commits
Pull iommu fix from Joerg Roedel:
- Fix device mask to catch all affected devices in the recently added
quirk for QAT devices in the Intel VT-d driver.
* tag 'iommu-fix-v6.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/vt-d: Fix buggy QAT device mask
Pull misc fixes from Andrew Morton:
"Nine hotfixes.
Six for MM, three for other areas. Four of these patches address
post-6.0 issues"
* tag 'mm-hotfixes-stable-2022-12-10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
memcg: fix possible use-after-free in memcg_write_event_control()
MAINTAINERS: update Muchun Song's email
mm/gup: fix gup_pud_range() for dax
mmap: fix do_brk_flags() modifying obviously incorrect VMAs
mm/swap: fix SWP_PFN_BITS with CONFIG_PHYS_ADDR_T_64BIT on 32bit
tmpfs: fix data loss from failed fallocate
kselftests: cgroup: update kmem test precision tolerance
mm: do not BUG_ON missing brk mapping, because userspace can unmap it
mailmap: update Matti Vaittinen's email address
Impacted QAT device IDs that need extra dtlb flush quirk is ranging
from 0x4940 to 0x4943. After bitwise AND device ID with 0xfffc the
result should be 0x4940 instead of 0x494c to identify these devices.
Fixes: e65a6897be5e ("iommu/vt-d: Add a fix for devices need extra dtlb flush")
Reported-by: Raghunathan Srinivasan <raghunathan.srinivasan@intel.com>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Link: https://lore.kernel.org/r/20221203005610.2927487-1-jacob.jun.pan@linux.intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Pull ARM fix from Russell King:
"One further ARM fix for 6.1 from Wang Kefeng, fixing up the handling
for kfence faults"
* tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm:
ARM: 9278/1: kfence: only handle translation faults
memcg_write_event_control() accesses the dentry->d_name of the specified
control fd to route the write call. As a cgroup interface file can't be
renamed, it's safe to access d_name as long as the specified file is a
regular cgroup file. Also, as these cgroup interface files can't be
removed before the directory, it's safe to access the parent too.
Prior to 347c4a874710 ("memcg: remove cgroup_event->cft"), there was a
call to __file_cft() which verified that the specified file is a regular
cgroupfs file before further accesses. The cftype pointer returned from
__file_cft() was no longer necessary and the commit inadvertently dropped
the file type check with it allowing any file to slip through. With the
invarients broken, the d_name and parent accesses can now race against
renames and removals of arbitrary files and cause use-after-free's.
Fix the bug by resurrecting the file type check in __file_cft(). Now that
cgroupfs is implemented through kernfs, checking the file operations needs
to go through a layer of indirection. Instead, let's check the superblock
and dentry type.
Link: https://lkml.kernel.org/r/Y5FRm/cfcKPGzWwl@slm.duckdns.org
Fixes: 347c4a874710 ("memcg: remove cgroup_event->cft")
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Jann Horn <jannh@google.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: <stable@vger.kernel.org> [3.14+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Pull media fix from Mauro Carvalho Chehab:
"A v4l-core fix related to validating DV timings related to video
blanking values"
* tag 'media/v6.1-4' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
media: v4l2-dv-timings.c: fix too strict blanking sanity checks
This is a similar fixup like arm64 does, only handle translation faults
in case of unexpected kfence report when alignment faults on ARM, see
more from commit 0bb1fbffc631 ("arm64: mm: kfence: only handle translation
faults").
Fixes: 75969686ec0d ("ARM: 9166/1: Support KFENCE for ARM")
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
I'm moving to the @linux.dev account. Map my old addresses and update it
to my new address.
Link: https://lkml.kernel.org/r/20221208115548.85244-1-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
This reverts commit f35b5d7d676e59e401690b678cd3cfec5e785c23.
It has been reported to cause huge performance regressions on some loads
(will-it-scale.per_process_ops, but also building the kernel with
clang).
The commit did speed up gcc builds by a small amount, so it's not an
unambiguous regression, but until the big regressions are understood,
let's revert it.
Reported-by: kernel test robot <yujie.liu@intel.com>
Link: https://lore.kernel.org/r/202210181535.7144dd15-yujie.liu@intel.com
Reported-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/lkml/Y1DNQaoPWxE%2BrGce@dev-arch.thelio-3990X/
Cc: Huang, Ying <ying.huang@intel.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull ARM SoC fix from Arnd Bergmann:
"One more last minute revert for a boot regression that was found on
the popular colibri-imx7"
* tag 'soc-fixes-6.1-6' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
Revert "ARM: dts: imx7: Fix NAND controller size-cells"
Sanity checks were added to verify the v4l2_bt_timings blanking fields
in order to avoid integer overflows when userspace passes weird values.
But that assumed that userspace would correctly fill in the front porch,
backporch and sync values, but sometimes all you know is the total
blanking, which is then assigned to just one of these fields.
And that can fail with these checks.
So instead set a maximum for the total horizontal and vertical
blanking and check that each field remains below that.
That is still sufficient to avoid integer overflows, but it also
allows for more flexibility in how userspace fills in these fields.
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Fixes: 4b6d66a45ed3 ("media: v4l2-dv-timings: add sanity checks for blanking values")
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Actually in no-MMU SoCs(i.e. i.MXRT) ZERO_PAGE(vaddr) expands to
```
virt_to_page(0)
```
that in order expands to:
```
pfn_to_page(virt_to_pfn(0))
```
and then virt_to_pfn(0) to:
```
((((unsigned long)(0) - PAGE_OFFSET) >> PAGE_SHIFT) +
PHYS_PFN_OFFSET)
```
where PAGE_OFFSET and PHYS_PFN_OFFSET are the DRAM offset(0x80000000) and
PAGE_SHIFT is 12. This way we obtain 16MB(0x01000000) summed to the base of
DRAM(0x80000000).
When ZERO_PAGE(0) is then used, for example in bio_add_page(), the page
gets an address that is out of DRAM bounds.
So instead of using fake virtual page 0 let's allocate a dedicated
zero_page during paging_init() and assign it to a global 'struct page *
empty_zero_page' the same way mmu.c does and it's the same approach used
in m68k with commit dc068f462179 as discussed here[0]. Then let's move
ZERO_PAGE() definition to the top of pgtable.h to be in common between
mmu.c and nommu.c.
[0]: https://lore.kernel.org/linux-m68k/2a462b23-5b8e-bbf4-ec7d-778434a3b9d7@google.com/T/#m1266ceb63
ad140743174d6b3070364d3c9a5179b
Signed-off-by: Giulio Benetti <giulio.benetti@benettiengineering.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
For dax pud, pud_huge() returns true on x86. So the function works as long
as hugetlb is configured. However, dax doesn't depend on hugetlb.
Commit 414fd080d125 ("mm/gup: fix gup_pmd_range() for dax") fixed
devmap-backed huge PMDs, but missed devmap-backed huge PUDs. Fix this as
well.
This fixes the below kernel panic:
general protection fault, probably for non-canonical address 0x69e7c000cc478: 0000 [#1] SMP
< snip >
Call Trace:
<TASK>
get_user_pages_fast+0x1f/0x40
iov_iter_get_pages+0xc6/0x3b0
? mempool_alloc+0x5d/0x170
bio_iov_iter_get_pages+0x82/0x4e0
? bvec_alloc+0x91/0xc0
? bio_alloc_bioset+0x19a/0x2a0
blkdev_direct_IO+0x282/0x480
? __io_complete_rw_common+0xc0/0xc0
? filemap_range_has_page+0x82/0xc0
generic_file_direct_write+0x9d/0x1a0
? inode_update_time+0x24/0x30
__generic_file_write_iter+0xbd/0x1e0
blkdev_write_iter+0xb4/0x150
? io_import_iovec+0x8d/0x340
io_write+0xf9/0x300
io_issue_sqe+0x3c3/0x1d30
? sysvec_reschedule_ipi+0x6c/0x80
__io_queue_sqe+0x33/0x240
? fget+0x76/0xa0
io_submit_sqes+0xe6a/0x18d0
? __fget_light+0xd1/0x100
__x64_sys_io_uring_enter+0x199/0x880
? __context_tracking_enter+0x1f/0x70
? irqentry_exit_to_user_mode+0x24/0x30
? irqentry_exit+0x1d/0x30
? __context_tracking_exit+0xe/0x70
do_syscall_64+0x3b/0x90
entry_SYSCALL_64_after_hwframe+0x61/0xcb
RIP: 0033:0x7fc97c11a7be
< snip >
</TASK>
---[ end trace 48b2e0e67debcaeb ]---
RIP: 0010:internal_get_user_pages_fast+0x340/0x990
< snip >
Kernel panic - not syncing: Fatal exception
Kernel Offset: disabled
Link: https://lkml.kernel.org/r/1670392853-28252-1-git-send-email-ssengar@linux.microsoft.com
Fixes: 414fd080d125 ("mm/gup: fix gup_pmd_range() for dax")
Signed-off-by: John Starks <jostarks@microsoft.com>
Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Yu Zhao <yuzhao@google.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Currently tpm transactions are executed unconditionally in
tpm_pm_suspend() function, which may lead to races with other tpm
accessors in the system.
Specifically, the hw_random tpm driver makes use of tpm_get_random(),
and this function is called in a loop from a kthread, which means it's
not frozen alongside userspace, and so can race with the work done
during system suspend:
tpm tpm0: tpm_transmit: tpm_recv: error -52
tpm tpm0: invalid TPM_STS.x 0xff, dumping stack for forensics
CPU: 0 PID: 1 Comm: init Not tainted 6.1.0-rc5+ #135
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-20220807_005459-localhost 04/01/2014
Call Trace:
tpm_tis_status.cold+0x19/0x20
tpm_transmit+0x13b/0x390
tpm_transmit_cmd+0x20/0x80
tpm1_pm_suspend+0xa6/0x110
tpm_pm_suspend+0x53/0x80
__pnp_bus_suspend+0x35/0xe0
__device_suspend+0x10f/0x350
Fix this by calling tpm_try_get_ops(), which itself is a wrapper around
tpm_chip_start(), but takes the appropriate mutex.
Signed-off-by: Jan Dabros <jsd@semihalf.com>
Reported-by: Vlastimil Babka <vbabka@suse.cz>
Tested-by: Jason A. Donenfeld <Jason@zx2c4.com>
Tested-by: Vlastimil Babka <vbabka@suse.cz>
Link: https://lore.kernel.org/all/c5ba47ef-393f-1fba-30bd-1230d1b4b592@suse.cz/
Cc: stable@vger.kernel.org
Fixes: e891db1a18bf ("tpm: turn on TPM on suspend for TPM 1.x")
[Jason: reworked commit message, added metadata]
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull drm fixes from Dave Airlie:
"Last set of fixes for final, scattered bunch of fixes, two amdgpu, one
vmwgfx, and some misc others.
amdgpu:
- S0ix fix
- DCN 3.2 array out of bounds fix
shmem:
- Fixes to shmem-helper error paths
bridge:
- Fix polarity bug in bridge/ti-sn65dsi86
dw-hdmi:
- Prefer 8-bit RGB fallback before any YUV mode in dw-hdmi, since
some panels lie about YUV support
vmwgfx:
- Stop using screen objects when SEV is active"
* tag 'drm-fixes-2022-12-09' of git://anongit.freedesktop.org/drm/drm:
drm/amd/display: fix array index out of bound error in DCN32 DML
drm/amdgpu/sdma_v4_0: turn off SDMA ring buffer in the s2idle suspend
drm/vmwgfx: Don't use screen objects when SEV is active
drm/shmem-helper: Avoid vm_open error paths
drm/shmem-helper: Remove errant put in error path
drm: bridge: dw_hdmi: fix preference of RGB modes over YUV420
drm/bridge: ti-sn65dsi86: Fix output polarity setting bug
drm/vmwgfx: Fix race issue calling pin_user_pages
This reverts commit 753395ea1e45c724150070b5785900b6a44bd5fb.
It introduced a boot regression on colibri-imx7, and potentially any
other i.MX7 boards with MTD partition list generated into the fdt by
U-Boot.
While the commit we are reverting here is not obviously wrong, it fixes
only a dt binding checker warning that is non-functional, while it
introduces a boot regression and there is no obvious fix ready.
Fixes: 753395ea1e45 ("ARM: dts: imx7: Fix NAND controller size-cells")
Signed-off-by: Francesco Dolcini <francesco.dolcini@toradex.com>
Reviewed-by: Miquel Raynal <miquel.raynal@bootlin.com>
Acked-by: Marek Vasut <marex@denx.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/all/Y4dgBTGNWpM6SQXI@francesco-nb.int.toradex.com/
Link: https://lore.kernel.org/all/20221205144917.6514168a@xps-13/
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
The example on how to use and test Capture Overlay specified
the wrong video device node. Back in 2015 the loop_video control
moved from the output device to the capture device, but this
example code is still referring to the output video device.
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Store the frame address where arm_get_current_stackframe() looks for it
(ARM_r7 instead of ARM_fp if CONFIG_THUMB2_KERNEL=y). Otherwise frame->fp
gets set to 0, causing unwind_frame() to fail.
# bpftrace -e 't:sched:sched_switch { @[kstack] = count(); exit(); }'
Attaching 1 probe...
@[
__schedule+1059
]: 1
A typical first unwind instruction is 0x97 (SP = R7), so after executing
it SP ends up being 0 and -URC_FAILURE is returned.
unwind_frame(pc = ac9da7d7 lr = 00000000 sp = c69bdda0 fp = 00000000)
unwind_find_idx(ac9da7d7)
unwind_exec_insn: insn = 00000097
unwind_exec_insn: fp = 00000000 sp = 00000000 lr = 00000000 pc = 00000000
With this patch:
# bpftrace -e 't:sched:sched_switch { @[kstack] = count(); exit(); }'
Attaching 1 probe...
@[
__schedule+1059
__schedule+1059
schedule+79
schedule_hrtimeout_range_clock+163
schedule_hrtimeout_range+17
ep_poll+471
SyS_epoll_wait+111
sys_epoll_pwait+231
__ret_fast_syscall+1
]: 1
Link: https://lore.kernel.org/r/20220920230728.2617421-1-tnovak@fb.com/
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Tomislav Novak <tnovak@fb.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Add more sanity checks to the VMA that do_brk_flags() will expand. Ensure
the VMA matches basic merge requirements within the function before
calling can_vma_merge_after().
Drop the duplicate checks from vm_brk_flags() since they will be enforced
later.
The old code would expand file VMAs on brk(), which is functionally
wrong and also dangerous in terms of locking because the brk() path
isn't designed for file VMAs and therefore doesn't lock the file
mapping. Checking can_vma_merge_after() ensures that new anonymous
VMAs can't be merged into file VMAs.
See https://lore.kernel.org/linux-mm/CAG48ez1tJZTOjS_FjRZhvtDA-STFmdw8PEizPDwMGFd_ui0Nrw@mail.gmail.com/
Link: https://lkml.kernel.org/r/20221205192304.1957418-1-Liam.Howlett@oracle.com
Fixes: 2e7ce7d354f2 ("mm/mmap: change do_brk_flags() to expand existing VMA and add do_brk_munmap()")
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Suggested-by: Jann Horn <jannh@google.com>
Cc: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: SeongJae Park <sj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Pull perf fix from Borislav Petkov:
- Fix a use-after-free case where the perf pending task callback would
see an already freed event
* tag 'perf_urgent_for_v6.1_rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf: Fix perf_pending_task() UaF
Pull block fix from Jens Axboe:
"A small fix for initializing the NVMe quirks before initializing the
subsystem"
* tag 'block-6.1-2022-12-08' of git://git.kernel.dk/linux:
nvme initialize core quirks before calling nvme_init_subsystem
drm-misc-fixes for v6.1 final?:
- Fix polarity bug in bridge/ti-sn65dsi86.
- Prefer 8-bit RGB fallback before any YUV mode in dw-hdmi, since some
panels lie about YUV support.
- Fixes to shmem-helper error paths.
- Small vmwgfx to stop using screen objects when SEV is active.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/8110f02d-d155-926e-8674-c88b806c3a3a@linux.intel.com
AT91 fixes for 6.1 #3
It contains:
- build fix for SAMA5D3 devices which don't have an L2 cache and due to this
accesssing outer_cache.write_sec in sama5_secure_cache_init() could throw
undefined reference to `outer_cache' if CONFIG_OUTER_CACHE is disabled
from common sama5_defconfig.
* tag 'at91-fixes-6.1-3' of https://git.kernel.org/pub/scm/linux/kernel/git/at91/linux:
ARM: at91: fix build for SAMA5D3 w/o L2 cache
Link: https://lore.kernel.org/r/20221125093521.382105-1-claudiu.beznea@microchip.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
If node_types does not have video/vbi/meta inputs or outputs,
then set num_inputs/num_outputs to 0 instead of 1.
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Fixes: 0c90f649d2f5 (media: vivid: add vivid_create_queue() helper)
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
MT_MEMORY_RO is introduced by commit 598f0a99fa8a ("ARM: 9210/1:
Mark the FDT_FIXED sections as shareable"), which is a readonly
memory type for FDT area, but there are some different between
ARM_LPAE and non-ARM_LPAE, we need to setup PMD_SECT_AP2 and
L_PMD_SECT_RDONLY for MT_MEMORY_RO when ARM_LAPE enabled.
non-ARM_LPAE 0xff800000-0xffa00000 2M PGD KERNEL ro NX SHD
ARM_LPAE 0xff800000-0xffc00000 4M PMD RW NX SHD
ARM_LPAE+fix 0xff800000-0xffc00000 4M PMD ro NX SHD
Fixes: 598f0a99fa8a ("ARM: 9210/1: Mark the FDT_FIXED sections as shareable")
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
We use "unsigned long" to store a PFN in the kernel and phys_addr_t to
store a physical address.
On a 64bit system, both are 64bit wide. However, on a 32bit system, the
latter might be 64bit wide. This is, for example, the case on x86 with
PAE: phys_addr_t and PTEs are 64bit wide, while "unsigned long" only spans
32bit.
The current definition of SWP_PFN_BITS without MAX_PHYSMEM_BITS misses
that case, and assumes that the maximum PFN is limited by an 32bit
phys_addr_t. This implies, that SWP_PFN_BITS will currently only be able
to cover 4 GiB - 1 on any 32bit system with 4k page size, which is wrong.
Let's rely on the number of bits in phys_addr_t instead, but make sure to
not exceed the maximum swap offset, to not make the BUILD_BUG_ON() in
is_pfn_swap_entry() unhappy. Note that swp_entry_t is effectively an
unsigned long and the maximum swap offset shares that value with the swap
type.
For example, on an 8 GiB x86 PAE system with a kernel config based on
Debian 11.5 (-> CONFIG_FLATMEM=y, CONFIG_X86_PAE=y), we will currently
fail removing migration entries (remove_migration_ptes()), because
mm/page_vma_mapped.c:check_pte() will fail to identify a PFN match as
swp_offset_pfn() wrongly masks off PFN bits. For example,
split_huge_page_to_list()->...->remap_page() will leave migration entries
in place and continue to unlock the page.
Later, when we stumble over these migration entries (e.g., via
/proc/self/pagemap), pfn_swap_entry_to_page() will BUG_ON() because these
migration entries shouldn't exist anymore and the page was unlocked.
[ 33.067591] kernel BUG at include/linux/swapops.h:497!
[ 33.067597] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[ 33.067602] CPU: 3 PID: 742 Comm: cow Tainted: G E 6.1.0-rc8+ #16
[ 33.067605] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-1.fc36 04/01/2014
[ 33.067606] EIP: pagemap_pmd_range+0x644/0x650
[ 33.067612] Code: 00 00 00 00 66 90 89 ce b9 00 f0 ff ff e9 ff fb ff ff 89 d8 31 db e8 48 c6 52 00 e9 23 fb ff ff e8 61 83 56 00 e9 b6 fe ff ff <0f> 0b bf 00 f0 ff ff e9 38 fa ff ff 3e 8d 74 26 00 55 89 e5 57 31
[ 33.067615] EAX: ee394000 EBX: 00000002 ECX: ee394000 EDX: 00000000
[ 33.067617] ESI: c1b0ded4 EDI: 00024a00 EBP: c1b0ddb4 ESP: c1b0dd68
[ 33.067619] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 EFLAGS: 00010246
[ 33.067624] CR0: 80050033 CR2: b7a00000 CR3: 01bbbd20 CR4: 00350ef0
[ 33.067625] Call Trace:
[ 33.067628] ? madvise_free_pte_range+0x720/0x720
[ 33.067632] ? smaps_pte_range+0x4b0/0x4b0
[ 33.067634] walk_pgd_range+0x325/0x720
[ 33.067637] ? mt_find+0x1d6/0x3a0
[ 33.067641] ? mt_find+0x1d6/0x3a0
[ 33.067643] __walk_page_range+0x164/0x170
[ 33.067646] walk_page_range+0xf9/0x170
[ 33.067648] ? __kmem_cache_alloc_node+0x2a8/0x340
[ 33.067653] pagemap_read+0x124/0x280
[ 33.067658] ? default_llseek+0x101/0x160
[ 33.067662] ? smaps_account+0x1d0/0x1d0
[ 33.067664] vfs_read+0x90/0x290
[ 33.067667] ? do_madvise.part.0+0x24b/0x390
[ 33.067669] ? debug_smp_processor_id+0x12/0x20
[ 33.067673] ksys_pread64+0x58/0x90
[ 33.067675] __ia32_sys_ia32_pread64+0x1b/0x20
[ 33.067680] __do_fast_syscall_32+0x4c/0xc0
[ 33.067683] do_fast_syscall_32+0x29/0x60
[ 33.067686] do_SYSENTER_32+0x15/0x20
[ 33.067689] entry_SYSENTER_32+0x98/0xf1
Decrease the indentation level of SWP_PFN_BITS and SWP_PFN_MASK to keep it
readable and consistent.
[david@redhat.com: rely on sizeof(phys_addr_t) and min_t() instead]
Link: https://lkml.kernel.org/r/20221206105737.69478-1-david@redhat.com
[david@redhat.com: use "int" for comparison, as we're only comparing numbers < 64]
Link: https://lkml.kernel.org/r/1f157500-2676-7cef-a84e-9224ed64e540@redhat.com
Link: https://lkml.kernel.org/r/20221205150857.167583-1-david@redhat.com
Fixes: 0d206b5d2e0d ("mm/swap: add swp_offset_pfn() to fetch PFN from swap entry")
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Pull timer fix from Borislav Petkov:
- Revert a fix to RISC-V timers supposed to address an uncertainty
whether clock events are received during S3 or not which locks up
other RISC-V platforms. The issue will be fixed differently later.
* tag 'timers_urgent_for_v6.1_rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
Revert "clocksource/drivers/riscv: Events are stopped during CPU suspend"
Per syzbot it is possible for perf_pending_task() to run after the
event is free()'d. There are two related but distinct cases:
- the task_work was already queued before destroying the event;
- destroying the event itself queues the task_work.
The first cannot be solved using task_work_cancel() since
perf_release() itself might be called from a task_work (____fput),
which means the current->task_works list is already empty and
task_work_cancel() won't be able to find the perf_pending_task()
entry.
The simplest alternative is extending the perf_event lifetime to cover
the task_work.
The second is just silly, queueing a task_work while you know the
event is going away makes no sense and is easily avoided by
re-arranging how the event is marked STATE_DEAD and ensuring it goes
through STATE_OFF on the way down.
Reported-by: syzbot+9228d6098455bb209ec8@syzkaller.appspotmail.com
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Marco Elver <elver@google.com>
Pull io_uring fix from Jens Axboe:
"A single small fix for an issue related to ordering between
cancelation and current->io_uring teardown"
* tag 'io_uring-6.1-2022-12-08' of git://git.kernel.dk/linux:
io_uring: Fix a null-ptr-deref in io_tctx_exit_cb()
Pull NVMe fix from Christoph:
"nvme fixes for Linux 6.1
- initialize core quirks before calling nvme_init_subsystem
(Pankaj Raghav)"
* tag 'nvme-6.1-2022-12-07' of git://git.infradead.org/nvme:
nvme initialize core quirks before calling nvme_init_subsystem
amd-drm-fixes-6.1-2022-12-07:
amdgpu:
- S0ix fix
- DCN 3.2 array out of bounds fix
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221207222751.9558-1-alexander.deucher@amd.com
When SEV is enabled gmr's and mob's are explicitly disabled because
the encrypted system memory can not be used by the hypervisor.
The driver was disabling GMR's but the presentation code, which depends
on GMR's, wasn't honoring it which lead to black screen on hosts
with SEV enabled.
Make sure screen objects presentation is not used when guest memory
regions have been disabled to fix presentation on SEV enabled hosts.
Fixes: 3b0d6458c705 ("drm/vmwgfx: Refuse DMA operation when SEV encryption is active")
Cc: <stable@vger.kernel.org> # v5.7+
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reported-by: Nicholas Hunt <nhunt@vmware.com>
Reviewed-by: Martin Krastev <krastevm@vmware.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221201175341.491884-1-zack@kde.org
The L2 cache is present on the newer SAMA5D2 and SAMA5D4 families, but
apparently not for the older SAMA5D3.
Solves a build-time regression with the following symptom:
sama5.c:(.init.text+0x48): undefined reference to `outer_cache'
Fixes: 3b5a7ca7d252 ("ARM: at91: setup outer cache .write_sec() callback if needed")
Signed-off-by: Peter Rosin <peda@axentia.se>
[claudiu.beznea: delete "At least not always." from commit description]
Signed-off-by: Claudiu Beznea <claudiu.beznea@microchip.com>
Link: https://lore.kernel.org/r/b7f8dacc-5e1f-0eb2-188e-3ad9a9f7613d@axentia.se
>From what I can see, this is not needed. And since using it issues a 'deprecated'
warning, just drop it.
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
After ARM supports p4d page tables, the pg_level for note_page()
in walk_pmd() should be 4, not 3, fix it.
Fixes: 84e6ffb2c49c ("arm: add support for folded p4d page tables")
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Fix tmpfs data loss when the fallocate system call is interrupted by a
signal, or fails for some other reason. The partial folio handling in
shmem_undo_range() forgot to consider this unfalloc case, and was liable
to erase or truncate out data which had already been committed earlier.
It turns out that none of the partial folio handling there is appropriate
for the unfalloc case, which just wants to proceed to removal of whole
folios: which find_get_entries() provides, even when partially covered.
Original patch by Rui Wang.
Link: https://lore.kernel.org/linux-mm/33b85d82.7764.1842e9ab207.Coremail.chenguoqic@163.com/
Link: https://lkml.kernel.org/r/a5dac112-cf4b-7af-a33-f386e347fd38@google.com
Fixes: b9a8a4195c7d ("truncate,shmem: Handle truncates that split large folios")
Signed-off-by: Hugh Dickins <hughd@google.com>
Reported-by: Guoqi Chen <chenguoqic@163.com>
Link: https://lore.kernel.org/all/20221101032248.819360-1-kernel@hev.cc/
Cc: Rui Wang <kernel@hev.cc>
Cc: Huacai Chen <chenhuacai@loongson.cn>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Cc: <stable@vger.kernel.org> [5.17+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Pull powerpc fixes from Michael Ellerman:
- Fix oops in 32-bit BPF tail call tests
- Add missing declaration for machine_check_early_boot()
Thanks to Christophe Leroy and Naveen N. Rao.
* tag 'powerpc-6.1-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/64s: Add missing declaration for machine_check_early_boot()
powerpc/bpf/32: Fix Oops on tail call tests
This reverts commit 232ccac1bd9b5bfe73895f527c08623e7fa0752d.
On the subject of suspend, the RISC-V SBI spec states:
This does not cover whether any given events actually reach the hart or
not, just what the hart will do if it receives an event. On PolarFire
SoC, and potentially other SiFive based implementations, events from the
RISC-V timer do reach a hart during suspend. This is not the case for the
implementation on the Allwinner D1 - there timer events are not received
during suspend.
To fix this, the CLOCK_EVT_FEAT_C3STOP (mis)feature was enabled for the
timer driver - but this has broken both RCU stall detection and timers
generally on PolarFire SoC and potentially other SiFive based
implementations.
If an AXI read to the PCIe controller on PolarFire SoC times out, the
system will stall, however, with CLOCK_EVT_FEAT_C3STOP active, the system
just locks up without RCU stalling:
io scheduler mq-deadline registered
io scheduler kyber registered
microchip-pcie 2000000000.pcie: host bridge /soc/pcie@2000000000 ranges:
microchip-pcie 2000000000.pcie: MEM 0x2008000000..0x2087ffffff -> 0x0008000000
microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
microchip-pcie 2000000000.pcie: axi read request error
microchip-pcie 2000000000.pcie: axi read timeout
microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
Freeing initrd memory: 7332K
Similarly issues were reported with clock_nanosleep() - with a test app
that sleeps each cpu for 6, 5, 4, 3 ms respectively, HZ=250 & the blamed
commit in place, the sleep times are rounded up to the next jiffy:
== CPU: 1 == == CPU: 2 == == CPU: 3 == == CPU: 4 ==
Mean: 7.974992 Mean: 7.976534 Mean: 7.962591 Mean: 3.952179
Std Dev: 0.154374 Std Dev: 0.156082 Std Dev: 0.171018 Std Dev: 0.076193
Hi: 9.472000 Hi: 10.495000 Hi: 8.864000 Hi: 4.736000
Lo: 6.087000 Lo: 6.380000 Lo: 4.872000 Lo: 3.403000
Samples: 521 Samples: 521 Samples: 521 Samples: 521
Fortunately, the D1 has a second timer, which is "currently used in
preference to the RISC-V/SBI timer driver" so a revert here does not
hurt operation of D1 in its current form.
Ultimately, a DeviceTree property (or node) will be added to encode the
behaviour of the timers, but until then revert the addition of
CLOCK_EVT_FEAT_C3STOP.
Fixes: 232ccac1bd9b ("clocksource/drivers/riscv: Events are stopped during CPU suspend")
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
Acked-by: Samuel Holland <samuel@sholland.org>
Link: https://lore.kernel.org/linux-riscv/YzYTNQRxLr7Q9JR0@spud/
Link: https://github.com/riscv-non-isa/riscv-sbi-doc/issues/98/
Link: https://lore.kernel.org/linux-riscv/bf6d3b1f-f703-4a25-833e-972a44a04114@sholland.org/
Link: https://lore.kernel.org/r/20221122121620.3522431-1-conor.dooley@microchip.com
Some PMUs (notably the traditional hardware kind) have boundary issues
with the OS filter. Specifically, it is possible for
perf_event_attr::exclude_kernel=1 events to trigger in-kernel due to
SKID or errata.
This can upset the sigtrap logic some and trigger the WARN.
However, if this invalid sample is the first we must not loose the
SIGTRAP, OTOH if it is the second, it must not override the
pending_addr with a (possibly) invalid one.
Fixes: ca6c21327c6a ("perf: Fix missing SIGTRAPs")
Reported-by: Pengfei Xu <pengfei.xu@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Marco Elver <elver@google.com>
Tested-by: Pengfei Xu <pengfei.xu@intel.com>
Link: https://lkml.kernel.org/r/Y3hDYiXwRnJr8RYG@xpf.sh.intel.com
Pull networking fixes from Jakub Kicinski:
"Including fixes from bluetooth, can and netfilter.
Current release - new code bugs:
- bonding: ipv6: correct address used in Neighbour Advertisement
parsing (src vs dst typo)
- fec: properly scope IRQ coalesce setup during link up to supported
chips only
Previous releases - regressions:
- Bluetooth fixes for fake CSR clones (knockoffs):
- re-add ERR_DATA_REPORTING quirk
- fix crash when device is replugged
- Bluetooth:
- silence a user-triggerable dmesg error message
- L2CAP: fix u8 overflow, oob access
- correct vendor codec definition
- fix support for Read Local Supported Codecs V2
- ti: am65-cpsw: fix RGMII configuration at SPEED_10
- mana: fix race on per-CQ variable NAPI work_done
Previous releases - always broken:
- af_unix: diag: fetch user_ns from in_skb in unix_diag_get_exact(),
avoid null-deref
- af_can: fix NULL pointer dereference in can_rcv_filter
- can: slcan: fix UAF with a freed work
- can: can327: flush TX_work on ldisc .close()
- macsec: add missing attribute validation for offload
- ipv6: avoid use-after-free in ip6_fragment()
- nft_set_pipapo: actually validate intervals in fields after the
first one
- mvneta: prevent oob access in mvneta_config_rss()
- ipv4: fix incorrect route flushing when table ID 0 is used, or when
source address is deleted
- phy: mxl-gpy: add workaround for IRQ bug on GPY215B and GPY215C"
* tag 'net-6.1-rc9' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (77 commits)
net: dsa: sja1105: avoid out of bounds access in sja1105_init_l2_policing()
s390/qeth: fix use-after-free in hsci
macsec: add missing attribute validation for offload
net: mvneta: Fix an out of bounds check
net: thunderbolt: fix memory leak in tbnet_open()
ipv6: avoid use-after-free in ip6_fragment()
net: plip: don't call kfree_skb/dev_kfree_skb() under spin_lock_irq()
net: phy: mxl-gpy: add MDINT workaround
net: dsa: mv88e6xxx: accept phy-mode = "internal" for internal PHY ports
xen/netback: don't call kfree_skb() under spin_lock_irqsave()
dpaa2-switch: Fix memory leak in dpaa2_switch_acl_entry_add() and dpaa2_switch_acl_entry_remove()
ethernet: aeroflex: fix potential skb leak in greth_init_rings()
tipc: call tipc_lxc_xmit without holding node_read_lock
can: esd_usb: Allow REC and TEC to return to zero
can: can327: flush TX_work on ldisc .close()
can: slcan: fix freed work crash
can: af_can: fix NULL pointer dereference in can_rcv_filter
net: dsa: sja1105: fix memory leak in sja1105_setup_devlink_regions()
ipv4: Fix incorrect route flushing when table ID 0 is used
ipv4: Fix incorrect route flushing when source address is deleted
...
Syzkaller reports a NULL deref bug as follows:
BUG: KASAN: null-ptr-deref in io_tctx_exit_cb+0x53/0xd3
Read of size 4 at addr 0000000000000138 by task file1/1955
CPU: 1 PID: 1955 Comm: file1 Not tainted 6.1.0-rc7-00103-gef4d3ea40565 #75
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0xcd/0x134
? io_tctx_exit_cb+0x53/0xd3
kasan_report+0xbb/0x1f0
? io_tctx_exit_cb+0x53/0xd3
kasan_check_range+0x140/0x190
io_tctx_exit_cb+0x53/0xd3
task_work_run+0x164/0x250
? task_work_cancel+0x30/0x30
get_signal+0x1c3/0x2440
? lock_downgrade+0x6e0/0x6e0
? lock_downgrade+0x6e0/0x6e0
? exit_signals+0x8b0/0x8b0
? do_raw_read_unlock+0x3b/0x70
? do_raw_spin_unlock+0x50/0x230
arch_do_signal_or_restart+0x82/0x2470
? kmem_cache_free+0x260/0x4b0
? putname+0xfe/0x140
? get_sigframe_size+0x10/0x10
? do_execveat_common.isra.0+0x226/0x710
? lockdep_hardirqs_on+0x79/0x100
? putname+0xfe/0x140
? do_execveat_common.isra.0+0x238/0x710
exit_to_user_mode_prepare+0x15f/0x250
syscall_exit_to_user_mode+0x19/0x50
do_syscall_64+0x42/0xb0
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0023:0x0
Code: Unable to access opcode bytes at 0xffffffffffffffd6.
RSP: 002b:00000000fffb7790 EFLAGS: 00000200 ORIG_RAX: 000000000000000b
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
</TASK>
Kernel panic - not syncing: panic_on_warn set ...
This happens because the adding of task_work from io_ring_exit_work()
isn't synchronized with canceling all work items from eg exec. The
execution of the two are ordered in that they are both run by the task
itself, but if io_tctx_exit_cb() is queued while we're canceling all
work items off exec AND gets executed when the task exits to userspace
rather than in the main loop in io_uring_cancel_generic(), then we can
find current->io_uring == NULL and hit the above crash.
It's safe to add this NULL check here, because the execution of the two
paths are done by the task itself.
Cc: stable@vger.kernel.org
Fixes: d56d938b4bef ("io_uring: do ctx initiated file note removal")
Reported-by: syzkaller <syzkaller@googlegroups.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
Link: https://lore.kernel.org/r/20221206093833.3812138-1-harshit.m.mogalapalli@oracle.com
[axboe: add code comment and also put an explanation in the commit msg]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull NVMe fixes from Christoph:
"nvme fixes for Linux 6.1
- fix SRCU protection of nvme_ns_head list (Caleb Sander)
- clear the prp2 field when not used (Lei Rao)"
* tag 'nvme-6.1-2022-01-02' of git://git.infradead.org/nvme:
nvme: fix SRCU protection of nvme_ns_head list
nvme-pci: clear the prp2 field when not used
A device might have a core quirk for NVME_QUIRK_IGNORE_DEV_SUBNQN
(such as Samsung X5) but it would still give a:
"missing or invalid SUBNQN field"
warning as core quirks are filled after calling nvme_init_subnqn. Fill
ctrl->quirks from struct core_quirks before calling nvme_init_subsystem
to fix this.
Tested on a Samsung X5.
Fixes: ab9e00cc72fa ("nvme: track subsystems")
Signed-off-by: Pankaj Raghav <p.raghav@samsung.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
[Why&How]
LinkCapacitySupport array is indexed with the number of voltage states and
not the number of max DPPs. Fix the error by changing the array
declaration to use the correct (larger) array size of total number of
voltage states.
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.0.x
vm_open() is not allowed to fail. Fortunately we are guaranteed that
the pages are already pinned, thanks to the initial mmap which is now
being cloned into a forked process, and only need to increment the
refcnt. So just increment it directly. Previously if a signal was
delivered at the wrong time to the forking process, the
mutex_lock_interruptible() could fail resulting in the pages_use_count
not being incremented.
Fixes: 2194a63a818d ("drm: Add library for shmem backed GEM objects")
Cc: stable@vger.kernel.org
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221130185748.357410-3-robdclark@gmail.com
Pull vfs fix from Al Viro:
"Amir's copy_file_range() fix"
* tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
vfs: fix copy_file_range() averts filesystem freeze protection
We set the PIOC to GPIO mode. This way the pin becomes an
input signal will be usable by the controller. Without
this change the udc on the 9g20ek does not work.
Cc: nicolas.ferre@microchip.com
Cc: ludovic.desroches@microchip.com
Cc: alexandre.belloni@bootlin.com
Cc: linux-arm-kernel@lists.infradead.org
Cc: kernel@pengutronix.de
Fixes: 5cb4e73575e3 ("ARM: at91: add at91sam9g20ek boards dt support")
Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de>
Signed-off-by: Claudiu Beznea <claudiu.beznea@microchip.com>
Link: https://lore.kernel.org/r/20221114185923.1023249-3-m.grzeschik@pengutronix.de
vivid_update_format_cap() can be called from an s_ctrl callback.
In that case (keep_controls == true) no control framework functions
can be called that take the control handler mutex.
The new call to v4l2_ctrl_modify_dimensions() did exactly that.
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Fixes: 6bc7643d1b9c (media: vivid: add pixel_array test control)
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
This patch fixes the following build error:
In file included from ./include/linux/io.h:13,
from ./arch/arm/mach-rpc/include/mach/uncompress.h:9,
from arch/arm/boot/compressed/misc.c:31:
./arch/arm/include/asm/io.h:85:22: error: conflicting types for ‘__raw_writeb’
85 | #define __raw_writeb __raw_writeb
| ^~~~~~~~~~~~
./arch/arm/include/asm/io.h:86:20: note: in expansion of macro ‘__raw_writeb’
86 | static inline void __raw_writeb(u8 val, volatile void __iomem *addr)
| ^~~~~~~~~~~~
In file included from arch/arm/boot/compressed/misc.c:26:
arch/arm/boot/compressed/misc-ep93xx.h:13:20: note: previous definition of ‘__raw_writeb’ was here
13 | static inline void __raw_writeb(unsigned char value, unsigned int ptr)
| ^~~~~~~~~~~~
To: Russell King <linux@armlinux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: linux-arm-kernel@lists.infradead.org
Fixes: 0361c7e504b1 ("ARM: ep93xx: multiplatform support")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
1813e51eece0 ("memcg: increase MEMCG_CHARGE_BATCH to 64") has changed
the batch size while this test case has been left behind. This has led
to a test failure reported by test bot:
not ok 2 selftests: cgroup: test_kmem # exit=1
Update the tolerance for the pcp charges to reflect the
MEMCG_CHARGE_BATCH change to fix this.
[akpm@linux-foundation.org: update comments, per Roman]
Link: https://lkml.kernel.org/r/Y4m8Unt6FhWKC6IH@dhcp22.suse.cz
Fixes: 1813e51eece0a ("memcg: increase MEMCG_CHARGE_BATCH to 64")
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: kernel test robot <yujie.liu@intel.com>
Link: https://lore.kernel.org/oe-lkp/202212010958.c1053bd3-yujie.liu@intel.com
Acked-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Tested-by: Yujie Liu <yujie.liu@intel.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Feng Tang <feng.tang@intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Michal Koutný" <mkoutny@suse.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Pull input fix from Dmitry Torokhov:
- a fix for Raydium touchscreen driver to stop leaking memory when
sending commands to the chip
* tag 'input-for-v6.1-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: raydium_ts_i2c - fix memory leak in raydium_i2c_send()
There's no declaration for machine_check_early_boot(), which leads to a
build failure with W=1. Add one.
Fixes: 2f5182cffa43 ("powerpc/64s: early boot machine check handler")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221125132521.2167039-1-mpe@ellerman.id.au
The perf_event_attr::sigtrap functionality relies on data->addr being
set. However commit 7b0846301531 ("perf: Use sample_flags for addr")
changed this to only initialize data->addr when not 0.
Fixes: 7b0846301531 ("perf: Use sample_flags for addr")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/Y3426b4OimE%2FI5po%40hirez.programming.kicks-ass.net
Pull HID fixes from Jiri Kosina:
"A regression fix for handling Logitech HID++ devices and memory
corruption fixes:
- regression fix (revert) for catch-all handling of Logitech HID++
Bluetooth devices; there are devices that turn out not to work with
this, and the root cause is yet to be properly understood. So we
are dropping it for now, and it will be revisited for 6.2 or 6.3
(Benjamin Tissoires)
- memory corruption fix in HID core (ZhangPeng)
- memory corruption fix in hid-lg4ff (Anastasia Belova)
- Kconfig fix for I2C_HID (Benjamin Tissoires)
- a few device-id specific quirks that piggy-back on top of the
important fixes above"
* tag 'for-linus-2022120801' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
Revert "HID: logitech-hidpp: Enable HID++ for all the Logitech Bluetooth devices"
Revert "HID: logitech-hidpp: Remove special-casing of Bluetooth devices"
HID: usbhid: Add ALWAYS_POLL quirk for some mice
HID: core: fix shift-out-of-bounds in hid_report_raw_event
HID: uclogic: Add HID_QUIRK_HIDINPUT_FORCE quirk
HID: fix I2C_HID not selected when I2C_HID_OF_ELAN is
HID: hid-lg4ff: Add check for empty lbuf
HID: ite: Enable QUIRK_TOUCHPAD_ON_OFF_REPORT on Acer Aspire Switch V 10
HID: uclogic: Fix frame templates for big endian architectures
The SJA1105 family has 45 L2 policing table entries
(SJA1105_MAX_L2_POLICING_COUNT) and SJA1110 has 110
(SJA1110_MAX_L2_POLICING_COUNT). Keeping the table structure but
accounting for the difference in port count (5 in SJA1105 vs 10 in
SJA1110) does not fully explain the difference. Rather, the SJA1110 also
has L2 ingress policers for multicast traffic. If a packet is classified
as multicast, it will be processed by the policer index 99 + SRCPORT.
The sja1105_init_l2_policing() function initializes all L2 policers such
that they don't interfere with normal packet reception by default. To have
a common code between SJA1105 and SJA1110, the index of the multicast
policer for the port is calculated because it's an index that is out of
bounds for SJA1105 but in bounds for SJA1110, and a bounds check is
performed.
The code fails to do the proper thing when determining what to do with the
multicast policer of port 0 on SJA1105 (ds->num_ports = 5). The "mcast"
index will be equal to 45, which is also equal to
table->ops->max_entry_count (SJA1105_MAX_L2_POLICING_COUNT). So it passes
through the check. But at the same time, SJA1105 doesn't have multicast
policers. So the code programs the SHARINDX field of an out-of-bounds
element in the L2 Policing table of the static config.
The comparison between index 45 and 45 entries should have determined the
code to not access this policer index on SJA1105, since its memory wasn't
even allocated.
With enough bad luck, the out-of-bounds write could even overwrite other
valid kernel data, but in this case, the issue was detected using KASAN.
Kernel log:
sja1105 spi5.0: Probed switch chip: SJA1105Q
==================================================================
BUG: KASAN: slab-out-of-bounds in sja1105_setup+0x1cbc/0x2340
Write of size 8 at addr ffffff880bd57708 by task kworker/u8:0/8
...
Workqueue: events_unbound deferred_probe_work_func
Call trace:
...
sja1105_setup+0x1cbc/0x2340
dsa_register_switch+0x1284/0x18d0
sja1105_probe+0x748/0x840
...
Allocated by task 8:
...
sja1105_setup+0x1bcc/0x2340
dsa_register_switch+0x1284/0x18d0
sja1105_probe+0x748/0x840
...
Fixes: 38fbe91f2287 ("net: dsa: sja1105: configure the multicast policers, if present")
CC: stable@vger.kernel.org # 5.15+
Signed-off-by: Radu Nicolae Pirea (OSS) <radu-nicolae.pirea@oss.nxp.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://lore.kernel.org/r/20221207132347.38698-1-radu-nicolae.pirea@oss.nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
With how task_work is added and signaled, we can have TIF_NOTIFY_SIGNAL
set and no task_work pending as it got run in a previous loop. Treat
TIF_NOTIFY_SIGNAL like get_signal(), always clear it if set regardless
of whether or not task_work is pending to run.
Cc: stable@vger.kernel.org
Fixes: 46a525e199e4 ("io_uring: don't gate task_work run on TIF_NOTIFY_SIGNAL")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Either ublk_can_use_task_work() is true or not, io commands are
forwarded to ublk server in reverse order, since llist_add() is
always to add one element to the head of the list.
Even though block layer doesn't guarantee request dispatch order,
requests should be sent to hardware in the sequence order generated
from io scheduler, which usually considers the request's LBA, and
order is often important for HDD.
So forward io commands in the sequence made from io scheduler by
aligning task work with current io_uring command's batch handling,
and it has been observed that both can get similar performance data
if IORING_SETUP_COOP_TASKRUN is set from ublk server.
Reported-by: Andreas Hindborg <andreas.hindborg@wdc.com>
Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: ZiyangZhang <ZiyangZhang@linux.alibaba.com>
Link: https://lore.kernel.org/r/20221121155645.396272-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull misc fixes from Andrew Morton:
"Nine hotfixes.
Six for MM, three for other areas. Four of these patches address
post-6.0 issues"
* tag 'mm-hotfixes-stable-2022-12-10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
memcg: fix possible use-after-free in memcg_write_event_control()
MAINTAINERS: update Muchun Song's email
mm/gup: fix gup_pud_range() for dax
mmap: fix do_brk_flags() modifying obviously incorrect VMAs
mm/swap: fix SWP_PFN_BITS with CONFIG_PHYS_ADDR_T_64BIT on 32bit
tmpfs: fix data loss from failed fallocate
kselftests: cgroup: update kmem test precision tolerance
mm: do not BUG_ON missing brk mapping, because userspace can unmap it
mailmap: update Matti Vaittinen's email address
Impacted QAT device IDs that need extra dtlb flush quirk is ranging
from 0x4940 to 0x4943. After bitwise AND device ID with 0xfffc the
result should be 0x4940 instead of 0x494c to identify these devices.
Fixes: e65a6897be5e ("iommu/vt-d: Add a fix for devices need extra dtlb flush")
Reported-by: Raghunathan Srinivasan <raghunathan.srinivasan@intel.com>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Link: https://lore.kernel.org/r/20221203005610.2927487-1-jacob.jun.pan@linux.intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
memcg_write_event_control() accesses the dentry->d_name of the specified
control fd to route the write call. As a cgroup interface file can't be
renamed, it's safe to access d_name as long as the specified file is a
regular cgroup file. Also, as these cgroup interface files can't be
removed before the directory, it's safe to access the parent too.
Prior to 347c4a874710 ("memcg: remove cgroup_event->cft"), there was a
call to __file_cft() which verified that the specified file is a regular
cgroupfs file before further accesses. The cftype pointer returned from
__file_cft() was no longer necessary and the commit inadvertently dropped
the file type check with it allowing any file to slip through. With the
invarients broken, the d_name and parent accesses can now race against
renames and removals of arbitrary files and cause use-after-free's.
Fix the bug by resurrecting the file type check in __file_cft(). Now that
cgroupfs is implemented through kernfs, checking the file operations needs
to go through a layer of indirection. Instead, let's check the superblock
and dentry type.
Link: https://lkml.kernel.org/r/Y5FRm/cfcKPGzWwl@slm.duckdns.org
Fixes: 347c4a874710 ("memcg: remove cgroup_event->cft")
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Jann Horn <jannh@google.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: <stable@vger.kernel.org> [3.14+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
This is a similar fixup like arm64 does, only handle translation faults
in case of unexpected kfence report when alignment faults on ARM, see
more from commit 0bb1fbffc631 ("arm64: mm: kfence: only handle translation
faults").
Fixes: 75969686ec0d ("ARM: 9166/1: Support KFENCE for ARM")
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
This reverts commit f35b5d7d676e59e401690b678cd3cfec5e785c23.
It has been reported to cause huge performance regressions on some loads
(will-it-scale.per_process_ops, but also building the kernel with
clang).
The commit did speed up gcc builds by a small amount, so it's not an
unambiguous regression, but until the big regressions are understood,
let's revert it.
Reported-by: kernel test robot <yujie.liu@intel.com>
Link: https://lore.kernel.org/r/202210181535.7144dd15-yujie.liu@intel.com
Reported-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/lkml/Y1DNQaoPWxE%2BrGce@dev-arch.thelio-3990X/
Cc: Huang, Ying <ying.huang@intel.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sanity checks were added to verify the v4l2_bt_timings blanking fields
in order to avoid integer overflows when userspace passes weird values.
But that assumed that userspace would correctly fill in the front porch,
backporch and sync values, but sometimes all you know is the total
blanking, which is then assigned to just one of these fields.
And that can fail with these checks.
So instead set a maximum for the total horizontal and vertical
blanking and check that each field remains below that.
That is still sufficient to avoid integer overflows, but it also
allows for more flexibility in how userspace fills in these fields.
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Fixes: 4b6d66a45ed3 ("media: v4l2-dv-timings: add sanity checks for blanking values")
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Actually in no-MMU SoCs(i.e. i.MXRT) ZERO_PAGE(vaddr) expands to
```
virt_to_page(0)
```
that in order expands to:
```
pfn_to_page(virt_to_pfn(0))
```
and then virt_to_pfn(0) to:
```
((((unsigned long)(0) - PAGE_OFFSET) >> PAGE_SHIFT) +
PHYS_PFN_OFFSET)
```
where PAGE_OFFSET and PHYS_PFN_OFFSET are the DRAM offset(0x80000000) and
PAGE_SHIFT is 12. This way we obtain 16MB(0x01000000) summed to the base of
DRAM(0x80000000).
When ZERO_PAGE(0) is then used, for example in bio_add_page(), the page
gets an address that is out of DRAM bounds.
So instead of using fake virtual page 0 let's allocate a dedicated
zero_page during paging_init() and assign it to a global 'struct page *
empty_zero_page' the same way mmu.c does and it's the same approach used
in m68k with commit dc068f462179 as discussed here[0]. Then let's move
ZERO_PAGE() definition to the top of pgtable.h to be in common between
mmu.c and nommu.c.
[0]: https://lore.kernel.org/linux-m68k/2a462b23-5b8e-bbf4-ec7d-778434a3b9d7@google.com/T/#m1266ceb63
ad140743174d6b3070364d3c9a5179b
Signed-off-by: Giulio Benetti <giulio.benetti@benettiengineering.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
For dax pud, pud_huge() returns true on x86. So the function works as long
as hugetlb is configured. However, dax doesn't depend on hugetlb.
Commit 414fd080d125 ("mm/gup: fix gup_pmd_range() for dax") fixed
devmap-backed huge PMDs, but missed devmap-backed huge PUDs. Fix this as
well.
This fixes the below kernel panic:
general protection fault, probably for non-canonical address 0x69e7c000cc478: 0000 [#1] SMP
< snip >
Call Trace:
<TASK>
get_user_pages_fast+0x1f/0x40
iov_iter_get_pages+0xc6/0x3b0
? mempool_alloc+0x5d/0x170
bio_iov_iter_get_pages+0x82/0x4e0
? bvec_alloc+0x91/0xc0
? bio_alloc_bioset+0x19a/0x2a0
blkdev_direct_IO+0x282/0x480
? __io_complete_rw_common+0xc0/0xc0
? filemap_range_has_page+0x82/0xc0
generic_file_direct_write+0x9d/0x1a0
? inode_update_time+0x24/0x30
__generic_file_write_iter+0xbd/0x1e0
blkdev_write_iter+0xb4/0x150
? io_import_iovec+0x8d/0x340
io_write+0xf9/0x300
io_issue_sqe+0x3c3/0x1d30
? sysvec_reschedule_ipi+0x6c/0x80
__io_queue_sqe+0x33/0x240
? fget+0x76/0xa0
io_submit_sqes+0xe6a/0x18d0
? __fget_light+0xd1/0x100
__x64_sys_io_uring_enter+0x199/0x880
? __context_tracking_enter+0x1f/0x70
? irqentry_exit_to_user_mode+0x24/0x30
? irqentry_exit+0x1d/0x30
? __context_tracking_exit+0xe/0x70
do_syscall_64+0x3b/0x90
entry_SYSCALL_64_after_hwframe+0x61/0xcb
RIP: 0033:0x7fc97c11a7be
< snip >
</TASK>
---[ end trace 48b2e0e67debcaeb ]---
RIP: 0010:internal_get_user_pages_fast+0x340/0x990
< snip >
Kernel panic - not syncing: Fatal exception
Kernel Offset: disabled
Link: https://lkml.kernel.org/r/1670392853-28252-1-git-send-email-ssengar@linux.microsoft.com
Fixes: 414fd080d125 ("mm/gup: fix gup_pmd_range() for dax")
Signed-off-by: John Starks <jostarks@microsoft.com>
Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Yu Zhao <yuzhao@google.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Currently tpm transactions are executed unconditionally in
tpm_pm_suspend() function, which may lead to races with other tpm
accessors in the system.
Specifically, the hw_random tpm driver makes use of tpm_get_random(),
and this function is called in a loop from a kthread, which means it's
not frozen alongside userspace, and so can race with the work done
during system suspend:
tpm tpm0: tpm_transmit: tpm_recv: error -52
tpm tpm0: invalid TPM_STS.x 0xff, dumping stack for forensics
CPU: 0 PID: 1 Comm: init Not tainted 6.1.0-rc5+ #135
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-20220807_005459-localhost 04/01/2014
Call Trace:
tpm_tis_status.cold+0x19/0x20
tpm_transmit+0x13b/0x390
tpm_transmit_cmd+0x20/0x80
tpm1_pm_suspend+0xa6/0x110
tpm_pm_suspend+0x53/0x80
__pnp_bus_suspend+0x35/0xe0
__device_suspend+0x10f/0x350
Fix this by calling tpm_try_get_ops(), which itself is a wrapper around
tpm_chip_start(), but takes the appropriate mutex.
Signed-off-by: Jan Dabros <jsd@semihalf.com>
Reported-by: Vlastimil Babka <vbabka@suse.cz>
Tested-by: Jason A. Donenfeld <Jason@zx2c4.com>
Tested-by: Vlastimil Babka <vbabka@suse.cz>
Link: https://lore.kernel.org/all/c5ba47ef-393f-1fba-30bd-1230d1b4b592@suse.cz/
Cc: stable@vger.kernel.org
Fixes: e891db1a18bf ("tpm: turn on TPM on suspend for TPM 1.x")
[Jason: reworked commit message, added metadata]
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull drm fixes from Dave Airlie:
"Last set of fixes for final, scattered bunch of fixes, two amdgpu, one
vmwgfx, and some misc others.
amdgpu:
- S0ix fix
- DCN 3.2 array out of bounds fix
shmem:
- Fixes to shmem-helper error paths
bridge:
- Fix polarity bug in bridge/ti-sn65dsi86
dw-hdmi:
- Prefer 8-bit RGB fallback before any YUV mode in dw-hdmi, since
some panels lie about YUV support
vmwgfx:
- Stop using screen objects when SEV is active"
* tag 'drm-fixes-2022-12-09' of git://anongit.freedesktop.org/drm/drm:
drm/amd/display: fix array index out of bound error in DCN32 DML
drm/amdgpu/sdma_v4_0: turn off SDMA ring buffer in the s2idle suspend
drm/vmwgfx: Don't use screen objects when SEV is active
drm/shmem-helper: Avoid vm_open error paths
drm/shmem-helper: Remove errant put in error path
drm: bridge: dw_hdmi: fix preference of RGB modes over YUV420
drm/bridge: ti-sn65dsi86: Fix output polarity setting bug
drm/vmwgfx: Fix race issue calling pin_user_pages
This reverts commit 753395ea1e45c724150070b5785900b6a44bd5fb.
It introduced a boot regression on colibri-imx7, and potentially any
other i.MX7 boards with MTD partition list generated into the fdt by
U-Boot.
While the commit we are reverting here is not obviously wrong, it fixes
only a dt binding checker warning that is non-functional, while it
introduces a boot regression and there is no obvious fix ready.
Fixes: 753395ea1e45 ("ARM: dts: imx7: Fix NAND controller size-cells")
Signed-off-by: Francesco Dolcini <francesco.dolcini@toradex.com>
Reviewed-by: Miquel Raynal <miquel.raynal@bootlin.com>
Acked-by: Marek Vasut <marex@denx.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/all/Y4dgBTGNWpM6SQXI@francesco-nb.int.toradex.com/
Link: https://lore.kernel.org/all/20221205144917.6514168a@xps-13/
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
The example on how to use and test Capture Overlay specified
the wrong video device node. Back in 2015 the loop_video control
moved from the output device to the capture device, but this
example code is still referring to the output video device.
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Store the frame address where arm_get_current_stackframe() looks for it
(ARM_r7 instead of ARM_fp if CONFIG_THUMB2_KERNEL=y). Otherwise frame->fp
gets set to 0, causing unwind_frame() to fail.
# bpftrace -e 't:sched:sched_switch { @[kstack] = count(); exit(); }'
Attaching 1 probe...
@[
__schedule+1059
]: 1
A typical first unwind instruction is 0x97 (SP = R7), so after executing
it SP ends up being 0 and -URC_FAILURE is returned.
unwind_frame(pc = ac9da7d7 lr = 00000000 sp = c69bdda0 fp = 00000000)
unwind_find_idx(ac9da7d7)
unwind_exec_insn: insn = 00000097
unwind_exec_insn: fp = 00000000 sp = 00000000 lr = 00000000 pc = 00000000
With this patch:
# bpftrace -e 't:sched:sched_switch { @[kstack] = count(); exit(); }'
Attaching 1 probe...
@[
__schedule+1059
__schedule+1059
schedule+79
schedule_hrtimeout_range_clock+163
schedule_hrtimeout_range+17
ep_poll+471
SyS_epoll_wait+111
sys_epoll_pwait+231
__ret_fast_syscall+1
]: 1
Link: https://lore.kernel.org/r/20220920230728.2617421-1-tnovak@fb.com/
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Tomislav Novak <tnovak@fb.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Add more sanity checks to the VMA that do_brk_flags() will expand. Ensure
the VMA matches basic merge requirements within the function before
calling can_vma_merge_after().
Drop the duplicate checks from vm_brk_flags() since they will be enforced
later.
The old code would expand file VMAs on brk(), which is functionally
wrong and also dangerous in terms of locking because the brk() path
isn't designed for file VMAs and therefore doesn't lock the file
mapping. Checking can_vma_merge_after() ensures that new anonymous
VMAs can't be merged into file VMAs.
See https://lore.kernel.org/linux-mm/CAG48ez1tJZTOjS_FjRZhvtDA-STFmdw8PEizPDwMGFd_ui0Nrw@mail.gmail.com/
Link: https://lkml.kernel.org/r/20221205192304.1957418-1-Liam.Howlett@oracle.com
Fixes: 2e7ce7d354f2 ("mm/mmap: change do_brk_flags() to expand existing VMA and add do_brk_munmap()")
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Suggested-by: Jann Horn <jannh@google.com>
Cc: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: SeongJae Park <sj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
drm-misc-fixes for v6.1 final?:
- Fix polarity bug in bridge/ti-sn65dsi86.
- Prefer 8-bit RGB fallback before any YUV mode in dw-hdmi, since some
panels lie about YUV support.
- Fixes to shmem-helper error paths.
- Small vmwgfx to stop using screen objects when SEV is active.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/8110f02d-d155-926e-8674-c88b806c3a3a@linux.intel.com
AT91 fixes for 6.1 #3
It contains:
- build fix for SAMA5D3 devices which don't have an L2 cache and due to this
accesssing outer_cache.write_sec in sama5_secure_cache_init() could throw
undefined reference to `outer_cache' if CONFIG_OUTER_CACHE is disabled
from common sama5_defconfig.
* tag 'at91-fixes-6.1-3' of https://git.kernel.org/pub/scm/linux/kernel/git/at91/linux:
ARM: at91: fix build for SAMA5D3 w/o L2 cache
Link: https://lore.kernel.org/r/20221125093521.382105-1-claudiu.beznea@microchip.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
MT_MEMORY_RO is introduced by commit 598f0a99fa8a ("ARM: 9210/1:
Mark the FDT_FIXED sections as shareable"), which is a readonly
memory type for FDT area, but there are some different between
ARM_LPAE and non-ARM_LPAE, we need to setup PMD_SECT_AP2 and
L_PMD_SECT_RDONLY for MT_MEMORY_RO when ARM_LAPE enabled.
non-ARM_LPAE 0xff800000-0xffa00000 2M PGD KERNEL ro NX SHD
ARM_LPAE 0xff800000-0xffc00000 4M PMD RW NX SHD
ARM_LPAE+fix 0xff800000-0xffc00000 4M PMD ro NX SHD
Fixes: 598f0a99fa8a ("ARM: 9210/1: Mark the FDT_FIXED sections as shareable")
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
We use "unsigned long" to store a PFN in the kernel and phys_addr_t to
store a physical address.
On a 64bit system, both are 64bit wide. However, on a 32bit system, the
latter might be 64bit wide. This is, for example, the case on x86 with
PAE: phys_addr_t and PTEs are 64bit wide, while "unsigned long" only spans
32bit.
The current definition of SWP_PFN_BITS without MAX_PHYSMEM_BITS misses
that case, and assumes that the maximum PFN is limited by an 32bit
phys_addr_t. This implies, that SWP_PFN_BITS will currently only be able
to cover 4 GiB - 1 on any 32bit system with 4k page size, which is wrong.
Let's rely on the number of bits in phys_addr_t instead, but make sure to
not exceed the maximum swap offset, to not make the BUILD_BUG_ON() in
is_pfn_swap_entry() unhappy. Note that swp_entry_t is effectively an
unsigned long and the maximum swap offset shares that value with the swap
type.
For example, on an 8 GiB x86 PAE system with a kernel config based on
Debian 11.5 (-> CONFIG_FLATMEM=y, CONFIG_X86_PAE=y), we will currently
fail removing migration entries (remove_migration_ptes()), because
mm/page_vma_mapped.c:check_pte() will fail to identify a PFN match as
swp_offset_pfn() wrongly masks off PFN bits. For example,
split_huge_page_to_list()->...->remap_page() will leave migration entries
in place and continue to unlock the page.
Later, when we stumble over these migration entries (e.g., via
/proc/self/pagemap), pfn_swap_entry_to_page() will BUG_ON() because these
migration entries shouldn't exist anymore and the page was unlocked.
[ 33.067591] kernel BUG at include/linux/swapops.h:497!
[ 33.067597] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[ 33.067602] CPU: 3 PID: 742 Comm: cow Tainted: G E 6.1.0-rc8+ #16
[ 33.067605] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-1.fc36 04/01/2014
[ 33.067606] EIP: pagemap_pmd_range+0x644/0x650
[ 33.067612] Code: 00 00 00 00 66 90 89 ce b9 00 f0 ff ff e9 ff fb ff ff 89 d8 31 db e8 48 c6 52 00 e9 23 fb ff ff e8 61 83 56 00 e9 b6 fe ff ff <0f> 0b bf 00 f0 ff ff e9 38 fa ff ff 3e 8d 74 26 00 55 89 e5 57 31
[ 33.067615] EAX: ee394000 EBX: 00000002 ECX: ee394000 EDX: 00000000
[ 33.067617] ESI: c1b0ded4 EDI: 00024a00 EBP: c1b0ddb4 ESP: c1b0dd68
[ 33.067619] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 EFLAGS: 00010246
[ 33.067624] CR0: 80050033 CR2: b7a00000 CR3: 01bbbd20 CR4: 00350ef0
[ 33.067625] Call Trace:
[ 33.067628] ? madvise_free_pte_range+0x720/0x720
[ 33.067632] ? smaps_pte_range+0x4b0/0x4b0
[ 33.067634] walk_pgd_range+0x325/0x720
[ 33.067637] ? mt_find+0x1d6/0x3a0
[ 33.067641] ? mt_find+0x1d6/0x3a0
[ 33.067643] __walk_page_range+0x164/0x170
[ 33.067646] walk_page_range+0xf9/0x170
[ 33.067648] ? __kmem_cache_alloc_node+0x2a8/0x340
[ 33.067653] pagemap_read+0x124/0x280
[ 33.067658] ? default_llseek+0x101/0x160
[ 33.067662] ? smaps_account+0x1d0/0x1d0
[ 33.067664] vfs_read+0x90/0x290
[ 33.067667] ? do_madvise.part.0+0x24b/0x390
[ 33.067669] ? debug_smp_processor_id+0x12/0x20
[ 33.067673] ksys_pread64+0x58/0x90
[ 33.067675] __ia32_sys_ia32_pread64+0x1b/0x20
[ 33.067680] __do_fast_syscall_32+0x4c/0xc0
[ 33.067683] do_fast_syscall_32+0x29/0x60
[ 33.067686] do_SYSENTER_32+0x15/0x20
[ 33.067689] entry_SYSENTER_32+0x98/0xf1
Decrease the indentation level of SWP_PFN_BITS and SWP_PFN_MASK to keep it
readable and consistent.
[david@redhat.com: rely on sizeof(phys_addr_t) and min_t() instead]
Link: https://lkml.kernel.org/r/20221206105737.69478-1-david@redhat.com
[david@redhat.com: use "int" for comparison, as we're only comparing numbers < 64]
Link: https://lkml.kernel.org/r/1f157500-2676-7cef-a84e-9224ed64e540@redhat.com
Link: https://lkml.kernel.org/r/20221205150857.167583-1-david@redhat.com
Fixes: 0d206b5d2e0d ("mm/swap: add swp_offset_pfn() to fetch PFN from swap entry")
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Pull timer fix from Borislav Petkov:
- Revert a fix to RISC-V timers supposed to address an uncertainty
whether clock events are received during S3 or not which locks up
other RISC-V platforms. The issue will be fixed differently later.
* tag 'timers_urgent_for_v6.1_rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
Revert "clocksource/drivers/riscv: Events are stopped during CPU suspend"
Per syzbot it is possible for perf_pending_task() to run after the
event is free()'d. There are two related but distinct cases:
- the task_work was already queued before destroying the event;
- destroying the event itself queues the task_work.
The first cannot be solved using task_work_cancel() since
perf_release() itself might be called from a task_work (____fput),
which means the current->task_works list is already empty and
task_work_cancel() won't be able to find the perf_pending_task()
entry.
The simplest alternative is extending the perf_event lifetime to cover
the task_work.
The second is just silly, queueing a task_work while you know the
event is going away makes no sense and is easily avoided by
re-arranging how the event is marked STATE_DEAD and ensuring it goes
through STATE_OFF on the way down.
Reported-by: syzbot+9228d6098455bb209ec8@syzkaller.appspotmail.com
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Marco Elver <elver@google.com>
When SEV is enabled gmr's and mob's are explicitly disabled because
the encrypted system memory can not be used by the hypervisor.
The driver was disabling GMR's but the presentation code, which depends
on GMR's, wasn't honoring it which lead to black screen on hosts
with SEV enabled.
Make sure screen objects presentation is not used when guest memory
regions have been disabled to fix presentation on SEV enabled hosts.
Fixes: 3b0d6458c705 ("drm/vmwgfx: Refuse DMA operation when SEV encryption is active")
Cc: <stable@vger.kernel.org> # v5.7+
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reported-by: Nicholas Hunt <nhunt@vmware.com>
Reviewed-by: Martin Krastev <krastevm@vmware.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221201175341.491884-1-zack@kde.org
The L2 cache is present on the newer SAMA5D2 and SAMA5D4 families, but
apparently not for the older SAMA5D3.
Solves a build-time regression with the following symptom:
sama5.c:(.init.text+0x48): undefined reference to `outer_cache'
Fixes: 3b5a7ca7d252 ("ARM: at91: setup outer cache .write_sec() callback if needed")
Signed-off-by: Peter Rosin <peda@axentia.se>
[claudiu.beznea: delete "At least not always." from commit description]
Signed-off-by: Claudiu Beznea <claudiu.beznea@microchip.com>
Link: https://lore.kernel.org/r/b7f8dacc-5e1f-0eb2-188e-3ad9a9f7613d@axentia.se
Fix tmpfs data loss when the fallocate system call is interrupted by a
signal, or fails for some other reason. The partial folio handling in
shmem_undo_range() forgot to consider this unfalloc case, and was liable
to erase or truncate out data which had already been committed earlier.
It turns out that none of the partial folio handling there is appropriate
for the unfalloc case, which just wants to proceed to removal of whole
folios: which find_get_entries() provides, even when partially covered.
Original patch by Rui Wang.
Link: https://lore.kernel.org/linux-mm/33b85d82.7764.1842e9ab207.Coremail.chenguoqic@163.com/
Link: https://lkml.kernel.org/r/a5dac112-cf4b-7af-a33-f386e347fd38@google.com
Fixes: b9a8a4195c7d ("truncate,shmem: Handle truncates that split large folios")
Signed-off-by: Hugh Dickins <hughd@google.com>
Reported-by: Guoqi Chen <chenguoqic@163.com>
Link: https://lore.kernel.org/all/20221101032248.819360-1-kernel@hev.cc/
Cc: Rui Wang <kernel@hev.cc>
Cc: Huacai Chen <chenhuacai@loongson.cn>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Cc: <stable@vger.kernel.org> [5.17+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Pull powerpc fixes from Michael Ellerman:
- Fix oops in 32-bit BPF tail call tests
- Add missing declaration for machine_check_early_boot()
Thanks to Christophe Leroy and Naveen N. Rao.
* tag 'powerpc-6.1-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/64s: Add missing declaration for machine_check_early_boot()
powerpc/bpf/32: Fix Oops on tail call tests
This reverts commit 232ccac1bd9b5bfe73895f527c08623e7fa0752d.
On the subject of suspend, the RISC-V SBI spec states:
This does not cover whether any given events actually reach the hart or
not, just what the hart will do if it receives an event. On PolarFire
SoC, and potentially other SiFive based implementations, events from the
RISC-V timer do reach a hart during suspend. This is not the case for the
implementation on the Allwinner D1 - there timer events are not received
during suspend.
To fix this, the CLOCK_EVT_FEAT_C3STOP (mis)feature was enabled for the
timer driver - but this has broken both RCU stall detection and timers
generally on PolarFire SoC and potentially other SiFive based
implementations.
If an AXI read to the PCIe controller on PolarFire SoC times out, the
system will stall, however, with CLOCK_EVT_FEAT_C3STOP active, the system
just locks up without RCU stalling:
io scheduler mq-deadline registered
io scheduler kyber registered
microchip-pcie 2000000000.pcie: host bridge /soc/pcie@2000000000 ranges:
microchip-pcie 2000000000.pcie: MEM 0x2008000000..0x2087ffffff -> 0x0008000000
microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
microchip-pcie 2000000000.pcie: axi read request error
microchip-pcie 2000000000.pcie: axi read timeout
microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
Freeing initrd memory: 7332K
Similarly issues were reported with clock_nanosleep() - with a test app
that sleeps each cpu for 6, 5, 4, 3 ms respectively, HZ=250 & the blamed
commit in place, the sleep times are rounded up to the next jiffy:
== CPU: 1 == == CPU: 2 == == CPU: 3 == == CPU: 4 ==
Mean: 7.974992 Mean: 7.976534 Mean: 7.962591 Mean: 3.952179
Std Dev: 0.154374 Std Dev: 0.156082 Std Dev: 0.171018 Std Dev: 0.076193
Hi: 9.472000 Hi: 10.495000 Hi: 8.864000 Hi: 4.736000
Lo: 6.087000 Lo: 6.380000 Lo: 4.872000 Lo: 3.403000
Samples: 521 Samples: 521 Samples: 521 Samples: 521
Fortunately, the D1 has a second timer, which is "currently used in
preference to the RISC-V/SBI timer driver" so a revert here does not
hurt operation of D1 in its current form.
Ultimately, a DeviceTree property (or node) will be added to encode the
behaviour of the timers, but until then revert the addition of
CLOCK_EVT_FEAT_C3STOP.
Fixes: 232ccac1bd9b ("clocksource/drivers/riscv: Events are stopped during CPU suspend")
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
Acked-by: Samuel Holland <samuel@sholland.org>
Link: https://lore.kernel.org/linux-riscv/YzYTNQRxLr7Q9JR0@spud/
Link: https://github.com/riscv-non-isa/riscv-sbi-doc/issues/98/
Link: https://lore.kernel.org/linux-riscv/bf6d3b1f-f703-4a25-833e-972a44a04114@sholland.org/
Link: https://lore.kernel.org/r/20221122121620.3522431-1-conor.dooley@microchip.com
Some PMUs (notably the traditional hardware kind) have boundary issues
with the OS filter. Specifically, it is possible for
perf_event_attr::exclude_kernel=1 events to trigger in-kernel due to
SKID or errata.
This can upset the sigtrap logic some and trigger the WARN.
However, if this invalid sample is the first we must not loose the
SIGTRAP, OTOH if it is the second, it must not override the
pending_addr with a (possibly) invalid one.
Fixes: ca6c21327c6a ("perf: Fix missing SIGTRAPs")
Reported-by: Pengfei Xu <pengfei.xu@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Marco Elver <elver@google.com>
Tested-by: Pengfei Xu <pengfei.xu@intel.com>
Link: https://lkml.kernel.org/r/Y3hDYiXwRnJr8RYG@xpf.sh.intel.com
Pull networking fixes from Jakub Kicinski:
"Including fixes from bluetooth, can and netfilter.
Current release - new code bugs:
- bonding: ipv6: correct address used in Neighbour Advertisement
parsing (src vs dst typo)
- fec: properly scope IRQ coalesce setup during link up to supported
chips only
Previous releases - regressions:
- Bluetooth fixes for fake CSR clones (knockoffs):
- re-add ERR_DATA_REPORTING quirk
- fix crash when device is replugged
- Bluetooth:
- silence a user-triggerable dmesg error message
- L2CAP: fix u8 overflow, oob access
- correct vendor codec definition
- fix support for Read Local Supported Codecs V2
- ti: am65-cpsw: fix RGMII configuration at SPEED_10
- mana: fix race on per-CQ variable NAPI work_done
Previous releases - always broken:
- af_unix: diag: fetch user_ns from in_skb in unix_diag_get_exact(),
avoid null-deref
- af_can: fix NULL pointer dereference in can_rcv_filter
- can: slcan: fix UAF with a freed work
- can: can327: flush TX_work on ldisc .close()
- macsec: add missing attribute validation for offload
- ipv6: avoid use-after-free in ip6_fragment()
- nft_set_pipapo: actually validate intervals in fields after the
first one
- mvneta: prevent oob access in mvneta_config_rss()
- ipv4: fix incorrect route flushing when table ID 0 is used, or when
source address is deleted
- phy: mxl-gpy: add workaround for IRQ bug on GPY215B and GPY215C"
* tag 'net-6.1-rc9' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (77 commits)
net: dsa: sja1105: avoid out of bounds access in sja1105_init_l2_policing()
s390/qeth: fix use-after-free in hsci
macsec: add missing attribute validation for offload
net: mvneta: Fix an out of bounds check
net: thunderbolt: fix memory leak in tbnet_open()
ipv6: avoid use-after-free in ip6_fragment()
net: plip: don't call kfree_skb/dev_kfree_skb() under spin_lock_irq()
net: phy: mxl-gpy: add MDINT workaround
net: dsa: mv88e6xxx: accept phy-mode = "internal" for internal PHY ports
xen/netback: don't call kfree_skb() under spin_lock_irqsave()
dpaa2-switch: Fix memory leak in dpaa2_switch_acl_entry_add() and dpaa2_switch_acl_entry_remove()
ethernet: aeroflex: fix potential skb leak in greth_init_rings()
tipc: call tipc_lxc_xmit without holding node_read_lock
can: esd_usb: Allow REC and TEC to return to zero
can: can327: flush TX_work on ldisc .close()
can: slcan: fix freed work crash
can: af_can: fix NULL pointer dereference in can_rcv_filter
net: dsa: sja1105: fix memory leak in sja1105_setup_devlink_regions()
ipv4: Fix incorrect route flushing when table ID 0 is used
ipv4: Fix incorrect route flushing when source address is deleted
...
Syzkaller reports a NULL deref bug as follows:
BUG: KASAN: null-ptr-deref in io_tctx_exit_cb+0x53/0xd3
Read of size 4 at addr 0000000000000138 by task file1/1955
CPU: 1 PID: 1955 Comm: file1 Not tainted 6.1.0-rc7-00103-gef4d3ea40565 #75
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0xcd/0x134
? io_tctx_exit_cb+0x53/0xd3
kasan_report+0xbb/0x1f0
? io_tctx_exit_cb+0x53/0xd3
kasan_check_range+0x140/0x190
io_tctx_exit_cb+0x53/0xd3
task_work_run+0x164/0x250
? task_work_cancel+0x30/0x30
get_signal+0x1c3/0x2440
? lock_downgrade+0x6e0/0x6e0
? lock_downgrade+0x6e0/0x6e0
? exit_signals+0x8b0/0x8b0
? do_raw_read_unlock+0x3b/0x70
? do_raw_spin_unlock+0x50/0x230
arch_do_signal_or_restart+0x82/0x2470
? kmem_cache_free+0x260/0x4b0
? putname+0xfe/0x140
? get_sigframe_size+0x10/0x10
? do_execveat_common.isra.0+0x226/0x710
? lockdep_hardirqs_on+0x79/0x100
? putname+0xfe/0x140
? do_execveat_common.isra.0+0x238/0x710
exit_to_user_mode_prepare+0x15f/0x250
syscall_exit_to_user_mode+0x19/0x50
do_syscall_64+0x42/0xb0
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0023:0x0
Code: Unable to access opcode bytes at 0xffffffffffffffd6.
RSP: 002b:00000000fffb7790 EFLAGS: 00000200 ORIG_RAX: 000000000000000b
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
</TASK>
Kernel panic - not syncing: panic_on_warn set ...
This happens because the adding of task_work from io_ring_exit_work()
isn't synchronized with canceling all work items from eg exec. The
execution of the two are ordered in that they are both run by the task
itself, but if io_tctx_exit_cb() is queued while we're canceling all
work items off exec AND gets executed when the task exits to userspace
rather than in the main loop in io_uring_cancel_generic(), then we can
find current->io_uring == NULL and hit the above crash.
It's safe to add this NULL check here, because the execution of the two
paths are done by the task itself.
Cc: stable@vger.kernel.org
Fixes: d56d938b4bef ("io_uring: do ctx initiated file note removal")
Reported-by: syzkaller <syzkaller@googlegroups.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
Link: https://lore.kernel.org/r/20221206093833.3812138-1-harshit.m.mogalapalli@oracle.com
[axboe: add code comment and also put an explanation in the commit msg]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull NVMe fixes from Christoph:
"nvme fixes for Linux 6.1
- fix SRCU protection of nvme_ns_head list (Caleb Sander)
- clear the prp2 field when not used (Lei Rao)"
* tag 'nvme-6.1-2022-01-02' of git://git.infradead.org/nvme:
nvme: fix SRCU protection of nvme_ns_head list
nvme-pci: clear the prp2 field when not used
A device might have a core quirk for NVME_QUIRK_IGNORE_DEV_SUBNQN
(such as Samsung X5) but it would still give a:
"missing or invalid SUBNQN field"
warning as core quirks are filled after calling nvme_init_subnqn. Fill
ctrl->quirks from struct core_quirks before calling nvme_init_subsystem
to fix this.
Tested on a Samsung X5.
Fixes: ab9e00cc72fa ("nvme: track subsystems")
Signed-off-by: Pankaj Raghav <p.raghav@samsung.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
[Why&How]
LinkCapacitySupport array is indexed with the number of voltage states and
not the number of max DPPs. Fix the error by changing the array
declaration to use the correct (larger) array size of total number of
voltage states.
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.0.x
vm_open() is not allowed to fail. Fortunately we are guaranteed that
the pages are already pinned, thanks to the initial mmap which is now
being cloned into a forked process, and only need to increment the
refcnt. So just increment it directly. Previously if a signal was
delivered at the wrong time to the forking process, the
mutex_lock_interruptible() could fail resulting in the pages_use_count
not being incremented.
Fixes: 2194a63a818d ("drm: Add library for shmem backed GEM objects")
Cc: stable@vger.kernel.org
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221130185748.357410-3-robdclark@gmail.com
We set the PIOC to GPIO mode. This way the pin becomes an
input signal will be usable by the controller. Without
this change the udc on the 9g20ek does not work.
Cc: nicolas.ferre@microchip.com
Cc: ludovic.desroches@microchip.com
Cc: alexandre.belloni@bootlin.com
Cc: linux-arm-kernel@lists.infradead.org
Cc: kernel@pengutronix.de
Fixes: 5cb4e73575e3 ("ARM: at91: add at91sam9g20ek boards dt support")
Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de>
Signed-off-by: Claudiu Beznea <claudiu.beznea@microchip.com>
Link: https://lore.kernel.org/r/20221114185923.1023249-3-m.grzeschik@pengutronix.de
vivid_update_format_cap() can be called from an s_ctrl callback.
In that case (keep_controls == true) no control framework functions
can be called that take the control handler mutex.
The new call to v4l2_ctrl_modify_dimensions() did exactly that.
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Fixes: 6bc7643d1b9c (media: vivid: add pixel_array test control)
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
This patch fixes the following build error:
In file included from ./include/linux/io.h:13,
from ./arch/arm/mach-rpc/include/mach/uncompress.h:9,
from arch/arm/boot/compressed/misc.c:31:
./arch/arm/include/asm/io.h:85:22: error: conflicting types for ‘__raw_writeb’
85 | #define __raw_writeb __raw_writeb
| ^~~~~~~~~~~~
./arch/arm/include/asm/io.h:86:20: note: in expansion of macro ‘__raw_writeb’
86 | static inline void __raw_writeb(u8 val, volatile void __iomem *addr)
| ^~~~~~~~~~~~
In file included from arch/arm/boot/compressed/misc.c:26:
arch/arm/boot/compressed/misc-ep93xx.h:13:20: note: previous definition of ‘__raw_writeb’ was here
13 | static inline void __raw_writeb(unsigned char value, unsigned int ptr)
| ^~~~~~~~~~~~
To: Russell King <linux@armlinux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: linux-arm-kernel@lists.infradead.org
Fixes: 0361c7e504b1 ("ARM: ep93xx: multiplatform support")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
1813e51eece0 ("memcg: increase MEMCG_CHARGE_BATCH to 64") has changed
the batch size while this test case has been left behind. This has led
to a test failure reported by test bot:
not ok 2 selftests: cgroup: test_kmem # exit=1
Update the tolerance for the pcp charges to reflect the
MEMCG_CHARGE_BATCH change to fix this.
[akpm@linux-foundation.org: update comments, per Roman]
Link: https://lkml.kernel.org/r/Y4m8Unt6FhWKC6IH@dhcp22.suse.cz
Fixes: 1813e51eece0a ("memcg: increase MEMCG_CHARGE_BATCH to 64")
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: kernel test robot <yujie.liu@intel.com>
Link: https://lore.kernel.org/oe-lkp/202212010958.c1053bd3-yujie.liu@intel.com
Acked-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Tested-by: Yujie Liu <yujie.liu@intel.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Feng Tang <feng.tang@intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Michal Koutný" <mkoutny@suse.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The perf_event_attr::sigtrap functionality relies on data->addr being
set. However commit 7b0846301531 ("perf: Use sample_flags for addr")
changed this to only initialize data->addr when not 0.
Fixes: 7b0846301531 ("perf: Use sample_flags for addr")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/Y3426b4OimE%2FI5po%40hirez.programming.kicks-ass.net
Pull HID fixes from Jiri Kosina:
"A regression fix for handling Logitech HID++ devices and memory
corruption fixes:
- regression fix (revert) for catch-all handling of Logitech HID++
Bluetooth devices; there are devices that turn out not to work with
this, and the root cause is yet to be properly understood. So we
are dropping it for now, and it will be revisited for 6.2 or 6.3
(Benjamin Tissoires)
- memory corruption fix in HID core (ZhangPeng)
- memory corruption fix in hid-lg4ff (Anastasia Belova)
- Kconfig fix for I2C_HID (Benjamin Tissoires)
- a few device-id specific quirks that piggy-back on top of the
important fixes above"
* tag 'for-linus-2022120801' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
Revert "HID: logitech-hidpp: Enable HID++ for all the Logitech Bluetooth devices"
Revert "HID: logitech-hidpp: Remove special-casing of Bluetooth devices"
HID: usbhid: Add ALWAYS_POLL quirk for some mice
HID: core: fix shift-out-of-bounds in hid_report_raw_event
HID: uclogic: Add HID_QUIRK_HIDINPUT_FORCE quirk
HID: fix I2C_HID not selected when I2C_HID_OF_ELAN is
HID: hid-lg4ff: Add check for empty lbuf
HID: ite: Enable QUIRK_TOUCHPAD_ON_OFF_REPORT on Acer Aspire Switch V 10
HID: uclogic: Fix frame templates for big endian architectures
The SJA1105 family has 45 L2 policing table entries
(SJA1105_MAX_L2_POLICING_COUNT) and SJA1110 has 110
(SJA1110_MAX_L2_POLICING_COUNT). Keeping the table structure but
accounting for the difference in port count (5 in SJA1105 vs 10 in
SJA1110) does not fully explain the difference. Rather, the SJA1110 also
has L2 ingress policers for multicast traffic. If a packet is classified
as multicast, it will be processed by the policer index 99 + SRCPORT.
The sja1105_init_l2_policing() function initializes all L2 policers such
that they don't interfere with normal packet reception by default. To have
a common code between SJA1105 and SJA1110, the index of the multicast
policer for the port is calculated because it's an index that is out of
bounds for SJA1105 but in bounds for SJA1110, and a bounds check is
performed.
The code fails to do the proper thing when determining what to do with the
multicast policer of port 0 on SJA1105 (ds->num_ports = 5). The "mcast"
index will be equal to 45, which is also equal to
table->ops->max_entry_count (SJA1105_MAX_L2_POLICING_COUNT). So it passes
through the check. But at the same time, SJA1105 doesn't have multicast
policers. So the code programs the SHARINDX field of an out-of-bounds
element in the L2 Policing table of the static config.
The comparison between index 45 and 45 entries should have determined the
code to not access this policer index on SJA1105, since its memory wasn't
even allocated.
With enough bad luck, the out-of-bounds write could even overwrite other
valid kernel data, but in this case, the issue was detected using KASAN.
Kernel log:
sja1105 spi5.0: Probed switch chip: SJA1105Q
==================================================================
BUG: KASAN: slab-out-of-bounds in sja1105_setup+0x1cbc/0x2340
Write of size 8 at addr ffffff880bd57708 by task kworker/u8:0/8
...
Workqueue: events_unbound deferred_probe_work_func
Call trace:
...
sja1105_setup+0x1cbc/0x2340
dsa_register_switch+0x1284/0x18d0
sja1105_probe+0x748/0x840
...
Allocated by task 8:
...
sja1105_setup+0x1bcc/0x2340
dsa_register_switch+0x1284/0x18d0
sja1105_probe+0x748/0x840
...
Fixes: 38fbe91f2287 ("net: dsa: sja1105: configure the multicast policers, if present")
CC: stable@vger.kernel.org # 5.15+
Signed-off-by: Radu Nicolae Pirea (OSS) <radu-nicolae.pirea@oss.nxp.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://lore.kernel.org/r/20221207132347.38698-1-radu-nicolae.pirea@oss.nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
With how task_work is added and signaled, we can have TIF_NOTIFY_SIGNAL
set and no task_work pending as it got run in a previous loop. Treat
TIF_NOTIFY_SIGNAL like get_signal(), always clear it if set regardless
of whether or not task_work is pending to run.
Cc: stable@vger.kernel.org
Fixes: 46a525e199e4 ("io_uring: don't gate task_work run on TIF_NOTIFY_SIGNAL")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Either ublk_can_use_task_work() is true or not, io commands are
forwarded to ublk server in reverse order, since llist_add() is
always to add one element to the head of the list.
Even though block layer doesn't guarantee request dispatch order,
requests should be sent to hardware in the sequence order generated
from io scheduler, which usually considers the request's LBA, and
order is often important for HDD.
So forward io commands in the sequence made from io scheduler by
aligning task work with current io_uring command's batch handling,
and it has been observed that both can get similar performance data
if IORING_SETUP_COOP_TASKRUN is set from ublk server.
Reported-by: Andreas Hindborg <andreas.hindborg@wdc.com>
Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: ZiyangZhang <ZiyangZhang@linux.alibaba.com>
Link: https://lore.kernel.org/r/20221121155645.396272-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>