Linux kernel
============
There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.
In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``. The formatted documentation can also be read online at:
https://www.kernel.org/doc/html/latest/
There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.
Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.
Clone this repository
For self-hosted knots, clone URLs may differ based on your setup.
Download tar.gz
Without EXCLUSIVE_SYSTEM_RAM, users are allowed to map arbitrary
physical memory regions into the userspace via /dev/mem. At the same
time, pages may change their properties (e.g., from anonymous pages to
named pages) while they are still being mapped in the userspace, leading
to "corruption" detected by the page table check.
To avoid these false positives, this patch makes PAGE_TABLE_CHECK
depends on EXCLUSIVE_SYSTEM_RAM. This dependency is understandable
because PAGE_TABLE_CHECK is a hardening technique but /dev/mem without
STRICT_DEVMEM (i.e., !EXCLUSIVE_SYSTEM_RAM) is itself a security
problem.
Even with EXCLUSIVE_SYSTEM_RAM, I/O pages may be still allowed to be
mapped via /dev/mem. However, these pages are always considered as named
pages, so they won't break the logic used in the page table check.
Cc: <stable@vger.kernel.org> # 5.17
Signed-off-by: Ruihan Li <lrh2000@pku.edu.cn>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Link: https://lore.kernel.org/r/20230515130958.32471-4-lrh2000@pku.edu.cn
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When hcd->localmem_pool is non-null, localmem_pool is used to allocate
DMA memory. In this case, the dma address will be properly returned (in
dma_handle), and dma_mmap_coherent should be used to map this memory
into the user space. However, the current implementation uses
pfn_remap_range, which is supposed to map normal pages.
Instead of repeating the logic in the memory allocation function, this
patch introduces a more robust solution. Here, the type of allocated
memory is checked by testing whether dma_handle is properly set. If
dma_handle is properly returned, it means some DMA pages are allocated
and dma_mmap_coherent should be used to map them. Otherwise, normal
pages are allocated and pfn_remap_range should be called. This ensures
that the correct mmap functions are used consistently, independently
with logic details that determine which type of memory gets allocated.
Fixes: a0e710a7def4 ("USB: usbfs: fix mmap dma mismatch")
Cc: stable@vger.kernel.org
Signed-off-by: Ruihan Li <lrh2000@pku.edu.cn>
Link: https://lore.kernel.org/r/20230515130958.32471-3-lrh2000@pku.edu.cn
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The current implementation of usbdev_mmap uses usb_alloc_coherent to
allocate memory pages that will later be mapped into the user space.
Meanwhile, usb_alloc_coherent employs three different methods to
allocate memory, as outlined below:
* If hcd->localmem_pool is non-null, it uses gen_pool_dma_alloc to
allocate memory;
* If DMA is not available, it uses kmalloc to allocate memory;
* Otherwise, it uses dma_alloc_coherent.
However, it should be noted that gen_pool_dma_alloc does not guarantee
that the resulting memory will be page-aligned. Furthermore, trying to
map slab pages (i.e., memory allocated by kmalloc) into the user space
is not resonable and can lead to problems, such as a type confusion bug
when PAGE_TABLE_CHECK=y [1].
To address these issues, this patch introduces hcd_alloc_coherent_pages,
which addresses the above two problems. Specifically,
hcd_alloc_coherent_pages uses gen_pool_dma_alloc_align instead of
gen_pool_dma_alloc to ensure that the memory is page-aligned. To replace
kmalloc, hcd_alloc_coherent_pages directly allocates pages by calling
__get_free_pages.
Reported-by: syzbot+fcf1a817ceb50935ce99@syzkaller.appspotmail.comm
Closes: https://lore.kernel.org/lkml/000000000000258e5e05fae79fc1@google.com/ [1]
Fixes: f7d34b445abc ("USB: Add support for usbfs zerocopy.")
Fixes: ff2437befd8f ("usb: host: Fix excessive alignment restriction for local memory allocations")
Cc: stable@vger.kernel.org
Signed-off-by: Ruihan Li <lrh2000@pku.edu.cn>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Link: https://lore.kernel.org/r/20230515130958.32471-2-lrh2000@pku.edu.cn
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The "snps,hsphy_interface" is string, not u8. Fix the type.
Fixes: 389d77658801 ("dt-bindings: usb: Convert DWC USB3 bindings to DT schema")
Cc: stable <stable@kernel.org>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Marek Vasut <marex@denx.de>
Link: https://lore.kernel.org/r/20230515172456.179049-1-marex@denx.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The "udc" pointer was never set in the probe() function so it will
lead to a NULL dereference in udc_pci_remove() when we do:
usb_del_gadget_udc(&udc->gadget);
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Link: https://lore.kernel.org/r/ZG+A/dNpFWAlCChk@kili
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
While exercising the unbind path, with the current implementation
the functionfs_unbind would be calling which waits for the ffs->mutex
to be available, however within the same time ffs_ep0_read is invoked
& if no setup packets are pending, it will invoke function
wait_event_interruptible_exclusive_locked_irq which by definition waits
for the ev.count to be increased inside the same mutex for which
functionfs_unbind is waiting.
This creates deadlock situation because the functionfs_unbind won't
get the lock until ev.count is increased which can only happen if
the caller ffs_func_unbind can proceed further.
Following is the illustration:
CPU1 CPU2
ffs_func_unbind() ffs_ep0_read()
mutex_lock(ffs->mutex)
wait_event(ffs->ev.count)
functionfs_unbind()
mutex_lock(ffs->mutex)
mutex_unlock(ffs->mutex)
ffs_event_add()
<deadlock>
Fix this by moving the event unbind before functionfs_unbind
to ensure the ev.count is incrased properly.
Fixes: 6a19da111057 ("usb: gadget: f_fs: Prevent race during ffs_ep0_queue_wait")
Cc: stable <stable@kernel.org>
Signed-off-by: Uttkarsh Aggarwal <quic_uaggarwa@quicinc.com>
Link: https://lore.kernel.org/r/20230525092854.7992-1-quic_uaggarwa@quicinc.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
At iMX8QM platform, enable NCM gadget and run 'iperf3 -s'.
At host, run 'iperf3 -V -c fe80::6863:98ff:feef:3e0%enxc6e147509498'
[ 5] 0.00-1.00 sec 1.55 MBytes 13.0 Mbits/sec 90 4.18 KBytes
[ 5] 1.00-2.00 sec 1.44 MBytes 12.0 Mbits/sec 75 4.18 KBytes
[ 5] 2.00-3.00 sec 1.48 MBytes 12.4 Mbits/sec 75 4.18 KBytes
Expected speed should be bigger than 300Mbits/sec.
The root cause of this performance drop was found to be data corruption
happening at 4K borders in some Ethernet packets, leading to TCP
checksum errors. This corruption occurs from the position
(4K - (address & 0x7F)) to 4K. The u_ether function's allocation of
skb_buff reserves 64B, meaning all RX addresses resemble 0xXXXX0040.
Force trb_burst_size to 16 can fix this problem.
Cc: stable@vger.kernel.org
Fixes: 7733f6c32e36 ("usb: cdns3: Add Cadence USB3 DRD Driver")
Signed-off-by: Frank Li <Frank.Li@nxp.com>
Link: https://lore.kernel.org/r/20230518154946.3666662-1-Frank.Li@nxp.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Pull x86 cpu fix from Thomas Gleixner:
"A single fix for x86:
- Prevent a bogus setting for the number of HT siblings, which is
caused by the CPUID evaluation trainwreck of X86. That recomputes
the value for each CPU, so the last CPU "wins". That can cause
completely bogus sibling values"
* tag 'x86-urgent-2023-05-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/topology: Fix erroneous smp_num_siblings on Intel Hybrid platforms
Pull perf fixes from Thomas Gleixner:
"A small set of perf fixes:
- Make the MSR-readout based CHA discovery work around broken
discovery tables in some SPR firmwares.
- Prevent saving PEBS configuration which has software bits set that
cause a crash when restored into the relevant MSR"
* tag 'perf-urgent-2023-05-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/uncore: Correct the number of CHAs on SPR
perf/x86/intel: Save/restore cpuc->active_pebs_data_cfg when using guest PEBS
Traditionally, all CPUs in a system have identical numbers of SMT
siblings. That changes with hybrid processors where some logical CPUs
have a sibling and others have none.
Today, the CPU boot code sets the global variable smp_num_siblings when
every CPU thread is brought up. The last thread to boot will overwrite
it with the number of siblings of *that* thread. That last thread to
boot will "win". If the thread is a Pcore, smp_num_siblings == 2. If it
is an Ecore, smp_num_siblings == 1.
smp_num_siblings describes if the *system* supports SMT. It should
specify the maximum number of SMT threads among all cores.
Ensure that smp_num_siblings represents the system-wide maximum number
of siblings by always increasing its value. Never allow it to decrease.
On MeteorLake-P platform, this fixes a problem that the Ecore CPUs are
not updated in any cpu sibling map because the system is treated as an
UP system when probing Ecore CPUs.
Below shows part of the CPU topology information before and after the
fix, for both Pcore and Ecore CPU (cpu0 is Pcore, cpu 12 is Ecore).
...
-/sys/devices/system/cpu/cpu0/topology/package_cpus:000fff
-/sys/devices/system/cpu/cpu0/topology/package_cpus_list:0-11
+/sys/devices/system/cpu/cpu0/topology/package_cpus:3fffff
+/sys/devices/system/cpu/cpu0/topology/package_cpus_list:0-21
...
-/sys/devices/system/cpu/cpu12/topology/package_cpus:001000
-/sys/devices/system/cpu/cpu12/topology/package_cpus_list:12
+/sys/devices/system/cpu/cpu12/topology/package_cpus:3fffff
+/sys/devices/system/cpu/cpu12/topology/package_cpus_list:0-21
Notice that the "before" 'package_cpus_list' has only one CPU. This
means that userspace tools like lscpu will see a little laptop like
an 11-socket system:
-Core(s) per socket: 1
-Socket(s): 11
+Core(s) per socket: 16
+Socket(s): 1
This is also expected to make the scheduler do rather wonky things
too.
[ dhansen: remove CPUID detail from changelog, add end user effects ]
CC: stable@kernel.org
Fixes: bbb65d2d365e ("x86: use cpuid vector 0xb when available for detecting cpu topology")
Fixes: 95f3d39ccf7a ("x86/cpu/topology: Provide detect_extended_topology_early()")
Suggested-by: Len Brown <len.brown@intel.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/all/20230323015640.27906-1-rui.zhang%40intel.com
Pull unwinder fixes from Thomas Gleixner:
"A set of unwinder and tooling fixes:
- Ensure that the stack pointer on x86 is aligned again so that the
unwinder does not read past the end of the stack
- Discard .note.gnu.property section which has a pointlessly
different alignment than the other note sections. That confuses
tooling of all sorts including readelf, libbpf and pahole"
* tag 'objtool-urgent-2023-05-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/show_trace_log_lvl: Ensure stack pointer is aligned, again
vmlinux.lds.h: Discard .note.gnu.property section
The number of CHAs from the discovery table on some SPR variants is
incorrect, because of a firmware issue. An accurate number can be read
from the MSR UNC_CBO_CONFIG.
Fixes: 949b11381f81 ("perf/x86/intel/uncore: Add Sapphire Rapids server CHA support")
Reported-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Stephane Eranian <eranian@google.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230508140206.283708-1-kan.liang@linux.intel.com
The INVLPG instruction is used to invalidate TLB entries for a
specified virtual address. When PCIDs are enabled, INVLPG is supposed
to invalidate TLB entries for the specified address for both the
current PCID *and* Global entries. (Note: Only kernel mappings set
Global=1.)
Unfortunately, some INVLPG implementations can leave Global
translations unflushed when PCIDs are enabled.
As a workaround, never enable PCIDs on affected processors.
I expect there to eventually be microcode mitigations to replace this
software workaround. However, the exact version numbers where that
will happen are not known today. Once the version numbers are set in
stone, the processor list can be tweaked to only disable PCIDs on
affected processors with affected microcode.
Note: if anyone wants a quick fix that doesn't require patching, just
stick 'nopcid' on your kernel command-line.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Pull debugobjects fixes from Thomas Gleixner:
"Two fixes for debugobjects:
- Prevent the allocation path from waking up kswapd.
That's a long standing issue due to the GFP_ATOMIC allocation flag.
As debug objects can be invoked from pretty much any context waking
kswapd can end up in arbitrary lock chains versus the waitqueue
lock
- Correct the explicit lockdep wait-type violation in
debug_object_fill_pool()"
* tag 'core-debugobjects-2023-05-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
debugobjects: Don't wake up kswapd from fill_pool()
debugobjects,locking: Annotate debug_object_fill_pool() wait type violation