commits
Pull ide fix from Jens Axboe:
"This is a leftover fix from 5.11, where I forgot to ship it your way"
* tag 'ide-5.11-2021-02-28' of git://git.kernel.dk/linux-block:
ide/falconide: Fix module unload
Pull Kbuild fixes from Masahiro Yamada:
- Fix UNUSED_KSYMS_WHITELIST for Clang LTO
- Make -s builds really silent irrespective of V= option
- Fix build error when SUBLEVEL or PATCHLEVEL is empty
* tag 'kbuild-fixes-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kbuild: Fix <linux/version.h> for empty SUBLEVEL or PATCHLEVEL again
kbuild: make -s option take precedence over V=1
ia64: remove redundant READELF from arch/ia64/Makefile
kbuild: do not include include/config/auto.conf from adjust_autoksyms.sh
kbuild: fix UNUSED_KSYMS_WHITELIST for Clang LTO
kbuild: lto: add _mcount to list of used symbols
Unloading the falconide module results in a crash:
Unable to handle kernel NULL pointer dereference at virtual address 00000000
Oops: 00000000
Modules linked in: falconide(-)
PC: [<002930b2>] ide_host_remove+0x2e/0x1d2
SR: 2000 SP: 00b49e28 a2: 009b0f90
d0: 00000000 d1: 009b0f90 d2: 00000000 d3: 00b48000
d4: 003cef32 d5: 00299188 a0: 0086d000 a1: 0086d000
Process rmmod (pid: 322, task=009b0f90)
Frame format=7 eff addr=00000000 ssw=0505 faddr=00000000
wb 1 stat/addr/data: 0000 00000000 00000000
wb 2 stat/addr/data: 0000 00000000 00000000
wb 3 stat/addr/data: 0000 00000000 00018da9
push data: 00000000 00000000 00000000 00000000
Stack from 00b49e90:
004c456a 0027f176 0027cb0a 0027cb9e 00000000 0086d00a 2187d3f0 0027f0e0
00b49ebc 2187d1f6 00000000 00b49ec8 002811e8 0086d000 00b49ef0 0028024c
0086d00a 002800d6 00279a1a 00000001 00000001 0086d00a 2187d3f0 00279a58
00b49f1c 002802e0 0086d00a 2187d3f0 004c456a 0086d00a ef96af74 00000000
2187d3f0 002805d2 800de064 00b49f44 0027f088 2187d3f0 00ac1cf4 2187d3f0
004c43be 2187d3f0 00000000 2187d3f0 800b66a8 00b49f5c 00280776 2187d3f0
Call Trace: [<0027f176>] __device_driver_unlock+0x0/0x48
[<0027cb0a>] device_links_busy+0x0/0x94
[<0027cb9e>] device_links_unbind_consumers+0x0/0x130
[<0027f0e0>] __device_driver_lock+0x0/0x5a
[<2187d1f6>] falconide_remove+0x12/0x18 [falconide]
[<002811e8>] platform_drv_remove+0x1c/0x28
[<0028024c>] device_release_driver_internal+0x176/0x17c
[<002800d6>] device_release_driver_internal+0x0/0x17c
[<00279a1a>] get_device+0x0/0x22
[<00279a58>] put_device+0x0/0x18
[<002802e0>] driver_detach+0x56/0x82
[<002805d2>] driver_remove_file+0x0/0x24
[<0027f088>] bus_remove_driver+0x4c/0xa4
[<00280776>] driver_unregister+0x28/0x5a
[<00281a00>] platform_driver_unregister+0x12/0x18
[<2187d2a0>] ide_falcon_driver_exit+0x10/0x16 [falconide]
[<000764f0>] sys_delete_module+0x110/0x1f2
[<000e83ea>] sys_rename+0x1a/0x1e
[<00002e0c>] syscall+0x8/0xc
[<00188004>] ext4_multi_mount_protect+0x35a/0x3ce
Code: 0029 9188 4bf9 0027 aa1c 283c 003c ef32 <265c> 4a8b 6700 00b8 2043 2028 000c 0280 00ff ff00 6600 0176 40c0 7202 b2b9 004c
Disabling lock debugging due to kernel taint
This happens because the driver_data pointer is uninitialized.
Add the missing platform_set_drvdata() call. For clarity, use the
matching platform_get_drvdata() as well.
Cc: Michael Schmitz <schmitzmic@gmail.com>
Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Fixes: 5ed0794cde593 ("m68k/atari: Convert Falcon IDE drivers to platform drivers")
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Michael Schmitz <schmitzmic@gmail.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull arch/csky updates from Guo Ren:
"Features:
- add new memory layout 2.5G(user):1.5G(kernel)
- add kmemleak support
- reconstruct VDSO framework: add VDSO with GENERIC_GETTIMEOFDAY,
GENERIC_TIME_VSYSCALL, HAVE_GENERIC_VDSO
- add faulthandler_disabled() check
- support (fix) swapon
- add (fix) _PAGE_ACCESSED for default pgprot
- abort uaccess retries upon fatal signal (from arm)
Fixes and optimizations:
- fix perf probe failure
- fix show_regs doesn't contain regs->usp
- remove custom asm/atomic.h implementation
- fix barrier design
- fix futex SMP implementation
- fix asm/cmpxchg.h with correct ordering barrier
- cleanup asm/spinlock.h
- fix PTE global for 2.5:1.5 virtual memory
- remove prologue of page fault handler in entry.S
- fix TLB maintenance synchronization problem
- add show_tlb for CPU_CK860 debug
- fix FAULT_FLAG_XXX param for handle_mm_fault
- fix update_mmu_cache called with user io mapping
- fix do_page_fault parent irq status
- fix a size determination in gpr_get()
- pgtable.h: Coding convention
- kprobe: Fix code in simulate without 'long'
- fix pfn_valid error with wrong max_mapnr
- use free_initmem_default() in free_initmem()
- fix compile error"
* tag 'csky-for-linus-5.12-rc1' of git://github.com/c-sky/csky-linux: (30 commits)
csky: Fixup compile error
csky: use free_initmem_default() in free_initmem()
csky: Fixup pfn_valid error with wrong max_mapnr
csky: Add VDSO with GENERIC_GETTIMEOFDAY, GENERIC_TIME_VSYSCALL, HAVE_GENERIC_VDSO
csky: kprobe: Fixup code in simulate without 'long'
csky: Fixup swapon
csky: pgtable.h: Coding convention
csky: Fixup _PAGE_ACCESSED for default pgprot
csky: remove unused including <linux/version.h>
csky: Fix a size determination in gpr_get()
csky: Reconstruct VDSO framework
csky: mm: abort uaccess retries upon fatal signal
csky: Sync riscv mm/fault.c for easy maintenance
csky: Fixup do_page_fault parent irq status
csky: Add faulthandler_disabled() check
csky: Fixup update_mmu_cache called with user io mapping
csky: Fixup FAULT_FLAG_XXX param for handle_mm_fault
csky: Add show_tlb for CPU_CK860 debug
csky: Fix TLB maintenance synchronization problem
csky: Add kmemleak support
...
Commit 78d3bb4483ba ("kbuild: Fix <linux/version.h> for empty SUBLEVEL
or PATCHLEVEL") fixed the build error for empty SUBLEVEL or PATCHLEVEL
by prepending a zero.
Commit 9b82f13e7ef3 ("kbuild: clamp SUBLEVEL to 255") re-introduced
this issue.
This time, we cannot take the same approach because we have C code:
#define LINUX_VERSION_PATCHLEVEL $(PATCHLEVEL)
#define LINUX_VERSION_SUBLEVEL $(SUBLEVEL)
Replace empty SUBLEVEL/PATCHLEVEL with a zero.
Fixes: 9b82f13e7ef3 ("kbuild: clamp SUBLEVEL to 255")
Reported-by: Christian Zigotzky <chzigotzky@xenosoft.de>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-and-tested-by: Sasha Levin <sashal@kernel.org>
Pull s390 cleanups from Vasily Gorbik:
"Update defconfigs and sort config select list"
* tag 's390-5.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/Kconfig: sort config S390 select list once again
s390: update defconfigs
Pull more RISC-V updates from Palmer Dabbelt:
"A pair of patches that slipped through the cracks:
- enable CPU hotplug in the defconfigs
- some cleanups to setup_bootmem
There's also a single fix for some randconfig build failures:
- make NUMA depend on SMP"
* tag 'riscv-for-linus-5.12-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: Cleanup setup_bootmem()
RISC-V: Enable CPU Hotplug in defconfigs
RISC-V: Make NUMA depend on SMP
: error: C++ style comments are not allowed in ISO C90
// Copyright (C) 2018 Hangzhou C-SKY Microsystems co.,ltd.
^
error: (this will be reported only once per input file)
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
'make -s' should be really silent. However, 'make -s V=1' prints noisy
log messages from some shell scripts.
Of course, such a combination is odd, but the build system needs to do
the right thing even if a user gives strange input.
If -s is given, KBUILD_VERBOSE should be forced to 0.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Pull power management fixes from Rafael Wysocki:
"These fix a crash in intel_pstate during resume from suspend-to-RAM
that may occur after recent changes and two resource leaks in error
paths in the operating performance points (OPP) framework, add a new
C-states table to intel_idle and update the cpuidle MAINTAINERS entry
to cover the governors too.
Specifics:
- Fix recently introduced crash in the intel_pstate driver that
occurs if scale-invariance is disabled during resume from
suspend-to-RAM due to inconsistent changes of APERF or MPERF MSR
values made by the platform firmware (Rafael Wysocki).
- Fix a memory leak and add a missing clk_put() in error paths in the
OPP framework (Quanyang Wang, Viresh Kumar).
- Add new C-states table for SnowRidge processors to the intel_idle
driver (Artem Bityutskiy).
- Update the MAINTAINERS entry for cpuidle to make it clear that the
governors are covered by it too (Lukas Bulwahn)"
* tag 'pm-5.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
intel_idle: add SnowRidge C-state table
cpufreq: intel_pstate: Fix fast-switch fallback path
opp: Call the missing clk_put() on error
opp: fix memory leak in _allocate_opp_table
MAINTAINERS: include governors into CPU IDLE TIME MANAGEMENT FRAMEWORK
...and add comments at the top and bottom.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Pull more SCSI updates from James Bottomley:
"This is a few driver updates (iscsi, mpt3sas) that were still in the
staging queue when the merge window opened (all committed on or before
8 Feb) and some small bug fixes which came in during the merge window
(all committed on 22 Feb)"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (30 commits)
scsi: hpsa: Correct dev cmds outstanding for retried cmds
scsi: sd: Fix Opal support
scsi: target: tcmu: Fix memory leak caused by wrong uio usage
scsi: target: tcmu: Move some functions without code change
scsi: sd: sd_zbc: Don't pass GFP_NOIO to kvcalloc
scsi: aic7xxx: Remove unused function pointer typedef ahc_bus_suspend/resume_t
scsi: bnx2fc: Fix Kconfig warning & CNIC build errors
scsi: ufs: Fix a duplicate dev quirk number
scsi: aic79xx: Fix spelling of version
scsi: target: core: Prevent underflow for service actions
scsi: target: core: Add cmd length set before cmd complete
scsi: iscsi: Drop session lock in iscsi_session_chkready()
scsi: qla4xxx: Use iscsi_is_session_online()
scsi: libiscsi: Reset max/exp cmdsn during recovery
scsi: iscsi_tcp: Fix shost can_queue initialization
scsi: libiscsi: Add helper to calculate max SCSI cmds per session
scsi: libiscsi: Fix iSCSI host workq destruction
scsi: libiscsi: Fix iscsi_task use after free()
scsi: libiscsi: Drop taskqueuelock
scsi: libiscsi: Fix iscsi_prep_scsi_cmd_pdu() error handling
...
After the following patches,
commit de043da0b9e7 ("RISC-V: Fix usage of memblock_enforce_memory_limit")
commit 1bd14a66ee52 ("RISC-V: Remove any memblock representing unusable memory area")
commit b10d6bca8720 ("arch, drivers: replace for_each_membock() with for_each_mem_range()")
some logic is useless, kill the mem_start/start/end and unneeded code.
Reviewed-by: Atish Patra <atish.patra@wdc.com>
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
The existing code is essentially
free_initmem_default()->free_reserved_area() without poisoning.
Note that existing code missed to update the managed page count of the
zone.
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Tested-by: Guo Ren <guoren@kernel.org>
Signed-off-by: Guo Ren <guoren@kernel.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
READELF is defined by the top Makefile.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Pull SCSI fixes from James Bottomley:
"This is a load of driver fixes (12 ufs, 1 mpt3sas, 1 cxgbi).
The big core two fixes are for power management ("block: Do not accept
any requests while suspended" and "block: Fix a race in the runtime
power management code") which finally sorts out the resume problems
we've occasionally been having.
To make the resume fix, there are seven necessary precursors which
effectively renames REQ_PREEMPT to REQ_PM, so every "special" request
in block is automatically a power management exempt one.
All of the non-PM preempt cases are removed except for the one in the
SCSI Parallel Interface (spi) domain validation which is a genuine
case where we have to run requests at high priority to validate the
bus so this becomes an autopm get/put protected request"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (22 commits)
scsi: cxgb4i: Fix TLS dependency
scsi: ufs: Un-inline ufshcd_vops_device_reset function
scsi: ufs: Re-enable WriteBooster after device reset
scsi: ufs-mediatek: Use correct path to fix compile error
scsi: mpt3sas: Signedness bug in _base_get_diag_triggers()
scsi: block: Do not accept any requests while suspended
scsi: block: Remove RQF_PREEMPT and BLK_MQ_REQ_PREEMPT
scsi: core: Only process PM requests if rpm_status != RPM_ACTIVE
scsi: scsi_transport_spi: Set RQF_PM for domain validation commands
scsi: ide: Mark power management requests with RQF_PM instead of RQF_PREEMPT
scsi: ide: Do not set the RQF_PREEMPT flag for sense requests
scsi: block: Introduce BLK_MQ_REQ_PM
scsi: block: Fix a race in the runtime power management code
scsi: ufs-pci: Enable UFSHCD_CAP_RPM_AUTOSUSPEND for Intel controllers
scsi: ufs-pci: Fix recovery from hibernate exit errors for Intel controllers
scsi: ufs-pci: Ensure UFS device is in PowerDown mode for suspend-to-disk ->poweroff()
scsi: ufs-pci: Fix restore from S4 for Intel controllers
scsi: ufs-mediatek: Keep VCC always-on for specific devices
scsi: ufs: Allow regulators being always-on
scsi: ufs: Clear UAC for RPMB after ufshcd resets
...
* pm-cpufreq:
cpufreq: intel_pstate: Fix fast-switch fallback path
* pm-cpuidle:
intel_idle: add SnowRidge C-state table
MAINTAINERS: include governors into CPU IDLE TIME MANAGEMENT FRAMEWORK
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Pull more xfs updates from Darrick Wong:
"The most notable fix here prevents premature reuse of freed metadata
blocks, and adding the ability to detect accidental nested
transactions, which are not allowed here.
- Restore a disused sysctl control knob that was inadvertently
dropped during the merge window to avoid fstests regressions.
- Don't speculatively release freed blocks from the busy list until
we're actually allocating them, which fixes a rare log recovery
regression.
- Don't nest transactions when scanning for free space.
- Add an idiot^Wmaintainer light to detect nested transactions. ;)"
* tag 'xfs-5.12-merge-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: use current->journal_info for detecting transaction recursion
xfs: don't nest transactions when scanning for eofblocks
xfs: don't reuse busy extents on extent trim
xfs: restore speculative_cow_prealloc_lifetime sysctl
Prevent incrementing device->commands_outstanding for ioaccel command
retries that are driver initiated. If the command goes through the retry
path, the device->commands_outstanding counter has already accounted for
the number of commands outstanding to the device. Only commands going
through function hpsa_cmd_resolve_events decrement this counter.
- ioaccel commands go to either HBA disks or to logical volumes comprised
of SSDs.
The extra increment is causing device resets to hang.
- Resets wait for all device outstanding commands to complete before
returning.
Replace unused field abort_pending with retry_pending. This is a
maintenance driver so these changes have the least impact/risk.
Link: https://lore.kernel.org/r/161342801747.29388.13045495968308188518.stgit@brunhilda
Tested-by: Joe Szczypek <jszczype@redhat.com>
Reviewed-by: Scott Benesh <scott.benesh@microchip.com>
Reviewed-by: Scott Teel <scott.teel@microchip.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Don Brace <don.brace@microchip.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The CPU hotplug support has been tested on QEMU, Spike, and SiFive
Unleashed so let's enable it by default in RV32 and RV64 defconfigs.
Signed-off-by: Anup Patel <anup.patel@wdc.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
The max_mapnr is the number of PFNs, not absolute PFN offset.
Using set_max_mapnr API instead of setting the value directly.
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Commit cd195bc4775a ("kbuild: split adjust_autoksyms.sh in two parts")
split out the code that needs include/config/auto.conf.
This script no longer needs to include include/config/auto.conf.
Fixes: cd195bc4775a ("kbuild: split adjust_autoksyms.sh in two parts")
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Pull block fixes from Jens Axboe:
"Two minor block fixes from this last week that should go into 5.11:
- Add missing NOWAIT debugfs definition (Andres)
- Fix kerneldoc warning introduced this merge window (Randy)"
* tag 'block-5.11-2021-01-01' of git://git.kernel.dk/linux-block:
block: add debugfs stanza for QUEUE_FLAG_NOWAIT
fs: block_dev.c: fix kernel-doc warnings from struct block_device changes
SCSI_CXGB4_ISCSI selects CHELSIO_T4. The latter depends on TLS || TLS=n, so
since 'select' does not check dependencies of the selected symbol,
SCSI_CXGB4_ISCSI should also depend on TLS || TLS=n.
This prevents the following kconfig warning and restricts SCSI_CXGB4_ISCSI
to 'm' whenever TLS=m.
WARNING: unmet direct dependencies detected for CHELSIO_T4
Depends on [m]: NETDEVICES [=y] && ETHERNET [=y] && NET_VENDOR_CHELSIO [=y] && PCI [=y] && (IPV6 [=y] || IPV6 [=y]=n) && (TLS [=m] || TLS [=m]=n)
Selected by [y]:
- SCSI_CXGB4_ISCSI [=y] && SCSI_LOWLEVEL [=y] && SCSI [=y] && PCI [=y] && INET [=y] && (IPV6 [=y] || IPV6 [=y]=n) && ETHERNET [=y]
Link: https://lore.kernel.org/r/20201208220505.24488-1-rdunlap@infradead.org
Fixes: 7b36b6e03b0d ("[SCSI] cxgb4i v5: iscsi driver")
Cc: Karen Xie <kxie@chelsio.com>
Cc: linux-scsi@vger.kernel.org
Cc: "James E.J. Bottomley" <jejb@linux.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Pull operating performance points (OPP) framework fixes for 5.11-rc2
from Viresh Kumar:
"This contains two patches to fix freeing of resources in error paths."
* 'opp/linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm:
opp: Call the missing clk_put() on error
opp: fix memory leak in _allocate_opp_table
When sugov_update_single_perf() falls back to the "frequency"
path due to the missing scale-invariance, it will call
cpufreq_driver_fast_switch() via sugov_fast_switch()
and the driver's ->fast_switch() callback will be invoked,
so it must not be NULL.
However, after commit a365ab6b9dfb ("cpufreq: intel_pstate: Implement
the ->adjust_perf() callback") intel_pstate sets ->fast_switch() to
NULL when it is going to use intel_cpufreq_adjust_perf(), which is a
mistake, because on x86 the scale-invariance may be turned off
dynamically, so modify it to retain the original ->adjust_perf()
callback pointer.
Fixes: a365ab6b9dfb ("cpufreq: intel_pstate: Implement the ->adjust_perf() callback")
Reported-by: Kenneth R. Crudup <kenny@panix.com>
Tested-by: Kenneth R. Crudup <kenny@panix.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Add C-state table for the SnowRidge SoC which is found on Intel Jacobsville
platforms.
The following has been changed.
1. C1E latency changed from 10us to 15us. It was measured using the
open source "wult" tool (the "nic" method, 15us is the 99.99th
percentile).
2. C1E power break even changed from 20us to 25us, which may result
in less C1E residency in some workloads.
3. C6 latency changed from 50us to 130us. Measured the same way as C1E.
The C6 C-state is supported only by some SnowRidge revisions, so add a C-state
table commentary about this.
On SnowRidge, C6 support is enumerated via the usual mechanism: "mwait" leaf of
the "cpuid" instruction. The 'intel_idle' driver does check this leaf, so even
though C6 is present in the table, the driver will only use it if the CPU does
support it.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Pull more block updates from Jens Axboe:
"A few stragglers (and one due to me missing it originally), and fixes
for changes in this merge window mostly. In particular:
- blktrace cleanups (Chaitanya, Greg)
- Kill dead blk_pm_* functions (Bart)
- Fixes for the bio alloc changes (Christoph)
- Fix for the partition changes (Christoph, Ming)
- Fix for turning off iopoll with polled IO inflight (Jeffle)
- nbd disconnect fix (Josef)
- loop fsync error fix (Mauricio)
- kyber update depth fix (Yang)
- max_sectors alignment fix (Mikulas)
- Add bio_max_segs helper (Matthew)"
* tag 'block-5.12-2021-02-27' of git://git.kernel.dk/linux-block: (21 commits)
block: Add bio_max_segs
blktrace: fix documentation for blk_fill_rw()
block: memory allocations in bounce_clone_bio must not fail
block: remove the gfp_mask argument to bounce_clone_bio
block: fix bounce_clone_bio for passthrough bios
block-crypto-fallback: use a bio_set for splitting bios
block: fix logging on capacity change
blk-settings: align max_sectors on "logical_block_size" boundary
block: reopen the device in blkdev_reread_part
block: don't skip empty device in in disk_uevent
blktrace: remove debugfs file dentries from struct blk_trace
nbd: handle device refs for DESTROY_ON_DISCONNECT properly
kyber: introduce kyber_depth_updated()
loop: fix I/O error on fsync() in detached loop devices
block: fix potential IO hang when turning off io_poll
block: get rid of the trace rq insert wrapper
blktrace: fix blk_rq_merge documentation
blktrace: fix blk_rq_issue documentation
blktrace: add blk_fill_rwbs documentation comment
block: remove superfluous param in blk_fill_rwbs()
...
Because the iomap code using PF_MEMALLOC_NOFS to detect transaction
recursion in XFS is just wrong. Remove it from the iomap code and
replace it with XFS specific internal checks using
current->journal_info instead.
[djwong: This change also realigns the lifetime of NOFS flag changes to
match the incore transaction, instead of the inconsistent scheme we have
now.]
Fixes: 9070733b4efa ("xfs: abstract PF_FSTRANS to PF_MEMALLOC_NOFS")
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
The SCSI core has been modified recently such that it only processes PM
requests if rpm_status != RPM_ACTIVE. Since some Opal requests are
submitted while rpm_status != RPM_ACTIVE, set flag RQF_PM for Opal
requests.
See also https://bugzilla.kernel.org/show_bug.cgi?id=211227.
[mkp: updated sha for PM patch]
Link: https://lore.kernel.org/r/20210222021042.3534-1-bvanassche@acm.org
Fixes: d80210f25ff0 ("sd: add support for TCG OPAL self encrypting disks")
Fixes: e6044f714b25 ("scsi: core: Only process PM requests if rpm_status != RPM_ACTIVE")
Cc: chriscjsus@yahoo.com
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: stable@vger.kernel.org
Reported-by: chriscjsus@yahoo.com
Tested-by: chriscjsus@yahoo.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In theory these are orthogonal, but in practice all NUMA systems are
SMP. NUMA && !SMP doesn't build, everyone else is coupling them, and I
don't really see any value in supporting that configuration.
Fixes: 4f0e8eef772e ("riscv: Add numa support for riscv64 platform")
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Suggested-by: Atish Patra <atishp@atishpatra.org>
Reported-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Tested-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
It could help to reduce the latency of the time-related functions
in user space.
We have referenced arm's and riscv's implementation for the patch.
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Cc: Vincent Chen <vincent.chen@sifive.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Commit fbe078d397b4 ("kbuild: lto: add a default list of used symbols")
does not work as expected if the .config file has already specified
CONFIG_UNUSED_KSYMS_WHITELIST="my/own/white/list" before enabling
CONFIG_LTO_CLANG.
So, the user-supplied whitelist and LTO-specific white list must be
independent of each other.
I refactored the shell script so CONFIG_MODVERSIONS and CONFIG_CLANG_LTO
handle whitelists in the same way.
Fixes: fbe078d397b4 ("kbuild: lto: add a default list of used symbols")
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Pull io_uring fixes from Jens Axboe:
"A few fixes that should go into 5.11, all marked for stable as well:
- Fix issue around identity COW'ing and users that share a ring
across processes
- Fix a hang associated with unregistering fixed files (Pavel)
- Move the 'process is exiting' cancelation a bit earlier, so
task_works aren't affected by it (Pavel)"
* tag 'io_uring-5.11-2021-01-01' of git://git.kernel.dk/linux-block:
kernel/io_uring: cancel io_uring before task works
io_uring: fix io_sqe_files_unregister() hangs
io_uring: add a helper for setting a ref node
io_uring: don't assume mm is constant across submits
This was missed in 021a24460dc2. Leads to the numeric value of
QUEUE_FLAG_NOWAIT (i.e. 29) showing up in
/sys/kernel/debug/block/*/state.
Fixes: 021a24460dc28e7412aecfae89f60e1847e685c0
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andres Freund <andres@anarazel.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
More and more statements are being added to ufshcd_vops_device_reset() and
this function is being called from multiple locations in the driver.
Un-inline the function to allow the compiler to make better decisions.
Link: https://lore.kernel.org/r/20201208135635.15326-3-stanley.chu@mediatek.com
Reviewed-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Stanley Chu <stanley.chu@mediatek.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Fix the clock reference counting by calling the missing clk_put() in the
error path.
Cc: v5.10 <stable@vger.kernel.org> # v5.10
Fixes: dd461cd9183f ("opp: Allow dev_pm_opp_get_opp_table() to return -EPROBE_DEFER")
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
The current pattern in the file entry does not make the files in the
governors subdirectory to be a part of the CPU IDLE TIME MANAGEMENT
FRAMEWORK.
Adjust the file pattern to include files in governors.
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Since commit 36e2c7421f02 ("fs: don't allow splice read/write without
explicit ops") we've required that file operation structures explicitly
enable splice support, rather than falling back to the default handlers.
Most /proc files use the indirect 'struct proc_ops' to describe their
file operations, and were fixed up to support splice earlier in commits
40be821d627c..b24c30c67863, but the mountinfo files interact with the
VFS directly using their own 'struct file_operations' and got missed as
a result.
This adds the necessary support for splice to work for /proc/*/mountinfo
and friends.
Reported-by: Joan Bruguera Micó <joanbrugueram@gmail.com>
Reported-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=209971
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull io_uring thread rewrite from Jens Axboe:
"This converts the io-wq workers to be forked off the tasks in question
instead of being kernel threads that assume various bits of the
original task identity.
This kills > 400 lines of code from io_uring/io-wq, and it's the worst
part of the code. We've had several bugs in this area, and the worry
is always that we could be missing some pieces for file types doing
unusual things (recent /dev/tty example comes to mind, userfaultfd
reads installing file descriptors is another fun one... - both of
which need special handling, and I bet it's not the last weird oddity
we'll find).
With these identical workers, we can have full confidence that we're
never missing anything. That, in itself, is a huge win. Outside of
that, it's also more efficient since we're not wasting space and code
on tracking state, or switching between different states.
I'm sure we're going to find little things to patch up after this
series, but testing has been pretty thorough, from the usual
regression suite to production. Any issue that may crop up should be
manageable.
There's also a nice series of further reductions we can do on top of
this, but I wanted to get the meat of it out sooner rather than later.
The general worry here isn't that it's fundamentally broken. Most of
the little issues we've found over the last week have been related to
just changes in how thread startup/exit is done, since that's the main
difference between using kthreads and these kinds of threads. In fact,
if all goes according to plan, I want to get this into the 5.10 and
5.11 stable branches as well.
That said, the changes outside of io_uring/io-wq are:
- arch setup, simple one-liner to each arch copy_thread()
implementation.
- Removal of net and proc restrictions for io_uring, they are no
longer needed or useful"
* tag 'io_uring-worker.v3-2021-02-25' of git://git.kernel.dk/linux-block: (30 commits)
io-wq: remove now unused IO_WQ_BIT_ERROR
io_uring: fix SQPOLL thread handling over exec
io-wq: improve manager/worker handling over exec
io_uring: ensure SQPOLL startup is triggered before error shutdown
io-wq: make buffered file write hashed work map per-ctx
io-wq: fix race around io_worker grabbing
io-wq: fix races around manager/worker creation and task exit
io_uring: ensure io-wq context is always destroyed for tasks
arch: ensure parisc/powerpc handle PF_IO_WORKER in copy_thread()
io_uring: cleanup ->user usage
io-wq: remove nr_process accounting
io_uring: flag new native workers with IORING_FEAT_NATIVE_WORKERS
net: remove cmsg restriction from io_uring based send/recvmsg calls
Revert "proc: don't allow async path resolution of /proc/self components"
Revert "proc: don't allow async path resolution of /proc/thread-self components"
io_uring: move SQPOLL thread io-wq forked worker
io-wq: make io_wq_fork_thread() available to other users
io-wq: only remove worker from free_list, if it was there
io_uring: remove io_identity
io_uring: remove any grabbing of context
...
It's often inconvenient to use BIO_MAX_PAGES due to min() requiring the
sign to be the same. Introduce bio_max_segs() and change BIO_MAX_PAGES to
be unsigned to make it easier for the users.
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Brian Foster reported a lockdep warning on xfs/167:
============================================
WARNING: possible recursive locking detected
5.11.0-rc4 #35 Tainted: G W I
--------------------------------------------
fsstress/17733 is trying to acquire lock:
ffff8e0fd1d90650 (sb_internal){++++}-{0:0}, at: xfs_free_eofblocks+0x104/0x1d0 [xfs]
but task is already holding lock:
ffff8e0fd1d90650 (sb_internal){++++}-{0:0}, at: xfs_trans_alloc_inode+0x5f/0x160 [xfs]
stack backtrace:
CPU: 38 PID: 17733 Comm: fsstress Tainted: G W I 5.11.0-rc4 #35
Hardware name: Dell Inc. PowerEdge R740/01KPX8, BIOS 1.6.11 11/20/2018
Call Trace:
dump_stack+0x8b/0xb0
__lock_acquire.cold+0x159/0x2ab
lock_acquire+0x116/0x370
xfs_trans_alloc+0x1ad/0x310 [xfs]
xfs_free_eofblocks+0x104/0x1d0 [xfs]
xfs_blockgc_scan_inode+0x24/0x60 [xfs]
xfs_inode_walk_ag+0x202/0x4b0 [xfs]
xfs_inode_walk+0x66/0xc0 [xfs]
xfs_trans_alloc+0x160/0x310 [xfs]
xfs_trans_alloc_inode+0x5f/0x160 [xfs]
xfs_alloc_file_space+0x105/0x300 [xfs]
xfs_file_fallocate+0x270/0x460 [xfs]
vfs_fallocate+0x14d/0x3d0
__x64_sys_fallocate+0x3e/0x70
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xa9
The cause of this is the new code that spurs a scan to garbage collect
speculative preallocations if we fail to reserve enough blocks while
allocating a transaction. While the warning itself is a fairly benign
lockdep complaint, it does expose a potential livelock if the rwsem
behavior ever changes with regards to nesting read locks when someone's
waiting for a write lock.
Fix this by freeing the transaction and jumping back to xfs_trans_alloc
like this patch in the V4 submission[1].
[1] https://lore.kernel.org/linux-xfs/161142798066.2171939.9311024588681972086.stgit@magnolia/
Fixes: a1a7d05a0576 ("xfs: flush speculative space allocations when we run out of space")
Reported-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
When user deletes a tcmu device via configFS, tcmu calls
uio_unregister_device(). During that call uio resets its pointer to struct
uio_info provided by tcmu. That means, after uio_unregister_device() uio
will no longer execute any of the callbacks tcmu had set in uio_info.
Especially, if userspace daemon still holds the corresponding uio device
open or mmap'ed while tcmu calls uio_unregister_device(), uio will not call
tcmu_release() when userspace finally closes and munmaps the uio device.
Since tcmu does refcounting for the tcmu device in tcmu_open() and
tcmu_release(), in the decribed case refcount does not drop to 0 and tcmu
does not free tcmu device's resources. In extreme cases this can cause
memory leaking of up to 1 GB for a single tcmu device.
After uio_unregister_device(), uio will reject every open, read, write,
mmap from userspace with -EOI. But userspace daemon can still access the
mmap'ed command ring and data area. Therefore tcmu should wait until
userspace munmaps the uio device before it frees the resources, as we don't
want to cause SIGSEGV or SIGBUS to user space.
That said, current refcounting during tcmu_open and tcmu_release does not
work correctly, and refcounting better should be done in the open and close
callouts of the vm_operations_struct, which tcmu assigns to each mmap of
the uio device (because it wants its own page fault handler).
This patch fixes the memory leak by removing refcounting from tcmu_open and
tcmu_close, and instead adding new tcmu_vma_open() and tcmu_vma_close()
handlers that only do refcounting.
Link: https://lore.kernel.org/r/20210218175039.7829-3-bostroesser@gmail.com
Reviewed-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Bodo Stroesser <bostroesser@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Pull RISC-V updates from Palmer Dabbelt:
"A handful of new RISC-V related patches for this merge window:
- A check to ensure drivers are properly using uaccess. This isn't
manifesting with any of the drivers I'm currently using, but may
catch errors in new drivers.
- Some preliminary support for the FU740, along with the HiFive
Unleashed it will appear on.
- NUMA support for RISC-V, which involves making the arm64 code
generic.
- Support for kasan on the vmalloc region.
- A handful of new drivers for the Kendryte K210, along with the DT
plumbing required to boot on a handful of K210-based boards.
- Support for allocating ASIDs.
- Preliminary support for kernels larger than 128MiB.
- Various other improvements to our KASAN support, including the
utilization of huge pages when allocating the KASAN regions.
We may have already found a bug with the KASAN_VMALLOC code, but it's
passing my tests. There's a fix in the works, but that will probably
miss the merge window.
* tag 'riscv-for-linus-5.12-mw0' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (75 commits)
riscv: Improve kasan population by using hugepages when possible
riscv: Improve kasan population function
riscv: Use KASAN_SHADOW_INIT define for kasan memory initialization
riscv: Improve kasan definitions
riscv: Get rid of MAX_EARLY_MAPPING_SIZE
soc: canaan: Sort the Makefile alphabetically
riscv: Disable KSAN_SANITIZE for vDSO
riscv: Remove unnecessary declaration
riscv: Add Canaan Kendryte K210 SD card defconfig
riscv: Update Canaan Kendryte K210 defconfig
riscv: Add Kendryte KD233 board device tree
riscv: Add SiPeed MAIXDUINO board device tree
riscv: Add SiPeed MAIX GO board device tree
riscv: Add SiPeed MAIX DOCK board device tree
riscv: Add SiPeed MAIX BiT board device tree
riscv: Update Canaan Kendryte K210 device tree
dt-bindings: add resets property to dw-apb-timer
dt-bindings: fix sifive gpio properties
dt-bindings: update sifive uart compatible string
dt-bindings: update sifive clint compatible string
...
The type of 'val' is 'unsigned long' in simulate_blz32, so 'val < 0'
can't be true.
Cast 'val' to 'long' here to determine branch token or not,
Fixup instructions: bnezad32, bhsz32, bhz32, blsz32, blz32
Link: https://lore.kernel.org/linux-csky/CAJF2gTQjKXR9gpo06WAWG1aquiT87mATiMGorXs6ChxOxoe90Q@mail.gmail.com/T/#t
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Co-developed-by: Menglong Dong <dong.menglong@zte.com.cn>
Signed-off-by: Menglong Dong <dong.menglong@zte.com.cn>
Some randconfig builds fail with undefined references to _mcount
when CONFIG_TRIM_UNUSED_KSYMS is set:
ERROR: modpost: "_mcount" [drivers/tee/optee/optee.ko] undefined!
ERROR: modpost: "_mcount" [drivers/fsi/fsi-occ.ko] undefined!
ERROR: modpost: "_mcount" [drivers/fpga/dfl-pci.ko] undefined!
Since there is already a list of symbols that get generated at link
time, add this one as well.
Fixes: fbe078d397b4 ("kbuild: lto: add a default list of used symbols")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Commit 436e980e2ed5 ("kbuild: don't hardcode depmod path") stopped
hard-coding the path of depmod, but in the process caused trouble for
distributions that had that /sbin location, but didn't have it in the
PATH (generally because /sbin is limited to the super-user path).
Work around it for now by just adding /sbin to the end of PATH in the
depmod.sh script.
Reported-and-tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
For cancelling io_uring requests it needs either to be able to run
currently enqueued task_works or having it shut down by that moment.
Otherwise io_uring_cancel_files() may be waiting for requests that won't
ever complete.
Go with the first way and do cancellations before setting PF_EXITING and
so before putting the task_work infrastructure into a transition state
where task_work_run() would better not be called.
Cc: stable@vger.kernel.org # 5.5+
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Fix new kernel-doc warnings in fs/block_dev.c:
../fs/block_dev.c:1066: warning: Excess function parameter 'whole' description in 'bd_abort_claiming'
../fs/block_dev.c:1837: warning: Function parameter or member 'dev' not described in 'lookup_bdev'
Fixes: 4e7b5671c6a8 ("block: remove i_bdev")
Fixes: 37c3fc9abb25 ("block: simplify the block device claiming interface")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: linux-fsdevel@vger.kernel.org
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
UFS 3.1 specification mentions that the WriteBooster flags listed below
will be set to their default values, i.e. disabled, after power cycle or
any type of reset event. Thus we need to reset the flag variables kept in
struct hba to align with the device status and ensure that
WriteBooster-related functions are configured properly after device reset.
Without this fix, WriteBooster will not be enabled successfully after by
ufshcd_wb_ctrl() after device reset because hba->wb_enabled remains true.
Flags required to be reset to default values:
- fWriteBoosterEn: hba->wb_enabled
- fWriteBoosterBufferFlushEn: hba->wb_buf_flush_enabled
- fWriteBoosterBufferFlushDuringHibernate: No variable mapped
Link: https://lore.kernel.org/r/20201208135635.15326-2-stanley.chu@mediatek.com
Fixes: 3d17b9b5ab11 ("scsi: ufs: Add write booster feature support")
Reviewed-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Stanley Chu <stanley.chu@mediatek.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In function _allocate_opp_table, opp_dev is allocated and referenced
by opp_table via _add_opp_dev. But in the case that the subsequent calls
return -EPROBE_DEFER, it will jump to err label and opp_table will be
freed. Then opp_dev becomes an unreferenced object to cause memory leak.
So let's call _remove_opp_dev to do the cleanup.
This fixes the following kmemleak report:
unreferenced object 0xffff000801524a00 (size 128):
comm "swapper/0", pid 1, jiffies 4294892465 (age 84.616s)
hex dump (first 32 bytes):
40 00 56 01 08 00 ff ff 40 00 56 01 08 00 ff ff @.V.....@.V.....
b8 52 77 7f 08 00 ff ff 00 3c 4c 00 08 00 ff ff .Rw......<L.....
backtrace:
[<00000000b1289fb1>] kmemleak_alloc+0x30/0x40
[<0000000056da48f0>] kmem_cache_alloc+0x3d4/0x588
[<00000000a84b3b0e>] _add_opp_dev+0x2c/0x88
[<0000000062a380cd>] _add_opp_table_indexed+0x124/0x268
[<000000008b4c8f1f>] dev_pm_opp_of_add_table+0x20/0x1d8
[<00000000e5316798>] dev_pm_opp_of_cpumask_add_table+0x48/0xf0
[<00000000db0a8ec2>] dt_cpufreq_probe+0x20c/0x448
[<0000000030a3a26c>] platform_probe+0x68/0xd8
[<00000000c618e78d>] really_probe+0xd0/0x3a0
[<00000000642e856f>] driver_probe_device+0x58/0xb8
[<00000000f10f5307>] device_driver_attach+0x74/0x80
[<0000000004f254b8>] __driver_attach+0x58/0xe0
[<0000000009d5d19e>] bus_for_each_dev+0x70/0xc8
[<0000000000d22e1c>] driver_attach+0x24/0x30
[<0000000001d4e952>] bus_add_driver+0x14c/0x1f0
[<0000000089928aaa>] driver_register+0x64/0x120
Cc: v5.10 <stable@vger.kernel.org> # v5.10
Fixes: dd461cd9183f ("opp: Allow dev_pm_opp_get_opp_table() to return -EPROBE_DEFER")
Signed-off-by: Quanyang Wang <quanyang.wang@windriver.com>
[ Viresh: Added the stable tag ]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Pull power management updates from Rafael Wysocki:
"These update cpufreq (core and drivers), cpuidle (polling state
implementation and the PSCI driver), the OPP (operating performance
points) framework, devfreq (core and drivers), the power capping RAPL
(Running Average Power Limit) driver, the Energy Model support, the
generic power domains (genpd) framework, the ACPI device power
management, the core system-wide suspend code and power management
utilities.
Specifics:
- Use local_clock() instead of jiffies in the cpufreq statistics to
improve accuracy (Viresh Kumar).
- Fix up OPP usage in the cpufreq-dt and qcom-cpufreq-nvmem cpufreq
drivers (Viresh Kumar).
- Clean up the cpufreq core, the intel_pstate driver and the
schedutil cpufreq governor (Rafael Wysocki).
- Fix up error code paths in the sti-cpufreq and mediatek cpufreq
drivers (Yangtao Li, Qinglang Miao).
- Fix cpufreq_online() to return error codes instead of success (0)
in all cases when it fails (Wang ShaoBo).
- Add mt8167 support to the mediatek cpufreq driver and blacklist
mt8516 in the cpufreq-dt-platdev driver (Fabien Parent).
- Modify the tegra194 cpufreq driver to always return values from the
frequency table as the current frequency and clean up that driver
(Sumit Gupta, Jon Hunter).
- Modify the arm_scmi cpufreq driver to allow it to discover the
power scale present in the performance protocol and provide this
information to the Energy Model (Lukasz Luba).
- Add missing MODULE_DEVICE_TABLE to several cpufreq drivers (Pali
Rohár).
- Clean up the CPPC cpufreq driver (Ionela Voinescu).
- Fix NVMEM_IMX_OCOTP dependency in the imx cpufreq driver (Arnd
Bergmann).
- Rework the poling interval selection for the polling state in
cpuidle (Mel Gorman).
- Enable suspend-to-idle for PSCI OSI mode in the PSCI cpuidle driver
(Ulf Hansson).
- Modify the OPP framework to support empty (node-less) OPP tables in
DT for passing dependency information (Nicola Mazzucato).
- Fix potential lockdep issue in the OPP core and clean up the OPP
core (Viresh Kumar).
- Modify dev_pm_opp_put_regulators() to accept a NULL argument and
update its users accordingly (Viresh Kumar).
- Add frequency changes tracepoint to devfreq (Matthias Kaehlcke).
- Add support for governor feature flags to devfreq, make devfreq
sysfs file permissions depend on the governor and clean up the
devfreq core (Chanwoo Choi).
- Clean up the tegra20 devfreq driver and deprecate it to allow
another driver based on EMC_STAT to be used instead of it (Dmitry
Osipenko).
- Add interconnect support to the tegra30 devfreq driver, allow it to
take the interconnect and OPP information from DT and clean it up
(Dmitry Osipenko).
- Add interconnect support to the exynos-bus devfreq driver along
with interconnect properties documentation (Sylwester Nawrocki).
- Add suport for AMD Fam17h and Fam19h processors to the RAPL power
capping driver (Victor Ding, Kim Phillips).
- Fix handling of overly long constraint names in the powercap
framework (Lukasz Luba).
- Fix the wakeup configuration handling for bridges in the ACPI
device power management core (Rafael Wysocki).
- Add support for using an abstract scale for power units in the
Energy Model (EM) and document it (Lukasz Luba).
- Add em_cpu_energy() micro-optimization to the EM (Pavankumar
Kondeti).
- Modify the generic power domains (genpd) framwework to support
suspend-to-idle (Ulf Hansson).
- Fix creation of debugfs nodes in genpd (Thierry Strudel).
- Clean up genpd (Lina Iyer).
- Clean up the core system-wide suspend code and make it print driver
flags for devices with debug enabled (Alex Shi, Patrice Chotard,
Chen Yu).
- Modify the ACPI system reboot code to make it prepare for system
power off to avoid confusing the platform firmware (Kai-Heng Feng).
- Update the pm-graph (multiple changes, mostly usability-related)
and cpupower (online and offline CPU information support) PM
utilities (Todd Brandt, Brahadambal Srinivasan)"
* tag 'pm-5.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (86 commits)
cpufreq: Fix cpufreq_online() return value on errors
cpufreq: Fix up several kerneldoc comments
cpufreq: stats: Use local_clock() instead of jiffies
cpufreq: schedutil: Simplify sugov_update_next_freq()
cpufreq: intel_pstate: Simplify intel_cpufreq_update_pstate()
PM: domains: create debugfs nodes when adding power domains
opp: of: Allow empty opp-table with opp-shared
dt-bindings: opp: Allow empty OPP tables
media: venus: dev_pm_opp_put_*() accepts NULL argument
drm/panfrost: dev_pm_opp_put_*() accepts NULL argument
drm/lima: dev_pm_opp_put_*() accepts NULL argument
PM / devfreq: exynos: dev_pm_opp_put_*() accepts NULL argument
cpufreq: qcom-cpufreq-nvmem: dev_pm_opp_put_*() accepts NULL argument
cpufreq: dt: dev_pm_opp_put_regulators() accepts NULL argument
opp: Allow dev_pm_opp_put_*() APIs to accept NULL opp_table
opp: Don't create an OPP table from dev_pm_opp_get_opp_table()
cpufreq: dt: Don't (ab)use dev_pm_opp_get_opp_table() to create OPP table
opp: Reduce the size of critical section in _opp_kref_release()
PM / EM: Micro optimization in em_cpu_energy
cpufreq: arm_scmi: Discover the power scale in performance protocol
...
Pull NTB fixes from Jon Mason:
"Bug fix for IDT NTB and Intel NTB LTR management support"
* tag 'ntb-5.11' of git://github.com/jonmason/ntb:
ntb: intel: add Intel NTB LTR vendor support for gen4 NTB
ntb: idt: fix error check in ntb_hw_idt.c
Pull misc vfs updates from Al Viro:
"Assorted stuff pile - no common topic here"
* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
whack-a-mole: don't open-code iminor/imajor
9p: fix misuse of sscanf() in v9fs_stat2inode()
audit_alloc_mark(): don't open-code ERR_CAST()
fs/inode.c: make inode_init_always() initialize i_ino to 0
vfs: don't unnecessarily clone write access for writable fds
This flag is now dead, remove it.
Fixes: 1cbd9c2bcf02 ("io-wq: don't create any IO workers upfront")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Add missing ":" after rwbs function parameter documentation that fixes
following warning :-
./kernel/trace/blktrace.c:1877: warning: Function parameter or member 'rwbs' not described in 'blk_fill_rwbs'
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Fixes: 1f83bb4b4914 ("blktrace: add blk_fill_rwbs documentation comment")
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Freed extents are marked busy from the point the freeing transaction
commits until the associated CIL context is checkpointed to the log.
This prevents reuse and overwrite of recently freed blocks before
the changes are committed to disk, which can lead to corruption
after a crash. The exception to this rule is that metadata
allocation is allowed to reuse busy extents because metadata changes
are also logged.
As of commit 97d3ac75e5e0 ("xfs: exact busy extent tracking"), XFS
has allowed modification or complete invalidation of outstanding
busy extents for metadata allocations. This implementation assumes
that use of the associated extent is imminent, which is not always
the case. For example, the trimmed extent might not satisfy the
minimum length of the allocation request, or the allocation
algorithm might be involved in a search for the optimal result based
on locality.
generic/019 reproduces a corruption caused by this scenario. First,
a metadata block (usually a bmbt or symlink block) is freed from an
inode. A subsequent bmbt split on an unrelated inode attempts a near
mode allocation request that invalidates the busy block during the
search, but does not ultimately allocate it. Due to the busy state
invalidation, the block is no longer considered busy to subsequent
allocation. A direct I/O write request immediately allocates the
block and writes to it. Finally, the filesystem crashes while in a
state where the initial metadata block free had not committed to the
on-disk log. After recovery, the original metadata block is in its
original location as expected, but has been corrupted by the
aforementioned dio.
This demonstrates that it is fundamentally unsafe to modify busy
extent state for extents that are not guaranteed to be allocated.
This applies to pretty much all of the code paths that currently
trim busy extents for one reason or another. Therefore to address
this problem, drop the reuse mechanism from the busy extent trim
path. This code already knows how to return partial non-busy ranges
of the targeted free extent and higher level code tracks the busy
state of the allocation attempt. If a block allocation fails where
one or more candidate extents is busy, we force the log and retry
the allocation.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Pull Kbuild fixes from Masahiro Yamada:
- Fix UNUSED_KSYMS_WHITELIST for Clang LTO
- Make -s builds really silent irrespective of V= option
- Fix build error when SUBLEVEL or PATCHLEVEL is empty
* tag 'kbuild-fixes-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kbuild: Fix <linux/version.h> for empty SUBLEVEL or PATCHLEVEL again
kbuild: make -s option take precedence over V=1
ia64: remove redundant READELF from arch/ia64/Makefile
kbuild: do not include include/config/auto.conf from adjust_autoksyms.sh
kbuild: fix UNUSED_KSYMS_WHITELIST for Clang LTO
kbuild: lto: add _mcount to list of used symbols
Unloading the falconide module results in a crash:
Unable to handle kernel NULL pointer dereference at virtual address 00000000
Oops: 00000000
Modules linked in: falconide(-)
PC: [<002930b2>] ide_host_remove+0x2e/0x1d2
SR: 2000 SP: 00b49e28 a2: 009b0f90
d0: 00000000 d1: 009b0f90 d2: 00000000 d3: 00b48000
d4: 003cef32 d5: 00299188 a0: 0086d000 a1: 0086d000
Process rmmod (pid: 322, task=009b0f90)
Frame format=7 eff addr=00000000 ssw=0505 faddr=00000000
wb 1 stat/addr/data: 0000 00000000 00000000
wb 2 stat/addr/data: 0000 00000000 00000000
wb 3 stat/addr/data: 0000 00000000 00018da9
push data: 00000000 00000000 00000000 00000000
Stack from 00b49e90:
004c456a 0027f176 0027cb0a 0027cb9e 00000000 0086d00a 2187d3f0 0027f0e0
00b49ebc 2187d1f6 00000000 00b49ec8 002811e8 0086d000 00b49ef0 0028024c
0086d00a 002800d6 00279a1a 00000001 00000001 0086d00a 2187d3f0 00279a58
00b49f1c 002802e0 0086d00a 2187d3f0 004c456a 0086d00a ef96af74 00000000
2187d3f0 002805d2 800de064 00b49f44 0027f088 2187d3f0 00ac1cf4 2187d3f0
004c43be 2187d3f0 00000000 2187d3f0 800b66a8 00b49f5c 00280776 2187d3f0
Call Trace: [<0027f176>] __device_driver_unlock+0x0/0x48
[<0027cb0a>] device_links_busy+0x0/0x94
[<0027cb9e>] device_links_unbind_consumers+0x0/0x130
[<0027f0e0>] __device_driver_lock+0x0/0x5a
[<2187d1f6>] falconide_remove+0x12/0x18 [falconide]
[<002811e8>] platform_drv_remove+0x1c/0x28
[<0028024c>] device_release_driver_internal+0x176/0x17c
[<002800d6>] device_release_driver_internal+0x0/0x17c
[<00279a1a>] get_device+0x0/0x22
[<00279a58>] put_device+0x0/0x18
[<002802e0>] driver_detach+0x56/0x82
[<002805d2>] driver_remove_file+0x0/0x24
[<0027f088>] bus_remove_driver+0x4c/0xa4
[<00280776>] driver_unregister+0x28/0x5a
[<00281a00>] platform_driver_unregister+0x12/0x18
[<2187d2a0>] ide_falcon_driver_exit+0x10/0x16 [falconide]
[<000764f0>] sys_delete_module+0x110/0x1f2
[<000e83ea>] sys_rename+0x1a/0x1e
[<00002e0c>] syscall+0x8/0xc
[<00188004>] ext4_multi_mount_protect+0x35a/0x3ce
Code: 0029 9188 4bf9 0027 aa1c 283c 003c ef32 <265c> 4a8b 6700 00b8 2043 2028 000c 0280 00ff ff00 6600 0176 40c0 7202 b2b9 004c
Disabling lock debugging due to kernel taint
This happens because the driver_data pointer is uninitialized.
Add the missing platform_set_drvdata() call. For clarity, use the
matching platform_get_drvdata() as well.
Cc: Michael Schmitz <schmitzmic@gmail.com>
Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Fixes: 5ed0794cde593 ("m68k/atari: Convert Falcon IDE drivers to platform drivers")
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Michael Schmitz <schmitzmic@gmail.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull arch/csky updates from Guo Ren:
"Features:
- add new memory layout 2.5G(user):1.5G(kernel)
- add kmemleak support
- reconstruct VDSO framework: add VDSO with GENERIC_GETTIMEOFDAY,
GENERIC_TIME_VSYSCALL, HAVE_GENERIC_VDSO
- add faulthandler_disabled() check
- support (fix) swapon
- add (fix) _PAGE_ACCESSED for default pgprot
- abort uaccess retries upon fatal signal (from arm)
Fixes and optimizations:
- fix perf probe failure
- fix show_regs doesn't contain regs->usp
- remove custom asm/atomic.h implementation
- fix barrier design
- fix futex SMP implementation
- fix asm/cmpxchg.h with correct ordering barrier
- cleanup asm/spinlock.h
- fix PTE global for 2.5:1.5 virtual memory
- remove prologue of page fault handler in entry.S
- fix TLB maintenance synchronization problem
- add show_tlb for CPU_CK860 debug
- fix FAULT_FLAG_XXX param for handle_mm_fault
- fix update_mmu_cache called with user io mapping
- fix do_page_fault parent irq status
- fix a size determination in gpr_get()
- pgtable.h: Coding convention
- kprobe: Fix code in simulate without 'long'
- fix pfn_valid error with wrong max_mapnr
- use free_initmem_default() in free_initmem()
- fix compile error"
* tag 'csky-for-linus-5.12-rc1' of git://github.com/c-sky/csky-linux: (30 commits)
csky: Fixup compile error
csky: use free_initmem_default() in free_initmem()
csky: Fixup pfn_valid error with wrong max_mapnr
csky: Add VDSO with GENERIC_GETTIMEOFDAY, GENERIC_TIME_VSYSCALL, HAVE_GENERIC_VDSO
csky: kprobe: Fixup code in simulate without 'long'
csky: Fixup swapon
csky: pgtable.h: Coding convention
csky: Fixup _PAGE_ACCESSED for default pgprot
csky: remove unused including <linux/version.h>
csky: Fix a size determination in gpr_get()
csky: Reconstruct VDSO framework
csky: mm: abort uaccess retries upon fatal signal
csky: Sync riscv mm/fault.c for easy maintenance
csky: Fixup do_page_fault parent irq status
csky: Add faulthandler_disabled() check
csky: Fixup update_mmu_cache called with user io mapping
csky: Fixup FAULT_FLAG_XXX param for handle_mm_fault
csky: Add show_tlb for CPU_CK860 debug
csky: Fix TLB maintenance synchronization problem
csky: Add kmemleak support
...
Commit 78d3bb4483ba ("kbuild: Fix <linux/version.h> for empty SUBLEVEL
or PATCHLEVEL") fixed the build error for empty SUBLEVEL or PATCHLEVEL
by prepending a zero.
Commit 9b82f13e7ef3 ("kbuild: clamp SUBLEVEL to 255") re-introduced
this issue.
This time, we cannot take the same approach because we have C code:
#define LINUX_VERSION_PATCHLEVEL $(PATCHLEVEL)
#define LINUX_VERSION_SUBLEVEL $(SUBLEVEL)
Replace empty SUBLEVEL/PATCHLEVEL with a zero.
Fixes: 9b82f13e7ef3 ("kbuild: clamp SUBLEVEL to 255")
Reported-by: Christian Zigotzky <chzigotzky@xenosoft.de>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-and-tested-by: Sasha Levin <sashal@kernel.org>
Pull more RISC-V updates from Palmer Dabbelt:
"A pair of patches that slipped through the cracks:
- enable CPU hotplug in the defconfigs
- some cleanups to setup_bootmem
There's also a single fix for some randconfig build failures:
- make NUMA depend on SMP"
* tag 'riscv-for-linus-5.12-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: Cleanup setup_bootmem()
RISC-V: Enable CPU Hotplug in defconfigs
RISC-V: Make NUMA depend on SMP
'make -s' should be really silent. However, 'make -s V=1' prints noisy
log messages from some shell scripts.
Of course, such a combination is odd, but the build system needs to do
the right thing even if a user gives strange input.
If -s is given, KBUILD_VERBOSE should be forced to 0.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Pull power management fixes from Rafael Wysocki:
"These fix a crash in intel_pstate during resume from suspend-to-RAM
that may occur after recent changes and two resource leaks in error
paths in the operating performance points (OPP) framework, add a new
C-states table to intel_idle and update the cpuidle MAINTAINERS entry
to cover the governors too.
Specifics:
- Fix recently introduced crash in the intel_pstate driver that
occurs if scale-invariance is disabled during resume from
suspend-to-RAM due to inconsistent changes of APERF or MPERF MSR
values made by the platform firmware (Rafael Wysocki).
- Fix a memory leak and add a missing clk_put() in error paths in the
OPP framework (Quanyang Wang, Viresh Kumar).
- Add new C-states table for SnowRidge processors to the intel_idle
driver (Artem Bityutskiy).
- Update the MAINTAINERS entry for cpuidle to make it clear that the
governors are covered by it too (Lukas Bulwahn)"
* tag 'pm-5.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
intel_idle: add SnowRidge C-state table
cpufreq: intel_pstate: Fix fast-switch fallback path
opp: Call the missing clk_put() on error
opp: fix memory leak in _allocate_opp_table
MAINTAINERS: include governors into CPU IDLE TIME MANAGEMENT FRAMEWORK
Pull more SCSI updates from James Bottomley:
"This is a few driver updates (iscsi, mpt3sas) that were still in the
staging queue when the merge window opened (all committed on or before
8 Feb) and some small bug fixes which came in during the merge window
(all committed on 22 Feb)"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (30 commits)
scsi: hpsa: Correct dev cmds outstanding for retried cmds
scsi: sd: Fix Opal support
scsi: target: tcmu: Fix memory leak caused by wrong uio usage
scsi: target: tcmu: Move some functions without code change
scsi: sd: sd_zbc: Don't pass GFP_NOIO to kvcalloc
scsi: aic7xxx: Remove unused function pointer typedef ahc_bus_suspend/resume_t
scsi: bnx2fc: Fix Kconfig warning & CNIC build errors
scsi: ufs: Fix a duplicate dev quirk number
scsi: aic79xx: Fix spelling of version
scsi: target: core: Prevent underflow for service actions
scsi: target: core: Add cmd length set before cmd complete
scsi: iscsi: Drop session lock in iscsi_session_chkready()
scsi: qla4xxx: Use iscsi_is_session_online()
scsi: libiscsi: Reset max/exp cmdsn during recovery
scsi: iscsi_tcp: Fix shost can_queue initialization
scsi: libiscsi: Add helper to calculate max SCSI cmds per session
scsi: libiscsi: Fix iSCSI host workq destruction
scsi: libiscsi: Fix iscsi_task use after free()
scsi: libiscsi: Drop taskqueuelock
scsi: libiscsi: Fix iscsi_prep_scsi_cmd_pdu() error handling
...
After the following patches,
commit de043da0b9e7 ("RISC-V: Fix usage of memblock_enforce_memory_limit")
commit 1bd14a66ee52 ("RISC-V: Remove any memblock representing unusable memory area")
commit b10d6bca8720 ("arch, drivers: replace for_each_membock() with for_each_mem_range()")
some logic is useless, kill the mem_start/start/end and unneeded code.
Reviewed-by: Atish Patra <atish.patra@wdc.com>
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
The existing code is essentially
free_initmem_default()->free_reserved_area() without poisoning.
Note that existing code missed to update the managed page count of the
zone.
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Tested-by: Guo Ren <guoren@kernel.org>
Signed-off-by: Guo Ren <guoren@kernel.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
Pull SCSI fixes from James Bottomley:
"This is a load of driver fixes (12 ufs, 1 mpt3sas, 1 cxgbi).
The big core two fixes are for power management ("block: Do not accept
any requests while suspended" and "block: Fix a race in the runtime
power management code") which finally sorts out the resume problems
we've occasionally been having.
To make the resume fix, there are seven necessary precursors which
effectively renames REQ_PREEMPT to REQ_PM, so every "special" request
in block is automatically a power management exempt one.
All of the non-PM preempt cases are removed except for the one in the
SCSI Parallel Interface (spi) domain validation which is a genuine
case where we have to run requests at high priority to validate the
bus so this becomes an autopm get/put protected request"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (22 commits)
scsi: cxgb4i: Fix TLS dependency
scsi: ufs: Un-inline ufshcd_vops_device_reset function
scsi: ufs: Re-enable WriteBooster after device reset
scsi: ufs-mediatek: Use correct path to fix compile error
scsi: mpt3sas: Signedness bug in _base_get_diag_triggers()
scsi: block: Do not accept any requests while suspended
scsi: block: Remove RQF_PREEMPT and BLK_MQ_REQ_PREEMPT
scsi: core: Only process PM requests if rpm_status != RPM_ACTIVE
scsi: scsi_transport_spi: Set RQF_PM for domain validation commands
scsi: ide: Mark power management requests with RQF_PM instead of RQF_PREEMPT
scsi: ide: Do not set the RQF_PREEMPT flag for sense requests
scsi: block: Introduce BLK_MQ_REQ_PM
scsi: block: Fix a race in the runtime power management code
scsi: ufs-pci: Enable UFSHCD_CAP_RPM_AUTOSUSPEND for Intel controllers
scsi: ufs-pci: Fix recovery from hibernate exit errors for Intel controllers
scsi: ufs-pci: Ensure UFS device is in PowerDown mode for suspend-to-disk ->poweroff()
scsi: ufs-pci: Fix restore from S4 for Intel controllers
scsi: ufs-mediatek: Keep VCC always-on for specific devices
scsi: ufs: Allow regulators being always-on
scsi: ufs: Clear UAC for RPMB after ufshcd resets
...
Pull more xfs updates from Darrick Wong:
"The most notable fix here prevents premature reuse of freed metadata
blocks, and adding the ability to detect accidental nested
transactions, which are not allowed here.
- Restore a disused sysctl control knob that was inadvertently
dropped during the merge window to avoid fstests regressions.
- Don't speculatively release freed blocks from the busy list until
we're actually allocating them, which fixes a rare log recovery
regression.
- Don't nest transactions when scanning for free space.
- Add an idiot^Wmaintainer light to detect nested transactions. ;)"
* tag 'xfs-5.12-merge-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: use current->journal_info for detecting transaction recursion
xfs: don't nest transactions when scanning for eofblocks
xfs: don't reuse busy extents on extent trim
xfs: restore speculative_cow_prealloc_lifetime sysctl
Prevent incrementing device->commands_outstanding for ioaccel command
retries that are driver initiated. If the command goes through the retry
path, the device->commands_outstanding counter has already accounted for
the number of commands outstanding to the device. Only commands going
through function hpsa_cmd_resolve_events decrement this counter.
- ioaccel commands go to either HBA disks or to logical volumes comprised
of SSDs.
The extra increment is causing device resets to hang.
- Resets wait for all device outstanding commands to complete before
returning.
Replace unused field abort_pending with retry_pending. This is a
maintenance driver so these changes have the least impact/risk.
Link: https://lore.kernel.org/r/161342801747.29388.13045495968308188518.stgit@brunhilda
Tested-by: Joe Szczypek <jszczype@redhat.com>
Reviewed-by: Scott Benesh <scott.benesh@microchip.com>
Reviewed-by: Scott Teel <scott.teel@microchip.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Don Brace <don.brace@microchip.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Commit cd195bc4775a ("kbuild: split adjust_autoksyms.sh in two parts")
split out the code that needs include/config/auto.conf.
This script no longer needs to include include/config/auto.conf.
Fixes: cd195bc4775a ("kbuild: split adjust_autoksyms.sh in two parts")
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Pull block fixes from Jens Axboe:
"Two minor block fixes from this last week that should go into 5.11:
- Add missing NOWAIT debugfs definition (Andres)
- Fix kerneldoc warning introduced this merge window (Randy)"
* tag 'block-5.11-2021-01-01' of git://git.kernel.dk/linux-block:
block: add debugfs stanza for QUEUE_FLAG_NOWAIT
fs: block_dev.c: fix kernel-doc warnings from struct block_device changes
SCSI_CXGB4_ISCSI selects CHELSIO_T4. The latter depends on TLS || TLS=n, so
since 'select' does not check dependencies of the selected symbol,
SCSI_CXGB4_ISCSI should also depend on TLS || TLS=n.
This prevents the following kconfig warning and restricts SCSI_CXGB4_ISCSI
to 'm' whenever TLS=m.
WARNING: unmet direct dependencies detected for CHELSIO_T4
Depends on [m]: NETDEVICES [=y] && ETHERNET [=y] && NET_VENDOR_CHELSIO [=y] && PCI [=y] && (IPV6 [=y] || IPV6 [=y]=n) && (TLS [=m] || TLS [=m]=n)
Selected by [y]:
- SCSI_CXGB4_ISCSI [=y] && SCSI_LOWLEVEL [=y] && SCSI [=y] && PCI [=y] && INET [=y] && (IPV6 [=y] || IPV6 [=y]=n) && ETHERNET [=y]
Link: https://lore.kernel.org/r/20201208220505.24488-1-rdunlap@infradead.org
Fixes: 7b36b6e03b0d ("[SCSI] cxgb4i v5: iscsi driver")
Cc: Karen Xie <kxie@chelsio.com>
Cc: linux-scsi@vger.kernel.org
Cc: "James E.J. Bottomley" <jejb@linux.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Pull operating performance points (OPP) framework fixes for 5.11-rc2
from Viresh Kumar:
"This contains two patches to fix freeing of resources in error paths."
* 'opp/linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm:
opp: Call the missing clk_put() on error
opp: fix memory leak in _allocate_opp_table
When sugov_update_single_perf() falls back to the "frequency"
path due to the missing scale-invariance, it will call
cpufreq_driver_fast_switch() via sugov_fast_switch()
and the driver's ->fast_switch() callback will be invoked,
so it must not be NULL.
However, after commit a365ab6b9dfb ("cpufreq: intel_pstate: Implement
the ->adjust_perf() callback") intel_pstate sets ->fast_switch() to
NULL when it is going to use intel_cpufreq_adjust_perf(), which is a
mistake, because on x86 the scale-invariance may be turned off
dynamically, so modify it to retain the original ->adjust_perf()
callback pointer.
Fixes: a365ab6b9dfb ("cpufreq: intel_pstate: Implement the ->adjust_perf() callback")
Reported-by: Kenneth R. Crudup <kenny@panix.com>
Tested-by: Kenneth R. Crudup <kenny@panix.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Add C-state table for the SnowRidge SoC which is found on Intel Jacobsville
platforms.
The following has been changed.
1. C1E latency changed from 10us to 15us. It was measured using the
open source "wult" tool (the "nic" method, 15us is the 99.99th
percentile).
2. C1E power break even changed from 20us to 25us, which may result
in less C1E residency in some workloads.
3. C6 latency changed from 50us to 130us. Measured the same way as C1E.
The C6 C-state is supported only by some SnowRidge revisions, so add a C-state
table commentary about this.
On SnowRidge, C6 support is enumerated via the usual mechanism: "mwait" leaf of
the "cpuid" instruction. The 'intel_idle' driver does check this leaf, so even
though C6 is present in the table, the driver will only use it if the CPU does
support it.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Pull more block updates from Jens Axboe:
"A few stragglers (and one due to me missing it originally), and fixes
for changes in this merge window mostly. In particular:
- blktrace cleanups (Chaitanya, Greg)
- Kill dead blk_pm_* functions (Bart)
- Fixes for the bio alloc changes (Christoph)
- Fix for the partition changes (Christoph, Ming)
- Fix for turning off iopoll with polled IO inflight (Jeffle)
- nbd disconnect fix (Josef)
- loop fsync error fix (Mauricio)
- kyber update depth fix (Yang)
- max_sectors alignment fix (Mikulas)
- Add bio_max_segs helper (Matthew)"
* tag 'block-5.12-2021-02-27' of git://git.kernel.dk/linux-block: (21 commits)
block: Add bio_max_segs
blktrace: fix documentation for blk_fill_rw()
block: memory allocations in bounce_clone_bio must not fail
block: remove the gfp_mask argument to bounce_clone_bio
block: fix bounce_clone_bio for passthrough bios
block-crypto-fallback: use a bio_set for splitting bios
block: fix logging on capacity change
blk-settings: align max_sectors on "logical_block_size" boundary
block: reopen the device in blkdev_reread_part
block: don't skip empty device in in disk_uevent
blktrace: remove debugfs file dentries from struct blk_trace
nbd: handle device refs for DESTROY_ON_DISCONNECT properly
kyber: introduce kyber_depth_updated()
loop: fix I/O error on fsync() in detached loop devices
block: fix potential IO hang when turning off io_poll
block: get rid of the trace rq insert wrapper
blktrace: fix blk_rq_merge documentation
blktrace: fix blk_rq_issue documentation
blktrace: add blk_fill_rwbs documentation comment
block: remove superfluous param in blk_fill_rwbs()
...
Because the iomap code using PF_MEMALLOC_NOFS to detect transaction
recursion in XFS is just wrong. Remove it from the iomap code and
replace it with XFS specific internal checks using
current->journal_info instead.
[djwong: This change also realigns the lifetime of NOFS flag changes to
match the incore transaction, instead of the inconsistent scheme we have
now.]
Fixes: 9070733b4efa ("xfs: abstract PF_FSTRANS to PF_MEMALLOC_NOFS")
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
The SCSI core has been modified recently such that it only processes PM
requests if rpm_status != RPM_ACTIVE. Since some Opal requests are
submitted while rpm_status != RPM_ACTIVE, set flag RQF_PM for Opal
requests.
See also https://bugzilla.kernel.org/show_bug.cgi?id=211227.
[mkp: updated sha for PM patch]
Link: https://lore.kernel.org/r/20210222021042.3534-1-bvanassche@acm.org
Fixes: d80210f25ff0 ("sd: add support for TCG OPAL self encrypting disks")
Fixes: e6044f714b25 ("scsi: core: Only process PM requests if rpm_status != RPM_ACTIVE")
Cc: chriscjsus@yahoo.com
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: stable@vger.kernel.org
Reported-by: chriscjsus@yahoo.com
Tested-by: chriscjsus@yahoo.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In theory these are orthogonal, but in practice all NUMA systems are
SMP. NUMA && !SMP doesn't build, everyone else is coupling them, and I
don't really see any value in supporting that configuration.
Fixes: 4f0e8eef772e ("riscv: Add numa support for riscv64 platform")
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Suggested-by: Atish Patra <atishp@atishpatra.org>
Reported-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Tested-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Commit fbe078d397b4 ("kbuild: lto: add a default list of used symbols")
does not work as expected if the .config file has already specified
CONFIG_UNUSED_KSYMS_WHITELIST="my/own/white/list" before enabling
CONFIG_LTO_CLANG.
So, the user-supplied whitelist and LTO-specific white list must be
independent of each other.
I refactored the shell script so CONFIG_MODVERSIONS and CONFIG_CLANG_LTO
handle whitelists in the same way.
Fixes: fbe078d397b4 ("kbuild: lto: add a default list of used symbols")
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Pull io_uring fixes from Jens Axboe:
"A few fixes that should go into 5.11, all marked for stable as well:
- Fix issue around identity COW'ing and users that share a ring
across processes
- Fix a hang associated with unregistering fixed files (Pavel)
- Move the 'process is exiting' cancelation a bit earlier, so
task_works aren't affected by it (Pavel)"
* tag 'io_uring-5.11-2021-01-01' of git://git.kernel.dk/linux-block:
kernel/io_uring: cancel io_uring before task works
io_uring: fix io_sqe_files_unregister() hangs
io_uring: add a helper for setting a ref node
io_uring: don't assume mm is constant across submits
This was missed in 021a24460dc2. Leads to the numeric value of
QUEUE_FLAG_NOWAIT (i.e. 29) showing up in
/sys/kernel/debug/block/*/state.
Fixes: 021a24460dc28e7412aecfae89f60e1847e685c0
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andres Freund <andres@anarazel.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
More and more statements are being added to ufshcd_vops_device_reset() and
this function is being called from multiple locations in the driver.
Un-inline the function to allow the compiler to make better decisions.
Link: https://lore.kernel.org/r/20201208135635.15326-3-stanley.chu@mediatek.com
Reviewed-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Stanley Chu <stanley.chu@mediatek.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The current pattern in the file entry does not make the files in the
governors subdirectory to be a part of the CPU IDLE TIME MANAGEMENT
FRAMEWORK.
Adjust the file pattern to include files in governors.
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Since commit 36e2c7421f02 ("fs: don't allow splice read/write without
explicit ops") we've required that file operation structures explicitly
enable splice support, rather than falling back to the default handlers.
Most /proc files use the indirect 'struct proc_ops' to describe their
file operations, and were fixed up to support splice earlier in commits
40be821d627c..b24c30c67863, but the mountinfo files interact with the
VFS directly using their own 'struct file_operations' and got missed as
a result.
This adds the necessary support for splice to work for /proc/*/mountinfo
and friends.
Reported-by: Joan Bruguera Micó <joanbrugueram@gmail.com>
Reported-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=209971
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull io_uring thread rewrite from Jens Axboe:
"This converts the io-wq workers to be forked off the tasks in question
instead of being kernel threads that assume various bits of the
original task identity.
This kills > 400 lines of code from io_uring/io-wq, and it's the worst
part of the code. We've had several bugs in this area, and the worry
is always that we could be missing some pieces for file types doing
unusual things (recent /dev/tty example comes to mind, userfaultfd
reads installing file descriptors is another fun one... - both of
which need special handling, and I bet it's not the last weird oddity
we'll find).
With these identical workers, we can have full confidence that we're
never missing anything. That, in itself, is a huge win. Outside of
that, it's also more efficient since we're not wasting space and code
on tracking state, or switching between different states.
I'm sure we're going to find little things to patch up after this
series, but testing has been pretty thorough, from the usual
regression suite to production. Any issue that may crop up should be
manageable.
There's also a nice series of further reductions we can do on top of
this, but I wanted to get the meat of it out sooner rather than later.
The general worry here isn't that it's fundamentally broken. Most of
the little issues we've found over the last week have been related to
just changes in how thread startup/exit is done, since that's the main
difference between using kthreads and these kinds of threads. In fact,
if all goes according to plan, I want to get this into the 5.10 and
5.11 stable branches as well.
That said, the changes outside of io_uring/io-wq are:
- arch setup, simple one-liner to each arch copy_thread()
implementation.
- Removal of net and proc restrictions for io_uring, they are no
longer needed or useful"
* tag 'io_uring-worker.v3-2021-02-25' of git://git.kernel.dk/linux-block: (30 commits)
io-wq: remove now unused IO_WQ_BIT_ERROR
io_uring: fix SQPOLL thread handling over exec
io-wq: improve manager/worker handling over exec
io_uring: ensure SQPOLL startup is triggered before error shutdown
io-wq: make buffered file write hashed work map per-ctx
io-wq: fix race around io_worker grabbing
io-wq: fix races around manager/worker creation and task exit
io_uring: ensure io-wq context is always destroyed for tasks
arch: ensure parisc/powerpc handle PF_IO_WORKER in copy_thread()
io_uring: cleanup ->user usage
io-wq: remove nr_process accounting
io_uring: flag new native workers with IORING_FEAT_NATIVE_WORKERS
net: remove cmsg restriction from io_uring based send/recvmsg calls
Revert "proc: don't allow async path resolution of /proc/self components"
Revert "proc: don't allow async path resolution of /proc/thread-self components"
io_uring: move SQPOLL thread io-wq forked worker
io-wq: make io_wq_fork_thread() available to other users
io-wq: only remove worker from free_list, if it was there
io_uring: remove io_identity
io_uring: remove any grabbing of context
...
It's often inconvenient to use BIO_MAX_PAGES due to min() requiring the
sign to be the same. Introduce bio_max_segs() and change BIO_MAX_PAGES to
be unsigned to make it easier for the users.
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Brian Foster reported a lockdep warning on xfs/167:
============================================
WARNING: possible recursive locking detected
5.11.0-rc4 #35 Tainted: G W I
--------------------------------------------
fsstress/17733 is trying to acquire lock:
ffff8e0fd1d90650 (sb_internal){++++}-{0:0}, at: xfs_free_eofblocks+0x104/0x1d0 [xfs]
but task is already holding lock:
ffff8e0fd1d90650 (sb_internal){++++}-{0:0}, at: xfs_trans_alloc_inode+0x5f/0x160 [xfs]
stack backtrace:
CPU: 38 PID: 17733 Comm: fsstress Tainted: G W I 5.11.0-rc4 #35
Hardware name: Dell Inc. PowerEdge R740/01KPX8, BIOS 1.6.11 11/20/2018
Call Trace:
dump_stack+0x8b/0xb0
__lock_acquire.cold+0x159/0x2ab
lock_acquire+0x116/0x370
xfs_trans_alloc+0x1ad/0x310 [xfs]
xfs_free_eofblocks+0x104/0x1d0 [xfs]
xfs_blockgc_scan_inode+0x24/0x60 [xfs]
xfs_inode_walk_ag+0x202/0x4b0 [xfs]
xfs_inode_walk+0x66/0xc0 [xfs]
xfs_trans_alloc+0x160/0x310 [xfs]
xfs_trans_alloc_inode+0x5f/0x160 [xfs]
xfs_alloc_file_space+0x105/0x300 [xfs]
xfs_file_fallocate+0x270/0x460 [xfs]
vfs_fallocate+0x14d/0x3d0
__x64_sys_fallocate+0x3e/0x70
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xa9
The cause of this is the new code that spurs a scan to garbage collect
speculative preallocations if we fail to reserve enough blocks while
allocating a transaction. While the warning itself is a fairly benign
lockdep complaint, it does expose a potential livelock if the rwsem
behavior ever changes with regards to nesting read locks when someone's
waiting for a write lock.
Fix this by freeing the transaction and jumping back to xfs_trans_alloc
like this patch in the V4 submission[1].
[1] https://lore.kernel.org/linux-xfs/161142798066.2171939.9311024588681972086.stgit@magnolia/
Fixes: a1a7d05a0576 ("xfs: flush speculative space allocations when we run out of space")
Reported-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
When user deletes a tcmu device via configFS, tcmu calls
uio_unregister_device(). During that call uio resets its pointer to struct
uio_info provided by tcmu. That means, after uio_unregister_device() uio
will no longer execute any of the callbacks tcmu had set in uio_info.
Especially, if userspace daemon still holds the corresponding uio device
open or mmap'ed while tcmu calls uio_unregister_device(), uio will not call
tcmu_release() when userspace finally closes and munmaps the uio device.
Since tcmu does refcounting for the tcmu device in tcmu_open() and
tcmu_release(), in the decribed case refcount does not drop to 0 and tcmu
does not free tcmu device's resources. In extreme cases this can cause
memory leaking of up to 1 GB for a single tcmu device.
After uio_unregister_device(), uio will reject every open, read, write,
mmap from userspace with -EOI. But userspace daemon can still access the
mmap'ed command ring and data area. Therefore tcmu should wait until
userspace munmaps the uio device before it frees the resources, as we don't
want to cause SIGSEGV or SIGBUS to user space.
That said, current refcounting during tcmu_open and tcmu_release does not
work correctly, and refcounting better should be done in the open and close
callouts of the vm_operations_struct, which tcmu assigns to each mmap of
the uio device (because it wants its own page fault handler).
This patch fixes the memory leak by removing refcounting from tcmu_open and
tcmu_close, and instead adding new tcmu_vma_open() and tcmu_vma_close()
handlers that only do refcounting.
Link: https://lore.kernel.org/r/20210218175039.7829-3-bostroesser@gmail.com
Reviewed-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Bodo Stroesser <bostroesser@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Pull RISC-V updates from Palmer Dabbelt:
"A handful of new RISC-V related patches for this merge window:
- A check to ensure drivers are properly using uaccess. This isn't
manifesting with any of the drivers I'm currently using, but may
catch errors in new drivers.
- Some preliminary support for the FU740, along with the HiFive
Unleashed it will appear on.
- NUMA support for RISC-V, which involves making the arm64 code
generic.
- Support for kasan on the vmalloc region.
- A handful of new drivers for the Kendryte K210, along with the DT
plumbing required to boot on a handful of K210-based boards.
- Support for allocating ASIDs.
- Preliminary support for kernels larger than 128MiB.
- Various other improvements to our KASAN support, including the
utilization of huge pages when allocating the KASAN regions.
We may have already found a bug with the KASAN_VMALLOC code, but it's
passing my tests. There's a fix in the works, but that will probably
miss the merge window.
* tag 'riscv-for-linus-5.12-mw0' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (75 commits)
riscv: Improve kasan population by using hugepages when possible
riscv: Improve kasan population function
riscv: Use KASAN_SHADOW_INIT define for kasan memory initialization
riscv: Improve kasan definitions
riscv: Get rid of MAX_EARLY_MAPPING_SIZE
soc: canaan: Sort the Makefile alphabetically
riscv: Disable KSAN_SANITIZE for vDSO
riscv: Remove unnecessary declaration
riscv: Add Canaan Kendryte K210 SD card defconfig
riscv: Update Canaan Kendryte K210 defconfig
riscv: Add Kendryte KD233 board device tree
riscv: Add SiPeed MAIXDUINO board device tree
riscv: Add SiPeed MAIX GO board device tree
riscv: Add SiPeed MAIX DOCK board device tree
riscv: Add SiPeed MAIX BiT board device tree
riscv: Update Canaan Kendryte K210 device tree
dt-bindings: add resets property to dw-apb-timer
dt-bindings: fix sifive gpio properties
dt-bindings: update sifive uart compatible string
dt-bindings: update sifive clint compatible string
...
The type of 'val' is 'unsigned long' in simulate_blz32, so 'val < 0'
can't be true.
Cast 'val' to 'long' here to determine branch token or not,
Fixup instructions: bnezad32, bhsz32, bhz32, blsz32, blz32
Link: https://lore.kernel.org/linux-csky/CAJF2gTQjKXR9gpo06WAWG1aquiT87mATiMGorXs6ChxOxoe90Q@mail.gmail.com/T/#t
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Co-developed-by: Menglong Dong <dong.menglong@zte.com.cn>
Signed-off-by: Menglong Dong <dong.menglong@zte.com.cn>
Some randconfig builds fail with undefined references to _mcount
when CONFIG_TRIM_UNUSED_KSYMS is set:
ERROR: modpost: "_mcount" [drivers/tee/optee/optee.ko] undefined!
ERROR: modpost: "_mcount" [drivers/fsi/fsi-occ.ko] undefined!
ERROR: modpost: "_mcount" [drivers/fpga/dfl-pci.ko] undefined!
Since there is already a list of symbols that get generated at link
time, add this one as well.
Fixes: fbe078d397b4 ("kbuild: lto: add a default list of used symbols")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Commit 436e980e2ed5 ("kbuild: don't hardcode depmod path") stopped
hard-coding the path of depmod, but in the process caused trouble for
distributions that had that /sbin location, but didn't have it in the
PATH (generally because /sbin is limited to the super-user path).
Work around it for now by just adding /sbin to the end of PATH in the
depmod.sh script.
Reported-and-tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
For cancelling io_uring requests it needs either to be able to run
currently enqueued task_works or having it shut down by that moment.
Otherwise io_uring_cancel_files() may be waiting for requests that won't
ever complete.
Go with the first way and do cancellations before setting PF_EXITING and
so before putting the task_work infrastructure into a transition state
where task_work_run() would better not be called.
Cc: stable@vger.kernel.org # 5.5+
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Fix new kernel-doc warnings in fs/block_dev.c:
../fs/block_dev.c:1066: warning: Excess function parameter 'whole' description in 'bd_abort_claiming'
../fs/block_dev.c:1837: warning: Function parameter or member 'dev' not described in 'lookup_bdev'
Fixes: 4e7b5671c6a8 ("block: remove i_bdev")
Fixes: 37c3fc9abb25 ("block: simplify the block device claiming interface")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: linux-fsdevel@vger.kernel.org
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
UFS 3.1 specification mentions that the WriteBooster flags listed below
will be set to their default values, i.e. disabled, after power cycle or
any type of reset event. Thus we need to reset the flag variables kept in
struct hba to align with the device status and ensure that
WriteBooster-related functions are configured properly after device reset.
Without this fix, WriteBooster will not be enabled successfully after by
ufshcd_wb_ctrl() after device reset because hba->wb_enabled remains true.
Flags required to be reset to default values:
- fWriteBoosterEn: hba->wb_enabled
- fWriteBoosterBufferFlushEn: hba->wb_buf_flush_enabled
- fWriteBoosterBufferFlushDuringHibernate: No variable mapped
Link: https://lore.kernel.org/r/20201208135635.15326-2-stanley.chu@mediatek.com
Fixes: 3d17b9b5ab11 ("scsi: ufs: Add write booster feature support")
Reviewed-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Stanley Chu <stanley.chu@mediatek.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In function _allocate_opp_table, opp_dev is allocated and referenced
by opp_table via _add_opp_dev. But in the case that the subsequent calls
return -EPROBE_DEFER, it will jump to err label and opp_table will be
freed. Then opp_dev becomes an unreferenced object to cause memory leak.
So let's call _remove_opp_dev to do the cleanup.
This fixes the following kmemleak report:
unreferenced object 0xffff000801524a00 (size 128):
comm "swapper/0", pid 1, jiffies 4294892465 (age 84.616s)
hex dump (first 32 bytes):
40 00 56 01 08 00 ff ff 40 00 56 01 08 00 ff ff @.V.....@.V.....
b8 52 77 7f 08 00 ff ff 00 3c 4c 00 08 00 ff ff .Rw......<L.....
backtrace:
[<00000000b1289fb1>] kmemleak_alloc+0x30/0x40
[<0000000056da48f0>] kmem_cache_alloc+0x3d4/0x588
[<00000000a84b3b0e>] _add_opp_dev+0x2c/0x88
[<0000000062a380cd>] _add_opp_table_indexed+0x124/0x268
[<000000008b4c8f1f>] dev_pm_opp_of_add_table+0x20/0x1d8
[<00000000e5316798>] dev_pm_opp_of_cpumask_add_table+0x48/0xf0
[<00000000db0a8ec2>] dt_cpufreq_probe+0x20c/0x448
[<0000000030a3a26c>] platform_probe+0x68/0xd8
[<00000000c618e78d>] really_probe+0xd0/0x3a0
[<00000000642e856f>] driver_probe_device+0x58/0xb8
[<00000000f10f5307>] device_driver_attach+0x74/0x80
[<0000000004f254b8>] __driver_attach+0x58/0xe0
[<0000000009d5d19e>] bus_for_each_dev+0x70/0xc8
[<0000000000d22e1c>] driver_attach+0x24/0x30
[<0000000001d4e952>] bus_add_driver+0x14c/0x1f0
[<0000000089928aaa>] driver_register+0x64/0x120
Cc: v5.10 <stable@vger.kernel.org> # v5.10
Fixes: dd461cd9183f ("opp: Allow dev_pm_opp_get_opp_table() to return -EPROBE_DEFER")
Signed-off-by: Quanyang Wang <quanyang.wang@windriver.com>
[ Viresh: Added the stable tag ]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Pull power management updates from Rafael Wysocki:
"These update cpufreq (core and drivers), cpuidle (polling state
implementation and the PSCI driver), the OPP (operating performance
points) framework, devfreq (core and drivers), the power capping RAPL
(Running Average Power Limit) driver, the Energy Model support, the
generic power domains (genpd) framework, the ACPI device power
management, the core system-wide suspend code and power management
utilities.
Specifics:
- Use local_clock() instead of jiffies in the cpufreq statistics to
improve accuracy (Viresh Kumar).
- Fix up OPP usage in the cpufreq-dt and qcom-cpufreq-nvmem cpufreq
drivers (Viresh Kumar).
- Clean up the cpufreq core, the intel_pstate driver and the
schedutil cpufreq governor (Rafael Wysocki).
- Fix up error code paths in the sti-cpufreq and mediatek cpufreq
drivers (Yangtao Li, Qinglang Miao).
- Fix cpufreq_online() to return error codes instead of success (0)
in all cases when it fails (Wang ShaoBo).
- Add mt8167 support to the mediatek cpufreq driver and blacklist
mt8516 in the cpufreq-dt-platdev driver (Fabien Parent).
- Modify the tegra194 cpufreq driver to always return values from the
frequency table as the current frequency and clean up that driver
(Sumit Gupta, Jon Hunter).
- Modify the arm_scmi cpufreq driver to allow it to discover the
power scale present in the performance protocol and provide this
information to the Energy Model (Lukasz Luba).
- Add missing MODULE_DEVICE_TABLE to several cpufreq drivers (Pali
Rohár).
- Clean up the CPPC cpufreq driver (Ionela Voinescu).
- Fix NVMEM_IMX_OCOTP dependency in the imx cpufreq driver (Arnd
Bergmann).
- Rework the poling interval selection for the polling state in
cpuidle (Mel Gorman).
- Enable suspend-to-idle for PSCI OSI mode in the PSCI cpuidle driver
(Ulf Hansson).
- Modify the OPP framework to support empty (node-less) OPP tables in
DT for passing dependency information (Nicola Mazzucato).
- Fix potential lockdep issue in the OPP core and clean up the OPP
core (Viresh Kumar).
- Modify dev_pm_opp_put_regulators() to accept a NULL argument and
update its users accordingly (Viresh Kumar).
- Add frequency changes tracepoint to devfreq (Matthias Kaehlcke).
- Add support for governor feature flags to devfreq, make devfreq
sysfs file permissions depend on the governor and clean up the
devfreq core (Chanwoo Choi).
- Clean up the tegra20 devfreq driver and deprecate it to allow
another driver based on EMC_STAT to be used instead of it (Dmitry
Osipenko).
- Add interconnect support to the tegra30 devfreq driver, allow it to
take the interconnect and OPP information from DT and clean it up
(Dmitry Osipenko).
- Add interconnect support to the exynos-bus devfreq driver along
with interconnect properties documentation (Sylwester Nawrocki).
- Add suport for AMD Fam17h and Fam19h processors to the RAPL power
capping driver (Victor Ding, Kim Phillips).
- Fix handling of overly long constraint names in the powercap
framework (Lukasz Luba).
- Fix the wakeup configuration handling for bridges in the ACPI
device power management core (Rafael Wysocki).
- Add support for using an abstract scale for power units in the
Energy Model (EM) and document it (Lukasz Luba).
- Add em_cpu_energy() micro-optimization to the EM (Pavankumar
Kondeti).
- Modify the generic power domains (genpd) framwework to support
suspend-to-idle (Ulf Hansson).
- Fix creation of debugfs nodes in genpd (Thierry Strudel).
- Clean up genpd (Lina Iyer).
- Clean up the core system-wide suspend code and make it print driver
flags for devices with debug enabled (Alex Shi, Patrice Chotard,
Chen Yu).
- Modify the ACPI system reboot code to make it prepare for system
power off to avoid confusing the platform firmware (Kai-Heng Feng).
- Update the pm-graph (multiple changes, mostly usability-related)
and cpupower (online and offline CPU information support) PM
utilities (Todd Brandt, Brahadambal Srinivasan)"
* tag 'pm-5.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (86 commits)
cpufreq: Fix cpufreq_online() return value on errors
cpufreq: Fix up several kerneldoc comments
cpufreq: stats: Use local_clock() instead of jiffies
cpufreq: schedutil: Simplify sugov_update_next_freq()
cpufreq: intel_pstate: Simplify intel_cpufreq_update_pstate()
PM: domains: create debugfs nodes when adding power domains
opp: of: Allow empty opp-table with opp-shared
dt-bindings: opp: Allow empty OPP tables
media: venus: dev_pm_opp_put_*() accepts NULL argument
drm/panfrost: dev_pm_opp_put_*() accepts NULL argument
drm/lima: dev_pm_opp_put_*() accepts NULL argument
PM / devfreq: exynos: dev_pm_opp_put_*() accepts NULL argument
cpufreq: qcom-cpufreq-nvmem: dev_pm_opp_put_*() accepts NULL argument
cpufreq: dt: dev_pm_opp_put_regulators() accepts NULL argument
opp: Allow dev_pm_opp_put_*() APIs to accept NULL opp_table
opp: Don't create an OPP table from dev_pm_opp_get_opp_table()
cpufreq: dt: Don't (ab)use dev_pm_opp_get_opp_table() to create OPP table
opp: Reduce the size of critical section in _opp_kref_release()
PM / EM: Micro optimization in em_cpu_energy
cpufreq: arm_scmi: Discover the power scale in performance protocol
...
Pull misc vfs updates from Al Viro:
"Assorted stuff pile - no common topic here"
* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
whack-a-mole: don't open-code iminor/imajor
9p: fix misuse of sscanf() in v9fs_stat2inode()
audit_alloc_mark(): don't open-code ERR_CAST()
fs/inode.c: make inode_init_always() initialize i_ino to 0
vfs: don't unnecessarily clone write access for writable fds
Add missing ":" after rwbs function parameter documentation that fixes
following warning :-
./kernel/trace/blktrace.c:1877: warning: Function parameter or member 'rwbs' not described in 'blk_fill_rwbs'
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Fixes: 1f83bb4b4914 ("blktrace: add blk_fill_rwbs documentation comment")
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Freed extents are marked busy from the point the freeing transaction
commits until the associated CIL context is checkpointed to the log.
This prevents reuse and overwrite of recently freed blocks before
the changes are committed to disk, which can lead to corruption
after a crash. The exception to this rule is that metadata
allocation is allowed to reuse busy extents because metadata changes
are also logged.
As of commit 97d3ac75e5e0 ("xfs: exact busy extent tracking"), XFS
has allowed modification or complete invalidation of outstanding
busy extents for metadata allocations. This implementation assumes
that use of the associated extent is imminent, which is not always
the case. For example, the trimmed extent might not satisfy the
minimum length of the allocation request, or the allocation
algorithm might be involved in a search for the optimal result based
on locality.
generic/019 reproduces a corruption caused by this scenario. First,
a metadata block (usually a bmbt or symlink block) is freed from an
inode. A subsequent bmbt split on an unrelated inode attempts a near
mode allocation request that invalidates the busy block during the
search, but does not ultimately allocate it. Due to the busy state
invalidation, the block is no longer considered busy to subsequent
allocation. A direct I/O write request immediately allocates the
block and writes to it. Finally, the filesystem crashes while in a
state where the initial metadata block free had not committed to the
on-disk log. After recovery, the original metadata block is in its
original location as expected, but has been corrupted by the
aforementioned dio.
This demonstrates that it is fundamentally unsafe to modify busy
extent state for extents that are not guaranteed to be allocated.
This applies to pretty much all of the code paths that currently
trim busy extents for one reason or another. Therefore to address
this problem, drop the reuse mechanism from the busy extent trim
path. This code already knows how to return partial non-busy ranges
of the targeted free extent and higher level code tracks the busy
state of the allocation attempt. If a block allocation fails where
one or more candidate extents is busy, we force the log and retry
the allocation.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>