Linux kernel
============
There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.
In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``. The formatted documentation can also be read online at:
https://www.kernel.org/doc/html/latest/
There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.
Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.
code
Clone this repository
https://tangled.org/tjh.dev/kernel
git@gordian.tjh.dev:tjh.dev/kernel
For self-hosted knots, clone URLs may differ based on your setup.
Pull vfs fixes from Al Viro:
"Eric's s_inodes softlockup fixes + Jan's fix for recent regression
from pipe rework"
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fs: call fsnotify_sb_delete after evict_inodes
fs: avoid softlockups in s_inodes iterators
pipe: Fix bogus dereference in iov_iter_alignment()
Pull xfs fixes from Darrick Wong:
"Fix a few bugs that could lead to corrupt files, fsck complaints, and
filesystem crashes:
- Minor documentation fixes
- Fix a file corruption due to read racing with an insert range
operation.
- Fix log reservation overflows when allocating large rt extents
- Fix a buffer log item flags check
- Don't allow administrators to mount with sunit= options that will
cause later xfs_repair complaints about the root directory being
suspicious because the fs geometry appeared inconsistent
- Fix a non-static helper that should have been static"
* tag 'xfs-5.5-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: Make the symbol 'xfs_rtalloc_log_count' static
xfs: don't commit sunit/swidth updates to disk if that would cause repair failures
xfs: split the sunit parameter update into two parts
xfs: refactor agfl length computation function
libxfs: resync with the userspace libxfs
xfs: use bitops interface for buf log item AIL flag check
xfs: fix log reservation overflows when allocating large rt extents
xfs: stabilize insert range start boundary to avoid COW writeback race
xfs: fix Sphinx documentation warning
When a filesystem is unmounted, we currently call fsnotify_sb_delete()
before evict_inodes(), which means that fsnotify_unmount_inodes()
must iterate over all inodes on the superblock looking for any inodes
with watches. This is inefficient and can lead to livelocks as it
iterates over many unwatched inodes.
At this point, SB_ACTIVE is gone and dropping refcount to zero kicks
the inode out out immediately, so anything processed by
fsnotify_sb_delete / fsnotify_unmount_inodes gets evicted in that loop.
After that, the call to evict_inodes will evict everything else with a
zero refcount.
This should speed things up overall, and avoid livelocks in
fsnotify_unmount_inodes().
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Pull ext4 bug fixes from Ted Ts'o:
"Ext4 bug fixes, including a regression fix"
* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: clarify impact of 'commit' mount option
ext4: fix unused-but-set-variable warning in ext4_add_entry()
jbd2: fix kernel-doc notation warning
ext4: use RCU API in debug_print_tree
ext4: validate the debug_want_extra_isize mount option at parse time
ext4: reserve revoke credits in __ext4_new_inode
ext4: unlock on error in ext4_expand_extra_isize()
ext4: optimize __ext4_check_dir_entry()
ext4: check for directory entries too close to block end
ext4: fix ext4_empty_dir() for directories with holes
Fix the following sparse warning:
fs/xfs/libxfs/xfs_trans_resv.c:206:1: warning: symbol 'xfs_rtalloc_log_count' was not declared. Should it be static?
Fixes: b1de6fc7520f ("xfs: fix log reservation overflows when allocating large rt extents")
Signed-off-by: Chen Wandun <chenwandun@huawei.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Anything that walks all inodes on sb->s_inodes list without rescheduling
risks softlockups.
Previous efforts were made in 2 functions, see:
c27d82f fs/drop_caches.c: avoid softlockups in drop_pagecache_sb()
ac05fbb inode: don't softlockup when evicting inodes
but there hasn't been an audit of all walkers, so do that now. This
also consistently moves the cond_resched() calls to the bottom of each
loop in cases where it already exists.
One loop remains: remove_dquot_ref(), because I'm not quite sure how
to deal with that one w/o taking the i_lock.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Pull block fixes from Jens Axboe:
"Let's try this one again, this time without the compat_ioctl changes.
We've got those fixed up, but that can go out next week.
This contains:
- block queue flush lockdep annotation (Bart)
- Type fix for bsg_queue_rq() (Bart)
- Three dasd fixes (Stefan, Jan)
- nbd deadlock fix (Mike)
- Error handling bio user map fix (Yang)
- iocost fix (Tejun)
- sbitmap waitqueue addition fix that affects the kyber IO scheduler
(David)"
* tag 'block-5.5-20191221' of git://git.kernel.dk/linux-block:
sbitmap: only queue kyber's wait callback if not already active
block: fix memleak when __blk_rq_map_user_iov() is failed
s390/dasd: fix typo in copyright statement
s390/dasd: fix memleak in path handling error case
s390/dasd/cio: Interpret ccw_device_get_mdc return value correctly
block: Fix a lockdep complaint triggered by request queue flushing
block: Fix the type of 'sts' in bsg_queue_rq()
block: end bio with BLK_STS_AGAIN in case of non-mq devs and REQ_NOWAIT
nbd: fix shutdown and recv work deadlock v2
iocost: over-budget forced IOs should schedule async delay
The description of 'commit' mount option dates back to ext3 times.
Update the description to match current meaning for ext4.
Reported-by: Paul Richards <paul.richards@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20191218111210.14161-1-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Alex Lyakas reported[1] that mounting an xfs filesystem with new sunit
and swidth values could cause xfs_repair to fail loudly. The problem
here is that repair calculates the where mkfs should have allocated the
root inode, based on the superblock geometry. The allocation decisions
depend on sunit, which means that we really can't go updating sunit if
it would lead to a subsequent repair failure on an otherwise correct
filesystem.
Port from xfs_repair some code that computes the location of the root
inode and teach mount to skip the ondisk update if it would cause
problems for repair. Along the way we'll update the documentation,
provide a function for computing the minimum AGFL size instead of
open-coding it, and cut down some indenting in the mount code.
Note that we allow the mount to proceed (and new allocations will
reflect this new geometry) because we've never screened this kind of
thing before. We'll have to wait for a new future incompat feature to
enforce correct behavior, alas.
Note that the geometry reporting always uses the superblock values, not
the incore ones, so that is what xfs_info and xfs_growfs will report.
[1] https://lore.kernel.org/linux-xfs/20191125130744.GA44777@bfoster/T/#m00f9594b511e076e2fcdd489d78bc30216d72a7d
Reported-by: Alex Lyakas <alex@zadara.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
We cannot look at 'i->pipe' unless we know the iter is a pipe. Move the
ring_size load to a branch in iov_iter_alignment() where we've already
checked the iter is a pipe to avoid bogus dereference.
Reported-by: syzbot+bea68382bae9490e7dd6@syzkaller.appspotmail.com
Fixes: 8cefc107ca54 ("pipe: Use head and tail pointers for the ring, not cursor and length")
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Pull KVM fixes from Paolo Bonzini:
"PPC:
- Fix a bug where we try to do an ultracall on a system without an
ultravisor
KVM:
- Fix uninitialised sysreg accessor
- Fix handling of demand-paged device mappings
- Stop spamming the console on IMPDEF sysregs
- Relax mappings of writable memslots
- Assorted cleanups
MIPS:
- Now orphan, James Hogan is stepping down
x86:
- MAINTAINERS change, so long Radim and thanks for all the fish
- supported CPUID fixes for AMD machines without SPEC_CTRL"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
MAINTAINERS: remove Radim from KVM maintainers
MAINTAINERS: Orphan KVM for MIPS
kvm: x86: Host feature SSBD doesn't imply guest feature AMD_SSBD
kvm: x86: Host feature SSBD doesn't imply guest feature SPEC_CTRL_SSBD
KVM: PPC: Book3S HV: Don't do ultravisor calls on systems without ultravisor
KVM: arm/arm64: Properly handle faulting of device mappings
KVM: arm64: Ensure 'params' is initialised when looking up sys register
KVM: arm/arm64: Remove excessive permission check in kvm_arch_prepare_memory_region
KVM: arm64: Don't log IMP DEF sysreg traps
KVM: arm64: Sanely ratelimit sysreg messages
KVM: arm/arm64: vgic: Use wrapper function to lock/unlock all vcpus in kvm_vgic_create()
KVM: arm/arm64: vgic: Fix potential double free dist->spis in __kvm_vgic_destroy()
KVM: arm/arm64: Get rid of unused arg in cpu_init_hyp_mode()
Under heavy loads where the kyber I/O scheduler hits the token limits for
its scheduling domains, kyber can become stuck. When active requests
complete, kyber may not be woken up leaving the I/O requests in kyber
stuck.
This stuck state is due to a race condition with kyber and the sbitmap
functions it uses to run a callback when enough requests have completed.
The running of a sbt_wait callback can race with the attempt to insert the
sbt_wait. Since sbitmap_del_wait_queue removes the sbt_wait from the list
first then sets the sbq field to NULL, kyber can see the item as not on a
list but the call to sbitmap_add_wait_queue will see sbq as non-NULL. This
results in the sbt_wait being inserted onto the wait list but ws_active
doesn't get incremented. So the sbitmap queue does not know there is a
waiter on a wait list.
Since sbitmap doesn't think there is a waiter, kyber may never be
informed that there are domain tokens available and the I/O never advances.
With the sbt_wait on a wait list, kyber believes it has an active waiter
so cannot insert a new waiter when reaching the domain's full state.
This race can be fixed by only adding the sbt_wait to the queue if the
sbq field is NULL. If sbq is not NULL, there is already an action active
which will trigger the re-running of kyber. Let it run and add the
sbt_wait to the wait list if still needing to wait.
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Jeffery <djeffery@redhat.com>
Reported-by: John Pittman <jpittman@redhat.com>
Tested-by: John Pittman <jpittman@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Warning is found when compile with "-Wunused-but-set-variable":
fs/ext4/namei.c: In function ‘ext4_add_entry’:
fs/ext4/namei.c:2167:23: warning: variable ‘sbi’ set but not used
[-Wunused-but-set-variable]
struct ext4_sb_info *sbi;
^~~
Fix this by moving the variable @sbi under CONFIG_UNICODE.
Signed-off-by: Yunfeng Ye <yeyunfeng@huawei.com>
Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com>
Link: https://lore.kernel.org/r/cb5eb904-224a-9701-c38f-cb23514b1fff@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>