Linux kernel ============ This file was moved to Documentation/admin-guide/README.rst Please notice that there are several guides for kernel developers and users. These guides can be rendered in a number of formats, like HTML and PDF. In order to build the documentation, use ``make htmldocs`` or ``make pdfdocs``. There are various text files in the Documentation/ subdirectory, several of them using the Restructured Text markup notation. See Documentation/00-INDEX for a list of what is contained in each file. Please read the Documentation/process/changes.rst file, as it contains the requirements for building and running the kernel, and information about the problems which may result by upgrading your kernel.
Since the commit list is not ordered, it is possible for nfs_scan_commit_list
to hold a request that nfs_lock_and_join_requests() is waiting for, while
at the same time trying to grab a request that nfs_lock_and_join_requests
already holds.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
The writeback code wants to send a commit after processing the pages,
which is why we want to delay releasing the struct path until after
that's done.
Also, the layout code expects that we do not free the inode before
we've put the layout segments in pnfs_writehdr_free() and
pnfs_readhdr_free()
Fixes: 919e3bd9a875 ("NFS: Ensure we commit after writeback is complete")
Fixes: 4714fb51fd03 ("nfs: remove pgio_header refcount, related cleanup")
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Since commit 18290650b1c8 ("NFS: Move buffered I/O locking into
nfs_file_write()") nfs_file_write() has not flushed the correct byte
range during synchronous writes. generic_write_sync() expects that
iocb->ki_pos points to the right edge of the range rather than the
left edge.
To replicate the problem, open a file with O_DSYNC, have the client
write at increasing offsets, and then print the successful offsets.
Block port 2049 partway through that sequence, and observe that the
client application indicates successful writes in advance of what the
server received.
Fixes: 18290650b1c8 ("NFS: Move buffered I/O locking into nfs_file_write()")
Signed-off-by: Jacob Strauss <jsstraus@amazon.com>
Signed-off-by: Tarang Gupta <tarangg@amazon.com>
Tested-by: Tarang Gupta <tarangg@amazon.com>
Cc: stable@vger.kernel.org # v4.8+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Omit an extra message for a memory allocation failure in this function.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
This field hasn't been used since commit 57b691819ee2 ("NFS: Cache
access checks more aggressively").
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
When a byte range lock (or flock) is taken out on an NFS file, the
validity of the cached data is checked and the inode is marked
NFS_INODE_INVALID_DATA. However the cached data isn't flushed from
the page cache.
This is sufficient for future read() requests or mmap() requests as
they call nfs_revalidate_mapping() which performs the flush if
necessary.
However an existing mapping is not affected. Accessing data through
that mapping will continue to return old data even though the inode is
marked NFS_INODE_INVALID_DATA.
This can easily be confirmed using the 'nfs' tool in
git://github.com/okirch/twopence-nfs.git
and running
nfs coherence FILENAME
on one client, and
nfs coherence -r FILENAME
on another client.
It appears that prior to Linux 2.6.0 this worked correctly.
However commit:
http://git.kernel.org/cgit/linux/kernel/git/history/history.git/commit/?id=ca9268fe3ddd075714005adecd4afbd7f9ab87d0
removed the call to inode_invalidate_pages() from nfs_zap_caches(). I
haven't tested this code, but inspection suggests that prior to this
commit, file locking would invalidate all inode pages.
This patch adds a call to nfs_revalidate_mapping() after a
successful SETLK so that invalid data is flushed. With this patch the
above test passes. To minimize impact (and possibly avoid a GETATTR
call) this only happens if the mapping might be mapped into
userspace.
Cc: Olaf Kirch <okir@suse.com>
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
RPC_TASK_NO_RETRANS_TIMEOUT is set when cl_noretranstimeo
is set, which happens when RPC_CLNT_CREATE_NO_RETRANS_TIMEOUT is set,
which happens when NFS_CS_NO_RETRANS_TIMEOUT is set.
This flag means "don't resend on a timeout, only resend if the
connection gets broken for some reason".
cl_discrtry is set when RPC_CLNT_CREATE_DISCRTRY is set, which
happens when NFS_CS_DISCRTRY is set.
This flag means "always disconnect before resending".
NFS_CS_NO_RETRANS_TIMEOUT and NFS_CS_DISCRTRY are both only set
in nfs4_init_client(), and it always sets both.
So we will never have a situation where only one of the flags is set.
So this code, which tests if timeout retransmits are allowed, and
disconnection is required, will never run.
So it makes sense to remove this code as it cannot be tested and
could confuse people reading the code (like me).
(alternately we could leave it there with a comment saying
it is never actually used).
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Commit fbe77c30e9ab ("NFS: move rw_mode to nfs_pageio_header")
reintroduced some pointless code that commit 518662e0fcb9 ("NFS: fix
usage of mempools.") had recently removed.
Remove it again.
Cc: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Adopt the use of xprt_pin_rqst to eliminate contention between
Call-side users of rb_lock and the use of rb_lock in
rpcrdma_reply_handler.
This replaces the mechanism introduced in 431af645cf66 ("xprtrdma:
Fix client lock-up after application signal fires").
Use recv_lock to quickly find the completing rqst, pin it, then
drop the lock. At that point invalidation and pull-up of the Reply
XDR can be done. Both are often expensive operations.
Finally, take recv_lock again to signal completion to the RPC
layer. It also protects adjustment of "cwnd".
This greatly reduces the amount of time a lock is held by the
reply handler. Comparing lock_stat results shows a marked decrease
in contention on rb_lock and recv_lock.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
[trond.myklebust@primarydata.com: Remove call to rpcrdma_buffer_put() from
the "out_norqst:" path in rpcrdma_reply_handler.]
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
NFS-over-RDMA client updates for Linux 4.14
Bugfixes and cleanups:
- Constify rpc_xprt_ops
- Harden RPC call encoding and decoding
- Clean up rpc call decoding to use xdr_streams
- Remove unused variables from various structures
- Refactor code to remove imul instructions
- Rearrange rx_stats structure for better cacheline sharing
To reduce false cacheline sharing, separate counters that are likely
to be accessed in the Call path from those accessed in the Reply
path.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
For a while now any NFSv2 mount where sec= is specified uses
AUTH_NULL. If sec= is not specified, the mount uses AUTH_UNIX.
Commit e68fd7c8071d ("mount: use sec= that was specified on the
command line") attempted to address a very similar problem with
NFSv3, and should have fixed this too, but it has a bug.
The MNTv1 MNT procedure does not return a list of security flavors,
so our client makes up a list containing just AUTH_NULL. This should
enable nfs_verify_authflavors() to assign the sec= specified flavor,
but instead, it incorrectly sets it to AUTH_NULL.
I expect this would also be a problem for any NFSv3 server whose
MNTv3 MNT procedure returned a security flavor list containing only
AUTH_NULL.
Fixes: e68fd7c8071d ("mount: use sec= that was specified on ... ")
BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=310
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>