Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull vfs updates from Al Viro:
"Stuff in here:

- acct.c fixes and general rework of mnt_pin mechanism. That allows
to go for delayed-mntput stuff, which will permit mntput() on deep
stack without worrying about stack overflows - fs shutdown will
happen on shallow stack. IOW, we can do Eric's umount-on-rmdir
series without introducing tons of stack overflows on new mntput()
call chains it introduces.
- Bruce's d_splice_alias() patches
- more Miklos' rename() stuff.
- a couple of regression fixes (stable fodder, in the end of branch)
and a fix for API idiocy in iov_iter.c.

There definitely will be another pile, maybe even two. I'd like to
get Eric's series in this time, but even if we miss it, it'll go right
in the beginning of for-next in the next cycle - the tricky part of
prereqs is in this pile"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (40 commits)
fix copy_tree() regression
__generic_file_write_iter(): fix handling of sync error after DIO
switch iov_iter_get_pages() to passing maximal number of pages
fs: mark __d_obtain_alias static
dcache: d_splice_alias should detect loops
exportfs: update Exporting documentation
dcache: d_find_alias needn't recheck IS_ROOT && DCACHE_DISCONNECTED
dcache: remove unused d_find_alias parameter
dcache: d_obtain_alias callers don't all want DISCONNECTED
dcache: d_splice_alias should ignore DCACHE_DISCONNECTED
dcache: d_splice_alias mustn't create directory aliases
dcache: close d_move race in d_splice_alias
dcache: move d_splice_alias
namei: trivial fix to vfs_rename_dir comment
VFS: allow ->d_manage() to declare -EISDIR in rcu_walk mode.
cifs: support RENAME_NOREPLACE
hostfs: support rename flags
shmem: support RENAME_EXCHANGE
shmem: support RENAME_NOREPLACE
btrfs: add RENAME_NOREPLACE
...

+641 -461
+23 -15
Documentation/filesystems/nfs/Exporting
··· 66 66 67 67 c/ Helper routines to allocate anonymous dentries, and to help attach 68 68 loose directory dentries at lookup time. They are: 69 - d_alloc_anon(inode) will return a dentry for the given inode. 69 + d_obtain_alias(inode) will return a dentry for the given inode. 70 70 If the inode already has a dentry, one of those is returned. 71 71 If it doesn't, a new anonymous (IS_ROOT and 72 72 DCACHE_DISCONNECTED) dentry is allocated and attached. 73 73 In the case of a directory, care is taken that only one dentry 74 74 can ever be attached. 75 - d_splice_alias(inode, dentry) will make sure that there is a 76 - dentry with the same name and parent as the given dentry, and 77 - which refers to the given inode. 78 - If the inode is a directory and already has a dentry, then that 79 - dentry is d_moved over the given dentry. 80 - If the passed dentry gets attached, care is taken that this is 81 - mutually exclusive to a d_alloc_anon operation. 82 - If the passed dentry is used, NULL is returned, else the used 83 - dentry is returned. This corresponds to the calling pattern of 84 - ->lookup. 85 - 75 + d_splice_alias(inode, dentry) or d_materialise_unique(dentry, inode) 76 + will introduce a new dentry into the tree; either the passed-in 77 + dentry or a preexisting alias for the given inode (such as an 78 + anonymous one created by d_obtain_alias), if appropriate. The two 79 + functions differ in their handling of directories with preexisting 80 + aliases: 81 + d_splice_alias will use any existing IS_ROOT dentry, but it will 82 + return -EIO rather than try to move a dentry with a different 83 + parent. This is appropriate for local filesystems, which 84 + should never see such an alias unless the filesystem is 85 + corrupted somehow (for example, if two on-disk directory 86 + entries refer to the same directory.) 87 + d_materialise_unique will attempt to move any dentry. This is 88 + appropriate for distributed filesystems, where finding a 89 + directory other than where we last cached it may be a normal 90 + consequence of concurrent operations on other hosts. 91 + Both functions return NULL when the passed-in dentry is used, 92 + following the calling convention of ->lookup. 93 + 86 94 87 95 Filesystem Issues 88 96 ----------------- ··· 128 120 129 121 fh_to_dentry (mandatory) 130 122 Given a filehandle fragment, this should find the implied object and 131 - create a dentry for it (possibly with d_alloc_anon). 123 + create a dentry for it (possibly with d_obtain_alias). 132 124 133 125 fh_to_parent (optional but strongly recommended) 134 126 Given a filehandle fragment, this should find the parent of the 135 - implied object and create a dentry for it (possibly with d_alloc_anon). 136 - May fail if the filehandle fragment is too small. 127 + implied object and create a dentry for it (possibly with 128 + d_obtain_alias). May fail if the filehandle fragment is too small. 137 129 138 130 get_parent (optional but strongly recommended) 139 131 When given a dentry for a directory, this should return a dentry for
+2 -1
Documentation/filesystems/vfs.txt
··· 1053 1053 If the 'rcu_walk' parameter is true, then the caller is doing a 1054 1054 pathwalk in RCU-walk mode. Sleeping is not permitted in this mode, 1055 1055 and the caller can be asked to leave it and call again by returning 1056 - -ECHILD. 1056 + -ECHILD. -EISDIR may also be returned to tell pathwalk to 1057 + ignore d_automount or any mounts. 1057 1058 1058 1059 This function is only used if DCACHE_MANAGE_TRANSIT is set on the 1059 1060 dentry being transited from.
+1 -1
fs/Makefile
··· 11 11 attr.o bad_inode.o file.o filesystems.o namespace.o \ 12 12 seq_file.o xattr.o libfs.o fs-writeback.o \ 13 13 pnode.o splice.o sync.o utimes.o \ 14 - stack.o fs_struct.o statfs.o 14 + stack.o fs_struct.o statfs.o fs_pin.o 15 15 16 16 ifeq ($(CONFIG_BLOCK),y) 17 17 obj-y += buffer.o block_dev.o direct-io.o mpage.o
+4 -3
fs/bad_inode.c
··· 218 218 return -EIO; 219 219 } 220 220 221 - static int bad_inode_rename (struct inode *old_dir, struct dentry *old_dentry, 222 - struct inode *new_dir, struct dentry *new_dentry) 221 + static int bad_inode_rename2(struct inode *old_dir, struct dentry *old_dentry, 222 + struct inode *new_dir, struct dentry *new_dentry, 223 + unsigned int flags) 223 224 { 224 225 return -EIO; 225 226 } ··· 280 279 .mkdir = bad_inode_mkdir, 281 280 .rmdir = bad_inode_rmdir, 282 281 .mknod = bad_inode_mknod, 283 - .rename = bad_inode_rename, 282 + .rename2 = bad_inode_rename2, 284 283 .readlink = bad_inode_readlink, 285 284 /* follow_link must be no-op, otherwise unmounting this inode 286 285 won't work */
+11 -1
fs/btrfs/inode.c
··· 8476 8476 return ret; 8477 8477 } 8478 8478 8479 + static int btrfs_rename2(struct inode *old_dir, struct dentry *old_dentry, 8480 + struct inode *new_dir, struct dentry *new_dentry, 8481 + unsigned int flags) 8482 + { 8483 + if (flags & ~RENAME_NOREPLACE) 8484 + return -EINVAL; 8485 + 8486 + return btrfs_rename(old_dir, old_dentry, new_dir, new_dentry); 8487 + } 8488 + 8479 8489 static void btrfs_run_delalloc_work(struct btrfs_work *work) 8480 8490 { 8481 8491 struct btrfs_delalloc_work *delalloc_work; ··· 9029 9019 .link = btrfs_link, 9030 9020 .mkdir = btrfs_mkdir, 9031 9021 .rmdir = btrfs_rmdir, 9032 - .rename = btrfs_rename, 9022 + .rename2 = btrfs_rename2, 9033 9023 .symlink = btrfs_symlink, 9034 9024 .setattr = btrfs_setattr, 9035 9025 .mknod = btrfs_mknod,
+1 -8
fs/btrfs/super.c
··· 851 851 struct btrfs_path *path; 852 852 struct btrfs_key location; 853 853 struct inode *inode; 854 - struct dentry *dentry; 855 854 u64 dir_id; 856 855 int new = 0; 857 856 ··· 921 922 return dget(sb->s_root); 922 923 } 923 924 924 - dentry = d_obtain_alias(inode); 925 - if (!IS_ERR(dentry)) { 926 - spin_lock(&dentry->d_lock); 927 - dentry->d_flags &= ~DCACHE_DISCONNECTED; 928 - spin_unlock(&dentry->d_lock); 929 - } 930 - return dentry; 925 + return d_obtain_root(inode); 931 926 } 932 927 933 928 static int btrfs_fill_super(struct super_block *sb,
+1 -1
fs/ceph/super.c
··· 755 755 goto out; 756 756 } 757 757 } else { 758 - root = d_obtain_alias(inode); 758 + root = d_obtain_root(inode); 759 759 } 760 760 ceph_init_dentry(root); 761 761 dout("open_root_inode success, root dentry is %p\n", root);
+1 -1
fs/cifs/cifsfs.c
··· 848 848 .link = cifs_hardlink, 849 849 .mkdir = cifs_mkdir, 850 850 .rmdir = cifs_rmdir, 851 - .rename = cifs_rename, 851 + .rename2 = cifs_rename2, 852 852 .permission = cifs_permission, 853 853 /* revalidate:cifs_revalidate, */ 854 854 .setattr = cifs_setattr,
+2 -2
fs/cifs/cifsfs.h
··· 68 68 extern int cifs_mknod(struct inode *, struct dentry *, umode_t, dev_t); 69 69 extern int cifs_mkdir(struct inode *, struct dentry *, umode_t); 70 70 extern int cifs_rmdir(struct inode *, struct dentry *); 71 - extern int cifs_rename(struct inode *, struct dentry *, struct inode *, 72 - struct dentry *); 71 + extern int cifs_rename2(struct inode *, struct dentry *, struct inode *, 72 + struct dentry *, unsigned int); 73 73 extern int cifs_revalidate_file_attr(struct file *filp); 74 74 extern int cifs_revalidate_dentry_attr(struct dentry *); 75 75 extern int cifs_revalidate_file(struct file *filp);
+12 -2
fs/cifs/inode.c
··· 1627 1627 } 1628 1628 1629 1629 int 1630 - cifs_rename(struct inode *source_dir, struct dentry *source_dentry, 1631 - struct inode *target_dir, struct dentry *target_dentry) 1630 + cifs_rename2(struct inode *source_dir, struct dentry *source_dentry, 1631 + struct inode *target_dir, struct dentry *target_dentry, 1632 + unsigned int flags) 1632 1633 { 1633 1634 char *from_name = NULL; 1634 1635 char *to_name = NULL; ··· 1640 1639 FILE_UNIX_BASIC_INFO *info_buf_target; 1641 1640 unsigned int xid; 1642 1641 int rc, tmprc; 1642 + 1643 + if (flags & ~RENAME_NOREPLACE) 1644 + return -EINVAL; 1643 1645 1644 1646 cifs_sb = CIFS_SB(source_dir->i_sb); 1645 1647 tlink = cifs_sb_tlink(cifs_sb); ··· 1670 1666 1671 1667 rc = cifs_do_rename(xid, source_dentry, from_name, target_dentry, 1672 1668 to_name); 1669 + 1670 + /* 1671 + * No-replace is the natural behavior for CIFS, so skip unlink hacks. 1672 + */ 1673 + if (flags & RENAME_NOREPLACE) 1674 + goto cifs_rename_exit; 1673 1675 1674 1676 if (rc == -EEXIST && tcon->unix_ext) { 1675 1677 /*
+118 -78
fs/dcache.c
··· 731 731 /** 732 732 * d_find_alias - grab a hashed alias of inode 733 733 * @inode: inode in question 734 - * @want_discon: flag, used by d_splice_alias, to request 735 - * that only a DISCONNECTED alias be returned. 736 734 * 737 735 * If inode has a hashed alias, or is a directory and has any alias, 738 736 * acquire the reference to alias and return it. Otherwise return NULL. ··· 739 741 * of a filesystem. 740 742 * 741 743 * If the inode has an IS_ROOT, DCACHE_DISCONNECTED alias, then prefer 742 - * any other hashed alias over that one unless @want_discon is set, 743 - * in which case only return an IS_ROOT, DCACHE_DISCONNECTED alias. 744 + * any other hashed alias over that one. 744 745 */ 745 - static struct dentry *__d_find_alias(struct inode *inode, int want_discon) 746 + static struct dentry *__d_find_alias(struct inode *inode) 746 747 { 747 748 struct dentry *alias, *discon_alias; 748 749 ··· 753 756 if (IS_ROOT(alias) && 754 757 (alias->d_flags & DCACHE_DISCONNECTED)) { 755 758 discon_alias = alias; 756 - } else if (!want_discon) { 759 + } else { 757 760 __dget_dlock(alias); 758 761 spin_unlock(&alias->d_lock); 759 762 return alias; ··· 765 768 alias = discon_alias; 766 769 spin_lock(&alias->d_lock); 767 770 if (S_ISDIR(inode->i_mode) || !d_unhashed(alias)) { 768 - if (IS_ROOT(alias) && 769 - (alias->d_flags & DCACHE_DISCONNECTED)) { 770 - __dget_dlock(alias); 771 - spin_unlock(&alias->d_lock); 772 - return alias; 773 - } 771 + __dget_dlock(alias); 772 + spin_unlock(&alias->d_lock); 773 + return alias; 774 774 } 775 775 spin_unlock(&alias->d_lock); 776 776 goto again; ··· 781 787 782 788 if (!hlist_empty(&inode->i_dentry)) { 783 789 spin_lock(&inode->i_lock); 784 - de = __d_find_alias(inode, 0); 790 + de = __d_find_alias(inode); 785 791 spin_unlock(&inode->i_lock); 786 792 } 787 793 return de; ··· 1775 1781 } 1776 1782 EXPORT_SYMBOL(d_find_any_alias); 1777 1783 1778 - /** 1779 - * d_obtain_alias - find or allocate a dentry for a given inode 1780 - * @inode: inode to allocate the dentry for 1781 - * 1782 - * Obtain a dentry for an inode resulting from NFS filehandle conversion or 1783 - * similar open by handle operations. The returned dentry may be anonymous, 1784 - * or may have a full name (if the inode was already in the cache). 1785 - * 1786 - * When called on a directory inode, we must ensure that the inode only ever 1787 - * has one dentry. If a dentry is found, that is returned instead of 1788 - * allocating a new one. 1789 - * 1790 - * On successful return, the reference to the inode has been transferred 1791 - * to the dentry. In case of an error the reference on the inode is released. 1792 - * To make it easier to use in export operations a %NULL or IS_ERR inode may 1793 - * be passed in and will be the error will be propagate to the return value, 1794 - * with a %NULL @inode replaced by ERR_PTR(-ESTALE). 1795 - */ 1796 - struct dentry *d_obtain_alias(struct inode *inode) 1784 + static struct dentry *__d_obtain_alias(struct inode *inode, int disconnected) 1797 1785 { 1798 1786 static const struct qstr anonstring = QSTR_INIT("/", 1); 1799 1787 struct dentry *tmp; ··· 1806 1830 } 1807 1831 1808 1832 /* attach a disconnected dentry */ 1809 - add_flags = d_flags_for_inode(inode) | DCACHE_DISCONNECTED; 1833 + add_flags = d_flags_for_inode(inode); 1834 + 1835 + if (disconnected) 1836 + add_flags |= DCACHE_DISCONNECTED; 1810 1837 1811 1838 spin_lock(&tmp->d_lock); 1812 1839 tmp->d_inode = inode; ··· 1830 1851 iput(inode); 1831 1852 return res; 1832 1853 } 1854 + 1855 + /** 1856 + * d_obtain_alias - find or allocate a DISCONNECTED dentry for a given inode 1857 + * @inode: inode to allocate the dentry for 1858 + * 1859 + * Obtain a dentry for an inode resulting from NFS filehandle conversion or 1860 + * similar open by handle operations. The returned dentry may be anonymous, 1861 + * or may have a full name (if the inode was already in the cache). 1862 + * 1863 + * When called on a directory inode, we must ensure that the inode only ever 1864 + * has one dentry. If a dentry is found, that is returned instead of 1865 + * allocating a new one. 1866 + * 1867 + * On successful return, the reference to the inode has been transferred 1868 + * to the dentry. In case of an error the reference on the inode is released. 1869 + * To make it easier to use in export operations a %NULL or IS_ERR inode may 1870 + * be passed in and the error will be propagated to the return value, 1871 + * with a %NULL @inode replaced by ERR_PTR(-ESTALE). 1872 + */ 1873 + struct dentry *d_obtain_alias(struct inode *inode) 1874 + { 1875 + return __d_obtain_alias(inode, 1); 1876 + } 1833 1877 EXPORT_SYMBOL(d_obtain_alias); 1834 1878 1835 1879 /** 1836 - * d_splice_alias - splice a disconnected dentry into the tree if one exists 1837 - * @inode: the inode which may have a disconnected dentry 1838 - * @dentry: a negative dentry which we want to point to the inode. 1880 + * d_obtain_root - find or allocate a dentry for a given inode 1881 + * @inode: inode to allocate the dentry for 1839 1882 * 1840 - * If inode is a directory and has a 'disconnected' dentry (i.e. IS_ROOT and 1841 - * DCACHE_DISCONNECTED), then d_move that in place of the given dentry 1842 - * and return it, else simply d_add the inode to the dentry and return NULL. 1883 + * Obtain an IS_ROOT dentry for the root of a filesystem. 1843 1884 * 1844 - * This is needed in the lookup routine of any filesystem that is exportable 1845 - * (via knfsd) so that we can build dcache paths to directories effectively. 1885 + * We must ensure that directory inodes only ever have one dentry. If a 1886 + * dentry is found, that is returned instead of allocating a new one. 1846 1887 * 1847 - * If a dentry was found and moved, then it is returned. Otherwise NULL 1848 - * is returned. This matches the expected return value of ->lookup. 1849 - * 1850 - * Cluster filesystems may call this function with a negative, hashed dentry. 1851 - * In that case, we know that the inode will be a regular file, and also this 1852 - * will only occur during atomic_open. So we need to check for the dentry 1853 - * being already hashed only in the final case. 1888 + * On successful return, the reference to the inode has been transferred 1889 + * to the dentry. In case of an error the reference on the inode is 1890 + * released. A %NULL or IS_ERR inode may be passed in and will be the 1891 + * error will be propagate to the return value, with a %NULL @inode 1892 + * replaced by ERR_PTR(-ESTALE). 1854 1893 */ 1855 - struct dentry *d_splice_alias(struct inode *inode, struct dentry *dentry) 1894 + struct dentry *d_obtain_root(struct inode *inode) 1856 1895 { 1857 - struct dentry *new = NULL; 1858 - 1859 - if (IS_ERR(inode)) 1860 - return ERR_CAST(inode); 1861 - 1862 - if (inode && S_ISDIR(inode->i_mode)) { 1863 - spin_lock(&inode->i_lock); 1864 - new = __d_find_alias(inode, 1); 1865 - if (new) { 1866 - BUG_ON(!(new->d_flags & DCACHE_DISCONNECTED)); 1867 - spin_unlock(&inode->i_lock); 1868 - security_d_instantiate(new, inode); 1869 - d_move(new, dentry); 1870 - iput(inode); 1871 - } else { 1872 - /* already taking inode->i_lock, so d_add() by hand */ 1873 - __d_instantiate(dentry, inode); 1874 - spin_unlock(&inode->i_lock); 1875 - security_d_instantiate(dentry, inode); 1876 - d_rehash(dentry); 1877 - } 1878 - } else { 1879 - d_instantiate(dentry, inode); 1880 - if (d_unhashed(dentry)) 1881 - d_rehash(dentry); 1882 - } 1883 - return new; 1896 + return __d_obtain_alias(inode, 0); 1884 1897 } 1885 - EXPORT_SYMBOL(d_splice_alias); 1898 + EXPORT_SYMBOL(d_obtain_root); 1886 1899 1887 1900 /** 1888 1901 * d_add_ci - lookup or allocate new dentry with case-exact name ··· 2668 2697 } 2669 2698 2670 2699 /** 2700 + * d_splice_alias - splice a disconnected dentry into the tree if one exists 2701 + * @inode: the inode which may have a disconnected dentry 2702 + * @dentry: a negative dentry which we want to point to the inode. 2703 + * 2704 + * If inode is a directory and has an IS_ROOT alias, then d_move that in 2705 + * place of the given dentry and return it, else simply d_add the inode 2706 + * to the dentry and return NULL. 2707 + * 2708 + * If a non-IS_ROOT directory is found, the filesystem is corrupt, and 2709 + * we should error out: directories can't have multiple aliases. 2710 + * 2711 + * This is needed in the lookup routine of any filesystem that is exportable 2712 + * (via knfsd) so that we can build dcache paths to directories effectively. 2713 + * 2714 + * If a dentry was found and moved, then it is returned. Otherwise NULL 2715 + * is returned. This matches the expected return value of ->lookup. 2716 + * 2717 + * Cluster filesystems may call this function with a negative, hashed dentry. 2718 + * In that case, we know that the inode will be a regular file, and also this 2719 + * will only occur during atomic_open. So we need to check for the dentry 2720 + * being already hashed only in the final case. 2721 + */ 2722 + struct dentry *d_splice_alias(struct inode *inode, struct dentry *dentry) 2723 + { 2724 + struct dentry *new = NULL; 2725 + 2726 + if (IS_ERR(inode)) 2727 + return ERR_CAST(inode); 2728 + 2729 + if (inode && S_ISDIR(inode->i_mode)) { 2730 + spin_lock(&inode->i_lock); 2731 + new = __d_find_any_alias(inode); 2732 + if (new) { 2733 + if (!IS_ROOT(new)) { 2734 + spin_unlock(&inode->i_lock); 2735 + dput(new); 2736 + return ERR_PTR(-EIO); 2737 + } 2738 + if (d_ancestor(new, dentry)) { 2739 + spin_unlock(&inode->i_lock); 2740 + dput(new); 2741 + return ERR_PTR(-EIO); 2742 + } 2743 + write_seqlock(&rename_lock); 2744 + __d_materialise_dentry(dentry, new); 2745 + write_sequnlock(&rename_lock); 2746 + __d_drop(new); 2747 + _d_rehash(new); 2748 + spin_unlock(&new->d_lock); 2749 + spin_unlock(&inode->i_lock); 2750 + security_d_instantiate(new, inode); 2751 + iput(inode); 2752 + } else { 2753 + /* already taking inode->i_lock, so d_add() by hand */ 2754 + __d_instantiate(dentry, inode); 2755 + spin_unlock(&inode->i_lock); 2756 + security_d_instantiate(dentry, inode); 2757 + d_rehash(dentry); 2758 + } 2759 + } else { 2760 + d_instantiate(dentry, inode); 2761 + if (d_unhashed(dentry)) 2762 + d_rehash(dentry); 2763 + } 2764 + return new; 2765 + } 2766 + EXPORT_SYMBOL(d_splice_alias); 2767 + 2768 + /** 2671 2769 * d_materialise_unique - introduce an inode into the tree 2672 2770 * @dentry: candidate dentry 2673 2771 * @inode: inode to bind to the dentry, to which aliases may be attached ··· 2764 2724 struct dentry *alias; 2765 2725 2766 2726 /* Does an aliased dentry already exist? */ 2767 - alias = __d_find_alias(inode, 0); 2727 + alias = __d_find_alias(inode); 2768 2728 if (alias) { 2769 2729 actual = alias; 2770 2730 write_seqlock(&rename_lock);
+1 -1
fs/direct-io.c
··· 158 158 { 159 159 ssize_t ret; 160 160 161 - ret = iov_iter_get_pages(sdio->iter, dio->pages, DIO_PAGES * PAGE_SIZE, 161 + ret = iov_iter_get_pages(sdio->iter, dio->pages, DIO_PAGES, 162 162 &sdio->from); 163 163 164 164 if (ret < 0 && sdio->blocks_available && (dio->rw & WRITE)) {
-1
fs/ext4/namei.c
··· 3455 3455 .rmdir = ext4_rmdir, 3456 3456 .mknod = ext4_mknod, 3457 3457 .tmpfile = ext4_tmpfile, 3458 - .rename = ext4_rename, 3459 3458 .rename2 = ext4_rename2, 3460 3459 .setattr = ext4_setattr, 3461 3460 .setxattr = generic_setxattr,
+78
fs/fs_pin.c
··· 1 + #include <linux/fs.h> 2 + #include <linux/slab.h> 3 + #include <linux/fs_pin.h> 4 + #include "internal.h" 5 + #include "mount.h" 6 + 7 + static void pin_free_rcu(struct rcu_head *head) 8 + { 9 + kfree(container_of(head, struct fs_pin, rcu)); 10 + } 11 + 12 + static DEFINE_SPINLOCK(pin_lock); 13 + 14 + void pin_put(struct fs_pin *p) 15 + { 16 + if (atomic_long_dec_and_test(&p->count)) 17 + call_rcu(&p->rcu, pin_free_rcu); 18 + } 19 + 20 + void pin_remove(struct fs_pin *pin) 21 + { 22 + spin_lock(&pin_lock); 23 + hlist_del(&pin->m_list); 24 + hlist_del(&pin->s_list); 25 + spin_unlock(&pin_lock); 26 + } 27 + 28 + void pin_insert(struct fs_pin *pin, struct vfsmount *m) 29 + { 30 + spin_lock(&pin_lock); 31 + hlist_add_head(&pin->s_list, &m->mnt_sb->s_pins); 32 + hlist_add_head(&pin->m_list, &real_mount(m)->mnt_pins); 33 + spin_unlock(&pin_lock); 34 + } 35 + 36 + void mnt_pin_kill(struct mount *m) 37 + { 38 + while (1) { 39 + struct hlist_node *p; 40 + struct fs_pin *pin; 41 + rcu_read_lock(); 42 + p = ACCESS_ONCE(m->mnt_pins.first); 43 + if (!p) { 44 + rcu_read_unlock(); 45 + break; 46 + } 47 + pin = hlist_entry(p, struct fs_pin, m_list); 48 + if (!atomic_long_inc_not_zero(&pin->count)) { 49 + rcu_read_unlock(); 50 + cpu_relax(); 51 + continue; 52 + } 53 + rcu_read_unlock(); 54 + pin->kill(pin); 55 + } 56 + } 57 + 58 + void sb_pin_kill(struct super_block *sb) 59 + { 60 + while (1) { 61 + struct hlist_node *p; 62 + struct fs_pin *pin; 63 + rcu_read_lock(); 64 + p = ACCESS_ONCE(sb->s_pins.first); 65 + if (!p) { 66 + rcu_read_unlock(); 67 + break; 68 + } 69 + pin = hlist_entry(p, struct fs_pin, s_list); 70 + if (!atomic_long_inc_not_zero(&pin->count)) { 71 + rcu_read_unlock(); 72 + cpu_relax(); 73 + continue; 74 + } 75 + rcu_read_unlock(); 76 + pin->kill(pin); 77 + } 78 + }
-7
fs/fuse/dir.c
··· 845 845 return err; 846 846 } 847 847 848 - static int fuse_rename(struct inode *olddir, struct dentry *oldent, 849 - struct inode *newdir, struct dentry *newent) 850 - { 851 - return fuse_rename2(olddir, oldent, newdir, newent, 0); 852 - } 853 - 854 848 static int fuse_link(struct dentry *entry, struct inode *newdir, 855 849 struct dentry *newent) 856 850 { ··· 2018 2024 .symlink = fuse_symlink, 2019 2025 .unlink = fuse_unlink, 2020 2026 .rmdir = fuse_rmdir, 2021 - .rename = fuse_rename, 2022 2027 .rename2 = fuse_rename2, 2023 2028 .link = fuse_link, 2024 2029 .setattr = fuse_setattr,
+2 -2
fs/fuse/file.c
··· 1303 1303 while (nbytes < *nbytesp && req->num_pages < req->max_pages) { 1304 1304 unsigned npages; 1305 1305 size_t start; 1306 - unsigned n = req->max_pages - req->num_pages; 1307 1306 ssize_t ret = iov_iter_get_pages(ii, 1308 1307 &req->pages[req->num_pages], 1309 - n * PAGE_SIZE, &start); 1308 + req->max_pages - req->num_pages, 1309 + &start); 1310 1310 if (ret < 0) 1311 1311 return ret; 1312 1312
+1
fs/hostfs/hostfs.h
··· 89 89 extern int link_file(const char *from, const char *to); 90 90 extern int hostfs_do_readlink(char *file, char *buf, int size); 91 91 extern int rename_file(char *from, char *to); 92 + extern int rename2_file(char *from, char *to, unsigned int flags); 92 93 extern int do_statfs(char *root, long *bsize_out, long long *blocks_out, 93 94 long long *bfree_out, long long *bavail_out, 94 95 long long *files_out, long long *ffree_out,
+20 -10
fs/hostfs/hostfs_kern.c
··· 741 741 return err; 742 742 } 743 743 744 - static int hostfs_rename(struct inode *from_ino, struct dentry *from, 745 - struct inode *to_ino, struct dentry *to) 744 + static int hostfs_rename2(struct inode *old_dir, struct dentry *old_dentry, 745 + struct inode *new_dir, struct dentry *new_dentry, 746 + unsigned int flags) 746 747 { 747 - char *from_name, *to_name; 748 + char *old_name, *new_name; 748 749 int err; 749 750 750 - if ((from_name = dentry_name(from)) == NULL) 751 + if (flags & ~(RENAME_NOREPLACE | RENAME_EXCHANGE)) 752 + return -EINVAL; 753 + 754 + old_name = dentry_name(old_dentry); 755 + if (old_name == NULL) 751 756 return -ENOMEM; 752 - if ((to_name = dentry_name(to)) == NULL) { 753 - __putname(from_name); 757 + new_name = dentry_name(new_dentry); 758 + if (new_name == NULL) { 759 + __putname(old_name); 754 760 return -ENOMEM; 755 761 } 756 - err = rename_file(from_name, to_name); 757 - __putname(from_name); 758 - __putname(to_name); 762 + if (!flags) 763 + err = rename_file(old_name, new_name); 764 + else 765 + err = rename2_file(old_name, new_name, flags); 766 + 767 + __putname(old_name); 768 + __putname(new_name); 759 769 return err; 760 770 } 761 771 ··· 877 867 .mkdir = hostfs_mkdir, 878 868 .rmdir = hostfs_rmdir, 879 869 .mknod = hostfs_mknod, 880 - .rename = hostfs_rename, 870 + .rename2 = hostfs_rename2, 881 871 .permission = hostfs_permission, 882 872 .setattr = hostfs_setattr, 883 873 };
+28
fs/hostfs/hostfs_user.c
··· 14 14 #include <sys/time.h> 15 15 #include <sys/types.h> 16 16 #include <sys/vfs.h> 17 + #include <sys/syscall.h> 17 18 #include "hostfs.h" 18 19 #include <utime.h> 19 20 ··· 359 358 if (err < 0) 360 359 return -errno; 361 360 return 0; 361 + } 362 + 363 + int rename2_file(char *from, char *to, unsigned int flags) 364 + { 365 + int err; 366 + 367 + #ifndef SYS_renameat2 368 + # ifdef __x86_64__ 369 + # define SYS_renameat2 316 370 + # endif 371 + # ifdef __i386__ 372 + # define SYS_renameat2 353 373 + # endif 374 + #endif 375 + 376 + #ifdef SYS_renameat2 377 + err = syscall(SYS_renameat2, AT_FDCWD, from, AT_FDCWD, to, flags); 378 + if (err < 0) { 379 + if (errno != ENOSYS) 380 + return -errno; 381 + else 382 + return -EINVAL; 383 + } 384 + return 0; 385 + #else 386 + return -EINVAL; 387 + #endif 362 388 } 363 389 364 390 int do_statfs(char *root, long *bsize_out, long long *blocks_out,
+6 -1
fs/internal.h
··· 131 131 /* 132 132 * read_write.c 133 133 */ 134 - extern ssize_t __kernel_write(struct file *, const char *, size_t, loff_t *); 135 134 extern int rw_verify_area(int, struct file *, const loff_t *, size_t); 136 135 137 136 /* ··· 143 144 * pipe.c 144 145 */ 145 146 extern const struct file_operations pipefifo_fops; 147 + 148 + /* 149 + * fs_pin.c 150 + */ 151 + extern void sb_pin_kill(struct super_block *sb); 152 + extern void mnt_pin_kill(struct mount *m);
+1 -1
fs/mount.h
··· 55 55 int mnt_id; /* mount identifier */ 56 56 int mnt_group_id; /* peer group identifier */ 57 57 int mnt_expiry_mark; /* true if marked for expiry */ 58 - int mnt_pinned; 58 + struct hlist_head mnt_pins; 59 59 struct path mnt_ex_mountpoint; 60 60 }; 61 61
+20 -14
fs/namei.c
··· 1091 1091 } 1092 1092 EXPORT_SYMBOL(follow_down_one); 1093 1093 1094 - static inline bool managed_dentry_might_block(struct dentry *dentry) 1094 + static inline int managed_dentry_rcu(struct dentry *dentry) 1095 1095 { 1096 - return (dentry->d_flags & DCACHE_MANAGE_TRANSIT && 1097 - dentry->d_op->d_manage(dentry, true) < 0); 1096 + return (dentry->d_flags & DCACHE_MANAGE_TRANSIT) ? 1097 + dentry->d_op->d_manage(dentry, true) : 0; 1098 1098 } 1099 1099 1100 1100 /* ··· 1110 1110 * Don't forget we might have a non-mountpoint managed dentry 1111 1111 * that wants to block transit. 1112 1112 */ 1113 - if (unlikely(managed_dentry_might_block(path->dentry))) 1113 + switch (managed_dentry_rcu(path->dentry)) { 1114 + case -ECHILD: 1115 + default: 1114 1116 return false; 1117 + case -EISDIR: 1118 + return true; 1119 + case 0: 1120 + break; 1121 + } 1115 1122 1116 1123 if (!d_mountpoint(path->dentry)) 1117 - return true; 1124 + return !(path->dentry->d_flags & DCACHE_NEED_AUTOMOUNT); 1118 1125 1119 1126 mounted = __lookup_mnt(path->mnt, path->dentry); 1120 1127 if (!mounted) ··· 1137 1130 */ 1138 1131 *inode = path->dentry->d_inode; 1139 1132 } 1140 - return read_seqretry(&mount_lock, nd->m_seq); 1133 + return read_seqretry(&mount_lock, nd->m_seq) && 1134 + !(path->dentry->d_flags & DCACHE_NEED_AUTOMOUNT); 1141 1135 } 1142 1136 1143 1137 static int follow_dotdot_rcu(struct nameidata *nd) ··· 1410 1402 } 1411 1403 path->mnt = mnt; 1412 1404 path->dentry = dentry; 1413 - if (unlikely(!__follow_mount_rcu(nd, path, inode))) 1414 - goto unlazy; 1415 - if (unlikely(path->dentry->d_flags & DCACHE_NEED_AUTOMOUNT)) 1416 - goto unlazy; 1417 - return 0; 1405 + if (likely(__follow_mount_rcu(nd, path, inode))) 1406 + return 0; 1418 1407 unlazy: 1419 1408 if (unlazy_walk(nd, dentry)) 1420 1409 return -ECHILD; ··· 4024 4019 * The worst of all namespace operations - renaming directory. "Perverted" 4025 4020 * doesn't even start to describe it. Somebody in UCB had a heck of a trip... 4026 4021 * Problems: 4027 - * a) we can get into loop creation. Check is done in is_subdir(). 4022 + * a) we can get into loop creation. 4028 4023 * b) race potential - two innocent renames can create a loop together. 4029 4024 * That's where 4.4 screws up. Current fix: serialization on 4030 4025 * sb->s_vfs_rename_mutex. We might be more accurate, but that's another ··· 4080 4075 if (error) 4081 4076 return error; 4082 4077 4083 - if (!old_dir->i_op->rename) 4078 + if (!old_dir->i_op->rename && !old_dir->i_op->rename2) 4084 4079 return -EPERM; 4085 4080 4086 4081 if (flags && !old_dir->i_op->rename2) ··· 4139 4134 if (error) 4140 4135 goto out; 4141 4136 } 4142 - if (!flags) { 4137 + if (!old_dir->i_op->rename2) { 4143 4138 error = old_dir->i_op->rename(old_dir, old_dentry, 4144 4139 new_dir, new_dentry); 4145 4140 } else { 4141 + WARN_ON(old_dir->i_op->rename != NULL); 4146 4142 error = old_dir->i_op->rename2(old_dir, old_dentry, 4147 4143 new_dir, new_dentry, flags); 4148 4144 }
+33 -34
fs/namespace.c
··· 16 16 #include <linux/namei.h> 17 17 #include <linux/security.h> 18 18 #include <linux/idr.h> 19 - #include <linux/acct.h> /* acct_auto_close_mnt */ 20 19 #include <linux/init.h> /* init_rootfs */ 21 20 #include <linux/fs_struct.h> /* get_fs_root et.al. */ 22 21 #include <linux/fsnotify.h> /* fsnotify_vfsmount_delete */ ··· 778 779 list_add_tail(&mnt->mnt_child, &parent->mnt_mounts); 779 780 } 780 781 782 + static void attach_shadowed(struct mount *mnt, 783 + struct mount *parent, 784 + struct mount *shadows) 785 + { 786 + if (shadows) { 787 + hlist_add_behind_rcu(&mnt->mnt_hash, &shadows->mnt_hash); 788 + list_add(&mnt->mnt_child, &shadows->mnt_child); 789 + } else { 790 + hlist_add_head_rcu(&mnt->mnt_hash, 791 + m_hash(&parent->mnt, mnt->mnt_mountpoint)); 792 + list_add_tail(&mnt->mnt_child, &parent->mnt_mounts); 793 + } 794 + } 795 + 781 796 /* 782 797 * vfsmount lock must be held for write 783 798 */ ··· 810 797 811 798 list_splice(&head, n->list.prev); 812 799 813 - if (shadows) 814 - hlist_add_behind_rcu(&mnt->mnt_hash, &shadows->mnt_hash); 815 - else 816 - hlist_add_head_rcu(&mnt->mnt_hash, 817 - m_hash(&parent->mnt, mnt->mnt_mountpoint)); 818 - list_add_tail(&mnt->mnt_child, &parent->mnt_mounts); 800 + attach_shadowed(mnt, parent, shadows); 819 801 touch_mnt_namespace(n); 820 802 } 821 803 ··· 959 951 960 952 static void mntput_no_expire(struct mount *mnt) 961 953 { 962 - put_again: 963 954 rcu_read_lock(); 964 955 mnt_add_count(mnt, -1); 965 956 if (likely(mnt->mnt_ns)) { /* shouldn't be the last one */ ··· 970 963 rcu_read_unlock(); 971 964 unlock_mount_hash(); 972 965 return; 973 - } 974 - if (unlikely(mnt->mnt_pinned)) { 975 - mnt_add_count(mnt, mnt->mnt_pinned + 1); 976 - mnt->mnt_pinned = 0; 977 - rcu_read_unlock(); 978 - unlock_mount_hash(); 979 - acct_auto_close_mnt(&mnt->mnt); 980 - goto put_again; 981 966 } 982 967 if (unlikely(mnt->mnt.mnt_flags & MNT_DOOMED)) { 983 968 rcu_read_unlock(); ··· 993 994 * so mnt_get_writers() below is safe. 994 995 */ 995 996 WARN_ON(mnt_get_writers(mnt)); 997 + if (unlikely(mnt->mnt_pins.first)) 998 + mnt_pin_kill(mnt); 996 999 fsnotify_vfsmount_delete(&mnt->mnt); 997 1000 dput(mnt->mnt.mnt_root); 998 1001 deactivate_super(mnt->mnt.mnt_sb); ··· 1022 1021 } 1023 1022 EXPORT_SYMBOL(mntget); 1024 1023 1025 - void mnt_pin(struct vfsmount *mnt) 1024 + struct vfsmount *mnt_clone_internal(struct path *path) 1026 1025 { 1027 - lock_mount_hash(); 1028 - real_mount(mnt)->mnt_pinned++; 1029 - unlock_mount_hash(); 1026 + struct mount *p; 1027 + p = clone_mnt(real_mount(path->mnt), path->dentry, CL_PRIVATE); 1028 + if (IS_ERR(p)) 1029 + return ERR_CAST(p); 1030 + p->mnt.mnt_flags |= MNT_INTERNAL; 1031 + return &p->mnt; 1030 1032 } 1031 - EXPORT_SYMBOL(mnt_pin); 1032 - 1033 - void mnt_unpin(struct vfsmount *m) 1034 - { 1035 - struct mount *mnt = real_mount(m); 1036 - lock_mount_hash(); 1037 - if (mnt->mnt_pinned) { 1038 - mnt_add_count(mnt, 1); 1039 - mnt->mnt_pinned--; 1040 - } 1041 - unlock_mount_hash(); 1042 - } 1043 - EXPORT_SYMBOL(mnt_unpin); 1044 1033 1045 1034 static inline void mangle(struct seq_file *m, const char *s) 1046 1035 { ··· 1496 1505 continue; 1497 1506 1498 1507 for (s = r; s; s = next_mnt(s, r)) { 1508 + struct mount *t = NULL; 1499 1509 if (!(flag & CL_COPY_UNBINDABLE) && 1500 1510 IS_MNT_UNBINDABLE(s)) { 1501 1511 s = skip_mnt_tree(s); ··· 1518 1526 goto out; 1519 1527 lock_mount_hash(); 1520 1528 list_add_tail(&q->mnt_list, &res->mnt_list); 1521 - attach_mnt(q, parent, p->mnt_mp); 1529 + mnt_set_mountpoint(parent, p->mnt_mp, q); 1530 + if (!list_empty(&parent->mnt_mounts)) { 1531 + t = list_last_entry(&parent->mnt_mounts, 1532 + struct mount, mnt_child); 1533 + if (t->mnt_mp != p->mnt_mp) 1534 + t = NULL; 1535 + } 1536 + attach_shadowed(q, parent, t); 1522 1537 unlock_mount_hash(); 1523 1538 } 1524 1539 }
+1 -1
fs/nfs/getroot.c
··· 112 112 * if the dentry tree reaches them; however if the dentry already 113 113 * exists, we'll pick it up at this point and use it as the root 114 114 */ 115 - ret = d_obtain_alias(inode); 115 + ret = d_obtain_root(inode); 116 116 if (IS_ERR(ret)) { 117 117 dprintk("nfs_get_root: get root dentry failed\n"); 118 118 goto out;
+1 -1
fs/nilfs2/super.c
··· 942 942 iput(inode); 943 943 } 944 944 } else { 945 - dentry = d_obtain_alias(inode); 945 + dentry = d_obtain_root(inode); 946 946 if (IS_ERR(dentry)) { 947 947 ret = PTR_ERR(dentry); 948 948 goto failed_dentry;
+14 -5
fs/super.c
··· 22 22 23 23 #include <linux/export.h> 24 24 #include <linux/slab.h> 25 - #include <linux/acct.h> 26 25 #include <linux/blkdev.h> 27 26 #include <linux/mount.h> 28 27 #include <linux/security.h> ··· 701 702 return -EACCES; 702 703 #endif 703 704 704 - if (flags & MS_RDONLY) 705 - acct_auto_close(sb); 706 - shrink_dcache_sb(sb); 707 - 708 705 remount_ro = (flags & MS_RDONLY) && !(sb->s_flags & MS_RDONLY); 706 + 707 + if (remount_ro) { 708 + if (sb->s_pins.first) { 709 + up_write(&sb->s_umount); 710 + sb_pin_kill(sb); 711 + down_write(&sb->s_umount); 712 + if (!sb->s_root) 713 + return 0; 714 + if (sb->s_writers.frozen != SB_UNFROZEN) 715 + return -EBUSY; 716 + remount_ro = (flags & MS_RDONLY) && !(sb->s_flags & MS_RDONLY); 717 + } 718 + } 719 + shrink_dcache_sb(sb); 709 720 710 721 /* If we are remounting RDONLY and current sb is read/write, 711 722 make sure there are no rw files opened */
-4
include/linux/acct.h
··· 24 24 struct pacct_struct; 25 25 struct pid_namespace; 26 26 extern int acct_parm[]; /* for sysctl */ 27 - extern void acct_auto_close_mnt(struct vfsmount *m); 28 - extern void acct_auto_close(struct super_block *sb); 29 27 extern void acct_collect(long exitcode, int group_dead); 30 28 extern void acct_process(void); 31 29 extern void acct_exit_ns(struct pid_namespace *); 32 30 #else 33 - #define acct_auto_close_mnt(x) do { } while (0) 34 - #define acct_auto_close(x) do { } while (0) 35 31 #define acct_collect(x,y) do { } while (0) 36 32 #define acct_process() do { } while (0) 37 33 #define acct_exit_ns(ns) do { } while (0)
+1
include/linux/dcache.h
··· 249 249 extern struct dentry * d_add_ci(struct dentry *, struct inode *, struct qstr *); 250 250 extern struct dentry *d_find_any_alias(struct inode *inode); 251 251 extern struct dentry * d_obtain_alias(struct inode *); 252 + extern struct dentry * d_obtain_root(struct inode *); 252 253 extern void shrink_dcache_sb(struct super_block *); 253 254 extern void shrink_dcache_parent(struct dentry *); 254 255 extern void shrink_dcache_for_umount(struct super_block *);
+2
include/linux/fs.h
··· 1275 1275 1276 1276 /* AIO completions deferred from interrupt context */ 1277 1277 struct workqueue_struct *s_dio_done_wq; 1278 + struct hlist_head s_pins; 1278 1279 1279 1280 /* 1280 1281 * Keep the lru lists last in the structure so they always sit on their ··· 2361 2360 2362 2361 extern int kernel_read(struct file *, loff_t, char *, unsigned long); 2363 2362 extern ssize_t kernel_write(struct file *, const char *, size_t, loff_t); 2363 + extern ssize_t __kernel_write(struct file *, const char *, size_t, loff_t *); 2364 2364 extern struct file * open_exec(const char *); 2365 2365 2366 2366 /* fs/dcache.c -- generic fs support functions */
+17
include/linux/fs_pin.h
··· 1 + #include <linux/fs.h> 2 + 3 + struct fs_pin { 4 + atomic_long_t count; 5 + union { 6 + struct { 7 + struct hlist_node s_list; 8 + struct hlist_node m_list; 9 + }; 10 + struct rcu_head rcu; 11 + }; 12 + void (*kill)(struct fs_pin *); 13 + }; 14 + 15 + void pin_put(struct fs_pin *); 16 + void pin_remove(struct fs_pin *); 17 + void pin_insert(struct fs_pin *, struct vfsmount *);
+2 -2
include/linux/mount.h
··· 69 69 }; 70 70 71 71 struct file; /* forward dec */ 72 + struct path; 72 73 73 74 extern int mnt_want_write(struct vfsmount *mnt); 74 75 extern int mnt_want_write_file(struct file *file); ··· 78 77 extern void mnt_drop_write_file(struct file *file); 79 78 extern void mntput(struct vfsmount *mnt); 80 79 extern struct vfsmount *mntget(struct vfsmount *mnt); 81 - extern void mnt_pin(struct vfsmount *mnt); 82 - extern void mnt_unpin(struct vfsmount *mnt); 80 + extern struct vfsmount *mnt_clone_internal(struct path *path); 83 81 extern int __mnt_is_readonly(struct vfsmount *mnt); 84 82 85 83 struct file_system_type;
+1 -1
include/linux/uio.h
··· 84 84 void iov_iter_init(struct iov_iter *i, int direction, const struct iovec *iov, 85 85 unsigned long nr_segs, size_t count); 86 86 ssize_t iov_iter_get_pages(struct iov_iter *i, struct page **pages, 87 - size_t maxsize, size_t *start); 87 + unsigned maxpages, size_t *start); 88 88 ssize_t iov_iter_get_pages_alloc(struct iov_iter *i, struct page ***pages, 89 89 size_t maxsize, size_t *start); 90 90 int iov_iter_npages(const struct iov_iter *i, int maxpages);
+197 -251
kernel/acct.c
··· 59 59 #include <asm/div64.h> 60 60 #include <linux/blkdev.h> /* sector_div */ 61 61 #include <linux/pid_namespace.h> 62 + #include <linux/fs_pin.h> 62 63 63 64 /* 64 65 * These constants control the amount of freespace that suspend and ··· 76 75 /* 77 76 * External references and all of the globals. 78 77 */ 79 - static void do_acct_process(struct bsd_acct_struct *acct, 80 - struct pid_namespace *ns, struct file *); 78 + static void do_acct_process(struct bsd_acct_struct *acct); 81 79 82 - /* 83 - * This structure is used so that all the data protected by lock 84 - * can be placed in the same cache line as the lock. This primes 85 - * the cache line to have the data after getting the lock. 86 - */ 87 80 struct bsd_acct_struct { 81 + struct fs_pin pin; 82 + struct mutex lock; 88 83 int active; 89 84 unsigned long needcheck; 90 85 struct file *file; 91 86 struct pid_namespace *ns; 92 - struct list_head list; 87 + struct work_struct work; 88 + struct completion done; 93 89 }; 94 - 95 - static DEFINE_SPINLOCK(acct_lock); 96 - static LIST_HEAD(acct_list); 97 90 98 91 /* 99 92 * Check the amount of free space and suspend/resume accordingly. 100 93 */ 101 - static int check_free_space(struct bsd_acct_struct *acct, struct file *file) 94 + static int check_free_space(struct bsd_acct_struct *acct) 102 95 { 103 96 struct kstatfs sbuf; 104 - int res; 105 - int act; 106 - u64 resume; 107 - u64 suspend; 108 97 109 - spin_lock(&acct_lock); 110 - res = acct->active; 111 - if (!file || time_is_before_jiffies(acct->needcheck)) 98 + if (time_is_before_jiffies(acct->needcheck)) 112 99 goto out; 113 - spin_unlock(&acct_lock); 114 100 115 101 /* May block */ 116 - if (vfs_statfs(&file->f_path, &sbuf)) 117 - return res; 118 - suspend = sbuf.f_blocks * SUSPEND; 119 - resume = sbuf.f_blocks * RESUME; 120 - 121 - do_div(suspend, 100); 122 - do_div(resume, 100); 123 - 124 - if (sbuf.f_bavail <= suspend) 125 - act = -1; 126 - else if (sbuf.f_bavail >= resume) 127 - act = 1; 128 - else 129 - act = 0; 130 - 131 - /* 132 - * If some joker switched acct->file under us we'ld better be 133 - * silent and _not_ touch anything. 134 - */ 135 - spin_lock(&acct_lock); 136 - if (file != acct->file) { 137 - if (act) 138 - res = act > 0; 102 + if (vfs_statfs(&acct->file->f_path, &sbuf)) 139 103 goto out; 140 - } 141 104 142 105 if (acct->active) { 143 - if (act < 0) { 106 + u64 suspend = sbuf.f_blocks * SUSPEND; 107 + do_div(suspend, 100); 108 + if (sbuf.f_bavail <= suspend) { 144 109 acct->active = 0; 145 110 pr_info("Process accounting paused\n"); 146 111 } 147 112 } else { 148 - if (act > 0) { 113 + u64 resume = sbuf.f_blocks * RESUME; 114 + do_div(resume, 100); 115 + if (sbuf.f_bavail >= resume) { 149 116 acct->active = 1; 150 117 pr_info("Process accounting resumed\n"); 151 118 } 152 119 } 153 120 154 121 acct->needcheck = jiffies + ACCT_TIMEOUT*HZ; 155 - res = acct->active; 156 122 out: 157 - spin_unlock(&acct_lock); 123 + return acct->active; 124 + } 125 + 126 + static struct bsd_acct_struct *acct_get(struct pid_namespace *ns) 127 + { 128 + struct bsd_acct_struct *res; 129 + again: 130 + smp_rmb(); 131 + rcu_read_lock(); 132 + res = ACCESS_ONCE(ns->bacct); 133 + if (!res) { 134 + rcu_read_unlock(); 135 + return NULL; 136 + } 137 + if (!atomic_long_inc_not_zero(&res->pin.count)) { 138 + rcu_read_unlock(); 139 + cpu_relax(); 140 + goto again; 141 + } 142 + rcu_read_unlock(); 143 + mutex_lock(&res->lock); 144 + if (!res->ns) { 145 + mutex_unlock(&res->lock); 146 + pin_put(&res->pin); 147 + goto again; 148 + } 158 149 return res; 159 150 } 160 151 161 - /* 162 - * Close the old accounting file (if currently open) and then replace 163 - * it with file (if non-NULL). 164 - * 165 - * NOTE: acct_lock MUST be held on entry and exit. 166 - */ 167 - static void acct_file_reopen(struct bsd_acct_struct *acct, struct file *file, 168 - struct pid_namespace *ns) 152 + static void close_work(struct work_struct *work) 169 153 { 170 - struct file *old_acct = NULL; 171 - struct pid_namespace *old_ns = NULL; 154 + struct bsd_acct_struct *acct = container_of(work, struct bsd_acct_struct, work); 155 + struct file *file = acct->file; 156 + if (file->f_op->flush) 157 + file->f_op->flush(file, NULL); 158 + __fput_sync(file); 159 + complete(&acct->done); 160 + } 172 161 173 - if (acct->file) { 174 - old_acct = acct->file; 175 - old_ns = acct->ns; 176 - acct->active = 0; 177 - acct->file = NULL; 162 + static void acct_kill(struct bsd_acct_struct *acct, 163 + struct bsd_acct_struct *new) 164 + { 165 + if (acct) { 166 + struct pid_namespace *ns = acct->ns; 167 + do_acct_process(acct); 168 + INIT_WORK(&acct->work, close_work); 169 + init_completion(&acct->done); 170 + schedule_work(&acct->work); 171 + wait_for_completion(&acct->done); 172 + pin_remove(&acct->pin); 173 + ns->bacct = new; 178 174 acct->ns = NULL; 179 - list_del(&acct->list); 175 + atomic_long_dec(&acct->pin.count); 176 + mutex_unlock(&acct->lock); 177 + pin_put(&acct->pin); 180 178 } 181 - if (file) { 182 - acct->file = file; 183 - acct->ns = ns; 184 - acct->needcheck = jiffies + ACCT_TIMEOUT*HZ; 185 - acct->active = 1; 186 - list_add(&acct->list, &acct_list); 179 + } 180 + 181 + static void acct_pin_kill(struct fs_pin *pin) 182 + { 183 + struct bsd_acct_struct *acct; 184 + acct = container_of(pin, struct bsd_acct_struct, pin); 185 + mutex_lock(&acct->lock); 186 + if (!acct->ns) { 187 + mutex_unlock(&acct->lock); 188 + pin_put(pin); 189 + acct = NULL; 187 190 } 188 - if (old_acct) { 189 - mnt_unpin(old_acct->f_path.mnt); 190 - spin_unlock(&acct_lock); 191 - do_acct_process(acct, old_ns, old_acct); 192 - filp_close(old_acct, NULL); 193 - spin_lock(&acct_lock); 194 - } 191 + acct_kill(acct, NULL); 195 192 } 196 193 197 194 static int acct_on(struct filename *pathname) 198 195 { 199 196 struct file *file; 200 - struct vfsmount *mnt; 201 - struct pid_namespace *ns; 202 - struct bsd_acct_struct *acct = NULL; 197 + struct vfsmount *mnt, *internal; 198 + struct pid_namespace *ns = task_active_pid_ns(current); 199 + struct bsd_acct_struct *acct, *old; 200 + int err; 201 + 202 + acct = kzalloc(sizeof(struct bsd_acct_struct), GFP_KERNEL); 203 + if (!acct) 204 + return -ENOMEM; 203 205 204 206 /* Difference from BSD - they don't do O_APPEND */ 205 207 file = file_open_name(pathname, O_WRONLY|O_APPEND|O_LARGEFILE, 0); 206 - if (IS_ERR(file)) 208 + if (IS_ERR(file)) { 209 + kfree(acct); 207 210 return PTR_ERR(file); 211 + } 208 212 209 213 if (!S_ISREG(file_inode(file)->i_mode)) { 214 + kfree(acct); 210 215 filp_close(file, NULL); 211 216 return -EACCES; 212 217 } 213 218 214 219 if (!file->f_op->write) { 220 + kfree(acct); 215 221 filp_close(file, NULL); 216 222 return -EIO; 217 223 } 218 - 219 - ns = task_active_pid_ns(current); 220 - if (ns->bacct == NULL) { 221 - acct = kzalloc(sizeof(struct bsd_acct_struct), GFP_KERNEL); 222 - if (acct == NULL) { 223 - filp_close(file, NULL); 224 - return -ENOMEM; 225 - } 224 + internal = mnt_clone_internal(&file->f_path); 225 + if (IS_ERR(internal)) { 226 + kfree(acct); 227 + filp_close(file, NULL); 228 + return PTR_ERR(internal); 226 229 } 227 - 228 - spin_lock(&acct_lock); 229 - if (ns->bacct == NULL) { 230 - ns->bacct = acct; 231 - acct = NULL; 230 + err = mnt_want_write(internal); 231 + if (err) { 232 + mntput(internal); 233 + kfree(acct); 234 + filp_close(file, NULL); 235 + return err; 232 236 } 233 - 234 237 mnt = file->f_path.mnt; 235 - mnt_pin(mnt); 236 - acct_file_reopen(ns->bacct, file, ns); 237 - spin_unlock(&acct_lock); 238 + file->f_path.mnt = internal; 238 239 239 - mntput(mnt); /* it's pinned, now give up active reference */ 240 - kfree(acct); 240 + atomic_long_set(&acct->pin.count, 1); 241 + acct->pin.kill = acct_pin_kill; 242 + acct->file = file; 243 + acct->needcheck = jiffies; 244 + acct->ns = ns; 245 + mutex_init(&acct->lock); 246 + mutex_lock_nested(&acct->lock, 1); /* nobody has seen it yet */ 247 + pin_insert(&acct->pin, mnt); 241 248 249 + old = acct_get(ns); 250 + if (old) 251 + acct_kill(old, acct); 252 + else 253 + ns->bacct = acct; 254 + mutex_unlock(&acct->lock); 255 + mnt_drop_write(mnt); 256 + mntput(mnt); 242 257 return 0; 243 258 } 259 + 260 + static DEFINE_MUTEX(acct_on_mutex); 244 261 245 262 /** 246 263 * sys_acct - enable/disable process accounting ··· 283 264 284 265 if (IS_ERR(tmp)) 285 266 return PTR_ERR(tmp); 267 + mutex_lock(&acct_on_mutex); 286 268 error = acct_on(tmp); 269 + mutex_unlock(&acct_on_mutex); 287 270 putname(tmp); 288 271 } else { 289 - struct bsd_acct_struct *acct; 290 - 291 - acct = task_active_pid_ns(current)->bacct; 292 - if (acct == NULL) 293 - return 0; 294 - 295 - spin_lock(&acct_lock); 296 - acct_file_reopen(acct, NULL, NULL); 297 - spin_unlock(&acct_lock); 272 + acct_kill(acct_get(task_active_pid_ns(current)), NULL); 298 273 } 299 274 300 275 return error; 301 276 } 302 277 303 - /** 304 - * acct_auto_close - turn off a filesystem's accounting if it is on 305 - * @m: vfsmount being shut down 306 - * 307 - * If the accounting is turned on for a file in the subtree pointed to 308 - * to by m, turn accounting off. Done when m is about to die. 309 - */ 310 - void acct_auto_close_mnt(struct vfsmount *m) 311 - { 312 - struct bsd_acct_struct *acct; 313 - 314 - spin_lock(&acct_lock); 315 - restart: 316 - list_for_each_entry(acct, &acct_list, list) 317 - if (acct->file && acct->file->f_path.mnt == m) { 318 - acct_file_reopen(acct, NULL, NULL); 319 - goto restart; 320 - } 321 - spin_unlock(&acct_lock); 322 - } 323 - 324 - /** 325 - * acct_auto_close - turn off a filesystem's accounting if it is on 326 - * @sb: super block for the filesystem 327 - * 328 - * If the accounting is turned on for a file in the filesystem pointed 329 - * to by sb, turn accounting off. 330 - */ 331 - void acct_auto_close(struct super_block *sb) 332 - { 333 - struct bsd_acct_struct *acct; 334 - 335 - spin_lock(&acct_lock); 336 - restart: 337 - list_for_each_entry(acct, &acct_list, list) 338 - if (acct->file && acct->file->f_path.dentry->d_sb == sb) { 339 - acct_file_reopen(acct, NULL, NULL); 340 - goto restart; 341 - } 342 - spin_unlock(&acct_lock); 343 - } 344 - 345 278 void acct_exit_ns(struct pid_namespace *ns) 346 279 { 347 - struct bsd_acct_struct *acct = ns->bacct; 348 - 349 - if (acct == NULL) 350 - return; 351 - 352 - spin_lock(&acct_lock); 353 - if (acct->file != NULL) 354 - acct_file_reopen(acct, NULL, NULL); 355 - spin_unlock(&acct_lock); 356 - 357 - kfree(acct); 280 + acct_kill(acct_get(ns), NULL); 358 281 } 359 282 360 283 /* ··· 411 450 * do_exit() or when switching to a different output file. 412 451 */ 413 452 414 - /* 415 - * do_acct_process does all actual work. Caller holds the reference to file. 416 - */ 417 - static void do_acct_process(struct bsd_acct_struct *acct, 418 - struct pid_namespace *ns, struct file *file) 453 + static void fill_ac(acct_t *ac) 419 454 { 420 455 struct pacct_struct *pacct = &current->signal->pacct; 421 - acct_t ac; 422 - mm_segment_t fs; 423 - unsigned long flim; 424 456 u64 elapsed, run_time; 425 457 struct tty_struct *tty; 426 - const struct cred *orig_cred; 427 - 428 - /* Perform file operations on behalf of whoever enabled accounting */ 429 - orig_cred = override_creds(file->f_cred); 430 - 431 - /* 432 - * First check to see if there is enough free_space to continue 433 - * the process accounting system. 434 - */ 435 - if (!check_free_space(acct, file)) 436 - goto out; 437 458 438 459 /* 439 460 * Fill the accounting struct with the needed info as recorded 440 461 * by the different kernel functions. 441 462 */ 442 - memset(&ac, 0, sizeof(acct_t)); 463 + memset(ac, 0, sizeof(acct_t)); 443 464 444 - ac.ac_version = ACCT_VERSION | ACCT_BYTEORDER; 445 - strlcpy(ac.ac_comm, current->comm, sizeof(ac.ac_comm)); 465 + ac->ac_version = ACCT_VERSION | ACCT_BYTEORDER; 466 + strlcpy(ac->ac_comm, current->comm, sizeof(ac->ac_comm)); 446 467 447 468 /* calculate run_time in nsec*/ 448 469 run_time = ktime_get_ns(); ··· 432 489 /* convert nsec -> AHZ */ 433 490 elapsed = nsec_to_AHZ(run_time); 434 491 #if ACCT_VERSION == 3 435 - ac.ac_etime = encode_float(elapsed); 492 + ac->ac_etime = encode_float(elapsed); 436 493 #else 437 - ac.ac_etime = encode_comp_t(elapsed < (unsigned long) -1l ? 494 + ac->ac_etime = encode_comp_t(elapsed < (unsigned long) -1l ? 438 495 (unsigned long) elapsed : (unsigned long) -1l); 439 496 #endif 440 497 #if ACCT_VERSION == 1 || ACCT_VERSION == 2 ··· 442 499 /* new enlarged etime field */ 443 500 comp2_t etime = encode_comp2_t(elapsed); 444 501 445 - ac.ac_etime_hi = etime >> 16; 446 - ac.ac_etime_lo = (u16) etime; 502 + ac->ac_etime_hi = etime >> 16; 503 + ac->ac_etime_lo = (u16) etime; 447 504 } 448 505 #endif 449 506 do_div(elapsed, AHZ); 450 - ac.ac_btime = get_seconds() - elapsed; 507 + ac->ac_btime = get_seconds() - elapsed; 508 + #if ACCT_VERSION==2 509 + ac->ac_ahz = AHZ; 510 + #endif 511 + 512 + spin_lock_irq(&current->sighand->siglock); 513 + tty = current->signal->tty; /* Safe as we hold the siglock */ 514 + ac->ac_tty = tty ? old_encode_dev(tty_devnum(tty)) : 0; 515 + ac->ac_utime = encode_comp_t(jiffies_to_AHZ(cputime_to_jiffies(pacct->ac_utime))); 516 + ac->ac_stime = encode_comp_t(jiffies_to_AHZ(cputime_to_jiffies(pacct->ac_stime))); 517 + ac->ac_flag = pacct->ac_flag; 518 + ac->ac_mem = encode_comp_t(pacct->ac_mem); 519 + ac->ac_minflt = encode_comp_t(pacct->ac_minflt); 520 + ac->ac_majflt = encode_comp_t(pacct->ac_majflt); 521 + ac->ac_exitcode = pacct->ac_exitcode; 522 + spin_unlock_irq(&current->sighand->siglock); 523 + } 524 + /* 525 + * do_acct_process does all actual work. Caller holds the reference to file. 526 + */ 527 + static void do_acct_process(struct bsd_acct_struct *acct) 528 + { 529 + acct_t ac; 530 + unsigned long flim; 531 + const struct cred *orig_cred; 532 + struct pid_namespace *ns = acct->ns; 533 + struct file *file = acct->file; 534 + 535 + /* 536 + * Accounting records are not subject to resource limits. 537 + */ 538 + flim = current->signal->rlim[RLIMIT_FSIZE].rlim_cur; 539 + current->signal->rlim[RLIMIT_FSIZE].rlim_cur = RLIM_INFINITY; 540 + /* Perform file operations on behalf of whoever enabled accounting */ 541 + orig_cred = override_creds(file->f_cred); 542 + 543 + /* 544 + * First check to see if there is enough free_space to continue 545 + * the process accounting system. 546 + */ 547 + if (!check_free_space(acct)) 548 + goto out; 549 + 550 + fill_ac(&ac); 451 551 /* we really need to bite the bullet and change layout */ 452 552 ac.ac_uid = from_kuid_munged(file->f_cred->user_ns, orig_cred->uid); 453 553 ac.ac_gid = from_kgid_munged(file->f_cred->user_ns, orig_cred->gid); 454 - #if ACCT_VERSION == 2 455 - ac.ac_ahz = AHZ; 456 - #endif 457 554 #if ACCT_VERSION == 1 || ACCT_VERSION == 2 458 555 /* backward-compatible 16 bit fields */ 459 556 ac.ac_uid16 = ac.ac_uid; ··· 505 522 ac.ac_ppid = task_tgid_nr_ns(rcu_dereference(current->real_parent), ns); 506 523 rcu_read_unlock(); 507 524 #endif 508 - 509 - spin_lock_irq(&current->sighand->siglock); 510 - tty = current->signal->tty; /* Safe as we hold the siglock */ 511 - ac.ac_tty = tty ? old_encode_dev(tty_devnum(tty)) : 0; 512 - ac.ac_utime = encode_comp_t(jiffies_to_AHZ(cputime_to_jiffies(pacct->ac_utime))); 513 - ac.ac_stime = encode_comp_t(jiffies_to_AHZ(cputime_to_jiffies(pacct->ac_stime))); 514 - ac.ac_flag = pacct->ac_flag; 515 - ac.ac_mem = encode_comp_t(pacct->ac_mem); 516 - ac.ac_minflt = encode_comp_t(pacct->ac_minflt); 517 - ac.ac_majflt = encode_comp_t(pacct->ac_majflt); 518 - ac.ac_exitcode = pacct->ac_exitcode; 519 - spin_unlock_irq(&current->sighand->siglock); 520 - ac.ac_io = encode_comp_t(0 /* current->io_usage */); /* %% */ 521 - ac.ac_rw = encode_comp_t(ac.ac_io / 1024); 522 - ac.ac_swaps = encode_comp_t(0); 523 - 524 525 /* 525 526 * Get freeze protection. If the fs is frozen, just skip the write 526 527 * as we could deadlock the system otherwise. 527 528 */ 528 - if (!file_start_write_trylock(file)) 529 - goto out; 530 - /* 531 - * Kernel segment override to datasegment and write it 532 - * to the accounting file. 533 - */ 534 - fs = get_fs(); 535 - set_fs(KERNEL_DS); 536 - /* 537 - * Accounting records are not subject to resource limits. 538 - */ 539 - flim = current->signal->rlim[RLIMIT_FSIZE].rlim_cur; 540 - current->signal->rlim[RLIMIT_FSIZE].rlim_cur = RLIM_INFINITY; 541 - file->f_op->write(file, (char *)&ac, 542 - sizeof(acct_t), &file->f_pos); 543 - current->signal->rlim[RLIMIT_FSIZE].rlim_cur = flim; 544 - set_fs(fs); 545 - file_end_write(file); 529 + if (file_start_write_trylock(file)) { 530 + /* it's been opened O_APPEND, so position is irrelevant */ 531 + loff_t pos = 0; 532 + __kernel_write(file, (char *)&ac, sizeof(acct_t), &pos); 533 + file_end_write(file); 534 + } 546 535 out: 536 + current->signal->rlim[RLIMIT_FSIZE].rlim_cur = flim; 547 537 revert_creds(orig_cred); 548 538 } 549 539 ··· 565 609 spin_unlock_irq(&current->sighand->siglock); 566 610 } 567 611 568 - static void acct_process_in_ns(struct pid_namespace *ns) 612 + static void slow_acct_process(struct pid_namespace *ns) 569 613 { 570 - struct file *file = NULL; 571 - struct bsd_acct_struct *acct; 572 - 573 - acct = ns->bacct; 574 - /* 575 - * accelerate the common fastpath: 576 - */ 577 - if (!acct || !acct->file) 578 - return; 579 - 580 - spin_lock(&acct_lock); 581 - file = acct->file; 582 - if (unlikely(!file)) { 583 - spin_unlock(&acct_lock); 584 - return; 614 + for ( ; ns; ns = ns->parent) { 615 + struct bsd_acct_struct *acct = acct_get(ns); 616 + if (acct) { 617 + do_acct_process(acct); 618 + mutex_unlock(&acct->lock); 619 + pin_put(&acct->pin); 620 + } 585 621 } 586 - get_file(file); 587 - spin_unlock(&acct_lock); 588 - 589 - do_acct_process(acct, ns, file); 590 - fput(file); 591 622 } 592 623 593 624 /** 594 - * acct_process - now just a wrapper around acct_process_in_ns, 595 - * which in turn is a wrapper around do_acct_process. 625 + * acct_process 596 626 * 597 627 * handles process accounting for an exiting task 598 628 */ ··· 591 649 * alive and holds its namespace, which in turn holds 592 650 * its parent. 593 651 */ 594 - for (ns = task_active_pid_ns(current); ns != NULL; ns = ns->parent) 595 - acct_process_in_ns(ns); 652 + for (ns = task_active_pid_ns(current); ns != NULL; ns = ns->parent) { 653 + if (ns->bacct) 654 + break; 655 + } 656 + if (unlikely(ns)) 657 + slow_acct_process(ns); 596 658 }
+1 -1
mm/filemap.c
··· 2602 2602 * that this differs from normal direct-io semantics, which 2603 2603 * will return -EFOO even if some bytes were written. 2604 2604 */ 2605 - if (unlikely(status < 0) && !written) { 2605 + if (unlikely(status < 0)) { 2606 2606 err = status; 2607 2607 goto out; 2608 2608 }
+8 -9
mm/iov_iter.c
··· 310 310 EXPORT_SYMBOL(iov_iter_init); 311 311 312 312 static ssize_t get_pages_iovec(struct iov_iter *i, 313 - struct page **pages, size_t maxsize, 313 + struct page **pages, unsigned maxpages, 314 314 size_t *start) 315 315 { 316 316 size_t offset = i->iov_offset; ··· 323 323 len = iov->iov_len - offset; 324 324 if (len > i->count) 325 325 len = i->count; 326 - if (len > maxsize) 327 - len = maxsize; 328 326 addr = (unsigned long)iov->iov_base + offset; 329 327 len += *start = addr & (PAGE_SIZE - 1); 328 + if (len > maxpages * PAGE_SIZE) 329 + len = maxpages * PAGE_SIZE; 330 330 addr &= ~(PAGE_SIZE - 1); 331 331 n = (len + PAGE_SIZE - 1) / PAGE_SIZE; 332 332 res = get_user_pages_fast(addr, n, (i->type & WRITE) != WRITE, pages); ··· 588 588 } 589 589 590 590 static ssize_t get_pages_bvec(struct iov_iter *i, 591 - struct page **pages, size_t maxsize, 591 + struct page **pages, unsigned maxpages, 592 592 size_t *start) 593 593 { 594 594 const struct bio_vec *bvec = i->bvec; 595 595 size_t len = bvec->bv_len - i->iov_offset; 596 596 if (len > i->count) 597 597 len = i->count; 598 - if (len > maxsize) 599 - len = maxsize; 598 + /* can't be more than PAGE_SIZE */ 600 599 *start = bvec->bv_offset + i->iov_offset; 601 600 602 601 get_page(*pages = bvec->bv_page); ··· 711 712 EXPORT_SYMBOL(iov_iter_alignment); 712 713 713 714 ssize_t iov_iter_get_pages(struct iov_iter *i, 714 - struct page **pages, size_t maxsize, 715 + struct page **pages, unsigned maxpages, 715 716 size_t *start) 716 717 { 717 718 if (i->type & ITER_BVEC) 718 - return get_pages_bvec(i, pages, maxsize, start); 719 + return get_pages_bvec(i, pages, maxpages, start); 719 720 else 720 - return get_pages_iovec(i, pages, maxsize, start); 721 + return get_pages_iovec(i, pages, maxpages, start); 721 722 } 722 723 EXPORT_SYMBOL(iov_iter_get_pages); 723 724
+30 -2
mm/shmem.c
··· 2323 2323 return shmem_unlink(dir, dentry); 2324 2324 } 2325 2325 2326 + static int shmem_exchange(struct inode *old_dir, struct dentry *old_dentry, struct inode *new_dir, struct dentry *new_dentry) 2327 + { 2328 + bool old_is_dir = S_ISDIR(old_dentry->d_inode->i_mode); 2329 + bool new_is_dir = S_ISDIR(new_dentry->d_inode->i_mode); 2330 + 2331 + if (old_dir != new_dir && old_is_dir != new_is_dir) { 2332 + if (old_is_dir) { 2333 + drop_nlink(old_dir); 2334 + inc_nlink(new_dir); 2335 + } else { 2336 + drop_nlink(new_dir); 2337 + inc_nlink(old_dir); 2338 + } 2339 + } 2340 + old_dir->i_ctime = old_dir->i_mtime = 2341 + new_dir->i_ctime = new_dir->i_mtime = 2342 + old_dentry->d_inode->i_ctime = 2343 + new_dentry->d_inode->i_ctime = CURRENT_TIME; 2344 + 2345 + return 0; 2346 + } 2347 + 2326 2348 /* 2327 2349 * The VFS layer already does all the dentry stuff for rename, 2328 2350 * we just have to decrement the usage count for the target if 2329 2351 * it exists so that the VFS layer correctly free's it when it 2330 2352 * gets overwritten. 2331 2353 */ 2332 - static int shmem_rename(struct inode *old_dir, struct dentry *old_dentry, struct inode *new_dir, struct dentry *new_dentry) 2354 + static int shmem_rename2(struct inode *old_dir, struct dentry *old_dentry, struct inode *new_dir, struct dentry *new_dentry, unsigned int flags) 2333 2355 { 2334 2356 struct inode *inode = old_dentry->d_inode; 2335 2357 int they_are_dirs = S_ISDIR(inode->i_mode); 2358 + 2359 + if (flags & ~(RENAME_NOREPLACE | RENAME_EXCHANGE)) 2360 + return -EINVAL; 2361 + 2362 + if (flags & RENAME_EXCHANGE) 2363 + return shmem_exchange(old_dir, old_dentry, new_dir, new_dentry); 2336 2364 2337 2365 if (!simple_empty(new_dentry)) 2338 2366 return -ENOTEMPTY; ··· 3115 3087 .mkdir = shmem_mkdir, 3116 3088 .rmdir = shmem_rmdir, 3117 3089 .mknod = shmem_mknod, 3118 - .rename = shmem_rename, 3090 + .rename2 = shmem_rename2, 3119 3091 .tmpfile = shmem_tmpfile, 3120 3092 #endif 3121 3093 #ifdef CONFIG_TMPFS_XATTR