Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

Merge tag 'gfs2-for-5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2

Pull gfs2 updates from Andreas Gruenbacher:

- An iopen glock locking scheme rework that speeds up deletes of inodes
accessed from multiple nodes

- Various bug fixes and debugging improvements

- Convert gfs2-glocks.txt to ReST

* tag 'gfs2-for-5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
gfs2: fix use-after-free on transaction ail lists
gfs2: new slab for transactions
gfs2: initialize transaction tr_ailX_lists earlier
gfs2: Smarter iopen glock waiting
gfs2: Wake up when setting GLF_DEMOTE
gfs2: Check inode generation number in delete_work_func
gfs2: Move inode generation number check into gfs2_inode_lookup
gfs2: Minor gfs2_lookup_by_inum cleanup
gfs2: Try harder to delete inodes locally
gfs2: Give up the iopen glock on contention
gfs2: Turn gl_delete into a delayed work
gfs2: Keep track of deleted inode generations in LVBs
gfs2: Allow ASPACE glocks to also have an lvb
gfs2: instrumentation wrt log_flush stuck
gfs2: introduce new gfs2_glock_assert_withdraw
gfs2: print mapping->nrpages in glock dump for address space glocks
gfs2: Only do glock put in gfs2_create_inode for free inodes
gfs2: Allow lock_nolock mount to specify jid=X
gfs2: Don't ignore inode write errors during inode_go_sync
docs: filesystems: convert gfs2-glocks.txt to ReST

+489 -141
+84 -63
Documentation/filesystems/gfs2-glocks.txt Documentation/filesystems/gfs2-glocks.rst
··· 1 - Glock internal locking rules 2 - ------------------------------ 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + ============================ 4 + Glock internal locking rules 5 + ============================ 3 6 4 7 This documents the basic principles of the glock state machine 5 8 internals. Each glock (struct gfs2_glock in fs/gfs2/incore.h) ··· 27 24 namely shared (SH), deferred (DF) and exclusive (EX). Those translate 28 25 to the following DLM lock modes: 29 26 30 - Glock mode | DLM lock mode 31 - ------------------------------ 32 - UN | IV/NL Unlocked (no DLM lock associated with glock) or NL 33 - SH | PR (Protected read) 34 - DF | CW (Concurrent write) 35 - EX | EX (Exclusive) 27 + ========== ====== ===================================================== 28 + Glock mode DLM lock mode 29 + ========== ====== ===================================================== 30 + UN IV/NL Unlocked (no DLM lock associated with glock) or NL 31 + SH PR (Protected read) 32 + DF CW (Concurrent write) 33 + EX EX (Exclusive) 34 + ========== ====== ===================================================== 36 35 37 36 Thus DF is basically a shared mode which is incompatible with the "normal" 38 37 shared lock mode, SH. In GFS2 the DF mode is used exclusively for direct I/O 39 38 operations. The glocks are basically a lock plus some routines which deal 40 39 with cache management. The following rules apply for the cache: 41 40 42 - Glock mode | Cache data | Cache Metadata | Dirty Data | Dirty Metadata 43 - -------------------------------------------------------------------------- 44 - UN | No | No | No | No 45 - SH | Yes | Yes | No | No 46 - DF | No | Yes | No | No 47 - EX | Yes | Yes | Yes | Yes 41 + ========== ========== ============== ========== ============== 42 + Glock mode Cache data Cache Metadata Dirty Data Dirty Metadata 43 + ========== ========== ============== ========== ============== 44 + UN No No No No 45 + SH Yes Yes No No 46 + DF No Yes No No 47 + EX Yes Yes Yes Yes 48 + ========== ========== ============== ========== ============== 48 49 49 50 These rules are implemented using the various glock operations which 50 51 are defined for each type of glock. Not all types of glocks use ··· 56 49 57 50 Table of glock operations and per type constants: 58 51 59 - Field | Purpose 60 - ---------------------------------------------------------------------------- 61 - go_xmote_th | Called before remote state change (e.g. to sync dirty data) 62 - go_xmote_bh | Called after remote state change (e.g. to refill cache) 63 - go_inval | Called if remote state change requires invalidating the cache 64 - go_demote_ok | Returns boolean value of whether its ok to demote a glock 65 - | (e.g. checks timeout, and that there is no cached data) 66 - go_lock | Called for the first local holder of a lock 67 - go_unlock | Called on the final local unlock of a lock 68 - go_dump | Called to print content of object for debugfs file, or on 69 - | error to dump glock to the log. 70 - go_type | The type of the glock, LM_TYPE_..... 71 - go_callback | Called if the DLM sends a callback to drop this lock 72 - go_flags | GLOF_ASPACE is set, if the glock has an address space 73 - | associated with it 52 + ============= ============================================================= 53 + Field Purpose 54 + ============= ============================================================= 55 + go_xmote_th Called before remote state change (e.g. to sync dirty data) 56 + go_xmote_bh Called after remote state change (e.g. to refill cache) 57 + go_inval Called if remote state change requires invalidating the cache 58 + go_demote_ok Returns boolean value of whether its ok to demote a glock 59 + (e.g. checks timeout, and that there is no cached data) 60 + go_lock Called for the first local holder of a lock 61 + go_unlock Called on the final local unlock of a lock 62 + go_dump Called to print content of object for debugfs file, or on 63 + error to dump glock to the log. 64 + go_type The type of the glock, ``LM_TYPE_*`` 65 + go_callback Called if the DLM sends a callback to drop this lock 66 + go_flags GLOF_ASPACE is set, if the glock has an address space 67 + associated with it 68 + ============= ============================================================= 74 69 75 70 The minimum hold time for each lock is the time after a remote lock 76 71 grant for which we ignore remote demote requests. This is in order to ··· 91 82 92 83 Locking rules for glock operations: 93 84 94 - Operation | GLF_LOCK bit lock held | gl_lockref.lock spinlock held 95 - ------------------------------------------------------------------------- 96 - go_xmote_th | Yes | No 97 - go_xmote_bh | Yes | No 98 - go_inval | Yes | No 99 - go_demote_ok | Sometimes | Yes 100 - go_lock | Yes | No 101 - go_unlock | Yes | No 102 - go_dump | Sometimes | Yes 103 - go_callback | Sometimes (N/A) | Yes 85 + ============= ====================== ============================= 86 + Operation GLF_LOCK bit lock held gl_lockref.lock spinlock held 87 + ============= ====================== ============================= 88 + go_xmote_th Yes No 89 + go_xmote_bh Yes No 90 + go_inval Yes No 91 + go_demote_ok Sometimes Yes 92 + go_lock Yes No 93 + go_unlock Yes No 94 + go_dump Sometimes Yes 95 + go_callback Sometimes (N/A) Yes 96 + ============= ====================== ============================= 104 97 105 - N.B. Operations must not drop either the bit lock or the spinlock 106 - if its held on entry. go_dump and do_demote_ok must never block. 107 - Note that go_dump will only be called if the glock's state 108 - indicates that it is caching uptodate data. 98 + .. Note:: 99 + 100 + Operations must not drop either the bit lock or the spinlock 101 + if its held on entry. go_dump and do_demote_ok must never block. 102 + Note that go_dump will only be called if the glock's state 103 + indicates that it is caching uptodate data. 109 104 110 105 Glock locking order within GFS2: 111 106 ··· 117 104 2. Rename glock (for rename only) 118 105 3. Inode glock(s) 119 106 (Parents before children, inodes at "same level" with same parent in 120 - lock number order) 107 + lock number order) 121 108 4. Rgrp glock(s) (for (de)allocation operations) 122 109 5. Transaction glock (via gfs2_trans_begin) for non-read operations 123 110 6. i_rw_mutex (if required) ··· 130 117 is on a per-inode basis. Locking of rgrps is on a per rgrp basis. 131 118 In general we prefer to lock local locks prior to cluster locks. 132 119 133 - Glock Statistics 134 - ------------------ 120 + Glock Statistics 121 + ---------------- 135 122 136 123 The stats are divided into two sets: those relating to the 137 124 super block and those relating to an individual glock. The ··· 186 173 1. To be able to better set the glock "min hold time" 187 174 2. To spot performance issues more easily 188 175 3. To improve the algorithm for selecting resource groups for 189 - allocation (to base it on lock wait time, rather than blindly 190 - using a "try lock") 176 + allocation (to base it on lock wait time, rather than blindly 177 + using a "try lock") 191 178 192 179 Due to the smoothing action of the updates, a step change in 193 180 some input quantity being sampled will only fully be taken ··· 208 195 measuring system, but I hope this is as accurate as we 209 196 can reasonably make it. 210 197 211 - Per sb stats can be found here: 212 - /sys/kernel/debug/gfs2/<fsname>/sbstats 213 - Per glock stats can be found here: 214 - /sys/kernel/debug/gfs2/<fsname>/glstats 198 + Per sb stats can be found here:: 199 + 200 + /sys/kernel/debug/gfs2/<fsname>/sbstats 201 + 202 + Per glock stats can be found here:: 203 + 204 + /sys/kernel/debug/gfs2/<fsname>/glstats 215 205 216 206 Assuming that debugfs is mounted on /sys/kernel/debug and also 217 207 that <fsname> is replaced with the name of the gfs2 filesystem ··· 222 206 223 207 The abbreviations used in the output as are follows: 224 208 225 - srtt - Smoothed round trip time for non-blocking dlm requests 226 - srttvar - Variance estimate for srtt 227 - srttb - Smoothed round trip time for (potentially) blocking dlm requests 228 - srttvarb - Variance estimate for srttb 229 - sirt - Smoothed inter-request time (for dlm requests) 230 - sirtvar - Variance estimate for sirt 231 - dlm - Number of dlm requests made (dcnt in glstats file) 232 - queue - Number of glock requests queued (qcnt in glstats file) 209 + ========= ================================================================ 210 + srtt Smoothed round trip time for non blocking dlm requests 211 + srttvar Variance estimate for srtt 212 + srttb Smoothed round trip time for (potentially) blocking dlm requests 213 + srttvarb Variance estimate for srttb 214 + sirt Smoothed inter request time (for dlm requests) 215 + sirtvar Variance estimate for sirt 216 + dlm Number of dlm requests made (dcnt in glstats file) 217 + queue Number of glock requests queued (qcnt in glstats file) 218 + ========= ================================================================ 233 219 234 220 The sbstats file contains a set of these stats for each glock type (so 8 lines 235 221 for each type) and for each cpu (one column per cpu). The glstats file contains ··· 242 224 for the glock in question, along with some addition information on each dlm 243 225 reply that is received: 244 226 245 - status - The status of the dlm request 246 - flags - The dlm request flags 247 - tdiff - The time taken by this specific request 227 + ====== ======================================= 228 + status The status of the dlm request 229 + flags The dlm request flags 230 + tdiff The time taken by this specific request 231 + ====== ======================================= 232 + 248 233 (remaining fields as per above list) 249 234 250 235
+1
Documentation/filesystems/index.rst
··· 88 88 f2fs 89 89 gfs2 90 90 gfs2-uevents 91 + gfs2-glocks 91 92 hfs 92 93 hfsplus 93 94 hpfs
+1 -1
MAINTAINERS
··· 7251 7251 S: Supported 7252 7252 W: http://sources.redhat.com/cluster/ 7253 7253 T: git git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2.git 7254 - F: Documentation/filesystems/gfs2*.txt 7254 + F: Documentation/filesystems/gfs2* 7255 7255 F: fs/gfs2/ 7256 7256 F: include/uapi/linux/gfs2_ondisk.h 7257 7257
+3 -1
fs/gfs2/export.c
··· 134 134 struct gfs2_sbd *sdp = sb->s_fs_info; 135 135 struct inode *inode; 136 136 137 - inode = gfs2_lookup_by_inum(sdp, inum->no_addr, &inum->no_formal_ino, 137 + if (!inum->no_formal_ino) 138 + return ERR_PTR(-ESTALE); 139 + inode = gfs2_lookup_by_inum(sdp, inum->no_addr, inum->no_formal_ino, 138 140 GFS2_BLKST_DINODE); 139 141 if (IS_ERR(inode)) 140 142 return ERR_CAST(inode);
+185 -23
fs/gfs2/glock.c
··· 125 125 { 126 126 struct gfs2_glock *gl = container_of(rcu, struct gfs2_glock, gl_rcu); 127 127 128 - if (gl->gl_ops->go_flags & GLOF_ASPACE) { 128 + kfree(gl->gl_lksb.sb_lvbptr); 129 + if (gl->gl_ops->go_flags & GLOF_ASPACE) 129 130 kmem_cache_free(gfs2_glock_aspace_cachep, gl); 130 - } else { 131 - kfree(gl->gl_lksb.sb_lvbptr); 131 + else 132 132 kmem_cache_free(gfs2_glock_cachep, gl); 133 - } 134 133 } 135 134 136 135 /** ··· 163 164 { 164 165 struct gfs2_sbd *sdp = gl->gl_name.ln_sbd; 165 166 166 - BUG_ON(atomic_read(&gl->gl_revokes)); 167 + gfs2_glock_assert_withdraw(gl, atomic_read(&gl->gl_revokes) == 0); 167 168 rhashtable_remove_fast(&gl_hash_table, &gl->gl_node, ht_parms); 168 169 smp_mb(); 169 170 wake_up_glock(gl); ··· 464 465 gl->gl_tchange = jiffies; 465 466 } 466 467 468 + static void gfs2_set_demote(struct gfs2_glock *gl) 469 + { 470 + struct gfs2_sbd *sdp = gl->gl_name.ln_sbd; 471 + 472 + set_bit(GLF_DEMOTE, &gl->gl_flags); 473 + smp_mb(); 474 + wake_up(&sdp->sd_async_glock_wait); 475 + } 476 + 467 477 static void gfs2_demote_wake(struct gfs2_glock *gl) 468 478 { 469 479 gl->gl_demote_state = LM_ST_EXCLUSIVE; ··· 634 626 */ 635 627 if ((atomic_read(&gl->gl_ail_count) != 0) && 636 628 (!cmpxchg(&sdp->sd_log_error, 0, -EIO))) { 637 - gfs2_assert_warn(sdp, !atomic_read(&gl->gl_ail_count)); 629 + gfs2_glock_assert_warn(gl, 630 + !atomic_read(&gl->gl_ail_count)); 638 631 gfs2_dump_glock(NULL, gl, true); 639 632 } 640 633 glops->go_inval(gl, target == LM_ST_DEFERRED ? 0 : DIO_METADATA); ··· 765 756 return; 766 757 } 767 758 759 + void gfs2_inode_remember_delete(struct gfs2_glock *gl, u64 generation) 760 + { 761 + struct gfs2_inode_lvb *ri = (void *)gl->gl_lksb.sb_lvbptr; 762 + 763 + if (ri->ri_magic == 0) 764 + ri->ri_magic = cpu_to_be32(GFS2_MAGIC); 765 + if (ri->ri_magic == cpu_to_be32(GFS2_MAGIC)) 766 + ri->ri_generation_deleted = cpu_to_be64(generation); 767 + } 768 + 769 + bool gfs2_inode_already_deleted(struct gfs2_glock *gl, u64 generation) 770 + { 771 + struct gfs2_inode_lvb *ri = (void *)gl->gl_lksb.sb_lvbptr; 772 + 773 + if (ri->ri_magic != cpu_to_be32(GFS2_MAGIC)) 774 + return false; 775 + return generation <= be64_to_cpu(ri->ri_generation_deleted); 776 + } 777 + 778 + static void gfs2_glock_poke(struct gfs2_glock *gl) 779 + { 780 + int flags = LM_FLAG_TRY_1CB | LM_FLAG_ANY | GL_SKIP; 781 + struct gfs2_holder gh; 782 + int error; 783 + 784 + error = gfs2_glock_nq_init(gl, LM_ST_SHARED, flags, &gh); 785 + if (!error) 786 + gfs2_glock_dq(&gh); 787 + } 788 + 789 + static bool gfs2_try_evict(struct gfs2_glock *gl) 790 + { 791 + struct gfs2_inode *ip; 792 + bool evicted = false; 793 + 794 + /* 795 + * If there is contention on the iopen glock and we have an inode, try 796 + * to grab and release the inode so that it can be evicted. This will 797 + * allow the remote node to go ahead and delete the inode without us 798 + * having to do it, which will avoid rgrp glock thrashing. 799 + * 800 + * The remote node is likely still holding the corresponding inode 801 + * glock, so it will run before we get to verify that the delete has 802 + * happened below. 803 + */ 804 + spin_lock(&gl->gl_lockref.lock); 805 + ip = gl->gl_object; 806 + if (ip && !igrab(&ip->i_inode)) 807 + ip = NULL; 808 + spin_unlock(&gl->gl_lockref.lock); 809 + if (ip) { 810 + struct gfs2_glock *inode_gl = NULL; 811 + 812 + gl->gl_no_formal_ino = ip->i_no_formal_ino; 813 + set_bit(GIF_DEFERRED_DELETE, &ip->i_flags); 814 + d_prune_aliases(&ip->i_inode); 815 + iput(&ip->i_inode); 816 + 817 + /* If the inode was evicted, gl->gl_object will now be NULL. */ 818 + spin_lock(&gl->gl_lockref.lock); 819 + ip = gl->gl_object; 820 + if (ip) { 821 + inode_gl = ip->i_gl; 822 + lockref_get(&inode_gl->gl_lockref); 823 + clear_bit(GIF_DEFERRED_DELETE, &ip->i_flags); 824 + } 825 + spin_unlock(&gl->gl_lockref.lock); 826 + if (inode_gl) { 827 + gfs2_glock_poke(inode_gl); 828 + gfs2_glock_put(inode_gl); 829 + } 830 + evicted = !ip; 831 + } 832 + return evicted; 833 + } 834 + 768 835 static void delete_work_func(struct work_struct *work) 769 836 { 770 - struct gfs2_glock *gl = container_of(work, struct gfs2_glock, gl_delete); 837 + struct delayed_work *dwork = to_delayed_work(work); 838 + struct gfs2_glock *gl = container_of(dwork, struct gfs2_glock, gl_delete); 771 839 struct gfs2_sbd *sdp = gl->gl_name.ln_sbd; 772 840 struct inode *inode; 773 841 u64 no_addr = gl->gl_name.ln_number; 842 + 843 + spin_lock(&gl->gl_lockref.lock); 844 + clear_bit(GLF_PENDING_DELETE, &gl->gl_flags); 845 + spin_unlock(&gl->gl_lockref.lock); 774 846 775 847 /* If someone's using this glock to create a new dinode, the block must 776 848 have been freed by another node, then re-used, in which case our ··· 859 769 if (test_bit(GLF_INODE_CREATING, &gl->gl_flags)) 860 770 goto out; 861 771 862 - inode = gfs2_lookup_by_inum(sdp, no_addr, NULL, GFS2_BLKST_UNLINKED); 772 + if (test_bit(GLF_DEMOTE, &gl->gl_flags)) { 773 + /* 774 + * If we can evict the inode, give the remote node trying to 775 + * delete the inode some time before verifying that the delete 776 + * has happened. Otherwise, if we cause contention on the inode glock 777 + * immediately, the remote node will think that we still have 778 + * the inode in use, and so it will give up waiting. 779 + * 780 + * If we can't evict the inode, signal to the remote node that 781 + * the inode is still in use. We'll later try to delete the 782 + * inode locally in gfs2_evict_inode. 783 + * 784 + * FIXME: We only need to verify that the remote node has 785 + * deleted the inode because nodes before this remote delete 786 + * rework won't cooperate. At a later time, when we no longer 787 + * care about compatibility with such nodes, we can skip this 788 + * step entirely. 789 + */ 790 + if (gfs2_try_evict(gl)) { 791 + if (gfs2_queue_delete_work(gl, 5 * HZ)) 792 + return; 793 + } 794 + goto out; 795 + } 796 + 797 + inode = gfs2_lookup_by_inum(sdp, no_addr, gl->gl_no_formal_ino, 798 + GFS2_BLKST_UNLINKED); 863 799 if (!IS_ERR_OR_NULL(inode)) { 864 800 d_prune_aliases(inode); 865 801 iput(inode); ··· 916 800 917 801 if (!delay) { 918 802 clear_bit(GLF_PENDING_DEMOTE, &gl->gl_flags); 919 - set_bit(GLF_DEMOTE, &gl->gl_flags); 803 + gfs2_set_demote(gl); 920 804 } 921 805 } 922 806 run_queue(gl, 0); ··· 1047 931 gl->gl_object = NULL; 1048 932 gl->gl_hold_time = GL_GLOCK_DFT_HOLD; 1049 933 INIT_DELAYED_WORK(&gl->gl_work, glock_work_func); 1050 - INIT_WORK(&gl->gl_delete, delete_work_func); 934 + INIT_DELAYED_WORK(&gl->gl_delete, delete_work_func); 1051 935 1052 936 mapping = gfs2_glock2aspace(gl); 1053 937 if (mapping) { ··· 1261 1145 static void handle_callback(struct gfs2_glock *gl, unsigned int state, 1262 1146 unsigned long delay, bool remote) 1263 1147 { 1264 - int bit = delay ? GLF_PENDING_DEMOTE : GLF_DEMOTE; 1265 - 1266 - set_bit(bit, &gl->gl_flags); 1148 + if (delay) 1149 + set_bit(GLF_PENDING_DEMOTE, &gl->gl_flags); 1150 + else 1151 + gfs2_set_demote(gl); 1267 1152 if (gl->gl_demote_state == LM_ST_EXCLUSIVE) { 1268 1153 gl->gl_demote_state = state; 1269 1154 gl->gl_demote_time = jiffies; ··· 1871 1754 rhashtable_walk_exit(&iter); 1872 1755 } 1873 1756 1757 + bool gfs2_queue_delete_work(struct gfs2_glock *gl, unsigned long delay) 1758 + { 1759 + bool queued; 1760 + 1761 + spin_lock(&gl->gl_lockref.lock); 1762 + queued = queue_delayed_work(gfs2_delete_workqueue, 1763 + &gl->gl_delete, delay); 1764 + if (queued) 1765 + set_bit(GLF_PENDING_DELETE, &gl->gl_flags); 1766 + spin_unlock(&gl->gl_lockref.lock); 1767 + return queued; 1768 + } 1769 + 1770 + void gfs2_cancel_delete_work(struct gfs2_glock *gl) 1771 + { 1772 + if (cancel_delayed_work_sync(&gl->gl_delete)) { 1773 + clear_bit(GLF_PENDING_DELETE, &gl->gl_flags); 1774 + gfs2_glock_put(gl); 1775 + } 1776 + } 1777 + 1778 + bool gfs2_delete_work_queued(const struct gfs2_glock *gl) 1779 + { 1780 + return test_bit(GLF_PENDING_DELETE, &gl->gl_flags); 1781 + } 1782 + 1783 + static void flush_delete_work(struct gfs2_glock *gl) 1784 + { 1785 + flush_delayed_work(&gl->gl_delete); 1786 + gfs2_glock_queue_work(gl, 0); 1787 + } 1788 + 1789 + void gfs2_flush_delete_work(struct gfs2_sbd *sdp) 1790 + { 1791 + glock_hash_walk(flush_delete_work, sdp); 1792 + flush_workqueue(gfs2_delete_workqueue); 1793 + } 1794 + 1874 1795 /** 1875 1796 * thaw_glock - thaw out a glock which has an unprocessed reply waiting 1876 1797 * @gl: The glock to thaw ··· 1991 1836 int ret; 1992 1837 1993 1838 ret = gfs2_truncatei_resume(ip); 1994 - gfs2_assert_withdraw(gl->gl_name.ln_sbd, ret == 0); 1839 + gfs2_glock_assert_withdraw(gl, ret == 0); 1995 1840 1996 1841 spin_lock(&gl->gl_lockref.lock); 1997 1842 clear_bit(GLF_LOCK, &gl->gl_flags); ··· 2133 1978 char gflags_buf[32]; 2134 1979 struct gfs2_sbd *sdp = gl->gl_name.ln_sbd; 2135 1980 char fs_id_buf[sizeof(sdp->sd_fsname) + 7]; 1981 + unsigned long nrpages = 0; 2136 1982 1983 + if (gl->gl_ops->go_flags & GLOF_ASPACE) { 1984 + struct address_space *mapping = gfs2_glock2aspace(gl); 1985 + 1986 + nrpages = mapping->nrpages; 1987 + } 2137 1988 memset(fs_id_buf, 0, sizeof(fs_id_buf)); 2138 1989 if (fsid && sdp) /* safety precaution */ 2139 1990 sprintf(fs_id_buf, "fsid=%s: ", sdp->sd_fsname); ··· 2148 1987 if (!test_bit(GLF_DEMOTE, &gl->gl_flags)) 2149 1988 dtime = 0; 2150 1989 gfs2_print_dbg(seq, "%sG: s:%s n:%u/%llx f:%s t:%s d:%s/%llu a:%d " 2151 - "v:%d r:%d m:%ld\n", fs_id_buf, state2str(gl->gl_state), 2152 - gl->gl_name.ln_type, 2153 - (unsigned long long)gl->gl_name.ln_number, 2154 - gflags2str(gflags_buf, gl), 2155 - state2str(gl->gl_target), 2156 - state2str(gl->gl_demote_state), dtime, 2157 - atomic_read(&gl->gl_ail_count), 2158 - atomic_read(&gl->gl_revokes), 2159 - (int)gl->gl_lockref.count, gl->gl_hold_time); 1990 + "v:%d r:%d m:%ld p:%lu\n", 1991 + fs_id_buf, state2str(gl->gl_state), 1992 + gl->gl_name.ln_type, 1993 + (unsigned long long)gl->gl_name.ln_number, 1994 + gflags2str(gflags_buf, gl), 1995 + state2str(gl->gl_target), 1996 + state2str(gl->gl_demote_state), dtime, 1997 + atomic_read(&gl->gl_ail_count), 1998 + atomic_read(&gl->gl_revokes), 1999 + (int)gl->gl_lockref.count, gl->gl_hold_time, nrpages); 2160 2000 2161 2001 list_for_each_entry(gh, &gl->gl_holders, gh_list) 2162 2002 dump_holder(seq, gh, fs_id_buf);
+16
fs/gfs2/glock.h
··· 205 205 #define GLOCK_BUG_ON(gl,x) do { if (unlikely(x)) { \ 206 206 gfs2_dump_glock(NULL, gl, true); \ 207 207 BUG(); } } while(0) 208 + #define gfs2_glock_assert_warn(gl, x) do { if (unlikely(!(x))) { \ 209 + gfs2_dump_glock(NULL, gl, true); \ 210 + gfs2_assert_warn((gl)->gl_name.ln_sbd, (x)); } } \ 211 + while (0) 212 + #define gfs2_glock_assert_withdraw(gl, x) do { if (unlikely(!(x))) { \ 213 + gfs2_dump_glock(NULL, gl, true); \ 214 + gfs2_assert_withdraw((gl)->gl_name.ln_sbd, (x)); } } \ 215 + while (0) 216 + 208 217 extern __printf(2, 3) 209 218 void gfs2_print_dbg(struct seq_file *seq, const char *fmt, ...); 210 219 ··· 244 235 245 236 extern void gfs2_glock_cb(struct gfs2_glock *gl, unsigned int state); 246 237 extern void gfs2_glock_complete(struct gfs2_glock *gl, int ret); 238 + extern bool gfs2_queue_delete_work(struct gfs2_glock *gl, unsigned long delay); 239 + extern void gfs2_cancel_delete_work(struct gfs2_glock *gl); 240 + extern bool gfs2_delete_work_queued(const struct gfs2_glock *gl); 241 + extern void gfs2_flush_delete_work(struct gfs2_sbd *sdp); 247 242 extern void gfs2_gl_hash_clear(struct gfs2_sbd *sdp); 248 243 extern void gfs2_glock_finish_truncate(struct gfs2_inode *ip); 249 244 extern void gfs2_glock_thaw(struct gfs2_sbd *sdp); ··· 318 305 gl->gl_object = NULL; 319 306 spin_unlock(&gl->gl_lockref.lock); 320 307 } 308 + 309 + extern void gfs2_inode_remember_delete(struct gfs2_glock *gl, u64 generation); 310 + extern bool gfs2_inode_already_deleted(struct gfs2_glock *gl, u64 generation); 321 311 322 312 #endif /* __GLOCK_DOT_H__ */
+16 -5
fs/gfs2/glops.c
··· 91 91 memset(&tr, 0, sizeof(tr)); 92 92 INIT_LIST_HEAD(&tr.tr_buf); 93 93 INIT_LIST_HEAD(&tr.tr_databuf); 94 + INIT_LIST_HEAD(&tr.tr_ail1_list); 95 + INIT_LIST_HEAD(&tr.tr_ail2_list); 94 96 tr.tr_revokes = atomic_read(&gl->gl_ail_count); 95 97 96 98 if (!tr.tr_revokes) { ··· 270 268 struct gfs2_inode *ip = gfs2_glock2inode(gl); 271 269 int isreg = ip && S_ISREG(ip->i_inode.i_mode); 272 270 struct address_space *metamapping = gfs2_glock2aspace(gl); 273 - int error = 0; 271 + int error = 0, ret; 274 272 275 273 if (isreg) { 276 274 if (test_and_clear_bit(GIF_SW_PAGED, &ip->i_flags)) ··· 291 289 error = filemap_fdatawait(mapping); 292 290 mapping_set_error(mapping, error); 293 291 } 294 - error = filemap_fdatawait(metamapping); 295 - mapping_set_error(metamapping, error); 292 + ret = filemap_fdatawait(metamapping); 293 + mapping_set_error(metamapping, ret); 294 + if (!error) 295 + error = ret; 296 296 gfs2_ail_empty_gl(gl); 297 297 /* 298 298 * Writeback of the data mapping may cause the dirty flag to be set ··· 612 608 if (gl->gl_demote_state == LM_ST_UNLOCKED && 613 609 gl->gl_state == LM_ST_SHARED && ip) { 614 610 gl->gl_lockref.count++; 615 - if (queue_work(gfs2_delete_workqueue, &gl->gl_delete) == 0) 611 + if (!queue_delayed_work(gfs2_delete_workqueue, 612 + &gl->gl_delete, 0)) 616 613 gl->gl_lockref.count--; 617 614 } 615 + } 616 + 617 + static int iopen_go_demote_ok(const struct gfs2_glock *gl) 618 + { 619 + return !gfs2_delete_work_queued(gl); 618 620 } 619 621 620 622 /** ··· 702 692 .go_lock = inode_go_lock, 703 693 .go_dump = inode_go_dump, 704 694 .go_type = LM_TYPE_INODE, 705 - .go_flags = GLOF_ASPACE | GLOF_LRU, 695 + .go_flags = GLOF_ASPACE | GLOF_LRU | GLOF_LVB, 706 696 .go_free = inode_go_free, 707 697 }; 708 698 ··· 726 716 const struct gfs2_glock_operations gfs2_iopen_glops = { 727 717 .go_type = LM_TYPE_IOPEN, 728 718 .go_callback = iopen_go_callback, 719 + .go_demote_ok = iopen_go_demote_ok, 729 720 .go_flags = GLOF_LRU | GLOF_NONDISK, 730 721 }; 731 722
+7 -2
fs/gfs2/incore.h
··· 345 345 GLF_OBJECT = 14, /* Used only for tracing */ 346 346 GLF_BLOCKING = 15, 347 347 GLF_INODE_CREATING = 16, /* Inode creation occurring */ 348 + GLF_PENDING_DELETE = 17, 348 349 GLF_FREEING = 18, /* Wait for glock to be freed */ 349 350 }; 350 351 ··· 379 378 atomic_t gl_revokes; 380 379 struct delayed_work gl_work; 381 380 union { 382 - /* For inode and iopen glocks only */ 383 - struct work_struct gl_delete; 381 + /* For iopen glocks only */ 382 + struct { 383 + struct delayed_work gl_delete; 384 + u64 gl_no_formal_ino; 385 + }; 384 386 /* For rgrp glocks only */ 385 387 struct { 386 388 loff_t start; ··· 402 398 GIF_ORDERED = 4, 403 399 GIF_FREE_VFS_INODE = 5, 404 400 GIF_GLOP_PENDING = 6, 401 + GIF_DEFERRED_DELETE = 7, 405 402 }; 406 403 407 404 struct gfs2_inode {
+37 -12
fs/gfs2/inode.c
··· 115 115 * placeholder because it doesn't otherwise make sense), the on-disk block type 116 116 * is verified to be @blktype. 117 117 * 118 + * When @no_formal_ino is non-zero, this function will return ERR_PTR(-ESTALE) 119 + * if it detects that @no_formal_ino doesn't match the actual inode generation 120 + * number. However, it doesn't always know unless @type is DT_UNKNOWN. 121 + * 118 122 * Returns: A VFS inode, or an error 119 123 */ 120 124 ··· 162 158 if (error) 163 159 goto fail; 164 160 161 + error = -ESTALE; 162 + if (no_formal_ino && 163 + gfs2_inode_already_deleted(ip->i_gl, no_formal_ino)) 164 + goto fail; 165 + 165 166 if (blktype != GFS2_BLKST_FREE) { 166 167 error = gfs2_check_blk_type(sdp, no_addr, 167 168 blktype); ··· 180 171 error = gfs2_glock_nq_init(io_gl, LM_ST_SHARED, GL_EXACT, &ip->i_iopen_gh); 181 172 if (unlikely(error)) 182 173 goto fail; 174 + gfs2_cancel_delete_work(ip->i_iopen_gh.gh_gl); 183 175 glock_set_object(ip->i_iopen_gh.gh_gl, ip); 184 176 gfs2_glock_put(io_gl); 185 177 io_gl = NULL; ··· 199 189 inode->i_mode = DT2IF(type); 200 190 } 201 191 202 - gfs2_set_iop(inode); 192 + if (gfs2_holder_initialized(&i_gh)) 193 + gfs2_glock_dq_uninit(&i_gh); 203 194 204 - unlock_new_inode(inode); 195 + gfs2_set_iop(inode); 205 196 } 206 197 207 - if (gfs2_holder_initialized(&i_gh)) 208 - gfs2_glock_dq_uninit(&i_gh); 198 + if (no_formal_ino && ip->i_no_formal_ino && 199 + no_formal_ino != ip->i_no_formal_ino) { 200 + if (inode->i_state & I_NEW) 201 + goto fail; 202 + iput(inode); 203 + return ERR_PTR(-ESTALE); 204 + } 205 + 206 + if (inode->i_state & I_NEW) 207 + unlock_new_inode(inode); 208 + 209 209 return inode; 210 210 211 211 fail: ··· 227 207 return ERR_PTR(error); 228 208 } 229 209 210 + /** 211 + * gfs2_lookup_by_inum - look up an inode by inode number 212 + * @sdp: The super block 213 + * @no_addr: The inode number 214 + * @no_formal_ino: The inode generation number (0 for any) 215 + * @blktype: Requested block type (see gfs2_inode_lookup) 216 + */ 230 217 struct inode *gfs2_lookup_by_inum(struct gfs2_sbd *sdp, u64 no_addr, 231 - u64 *no_formal_ino, unsigned int blktype) 218 + u64 no_formal_ino, unsigned int blktype) 232 219 { 233 220 struct super_block *sb = sdp->sd_vfs; 234 221 struct inode *inode; 235 222 int error; 236 223 237 - inode = gfs2_inode_lookup(sb, DT_UNKNOWN, no_addr, 0, blktype); 224 + inode = gfs2_inode_lookup(sb, DT_UNKNOWN, no_addr, no_formal_ino, 225 + blktype); 238 226 if (IS_ERR(inode)) 239 227 return inode; 240 228 241 - /* Two extra checks for NFS only */ 242 229 if (no_formal_ino) { 243 - error = -ESTALE; 244 - if (GFS2_I(inode)->i_no_formal_ino != *no_formal_ino) 245 - goto fail_iput; 246 - 247 230 error = -EIO; 248 231 if (GFS2_I(inode)->i_diskflags & GFS2_DIF_SYSTEM) 249 232 goto fail_iput; ··· 748 725 if (error) 749 726 goto fail_gunlock2; 750 727 728 + gfs2_cancel_delete_work(ip->i_iopen_gh.gh_gl); 751 729 glock_set_object(ip->i_iopen_gh.gh_gl, ip); 752 730 gfs2_set_iop(inode); 753 731 insert_inode_hash(inode); ··· 805 781 fail_free_inode: 806 782 if (ip->i_gl) { 807 783 glock_clear_object(ip->i_gl, ip); 808 - gfs2_glock_put(ip->i_gl); 784 + if (free_vfs_inode) /* else evict will do the put for us */ 785 + gfs2_glock_put(ip->i_gl); 809 786 } 810 787 gfs2_rs_delete(ip, NULL); 811 788 gfs2_qa_put(ip);
+1 -1
fs/gfs2/inode.h
··· 92 92 u64 no_addr, u64 no_formal_ino, 93 93 unsigned int blktype); 94 94 extern struct inode *gfs2_lookup_by_inum(struct gfs2_sbd *sdp, u64 no_addr, 95 - u64 *no_formal_ino, 95 + u64 no_formal_ino, 96 96 unsigned int blktype); 97 97 98 98 extern int gfs2_inode_refresh(struct gfs2_inode *ip);
+39 -17
fs/gfs2/log.c
··· 30 30 #include "util.h" 31 31 #include "dir.h" 32 32 #include "trace_gfs2.h" 33 + #include "trans.h" 33 34 34 35 static void gfs2_log_shutdown(struct gfs2_sbd *sdp); 35 36 ··· 146 145 struct gfs2_bufdata *bd; 147 146 struct buffer_head *bh; 148 147 149 - fs_err(sdp, "Error: In gfs2_ail1_flush for ten minutes! t=%d\n", 150 - current->journal_info ? 1 : 0); 151 - 152 148 list_for_each_entry_reverse(tr, &sdp->sd_ail1_list, tr_list) { 153 149 list_for_each_entry_reverse(bd, &tr->tr_ail1_list, 154 150 bd_ail_st_list) { ··· 195 197 restart: 196 198 ret = 0; 197 199 if (time_after(jiffies, flush_start + (HZ * 600))) { 200 + fs_err(sdp, "Error: In %s for ten minutes! t=%d\n", 201 + __func__, current->journal_info ? 1 : 0); 198 202 dump_ail_list(sdp); 199 203 goto out; 200 204 } ··· 379 379 list_del(&tr->tr_list); 380 380 gfs2_assert_warn(sdp, list_empty(&tr->tr_ail1_list)); 381 381 gfs2_assert_warn(sdp, list_empty(&tr->tr_ail2_list)); 382 - kfree(tr); 382 + gfs2_trans_free(sdp, tr); 383 383 } 384 384 385 385 spin_unlock(&sdp->sd_ail_lock); ··· 864 864 gfs2_ail_empty_tr(sdp, tr, &tr->tr_ail1_list); 865 865 gfs2_ail_empty_tr(sdp, tr, &tr->tr_ail2_list); 866 866 list_del(&tr->tr_list); 867 - kfree(tr); 867 + gfs2_trans_free(sdp, tr); 868 868 } 869 869 while (!list_empty(&sdp->sd_ail2_list)) { 870 870 tr = list_first_entry(&sdp->sd_ail2_list, struct gfs2_trans, 871 871 tr_list); 872 872 gfs2_ail_empty_tr(sdp, tr, &tr->tr_ail2_list); 873 873 list_del(&tr->tr_list); 874 - kfree(tr); 874 + gfs2_trans_free(sdp, tr); 875 875 } 876 876 spin_unlock(&sdp->sd_ail_lock); 877 + } 878 + 879 + /** 880 + * empty_ail1_list - try to start IO and empty the ail1 list 881 + * @sdp: Pointer to GFS2 superblock 882 + */ 883 + static void empty_ail1_list(struct gfs2_sbd *sdp) 884 + { 885 + unsigned long start = jiffies; 886 + 887 + for (;;) { 888 + if (time_after(jiffies, start + (HZ * 600))) { 889 + fs_err(sdp, "Error: In %s for 10 minutes! t=%d\n", 890 + __func__, current->journal_info ? 1 : 0); 891 + dump_ail_list(sdp); 892 + return; 893 + } 894 + gfs2_ail1_start(sdp); 895 + gfs2_ail1_wait(sdp); 896 + if (gfs2_ail1_empty(sdp, 0)) 897 + return; 898 + } 877 899 } 878 900 879 901 /** ··· 934 912 tr = sdp->sd_log_tr; 935 913 if (tr) { 936 914 sdp->sd_log_tr = NULL; 937 - INIT_LIST_HEAD(&tr->tr_ail1_list); 938 - INIT_LIST_HEAD(&tr->tr_ail2_list); 939 915 tr->tr_first = sdp->sd_log_flush_head; 940 916 if (unlikely (state == SFS_FROZEN)) 941 917 if (gfs2_assert_withdraw_delayed(sdp, ··· 985 965 986 966 if (!(flags & GFS2_LOG_HEAD_FLUSH_NORMAL)) { 987 967 if (!sdp->sd_log_idle) { 988 - for (;;) { 989 - gfs2_ail1_start(sdp); 990 - gfs2_ail1_wait(sdp); 991 - if (gfs2_ail1_empty(sdp, 0)) 992 - break; 993 - } 968 + empty_ail1_list(sdp); 994 969 if (gfs2_withdrawn(sdp)) 995 970 goto out; 996 971 atomic_dec(&sdp->sd_log_blks_free); /* Adjust for unreserved buffer */ ··· 1009 994 trace_gfs2_log_flush(sdp, 0, flags); 1010 995 up_write(&sdp->sd_log_flush_lock); 1011 996 1012 - kfree(tr); 997 + gfs2_trans_free(sdp, tr); 1013 998 } 1014 999 1015 1000 /** ··· 1018 1003 * @new: New transaction to be merged 1019 1004 */ 1020 1005 1021 - static void gfs2_merge_trans(struct gfs2_trans *old, struct gfs2_trans *new) 1006 + static void gfs2_merge_trans(struct gfs2_sbd *sdp, struct gfs2_trans *new) 1022 1007 { 1008 + struct gfs2_trans *old = sdp->sd_log_tr; 1009 + 1023 1010 WARN_ON_ONCE(!test_bit(TR_ATTACHED, &old->tr_flags)); 1024 1011 1025 1012 old->tr_num_buf_new += new->tr_num_buf_new; ··· 1033 1016 1034 1017 list_splice_tail_init(&new->tr_databuf, &old->tr_databuf); 1035 1018 list_splice_tail_init(&new->tr_buf, &old->tr_buf); 1019 + 1020 + spin_lock(&sdp->sd_ail_lock); 1021 + list_splice_tail_init(&new->tr_ail1_list, &old->tr_ail1_list); 1022 + list_splice_tail_init(&new->tr_ail2_list, &old->tr_ail2_list); 1023 + spin_unlock(&sdp->sd_ail_lock); 1036 1024 } 1037 1025 1038 1026 static void log_refund(struct gfs2_sbd *sdp, struct gfs2_trans *tr) ··· 1049 1027 gfs2_log_lock(sdp); 1050 1028 1051 1029 if (sdp->sd_log_tr) { 1052 - gfs2_merge_trans(sdp->sd_log_tr, tr); 1030 + gfs2_merge_trans(sdp, tr); 1053 1031 } else if (tr->tr_num_buf_new || tr->tr_num_databuf_new) { 1054 1032 gfs2_assert_withdraw(sdp, test_bit(TR_ALLOCED, &tr->tr_flags)); 1055 1033 sdp->sd_log_tr = tr;
+9
fs/gfs2/main.c
··· 143 143 if (!gfs2_qadata_cachep) 144 144 goto fail_cachep7; 145 145 146 + gfs2_trans_cachep = kmem_cache_create("gfs2_trans", 147 + sizeof(struct gfs2_trans), 148 + 0, 0, NULL); 149 + if (!gfs2_trans_cachep) 150 + goto fail_cachep8; 151 + 146 152 error = register_shrinker(&gfs2_qd_shrinker); 147 153 if (error) 148 154 goto fail_shrinker; ··· 200 194 fail_fs1: 201 195 unregister_shrinker(&gfs2_qd_shrinker); 202 196 fail_shrinker: 197 + kmem_cache_destroy(gfs2_trans_cachep); 198 + fail_cachep8: 203 199 kmem_cache_destroy(gfs2_qadata_cachep); 204 200 fail_cachep7: 205 201 kmem_cache_destroy(gfs2_quotad_cachep); ··· 244 236 rcu_barrier(); 245 237 246 238 mempool_destroy(gfs2_page_pool); 239 + kmem_cache_destroy(gfs2_trans_cachep); 247 240 kmem_cache_destroy(gfs2_qadata_cachep); 248 241 kmem_cache_destroy(gfs2_quotad_cachep); 249 242 kmem_cache_destroy(gfs2_rgrpd_cachep);
+1 -1
fs/gfs2/ops_fstype.c
··· 880 880 } 881 881 882 882 static const match_table_t nolock_tokens = { 883 - { Opt_jid, "jid=%d\n", }, 883 + { Opt_jid, "jid=%d", }, 884 884 { Opt_err, NULL }, 885 885 }; 886 886
+1 -1
fs/gfs2/rgrp.c
··· 1835 1835 */ 1836 1836 ip = gl->gl_object; 1837 1837 1838 - if (ip || queue_work(gfs2_delete_workqueue, &gl->gl_delete) == 0) 1838 + if (ip || !gfs2_queue_delete_work(gl, 0)) 1839 1839 gfs2_glock_put(gl); 1840 1840 else 1841 1841 found++;
+62 -10
fs/gfs2/super.c
··· 626 626 } 627 627 } 628 628 629 - flush_workqueue(gfs2_delete_workqueue); 629 + gfs2_flush_delete_work(sdp); 630 630 if (!log_write_allowed && current == sdp->sd_quotad_process) 631 631 fs_warn(sdp, "The quotad daemon is withdrawing.\n"); 632 632 else if (sdp->sd_quotad_process) ··· 1054 1054 struct gfs2_glock *gl = ip->i_iopen_gh.gh_gl; 1055 1055 1056 1056 gfs2_glock_hold(gl); 1057 - if (queue_work(gfs2_delete_workqueue, &gl->gl_delete) == 0) 1057 + if (!gfs2_queue_delete_work(gl, 0)) 1058 1058 gfs2_glock_queue_put(gl); 1059 1059 return false; 1060 1060 } ··· 1258 1258 gfs2_glock_put(gl); 1259 1259 } 1260 1260 1261 + static bool gfs2_upgrade_iopen_glock(struct inode *inode) 1262 + { 1263 + struct gfs2_inode *ip = GFS2_I(inode); 1264 + struct gfs2_sbd *sdp = GFS2_SB(inode); 1265 + struct gfs2_holder *gh = &ip->i_iopen_gh; 1266 + long timeout = 5 * HZ; 1267 + int error; 1268 + 1269 + gh->gh_flags |= GL_NOCACHE; 1270 + gfs2_glock_dq_wait(gh); 1271 + 1272 + /* 1273 + * If there are no other lock holders, we'll get the lock immediately. 1274 + * Otherwise, the other nodes holding the lock will be notified about 1275 + * our locking request. If they don't have the inode open, they'll 1276 + * evict the cached inode and release the lock. Otherwise, if they 1277 + * poke the inode glock, we'll take this as an indication that they 1278 + * still need the iopen glock and that they'll take care of deleting 1279 + * the inode when they're done. As a last resort, if another node 1280 + * keeps holding the iopen glock without showing any activity on the 1281 + * inode glock, we'll eventually time out. 1282 + * 1283 + * Note that we're passing the LM_FLAG_TRY_1CB flag to the first 1284 + * locking request as an optimization to notify lock holders as soon as 1285 + * possible. Without that flag, they'd be notified implicitly by the 1286 + * second locking request. 1287 + */ 1288 + 1289 + gfs2_holder_reinit(LM_ST_EXCLUSIVE, LM_FLAG_TRY_1CB | GL_NOCACHE, gh); 1290 + error = gfs2_glock_nq(gh); 1291 + if (error != GLR_TRYFAILED) 1292 + return !error; 1293 + 1294 + gfs2_holder_reinit(LM_ST_EXCLUSIVE, GL_ASYNC | GL_NOCACHE, gh); 1295 + error = gfs2_glock_nq(gh); 1296 + if (error) 1297 + return false; 1298 + 1299 + timeout = wait_event_interruptible_timeout(sdp->sd_async_glock_wait, 1300 + !test_bit(HIF_WAIT, &gh->gh_iflags) || 1301 + test_bit(GLF_DEMOTE, &ip->i_gl->gl_flags), 1302 + timeout); 1303 + if (!test_bit(HIF_HOLDER, &gh->gh_iflags)) { 1304 + gfs2_glock_dq(gh); 1305 + return false; 1306 + } 1307 + return true; 1308 + } 1309 + 1261 1310 /** 1262 1311 * gfs2_evict_inode - Remove an inode from cache 1263 1312 * @inode: The inode to evict ··· 1348 1299 if (test_bit(GIF_ALLOC_FAILED, &ip->i_flags)) { 1349 1300 BUG_ON(!gfs2_glock_is_locked_by_me(ip->i_gl)); 1350 1301 gfs2_holder_mark_uninitialized(&gh); 1351 - goto alloc_failed; 1302 + goto out_delete; 1352 1303 } 1304 + 1305 + if (test_bit(GIF_DEFERRED_DELETE, &ip->i_flags)) 1306 + goto out; 1353 1307 1354 1308 /* Deletes should never happen under memory pressure anymore. */ 1355 1309 if (WARN_ON_ONCE(current->flags & PF_MEMALLOC)) ··· 1367 1315 goto out; 1368 1316 } 1369 1317 1318 + if (gfs2_inode_already_deleted(ip->i_gl, ip->i_no_formal_ino)) 1319 + goto out_truncate; 1370 1320 error = gfs2_check_blk_type(sdp, ip->i_no_addr, GFS2_BLKST_UNLINKED); 1371 1321 if (error) 1372 1322 goto out_truncate; ··· 1385 1331 if (inode->i_nlink) 1386 1332 goto out_truncate; 1387 1333 1388 - alloc_failed: 1334 + out_delete: 1389 1335 if (gfs2_holder_initialized(&ip->i_iopen_gh) && 1390 1336 test_bit(HIF_HOLDER, &ip->i_iopen_gh.gh_iflags)) { 1391 - ip->i_iopen_gh.gh_flags |= GL_NOCACHE; 1392 - gfs2_glock_dq_wait(&ip->i_iopen_gh); 1393 - gfs2_holder_reinit(LM_ST_EXCLUSIVE, LM_FLAG_TRY_1CB | GL_NOCACHE, 1394 - &ip->i_iopen_gh); 1395 - error = gfs2_glock_nq(&ip->i_iopen_gh); 1396 - if (error) 1337 + if (!gfs2_upgrade_iopen_glock(inode)) { 1338 + gfs2_holder_uninit(&ip->i_iopen_gh); 1397 1339 goto out_truncate; 1340 + } 1398 1341 } 1399 1342 1400 1343 if (S_ISDIR(inode->i_mode) && ··· 1419 1368 that subsequent inode creates don't see an old gl_object. */ 1420 1369 glock_clear_object(ip->i_gl, ip); 1421 1370 error = gfs2_dinode_dealloc(ip); 1371 + gfs2_inode_remember_delete(ip->i_gl, ip->i_no_formal_ino); 1422 1372 goto out_unlock; 1423 1373 1424 1374 out_truncate:
+17 -4
fs/gfs2/trans.c
··· 37 37 if (!test_bit(SDF_JOURNAL_LIVE, &sdp->sd_flags)) 38 38 return -EROFS; 39 39 40 - tr = kzalloc(sizeof(struct gfs2_trans), GFP_NOFS); 40 + tr = kmem_cache_zalloc(gfs2_trans_cachep, GFP_NOFS); 41 41 if (!tr) 42 42 return -ENOMEM; 43 43 ··· 52 52 tr->tr_reserved += gfs2_struct2blk(sdp, revokes); 53 53 INIT_LIST_HEAD(&tr->tr_databuf); 54 54 INIT_LIST_HEAD(&tr->tr_buf); 55 + INIT_LIST_HEAD(&tr->tr_ail1_list); 56 + INIT_LIST_HEAD(&tr->tr_ail2_list); 55 57 56 58 sb_start_intwrite(sdp->sd_vfs); 57 59 ··· 67 65 68 66 fail: 69 67 sb_end_intwrite(sdp->sd_vfs); 70 - kfree(tr); 68 + kmem_cache_free(gfs2_trans_cachep, tr); 71 69 72 70 return error; 73 71 } ··· 95 93 if (!test_bit(TR_TOUCHED, &tr->tr_flags)) { 96 94 gfs2_log_release(sdp, tr->tr_reserved); 97 95 if (alloced) { 98 - kfree(tr); 96 + gfs2_trans_free(sdp, tr); 99 97 sb_end_intwrite(sdp->sd_vfs); 100 98 } 101 99 return; ··· 111 109 112 110 gfs2_log_commit(sdp, tr); 113 111 if (alloced && !test_bit(TR_ATTACHED, &tr->tr_flags)) 114 - kfree(tr); 112 + gfs2_trans_free(sdp, tr); 115 113 up_read(&sdp->sd_log_flush_lock); 116 114 117 115 if (sdp->sd_vfs->s_flags & SB_SYNCHRONOUS) ··· 278 276 gfs2_log_unlock(sdp); 279 277 } 280 278 279 + void gfs2_trans_free(struct gfs2_sbd *sdp, struct gfs2_trans *tr) 280 + { 281 + if (tr == NULL) 282 + return; 283 + 284 + gfs2_assert_warn(sdp, list_empty(&tr->tr_ail1_list)); 285 + gfs2_assert_warn(sdp, list_empty(&tr->tr_ail2_list)); 286 + gfs2_assert_warn(sdp, list_empty(&tr->tr_databuf)); 287 + gfs2_assert_warn(sdp, list_empty(&tr->tr_buf)); 288 + kmem_cache_free(gfs2_trans_cachep, tr); 289 + }
+1
fs/gfs2/trans.h
··· 42 42 extern void gfs2_trans_add_meta(struct gfs2_glock *gl, struct buffer_head *bh); 43 43 extern void gfs2_trans_add_revoke(struct gfs2_sbd *sdp, struct gfs2_bufdata *bd); 44 44 extern void gfs2_trans_remove_revoke(struct gfs2_sbd *sdp, u64 blkno, unsigned int len); 45 + extern void gfs2_trans_free(struct gfs2_sbd *sdp, struct gfs2_trans *tr); 45 46 46 47 #endif /* __TRANS_DOT_H__ */
+1
fs/gfs2/util.c
··· 32 32 struct kmem_cache *gfs2_rgrpd_cachep __read_mostly; 33 33 struct kmem_cache *gfs2_quotad_cachep __read_mostly; 34 34 struct kmem_cache *gfs2_qadata_cachep __read_mostly; 35 + struct kmem_cache *gfs2_trans_cachep __read_mostly; 35 36 mempool_t *gfs2_page_pool __read_mostly; 36 37 37 38 void gfs2_assert_i(struct gfs2_sbd *sdp)
+1
fs/gfs2/util.h
··· 172 172 extern struct kmem_cache *gfs2_rgrpd_cachep; 173 173 extern struct kmem_cache *gfs2_quotad_cachep; 174 174 extern struct kmem_cache *gfs2_qadata_cachep; 175 + extern struct kmem_cache *gfs2_trans_cachep; 175 176 extern mempool_t *gfs2_page_pool; 176 177 extern struct workqueue_struct *gfs2_control_wq; 177 178
+6
include/uapi/linux/gfs2_ondisk.h
··· 171 171 #define GFS2_RGF_NOALLOC 0x00000008 172 172 #define GFS2_RGF_TRIMMED 0x00000010 173 173 174 + struct gfs2_inode_lvb { 175 + __be32 ri_magic; 176 + __be32 __pad; 177 + __be64 ri_generation_deleted; 178 + }; 179 + 174 180 struct gfs2_rgrp_lvb { 175 181 __be32 rl_magic; 176 182 __be32 rl_flags;