Merge tag 'xfs-merge-6.19' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

+6 -230

Documentation/filesystems/xfs/xfs-online-fsck-design.rst

··· 105 105 TLDR; Show Me the Code! 106 106 ----------------------- 107 107 108 - Code is posted to the kernel.org git trees as follows: 109 - `kernel changes <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-symlink>`_, 110 - `userspace changes <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=scrub-media-scan-service>`_, and 111 - `QA test changes <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.git/log/?h=repair-dirs>`_. 108 + Kernel and userspace code has been fully merged as of October 2025. 109 + 112 110 Each kernel patchset adding an online repair function will use the same branch 113 111 name across the kernel, xfsprogs, and fstests git repos. 114 112 ··· 762 764 and they enable XFS developers to find deficiencies in the code base. 763 765 764 766 Proposed patchsets include 765 - `general fuzzer improvements 766 - <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.git/log/?h=fuzzer-improvements>`_, 767 767 `fuzzing baselines 768 - <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.git/log/?h=fuzz-baseline>`_, 769 - and `improvements in fuzz testing comprehensiveness 770 - <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.git/log/?h=more-fuzz-testing>`_. 768 + <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.git/log/?h=fuzz-baseline>`_. 771 769 772 770 Stress Testing 773 771 -------------- ··· 794 800 Success is defined by the ability to run all of these tests without observing 795 801 any unexpected filesystem shutdowns due to corrupted metadata, kernel hang 796 802 check warnings, or any other sort of mischief. 797 - 798 - Proposed patchsets include `general stress testing 799 - <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.git/log/?h=race-scrub-and-mount-state-changes>`_ 800 - and the `evolution of existing per-function stress testing 801 - <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.git/log/?h=refactor-scrub-stress>`_. 802 803 803 804 4. User Interface 804 805 ================= ··· 875 886 This measure was taken to minimize delays in the rest of the filesystem. 876 887 No such hardening has been performed for the cron job. 877 888 878 - Proposed patchset: 879 - `Enabling the xfs_scrub background service 880 - <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=scrub-media-scan-service>`_. 881 - 882 889 Health Reporting 883 890 ---------------- 884 891 ··· 896 911 897 912 *Answer*: These questions remain unanswered, but should be a part of the 898 913 conversation with early adopters and potential downstream users of XFS. 899 - 900 - Proposed patchsets include 901 - `wiring up health reports to correction returns 902 - <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=corruption-health-reports>`_ 903 - and 904 - `preservation of sickness info during memory reclaim 905 - <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=indirect-health-reporting>`_. 906 914 907 915 5. Kernel Algorithms and Data Structures 908 916 ======================================== ··· 1287 1309 records mapped to a particular space extent and ignoring the owner info), 1288 1310 are there the same number of reverse mapping records for each block as the 1289 1311 reference count record claims? 1290 - 1291 - Proposed patchsets are the series to find gaps in 1292 - `refcount btree 1293 - <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=scrub-detect-refcount-gaps>`_, 1294 - `inode btree 1295 - <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=scrub-detect-inobt-gaps>`_, and 1296 - `rmap btree 1297 - <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=scrub-detect-rmapbt-gaps>`_ records; 1298 - to find 1299 - `mergeable records 1300 - <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=scrub-detect-mergeable-records>`_; 1301 - and to 1302 - `improve cross referencing with rmap 1303 - <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=scrub-strengthen-rmap-checking>`_ 1304 - before starting a repair. 1305 1312 1306 1313 Checking Extended Attributes 1307 1314 ```````````````````````````` ··· 1719 1756 To avoid polling in step 4, the drain provides a waitqueue for scrub threads to 1720 1757 be woken up whenever the intent count drops to zero. 1721 1758 1722 - The proposed patchset is the 1723 - `scrub intent drain series 1724 - <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=scrub-drain-intents>`_. 1725 - 1726 1759 .. _jump_labels: 1727 1760 1728 1761 Static Keys (aka Jump Label Patching) ··· 1995 2036 null record slot in the bag; and the ``xfarray_unset`` function removes a 1996 2037 record from the bag. 1997 2038 1998 - The proposed patchset is the 1999 - `big in-memory array 2000 - <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=big-array>`_. 2001 - 2002 2039 Iterating Array Elements 2003 2040 ^^^^^^^^^^^^^^^^^^^^^^^^ 2004 2041 ··· 2127 2172 to cache a small number of entries before adding them to a temporary ondisk 2128 2173 file, which is why compaction is not required. 2129 2174 2130 - The proposed patchset is at the start of the 2131 - `extended attribute repair 2132 - <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-xattrs>`_ series. 2133 - 2134 2175 .. _xfbtree: 2135 2176 2136 2177 In-Memory B+Trees ··· 2164 2213 xfiles enables reuse of the entire btree library. 2165 2214 Btrees built atop an xfile are collectively known as ``xfbtrees``. 2166 2215 The next few sections describe how they actually work. 2167 - 2168 - The proposed patchset is the 2169 - `in-memory btree 2170 - <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=in-memory-btrees>`_ 2171 - series. 2172 2216 2173 2217 Using xfiles as a Buffer Cache Target 2174 2218 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ··· 2405 2459 EFIs have a role to play during the commit and reaping phases; please see the 2406 2460 next section and the section about :ref:`reaping<reaping>` for more details. 2407 2461 2408 - Proposed patchsets are the 2409 - `bitmap rework 2410 - <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-bitmap-rework>`_ 2411 - and the 2412 - `preparation for bulk loading btrees 2413 - <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-prep-for-bulk-loading>`_. 2414 - 2415 - 2416 2462 Writing the New Tree 2417 2463 ```````````````````` 2418 2464 ··· 2561 2623 but the record count for the free inode btree has to be computed as inode chunk 2562 2624 records are stored in the xfarray. 2563 2625 2564 - The proposed patchset is the 2565 - `AG btree repair 2566 - <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-ag-btrees>`_ 2567 - series. 2568 - 2569 2626 Case Study: Rebuilding the Space Reference Counts 2570 2627 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2571 2628 ··· 2649 2716 removed via ``xfarray_unset``. 2650 2717 Bag members are examined through ``xfarray_iter`` loops. 2651 2718 2652 - The proposed patchset is the 2653 - `AG btree repair 2654 - <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-ag-btrees>`_ 2655 - series. 2656 - 2657 2719 Case Study: Rebuilding File Fork Mapping Indices 2658 2720 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2659 2721 ··· 2684 2756 EXTENTS format instead of BMBT, which may require a conversion. 2685 2757 Third, the incore extent map must be reloaded carefully to avoid disturbing 2686 2758 any delayed allocation extents. 2687 - 2688 - The proposed patchset is the 2689 - `file mapping repair 2690 - <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-file-mappings>`_ 2691 - series. 2692 2759 2693 2760 .. _reaping: 2694 2761 ··· 2765 2842 blocks. 2766 2843 As stated earlier, online repair functions use very large transactions to 2767 2844 minimize the chances of this occurring. 2768 - 2769 - The proposed patchset is the 2770 - `preparation for bulk loading btrees 2771 - <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-prep-for-bulk-loading>`_ 2772 - series. 2773 2845 2774 2846 Case Study: Reaping After a Regular Btree Repair 2775 2847 ```````````````````````````````````````````````` ··· 2861 2943 btrees. 2862 2944 These blocks can then be reaped using the methods outlined above. 2863 2945 2864 - The proposed patchset is the 2865 - `AG btree repair 2866 - <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-ag-btrees>`_ 2867 - series. 2868 - 2869 2946 .. _rmap_reap: 2870 2947 2871 2948 Case Study: Reaping After Repairing Reverse Mapping Btrees ··· 2884 2971 2885 2972 The rest of the process of rebuildng the reverse mapping btree is discussed 2886 2973 in a separate :ref:`case study<rmap_repair>`. 2887 - 2888 - The proposed patchset is the 2889 - `AG btree repair 2890 - <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-ag-btrees>`_ 2891 - series. 2892 2974 2893 2975 Case Study: Rebuilding the AGFL 2894 2976 ``````````````````````````````` ··· 2932 3024 forks, or if that fails, leaving the fields invalid and waiting for the fork 2933 3025 fsck functions to run. 2934 3026 2935 - The proposed patchset is the 2936 - `inode 2937 - <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-inodes>`_ 2938 - repair series. 2939 - 2940 3027 Quota Record Repairs 2941 3028 -------------------- 2942 3029 ··· 2947 3044 2948 3045 Quota usage counters are checked, repaired, and discussed separately in the 2949 3046 section about :ref:`live quotacheck <quotacheck>`. 2950 - 2951 - The proposed patchset is the 2952 - `quota 2953 - <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-quota>`_ 2954 - repair series. 2955 3047 2956 3048 .. _fscounters: 2957 3049 ··· 3042 3144 | sync_filesystem fails to flush the filesystem and returns an error. | 3043 3145 | This bug was fixed in Linux 5.17. | 3044 3146 +--------------------------------------------------------------------------+ 3045 - 3046 - The proposed patchset is the 3047 - `summary counter cleanup 3048 - <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-fscounters>`_ 3049 - series. 3050 3147 3051 3148 Full Filesystem Scans 3052 3149 --------------------- ··· 3170 3277 coordinator must release the AGI and push the main filesystem to get the inode 3171 3278 back into a loadable state. 3172 3279 3173 - The proposed patches are the 3174 - `inode scanner 3175 - <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=scrub-iscan>`_ 3176 - series. 3177 - The first user of the new functionality is the 3178 - `online quotacheck 3179 - <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-quotacheck>`_ 3180 - series. 3181 - 3182 3280 Inode Management 3183 3281 ```````````````` 3184 3282 ··· 3265 3381 function to set or clear the ``DONTCACHE`` flag to get the required release 3266 3382 behavior. 3267 3383 3268 - Proposed patchsets include fixing 3269 - `scrub iget usage 3270 - <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=scrub-iget-fixes>`_ and 3271 - `dir iget usage 3272 - <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=scrub-dir-iget-fixes>`_. 3273 - 3274 3384 .. _ilocking: 3275 3385 3276 3386 Locking Inodes ··· 3320 3442 If the dotdot entry changes while the directory is unlocked, then a move or 3321 3443 rename operation must have changed the child's parentage, and the scan can 3322 3444 exit early. 3323 - 3324 - The proposed patchset is the 3325 - `directory repair 3326 - <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-dirs>`_ 3327 - series. 3328 3445 3329 3446 .. _fshooks: 3330 3447 ··· 3467 3594 3468 3595 - ``xchk_iscan_teardown`` to finish the scan 3469 3596 3470 - This functionality is also a part of the 3471 - `inode scanner 3472 - <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=scrub-iscan>`_ 3473 - series. 3474 - 3475 3597 .. _quotacheck: 3476 3598 3477 3599 Case Study: Quota Counter Checking ··· 3554 3686 If repairs are desired, the real and shadow dquots are locked and their 3555 3687 resource counts are set to the values in the shadow dquot. 3556 3688 3557 - The proposed patchset is the 3558 - `online quotacheck 3559 - <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-quotacheck>`_ 3560 - series. 3561 - 3562 3689 .. _nlinks: 3563 3690 3564 3691 Case Study: File Link Count Checking ··· 3606 3743 shadow information. 3607 3744 If no parents are found, the file must be :ref:`reparented <orphanage>` to the 3608 3745 orphanage to prevent the file from being lost forever. 3609 - 3610 - The proposed patchset is the 3611 - `file link count repair 3612 - <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=scrub-nlinks>`_ 3613 - series. 3614 3746 3615 3747 .. _rmap_repair: 3616 3748 ··· 3685 3827 to :ref:`reap after rmap btree repair <rmap_reap>`. 3686 3828 3687 3829 12. Free the xfbtree now that it not needed. 3688 - 3689 - The proposed patchset is the 3690 - `rmap repair 3691 - <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-rmap-btree>`_ 3692 - series. 3693 3830 3694 3831 Staging Repairs with Temporary Files on Disk 3695 3832 -------------------------------------------- ··· 3824 3971 must be conveyed to the file being repaired, which is the topic of the next 3825 3972 section. 3826 3973 3827 - The proposed patches are in the 3828 - `repair temporary files 3829 - <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-tempfiles>`_ 3830 - series. 3831 - 3832 3974 Logged File Content Exchanges 3833 3975 ----------------------------- 3834 3976 ··· 3872 4024 The new ``XFS_SB_FEAT_INCOMPAT_EXCHRANGE`` incompatible feature flag 3873 4025 in the superblock protects these new log item records from being replayed on 3874 4026 old kernels. 3875 - 3876 - The proposed patchset is the 3877 - `file contents exchange 3878 - <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=atomic-file-updates>`_ 3879 - series. 3880 4027 3881 4028 +--------------------------------------------------------------------------+ 3882 4029 | **Sidebar: Using Log-Incompatible Feature Flags** | ··· 4166 4323 and use atomic mapping exchange to commit the new contents. 4167 4324 The temporary file is then reaped. 4168 4325 4169 - The proposed patchset is the 4170 - `realtime summary repair 4171 - <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-rtsummary>`_ 4172 - series. 4173 - 4174 4326 Case Study: Salvaging Extended Attributes 4175 4327 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 4176 4328 ··· 4206 4368 The old attribute blocks are now attached to the temporary file. 4207 4369 4208 4370 4. Reap the temporary file. 4209 - 4210 - The proposed patchset is the 4211 - `extended attribute repair 4212 - <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-xattrs>`_ 4213 - series. 4214 4371 4215 4372 Fixing Directories 4216 4373 ------------------ ··· 4280 4447 Unfortunately, the current dentry cache design doesn't provide a means to walk 4281 4448 every child dentry of a specific directory, which makes this a hard problem. 4282 4449 There is no known solution. 4283 - 4284 - The proposed patchset is the 4285 - `directory repair 4286 - <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-dirs>`_ 4287 - series. 4288 4450 4289 4451 Parent Pointers 4290 4452 ``````````````` ··· 4440 4612 4441 4613 7. Reap the temporary directory. 4442 4614 4443 - The proposed patchset is the 4444 - `parent pointers directory repair 4445 - <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=pptrs-fsck>`_ 4446 - series. 4447 - 4448 4615 Case Study: Repairing Parent Pointers 4449 4616 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 4450 4617 ··· 4484 4661 The temporary file now contains the damaged extended attribute structure. 4485 4662 4486 4663 8. Reap the temporary file. 4487 - 4488 - The proposed patchset is the 4489 - `parent pointers repair 4490 - <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=pptrs-fsck>`_ 4491 - series. 4492 4664 4493 4665 Digression: Offline Checking of Parent Pointers 4494 4666 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ··· 4572 4754 Advance both cursors. 4573 4755 4574 4756 4. Move on to examining link counts, as we do today. 4575 - 4576 - The proposed patchset is the 4577 - `offline parent pointers repair 4578 - <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=pptrs-fsck>`_ 4579 - series. 4580 4757 4581 4758 Rebuilding directories from parent pointers in offline repair would be very 4582 4759 challenging because xfs_repair currently uses two single-pass scans of the ··· 4716 4903 4717 4904 6. If the subdirectory has zero paths, attach it to the lost and found. 4718 4905 4719 - The proposed patches are in the 4720 - `directory tree repair 4721 - <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=scrub-directory-tree>`_ 4722 - series. 4723 - 4724 - 4725 4906 .. _orphanage: 4726 4907 4727 4908 The Orphanage ··· 4779 4972 4780 4973 7. If a runtime error happens, call ``xrep_adoption_cancel`` to release all 4781 4974 resources. 4782 - 4783 - The proposed patches are in the 4784 - `orphanage adoption 4785 - <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-orphanage>`_ 4786 - series. 4787 4975 4788 4976 6. Userspace Algorithms and Data Structures 4789 4977 =========================================== ··· 4893 5091 This doesn't completely solve the balancing problem, but reduces it enough to 4894 5092 move on to more pressing issues. 4895 5093 4896 - The proposed patchsets are the scrub 4897 - `performance tweaks 4898 - <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=scrub-performance-tweaks>`_ 4899 - and the 4900 - `inode scan rebalance 4901 - <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=scrub-iscan-rebalance>`_ 4902 - series. 4903 - 4904 5094 .. _scrubrepair: 4905 5095 4906 5096 Scheduling Repairs ··· 4972 5178 immediately. 4973 5179 Corrupt file data blocks reported by phase 6 cannot be recovered by the 4974 5180 filesystem. 4975 - 4976 - The proposed patchsets are the 4977 - `repair warning improvements 4978 - <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=scrub-better-repair-warnings>`_, 4979 - refactoring of the 4980 - `repair data dependency 4981 - <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=scrub-repair-data-deps>`_ 4982 - and 4983 - `object tracking 4984 - <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=scrub-object-tracking>`_, 4985 - and the 4986 - `repair scheduling 4987 - <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=scrub-repair-scheduling>`_ 4988 - improvement series. 4989 5181 4990 5182 Checking Names for Confusable Unicode Sequences 4991 5183 ----------------------------------------------- ··· 5152 5372 This emulates an atomic device write in software, and can support arbitrary 5153 5373 scattered writes. 5154 5374 5375 + (This functionality was merged into mainline as of 2025) 5376 + 5155 5377 Vectorized Scrub 5156 5378 ---------------- 5157 5379 ··· 5175 5393 online fsck can use that instead of adding a separate vectored scrub system 5176 5394 call to XFS. 5177 5395 5178 - The relevant patchsets are the 5179 - `kernel vectorized scrub 5180 - <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=vectorized-scrub>`_ 5181 - and 5182 - `userspace vectorized scrub 5183 - <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=vectorized-scrub>`_ 5184 - series. 5396 + (This functionality was merged into mainline as of 2025) 5185 5397 5186 5398 Quality of Service Targets for Scrub 5187 5399 ------------------------------------

+9

fs/xfs/libxfs/xfs_group.h

··· 98 98 return xg->xg_mount->m_groups[xg->xg_type].blocks; 99 99 } 100 100 101 + static inline xfs_rfsblock_t 102 + xfs_groups_to_rfsbs( 103 + struct xfs_mount *mp, 104 + uint32_t nr_groups, 105 + enum xfs_group_type type) 106 + { 107 + return (xfs_rfsblock_t)mp->m_groups[type].blocks * nr_groups; 108 + } 109 + 101 110 static inline xfs_fsblock_t 102 111 xfs_group_start_fsb( 103 112 struct xfs_group *xg)

+19 -19

fs/xfs/libxfs/xfs_log_format.h

··· 31 31 #define XLOG_BIG_RECORD_BSIZE (32*1024) /* 32k buffers */ 32 32 #define XLOG_MAX_RECORD_BSIZE (256*1024) 33 33 #define XLOG_HEADER_CYCLE_SIZE (32*1024) /* cycle data in header */ 34 + #define XLOG_CYCLE_DATA_SIZE (XLOG_HEADER_CYCLE_SIZE / BBSIZE) 34 35 #define XLOG_MIN_RECORD_BSHIFT 14 /* 16384 == 1 << 14 */ 35 36 #define XLOG_BIG_RECORD_BSHIFT 15 /* 32k == 1 << 15 */ 36 37 #define XLOG_MAX_RECORD_BSHIFT 18 /* 256k == 1 << 18 */ ··· 126 125 #define XLOG_FMT XLOG_FMT_LINUX_LE 127 126 #endif 128 127 129 - typedef struct xlog_rec_header { 128 + struct xlog_rec_ext_header { 129 + __be32 xh_cycle; /* write cycle of log */ 130 + __be32 xh_cycle_data[XLOG_CYCLE_DATA_SIZE]; 131 + __u8 xh_reserved[252]; 132 + }; 133 + 134 + /* actual ext header payload size for checksumming */ 135 + #define XLOG_REC_EXT_SIZE \ 136 + offsetofend(struct xlog_rec_ext_header, xh_cycle_data) 137 + 138 + struct xlog_rec_header { 130 139 __be32 h_magicno; /* log record (LR) identifier : 4 */ 131 140 __be32 h_cycle; /* write cycle of log : 4 */ 132 141 __be32 h_version; /* LR version : 4 */ ··· 146 135 __le32 h_crc; /* crc of log record : 4 */ 147 136 __be32 h_prev_block; /* block number to previous LR : 4 */ 148 137 __be32 h_num_logops; /* number of log operations in this LR : 4 */ 149 - __be32 h_cycle_data[XLOG_HEADER_CYCLE_SIZE / BBSIZE]; 138 + __be32 h_cycle_data[XLOG_CYCLE_DATA_SIZE]; 150 139 151 140 /* fields added by the Linux port: */ 152 141 __be32 h_fmt; /* format of log record : 4 */ ··· 171 160 * (little-endian) architectures. 172 161 */ 173 162 __u32 h_pad0; 174 - } xlog_rec_header_t; 163 + 164 + __u8 h_reserved[184]; 165 + struct xlog_rec_ext_header h_ext[]; 166 + }; 175 167 176 168 #ifdef __i386__ 177 169 #define XLOG_REC_SIZE offsetofend(struct xlog_rec_header, h_size) 178 - #define XLOG_REC_SIZE_OTHER sizeof(struct xlog_rec_header) 170 + #define XLOG_REC_SIZE_OTHER offsetofend(struct xlog_rec_header, h_pad0) 179 171 #else 180 - #define XLOG_REC_SIZE sizeof(struct xlog_rec_header) 172 + #define XLOG_REC_SIZE offsetofend(struct xlog_rec_header, h_pad0) 181 173 #define XLOG_REC_SIZE_OTHER offsetofend(struct xlog_rec_header, h_size) 182 174 #endif /* __i386__ */ 183 - 184 - typedef struct xlog_rec_ext_header { 185 - __be32 xh_cycle; /* write cycle of log : 4 */ 186 - __be32 xh_cycle_data[XLOG_HEADER_CYCLE_SIZE / BBSIZE]; /* : 256 */ 187 - } xlog_rec_ext_header_t; 188 - 189 - /* 190 - * Quite misnamed, because this union lays out the actual on-disk log buffer. 191 - */ 192 - typedef union xlog_in_core2 { 193 - xlog_rec_header_t hic_header; 194 - xlog_rec_ext_header_t hic_xheader; 195 - char hic_sector[XLOG_HEADER_SIZE]; 196 - } xlog_in_core_2_t; 197 175 198 176 /* not an on-disk structure, but needed by log recovery in userspace */ 199 177 struct xfs_log_iovec {

+4 -2

fs/xfs/libxfs/xfs_ondisk.h

··· 174 174 XFS_CHECK_STRUCT_SIZE(struct xfs_rud_log_format, 16); 175 175 XFS_CHECK_STRUCT_SIZE(struct xfs_map_extent, 32); 176 176 XFS_CHECK_STRUCT_SIZE(struct xfs_phys_extent, 16); 177 - XFS_CHECK_STRUCT_SIZE(struct xlog_rec_header, 328); 178 - XFS_CHECK_STRUCT_SIZE(struct xlog_rec_ext_header, 260); 177 + XFS_CHECK_STRUCT_SIZE(struct xlog_rec_header, 512); 178 + XFS_CHECK_STRUCT_SIZE(struct xlog_rec_ext_header, 512); 179 179 180 + XFS_CHECK_OFFSET(struct xlog_rec_header, h_reserved, 328); 181 + XFS_CHECK_OFFSET(struct xlog_rec_ext_header, xh_reserved, 260); 180 182 XFS_CHECK_OFFSET(struct xfs_bui_log_format, bui_extents, 16); 181 183 XFS_CHECK_OFFSET(struct xfs_cui_log_format, cui_extents, 16); 182 184 XFS_CHECK_OFFSET(struct xfs_rui_log_format, rui_extents, 16);

+1 -3

fs/xfs/libxfs/xfs_quota_defs.h

··· 29 29 * flags for q_flags field in the dquot. 30 30 */ 31 31 #define XFS_DQFLAG_DIRTY (1u << 0) /* dquot is dirty */ 32 - #define XFS_DQFLAG_FREEING (1u << 1) /* dquot is being torn down */ 33 32 34 33 #define XFS_DQFLAG_STRINGS \ 35 - { XFS_DQFLAG_DIRTY, "DIRTY" }, \ 36 - { XFS_DQFLAG_FREEING, "FREEING" } 34 + { XFS_DQFLAG_DIRTY, "DIRTY" } 37 35 38 36 /* 39 37 * We have the possibility of all three quota types being active at once, and

+8 -6

fs/xfs/libxfs/xfs_rtgroup.h

··· 64 64 */ 65 65 #define XFS_RTG_FREE XA_MARK_0 66 66 67 - /* 68 - * For zoned RT devices this is set on groups that are fully written and that 69 - * have unused blocks. Used by the garbage collection to pick targets. 70 - */ 71 - #define XFS_RTG_RECLAIMABLE XA_MARK_1 72 - 73 67 static inline struct xfs_rtgroup *to_rtg(struct xfs_group *xg) 74 68 { 75 69 return container_of(xg, struct xfs_rtgroup, rtg_group); ··· 364 370 # define xfs_log_rtsb(tp, sb_bp) (NULL) 365 371 # define xfs_rtgroup_get_geometry(rtg, rgeo) (-EOPNOTSUPP) 366 372 #endif /* CONFIG_XFS_RT */ 373 + 374 + static inline xfs_rfsblock_t 375 + xfs_rtgs_to_rfsbs( 376 + struct xfs_mount *mp, 377 + uint32_t nr_groups) 378 + { 379 + return xfs_groups_to_rfsbs(mp, nr_groups, XG_TYPE_RTG); 380 + } 367 381 368 382 #endif /* __LIBXFS_RTGROUP_H */

+3 -5

fs/xfs/scrub/quota.c

··· 155 155 * We want to validate the bmap record for the storage backing this 156 156 * dquot, so we need to lock the dquot and the quota file. For quota 157 157 * operations, the locking order is first the ILOCK and then the dquot. 158 - * However, dqiterate gave us a locked dquot, so drop the dquot lock to 159 - * get the ILOCK. 160 158 */ 161 - xfs_dqunlock(dq); 162 159 xchk_ilock(sc, XFS_ILOCK_SHARED); 163 - xfs_dqlock(dq); 160 + mutex_lock(&dq->q_qlock); 164 161 165 162 /* 166 163 * Except for the root dquot, the actual dquot we got must either have ··· 248 251 xchk_quota_item_timer(sc, offset, &dq->q_rtb); 249 252 250 253 out: 254 + mutex_unlock(&dq->q_qlock); 251 255 if (sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT) 252 256 return -ECANCELED; 253 257 ··· 328 330 xchk_dqiter_init(&cursor, sc, dqtype); 329 331 while ((error = xchk_dquot_iter(&cursor, &dq)) == 1) { 330 332 error = xchk_quota_item(&sqi, dq); 331 - xfs_qm_dqput(dq); 333 + xfs_qm_dqrele(dq); 332 334 if (error) 333 335 break; 334 336 }

+8 -10

fs/xfs/scrub/quota_repair.c

··· 184 184 /* 185 185 * We might need to fix holes in the bmap record for the storage 186 186 * backing this dquot, so we need to lock the dquot and the quota file. 187 - * dqiterate gave us a locked dquot, so drop the dquot lock to get the 188 - * ILOCK_EXCL. 189 187 */ 190 - xfs_dqunlock(dq); 191 188 xchk_ilock(sc, XFS_ILOCK_EXCL); 192 - xfs_dqlock(dq); 193 - 189 + mutex_lock(&dq->q_qlock); 194 190 error = xrep_quota_item_bmap(sc, dq, &dirty); 195 191 xchk_iunlock(sc, XFS_ILOCK_EXCL); 196 192 if (error) 197 - return error; 193 + goto out_unlock_dquot; 198 194 199 195 /* Check the limits. */ 200 196 if (dq->q_blk.softlimit > dq->q_blk.hardlimit) { ··· 242 246 xrep_quota_item_timer(sc, &dq->q_rtb, &dirty); 243 247 244 248 if (!dirty) 245 - return 0; 249 + goto out_unlock_dquot; 246 250 247 251 trace_xrep_dquot_item(sc->mp, dq->q_type, dq->q_id); 248 252 ··· 253 257 xfs_qm_adjust_dqtimers(dq); 254 258 } 255 259 xfs_trans_log_dquot(sc->tp, dq); 256 - error = xfs_trans_roll(&sc->tp); 257 - xfs_dqlock(dq); 260 + return xfs_trans_roll(&sc->tp); 261 + 262 + out_unlock_dquot: 263 + mutex_unlock(&dq->q_qlock); 258 264 return error; 259 265 } 260 266 ··· 511 513 xchk_dqiter_init(&cursor, sc, dqtype); 512 514 while ((error = xchk_dquot_iter(&cursor, &dq)) == 1) { 513 515 error = xrep_quota_item(&rqi, dq); 514 - xfs_qm_dqput(dq); 516 + xfs_qm_dqrele(dq); 515 517 if (error) 516 518 break; 517 519 }

+5 -6

fs/xfs/scrub/quotacheck.c

··· 563 563 return -ECANCELED; 564 564 } 565 565 566 + mutex_lock(&dq->q_qlock); 566 567 mutex_lock(&xqc->lock); 567 568 error = xfarray_load_sparse(counts, dq->q_id, &xcdq); 568 569 if (error) ··· 590 589 xchk_set_incomplete(xqc->sc); 591 590 error = -ECANCELED; 592 591 } 592 + out_unlock: 593 593 mutex_unlock(&xqc->lock); 594 + mutex_unlock(&dq->q_qlock); 594 595 if (error) 595 596 return error; 596 597 ··· 600 597 return -ECANCELED; 601 598 602 599 return 0; 603 - 604 - out_unlock: 605 - mutex_unlock(&xqc->lock); 606 - return error; 607 600 } 608 601 609 602 /* ··· 635 636 return error; 636 637 637 638 error = xqcheck_compare_dquot(xqc, dqtype, dq); 638 - xfs_qm_dqput(dq); 639 + xfs_qm_dqrele(dq); 639 640 if (error) 640 641 return error; 641 642 ··· 673 674 xchk_dqiter_init(&cursor, sc, dqtype); 674 675 while ((error = xchk_dquot_iter(&cursor, &dq)) == 1) { 675 676 error = xqcheck_compare_dquot(xqc, dqtype, dq); 676 - xfs_qm_dqput(dq); 677 + xfs_qm_dqrele(dq); 677 678 if (error) 678 679 break; 679 680 }

+4 -17

fs/xfs/scrub/quotacheck_repair.c

··· 52 52 bool dirty = false; 53 53 int error = 0; 54 54 55 - /* Unlock the dquot just long enough to allocate a transaction. */ 56 - xfs_dqunlock(dq); 57 55 error = xchk_trans_alloc(xqc->sc, 0); 58 - xfs_dqlock(dq); 59 56 if (error) 60 57 return error; 61 58 59 + mutex_lock(&dq->q_qlock); 62 60 xfs_trans_dqjoin(xqc->sc->tp, dq); 63 61 64 62 if (xchk_iscan_aborted(&xqc->iscan)) { ··· 113 115 if (dq->q_id) 114 116 xfs_qm_adjust_dqtimers(dq); 115 117 xfs_trans_log_dquot(xqc->sc->tp, dq); 116 - 117 - /* 118 - * Transaction commit unlocks the dquot, so we must re-lock it so that 119 - * the caller can put the reference (which apparently requires a locked 120 - * dquot). 121 - */ 122 - error = xrep_trans_commit(xqc->sc); 123 - xfs_dqlock(dq); 124 - return error; 118 + return xrep_trans_commit(xqc->sc); 125 119 126 120 out_unlock: 127 121 mutex_unlock(&xqc->lock); 128 122 out_cancel: 129 123 xchk_trans_cancel(xqc->sc); 130 - 131 - /* Re-lock the dquot so the caller can put the reference. */ 132 - xfs_dqlock(dq); 133 124 return error; 134 125 } 135 126 ··· 143 156 xchk_dqiter_init(&cursor, sc, dqtype); 144 157 while ((error = xchk_dquot_iter(&cursor, &dq)) == 1) { 145 158 error = xqcheck_commit_dquot(xqc, dqtype, dq); 146 - xfs_qm_dqput(dq); 159 + xfs_qm_dqrele(dq); 147 160 if (error) 148 161 break; 149 162 } ··· 174 187 return error; 175 188 176 189 error = xqcheck_commit_dquot(xqc, dqtype, dq); 177 - xfs_qm_dqput(dq); 190 + xfs_qm_dqrele(dq); 178 191 if (error) 179 192 return error; 180 193

+60 -85

fs/xfs/xfs_dquot.c

··· 31 31 * 32 32 * ip->i_lock 33 33 * qi->qi_tree_lock 34 - * dquot->q_qlock (xfs_dqlock() and friends) 34 + * dquot->q_qlock 35 35 * dquot->q_flush (xfs_dqflock() and friends) 36 36 * qi->qi_lru_lock 37 37 * ··· 801 801 static struct xfs_dquot * 802 802 xfs_qm_dqget_cache_lookup( 803 803 struct xfs_mount *mp, 804 - struct xfs_quotainfo *qi, 805 - struct radix_tree_root *tree, 806 - xfs_dqid_t id) 804 + xfs_dqid_t id, 805 + xfs_dqtype_t type) 807 806 { 807 + struct xfs_quotainfo *qi = mp->m_quotainfo; 808 + struct radix_tree_root *tree = xfs_dquot_tree(qi, type); 808 809 struct xfs_dquot *dqp; 809 810 810 811 restart: ··· 817 816 return NULL; 818 817 } 819 818 820 - xfs_dqlock(dqp); 821 - if (dqp->q_flags & XFS_DQFLAG_FREEING) { 822 - xfs_dqunlock(dqp); 819 + if (!lockref_get_not_dead(&dqp->q_lockref)) { 823 820 mutex_unlock(&qi->qi_tree_lock); 824 821 trace_xfs_dqget_freeing(dqp); 825 822 delay(1); 826 823 goto restart; 827 824 } 828 - 829 - dqp->q_nrefs++; 830 825 mutex_unlock(&qi->qi_tree_lock); 831 826 832 827 trace_xfs_dqget_hit(dqp); ··· 833 836 /* 834 837 * Try to insert a new dquot into the in-core cache. If an error occurs the 835 838 * caller should throw away the dquot and start over. Otherwise, the dquot 836 - * is returned locked (and held by the cache) as if there had been a cache 837 - * hit. 839 + * is returned (and held by the cache) as if there had been a cache hit. 838 840 * 839 841 * The insert needs to be done under memalloc_nofs context because the radix 840 842 * tree can do memory allocation during insert. The qi->qi_tree_lock is taken in ··· 844 848 static int 845 849 xfs_qm_dqget_cache_insert( 846 850 struct xfs_mount *mp, 847 - struct xfs_quotainfo *qi, 848 - struct radix_tree_root *tree, 849 851 xfs_dqid_t id, 852 + xfs_dqtype_t type, 850 853 struct xfs_dquot *dqp) 851 854 { 855 + struct xfs_quotainfo *qi = mp->m_quotainfo; 856 + struct radix_tree_root *tree = xfs_dquot_tree(qi, type); 852 857 unsigned int nofs_flags; 853 858 int error; 854 859 ··· 857 860 mutex_lock(&qi->qi_tree_lock); 858 861 error = radix_tree_insert(tree, id, dqp); 859 862 if (unlikely(error)) { 860 - /* Duplicate found! Caller must try again. */ 861 863 trace_xfs_dqget_dup(dqp); 862 864 goto out_unlock; 863 865 } 864 866 865 - /* Return a locked dquot to the caller, with a reference taken. */ 866 - xfs_dqlock(dqp); 867 - dqp->q_nrefs = 1; 867 + lockref_init(&dqp->q_lockref); 868 868 qi->qi_dquots++; 869 869 870 870 out_unlock: ··· 897 903 898 904 /* 899 905 * Given the file system, id, and type (UDQUOT/GDQUOT/PDQUOT), return a 900 - * locked dquot, doing an allocation (if requested) as needed. 906 + * dquot, doing an allocation (if requested) as needed. 901 907 */ 902 908 int 903 909 xfs_qm_dqget( ··· 907 913 bool can_alloc, 908 914 struct xfs_dquot **O_dqpp) 909 915 { 910 - struct xfs_quotainfo *qi = mp->m_quotainfo; 911 - struct radix_tree_root *tree = xfs_dquot_tree(qi, type); 912 916 struct xfs_dquot *dqp; 913 917 int error; 914 918 ··· 915 923 return error; 916 924 917 925 restart: 918 - dqp = xfs_qm_dqget_cache_lookup(mp, qi, tree, id); 919 - if (dqp) { 920 - *O_dqpp = dqp; 921 - return 0; 922 - } 926 + dqp = xfs_qm_dqget_cache_lookup(mp, id, type); 927 + if (dqp) 928 + goto found; 923 929 924 930 error = xfs_qm_dqread(mp, id, type, can_alloc, &dqp); 925 931 if (error) 926 932 return error; 927 933 928 - error = xfs_qm_dqget_cache_insert(mp, qi, tree, id, dqp); 934 + error = xfs_qm_dqget_cache_insert(mp, id, type, dqp); 929 935 if (error) { 930 - /* 931 - * Duplicate found. Just throw away the new dquot and start 932 - * over. 933 - */ 934 936 xfs_qm_dqdestroy(dqp); 935 - XFS_STATS_INC(mp, xs_qm_dquot_dups); 936 - goto restart; 937 + if (error == -EEXIST) { 938 + /* 939 + * Duplicate found. Just throw away the new dquot and 940 + * start over. 941 + */ 942 + XFS_STATS_INC(mp, xs_qm_dquot_dups); 943 + goto restart; 944 + } 945 + return error; 937 946 } 938 947 939 948 trace_xfs_dqget_miss(dqp); 949 + found: 940 950 *O_dqpp = dqp; 941 951 return 0; 942 952 } ··· 993 999 struct xfs_inode *ip, 994 1000 xfs_dqtype_t type, 995 1001 bool can_alloc, 996 - struct xfs_dquot **O_dqpp) 1002 + struct xfs_dquot **dqpp) 997 1003 { 998 1004 struct xfs_mount *mp = ip->i_mount; 999 - struct xfs_quotainfo *qi = mp->m_quotainfo; 1000 - struct radix_tree_root *tree = xfs_dquot_tree(qi, type); 1001 1005 struct xfs_dquot *dqp; 1002 1006 xfs_dqid_t id; 1003 1007 int error; 1008 + 1009 + ASSERT(!*dqpp); 1010 + xfs_assert_ilocked(ip, XFS_ILOCK_EXCL); 1004 1011 1005 1012 error = xfs_qm_dqget_checks(mp, type); 1006 1013 if (error) ··· 1014 1019 id = xfs_qm_id_for_quotatype(ip, type); 1015 1020 1016 1021 restart: 1017 - dqp = xfs_qm_dqget_cache_lookup(mp, qi, tree, id); 1018 - if (dqp) { 1019 - *O_dqpp = dqp; 1020 - return 0; 1021 - } 1022 + dqp = xfs_qm_dqget_cache_lookup(mp, id, type); 1023 + if (dqp) 1024 + goto found; 1022 1025 1023 1026 /* 1024 1027 * Dquot cache miss. We don't want to keep the inode lock across ··· 1042 1049 if (dqp1) { 1043 1050 xfs_qm_dqdestroy(dqp); 1044 1051 dqp = dqp1; 1045 - xfs_dqlock(dqp); 1046 1052 goto dqret; 1047 1053 } 1048 1054 } else { ··· 1050 1058 return -ESRCH; 1051 1059 } 1052 1060 1053 - error = xfs_qm_dqget_cache_insert(mp, qi, tree, id, dqp); 1061 + error = xfs_qm_dqget_cache_insert(mp, id, type, dqp); 1054 1062 if (error) { 1055 - /* 1056 - * Duplicate found. Just throw away the new dquot and start 1057 - * over. 1058 - */ 1059 1063 xfs_qm_dqdestroy(dqp); 1060 - XFS_STATS_INC(mp, xs_qm_dquot_dups); 1061 - goto restart; 1064 + if (error == -EEXIST) { 1065 + /* 1066 + * Duplicate found. Just throw away the new dquot and 1067 + * start over. 1068 + */ 1069 + XFS_STATS_INC(mp, xs_qm_dquot_dups); 1070 + goto restart; 1071 + } 1072 + return error; 1062 1073 } 1063 1074 1064 1075 dqret: 1065 1076 xfs_assert_ilocked(ip, XFS_ILOCK_EXCL); 1066 1077 trace_xfs_dqget_miss(dqp); 1067 - *O_dqpp = dqp; 1078 + found: 1079 + trace_xfs_dqattach_get(dqp); 1080 + *dqpp = dqp; 1068 1081 return 0; 1069 1082 } 1070 1083 ··· 1095 1098 else if (error != 0) 1096 1099 break; 1097 1100 1101 + mutex_lock(&dqp->q_qlock); 1098 1102 if (!XFS_IS_DQUOT_UNINITIALIZED(dqp)) { 1099 1103 *dqpp = dqp; 1100 1104 return 0; 1101 1105 } 1102 1106 1103 - xfs_qm_dqput(dqp); 1107 + mutex_unlock(&dqp->q_qlock); 1108 + xfs_qm_dqrele(dqp); 1104 1109 } 1105 1110 1106 1111 return error; 1107 1112 } 1108 1113 1109 1114 /* 1110 - * Release a reference to the dquot (decrement ref-count) and unlock it. 1111 - * 1112 - * If there is a group quota attached to this dquot, carefully release that 1113 - * too without tripping over deadlocks'n'stuff. 1114 - */ 1115 - void 1116 - xfs_qm_dqput( 1117 - struct xfs_dquot *dqp) 1118 - { 1119 - ASSERT(dqp->q_nrefs > 0); 1120 - ASSERT(XFS_DQ_IS_LOCKED(dqp)); 1121 - 1122 - trace_xfs_dqput(dqp); 1123 - 1124 - if (--dqp->q_nrefs == 0) { 1125 - struct xfs_quotainfo *qi = dqp->q_mount->m_quotainfo; 1126 - trace_xfs_dqput_free(dqp); 1127 - 1128 - if (list_lru_add_obj(&qi->qi_lru, &dqp->q_lru)) 1129 - XFS_STATS_INC(dqp->q_mount, xs_qm_dquot_unused); 1130 - } 1131 - xfs_dqunlock(dqp); 1132 - } 1133 - 1134 - /* 1135 - * Release a dquot. Flush it if dirty, then dqput() it. 1136 - * dquot must not be locked. 1115 + * Release a reference to the dquot. 1137 1116 */ 1138 1117 void 1139 1118 xfs_qm_dqrele( ··· 1120 1147 1121 1148 trace_xfs_dqrele(dqp); 1122 1149 1123 - xfs_dqlock(dqp); 1124 - /* 1125 - * We don't care to flush it if the dquot is dirty here. 1126 - * That will create stutters that we want to avoid. 1127 - * Instead we do a delayed write when we try to reclaim 1128 - * a dirty dquot. Also xfs_sync will take part of the burden... 1129 - */ 1130 - xfs_qm_dqput(dqp); 1150 + if (lockref_put_or_lock(&dqp->q_lockref)) 1151 + return; 1152 + if (!--dqp->q_lockref.count) { 1153 + struct xfs_quotainfo *qi = dqp->q_mount->m_quotainfo; 1154 + 1155 + trace_xfs_dqrele_free(dqp); 1156 + if (list_lru_add_obj(&qi->qi_lru, &dqp->q_lru)) 1157 + XFS_STATS_INC(dqp->q_mount, xs_qm_dquot_unused); 1158 + } 1159 + spin_unlock(&dqp->q_lockref.lock); 1131 1160 } 1132 1161 1133 1162 /*

+2 -20

fs/xfs/xfs_dquot.h

··· 71 71 xfs_dqtype_t q_type; 72 72 uint16_t q_flags; 73 73 xfs_dqid_t q_id; 74 - uint q_nrefs; 74 + struct lockref q_lockref; 75 75 int q_bufoffset; 76 76 xfs_daddr_t q_blkno; 77 77 xfs_fileoff_t q_fileoffset; ··· 119 119 static inline void xfs_dqfunlock(struct xfs_dquot *dqp) 120 120 { 121 121 complete(&dqp->q_flush); 122 - } 123 - 124 - static inline int xfs_dqlock_nowait(struct xfs_dquot *dqp) 125 - { 126 - return mutex_trylock(&dqp->q_qlock); 127 - } 128 - 129 - static inline void xfs_dqlock(struct xfs_dquot *dqp) 130 - { 131 - mutex_lock(&dqp->q_qlock); 132 - } 133 - 134 - static inline void xfs_dqunlock(struct xfs_dquot *dqp) 135 - { 136 - mutex_unlock(&dqp->q_qlock); 137 122 } 138 123 139 124 static inline int ··· 218 233 int xfs_qm_dqget_uncached(struct xfs_mount *mp, 219 234 xfs_dqid_t id, xfs_dqtype_t type, 220 235 struct xfs_dquot **dqpp); 221 - void xfs_qm_dqput(struct xfs_dquot *dqp); 222 236 223 237 void xfs_dqlock2(struct xfs_dquot *, struct xfs_dquot *); 224 238 void xfs_dqlockn(struct xfs_dqtrx *q); ··· 230 246 231 247 static inline struct xfs_dquot *xfs_qm_dqhold(struct xfs_dquot *dqp) 232 248 { 233 - xfs_dqlock(dqp); 234 - dqp->q_nrefs++; 235 - xfs_dqunlock(dqp); 249 + lockref_get(&dqp->q_lockref); 236 250 return dqp; 237 251 } 238 252

+3 -3

fs/xfs/xfs_dquot_item.c

··· 132 132 if (atomic_read(&dqp->q_pincount) > 0) 133 133 return XFS_ITEM_PINNED; 134 134 135 - if (!xfs_dqlock_nowait(dqp)) 135 + if (!mutex_trylock(&dqp->q_qlock)) 136 136 return XFS_ITEM_LOCKED; 137 137 138 138 /* ··· 177 177 out_relock_ail: 178 178 spin_lock(&lip->li_ailp->ail_lock); 179 179 out_unlock: 180 - xfs_dqunlock(dqp); 180 + mutex_unlock(&dqp->q_qlock); 181 181 return rval; 182 182 } 183 183 ··· 195 195 * transaction layer, within trans_commit. Hence, no LI_HOLD flag 196 196 * for the logitem. 197 197 */ 198 - xfs_dqunlock(dqp); 198 + mutex_unlock(&dqp->q_qlock); 199 199 } 200 200 201 201 STATIC void

+13 -18

fs/xfs/xfs_icache.c

··· 358 358 static int 359 359 xfs_iget_recycle( 360 360 struct xfs_perag *pag, 361 - struct xfs_inode *ip) __releases(&ip->i_flags_lock) 361 + struct xfs_inode *ip) 362 362 { 363 363 struct xfs_mount *mp = ip->i_mount; 364 364 struct inode *inode = VFS_I(ip); 365 365 int error; 366 366 367 367 trace_xfs_iget_recycle(ip); 368 - 369 - if (!xfs_ilock_nowait(ip, XFS_ILOCK_EXCL)) 370 - return -EAGAIN; 371 - 372 - /* 373 - * We need to make it look like the inode is being reclaimed to prevent 374 - * the actual reclaim workers from stomping over us while we recycle 375 - * the inode. We can't clear the radix tree tag yet as it requires 376 - * pag_ici_lock to be held exclusive. 377 - */ 378 - ip->i_flags |= XFS_IRECLAIM; 379 - 380 - spin_unlock(&ip->i_flags_lock); 381 - rcu_read_unlock(); 382 368 383 369 ASSERT(!rwsem_is_locked(&inode->i_rwsem)); 384 370 error = xfs_reinit_inode(mp, inode); ··· 562 576 563 577 /* The inode fits the selection criteria; process it. */ 564 578 if (ip->i_flags & XFS_IRECLAIMABLE) { 565 - /* Drops i_flags_lock and RCU read lock. */ 566 - error = xfs_iget_recycle(pag, ip); 567 - if (error == -EAGAIN) 579 + /* 580 + * We need to make it look like the inode is being reclaimed to 581 + * prevent the actual reclaim workers from stomping over us 582 + * while we recycle the inode. We can't clear the radix tree 583 + * tag yet as it requires pag_ici_lock to be held exclusive. 584 + */ 585 + if (!xfs_ilock_nowait(ip, XFS_ILOCK_EXCL)) 568 586 goto out_skip; 587 + ip->i_flags |= XFS_IRECLAIM; 588 + spin_unlock(&ip->i_flags_lock); 589 + rcu_read_unlock(); 590 + 591 + error = xfs_iget_recycle(pag, ip); 569 592 if (error) 570 593 return error; 571 594 } else {

+79 -127

fs/xfs/xfs_log.c

··· 534 534 */ 535 535 if ((iclog->ic_state == XLOG_STATE_WANT_SYNC || 536 536 (iclog->ic_flags & XLOG_ICL_NEED_FUA)) && 537 - !iclog->ic_header.h_tail_lsn) { 538 - iclog->ic_header.h_tail_lsn = 537 + !iclog->ic_header->h_tail_lsn) { 538 + iclog->ic_header->h_tail_lsn = 539 539 cpu_to_be64(atomic64_read(&log->l_tail_lsn)); 540 540 } 541 541 ··· 1279 1279 log->l_iclog_size = mp->m_logbsize; 1280 1280 1281 1281 /* 1282 - * # headers = size / 32k - one header holds cycles from 32k of data. 1282 + * Combined size of the log record headers. The first 32k cycles 1283 + * are stored directly in the xlog_rec_header, the rest in the 1284 + * variable number of xlog_rec_ext_headers at its end. 1283 1285 */ 1284 - log->l_iclog_heads = 1285 - DIV_ROUND_UP(mp->m_logbsize, XLOG_HEADER_CYCLE_SIZE); 1286 - log->l_iclog_hsize = log->l_iclog_heads << BBSHIFT; 1286 + log->l_iclog_hsize = struct_size(log->l_iclog->ic_header, h_ext, 1287 + DIV_ROUND_UP(mp->m_logbsize, XLOG_HEADER_CYCLE_SIZE) - 1); 1287 1288 } 1288 1289 1289 1290 void ··· 1368 1367 int num_bblks) 1369 1368 { 1370 1369 struct xlog *log; 1371 - xlog_rec_header_t *head; 1372 - xlog_in_core_t **iclogp; 1373 - xlog_in_core_t *iclog, *prev_iclog=NULL; 1370 + struct xlog_in_core **iclogp; 1371 + struct xlog_in_core *iclog, *prev_iclog = NULL; 1374 1372 int i; 1375 1373 int error = -ENOMEM; 1376 1374 uint log2_size = 0; ··· 1436 1436 init_waitqueue_head(&log->l_flush_wait); 1437 1437 1438 1438 iclogp = &log->l_iclog; 1439 - /* 1440 - * The amount of memory to allocate for the iclog structure is 1441 - * rather funky due to the way the structure is defined. It is 1442 - * done this way so that we can use different sizes for machines 1443 - * with different amounts of memory. See the definition of 1444 - * xlog_in_core_t in xfs_log_priv.h for details. 1445 - */ 1446 1439 ASSERT(log->l_iclog_size >= 4096); 1447 1440 for (i = 0; i < log->l_iclog_bufs; i++) { 1448 1441 size_t bvec_size = howmany(log->l_iclog_size, PAGE_SIZE) * ··· 1450 1457 iclog->ic_prev = prev_iclog; 1451 1458 prev_iclog = iclog; 1452 1459 1453 - iclog->ic_data = kvzalloc(log->l_iclog_size, 1460 + iclog->ic_header = kvzalloc(log->l_iclog_size, 1454 1461 GFP_KERNEL | __GFP_RETRY_MAYFAIL); 1455 - if (!iclog->ic_data) 1462 + if (!iclog->ic_header) 1456 1463 goto out_free_iclog; 1457 - head = &iclog->ic_header; 1458 - memset(head, 0, sizeof(xlog_rec_header_t)); 1459 - head->h_magicno = cpu_to_be32(XLOG_HEADER_MAGIC_NUM); 1460 - head->h_version = cpu_to_be32( 1464 + iclog->ic_header->h_magicno = 1465 + cpu_to_be32(XLOG_HEADER_MAGIC_NUM); 1466 + iclog->ic_header->h_version = cpu_to_be32( 1461 1467 xfs_has_logv2(log->l_mp) ? 2 : 1); 1462 - head->h_size = cpu_to_be32(log->l_iclog_size); 1463 - /* new fields */ 1464 - head->h_fmt = cpu_to_be32(XLOG_FMT); 1465 - memcpy(&head->h_fs_uuid, &mp->m_sb.sb_uuid, sizeof(uuid_t)); 1468 + iclog->ic_header->h_size = cpu_to_be32(log->l_iclog_size); 1469 + iclog->ic_header->h_fmt = cpu_to_be32(XLOG_FMT); 1470 + memcpy(&iclog->ic_header->h_fs_uuid, &mp->m_sb.sb_uuid, 1471 + sizeof(iclog->ic_header->h_fs_uuid)); 1466 1472 1473 + iclog->ic_datap = (void *)iclog->ic_header + log->l_iclog_hsize; 1467 1474 iclog->ic_size = log->l_iclog_size - log->l_iclog_hsize; 1468 1475 iclog->ic_state = XLOG_STATE_ACTIVE; 1469 1476 iclog->ic_log = log; 1470 1477 atomic_set(&iclog->ic_refcnt, 0); 1471 1478 INIT_LIST_HEAD(&iclog->ic_callbacks); 1472 - iclog->ic_datap = (void *)iclog->ic_data + log->l_iclog_hsize; 1473 1479 1474 1480 init_waitqueue_head(&iclog->ic_force_wait); 1475 1481 init_waitqueue_head(&iclog->ic_write_wait); ··· 1496 1504 out_free_iclog: 1497 1505 for (iclog = log->l_iclog; iclog; iclog = prev_iclog) { 1498 1506 prev_iclog = iclog->ic_next; 1499 - kvfree(iclog->ic_data); 1507 + kvfree(iclog->ic_header); 1500 1508 kfree(iclog); 1501 1509 if (prev_iclog == log->l_iclog) 1502 1510 break; ··· 1516 1524 struct xlog_in_core *iclog, 1517 1525 int roundoff) 1518 1526 { 1519 - int i, j, k; 1520 - int size = iclog->ic_offset + roundoff; 1521 - __be32 cycle_lsn; 1522 - char *dp; 1527 + struct xlog_rec_header *rhead = iclog->ic_header; 1528 + __be32 cycle_lsn = CYCLE_LSN_DISK(rhead->h_lsn); 1529 + char *dp = iclog->ic_datap; 1530 + int i; 1523 1531 1524 - cycle_lsn = CYCLE_LSN_DISK(iclog->ic_header.h_lsn); 1525 - 1526 - dp = iclog->ic_datap; 1527 - for (i = 0; i < BTOBB(size); i++) { 1528 - if (i >= (XLOG_HEADER_CYCLE_SIZE / BBSIZE)) 1529 - break; 1530 - iclog->ic_header.h_cycle_data[i] = *(__be32 *)dp; 1532 + for (i = 0; i < BTOBB(iclog->ic_offset + roundoff); i++) { 1533 + *xlog_cycle_data(rhead, i) = *(__be32 *)dp; 1531 1534 *(__be32 *)dp = cycle_lsn; 1532 1535 dp += BBSIZE; 1533 1536 } 1534 1537 1535 - if (xfs_has_logv2(log->l_mp)) { 1536 - xlog_in_core_2_t *xhdr = iclog->ic_data; 1537 - 1538 - for ( ; i < BTOBB(size); i++) { 1539 - j = i / (XLOG_HEADER_CYCLE_SIZE / BBSIZE); 1540 - k = i % (XLOG_HEADER_CYCLE_SIZE / BBSIZE); 1541 - xhdr[j].hic_xheader.xh_cycle_data[k] = *(__be32 *)dp; 1542 - *(__be32 *)dp = cycle_lsn; 1543 - dp += BBSIZE; 1544 - } 1545 - 1546 - for (i = 1; i < log->l_iclog_heads; i++) 1547 - xhdr[i].hic_xheader.xh_cycle = cycle_lsn; 1548 - } 1538 + for (i = 0; i < (log->l_iclog_hsize >> BBSHIFT) - 1; i++) 1539 + rhead->h_ext[i].xh_cycle = cycle_lsn; 1549 1540 } 1550 1541 1551 1542 /* ··· 1553 1578 1554 1579 /* ... then for additional cycle data for v2 logs ... */ 1555 1580 if (xfs_has_logv2(log->l_mp)) { 1556 - union xlog_in_core2 *xhdr = (union xlog_in_core2 *)rhead; 1557 - int i; 1558 - int xheads; 1581 + int xheads, i; 1559 1582 1560 - xheads = DIV_ROUND_UP(size, XLOG_HEADER_CYCLE_SIZE); 1561 - 1562 - for (i = 1; i < xheads; i++) { 1563 - crc = crc32c(crc, &xhdr[i].hic_xheader, 1564 - sizeof(struct xlog_rec_ext_header)); 1565 - } 1583 + xheads = DIV_ROUND_UP(size, XLOG_HEADER_CYCLE_SIZE) - 1; 1584 + for (i = 0; i < xheads; i++) 1585 + crc = crc32c(crc, &rhead->h_ext[i], XLOG_REC_EXT_SIZE); 1566 1586 } 1567 1587 1568 1588 /* ... and finally for the payload */ ··· 1641 1671 1642 1672 iclog->ic_flags &= ~(XLOG_ICL_NEED_FLUSH | XLOG_ICL_NEED_FUA); 1643 1673 1644 - if (is_vmalloc_addr(iclog->ic_data)) { 1645 - if (!bio_add_vmalloc(&iclog->ic_bio, iclog->ic_data, count)) 1674 + if (is_vmalloc_addr(iclog->ic_header)) { 1675 + if (!bio_add_vmalloc(&iclog->ic_bio, iclog->ic_header, count)) 1646 1676 goto shutdown; 1647 1677 } else { 1648 - bio_add_virt_nofail(&iclog->ic_bio, iclog->ic_data, count); 1678 + bio_add_virt_nofail(&iclog->ic_bio, iclog->ic_header, count); 1649 1679 } 1650 1680 1651 1681 /* ··· 1774 1804 size = iclog->ic_offset; 1775 1805 if (xfs_has_logv2(log->l_mp)) 1776 1806 size += roundoff; 1777 - iclog->ic_header.h_len = cpu_to_be32(size); 1807 + iclog->ic_header->h_len = cpu_to_be32(size); 1778 1808 1779 1809 XFS_STATS_INC(log->l_mp, xs_log_writes); 1780 1810 XFS_STATS_ADD(log->l_mp, xs_log_blocks, BTOBB(count)); 1781 1811 1782 - bno = BLOCK_LSN(be64_to_cpu(iclog->ic_header.h_lsn)); 1812 + bno = BLOCK_LSN(be64_to_cpu(iclog->ic_header->h_lsn)); 1783 1813 1784 1814 /* Do we need to split this write into 2 parts? */ 1785 1815 if (bno + BTOBB(count) > log->l_logBBsize) 1786 - xlog_split_iclog(log, &iclog->ic_header, bno, count); 1816 + xlog_split_iclog(log, iclog->ic_header, bno, count); 1787 1817 1788 1818 /* calculcate the checksum */ 1789 - iclog->ic_header.h_crc = xlog_cksum(log, &iclog->ic_header, 1819 + iclog->ic_header->h_crc = xlog_cksum(log, iclog->ic_header, 1790 1820 iclog->ic_datap, XLOG_REC_SIZE, size); 1791 1821 /* 1792 1822 * Intentionally corrupt the log record CRC based on the error injection ··· 1797 1827 */ 1798 1828 #ifdef DEBUG 1799 1829 if (XFS_TEST_ERROR(log->l_mp, XFS_ERRTAG_LOG_BAD_CRC)) { 1800 - iclog->ic_header.h_crc &= cpu_to_le32(0xAAAAAAAA); 1830 + iclog->ic_header->h_crc &= cpu_to_le32(0xAAAAAAAA); 1801 1831 iclog->ic_fail_crc = true; 1802 1832 xfs_warn(log->l_mp, 1803 1833 "Intentionally corrupted log record at LSN 0x%llx. Shutdown imminent.", 1804 - be64_to_cpu(iclog->ic_header.h_lsn)); 1834 + be64_to_cpu(iclog->ic_header->h_lsn)); 1805 1835 } 1806 1836 #endif 1807 1837 xlog_verify_iclog(log, iclog, count); ··· 1813 1843 */ 1814 1844 STATIC void 1815 1845 xlog_dealloc_log( 1816 - struct xlog *log) 1846 + struct xlog *log) 1817 1847 { 1818 - xlog_in_core_t *iclog, *next_iclog; 1819 - int i; 1848 + struct xlog_in_core *iclog, *next_iclog; 1849 + int i; 1820 1850 1821 1851 /* 1822 1852 * Destroy the CIL after waiting for iclog IO completion because an ··· 1828 1858 iclog = log->l_iclog; 1829 1859 for (i = 0; i < log->l_iclog_bufs; i++) { 1830 1860 next_iclog = iclog->ic_next; 1831 - kvfree(iclog->ic_data); 1861 + kvfree(iclog->ic_header); 1832 1862 kfree(iclog); 1833 1863 iclog = next_iclog; 1834 1864 } ··· 1850 1880 { 1851 1881 lockdep_assert_held(&log->l_icloglock); 1852 1882 1853 - be32_add_cpu(&iclog->ic_header.h_num_logops, record_cnt); 1883 + be32_add_cpu(&iclog->ic_header->h_num_logops, record_cnt); 1854 1884 iclog->ic_offset += copy_bytes; 1855 1885 } 1856 1886 ··· 2273 2303 * We don't need to cover the dummy. 2274 2304 */ 2275 2305 if (*iclogs_changed == 0 && 2276 - iclog->ic_header.h_num_logops == cpu_to_be32(XLOG_COVER_OPS)) { 2306 + iclog->ic_header->h_num_logops == cpu_to_be32(XLOG_COVER_OPS)) { 2277 2307 *iclogs_changed = 1; 2278 2308 } else { 2279 2309 /* ··· 2285 2315 2286 2316 iclog->ic_state = XLOG_STATE_ACTIVE; 2287 2317 iclog->ic_offset = 0; 2288 - iclog->ic_header.h_num_logops = 0; 2289 - memset(iclog->ic_header.h_cycle_data, 0, 2290 - sizeof(iclog->ic_header.h_cycle_data)); 2291 - iclog->ic_header.h_lsn = 0; 2292 - iclog->ic_header.h_tail_lsn = 0; 2318 + iclog->ic_header->h_num_logops = 0; 2319 + memset(iclog->ic_header->h_cycle_data, 0, 2320 + sizeof(iclog->ic_header->h_cycle_data)); 2321 + iclog->ic_header->h_lsn = 0; 2322 + iclog->ic_header->h_tail_lsn = 0; 2293 2323 } 2294 2324 2295 2325 /* ··· 2381 2411 iclog->ic_state == XLOG_STATE_DIRTY) 2382 2412 continue; 2383 2413 2384 - lsn = be64_to_cpu(iclog->ic_header.h_lsn); 2414 + lsn = be64_to_cpu(iclog->ic_header->h_lsn); 2385 2415 if ((lsn && !lowest_lsn) || XFS_LSN_CMP(lsn, lowest_lsn) < 0) 2386 2416 lowest_lsn = lsn; 2387 2417 } while ((iclog = iclog->ic_next) != log->l_iclog); ··· 2416 2446 * If this is not the lowest lsn iclog, then we will leave it 2417 2447 * for another completion to process. 2418 2448 */ 2419 - header_lsn = be64_to_cpu(iclog->ic_header.h_lsn); 2449 + header_lsn = be64_to_cpu(iclog->ic_header->h_lsn); 2420 2450 lowest_lsn = xlog_get_lowest_lsn(log); 2421 2451 if (lowest_lsn && XFS_LSN_CMP(lowest_lsn, header_lsn) < 0) 2422 2452 return false; ··· 2579 2609 struct xlog_ticket *ticket, 2580 2610 int *logoffsetp) 2581 2611 { 2582 - int log_offset; 2583 - xlog_rec_header_t *head; 2584 - xlog_in_core_t *iclog; 2612 + int log_offset; 2613 + struct xlog_rec_header *head; 2614 + struct xlog_in_core *iclog; 2585 2615 2586 2616 restart: 2587 2617 spin_lock(&log->l_icloglock); ··· 2599 2629 goto restart; 2600 2630 } 2601 2631 2602 - head = &iclog->ic_header; 2632 + head = iclog->ic_header; 2603 2633 2604 2634 atomic_inc(&iclog->ic_refcnt); /* prevents sync */ 2605 2635 log_offset = iclog->ic_offset; ··· 2764 2794 if (!eventual_size) 2765 2795 eventual_size = iclog->ic_offset; 2766 2796 iclog->ic_state = XLOG_STATE_WANT_SYNC; 2767 - iclog->ic_header.h_prev_block = cpu_to_be32(log->l_prev_block); 2797 + iclog->ic_header->h_prev_block = cpu_to_be32(log->l_prev_block); 2768 2798 log->l_prev_block = log->l_curr_block; 2769 2799 log->l_prev_cycle = log->l_curr_cycle; 2770 2800 ··· 2808 2838 struct xlog_in_core *iclog, 2809 2839 bool *completed) 2810 2840 { 2811 - xfs_lsn_t lsn = be64_to_cpu(iclog->ic_header.h_lsn); 2841 + xfs_lsn_t lsn = be64_to_cpu(iclog->ic_header->h_lsn); 2812 2842 int error; 2813 2843 2814 2844 *completed = false; ··· 2820 2850 * If the iclog has already been completed and reused the header LSN 2821 2851 * will have been rewritten by completion 2822 2852 */ 2823 - if (be64_to_cpu(iclog->ic_header.h_lsn) != lsn) 2853 + if (be64_to_cpu(iclog->ic_header->h_lsn) != lsn) 2824 2854 *completed = true; 2825 2855 return 0; 2826 2856 } ··· 2953 2983 goto out_error; 2954 2984 2955 2985 iclog = log->l_iclog; 2956 - while (be64_to_cpu(iclog->ic_header.h_lsn) != lsn) { 2986 + while (be64_to_cpu(iclog->ic_header->h_lsn) != lsn) { 2957 2987 trace_xlog_iclog_force_lsn(iclog, _RET_IP_); 2958 2988 iclog = iclog->ic_next; 2959 2989 if (iclog == log->l_iclog) ··· 3219 3249 { 3220 3250 xfs_alert(log->l_mp, 3221 3251 "ran out of log space tail 0x%llx/0x%llx, head lsn 0x%llx, head 0x%x/0x%x, prev head 0x%x/0x%x", 3222 - iclog ? be64_to_cpu(iclog->ic_header.h_tail_lsn) : -1, 3252 + iclog ? be64_to_cpu(iclog->ic_header->h_tail_lsn) : -1, 3223 3253 atomic64_read(&log->l_tail_lsn), 3224 3254 log->l_ailp->ail_head_lsn, 3225 3255 log->l_curr_cycle, log->l_curr_block, ··· 3238 3268 struct xlog *log, 3239 3269 struct xlog_in_core *iclog) 3240 3270 { 3241 - xfs_lsn_t tail_lsn = be64_to_cpu(iclog->ic_header.h_tail_lsn); 3271 + xfs_lsn_t tail_lsn = be64_to_cpu(iclog->ic_header->h_tail_lsn); 3242 3272 int blocks; 3243 3273 3244 3274 if (CYCLE_LSN(tail_lsn) == log->l_prev_cycle) { ··· 3292 3322 struct xlog_in_core *iclog, 3293 3323 int count) 3294 3324 { 3295 - struct xlog_op_header *ophead; 3296 - xlog_in_core_t *icptr; 3297 - xlog_in_core_2_t *xhdr; 3298 - void *base_ptr, *ptr, *p; 3325 + struct xlog_rec_header *rhead = iclog->ic_header; 3326 + struct xlog_in_core *icptr; 3327 + void *base_ptr, *ptr; 3299 3328 ptrdiff_t field_offset; 3300 3329 uint8_t clientid; 3301 - int len, i, j, k, op_len; 3330 + int len, i, op_len; 3302 3331 int idx; 3303 3332 3304 3333 /* check validity of iclog pointers */ ··· 3311 3342 spin_unlock(&log->l_icloglock); 3312 3343 3313 3344 /* check log magic numbers */ 3314 - if (iclog->ic_header.h_magicno != cpu_to_be32(XLOG_HEADER_MAGIC_NUM)) 3345 + if (rhead->h_magicno != cpu_to_be32(XLOG_HEADER_MAGIC_NUM)) 3315 3346 xfs_emerg(log->l_mp, "%s: invalid magic num", __func__); 3316 3347 3317 - base_ptr = ptr = &iclog->ic_header; 3318 - p = &iclog->ic_header; 3348 + base_ptr = ptr = rhead; 3319 3349 for (ptr += BBSIZE; ptr < base_ptr + count; ptr += BBSIZE) { 3320 3350 if (*(__be32 *)ptr == cpu_to_be32(XLOG_HEADER_MAGIC_NUM)) 3321 3351 xfs_emerg(log->l_mp, "%s: unexpected magic num", ··· 3322 3354 } 3323 3355 3324 3356 /* check fields */ 3325 - len = be32_to_cpu(iclog->ic_header.h_num_logops); 3357 + len = be32_to_cpu(rhead->h_num_logops); 3326 3358 base_ptr = ptr = iclog->ic_datap; 3327 - ophead = ptr; 3328 - xhdr = iclog->ic_data; 3329 3359 for (i = 0; i < len; i++) { 3330 - ophead = ptr; 3360 + struct xlog_op_header *ophead = ptr; 3361 + void *p = &ophead->oh_clientid; 3331 3362 3332 3363 /* clientid is only 1 byte */ 3333 - p = &ophead->oh_clientid; 3334 3364 field_offset = p - base_ptr; 3335 3365 if (field_offset & 0x1ff) { 3336 3366 clientid = ophead->oh_clientid; 3337 3367 } else { 3338 3368 idx = BTOBBT((void *)&ophead->oh_clientid - iclog->ic_datap); 3339 - if (idx >= (XLOG_HEADER_CYCLE_SIZE / BBSIZE)) { 3340 - j = idx / (XLOG_HEADER_CYCLE_SIZE / BBSIZE); 3341 - k = idx % (XLOG_HEADER_CYCLE_SIZE / BBSIZE); 3342 - clientid = xlog_get_client_id( 3343 - xhdr[j].hic_xheader.xh_cycle_data[k]); 3344 - } else { 3345 - clientid = xlog_get_client_id( 3346 - iclog->ic_header.h_cycle_data[idx]); 3347 - } 3369 + clientid = xlog_get_client_id(*xlog_cycle_data(rhead, idx)); 3348 3370 } 3349 3371 if (clientid != XFS_TRANSACTION && clientid != XFS_LOG) { 3350 3372 xfs_warn(log->l_mp, ··· 3350 3392 op_len = be32_to_cpu(ophead->oh_len); 3351 3393 } else { 3352 3394 idx = BTOBBT((void *)&ophead->oh_len - iclog->ic_datap); 3353 - if (idx >= (XLOG_HEADER_CYCLE_SIZE / BBSIZE)) { 3354 - j = idx / (XLOG_HEADER_CYCLE_SIZE / BBSIZE); 3355 - k = idx % (XLOG_HEADER_CYCLE_SIZE / BBSIZE); 3356 - op_len = be32_to_cpu(xhdr[j].hic_xheader.xh_cycle_data[k]); 3357 - } else { 3358 - op_len = be32_to_cpu(iclog->ic_header.h_cycle_data[idx]); 3359 - } 3395 + op_len = be32_to_cpu(*xlog_cycle_data(rhead, idx)); 3360 3396 } 3361 3397 ptr += sizeof(struct xlog_op_header) + op_len; 3362 3398 } ··· 3481 3529 3482 3530 STATIC int 3483 3531 xlog_iclogs_empty( 3484 - struct xlog *log) 3532 + struct xlog *log) 3485 3533 { 3486 - xlog_in_core_t *iclog; 3534 + struct xlog_in_core *iclog = log->l_iclog; 3487 3535 3488 - iclog = log->l_iclog; 3489 3536 do { 3490 3537 /* endianness does not matter here, zero is zero in 3491 3538 * any language. 3492 3539 */ 3493 - if (iclog->ic_header.h_num_logops) 3540 + if (iclog->ic_header->h_num_logops) 3494 3541 return 0; 3495 3542 iclog = iclog->ic_next; 3496 3543 } while (iclog != log->l_iclog); 3544 + 3497 3545 return 1; 3498 3546 } 3499 3547

+3 -3

fs/xfs/xfs_log_cil.c

··· 940 940 struct xlog_in_core *iclog) 941 941 { 942 942 struct xfs_cil *cil = ctx->cil; 943 - xfs_lsn_t lsn = be64_to_cpu(iclog->ic_header.h_lsn); 943 + xfs_lsn_t lsn = be64_to_cpu(iclog->ic_header->h_lsn); 944 944 945 945 ASSERT(!ctx->commit_lsn); 946 946 if (!ctx->start_lsn) { ··· 1458 1458 */ 1459 1459 spin_lock(&log->l_icloglock); 1460 1460 if (ctx->start_lsn != ctx->commit_lsn) { 1461 - xfs_lsn_t plsn; 1461 + xfs_lsn_t plsn = be64_to_cpu( 1462 + ctx->commit_iclog->ic_prev->ic_header->h_lsn); 1462 1463 1463 - plsn = be64_to_cpu(ctx->commit_iclog->ic_prev->ic_header.h_lsn); 1464 1464 if (plsn && XFS_LSN_CMP(plsn, ctx->commit_lsn) < 0) { 1465 1465 /* 1466 1466 * Waiting on ic_force_wait orders the completion of

+23 -10

fs/xfs/xfs_log_priv.h

··· 158 158 }; 159 159 160 160 /* 161 - * - A log record header is 512 bytes. There is plenty of room to grow the 162 - * xlog_rec_header_t into the reserved space. 163 - * - ic_data follows, so a write to disk can start at the beginning of 164 - * the iclog. 161 + * In-core log structure. 162 + * 165 163 * - ic_forcewait is used to implement synchronous forcing of the iclog to disk. 166 164 * - ic_next is the pointer to the next iclog in the ring. 167 165 * - ic_log is a pointer back to the global log structure. ··· 181 183 * We'll put all the read-only and l_icloglock fields in the first cacheline, 182 184 * and move everything else out to subsequent cachelines. 183 185 */ 184 - typedef struct xlog_in_core { 186 + struct xlog_in_core { 185 187 wait_queue_head_t ic_force_wait; 186 188 wait_queue_head_t ic_write_wait; 187 189 struct xlog_in_core *ic_next; ··· 196 198 197 199 /* reference counts need their own cacheline */ 198 200 atomic_t ic_refcnt ____cacheline_aligned_in_smp; 199 - xlog_in_core_2_t *ic_data; 200 - #define ic_header ic_data->hic_header 201 + struct xlog_rec_header *ic_header; 201 202 #ifdef DEBUG 202 203 bool ic_fail_crc : 1; 203 204 #endif ··· 204 207 struct work_struct ic_end_io_work; 205 208 struct bio ic_bio; 206 209 struct bio_vec ic_bvec[]; 207 - } xlog_in_core_t; 210 + }; 208 211 209 212 /* 210 213 * The CIL context is used to aggregate per-transaction details as well be ··· 406 409 struct list_head *l_buf_cancel_table; 407 410 struct list_head r_dfops; /* recovered log intent items */ 408 411 int l_iclog_hsize; /* size of iclog header */ 409 - int l_iclog_heads; /* # of iclog header sectors */ 410 412 uint l_sectBBsize; /* sector size in BBs (2^n) */ 411 413 int l_iclog_size; /* size of log in bytes */ 412 414 int l_iclog_bufs; /* number of iclog buffers */ ··· 418 422 /* waiting for iclog flush */ 419 423 int l_covered_state;/* state of "covering disk 420 424 * log entries" */ 421 - xlog_in_core_t *l_iclog; /* head log queue */ 425 + struct xlog_in_core *l_iclog; /* head log queue */ 422 426 spinlock_t l_icloglock; /* grab to change iclog state */ 423 427 int l_curr_cycle; /* Cycle number of log writes */ 424 428 int l_prev_cycle; /* Cycle number before last ··· 705 709 { 706 710 nbytes += niovecs * (sizeof(uint64_t) + sizeof(struct xlog_op_header)); 707 711 return round_up(nbytes, sizeof(uint64_t)); 712 + } 713 + 714 + /* 715 + * Cycles over XLOG_CYCLE_DATA_SIZE overflow into the extended header that was 716 + * added for v2 logs. Addressing for the cycles array there is off by one, 717 + * because the first batch of cycles is in the original header. 718 + */ 719 + static inline __be32 *xlog_cycle_data(struct xlog_rec_header *rhead, unsigned i) 720 + { 721 + if (i >= XLOG_CYCLE_DATA_SIZE) { 722 + unsigned j = i / XLOG_CYCLE_DATA_SIZE; 723 + unsigned k = i % XLOG_CYCLE_DATA_SIZE; 724 + 725 + return &rhead->h_ext[j - 1].xh_cycle_data[k]; 726 + } 727 + 728 + return &rhead->h_cycle_data[i]; 708 729 } 709 730 710 731 #endif /* __XFS_LOG_PRIV_H__ */

+17 -28

fs/xfs/xfs_log_recover.c

··· 190 190 */ 191 191 STATIC void 192 192 xlog_header_check_dump( 193 - xfs_mount_t *mp, 194 - xlog_rec_header_t *head) 193 + struct xfs_mount *mp, 194 + struct xlog_rec_header *head) 195 195 { 196 196 xfs_debug(mp, "%s: SB : uuid = %pU, fmt = %d", 197 197 __func__, &mp->m_sb.sb_uuid, XLOG_FMT); ··· 207 207 */ 208 208 STATIC int 209 209 xlog_header_check_recover( 210 - xfs_mount_t *mp, 211 - xlog_rec_header_t *head) 210 + struct xfs_mount *mp, 211 + struct xlog_rec_header *head) 212 212 { 213 213 ASSERT(head->h_magicno == cpu_to_be32(XLOG_HEADER_MAGIC_NUM)); 214 214 ··· 238 238 */ 239 239 STATIC int 240 240 xlog_header_check_mount( 241 - xfs_mount_t *mp, 242 - xlog_rec_header_t *head) 241 + struct xfs_mount *mp, 242 + struct xlog_rec_header *head) 243 243 { 244 244 ASSERT(head->h_magicno == cpu_to_be32(XLOG_HEADER_MAGIC_NUM)); 245 245 ··· 400 400 xfs_daddr_t i; 401 401 char *buffer; 402 402 char *offset = NULL; 403 - xlog_rec_header_t *head = NULL; 403 + struct xlog_rec_header *head = NULL; 404 404 int error = 0; 405 405 int smallmem = 0; 406 406 int num_blks = *last_blk - start_blk; ··· 437 437 goto out; 438 438 } 439 439 440 - head = (xlog_rec_header_t *)offset; 440 + head = (struct xlog_rec_header *)offset; 441 441 442 442 if (head->h_magicno == cpu_to_be32(XLOG_HEADER_MAGIC_NUM)) 443 443 break; ··· 1237 1237 xfs_daddr_t *head_blk, 1238 1238 xfs_daddr_t *tail_blk) 1239 1239 { 1240 - xlog_rec_header_t *rhead; 1240 + struct xlog_rec_header *rhead; 1241 1241 char *offset = NULL; 1242 1242 char *buffer; 1243 1243 int error; ··· 1487 1487 int tail_cycle, 1488 1488 int tail_block) 1489 1489 { 1490 - xlog_rec_header_t *recp = (xlog_rec_header_t *)buf; 1490 + struct xlog_rec_header *recp = (struct xlog_rec_header *)buf; 1491 1491 1492 1492 memset(buf, 0, BBSIZE); 1493 1493 recp->h_magicno = cpu_to_be32(XLOG_HEADER_MAGIC_NUM); ··· 2863 2863 char *dp, 2864 2864 struct xlog *log) 2865 2865 { 2866 - int i, j, k; 2866 + int i; 2867 2867 2868 - for (i = 0; i < BTOBB(be32_to_cpu(rhead->h_len)) && 2869 - i < (XLOG_HEADER_CYCLE_SIZE / BBSIZE); i++) { 2870 - *(__be32 *)dp = *(__be32 *)&rhead->h_cycle_data[i]; 2868 + for (i = 0; i < BTOBB(be32_to_cpu(rhead->h_len)); i++) { 2869 + *(__be32 *)dp = *xlog_cycle_data(rhead, i); 2871 2870 dp += BBSIZE; 2872 - } 2873 - 2874 - if (xfs_has_logv2(log->l_mp)) { 2875 - xlog_in_core_2_t *xhdr = (xlog_in_core_2_t *)rhead; 2876 - for ( ; i < BTOBB(be32_to_cpu(rhead->h_len)); i++) { 2877 - j = i / (XLOG_HEADER_CYCLE_SIZE / BBSIZE); 2878 - k = i % (XLOG_HEADER_CYCLE_SIZE / BBSIZE); 2879 - *(__be32 *)dp = xhdr[j].hic_xheader.xh_cycle_data[k]; 2880 - dp += BBSIZE; 2881 - } 2882 2871 } 2883 2872 } 2884 2873 ··· 2997 3008 int pass, 2998 3009 xfs_daddr_t *first_bad) /* out: first bad log rec */ 2999 3010 { 3000 - xlog_rec_header_t *rhead; 3011 + struct xlog_rec_header *rhead; 3001 3012 xfs_daddr_t blk_no, rblk_no; 3002 3013 xfs_daddr_t rhead_blk; 3003 3014 char *offset; ··· 3034 3045 if (error) 3035 3046 goto bread_err1; 3036 3047 3037 - rhead = (xlog_rec_header_t *)offset; 3048 + rhead = (struct xlog_rec_header *)offset; 3038 3049 3039 3050 /* 3040 3051 * xfsprogs has a bug where record length is based on lsunit but ··· 3141 3152 if (error) 3142 3153 goto bread_err2; 3143 3154 } 3144 - rhead = (xlog_rec_header_t *)offset; 3155 + rhead = (struct xlog_rec_header *)offset; 3145 3156 error = xlog_valid_rec_header(log, rhead, 3146 3157 split_hblks ? blk_no : 0, h_size); 3147 3158 if (error) ··· 3223 3234 if (error) 3224 3235 goto bread_err2; 3225 3236 3226 - rhead = (xlog_rec_header_t *)offset; 3237 + rhead = (struct xlog_rec_header *)offset; 3227 3238 error = xlog_valid_rec_header(log, rhead, blk_no, h_size); 3228 3239 if (error) 3229 3240 goto bread_err2;

+41 -113

fs/xfs/xfs_qm.c

··· 126 126 void *data) 127 127 { 128 128 struct xfs_quotainfo *qi = dqp->q_mount->m_quotainfo; 129 - int error = -EAGAIN; 130 129 131 - xfs_dqlock(dqp); 132 - if ((dqp->q_flags & XFS_DQFLAG_FREEING) || dqp->q_nrefs != 0) 133 - goto out_unlock; 130 + spin_lock(&dqp->q_lockref.lock); 131 + if (dqp->q_lockref.count > 0 || __lockref_is_dead(&dqp->q_lockref)) { 132 + spin_unlock(&dqp->q_lockref.lock); 133 + return -EAGAIN; 134 + } 135 + lockref_mark_dead(&dqp->q_lockref); 136 + spin_unlock(&dqp->q_lockref.lock); 134 137 135 - dqp->q_flags |= XFS_DQFLAG_FREEING; 136 - 138 + mutex_lock(&dqp->q_qlock); 137 139 xfs_qm_dqunpin_wait(dqp); 138 140 xfs_dqflock(dqp); 139 141 ··· 146 144 */ 147 145 if (XFS_DQ_IS_DIRTY(dqp)) { 148 146 struct xfs_buf *bp = NULL; 147 + int error; 149 148 150 149 /* 151 150 * We don't care about getting disk errors here. We need ··· 154 151 */ 155 152 error = xfs_dquot_use_attached_buf(dqp, &bp); 156 153 if (error == -EAGAIN) { 157 - xfs_dqfunlock(dqp); 158 - dqp->q_flags &= ~XFS_DQFLAG_FREEING; 159 - goto out_unlock; 154 + /* resurrect the refcount from the dead. */ 155 + dqp->q_lockref.count = 0; 156 + goto out_funlock; 160 157 } 161 158 if (!bp) 162 159 goto out_funlock; ··· 180 177 !test_bit(XFS_LI_IN_AIL, &dqp->q_logitem.qli_item.li_flags)); 181 178 182 179 xfs_dqfunlock(dqp); 183 - xfs_dqunlock(dqp); 180 + mutex_unlock(&dqp->q_qlock); 184 181 185 182 radix_tree_delete(xfs_dquot_tree(qi, xfs_dquot_type(dqp)), dqp->q_id); 186 183 qi->qi_dquots--; ··· 195 192 196 193 xfs_qm_dqdestroy(dqp); 197 194 return 0; 198 - 199 - out_unlock: 200 - xfs_dqunlock(dqp); 201 - return error; 202 195 } 203 196 204 197 /* ··· 287 288 xfs_qm_destroy_quotainos(mp->m_quotainfo); 288 289 } 289 290 290 - STATIC int 291 - xfs_qm_dqattach_one( 292 - struct xfs_inode *ip, 293 - xfs_dqtype_t type, 294 - bool doalloc, 295 - struct xfs_dquot **IO_idqpp) 296 - { 297 - struct xfs_dquot *dqp; 298 - int error; 299 - 300 - xfs_assert_ilocked(ip, XFS_ILOCK_EXCL); 301 - error = 0; 302 - 303 - /* 304 - * See if we already have it in the inode itself. IO_idqpp is &i_udquot 305 - * or &i_gdquot. This made the code look weird, but made the logic a lot 306 - * simpler. 307 - */ 308 - dqp = *IO_idqpp; 309 - if (dqp) { 310 - trace_xfs_dqattach_found(dqp); 311 - return 0; 312 - } 313 - 314 - /* 315 - * Find the dquot from somewhere. This bumps the reference count of 316 - * dquot and returns it locked. This can return ENOENT if dquot didn't 317 - * exist on disk and we didn't ask it to allocate; ESRCH if quotas got 318 - * turned off suddenly. 319 - */ 320 - error = xfs_qm_dqget_inode(ip, type, doalloc, &dqp); 321 - if (error) 322 - return error; 323 - 324 - trace_xfs_dqattach_get(dqp); 325 - 326 - /* 327 - * dqget may have dropped and re-acquired the ilock, but it guarantees 328 - * that the dquot returned is the one that should go in the inode. 329 - */ 330 - *IO_idqpp = dqp; 331 - xfs_dqunlock(dqp); 332 - return 0; 333 - } 334 - 335 291 static bool 336 292 xfs_qm_need_dqattach( 337 293 struct xfs_inode *ip) ··· 326 372 ASSERT(!xfs_is_metadir_inode(ip)); 327 373 328 374 if (XFS_IS_UQUOTA_ON(mp) && !ip->i_udquot) { 329 - error = xfs_qm_dqattach_one(ip, XFS_DQTYPE_USER, 375 + error = xfs_qm_dqget_inode(ip, XFS_DQTYPE_USER, 330 376 doalloc, &ip->i_udquot); 331 377 if (error) 332 378 goto done; ··· 334 380 } 335 381 336 382 if (XFS_IS_GQUOTA_ON(mp) && !ip->i_gdquot) { 337 - error = xfs_qm_dqattach_one(ip, XFS_DQTYPE_GROUP, 383 + error = xfs_qm_dqget_inode(ip, XFS_DQTYPE_GROUP, 338 384 doalloc, &ip->i_gdquot); 339 385 if (error) 340 386 goto done; ··· 342 388 } 343 389 344 390 if (XFS_IS_PQUOTA_ON(mp) && !ip->i_pdquot) { 345 - error = xfs_qm_dqattach_one(ip, XFS_DQTYPE_PROJ, 391 + error = xfs_qm_dqget_inode(ip, XFS_DQTYPE_PROJ, 346 392 doalloc, &ip->i_pdquot); 347 393 if (error) 348 394 goto done; ··· 422 468 struct xfs_qm_isolate *isol = arg; 423 469 enum lru_status ret = LRU_SKIP; 424 470 425 - if (!xfs_dqlock_nowait(dqp)) 471 + if (!spin_trylock(&dqp->q_lockref.lock)) 426 472 goto out_miss_busy; 427 473 428 474 /* ··· 430 476 * from the LRU, leave it for the freeing task to complete the freeing 431 477 * process rather than risk it being free from under us here. 432 478 */ 433 - if (dqp->q_flags & XFS_DQFLAG_FREEING) 479 + if (__lockref_is_dead(&dqp->q_lockref)) 434 480 goto out_miss_unlock; 435 481 436 482 /* ··· 439 485 * again. 440 486 */ 441 487 ret = LRU_ROTATE; 442 - if (XFS_DQ_IS_DIRTY(dqp) || atomic_read(&dqp->q_pincount) > 0) { 488 + if (XFS_DQ_IS_DIRTY(dqp) || atomic_read(&dqp->q_pincount) > 0) 443 489 goto out_miss_unlock; 444 - } 445 490 446 491 /* 447 492 * This dquot has acquired a reference in the meantime remove it from 448 493 * the freelist and try again. 449 494 */ 450 - if (dqp->q_nrefs) { 451 - xfs_dqunlock(dqp); 495 + if (dqp->q_lockref.count) { 496 + spin_unlock(&dqp->q_lockref.lock); 452 497 XFS_STATS_INC(dqp->q_mount, xs_qm_dqwants); 453 498 454 499 trace_xfs_dqreclaim_want(dqp); ··· 471 518 /* 472 519 * Prevent lookups now that we are past the point of no return. 473 520 */ 474 - dqp->q_flags |= XFS_DQFLAG_FREEING; 475 - xfs_dqunlock(dqp); 521 + lockref_mark_dead(&dqp->q_lockref); 522 + spin_unlock(&dqp->q_lockref.lock); 476 523 477 - ASSERT(dqp->q_nrefs == 0); 478 524 list_lru_isolate_move(lru, &dqp->q_lru, &isol->dispose); 479 525 XFS_STATS_DEC(dqp->q_mount, xs_qm_dquot_unused); 480 526 trace_xfs_dqreclaim_done(dqp); ··· 481 529 return LRU_REMOVED; 482 530 483 531 out_miss_unlock: 484 - xfs_dqunlock(dqp); 532 + spin_unlock(&dqp->q_lockref.lock); 485 533 out_miss_busy: 486 534 trace_xfs_dqreclaim_busy(dqp); 487 535 XFS_STATS_INC(dqp->q_mount, xs_qm_dqreclaim_misses); ··· 1268 1316 return error; 1269 1317 } 1270 1318 1319 + mutex_lock(&dqp->q_qlock); 1271 1320 error = xfs_dquot_attach_buf(NULL, dqp); 1272 1321 if (error) 1273 - return error; 1322 + goto out_unlock; 1274 1323 1275 1324 trace_xfs_dqadjust(dqp); 1276 1325 ··· 1301 1348 } 1302 1349 1303 1350 dqp->q_flags |= XFS_DQFLAG_DIRTY; 1304 - xfs_qm_dqput(dqp); 1305 - return 0; 1351 + out_unlock: 1352 + mutex_unlock(&dqp->q_qlock); 1353 + xfs_qm_dqrele(dqp); 1354 + return error; 1306 1355 } 1307 1356 1308 1357 /* ··· 1421 1466 struct xfs_buf *bp = NULL; 1422 1467 int error = 0; 1423 1468 1424 - xfs_dqlock(dqp); 1425 - if (dqp->q_flags & XFS_DQFLAG_FREEING) 1426 - goto out_unlock; 1469 + if (!lockref_get_not_dead(&dqp->q_lockref)) 1470 + return 0; 1471 + 1472 + mutex_lock(&dqp->q_qlock); 1427 1473 if (!XFS_DQ_IS_DIRTY(dqp)) 1428 1474 goto out_unlock; 1429 1475 ··· 1444 1488 xfs_buf_delwri_queue(bp, buffer_list); 1445 1489 xfs_buf_relse(bp); 1446 1490 out_unlock: 1447 - xfs_dqunlock(dqp); 1491 + mutex_unlock(&dqp->q_qlock); 1492 + xfs_qm_dqrele(dqp); 1448 1493 return error; 1449 1494 } 1450 1495 ··· 1861 1904 struct xfs_dquot *gq = NULL; 1862 1905 struct xfs_dquot *pq = NULL; 1863 1906 int error; 1864 - uint lockflags; 1865 1907 1866 1908 if (!XFS_IS_QUOTA_ON(mp)) 1867 1909 return 0; 1868 1910 1869 1911 ASSERT(!xfs_is_metadir_inode(ip)); 1870 - 1871 - lockflags = XFS_ILOCK_EXCL; 1872 - xfs_ilock(ip, lockflags); 1873 1912 1874 1913 if ((flags & XFS_QMOPT_INHERIT) && XFS_INHERIT_GID(ip)) 1875 1914 gid = inode->i_gid; ··· 1875 1922 * if necessary. The dquot(s) will not be locked. 1876 1923 */ 1877 1924 if (XFS_NOT_DQATTACHED(mp, ip)) { 1925 + xfs_ilock(ip, XFS_ILOCK_EXCL); 1878 1926 error = xfs_qm_dqattach_locked(ip, true); 1879 - if (error) { 1880 - xfs_iunlock(ip, lockflags); 1927 + xfs_iunlock(ip, XFS_ILOCK_EXCL); 1928 + if (error) 1881 1929 return error; 1882 - } 1883 1930 } 1884 1931 1885 1932 if ((flags & XFS_QMOPT_UQUOTA) && XFS_IS_UQUOTA_ON(mp)) { 1886 1933 ASSERT(O_udqpp); 1887 1934 if (!uid_eq(inode->i_uid, uid)) { 1888 - /* 1889 - * What we need is the dquot that has this uid, and 1890 - * if we send the inode to dqget, the uid of the inode 1891 - * takes priority over what's sent in the uid argument. 1892 - * We must unlock inode here before calling dqget if 1893 - * we're not sending the inode, because otherwise 1894 - * we'll deadlock by doing trans_reserve while 1895 - * holding ilock. 1896 - */ 1897 - xfs_iunlock(ip, lockflags); 1898 1935 error = xfs_qm_dqget(mp, from_kuid(user_ns, uid), 1899 1936 XFS_DQTYPE_USER, true, &uq); 1900 1937 if (error) { 1901 1938 ASSERT(error != -ENOENT); 1902 1939 return error; 1903 1940 } 1904 - /* 1905 - * Get the ilock in the right order. 1906 - */ 1907 - xfs_dqunlock(uq); 1908 - lockflags = XFS_ILOCK_SHARED; 1909 - xfs_ilock(ip, lockflags); 1910 1941 } else { 1911 1942 /* 1912 1943 * Take an extra reference, because we'll return ··· 1903 1966 if ((flags & XFS_QMOPT_GQUOTA) && XFS_IS_GQUOTA_ON(mp)) { 1904 1967 ASSERT(O_gdqpp); 1905 1968 if (!gid_eq(inode->i_gid, gid)) { 1906 - xfs_iunlock(ip, lockflags); 1907 1969 error = xfs_qm_dqget(mp, from_kgid(user_ns, gid), 1908 1970 XFS_DQTYPE_GROUP, true, &gq); 1909 1971 if (error) { 1910 1972 ASSERT(error != -ENOENT); 1911 1973 goto error_rele; 1912 1974 } 1913 - xfs_dqunlock(gq); 1914 - lockflags = XFS_ILOCK_SHARED; 1915 - xfs_ilock(ip, lockflags); 1916 1975 } else { 1917 1976 ASSERT(ip->i_gdquot); 1918 1977 gq = xfs_qm_dqhold(ip->i_gdquot); ··· 1917 1984 if ((flags & XFS_QMOPT_PQUOTA) && XFS_IS_PQUOTA_ON(mp)) { 1918 1985 ASSERT(O_pdqpp); 1919 1986 if (ip->i_projid != prid) { 1920 - xfs_iunlock(ip, lockflags); 1921 1987 error = xfs_qm_dqget(mp, prid, 1922 1988 XFS_DQTYPE_PROJ, true, &pq); 1923 1989 if (error) { 1924 1990 ASSERT(error != -ENOENT); 1925 1991 goto error_rele; 1926 1992 } 1927 - xfs_dqunlock(pq); 1928 - lockflags = XFS_ILOCK_SHARED; 1929 - xfs_ilock(ip, lockflags); 1930 1993 } else { 1931 1994 ASSERT(ip->i_pdquot); 1932 1995 pq = xfs_qm_dqhold(ip->i_pdquot); ··· 1930 2001 } 1931 2002 trace_xfs_dquot_dqalloc(ip); 1932 2003 1933 - xfs_iunlock(ip, lockflags); 1934 2004 if (O_udqpp) 1935 2005 *O_udqpp = uq; 1936 2006 else ··· 2006 2078 * back now. 2007 2079 */ 2008 2080 tp->t_flags |= XFS_TRANS_DIRTY; 2009 - xfs_dqlock(prevdq); 2081 + mutex_lock(&prevdq->q_qlock); 2010 2082 if (isrt) { 2011 2083 ASSERT(prevdq->q_rtb.reserved >= ip->i_delayed_blks); 2012 2084 prevdq->q_rtb.reserved -= ip->i_delayed_blks; ··· 2014 2086 ASSERT(prevdq->q_blk.reserved >= ip->i_delayed_blks); 2015 2087 prevdq->q_blk.reserved -= ip->i_delayed_blks; 2016 2088 } 2017 - xfs_dqunlock(prevdq); 2089 + mutex_unlock(&prevdq->q_qlock); 2018 2090 2019 2091 /* 2020 2092 * Take an extra reference, because the inode is going to keep

+1 -1

fs/xfs/xfs_qm.h

··· 57 57 struct xfs_inode *qi_pquotaip; /* project quota inode */ 58 58 struct xfs_inode *qi_dirip; /* quota metadir */ 59 59 struct list_lru qi_lru; 60 - int qi_dquots; 60 + uint64_t qi_dquots; 61 61 struct mutex qi_quotaofflock;/* to serialize quotaoff */ 62 62 xfs_filblks_t qi_dqchunklen; /* # BBs in a chunk of dqs */ 63 63 uint qi_dqperchunk; /* # ondisk dq in above chunk */

+3 -1

fs/xfs/xfs_qm_bhv.c

··· 73 73 struct xfs_dquot *dqp; 74 74 75 75 if (!xfs_qm_dqget(mp, ip->i_projid, XFS_DQTYPE_PROJ, false, &dqp)) { 76 + mutex_lock(&dqp->q_qlock); 76 77 xfs_fill_statvfs_from_dquot(statp, ip, dqp); 77 - xfs_qm_dqput(dqp); 78 + mutex_unlock(&dqp->q_qlock); 79 + xfs_qm_dqrele(dqp); 78 80 } 79 81 } 80 82

+6 -4

fs/xfs/xfs_qm_syscalls.c

··· 303 303 } 304 304 305 305 defq = xfs_get_defquota(q, xfs_dquot_type(dqp)); 306 - xfs_dqunlock(dqp); 307 306 308 307 error = xfs_trans_alloc(mp, &M_RES(mp)->tr_qm_setqlim, 0, 0, 0, &tp); 309 308 if (error) 310 309 goto out_rele; 311 310 312 - xfs_dqlock(dqp); 311 + mutex_lock(&dqp->q_qlock); 313 312 xfs_trans_dqjoin(tp, dqp); 314 313 315 314 /* ··· 458 459 * If everything's NULL, this dquot doesn't quite exist as far as 459 460 * our utility programs are concerned. 460 461 */ 462 + mutex_lock(&dqp->q_qlock); 461 463 if (XFS_IS_DQUOT_UNINITIALIZED(dqp)) { 462 464 error = -ENOENT; 463 465 goto out_put; ··· 467 467 xfs_qm_scall_getquota_fill_qc(mp, type, dqp, dst); 468 468 469 469 out_put: 470 - xfs_qm_dqput(dqp); 470 + mutex_unlock(&dqp->q_qlock); 471 + xfs_qm_dqrele(dqp); 471 472 return error; 472 473 } 473 474 ··· 498 497 *id = dqp->q_id; 499 498 500 499 xfs_qm_scall_getquota_fill_qc(mp, type, dqp, dst); 500 + mutex_unlock(&dqp->q_qlock); 501 501 502 - xfs_qm_dqput(dqp); 502 + xfs_qm_dqrele(dqp); 503 503 return error; 504 504 }

+1 -1

fs/xfs/xfs_quotaops.c

··· 65 65 memset(state, 0, sizeof(*state)); 66 66 if (!XFS_IS_QUOTA_ON(mp)) 67 67 return 0; 68 - state->s_incoredqs = q->qi_dquots; 68 + state->s_incoredqs = min_t(uint64_t, q->qi_dquots, UINT_MAX); 69 69 if (XFS_IS_UQUOTA_ON(mp)) 70 70 state->s_state[USRQUOTA].flags |= QCI_ACCT_ENABLED; 71 71 if (XFS_IS_UQUOTA_ENFORCED(mp))

+3 -5

fs/xfs/xfs_trace.h

··· 1350 1350 __entry->id = dqp->q_id; 1351 1351 __entry->type = dqp->q_type; 1352 1352 __entry->flags = dqp->q_flags; 1353 - __entry->nrefs = dqp->q_nrefs; 1353 + __entry->nrefs = data_race(dqp->q_lockref.count); 1354 1354 1355 1355 __entry->res_bcount = dqp->q_blk.reserved; 1356 1356 __entry->res_rtbcount = dqp->q_rtb.reserved; ··· 1399 1399 DEFINE_DQUOT_EVENT(xfs_dqreclaim_want); 1400 1400 DEFINE_DQUOT_EVENT(xfs_dqreclaim_busy); 1401 1401 DEFINE_DQUOT_EVENT(xfs_dqreclaim_done); 1402 - DEFINE_DQUOT_EVENT(xfs_dqattach_found); 1403 1402 DEFINE_DQUOT_EVENT(xfs_dqattach_get); 1404 1403 DEFINE_DQUOT_EVENT(xfs_dqalloc); 1405 1404 DEFINE_DQUOT_EVENT(xfs_dqtobp_read); ··· 1408 1409 DEFINE_DQUOT_EVENT(xfs_dqget_miss); 1409 1410 DEFINE_DQUOT_EVENT(xfs_dqget_freeing); 1410 1411 DEFINE_DQUOT_EVENT(xfs_dqget_dup); 1411 - DEFINE_DQUOT_EVENT(xfs_dqput); 1412 - DEFINE_DQUOT_EVENT(xfs_dqput_free); 1413 1412 DEFINE_DQUOT_EVENT(xfs_dqrele); 1413 + DEFINE_DQUOT_EVENT(xfs_dqrele_free); 1414 1414 DEFINE_DQUOT_EVENT(xfs_dqflush); 1415 1415 DEFINE_DQUOT_EVENT(xfs_dqflush_force); 1416 1416 DEFINE_DQUOT_EVENT(xfs_dqflush_done); ··· 4932 4934 __entry->refcount = atomic_read(&iclog->ic_refcnt); 4933 4935 __entry->offset = iclog->ic_offset; 4934 4936 __entry->flags = iclog->ic_flags; 4935 - __entry->lsn = be64_to_cpu(iclog->ic_header.h_lsn); 4937 + __entry->lsn = be64_to_cpu(iclog->ic_header->h_lsn); 4936 4938 __entry->caller_ip = caller_ip; 4937 4939 ), 4938 4940 TP_printk("dev %d:%d state %s refcnt %d offset %u lsn 0x%llx flags %s caller %pS",

+9 -9

fs/xfs/xfs_trans_dquot.c

··· 393 393 unsigned int i; 394 394 ASSERT(q[0].qt_dquot != NULL); 395 395 if (q[1].qt_dquot == NULL) { 396 - xfs_dqlock(q[0].qt_dquot); 396 + mutex_lock(&q[0].qt_dquot->q_qlock); 397 397 xfs_trans_dqjoin(tp, q[0].qt_dquot); 398 398 } else if (q[2].qt_dquot == NULL) { 399 399 xfs_dqlock2(q[0].qt_dquot, q[1].qt_dquot); ··· 693 693 locked = already_locked; 694 694 if (qtrx->qt_blk_res) { 695 695 if (!locked) { 696 - xfs_dqlock(dqp); 696 + mutex_lock(&dqp->q_qlock); 697 697 locked = true; 698 698 } 699 699 dqp->q_blk.reserved -= ··· 701 701 } 702 702 if (qtrx->qt_ino_res) { 703 703 if (!locked) { 704 - xfs_dqlock(dqp); 704 + mutex_lock(&dqp->q_qlock); 705 705 locked = true; 706 706 } 707 707 dqp->q_ino.reserved -= ··· 710 710 711 711 if (qtrx->qt_rtblk_res) { 712 712 if (!locked) { 713 - xfs_dqlock(dqp); 713 + mutex_lock(&dqp->q_qlock); 714 714 locked = true; 715 715 } 716 716 dqp->q_rtb.reserved -= 717 717 (xfs_qcnt_t)qtrx->qt_rtblk_res; 718 718 } 719 719 if (locked && !already_locked) 720 - xfs_dqunlock(dqp); 720 + mutex_unlock(&dqp->q_qlock); 721 721 722 722 } 723 723 } ··· 820 820 struct xfs_dquot_res *blkres; 821 821 struct xfs_quota_limits *qlim; 822 822 823 - xfs_dqlock(dqp); 823 + mutex_lock(&dqp->q_qlock); 824 824 825 825 defq = xfs_get_defquota(q, xfs_dquot_type(dqp)); 826 826 ··· 887 887 XFS_IS_CORRUPT(mp, dqp->q_ino.reserved < dqp->q_ino.count)) 888 888 goto error_corrupt; 889 889 890 - xfs_dqunlock(dqp); 890 + mutex_unlock(&dqp->q_qlock); 891 891 return 0; 892 892 893 893 error_return: 894 - xfs_dqunlock(dqp); 894 + mutex_unlock(&dqp->q_qlock); 895 895 if (xfs_dquot_type(dqp) == XFS_DQTYPE_PROJ) 896 896 return -ENOSPC; 897 897 return -EDQUOT; 898 898 error_corrupt: 899 - xfs_dqunlock(dqp); 899 + mutex_unlock(&dqp->q_qlock); 900 900 xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE); 901 901 xfs_fs_mark_sick(mp, XFS_SICK_FS_QUOTACHECK); 902 902 return -EFSCORRUPTED;

+22 -4

fs/xfs/xfs_zone_alloc.c

··· 103 103 */ 104 104 trace_xfs_zone_emptied(rtg); 105 105 106 - if (!was_full) 107 - xfs_group_clear_mark(xg, XFS_RTG_RECLAIMABLE); 108 - 109 106 spin_lock(&zi->zi_used_buckets_lock); 110 107 if (!was_full) 111 108 xfs_zone_remove_from_bucket(zi, rgno, from_bucket); ··· 124 127 xfs_zone_add_to_bucket(zi, rgno, to_bucket); 125 128 spin_unlock(&zi->zi_used_buckets_lock); 126 129 127 - xfs_group_set_mark(xg, XFS_RTG_RECLAIMABLE); 128 130 if (zi->zi_gc_thread && xfs_zoned_need_gc(mp)) 129 131 wake_up_process(zi->zi_gc_thread); 130 132 } else if (to_bucket != from_bucket) { ··· 136 140 xfs_zone_remove_from_bucket(zi, rgno, from_bucket); 137 141 spin_unlock(&zi->zi_used_buckets_lock); 138 142 } 143 + } 144 + 145 + /* 146 + * Check if we have any zones that can be reclaimed by looking at the entry 147 + * counters for the zone buckets. 148 + */ 149 + bool 150 + xfs_zoned_have_reclaimable( 151 + struct xfs_zone_info *zi) 152 + { 153 + int i; 154 + 155 + spin_lock(&zi->zi_used_buckets_lock); 156 + for (i = 0; i < XFS_ZONE_USED_BUCKETS; i++) { 157 + if (zi->zi_used_bucket_entries[i]) { 158 + spin_unlock(&zi->zi_used_buckets_lock); 159 + return true; 160 + } 161 + } 162 + spin_unlock(&zi->zi_used_buckets_lock); 163 + 164 + return false; 139 165 } 140 166 141 167 static void

+6 -8

fs/xfs/xfs_zone_gc.c

··· 117 117 struct xfs_rtgroup *victim_rtg; 118 118 119 119 /* Bio used for reads and writes, including the bvec used by it */ 120 - struct bio_vec bv; 121 120 struct bio bio; /* must be last */ 122 121 }; 123 122 ··· 174 175 s64 available, free, threshold; 175 176 s32 remainder; 176 177 177 - if (!xfs_group_marked(mp, XG_TYPE_RTG, XFS_RTG_RECLAIMABLE)) 178 + if (!xfs_zoned_have_reclaimable(mp->m_zone_info)) 178 179 return false; 179 180 180 181 available = xfs_estimate_freecounter(mp, XC_FREE_RTAVAILABLE); 181 182 182 183 if (available < 183 - mp->m_groups[XG_TYPE_RTG].blocks * 184 - (mp->m_max_open_zones - XFS_OPEN_GC_ZONES)) 184 + xfs_rtgs_to_rfsbs(mp, mp->m_max_open_zones - XFS_OPEN_GC_ZONES)) 185 185 return true; 186 186 187 187 free = xfs_estimate_freecounter(mp, XC_FREE_RTEXTENTS); ··· 1182 1184 goto out_put_gc_zone; 1183 1185 } 1184 1186 1185 - mp->m_zone_info->zi_gc_thread = kthread_create(xfs_zoned_gcd, data, 1187 + zi->zi_gc_thread = kthread_create(xfs_zoned_gcd, data, 1186 1188 "xfs-zone-gc/%s", mp->m_super->s_id); 1187 - if (IS_ERR(mp->m_zone_info->zi_gc_thread)) { 1189 + if (IS_ERR(zi->zi_gc_thread)) { 1188 1190 xfs_warn(mp, "unable to create zone gc thread"); 1189 - error = PTR_ERR(mp->m_zone_info->zi_gc_thread); 1191 + error = PTR_ERR(zi->zi_gc_thread); 1190 1192 goto out_free_gc_data; 1191 1193 } 1192 1194 1193 1195 /* xfs_zone_gc_start will unpark for rw mounts */ 1194 - kthread_park(mp->m_zone_info->zi_gc_thread); 1196 + kthread_park(zi->zi_gc_thread); 1195 1197 return 0; 1196 1198 1197 1199 out_free_gc_data:

+1

fs/xfs/xfs_zone_priv.h

··· 113 113 114 114 int xfs_zone_gc_reset_sync(struct xfs_rtgroup *rtg); 115 115 bool xfs_zoned_need_gc(struct xfs_mount *mp); 116 + bool xfs_zoned_have_reclaimable(struct xfs_zone_info *zi); 116 117 int xfs_zone_gc_mount(struct xfs_mount *mp); 117 118 void xfs_zone_gc_unmount(struct xfs_mount *mp); 118 119

+4 -6

fs/xfs/xfs_zone_space_resv.c

··· 54 54 { 55 55 switch (ctr) { 56 56 case XC_FREE_RTEXTENTS: 57 - return (uint64_t)XFS_RESERVED_ZONES * 58 - mp->m_groups[XG_TYPE_RTG].blocks + 59 - mp->m_sb.sb_rtreserved; 57 + return xfs_rtgs_to_rfsbs(mp, XFS_RESERVED_ZONES) + 58 + mp->m_sb.sb_rtreserved; 60 59 case XC_FREE_RTAVAILABLE: 61 - return (uint64_t)XFS_GC_ZONES * 62 - mp->m_groups[XG_TYPE_RTG].blocks; 60 + return xfs_rtgs_to_rfsbs(mp, XFS_GC_ZONES); 63 61 default: 64 62 ASSERT(0); 65 63 return 0; ··· 172 174 * processing a pending GC request give up as we're fully out 173 175 * of space. 174 176 */ 175 - if (!xfs_group_marked(mp, XG_TYPE_RTG, XFS_RTG_RECLAIMABLE) && 177 + if (!xfs_zoned_have_reclaimable(mp->m_zone_info) && 176 178 !xfs_is_zonegc_running(mp)) 177 179 break; 178 180