Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

+10 -4

Documentation/filesystems/ext4.txt

··· 32 32 you will need to merge your changes with the version from e2fsprogs 33 33 1.41.x. 34 34 35 - - Create a new filesystem using the ext4dev filesystem type: 35 + - Create a new filesystem using the ext4 filesystem type: 36 36 37 - # mke2fs -t ext4dev /dev/hda1 37 + # mke2fs -t ext4 /dev/hda1 38 38 39 39 Or configure an existing ext3 filesystem to support extents and set 40 40 the test_fs flag to indicate that it's ok for an in-development ··· 47 47 48 48 # tune2fs -I 256 /dev/hda1 49 49 50 - (Note: we currently do not have tools to convert an ext4dev 50 + (Note: we currently do not have tools to convert an ext4 51 51 filesystem back to ext3; so please do not do try this on production 52 52 filesystems.) 53 53 54 54 - Mounting: 55 55 56 - # mount -t ext4dev /dev/hda1 /wherever 56 + # mount -t ext4 /dev/hda1 /wherever 57 57 58 58 - When comparing performance with other filesystems, remember that 59 59 ext3/4 by default offers higher data integrity guarantees than most. ··· 177 177 your disks are battery-backed in one way or another, 178 178 disabling barriers may safely improve performance. 179 179 180 + inode_readahead=n This tuning parameter controls the maximum 181 + number of inode table blocks that ext4's inode 182 + table readahead algorithm will pre-read into 183 + the buffer cache. The default value is 32 blocks. 184 + 180 185 orlov (*) This enables the new Orlov block allocator. It is 181 186 enabled by default. 182 187 ··· 257 252 delalloc (*) Deferring block allocation until write-out time. 258 253 nodelalloc Disable delayed allocation. Blocks are allocation 259 254 when data is copied from user to page cache. 255 + 260 256 Data Mode 261 257 ========= 262 258 There are 3 different data modes:

+228

Documentation/filesystems/fiemap.txt

··· 1 + ============ 2 + Fiemap Ioctl 3 + ============ 4 + 5 + The fiemap ioctl is an efficient method for userspace to get file 6 + extent mappings. Instead of block-by-block mapping (such as bmap), fiemap 7 + returns a list of extents. 8 + 9 + 10 + Request Basics 11 + -------------- 12 + 13 + A fiemap request is encoded within struct fiemap: 14 + 15 + struct fiemap { 16 + __u64 fm_start; /* logical offset (inclusive) at 17 + * which to start mapping (in) */ 18 + __u64 fm_length; /* logical length of mapping which 19 + * userspace cares about (in) */ 20 + __u32 fm_flags; /* FIEMAP_FLAG_* flags for request (in/out) */ 21 + __u32 fm_mapped_extents; /* number of extents that were 22 + * mapped (out) */ 23 + __u32 fm_extent_count; /* size of fm_extents array (in) */ 24 + __u32 fm_reserved; 25 + struct fiemap_extent fm_extents[0]; /* array of mapped extents (out) */ 26 + }; 27 + 28 + 29 + fm_start, and fm_length specify the logical range within the file 30 + which the process would like mappings for. Extents returned mirror 31 + those on disk - that is, the logical offset of the 1st returned extent 32 + may start before fm_start, and the range covered by the last returned 33 + extent may end after fm_length. All offsets and lengths are in bytes. 34 + 35 + Certain flags to modify the way in which mappings are looked up can be 36 + set in fm_flags. If the kernel doesn't understand some particular 37 + flags, it will return EBADR and the contents of fm_flags will contain 38 + the set of flags which caused the error. If the kernel is compatible 39 + with all flags passed, the contents of fm_flags will be unmodified. 40 + It is up to userspace to determine whether rejection of a particular 41 + flag is fatal to it's operation. This scheme is intended to allow the 42 + fiemap interface to grow in the future but without losing 43 + compatibility with old software. 44 + 45 + fm_extent_count specifies the number of elements in the fm_extents[] array 46 + that can be used to return extents. If fm_extent_count is zero, then the 47 + fm_extents[] array is ignored (no extents will be returned), and the 48 + fm_mapped_extents count will hold the number of extents needed in 49 + fm_extents[] to hold the file's current mapping. Note that there is 50 + nothing to prevent the file from changing between calls to FIEMAP. 51 + 52 + The following flags can be set in fm_flags: 53 + 54 + * FIEMAP_FLAG_SYNC 55 + If this flag is set, the kernel will sync the file before mapping extents. 56 + 57 + * FIEMAP_FLAG_XATTR 58 + If this flag is set, the extents returned will describe the inodes 59 + extended attribute lookup tree, instead of it's data tree. 60 + 61 + 62 + Extent Mapping 63 + -------------- 64 + 65 + Extent information is returned within the embedded fm_extents array 66 + which userspace must allocate along with the fiemap structure. The 67 + number of elements in the fiemap_extents[] array should be passed via 68 + fm_extent_count. The number of extents mapped by kernel will be 69 + returned via fm_mapped_extents. If the number of fiemap_extents 70 + allocated is less than would be required to map the requested range, 71 + the maximum number of extents that can be mapped in the fm_extent[] 72 + array will be returned and fm_mapped_extents will be equal to 73 + fm_extent_count. In that case, the last extent in the array will not 74 + complete the requested range and will not have the FIEMAP_EXTENT_LAST 75 + flag set (see the next section on extent flags). 76 + 77 + Each extent is described by a single fiemap_extent structure as 78 + returned in fm_extents. 79 + 80 + struct fiemap_extent { 81 + __u64 fe_logical; /* logical offset in bytes for the start of 82 + * the extent */ 83 + __u64 fe_physical; /* physical offset in bytes for the start 84 + * of the extent */ 85 + __u64 fe_length; /* length in bytes for the extent */ 86 + __u64 fe_reserved64[2]; 87 + __u32 fe_flags; /* FIEMAP_EXTENT_* flags for this extent */ 88 + __u32 fe_reserved[3]; 89 + }; 90 + 91 + All offsets and lengths are in bytes and mirror those on disk. It is valid 92 + for an extents logical offset to start before the request or it's logical 93 + length to extend past the request. Unless FIEMAP_EXTENT_NOT_ALIGNED is 94 + returned, fe_logical, fe_physical, and fe_length will be aligned to the 95 + block size of the file system. With the exception of extents flagged as 96 + FIEMAP_EXTENT_MERGED, adjacent extents will not be merged. 97 + 98 + The fe_flags field contains flags which describe the extent returned. 99 + A special flag, FIEMAP_EXTENT_LAST is always set on the last extent in 100 + the file so that the process making fiemap calls can determine when no 101 + more extents are available, without having to call the ioctl again. 102 + 103 + Some flags are intentionally vague and will always be set in the 104 + presence of other more specific flags. This way a program looking for 105 + a general property does not have to know all existing and future flags 106 + which imply that property. 107 + 108 + For example, if FIEMAP_EXTENT_DATA_INLINE or FIEMAP_EXTENT_DATA_TAIL 109 + are set, FIEMAP_EXTENT_NOT_ALIGNED will also be set. A program looking 110 + for inline or tail-packed data can key on the specific flag. Software 111 + which simply cares not to try operating on non-aligned extents 112 + however, can just key on FIEMAP_EXTENT_NOT_ALIGNED, and not have to 113 + worry about all present and future flags which might imply unaligned 114 + data. Note that the opposite is not true - it would be valid for 115 + FIEMAP_EXTENT_NOT_ALIGNED to appear alone. 116 + 117 + * FIEMAP_EXTENT_LAST 118 + This is the last extent in the file. A mapping attempt past this 119 + extent will return nothing. 120 + 121 + * FIEMAP_EXTENT_UNKNOWN 122 + The location of this extent is currently unknown. This may indicate 123 + the data is stored on an inaccessible volume or that no storage has 124 + been allocated for the file yet. 125 + 126 + * FIEMAP_EXTENT_DELALLOC 127 + - This will also set FIEMAP_EXTENT_UNKNOWN. 128 + Delayed allocation - while there is data for this extent, it's 129 + physical location has not been allocated yet. 130 + 131 + * FIEMAP_EXTENT_ENCODED 132 + This extent does not consist of plain filesystem blocks but is 133 + encoded (e.g. encrypted or compressed). Reading the data in this 134 + extent via I/O to the block device will have undefined results. 135 + 136 + Note that it is *always* undefined to try to update the data 137 + in-place by writing to the indicated location without the 138 + assistance of the filesystem, or to access the data using the 139 + information returned by the FIEMAP interface while the filesystem 140 + is mounted. In other words, user applications may only read the 141 + extent data via I/O to the block device while the filesystem is 142 + unmounted, and then only if the FIEMAP_EXTENT_ENCODED flag is 143 + clear; user applications must not try reading or writing to the 144 + filesystem via the block device under any other circumstances. 145 + 146 + * FIEMAP_EXTENT_DATA_ENCRYPTED 147 + - This will also set FIEMAP_EXTENT_ENCODED 148 + The data in this extent has been encrypted by the file system. 149 + 150 + * FIEMAP_EXTENT_NOT_ALIGNED 151 + Extent offsets and length are not guaranteed to be block aligned. 152 + 153 + * FIEMAP_EXTENT_DATA_INLINE 154 + This will also set FIEMAP_EXTENT_NOT_ALIGNED 155 + Data is located within a meta data block. 156 + 157 + * FIEMAP_EXTENT_DATA_TAIL 158 + This will also set FIEMAP_EXTENT_NOT_ALIGNED 159 + Data is packed into a block with data from other files. 160 + 161 + * FIEMAP_EXTENT_UNWRITTEN 162 + Unwritten extent - the extent is allocated but it's data has not been 163 + initialized. This indicates the extent's data will be all zero if read 164 + through the filesystem but the contents are undefined if read directly from 165 + the device. 166 + 167 + * FIEMAP_EXTENT_MERGED 168 + This will be set when a file does not support extents, i.e., it uses a block 169 + based addressing scheme. Since returning an extent for each block back to 170 + userspace would be highly inefficient, the kernel will try to merge most 171 + adjacent blocks into 'extents'. 172 + 173 + 174 + VFS -> File System Implementation 175 + --------------------------------- 176 + 177 + File systems wishing to support fiemap must implement a ->fiemap callback on 178 + their inode_operations structure. The fs ->fiemap call is responsible for 179 + defining it's set of supported fiemap flags, and calling a helper function on 180 + each discovered extent: 181 + 182 + struct inode_operations { 183 + ... 184 + 185 + int (*fiemap)(struct inode *, struct fiemap_extent_info *, u64 start, 186 + u64 len); 187 + 188 + ->fiemap is passed struct fiemap_extent_info which describes the 189 + fiemap request: 190 + 191 + struct fiemap_extent_info { 192 + unsigned int fi_flags; /* Flags as passed from user */ 193 + unsigned int fi_extents_mapped; /* Number of mapped extents */ 194 + unsigned int fi_extents_max; /* Size of fiemap_extent array */ 195 + struct fiemap_extent *fi_extents_start; /* Start of fiemap_extent array */ 196 + }; 197 + 198 + It is intended that the file system should not need to access any of this 199 + structure directly. 200 + 201 + 202 + Flag checking should be done at the beginning of the ->fiemap callback via the 203 + fiemap_check_flags() helper: 204 + 205 + int fiemap_check_flags(struct fiemap_extent_info *fieinfo, u32 fs_flags); 206 + 207 + The struct fieinfo should be passed in as recieved from ioctl_fiemap(). The 208 + set of fiemap flags which the fs understands should be passed via fs_flags. If 209 + fiemap_check_flags finds invalid user flags, it will place the bad values in 210 + fieinfo->fi_flags and return -EBADR. If the file system gets -EBADR, from 211 + fiemap_check_flags(), it should immediately exit, returning that error back to 212 + ioctl_fiemap(). 213 + 214 + 215 + For each extent in the request range, the file system should call 216 + the helper function, fiemap_fill_next_extent(): 217 + 218 + int fiemap_fill_next_extent(struct fiemap_extent_info *info, u64 logical, 219 + u64 phys, u64 len, u32 flags, u32 dev); 220 + 221 + fiemap_fill_next_extent() will use the passed values to populate the 222 + next free extent in the fm_extents array. 'General' extent flags will 223 + automatically be set from specific flags on behalf of the calling file 224 + system so that the userspace API is not broken. 225 + 226 + fiemap_fill_next_extent() returns 0 on success, and 1 when the 227 + user-supplied fm_extents array is full. If an error is encountered 228 + while copying the extent to user memory, -EFAULT will be returned.

+33 -34

Documentation/filesystems/proc.txt

··· 923 923 The "procs_blocked" line gives the number of processes currently blocked, 924 924 waiting for I/O to complete. 925 925 926 + 926 927 1.9 Ext4 file system parameters 927 928 ------------------------------ 928 - Ext4 file system have one directory per partition under /proc/fs/ext4/ 929 - # ls /proc/fs/ext4/hdc/ 930 - group_prealloc max_to_scan mb_groups mb_history min_to_scan order2_req 931 - stats stream_req 932 929 933 - mb_groups: 934 - This file gives the details of multiblock allocator buddy cache of free blocks 930 + Information about mounted ext4 file systems can be found in 931 + /proc/fs/ext4. Each mounted filesystem will have a directory in 932 + /proc/fs/ext4 based on its device name (i.e., /proc/fs/ext4/hdc or 933 + /proc/fs/ext4/dm-0). The files in each per-device directory are shown 934 + in Table 1-10, below. 935 935 936 - mb_history: 937 - Multiblock allocation history. 936 + Table 1-10: Files in /proc/fs/ext4/<devname> 937 + .............................................................................. 938 + File Content 939 + mb_groups details of multiblock allocator buddy cache of free blocks 940 + mb_history multiblock allocation history 941 + stats controls whether the multiblock allocator should start 942 + collecting statistics, which are shown during the unmount 943 + group_prealloc the multiblock allocator will round up allocation 944 + requests to a multiple of this tuning parameter if the 945 + stripe size is not set in the ext4 superblock 946 + max_to_scan The maximum number of extents the multiblock allocator 947 + will search to find the best extent 948 + min_to_scan The minimum number of extents the multiblock allocator 949 + will search to find the best extent 950 + order2_req Tuning parameter which controls the minimum size for 951 + requests (as a power of 2) where the buddy cache is 952 + used 953 + stream_req Files which have fewer blocks than this tunable 954 + parameter will have their blocks allocated out of a 955 + block group specific preallocation pool, so that small 956 + files are packed closely together. Each large file 957 + will have its blocks allocated out of its own unique 958 + preallocation pool. 959 + inode_readahead Tuning parameter which controls the maximum number of 960 + inode table blocks that ext4's inode table readahead 961 + algorithm will pre-read into the buffer cache 962 + .............................................................................. 938 963 939 - stats: 940 - This file indicate whether the multiblock allocator should start collecting 941 - statistics. The statistics are shown during unmount 942 - 943 - group_prealloc: 944 - The multiblock allocator normalize the block allocation request to 945 - group_prealloc filesystem blocks if we don't have strip value set. 946 - The stripe value can be specified at mount time or during mke2fs. 947 - 948 - max_to_scan: 949 - How long multiblock allocator can look for a best extent (in found extents) 950 - 951 - min_to_scan: 952 - How long multiblock allocator must look for a best extent 953 - 954 - order2_req: 955 - Multiblock allocator use 2^N search using buddies only for requests greater 956 - than or equal to order2_req. The request size is specfied in file system 957 - blocks. A value of 2 indicate only if the requests are greater than or equal 958 - to 4 blocks. 959 - 960 - stream_req: 961 - Files smaller than stream_req are served by the stream allocator, whose 962 - purpose is to pack requests as close each to other as possible to 963 - produce smooth I/O traffic. Avalue of 16 indicate that file smaller than 16 964 - filesystem block size will use group based preallocation. 965 964 966 965 ------------------------------------------------------------------------------ 967 966 Summary

+3 -2

MAINTAINERS

··· 1659 1659 S: Maintained 1660 1660 1661 1661 EXT4 FILE SYSTEM 1662 - P: Stephen Tweedie, Andrew Morton 1663 - M: sct@redhat.com, akpm@linux-foundation.org, adilger@sun.com 1662 + P: Theodore Ts'o 1663 + M: tytso@mit.edu, adilger@sun.com 1664 1664 L: linux-ext4@vger.kernel.org 1665 + W: http://ext4.wiki.kernel.org 1665 1666 S: Maintained 1666 1667 1667 1668 F71805F HARDWARE MONITORING DRIVER

+50 -36

fs/Kconfig

··· 136 136 If you are not using a security module that requires using 137 137 extended attributes for file security labels, say N. 138 138 139 - config EXT4DEV_FS 140 - tristate "Ext4dev/ext4 extended fs support development (EXPERIMENTAL)" 141 - depends on EXPERIMENTAL 139 + config EXT4_FS 140 + tristate "The Extended 4 (ext4) filesystem" 142 141 select JBD2 143 142 select CRC16 144 143 help 145 - Ext4dev is a predecessor filesystem of the next generation 146 - extended fs ext4, based on ext3 filesystem code. It will be 147 - renamed ext4 fs later, once ext4dev is mature and stabilized. 144 + This is the next generation of the ext3 filesystem. 148 145 149 146 Unlike the change from ext2 filesystem to ext3 filesystem, 150 - the on-disk format of ext4dev is not the same as ext3 any more: 151 - it is based on extent maps and it supports 48-bit physical block 152 - numbers. These combined on-disk format changes will allow 153 - ext4dev/ext4 to handle more than 16 TB filesystem volumes -- 154 - a hard limit that ext3 cannot overcome without changing the 155 - on-disk format. 147 + the on-disk format of ext4 is not forwards compatible with 148 + ext3; it is based on extent maps and it supports 48-bit 149 + physical block numbers. The ext4 filesystem also supports delayed 150 + allocation, persistent preallocation, high resolution time stamps, 151 + and a number of other features to improve performance and speed 152 + up fsck time. For more information, please see the web pages at 153 + http://ext4.wiki.kernel.org. 156 154 157 - Other than extent maps and 48-bit block numbers, ext4dev also is 158 - likely to have other new features such as persistent preallocation, 159 - high resolution time stamps, and larger file support etc. These 160 - features will be added to ext4dev gradually. 155 + The ext4 filesystem will support mounting an ext3 156 + filesystem; while there will be some performance gains from 157 + the delayed allocation and inode table readahead, the best 158 + performance gains will require enabling ext4 features in the 159 + filesystem, or formating a new filesystem as an ext4 160 + filesystem initially. 161 161 162 162 To compile this file system support as a module, choose M here. The 163 163 module will be called ext4dev. 164 164 165 165 If unsure, say N. 166 166 167 - config EXT4DEV_FS_XATTR 168 - bool "Ext4dev extended attributes" 169 - depends on EXT4DEV_FS 167 + config EXT4DEV_COMPAT 168 + bool "Enable ext4dev compatibility" 169 + depends on EXT4_FS 170 + help 171 + Starting with 2.6.28, the name of the ext4 filesystem was 172 + renamed from ext4dev to ext4. Unfortunately there are some 173 + lagecy userspace programs (such as klibc's fstype) have 174 + "ext4dev" hardcoded. 175 + 176 + To enable backwards compatibility so that systems that are 177 + still expecting to mount ext4 filesystems using ext4dev, 178 + chose Y here. This feature will go away by 2.6.31, so 179 + please arrange to get your userspace programs fixed! 180 + 181 + config EXT4_FS_XATTR 182 + bool "Ext4 extended attributes" 183 + depends on EXT4_FS 170 184 default y 171 185 help 172 186 Extended attributes are name:value pairs associated with inodes by ··· 189 175 190 176 If unsure, say N. 191 177 192 - You need this for POSIX ACL support on ext4dev/ext4. 178 + You need this for POSIX ACL support on ext4. 193 179 194 - config EXT4DEV_FS_POSIX_ACL 195 - bool "Ext4dev POSIX Access Control Lists" 196 - depends on EXT4DEV_FS_XATTR 180 + config EXT4_FS_POSIX_ACL 181 + bool "Ext4 POSIX Access Control Lists" 182 + depends on EXT4_FS_XATTR 197 183 select FS_POSIX_ACL 198 184 help 199 185 POSIX Access Control Lists (ACLs) support permissions for users and ··· 204 190 205 191 If you don't know what Access Control Lists are, say N 206 192 207 - config EXT4DEV_FS_SECURITY 208 - bool "Ext4dev Security Labels" 209 - depends on EXT4DEV_FS_XATTR 193 + config EXT4_FS_SECURITY 194 + bool "Ext4 Security Labels" 195 + depends on EXT4_FS_XATTR 210 196 help 211 197 Security labels support alternative access control models 212 198 implemented by security modules like SELinux. This option 213 199 enables an extended attribute handler for file security 214 - labels in the ext4dev/ext4 filesystem. 200 + labels in the ext4 filesystem. 215 201 216 202 If you are not using a security module that requires using 217 203 extended attributes for file security labels, say N. ··· 254 240 help 255 241 This is a generic journaling layer for block devices that support 256 242 both 32-bit and 64-bit block numbers. It is currently used by 257 - the ext4dev/ext4 filesystem, but it could also be used to add 243 + the ext4 filesystem, but it could also be used to add 258 244 journal support to other file systems or block devices such 259 245 as RAID or LVM. 260 246 261 - If you are using ext4dev/ext4, you need to say Y here. If you are not 262 - using ext4dev/ext4 then you will probably want to say N. 247 + If you are using ext4, you need to say Y here. If you are not 248 + using ext4 then you will probably want to say N. 263 249 264 250 To compile this device as a module, choose M here. The module will be 265 - called jbd2. If you are compiling ext4dev/ext4 into the kernel, 251 + called jbd2. If you are compiling ext4 into the kernel, 266 252 you cannot compile this code as a module. 267 253 268 254 config JBD2_DEBUG 269 - bool "JBD2 (ext4dev/ext4) debugging support" 255 + bool "JBD2 (ext4) debugging support" 270 256 depends on JBD2 && DEBUG_FS 271 257 help 272 - If you are using the ext4dev/ext4 journaled file system (or 258 + If you are using the ext4 journaled file system (or 273 259 potentially any other filesystem/device using JBD2), this option 274 260 allows you to enable debugging output while the system is running, 275 261 in order to help track down any problems you are having. ··· 284 270 config FS_MBCACHE 285 271 # Meta block cache for Extended Attributes (ext2/ext3/ext4) 286 272 tristate 287 - depends on EXT2_FS_XATTR || EXT3_FS_XATTR || EXT4DEV_FS_XATTR 288 - default y if EXT2_FS=y || EXT3_FS=y || EXT4DEV_FS=y 289 - default m if EXT2_FS=m || EXT3_FS=m || EXT4DEV_FS=m 273 + depends on EXT2_FS_XATTR || EXT3_FS_XATTR || EXT4_FS_XATTR 274 + default y if EXT2_FS=y || EXT3_FS=y || EXT4_FS=y 275 + default m if EXT2_FS=m || EXT3_FS=m || EXT4_FS=m 290 276 291 277 config REISERFS_FS 292 278 tristate "Reiserfs support"

+1 -1

fs/Makefile

··· 69 69 # Do not add any filesystems before this line 70 70 obj-$(CONFIG_REISERFS_FS) += reiserfs/ 71 71 obj-$(CONFIG_EXT3_FS) += ext3/ # Before ext2 so root fs can be ext3 72 - obj-$(CONFIG_EXT4DEV_FS) += ext4/ # Before ext2 so root fs can be ext4dev 72 + obj-$(CONFIG_EXT4_FS) += ext4/ # Before ext2 so root fs can be ext4dev 73 73 obj-$(CONFIG_JBD) += jbd/ 74 74 obj-$(CONFIG_JBD2) += jbd2/ 75 75 obj-$(CONFIG_EXT2_FS) += ext2/

+2

fs/ext2/ext2.h

··· 133 133 extern int ext2_setattr (struct dentry *, struct iattr *); 134 134 extern void ext2_set_inode_flags(struct inode *inode); 135 135 extern void ext2_get_inode_flags(struct ext2_inode_info *); 136 + extern int ext2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo, 137 + u64 start, u64 len); 136 138 int __ext2_write_begin(struct file *file, struct address_space *mapping, 137 139 loff_t pos, unsigned len, unsigned flags, 138 140 struct page **pagep, void **fsdata);

+1

fs/ext2/file.c

··· 86 86 #endif 87 87 .setattr = ext2_setattr, 88 88 .permission = ext2_permission, 89 + .fiemap = ext2_fiemap, 89 90 };

+8

fs/ext2/inode.c

··· 31 31 #include <linux/writeback.h> 32 32 #include <linux/buffer_head.h> 33 33 #include <linux/mpage.h> 34 + #include <linux/fiemap.h> 34 35 #include "ext2.h" 35 36 #include "acl.h" 36 37 #include "xip.h" ··· 703 702 } 704 703 return ret; 705 704 705 + } 706 + 707 + int ext2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo, 708 + u64 start, u64 len) 709 + { 710 + return generic_block_fiemap(inode, fieinfo, start, len, 711 + ext2_get_block); 706 712 } 707 713 708 714 static int ext2_writepage(struct page *page, struct writeback_control *wbc)

+1

fs/ext3/file.c

··· 134 134 .removexattr = generic_removexattr, 135 135 #endif 136 136 .permission = ext3_permission, 137 + .fiemap = ext3_fiemap, 137 138 }; 138 139

+8

fs/ext3/inode.c

··· 36 36 #include <linux/mpage.h> 37 37 #include <linux/uio.h> 38 38 #include <linux/bio.h> 39 + #include <linux/fiemap.h> 39 40 #include "xattr.h" 40 41 #include "acl.h" 41 42 ··· 980 979 ext3_journal_stop(handle); 981 980 out: 982 981 return ret; 982 + } 983 + 984 + int ext3_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo, 985 + u64 start, u64 len) 986 + { 987 + return generic_block_fiemap(inode, fieinfo, start, len, 988 + ext3_get_block); 983 989 } 984 990 985 991 /*

+5 -5

fs/ext4/Makefile

··· 2 2 # Makefile for the linux ext4-filesystem routines. 3 3 # 4 4 5 - obj-$(CONFIG_EXT4DEV_FS) += ext4dev.o 5 + obj-$(CONFIG_EXT4_FS) += ext4.o 6 6 7 - ext4dev-y := balloc.o bitmap.o dir.o file.o fsync.o ialloc.o inode.o \ 7 + ext4-y := balloc.o bitmap.o dir.o file.o fsync.o ialloc.o inode.o \ 8 8 ioctl.o namei.o super.o symlink.o hash.o resize.o extents.o \ 9 9 ext4_jbd2.o migrate.o mballoc.o 10 10 11 - ext4dev-$(CONFIG_EXT4DEV_FS_XATTR) += xattr.o xattr_user.o xattr_trusted.o 12 - ext4dev-$(CONFIG_EXT4DEV_FS_POSIX_ACL) += acl.o 13 - ext4dev-$(CONFIG_EXT4DEV_FS_SECURITY) += xattr_security.o 11 + ext4-$(CONFIG_EXT4_FS_XATTR) += xattr.o xattr_user.o xattr_trusted.o 12 + ext4-$(CONFIG_EXT4_FS_POSIX_ACL) += acl.o 13 + ext4-$(CONFIG_EXT4_FS_SECURITY) += xattr_security.o

+6 -6

fs/ext4/acl.h

··· 51 51 } 52 52 } 53 53 54 - #ifdef CONFIG_EXT4DEV_FS_POSIX_ACL 54 + #ifdef CONFIG_EXT4_FS_POSIX_ACL 55 55 56 56 /* Value for inode->u.ext4_i.i_acl and inode->u.ext4_i.i_default_acl 57 57 if the ACL has not been cached */ 58 58 #define EXT4_ACL_NOT_CACHED ((void *)-1) 59 59 60 60 /* acl.c */ 61 - extern int ext4_permission (struct inode *, int); 62 - extern int ext4_acl_chmod (struct inode *); 63 - extern int ext4_init_acl (handle_t *, struct inode *, struct inode *); 61 + extern int ext4_permission(struct inode *, int); 62 + extern int ext4_acl_chmod(struct inode *); 63 + extern int ext4_init_acl(handle_t *, struct inode *, struct inode *); 64 64 65 - #else /* CONFIG_EXT4DEV_FS_POSIX_ACL */ 65 + #else /* CONFIG_EXT4_FS_POSIX_ACL */ 66 66 #include <linux/sched.h> 67 67 #define ext4_permission NULL 68 68 ··· 77 77 { 78 78 return 0; 79 79 } 80 - #endif /* CONFIG_EXT4DEV_FS_POSIX_ACL */ 80 + #endif /* CONFIG_EXT4_FS_POSIX_ACL */ 81 81

+90 -1367

fs/ext4/balloc.c

··· 83 83 } 84 84 return used_blocks; 85 85 } 86 + 86 87 /* Initializes an uninitialized block bitmap if given, and returns the 87 88 * number of blocks free in the group. */ 88 89 unsigned ext4_init_block_bitmap(struct super_block *sb, struct buffer_head *bh, ··· 133 132 */ 134 133 group_blocks = ext4_blocks_count(sbi->s_es) - 135 134 le32_to_cpu(sbi->s_es->s_first_data_block) - 136 - (EXT4_BLOCKS_PER_GROUP(sb) * (sbi->s_groups_count -1)); 135 + (EXT4_BLOCKS_PER_GROUP(sb) * (sbi->s_groups_count - 1)); 137 136 } else { 138 137 group_blocks = EXT4_BLOCKS_PER_GROUP(sb); 139 138 } ··· 201 200 * @bh: pointer to the buffer head to store the block 202 201 * group descriptor 203 202 */ 204 - struct ext4_group_desc * ext4_get_group_desc(struct super_block * sb, 203 + struct ext4_group_desc * ext4_get_group_desc(struct super_block *sb, 205 204 ext4_group_t block_group, 206 - struct buffer_head ** bh) 205 + struct buffer_head **bh) 207 206 { 208 207 unsigned long group_desc; 209 208 unsigned long offset; 210 - struct ext4_group_desc * desc; 209 + struct ext4_group_desc *desc; 211 210 struct ext4_sb_info *sbi = EXT4_SB(sb); 212 211 213 212 if (block_group >= sbi->s_groups_count) { 214 - ext4_error (sb, "ext4_get_group_desc", 215 - "block_group >= groups_count - " 216 - "block_group = %lu, groups_count = %lu", 217 - block_group, sbi->s_groups_count); 213 + ext4_error(sb, "ext4_get_group_desc", 214 + "block_group >= groups_count - " 215 + "block_group = %lu, groups_count = %lu", 216 + block_group, sbi->s_groups_count); 218 217 219 218 return NULL; 220 219 } ··· 223 222 group_desc = block_group >> EXT4_DESC_PER_BLOCK_BITS(sb); 224 223 offset = block_group & (EXT4_DESC_PER_BLOCK(sb) - 1); 225 224 if (!sbi->s_group_desc[group_desc]) { 226 - ext4_error (sb, "ext4_get_group_desc", 227 - "Group descriptor not loaded - " 228 - "block_group = %lu, group_desc = %lu, desc = %lu", 229 - block_group, group_desc, offset); 225 + ext4_error(sb, "ext4_get_group_desc", 226 + "Group descriptor not loaded - " 227 + "block_group = %lu, group_desc = %lu, desc = %lu", 228 + block_group, group_desc, offset); 230 229 return NULL; 231 230 } 232 231 ··· 303 302 struct buffer_head * 304 303 ext4_read_block_bitmap(struct super_block *sb, ext4_group_t block_group) 305 304 { 306 - struct ext4_group_desc * desc; 307 - struct buffer_head * bh = NULL; 305 + struct ext4_group_desc *desc; 306 + struct buffer_head *bh = NULL; 308 307 ext4_fsblk_t bitmap_blk; 309 308 310 309 desc = ext4_get_group_desc(sb, block_group, NULL); ··· 319 318 block_group, bitmap_blk); 320 319 return NULL; 321 320 } 322 - if (bh_uptodate_or_lock(bh)) 321 + if (buffer_uptodate(bh) && 322 + !(desc->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT))) 323 323 return bh; 324 324 325 + lock_buffer(bh); 325 326 spin_lock(sb_bgl_lock(EXT4_SB(sb), block_group)); 326 327 if (desc->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) { 327 328 ext4_init_block_bitmap(sb, bh, block_group, desc); ··· 348 345 */ 349 346 return bh; 350 347 } 351 - /* 352 - * The reservation window structure operations 353 - * -------------------------------------------- 354 - * Operations include: 355 - * dump, find, add, remove, is_empty, find_next_reservable_window, etc. 356 - * 357 - * We use a red-black tree to represent per-filesystem reservation 358 - * windows. 359 - * 360 - */ 361 - 362 - /** 363 - * __rsv_window_dump() -- Dump the filesystem block allocation reservation map 364 - * @rb_root: root of per-filesystem reservation rb tree 365 - * @verbose: verbose mode 366 - * @fn: function which wishes to dump the reservation map 367 - * 368 - * If verbose is turned on, it will print the whole block reservation 369 - * windows(start, end). Otherwise, it will only print out the "bad" windows, 370 - * those windows that overlap with their immediate neighbors. 371 - */ 372 - #if 1 373 - static void __rsv_window_dump(struct rb_root *root, int verbose, 374 - const char *fn) 375 - { 376 - struct rb_node *n; 377 - struct ext4_reserve_window_node *rsv, *prev; 378 - int bad; 379 - 380 - restart: 381 - n = rb_first(root); 382 - bad = 0; 383 - prev = NULL; 384 - 385 - printk("Block Allocation Reservation Windows Map (%s):\n", fn); 386 - while (n) { 387 - rsv = rb_entry(n, struct ext4_reserve_window_node, rsv_node); 388 - if (verbose) 389 - printk("reservation window 0x%p " 390 - "start: %llu, end: %llu\n", 391 - rsv, rsv->rsv_start, rsv->rsv_end); 392 - if (rsv->rsv_start && rsv->rsv_start >= rsv->rsv_end) { 393 - printk("Bad reservation %p (start >= end)\n", 394 - rsv); 395 - bad = 1; 396 - } 397 - if (prev && prev->rsv_end >= rsv->rsv_start) { 398 - printk("Bad reservation %p (prev->end >= start)\n", 399 - rsv); 400 - bad = 1; 401 - } 402 - if (bad) { 403 - if (!verbose) { 404 - printk("Restarting reservation walk in verbose mode\n"); 405 - verbose = 1; 406 - goto restart; 407 - } 408 - } 409 - n = rb_next(n); 410 - prev = rsv; 411 - } 412 - printk("Window map complete.\n"); 413 - BUG_ON(bad); 414 - } 415 - #define rsv_window_dump(root, verbose) \ 416 - __rsv_window_dump((root), (verbose), __func__) 417 - #else 418 - #define rsv_window_dump(root, verbose) do {} while (0) 419 - #endif 420 - 421 - /** 422 - * goal_in_my_reservation() 423 - * @rsv: inode's reservation window 424 - * @grp_goal: given goal block relative to the allocation block group 425 - * @group: the current allocation block group 426 - * @sb: filesystem super block 427 - * 428 - * Test if the given goal block (group relative) is within the file's 429 - * own block reservation window range. 430 - * 431 - * If the reservation window is outside the goal allocation group, return 0; 432 - * grp_goal (given goal block) could be -1, which means no specific 433 - * goal block. In this case, always return 1. 434 - * If the goal block is within the reservation window, return 1; 435 - * otherwise, return 0; 436 - */ 437 - static int 438 - goal_in_my_reservation(struct ext4_reserve_window *rsv, ext4_grpblk_t grp_goal, 439 - ext4_group_t group, struct super_block *sb) 440 - { 441 - ext4_fsblk_t group_first_block, group_last_block; 442 - 443 - group_first_block = ext4_group_first_block_no(sb, group); 444 - group_last_block = group_first_block + (EXT4_BLOCKS_PER_GROUP(sb) - 1); 445 - 446 - if ((rsv->_rsv_start > group_last_block) || 447 - (rsv->_rsv_end < group_first_block)) 448 - return 0; 449 - if ((grp_goal >= 0) && ((grp_goal + group_first_block < rsv->_rsv_start) 450 - || (grp_goal + group_first_block > rsv->_rsv_end))) 451 - return 0; 452 - return 1; 453 - } 454 - 455 - /** 456 - * search_reserve_window() 457 - * @rb_root: root of reservation tree 458 - * @goal: target allocation block 459 - * 460 - * Find the reserved window which includes the goal, or the previous one 461 - * if the goal is not in any window. 462 - * Returns NULL if there are no windows or if all windows start after the goal. 463 - */ 464 - static struct ext4_reserve_window_node * 465 - search_reserve_window(struct rb_root *root, ext4_fsblk_t goal) 466 - { 467 - struct rb_node *n = root->rb_node; 468 - struct ext4_reserve_window_node *rsv; 469 - 470 - if (!n) 471 - return NULL; 472 - 473 - do { 474 - rsv = rb_entry(n, struct ext4_reserve_window_node, rsv_node); 475 - 476 - if (goal < rsv->rsv_start) 477 - n = n->rb_left; 478 - else if (goal > rsv->rsv_end) 479 - n = n->rb_right; 480 - else 481 - return rsv; 482 - } while (n); 483 - /* 484 - * We've fallen off the end of the tree: the goal wasn't inside 485 - * any particular node. OK, the previous node must be to one 486 - * side of the interval containing the goal. If it's the RHS, 487 - * we need to back up one. 488 - */ 489 - if (rsv->rsv_start > goal) { 490 - n = rb_prev(&rsv->rsv_node); 491 - rsv = rb_entry(n, struct ext4_reserve_window_node, rsv_node); 492 - } 493 - return rsv; 494 - } 495 - 496 - /** 497 - * ext4_rsv_window_add() -- Insert a window to the block reservation rb tree. 498 - * @sb: super block 499 - * @rsv: reservation window to add 500 - * 501 - * Must be called with rsv_lock hold. 502 - */ 503 - void ext4_rsv_window_add(struct super_block *sb, 504 - struct ext4_reserve_window_node *rsv) 505 - { 506 - struct rb_root *root = &EXT4_SB(sb)->s_rsv_window_root; 507 - struct rb_node *node = &rsv->rsv_node; 508 - ext4_fsblk_t start = rsv->rsv_start; 509 - 510 - struct rb_node ** p = &root->rb_node; 511 - struct rb_node * parent = NULL; 512 - struct ext4_reserve_window_node *this; 513 - 514 - while (*p) 515 - { 516 - parent = *p; 517 - this = rb_entry(parent, struct ext4_reserve_window_node, rsv_node); 518 - 519 - if (start < this->rsv_start) 520 - p = &(*p)->rb_left; 521 - else if (start > this->rsv_end) 522 - p = &(*p)->rb_right; 523 - else { 524 - rsv_window_dump(root, 1); 525 - BUG(); 526 - } 527 - } 528 - 529 - rb_link_node(node, parent, p); 530 - rb_insert_color(node, root); 531 - } 532 - 533 - /** 534 - * ext4_rsv_window_remove() -- unlink a window from the reservation rb tree 535 - * @sb: super block 536 - * @rsv: reservation window to remove 537 - * 538 - * Mark the block reservation window as not allocated, and unlink it 539 - * from the filesystem reservation window rb tree. Must be called with 540 - * rsv_lock hold. 541 - */ 542 - static void rsv_window_remove(struct super_block *sb, 543 - struct ext4_reserve_window_node *rsv) 544 - { 545 - rsv->rsv_start = EXT4_RESERVE_WINDOW_NOT_ALLOCATED; 546 - rsv->rsv_end = EXT4_RESERVE_WINDOW_NOT_ALLOCATED; 547 - rsv->rsv_alloc_hit = 0; 548 - rb_erase(&rsv->rsv_node, &EXT4_SB(sb)->s_rsv_window_root); 549 - } 550 - 551 - /* 552 - * rsv_is_empty() -- Check if the reservation window is allocated. 553 - * @rsv: given reservation window to check 554 - * 555 - * returns 1 if the end block is EXT4_RESERVE_WINDOW_NOT_ALLOCATED. 556 - */ 557 - static inline int rsv_is_empty(struct ext4_reserve_window *rsv) 558 - { 559 - /* a valid reservation end block could not be 0 */ 560 - return rsv->_rsv_end == EXT4_RESERVE_WINDOW_NOT_ALLOCATED; 561 - } 562 - 563 - /** 564 - * ext4_init_block_alloc_info() 565 - * @inode: file inode structure 566 - * 567 - * Allocate and initialize the reservation window structure, and 568 - * link the window to the ext4 inode structure at last 569 - * 570 - * The reservation window structure is only dynamically allocated 571 - * and linked to ext4 inode the first time the open file 572 - * needs a new block. So, before every ext4_new_block(s) call, for 573 - * regular files, we should check whether the reservation window 574 - * structure exists or not. In the latter case, this function is called. 575 - * Fail to do so will result in block reservation being turned off for that 576 - * open file. 577 - * 578 - * This function is called from ext4_get_blocks_handle(), also called 579 - * when setting the reservation window size through ioctl before the file 580 - * is open for write (needs block allocation). 581 - * 582 - * Needs down_write(i_data_sem) protection prior to call this function. 583 - */ 584 - void ext4_init_block_alloc_info(struct inode *inode) 585 - { 586 - struct ext4_inode_info *ei = EXT4_I(inode); 587 - struct ext4_block_alloc_info *block_i = ei->i_block_alloc_info; 588 - struct super_block *sb = inode->i_sb; 589 - 590 - block_i = kmalloc(sizeof(*block_i), GFP_NOFS); 591 - if (block_i) { 592 - struct ext4_reserve_window_node *rsv = &block_i->rsv_window_node; 593 - 594 - rsv->rsv_start = EXT4_RESERVE_WINDOW_NOT_ALLOCATED; 595 - rsv->rsv_end = EXT4_RESERVE_WINDOW_NOT_ALLOCATED; 596 - 597 - /* 598 - * if filesystem is mounted with NORESERVATION, the goal 599 - * reservation window size is set to zero to indicate 600 - * block reservation is off 601 - */ 602 - if (!test_opt(sb, RESERVATION)) 603 - rsv->rsv_goal_size = 0; 604 - else 605 - rsv->rsv_goal_size = EXT4_DEFAULT_RESERVE_BLOCKS; 606 - rsv->rsv_alloc_hit = 0; 607 - block_i->last_alloc_logical_block = 0; 608 - block_i->last_alloc_physical_block = 0; 609 - } 610 - ei->i_block_alloc_info = block_i; 611 - } 612 - 613 - /** 614 - * ext4_discard_reservation() 615 - * @inode: inode 616 - * 617 - * Discard(free) block reservation window on last file close, or truncate 618 - * or at last iput(). 619 - * 620 - * It is being called in three cases: 621 - * ext4_release_file(): last writer close the file 622 - * ext4_clear_inode(): last iput(), when nobody link to this file. 623 - * ext4_truncate(): when the block indirect map is about to change. 624 - * 625 - */ 626 - void ext4_discard_reservation(struct inode *inode) 627 - { 628 - struct ext4_inode_info *ei = EXT4_I(inode); 629 - struct ext4_block_alloc_info *block_i = ei->i_block_alloc_info; 630 - struct ext4_reserve_window_node *rsv; 631 - spinlock_t *rsv_lock = &EXT4_SB(inode->i_sb)->s_rsv_window_lock; 632 - 633 - ext4_mb_discard_inode_preallocations(inode); 634 - 635 - if (!block_i) 636 - return; 637 - 638 - rsv = &block_i->rsv_window_node; 639 - if (!rsv_is_empty(&rsv->rsv_window)) { 640 - spin_lock(rsv_lock); 641 - if (!rsv_is_empty(&rsv->rsv_window)) 642 - rsv_window_remove(inode->i_sb, rsv); 643 - spin_unlock(rsv_lock); 644 - } 645 - } 646 348 647 349 /** 648 350 * ext4_free_blocks_sb() -- Free given blocks and update quota ··· 356 648 * @block: start physcial block to free 357 649 * @count: number of blocks to free 358 650 * @pdquot_freed_blocks: pointer to quota 651 + * 652 + * XXX This function is only used by the on-line resizing code, which 653 + * should probably be fixed up to call the mballoc variant. There 654 + * this needs to be cleaned up later; in fact, I'm not convinced this 655 + * is 100% correct in the face of the mballoc code. The online resizing 656 + * code needs to be fixed up to more tightly (and correctly) interlock 657 + * with the mballoc code. 359 658 */ 360 659 void ext4_free_blocks_sb(handle_t *handle, struct super_block *sb, 361 660 ext4_fsblk_t block, unsigned long count, ··· 374 659 ext4_grpblk_t bit; 375 660 unsigned long i; 376 661 unsigned long overflow; 377 - struct ext4_group_desc * desc; 378 - struct ext4_super_block * es; 662 + struct ext4_group_desc *desc; 663 + struct ext4_super_block *es; 379 664 struct ext4_sb_info *sbi; 380 665 int err = 0, ret; 381 666 ext4_grpblk_t group_freed; ··· 386 671 if (block < le32_to_cpu(es->s_first_data_block) || 387 672 block + count < block || 388 673 block + count > ext4_blocks_count(es)) { 389 - ext4_error (sb, "ext4_free_blocks", 390 - "Freeing blocks not in datazone - " 391 - "block = %llu, count = %lu", block, count); 674 + ext4_error(sb, "ext4_free_blocks", 675 + "Freeing blocks not in datazone - " 676 + "block = %llu, count = %lu", block, count); 392 677 goto error_return; 393 678 } 394 679 395 - ext4_debug ("freeing block(s) %llu-%llu\n", block, block + count - 1); 680 + ext4_debug("freeing block(s) %llu-%llu\n", block, block + count - 1); 396 681 397 682 do_more: 398 683 overflow = 0; ··· 409 694 bitmap_bh = ext4_read_block_bitmap(sb, block_group); 410 695 if (!bitmap_bh) 411 696 goto error_return; 412 - desc = ext4_get_group_desc (sb, block_group, &gd_bh); 697 + desc = ext4_get_group_desc(sb, block_group, &gd_bh); 413 698 if (!desc) 414 699 goto error_return; 415 700 ··· 418 703 in_range(block, ext4_inode_table(sb, desc), sbi->s_itb_per_group) || 419 704 in_range(block + count - 1, ext4_inode_table(sb, desc), 420 705 sbi->s_itb_per_group)) { 421 - ext4_error (sb, "ext4_free_blocks", 422 - "Freeing blocks in system zones - " 423 - "Block = %llu, count = %lu", 424 - block, count); 706 + ext4_error(sb, "ext4_free_blocks", 707 + "Freeing blocks in system zones - " 708 + "Block = %llu, count = %lu", 709 + block, count); 425 710 goto error_return; 426 711 } 427 712 ··· 563 848 ext4_fsblk_t block, unsigned long count, 564 849 int metadata) 565 850 { 566 - struct super_block * sb; 851 + struct super_block *sb; 567 852 unsigned long dquot_freed_blocks; 568 853 569 854 /* this isn't the right place to decide whether block is metadata ··· 574 859 575 860 sb = inode->i_sb; 576 861 577 - if (!test_opt(sb, MBALLOC) || !EXT4_SB(sb)->s_group_info) 578 - ext4_free_blocks_sb(handle, sb, block, count, 579 - &dquot_freed_blocks); 580 - else 581 - ext4_mb_free_blocks(handle, inode, block, count, 582 - metadata, &dquot_freed_blocks); 862 + ext4_mb_free_blocks(handle, inode, block, count, 863 + metadata, &dquot_freed_blocks); 583 864 if (dquot_freed_blocks) 584 865 DQUOT_FREE_BLOCK(inode, dquot_freed_blocks); 585 866 return; 586 867 } 587 868 588 - /** 589 - * ext4_test_allocatable() 590 - * @nr: given allocation block group 591 - * @bh: bufferhead contains the bitmap of the given block group 592 - * 593 - * For ext4 allocations, we must not reuse any blocks which are 594 - * allocated in the bitmap buffer's "last committed data" copy. This 595 - * prevents deletes from freeing up the page for reuse until we have 596 - * committed the delete transaction. 597 - * 598 - * If we didn't do this, then deleting something and reallocating it as 599 - * data would allow the old block to be overwritten before the 600 - * transaction committed (because we force data to disk before commit). 601 - * This would lead to corruption if we crashed between overwriting the 602 - * data and committing the delete. 603 - * 604 - * @@@ We may want to make this allocation behaviour conditional on 605 - * data-writes at some point, and disable it for metadata allocations or 606 - * sync-data inodes. 607 - */ 608 - static int ext4_test_allocatable(ext4_grpblk_t nr, struct buffer_head *bh) 869 + int ext4_claim_free_blocks(struct ext4_sb_info *sbi, 870 + s64 nblocks) 609 871 { 610 - int ret; 611 - struct journal_head *jh = bh2jh(bh); 872 + s64 free_blocks, dirty_blocks; 873 + s64 root_blocks = 0; 874 + struct percpu_counter *fbc = &sbi->s_freeblocks_counter; 875 + struct percpu_counter *dbc = &sbi->s_dirtyblocks_counter; 612 876 613 - if (ext4_test_bit(nr, bh->b_data)) 614 - return 0; 877 + free_blocks = percpu_counter_read_positive(fbc); 878 + dirty_blocks = percpu_counter_read_positive(dbc); 615 879 616 - jbd_lock_bh_state(bh); 617 - if (!jh->b_committed_data) 618 - ret = 1; 619 - else 620 - ret = !ext4_test_bit(nr, jh->b_committed_data); 621 - jbd_unlock_bh_state(bh); 622 - return ret; 623 - } 880 + if (!capable(CAP_SYS_RESOURCE) && 881 + sbi->s_resuid != current->fsuid && 882 + (sbi->s_resgid == 0 || !in_group_p(sbi->s_resgid))) 883 + root_blocks = ext4_r_blocks_count(sbi->s_es); 624 884 625 - /** 626 - * bitmap_search_next_usable_block() 627 - * @start: the starting block (group relative) of the search 628 - * @bh: bufferhead contains the block group bitmap 629 - * @maxblocks: the ending block (group relative) of the reservation 630 - * 631 - * The bitmap search --- search forward alternately through the actual 632 - * bitmap on disk and the last-committed copy in journal, until we find a 633 - * bit free in both bitmaps. 634 - */ 635 - static ext4_grpblk_t 636 - bitmap_search_next_usable_block(ext4_grpblk_t start, struct buffer_head *bh, 637 - ext4_grpblk_t maxblocks) 638 - { 639 - ext4_grpblk_t next; 640 - struct journal_head *jh = bh2jh(bh); 641 - 642 - while (start < maxblocks) { 643 - next = ext4_find_next_zero_bit(bh->b_data, maxblocks, start); 644 - if (next >= maxblocks) 645 - return -1; 646 - if (ext4_test_allocatable(next, bh)) 647 - return next; 648 - jbd_lock_bh_state(bh); 649 - if (jh->b_committed_data) 650 - start = ext4_find_next_zero_bit(jh->b_committed_data, 651 - maxblocks, next); 652 - jbd_unlock_bh_state(bh); 653 - } 654 - return -1; 655 - } 656 - 657 - /** 658 - * find_next_usable_block() 659 - * @start: the starting block (group relative) to find next 660 - * allocatable block in bitmap. 661 - * @bh: bufferhead contains the block group bitmap 662 - * @maxblocks: the ending block (group relative) for the search 663 - * 664 - * Find an allocatable block in a bitmap. We honor both the bitmap and 665 - * its last-committed copy (if that exists), and perform the "most 666 - * appropriate allocation" algorithm of looking for a free block near 667 - * the initial goal; then for a free byte somewhere in the bitmap; then 668 - * for any free bit in the bitmap. 669 - */ 670 - static ext4_grpblk_t 671 - find_next_usable_block(ext4_grpblk_t start, struct buffer_head *bh, 672 - ext4_grpblk_t maxblocks) 673 - { 674 - ext4_grpblk_t here, next; 675 - char *p, *r; 676 - 677 - if (start > 0) { 678 - /* 679 - * The goal was occupied; search forward for a free 680 - * block within the next XX blocks. 681 - * 682 - * end_goal is more or less random, but it has to be 683 - * less than EXT4_BLOCKS_PER_GROUP. Aligning up to the 684 - * next 64-bit boundary is simple.. 685 - */ 686 - ext4_grpblk_t end_goal = (start + 63) & ~63; 687 - if (end_goal > maxblocks) 688 - end_goal = maxblocks; 689 - here = ext4_find_next_zero_bit(bh->b_data, end_goal, start); 690 - if (here < end_goal && ext4_test_allocatable(here, bh)) 691 - return here; 692 - ext4_debug("Bit not found near goal\n"); 693 - } 694 - 695 - here = start; 696 - if (here < 0) 697 - here = 0; 698 - 699 - p = ((char *)bh->b_data) + (here >> 3); 700 - r = memscan(p, 0, ((maxblocks + 7) >> 3) - (here >> 3)); 701 - next = (r - ((char *)bh->b_data)) << 3; 702 - 703 - if (next < maxblocks && next >= start && ext4_test_allocatable(next, bh)) 704 - return next; 705 - 706 - /* 707 - * The bitmap search --- search forward alternately through the actual 708 - * bitmap and the last-committed copy until we find a bit free in 709 - * both 710 - */ 711 - here = bitmap_search_next_usable_block(here, bh, maxblocks); 712 - return here; 713 - } 714 - 715 - /** 716 - * claim_block() 717 - * @block: the free block (group relative) to allocate 718 - * @bh: the bufferhead containts the block group bitmap 719 - * 720 - * We think we can allocate this block in this bitmap. Try to set the bit. 721 - * If that succeeds then check that nobody has allocated and then freed the 722 - * block since we saw that is was not marked in b_committed_data. If it _was_ 723 - * allocated and freed then clear the bit in the bitmap again and return 724 - * zero (failure). 725 - */ 726 - static inline int 727 - claim_block(spinlock_t *lock, ext4_grpblk_t block, struct buffer_head *bh) 728 - { 729 - struct journal_head *jh = bh2jh(bh); 730 - int ret; 731 - 732 - if (ext4_set_bit_atomic(lock, block, bh->b_data)) 733 - return 0; 734 - jbd_lock_bh_state(bh); 735 - if (jh->b_committed_data && ext4_test_bit(block,jh->b_committed_data)) { 736 - ext4_clear_bit_atomic(lock, block, bh->b_data); 737 - ret = 0; 738 - } else { 739 - ret = 1; 740 - } 741 - jbd_unlock_bh_state(bh); 742 - return ret; 743 - } 744 - 745 - /** 746 - * ext4_try_to_allocate() 747 - * @sb: superblock 748 - * @handle: handle to this transaction 749 - * @group: given allocation block group 750 - * @bitmap_bh: bufferhead holds the block bitmap 751 - * @grp_goal: given target block within the group 752 - * @count: target number of blocks to allocate 753 - * @my_rsv: reservation window 754 - * 755 - * Attempt to allocate blocks within a give range. Set the range of allocation 756 - * first, then find the first free bit(s) from the bitmap (within the range), 757 - * and at last, allocate the blocks by claiming the found free bit as allocated. 758 - * 759 - * To set the range of this allocation: 760 - * if there is a reservation window, only try to allocate block(s) from the 761 - * file's own reservation window; 762 - * Otherwise, the allocation range starts from the give goal block, ends at 763 - * the block group's last block. 764 - * 765 - * If we failed to allocate the desired block then we may end up crossing to a 766 - * new bitmap. In that case we must release write access to the old one via 767 - * ext4_journal_release_buffer(), else we'll run out of credits. 768 - */ 769 - static ext4_grpblk_t 770 - ext4_try_to_allocate(struct super_block *sb, handle_t *handle, 771 - ext4_group_t group, struct buffer_head *bitmap_bh, 772 - ext4_grpblk_t grp_goal, unsigned long *count, 773 - struct ext4_reserve_window *my_rsv) 774 - { 775 - ext4_fsblk_t group_first_block; 776 - ext4_grpblk_t start, end; 777 - unsigned long num = 0; 778 - 779 - /* we do allocation within the reservation window if we have a window */ 780 - if (my_rsv) { 781 - group_first_block = ext4_group_first_block_no(sb, group); 782 - if (my_rsv->_rsv_start >= group_first_block) 783 - start = my_rsv->_rsv_start - group_first_block; 784 - else 785 - /* reservation window cross group boundary */ 786 - start = 0; 787 - end = my_rsv->_rsv_end - group_first_block + 1; 788 - if (end > EXT4_BLOCKS_PER_GROUP(sb)) 789 - /* reservation window crosses group boundary */ 790 - end = EXT4_BLOCKS_PER_GROUP(sb); 791 - if ((start <= grp_goal) && (grp_goal < end)) 792 - start = grp_goal; 793 - else 794 - grp_goal = -1; 795 - } else { 796 - if (grp_goal > 0) 797 - start = grp_goal; 798 - else 799 - start = 0; 800 - end = EXT4_BLOCKS_PER_GROUP(sb); 801 - } 802 - 803 - BUG_ON(start > EXT4_BLOCKS_PER_GROUP(sb)); 804 - 805 - repeat: 806 - if (grp_goal < 0 || !ext4_test_allocatable(grp_goal, bitmap_bh)) { 807 - grp_goal = find_next_usable_block(start, bitmap_bh, end); 808 - if (grp_goal < 0) 809 - goto fail_access; 810 - if (!my_rsv) { 811 - int i; 812 - 813 - for (i = 0; i < 7 && grp_goal > start && 814 - ext4_test_allocatable(grp_goal - 1, 815 - bitmap_bh); 816 - i++, grp_goal--) 817 - ; 885 + if (free_blocks - (nblocks + root_blocks + dirty_blocks) < 886 + EXT4_FREEBLOCKS_WATERMARK) { 887 + free_blocks = percpu_counter_sum(fbc); 888 + dirty_blocks = percpu_counter_sum(dbc); 889 + if (dirty_blocks < 0) { 890 + printk(KERN_CRIT "Dirty block accounting " 891 + "went wrong %lld\n", 892 + dirty_blocks); 818 893 } 819 894 } 820 - start = grp_goal; 821 - 822 - if (!claim_block(sb_bgl_lock(EXT4_SB(sb), group), 823 - grp_goal, bitmap_bh)) { 824 - /* 825 - * The block was allocated by another thread, or it was 826 - * allocated and then freed by another thread 827 - */ 828 - start++; 829 - grp_goal++; 830 - if (start >= end) 831 - goto fail_access; 832 - goto repeat; 833 - } 834 - num++; 835 - grp_goal++; 836 - while (num < *count && grp_goal < end 837 - && ext4_test_allocatable(grp_goal, bitmap_bh) 838 - && claim_block(sb_bgl_lock(EXT4_SB(sb), group), 839 - grp_goal, bitmap_bh)) { 840 - num++; 841 - grp_goal++; 842 - } 843 - *count = num; 844 - return grp_goal - num; 845 - fail_access: 846 - *count = num; 847 - return -1; 848 - } 849 - 850 - /** 851 - * find_next_reservable_window(): 852 - * find a reservable space within the given range. 853 - * It does not allocate the reservation window for now: 854 - * alloc_new_reservation() will do the work later. 855 - * 856 - * @search_head: the head of the searching list; 857 - * This is not necessarily the list head of the whole filesystem 858 - * 859 - * We have both head and start_block to assist the search 860 - * for the reservable space. The list starts from head, 861 - * but we will shift to the place where start_block is, 862 - * then start from there, when looking for a reservable space. 863 - * 864 - * @size: the target new reservation window size 865 - * 866 - * @group_first_block: the first block we consider to start 867 - * the real search from 868 - * 869 - * @last_block: 870 - * the maximum block number that our goal reservable space 871 - * could start from. This is normally the last block in this 872 - * group. The search will end when we found the start of next 873 - * possible reservable space is out of this boundary. 874 - * This could handle the cross boundary reservation window 875 - * request. 876 - * 877 - * basically we search from the given range, rather than the whole 878 - * reservation double linked list, (start_block, last_block) 879 - * to find a free region that is of my size and has not 880 - * been reserved. 881 - * 882 - */ 883 - static int find_next_reservable_window( 884 - struct ext4_reserve_window_node *search_head, 885 - struct ext4_reserve_window_node *my_rsv, 886 - struct super_block * sb, 887 - ext4_fsblk_t start_block, 888 - ext4_fsblk_t last_block) 889 - { 890 - struct rb_node *next; 891 - struct ext4_reserve_window_node *rsv, *prev; 892 - ext4_fsblk_t cur; 893 - int size = my_rsv->rsv_goal_size; 894 - 895 - /* TODO: make the start of the reservation window byte-aligned */ 896 - /* cur = *start_block & ~7;*/ 897 - cur = start_block; 898 - rsv = search_head; 899 - if (!rsv) 900 - return -1; 901 - 902 - while (1) { 903 - if (cur <= rsv->rsv_end) 904 - cur = rsv->rsv_end + 1; 905 - 906 - /* TODO? 907 - * in the case we could not find a reservable space 908 - * that is what is expected, during the re-search, we could 909 - * remember what's the largest reservable space we could have 910 - * and return that one. 911 - * 912 - * For now it will fail if we could not find the reservable 913 - * space with expected-size (or more)... 914 - */ 915 - if (cur > last_block) 916 - return -1; /* fail */ 917 - 918 - prev = rsv; 919 - next = rb_next(&rsv->rsv_node); 920 - rsv = rb_entry(next,struct ext4_reserve_window_node,rsv_node); 921 - 922 - /* 923 - * Reached the last reservation, we can just append to the 924 - * previous one. 925 - */ 926 - if (!next) 927 - break; 928 - 929 - if (cur + size <= rsv->rsv_start) { 930 - /* 931 - * Found a reserveable space big enough. We could 932 - * have a reservation across the group boundary here 933 - */ 934 - break; 935 - } 936 - } 937 - /* 938 - * we come here either : 939 - * when we reach the end of the whole list, 940 - * and there is empty reservable space after last entry in the list. 941 - * append it to the end of the list. 942 - * 943 - * or we found one reservable space in the middle of the list, 944 - * return the reservation window that we could append to. 945 - * succeed. 895 + /* Check whether we have space after 896 + * accounting for current dirty blocks 946 897 */ 898 + if (free_blocks < ((root_blocks + nblocks) + dirty_blocks)) 899 + /* we don't have free space */ 900 + return -ENOSPC; 947 901 948 - if ((prev != my_rsv) && (!rsv_is_empty(&my_rsv->rsv_window))) 949 - rsv_window_remove(sb, my_rsv); 950 - 951 - /* 952 - * Let's book the whole avaliable window for now. We will check the 953 - * disk bitmap later and then, if there are free blocks then we adjust 954 - * the window size if it's larger than requested. 955 - * Otherwise, we will remove this node from the tree next time 956 - * call find_next_reservable_window. 957 - */ 958 - my_rsv->rsv_start = cur; 959 - my_rsv->rsv_end = cur + size - 1; 960 - my_rsv->rsv_alloc_hit = 0; 961 - 962 - if (prev != my_rsv) 963 - ext4_rsv_window_add(sb, my_rsv); 964 - 902 + /* Add the blocks to nblocks */ 903 + percpu_counter_add(dbc, nblocks); 965 904 return 0; 966 - } 967 - 968 - /** 969 - * alloc_new_reservation()--allocate a new reservation window 970 - * 971 - * To make a new reservation, we search part of the filesystem 972 - * reservation list (the list that inside the group). We try to 973 - * allocate a new reservation window near the allocation goal, 974 - * or the beginning of the group, if there is no goal. 975 - * 976 - * We first find a reservable space after the goal, then from 977 - * there, we check the bitmap for the first free block after 978 - * it. If there is no free block until the end of group, then the 979 - * whole group is full, we failed. Otherwise, check if the free 980 - * block is inside the expected reservable space, if so, we 981 - * succeed. 982 - * If the first free block is outside the reservable space, then 983 - * start from the first free block, we search for next available 984 - * space, and go on. 985 - * 986 - * on succeed, a new reservation will be found and inserted into the list 987 - * It contains at least one free block, and it does not overlap with other 988 - * reservation windows. 989 - * 990 - * failed: we failed to find a reservation window in this group 991 - * 992 - * @rsv: the reservation 993 - * 994 - * @grp_goal: The goal (group-relative). It is where the search for a 995 - * free reservable space should start from. 996 - * if we have a grp_goal(grp_goal >0 ), then start from there, 997 - * no grp_goal(grp_goal = -1), we start from the first block 998 - * of the group. 999 - * 1000 - * @sb: the super block 1001 - * @group: the group we are trying to allocate in 1002 - * @bitmap_bh: the block group block bitmap 1003 - * 1004 - */ 1005 - static int alloc_new_reservation(struct ext4_reserve_window_node *my_rsv, 1006 - ext4_grpblk_t grp_goal, struct super_block *sb, 1007 - ext4_group_t group, struct buffer_head *bitmap_bh) 1008 - { 1009 - struct ext4_reserve_window_node *search_head; 1010 - ext4_fsblk_t group_first_block, group_end_block, start_block; 1011 - ext4_grpblk_t first_free_block; 1012 - struct rb_root *fs_rsv_root = &EXT4_SB(sb)->s_rsv_window_root; 1013 - unsigned long size; 1014 - int ret; 1015 - spinlock_t *rsv_lock = &EXT4_SB(sb)->s_rsv_window_lock; 1016 - 1017 - group_first_block = ext4_group_first_block_no(sb, group); 1018 - group_end_block = group_first_block + (EXT4_BLOCKS_PER_GROUP(sb) - 1); 1019 - 1020 - if (grp_goal < 0) 1021 - start_block = group_first_block; 1022 - else 1023 - start_block = grp_goal + group_first_block; 1024 - 1025 - size = my_rsv->rsv_goal_size; 1026 - 1027 - if (!rsv_is_empty(&my_rsv->rsv_window)) { 1028 - /* 1029 - * if the old reservation is cross group boundary 1030 - * and if the goal is inside the old reservation window, 1031 - * we will come here when we just failed to allocate from 1032 - * the first part of the window. We still have another part 1033 - * that belongs to the next group. In this case, there is no 1034 - * point to discard our window and try to allocate a new one 1035 - * in this group(which will fail). we should 1036 - * keep the reservation window, just simply move on. 1037 - * 1038 - * Maybe we could shift the start block of the reservation 1039 - * window to the first block of next group. 1040 - */ 1041 - 1042 - if ((my_rsv->rsv_start <= group_end_block) && 1043 - (my_rsv->rsv_end > group_end_block) && 1044 - (start_block >= my_rsv->rsv_start)) 1045 - return -1; 1046 - 1047 - if ((my_rsv->rsv_alloc_hit > 1048 - (my_rsv->rsv_end - my_rsv->rsv_start + 1) / 2)) { 1049 - /* 1050 - * if the previously allocation hit ratio is 1051 - * greater than 1/2, then we double the size of 1052 - * the reservation window the next time, 1053 - * otherwise we keep the same size window 1054 - */ 1055 - size = size * 2; 1056 - if (size > EXT4_MAX_RESERVE_BLOCKS) 1057 - size = EXT4_MAX_RESERVE_BLOCKS; 1058 - my_rsv->rsv_goal_size= size; 1059 - } 1060 - } 1061 - 1062 - spin_lock(rsv_lock); 1063 - /* 1064 - * shift the search start to the window near the goal block 1065 - */ 1066 - search_head = search_reserve_window(fs_rsv_root, start_block); 1067 - 1068 - /* 1069 - * find_next_reservable_window() simply finds a reservable window 1070 - * inside the given range(start_block, group_end_block). 1071 - * 1072 - * To make sure the reservation window has a free bit inside it, we 1073 - * need to check the bitmap after we found a reservable window. 1074 - */ 1075 - retry: 1076 - ret = find_next_reservable_window(search_head, my_rsv, sb, 1077 - start_block, group_end_block); 1078 - 1079 - if (ret == -1) { 1080 - if (!rsv_is_empty(&my_rsv->rsv_window)) 1081 - rsv_window_remove(sb, my_rsv); 1082 - spin_unlock(rsv_lock); 1083 - return -1; 1084 - } 1085 - 1086 - /* 1087 - * On success, find_next_reservable_window() returns the 1088 - * reservation window where there is a reservable space after it. 1089 - * Before we reserve this reservable space, we need 1090 - * to make sure there is at least a free block inside this region. 1091 - * 1092 - * searching the first free bit on the block bitmap and copy of 1093 - * last committed bitmap alternatively, until we found a allocatable 1094 - * block. Search start from the start block of the reservable space 1095 - * we just found. 1096 - */ 1097 - spin_unlock(rsv_lock); 1098 - first_free_block = bitmap_search_next_usable_block( 1099 - my_rsv->rsv_start - group_first_block, 1100 - bitmap_bh, group_end_block - group_first_block + 1); 1101 - 1102 - if (first_free_block < 0) { 1103 - /* 1104 - * no free block left on the bitmap, no point 1105 - * to reserve the space. return failed. 1106 - */ 1107 - spin_lock(rsv_lock); 1108 - if (!rsv_is_empty(&my_rsv->rsv_window)) 1109 - rsv_window_remove(sb, my_rsv); 1110 - spin_unlock(rsv_lock); 1111 - return -1; /* failed */ 1112 - } 1113 - 1114 - start_block = first_free_block + group_first_block; 1115 - /* 1116 - * check if the first free block is within the 1117 - * free space we just reserved 1118 - */ 1119 - if (start_block >= my_rsv->rsv_start && start_block <= my_rsv->rsv_end) 1120 - return 0; /* success */ 1121 - /* 1122 - * if the first free bit we found is out of the reservable space 1123 - * continue search for next reservable space, 1124 - * start from where the free block is, 1125 - * we also shift the list head to where we stopped last time 1126 - */ 1127 - search_head = my_rsv; 1128 - spin_lock(rsv_lock); 1129 - goto retry; 1130 - } 1131 - 1132 - /** 1133 - * try_to_extend_reservation() 1134 - * @my_rsv: given reservation window 1135 - * @sb: super block 1136 - * @size: the delta to extend 1137 - * 1138 - * Attempt to expand the reservation window large enough to have 1139 - * required number of free blocks 1140 - * 1141 - * Since ext4_try_to_allocate() will always allocate blocks within 1142 - * the reservation window range, if the window size is too small, 1143 - * multiple blocks allocation has to stop at the end of the reservation 1144 - * window. To make this more efficient, given the total number of 1145 - * blocks needed and the current size of the window, we try to 1146 - * expand the reservation window size if necessary on a best-effort 1147 - * basis before ext4_new_blocks() tries to allocate blocks, 1148 - */ 1149 - static void try_to_extend_reservation(struct ext4_reserve_window_node *my_rsv, 1150 - struct super_block *sb, int size) 1151 - { 1152 - struct ext4_reserve_window_node *next_rsv; 1153 - struct rb_node *next; 1154 - spinlock_t *rsv_lock = &EXT4_SB(sb)->s_rsv_window_lock; 1155 - 1156 - if (!spin_trylock(rsv_lock)) 1157 - return; 1158 - 1159 - next = rb_next(&my_rsv->rsv_node); 1160 - 1161 - if (!next) 1162 - my_rsv->rsv_end += size; 1163 - else { 1164 - next_rsv = rb_entry(next, struct ext4_reserve_window_node, rsv_node); 1165 - 1166 - if ((next_rsv->rsv_start - my_rsv->rsv_end - 1) >= size) 1167 - my_rsv->rsv_end += size; 1168 - else 1169 - my_rsv->rsv_end = next_rsv->rsv_start - 1; 1170 - } 1171 - spin_unlock(rsv_lock); 1172 - } 1173 - 1174 - /** 1175 - * ext4_try_to_allocate_with_rsv() 1176 - * @sb: superblock 1177 - * @handle: handle to this transaction 1178 - * @group: given allocation block group 1179 - * @bitmap_bh: bufferhead holds the block bitmap 1180 - * @grp_goal: given target block within the group 1181 - * @count: target number of blocks to allocate 1182 - * @my_rsv: reservation window 1183 - * @errp: pointer to store the error code 1184 - * 1185 - * This is the main function used to allocate a new block and its reservation 1186 - * window. 1187 - * 1188 - * Each time when a new block allocation is need, first try to allocate from 1189 - * its own reservation. If it does not have a reservation window, instead of 1190 - * looking for a free bit on bitmap first, then look up the reservation list to 1191 - * see if it is inside somebody else's reservation window, we try to allocate a 1192 - * reservation window for it starting from the goal first. Then do the block 1193 - * allocation within the reservation window. 1194 - * 1195 - * This will avoid keeping on searching the reservation list again and 1196 - * again when somebody is looking for a free block (without 1197 - * reservation), and there are lots of free blocks, but they are all 1198 - * being reserved. 1199 - * 1200 - * We use a red-black tree for the per-filesystem reservation list. 1201 - * 1202 - */ 1203 - static ext4_grpblk_t 1204 - ext4_try_to_allocate_with_rsv(struct super_block *sb, handle_t *handle, 1205 - ext4_group_t group, struct buffer_head *bitmap_bh, 1206 - ext4_grpblk_t grp_goal, 1207 - struct ext4_reserve_window_node * my_rsv, 1208 - unsigned long *count, int *errp) 1209 - { 1210 - ext4_fsblk_t group_first_block, group_last_block; 1211 - ext4_grpblk_t ret = 0; 1212 - int fatal; 1213 - unsigned long num = *count; 1214 - 1215 - *errp = 0; 1216 - 1217 - /* 1218 - * Make sure we use undo access for the bitmap, because it is critical 1219 - * that we do the frozen_data COW on bitmap buffers in all cases even 1220 - * if the buffer is in BJ_Forget state in the committing transaction. 1221 - */ 1222 - BUFFER_TRACE(bitmap_bh, "get undo access for new block"); 1223 - fatal = ext4_journal_get_undo_access(handle, bitmap_bh); 1224 - if (fatal) { 1225 - *errp = fatal; 1226 - return -1; 1227 - } 1228 - 1229 - /* 1230 - * we don't deal with reservation when 1231 - * filesystem is mounted without reservation 1232 - * or the file is not a regular file 1233 - * or last attempt to allocate a block with reservation turned on failed 1234 - */ 1235 - if (my_rsv == NULL ) { 1236 - ret = ext4_try_to_allocate(sb, handle, group, bitmap_bh, 1237 - grp_goal, count, NULL); 1238 - goto out; 1239 - } 1240 - /* 1241 - * grp_goal is a group relative block number (if there is a goal) 1242 - * 0 <= grp_goal < EXT4_BLOCKS_PER_GROUP(sb) 1243 - * first block is a filesystem wide block number 1244 - * first block is the block number of the first block in this group 1245 - */ 1246 - group_first_block = ext4_group_first_block_no(sb, group); 1247 - group_last_block = group_first_block + (EXT4_BLOCKS_PER_GROUP(sb) - 1); 1248 - 1249 - /* 1250 - * Basically we will allocate a new block from inode's reservation 1251 - * window. 1252 - * 1253 - * We need to allocate a new reservation window, if: 1254 - * a) inode does not have a reservation window; or 1255 - * b) last attempt to allocate a block from existing reservation 1256 - * failed; or 1257 - * c) we come here with a goal and with a reservation window 1258 - * 1259 - * We do not need to allocate a new reservation window if we come here 1260 - * at the beginning with a goal and the goal is inside the window, or 1261 - * we don't have a goal but already have a reservation window. 1262 - * then we could go to allocate from the reservation window directly. 1263 - */ 1264 - while (1) { 1265 - if (rsv_is_empty(&my_rsv->rsv_window) || (ret < 0) || 1266 - !goal_in_my_reservation(&my_rsv->rsv_window, 1267 - grp_goal, group, sb)) { 1268 - if (my_rsv->rsv_goal_size < *count) 1269 - my_rsv->rsv_goal_size = *count; 1270 - ret = alloc_new_reservation(my_rsv, grp_goal, sb, 1271 - group, bitmap_bh); 1272 - if (ret < 0) 1273 - break; /* failed */ 1274 - 1275 - if (!goal_in_my_reservation(&my_rsv->rsv_window, 1276 - grp_goal, group, sb)) 1277 - grp_goal = -1; 1278 - } else if (grp_goal >= 0) { 1279 - int curr = my_rsv->rsv_end - 1280 - (grp_goal + group_first_block) + 1; 1281 - 1282 - if (curr < *count) 1283 - try_to_extend_reservation(my_rsv, sb, 1284 - *count - curr); 1285 - } 1286 - 1287 - if ((my_rsv->rsv_start > group_last_block) || 1288 - (my_rsv->rsv_end < group_first_block)) { 1289 - rsv_window_dump(&EXT4_SB(sb)->s_rsv_window_root, 1); 1290 - BUG(); 1291 - } 1292 - ret = ext4_try_to_allocate(sb, handle, group, bitmap_bh, 1293 - grp_goal, &num, &my_rsv->rsv_window); 1294 - if (ret >= 0) { 1295 - my_rsv->rsv_alloc_hit += num; 1296 - *count = num; 1297 - break; /* succeed */ 1298 - } 1299 - num = *count; 1300 - } 1301 - out: 1302 - if (ret >= 0) { 1303 - BUFFER_TRACE(bitmap_bh, "journal_dirty_metadata for " 1304 - "bitmap block"); 1305 - fatal = ext4_journal_dirty_metadata(handle, bitmap_bh); 1306 - if (fatal) { 1307 - *errp = fatal; 1308 - return -1; 1309 - } 1310 - return ret; 1311 - } 1312 - 1313 - BUFFER_TRACE(bitmap_bh, "journal_release_buffer"); 1314 - ext4_journal_release_buffer(handle, bitmap_bh); 1315 - return ret; 1316 905 } 1317 906 1318 907 /** ··· 629 1610 * On success, return nblocks 630 1611 */ 631 1612 ext4_fsblk_t ext4_has_free_blocks(struct ext4_sb_info *sbi, 632 - ext4_fsblk_t nblocks) 1613 + s64 nblocks) 633 1614 { 634 - ext4_fsblk_t free_blocks; 635 - ext4_fsblk_t root_blocks = 0; 1615 + s64 free_blocks, dirty_blocks; 1616 + s64 root_blocks = 0; 1617 + struct percpu_counter *fbc = &sbi->s_freeblocks_counter; 1618 + struct percpu_counter *dbc = &sbi->s_dirtyblocks_counter; 636 1619 637 - free_blocks = percpu_counter_read_positive(&sbi->s_freeblocks_counter); 1620 + free_blocks = percpu_counter_read_positive(fbc); 1621 + dirty_blocks = percpu_counter_read_positive(dbc); 638 1622 639 1623 if (!capable(CAP_SYS_RESOURCE) && 640 1624 sbi->s_resuid != current->fsuid && 641 1625 (sbi->s_resgid == 0 || !in_group_p(sbi->s_resgid))) 642 1626 root_blocks = ext4_r_blocks_count(sbi->s_es); 643 - #ifdef CONFIG_SMP 644 - if (free_blocks - root_blocks < FBC_BATCH) 645 - free_blocks = 646 - percpu_counter_sum_and_set(&sbi->s_freeblocks_counter); 647 - #endif 648 - if (free_blocks <= root_blocks) 1627 + 1628 + if (free_blocks - (nblocks + root_blocks + dirty_blocks) < 1629 + EXT4_FREEBLOCKS_WATERMARK) { 1630 + free_blocks = percpu_counter_sum(fbc); 1631 + dirty_blocks = percpu_counter_sum(dbc); 1632 + } 1633 + if (free_blocks <= (root_blocks + dirty_blocks)) 649 1634 /* we don't have free space */ 650 1635 return 0; 651 - if (free_blocks - root_blocks < nblocks) 652 - return free_blocks - root_blocks; 1636 + 1637 + if (free_blocks - (root_blocks + dirty_blocks) < nblocks) 1638 + return free_blocks - (root_blocks + dirty_blocks); 653 1639 return nblocks; 654 - } 1640 + } 655 1641 656 1642 657 1643 /** ··· 681 1657 return jbd2_journal_force_commit_nested(EXT4_SB(sb)->s_journal); 682 1658 } 683 1659 684 - /** 685 - * ext4_old_new_blocks() -- core block bitmap based block allocation function 686 - * 687 - * @handle: handle to this transaction 688 - * @inode: file inode 689 - * @goal: given target block(filesystem wide) 690 - * @count: target number of blocks to allocate 691 - * @errp: error code 692 - * 693 - * ext4_old_new_blocks uses a goal block to assist allocation and look up 694 - * the block bitmap directly to do block allocation. It tries to 695 - * allocate block(s) from the block group contains the goal block first. If 696 - * that fails, it will try to allocate block(s) from other block groups 697 - * without any specific goal block. 698 - * 699 - * This function is called when -o nomballoc mount option is enabled 700 - * 701 - */ 702 - ext4_fsblk_t ext4_old_new_blocks(handle_t *handle, struct inode *inode, 703 - ext4_fsblk_t goal, unsigned long *count, int *errp) 704 - { 705 - struct buffer_head *bitmap_bh = NULL; 706 - struct buffer_head *gdp_bh; 707 - ext4_group_t group_no; 708 - ext4_group_t goal_group; 709 - ext4_grpblk_t grp_target_blk; /* blockgroup relative goal block */ 710 - ext4_grpblk_t grp_alloc_blk; /* blockgroup-relative allocated block*/ 711 - ext4_fsblk_t ret_block; /* filesyetem-wide allocated block */ 712 - ext4_group_t bgi; /* blockgroup iteration index */ 713 - int fatal = 0, err; 714 - int performed_allocation = 0; 715 - ext4_grpblk_t free_blocks; /* number of free blocks in a group */ 716 - struct super_block *sb; 717 - struct ext4_group_desc *gdp; 718 - struct ext4_super_block *es; 719 - struct ext4_sb_info *sbi; 720 - struct ext4_reserve_window_node *my_rsv = NULL; 721 - struct ext4_block_alloc_info *block_i; 722 - unsigned short windowsz = 0; 723 - ext4_group_t ngroups; 724 - unsigned long num = *count; 725 - 726 - sb = inode->i_sb; 727 - if (!sb) { 728 - *errp = -ENODEV; 729 - printk("ext4_new_block: nonexistent device"); 730 - return 0; 731 - } 732 - 733 - sbi = EXT4_SB(sb); 734 - if (!EXT4_I(inode)->i_delalloc_reserved_flag) { 735 - /* 736 - * With delalloc we already reserved the blocks 737 - */ 738 - *count = ext4_has_free_blocks(sbi, *count); 739 - } 740 - if (*count == 0) { 741 - *errp = -ENOSPC; 742 - return 0; /*return with ENOSPC error */ 743 - } 744 - num = *count; 745 - 746 - /* 747 - * Check quota for allocation of this block. 748 - */ 749 - if (DQUOT_ALLOC_BLOCK(inode, num)) { 750 - *errp = -EDQUOT; 751 - return 0; 752 - } 753 - 754 - sbi = EXT4_SB(sb); 755 - es = EXT4_SB(sb)->s_es; 756 - ext4_debug("goal=%llu.\n", goal); 757 - /* 758 - * Allocate a block from reservation only when 759 - * filesystem is mounted with reservation(default,-o reservation), and 760 - * it's a regular file, and 761 - * the desired window size is greater than 0 (One could use ioctl 762 - * command EXT4_IOC_SETRSVSZ to set the window size to 0 to turn off 763 - * reservation on that particular file) 764 - */ 765 - block_i = EXT4_I(inode)->i_block_alloc_info; 766 - if (block_i && ((windowsz = block_i->rsv_window_node.rsv_goal_size) > 0)) 767 - my_rsv = &block_i->rsv_window_node; 768 - 769 - /* 770 - * First, test whether the goal block is free. 771 - */ 772 - if (goal < le32_to_cpu(es->s_first_data_block) || 773 - goal >= ext4_blocks_count(es)) 774 - goal = le32_to_cpu(es->s_first_data_block); 775 - ext4_get_group_no_and_offset(sb, goal, &group_no, &grp_target_blk); 776 - goal_group = group_no; 777 - retry_alloc: 778 - gdp = ext4_get_group_desc(sb, group_no, &gdp_bh); 779 - if (!gdp) 780 - goto io_error; 781 - 782 - free_blocks = le16_to_cpu(gdp->bg_free_blocks_count); 783 - /* 784 - * if there is not enough free blocks to make a new resevation 785 - * turn off reservation for this allocation 786 - */ 787 - if (my_rsv && (free_blocks < windowsz) 788 - && (rsv_is_empty(&my_rsv->rsv_window))) 789 - my_rsv = NULL; 790 - 791 - if (free_blocks > 0) { 792 - bitmap_bh = ext4_read_block_bitmap(sb, group_no); 793 - if (!bitmap_bh) 794 - goto io_error; 795 - grp_alloc_blk = ext4_try_to_allocate_with_rsv(sb, handle, 796 - group_no, bitmap_bh, grp_target_blk, 797 - my_rsv, &num, &fatal); 798 - if (fatal) 799 - goto out; 800 - if (grp_alloc_blk >= 0) 801 - goto allocated; 802 - } 803 - 804 - ngroups = EXT4_SB(sb)->s_groups_count; 805 - smp_rmb(); 806 - 807 - /* 808 - * Now search the rest of the groups. We assume that 809 - * group_no and gdp correctly point to the last group visited. 810 - */ 811 - for (bgi = 0; bgi < ngroups; bgi++) { 812 - group_no++; 813 - if (group_no >= ngroups) 814 - group_no = 0; 815 - gdp = ext4_get_group_desc(sb, group_no, &gdp_bh); 816 - if (!gdp) 817 - goto io_error; 818 - free_blocks = le16_to_cpu(gdp->bg_free_blocks_count); 819 - /* 820 - * skip this group if the number of 821 - * free blocks is less than half of the reservation 822 - * window size. 823 - */ 824 - if (free_blocks <= (windowsz/2)) 825 - continue; 826 - 827 - brelse(bitmap_bh); 828 - bitmap_bh = ext4_read_block_bitmap(sb, group_no); 829 - if (!bitmap_bh) 830 - goto io_error; 831 - /* 832 - * try to allocate block(s) from this group, without a goal(-1). 833 - */ 834 - grp_alloc_blk = ext4_try_to_allocate_with_rsv(sb, handle, 835 - group_no, bitmap_bh, -1, my_rsv, 836 - &num, &fatal); 837 - if (fatal) 838 - goto out; 839 - if (grp_alloc_blk >= 0) 840 - goto allocated; 841 - } 842 - /* 843 - * We may end up a bogus ealier ENOSPC error due to 844 - * filesystem is "full" of reservations, but 845 - * there maybe indeed free blocks avaliable on disk 846 - * In this case, we just forget about the reservations 847 - * just do block allocation as without reservations. 848 - */ 849 - if (my_rsv) { 850 - my_rsv = NULL; 851 - windowsz = 0; 852 - group_no = goal_group; 853 - goto retry_alloc; 854 - } 855 - /* No space left on the device */ 856 - *errp = -ENOSPC; 857 - goto out; 858 - 859 - allocated: 860 - 861 - ext4_debug("using block group %lu(%d)\n", 862 - group_no, gdp->bg_free_blocks_count); 863 - 864 - BUFFER_TRACE(gdp_bh, "get_write_access"); 865 - fatal = ext4_journal_get_write_access(handle, gdp_bh); 866 - if (fatal) 867 - goto out; 868 - 869 - ret_block = grp_alloc_blk + ext4_group_first_block_no(sb, group_no); 870 - 871 - if (in_range(ext4_block_bitmap(sb, gdp), ret_block, num) || 872 - in_range(ext4_inode_bitmap(sb, gdp), ret_block, num) || 873 - in_range(ret_block, ext4_inode_table(sb, gdp), 874 - EXT4_SB(sb)->s_itb_per_group) || 875 - in_range(ret_block + num - 1, ext4_inode_table(sb, gdp), 876 - EXT4_SB(sb)->s_itb_per_group)) { 877 - ext4_error(sb, "ext4_new_block", 878 - "Allocating block in system zone - " 879 - "blocks from %llu, length %lu", 880 - ret_block, num); 881 - /* 882 - * claim_block marked the blocks we allocated 883 - * as in use. So we may want to selectively 884 - * mark some of the blocks as free 885 - */ 886 - goto retry_alloc; 887 - } 888 - 889 - performed_allocation = 1; 890 - 891 - #ifdef CONFIG_JBD2_DEBUG 892 - { 893 - struct buffer_head *debug_bh; 894 - 895 - /* Record bitmap buffer state in the newly allocated block */ 896 - debug_bh = sb_find_get_block(sb, ret_block); 897 - if (debug_bh) { 898 - BUFFER_TRACE(debug_bh, "state when allocated"); 899 - BUFFER_TRACE2(debug_bh, bitmap_bh, "bitmap state"); 900 - brelse(debug_bh); 901 - } 902 - } 903 - jbd_lock_bh_state(bitmap_bh); 904 - spin_lock(sb_bgl_lock(sbi, group_no)); 905 - if (buffer_jbd(bitmap_bh) && bh2jh(bitmap_bh)->b_committed_data) { 906 - int i; 907 - 908 - for (i = 0; i < num; i++) { 909 - if (ext4_test_bit(grp_alloc_blk+i, 910 - bh2jh(bitmap_bh)->b_committed_data)) { 911 - printk("%s: block was unexpectedly set in " 912 - "b_committed_data\n", __func__); 913 - } 914 - } 915 - } 916 - ext4_debug("found bit %d\n", grp_alloc_blk); 917 - spin_unlock(sb_bgl_lock(sbi, group_no)); 918 - jbd_unlock_bh_state(bitmap_bh); 919 - #endif 920 - 921 - if (ret_block + num - 1 >= ext4_blocks_count(es)) { 922 - ext4_error(sb, "ext4_new_block", 923 - "block(%llu) >= blocks count(%llu) - " 924 - "block_group = %lu, es == %p ", ret_block, 925 - ext4_blocks_count(es), group_no, es); 926 - goto out; 927 - } 928 - 929 - /* 930 - * It is up to the caller to add the new buffer to a journal 931 - * list of some description. We don't know in advance whether 932 - * the caller wants to use it as metadata or data. 933 - */ 934 - spin_lock(sb_bgl_lock(sbi, group_no)); 935 - if (gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) 936 - gdp->bg_flags &= cpu_to_le16(~EXT4_BG_BLOCK_UNINIT); 937 - le16_add_cpu(&gdp->bg_free_blocks_count, -num); 938 - gdp->bg_checksum = ext4_group_desc_csum(sbi, group_no, gdp); 939 - spin_unlock(sb_bgl_lock(sbi, group_no)); 940 - if (!EXT4_I(inode)->i_delalloc_reserved_flag) 941 - percpu_counter_sub(&sbi->s_freeblocks_counter, num); 942 - 943 - if (sbi->s_log_groups_per_flex) { 944 - ext4_group_t flex_group = ext4_flex_group(sbi, group_no); 945 - spin_lock(sb_bgl_lock(sbi, flex_group)); 946 - sbi->s_flex_groups[flex_group].free_blocks -= num; 947 - spin_unlock(sb_bgl_lock(sbi, flex_group)); 948 - } 949 - 950 - BUFFER_TRACE(gdp_bh, "journal_dirty_metadata for group descriptor"); 951 - err = ext4_journal_dirty_metadata(handle, gdp_bh); 952 - if (!fatal) 953 - fatal = err; 954 - 955 - sb->s_dirt = 1; 956 - if (fatal) 957 - goto out; 958 - 959 - *errp = 0; 960 - brelse(bitmap_bh); 961 - DQUOT_FREE_BLOCK(inode, *count-num); 962 - *count = num; 963 - return ret_block; 964 - 965 - io_error: 966 - *errp = -EIO; 967 - out: 968 - if (fatal) { 969 - *errp = fatal; 970 - ext4_std_error(sb, fatal); 971 - } 972 - /* 973 - * Undo the block allocation 974 - */ 975 - if (!performed_allocation) 976 - DQUOT_FREE_BLOCK(inode, *count); 977 - brelse(bitmap_bh); 978 - return 0; 979 - } 980 - 981 1660 #define EXT4_META_BLOCK 0x1 982 1661 983 1662 static ext4_fsblk_t do_blk_alloc(handle_t *handle, struct inode *inode, ··· 689 1962 { 690 1963 struct ext4_allocation_request ar; 691 1964 ext4_fsblk_t ret; 692 - 693 - if (!test_opt(inode->i_sb, MBALLOC)) { 694 - return ext4_old_new_blocks(handle, inode, goal, count, errp); 695 - } 696 1965 697 1966 memset(&ar, 0, sizeof(ar)); 698 1967 /* Fill with neighbour allocated blocks */ ··· 731 2008 /* 732 2009 * Account for the allocated meta blocks 733 2010 */ 734 - if (!(*errp)) { 2011 + if (!(*errp) && EXT4_I(inode)->i_delalloc_reserved_flag) { 735 2012 spin_lock(&EXT4_I(inode)->i_block_reservation_lock); 736 2013 EXT4_I(inode)->i_allocated_meta_blocks += *count; 737 2014 spin_unlock(&EXT4_I(inode)->i_block_reservation_lock); ··· 816 2093 bitmap_count += x; 817 2094 } 818 2095 brelse(bitmap_bh); 819 - printk("ext4_count_free_blocks: stored = %llu" 820 - ", computed = %llu, %llu\n", 821 - ext4_free_blocks_count(es), 822 - desc_count, bitmap_count); 2096 + printk(KERN_DEBUG "ext4_count_free_blocks: stored = %llu" 2097 + ", computed = %llu, %llu\n", ext4_free_blocks_count(es), 2098 + desc_count, bitmap_count); 823 2099 return bitmap_count; 824 2100 #else 825 2101 desc_count = 0; ··· 905 2183 906 2184 if (!EXT4_HAS_INCOMPAT_FEATURE(sb,EXT4_FEATURE_INCOMPAT_META_BG) || 907 2185 metagroup < first_meta_bg) 908 - return ext4_bg_num_gdb_nometa(sb,group); 2186 + return ext4_bg_num_gdb_nometa(sb, group); 909 2187 910 2188 return ext4_bg_num_gdb_meta(sb,group); 911 2189 912 2190 } 2191 +

+3 -3

fs/ext4/bitmap.c

··· 15 15 16 16 static const int nibblemap[] = {4, 3, 3, 2, 3, 2, 2, 1, 3, 2, 2, 1, 2, 1, 1, 0}; 17 17 18 - unsigned long ext4_count_free (struct buffer_head * map, unsigned int numchars) 18 + unsigned long ext4_count_free(struct buffer_head *map, unsigned int numchars) 19 19 { 20 20 unsigned int i; 21 21 unsigned long sum = 0; 22 22 23 23 if (!map) 24 - return (0); 24 + return 0; 25 25 for (i = 0; i < numchars; i++) 26 26 sum += nibblemap[map->b_data[i] & 0xf] + 27 27 nibblemap[(map->b_data[i] >> 4) & 0xf]; 28 - return (sum); 28 + return sum; 29 29 } 30 30 31 31 #endif /* EXT4FS_DEBUG */

+35 -29

fs/ext4/dir.c

··· 33 33 }; 34 34 35 35 static int ext4_readdir(struct file *, void *, filldir_t); 36 - static int ext4_dx_readdir(struct file * filp, 37 - void * dirent, filldir_t filldir); 38 - static int ext4_release_dir (struct inode * inode, 39 - struct file * filp); 36 + static int ext4_dx_readdir(struct file *filp, 37 + void *dirent, filldir_t filldir); 38 + static int ext4_release_dir(struct inode *inode, 39 + struct file *filp); 40 40 41 41 const struct file_operations ext4_dir_operations = { 42 42 .llseek = generic_file_llseek, ··· 61 61 } 62 62 63 63 64 - int ext4_check_dir_entry (const char * function, struct inode * dir, 65 - struct ext4_dir_entry_2 * de, 66 - struct buffer_head * bh, 67 - unsigned long offset) 64 + int ext4_check_dir_entry(const char *function, struct inode *dir, 65 + struct ext4_dir_entry_2 *de, 66 + struct buffer_head *bh, 67 + unsigned long offset) 68 68 { 69 - const char * error_msg = NULL; 69 + const char *error_msg = NULL; 70 70 const int rlen = ext4_rec_len_from_disk(de->rec_len); 71 71 72 72 if (rlen < EXT4_DIR_REC_LEN(1)) ··· 82 82 error_msg = "inode out of bounds"; 83 83 84 84 if (error_msg != NULL) 85 - ext4_error (dir->i_sb, function, 85 + ext4_error(dir->i_sb, function, 86 86 "bad entry in directory #%lu: %s - " 87 87 "offset=%lu, inode=%lu, rec_len=%d, name_len=%d", 88 88 dir->i_ino, error_msg, offset, ··· 91 91 return error_msg == NULL ? 1 : 0; 92 92 } 93 93 94 - static int ext4_readdir(struct file * filp, 95 - void * dirent, filldir_t filldir) 94 + static int ext4_readdir(struct file *filp, 95 + void *dirent, filldir_t filldir) 96 96 { 97 97 int error = 0; 98 98 unsigned long offset; ··· 102 102 int err; 103 103 struct inode *inode = filp->f_path.dentry->d_inode; 104 104 int ret = 0; 105 + int dir_has_error = 0; 105 106 106 107 sb = inode->i_sb; 107 108 ··· 149 148 * of recovering data when there's a bad sector 150 149 */ 151 150 if (!bh) { 152 - ext4_error (sb, "ext4_readdir", 153 - "directory #%lu contains a hole at offset %lu", 154 - inode->i_ino, (unsigned long)filp->f_pos); 151 + if (!dir_has_error) { 152 + ext4_error(sb, __func__, "directory #%lu " 153 + "contains a hole at offset %Lu", 154 + inode->i_ino, 155 + (unsigned long long) filp->f_pos); 156 + dir_has_error = 1; 157 + } 155 158 /* corrupt size? Maybe no more blocks to read */ 156 159 if (filp->f_pos > inode->i_blocks << 9) 157 160 break; ··· 192 187 while (!error && filp->f_pos < inode->i_size 193 188 && offset < sb->s_blocksize) { 194 189 de = (struct ext4_dir_entry_2 *) (bh->b_data + offset); 195 - if (!ext4_check_dir_entry ("ext4_readdir", inode, de, 196 - bh, offset)) { 190 + if (!ext4_check_dir_entry("ext4_readdir", inode, de, 191 + bh, offset)) { 197 192 /* 198 193 * On error, skip the f_pos to the next block 199 194 */ 200 195 filp->f_pos = (filp->f_pos | 201 196 (sb->s_blocksize - 1)) + 1; 202 - brelse (bh); 197 + brelse(bh); 203 198 ret = stored; 204 199 goto out; 205 200 } ··· 223 218 break; 224 219 if (version != filp->f_version) 225 220 goto revalidate; 226 - stored ++; 221 + stored++; 227 222 } 228 223 filp->f_pos += ext4_rec_len_from_disk(de->rec_len); 229 224 } 230 225 offset = 0; 231 - brelse (bh); 226 + brelse(bh); 232 227 } 233 228 out: 234 229 return ret; ··· 295 290 parent = rb_parent(n); 296 291 fname = rb_entry(n, struct fname, rb_hash); 297 292 while (fname) { 298 - struct fname * old = fname; 293 + struct fname *old = fname; 299 294 fname = fname->next; 300 - kfree (old); 295 + kfree(old); 301 296 } 302 297 if (!parent) 303 298 root->rb_node = NULL; ··· 336 331 struct ext4_dir_entry_2 *dirent) 337 332 { 338 333 struct rb_node **p, *parent = NULL; 339 - struct fname * fname, *new_fn; 334 + struct fname *fname, *new_fn; 340 335 struct dir_private_info *info; 341 336 int len; 342 337 ··· 393 388 * for all entres on the fname linked list. (Normally there is only 394 389 * one entry on the linked list, unless there are 62 bit hash collisions.) 395 390 */ 396 - static int call_filldir(struct file * filp, void * dirent, 391 + static int call_filldir(struct file *filp, void *dirent, 397 392 filldir_t filldir, struct fname *fname) 398 393 { 399 394 struct dir_private_info *info = filp->private_data; 400 395 loff_t curr_pos; 401 396 struct inode *inode = filp->f_path.dentry->d_inode; 402 - struct super_block * sb; 397 + struct super_block *sb; 403 398 int error; 404 399 405 400 sb = inode->i_sb; 406 401 407 402 if (!fname) { 408 - printk("call_filldir: called with null fname?!?\n"); 403 + printk(KERN_ERR "ext4: call_filldir: called with " 404 + "null fname?!?\n"); 409 405 return 0; 410 406 } 411 407 curr_pos = hash2pos(fname->hash, fname->minor_hash); ··· 425 419 return 0; 426 420 } 427 421 428 - static int ext4_dx_readdir(struct file * filp, 429 - void * dirent, filldir_t filldir) 422 + static int ext4_dx_readdir(struct file *filp, 423 + void *dirent, filldir_t filldir) 430 424 { 431 425 struct dir_private_info *info = filp->private_data; 432 426 struct inode *inode = filp->f_path.dentry->d_inode; ··· 517 511 return 0; 518 512 } 519 513 520 - static int ext4_release_dir (struct inode * inode, struct file * filp) 514 + static int ext4_release_dir(struct inode *inode, struct file *filp) 521 515 { 522 516 if (filp->private_data) 523 517 ext4_htree_free_dir_info(filp->private_data);

+85 -46

fs/ext4/ext4.h

··· 44 44 #ifdef EXT4FS_DEBUG 45 45 #define ext4_debug(f, a...) \ 46 46 do { \ 47 - printk (KERN_DEBUG "EXT4-fs DEBUG (%s, %d): %s:", \ 47 + printk(KERN_DEBUG "EXT4-fs DEBUG (%s, %d): %s:", \ 48 48 __FILE__, __LINE__, __func__); \ 49 - printk (KERN_DEBUG f, ## a); \ 49 + printk(KERN_DEBUG f, ## a); \ 50 50 } while (0) 51 51 #else 52 52 #define ext4_debug(f, a...) do {} while (0) ··· 128 128 #else 129 129 # define EXT4_BLOCK_SIZE(s) (EXT4_MIN_BLOCK_SIZE << (s)->s_log_block_size) 130 130 #endif 131 - #define EXT4_ADDR_PER_BLOCK(s) (EXT4_BLOCK_SIZE(s) / sizeof (__u32)) 131 + #define EXT4_ADDR_PER_BLOCK(s) (EXT4_BLOCK_SIZE(s) / sizeof(__u32)) 132 132 #ifdef __KERNEL__ 133 133 # define EXT4_BLOCK_SIZE_BITS(s) ((s)->s_blocksize_bits) 134 134 #else ··· 245 245 #define EXT4_RESERVED_FL 0x80000000 /* reserved for ext4 lib */ 246 246 247 247 #define EXT4_FL_USER_VISIBLE 0x000BDFFF /* User visible flags */ 248 - #define EXT4_FL_USER_MODIFIABLE 0x000380FF /* User modifiable flags */ 248 + #define EXT4_FL_USER_MODIFIABLE 0x000B80FF /* User modifiable flags */ 249 249 250 250 /* 251 251 * Inode dynamic state flags ··· 291 291 #define EXT4_IOC_SETFLAGS FS_IOC_SETFLAGS 292 292 #define EXT4_IOC_GETVERSION _IOR('f', 3, long) 293 293 #define EXT4_IOC_SETVERSION _IOW('f', 4, long) 294 - #define EXT4_IOC_GROUP_EXTEND _IOW('f', 7, unsigned long) 295 - #define EXT4_IOC_GROUP_ADD _IOW('f', 8,struct ext4_new_group_input) 296 294 #define EXT4_IOC_GETVERSION_OLD FS_IOC_GETVERSION 297 295 #define EXT4_IOC_SETVERSION_OLD FS_IOC_SETVERSION 298 296 #ifdef CONFIG_JBD2_DEBUG ··· 298 300 #endif 299 301 #define EXT4_IOC_GETRSVSZ _IOR('f', 5, long) 300 302 #define EXT4_IOC_SETRSVSZ _IOW('f', 6, long) 301 - #define EXT4_IOC_MIGRATE _IO('f', 7) 303 + #define EXT4_IOC_GROUP_EXTEND _IOW('f', 7, unsigned long) 304 + #define EXT4_IOC_GROUP_ADD _IOW('f', 8, struct ext4_new_group_input) 305 + #define EXT4_IOC_MIGRATE _IO('f', 9) 306 + /* note ioctl 11 reserved for filesystem-independent FIEMAP ioctl */ 302 307 303 308 /* 304 309 * ioctl commands in 32 bit emulation ··· 539 538 #define EXT4_MOUNT_JOURNAL_CHECKSUM 0x800000 /* Journal checksums */ 540 539 #define EXT4_MOUNT_JOURNAL_ASYNC_COMMIT 0x1000000 /* Journal Async Commit */ 541 540 #define EXT4_MOUNT_I_VERSION 0x2000000 /* i_version support */ 542 - #define EXT4_MOUNT_MBALLOC 0x4000000 /* Buddy allocation support */ 543 541 #define EXT4_MOUNT_DELALLOC 0x8000000 /* Delalloc support */ 544 542 /* Compatibility, for having both ext2_fs.h and ext4_fs.h included at once */ 545 543 #ifndef _LINUX_EXT2_FS_H ··· 667 667 }; 668 668 669 669 #ifdef __KERNEL__ 670 - static inline struct ext4_sb_info * EXT4_SB(struct super_block *sb) 670 + static inline struct ext4_sb_info *EXT4_SB(struct super_block *sb) 671 671 { 672 672 return sb->s_fs_info; 673 673 } ··· 725 725 */ 726 726 727 727 #define EXT4_HAS_COMPAT_FEATURE(sb,mask) \ 728 - ( EXT4_SB(sb)->s_es->s_feature_compat & cpu_to_le32(mask) ) 728 + (EXT4_SB(sb)->s_es->s_feature_compat & cpu_to_le32(mask)) 729 729 #define EXT4_HAS_RO_COMPAT_FEATURE(sb,mask) \ 730 - ( EXT4_SB(sb)->s_es->s_feature_ro_compat & cpu_to_le32(mask) ) 730 + (EXT4_SB(sb)->s_es->s_feature_ro_compat & cpu_to_le32(mask)) 731 731 #define EXT4_HAS_INCOMPAT_FEATURE(sb,mask) \ 732 - ( EXT4_SB(sb)->s_es->s_feature_incompat & cpu_to_le32(mask) ) 732 + (EXT4_SB(sb)->s_es->s_feature_incompat & cpu_to_le32(mask)) 733 733 #define EXT4_SET_COMPAT_FEATURE(sb,mask) \ 734 734 EXT4_SB(sb)->s_es->s_feature_compat |= cpu_to_le32(mask) 735 735 #define EXT4_SET_RO_COMPAT_FEATURE(sb,mask) \ ··· 788 788 */ 789 789 #define EXT4_DEF_RESUID 0 790 790 #define EXT4_DEF_RESGID 0 791 + 792 + #define EXT4_DEF_INODE_READAHEAD_BLKS 32 791 793 792 794 /* 793 795 * Default mount options ··· 956 954 void ext4_get_group_no_and_offset(struct super_block *sb, ext4_fsblk_t blocknr, 957 955 unsigned long *blockgrpp, ext4_grpblk_t *offsetp); 958 956 957 + extern struct proc_dir_entry *ext4_proc_root; 958 + 959 + #ifdef CONFIG_PROC_FS 960 + extern const struct file_operations ext4_ui_proc_fops; 961 + 962 + #define EXT4_PROC_HANDLER(name, var) \ 963 + do { \ 964 + proc = proc_create_data(name, mode, sbi->s_proc, \ 965 + &ext4_ui_proc_fops, &sbi->s_##var); \ 966 + if (proc == NULL) { \ 967 + printk(KERN_ERR "EXT4-fs: can't create %s\n", name); \ 968 + goto err_out; \ 969 + } \ 970 + } while (0) 971 + #else 972 + #define EXT4_PROC_HANDLER(name, var) 973 + #endif 974 + 959 975 /* 960 976 * Function prototypes 961 977 */ ··· 1001 981 extern ext4_fsblk_t ext4_new_blocks(handle_t *handle, struct inode *inode, 1002 982 ext4_lblk_t iblock, ext4_fsblk_t goal, 1003 983 unsigned long *count, int *errp); 1004 - extern ext4_fsblk_t ext4_old_new_blocks(handle_t *handle, struct inode *inode, 1005 - ext4_fsblk_t goal, unsigned long *count, int *errp); 984 + extern int ext4_claim_free_blocks(struct ext4_sb_info *sbi, s64 nblocks); 1006 985 extern ext4_fsblk_t ext4_has_free_blocks(struct ext4_sb_info *sbi, 1007 - ext4_fsblk_t nblocks); 1008 - extern void ext4_free_blocks (handle_t *handle, struct inode *inode, 986 + s64 nblocks); 987 + extern void ext4_free_blocks(handle_t *handle, struct inode *inode, 1009 988 ext4_fsblk_t block, unsigned long count, int metadata); 1010 - extern void ext4_free_blocks_sb (handle_t *handle, struct super_block *sb, 1011 - ext4_fsblk_t block, unsigned long count, 989 + extern void ext4_free_blocks_sb(handle_t *handle, struct super_block *sb, 990 + ext4_fsblk_t block, unsigned long count, 1012 991 unsigned long *pdquot_freed_blocks); 1013 - extern ext4_fsblk_t ext4_count_free_blocks (struct super_block *); 1014 - extern void ext4_check_blocks_bitmap (struct super_block *); 992 + extern ext4_fsblk_t ext4_count_free_blocks(struct super_block *); 993 + extern void ext4_check_blocks_bitmap(struct super_block *); 1015 994 extern struct ext4_group_desc * ext4_get_group_desc(struct super_block * sb, 1016 995 ext4_group_t block_group, 1017 996 struct buffer_head ** bh); 1018 997 extern int ext4_should_retry_alloc(struct super_block *sb, int *retries); 1019 - extern void ext4_init_block_alloc_info(struct inode *); 1020 - extern void ext4_rsv_window_add(struct super_block *sb, struct ext4_reserve_window_node *rsv); 1021 998 1022 999 /* dir.c */ 1023 1000 extern int ext4_check_dir_entry(const char *, struct inode *, ··· 1026 1009 extern void ext4_htree_free_dir_info(struct dir_private_info *p); 1027 1010 1028 1011 /* fsync.c */ 1029 - extern int ext4_sync_file (struct file *, struct dentry *, int); 1012 + extern int ext4_sync_file(struct file *, struct dentry *, int); 1030 1013 1031 1014 /* hash.c */ 1032 1015 extern int ext4fs_dirhash(const char *name, int len, struct 1033 1016 dx_hash_info *hinfo); 1034 1017 1035 1018 /* ialloc.c */ 1036 - extern struct inode * ext4_new_inode (handle_t *, struct inode *, int); 1037 - extern void ext4_free_inode (handle_t *, struct inode *); 1038 - extern struct inode * ext4_orphan_get (struct super_block *, unsigned long); 1039 - extern unsigned long ext4_count_free_inodes (struct super_block *); 1040 - extern unsigned long ext4_count_dirs (struct super_block *); 1041 - extern void ext4_check_inodes_bitmap (struct super_block *); 1042 - extern unsigned long ext4_count_free (struct buffer_head *, unsigned); 1019 + extern struct inode * ext4_new_inode(handle_t *, struct inode *, int); 1020 + extern void ext4_free_inode(handle_t *, struct inode *); 1021 + extern struct inode * ext4_orphan_get(struct super_block *, unsigned long); 1022 + extern unsigned long ext4_count_free_inodes(struct super_block *); 1023 + extern unsigned long ext4_count_dirs(struct super_block *); 1024 + extern void ext4_check_inodes_bitmap(struct super_block *); 1025 + extern unsigned long ext4_count_free(struct buffer_head *, unsigned); 1043 1026 1044 1027 /* mballoc.c */ 1045 1028 extern long ext4_mb_stats; ··· 1049 1032 extern ext4_fsblk_t ext4_mb_new_blocks(handle_t *, 1050 1033 struct ext4_allocation_request *, int *); 1051 1034 extern int ext4_mb_reserve_blocks(struct super_block *, int); 1052 - extern void ext4_mb_discard_inode_preallocations(struct inode *); 1035 + extern void ext4_discard_preallocations(struct inode *); 1053 1036 extern int __init init_ext4_mballoc(void); 1054 1037 extern void exit_ext4_mballoc(void); 1055 1038 extern void ext4_mb_free_blocks(handle_t *, struct inode *, ··· 1067 1050 ext4_lblk_t, int, int *); 1068 1051 struct buffer_head *ext4_bread(handle_t *, struct inode *, 1069 1052 ext4_lblk_t, int, int *); 1053 + int ext4_get_block(struct inode *inode, sector_t iblock, 1054 + struct buffer_head *bh_result, int create); 1070 1055 int ext4_get_blocks_handle(handle_t *handle, struct inode *inode, 1071 1056 ext4_lblk_t iblock, unsigned long maxblocks, 1072 1057 struct buffer_head *bh_result, 1073 1058 int create, int extend_disksize); 1074 1059 1075 1060 extern struct inode *ext4_iget(struct super_block *, unsigned long); 1076 - extern int ext4_write_inode (struct inode *, int); 1077 - extern int ext4_setattr (struct dentry *, struct iattr *); 1061 + extern int ext4_write_inode(struct inode *, int); 1062 + extern int ext4_setattr(struct dentry *, struct iattr *); 1078 1063 extern int ext4_getattr(struct vfsmount *mnt, struct dentry *dentry, 1079 1064 struct kstat *stat); 1080 - extern void ext4_delete_inode (struct inode *); 1081 - extern int ext4_sync_inode (handle_t *, struct inode *); 1082 - extern void ext4_discard_reservation (struct inode *); 1065 + extern void ext4_delete_inode(struct inode *); 1066 + extern int ext4_sync_inode(handle_t *, struct inode *); 1083 1067 extern void ext4_dirty_inode(struct inode *); 1084 1068 extern int ext4_change_inode_journal_flag(struct inode *, int); 1085 1069 extern int ext4_get_inode_loc(struct inode *, struct ext4_iloc *); 1086 1070 extern int ext4_can_truncate(struct inode *inode); 1087 - extern void ext4_truncate (struct inode *); 1071 + extern void ext4_truncate(struct inode *); 1088 1072 extern void ext4_set_inode_flags(struct inode *); 1089 1073 extern void ext4_get_inode_flags(struct ext4_inode_info *); 1090 1074 extern void ext4_set_aops(struct inode *inode); ··· 1098 1080 1099 1081 /* ioctl.c */ 1100 1082 extern long ext4_ioctl(struct file *, unsigned int, unsigned long); 1101 - extern long ext4_compat_ioctl (struct file *, unsigned int, unsigned long); 1083 + extern long ext4_compat_ioctl(struct file *, unsigned int, unsigned long); 1102 1084 1103 1085 /* migrate.c */ 1104 - extern int ext4_ext_migrate(struct inode *, struct file *, unsigned int, 1105 - unsigned long); 1086 + extern int ext4_ext_migrate(struct inode *); 1106 1087 /* namei.c */ 1107 1088 extern int ext4_orphan_add(handle_t *, struct inode *); 1108 1089 extern int ext4_orphan_del(handle_t *, struct inode *); ··· 1116 1099 ext4_fsblk_t n_blocks_count); 1117 1100 1118 1101 /* super.c */ 1119 - extern void ext4_error (struct super_block *, const char *, const char *, ...) 1102 + extern void ext4_error(struct super_block *, const char *, const char *, ...) 1120 1103 __attribute__ ((format (printf, 3, 4))); 1121 - extern void __ext4_std_error (struct super_block *, const char *, int); 1122 - extern void ext4_abort (struct super_block *, const char *, const char *, ...) 1104 + extern void __ext4_std_error(struct super_block *, const char *, int); 1105 + extern void ext4_abort(struct super_block *, const char *, const char *, ...) 1123 1106 __attribute__ ((format (printf, 3, 4))); 1124 - extern void ext4_warning (struct super_block *, const char *, const char *, ...) 1107 + extern void ext4_warning(struct super_block *, const char *, const char *, ...) 1125 1108 __attribute__ ((format (printf, 3, 4))); 1126 - extern void ext4_update_dynamic_rev (struct super_block *sb); 1109 + extern void ext4_update_dynamic_rev(struct super_block *sb); 1127 1110 extern int ext4_update_compat_feature(handle_t *handle, struct super_block *sb, 1128 1111 __u32 compat); 1129 1112 extern int ext4_update_rocompat_feature(handle_t *handle, ··· 1196 1179 1197 1180 static inline 1198 1181 struct ext4_group_info *ext4_get_group_info(struct super_block *sb, 1199 - ext4_group_t group) 1182 + ext4_group_t group) 1200 1183 { 1201 1184 struct ext4_group_info ***grp_info; 1202 1185 long indexv, indexh; ··· 1223 1206 if ((errno)) \ 1224 1207 __ext4_std_error((sb), __func__, (errno)); \ 1225 1208 } while (0) 1209 + 1210 + #ifdef CONFIG_SMP 1211 + /* Each CPU can accumulate FBC_BATCH blocks in their local 1212 + * counters. So we need to make sure we have free blocks more 1213 + * than FBC_BATCH * nr_cpu_ids. Also add a window of 4 times. 1214 + */ 1215 + #define EXT4_FREEBLOCKS_WATERMARK (4 * (FBC_BATCH * nr_cpu_ids)) 1216 + #else 1217 + #define EXT4_FREEBLOCKS_WATERMARK 0 1218 + #endif 1219 + 1220 + static inline void ext4_update_i_disksize(struct inode *inode, loff_t newsize) 1221 + { 1222 + /* 1223 + * XXX: replace with spinlock if seen contended -bzzz 1224 + */ 1225 + down_write(&EXT4_I(inode)->i_data_sem); 1226 + if (newsize > EXT4_I(inode)->i_disksize) 1227 + EXT4_I(inode)->i_disksize = newsize; 1228 + up_write(&EXT4_I(inode)->i_data_sem); 1229 + return ; 1230 + } 1226 1231 1227 1232 /* 1228 1233 * Inodes and files operations

+15

fs/ext4/ext4_extents.h

··· 124 124 #define EXT4_EXT_CACHE_GAP 1 125 125 #define EXT4_EXT_CACHE_EXTENT 2 126 126 127 + /* 128 + * to be called by ext4_ext_walk_space() 129 + * negative retcode - error 130 + * positive retcode - signal for ext4_ext_walk_space(), see below 131 + * callback must return valid extent (passed or newly created) 132 + */ 133 + typedef int (*ext_prepare_callback)(struct inode *, struct ext4_ext_path *, 134 + struct ext4_ext_cache *, 135 + struct ext4_extent *, void *); 136 + 137 + #define EXT_CONTINUE 0 138 + #define EXT_BREAK 1 139 + #define EXT_REPEAT 2 127 140 128 141 #define EXT_MAX_BLOCK 0xffffffff 129 142 ··· 237 224 struct ext4_extent *); 238 225 extern unsigned int ext4_ext_check_overlap(struct inode *, struct ext4_extent *, struct ext4_ext_path *); 239 226 extern int ext4_ext_insert_extent(handle_t *, struct inode *, struct ext4_ext_path *, struct ext4_extent *); 227 + extern int ext4_ext_walk_space(struct inode *, ext4_lblk_t, ext4_lblk_t, 228 + ext_prepare_callback, void *); 240 229 extern struct ext4_ext_path *ext4_ext_find_extent(struct inode *, ext4_lblk_t, 241 230 struct ext4_ext_path *); 242 231 extern int ext4_ext_search_left(struct inode *, struct ext4_ext_path *,

+2 -37

fs/ext4/ext4_i.h

··· 33 33 /* data type for block group number */ 34 34 typedef unsigned long ext4_group_t; 35 35 36 - struct ext4_reserve_window { 37 - ext4_fsblk_t _rsv_start; /* First byte reserved */ 38 - ext4_fsblk_t _rsv_end; /* Last byte reserved or 0 */ 39 - }; 40 - 41 - struct ext4_reserve_window_node { 42 - struct rb_node rsv_node; 43 - __u32 rsv_goal_size; 44 - __u32 rsv_alloc_hit; 45 - struct ext4_reserve_window rsv_window; 46 - }; 47 - 48 - struct ext4_block_alloc_info { 49 - /* information about reservation window */ 50 - struct ext4_reserve_window_node rsv_window_node; 51 - /* 52 - * was i_next_alloc_block in ext4_inode_info 53 - * is the logical (file-relative) number of the 54 - * most-recently-allocated block in this file. 55 - * We use this for detecting linearly ascending allocation requests. 56 - */ 57 - ext4_lblk_t last_alloc_logical_block; 58 - /* 59 - * Was i_next_alloc_goal in ext4_inode_info 60 - * is the *physical* companion to i_next_alloc_block. 61 - * it the physical block number of the block which was most-recentl 62 - * allocated to this file. This give us the goal (target) for the next 63 - * allocation when we detect linearly ascending requests. 64 - */ 65 - ext4_fsblk_t last_alloc_physical_block; 66 - }; 67 - 68 36 #define rsv_start rsv_window._rsv_start 69 37 #define rsv_end rsv_window._rsv_end 70 38 ··· 65 97 ext4_group_t i_block_group; 66 98 __u32 i_state; /* Dynamic state flags for ext4 */ 67 99 68 - /* block reservation info */ 69 - struct ext4_block_alloc_info *i_block_alloc_info; 70 - 71 100 ext4_lblk_t i_dir_start_lookup; 72 - #ifdef CONFIG_EXT4DEV_FS_XATTR 101 + #ifdef CONFIG_EXT4_FS_XATTR 73 102 /* 74 103 * Extended attributes can be read independently of the main file 75 104 * data. Taking i_mutex even when reading would cause contention ··· 76 111 */ 77 112 struct rw_semaphore xattr_sem; 78 113 #endif 79 - #ifdef CONFIG_EXT4DEV_FS_POSIX_ACL 114 + #ifdef CONFIG_EXT4_FS_POSIX_ACL 80 115 struct posix_acl *i_acl; 81 116 struct posix_acl *i_default_acl; 82 117 #endif

+13 -12

fs/ext4/ext4_sb.h

··· 40 40 unsigned long s_blocks_last; /* Last seen block count */ 41 41 loff_t s_bitmap_maxbytes; /* max bytes for bitmap files */ 42 42 struct buffer_head * s_sbh; /* Buffer containing the super block */ 43 - struct ext4_super_block * s_es; /* Pointer to the super block in the buffer */ 44 - struct buffer_head ** s_group_desc; 43 + struct ext4_super_block *s_es; /* Pointer to the super block in the buffer */ 44 + struct buffer_head **s_group_desc; 45 45 unsigned long s_mount_opt; 46 46 ext4_fsblk_t s_sb_block; 47 47 uid_t s_resuid; ··· 52 52 int s_desc_per_block_bits; 53 53 int s_inode_size; 54 54 int s_first_ino; 55 + unsigned int s_inode_readahead_blks; 55 56 spinlock_t s_next_gen_lock; 56 57 u32 s_next_generation; 57 58 u32 s_hash_seed[4]; ··· 60 59 struct percpu_counter s_freeblocks_counter; 61 60 struct percpu_counter s_freeinodes_counter; 62 61 struct percpu_counter s_dirs_counter; 62 + struct percpu_counter s_dirtyblocks_counter; 63 63 struct blockgroup_lock s_blockgroup_lock; 64 + struct proc_dir_entry *s_proc; 64 65 65 66 /* root of the per fs reservation window tree */ 66 67 spinlock_t s_rsv_window_lock; 67 68 struct rb_root s_rsv_window_root; 68 - struct ext4_reserve_window_node s_rsv_window_head; 69 69 70 70 /* Journaling */ 71 - struct inode * s_journal_inode; 72 - struct journal_s * s_journal; 71 + struct inode *s_journal_inode; 72 + struct journal_s *s_journal; 73 73 struct list_head s_orphan; 74 74 unsigned long s_commit_interval; 75 75 struct block_device *journal_bdev; ··· 108 106 109 107 /* tunables */ 110 108 unsigned long s_stripe; 111 - unsigned long s_mb_stream_request; 112 - unsigned long s_mb_max_to_scan; 113 - unsigned long s_mb_min_to_scan; 114 - unsigned long s_mb_stats; 115 - unsigned long s_mb_order2_reqs; 116 - unsigned long s_mb_group_prealloc; 109 + unsigned int s_mb_stream_request; 110 + unsigned int s_mb_max_to_scan; 111 + unsigned int s_mb_min_to_scan; 112 + unsigned int s_mb_stats; 113 + unsigned int s_mb_order2_reqs; 114 + unsigned int s_mb_group_prealloc; 117 115 /* where last allocation was done - for stream allocation */ 118 116 unsigned long s_mb_last_group; 119 117 unsigned long s_mb_last_start; ··· 123 121 int s_mb_history_cur; 124 122 int s_mb_history_max; 125 123 int s_mb_history_num; 126 - struct proc_dir_entry *s_mb_proc; 127 124 spinlock_t s_mb_history_lock; 128 125 int s_mb_history_filter; 129 126

+264 -17

fs/ext4/extents.c

··· 40 40 #include <linux/slab.h> 41 41 #include <linux/falloc.h> 42 42 #include <asm/uaccess.h> 43 + #include <linux/fiemap.h> 43 44 #include "ext4_jbd2.h" 44 45 #include "ext4_extents.h" 45 46 ··· 384 383 ext_debug("\n"); 385 384 } 386 385 #else 387 - #define ext4_ext_show_path(inode,path) 388 - #define ext4_ext_show_leaf(inode,path) 386 + #define ext4_ext_show_path(inode, path) 387 + #define ext4_ext_show_leaf(inode, path) 389 388 #endif 390 389 391 390 void ext4_ext_drop_refs(struct ext4_ext_path *path) ··· 441 440 for (k = 0; k < le16_to_cpu(eh->eh_entries); k++, ix++) { 442 441 if (k != 0 && 443 442 le32_to_cpu(ix->ei_block) <= le32_to_cpu(ix[-1].ei_block)) { 444 - printk("k=%d, ix=0x%p, first=0x%p\n", k, 445 - ix, EXT_FIRST_INDEX(eh)); 446 - printk("%u <= %u\n", 443 + printk(KERN_DEBUG "k=%d, ix=0x%p, " 444 + "first=0x%p\n", k, 445 + ix, EXT_FIRST_INDEX(eh)); 446 + printk(KERN_DEBUG "%u <= %u\n", 447 447 le32_to_cpu(ix->ei_block), 448 448 le32_to_cpu(ix[-1].ei_block)); 449 449 } ··· 1477 1475 struct ext4_ext_path *path, 1478 1476 struct ext4_extent *newext) 1479 1477 { 1480 - struct ext4_extent_header * eh; 1478 + struct ext4_extent_header *eh; 1481 1479 struct ext4_extent *ex, *fex; 1482 1480 struct ext4_extent *nearex; /* nearest extent */ 1483 1481 struct ext4_ext_path *npath = NULL; ··· 1624 1622 } 1625 1623 ext4_ext_tree_changed(inode); 1626 1624 ext4_ext_invalidate_cache(inode); 1625 + return err; 1626 + } 1627 + 1628 + int ext4_ext_walk_space(struct inode *inode, ext4_lblk_t block, 1629 + ext4_lblk_t num, ext_prepare_callback func, 1630 + void *cbdata) 1631 + { 1632 + struct ext4_ext_path *path = NULL; 1633 + struct ext4_ext_cache cbex; 1634 + struct ext4_extent *ex; 1635 + ext4_lblk_t next, start = 0, end = 0; 1636 + ext4_lblk_t last = block + num; 1637 + int depth, exists, err = 0; 1638 + 1639 + BUG_ON(func == NULL); 1640 + BUG_ON(inode == NULL); 1641 + 1642 + while (block < last && block != EXT_MAX_BLOCK) { 1643 + num = last - block; 1644 + /* find extent for this block */ 1645 + path = ext4_ext_find_extent(inode, block, path); 1646 + if (IS_ERR(path)) { 1647 + err = PTR_ERR(path); 1648 + path = NULL; 1649 + break; 1650 + } 1651 + 1652 + depth = ext_depth(inode); 1653 + BUG_ON(path[depth].p_hdr == NULL); 1654 + ex = path[depth].p_ext; 1655 + next = ext4_ext_next_allocated_block(path); 1656 + 1657 + exists = 0; 1658 + if (!ex) { 1659 + /* there is no extent yet, so try to allocate 1660 + * all requested space */ 1661 + start = block; 1662 + end = block + num; 1663 + } else if (le32_to_cpu(ex->ee_block) > block) { 1664 + /* need to allocate space before found extent */ 1665 + start = block; 1666 + end = le32_to_cpu(ex->ee_block); 1667 + if (block + num < end) 1668 + end = block + num; 1669 + } else if (block >= le32_to_cpu(ex->ee_block) 1670 + + ext4_ext_get_actual_len(ex)) { 1671 + /* need to allocate space after found extent */ 1672 + start = block; 1673 + end = block + num; 1674 + if (end >= next) 1675 + end = next; 1676 + } else if (block >= le32_to_cpu(ex->ee_block)) { 1677 + /* 1678 + * some part of requested space is covered 1679 + * by found extent 1680 + */ 1681 + start = block; 1682 + end = le32_to_cpu(ex->ee_block) 1683 + + ext4_ext_get_actual_len(ex); 1684 + if (block + num < end) 1685 + end = block + num; 1686 + exists = 1; 1687 + } else { 1688 + BUG(); 1689 + } 1690 + BUG_ON(end <= start); 1691 + 1692 + if (!exists) { 1693 + cbex.ec_block = start; 1694 + cbex.ec_len = end - start; 1695 + cbex.ec_start = 0; 1696 + cbex.ec_type = EXT4_EXT_CACHE_GAP; 1697 + } else { 1698 + cbex.ec_block = le32_to_cpu(ex->ee_block); 1699 + cbex.ec_len = ext4_ext_get_actual_len(ex); 1700 + cbex.ec_start = ext_pblock(ex); 1701 + cbex.ec_type = EXT4_EXT_CACHE_EXTENT; 1702 + } 1703 + 1704 + BUG_ON(cbex.ec_len == 0); 1705 + err = func(inode, path, &cbex, ex, cbdata); 1706 + ext4_ext_drop_refs(path); 1707 + 1708 + if (err < 0) 1709 + break; 1710 + 1711 + if (err == EXT_REPEAT) 1712 + continue; 1713 + else if (err == EXT_BREAK) { 1714 + err = 0; 1715 + break; 1716 + } 1717 + 1718 + if (ext_depth(inode) != depth) { 1719 + /* depth was changed. we have to realloc path */ 1720 + kfree(path); 1721 + path = NULL; 1722 + } 1723 + 1724 + block = cbex.ec_block + cbex.ec_len; 1725 + } 1726 + 1727 + if (path) { 1728 + ext4_ext_drop_refs(path); 1729 + kfree(path); 1730 + } 1731 + 1627 1732 return err; 1628 1733 } 1629 1734 ··· 2251 2142 */ 2252 2143 2253 2144 if (test_opt(sb, EXTENTS)) { 2254 - printk("EXT4-fs: file extents enabled"); 2145 + printk(KERN_INFO "EXT4-fs: file extents enabled"); 2255 2146 #ifdef AGGRESSIVE_TEST 2256 2147 printk(", aggressive tests"); 2257 2148 #endif ··· 2805 2696 goto out2; 2806 2697 } 2807 2698 /* 2808 - * Okay, we need to do block allocation. Lazily initialize the block 2809 - * allocation info here if necessary. 2699 + * Okay, we need to do block allocation. 2810 2700 */ 2811 - if (S_ISREG(inode->i_mode) && (!EXT4_I(inode)->i_block_alloc_info)) 2812 - ext4_init_block_alloc_info(inode); 2813 2701 2814 2702 /* find neighbour allocated blocks */ 2815 2703 ar.lleft = iblock; ··· 2866 2760 /* free data blocks we just allocated */ 2867 2761 /* not a good idea to call discard here directly, 2868 2762 * but otherwise we'd need to call it every free() */ 2869 - ext4_mb_discard_inode_preallocations(inode); 2763 + ext4_discard_preallocations(inode); 2870 2764 ext4_free_blocks(handle, inode, ext_pblock(&newex), 2871 2765 ext4_ext_get_actual_len(&newex), 0); 2872 2766 goto out2; ··· 2930 2824 down_write(&EXT4_I(inode)->i_data_sem); 2931 2825 ext4_ext_invalidate_cache(inode); 2932 2826 2933 - ext4_discard_reservation(inode); 2827 + ext4_discard_preallocations(inode); 2934 2828 2935 2829 /* 2936 2830 * TODO: optimization is possible here. ··· 2983 2877 * Update only when preallocation was requested beyond 2984 2878 * the file size. 2985 2879 */ 2986 - if (!(mode & FALLOC_FL_KEEP_SIZE) && 2987 - new_size > i_size_read(inode)) { 2988 - i_size_write(inode, new_size); 2989 - EXT4_I(inode)->i_disksize = new_size; 2880 + if (!(mode & FALLOC_FL_KEEP_SIZE)) { 2881 + if (new_size > i_size_read(inode)) 2882 + i_size_write(inode, new_size); 2883 + if (new_size > EXT4_I(inode)->i_disksize) 2884 + ext4_update_i_disksize(inode, new_size); 2990 2885 } 2991 2886 2992 2887 } ··· 3079 2972 mutex_unlock(&inode->i_mutex); 3080 2973 return ret > 0 ? ret2 : ret; 3081 2974 } 2975 + 2976 + /* 2977 + * Callback function called for each extent to gather FIEMAP information. 2978 + */ 2979 + int ext4_ext_fiemap_cb(struct inode *inode, struct ext4_ext_path *path, 2980 + struct ext4_ext_cache *newex, struct ext4_extent *ex, 2981 + void *data) 2982 + { 2983 + struct fiemap_extent_info *fieinfo = data; 2984 + unsigned long blksize_bits = inode->i_sb->s_blocksize_bits; 2985 + __u64 logical; 2986 + __u64 physical; 2987 + __u64 length; 2988 + __u32 flags = 0; 2989 + int error; 2990 + 2991 + logical = (__u64)newex->ec_block << blksize_bits; 2992 + 2993 + if (newex->ec_type == EXT4_EXT_CACHE_GAP) { 2994 + pgoff_t offset; 2995 + struct page *page; 2996 + struct buffer_head *bh = NULL; 2997 + 2998 + offset = logical >> PAGE_SHIFT; 2999 + page = find_get_page(inode->i_mapping, offset); 3000 + if (!page || !page_has_buffers(page)) 3001 + return EXT_CONTINUE; 3002 + 3003 + bh = page_buffers(page); 3004 + 3005 + if (!bh) 3006 + return EXT_CONTINUE; 3007 + 3008 + if (buffer_delay(bh)) { 3009 + flags |= FIEMAP_EXTENT_DELALLOC; 3010 + page_cache_release(page); 3011 + } else { 3012 + page_cache_release(page); 3013 + return EXT_CONTINUE; 3014 + } 3015 + } 3016 + 3017 + physical = (__u64)newex->ec_start << blksize_bits; 3018 + length = (__u64)newex->ec_len << blksize_bits; 3019 + 3020 + if (ex && ext4_ext_is_uninitialized(ex)) 3021 + flags |= FIEMAP_EXTENT_UNWRITTEN; 3022 + 3023 + /* 3024 + * If this extent reaches EXT_MAX_BLOCK, it must be last. 3025 + * 3026 + * Or if ext4_ext_next_allocated_block is EXT_MAX_BLOCK, 3027 + * this also indicates no more allocated blocks. 3028 + * 3029 + * XXX this might miss a single-block extent at EXT_MAX_BLOCK 3030 + */ 3031 + if (logical + length - 1 == EXT_MAX_BLOCK || 3032 + ext4_ext_next_allocated_block(path) == EXT_MAX_BLOCK) 3033 + flags |= FIEMAP_EXTENT_LAST; 3034 + 3035 + error = fiemap_fill_next_extent(fieinfo, logical, physical, 3036 + length, flags); 3037 + if (error < 0) 3038 + return error; 3039 + if (error == 1) 3040 + return EXT_BREAK; 3041 + 3042 + return EXT_CONTINUE; 3043 + } 3044 + 3045 + /* fiemap flags we can handle specified here */ 3046 + #define EXT4_FIEMAP_FLAGS (FIEMAP_FLAG_SYNC|FIEMAP_FLAG_XATTR) 3047 + 3048 + int ext4_xattr_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo) 3049 + { 3050 + __u64 physical = 0; 3051 + __u64 length; 3052 + __u32 flags = FIEMAP_EXTENT_LAST; 3053 + int blockbits = inode->i_sb->s_blocksize_bits; 3054 + int error = 0; 3055 + 3056 + /* in-inode? */ 3057 + if (EXT4_I(inode)->i_state & EXT4_STATE_XATTR) { 3058 + struct ext4_iloc iloc; 3059 + int offset; /* offset of xattr in inode */ 3060 + 3061 + error = ext4_get_inode_loc(inode, &iloc); 3062 + if (error) 3063 + return error; 3064 + physical = iloc.bh->b_blocknr << blockbits; 3065 + offset = EXT4_GOOD_OLD_INODE_SIZE + 3066 + EXT4_I(inode)->i_extra_isize; 3067 + physical += offset; 3068 + length = EXT4_SB(inode->i_sb)->s_inode_size - offset; 3069 + flags |= FIEMAP_EXTENT_DATA_INLINE; 3070 + } else { /* external block */ 3071 + physical = EXT4_I(inode)->i_file_acl << blockbits; 3072 + length = inode->i_sb->s_blocksize; 3073 + } 3074 + 3075 + if (physical) 3076 + error = fiemap_fill_next_extent(fieinfo, 0, physical, 3077 + length, flags); 3078 + return (error < 0 ? error : 0); 3079 + } 3080 + 3081 + int ext4_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo, 3082 + __u64 start, __u64 len) 3083 + { 3084 + ext4_lblk_t start_blk; 3085 + ext4_lblk_t len_blks; 3086 + int error = 0; 3087 + 3088 + /* fallback to generic here if not in extents fmt */ 3089 + if (!(EXT4_I(inode)->i_flags & EXT4_EXTENTS_FL)) 3090 + return generic_block_fiemap(inode, fieinfo, start, len, 3091 + ext4_get_block); 3092 + 3093 + if (fiemap_check_flags(fieinfo, EXT4_FIEMAP_FLAGS)) 3094 + return -EBADR; 3095 + 3096 + if (fieinfo->fi_flags & FIEMAP_FLAG_XATTR) { 3097 + error = ext4_xattr_fiemap(inode, fieinfo); 3098 + } else { 3099 + start_blk = start >> inode->i_sb->s_blocksize_bits; 3100 + len_blks = len >> inode->i_sb->s_blocksize_bits; 3101 + 3102 + /* 3103 + * Walk the extent tree gathering extent information. 3104 + * ext4_ext_fiemap_cb will push extents back to user. 3105 + */ 3106 + down_write(&EXT4_I(inode)->i_data_sem); 3107 + error = ext4_ext_walk_space(inode, start_blk, len_blks, 3108 + ext4_ext_fiemap_cb, fieinfo); 3109 + up_write(&EXT4_I(inode)->i_data_sem); 3110 + } 3111 + 3112 + return error; 3113 + } 3114 +

+7 -3

fs/ext4/file.c

··· 31 31 * from ext4_file_open: open gets called at every open, but release 32 32 * gets called only when /all/ the files are closed. 33 33 */ 34 - static int ext4_release_file (struct inode * inode, struct file * filp) 34 + static int ext4_release_file(struct inode *inode, struct file *filp) 35 35 { 36 36 /* if we are the last writer on the inode, drop the block reservation */ 37 37 if ((filp->f_mode & FMODE_WRITE) && 38 38 (atomic_read(&inode->i_writecount) == 1)) 39 39 { 40 40 down_write(&EXT4_I(inode)->i_data_sem); 41 - ext4_discard_reservation(inode); 41 + ext4_discard_preallocations(inode); 42 42 up_write(&EXT4_I(inode)->i_data_sem); 43 43 } 44 44 if (is_dx(inode) && filp->private_data) ··· 140 140 return 0; 141 141 } 142 142 143 + extern int ext4_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo, 144 + __u64 start, __u64 len); 145 + 143 146 const struct file_operations ext4_file_operations = { 144 147 .llseek = generic_file_llseek, 145 148 .read = do_sync_read, ··· 165 162 .truncate = ext4_truncate, 166 163 .setattr = ext4_setattr, 167 164 .getattr = ext4_getattr, 168 - #ifdef CONFIG_EXT4DEV_FS_XATTR 165 + #ifdef CONFIG_EXT4_FS_XATTR 169 166 .setxattr = generic_setxattr, 170 167 .getxattr = generic_getxattr, 171 168 .listxattr = ext4_listxattr, ··· 173 170 #endif 174 171 .permission = ext4_permission, 175 172 .fallocate = ext4_fallocate, 173 + .fiemap = ext4_fiemap, 176 174 }; 177 175

+6 -1

fs/ext4/fsync.c

··· 28 28 #include <linux/writeback.h> 29 29 #include <linux/jbd2.h> 30 30 #include <linux/blkdev.h> 31 + #include <linux/marker.h> 31 32 #include "ext4.h" 32 33 #include "ext4_jbd2.h" 33 34 ··· 44 43 * inode to disk. 45 44 */ 46 45 47 - int ext4_sync_file(struct file * file, struct dentry *dentry, int datasync) 46 + int ext4_sync_file(struct file *file, struct dentry *dentry, int datasync) 48 47 { 49 48 struct inode *inode = dentry->d_inode; 50 49 journal_t *journal = EXT4_SB(inode->i_sb)->s_journal; 51 50 int ret = 0; 52 51 53 52 J_ASSERT(ext4_journal_current_handle() == NULL); 53 + 54 + trace_mark(ext4_sync_file, "dev %s datasync %d ino %ld parent %ld", 55 + inode->i_sb->s_id, datasync, inode->i_ino, 56 + dentry->d_parent->d_inode->i_ino); 54 57 55 58 /* 56 59 * data=writeback:

+4 -4

fs/ext4/hash.c

··· 27 27 sum += DELTA; 28 28 b0 += ((b1 << 4)+a) ^ (b1+sum) ^ ((b1 >> 5)+b); 29 29 b1 += ((b0 << 4)+c) ^ (b0+sum) ^ ((b0 >> 5)+d); 30 - } while(--n); 30 + } while (--n); 31 31 32 32 buf[0] += b0; 33 33 buf[1] += b1; ··· 35 35 36 36 37 37 /* The old legacy hash */ 38 - static __u32 dx_hack_hash (const char *name, int len) 38 + static __u32 dx_hack_hash(const char *name, int len) 39 39 { 40 40 __u32 hash0 = 0x12a3fe2d, hash1 = 0x37abe8f9; 41 41 while (len--) { ··· 59 59 val = pad; 60 60 if (len > num*4) 61 61 len = num * 4; 62 - for (i=0; i < len; i++) { 62 + for (i = 0; i < len; i++) { 63 63 if ((i % 4) == 0) 64 64 val = pad; 65 65 val = msg[i] + (val << 8); ··· 104 104 105 105 /* Check to see if the seed is all zero's */ 106 106 if (hinfo->seed) { 107 - for (i=0; i < 4; i++) { 107 + for (i = 0; i < 4; i++) { 108 108 if (hinfo->seed[i]) 109 109 break; 110 110 }

+37 -34

fs/ext4/ialloc.c

··· 115 115 block_group, bitmap_blk); 116 116 return NULL; 117 117 } 118 - if (bh_uptodate_or_lock(bh)) 118 + if (buffer_uptodate(bh) && 119 + !(desc->bg_flags & cpu_to_le16(EXT4_BG_INODE_UNINIT))) 119 120 return bh; 120 121 122 + lock_buffer(bh); 121 123 spin_lock(sb_bgl_lock(EXT4_SB(sb), block_group)); 122 124 if (desc->bg_flags & cpu_to_le16(EXT4_BG_INODE_UNINIT)) { 123 125 ext4_init_inode_bitmap(sb, bh, block_group, desc); ··· 156 154 * though), and then we'd have two inodes sharing the 157 155 * same inode number and space on the harddisk. 158 156 */ 159 - void ext4_free_inode (handle_t *handle, struct inode * inode) 157 + void ext4_free_inode(handle_t *handle, struct inode *inode) 160 158 { 161 - struct super_block * sb = inode->i_sb; 159 + struct super_block *sb = inode->i_sb; 162 160 int is_directory; 163 161 unsigned long ino; 164 162 struct buffer_head *bitmap_bh = NULL; 165 163 struct buffer_head *bh2; 166 164 ext4_group_t block_group; 167 165 unsigned long bit; 168 - struct ext4_group_desc * gdp; 169 - struct ext4_super_block * es; 166 + struct ext4_group_desc *gdp; 167 + struct ext4_super_block *es; 170 168 struct ext4_sb_info *sbi; 171 169 int fatal = 0, err; 172 170 ext4_group_t flex_group; 173 171 174 172 if (atomic_read(&inode->i_count) > 1) { 175 - printk ("ext4_free_inode: inode has count=%d\n", 176 - atomic_read(&inode->i_count)); 173 + printk(KERN_ERR "ext4_free_inode: inode has count=%d\n", 174 + atomic_read(&inode->i_count)); 177 175 return; 178 176 } 179 177 if (inode->i_nlink) { 180 - printk ("ext4_free_inode: inode has nlink=%d\n", 181 - inode->i_nlink); 178 + printk(KERN_ERR "ext4_free_inode: inode has nlink=%d\n", 179 + inode->i_nlink); 182 180 return; 183 181 } 184 182 if (!sb) { 185 - printk("ext4_free_inode: inode on nonexistent device\n"); 183 + printk(KERN_ERR "ext4_free_inode: inode on " 184 + "nonexistent device\n"); 186 185 return; 187 186 } 188 187 sbi = EXT4_SB(sb); 189 188 190 189 ino = inode->i_ino; 191 - ext4_debug ("freeing inode %lu\n", ino); 190 + ext4_debug("freeing inode %lu\n", ino); 192 191 193 192 /* 194 193 * Note: we must free any quota before locking the superblock, ··· 203 200 is_directory = S_ISDIR(inode->i_mode); 204 201 205 202 /* Do this BEFORE marking the inode not in use or returning an error */ 206 - clear_inode (inode); 203 + clear_inode(inode); 207 204 208 205 es = EXT4_SB(sb)->s_es; 209 206 if (ino < EXT4_FIRST_INO(sb) || ino > le32_to_cpu(es->s_inodes_count)) { 210 - ext4_error (sb, "ext4_free_inode", 211 - "reserved or nonexistent inode %lu", ino); 207 + ext4_error(sb, "ext4_free_inode", 208 + "reserved or nonexistent inode %lu", ino); 212 209 goto error_return; 213 210 } 214 211 block_group = (ino - 1) / EXT4_INODES_PER_GROUP(sb); ··· 225 222 /* Ok, now we can actually update the inode bitmaps.. */ 226 223 if (!ext4_clear_bit_atomic(sb_bgl_lock(sbi, block_group), 227 224 bit, bitmap_bh->b_data)) 228 - ext4_error (sb, "ext4_free_inode", 229 - "bit already cleared for inode %lu", ino); 225 + ext4_error(sb, "ext4_free_inode", 226 + "bit already cleared for inode %lu", ino); 230 227 else { 231 - gdp = ext4_get_group_desc (sb, block_group, &bh2); 228 + gdp = ext4_get_group_desc(sb, block_group, &bh2); 232 229 233 230 BUFFER_TRACE(bh2, "get_write_access"); 234 231 fatal = ext4_journal_get_write_access(handle, bh2); ··· 290 287 avefreei = freei / ngroups; 291 288 292 289 for (group = 0; group < ngroups; group++) { 293 - desc = ext4_get_group_desc (sb, group, NULL); 290 + desc = ext4_get_group_desc(sb, group, NULL); 294 291 if (!desc || !desc->bg_free_inodes_count) 295 292 continue; 296 293 if (le16_to_cpu(desc->bg_free_inodes_count) < avefreei) ··· 579 576 * For other inodes, search forward from the parent directory's block 580 577 * group to find a free inode. 581 578 */ 582 - struct inode *ext4_new_inode(handle_t *handle, struct inode * dir, int mode) 579 + struct inode *ext4_new_inode(handle_t *handle, struct inode *dir, int mode) 583 580 { 584 581 struct super_block *sb; 585 582 struct buffer_head *bitmap_bh = NULL; 586 583 struct buffer_head *bh2; 587 584 ext4_group_t group = 0; 588 585 unsigned long ino = 0; 589 - struct inode * inode; 590 - struct ext4_group_desc * gdp = NULL; 591 - struct ext4_super_block * es; 586 + struct inode *inode; 587 + struct ext4_group_desc *gdp = NULL; 588 + struct ext4_super_block *es; 592 589 struct ext4_inode_info *ei; 593 590 struct ext4_sb_info *sbi; 594 591 int ret2, err = 0; ··· 616 613 } 617 614 618 615 if (S_ISDIR(mode)) { 619 - if (test_opt (sb, OLDALLOC)) 616 + if (test_opt(sb, OLDALLOC)) 620 617 ret2 = find_group_dir(sb, dir, &group); 621 618 else 622 619 ret2 = find_group_orlov(sb, dir, &group); ··· 786 783 } 787 784 788 785 inode->i_uid = current->fsuid; 789 - if (test_opt (sb, GRPID)) 786 + if (test_opt(sb, GRPID)) 790 787 inode->i_gid = dir->i_gid; 791 788 else if (dir->i_mode & S_ISGID) { 792 789 inode->i_gid = dir->i_gid; ··· 819 816 ei->i_flags &= ~EXT4_DIRSYNC_FL; 820 817 ei->i_file_acl = 0; 821 818 ei->i_dtime = 0; 822 - ei->i_block_alloc_info = NULL; 823 819 ei->i_block_group = group; 824 820 825 821 ext4_set_inode_flags(inode); ··· 834 832 ei->i_extra_isize = EXT4_SB(sb)->s_want_extra_isize; 835 833 836 834 ret = inode; 837 - if(DQUOT_ALLOC_INODE(inode)) { 835 + if (DQUOT_ALLOC_INODE(inode)) { 838 836 err = -EDQUOT; 839 837 goto fail_drop; 840 838 } ··· 843 841 if (err) 844 842 goto fail_free_drop; 845 843 846 - err = ext4_init_security(handle,inode, dir); 844 + err = ext4_init_security(handle, inode, dir); 847 845 if (err) 848 846 goto fail_free_drop; 849 847 ··· 961 959 return ERR_PTR(err); 962 960 } 963 961 964 - unsigned long ext4_count_free_inodes (struct super_block * sb) 962 + unsigned long ext4_count_free_inodes(struct super_block *sb) 965 963 { 966 964 unsigned long desc_count; 967 965 struct ext4_group_desc *gdp; ··· 976 974 bitmap_count = 0; 977 975 gdp = NULL; 978 976 for (i = 0; i < EXT4_SB(sb)->s_groups_count; i++) { 979 - gdp = ext4_get_group_desc (sb, i, NULL); 977 + gdp = ext4_get_group_desc(sb, i, NULL); 980 978 if (!gdp) 981 979 continue; 982 980 desc_count += le16_to_cpu(gdp->bg_free_inodes_count); ··· 991 989 bitmap_count += x; 992 990 } 993 991 brelse(bitmap_bh); 994 - printk("ext4_count_free_inodes: stored = %u, computed = %lu, %lu\n", 995 - le32_to_cpu(es->s_free_inodes_count), desc_count, bitmap_count); 992 + printk(KERN_DEBUG "ext4_count_free_inodes: " 993 + "stored = %u, computed = %lu, %lu\n", 994 + le32_to_cpu(es->s_free_inodes_count), desc_count, bitmap_count); 996 995 return desc_count; 997 996 #else 998 997 desc_count = 0; 999 998 for (i = 0; i < EXT4_SB(sb)->s_groups_count; i++) { 1000 - gdp = ext4_get_group_desc (sb, i, NULL); 999 + gdp = ext4_get_group_desc(sb, i, NULL); 1001 1000 if (!gdp) 1002 1001 continue; 1003 1002 desc_count += le16_to_cpu(gdp->bg_free_inodes_count); ··· 1009 1006 } 1010 1007 1011 1008 /* Called at mount-time, super-block is locked */ 1012 - unsigned long ext4_count_dirs (struct super_block * sb) 1009 + unsigned long ext4_count_dirs(struct super_block * sb) 1013 1010 { 1014 1011 unsigned long count = 0; 1015 1012 ext4_group_t i; 1016 1013 1017 1014 for (i = 0; i < EXT4_SB(sb)->s_groups_count; i++) { 1018 - struct ext4_group_desc *gdp = ext4_get_group_desc (sb, i, NULL); 1015 + struct ext4_group_desc *gdp = ext4_get_group_desc(sb, i, NULL); 1019 1016 if (!gdp) 1020 1017 continue; 1021 1018 count += le16_to_cpu(gdp->bg_used_dirs_count);

+373 -245

fs/ext4/inode.c

··· 190 190 /* 191 191 * Called at the last iput() if i_nlink is zero. 192 192 */ 193 - void ext4_delete_inode (struct inode * inode) 193 + void ext4_delete_inode(struct inode *inode) 194 194 { 195 195 handle_t *handle; 196 196 int err; ··· 330 330 int final = 0; 331 331 332 332 if (i_block < 0) { 333 - ext4_warning (inode->i_sb, "ext4_block_to_path", "block < 0"); 333 + ext4_warning(inode->i_sb, "ext4_block_to_path", "block < 0"); 334 334 } else if (i_block < direct_blocks) { 335 335 offsets[n++] = i_block; 336 336 final = direct_blocks; 337 - } else if ( (i_block -= direct_blocks) < indirect_blocks) { 337 + } else if ((i_block -= direct_blocks) < indirect_blocks) { 338 338 offsets[n++] = EXT4_IND_BLOCK; 339 339 offsets[n++] = i_block; 340 340 final = ptrs; ··· 400 400 401 401 *err = 0; 402 402 /* i_data is not going away, no lock needed */ 403 - add_chain (chain, NULL, EXT4_I(inode)->i_data + *offsets); 403 + add_chain(chain, NULL, EXT4_I(inode)->i_data + *offsets); 404 404 if (!p->key) 405 405 goto no_block; 406 406 while (--depth) { 407 407 bh = sb_bread(sb, le32_to_cpu(p->key)); 408 408 if (!bh) 409 409 goto failure; 410 - add_chain(++p, bh, (__le32*)bh->b_data + *++offsets); 410 + add_chain(++p, bh, (__le32 *)bh->b_data + *++offsets); 411 411 /* Reader: end */ 412 412 if (!p->key) 413 413 goto no_block; ··· 443 443 static ext4_fsblk_t ext4_find_near(struct inode *inode, Indirect *ind) 444 444 { 445 445 struct ext4_inode_info *ei = EXT4_I(inode); 446 - __le32 *start = ind->bh ? (__le32*) ind->bh->b_data : ei->i_data; 446 + __le32 *start = ind->bh ? (__le32 *) ind->bh->b_data : ei->i_data; 447 447 __le32 *p; 448 448 ext4_fsblk_t bg_start; 449 449 ext4_fsblk_t last_block; ··· 486 486 static ext4_fsblk_t ext4_find_goal(struct inode *inode, ext4_lblk_t block, 487 487 Indirect *partial) 488 488 { 489 - struct ext4_block_alloc_info *block_i; 490 - 491 - block_i = EXT4_I(inode)->i_block_alloc_info; 492 - 493 489 /* 494 - * try the heuristic for sequential allocation, 495 - * failing that at least try to get decent locality. 490 + * XXX need to get goal block from mballoc's data structures 496 491 */ 497 - if (block_i && (block == block_i->last_alloc_logical_block + 1) 498 - && (block_i->last_alloc_physical_block != 0)) { 499 - return block_i->last_alloc_physical_block + 1; 500 - } 501 492 502 493 return ext4_find_near(inode, partial); 503 494 } ··· 621 630 *err = 0; 622 631 return ret; 623 632 failed_out: 624 - for (i = 0; i <index; i++) 633 + for (i = 0; i < index; i++) 625 634 ext4_free_blocks(handle, inode, new_blocks[i], 1, 0); 626 635 return ret; 627 636 } ··· 694 703 branch[n].p = (__le32 *) bh->b_data + offsets[n]; 695 704 branch[n].key = cpu_to_le32(new_blocks[n]); 696 705 *branch[n].p = branch[n].key; 697 - if ( n == indirect_blks) { 706 + if (n == indirect_blks) { 698 707 current_block = new_blocks[n]; 699 708 /* 700 709 * End of chain, update the last new metablock of ··· 721 730 BUFFER_TRACE(branch[i].bh, "call jbd2_journal_forget"); 722 731 ext4_journal_forget(handle, branch[i].bh); 723 732 } 724 - for (i = 0; i <indirect_blks; i++) 733 + for (i = 0; i < indirect_blks; i++) 725 734 ext4_free_blocks(handle, inode, new_blocks[i], 1, 0); 726 735 727 736 ext4_free_blocks(handle, inode, new_blocks[i], num, 0); ··· 748 757 { 749 758 int i; 750 759 int err = 0; 751 - struct ext4_block_alloc_info *block_i; 752 760 ext4_fsblk_t current_block; 753 761 754 - block_i = EXT4_I(inode)->i_block_alloc_info; 755 762 /* 756 763 * If we're splicing into a [td]indirect block (as opposed to the 757 764 * inode) then we need to get write access to the [td]indirect block ··· 772 783 if (num == 0 && blks > 1) { 773 784 current_block = le32_to_cpu(where->key) + 1; 774 785 for (i = 1; i < blks; i++) 775 - *(where->p + i ) = cpu_to_le32(current_block++); 776 - } 777 - 778 - /* 779 - * update the most recently allocated logical & physical block 780 - * in i_block_alloc_info, to assist find the proper goal block for next 781 - * allocation 782 - */ 783 - if (block_i) { 784 - block_i->last_alloc_logical_block = block + blks - 1; 785 - block_i->last_alloc_physical_block = 786 - le32_to_cpu(where[num].key) + blks - 1; 786 + *(where->p + i) = cpu_to_le32(current_block++); 787 787 } 788 788 789 789 /* We are done with atomic stuff, now do the rest of housekeeping */ ··· 892 914 goto cleanup; 893 915 894 916 /* 895 - * Okay, we need to do block allocation. Lazily initialize the block 896 - * allocation info here if necessary 917 + * Okay, we need to do block allocation. 897 918 */ 898 - if (S_ISREG(inode->i_mode) && (!ei->i_block_alloc_info)) 899 - ext4_init_block_alloc_info(inode); 900 - 901 919 goal = ext4_find_goal(inode, iblock, partial); 902 920 903 921 /* the number of blocks need to allocate for [d,t]indirect blocks */ ··· 1004 1030 BUG_ON(mdb > EXT4_I(inode)->i_reserved_meta_blocks); 1005 1031 mdb_free = EXT4_I(inode)->i_reserved_meta_blocks - mdb; 1006 1032 1007 - /* Account for allocated meta_blocks */ 1008 - mdb_free -= EXT4_I(inode)->i_allocated_meta_blocks; 1033 + if (mdb_free) { 1034 + /* Account for allocated meta_blocks */ 1035 + mdb_free -= EXT4_I(inode)->i_allocated_meta_blocks; 1009 1036 1010 - /* update fs free blocks counter for truncate case */ 1011 - percpu_counter_add(&sbi->s_freeblocks_counter, mdb_free); 1037 + /* update fs dirty blocks counter */ 1038 + percpu_counter_sub(&sbi->s_dirtyblocks_counter, mdb_free); 1039 + EXT4_I(inode)->i_allocated_meta_blocks = 0; 1040 + EXT4_I(inode)->i_reserved_meta_blocks = mdb; 1041 + } 1012 1042 1013 1043 /* update per-inode reservations */ 1014 1044 BUG_ON(used > EXT4_I(inode)->i_reserved_data_blocks); 1015 1045 EXT4_I(inode)->i_reserved_data_blocks -= used; 1016 1046 1017 - BUG_ON(mdb > EXT4_I(inode)->i_reserved_meta_blocks); 1018 - EXT4_I(inode)->i_reserved_meta_blocks = mdb; 1019 - EXT4_I(inode)->i_allocated_meta_blocks = 0; 1020 1047 spin_unlock(&EXT4_I(inode)->i_block_reservation_lock); 1021 1048 } 1022 1049 ··· 1135 1160 /* Maximum number of blocks we map for direct IO at once. */ 1136 1161 #define DIO_MAX_BLOCKS 4096 1137 1162 1138 - static int ext4_get_block(struct inode *inode, sector_t iblock, 1139 - struct buffer_head *bh_result, int create) 1163 + int ext4_get_block(struct inode *inode, sector_t iblock, 1164 + struct buffer_head *bh_result, int create) 1140 1165 { 1141 1166 handle_t *handle = ext4_journal_current_handle(); 1142 1167 int ret = 0, started = 0; ··· 1216 1241 BUFFER_TRACE(bh, "call get_create_access"); 1217 1242 fatal = ext4_journal_get_create_access(handle, bh); 1218 1243 if (!fatal && !buffer_uptodate(bh)) { 1219 - memset(bh->b_data,0,inode->i_sb->s_blocksize); 1244 + memset(bh->b_data, 0, inode->i_sb->s_blocksize); 1220 1245 set_buffer_uptodate(bh); 1221 1246 } 1222 1247 unlock_buffer(bh); ··· 1241 1266 struct buffer_head *ext4_bread(handle_t *handle, struct inode *inode, 1242 1267 ext4_lblk_t block, int create, int *err) 1243 1268 { 1244 - struct buffer_head * bh; 1269 + struct buffer_head *bh; 1245 1270 1246 1271 bh = ext4_getblk(handle, inode, block, create, err); 1247 1272 if (!bh) ··· 1257 1282 return NULL; 1258 1283 } 1259 1284 1260 - static int walk_page_buffers( handle_t *handle, 1261 - struct buffer_head *head, 1262 - unsigned from, 1263 - unsigned to, 1264 - int *partial, 1265 - int (*fn)( handle_t *handle, 1266 - struct buffer_head *bh)) 1285 + static int walk_page_buffers(handle_t *handle, 1286 + struct buffer_head *head, 1287 + unsigned from, 1288 + unsigned to, 1289 + int *partial, 1290 + int (*fn)(handle_t *handle, 1291 + struct buffer_head *bh)) 1267 1292 { 1268 1293 struct buffer_head *bh; 1269 1294 unsigned block_start, block_end; ··· 1271 1296 int err, ret = 0; 1272 1297 struct buffer_head *next; 1273 1298 1274 - for ( bh = head, block_start = 0; 1275 - ret == 0 && (bh != head || !block_start); 1276 - block_start = block_end, bh = next) 1299 + for (bh = head, block_start = 0; 1300 + ret == 0 && (bh != head || !block_start); 1301 + block_start = block_end, bh = next) 1277 1302 { 1278 1303 next = bh->b_this_page; 1279 1304 block_end = block_start + blocksize; ··· 1326 1351 loff_t pos, unsigned len, unsigned flags, 1327 1352 struct page **pagep, void **fsdata) 1328 1353 { 1329 - struct inode *inode = mapping->host; 1354 + struct inode *inode = mapping->host; 1330 1355 int ret, needed_blocks = ext4_writepage_trans_blocks(inode); 1331 1356 handle_t *handle; 1332 1357 int retries = 0; 1333 - struct page *page; 1358 + struct page *page; 1334 1359 pgoff_t index; 1335 - unsigned from, to; 1360 + unsigned from, to; 1336 1361 1337 1362 index = pos >> PAGE_CACHE_SHIFT; 1338 - from = pos & (PAGE_CACHE_SIZE - 1); 1339 - to = from + len; 1363 + from = pos & (PAGE_CACHE_SIZE - 1); 1364 + to = from + len; 1340 1365 1341 1366 retry: 1342 - handle = ext4_journal_start(inode, needed_blocks); 1343 - if (IS_ERR(handle)) { 1344 - ret = PTR_ERR(handle); 1345 - goto out; 1367 + handle = ext4_journal_start(inode, needed_blocks); 1368 + if (IS_ERR(handle)) { 1369 + ret = PTR_ERR(handle); 1370 + goto out; 1346 1371 } 1347 1372 1348 1373 page = __grab_cache_page(mapping, index); ··· 1362 1387 } 1363 1388 1364 1389 if (ret) { 1365 - unlock_page(page); 1390 + unlock_page(page); 1366 1391 ext4_journal_stop(handle); 1367 - page_cache_release(page); 1392 + page_cache_release(page); 1393 + /* 1394 + * block_write_begin may have instantiated a few blocks 1395 + * outside i_size. Trim these off again. Don't need 1396 + * i_size_read because we hold i_mutex. 1397 + */ 1398 + if (pos + len > inode->i_size) 1399 + vmtruncate(inode, inode->i_size); 1368 1400 } 1369 1401 1370 1402 if (ret == -ENOSPC && ext4_should_retry_alloc(inode->i_sb, &retries)) ··· 1408 1426 ret = ext4_jbd2_file_inode(handle, inode); 1409 1427 1410 1428 if (ret == 0) { 1411 - /* 1412 - * generic_write_end() will run mark_inode_dirty() if i_size 1413 - * changes. So let's piggyback the i_disksize mark_inode_dirty 1414 - * into that. 1415 - */ 1416 1429 loff_t new_i_size; 1417 1430 1418 1431 new_i_size = pos + copied; 1419 - if (new_i_size > EXT4_I(inode)->i_disksize) 1420 - EXT4_I(inode)->i_disksize = new_i_size; 1432 + if (new_i_size > EXT4_I(inode)->i_disksize) { 1433 + ext4_update_i_disksize(inode, new_i_size); 1434 + /* We need to mark inode dirty even if 1435 + * new_i_size is less that inode->i_size 1436 + * bu greater than i_disksize.(hint delalloc) 1437 + */ 1438 + ext4_mark_inode_dirty(handle, inode); 1439 + } 1440 + 1421 1441 ret2 = generic_write_end(file, mapping, pos, len, copied, 1422 1442 page, fsdata); 1423 1443 copied = ret2; ··· 1444 1460 loff_t new_i_size; 1445 1461 1446 1462 new_i_size = pos + copied; 1447 - if (new_i_size > EXT4_I(inode)->i_disksize) 1448 - EXT4_I(inode)->i_disksize = new_i_size; 1463 + if (new_i_size > EXT4_I(inode)->i_disksize) { 1464 + ext4_update_i_disksize(inode, new_i_size); 1465 + /* We need to mark inode dirty even if 1466 + * new_i_size is less that inode->i_size 1467 + * bu greater than i_disksize.(hint delalloc) 1468 + */ 1469 + ext4_mark_inode_dirty(handle, inode); 1470 + } 1449 1471 1450 1472 ret2 = generic_write_end(file, mapping, pos, len, copied, 1451 1473 page, fsdata); ··· 1476 1486 int ret = 0, ret2; 1477 1487 int partial = 0; 1478 1488 unsigned from, to; 1489 + loff_t new_i_size; 1479 1490 1480 1491 from = pos & (PAGE_CACHE_SIZE - 1); 1481 1492 to = from + len; ··· 1491 1500 to, &partial, write_end_fn); 1492 1501 if (!partial) 1493 1502 SetPageUptodate(page); 1494 - if (pos+copied > inode->i_size) 1503 + new_i_size = pos + copied; 1504 + if (new_i_size > inode->i_size) 1495 1505 i_size_write(inode, pos+copied); 1496 1506 EXT4_I(inode)->i_state |= EXT4_STATE_JDATA; 1497 - if (inode->i_size > EXT4_I(inode)->i_disksize) { 1498 - EXT4_I(inode)->i_disksize = inode->i_size; 1507 + if (new_i_size > EXT4_I(inode)->i_disksize) { 1508 + ext4_update_i_disksize(inode, new_i_size); 1499 1509 ret2 = ext4_mark_inode_dirty(handle, inode); 1500 1510 if (!ret) 1501 1511 ret = ret2; ··· 1513 1521 1514 1522 static int ext4_da_reserve_space(struct inode *inode, int nrblocks) 1515 1523 { 1524 + int retries = 0; 1516 1525 struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb); 1517 1526 unsigned long md_needed, mdblocks, total = 0; 1518 1527 ··· 1522 1529 * in order to allocate nrblocks 1523 1530 * worse case is one extent per block 1524 1531 */ 1532 + repeat: 1525 1533 spin_lock(&EXT4_I(inode)->i_block_reservation_lock); 1526 1534 total = EXT4_I(inode)->i_reserved_data_blocks + nrblocks; 1527 1535 mdblocks = ext4_calc_metadata_amount(inode, total); ··· 1531 1537 md_needed = mdblocks - EXT4_I(inode)->i_reserved_meta_blocks; 1532 1538 total = md_needed + nrblocks; 1533 1539 1534 - if (ext4_has_free_blocks(sbi, total) < total) { 1540 + if (ext4_claim_free_blocks(sbi, total)) { 1535 1541 spin_unlock(&EXT4_I(inode)->i_block_reservation_lock); 1542 + if (ext4_should_retry_alloc(inode->i_sb, &retries)) { 1543 + yield(); 1544 + goto repeat; 1545 + } 1536 1546 return -ENOSPC; 1537 1547 } 1538 - /* reduce fs free blocks counter */ 1539 - percpu_counter_sub(&sbi->s_freeblocks_counter, total); 1540 - 1541 1548 EXT4_I(inode)->i_reserved_data_blocks += nrblocks; 1542 1549 EXT4_I(inode)->i_reserved_meta_blocks = mdblocks; 1543 1550 ··· 1580 1585 1581 1586 release = to_free + mdb_free; 1582 1587 1583 - /* update fs free blocks counter for truncate case */ 1584 - percpu_counter_add(&sbi->s_freeblocks_counter, release); 1588 + /* update fs dirty blocks counter for truncate case */ 1589 + percpu_counter_sub(&sbi->s_dirtyblocks_counter, release); 1585 1590 1586 1591 /* update per-inode reservations */ 1587 1592 BUG_ON(to_free > EXT4_I(inode)->i_reserved_data_blocks); ··· 1625 1630 struct writeback_control *wbc; 1626 1631 int io_done; 1627 1632 long pages_written; 1633 + int retval; 1628 1634 }; 1629 1635 1630 1636 /* ··· 1779 1783 unmap_underlying_metadata(bdev, bh->b_blocknr + i); 1780 1784 } 1781 1785 1786 + static void ext4_da_block_invalidatepages(struct mpage_da_data *mpd, 1787 + sector_t logical, long blk_cnt) 1788 + { 1789 + int nr_pages, i; 1790 + pgoff_t index, end; 1791 + struct pagevec pvec; 1792 + struct inode *inode = mpd->inode; 1793 + struct address_space *mapping = inode->i_mapping; 1794 + 1795 + index = logical >> (PAGE_CACHE_SHIFT - inode->i_blkbits); 1796 + end = (logical + blk_cnt - 1) >> 1797 + (PAGE_CACHE_SHIFT - inode->i_blkbits); 1798 + while (index <= end) { 1799 + nr_pages = pagevec_lookup(&pvec, mapping, index, PAGEVEC_SIZE); 1800 + if (nr_pages == 0) 1801 + break; 1802 + for (i = 0; i < nr_pages; i++) { 1803 + struct page *page = pvec.pages[i]; 1804 + index = page->index; 1805 + if (index > end) 1806 + break; 1807 + index++; 1808 + 1809 + BUG_ON(!PageLocked(page)); 1810 + BUG_ON(PageWriteback(page)); 1811 + block_invalidatepage(page, 0); 1812 + ClearPageUptodate(page); 1813 + unlock_page(page); 1814 + } 1815 + } 1816 + return; 1817 + } 1818 + 1819 + static void ext4_print_free_blocks(struct inode *inode) 1820 + { 1821 + struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb); 1822 + printk(KERN_EMERG "Total free blocks count %lld\n", 1823 + ext4_count_free_blocks(inode->i_sb)); 1824 + printk(KERN_EMERG "Free/Dirty block details\n"); 1825 + printk(KERN_EMERG "free_blocks=%lld\n", 1826 + percpu_counter_sum(&sbi->s_freeblocks_counter)); 1827 + printk(KERN_EMERG "dirty_blocks=%lld\n", 1828 + percpu_counter_sum(&sbi->s_dirtyblocks_counter)); 1829 + printk(KERN_EMERG "Block reservation details\n"); 1830 + printk(KERN_EMERG "i_reserved_data_blocks=%lu\n", 1831 + EXT4_I(inode)->i_reserved_data_blocks); 1832 + printk(KERN_EMERG "i_reserved_meta_blocks=%lu\n", 1833 + EXT4_I(inode)->i_reserved_meta_blocks); 1834 + return; 1835 + } 1836 + 1782 1837 /* 1783 1838 * mpage_da_map_blocks - go through given space 1784 1839 * ··· 1839 1792 * The function skips space we know is already mapped to disk blocks. 1840 1793 * 1841 1794 */ 1842 - static void mpage_da_map_blocks(struct mpage_da_data *mpd) 1795 + static int mpage_da_map_blocks(struct mpage_da_data *mpd) 1843 1796 { 1844 1797 int err = 0; 1845 - struct buffer_head *lbh = &mpd->lbh; 1846 - sector_t next = lbh->b_blocknr; 1847 1798 struct buffer_head new; 1799 + struct buffer_head *lbh = &mpd->lbh; 1800 + sector_t next; 1848 1801 1849 1802 /* 1850 1803 * We consider only non-mapped and non-allocated blocks 1851 1804 */ 1852 1805 if (buffer_mapped(lbh) && !buffer_delay(lbh)) 1853 - return; 1854 - 1806 + return 0; 1855 1807 new.b_state = lbh->b_state; 1856 1808 new.b_blocknr = 0; 1857 1809 new.b_size = lbh->b_size; 1858 - 1810 + next = lbh->b_blocknr; 1859 1811 /* 1860 1812 * If we didn't accumulate anything 1861 1813 * to write simply return 1862 1814 */ 1863 1815 if (!new.b_size) 1864 - return; 1816 + return 0; 1865 1817 err = mpd->get_block(mpd->inode, next, &new, 1); 1866 - if (err) 1867 - return; 1818 + if (err) { 1819 + 1820 + /* If get block returns with error 1821 + * we simply return. Later writepage 1822 + * will redirty the page and writepages 1823 + * will find the dirty page again 1824 + */ 1825 + if (err == -EAGAIN) 1826 + return 0; 1827 + 1828 + if (err == -ENOSPC && 1829 + ext4_count_free_blocks(mpd->inode->i_sb)) { 1830 + mpd->retval = err; 1831 + return 0; 1832 + } 1833 + 1834 + /* 1835 + * get block failure will cause us 1836 + * to loop in writepages. Because 1837 + * a_ops->writepage won't be able to 1838 + * make progress. The page will be redirtied 1839 + * by writepage and writepages will again 1840 + * try to write the same. 1841 + */ 1842 + printk(KERN_EMERG "%s block allocation failed for inode %lu " 1843 + "at logical offset %llu with max blocks " 1844 + "%zd with error %d\n", 1845 + __func__, mpd->inode->i_ino, 1846 + (unsigned long long)next, 1847 + lbh->b_size >> mpd->inode->i_blkbits, err); 1848 + printk(KERN_EMERG "This should not happen.!! " 1849 + "Data will be lost\n"); 1850 + if (err == -ENOSPC) { 1851 + ext4_print_free_blocks(mpd->inode); 1852 + } 1853 + /* invlaidate all the pages */ 1854 + ext4_da_block_invalidatepages(mpd, next, 1855 + lbh->b_size >> mpd->inode->i_blkbits); 1856 + return err; 1857 + } 1868 1858 BUG_ON(new.b_size == 0); 1869 1859 1870 1860 if (buffer_new(&new)) ··· 1914 1830 if (buffer_delay(lbh) || buffer_unwritten(lbh)) 1915 1831 mpage_put_bnr_to_bhs(mpd, next, &new); 1916 1832 1917 - return; 1833 + return 0; 1918 1834 } 1919 1835 1920 1836 #define BH_FLAGS ((1 << BH_Uptodate) | (1 << BH_Mapped) | \ ··· 1983 1899 * We couldn't merge the block to our extent, so we 1984 1900 * need to flush current extent and start new one 1985 1901 */ 1986 - mpage_da_map_blocks(mpd); 1987 - mpage_da_submit_io(mpd); 1902 + if (mpage_da_map_blocks(mpd) == 0) 1903 + mpage_da_submit_io(mpd); 1988 1904 mpd->io_done = 1; 1989 1905 return; 1990 1906 } ··· 2026 1942 * and start IO on them using writepage() 2027 1943 */ 2028 1944 if (mpd->next_page != mpd->first_page) { 2029 - mpage_da_map_blocks(mpd); 2030 - mpage_da_submit_io(mpd); 1945 + if (mpage_da_map_blocks(mpd) == 0) 1946 + mpage_da_submit_io(mpd); 2031 1947 /* 2032 1948 * skip rest of the page in the page_vec 2033 1949 */ ··· 2102 2018 */ 2103 2019 static int mpage_da_writepages(struct address_space *mapping, 2104 2020 struct writeback_control *wbc, 2105 - get_block_t get_block) 2021 + struct mpage_da_data *mpd) 2106 2022 { 2107 - struct mpage_da_data mpd; 2108 2023 long to_write; 2109 2024 int ret; 2110 2025 2111 - if (!get_block) 2026 + if (!mpd->get_block) 2112 2027 return generic_writepages(mapping, wbc); 2113 2028 2114 - mpd.wbc = wbc; 2115 - mpd.inode = mapping->host; 2116 - mpd.lbh.b_size = 0; 2117 - mpd.lbh.b_state = 0; 2118 - mpd.lbh.b_blocknr = 0; 2119 - mpd.first_page = 0; 2120 - mpd.next_page = 0; 2121 - mpd.get_block = get_block; 2122 - mpd.io_done = 0; 2123 - mpd.pages_written = 0; 2029 + mpd->lbh.b_size = 0; 2030 + mpd->lbh.b_state = 0; 2031 + mpd->lbh.b_blocknr = 0; 2032 + mpd->first_page = 0; 2033 + mpd->next_page = 0; 2034 + mpd->io_done = 0; 2035 + mpd->pages_written = 0; 2036 + mpd->retval = 0; 2124 2037 2125 2038 to_write = wbc->nr_to_write; 2126 2039 2127 - ret = write_cache_pages(mapping, wbc, __mpage_da_writepage, &mpd); 2040 + ret = write_cache_pages(mapping, wbc, __mpage_da_writepage, mpd); 2128 2041 2129 2042 /* 2130 2043 * Handle last extent of pages 2131 2044 */ 2132 - if (!mpd.io_done && mpd.next_page != mpd.first_page) { 2133 - mpage_da_map_blocks(&mpd); 2134 - mpage_da_submit_io(&mpd); 2045 + if (!mpd->io_done && mpd->next_page != mpd->first_page) { 2046 + if (mpage_da_map_blocks(mpd) == 0) 2047 + mpage_da_submit_io(mpd); 2135 2048 } 2136 2049 2137 - wbc->nr_to_write = to_write - mpd.pages_written; 2050 + wbc->nr_to_write = to_write - mpd->pages_written; 2138 2051 return ret; 2139 2052 } 2140 2053 ··· 2184 2103 handle_t *handle = NULL; 2185 2104 2186 2105 handle = ext4_journal_current_handle(); 2187 - if (!handle) { 2188 - ret = ext4_get_blocks_wrap(handle, inode, iblock, max_blocks, 2189 - bh_result, 0, 0, 0); 2190 - BUG_ON(!ret); 2191 - } else { 2192 - ret = ext4_get_blocks_wrap(handle, inode, iblock, max_blocks, 2193 - bh_result, create, 0, EXT4_DELALLOC_RSVED); 2194 - } 2195 - 2106 + BUG_ON(!handle); 2107 + ret = ext4_get_blocks_wrap(handle, inode, iblock, max_blocks, 2108 + bh_result, create, 0, EXT4_DELALLOC_RSVED); 2196 2109 if (ret > 0) { 2110 + 2197 2111 bh_result->b_size = (ret << inode->i_blkbits); 2112 + 2113 + if (ext4_should_order_data(inode)) { 2114 + int retval; 2115 + retval = ext4_jbd2_file_inode(handle, inode); 2116 + if (retval) 2117 + /* 2118 + * Failed to add inode for ordered 2119 + * mode. Don't update file size 2120 + */ 2121 + return retval; 2122 + } 2198 2123 2199 2124 /* 2200 2125 * Update on-disk size along with block allocation ··· 2211 2124 if (disksize > i_size_read(inode)) 2212 2125 disksize = i_size_read(inode); 2213 2126 if (disksize > EXT4_I(inode)->i_disksize) { 2214 - /* 2215 - * XXX: replace with spinlock if seen contended -bzzz 2216 - */ 2217 - down_write(&EXT4_I(inode)->i_data_sem); 2218 - if (disksize > EXT4_I(inode)->i_disksize) 2219 - EXT4_I(inode)->i_disksize = disksize; 2220 - up_write(&EXT4_I(inode)->i_data_sem); 2221 - 2222 - if (EXT4_I(inode)->i_disksize == disksize) { 2223 - ret = ext4_mark_inode_dirty(handle, inode); 2224 - return ret; 2225 - } 2127 + ext4_update_i_disksize(inode, disksize); 2128 + ret = ext4_mark_inode_dirty(handle, inode); 2129 + return ret; 2226 2130 } 2227 2131 ret = 0; 2228 2132 } ··· 2362 2284 { 2363 2285 handle_t *handle = NULL; 2364 2286 loff_t range_start = 0; 2287 + struct mpage_da_data mpd; 2365 2288 struct inode *inode = mapping->host; 2366 2289 int needed_blocks, ret = 0, nr_to_writebump = 0; 2367 2290 long to_write, pages_skipped = 0; ··· 2396 2317 range_start = wbc->range_start; 2397 2318 pages_skipped = wbc->pages_skipped; 2398 2319 2320 + mpd.wbc = wbc; 2321 + mpd.inode = mapping->host; 2322 + 2399 2323 restart_loop: 2400 2324 to_write = wbc->nr_to_write; 2401 2325 while (!ret && to_write > 0) { ··· 2422 2340 dump_stack(); 2423 2341 goto out_writepages; 2424 2342 } 2425 - if (ext4_should_order_data(inode)) { 2426 - /* 2427 - * With ordered mode we need to add 2428 - * the inode to the journal handl 2429 - * when we do block allocation. 2430 - */ 2431 - ret = ext4_jbd2_file_inode(handle, inode); 2432 - if (ret) { 2433 - ext4_journal_stop(handle); 2434 - goto out_writepages; 2435 - } 2436 - } 2437 - 2438 2343 to_write -= wbc->nr_to_write; 2439 - ret = mpage_da_writepages(mapping, wbc, 2440 - ext4_da_get_block_write); 2344 + 2345 + mpd.get_block = ext4_da_get_block_write; 2346 + ret = mpage_da_writepages(mapping, wbc, &mpd); 2347 + 2441 2348 ext4_journal_stop(handle); 2349 + 2350 + if (mpd.retval == -ENOSPC) 2351 + jbd2_journal_force_commit_nested(sbi->s_journal); 2352 + 2353 + /* reset the retry count */ 2442 2354 if (ret == MPAGE_DA_EXTENT_TAIL) { 2443 2355 /* 2444 2356 * got one extent now try with ··· 2467 2391 return ret; 2468 2392 } 2469 2393 2394 + #define FALL_BACK_TO_NONDELALLOC 1 2395 + static int ext4_nonda_switch(struct super_block *sb) 2396 + { 2397 + s64 free_blocks, dirty_blocks; 2398 + struct ext4_sb_info *sbi = EXT4_SB(sb); 2399 + 2400 + /* 2401 + * switch to non delalloc mode if we are running low 2402 + * on free block. The free block accounting via percpu 2403 + * counters can get slightly wrong with FBC_BATCH getting 2404 + * accumulated on each CPU without updating global counters 2405 + * Delalloc need an accurate free block accounting. So switch 2406 + * to non delalloc when we are near to error range. 2407 + */ 2408 + free_blocks = percpu_counter_read_positive(&sbi->s_freeblocks_counter); 2409 + dirty_blocks = percpu_counter_read_positive(&sbi->s_dirtyblocks_counter); 2410 + if (2 * free_blocks < 3 * dirty_blocks || 2411 + free_blocks < (dirty_blocks + EXT4_FREEBLOCKS_WATERMARK)) { 2412 + /* 2413 + * free block count is less that 150% of dirty blocks 2414 + * or free blocks is less that watermark 2415 + */ 2416 + return 1; 2417 + } 2418 + return 0; 2419 + } 2420 + 2470 2421 static int ext4_da_write_begin(struct file *file, struct address_space *mapping, 2471 2422 loff_t pos, unsigned len, unsigned flags, 2472 2423 struct page **pagep, void **fsdata) ··· 2509 2406 from = pos & (PAGE_CACHE_SIZE - 1); 2510 2407 to = from + len; 2511 2408 2409 + if (ext4_nonda_switch(inode->i_sb)) { 2410 + *fsdata = (void *)FALL_BACK_TO_NONDELALLOC; 2411 + return ext4_write_begin(file, mapping, pos, 2412 + len, flags, pagep, fsdata); 2413 + } 2414 + *fsdata = (void *)0; 2512 2415 retry: 2513 2416 /* 2514 2417 * With delayed allocation, we don't log the i_disksize update ··· 2542 2433 unlock_page(page); 2543 2434 ext4_journal_stop(handle); 2544 2435 page_cache_release(page); 2436 + /* 2437 + * block_write_begin may have instantiated a few blocks 2438 + * outside i_size. Trim these off again. Don't need 2439 + * i_size_read because we hold i_mutex. 2440 + */ 2441 + if (pos + len > inode->i_size) 2442 + vmtruncate(inode, inode->i_size); 2545 2443 } 2546 2444 2547 2445 if (ret == -ENOSPC && ext4_should_retry_alloc(inode->i_sb, &retries)) ··· 2572 2456 bh = page_buffers(page); 2573 2457 idx = offset >> inode->i_blkbits; 2574 2458 2575 - for (i=0; i < idx; i++) 2459 + for (i = 0; i < idx; i++) 2576 2460 bh = bh->b_this_page; 2577 2461 2578 2462 if (!buffer_mapped(bh) || (buffer_delay(bh))) ··· 2590 2474 handle_t *handle = ext4_journal_current_handle(); 2591 2475 loff_t new_i_size; 2592 2476 unsigned long start, end; 2477 + int write_mode = (int)(unsigned long)fsdata; 2478 + 2479 + if (write_mode == FALL_BACK_TO_NONDELALLOC) { 2480 + if (ext4_should_order_data(inode)) { 2481 + return ext4_ordered_write_end(file, mapping, pos, 2482 + len, copied, page, fsdata); 2483 + } else if (ext4_should_writeback_data(inode)) { 2484 + return ext4_writeback_write_end(file, mapping, pos, 2485 + len, copied, page, fsdata); 2486 + } else { 2487 + BUG(); 2488 + } 2489 + } 2593 2490 2594 2491 start = pos & (PAGE_CACHE_SIZE - 1); 2595 - end = start + copied -1; 2492 + end = start + copied - 1; 2596 2493 2597 2494 /* 2598 2495 * generic_write_end() will run mark_inode_dirty() if i_size ··· 2629 2500 EXT4_I(inode)->i_disksize = new_i_size; 2630 2501 } 2631 2502 up_write(&EXT4_I(inode)->i_data_sem); 2503 + /* We need to mark inode dirty even if 2504 + * new_i_size is less that inode->i_size 2505 + * bu greater than i_disksize.(hint delalloc) 2506 + */ 2507 + ext4_mark_inode_dirty(handle, inode); 2632 2508 } 2633 2509 } 2634 2510 ret2 = generic_write_end(file, mapping, pos, len, copied, ··· 2725 2591 return 0; 2726 2592 } 2727 2593 2728 - return generic_block_bmap(mapping,block,ext4_get_block); 2594 + return generic_block_bmap(mapping, block, ext4_get_block); 2729 2595 } 2730 2596 2731 2597 static int bget_one(handle_t *handle, struct buffer_head *bh) ··· 3331 3197 if (!partial->key && *partial->p) 3332 3198 /* Writer: end */ 3333 3199 goto no_top; 3334 - for (p=partial; p>chain && all_zeroes((__le32*)p->bh->b_data,p->p); p--) 3200 + for (p = partial; (p > chain) && all_zeroes((__le32 *) p->bh->b_data, p->p); p--) 3335 3201 ; 3336 3202 /* 3337 3203 * OK, we've found the last block that must survive. The rest of our ··· 3350 3216 } 3351 3217 /* Writer: end */ 3352 3218 3353 - while(partial > p) { 3219 + while (partial > p) { 3354 3220 brelse(partial->bh); 3355 3221 partial--; 3356 3222 } ··· 3542 3408 /* This zaps the entire block. Bottom up. */ 3543 3409 BUFFER_TRACE(bh, "free child branches"); 3544 3410 ext4_free_branches(handle, inode, bh, 3545 - (__le32*)bh->b_data, 3546 - (__le32*)bh->b_data + addr_per_block, 3547 - depth); 3411 + (__le32 *) bh->b_data, 3412 + (__le32 *) bh->b_data + addr_per_block, 3413 + depth); 3548 3414 3549 3415 /* 3550 3416 * We've probably journalled the indirect block several ··· 3712 3578 */ 3713 3579 down_write(&ei->i_data_sem); 3714 3580 3715 - ext4_discard_reservation(inode); 3581 + ext4_discard_preallocations(inode); 3716 3582 3717 3583 /* 3718 3584 * The orphan list entry will now protect us from any crash which ··· 3807 3673 ext4_journal_stop(handle); 3808 3674 } 3809 3675 3810 - static ext4_fsblk_t ext4_get_inode_block(struct super_block *sb, 3811 - unsigned long ino, struct ext4_iloc *iloc) 3812 - { 3813 - ext4_group_t block_group; 3814 - unsigned long offset; 3815 - ext4_fsblk_t block; 3816 - struct ext4_group_desc *gdp; 3817 - 3818 - if (!ext4_valid_inum(sb, ino)) { 3819 - /* 3820 - * This error is already checked for in namei.c unless we are 3821 - * looking at an NFS filehandle, in which case no error 3822 - * report is needed 3823 - */ 3824 - return 0; 3825 - } 3826 - 3827 - block_group = (ino - 1) / EXT4_INODES_PER_GROUP(sb); 3828 - gdp = ext4_get_group_desc(sb, block_group, NULL); 3829 - if (!gdp) 3830 - return 0; 3831 - 3832 - /* 3833 - * Figure out the offset within the block group inode table 3834 - */ 3835 - offset = ((ino - 1) % EXT4_INODES_PER_GROUP(sb)) * 3836 - EXT4_INODE_SIZE(sb); 3837 - block = ext4_inode_table(sb, gdp) + 3838 - (offset >> EXT4_BLOCK_SIZE_BITS(sb)); 3839 - 3840 - iloc->block_group = block_group; 3841 - iloc->offset = offset & (EXT4_BLOCK_SIZE(sb) - 1); 3842 - return block; 3843 - } 3844 - 3845 3676 /* 3846 3677 * ext4_get_inode_loc returns with an extra refcount against the inode's 3847 3678 * underlying buffer_head on success. If 'in_mem' is true, we have all ··· 3816 3717 static int __ext4_get_inode_loc(struct inode *inode, 3817 3718 struct ext4_iloc *iloc, int in_mem) 3818 3719 { 3819 - ext4_fsblk_t block; 3820 - struct buffer_head *bh; 3720 + struct ext4_group_desc *gdp; 3721 + struct buffer_head *bh; 3722 + struct super_block *sb = inode->i_sb; 3723 + ext4_fsblk_t block; 3724 + int inodes_per_block, inode_offset; 3821 3725 3822 - block = ext4_get_inode_block(inode->i_sb, inode->i_ino, iloc); 3823 - if (!block) 3726 + iloc->bh = 0; 3727 + if (!ext4_valid_inum(sb, inode->i_ino)) 3824 3728 return -EIO; 3825 3729 3826 - bh = sb_getblk(inode->i_sb, block); 3730 + iloc->block_group = (inode->i_ino - 1) / EXT4_INODES_PER_GROUP(sb); 3731 + gdp = ext4_get_group_desc(sb, iloc->block_group, NULL); 3732 + if (!gdp) 3733 + return -EIO; 3734 + 3735 + /* 3736 + * Figure out the offset within the block group inode table 3737 + */ 3738 + inodes_per_block = (EXT4_BLOCK_SIZE(sb) / EXT4_INODE_SIZE(sb)); 3739 + inode_offset = ((inode->i_ino - 1) % 3740 + EXT4_INODES_PER_GROUP(sb)); 3741 + block = ext4_inode_table(sb, gdp) + (inode_offset / inodes_per_block); 3742 + iloc->offset = (inode_offset % inodes_per_block) * EXT4_INODE_SIZE(sb); 3743 + 3744 + bh = sb_getblk(sb, block); 3827 3745 if (!bh) { 3828 - ext4_error (inode->i_sb, "ext4_get_inode_loc", 3829 - "unable to read inode block - " 3830 - "inode=%lu, block=%llu", 3831 - inode->i_ino, block); 3746 + ext4_error(sb, "ext4_get_inode_loc", "unable to read " 3747 + "inode block - inode=%lu, block=%llu", 3748 + inode->i_ino, block); 3832 3749 return -EIO; 3833 3750 } 3834 3751 if (!buffer_uptodate(bh)) { ··· 3872 3757 */ 3873 3758 if (in_mem) { 3874 3759 struct buffer_head *bitmap_bh; 3875 - struct ext4_group_desc *desc; 3876 - int inodes_per_buffer; 3877 - int inode_offset, i; 3878 - ext4_group_t block_group; 3879 - int start; 3760 + int i, start; 3880 3761 3881 - block_group = (inode->i_ino - 1) / 3882 - EXT4_INODES_PER_GROUP(inode->i_sb); 3883 - inodes_per_buffer = bh->b_size / 3884 - EXT4_INODE_SIZE(inode->i_sb); 3885 - inode_offset = ((inode->i_ino - 1) % 3886 - EXT4_INODES_PER_GROUP(inode->i_sb)); 3887 - start = inode_offset & ~(inodes_per_buffer - 1); 3762 + start = inode_offset & ~(inodes_per_block - 1); 3888 3763 3889 3764 /* Is the inode bitmap in cache? */ 3890 - desc = ext4_get_group_desc(inode->i_sb, 3891 - block_group, NULL); 3892 - if (!desc) 3893 - goto make_io; 3894 - 3895 - bitmap_bh = sb_getblk(inode->i_sb, 3896 - ext4_inode_bitmap(inode->i_sb, desc)); 3765 + bitmap_bh = sb_getblk(sb, ext4_inode_bitmap(sb, gdp)); 3897 3766 if (!bitmap_bh) 3898 3767 goto make_io; 3899 3768 ··· 3890 3791 brelse(bitmap_bh); 3891 3792 goto make_io; 3892 3793 } 3893 - for (i = start; i < start + inodes_per_buffer; i++) { 3794 + for (i = start; i < start + inodes_per_block; i++) { 3894 3795 if (i == inode_offset) 3895 3796 continue; 3896 3797 if (ext4_test_bit(i, bitmap_bh->b_data)) 3897 3798 break; 3898 3799 } 3899 3800 brelse(bitmap_bh); 3900 - if (i == start + inodes_per_buffer) { 3801 + if (i == start + inodes_per_block) { 3901 3802 /* all other inodes are free, so skip I/O */ 3902 3803 memset(bh->b_data, 0, bh->b_size); 3903 3804 set_buffer_uptodate(bh); ··· 3908 3809 3909 3810 make_io: 3910 3811 /* 3812 + * If we need to do any I/O, try to pre-readahead extra 3813 + * blocks from the inode table. 3814 + */ 3815 + if (EXT4_SB(sb)->s_inode_readahead_blks) { 3816 + ext4_fsblk_t b, end, table; 3817 + unsigned num; 3818 + 3819 + table = ext4_inode_table(sb, gdp); 3820 + /* Make sure s_inode_readahead_blks is a power of 2 */ 3821 + while (EXT4_SB(sb)->s_inode_readahead_blks & 3822 + (EXT4_SB(sb)->s_inode_readahead_blks-1)) 3823 + EXT4_SB(sb)->s_inode_readahead_blks = 3824 + (EXT4_SB(sb)->s_inode_readahead_blks & 3825 + (EXT4_SB(sb)->s_inode_readahead_blks-1)); 3826 + b = block & ~(EXT4_SB(sb)->s_inode_readahead_blks-1); 3827 + if (table > b) 3828 + b = table; 3829 + end = b + EXT4_SB(sb)->s_inode_readahead_blks; 3830 + num = EXT4_INODES_PER_GROUP(sb); 3831 + if (EXT4_HAS_RO_COMPAT_FEATURE(sb, 3832 + EXT4_FEATURE_RO_COMPAT_GDT_CSUM)) 3833 + num -= le16_to_cpu(gdp->bg_itable_unused); 3834 + table += num / inodes_per_block; 3835 + if (end > table) 3836 + end = table; 3837 + while (b <= end) 3838 + sb_breadahead(sb, b++); 3839 + } 3840 + 3841 + /* 3911 3842 * There are other valid inodes in the buffer, this inode 3912 3843 * has in-inode xattrs, or we don't have this inode in memory. 3913 3844 * Read the block from disk. ··· 3947 3818 submit_bh(READ_META, bh); 3948 3819 wait_on_buffer(bh); 3949 3820 if (!buffer_uptodate(bh)) { 3950 - ext4_error(inode->i_sb, "ext4_get_inode_loc", 3951 - "unable to read inode block - " 3952 - "inode=%lu, block=%llu", 3953 - inode->i_ino, block); 3821 + ext4_error(sb, __func__, 3822 + "unable to read inode block - inode=%lu, " 3823 + "block=%llu", inode->i_ino, block); 3954 3824 brelse(bh); 3955 3825 return -EIO; 3956 3826 } ··· 4041 3913 return inode; 4042 3914 4043 3915 ei = EXT4_I(inode); 4044 - #ifdef CONFIG_EXT4DEV_FS_POSIX_ACL 3916 + #ifdef CONFIG_EXT4_FS_POSIX_ACL 4045 3917 ei->i_acl = EXT4_ACL_NOT_CACHED; 4046 3918 ei->i_default_acl = EXT4_ACL_NOT_CACHED; 4047 3919 #endif 4048 - ei->i_block_alloc_info = NULL; 4049 3920 4050 3921 ret = __ext4_get_inode_loc(inode, &iloc, 0); 4051 3922 if (ret < 0) ··· 4054 3927 inode->i_mode = le16_to_cpu(raw_inode->i_mode); 4055 3928 inode->i_uid = (uid_t)le16_to_cpu(raw_inode->i_uid_low); 4056 3929 inode->i_gid = (gid_t)le16_to_cpu(raw_inode->i_gid_low); 4057 - if(!(test_opt (inode->i_sb, NO_UID32))) { 3930 + if (!(test_opt(inode->i_sb, NO_UID32))) { 4058 3931 inode->i_uid |= le16_to_cpu(raw_inode->i_uid_high) << 16; 4059 3932 inode->i_gid |= le16_to_cpu(raw_inode->i_gid_high) << 16; 4060 3933 } ··· 4072 3945 if (inode->i_mode == 0 || 4073 3946 !(EXT4_SB(inode->i_sb)->s_mount_state & EXT4_ORPHAN_FS)) { 4074 3947 /* this inode is deleted */ 4075 - brelse (bh); 3948 + brelse(bh); 4076 3949 ret = -ESTALE; 4077 3950 goto bad_inode; 4078 3951 } ··· 4105 3978 ei->i_extra_isize = le16_to_cpu(raw_inode->i_extra_isize); 4106 3979 if (EXT4_GOOD_OLD_INODE_SIZE + ei->i_extra_isize > 4107 3980 EXT4_INODE_SIZE(inode->i_sb)) { 4108 - brelse (bh); 3981 + brelse(bh); 4109 3982 ret = -EIO; 4110 3983 goto bad_inode; 4111 3984 } ··· 4158 4031 init_special_inode(inode, inode->i_mode, 4159 4032 new_decode_dev(le32_to_cpu(raw_inode->i_block[1]))); 4160 4033 } 4161 - brelse (iloc.bh); 4034 + brelse(iloc.bh); 4162 4035 ext4_set_inode_flags(inode); 4163 4036 unlock_new_inode(inode); 4164 4037 return inode; ··· 4240 4113 4241 4114 ext4_get_inode_flags(ei); 4242 4115 raw_inode->i_mode = cpu_to_le16(inode->i_mode); 4243 - if(!(test_opt(inode->i_sb, NO_UID32))) { 4116 + if (!(test_opt(inode->i_sb, NO_UID32))) { 4244 4117 raw_inode->i_uid_low = cpu_to_le16(low_16_bits(inode->i_uid)); 4245 4118 raw_inode->i_gid_low = cpu_to_le16(low_16_bits(inode->i_gid)); 4246 4119 /* 4247 4120 * Fix up interoperability with old kernels. Otherwise, old inodes get 4248 4121 * re-used with the upper 16 bits of the uid/gid intact 4249 4122 */ 4250 - if(!ei->i_dtime) { 4123 + if (!ei->i_dtime) { 4251 4124 raw_inode->i_uid_high = 4252 4125 cpu_to_le16(high_16_bits(inode->i_uid)); 4253 4126 raw_inode->i_gid_high = ··· 4335 4208 ei->i_state &= ~EXT4_STATE_NEW; 4336 4209 4337 4210 out_brelse: 4338 - brelse (bh); 4211 + brelse(bh); 4339 4212 ext4_std_error(inode->i_sb, err); 4340 4213 return err; 4341 4214 } ··· 4938 4811 loff_t size; 4939 4812 unsigned long len; 4940 4813 int ret = -EINVAL; 4814 + void *fsdata; 4941 4815 struct file *file = vma->vm_file; 4942 4816 struct inode *inode = file->f_path.dentry->d_inode; 4943 4817 struct address_space *mapping = inode->i_mapping; ··· 4977 4849 * on the same page though 4978 4850 */ 4979 4851 ret = mapping->a_ops->write_begin(file, mapping, page_offset(page), 4980 - len, AOP_FLAG_UNINTERRUPTIBLE, &page, NULL); 4852 + len, AOP_FLAG_UNINTERRUPTIBLE, &page, &fsdata); 4981 4853 if (ret < 0) 4982 4854 goto out_unlock; 4983 4855 ret = mapping->a_ops->write_end(file, mapping, page_offset(page), 4984 - len, len, page, NULL); 4856 + len, len, page, fsdata); 4985 4857 if (ret < 0) 4986 4858 goto out_unlock; 4987 4859 ret = 0;

+37 -47

fs/ext4/ioctl.c

··· 23 23 struct inode *inode = filp->f_dentry->d_inode; 24 24 struct ext4_inode_info *ei = EXT4_I(inode); 25 25 unsigned int flags; 26 - unsigned short rsv_window_size; 27 26 28 - ext4_debug ("cmd = %u, arg = %lu\n", cmd, arg); 27 + ext4_debug("cmd = %u, arg = %lu\n", cmd, arg); 29 28 30 29 switch (cmd) { 31 30 case EXT4_IOC_GETFLAGS: ··· 33 34 return put_user(flags, (int __user *) arg); 34 35 case EXT4_IOC_SETFLAGS: { 35 36 handle_t *handle = NULL; 36 - int err; 37 + int err, migrate = 0; 37 38 struct ext4_iloc iloc; 38 39 unsigned int oldflags; 39 40 unsigned int jflag; ··· 81 82 if (!capable(CAP_SYS_RESOURCE)) 82 83 goto flags_out; 83 84 } 85 + if (oldflags & EXT4_EXTENTS_FL) { 86 + /* We don't support clearning extent flags */ 87 + if (!(flags & EXT4_EXTENTS_FL)) { 88 + err = -EOPNOTSUPP; 89 + goto flags_out; 90 + } 91 + } else if (flags & EXT4_EXTENTS_FL) { 92 + /* migrate the file */ 93 + migrate = 1; 94 + flags &= ~EXT4_EXTENTS_FL; 95 + } 84 96 85 97 handle = ext4_journal_start(inode, 1); 86 98 if (IS_ERR(handle)) { ··· 119 109 120 110 if ((jflag ^ oldflags) & (EXT4_JOURNAL_DATA_FL)) 121 111 err = ext4_change_inode_journal_flag(inode, jflag); 112 + if (err) 113 + goto flags_out; 114 + if (migrate) 115 + err = ext4_ext_migrate(inode); 122 116 flags_out: 123 117 mutex_unlock(&inode->i_mutex); 124 118 mnt_drop_write(filp->f_path.mnt); ··· 189 175 return ret; 190 176 } 191 177 #endif 192 - case EXT4_IOC_GETRSVSZ: 193 - if (test_opt(inode->i_sb, RESERVATION) 194 - && S_ISREG(inode->i_mode) 195 - && ei->i_block_alloc_info) { 196 - rsv_window_size = ei->i_block_alloc_info->rsv_window_node.rsv_goal_size; 197 - return put_user(rsv_window_size, (int __user *)arg); 198 - } 199 - return -ENOTTY; 200 - case EXT4_IOC_SETRSVSZ: { 201 - int err; 202 - 203 - if (!test_opt(inode->i_sb, RESERVATION) ||!S_ISREG(inode->i_mode)) 204 - return -ENOTTY; 205 - 206 - if (!is_owner_or_cap(inode)) 207 - return -EACCES; 208 - 209 - if (get_user(rsv_window_size, (int __user *)arg)) 210 - return -EFAULT; 211 - 212 - err = mnt_want_write(filp->f_path.mnt); 213 - if (err) 214 - return err; 215 - 216 - if (rsv_window_size > EXT4_MAX_RESERVE_BLOCKS) 217 - rsv_window_size = EXT4_MAX_RESERVE_BLOCKS; 218 - 219 - /* 220 - * need to allocate reservation structure for this inode 221 - * before set the window size 222 - */ 223 - down_write(&ei->i_data_sem); 224 - if (!ei->i_block_alloc_info) 225 - ext4_init_block_alloc_info(inode); 226 - 227 - if (ei->i_block_alloc_info){ 228 - struct ext4_reserve_window_node *rsv = &ei->i_block_alloc_info->rsv_window_node; 229 - rsv->rsv_goal_size = rsv_window_size; 230 - } 231 - up_write(&ei->i_data_sem); 232 - mnt_drop_write(filp->f_path.mnt); 233 - return 0; 234 - } 235 178 case EXT4_IOC_GROUP_EXTEND: { 236 179 ext4_fsblk_t n_blocks_count; 237 180 struct super_block *sb = inode->i_sb; ··· 238 267 } 239 268 240 269 case EXT4_IOC_MIGRATE: 241 - return ext4_ext_migrate(inode, filp, cmd, arg); 270 + { 271 + int err; 272 + if (!is_owner_or_cap(inode)) 273 + return -EACCES; 274 + 275 + err = mnt_want_write(filp->f_path.mnt); 276 + if (err) 277 + return err; 278 + /* 279 + * inode_mutex prevent write and truncate on the file. 280 + * Read still goes through. We take i_data_sem in 281 + * ext4_ext_swap_inode_data before we switch the 282 + * inode format to prevent read. 283 + */ 284 + mutex_lock(&(inode->i_mutex)); 285 + err = ext4_ext_migrate(inode); 286 + mutex_unlock(&(inode->i_mutex)); 287 + mnt_drop_write(filp->f_path.mnt); 288 + return err; 289 + } 242 290 243 291 default: 244 292 return -ENOTTY;

+64 -156

fs/ext4/mballoc.c

··· 477 477 b2 = (unsigned char *) bitmap; 478 478 for (i = 0; i < e4b->bd_sb->s_blocksize; i++) { 479 479 if (b1[i] != b2[i]) { 480 - printk("corruption in group %lu at byte %u(%u):" 481 - " %x in copy != %x on disk/prealloc\n", 482 - e4b->bd_group, i, i * 8, b1[i], b2[i]); 480 + printk(KERN_ERR "corruption in group %lu " 481 + "at byte %u(%u): %x in copy != %x " 482 + "on disk/prealloc\n", 483 + e4b->bd_group, i, i * 8, b1[i], b2[i]); 483 484 BUG(); 484 485 } 485 486 } ··· 533 532 struct list_head *cur; 534 533 void *buddy; 535 534 void *buddy2; 536 - 537 - if (!test_opt(sb, MBALLOC)) 538 - return 0; 539 535 540 536 { 541 537 static int mb_check_counter; ··· 782 784 if (bh[i] == NULL) 783 785 goto out; 784 786 785 - if (bh_uptodate_or_lock(bh[i])) 787 + if (buffer_uptodate(bh[i]) && 788 + !(desc->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT))) 786 789 continue; 787 790 791 + lock_buffer(bh[i]); 788 792 spin_lock(sb_bgl_lock(EXT4_SB(sb), first_group + i)); 789 793 if (desc->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) { 790 794 ext4_init_block_bitmap(sb, bh[i], ··· 2169 2169 { 2170 2170 struct ext4_sb_info *sbi = EXT4_SB(sb); 2171 2171 2172 - remove_proc_entry("mb_groups", sbi->s_mb_proc); 2173 - remove_proc_entry("mb_history", sbi->s_mb_proc); 2174 - 2172 + if (sbi->s_proc != NULL) { 2173 + remove_proc_entry("mb_groups", sbi->s_proc); 2174 + remove_proc_entry("mb_history", sbi->s_proc); 2175 + } 2175 2176 kfree(sbi->s_mb_history); 2176 2177 } 2177 2178 ··· 2181 2180 struct ext4_sb_info *sbi = EXT4_SB(sb); 2182 2181 int i; 2183 2182 2184 - if (sbi->s_mb_proc != NULL) { 2185 - proc_create_data("mb_history", S_IRUGO, sbi->s_mb_proc, 2183 + if (sbi->s_proc != NULL) { 2184 + proc_create_data("mb_history", S_IRUGO, sbi->s_proc, 2186 2185 &ext4_mb_seq_history_fops, sb); 2187 - proc_create_data("mb_groups", S_IRUGO, sbi->s_mb_proc, 2186 + proc_create_data("mb_groups", S_IRUGO, sbi->s_proc, 2188 2187 &ext4_mb_seq_groups_fops, sb); 2189 2188 } 2190 2189 ··· 2486 2485 unsigned max; 2487 2486 int ret; 2488 2487 2489 - if (!test_opt(sb, MBALLOC)) 2490 - return 0; 2491 - 2492 2488 i = (sb->s_blocksize_bits + 2) * sizeof(unsigned short); 2493 2489 2494 2490 sbi->s_mb_offsets = kmalloc(i, GFP_KERNEL); 2495 2491 if (sbi->s_mb_offsets == NULL) { 2496 - clear_opt(sbi->s_mount_opt, MBALLOC); 2497 2492 return -ENOMEM; 2498 2493 } 2499 2494 sbi->s_mb_maxs = kmalloc(i, GFP_KERNEL); 2500 2495 if (sbi->s_mb_maxs == NULL) { 2501 - clear_opt(sbi->s_mount_opt, MBALLOC); 2502 2496 kfree(sbi->s_mb_maxs); 2503 2497 return -ENOMEM; 2504 2498 } ··· 2516 2520 /* init file for buddy data */ 2517 2521 ret = ext4_mb_init_backend(sb); 2518 2522 if (ret != 0) { 2519 - clear_opt(sbi->s_mount_opt, MBALLOC); 2520 2523 kfree(sbi->s_mb_offsets); 2521 2524 kfree(sbi->s_mb_maxs); 2522 2525 return ret; ··· 2535 2540 sbi->s_mb_history_filter = EXT4_MB_HISTORY_DEFAULT; 2536 2541 sbi->s_mb_group_prealloc = MB_DEFAULT_GROUP_PREALLOC; 2537 2542 2538 - i = sizeof(struct ext4_locality_group) * nr_cpu_ids; 2539 - sbi->s_locality_groups = kmalloc(i, GFP_KERNEL); 2543 + sbi->s_locality_groups = alloc_percpu(struct ext4_locality_group); 2540 2544 if (sbi->s_locality_groups == NULL) { 2541 - clear_opt(sbi->s_mount_opt, MBALLOC); 2542 2545 kfree(sbi->s_mb_offsets); 2543 2546 kfree(sbi->s_mb_maxs); 2544 2547 return -ENOMEM; 2545 2548 } 2546 - for (i = 0; i < nr_cpu_ids; i++) { 2549 + for_each_possible_cpu(i) { 2547 2550 struct ext4_locality_group *lg; 2548 - lg = &sbi->s_locality_groups[i]; 2551 + lg = per_cpu_ptr(sbi->s_locality_groups, i); 2549 2552 mutex_init(&lg->lg_mutex); 2550 2553 for (j = 0; j < PREALLOC_TB_SIZE; j++) 2551 2554 INIT_LIST_HEAD(&lg->lg_prealloc_list[j]); ··· 2553 2560 ext4_mb_init_per_dev_proc(sb); 2554 2561 ext4_mb_history_init(sb); 2555 2562 2556 - printk("EXT4-fs: mballoc enabled\n"); 2563 + printk(KERN_INFO "EXT4-fs: mballoc enabled\n"); 2557 2564 return 0; 2558 2565 } 2559 2566 ··· 2581 2588 int num_meta_group_infos; 2582 2589 struct ext4_group_info *grinfo; 2583 2590 struct ext4_sb_info *sbi = EXT4_SB(sb); 2584 - 2585 - if (!test_opt(sb, MBALLOC)) 2586 - return 0; 2587 2591 2588 2592 /* release freed, non-committed blocks */ 2589 2593 spin_lock(&sbi->s_md_lock); ··· 2637 2647 atomic_read(&sbi->s_mb_discarded)); 2638 2648 } 2639 2649 2640 - kfree(sbi->s_locality_groups); 2641 - 2650 + free_percpu(sbi->s_locality_groups); 2642 2651 ext4_mb_history_release(sb); 2643 2652 ext4_mb_destroy_per_dev_proc(sb); 2644 2653 ··· 2710 2721 #define EXT4_MB_STREAM_REQ "stream_req" 2711 2722 #define EXT4_MB_GROUP_PREALLOC "group_prealloc" 2712 2723 2713 - 2714 - 2715 - #define MB_PROC_FOPS(name) \ 2716 - static int ext4_mb_##name##_proc_show(struct seq_file *m, void *v) \ 2717 - { \ 2718 - struct ext4_sb_info *sbi = m->private; \ 2719 - \ 2720 - seq_printf(m, "%ld\n", sbi->s_mb_##name); \ 2721 - return 0; \ 2722 - } \ 2723 - \ 2724 - static int ext4_mb_##name##_proc_open(struct inode *inode, struct file *file)\ 2725 - { \ 2726 - return single_open(file, ext4_mb_##name##_proc_show, PDE(inode)->data);\ 2727 - } \ 2728 - \ 2729 - static ssize_t ext4_mb_##name##_proc_write(struct file *file, \ 2730 - const char __user *buf, size_t cnt, loff_t *ppos) \ 2731 - { \ 2732 - struct ext4_sb_info *sbi = PDE(file->f_path.dentry->d_inode)->data;\ 2733 - char str[32]; \ 2734 - long value; \ 2735 - if (cnt >= sizeof(str)) \ 2736 - return -EINVAL; \ 2737 - if (copy_from_user(str, buf, cnt)) \ 2738 - return -EFAULT; \ 2739 - value = simple_strtol(str, NULL, 0); \ 2740 - if (value <= 0) \ 2741 - return -ERANGE; \ 2742 - sbi->s_mb_##name = value; \ 2743 - return cnt; \ 2744 - } \ 2745 - \ 2746 - static const struct file_operations ext4_mb_##name##_proc_fops = { \ 2747 - .owner = THIS_MODULE, \ 2748 - .open = ext4_mb_##name##_proc_open, \ 2749 - .read = seq_read, \ 2750 - .llseek = seq_lseek, \ 2751 - .release = single_release, \ 2752 - .write = ext4_mb_##name##_proc_write, \ 2753 - }; 2754 - 2755 - MB_PROC_FOPS(stats); 2756 - MB_PROC_FOPS(max_to_scan); 2757 - MB_PROC_FOPS(min_to_scan); 2758 - MB_PROC_FOPS(order2_reqs); 2759 - MB_PROC_FOPS(stream_request); 2760 - MB_PROC_FOPS(group_prealloc); 2761 - 2762 - #define MB_PROC_HANDLER(name, var) \ 2763 - do { \ 2764 - proc = proc_create_data(name, mode, sbi->s_mb_proc, \ 2765 - &ext4_mb_##var##_proc_fops, sbi); \ 2766 - if (proc == NULL) { \ 2767 - printk(KERN_ERR "EXT4-fs: can't to create %s\n", name); \ 2768 - goto err_out; \ 2769 - } \ 2770 - } while (0) 2771 - 2772 2724 static int ext4_mb_init_per_dev_proc(struct super_block *sb) 2773 2725 { 2774 2726 mode_t mode = S_IFREG | S_IRUGO | S_IWUSR; 2775 2727 struct ext4_sb_info *sbi = EXT4_SB(sb); 2776 2728 struct proc_dir_entry *proc; 2777 - char devname[64]; 2778 2729 2779 - if (proc_root_ext4 == NULL) { 2780 - sbi->s_mb_proc = NULL; 2730 + if (sbi->s_proc == NULL) 2781 2731 return -EINVAL; 2782 - } 2783 - bdevname(sb->s_bdev, devname); 2784 - sbi->s_mb_proc = proc_mkdir(devname, proc_root_ext4); 2785 2732 2786 - MB_PROC_HANDLER(EXT4_MB_STATS_NAME, stats); 2787 - MB_PROC_HANDLER(EXT4_MB_MAX_TO_SCAN_NAME, max_to_scan); 2788 - MB_PROC_HANDLER(EXT4_MB_MIN_TO_SCAN_NAME, min_to_scan); 2789 - MB_PROC_HANDLER(EXT4_MB_ORDER2_REQ, order2_reqs); 2790 - MB_PROC_HANDLER(EXT4_MB_STREAM_REQ, stream_request); 2791 - MB_PROC_HANDLER(EXT4_MB_GROUP_PREALLOC, group_prealloc); 2792 - 2733 + EXT4_PROC_HANDLER(EXT4_MB_STATS_NAME, mb_stats); 2734 + EXT4_PROC_HANDLER(EXT4_MB_MAX_TO_SCAN_NAME, mb_max_to_scan); 2735 + EXT4_PROC_HANDLER(EXT4_MB_MIN_TO_SCAN_NAME, mb_min_to_scan); 2736 + EXT4_PROC_HANDLER(EXT4_MB_ORDER2_REQ, mb_order2_reqs); 2737 + EXT4_PROC_HANDLER(EXT4_MB_STREAM_REQ, mb_stream_request); 2738 + EXT4_PROC_HANDLER(EXT4_MB_GROUP_PREALLOC, mb_group_prealloc); 2793 2739 return 0; 2794 2740 2795 2741 err_out: 2796 - printk(KERN_ERR "EXT4-fs: Unable to create %s\n", devname); 2797 - remove_proc_entry(EXT4_MB_GROUP_PREALLOC, sbi->s_mb_proc); 2798 - remove_proc_entry(EXT4_MB_STREAM_REQ, sbi->s_mb_proc); 2799 - remove_proc_entry(EXT4_MB_ORDER2_REQ, sbi->s_mb_proc); 2800 - remove_proc_entry(EXT4_MB_MIN_TO_SCAN_NAME, sbi->s_mb_proc); 2801 - remove_proc_entry(EXT4_MB_MAX_TO_SCAN_NAME, sbi->s_mb_proc); 2802 - remove_proc_entry(EXT4_MB_STATS_NAME, sbi->s_mb_proc); 2803 - remove_proc_entry(devname, proc_root_ext4); 2804 - sbi->s_mb_proc = NULL; 2805 - 2742 + remove_proc_entry(EXT4_MB_GROUP_PREALLOC, sbi->s_proc); 2743 + remove_proc_entry(EXT4_MB_STREAM_REQ, sbi->s_proc); 2744 + remove_proc_entry(EXT4_MB_ORDER2_REQ, sbi->s_proc); 2745 + remove_proc_entry(EXT4_MB_MIN_TO_SCAN_NAME, sbi->s_proc); 2746 + remove_proc_entry(EXT4_MB_MAX_TO_SCAN_NAME, sbi->s_proc); 2747 + remove_proc_entry(EXT4_MB_STATS_NAME, sbi->s_proc); 2806 2748 return -ENOMEM; 2807 2749 } 2808 2750 2809 2751 static int ext4_mb_destroy_per_dev_proc(struct super_block *sb) 2810 2752 { 2811 2753 struct ext4_sb_info *sbi = EXT4_SB(sb); 2812 - char devname[64]; 2813 2754 2814 - if (sbi->s_mb_proc == NULL) 2755 + if (sbi->s_proc == NULL) 2815 2756 return -EINVAL; 2816 2757 2817 - bdevname(sb->s_bdev, devname); 2818 - remove_proc_entry(EXT4_MB_GROUP_PREALLOC, sbi->s_mb_proc); 2819 - remove_proc_entry(EXT4_MB_STREAM_REQ, sbi->s_mb_proc); 2820 - remove_proc_entry(EXT4_MB_ORDER2_REQ, sbi->s_mb_proc); 2821 - remove_proc_entry(EXT4_MB_MIN_TO_SCAN_NAME, sbi->s_mb_proc); 2822 - remove_proc_entry(EXT4_MB_MAX_TO_SCAN_NAME, sbi->s_mb_proc); 2823 - remove_proc_entry(EXT4_MB_STATS_NAME, sbi->s_mb_proc); 2824 - remove_proc_entry(devname, proc_root_ext4); 2758 + remove_proc_entry(EXT4_MB_GROUP_PREALLOC, sbi->s_proc); 2759 + remove_proc_entry(EXT4_MB_STREAM_REQ, sbi->s_proc); 2760 + remove_proc_entry(EXT4_MB_ORDER2_REQ, sbi->s_proc); 2761 + remove_proc_entry(EXT4_MB_MIN_TO_SCAN_NAME, sbi->s_proc); 2762 + remove_proc_entry(EXT4_MB_MAX_TO_SCAN_NAME, sbi->s_proc); 2763 + remove_proc_entry(EXT4_MB_STATS_NAME, sbi->s_proc); 2825 2764 2826 2765 return 0; 2827 2766 } ··· 2771 2854 kmem_cache_destroy(ext4_pspace_cachep); 2772 2855 return -ENOMEM; 2773 2856 } 2774 - #ifdef CONFIG_PROC_FS 2775 - proc_root_ext4 = proc_mkdir("fs/ext4", NULL); 2776 - if (proc_root_ext4 == NULL) 2777 - printk(KERN_ERR "EXT4-fs: Unable to create fs/ext4\n"); 2778 - #endif 2779 2857 return 0; 2780 2858 } 2781 2859 ··· 2779 2867 /* XXX: synchronize_rcu(); */ 2780 2868 kmem_cache_destroy(ext4_pspace_cachep); 2781 2869 kmem_cache_destroy(ext4_ac_cachep); 2782 - #ifdef CONFIG_PROC_FS 2783 - remove_proc_entry("fs/ext4", NULL); 2784 - #endif 2785 2870 } 2786 2871 2787 2872 ··· 2788 2879 */ 2789 2880 static noinline_for_stack int 2790 2881 ext4_mb_mark_diskspace_used(struct ext4_allocation_context *ac, 2791 - handle_t *handle) 2882 + handle_t *handle, unsigned long reserv_blks) 2792 2883 { 2793 2884 struct buffer_head *bitmap_bh = NULL; 2794 2885 struct ext4_super_block *es; ··· 2877 2968 le16_add_cpu(&gdp->bg_free_blocks_count, -ac->ac_b_ex.fe_len); 2878 2969 gdp->bg_checksum = ext4_group_desc_csum(sbi, ac->ac_b_ex.fe_group, gdp); 2879 2970 spin_unlock(sb_bgl_lock(sbi, ac->ac_b_ex.fe_group)); 2880 - 2971 + percpu_counter_sub(&sbi->s_freeblocks_counter, ac->ac_b_ex.fe_len); 2881 2972 /* 2882 - * free blocks account has already be reduced/reserved 2883 - * at write_begin() time for delayed allocation 2884 - * do not double accounting 2973 + * Now reduce the dirty block count also. Should not go negative 2885 2974 */ 2886 2975 if (!(ac->ac_flags & EXT4_MB_DELALLOC_RESERVED)) 2887 - percpu_counter_sub(&sbi->s_freeblocks_counter, 2888 - ac->ac_b_ex.fe_len); 2976 + /* release all the reserved blocks if non delalloc */ 2977 + percpu_counter_sub(&sbi->s_dirtyblocks_counter, reserv_blks); 2978 + else 2979 + percpu_counter_sub(&sbi->s_dirtyblocks_counter, 2980 + ac->ac_b_ex.fe_len); 2889 2981 2890 2982 if (sbi->s_log_groups_per_flex) { 2891 2983 ext4_group_t flex_group = ext4_flex_group(sbi, ··· 3794 3884 * 3795 3885 * FIXME!! Make sure it is valid at all the call sites 3796 3886 */ 3797 - void ext4_mb_discard_inode_preallocations(struct inode *inode) 3887 + void ext4_discard_preallocations(struct inode *inode) 3798 3888 { 3799 3889 struct ext4_inode_info *ei = EXT4_I(inode); 3800 3890 struct super_block *sb = inode->i_sb; ··· 3806 3896 struct ext4_buddy e4b; 3807 3897 int err; 3808 3898 3809 - if (!test_opt(sb, MBALLOC) || !S_ISREG(inode->i_mode)) { 3899 + if (!S_ISREG(inode->i_mode)) { 3810 3900 /*BUG_ON(!list_empty(&ei->i_prealloc_list));*/ 3811 3901 return; 3812 3902 } ··· 4004 4094 * per cpu locality group is to reduce the contention between block 4005 4095 * request from multiple CPUs. 4006 4096 */ 4007 - ac->ac_lg = &sbi->s_locality_groups[get_cpu()]; 4008 - put_cpu(); 4097 + ac->ac_lg = per_cpu_ptr(sbi->s_locality_groups, raw_smp_processor_id()); 4009 4098 4010 4099 /* we're going to use group allocation */ 4011 4100 ac->ac_flags |= EXT4_MB_HINT_GROUP_ALLOC; ··· 4278 4369 ext4_fsblk_t ext4_mb_new_blocks(handle_t *handle, 4279 4370 struct ext4_allocation_request *ar, int *errp) 4280 4371 { 4372 + int freed; 4281 4373 struct ext4_allocation_context *ac = NULL; 4282 4374 struct ext4_sb_info *sbi; 4283 4375 struct super_block *sb; 4284 4376 ext4_fsblk_t block = 0; 4285 - int freed; 4286 - int inquota; 4377 + unsigned long inquota; 4378 + unsigned long reserv_blks = 0; 4287 4379 4288 4380 sb = ar->inode->i_sb; 4289 4381 sbi = EXT4_SB(sb); 4290 4382 4291 - if (!test_opt(sb, MBALLOC)) { 4292 - block = ext4_old_new_blocks(handle, ar->inode, ar->goal, 4293 - &(ar->len), errp); 4294 - return block; 4295 - } 4296 4383 if (!EXT4_I(ar->inode)->i_delalloc_reserved_flag) { 4297 4384 /* 4298 4385 * With delalloc we already reserved the blocks 4299 4386 */ 4300 - ar->len = ext4_has_free_blocks(sbi, ar->len); 4387 + while (ar->len && ext4_claim_free_blocks(sbi, ar->len)) { 4388 + /* let others to free the space */ 4389 + yield(); 4390 + ar->len = ar->len >> 1; 4391 + } 4392 + if (!ar->len) { 4393 + *errp = -ENOSPC; 4394 + return 0; 4395 + } 4396 + reserv_blks = ar->len; 4301 4397 } 4302 - 4303 - if (ar->len == 0) { 4304 - *errp = -ENOSPC; 4305 - return 0; 4306 - } 4307 - 4308 4398 while (ar->len && DQUOT_ALLOC_BLOCK(ar->inode, ar->len)) { 4309 4399 ar->flags |= EXT4_MB_HINT_NOPREALLOC; 4310 4400 ar->len--; ··· 4349 4441 } 4350 4442 4351 4443 if (likely(ac->ac_status == AC_STATUS_FOUND)) { 4352 - *errp = ext4_mb_mark_diskspace_used(ac, handle); 4444 + *errp = ext4_mb_mark_diskspace_used(ac, handle, reserv_blks); 4353 4445 if (*errp == -EAGAIN) { 4354 4446 ac->ac_b_ex.fe_group = 0; 4355 4447 ac->ac_b_ex.fe_start = 0;

-1

fs/ext4/mballoc.h

··· 257 257 258 258 #define in_range(b, first, len) ((b) >= (first) && (b) <= (first) + (len) - 1) 259 259 260 - static struct proc_dir_entry *proc_root_ext4; 261 260 struct buffer_head *read_block_bitmap(struct super_block *, ext4_group_t); 262 261 263 262 static void ext4_mb_generate_from_pa(struct super_block *sb, void *bitmap,

+1 -9

fs/ext4/migrate.c

··· 447 447 448 448 } 449 449 450 - int ext4_ext_migrate(struct inode *inode, struct file *filp, 451 - unsigned int cmd, unsigned long arg) 450 + int ext4_ext_migrate(struct inode *inode) 452 451 { 453 452 handle_t *handle; 454 453 int retval = 0, i; ··· 514 515 * trascation that created the inode. Later as and 515 516 * when we add extents we extent the journal 516 517 */ 517 - /* 518 - * inode_mutex prevent write and truncate on the file. Read still goes 519 - * through. We take i_data_sem in ext4_ext_swap_inode_data before we 520 - * switch the inode format to prevent read. 521 - */ 522 - mutex_lock(&(inode->i_mutex)); 523 518 /* 524 519 * Even though we take i_mutex we can still cause block allocation 525 520 * via mmap write to holes. If we have allocated new blocks we fail ··· 616 623 tmp_inode->i_nlink = 0; 617 624 618 625 ext4_journal_stop(handle); 619 - mutex_unlock(&(inode->i_mutex)); 620 626 621 627 if (tmp_inode) 622 628 iput(tmp_inode);

+202 -200

fs/ext4/namei.c

··· 151 151 152 152 static inline ext4_lblk_t dx_get_block(struct dx_entry *entry); 153 153 static void dx_set_block(struct dx_entry *entry, ext4_lblk_t value); 154 - static inline unsigned dx_get_hash (struct dx_entry *entry); 155 - static void dx_set_hash (struct dx_entry *entry, unsigned value); 156 - static unsigned dx_get_count (struct dx_entry *entries); 157 - static unsigned dx_get_limit (struct dx_entry *entries); 158 - static void dx_set_count (struct dx_entry *entries, unsigned value); 159 - static void dx_set_limit (struct dx_entry *entries, unsigned value); 160 - static unsigned dx_root_limit (struct inode *dir, unsigned infosize); 161 - static unsigned dx_node_limit (struct inode *dir); 162 - static struct dx_frame *dx_probe(struct dentry *dentry, 154 + static inline unsigned dx_get_hash(struct dx_entry *entry); 155 + static void dx_set_hash(struct dx_entry *entry, unsigned value); 156 + static unsigned dx_get_count(struct dx_entry *entries); 157 + static unsigned dx_get_limit(struct dx_entry *entries); 158 + static void dx_set_count(struct dx_entry *entries, unsigned value); 159 + static void dx_set_limit(struct dx_entry *entries, unsigned value); 160 + static unsigned dx_root_limit(struct inode *dir, unsigned infosize); 161 + static unsigned dx_node_limit(struct inode *dir); 162 + static struct dx_frame *dx_probe(const struct qstr *d_name, 163 163 struct inode *dir, 164 164 struct dx_hash_info *hinfo, 165 165 struct dx_frame *frame, 166 166 int *err); 167 - static void dx_release (struct dx_frame *frames); 168 - static int dx_make_map (struct ext4_dir_entry_2 *de, int size, 169 - struct dx_hash_info *hinfo, struct dx_map_entry map[]); 167 + static void dx_release(struct dx_frame *frames); 168 + static int dx_make_map(struct ext4_dir_entry_2 *de, int size, 169 + struct dx_hash_info *hinfo, struct dx_map_entry map[]); 170 170 static void dx_sort_map(struct dx_map_entry *map, unsigned count); 171 - static struct ext4_dir_entry_2 *dx_move_dirents (char *from, char *to, 171 + static struct ext4_dir_entry_2 *dx_move_dirents(char *from, char *to, 172 172 struct dx_map_entry *offsets, int count); 173 - static struct ext4_dir_entry_2* dx_pack_dirents (char *base, int size); 173 + static struct ext4_dir_entry_2* dx_pack_dirents(char *base, int size); 174 174 static void dx_insert_block(struct dx_frame *frame, 175 175 u32 hash, ext4_lblk_t block); 176 176 static int ext4_htree_next_block(struct inode *dir, __u32 hash, 177 177 struct dx_frame *frame, 178 178 struct dx_frame *frames, 179 179 __u32 *start_hash); 180 - static struct buffer_head * ext4_dx_find_entry(struct dentry *dentry, 181 - struct ext4_dir_entry_2 **res_dir, int *err); 180 + static struct buffer_head * ext4_dx_find_entry(struct inode *dir, 181 + const struct qstr *d_name, 182 + struct ext4_dir_entry_2 **res_dir, 183 + int *err); 182 184 static int ext4_dx_add_entry(handle_t *handle, struct dentry *dentry, 183 185 struct inode *inode); 184 186 ··· 209 207 entry->block = cpu_to_le32(value); 210 208 } 211 209 212 - static inline unsigned dx_get_hash (struct dx_entry *entry) 210 + static inline unsigned dx_get_hash(struct dx_entry *entry) 213 211 { 214 212 return le32_to_cpu(entry->hash); 215 213 } 216 214 217 - static inline void dx_set_hash (struct dx_entry *entry, unsigned value) 215 + static inline void dx_set_hash(struct dx_entry *entry, unsigned value) 218 216 { 219 217 entry->hash = cpu_to_le32(value); 220 218 } 221 219 222 - static inline unsigned dx_get_count (struct dx_entry *entries) 220 + static inline unsigned dx_get_count(struct dx_entry *entries) 223 221 { 224 222 return le16_to_cpu(((struct dx_countlimit *) entries)->count); 225 223 } 226 224 227 - static inline unsigned dx_get_limit (struct dx_entry *entries) 225 + static inline unsigned dx_get_limit(struct dx_entry *entries) 228 226 { 229 227 return le16_to_cpu(((struct dx_countlimit *) entries)->limit); 230 228 } 231 229 232 - static inline void dx_set_count (struct dx_entry *entries, unsigned value) 230 + static inline void dx_set_count(struct dx_entry *entries, unsigned value) 233 231 { 234 232 ((struct dx_countlimit *) entries)->count = cpu_to_le16(value); 235 233 } 236 234 237 - static inline void dx_set_limit (struct dx_entry *entries, unsigned value) 235 + static inline void dx_set_limit(struct dx_entry *entries, unsigned value) 238 236 { 239 237 ((struct dx_countlimit *) entries)->limit = cpu_to_le16(value); 240 238 } 241 239 242 - static inline unsigned dx_root_limit (struct inode *dir, unsigned infosize) 240 + static inline unsigned dx_root_limit(struct inode *dir, unsigned infosize) 243 241 { 244 242 unsigned entry_space = dir->i_sb->s_blocksize - EXT4_DIR_REC_LEN(1) - 245 243 EXT4_DIR_REC_LEN(2) - infosize; 246 244 return entry_space / sizeof(struct dx_entry); 247 245 } 248 246 249 - static inline unsigned dx_node_limit (struct inode *dir) 247 + static inline unsigned dx_node_limit(struct inode *dir) 250 248 { 251 249 unsigned entry_space = dir->i_sb->s_blocksize - EXT4_DIR_REC_LEN(0); 252 250 return entry_space / sizeof(struct dx_entry); ··· 256 254 * Debug 257 255 */ 258 256 #ifdef DX_DEBUG 259 - static void dx_show_index (char * label, struct dx_entry *entries) 257 + static void dx_show_index(char * label, struct dx_entry *entries) 260 258 { 261 259 int i, n = dx_get_count (entries); 262 - printk("%s index ", label); 260 + printk(KERN_DEBUG "%s index ", label); 263 261 for (i = 0; i < n; i++) { 264 - printk("%x->%lu ", i? dx_get_hash(entries + i) : 262 + printk("%x->%lu ", i ? dx_get_hash(entries + i) : 265 263 0, (unsigned long)dx_get_block(entries + i)); 266 264 } 267 265 printk("\n"); ··· 308 306 struct dx_entry *entries, int levels) 309 307 { 310 308 unsigned blocksize = dir->i_sb->s_blocksize; 311 - unsigned count = dx_get_count (entries), names = 0, space = 0, i; 309 + unsigned count = dx_get_count(entries), names = 0, space = 0, i; 312 310 unsigned bcount = 0; 313 311 struct buffer_head *bh; 314 312 int err; ··· 327 325 names += stats.names; 328 326 space += stats.space; 329 327 bcount += stats.bcount; 330 - brelse (bh); 328 + brelse(bh); 331 329 } 332 330 if (bcount) 333 - printk("%snames %u, fullness %u (%u%%)\n", levels?"":" ", 334 - names, space/bcount,(space/bcount)*100/blocksize); 331 + printk(KERN_DEBUG "%snames %u, fullness %u (%u%%)\n", 332 + levels ? "" : " ", names, space/bcount, 333 + (space/bcount)*100/blocksize); 335 334 return (struct stats) { names, space, bcount}; 336 335 } 337 336 #endif /* DX_DEBUG */ ··· 347 344 * back to userspace. 348 345 */ 349 346 static struct dx_frame * 350 - dx_probe(struct dentry *dentry, struct inode *dir, 347 + dx_probe(const struct qstr *d_name, struct inode *dir, 351 348 struct dx_hash_info *hinfo, struct dx_frame *frame_in, int *err) 352 349 { 353 350 unsigned count, indirect; ··· 358 355 u32 hash; 359 356 360 357 frame->bh = NULL; 361 - if (dentry) 362 - dir = dentry->d_parent->d_inode; 363 358 if (!(bh = ext4_bread (NULL,dir, 0, 0, err))) 364 359 goto fail; 365 360 root = (struct dx_root *) bh->b_data; ··· 373 372 } 374 373 hinfo->hash_version = root->info.hash_version; 375 374 hinfo->seed = EXT4_SB(dir->i_sb)->s_hash_seed; 376 - if (dentry) 377 - ext4fs_dirhash(dentry->d_name.name, dentry->d_name.len, hinfo); 375 + if (d_name) 376 + ext4fs_dirhash(d_name->name, d_name->len, hinfo); 378 377 hash = hinfo->hash; 379 378 380 379 if (root->info.unused_flags & 1) { ··· 407 406 goto fail; 408 407 } 409 408 410 - dxtrace (printk("Look up %x", hash)); 409 + dxtrace(printk("Look up %x", hash)); 411 410 while (1) 412 411 { 413 412 count = dx_get_count(entries); ··· 556 555 0, &err))) 557 556 return err; /* Failure */ 558 557 p++; 559 - brelse (p->bh); 558 + brelse(p->bh); 560 559 p->bh = bh; 561 560 p->at = p->entries = ((struct dx_node *) bh->b_data)->entries; 562 561 } ··· 594 593 /* On error, skip the f_pos to the next block. */ 595 594 dir_file->f_pos = (dir_file->f_pos | 596 595 (dir->i_sb->s_blocksize - 1)) + 1; 597 - brelse (bh); 596 + brelse(bh); 598 597 return count; 599 598 } 600 599 ext4fs_dirhash(de->name, de->name_len, hinfo); ··· 636 635 int ret, err; 637 636 __u32 hashval; 638 637 639 - dxtrace(printk("In htree_fill_tree, start hash: %x:%x\n", start_hash, 640 - start_minor_hash)); 638 + dxtrace(printk(KERN_DEBUG "In htree_fill_tree, start hash: %x:%x\n", 639 + start_hash, start_minor_hash)); 641 640 dir = dir_file->f_path.dentry->d_inode; 642 641 if (!(EXT4_I(dir)->i_flags & EXT4_INDEX_FL)) { 643 642 hinfo.hash_version = EXT4_SB(dir->i_sb)->s_def_hash_version; ··· 649 648 } 650 649 hinfo.hash = start_hash; 651 650 hinfo.minor_hash = 0; 652 - frame = dx_probe(NULL, dir_file->f_path.dentry->d_inode, &hinfo, frames, &err); 651 + frame = dx_probe(NULL, dir, &hinfo, frames, &err); 653 652 if (!frame) 654 653 return err; 655 654 ··· 695 694 break; 696 695 } 697 696 dx_release(frames); 698 - dxtrace(printk("Fill tree: returned %d entries, next hash: %x\n", 699 - count, *next_hash)); 697 + dxtrace(printk(KERN_DEBUG "Fill tree: returned %d entries, " 698 + "next hash: %x\n", count, *next_hash)); 700 699 return count; 701 700 errout: 702 701 dx_release(frames); ··· 803 802 /* 804 803 * Returns 0 if not found, -1 on failure, and 1 on success 805 804 */ 806 - static inline int search_dirblock(struct buffer_head * bh, 805 + static inline int search_dirblock(struct buffer_head *bh, 807 806 struct inode *dir, 808 - struct dentry *dentry, 807 + const struct qstr *d_name, 809 808 unsigned long offset, 810 809 struct ext4_dir_entry_2 ** res_dir) 811 810 { 812 811 struct ext4_dir_entry_2 * de; 813 812 char * dlimit; 814 813 int de_len; 815 - const char *name = dentry->d_name.name; 816 - int namelen = dentry->d_name.len; 814 + const char *name = d_name->name; 815 + int namelen = d_name->len; 817 816 818 817 de = (struct ext4_dir_entry_2 *) bh->b_data; 819 818 dlimit = bh->b_data + dir->i_sb->s_blocksize; ··· 852 851 * The returned buffer_head has ->b_count elevated. The caller is expected 853 852 * to brelse() it when appropriate. 854 853 */ 855 - static struct buffer_head * ext4_find_entry (struct dentry *dentry, 854 + static struct buffer_head * ext4_find_entry (struct inode *dir, 855 + const struct qstr *d_name, 856 856 struct ext4_dir_entry_2 ** res_dir) 857 857 { 858 - struct super_block * sb; 859 - struct buffer_head * bh_use[NAMEI_RA_SIZE]; 860 - struct buffer_head * bh, *ret = NULL; 858 + struct super_block *sb; 859 + struct buffer_head *bh_use[NAMEI_RA_SIZE]; 860 + struct buffer_head *bh, *ret = NULL; 861 861 ext4_lblk_t start, block, b; 862 862 int ra_max = 0; /* Number of bh's in the readahead 863 863 buffer, bh_use[] */ ··· 867 865 int num = 0; 868 866 ext4_lblk_t nblocks; 869 867 int i, err; 870 - struct inode *dir = dentry->d_parent->d_inode; 871 868 int namelen; 872 869 873 870 *res_dir = NULL; 874 871 sb = dir->i_sb; 875 - namelen = dentry->d_name.len; 872 + namelen = d_name->len; 876 873 if (namelen > EXT4_NAME_LEN) 877 874 return NULL; 878 875 if (is_dx(dir)) { 879 - bh = ext4_dx_find_entry(dentry, res_dir, &err); 876 + bh = ext4_dx_find_entry(dir, d_name, res_dir, &err); 880 877 /* 881 878 * On success, or if the error was file not found, 882 879 * return. Otherwise, fall back to doing a search the ··· 883 882 */ 884 883 if (bh || (err != ERR_BAD_DX_DIR)) 885 884 return bh; 886 - dxtrace(printk("ext4_find_entry: dx failed, falling back\n")); 885 + dxtrace(printk(KERN_DEBUG "ext4_find_entry: dx failed, " 886 + "falling back\n")); 887 887 } 888 888 nblocks = dir->i_size >> EXT4_BLOCK_SIZE_BITS(sb); 889 889 start = EXT4_I(dir)->i_dir_start_lookup; ··· 928 926 brelse(bh); 929 927 goto next; 930 928 } 931 - i = search_dirblock(bh, dir, dentry, 929 + i = search_dirblock(bh, dir, d_name, 932 930 block << EXT4_BLOCK_SIZE_BITS(sb), res_dir); 933 931 if (i == 1) { 934 932 EXT4_I(dir)->i_dir_start_lookup = block; ··· 958 956 cleanup_and_exit: 959 957 /* Clean up the read-ahead blocks */ 960 958 for (; ra_ptr < ra_max; ra_ptr++) 961 - brelse (bh_use[ra_ptr]); 959 + brelse(bh_use[ra_ptr]); 962 960 return ret; 963 961 } 964 962 965 - static struct buffer_head * ext4_dx_find_entry(struct dentry *dentry, 963 + static struct buffer_head * ext4_dx_find_entry(struct inode *dir, const struct qstr *d_name, 966 964 struct ext4_dir_entry_2 **res_dir, int *err) 967 965 { 968 966 struct super_block * sb; ··· 973 971 struct buffer_head *bh; 974 972 ext4_lblk_t block; 975 973 int retval; 976 - int namelen = dentry->d_name.len; 977 - const u8 *name = dentry->d_name.name; 978 - struct inode *dir = dentry->d_parent->d_inode; 974 + int namelen = d_name->len; 975 + const u8 *name = d_name->name; 979 976 980 977 sb = dir->i_sb; 981 978 /* NFS may look up ".." - look at dx_root directory block */ 982 979 if (namelen > 2 || name[0] != '.'||(name[1] != '.' && name[1] != '\0')){ 983 - if (!(frame = dx_probe(dentry, NULL, &hinfo, frames, err))) 980 + if (!(frame = dx_probe(d_name, dir, &hinfo, frames, err))) 984 981 return NULL; 985 982 } else { 986 983 frame = frames; ··· 1011 1010 return bh; 1012 1011 } 1013 1012 } 1014 - brelse (bh); 1013 + brelse(bh); 1015 1014 /* Check to see if we should continue to search */ 1016 1015 retval = ext4_htree_next_block(dir, hash, frame, 1017 1016 frames, NULL); ··· 1026 1025 1027 1026 *err = -ENOENT; 1028 1027 errout: 1029 - dxtrace(printk("%s not found\n", name)); 1028 + dxtrace(printk(KERN_DEBUG "%s not found\n", name)); 1030 1029 dx_release (frames); 1031 1030 return NULL; 1032 1031 } 1033 1032 1034 - static struct dentry *ext4_lookup(struct inode * dir, struct dentry *dentry, struct nameidata *nd) 1033 + static struct dentry *ext4_lookup(struct inode *dir, struct dentry *dentry, struct nameidata *nd) 1035 1034 { 1036 - struct inode * inode; 1037 - struct ext4_dir_entry_2 * de; 1038 - struct buffer_head * bh; 1035 + struct inode *inode; 1036 + struct ext4_dir_entry_2 *de; 1037 + struct buffer_head *bh; 1039 1038 1040 1039 if (dentry->d_name.len > EXT4_NAME_LEN) 1041 1040 return ERR_PTR(-ENAMETOOLONG); 1042 1041 1043 - bh = ext4_find_entry(dentry, &de); 1042 + bh = ext4_find_entry(dir, &dentry->d_name, &de); 1044 1043 inode = NULL; 1045 1044 if (bh) { 1046 1045 unsigned long ino = le32_to_cpu(de->inode); 1047 - brelse (bh); 1046 + brelse(bh); 1048 1047 if (!ext4_valid_inum(dir->i_sb, ino)) { 1049 1048 ext4_error(dir->i_sb, "ext4_lookup", 1050 1049 "bad inode number: %lu", ino); ··· 1063 1062 unsigned long ino; 1064 1063 struct dentry *parent; 1065 1064 struct inode *inode; 1066 - struct dentry dotdot; 1065 + static const struct qstr dotdot = { 1066 + .name = "..", 1067 + .len = 2, 1068 + }; 1067 1069 struct ext4_dir_entry_2 * de; 1068 1070 struct buffer_head *bh; 1069 1071 1070 - dotdot.d_name.name = ".."; 1071 - dotdot.d_name.len = 2; 1072 - dotdot.d_parent = child; /* confusing, isn't it! */ 1073 - 1074 - bh = ext4_find_entry(&dotdot, &de); 1072 + bh = ext4_find_entry(child->d_inode, &dotdot, &de); 1075 1073 inode = NULL; 1076 1074 if (!bh) 1077 1075 return ERR_PTR(-ENOENT); ··· 1201 1201 1202 1202 /* create map in the end of data2 block */ 1203 1203 map = (struct dx_map_entry *) (data2 + blocksize); 1204 - count = dx_make_map ((struct ext4_dir_entry_2 *) data1, 1204 + count = dx_make_map((struct ext4_dir_entry_2 *) data1, 1205 1205 blocksize, hinfo, map); 1206 1206 map -= count; 1207 - dx_sort_map (map, count); 1207 + dx_sort_map(map, count); 1208 1208 /* Split the existing block in the middle, size-wise */ 1209 1209 size = 0; 1210 1210 move = 0; ··· 1225 1225 1226 1226 /* Fancy dance to stay within two buffers */ 1227 1227 de2 = dx_move_dirents(data1, data2, map + split, count - split); 1228 - de = dx_pack_dirents(data1,blocksize); 1228 + de = dx_pack_dirents(data1, blocksize); 1229 1229 de->rec_len = ext4_rec_len_to_disk(data1 + blocksize - (char *) de); 1230 1230 de2->rec_len = ext4_rec_len_to_disk(data2 + blocksize - (char *) de2); 1231 1231 dxtrace(dx_show_leaf (hinfo, (struct ext4_dir_entry_2 *) data1, blocksize, 1)); ··· 1237 1237 swap(*bh, bh2); 1238 1238 de = de2; 1239 1239 } 1240 - dx_insert_block (frame, hash2 + continued, newblock); 1241 - err = ext4_journal_dirty_metadata (handle, bh2); 1240 + dx_insert_block(frame, hash2 + continued, newblock); 1241 + err = ext4_journal_dirty_metadata(handle, bh2); 1242 1242 if (err) 1243 1243 goto journal_error; 1244 - err = ext4_journal_dirty_metadata (handle, frame->bh); 1244 + err = ext4_journal_dirty_metadata(handle, frame->bh); 1245 1245 if (err) 1246 1246 goto journal_error; 1247 - brelse (bh2); 1248 - dxtrace(dx_show_index ("frame", frame->entries)); 1247 + brelse(bh2); 1248 + dxtrace(dx_show_index("frame", frame->entries)); 1249 1249 return de; 1250 1250 1251 1251 journal_error: ··· 1271 1271 */ 1272 1272 static int add_dirent_to_buf(handle_t *handle, struct dentry *dentry, 1273 1273 struct inode *inode, struct ext4_dir_entry_2 *de, 1274 - struct buffer_head * bh) 1274 + struct buffer_head *bh) 1275 1275 { 1276 1276 struct inode *dir = dentry->d_parent->d_inode; 1277 1277 const char *name = dentry->d_name.name; ··· 1288 1288 while ((char *) de <= top) { 1289 1289 if (!ext4_check_dir_entry("ext4_add_entry", dir, de, 1290 1290 bh, offset)) { 1291 - brelse (bh); 1291 + brelse(bh); 1292 1292 return -EIO; 1293 1293 } 1294 - if (ext4_match (namelen, name, de)) { 1295 - brelse (bh); 1294 + if (ext4_match(namelen, name, de)) { 1295 + brelse(bh); 1296 1296 return -EEXIST; 1297 1297 } 1298 1298 nlen = EXT4_DIR_REC_LEN(de->name_len); ··· 1329 1329 } else 1330 1330 de->inode = 0; 1331 1331 de->name_len = namelen; 1332 - memcpy (de->name, name, namelen); 1332 + memcpy(de->name, name, namelen); 1333 1333 /* 1334 1334 * XXX shouldn't update any times until successful 1335 1335 * completion of syscall, but too many callers depend ··· 1377 1377 struct fake_dirent *fde; 1378 1378 1379 1379 blocksize = dir->i_sb->s_blocksize; 1380 - dxtrace(printk("Creating index\n")); 1380 + dxtrace(printk(KERN_DEBUG "Creating index\n")); 1381 1381 retval = ext4_journal_get_write_access(handle, bh); 1382 1382 if (retval) { 1383 1383 ext4_std_error(dir->i_sb, retval); ··· 1386 1386 } 1387 1387 root = (struct dx_root *) bh->b_data; 1388 1388 1389 - bh2 = ext4_append (handle, dir, &block, &retval); 1389 + bh2 = ext4_append(handle, dir, &block, &retval); 1390 1390 if (!(bh2)) { 1391 1391 brelse(bh); 1392 1392 return retval; ··· 1412 1412 root->info.info_length = sizeof(root->info); 1413 1413 root->info.hash_version = EXT4_SB(dir->i_sb)->s_def_hash_version; 1414 1414 entries = root->entries; 1415 - dx_set_block (entries, 1); 1416 - dx_set_count (entries, 1); 1417 - dx_set_limit (entries, dx_root_limit(dir, sizeof(root->info))); 1415 + dx_set_block(entries, 1); 1416 + dx_set_count(entries, 1); 1417 + dx_set_limit(entries, dx_root_limit(dir, sizeof(root->info))); 1418 1418 1419 1419 /* Initialize as for dx_probe */ 1420 1420 hinfo.hash_version = root->info.hash_version; ··· 1443 1443 * may not sleep between calling this and putting something into 1444 1444 * the entry, as someone else might have used it while you slept. 1445 1445 */ 1446 - static int ext4_add_entry (handle_t *handle, struct dentry *dentry, 1447 - struct inode *inode) 1446 + static int ext4_add_entry(handle_t *handle, struct dentry *dentry, 1447 + struct inode *inode) 1448 1448 { 1449 1449 struct inode *dir = dentry->d_parent->d_inode; 1450 1450 unsigned long offset; 1451 - struct buffer_head * bh; 1451 + struct buffer_head *bh; 1452 1452 struct ext4_dir_entry_2 *de; 1453 - struct super_block * sb; 1453 + struct super_block *sb; 1454 1454 int retval; 1455 1455 int dx_fallback=0; 1456 1456 unsigned blocksize; ··· 1500 1500 struct dx_frame frames[2], *frame; 1501 1501 struct dx_entry *entries, *at; 1502 1502 struct dx_hash_info hinfo; 1503 - struct buffer_head * bh; 1503 + struct buffer_head *bh; 1504 1504 struct inode *dir = dentry->d_parent->d_inode; 1505 - struct super_block * sb = dir->i_sb; 1505 + struct super_block *sb = dir->i_sb; 1506 1506 struct ext4_dir_entry_2 *de; 1507 1507 int err; 1508 1508 1509 - frame = dx_probe(dentry, NULL, &hinfo, frames, &err); 1509 + frame = dx_probe(&dentry->d_name, dir, &hinfo, frames, &err); 1510 1510 if (!frame) 1511 1511 return err; 1512 1512 entries = frame->entries; ··· 1527 1527 } 1528 1528 1529 1529 /* Block full, should compress but for now just split */ 1530 - dxtrace(printk("using %u of %u node entries\n", 1530 + dxtrace(printk(KERN_DEBUG "using %u of %u node entries\n", 1531 1531 dx_get_count(entries), dx_get_limit(entries))); 1532 1532 /* Need to split index? */ 1533 1533 if (dx_get_count(entries) == dx_get_limit(entries)) { ··· 1559 1559 if (levels) { 1560 1560 unsigned icount1 = icount/2, icount2 = icount - icount1; 1561 1561 unsigned hash2 = dx_get_hash(entries + icount1); 1562 - dxtrace(printk("Split index %i/%i\n", icount1, icount2)); 1562 + dxtrace(printk(KERN_DEBUG "Split index %i/%i\n", 1563 + icount1, icount2)); 1563 1564 1564 1565 BUFFER_TRACE(frame->bh, "get_write_access"); /* index root */ 1565 1566 err = ext4_journal_get_write_access(handle, ··· 1568 1567 if (err) 1569 1568 goto journal_error; 1570 1569 1571 - memcpy ((char *) entries2, (char *) (entries + icount1), 1572 - icount2 * sizeof(struct dx_entry)); 1573 - dx_set_count (entries, icount1); 1574 - dx_set_count (entries2, icount2); 1575 - dx_set_limit (entries2, dx_node_limit(dir)); 1570 + memcpy((char *) entries2, (char *) (entries + icount1), 1571 + icount2 * sizeof(struct dx_entry)); 1572 + dx_set_count(entries, icount1); 1573 + dx_set_count(entries2, icount2); 1574 + dx_set_limit(entries2, dx_node_limit(dir)); 1576 1575 1577 1576 /* Which index block gets the new entry? */ 1578 1577 if (at - entries >= icount1) { ··· 1580 1579 frame->entries = entries = entries2; 1581 1580 swap(frame->bh, bh2); 1582 1581 } 1583 - dx_insert_block (frames + 0, hash2, newblock); 1584 - dxtrace(dx_show_index ("node", frames[1].entries)); 1585 - dxtrace(dx_show_index ("node", 1582 + dx_insert_block(frames + 0, hash2, newblock); 1583 + dxtrace(dx_show_index("node", frames[1].entries)); 1584 + dxtrace(dx_show_index("node", 1586 1585 ((struct dx_node *) bh2->b_data)->entries)); 1587 1586 err = ext4_journal_dirty_metadata(handle, bh2); 1588 1587 if (err) 1589 1588 goto journal_error; 1590 1589 brelse (bh2); 1591 1590 } else { 1592 - dxtrace(printk("Creating second level index...\n")); 1591 + dxtrace(printk(KERN_DEBUG 1592 + "Creating second level index...\n")); 1593 1593 memcpy((char *) entries2, (char *) entries, 1594 1594 icount * sizeof(struct dx_entry)); 1595 1595 dx_set_limit(entries2, dx_node_limit(dir)); ··· 1632 1630 * ext4_delete_entry deletes a directory entry by merging it with the 1633 1631 * previous entry 1634 1632 */ 1635 - static int ext4_delete_entry (handle_t *handle, 1636 - struct inode * dir, 1637 - struct ext4_dir_entry_2 * de_del, 1638 - struct buffer_head * bh) 1633 + static int ext4_delete_entry(handle_t *handle, 1634 + struct inode *dir, 1635 + struct ext4_dir_entry_2 *de_del, 1636 + struct buffer_head *bh) 1639 1637 { 1640 - struct ext4_dir_entry_2 * de, * pde; 1638 + struct ext4_dir_entry_2 *de, *pde; 1641 1639 int i; 1642 1640 1643 1641 i = 0; ··· 1718 1716 * If the create succeeds, we fill in the inode information 1719 1717 * with d_instantiate(). 1720 1718 */ 1721 - static int ext4_create (struct inode * dir, struct dentry * dentry, int mode, 1722 - struct nameidata *nd) 1719 + static int ext4_create(struct inode *dir, struct dentry *dentry, int mode, 1720 + struct nameidata *nd) 1723 1721 { 1724 1722 handle_t *handle; 1725 - struct inode * inode; 1723 + struct inode *inode; 1726 1724 int err, retries = 0; 1727 1725 1728 1726 retry: ··· 1749 1747 return err; 1750 1748 } 1751 1749 1752 - static int ext4_mknod (struct inode * dir, struct dentry *dentry, 1753 - int mode, dev_t rdev) 1750 + static int ext4_mknod(struct inode *dir, struct dentry *dentry, 1751 + int mode, dev_t rdev) 1754 1752 { 1755 1753 handle_t *handle; 1756 1754 struct inode *inode; ··· 1769 1767 if (IS_DIRSYNC(dir)) 1770 1768 handle->h_sync = 1; 1771 1769 1772 - inode = ext4_new_inode (handle, dir, mode); 1770 + inode = ext4_new_inode(handle, dir, mode); 1773 1771 err = PTR_ERR(inode); 1774 1772 if (!IS_ERR(inode)) { 1775 1773 init_special_inode(inode, inode->i_mode, rdev); 1776 - #ifdef CONFIG_EXT4DEV_FS_XATTR 1774 + #ifdef CONFIG_EXT4_FS_XATTR 1777 1775 inode->i_op = &ext4_special_inode_operations; 1778 1776 #endif 1779 1777 err = ext4_add_nondir(handle, dentry, inode); ··· 1784 1782 return err; 1785 1783 } 1786 1784 1787 - static int ext4_mkdir(struct inode * dir, struct dentry * dentry, int mode) 1785 + static int ext4_mkdir(struct inode *dir, struct dentry *dentry, int mode) 1788 1786 { 1789 1787 handle_t *handle; 1790 - struct inode * inode; 1791 - struct buffer_head * dir_block; 1792 - struct ext4_dir_entry_2 * de; 1788 + struct inode *inode; 1789 + struct buffer_head *dir_block; 1790 + struct ext4_dir_entry_2 *de; 1793 1791 int err, retries = 0; 1794 1792 1795 1793 if (EXT4_DIR_LINK_MAX(dir)) ··· 1805 1803 if (IS_DIRSYNC(dir)) 1806 1804 handle->h_sync = 1; 1807 1805 1808 - inode = ext4_new_inode (handle, dir, S_IFDIR | mode); 1806 + inode = ext4_new_inode(handle, dir, S_IFDIR | mode); 1809 1807 err = PTR_ERR(inode); 1810 1808 if (IS_ERR(inode)) 1811 1809 goto out_stop; ··· 1813 1811 inode->i_op = &ext4_dir_inode_operations; 1814 1812 inode->i_fop = &ext4_dir_operations; 1815 1813 inode->i_size = EXT4_I(inode)->i_disksize = inode->i_sb->s_blocksize; 1816 - dir_block = ext4_bread (handle, inode, 0, 1, &err); 1814 + dir_block = ext4_bread(handle, inode, 0, 1, &err); 1817 1815 if (!dir_block) 1818 1816 goto out_clear_inode; 1819 1817 BUFFER_TRACE(dir_block, "get_write_access"); ··· 1822 1820 de->inode = cpu_to_le32(inode->i_ino); 1823 1821 de->name_len = 1; 1824 1822 de->rec_len = ext4_rec_len_to_disk(EXT4_DIR_REC_LEN(de->name_len)); 1825 - strcpy (de->name, "."); 1823 + strcpy(de->name, "."); 1826 1824 ext4_set_de_type(dir->i_sb, de, S_IFDIR); 1827 1825 de = ext4_next_entry(de); 1828 1826 de->inode = cpu_to_le32(dir->i_ino); 1829 1827 de->rec_len = ext4_rec_len_to_disk(inode->i_sb->s_blocksize - 1830 1828 EXT4_DIR_REC_LEN(1)); 1831 1829 de->name_len = 2; 1832 - strcpy (de->name, ".."); 1830 + strcpy(de->name, ".."); 1833 1831 ext4_set_de_type(dir->i_sb, de, S_IFDIR); 1834 1832 inode->i_nlink = 2; 1835 1833 BUFFER_TRACE(dir_block, "call ext4_journal_dirty_metadata"); 1836 1834 ext4_journal_dirty_metadata(handle, dir_block); 1837 - brelse (dir_block); 1835 + brelse(dir_block); 1838 1836 ext4_mark_inode_dirty(handle, inode); 1839 - err = ext4_add_entry (handle, dentry, inode); 1837 + err = ext4_add_entry(handle, dentry, inode); 1840 1838 if (err) { 1841 1839 out_clear_inode: 1842 1840 clear_nlink(inode); 1843 1841 ext4_mark_inode_dirty(handle, inode); 1844 - iput (inode); 1842 + iput(inode); 1845 1843 goto out_stop; 1846 1844 } 1847 1845 ext4_inc_count(handle, dir); ··· 1858 1856 /* 1859 1857 * routine to check that the specified directory is empty (for rmdir) 1860 1858 */ 1861 - static int empty_dir (struct inode * inode) 1859 + static int empty_dir(struct inode *inode) 1862 1860 { 1863 1861 unsigned long offset; 1864 - struct buffer_head * bh; 1865 - struct ext4_dir_entry_2 * de, * de1; 1866 - struct super_block * sb; 1862 + struct buffer_head *bh; 1863 + struct ext4_dir_entry_2 *de, *de1; 1864 + struct super_block *sb; 1867 1865 int err = 0; 1868 1866 1869 1867 sb = inode->i_sb; 1870 1868 if (inode->i_size < EXT4_DIR_REC_LEN(1) + EXT4_DIR_REC_LEN(2) || 1871 - !(bh = ext4_bread (NULL, inode, 0, 0, &err))) { 1869 + !(bh = ext4_bread(NULL, inode, 0, 0, &err))) { 1872 1870 if (err) 1873 1871 ext4_error(inode->i_sb, __func__, 1874 1872 "error %d reading directory #%lu offset 0", ··· 1883 1881 de1 = ext4_next_entry(de); 1884 1882 if (le32_to_cpu(de->inode) != inode->i_ino || 1885 1883 !le32_to_cpu(de1->inode) || 1886 - strcmp (".", de->name) || 1887 - strcmp ("..", de1->name)) { 1888 - ext4_warning (inode->i_sb, "empty_dir", 1889 - "bad directory (dir #%lu) - no `.' or `..'", 1890 - inode->i_ino); 1891 - brelse (bh); 1884 + strcmp(".", de->name) || 1885 + strcmp("..", de1->name)) { 1886 + ext4_warning(inode->i_sb, "empty_dir", 1887 + "bad directory (dir #%lu) - no `.' or `..'", 1888 + inode->i_ino); 1889 + brelse(bh); 1892 1890 return 1; 1893 1891 } 1894 1892 offset = ext4_rec_len_from_disk(de->rec_len) + 1895 1893 ext4_rec_len_from_disk(de1->rec_len); 1896 1894 de = ext4_next_entry(de1); 1897 - while (offset < inode->i_size ) { 1895 + while (offset < inode->i_size) { 1898 1896 if (!bh || 1899 1897 (void *) de >= (void *) (bh->b_data+sb->s_blocksize)) { 1900 1898 err = 0; 1901 - brelse (bh); 1902 - bh = ext4_bread (NULL, inode, 1899 + brelse(bh); 1900 + bh = ext4_bread(NULL, inode, 1903 1901 offset >> EXT4_BLOCK_SIZE_BITS(sb), 0, &err); 1904 1902 if (!bh) { 1905 1903 if (err) ··· 1919 1917 continue; 1920 1918 } 1921 1919 if (le32_to_cpu(de->inode)) { 1922 - brelse (bh); 1920 + brelse(bh); 1923 1921 return 0; 1924 1922 } 1925 1923 offset += ext4_rec_len_from_disk(de->rec_len); 1926 1924 de = ext4_next_entry(de); 1927 1925 } 1928 - brelse (bh); 1926 + brelse(bh); 1929 1927 return 1; 1930 1928 } 1931 1929 ··· 1956 1954 * ->i_nlink. For, say it, character device. Not a regular file, 1957 1955 * not a directory, not a symlink and ->i_nlink > 0. 1958 1956 */ 1959 - J_ASSERT ((S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode) || 1960 - S_ISLNK(inode->i_mode)) || inode->i_nlink == 0); 1957 + J_ASSERT((S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode) || 1958 + S_ISLNK(inode->i_mode)) || inode->i_nlink == 0); 1961 1959 1962 1960 BUFFER_TRACE(EXT4_SB(sb)->s_sbh, "get_write_access"); 1963 1961 err = ext4_journal_get_write_access(handle, EXT4_SB(sb)->s_sbh); ··· 2071 2069 goto out_err; 2072 2070 } 2073 2071 2074 - static int ext4_rmdir (struct inode * dir, struct dentry *dentry) 2072 + static int ext4_rmdir(struct inode *dir, struct dentry *dentry) 2075 2073 { 2076 2074 int retval; 2077 - struct inode * inode; 2078 - struct buffer_head * bh; 2079 - struct ext4_dir_entry_2 * de; 2075 + struct inode *inode; 2076 + struct buffer_head *bh; 2077 + struct ext4_dir_entry_2 *de; 2080 2078 handle_t *handle; 2081 2079 2082 2080 /* Initialize quotas before so that eventual writes go in ··· 2087 2085 return PTR_ERR(handle); 2088 2086 2089 2087 retval = -ENOENT; 2090 - bh = ext4_find_entry (dentry, &de); 2088 + bh = ext4_find_entry(dir, &dentry->d_name, &de); 2091 2089 if (!bh) 2092 2090 goto end_rmdir; 2093 2091 ··· 2101 2099 goto end_rmdir; 2102 2100 2103 2101 retval = -ENOTEMPTY; 2104 - if (!empty_dir (inode)) 2102 + if (!empty_dir(inode)) 2105 2103 goto end_rmdir; 2106 2104 2107 2105 retval = ext4_delete_entry(handle, dir, de, bh); 2108 2106 if (retval) 2109 2107 goto end_rmdir; 2110 2108 if (!EXT4_DIR_LINK_EMPTY(inode)) 2111 - ext4_warning (inode->i_sb, "ext4_rmdir", 2112 - "empty directory has too many links (%d)", 2113 - inode->i_nlink); 2109 + ext4_warning(inode->i_sb, "ext4_rmdir", 2110 + "empty directory has too many links (%d)", 2111 + inode->i_nlink); 2114 2112 inode->i_version++; 2115 2113 clear_nlink(inode); 2116 2114 /* There's no need to set i_disksize: the fact that i_nlink is ··· 2126 2124 2127 2125 end_rmdir: 2128 2126 ext4_journal_stop(handle); 2129 - brelse (bh); 2127 + brelse(bh); 2130 2128 return retval; 2131 2129 } 2132 2130 2133 - static int ext4_unlink(struct inode * dir, struct dentry *dentry) 2131 + static int ext4_unlink(struct inode *dir, struct dentry *dentry) 2134 2132 { 2135 2133 int retval; 2136 - struct inode * inode; 2137 - struct buffer_head * bh; 2138 - struct ext4_dir_entry_2 * de; 2134 + struct inode *inode; 2135 + struct buffer_head *bh; 2136 + struct ext4_dir_entry_2 *de; 2139 2137 handle_t *handle; 2140 2138 2141 2139 /* Initialize quotas before so that eventual writes go ··· 2149 2147 handle->h_sync = 1; 2150 2148 2151 2149 retval = -ENOENT; 2152 - bh = ext4_find_entry (dentry, &de); 2150 + bh = ext4_find_entry(dir, &dentry->d_name, &de); 2153 2151 if (!bh) 2154 2152 goto end_unlink; 2155 2153 ··· 2160 2158 goto end_unlink; 2161 2159 2162 2160 if (!inode->i_nlink) { 2163 - ext4_warning (inode->i_sb, "ext4_unlink", 2164 - "Deleting nonexistent file (%lu), %d", 2165 - inode->i_ino, inode->i_nlink); 2161 + ext4_warning(inode->i_sb, "ext4_unlink", 2162 + "Deleting nonexistent file (%lu), %d", 2163 + inode->i_ino, inode->i_nlink); 2166 2164 inode->i_nlink = 1; 2167 2165 } 2168 2166 retval = ext4_delete_entry(handle, dir, de, bh); ··· 2180 2178 2181 2179 end_unlink: 2182 2180 ext4_journal_stop(handle); 2183 - brelse (bh); 2181 + brelse(bh); 2184 2182 return retval; 2185 2183 } 2186 2184 2187 - static int ext4_symlink (struct inode * dir, 2188 - struct dentry *dentry, const char * symname) 2185 + static int ext4_symlink(struct inode *dir, 2186 + struct dentry *dentry, const char *symname) 2189 2187 { 2190 2188 handle_t *handle; 2191 - struct inode * inode; 2189 + struct inode *inode; 2192 2190 int l, err, retries = 0; 2193 2191 2194 2192 l = strlen(symname)+1; ··· 2205 2203 if (IS_DIRSYNC(dir)) 2206 2204 handle->h_sync = 1; 2207 2205 2208 - inode = ext4_new_inode (handle, dir, S_IFLNK|S_IRWXUGO); 2206 + inode = ext4_new_inode(handle, dir, S_IFLNK|S_IRWXUGO); 2209 2207 err = PTR_ERR(inode); 2210 2208 if (IS_ERR(inode)) 2211 2209 goto out_stop; 2212 2210 2213 - if (l > sizeof (EXT4_I(inode)->i_data)) { 2211 + if (l > sizeof(EXT4_I(inode)->i_data)) { 2214 2212 inode->i_op = &ext4_symlink_inode_operations; 2215 2213 ext4_set_aops(inode); 2216 2214 /* ··· 2223 2221 if (err) { 2224 2222 clear_nlink(inode); 2225 2223 ext4_mark_inode_dirty(handle, inode); 2226 - iput (inode); 2224 + iput(inode); 2227 2225 goto out_stop; 2228 2226 } 2229 2227 } else { 2230 2228 /* clear the extent format for fast symlink */ 2231 2229 EXT4_I(inode)->i_flags &= ~EXT4_EXTENTS_FL; 2232 2230 inode->i_op = &ext4_fast_symlink_inode_operations; 2233 - memcpy((char*)&EXT4_I(inode)->i_data,symname,l); 2231 + memcpy((char *)&EXT4_I(inode)->i_data, symname, l); 2234 2232 inode->i_size = l-1; 2235 2233 } 2236 2234 EXT4_I(inode)->i_disksize = inode->i_size; ··· 2242 2240 return err; 2243 2241 } 2244 2242 2245 - static int ext4_link (struct dentry * old_dentry, 2246 - struct inode * dir, struct dentry *dentry) 2243 + static int ext4_link(struct dentry *old_dentry, 2244 + struct inode *dir, struct dentry *dentry) 2247 2245 { 2248 2246 handle_t *handle; 2249 2247 struct inode *inode = old_dentry->d_inode; ··· 2286 2284 * Anybody can rename anything with this: the permission checks are left to the 2287 2285 * higher-level routines. 2288 2286 */ 2289 - static int ext4_rename (struct inode * old_dir, struct dentry *old_dentry, 2290 - struct inode * new_dir,struct dentry *new_dentry) 2287 + static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry, 2288 + struct inode *new_dir, struct dentry *new_dentry) 2291 2289 { 2292 2290 handle_t *handle; 2293 - struct inode * old_inode, * new_inode; 2294 - struct buffer_head * old_bh, * new_bh, * dir_bh; 2295 - struct ext4_dir_entry_2 * old_de, * new_de; 2291 + struct inode *old_inode, *new_inode; 2292 + struct buffer_head *old_bh, *new_bh, *dir_bh; 2293 + struct ext4_dir_entry_2 *old_de, *new_de; 2296 2294 int retval; 2297 2295 2298 2296 old_bh = new_bh = dir_bh = NULL; ··· 2310 2308 if (IS_DIRSYNC(old_dir) || IS_DIRSYNC(new_dir)) 2311 2309 handle->h_sync = 1; 2312 2310 2313 - old_bh = ext4_find_entry (old_dentry, &old_de); 2311 + old_bh = ext4_find_entry(old_dir, &old_dentry->d_name, &old_de); 2314 2312 /* 2315 2313 * Check for inode number is _not_ due to possible IO errors. 2316 2314 * We might rmdir the source, keep it as pwd of some process ··· 2323 2321 goto end_rename; 2324 2322 2325 2323 new_inode = new_dentry->d_inode; 2326 - new_bh = ext4_find_entry (new_dentry, &new_de); 2324 + new_bh = ext4_find_entry(new_dir, &new_dentry->d_name, &new_de); 2327 2325 if (new_bh) { 2328 2326 if (!new_inode) { 2329 - brelse (new_bh); 2327 + brelse(new_bh); 2330 2328 new_bh = NULL; 2331 2329 } 2332 2330 } 2333 2331 if (S_ISDIR(old_inode->i_mode)) { 2334 2332 if (new_inode) { 2335 2333 retval = -ENOTEMPTY; 2336 - if (!empty_dir (new_inode)) 2334 + if (!empty_dir(new_inode)) 2337 2335 goto end_rename; 2338 2336 } 2339 2337 retval = -EIO; 2340 - dir_bh = ext4_bread (handle, old_inode, 0, 0, &retval); 2338 + dir_bh = ext4_bread(handle, old_inode, 0, 0, &retval); 2341 2339 if (!dir_bh) 2342 2340 goto end_rename; 2343 2341 if (le32_to_cpu(PARENT_INO(dir_bh->b_data)) != old_dir->i_ino) 2344 2342 goto end_rename; 2345 2343 retval = -EMLINK; 2346 - if (!new_inode && new_dir!=old_dir && 2344 + if (!new_inode && new_dir != old_dir && 2347 2345 new_dir->i_nlink >= EXT4_LINK_MAX) 2348 2346 goto end_rename; 2349 2347 } 2350 2348 if (!new_bh) { 2351 - retval = ext4_add_entry (handle, new_dentry, old_inode); 2349 + retval = ext4_add_entry(handle, new_dentry, old_inode); 2352 2350 if (retval) 2353 2351 goto end_rename; 2354 2352 } else { ··· 2390 2388 struct buffer_head *old_bh2; 2391 2389 struct ext4_dir_entry_2 *old_de2; 2392 2390 2393 - old_bh2 = ext4_find_entry(old_dentry, &old_de2); 2391 + old_bh2 = ext4_find_entry(old_dir, &old_dentry->d_name, &old_de2); 2394 2392 if (old_bh2) { 2395 2393 retval = ext4_delete_entry(handle, old_dir, 2396 2394 old_de2, old_bh2); ··· 2435 2433 retval = 0; 2436 2434 2437 2435 end_rename: 2438 - brelse (dir_bh); 2439 - brelse (old_bh); 2440 - brelse (new_bh); 2436 + brelse(dir_bh); 2437 + brelse(old_bh); 2438 + brelse(new_bh); 2441 2439 ext4_journal_stop(handle); 2442 2440 return retval; 2443 2441 } ··· 2456 2454 .mknod = ext4_mknod, 2457 2455 .rename = ext4_rename, 2458 2456 .setattr = ext4_setattr, 2459 - #ifdef CONFIG_EXT4DEV_FS_XATTR 2457 + #ifdef CONFIG_EXT4_FS_XATTR 2460 2458 .setxattr = generic_setxattr, 2461 2459 .getxattr = generic_getxattr, 2462 2460 .listxattr = ext4_listxattr, ··· 2467 2465 2468 2466 const struct inode_operations ext4_special_inode_operations = { 2469 2467 .setattr = ext4_setattr, 2470 - #ifdef CONFIG_EXT4DEV_FS_XATTR 2468 + #ifdef CONFIG_EXT4_FS_XATTR 2471 2469 .setxattr = generic_setxattr, 2472 2470 .getxattr = generic_getxattr, 2473 2471 .listxattr = ext4_listxattr,

+24 -9

fs/ext4/resize.c

··· 416 416 "EXT4-fs: ext4_add_new_gdb: adding group block %lu\n", 417 417 gdb_num); 418 418 419 - /* 420 - * If we are not using the primary superblock/GDT copy don't resize, 419 + /* 420 + * If we are not using the primary superblock/GDT copy don't resize, 421 421 * because the user tools have no way of handling this. Probably a 422 422 * bad time to do it anyways. 423 423 */ ··· 870 870 * We can allocate memory for mb_alloc based on the new group 871 871 * descriptor 872 872 */ 873 - if (test_opt(sb, MBALLOC)) { 874 - err = ext4_mb_add_more_groupinfo(sb, input->group, gdp); 875 - if (err) 876 - goto exit_journal; 877 - } 873 + err = ext4_mb_add_more_groupinfo(sb, input->group, gdp); 874 + if (err) 875 + goto exit_journal; 876 + 878 877 /* 879 878 * Make the new blocks and inodes valid next. We do this before 880 879 * increasing the group count so that once the group is enabled, ··· 928 929 percpu_counter_add(&sbi->s_freeinodes_counter, 929 930 EXT4_INODES_PER_GROUP(sb)); 930 931 932 + if (EXT4_HAS_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_FLEX_BG)) { 933 + ext4_group_t flex_group; 934 + flex_group = ext4_flex_group(sbi, input->group); 935 + sbi->s_flex_groups[flex_group].free_blocks += 936 + input->free_blocks_count; 937 + sbi->s_flex_groups[flex_group].free_inodes += 938 + EXT4_INODES_PER_GROUP(sb); 939 + } 940 + 931 941 ext4_journal_dirty_metadata(handle, sbi->s_sbh); 932 942 sb->s_dirt = 1; 933 943 ··· 972 964 ext4_group_t o_groups_count; 973 965 ext4_grpblk_t last; 974 966 ext4_grpblk_t add; 975 - struct buffer_head * bh; 967 + struct buffer_head *bh; 976 968 handle_t *handle; 977 969 int err; 978 970 unsigned long freed_blocks; ··· 1085 1077 /* 1086 1078 * Mark mballoc pages as not up to date so that they will be updated 1087 1079 * next time they are loaded by ext4_mb_load_buddy. 1080 + * 1081 + * XXX Bad, Bad, BAD!!! We should not be overloading the 1082 + * Uptodate flag, particularly on thte bitmap bh, as way of 1083 + * hinting to ext4_mb_load_buddy() that it needs to be 1084 + * overloaded. A user could take a LVM snapshot, then do an 1085 + * on-line fsck, and clear the uptodate flag, and this would 1086 + * not be a bug in userspace, but a bug in the kernel. FIXME!!! 1088 1087 */ 1089 - if (test_opt(sb, MBALLOC)) { 1088 + { 1090 1089 struct ext4_sb_info *sbi = EXT4_SB(sb); 1091 1090 struct inode *inode = sbi->s_buddy_cache; 1092 1091 int blocks_per_page;

+193 -83

fs/ext4/super.c

··· 34 34 #include <linux/namei.h> 35 35 #include <linux/quotaops.h> 36 36 #include <linux/seq_file.h> 37 + #include <linux/proc_fs.h> 38 + #include <linux/marker.h> 37 39 #include <linux/log2.h> 38 40 #include <linux/crc16.h> 39 41 #include <asm/uaccess.h> ··· 46 44 #include "acl.h" 47 45 #include "namei.h" 48 46 #include "group.h" 47 + 48 + struct proc_dir_entry *ext4_proc_root; 49 49 50 50 static int ext4_load_journal(struct super_block *, struct ext4_super_block *, 51 51 unsigned long journal_devnum); ··· 512 508 if (!(sb->s_flags & MS_RDONLY)) { 513 509 EXT4_CLEAR_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_RECOVER); 514 510 es->s_state = cpu_to_le16(sbi->s_mount_state); 515 - BUFFER_TRACE(sbi->s_sbh, "marking dirty"); 516 - mark_buffer_dirty(sbi->s_sbh); 517 511 ext4_commit_super(sb, es, 1); 512 + } 513 + if (sbi->s_proc) { 514 + remove_proc_entry("inode_readahead_blks", sbi->s_proc); 515 + remove_proc_entry(sb->s_id, ext4_proc_root); 518 516 } 519 517 520 518 for (i = 0; i < sbi->s_gdb_count; i++) ··· 526 520 percpu_counter_destroy(&sbi->s_freeblocks_counter); 527 521 percpu_counter_destroy(&sbi->s_freeinodes_counter); 528 522 percpu_counter_destroy(&sbi->s_dirs_counter); 523 + percpu_counter_destroy(&sbi->s_dirtyblocks_counter); 529 524 brelse(sbi->s_sbh); 530 525 #ifdef CONFIG_QUOTA 531 526 for (i = 0; i < MAXQUOTAS; i++) ··· 569 562 ei = kmem_cache_alloc(ext4_inode_cachep, GFP_NOFS); 570 563 if (!ei) 571 564 return NULL; 572 - #ifdef CONFIG_EXT4DEV_FS_POSIX_ACL 565 + #ifdef CONFIG_EXT4_FS_POSIX_ACL 573 566 ei->i_acl = EXT4_ACL_NOT_CACHED; 574 567 ei->i_default_acl = EXT4_ACL_NOT_CACHED; 575 568 #endif 576 - ei->i_block_alloc_info = NULL; 577 569 ei->vfs_inode.i_version = 1; 578 570 ei->vfs_inode.i_data.writeback_index = 0; 579 571 memset(&ei->i_cached_extent, 0, sizeof(struct ext4_ext_cache)); ··· 605 599 struct ext4_inode_info *ei = (struct ext4_inode_info *) foo; 606 600 607 601 INIT_LIST_HEAD(&ei->i_orphan); 608 - #ifdef CONFIG_EXT4DEV_FS_XATTR 602 + #ifdef CONFIG_EXT4_FS_XATTR 609 603 init_rwsem(&ei->xattr_sem); 610 604 #endif 611 605 init_rwsem(&ei->i_data_sem); ··· 631 625 632 626 static void ext4_clear_inode(struct inode *inode) 633 627 { 634 - struct ext4_block_alloc_info *rsv = EXT4_I(inode)->i_block_alloc_info; 635 - #ifdef CONFIG_EXT4DEV_FS_POSIX_ACL 628 + #ifdef CONFIG_EXT4_FS_POSIX_ACL 636 629 if (EXT4_I(inode)->i_acl && 637 630 EXT4_I(inode)->i_acl != EXT4_ACL_NOT_CACHED) { 638 631 posix_acl_release(EXT4_I(inode)->i_acl); ··· 643 638 EXT4_I(inode)->i_default_acl = EXT4_ACL_NOT_CACHED; 644 639 } 645 640 #endif 646 - ext4_discard_reservation(inode); 647 - EXT4_I(inode)->i_block_alloc_info = NULL; 648 - if (unlikely(rsv)) 649 - kfree(rsv); 641 + ext4_discard_preallocations(inode); 650 642 jbd2_journal_release_jbd_inode(EXT4_SB(inode->i_sb)->s_journal, 651 643 &EXT4_I(inode)->jinode); 652 644 } ··· 656 654 657 655 if (sbi->s_jquota_fmt) 658 656 seq_printf(seq, ",jqfmt=%s", 659 - (sbi->s_jquota_fmt == QFMT_VFS_OLD) ? "vfsold": "vfsv0"); 657 + (sbi->s_jquota_fmt == QFMT_VFS_OLD) ? "vfsold" : "vfsv0"); 660 658 661 659 if (sbi->s_qf_names[USRQUOTA]) 662 660 seq_printf(seq, ",usrjquota=%s", sbi->s_qf_names[USRQUOTA]); ··· 720 718 seq_puts(seq, ",debug"); 721 719 if (test_opt(sb, OLDALLOC)) 722 720 seq_puts(seq, ",oldalloc"); 723 - #ifdef CONFIG_EXT4DEV_FS_XATTR 721 + #ifdef CONFIG_EXT4_FS_XATTR 724 722 if (test_opt(sb, XATTR_USER) && 725 723 !(def_mount_opts & EXT4_DEFM_XATTR_USER)) 726 724 seq_puts(seq, ",user_xattr"); ··· 729 727 seq_puts(seq, ",nouser_xattr"); 730 728 } 731 729 #endif 732 - #ifdef CONFIG_EXT4DEV_FS_POSIX_ACL 730 + #ifdef CONFIG_EXT4_FS_POSIX_ACL 733 731 if (test_opt(sb, POSIX_ACL) && !(def_mount_opts & EXT4_DEFM_ACL)) 734 732 seq_puts(seq, ",acl"); 735 733 if (!test_opt(sb, POSIX_ACL) && (def_mount_opts & EXT4_DEFM_ACL)) ··· 754 752 seq_puts(seq, ",nobh"); 755 753 if (!test_opt(sb, EXTENTS)) 756 754 seq_puts(seq, ",noextents"); 757 - if (!test_opt(sb, MBALLOC)) 758 - seq_puts(seq, ",nomballoc"); 759 755 if (test_opt(sb, I_VERSION)) 760 756 seq_puts(seq, ",i_version"); 761 757 if (!test_opt(sb, DELALLOC)) ··· 772 772 seq_puts(seq, ",data=ordered"); 773 773 else if (test_opt(sb, DATA_FLAGS) == EXT4_MOUNT_WRITEBACK_DATA) 774 774 seq_puts(seq, ",data=writeback"); 775 + 776 + if (sbi->s_inode_readahead_blks != EXT4_DEF_INODE_READAHEAD_BLKS) 777 + seq_printf(seq, ",inode_readahead_blks=%u", 778 + sbi->s_inode_readahead_blks); 775 779 776 780 ext4_show_quota_options(seq, sb); 777 781 return 0; ··· 826 822 } 827 823 828 824 #ifdef CONFIG_QUOTA 829 - #define QTYPE2NAME(t) ((t) == USRQUOTA?"user":"group") 825 + #define QTYPE2NAME(t) ((t) == USRQUOTA ? "user" : "group") 830 826 #define QTYPE2MOPT(on, t) ((t) == USRQUOTA?((on)##USRJQUOTA):((on)##GRPJQUOTA)) 831 827 832 828 static int ext4_dquot_initialize(struct inode *inode, int type); ··· 911 907 Opt_ignore, Opt_barrier, Opt_err, Opt_resize, Opt_usrquota, 912 908 Opt_grpquota, Opt_extents, Opt_noextents, Opt_i_version, 913 909 Opt_mballoc, Opt_nomballoc, Opt_stripe, Opt_delalloc, Opt_nodelalloc, 910 + Opt_inode_readahead_blks 914 911 }; 915 912 916 913 static match_table_t tokens = { ··· 972 967 {Opt_resize, "resize"}, 973 968 {Opt_delalloc, "delalloc"}, 974 969 {Opt_nodelalloc, "nodelalloc"}, 970 + {Opt_inode_readahead_blks, "inode_readahead_blks=%u"}, 975 971 {Opt_err, NULL}, 976 972 }; 977 973 ··· 987 981 /*todo: use simple_strtoll with >32bit ext4 */ 988 982 sb_block = simple_strtoul(options, &options, 0); 989 983 if (*options && *options != ',') { 990 - printk("EXT4-fs: Invalid sb specification: %s\n", 984 + printk(KERN_ERR "EXT4-fs: Invalid sb specification: %s\n", 991 985 (char *) *data); 992 986 return 1; 993 987 } ··· 1078 1072 case Opt_orlov: 1079 1073 clear_opt(sbi->s_mount_opt, OLDALLOC); 1080 1074 break; 1081 - #ifdef CONFIG_EXT4DEV_FS_XATTR 1075 + #ifdef CONFIG_EXT4_FS_XATTR 1082 1076 case Opt_user_xattr: 1083 1077 set_opt(sbi->s_mount_opt, XATTR_USER); 1084 1078 break; ··· 1088 1082 #else 1089 1083 case Opt_user_xattr: 1090 1084 case Opt_nouser_xattr: 1091 - printk("EXT4 (no)user_xattr options not supported\n"); 1085 + printk(KERN_ERR "EXT4 (no)user_xattr options " 1086 + "not supported\n"); 1092 1087 break; 1093 1088 #endif 1094 - #ifdef CONFIG_EXT4DEV_FS_POSIX_ACL 1089 + #ifdef CONFIG_EXT4_FS_POSIX_ACL 1095 1090 case Opt_acl: 1096 1091 set_opt(sbi->s_mount_opt, POSIX_ACL); 1097 1092 break; ··· 1102 1095 #else 1103 1096 case Opt_acl: 1104 1097 case Opt_noacl: 1105 - printk("EXT4 (no)acl options not supported\n"); 1098 + printk(KERN_ERR "EXT4 (no)acl options " 1099 + "not supported\n"); 1106 1100 break; 1107 1101 #endif 1108 1102 case Opt_reservation: ··· 1197 1189 sb_any_quota_suspended(sb)) && 1198 1190 !sbi->s_qf_names[qtype]) { 1199 1191 printk(KERN_ERR 1200 - "EXT4-fs: Cannot change journaled " 1201 - "quota options when quota turned on.\n"); 1192 + "EXT4-fs: Cannot change journaled " 1193 + "quota options when quota turned on.\n"); 1202 1194 return 0; 1203 1195 } 1204 1196 qname = match_strdup(&args[0]); ··· 1365 1357 case Opt_nodelalloc: 1366 1358 clear_opt(sbi->s_mount_opt, DELALLOC); 1367 1359 break; 1368 - case Opt_mballoc: 1369 - set_opt(sbi->s_mount_opt, MBALLOC); 1370 - break; 1371 - case Opt_nomballoc: 1372 - clear_opt(sbi->s_mount_opt, MBALLOC); 1373 - break; 1374 1360 case Opt_stripe: 1375 1361 if (match_int(&args[0], &option)) 1376 1362 return 0; ··· 1374 1372 break; 1375 1373 case Opt_delalloc: 1376 1374 set_opt(sbi->s_mount_opt, DELALLOC); 1375 + break; 1376 + case Opt_inode_readahead_blks: 1377 + if (match_int(&args[0], &option)) 1378 + return 0; 1379 + if (option < 0 || option > (1 << 30)) 1380 + return 0; 1381 + sbi->s_inode_readahead_blks = option; 1377 1382 break; 1378 1383 default: 1379 1384 printk(KERN_ERR ··· 1482 1473 EXT4_INODES_PER_GROUP(sb), 1483 1474 sbi->s_mount_opt); 1484 1475 1485 - printk(KERN_INFO "EXT4 FS on %s, ", sb->s_id); 1486 - if (EXT4_SB(sb)->s_journal->j_inode == NULL) { 1487 - char b[BDEVNAME_SIZE]; 1488 - 1489 - printk("external journal on %s\n", 1490 - bdevname(EXT4_SB(sb)->s_journal->j_dev, b)); 1491 - } else { 1492 - printk("internal journal\n"); 1493 - } 1476 + printk(KERN_INFO "EXT4 FS on %s, %s journal on %s\n", 1477 + sb->s_id, EXT4_SB(sb)->s_journal->j_inode ? "internal" : 1478 + "external", EXT4_SB(sb)->s_journal->j_devname); 1494 1479 return res; 1495 1480 } 1496 1481 ··· 1507 1504 sbi->s_log_groups_per_flex = sbi->s_es->s_log_groups_per_flex; 1508 1505 groups_per_flex = 1 << sbi->s_log_groups_per_flex; 1509 1506 1510 - flex_group_count = (sbi->s_groups_count + groups_per_flex - 1) / 1511 - groups_per_flex; 1507 + /* We allocate both existing and potentially added groups */ 1508 + flex_group_count = ((sbi->s_groups_count + groups_per_flex - 1) + 1509 + ((sbi->s_es->s_reserved_gdt_blocks +1 ) << 1510 + EXT4_DESC_PER_BLOCK_BITS(sb))) / 1511 + groups_per_flex; 1512 1512 sbi->s_flex_groups = kzalloc(flex_group_count * 1513 1513 sizeof(struct flex_groups), GFP_KERNEL); 1514 1514 if (sbi->s_flex_groups == NULL) { ··· 1590 1584 if (EXT4_HAS_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_FLEX_BG)) 1591 1585 flexbg_flag = 1; 1592 1586 1593 - ext4_debug ("Checking group descriptors"); 1587 + ext4_debug("Checking group descriptors"); 1594 1588 1595 1589 for (i = 0; i < sbi->s_groups_count; i++) { 1596 1590 struct ext4_group_desc *gdp = ext4_get_group_desc(sb, i, NULL); ··· 1629 1623 "Checksum for group %lu failed (%u!=%u)\n", 1630 1624 i, le16_to_cpu(ext4_group_desc_csum(sbi, i, 1631 1625 gdp)), le16_to_cpu(gdp->bg_checksum)); 1632 - if (!(sb->s_flags & MS_RDONLY)) 1626 + if (!(sb->s_flags & MS_RDONLY)) { 1627 + spin_unlock(sb_bgl_lock(sbi, i)); 1633 1628 return 0; 1629 + } 1634 1630 } 1635 1631 spin_unlock(sb_bgl_lock(sbi, i)); 1636 1632 if (!flexbg_flag) ··· 1722 1714 DQUOT_INIT(inode); 1723 1715 if (inode->i_nlink) { 1724 1716 printk(KERN_DEBUG 1725 - "%s: truncating inode %lu to %Ld bytes\n", 1717 + "%s: truncating inode %lu to %lld bytes\n", 1726 1718 __func__, inode->i_ino, inode->i_size); 1727 - jbd_debug(2, "truncating inode %lu to %Ld bytes\n", 1719 + jbd_debug(2, "truncating inode %lu to %lld bytes\n", 1728 1720 inode->i_ino, inode->i_size); 1729 1721 ext4_truncate(inode); 1730 1722 nr_truncates++; ··· 1922 1914 unsigned long journal_devnum = 0; 1923 1915 unsigned long def_mount_opts; 1924 1916 struct inode *root; 1917 + char *cp; 1925 1918 int ret = -EINVAL; 1926 1919 int blocksize; 1927 1920 int db_count; ··· 1939 1930 sbi->s_mount_opt = 0; 1940 1931 sbi->s_resuid = EXT4_DEF_RESUID; 1941 1932 sbi->s_resgid = EXT4_DEF_RESGID; 1933 + sbi->s_inode_readahead_blks = EXT4_DEF_INODE_READAHEAD_BLKS; 1942 1934 sbi->s_sb_block = sb_block; 1943 1935 1944 1936 unlock_kernel(); 1937 + 1938 + /* Cleanup superblock name */ 1939 + for (cp = sb->s_id; (cp = strchr(cp, '/'));) 1940 + *cp = '!'; 1945 1941 1946 1942 blocksize = sb_min_blocksize(sb, EXT4_MIN_BLOCK_SIZE); 1947 1943 if (!blocksize) { ··· 1987 1973 set_opt(sbi->s_mount_opt, GRPID); 1988 1974 if (def_mount_opts & EXT4_DEFM_UID16) 1989 1975 set_opt(sbi->s_mount_opt, NO_UID32); 1990 - #ifdef CONFIG_EXT4DEV_FS_XATTR 1976 + #ifdef CONFIG_EXT4_FS_XATTR 1991 1977 if (def_mount_opts & EXT4_DEFM_XATTR_USER) 1992 1978 set_opt(sbi->s_mount_opt, XATTR_USER); 1993 1979 #endif 1994 - #ifdef CONFIG_EXT4DEV_FS_POSIX_ACL 1980 + #ifdef CONFIG_EXT4_FS_POSIX_ACL 1995 1981 if (def_mount_opts & EXT4_DEFM_ACL) 1996 1982 set_opt(sbi->s_mount_opt, POSIX_ACL); 1997 1983 #endif ··· 2026 2012 ext4_warning(sb, __func__, 2027 2013 "extents feature not enabled on this filesystem, " 2028 2014 "use tune2fs.\n"); 2029 - /* 2030 - * turn on mballoc code by default in ext4 filesystem 2031 - * Use -o nomballoc to turn it off 2032 - */ 2033 - set_opt(sbi->s_mount_opt, MBALLOC); 2034 2015 2035 2016 /* 2036 2017 * enable delayed allocation by default ··· 2048 2039 printk(KERN_WARNING 2049 2040 "EXT4-fs warning: feature flags set on rev 0 fs, " 2050 2041 "running e2fsck is recommended\n"); 2051 - 2052 - /* 2053 - * Since ext4 is still considered development code, we require 2054 - * that the TEST_FILESYS flag in s->flags be set. 2055 - */ 2056 - if (!(le32_to_cpu(es->s_flags) & EXT2_FLAGS_TEST_FILESYS)) { 2057 - printk(KERN_WARNING "EXT4-fs: %s: not marked " 2058 - "OK to use with test code.\n", sb->s_id); 2059 - goto failed_mount; 2060 - } 2061 2042 2062 2043 /* 2063 2044 * Check feature flags regardless of the revision level, since we ··· 2218 2219 goto failed_mount; 2219 2220 } 2220 2221 2222 + if (ext4_proc_root) 2223 + sbi->s_proc = proc_mkdir(sb->s_id, ext4_proc_root); 2224 + 2225 + if (sbi->s_proc) 2226 + proc_create_data("inode_readahead_blks", 0644, sbi->s_proc, 2227 + &ext4_ui_proc_fops, 2228 + &sbi->s_inode_readahead_blks); 2229 + 2221 2230 bgl_lock_init(&sbi->s_blockgroup_lock); 2222 2231 2223 2232 for (i = 0; i < db_count; i++) { ··· 2264 2257 err = percpu_counter_init(&sbi->s_dirs_counter, 2265 2258 ext4_count_dirs(sb)); 2266 2259 } 2260 + if (!err) { 2261 + err = percpu_counter_init(&sbi->s_dirtyblocks_counter, 0); 2262 + } 2267 2263 if (err) { 2268 2264 printk(KERN_ERR "EXT4-fs: insufficient memory\n"); 2269 2265 goto failed_mount3; 2270 2266 } 2271 - 2272 - /* per fileystem reservation list head & lock */ 2273 - spin_lock_init(&sbi->s_rsv_window_lock); 2274 - sbi->s_rsv_window_root = RB_ROOT; 2275 - /* Add a single, static dummy reservation to the start of the 2276 - * reservation window list --- it gives us a placeholder for 2277 - * append-at-start-of-list which makes the allocation logic 2278 - * _much_ simpler. */ 2279 - sbi->s_rsv_window_head.rsv_start = EXT4_RESERVE_WINDOW_NOT_ALLOCATED; 2280 - sbi->s_rsv_window_head.rsv_end = EXT4_RESERVE_WINDOW_NOT_ALLOCATED; 2281 - sbi->s_rsv_window_head.rsv_alloc_hit = 0; 2282 - sbi->s_rsv_window_head.rsv_goal_size = 0; 2283 - ext4_rsv_window_add(sb, &sbi->s_rsv_window_head); 2284 2267 2285 2268 sbi->s_stripe = ext4_get_stripe_size(sbi); 2286 2269 ··· 2468 2471 printk(KERN_INFO "EXT4-fs: delayed allocation enabled\n"); 2469 2472 2470 2473 ext4_ext_init(sb); 2471 - ext4_mb_init(sb, needs_recovery); 2474 + err = ext4_mb_init(sb, needs_recovery); 2475 + if (err) { 2476 + printk(KERN_ERR "EXT4-fs: failed to initalize mballoc (%d)\n", 2477 + err); 2478 + goto failed_mount4; 2479 + } 2472 2480 2473 2481 lock_kernel(); 2474 2482 return 0; ··· 2491 2489 percpu_counter_destroy(&sbi->s_freeblocks_counter); 2492 2490 percpu_counter_destroy(&sbi->s_freeinodes_counter); 2493 2491 percpu_counter_destroy(&sbi->s_dirs_counter); 2492 + percpu_counter_destroy(&sbi->s_dirtyblocks_counter); 2494 2493 failed_mount2: 2495 2494 for (i = 0; i < db_count; i++) 2496 2495 brelse(sbi->s_group_desc[i]); 2497 2496 kfree(sbi->s_group_desc); 2498 2497 failed_mount: 2498 + if (sbi->s_proc) { 2499 + remove_proc_entry("inode_readahead_blks", sbi->s_proc); 2500 + remove_proc_entry(sb->s_id, ext4_proc_root); 2501 + } 2499 2502 #ifdef CONFIG_QUOTA 2500 2503 for (i = 0; i < MAXQUOTAS; i++) 2501 2504 kfree(sbi->s_qf_names[i]); ··· 2559 2552 return NULL; 2560 2553 } 2561 2554 2562 - jbd_debug(2, "Journal inode found at %p: %Ld bytes\n", 2555 + jbd_debug(2, "Journal inode found at %p: %lld bytes\n", 2563 2556 journal_inode, journal_inode->i_size); 2564 2557 if (!S_ISREG(journal_inode->i_mode)) { 2565 2558 printk(KERN_ERR "EXT4-fs: invalid journal inode.\n"); ··· 2722 2715 return -EINVAL; 2723 2716 } 2724 2717 2718 + if (journal->j_flags & JBD2_BARRIER) 2719 + printk(KERN_INFO "EXT4-fs: barriers enabled\n"); 2720 + else 2721 + printk(KERN_INFO "EXT4-fs: barriers disabled\n"); 2722 + 2725 2723 if (!really_read_only && test_opt(sb, UPDATE_JOURNAL)) { 2726 2724 err = jbd2_journal_update_format(journal); 2727 2725 if (err) { ··· 2811 2799 2812 2800 if (!sbh) 2813 2801 return; 2802 + if (buffer_write_io_error(sbh)) { 2803 + /* 2804 + * Oh, dear. A previous attempt to write the 2805 + * superblock failed. This could happen because the 2806 + * USB device was yanked out. Or it could happen to 2807 + * be a transient write error and maybe the block will 2808 + * be remapped. Nothing we can do but to retry the 2809 + * write and hope for the best. 2810 + */ 2811 + printk(KERN_ERR "ext4: previous I/O error to " 2812 + "superblock detected for %s.\n", sb->s_id); 2813 + clear_buffer_write_io_error(sbh); 2814 + set_buffer_uptodate(sbh); 2815 + } 2814 2816 es->s_wtime = cpu_to_le32(get_seconds()); 2815 2817 ext4_free_blocks_count_set(es, ext4_count_free_blocks(sb)); 2816 2818 es->s_free_inodes_count = cpu_to_le32(ext4_count_free_inodes(sb)); 2817 2819 BUFFER_TRACE(sbh, "marking dirty"); 2818 2820 mark_buffer_dirty(sbh); 2819 - if (sync) 2821 + if (sync) { 2820 2822 sync_dirty_buffer(sbh); 2823 + if (buffer_write_io_error(sbh)) { 2824 + printk(KERN_ERR "ext4: I/O error while writing " 2825 + "superblock for %s.\n", sb->s_id); 2826 + clear_buffer_write_io_error(sbh); 2827 + set_buffer_uptodate(sbh); 2828 + } 2829 + } 2821 2830 } 2822 2831 2823 2832 ··· 2940 2907 { 2941 2908 tid_t target; 2942 2909 2910 + trace_mark(ext4_sync_fs, "dev %s wait %d", sb->s_id, wait); 2943 2911 sb->s_dirt = 0; 2944 2912 if (jbd2_journal_start_commit(EXT4_SB(sb)->s_journal, &target)) { 2945 2913 if (wait) ··· 3196 3162 buf->f_type = EXT4_SUPER_MAGIC; 3197 3163 buf->f_bsize = sb->s_blocksize; 3198 3164 buf->f_blocks = ext4_blocks_count(es) - sbi->s_overhead_last; 3199 - buf->f_bfree = percpu_counter_sum_positive(&sbi->s_freeblocks_counter); 3165 + buf->f_bfree = percpu_counter_sum_positive(&sbi->s_freeblocks_counter) - 3166 + percpu_counter_sum_positive(&sbi->s_dirtyblocks_counter); 3200 3167 ext4_free_blocks_count_set(es, buf->f_bfree); 3201 3168 buf->f_bavail = buf->f_bfree - ext4_r_blocks_count(es); 3202 3169 if (buf->f_bfree < ext4_r_blocks_count(es)) ··· 3467 3432 handle_t *handle = journal_current_handle(); 3468 3433 3469 3434 if (!handle) { 3470 - printk(KERN_WARNING "EXT4-fs: Quota write (off=%Lu, len=%Lu)" 3435 + printk(KERN_WARNING "EXT4-fs: Quota write (off=%llu, len=%llu)" 3471 3436 " cancelled because transaction is not started.\n", 3472 3437 (unsigned long long)off, (unsigned long long)len); 3473 3438 return -EIO; ··· 3528 3493 return get_sb_bdev(fs_type, flags, dev_name, data, ext4_fill_super, mnt); 3529 3494 } 3530 3495 3531 - static struct file_system_type ext4dev_fs_type = { 3496 + #ifdef CONFIG_PROC_FS 3497 + static int ext4_ui_proc_show(struct seq_file *m, void *v) 3498 + { 3499 + unsigned int *p = m->private; 3500 + 3501 + seq_printf(m, "%u\n", *p); 3502 + return 0; 3503 + } 3504 + 3505 + static int ext4_ui_proc_open(struct inode *inode, struct file *file) 3506 + { 3507 + return single_open(file, ext4_ui_proc_show, PDE(inode)->data); 3508 + } 3509 + 3510 + static ssize_t ext4_ui_proc_write(struct file *file, const char __user *buf, 3511 + size_t cnt, loff_t *ppos) 3512 + { 3513 + unsigned int *p = PDE(file->f_path.dentry->d_inode)->data; 3514 + char str[32]; 3515 + unsigned long value; 3516 + 3517 + if (cnt >= sizeof(str)) 3518 + return -EINVAL; 3519 + if (copy_from_user(str, buf, cnt)) 3520 + return -EFAULT; 3521 + value = simple_strtol(str, NULL, 0); 3522 + if (value < 0) 3523 + return -ERANGE; 3524 + *p = value; 3525 + return cnt; 3526 + } 3527 + 3528 + const struct file_operations ext4_ui_proc_fops = { 3532 3529 .owner = THIS_MODULE, 3533 - .name = "ext4dev", 3530 + .open = ext4_ui_proc_open, 3531 + .read = seq_read, 3532 + .llseek = seq_lseek, 3533 + .release = single_release, 3534 + .write = ext4_ui_proc_write, 3535 + }; 3536 + #endif 3537 + 3538 + static struct file_system_type ext4_fs_type = { 3539 + .owner = THIS_MODULE, 3540 + .name = "ext4", 3534 3541 .get_sb = ext4_get_sb, 3535 3542 .kill_sb = kill_block_super, 3536 3543 .fs_flags = FS_REQUIRES_DEV, 3537 3544 }; 3538 3545 3546 + #ifdef CONFIG_EXT4DEV_COMPAT 3547 + static int ext4dev_get_sb(struct file_system_type *fs_type, 3548 + int flags, const char *dev_name, void *data, struct vfsmount *mnt) 3549 + { 3550 + printk(KERN_WARNING "EXT4-fs: Update your userspace programs " 3551 + "to mount using ext4\n"); 3552 + printk(KERN_WARNING "EXT4-fs: ext4dev backwards compatibility " 3553 + "will go away by 2.6.31\n"); 3554 + return get_sb_bdev(fs_type, flags, dev_name, data, ext4_fill_super, mnt); 3555 + } 3556 + 3557 + static struct file_system_type ext4dev_fs_type = { 3558 + .owner = THIS_MODULE, 3559 + .name = "ext4dev", 3560 + .get_sb = ext4dev_get_sb, 3561 + .kill_sb = kill_block_super, 3562 + .fs_flags = FS_REQUIRES_DEV, 3563 + }; 3564 + MODULE_ALIAS("ext4dev"); 3565 + #endif 3566 + 3539 3567 static int __init init_ext4_fs(void) 3540 3568 { 3541 3569 int err; 3542 3570 3571 + ext4_proc_root = proc_mkdir("fs/ext4", NULL); 3543 3572 err = init_ext4_mballoc(); 3544 3573 if (err) 3545 3574 return err; ··· 3614 3515 err = init_inodecache(); 3615 3516 if (err) 3616 3517 goto out1; 3617 - err = register_filesystem(&ext4dev_fs_type); 3518 + err = register_filesystem(&ext4_fs_type); 3618 3519 if (err) 3619 3520 goto out; 3521 + #ifdef CONFIG_EXT4DEV_COMPAT 3522 + err = register_filesystem(&ext4dev_fs_type); 3523 + if (err) { 3524 + unregister_filesystem(&ext4_fs_type); 3525 + goto out; 3526 + } 3527 + #endif 3620 3528 return 0; 3621 3529 out: 3622 3530 destroy_inodecache(); ··· 3636 3530 3637 3531 static void __exit exit_ext4_fs(void) 3638 3532 { 3533 + unregister_filesystem(&ext4_fs_type); 3534 + #ifdef CONFIG_EXT4DEV_COMPAT 3639 3535 unregister_filesystem(&ext4dev_fs_type); 3536 + #endif 3640 3537 destroy_inodecache(); 3641 3538 exit_ext4_xattr(); 3642 3539 exit_ext4_mballoc(); 3540 + remove_proc_entry("fs/ext4", NULL); 3643 3541 } 3644 3542 3645 3543 MODULE_AUTHOR("Remy Card, Stephen Tweedie, Andrew Morton, Andreas Dilger, Theodore Ts'o and others");

+4 -4

fs/ext4/symlink.c

··· 23 23 #include "ext4.h" 24 24 #include "xattr.h" 25 25 26 - static void * ext4_follow_link(struct dentry *dentry, struct nameidata *nd) 26 + static void *ext4_follow_link(struct dentry *dentry, struct nameidata *nd) 27 27 { 28 28 struct ext4_inode_info *ei = EXT4_I(dentry->d_inode); 29 - nd_set_link(nd, (char*)ei->i_data); 29 + nd_set_link(nd, (char *) ei->i_data); 30 30 return NULL; 31 31 } 32 32 ··· 34 34 .readlink = generic_readlink, 35 35 .follow_link = page_follow_link_light, 36 36 .put_link = page_put_link, 37 - #ifdef CONFIG_EXT4DEV_FS_XATTR 37 + #ifdef CONFIG_EXT4_FS_XATTR 38 38 .setxattr = generic_setxattr, 39 39 .getxattr = generic_getxattr, 40 40 .listxattr = ext4_listxattr, ··· 45 45 const struct inode_operations ext4_fast_symlink_inode_operations = { 46 46 .readlink = generic_readlink, 47 47 .follow_link = ext4_follow_link, 48 - #ifdef CONFIG_EXT4DEV_FS_XATTR 48 + #ifdef CONFIG_EXT4_FS_XATTR 49 49 .setxattr = generic_setxattr, 50 50 .getxattr = generic_getxattr, 51 51 .listxattr = ext4_listxattr,

+10 -4

fs/ext4/xattr.c

··· 99 99 100 100 static struct xattr_handler *ext4_xattr_handler_map[] = { 101 101 [EXT4_XATTR_INDEX_USER] = &ext4_xattr_user_handler, 102 - #ifdef CONFIG_EXT4DEV_FS_POSIX_ACL 102 + #ifdef CONFIG_EXT4_FS_POSIX_ACL 103 103 [EXT4_XATTR_INDEX_POSIX_ACL_ACCESS] = &ext4_xattr_acl_access_handler, 104 104 [EXT4_XATTR_INDEX_POSIX_ACL_DEFAULT] = &ext4_xattr_acl_default_handler, 105 105 #endif 106 106 [EXT4_XATTR_INDEX_TRUSTED] = &ext4_xattr_trusted_handler, 107 - #ifdef CONFIG_EXT4DEV_FS_SECURITY 107 + #ifdef CONFIG_EXT4_FS_SECURITY 108 108 [EXT4_XATTR_INDEX_SECURITY] = &ext4_xattr_security_handler, 109 109 #endif 110 110 }; ··· 112 112 struct xattr_handler *ext4_xattr_handlers[] = { 113 113 &ext4_xattr_user_handler, 114 114 &ext4_xattr_trusted_handler, 115 - #ifdef CONFIG_EXT4DEV_FS_POSIX_ACL 115 + #ifdef CONFIG_EXT4_FS_POSIX_ACL 116 116 &ext4_xattr_acl_access_handler, 117 117 &ext4_xattr_acl_default_handler, 118 118 #endif 119 - #ifdef CONFIG_EXT4DEV_FS_SECURITY 119 + #ifdef CONFIG_EXT4_FS_SECURITY 120 120 &ext4_xattr_security_handler, 121 121 #endif 122 122 NULL ··· 959 959 struct ext4_xattr_block_find bs = { 960 960 .s = { .not_found = -ENODATA, }, 961 961 }; 962 + unsigned long no_expand; 962 963 int error; 963 964 964 965 if (!name) ··· 967 966 if (strlen(name) > 255) 968 967 return -ERANGE; 969 968 down_write(&EXT4_I(inode)->xattr_sem); 969 + no_expand = EXT4_I(inode)->i_state & EXT4_STATE_NO_EXPAND; 970 + EXT4_I(inode)->i_state |= EXT4_STATE_NO_EXPAND; 971 + 970 972 error = ext4_get_inode_loc(inode, &is.iloc); 971 973 if (error) 972 974 goto cleanup; ··· 1046 1042 cleanup: 1047 1043 brelse(is.iloc.bh); 1048 1044 brelse(bs.bh); 1045 + if (no_expand == 0) 1046 + EXT4_I(inode)->i_state &= ~EXT4_STATE_NO_EXPAND; 1049 1047 up_write(&EXT4_I(inode)->xattr_sem); 1050 1048 return error; 1051 1049 }

+6 -6

fs/ext4/xattr.h

··· 51 51 (((name_len) + EXT4_XATTR_ROUND + \ 52 52 sizeof(struct ext4_xattr_entry)) & ~EXT4_XATTR_ROUND) 53 53 #define EXT4_XATTR_NEXT(entry) \ 54 - ( (struct ext4_xattr_entry *)( \ 55 - (char *)(entry) + EXT4_XATTR_LEN((entry)->e_name_len)) ) 54 + ((struct ext4_xattr_entry *)( \ 55 + (char *)(entry) + EXT4_XATTR_LEN((entry)->e_name_len))) 56 56 #define EXT4_XATTR_SIZE(size) \ 57 57 (((size) + EXT4_XATTR_ROUND) & ~EXT4_XATTR_ROUND) 58 58 ··· 63 63 EXT4_I(inode)->i_extra_isize)) 64 64 #define IFIRST(hdr) ((struct ext4_xattr_entry *)((hdr)+1)) 65 65 66 - # ifdef CONFIG_EXT4DEV_FS_XATTR 66 + # ifdef CONFIG_EXT4_FS_XATTR 67 67 68 68 extern struct xattr_handler ext4_xattr_user_handler; 69 69 extern struct xattr_handler ext4_xattr_trusted_handler; ··· 88 88 89 89 extern struct xattr_handler *ext4_xattr_handlers[]; 90 90 91 - # else /* CONFIG_EXT4DEV_FS_XATTR */ 91 + # else /* CONFIG_EXT4_FS_XATTR */ 92 92 93 93 static inline int 94 94 ext4_xattr_get(struct inode *inode, int name_index, const char *name, ··· 141 141 142 142 #define ext4_xattr_handlers NULL 143 143 144 - # endif /* CONFIG_EXT4DEV_FS_XATTR */ 144 + # endif /* CONFIG_EXT4_FS_XATTR */ 145 145 146 - #ifdef CONFIG_EXT4DEV_FS_SECURITY 146 + #ifdef CONFIG_EXT4_FS_SECURITY 147 147 extern int ext4_init_security(handle_t *handle, struct inode *inode, 148 148 struct inode *dir); 149 149 #else

+273

fs/ioctl.c

··· 13 13 #include <linux/security.h> 14 14 #include <linux/module.h> 15 15 #include <linux/uaccess.h> 16 + #include <linux/writeback.h> 17 + #include <linux/buffer_head.h> 16 18 17 19 #include <asm/ioctls.h> 20 + 21 + /* So that the fiemap access checks can't overflow on 32 bit machines. */ 22 + #define FIEMAP_MAX_EXTENTS (UINT_MAX / sizeof(struct fiemap_extent)) 18 23 19 24 /** 20 25 * vfs_ioctl - call filesystem specific ioctl methods ··· 76 71 return put_user(res, p); 77 72 } 78 73 74 + /** 75 + * fiemap_fill_next_extent - Fiemap helper function 76 + * @fieinfo: Fiemap context passed into ->fiemap 77 + * @logical: Extent logical start offset, in bytes 78 + * @phys: Extent physical start offset, in bytes 79 + * @len: Extent length, in bytes 80 + * @flags: FIEMAP_EXTENT flags that describe this extent 81 + * 82 + * Called from file system ->fiemap callback. Will populate extent 83 + * info as passed in via arguments and copy to user memory. On 84 + * success, extent count on fieinfo is incremented. 85 + * 86 + * Returns 0 on success, -errno on error, 1 if this was the last 87 + * extent that will fit in user array. 88 + */ 89 + #define SET_UNKNOWN_FLAGS (FIEMAP_EXTENT_DELALLOC) 90 + #define SET_NO_UNMOUNTED_IO_FLAGS (FIEMAP_EXTENT_DATA_ENCRYPTED) 91 + #define SET_NOT_ALIGNED_FLAGS (FIEMAP_EXTENT_DATA_TAIL|FIEMAP_EXTENT_DATA_INLINE) 92 + int fiemap_fill_next_extent(struct fiemap_extent_info *fieinfo, u64 logical, 93 + u64 phys, u64 len, u32 flags) 94 + { 95 + struct fiemap_extent extent; 96 + struct fiemap_extent *dest = fieinfo->fi_extents_start; 97 + 98 + /* only count the extents */ 99 + if (fieinfo->fi_extents_max == 0) { 100 + fieinfo->fi_extents_mapped++; 101 + return (flags & FIEMAP_EXTENT_LAST) ? 1 : 0; 102 + } 103 + 104 + if (fieinfo->fi_extents_mapped >= fieinfo->fi_extents_max) 105 + return 1; 106 + 107 + if (flags & SET_UNKNOWN_FLAGS) 108 + flags |= FIEMAP_EXTENT_UNKNOWN; 109 + if (flags & SET_NO_UNMOUNTED_IO_FLAGS) 110 + flags |= FIEMAP_EXTENT_ENCODED; 111 + if (flags & SET_NOT_ALIGNED_FLAGS) 112 + flags |= FIEMAP_EXTENT_NOT_ALIGNED; 113 + 114 + memset(&extent, 0, sizeof(extent)); 115 + extent.fe_logical = logical; 116 + extent.fe_physical = phys; 117 + extent.fe_length = len; 118 + extent.fe_flags = flags; 119 + 120 + dest += fieinfo->fi_extents_mapped; 121 + if (copy_to_user(dest, &extent, sizeof(extent))) 122 + return -EFAULT; 123 + 124 + fieinfo->fi_extents_mapped++; 125 + if (fieinfo->fi_extents_mapped == fieinfo->fi_extents_max) 126 + return 1; 127 + return (flags & FIEMAP_EXTENT_LAST) ? 1 : 0; 128 + } 129 + EXPORT_SYMBOL(fiemap_fill_next_extent); 130 + 131 + /** 132 + * fiemap_check_flags - check validity of requested flags for fiemap 133 + * @fieinfo: Fiemap context passed into ->fiemap 134 + * @fs_flags: Set of fiemap flags that the file system understands 135 + * 136 + * Called from file system ->fiemap callback. This will compute the 137 + * intersection of valid fiemap flags and those that the fs supports. That 138 + * value is then compared against the user supplied flags. In case of bad user 139 + * flags, the invalid values will be written into the fieinfo structure, and 140 + * -EBADR is returned, which tells ioctl_fiemap() to return those values to 141 + * userspace. For this reason, a return code of -EBADR should be preserved. 142 + * 143 + * Returns 0 on success, -EBADR on bad flags. 144 + */ 145 + int fiemap_check_flags(struct fiemap_extent_info *fieinfo, u32 fs_flags) 146 + { 147 + u32 incompat_flags; 148 + 149 + incompat_flags = fieinfo->fi_flags & ~(FIEMAP_FLAGS_COMPAT & fs_flags); 150 + if (incompat_flags) { 151 + fieinfo->fi_flags = incompat_flags; 152 + return -EBADR; 153 + } 154 + return 0; 155 + } 156 + EXPORT_SYMBOL(fiemap_check_flags); 157 + 158 + static int fiemap_check_ranges(struct super_block *sb, 159 + u64 start, u64 len, u64 *new_len) 160 + { 161 + *new_len = len; 162 + 163 + if (len == 0) 164 + return -EINVAL; 165 + 166 + if (start > sb->s_maxbytes) 167 + return -EFBIG; 168 + 169 + /* 170 + * Shrink request scope to what the fs can actually handle. 171 + */ 172 + if ((len > sb->s_maxbytes) || 173 + (sb->s_maxbytes - len) < start) 174 + *new_len = sb->s_maxbytes - start; 175 + 176 + return 0; 177 + } 178 + 179 + static int ioctl_fiemap(struct file *filp, unsigned long arg) 180 + { 181 + struct fiemap fiemap; 182 + struct fiemap_extent_info fieinfo = { 0, }; 183 + struct inode *inode = filp->f_path.dentry->d_inode; 184 + struct super_block *sb = inode->i_sb; 185 + u64 len; 186 + int error; 187 + 188 + if (!inode->i_op->fiemap) 189 + return -EOPNOTSUPP; 190 + 191 + if (copy_from_user(&fiemap, (struct fiemap __user *)arg, 192 + sizeof(struct fiemap))) 193 + return -EFAULT; 194 + 195 + if (fiemap.fm_extent_count > FIEMAP_MAX_EXTENTS) 196 + return -EINVAL; 197 + 198 + error = fiemap_check_ranges(sb, fiemap.fm_start, fiemap.fm_length, 199 + &len); 200 + if (error) 201 + return error; 202 + 203 + fieinfo.fi_flags = fiemap.fm_flags; 204 + fieinfo.fi_extents_max = fiemap.fm_extent_count; 205 + fieinfo.fi_extents_start = (struct fiemap_extent *)(arg + sizeof(fiemap)); 206 + 207 + if (fiemap.fm_extent_count != 0 && 208 + !access_ok(VERIFY_WRITE, fieinfo.fi_extents_start, 209 + fieinfo.fi_extents_max * sizeof(struct fiemap_extent))) 210 + return -EFAULT; 211 + 212 + if (fieinfo.fi_flags & FIEMAP_FLAG_SYNC) 213 + filemap_write_and_wait(inode->i_mapping); 214 + 215 + error = inode->i_op->fiemap(inode, &fieinfo, fiemap.fm_start, len); 216 + fiemap.fm_flags = fieinfo.fi_flags; 217 + fiemap.fm_mapped_extents = fieinfo.fi_extents_mapped; 218 + if (copy_to_user((char *)arg, &fiemap, sizeof(fiemap))) 219 + error = -EFAULT; 220 + 221 + return error; 222 + } 223 + 224 + #define blk_to_logical(inode, blk) (blk << (inode)->i_blkbits) 225 + #define logical_to_blk(inode, offset) (offset >> (inode)->i_blkbits); 226 + 227 + /* 228 + * @inode - the inode to map 229 + * @arg - the pointer to userspace where we copy everything to 230 + * @get_block - the fs's get_block function 231 + * 232 + * This does FIEMAP for block based inodes. Basically it will just loop 233 + * through get_block until we hit the number of extents we want to map, or we 234 + * go past the end of the file and hit a hole. 235 + * 236 + * If it is possible to have data blocks beyond a hole past @inode->i_size, then 237 + * please do not use this function, it will stop at the first unmapped block 238 + * beyond i_size 239 + */ 240 + int generic_block_fiemap(struct inode *inode, 241 + struct fiemap_extent_info *fieinfo, u64 start, 242 + u64 len, get_block_t *get_block) 243 + { 244 + struct buffer_head tmp; 245 + unsigned int start_blk; 246 + long long length = 0, map_len = 0; 247 + u64 logical = 0, phys = 0, size = 0; 248 + u32 flags = FIEMAP_EXTENT_MERGED; 249 + int ret = 0; 250 + 251 + if ((ret = fiemap_check_flags(fieinfo, FIEMAP_FLAG_SYNC))) 252 + return ret; 253 + 254 + start_blk = logical_to_blk(inode, start); 255 + 256 + /* guard against change */ 257 + mutex_lock(&inode->i_mutex); 258 + 259 + length = (long long)min_t(u64, len, i_size_read(inode)); 260 + map_len = length; 261 + 262 + do { 263 + /* 264 + * we set b_size to the total size we want so it will map as 265 + * many contiguous blocks as possible at once 266 + */ 267 + memset(&tmp, 0, sizeof(struct buffer_head)); 268 + tmp.b_size = map_len; 269 + 270 + ret = get_block(inode, start_blk, &tmp, 0); 271 + if (ret) 272 + break; 273 + 274 + /* HOLE */ 275 + if (!buffer_mapped(&tmp)) { 276 + /* 277 + * first hole after going past the EOF, this is our 278 + * last extent 279 + */ 280 + if (length <= 0) { 281 + flags = FIEMAP_EXTENT_MERGED|FIEMAP_EXTENT_LAST; 282 + ret = fiemap_fill_next_extent(fieinfo, logical, 283 + phys, size, 284 + flags); 285 + break; 286 + } 287 + 288 + length -= blk_to_logical(inode, 1); 289 + 290 + /* if we have holes up to/past EOF then we're done */ 291 + if (length <= 0) 292 + break; 293 + 294 + start_blk++; 295 + } else { 296 + if (length <= 0 && size) { 297 + ret = fiemap_fill_next_extent(fieinfo, logical, 298 + phys, size, 299 + flags); 300 + if (ret) 301 + break; 302 + } 303 + 304 + logical = blk_to_logical(inode, start_blk); 305 + phys = blk_to_logical(inode, tmp.b_blocknr); 306 + size = tmp.b_size; 307 + flags = FIEMAP_EXTENT_MERGED; 308 + 309 + length -= tmp.b_size; 310 + start_blk += logical_to_blk(inode, size); 311 + 312 + /* 313 + * if we are past the EOF we need to loop again to see 314 + * if there is a hole so we can mark this extent as the 315 + * last one, and if not keep mapping things until we 316 + * find a hole, or we run out of slots in the extent 317 + * array 318 + */ 319 + if (length <= 0) 320 + continue; 321 + 322 + ret = fiemap_fill_next_extent(fieinfo, logical, phys, 323 + size, flags); 324 + if (ret) 325 + break; 326 + } 327 + cond_resched(); 328 + } while (1); 329 + 330 + mutex_unlock(&inode->i_mutex); 331 + 332 + /* if ret is 1 then we just hit the end of the extent array */ 333 + if (ret == 1) 334 + ret = 0; 335 + 336 + return ret; 337 + } 338 + EXPORT_SYMBOL(generic_block_fiemap); 339 + 79 340 static int file_ioctl(struct file *filp, unsigned int cmd, 80 341 unsigned long arg) 81 342 { ··· 351 80 switch (cmd) { 352 81 case FIBMAP: 353 82 return ioctl_fibmap(filp, p); 83 + case FS_IOC_FIEMAP: 84 + return ioctl_fiemap(filp, arg); 354 85 case FIGETBSZ: 355 86 return put_user(inode->i_sb->s_blocksize, p); 356 87 case FIONREAD:

+20 -2

fs/jbd2/checkpoint.c

··· 20 20 #include <linux/time.h> 21 21 #include <linux/fs.h> 22 22 #include <linux/jbd2.h> 23 + #include <linux/marker.h> 23 24 #include <linux/errno.h> 24 25 #include <linux/slab.h> 25 26 ··· 127 126 128 127 /* 129 128 * Test again, another process may have checkpointed while we 130 - * were waiting for the checkpoint lock 129 + * were waiting for the checkpoint lock. If there are no 130 + * outstanding transactions there is nothing to checkpoint and 131 + * we can't make progress. Abort the journal in this case. 131 132 */ 132 133 spin_lock(&journal->j_state_lock); 134 + spin_lock(&journal->j_list_lock); 133 135 nblocks = jbd_space_needed(journal); 134 136 if (__jbd2_log_space_left(journal) < nblocks) { 137 + int chkpt = journal->j_checkpoint_transactions != NULL; 138 + 139 + spin_unlock(&journal->j_list_lock); 135 140 spin_unlock(&journal->j_state_lock); 136 - jbd2_log_do_checkpoint(journal); 141 + if (chkpt) { 142 + jbd2_log_do_checkpoint(journal); 143 + } else { 144 + printk(KERN_ERR "%s: no transactions\n", 145 + __func__); 146 + jbd2_journal_abort(journal, 0); 147 + } 148 + 137 149 spin_lock(&journal->j_state_lock); 150 + } else { 151 + spin_unlock(&journal->j_list_lock); 138 152 } 139 153 mutex_unlock(&journal->j_checkpoint_mutex); 140 154 } ··· 329 313 * journal straight away. 330 314 */ 331 315 result = jbd2_cleanup_journal_tail(journal); 316 + trace_mark(jbd2_checkpoint, "dev %s need_checkpoint %d", 317 + journal->j_devname, result); 332 318 jbd_debug(1, "cleanup_journal_tail returned %d\n", result); 333 319 if (result <= 0) 334 320 return result;

+11 -11

fs/jbd2/commit.c

··· 16 16 #include <linux/time.h> 17 17 #include <linux/fs.h> 18 18 #include <linux/jbd2.h> 19 + #include <linux/marker.h> 19 20 #include <linux/errno.h> 20 21 #include <linux/slab.h> 21 22 #include <linux/mm.h> ··· 127 126 128 127 JBUFFER_TRACE(descriptor, "submit commit block"); 129 128 lock_buffer(bh); 130 - get_bh(bh); 131 - set_buffer_dirty(bh); 129 + clear_buffer_dirty(bh); 132 130 set_buffer_uptodate(bh); 133 131 bh->b_end_io = journal_end_buffer_io_sync; 134 132 ··· 147 147 * to remember if we sent a barrier request 148 148 */ 149 149 if (ret == -EOPNOTSUPP && barrier_done) { 150 - char b[BDEVNAME_SIZE]; 151 - 152 150 printk(KERN_WARNING 153 - "JBD: barrier-based sync failed on %s - " 154 - "disabling barriers\n", 155 - bdevname(journal->j_dev, b)); 151 + "JBD: barrier-based sync failed on %s - " 152 + "disabling barriers\n", journal->j_devname); 156 153 spin_lock(&journal->j_state_lock); 157 154 journal->j_flags &= ~JBD2_BARRIER; 158 155 spin_unlock(&journal->j_state_lock); ··· 157 160 /* And try again, without the barrier */ 158 161 lock_buffer(bh); 159 162 set_buffer_uptodate(bh); 160 - set_buffer_dirty(bh); 163 + clear_buffer_dirty(bh); 161 164 ret = submit_bh(WRITE, bh); 162 165 } 163 166 *cbh = bh; ··· 368 371 commit_transaction = journal->j_running_transaction; 369 372 J_ASSERT(commit_transaction->t_state == T_RUNNING); 370 373 374 + trace_mark(jbd2_start_commit, "dev %s transaction %d", 375 + journal->j_devname, commit_transaction->t_tid); 371 376 jbd_debug(1, "JBD: starting commit of transaction %d\n", 372 377 commit_transaction->t_tid); 373 378 ··· 680 681 */ 681 682 err = journal_finish_inode_data_buffers(journal, commit_transaction); 682 683 if (err) { 683 - char b[BDEVNAME_SIZE]; 684 - 685 684 printk(KERN_WARNING 686 685 "JBD2: Detected IO errors while flushing file data " 687 - "on %s\n", bdevname(journal->j_fs_dev, b)); 686 + "on %s\n", journal->j_devname); 688 687 err = 0; 689 688 } 690 689 ··· 987 990 } 988 991 spin_unlock(&journal->j_list_lock); 989 992 993 + trace_mark(jbd2_end_commit, "dev %s transaction %d head %d", 994 + journal->j_devname, commit_transaction->t_tid, 995 + journal->j_tail_sequence); 990 996 jbd_debug(1, "JBD: commit %d complete, head %d\n", 991 997 journal->j_commit_sequence, journal->j_tail_sequence); 992 998

+41 -34

fs/jbd2/journal.c

··· 597 597 if (ret) 598 598 *retp = ret; 599 599 else { 600 - char b[BDEVNAME_SIZE]; 601 - 602 600 printk(KERN_ALERT "%s: journal block not found " 603 601 "at offset %lu on %s\n", 604 - __func__, 605 - blocknr, 606 - bdevname(journal->j_dev, b)); 602 + __func__, blocknr, journal->j_devname); 607 603 err = -EIO; 608 604 __journal_abort_soft(journal, err); 609 605 } ··· 897 901 898 902 static void jbd2_stats_proc_init(journal_t *journal) 899 903 { 900 - char name[BDEVNAME_SIZE]; 901 - 902 - bdevname(journal->j_dev, name); 903 - journal->j_proc_entry = proc_mkdir(name, proc_jbd2_stats); 904 + journal->j_proc_entry = proc_mkdir(journal->j_devname, proc_jbd2_stats); 904 905 if (journal->j_proc_entry) { 905 906 proc_create_data("history", S_IRUGO, journal->j_proc_entry, 906 907 &jbd2_seq_history_fops, journal); ··· 908 915 909 916 static void jbd2_stats_proc_exit(journal_t *journal) 910 917 { 911 - char name[BDEVNAME_SIZE]; 912 - 913 - bdevname(journal->j_dev, name); 914 918 remove_proc_entry("info", journal->j_proc_entry); 915 919 remove_proc_entry("history", journal->j_proc_entry); 916 - remove_proc_entry(name, proc_jbd2_stats); 920 + remove_proc_entry(journal->j_devname, proc_jbd2_stats); 917 921 } 918 922 919 923 static void journal_init_stats(journal_t *journal) ··· 1008 1018 { 1009 1019 journal_t *journal = journal_init_common(); 1010 1020 struct buffer_head *bh; 1021 + char *p; 1011 1022 int n; 1012 1023 1013 1024 if (!journal) ··· 1030 1039 journal->j_fs_dev = fs_dev; 1031 1040 journal->j_blk_offset = start; 1032 1041 journal->j_maxlen = len; 1042 + bdevname(journal->j_dev, journal->j_devname); 1043 + p = journal->j_devname; 1044 + while ((p = strchr(p, '/'))) 1045 + *p = '!'; 1033 1046 jbd2_stats_proc_init(journal); 1034 1047 1035 1048 bh = __getblk(journal->j_dev, start, journal->j_blocksize); ··· 1056 1061 { 1057 1062 struct buffer_head *bh; 1058 1063 journal_t *journal = journal_init_common(); 1064 + char *p; 1059 1065 int err; 1060 1066 int n; 1061 1067 unsigned long long blocknr; ··· 1066 1070 1067 1071 journal->j_dev = journal->j_fs_dev = inode->i_sb->s_bdev; 1068 1072 journal->j_inode = inode; 1073 + bdevname(journal->j_dev, journal->j_devname); 1074 + p = journal->j_devname; 1075 + while ((p = strchr(p, '/'))) 1076 + *p = '!'; 1077 + p = journal->j_devname + strlen(journal->j_devname); 1078 + sprintf(p, ":%lu", journal->j_inode->i_ino); 1069 1079 jbd_debug(1, 1070 1080 "journal %p: inode %s/%ld, size %Ld, bits %d, blksize %ld\n", 1071 1081 journal, inode->i_sb->s_id, inode->i_ino, ··· 1255 1253 goto out; 1256 1254 } 1257 1255 1256 + if (buffer_write_io_error(bh)) { 1257 + /* 1258 + * Oh, dear. A previous attempt to write the journal 1259 + * superblock failed. This could happen because the 1260 + * USB device was yanked out. Or it could happen to 1261 + * be a transient write error and maybe the block will 1262 + * be remapped. Nothing we can do but to retry the 1263 + * write and hope for the best. 1264 + */ 1265 + printk(KERN_ERR "JBD2: previous I/O error detected " 1266 + "for journal superblock update for %s.\n", 1267 + journal->j_devname); 1268 + clear_buffer_write_io_error(bh); 1269 + set_buffer_uptodate(bh); 1270 + } 1271 + 1258 1272 spin_lock(&journal->j_state_lock); 1259 1273 jbd_debug(1,"JBD: updating superblock (start %ld, seq %d, errno %d)\n", 1260 1274 journal->j_tail, journal->j_tail_sequence, journal->j_errno); ··· 1282 1264 1283 1265 BUFFER_TRACE(bh, "marking dirty"); 1284 1266 mark_buffer_dirty(bh); 1285 - if (wait) 1267 + if (wait) { 1286 1268 sync_dirty_buffer(bh); 1287 - else 1269 + if (buffer_write_io_error(bh)) { 1270 + printk(KERN_ERR "JBD2: I/O error detected " 1271 + "when updating journal superblock for %s.\n", 1272 + journal->j_devname); 1273 + clear_buffer_write_io_error(bh); 1274 + set_buffer_uptodate(bh); 1275 + } 1276 + } else 1288 1277 ll_rw_block(SWRITE, 1, &bh); 1289 1278 1290 1279 out: ··· 1786 1761 } 1787 1762 1788 1763 /* 1789 - * journal_dev_name: format a character string to describe on what 1790 - * device this journal is present. 1791 - */ 1792 - 1793 - static const char *journal_dev_name(journal_t *journal, char *buffer) 1794 - { 1795 - struct block_device *bdev; 1796 - 1797 - if (journal->j_inode) 1798 - bdev = journal->j_inode->i_sb->s_bdev; 1799 - else 1800 - bdev = journal->j_dev; 1801 - 1802 - return bdevname(bdev, buffer); 1803 - } 1804 - 1805 - /* 1806 1764 * Journal abort has very specific semantics, which we describe 1807 1765 * for journal abort. 1808 1766 * ··· 1801 1793 void __jbd2_journal_abort_hard(journal_t *journal) 1802 1794 { 1803 1795 transaction_t *transaction; 1804 - char b[BDEVNAME_SIZE]; 1805 1796 1806 1797 if (journal->j_flags & JBD2_ABORT) 1807 1798 return; 1808 1799 1809 1800 printk(KERN_ERR "Aborting journal on device %s.\n", 1810 - journal_dev_name(journal, b)); 1801 + journal->j_devname); 1811 1802 1812 1803 spin_lock(&journal->j_state_lock); 1813 1804 journal->j_flags |= JBD2_ABORT;

-9

fs/ocfs2/alloc.c

··· 990 990 } 991 991 992 992 /* 993 - * This is only valid for leaf nodes, which are the only ones that can 994 - * have empty extents anyway. 995 - */ 996 - static inline int ocfs2_is_empty_extent(struct ocfs2_extent_rec *rec) 997 - { 998 - return !rec->e_leaf_clusters; 999 - } 1000 - 1001 - /* 1002 993 * This function will discard the rightmost extent record. 1003 994 */ 1004 995 static void ocfs2_shift_records_right(struct ocfs2_extent_list *el)

+9

fs/ocfs2/alloc.h

··· 146 146 return le16_to_cpu(rec->e_leaf_clusters); 147 147 } 148 148 149 + /* 150 + * This is only valid for leaf nodes, which are the only ones that can 151 + * have empty extents anyway. 152 + */ 153 + static inline int ocfs2_is_empty_extent(struct ocfs2_extent_rec *rec) 154 + { 155 + return !rec->e_leaf_clusters; 156 + } 157 + 149 158 #endif /* OCFS2_ALLOC_H */

+299 -59

fs/ocfs2/extent_map.c

··· 25 25 #include <linux/fs.h> 26 26 #include <linux/init.h> 27 27 #include <linux/types.h> 28 + #include <linux/fiemap.h> 28 29 29 30 #define MLOG_MASK_PREFIX ML_EXTENT_MAP 30 31 #include <cluster/masklog.h> ··· 33 32 #include "ocfs2.h" 34 33 35 34 #include "alloc.h" 35 + #include "dlmglue.h" 36 36 #include "extent_map.h" 37 37 #include "inode.h" 38 38 #include "super.h" ··· 284 282 kfree(new_emi); 285 283 } 286 284 285 + static int ocfs2_last_eb_is_empty(struct inode *inode, 286 + struct ocfs2_dinode *di) 287 + { 288 + int ret, next_free; 289 + u64 last_eb_blk = le64_to_cpu(di->i_last_eb_blk); 290 + struct buffer_head *eb_bh = NULL; 291 + struct ocfs2_extent_block *eb; 292 + struct ocfs2_extent_list *el; 293 + 294 + ret = ocfs2_read_block(OCFS2_SB(inode->i_sb), last_eb_blk, 295 + &eb_bh, OCFS2_BH_CACHED, inode); 296 + if (ret) { 297 + mlog_errno(ret); 298 + goto out; 299 + } 300 + 301 + eb = (struct ocfs2_extent_block *) eb_bh->b_data; 302 + el = &eb->h_list; 303 + 304 + if (!OCFS2_IS_VALID_EXTENT_BLOCK(eb)) { 305 + ret = -EROFS; 306 + OCFS2_RO_ON_INVALID_EXTENT_BLOCK(inode->i_sb, eb); 307 + goto out; 308 + } 309 + 310 + if (el->l_tree_depth) { 311 + ocfs2_error(inode->i_sb, 312 + "Inode %lu has non zero tree depth in " 313 + "leaf block %llu\n", inode->i_ino, 314 + (unsigned long long)eb_bh->b_blocknr); 315 + ret = -EROFS; 316 + goto out; 317 + } 318 + 319 + next_free = le16_to_cpu(el->l_next_free_rec); 320 + 321 + if (next_free == 0 || 322 + (next_free == 1 && ocfs2_is_empty_extent(&el->l_recs[0]))) 323 + ret = 1; 324 + 325 + out: 326 + brelse(eb_bh); 327 + return ret; 328 + } 329 + 287 330 /* 288 331 * Return the 1st index within el which contains an extent start 289 332 * larger than v_cluster. ··· 420 373 return ret; 421 374 } 422 375 423 - int ocfs2_get_clusters(struct inode *inode, u32 v_cluster, 424 - u32 *p_cluster, u32 *num_clusters, 425 - unsigned int *extent_flags) 376 + static int ocfs2_get_clusters_nocache(struct inode *inode, 377 + struct buffer_head *di_bh, 378 + u32 v_cluster, unsigned int *hole_len, 379 + struct ocfs2_extent_rec *ret_rec, 380 + unsigned int *is_last) 426 381 { 427 - int ret, i; 428 - unsigned int flags = 0; 429 - struct buffer_head *di_bh = NULL; 430 - struct buffer_head *eb_bh = NULL; 382 + int i, ret, tree_height, len; 431 383 struct ocfs2_dinode *di; 432 - struct ocfs2_extent_block *eb; 384 + struct ocfs2_extent_block *uninitialized_var(eb); 433 385 struct ocfs2_extent_list *el; 434 386 struct ocfs2_extent_rec *rec; 435 - u32 coff; 387 + struct buffer_head *eb_bh = NULL; 436 388 437 - if (OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL) { 438 - ret = -ERANGE; 439 - mlog_errno(ret); 440 - goto out; 441 - } 442 - 443 - ret = ocfs2_extent_map_lookup(inode, v_cluster, p_cluster, 444 - num_clusters, extent_flags); 445 - if (ret == 0) 446 - goto out; 447 - 448 - ret = ocfs2_read_block(OCFS2_SB(inode->i_sb), OCFS2_I(inode)->ip_blkno, 449 - &di_bh, OCFS2_BH_CACHED, inode); 450 - if (ret) { 451 - mlog_errno(ret); 452 - goto out; 453 - } 389 + memset(ret_rec, 0, sizeof(*ret_rec)); 390 + if (is_last) 391 + *is_last = 0; 454 392 455 393 di = (struct ocfs2_dinode *) di_bh->b_data; 456 394 el = &di->id2.i_list; 395 + tree_height = le16_to_cpu(el->l_tree_depth); 457 396 458 - if (el->l_tree_depth) { 397 + if (tree_height > 0) { 459 398 ret = ocfs2_find_leaf(inode, el, v_cluster, &eb_bh); 460 399 if (ret) { 461 400 mlog_errno(ret); ··· 464 431 i = ocfs2_search_extent_list(el, v_cluster); 465 432 if (i == -1) { 466 433 /* 434 + * Holes can be larger than the maximum size of an 435 + * extent, so we return their lengths in a seperate 436 + * field. 437 + */ 438 + if (hole_len) { 439 + ret = ocfs2_figure_hole_clusters(inode, el, eb_bh, 440 + v_cluster, &len); 441 + if (ret) { 442 + mlog_errno(ret); 443 + goto out; 444 + } 445 + 446 + *hole_len = len; 447 + } 448 + goto out_hole; 449 + } 450 + 451 + rec = &el->l_recs[i]; 452 + 453 + BUG_ON(v_cluster < le32_to_cpu(rec->e_cpos)); 454 + 455 + if (!rec->e_blkno) { 456 + ocfs2_error(inode->i_sb, "Inode %lu has bad extent " 457 + "record (%u, %u, 0)", inode->i_ino, 458 + le32_to_cpu(rec->e_cpos), 459 + ocfs2_rec_clusters(el, rec)); 460 + ret = -EROFS; 461 + goto out; 462 + } 463 + 464 + *ret_rec = *rec; 465 + 466 + /* 467 + * Checking for last extent is potentially expensive - we 468 + * might have to look at the next leaf over to see if it's 469 + * empty. 470 + * 471 + * The first two checks are to see whether the caller even 472 + * cares for this information, and if the extent is at least 473 + * the last in it's list. 474 + * 475 + * If those hold true, then the extent is last if any of the 476 + * additional conditions hold true: 477 + * - Extent list is in-inode 478 + * - Extent list is right-most 479 + * - Extent list is 2nd to rightmost, with empty right-most 480 + */ 481 + if (is_last) { 482 + if (i == (le16_to_cpu(el->l_next_free_rec) - 1)) { 483 + if (tree_height == 0) 484 + *is_last = 1; 485 + else if (eb->h_blkno == di->i_last_eb_blk) 486 + *is_last = 1; 487 + else if (eb->h_next_leaf_blk == di->i_last_eb_blk) { 488 + ret = ocfs2_last_eb_is_empty(inode, di); 489 + if (ret < 0) { 490 + mlog_errno(ret); 491 + goto out; 492 + } 493 + if (ret == 1) 494 + *is_last = 1; 495 + } 496 + } 497 + } 498 + 499 + out_hole: 500 + ret = 0; 501 + out: 502 + brelse(eb_bh); 503 + return ret; 504 + } 505 + 506 + static void ocfs2_relative_extent_offsets(struct super_block *sb, 507 + u32 v_cluster, 508 + struct ocfs2_extent_rec *rec, 509 + u32 *p_cluster, u32 *num_clusters) 510 + 511 + { 512 + u32 coff = v_cluster - le32_to_cpu(rec->e_cpos); 513 + 514 + *p_cluster = ocfs2_blocks_to_clusters(sb, le64_to_cpu(rec->e_blkno)); 515 + *p_cluster = *p_cluster + coff; 516 + 517 + if (num_clusters) 518 + *num_clusters = le16_to_cpu(rec->e_leaf_clusters) - coff; 519 + } 520 + 521 + int ocfs2_get_clusters(struct inode *inode, u32 v_cluster, 522 + u32 *p_cluster, u32 *num_clusters, 523 + unsigned int *extent_flags) 524 + { 525 + int ret; 526 + unsigned int uninitialized_var(hole_len), flags = 0; 527 + struct buffer_head *di_bh = NULL; 528 + struct ocfs2_extent_rec rec; 529 + 530 + if (OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL) { 531 + ret = -ERANGE; 532 + mlog_errno(ret); 533 + goto out; 534 + } 535 + 536 + ret = ocfs2_extent_map_lookup(inode, v_cluster, p_cluster, 537 + num_clusters, extent_flags); 538 + if (ret == 0) 539 + goto out; 540 + 541 + ret = ocfs2_read_block(OCFS2_SB(inode->i_sb), OCFS2_I(inode)->ip_blkno, 542 + &di_bh, OCFS2_BH_CACHED, inode); 543 + if (ret) { 544 + mlog_errno(ret); 545 + goto out; 546 + } 547 + 548 + ret = ocfs2_get_clusters_nocache(inode, di_bh, v_cluster, &hole_len, 549 + &rec, NULL); 550 + if (ret) { 551 + mlog_errno(ret); 552 + goto out; 553 + } 554 + 555 + if (rec.e_blkno == 0ULL) { 556 + /* 467 557 * A hole was found. Return some canned values that 468 558 * callers can key on. If asked for, num_clusters will 469 559 * be populated with the size of the hole. 470 560 */ 471 561 *p_cluster = 0; 472 562 if (num_clusters) { 473 - ret = ocfs2_figure_hole_clusters(inode, el, eb_bh, 474 - v_cluster, 475 - num_clusters); 476 - if (ret) { 477 - mlog_errno(ret); 478 - goto out; 479 - } 563 + *num_clusters = hole_len; 480 564 } 481 565 } else { 482 - rec = &el->l_recs[i]; 566 + ocfs2_relative_extent_offsets(inode->i_sb, v_cluster, &rec, 567 + p_cluster, num_clusters); 568 + flags = rec.e_flags; 483 569 484 - BUG_ON(v_cluster < le32_to_cpu(rec->e_cpos)); 485 - 486 - if (!rec->e_blkno) { 487 - ocfs2_error(inode->i_sb, "Inode %lu has bad extent " 488 - "record (%u, %u, 0)", inode->i_ino, 489 - le32_to_cpu(rec->e_cpos), 490 - ocfs2_rec_clusters(el, rec)); 491 - ret = -EROFS; 492 - goto out; 493 - } 494 - 495 - coff = v_cluster - le32_to_cpu(rec->e_cpos); 496 - 497 - *p_cluster = ocfs2_blocks_to_clusters(inode->i_sb, 498 - le64_to_cpu(rec->e_blkno)); 499 - *p_cluster = *p_cluster + coff; 500 - 501 - if (num_clusters) 502 - *num_clusters = ocfs2_rec_clusters(el, rec) - coff; 503 - 504 - flags = rec->e_flags; 505 - 506 - ocfs2_extent_map_insert_rec(inode, rec); 570 + ocfs2_extent_map_insert_rec(inode, &rec); 507 571 } 508 572 509 573 if (extent_flags) ··· 608 478 609 479 out: 610 480 brelse(di_bh); 611 - brelse(eb_bh); 612 481 return ret; 613 482 } 614 483 ··· 648 519 } 649 520 650 521 out: 522 + return ret; 523 + } 524 + 525 + static int ocfs2_fiemap_inline(struct inode *inode, struct buffer_head *di_bh, 526 + struct fiemap_extent_info *fieinfo, 527 + u64 map_start) 528 + { 529 + int ret; 530 + unsigned int id_count; 531 + struct ocfs2_dinode *di; 532 + u64 phys; 533 + u32 flags = FIEMAP_EXTENT_DATA_INLINE|FIEMAP_EXTENT_LAST; 534 + struct ocfs2_inode_info *oi = OCFS2_I(inode); 535 + 536 + di = (struct ocfs2_dinode *)di_bh->b_data; 537 + id_count = le16_to_cpu(di->id2.i_data.id_count); 538 + 539 + if (map_start < id_count) { 540 + phys = oi->ip_blkno << inode->i_sb->s_blocksize_bits; 541 + phys += offsetof(struct ocfs2_dinode, id2.i_data.id_data); 542 + 543 + ret = fiemap_fill_next_extent(fieinfo, 0, phys, id_count, 544 + flags); 545 + if (ret < 0) 546 + return ret; 547 + } 548 + 549 + return 0; 550 + } 551 + 552 + #define OCFS2_FIEMAP_FLAGS (FIEMAP_FLAG_SYNC) 553 + 554 + int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo, 555 + u64 map_start, u64 map_len) 556 + { 557 + int ret, is_last; 558 + u32 mapping_end, cpos; 559 + unsigned int hole_size; 560 + struct ocfs2_super *osb = OCFS2_SB(inode->i_sb); 561 + u64 len_bytes, phys_bytes, virt_bytes; 562 + struct buffer_head *di_bh = NULL; 563 + struct ocfs2_extent_rec rec; 564 + 565 + ret = fiemap_check_flags(fieinfo, OCFS2_FIEMAP_FLAGS); 566 + if (ret) 567 + return ret; 568 + 569 + ret = ocfs2_inode_lock(inode, &di_bh, 0); 570 + if (ret) { 571 + mlog_errno(ret); 572 + goto out; 573 + } 574 + 575 + down_read(&OCFS2_I(inode)->ip_alloc_sem); 576 + 577 + /* 578 + * Handle inline-data separately. 579 + */ 580 + if (OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL) { 581 + ret = ocfs2_fiemap_inline(inode, di_bh, fieinfo, map_start); 582 + goto out_unlock; 583 + } 584 + 585 + cpos = map_start >> osb->s_clustersize_bits; 586 + mapping_end = ocfs2_clusters_for_bytes(inode->i_sb, 587 + map_start + map_len); 588 + mapping_end -= cpos; 589 + is_last = 0; 590 + while (cpos < mapping_end && !is_last) { 591 + u32 fe_flags; 592 + 593 + ret = ocfs2_get_clusters_nocache(inode, di_bh, cpos, 594 + &hole_size, &rec, &is_last); 595 + if (ret) { 596 + mlog_errno(ret); 597 + goto out; 598 + } 599 + 600 + if (rec.e_blkno == 0ULL) { 601 + cpos += hole_size; 602 + continue; 603 + } 604 + 605 + fe_flags = 0; 606 + if (rec.e_flags & OCFS2_EXT_UNWRITTEN) 607 + fe_flags |= FIEMAP_EXTENT_UNWRITTEN; 608 + if (is_last) 609 + fe_flags |= FIEMAP_EXTENT_LAST; 610 + len_bytes = (u64)le16_to_cpu(rec.e_leaf_clusters) << osb->s_clustersize_bits; 611 + phys_bytes = le64_to_cpu(rec.e_blkno) << osb->sb->s_blocksize_bits; 612 + virt_bytes = (u64)le32_to_cpu(rec.e_cpos) << osb->s_clustersize_bits; 613 + 614 + ret = fiemap_fill_next_extent(fieinfo, virt_bytes, phys_bytes, 615 + len_bytes, fe_flags); 616 + if (ret) 617 + break; 618 + 619 + cpos = le32_to_cpu(rec.e_cpos)+ le16_to_cpu(rec.e_leaf_clusters); 620 + } 621 + 622 + if (ret > 0) 623 + ret = 0; 624 + 625 + out_unlock: 626 + brelse(di_bh); 627 + 628 + up_read(&OCFS2_I(inode)->ip_alloc_sem); 629 + 630 + ocfs2_inode_unlock(inode, 0); 631 + out: 632 + 651 633 return ret; 652 634 }

+3

fs/ocfs2/extent_map.h

··· 50 50 int ocfs2_extent_map_get_blocks(struct inode *inode, u64 v_blkno, u64 *p_blkno, 51 51 u64 *ret_count, unsigned int *extent_flags); 52 52 53 + int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo, 54 + u64 map_start, u64 map_len); 55 + 53 56 #endif /* _EXTENT_MAP_H */

+1

fs/ocfs2/file.c

··· 2228 2228 .getattr = ocfs2_getattr, 2229 2229 .permission = ocfs2_permission, 2230 2230 .fallocate = ocfs2_fallocate, 2231 + .fiemap = ocfs2_fiemap, 2231 2232 }; 2232 2233 2233 2234 const struct inode_operations ocfs2_special_file_iops = {

+2

include/linux/ext3_fs.h

··· 837 837 extern void ext3_set_inode_flags(struct inode *); 838 838 extern void ext3_get_inode_flags(struct ext3_inode_info *); 839 839 extern void ext3_set_aops(struct inode *inode); 840 + extern int ext3_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo, 841 + u64 start, u64 len); 840 842 841 843 /* ioctl.c */ 842 844 extern int ext3_ioctl (struct inode *, struct file *, unsigned int,

+64

include/linux/fiemap.h

··· 1 + /* 2 + * FS_IOC_FIEMAP ioctl infrastructure. 3 + * 4 + * Some portions copyright (C) 2007 Cluster File Systems, Inc 5 + * 6 + * Authors: Mark Fasheh <mfasheh@suse.com> 7 + * Kalpak Shah <kalpak.shah@sun.com> 8 + * Andreas Dilger <adilger@sun.com> 9 + */ 10 + 11 + #ifndef _LINUX_FIEMAP_H 12 + #define _LINUX_FIEMAP_H 13 + 14 + struct fiemap_extent { 15 + __u64 fe_logical; /* logical offset in bytes for the start of 16 + * the extent from the beginning of the file */ 17 + __u64 fe_physical; /* physical offset in bytes for the start 18 + * of the extent from the beginning of the disk */ 19 + __u64 fe_length; /* length in bytes for this extent */ 20 + __u64 fe_reserved64[2]; 21 + __u32 fe_flags; /* FIEMAP_EXTENT_* flags for this extent */ 22 + __u32 fe_reserved[3]; 23 + }; 24 + 25 + struct fiemap { 26 + __u64 fm_start; /* logical offset (inclusive) at 27 + * which to start mapping (in) */ 28 + __u64 fm_length; /* logical length of mapping which 29 + * userspace wants (in) */ 30 + __u32 fm_flags; /* FIEMAP_FLAG_* flags for request (in/out) */ 31 + __u32 fm_mapped_extents;/* number of extents that were mapped (out) */ 32 + __u32 fm_extent_count; /* size of fm_extents array (in) */ 33 + __u32 fm_reserved; 34 + struct fiemap_extent fm_extents[0]; /* array of mapped extents (out) */ 35 + }; 36 + 37 + #define FIEMAP_MAX_OFFSET (~0ULL) 38 + 39 + #define FIEMAP_FLAG_SYNC 0x00000001 /* sync file data before map */ 40 + #define FIEMAP_FLAG_XATTR 0x00000002 /* map extended attribute tree */ 41 + 42 + #define FIEMAP_FLAGS_COMPAT (FIEMAP_FLAG_SYNC | FIEMAP_FLAG_XATTR) 43 + 44 + #define FIEMAP_EXTENT_LAST 0x00000001 /* Last extent in file. */ 45 + #define FIEMAP_EXTENT_UNKNOWN 0x00000002 /* Data location unknown. */ 46 + #define FIEMAP_EXTENT_DELALLOC 0x00000004 /* Location still pending. 47 + * Sets EXTENT_UNKNOWN. */ 48 + #define FIEMAP_EXTENT_ENCODED 0x00000008 /* Data can not be read 49 + * while fs is unmounted */ 50 + #define FIEMAP_EXTENT_DATA_ENCRYPTED 0x00000080 /* Data is encrypted by fs. 51 + * Sets EXTENT_NO_BYPASS. */ 52 + #define FIEMAP_EXTENT_NOT_ALIGNED 0x00000100 /* Extent offsets may not be 53 + * block aligned. */ 54 + #define FIEMAP_EXTENT_DATA_INLINE 0x00000200 /* Data mixed with metadata. 55 + * Sets EXTENT_NOT_ALIGNED.*/ 56 + #define FIEMAP_EXTENT_DATA_TAIL 0x00000400 /* Multiple files in block. 57 + * Sets EXTENT_NOT_ALIGNED.*/ 58 + #define FIEMAP_EXTENT_UNWRITTEN 0x00000800 /* Space allocated, but 59 + * no data (i.e. zero). */ 60 + #define FIEMAP_EXTENT_MERGED 0x00001000 /* File does not natively 61 + * support extents. Result 62 + * merged for efficiency. */ 63 + 64 + #endif /* _LINUX_FIEMAP_H */

+21

include/linux/fs.h

··· 234 234 #define FS_IOC_SETFLAGS _IOW('f', 2, long) 235 235 #define FS_IOC_GETVERSION _IOR('v', 1, long) 236 236 #define FS_IOC_SETVERSION _IOW('v', 2, long) 237 + #define FS_IOC_FIEMAP _IOWR('f', 11, struct fiemap) 237 238 #define FS_IOC32_GETFLAGS _IOR('f', 1, int) 238 239 #define FS_IOC32_SETFLAGS _IOW('f', 2, int) 239 240 #define FS_IOC32_GETVERSION _IOR('v', 1, int) ··· 295 294 #include <linux/mutex.h> 296 295 #include <linux/capability.h> 297 296 #include <linux/semaphore.h> 297 + #include <linux/fiemap.h> 298 298 299 299 #include <asm/atomic.h> 300 300 #include <asm/byteorder.h> ··· 1184 1182 extern int file_permission(struct file *, int); 1185 1183 1186 1184 /* 1185 + * VFS FS_IOC_FIEMAP helper definitions. 1186 + */ 1187 + struct fiemap_extent_info { 1188 + unsigned int fi_flags; /* Flags as passed from user */ 1189 + unsigned int fi_extents_mapped; /* Number of mapped extents */ 1190 + unsigned int fi_extents_max; /* Size of fiemap_extent array */ 1191 + struct fiemap_extent *fi_extents_start; /* Start of fiemap_extent 1192 + * array */ 1193 + }; 1194 + int fiemap_fill_next_extent(struct fiemap_extent_info *info, u64 logical, 1195 + u64 phys, u64 len, u32 flags); 1196 + int fiemap_check_flags(struct fiemap_extent_info *fieinfo, u32 fs_flags); 1197 + 1198 + /* 1187 1199 * File types 1188 1200 * 1189 1201 * NOTE! These match bits 12..15 of stat.st_mode ··· 1306 1290 void (*truncate_range)(struct inode *, loff_t, loff_t); 1307 1291 long (*fallocate)(struct inode *inode, int mode, loff_t offset, 1308 1292 loff_t len); 1293 + int (*fiemap)(struct inode *, struct fiemap_extent_info *, u64 start, 1294 + u64 len); 1309 1295 }; 1310 1296 1311 1297 struct seq_file; ··· 2005 1987 2006 1988 extern int do_vfs_ioctl(struct file *filp, unsigned int fd, unsigned int cmd, 2007 1989 unsigned long arg); 1990 + extern int generic_block_fiemap(struct inode *inode, 1991 + struct fiemap_extent_info *fieinfo, u64 start, 1992 + u64 len, get_block_t *get_block); 2008 1993 2009 1994 extern void get_filesystem(struct file_system_type *fs); 2010 1995 extern void put_filesystem(struct file_system_type *fs);

+2 -1

include/linux/jbd2.h

··· 850 850 */ 851 851 struct block_device *j_dev; 852 852 int j_blocksize; 853 - unsigned long long j_blk_offset; 853 + unsigned long long j_blk_offset; 854 + char j_devname[BDEVNAME_SIZE+24]; 854 855 855 856 /* 856 857 * Device which holds the client fs. For internal journal this will be

+3 -9

include/linux/percpu_counter.h

··· 35 35 void percpu_counter_destroy(struct percpu_counter *fbc); 36 36 void percpu_counter_set(struct percpu_counter *fbc, s64 amount); 37 37 void __percpu_counter_add(struct percpu_counter *fbc, s64 amount, s32 batch); 38 - s64 __percpu_counter_sum(struct percpu_counter *fbc, int set); 38 + s64 __percpu_counter_sum(struct percpu_counter *fbc); 39 39 40 40 static inline void percpu_counter_add(struct percpu_counter *fbc, s64 amount) 41 41 { ··· 44 44 45 45 static inline s64 percpu_counter_sum_positive(struct percpu_counter *fbc) 46 46 { 47 - s64 ret = __percpu_counter_sum(fbc, 0); 47 + s64 ret = __percpu_counter_sum(fbc); 48 48 return ret < 0 ? 0 : ret; 49 49 } 50 50 51 - static inline s64 percpu_counter_sum_and_set(struct percpu_counter *fbc) 52 - { 53 - return __percpu_counter_sum(fbc, 1); 54 - } 55 - 56 - 57 51 static inline s64 percpu_counter_sum(struct percpu_counter *fbc) 58 52 { 59 - return __percpu_counter_sum(fbc, 0); 53 + return __percpu_counter_sum(fbc); 60 54 } 61 55 62 56 static inline s64 percpu_counter_read(struct percpu_counter *fbc)

+3 -5

lib/percpu_counter.c

··· 52 52 * Add up all the per-cpu counts, return the result. This is a more accurate 53 53 * but much slower version of percpu_counter_read_positive() 54 54 */ 55 - s64 __percpu_counter_sum(struct percpu_counter *fbc, int set) 55 + s64 __percpu_counter_sum(struct percpu_counter *fbc) 56 56 { 57 57 s64 ret; 58 58 int cpu; ··· 62 62 for_each_online_cpu(cpu) { 63 63 s32 *pcount = per_cpu_ptr(fbc->counters, cpu); 64 64 ret += *pcount; 65 - if (set) 66 - *pcount = 0; 65 + *pcount = 0; 67 66 } 68 - if (set) 69 - fbc->count = ret; 67 + fbc->count = ret; 70 68 71 69 spin_unlock(&fbc->lock); 72 70 return ret;