Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

Merge tag 'f2fs-for-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs

Pull f2fs updates from Jaegeuk Kim:
"This series introduces a device aliasing feature where user can carve
out partitions but reclaim the space back by deleting aliased file in
root dir.

In addition to that, there're numerous minor bug fixes in zoned device
support, checkpoint=disable, extent cache management, fiemap, and
lazytime mount option. The full list of noticeable changes can be
found below.

Enhancements:
- introduce device aliasing file
- add stats in debugfs to show multiple devices
- add a sysfs node to limit max read extent count per-inode
- modify f2fs_is_checkpoint_ready logic to allow more data to be
written with the CP disable
- decrease spare area for pinned files for zoned devices

Fixes:
- Revert "f2fs: remove unreachable lazytime mount option parsing"
- adjust unusable cap before checkpoint=disable mode
- fix to drop all discards after creating snapshot on lvm device
- fix to shrink read extent node in batches
- fix changing cursegs if recovery fails on zoned device
- fix to adjust appropriate length for fiemap
- fix fiemap failure issue when page size is 16KB
- fix to avoid forcing direct write to use buffered IO on inline_data
inode
- fix to map blocks correctly for direct write
- fix to account dirty data in __get_secs_required()
- fix null-ptr-deref in f2fs_submit_page_bio()
- fix inconsistent update of i_blocks in release_compress_blocks and
reserve_compress_blocks"

* tag 'f2fs-for-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (40 commits)
f2fs: fix to drop all discards after creating snapshot on lvm device
f2fs: add a sysfs node to limit max read extent count per-inode
f2fs: fix to shrink read extent node in batches
f2fs: print message if fscorrupted was found in f2fs_new_node_page()
f2fs: clear SBI_POR_DOING before initing inmem curseg
f2fs: fix changing cursegs if recovery fails on zoned device
f2fs: adjust unusable cap before checkpoint=disable mode
f2fs: fix to requery extent which cross boundary of inquiry
f2fs: fix to adjust appropriate length for fiemap
f2fs: clean up w/ F2FS_{BLK_TO_BYTES,BTYES_TO_BLK}
f2fs: fix to do cast in F2FS_{BLK_TO_BYTES, BTYES_TO_BLK} to avoid overflow
f2fs: replace deprecated strcpy with strscpy
Revert "f2fs: remove unreachable lazytime mount option parsing"
f2fs: fix to avoid forcing direct write to use buffered IO on inline_data inode
f2fs: fix to map blocks correctly for direct write
f2fs: fix race in concurrent f2fs_stop_gc_thread
f2fs: fix fiemap failure issue when page size is 16KB
f2fs: remove redundant atomic file check in defragment
f2fs: fix to convert log type to segment data type correctly
f2fs: clean up the unused variable additional_reserved_segments
...

+700 -255
+11 -2
Documentation/ABI/testing/sysfs-fs-f2fs
··· 311 311 GC approach and turns SSR mode on. 312 312 gc urgent low(2): lowers the bar of checking I/O idling in 313 313 order to process outstanding discard commands and GC a 314 - little bit aggressively. uses cost benefit GC approach. 314 + little bit aggressively. always uses cost benefit GC approach, 315 + and will override age-threshold GC approach if ATGC is enabled 316 + at the same time. 315 317 gc urgent mid(3): does GC forcibly in a period of given 316 318 gc_urgent_sleep_time and executes a mid level of I/O idling check. 317 - uses cost benefit GC approach. 319 + always uses cost benefit GC approach, and will override 320 + age-threshold GC approach if ATGC is enabled at the same time. 318 321 319 322 What: /sys/fs/f2fs/<disk>/gc_urgent_sleep_time 320 323 Date: August 2017 ··· 822 819 for zoned deivces. The initial value of it is 95(%). F2FS will stop the 823 820 background GC thread from intiating GC for sections having valid blocks 824 821 exceeding the ratio. 822 + 823 + What: /sys/fs/f2fs/<disk>/max_read_extent_count 824 + Date: November 2024 825 + Contact: "Chao Yu" <chao@kernel.org> 826 + Description: It controls max read extent count for per-inode, the value of threshold 827 + is 10240 by default.
+44
Documentation/filesystems/f2fs.rst
··· 943 943 can start before the zone-capacity and span across zone-capacity boundary. 944 944 Such spanning segments are also considered as usable segments. All blocks 945 945 past the zone-capacity are considered unusable in these segments. 946 + 947 + Device aliasing feature 948 + ----------------------- 949 + 950 + f2fs can utilize a special file called a "device aliasing file." This file allows 951 + the entire storage device to be mapped with a single, large extent, not using 952 + the usual f2fs node structures. This mapped area is pinned and primarily intended 953 + for holding the space. 954 + 955 + Essentially, this mechanism allows a portion of the f2fs area to be temporarily 956 + reserved and used by another filesystem or for different purposes. Once that 957 + external usage is complete, the device aliasing file can be deleted, releasing 958 + the reserved space back to F2FS for its own use. 959 + 960 + <use-case> 961 + 962 + # ls /dev/vd* 963 + /dev/vdb (32GB) /dev/vdc (32GB) 964 + # mkfs.ext4 /dev/vdc 965 + # mkfs.f2fs -c /dev/vdc@vdc.file /dev/vdb 966 + # mount /dev/vdb /mnt/f2fs 967 + # ls -l /mnt/f2fs 968 + vdc.file 969 + # df -h 970 + /dev/vdb 64G 33G 32G 52% /mnt/f2fs 971 + 972 + # mount -o loop /dev/vdc /mnt/ext4 973 + # df -h 974 + /dev/vdb 64G 33G 32G 52% /mnt/f2fs 975 + /dev/loop7 32G 24K 30G 1% /mnt/ext4 976 + # umount /mnt/ext4 977 + 978 + # f2fs_io getflags /mnt/f2fs/vdc.file 979 + get a flag on /mnt/f2fs/vdc.file ret=0, flags=nocow(pinned),immutable 980 + # f2fs_io setflags noimmutable /mnt/f2fs/vdc.file 981 + get a flag on noimmutable ret=0, flags=800010 982 + set a flag on /mnt/f2fs/vdc.file ret=0, flags=noimmutable 983 + # rm /mnt/f2fs/vdc.file 984 + # df -h 985 + /dev/vdb 64G 753M 64G 2% /mnt/f2fs 986 + 987 + So, the key idea is, user can do any file operations on /dev/vdc, and 988 + reclaim the space after the use, while the space is counted as /data. 989 + That doesn't require modifying partition size and filesystem format.
+2 -3
fs/f2fs/acl.c
··· 296 296 struct posix_acl *clone = NULL; 297 297 298 298 if (acl) { 299 - int size = sizeof(struct posix_acl) + acl->a_count * 300 - sizeof(struct posix_acl_entry); 301 - clone = kmemdup(acl, size, flags); 299 + clone = kmemdup(acl, struct_size(acl, a_entries, acl->a_count), 300 + flags); 302 301 if (clone) 303 302 refcount_set(&clone->a_refcount, 1); 304 303 }
+1 -1
fs/f2fs/checkpoint.c
··· 32 32 f2fs_build_fault_attr(sbi, 0, 0); 33 33 if (!end_io) 34 34 f2fs_flush_merged_writes(sbi); 35 - f2fs_handle_critical_error(sbi, reason, end_io); 35 + f2fs_handle_critical_error(sbi, reason); 36 36 } 37 37 38 38 /*
+50 -64
fs/f2fs/data.c
··· 1679 1679 /* reserved delalloc block should be mapped for fiemap. */ 1680 1680 if (blkaddr == NEW_ADDR) 1681 1681 map->m_flags |= F2FS_MAP_DELALLOC; 1682 - if (flag != F2FS_GET_BLOCK_DIO || !is_hole) 1682 + /* DIO READ and hole case, should not map the blocks. */ 1683 + if (!(flag == F2FS_GET_BLOCK_DIO && is_hole && !map->m_may_create)) 1683 1684 map->m_flags |= F2FS_MAP_MAPPED; 1684 1685 1685 1686 map->m_pblk = blkaddr; ··· 1822 1821 return true; 1823 1822 } 1824 1823 1825 - static inline u64 bytes_to_blks(struct inode *inode, u64 bytes) 1826 - { 1827 - return (bytes >> inode->i_blkbits); 1828 - } 1829 - 1830 - static inline u64 blks_to_bytes(struct inode *inode, u64 blks) 1831 - { 1832 - return (blks << inode->i_blkbits); 1833 - } 1834 - 1835 1824 static int f2fs_xattr_fiemap(struct inode *inode, 1836 1825 struct fiemap_extent_info *fieinfo) 1837 1826 { ··· 1847 1856 return err; 1848 1857 } 1849 1858 1850 - phys = blks_to_bytes(inode, ni.blk_addr); 1859 + phys = F2FS_BLK_TO_BYTES(ni.blk_addr); 1851 1860 offset = offsetof(struct f2fs_inode, i_addr) + 1852 1861 sizeof(__le32) * (DEF_ADDRS_PER_INODE - 1853 1862 get_inline_xattr_addrs(inode)); ··· 1879 1888 return err; 1880 1889 } 1881 1890 1882 - phys = blks_to_bytes(inode, ni.blk_addr); 1891 + phys = F2FS_BLK_TO_BYTES(ni.blk_addr); 1883 1892 len = inode->i_sb->s_blocksize; 1884 1893 1885 1894 f2fs_put_page(page, 1); ··· 1895 1904 return (err < 0 ? err : 0); 1896 1905 } 1897 1906 1898 - static loff_t max_inode_blocks(struct inode *inode) 1899 - { 1900 - loff_t result = ADDRS_PER_INODE(inode); 1901 - loff_t leaf_count = ADDRS_PER_BLOCK(inode); 1902 - 1903 - /* two direct node blocks */ 1904 - result += (leaf_count * 2); 1905 - 1906 - /* two indirect node blocks */ 1907 - leaf_count *= NIDS_PER_BLOCK; 1908 - result += (leaf_count * 2); 1909 - 1910 - /* one double indirect node block */ 1911 - leaf_count *= NIDS_PER_BLOCK; 1912 - result += leaf_count; 1913 - 1914 - return result; 1915 - } 1916 - 1917 1907 int f2fs_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo, 1918 1908 u64 start, u64 len) 1919 1909 { 1920 1910 struct f2fs_map_blocks map; 1921 - sector_t start_blk, last_blk; 1911 + sector_t start_blk, last_blk, blk_len, max_len; 1922 1912 pgoff_t next_pgofs; 1923 1913 u64 logical = 0, phys = 0, size = 0; 1924 1914 u32 flags = 0; ··· 1941 1969 goto out; 1942 1970 } 1943 1971 1944 - if (bytes_to_blks(inode, len) == 0) 1945 - len = blks_to_bytes(inode, 1); 1946 - 1947 - start_blk = bytes_to_blks(inode, start); 1948 - last_blk = bytes_to_blks(inode, start + len - 1); 1972 + start_blk = F2FS_BYTES_TO_BLK(start); 1973 + last_blk = F2FS_BYTES_TO_BLK(start + len - 1); 1974 + blk_len = last_blk - start_blk + 1; 1975 + max_len = F2FS_BYTES_TO_BLK(maxbytes) - start_blk; 1949 1976 1950 1977 next: 1951 1978 memset(&map, 0, sizeof(map)); 1952 1979 map.m_lblk = start_blk; 1953 - map.m_len = bytes_to_blks(inode, len); 1980 + map.m_len = blk_len; 1954 1981 map.m_next_pgofs = &next_pgofs; 1955 1982 map.m_seg_type = NO_CHECK_TYPE; 1956 1983 ··· 1966 1995 if (!compr_cluster && !(map.m_flags & F2FS_MAP_FLAGS)) { 1967 1996 start_blk = next_pgofs; 1968 1997 1969 - if (blks_to_bytes(inode, start_blk) < blks_to_bytes(inode, 1970 - max_inode_blocks(inode))) 1998 + if (F2FS_BLK_TO_BYTES(start_blk) < maxbytes) 1971 1999 goto prep_next; 1972 2000 1973 2001 flags |= FIEMAP_EXTENT_LAST; 2002 + } 2003 + 2004 + /* 2005 + * current extent may cross boundary of inquiry, increase len to 2006 + * requery. 2007 + */ 2008 + if (!compr_cluster && (map.m_flags & F2FS_MAP_MAPPED) && 2009 + map.m_lblk + map.m_len - 1 == last_blk && 2010 + blk_len != max_len) { 2011 + blk_len = max_len; 2012 + goto next; 1974 2013 } 1975 2014 1976 2015 compr_appended = false; ··· 2014 2033 } else if (compr_appended) { 2015 2034 unsigned int appended_blks = cluster_size - 2016 2035 count_in_cluster + 1; 2017 - size += blks_to_bytes(inode, appended_blks); 2036 + size += F2FS_BLK_TO_BYTES(appended_blks); 2018 2037 start_blk += appended_blks; 2019 2038 compr_cluster = false; 2020 2039 } else { 2021 - logical = blks_to_bytes(inode, start_blk); 2040 + logical = F2FS_BLK_TO_BYTES(start_blk); 2022 2041 phys = __is_valid_data_blkaddr(map.m_pblk) ? 2023 - blks_to_bytes(inode, map.m_pblk) : 0; 2024 - size = blks_to_bytes(inode, map.m_len); 2042 + F2FS_BLK_TO_BYTES(map.m_pblk) : 0; 2043 + size = F2FS_BLK_TO_BYTES(map.m_len); 2025 2044 flags = 0; 2026 2045 2027 2046 if (compr_cluster) { ··· 2029 2048 count_in_cluster += map.m_len; 2030 2049 if (count_in_cluster == cluster_size) { 2031 2050 compr_cluster = false; 2032 - size += blks_to_bytes(inode, 1); 2051 + size += F2FS_BLKSIZE; 2033 2052 } 2034 2053 } else if (map.m_flags & F2FS_MAP_DELALLOC) { 2035 2054 flags = FIEMAP_EXTENT_UNWRITTEN; 2036 2055 } 2037 2056 2038 - start_blk += bytes_to_blks(inode, size); 2057 + start_blk += F2FS_BYTES_TO_BLK(size); 2039 2058 } 2040 2059 2041 2060 prep_next: ··· 2073 2092 struct readahead_control *rac) 2074 2093 { 2075 2094 struct bio *bio = *bio_ret; 2076 - const unsigned blocksize = blks_to_bytes(inode, 1); 2095 + const unsigned int blocksize = F2FS_BLKSIZE; 2077 2096 sector_t block_in_file; 2078 2097 sector_t last_block; 2079 2098 sector_t last_block_in_file; ··· 2083 2102 2084 2103 block_in_file = (sector_t)index; 2085 2104 last_block = block_in_file + nr_pages; 2086 - last_block_in_file = bytes_to_blks(inode, 2087 - f2fs_readpage_limit(inode) + blocksize - 1); 2105 + last_block_in_file = F2FS_BYTES_TO_BLK(f2fs_readpage_limit(inode) + 2106 + blocksize - 1); 2088 2107 if (last_block > last_block_in_file) 2089 2108 last_block = last_block_in_file; 2090 2109 ··· 2184 2203 struct bio *bio = *bio_ret; 2185 2204 unsigned int start_idx = cc->cluster_idx << cc->log_cluster_size; 2186 2205 sector_t last_block_in_file; 2187 - const unsigned blocksize = blks_to_bytes(inode, 1); 2206 + const unsigned int blocksize = F2FS_BLKSIZE; 2188 2207 struct decompress_io_ctx *dic = NULL; 2189 2208 struct extent_info ei = {}; 2190 2209 bool from_dnode = true; ··· 2193 2212 2194 2213 f2fs_bug_on(sbi, f2fs_cluster_is_empty(cc)); 2195 2214 2196 - last_block_in_file = bytes_to_blks(inode, 2197 - f2fs_readpage_limit(inode) + blocksize - 1); 2215 + last_block_in_file = F2FS_BYTES_TO_BLK(f2fs_readpage_limit(inode) + 2216 + blocksize - 1); 2198 2217 2199 2218 /* get rid of pages beyond EOF */ 2200 2219 for (i = 0; i < cc->cluster_size; i++) { ··· 2369 2388 .nr_cpages = 0, 2370 2389 }; 2371 2390 pgoff_t nc_cluster_idx = NULL_CLUSTER; 2391 + pgoff_t index; 2372 2392 #endif 2373 2393 unsigned nr_pages = rac ? readahead_count(rac) : 1; 2374 2394 unsigned max_nr_pages = nr_pages; 2375 - pgoff_t index; 2376 2395 int ret = 0; 2377 2396 2378 2397 map.m_pblk = 0; ··· 2390 2409 prefetchw(&folio->flags); 2391 2410 } 2392 2411 2412 + #ifdef CONFIG_F2FS_FS_COMPRESSION 2393 2413 index = folio_index(folio); 2394 2414 2395 - #ifdef CONFIG_F2FS_FS_COMPRESSION 2396 2415 if (!f2fs_compressed_file(inode)) 2397 2416 goto read_single_page; 2398 2417 ··· 3425 3444 3426 3445 if (!f2fs_lookup_read_extent_cache_block(inode, index, 3427 3446 &dn.data_blkaddr)) { 3447 + if (IS_DEVICE_ALIASING(inode)) { 3448 + err = -ENODATA; 3449 + goto out; 3450 + } 3451 + 3428 3452 if (locked) { 3429 3453 err = f2fs_reserve_block(&dn, index); 3430 3454 goto out; ··· 3960 3974 * to be very smart. 3961 3975 */ 3962 3976 cur_lblock = 0; 3963 - last_lblock = bytes_to_blks(inode, i_size_read(inode)); 3977 + last_lblock = F2FS_BYTES_TO_BLK(i_size_read(inode)); 3964 3978 3965 3979 while (cur_lblock < last_lblock && cur_lblock < sis->max) { 3966 3980 struct f2fs_map_blocks map; ··· 4203 4217 pgoff_t next_pgofs = 0; 4204 4218 int err; 4205 4219 4206 - map.m_lblk = bytes_to_blks(inode, offset); 4207 - map.m_len = bytes_to_blks(inode, offset + length - 1) - map.m_lblk + 1; 4220 + map.m_lblk = F2FS_BYTES_TO_BLK(offset); 4221 + map.m_len = F2FS_BYTES_TO_BLK(offset + length - 1) - map.m_lblk + 1; 4208 4222 map.m_next_pgofs = &next_pgofs; 4209 4223 map.m_seg_type = f2fs_rw_hint_to_seg_type(F2FS_I_SB(inode), 4210 4224 inode->i_write_hint); ··· 4215 4229 if (err) 4216 4230 return err; 4217 4231 4218 - iomap->offset = blks_to_bytes(inode, map.m_lblk); 4232 + iomap->offset = F2FS_BLK_TO_BYTES(map.m_lblk); 4219 4233 4220 4234 /* 4221 4235 * When inline encryption is enabled, sometimes I/O to an encrypted file ··· 4235 4249 if (WARN_ON_ONCE(map.m_pblk == NEW_ADDR)) 4236 4250 return -EINVAL; 4237 4251 4238 - iomap->length = blks_to_bytes(inode, map.m_len); 4252 + iomap->length = F2FS_BLK_TO_BYTES(map.m_len); 4239 4253 iomap->type = IOMAP_MAPPED; 4240 4254 iomap->flags |= IOMAP_F_MERGED; 4241 4255 iomap->bdev = map.m_bdev; 4242 - iomap->addr = blks_to_bytes(inode, map.m_pblk); 4256 + iomap->addr = F2FS_BLK_TO_BYTES(map.m_pblk); 4243 4257 } else { 4244 4258 if (flags & IOMAP_WRITE) 4245 4259 return -ENOTBLK; 4246 4260 4247 4261 if (map.m_pblk == NULL_ADDR) { 4248 - iomap->length = blks_to_bytes(inode, next_pgofs) - 4249 - iomap->offset; 4262 + iomap->length = F2FS_BLK_TO_BYTES(next_pgofs) - 4263 + iomap->offset; 4250 4264 iomap->type = IOMAP_HOLE; 4251 4265 } else if (map.m_pblk == NEW_ADDR) { 4252 - iomap->length = blks_to_bytes(inode, map.m_len); 4266 + iomap->length = F2FS_BLK_TO_BYTES(map.m_len); 4253 4267 iomap->type = IOMAP_UNWRITTEN; 4254 4268 } else { 4255 4269 f2fs_bug_on(F2FS_I_SB(inode), 1);
+109 -2
fs/f2fs/debug.c
··· 60 60 } 61 61 62 62 #ifdef CONFIG_DEBUG_FS 63 + static void update_multidevice_stats(struct f2fs_sb_info *sbi) 64 + { 65 + struct f2fs_stat_info *si = F2FS_STAT(sbi); 66 + struct f2fs_dev_stats *dev_stats = si->dev_stats; 67 + int i, j; 68 + 69 + if (!f2fs_is_multi_device(sbi)) 70 + return; 71 + 72 + memset(dev_stats, 0, sizeof(struct f2fs_dev_stats) * sbi->s_ndevs); 73 + for (i = 0; i < sbi->s_ndevs; i++) { 74 + unsigned int start_segno, end_segno; 75 + block_t start_blk, end_blk; 76 + 77 + if (i == 0) { 78 + start_blk = MAIN_BLKADDR(sbi); 79 + end_blk = FDEV(i).end_blk + 1 - SEG0_BLKADDR(sbi); 80 + } else { 81 + start_blk = FDEV(i).start_blk; 82 + end_blk = FDEV(i).end_blk + 1; 83 + } 84 + 85 + start_segno = GET_SEGNO(sbi, start_blk); 86 + end_segno = GET_SEGNO(sbi, end_blk); 87 + 88 + for (j = start_segno; j < end_segno; j++) { 89 + unsigned int seg_blks, sec_blks; 90 + 91 + seg_blks = get_seg_entry(sbi, j)->valid_blocks; 92 + 93 + /* update segment stats */ 94 + if (IS_CURSEG(sbi, j)) 95 + dev_stats[i].devstats[0][DEVSTAT_INUSE]++; 96 + else if (seg_blks == BLKS_PER_SEG(sbi)) 97 + dev_stats[i].devstats[0][DEVSTAT_FULL]++; 98 + else if (seg_blks != 0) 99 + dev_stats[i].devstats[0][DEVSTAT_DIRTY]++; 100 + else if (!test_bit(j, FREE_I(sbi)->free_segmap)) 101 + dev_stats[i].devstats[0][DEVSTAT_FREE]++; 102 + else 103 + dev_stats[i].devstats[0][DEVSTAT_PREFREE]++; 104 + 105 + if (!__is_large_section(sbi) || 106 + (j % SEGS_PER_SEC(sbi)) != 0) 107 + continue; 108 + 109 + sec_blks = get_sec_entry(sbi, j)->valid_blocks; 110 + 111 + /* update section stats */ 112 + if (IS_CURSEC(sbi, GET_SEC_FROM_SEG(sbi, j))) 113 + dev_stats[i].devstats[1][DEVSTAT_INUSE]++; 114 + else if (sec_blks == BLKS_PER_SEC(sbi)) 115 + dev_stats[i].devstats[1][DEVSTAT_FULL]++; 116 + else if (sec_blks != 0) 117 + dev_stats[i].devstats[1][DEVSTAT_DIRTY]++; 118 + else if (!test_bit(GET_SEC_FROM_SEG(sbi, j), 119 + FREE_I(sbi)->free_secmap)) 120 + dev_stats[i].devstats[1][DEVSTAT_FREE]++; 121 + else 122 + dev_stats[i].devstats[1][DEVSTAT_PREFREE]++; 123 + } 124 + } 125 + } 126 + 63 127 static void update_general_status(struct f2fs_sb_info *sbi) 64 128 { 65 129 struct f2fs_stat_info *si = F2FS_STAT(sbi); ··· 277 213 si->dirty_seg[type]++; 278 214 si->valid_blks[type] += blks; 279 215 } 216 + 217 + update_multidevice_stats(sbi); 280 218 281 219 for (i = 0; i < MAX_CALL_TYPE; i++) 282 220 si->cp_call_count[i] = atomic_read(&sbi->cp_call_count[i]); ··· 564 498 si->dirty_count); 565 499 seq_printf(s, " - Prefree: %d\n - Free: %d (%d)\n\n", 566 500 si->prefree_count, si->free_segs, si->free_secs); 501 + if (f2fs_is_multi_device(sbi)) { 502 + seq_puts(s, "Multidevice stats:\n"); 503 + seq_printf(s, " [seg: %8s %8s %8s %8s %8s]", 504 + "inuse", "dirty", "full", "free", "prefree"); 505 + if (__is_large_section(sbi)) 506 + seq_printf(s, " [sec: %8s %8s %8s %8s %8s]\n", 507 + "inuse", "dirty", "full", "free", "prefree"); 508 + else 509 + seq_puts(s, "\n"); 510 + 511 + for (i = 0; i < sbi->s_ndevs; i++) { 512 + seq_printf(s, " #%-2d %8u %8u %8u %8u %8u", i, 513 + si->dev_stats[i].devstats[0][DEVSTAT_INUSE], 514 + si->dev_stats[i].devstats[0][DEVSTAT_DIRTY], 515 + si->dev_stats[i].devstats[0][DEVSTAT_FULL], 516 + si->dev_stats[i].devstats[0][DEVSTAT_FREE], 517 + si->dev_stats[i].devstats[0][DEVSTAT_PREFREE]); 518 + if (!__is_large_section(sbi)) { 519 + seq_puts(s, "\n"); 520 + continue; 521 + } 522 + seq_printf(s, " %8u %8u %8u %8u %8u\n", 523 + si->dev_stats[i].devstats[1][DEVSTAT_INUSE], 524 + si->dev_stats[i].devstats[1][DEVSTAT_DIRTY], 525 + si->dev_stats[i].devstats[1][DEVSTAT_FULL], 526 + si->dev_stats[i].devstats[1][DEVSTAT_FREE], 527 + si->dev_stats[i].devstats[1][DEVSTAT_PREFREE]); 528 + } 529 + seq_puts(s, "\n"); 530 + } 567 531 seq_printf(s, "CP calls: %d (BG: %d)\n", 568 532 si->cp_call_count[TOTAL_CALL], 569 533 si->cp_call_count[BACKGROUND]); ··· 694 598 si->ndirty_node, si->node_pages); 695 599 seq_printf(s, " - dents: %4d in dirs:%4d (%4d)\n", 696 600 si->ndirty_dent, si->ndirty_dirs, si->ndirty_all); 697 - seq_printf(s, " - datas: %4d in files:%4d\n", 601 + seq_printf(s, " - data: %4d in files:%4d\n", 698 602 si->ndirty_data, si->ndirty_files); 699 - seq_printf(s, " - quota datas: %4d in quota files:%4d\n", 603 + seq_printf(s, " - quota data: %4d in quota files:%4d\n", 700 604 si->ndirty_qdata, si->nquota_files); 701 605 seq_printf(s, " - meta: %4d in %4d\n", 702 606 si->ndirty_meta, si->meta_pages); ··· 761 665 { 762 666 struct f2fs_super_block *raw_super = F2FS_RAW_SUPER(sbi); 763 667 struct f2fs_stat_info *si; 668 + struct f2fs_dev_stats *dev_stats; 764 669 unsigned long flags; 765 670 int i; 766 671 767 672 si = f2fs_kzalloc(sbi, sizeof(struct f2fs_stat_info), GFP_KERNEL); 768 673 if (!si) 769 674 return -ENOMEM; 675 + 676 + dev_stats = f2fs_kzalloc(sbi, sizeof(struct f2fs_dev_stats) * 677 + sbi->s_ndevs, GFP_KERNEL); 678 + if (!dev_stats) { 679 + kfree(si); 680 + return -ENOMEM; 681 + } 682 + 683 + si->dev_stats = dev_stats; 770 684 771 685 si->all_area_segs = le32_to_cpu(raw_super->segment_count); 772 686 si->sit_area_segs = le32_to_cpu(raw_super->segment_count_sit); ··· 830 724 list_del(&si->stat_list); 831 725 raw_spin_unlock_irqrestore(&f2fs_stat_lock, flags); 832 726 727 + kfree(si->dev_stats); 833 728 kfree(si); 834 729 } 835 730
+89 -30
fs/f2fs/extent_cache.c
··· 24 24 struct f2fs_sb_info *sbi = F2FS_I_SB(inode); 25 25 struct f2fs_extent *i_ext = &F2FS_INODE(ipage)->i_ext; 26 26 struct extent_info ei; 27 + int devi; 27 28 28 29 get_read_extent_info(&ei, i_ext); 29 30 ··· 39 38 ei.blk, ei.fofs, ei.len); 40 39 return false; 41 40 } 42 - return true; 41 + 42 + if (!IS_DEVICE_ALIASING(inode)) 43 + return true; 44 + 45 + for (devi = 0; devi < sbi->s_ndevs; devi++) { 46 + if (FDEV(devi).start_blk != ei.blk || 47 + FDEV(devi).end_blk != ei.blk + ei.len - 1) 48 + continue; 49 + 50 + if (devi == 0) { 51 + f2fs_warn(sbi, 52 + "%s: inode (ino=%lx) is an alias of meta device", 53 + __func__, inode->i_ino); 54 + return false; 55 + } 56 + 57 + if (bdev_is_zoned(FDEV(devi).bdev)) { 58 + f2fs_warn(sbi, 59 + "%s: device alias inode (ino=%lx)'s extent info " 60 + "[%u, %u, %u] maps to zoned block device", 61 + __func__, inode->i_ino, ei.blk, ei.fofs, ei.len); 62 + return false; 63 + } 64 + return true; 65 + } 66 + 67 + f2fs_warn(sbi, "%s: device alias inode (ino=%lx)'s extent info " 68 + "[%u, %u, %u] is inconsistent w/ any devices", 69 + __func__, inode->i_ino, ei.blk, ei.fofs, ei.len); 70 + return false; 43 71 } 44 72 45 73 static void __set_extent_info(struct extent_info *ei, ··· 106 76 107 77 static bool __may_extent_tree(struct inode *inode, enum extent_type type) 108 78 { 79 + if (IS_DEVICE_ALIASING(inode) && type == EX_READ) 80 + return true; 81 + 109 82 /* 110 83 * for recovered files during mount do not create extents 111 84 * if shrinker is not registered. ··· 379 346 } 380 347 381 348 static unsigned int __free_extent_tree(struct f2fs_sb_info *sbi, 382 - struct extent_tree *et) 349 + struct extent_tree *et, unsigned int nr_shrink) 383 350 { 384 351 struct rb_node *node, *next; 385 352 struct extent_node *en; 386 - unsigned int count = atomic_read(&et->node_cnt); 353 + unsigned int count; 387 354 388 355 node = rb_first_cached(&et->root); 389 - while (node) { 356 + 357 + for (count = 0; node && count < nr_shrink; count++) { 390 358 next = rb_next(node); 391 359 en = rb_entry(node, struct extent_node, rb_node); 392 360 __release_extent_node(sbi, et, en); 393 361 node = next; 394 362 } 395 363 396 - return count - atomic_read(&et->node_cnt); 364 + return count; 397 365 } 398 366 399 367 static void __drop_largest_extent(struct extent_tree *et, ··· 434 400 write_lock(&et->lock); 435 401 if (atomic_read(&et->node_cnt) || !ei.len) 436 402 goto skip; 403 + 404 + if (IS_DEVICE_ALIASING(inode)) { 405 + et->largest = ei; 406 + goto skip; 407 + } 437 408 438 409 en = __attach_extent_node(sbi, et, &ei, NULL, 439 410 &et->root.rb_root.rb_node, true); ··· 499 460 *ei = et->largest; 500 461 ret = true; 501 462 stat_inc_largest_node_hit(sbi); 463 + goto out; 464 + } 465 + 466 + if (IS_DEVICE_ALIASING(inode)) { 467 + ret = false; 502 468 goto out; 503 469 } 504 470 ··· 623 579 return en; 624 580 } 625 581 582 + static unsigned int __destroy_extent_node(struct inode *inode, 583 + enum extent_type type) 584 + { 585 + struct f2fs_sb_info *sbi = F2FS_I_SB(inode); 586 + struct extent_tree *et = F2FS_I(inode)->extent_tree[type]; 587 + unsigned int nr_shrink = type == EX_READ ? 588 + READ_EXTENT_CACHE_SHRINK_NUMBER : 589 + AGE_EXTENT_CACHE_SHRINK_NUMBER; 590 + unsigned int node_cnt = 0; 591 + 592 + if (!et || !atomic_read(&et->node_cnt)) 593 + return 0; 594 + 595 + while (atomic_read(&et->node_cnt)) { 596 + write_lock(&et->lock); 597 + node_cnt += __free_extent_tree(sbi, et, nr_shrink); 598 + write_unlock(&et->lock); 599 + } 600 + 601 + f2fs_bug_on(sbi, atomic_read(&et->node_cnt)); 602 + 603 + return node_cnt; 604 + } 605 + 626 606 static void __update_extent_tree_range(struct inode *inode, 627 607 struct extent_info *tei, enum extent_type type) 628 608 { ··· 717 649 } 718 650 719 651 if (end < org_end && (type != EX_READ || 720 - org_end - end >= F2FS_MIN_EXTENT_LEN)) { 652 + (org_end - end >= F2FS_MIN_EXTENT_LEN && 653 + atomic_read(&et->node_cnt) < 654 + sbi->max_read_extent_count))) { 721 655 if (parts) { 722 656 __set_extent_info(&ei, 723 657 end, org_end - end, ··· 787 717 } 788 718 } 789 719 790 - if (is_inode_flag_set(inode, FI_NO_EXTENT)) 791 - __free_extent_tree(sbi, et); 792 - 793 720 if (et->largest_updated) { 794 721 et->largest_updated = false; 795 722 updated = true; ··· 803 736 insert_p, insert_parent, leftmost); 804 737 out_read_extent_cache: 805 738 write_unlock(&et->lock); 739 + 740 + if (is_inode_flag_set(inode, FI_NO_EXTENT)) 741 + __destroy_extent_node(inode, EX_READ); 806 742 807 743 if (updated) 808 744 f2fs_mark_inode_dirty_sync(inode, true); ··· 969 899 list_for_each_entry_safe(et, next, &eti->zombie_list, list) { 970 900 if (atomic_read(&et->node_cnt)) { 971 901 write_lock(&et->lock); 972 - node_cnt += __free_extent_tree(sbi, et); 902 + node_cnt += __free_extent_tree(sbi, et, 903 + nr_shrink - node_cnt - tree_cnt); 973 904 write_unlock(&et->lock); 974 905 } 975 - f2fs_bug_on(sbi, atomic_read(&et->node_cnt)); 906 + 907 + if (atomic_read(&et->node_cnt)) 908 + goto unlock_out; 909 + 976 910 list_del_init(&et->list); 977 911 radix_tree_delete(&eti->extent_tree_root, et->ino); 978 912 kmem_cache_free(extent_tree_slab, et); ··· 1115 1041 return __shrink_extent_tree(sbi, nr_shrink, EX_BLOCK_AGE); 1116 1042 } 1117 1043 1118 - static unsigned int __destroy_extent_node(struct inode *inode, 1119 - enum extent_type type) 1120 - { 1121 - struct f2fs_sb_info *sbi = F2FS_I_SB(inode); 1122 - struct extent_tree *et = F2FS_I(inode)->extent_tree[type]; 1123 - unsigned int node_cnt = 0; 1124 - 1125 - if (!et || !atomic_read(&et->node_cnt)) 1126 - return 0; 1127 - 1128 - write_lock(&et->lock); 1129 - node_cnt = __free_extent_tree(sbi, et); 1130 - write_unlock(&et->lock); 1131 - 1132 - return node_cnt; 1133 - } 1134 - 1135 1044 void f2fs_destroy_extent_node(struct inode *inode) 1136 1045 { 1137 1046 __destroy_extent_node(inode, EX_READ); ··· 1123 1066 1124 1067 static void __drop_extent_tree(struct inode *inode, enum extent_type type) 1125 1068 { 1126 - struct f2fs_sb_info *sbi = F2FS_I_SB(inode); 1127 1069 struct extent_tree *et = F2FS_I(inode)->extent_tree[type]; 1128 1070 bool updated = false; 1129 1071 ··· 1130 1074 return; 1131 1075 1132 1076 write_lock(&et->lock); 1133 - __free_extent_tree(sbi, et); 1134 1077 if (type == EX_READ) { 1135 1078 set_inode_flag(inode, FI_NO_EXTENT); 1136 1079 if (et->largest.len) { ··· 1138 1083 } 1139 1084 } 1140 1085 write_unlock(&et->lock); 1086 + 1087 + __destroy_extent_node(inode, type); 1088 + 1141 1089 if (updated) 1142 1090 f2fs_mark_inode_dirty_sync(inode, true); 1143 1091 } ··· 1214 1156 sbi->hot_data_age_threshold = DEF_HOT_DATA_AGE_THRESHOLD; 1215 1157 sbi->warm_data_age_threshold = DEF_WARM_DATA_AGE_THRESHOLD; 1216 1158 sbi->last_age_weight = LAST_AGE_WEIGHT; 1159 + sbi->max_read_extent_count = DEF_MAX_READ_EXTENT_COUNT; 1217 1160 } 1218 1161 1219 1162 int __init f2fs_create_extent_cache(void)
+31 -7
fs/f2fs/f2fs.h
··· 213 213 #define F2FS_FEATURE_CASEFOLD 0x00001000 214 214 #define F2FS_FEATURE_COMPRESSION 0x00002000 215 215 #define F2FS_FEATURE_RO 0x00004000 216 + #define F2FS_FEATURE_DEVICE_ALIAS 0x00008000 216 217 217 218 #define __F2FS_HAS_FEATURE(raw_super, mask) \ 218 219 ((raw_super->feature & cpu_to_le32(mask)) != 0) ··· 635 634 #define DEF_HOT_DATA_AGE_THRESHOLD 262144 636 635 #define DEF_WARM_DATA_AGE_THRESHOLD 2621440 637 636 637 + /* default max read extent count per inode */ 638 + #define DEF_MAX_READ_EXTENT_COUNT 10240 639 + 638 640 /* extent cache type */ 639 641 enum extent_type { 640 642 EX_READ, ··· 1022 1018 #define NR_CURSEG_PERSIST_TYPE (NR_CURSEG_DATA_TYPE + NR_CURSEG_NODE_TYPE) 1023 1019 #define NR_CURSEG_TYPE (NR_CURSEG_INMEM_TYPE + NR_CURSEG_PERSIST_TYPE) 1024 1020 1025 - enum { 1021 + enum log_type { 1026 1022 CURSEG_HOT_DATA = 0, /* directory entry blocks */ 1027 1023 CURSEG_WARM_DATA, /* data blocks */ 1028 1024 CURSEG_COLD_DATA, /* multimedia or GCed data blocks */ ··· 1067 1063 unsigned int segment_count; /* total # of segments */ 1068 1064 unsigned int main_segments; /* # of segments in main area */ 1069 1065 unsigned int reserved_segments; /* # of reserved segments */ 1070 - unsigned int additional_reserved_segments;/* reserved segs for IO align feature */ 1071 1066 unsigned int ovp_segments; /* # of overprovision segments */ 1072 1067 1073 1068 /* a threshold to reclaim prefree segments */ ··· 1622 1619 /* for extent tree cache */ 1623 1620 struct extent_tree_info extent_tree[NR_EXTENT_CACHES]; 1624 1621 atomic64_t allocated_data_blocks; /* for block age extent_cache */ 1622 + unsigned int max_read_extent_count; /* max read extent count per inode */ 1625 1623 1626 1624 /* The threshold used for hot and warm data seperation*/ 1627 1625 unsigned int hot_data_age_threshold; ··· 1762 1758 unsigned int dirty_device; /* for checkpoint data flush */ 1763 1759 spinlock_t dev_lock; /* protect dirty_device */ 1764 1760 bool aligned_blksize; /* all devices has the same logical blksize */ 1761 + unsigned int first_zoned_segno; /* first zoned segno */ 1765 1762 1766 1763 /* For write statistics */ 1767 1764 u64 sectors_written_start; ··· 3051 3046 #define F2FS_DIRSYNC_FL 0x00010000 /* dirsync behaviour (directories only) */ 3052 3047 #define F2FS_PROJINHERIT_FL 0x20000000 /* Create with parents projid */ 3053 3048 #define F2FS_CASEFOLD_FL 0x40000000 /* Casefolded file */ 3049 + #define F2FS_DEVICE_ALIAS_FL 0x80000000 /* File for aliasing a device */ 3054 3050 3055 3051 #define F2FS_QUOTA_DEFAULT_FL (F2FS_NOATIME_FL | F2FS_IMMUTABLE_FL) 3056 3052 ··· 3066 3060 3067 3061 /* Flags that are appropriate for non-directories/regular files. */ 3068 3062 #define F2FS_OTHER_FLMASK (F2FS_NODUMP_FL | F2FS_NOATIME_FL) 3063 + 3064 + #define IS_DEVICE_ALIASING(inode) (F2FS_I(inode)->i_flags & F2FS_DEVICE_ALIAS_FL) 3069 3065 3070 3066 static inline __u32 f2fs_mask_flags(umode_t mode, __u32 flags) 3071 3067 { ··· 3640 3632 loff_t max_file_blocks(struct inode *inode); 3641 3633 void f2fs_quota_off_umount(struct super_block *sb); 3642 3634 void f2fs_save_errors(struct f2fs_sb_info *sbi, unsigned char flag); 3643 - void f2fs_handle_critical_error(struct f2fs_sb_info *sbi, unsigned char reason, 3644 - bool irq_context); 3635 + void f2fs_handle_critical_error(struct f2fs_sb_info *sbi, unsigned char reason); 3645 3636 void f2fs_handle_error(struct f2fs_sb_info *sbi, unsigned char error); 3646 3637 void f2fs_handle_error_async(struct f2fs_sb_info *sbi, unsigned char error); 3647 3638 int f2fs_commit_super(struct f2fs_sb_info *sbi, bool recover); ··· 3761 3754 block_t old_addr, block_t new_addr, 3762 3755 unsigned char version, bool recover_curseg, 3763 3756 bool recover_newaddr); 3764 - int f2fs_get_segment_temp(int seg_type); 3757 + enum temp_type f2fs_get_segment_temp(struct f2fs_sb_info *sbi, 3758 + enum log_type seg_type); 3765 3759 int f2fs_allocate_data_block(struct f2fs_sb_info *sbi, struct page *page, 3766 3760 block_t old_blkaddr, block_t *new_blkaddr, 3767 3761 struct f2fs_summary *sum, int type, ··· 3779 3771 int f2fs_lookup_journal_in_cursum(struct f2fs_journal *journal, int type, 3780 3772 unsigned int val, int alloc); 3781 3773 void f2fs_flush_sit_entries(struct f2fs_sb_info *sbi, struct cp_control *cpc); 3782 - int f2fs_fix_curseg_write_pointer(struct f2fs_sb_info *sbi); 3783 - int f2fs_check_write_pointer(struct f2fs_sb_info *sbi); 3774 + int f2fs_check_and_fix_write_pointer(struct f2fs_sb_info *sbi); 3784 3775 int f2fs_build_segment_manager(struct f2fs_sb_info *sbi); 3785 3776 void f2fs_destroy_segment_manager(struct f2fs_sb_info *sbi); 3786 3777 int __init f2fs_create_segment_manager_caches(void); ··· 3789 3782 enum page_type type, enum temp_type temp); 3790 3783 unsigned int f2fs_usable_segs_in_sec(struct f2fs_sb_info *sbi); 3791 3784 unsigned int f2fs_usable_blks_in_seg(struct f2fs_sb_info *sbi, 3785 + unsigned int segno); 3786 + unsigned long long f2fs_get_section_mtime(struct f2fs_sb_info *sbi, 3792 3787 unsigned int segno); 3793 3788 3794 3789 #define DEF_FRAGMENT_SIZE 4 ··· 3944 3935 * debug.c 3945 3936 */ 3946 3937 #ifdef CONFIG_F2FS_STAT_FS 3938 + enum { 3939 + DEVSTAT_INUSE, 3940 + DEVSTAT_DIRTY, 3941 + DEVSTAT_FULL, 3942 + DEVSTAT_FREE, 3943 + DEVSTAT_PREFREE, 3944 + DEVSTAT_MAX, 3945 + }; 3946 + 3947 + struct f2fs_dev_stats { 3948 + unsigned int devstats[2][DEVSTAT_MAX]; /* 0: segs, 1: secs */ 3949 + }; 3950 + 3947 3951 struct f2fs_stat_info { 3948 3952 struct list_head stat_list; 3949 3953 struct f2fs_sb_info *sbi; ··· 4020 3998 unsigned int block_count[2]; 4021 3999 unsigned int inplace_count; 4022 4000 unsigned long long base_mem, cache_mem, page_mem; 4001 + struct f2fs_dev_stats *dev_stats; 4023 4002 }; 4024 4003 4025 4004 static inline struct f2fs_stat_info *F2FS_STAT(struct f2fs_sb_info *sbi) ··· 4533 4510 F2FS_FEATURE_FUNCS(casefold, CASEFOLD); 4534 4511 F2FS_FEATURE_FUNCS(compression, COMPRESSION); 4535 4512 F2FS_FEATURE_FUNCS(readonly, RO); 4513 + F2FS_FEATURE_FUNCS(device_alias, DEVICE_ALIAS); 4536 4514 4537 4515 #ifdef CONFIG_BLK_DEV_ZONED 4538 4516 static inline bool f2fs_blkz_is_seq(struct f2fs_sb_info *sbi, int devi,
+58 -13
fs/f2fs/file.c
··· 725 725 726 726 trace_f2fs_truncate_blocks_enter(inode, from); 727 727 728 + if (IS_DEVICE_ALIASING(inode) && from) { 729 + err = -EINVAL; 730 + goto out_err; 731 + } 732 + 728 733 free_from = (pgoff_t)F2FS_BLK_ALIGN(from); 729 734 730 735 if (free_from >= max_file_blocks(inode)) ··· 741 736 ipage = f2fs_get_node_page(sbi, inode->i_ino); 742 737 if (IS_ERR(ipage)) { 743 738 err = PTR_ERR(ipage); 739 + goto out; 740 + } 741 + 742 + if (IS_DEVICE_ALIASING(inode)) { 743 + struct extent_tree *et = F2FS_I(inode)->extent_tree[EX_READ]; 744 + struct extent_info ei = et->largest; 745 + unsigned int i; 746 + 747 + for (i = 0; i < ei.len; i++) 748 + f2fs_invalidate_blocks(sbi, ei.blk + i); 749 + 750 + dec_valid_block_count(sbi, inode, ei.len); 751 + f2fs_update_time(sbi, REQ_TIME); 752 + 753 + f2fs_put_page(ipage, 1); 744 754 goto out; 745 755 } 746 756 ··· 794 774 /* lastly zero out the first data page */ 795 775 if (!err) 796 776 err = truncate_partial_data_page(inode, from, truncate_page); 797 - 777 + out_err: 798 778 trace_f2fs_truncate_blocks_exit(inode, err); 799 779 return err; 800 780 } ··· 883 863 return true; 884 864 if (f2fs_compressed_file(inode)) 885 865 return true; 886 - if (f2fs_has_inline_data(inode)) 866 + /* 867 + * only force direct read to use buffered IO, for direct write, 868 + * it expects inline data conversion before committing IO. 869 + */ 870 + if (f2fs_has_inline_data(inode) && rw == READ) 887 871 return true; 888 872 889 873 /* disallow direct IO if any of devices has unaligned blksize */ ··· 1016 992 return -EPERM; 1017 993 1018 994 if ((attr->ia_valid & ATTR_SIZE)) { 1019 - if (!f2fs_is_compress_backend_ready(inode)) 995 + if (!f2fs_is_compress_backend_ready(inode) || 996 + IS_DEVICE_ALIASING(inode)) 1020 997 return -EOPNOTSUPP; 1021 998 if (is_inode_flag_set(inode, FI_COMPRESS_RELEASED) && 1022 999 !IS_ALIGNED(attr->ia_size, ··· 1815 1790 1816 1791 map.m_len = sec_blks; 1817 1792 next_alloc: 1818 - if (has_not_enough_free_secs(sbi, 0, 1793 + if (has_not_enough_free_secs(sbi, 0, f2fs_sb_has_blkzoned(sbi) ? 1794 + ZONED_PIN_SEC_REQUIRED_COUNT : 1819 1795 GET_SEC_FROM_SEG(sbi, overprovision_segments(sbi)))) { 1820 1796 f2fs_down_write(&sbi->gc_lock); 1821 1797 stat_inc_gc_call_count(sbi, FOREGROUND); ··· 1886 1860 return -EIO; 1887 1861 if (!f2fs_is_checkpoint_ready(F2FS_I_SB(inode))) 1888 1862 return -ENOSPC; 1889 - if (!f2fs_is_compress_backend_ready(inode)) 1863 + if (!f2fs_is_compress_backend_ready(inode) || IS_DEVICE_ALIASING(inode)) 1890 1864 return -EOPNOTSUPP; 1891 1865 1892 1866 /* f2fs only support ->fallocate for regular file */ ··· 2369 2343 if (readonly) 2370 2344 goto out; 2371 2345 2372 - /* grab sb->s_umount to avoid racing w/ remount() */ 2346 + /* 2347 + * grab sb->s_umount to avoid racing w/ remount() and other shutdown 2348 + * paths. 2349 + */ 2373 2350 if (need_lock) 2374 - down_read(&sbi->sb->s_umount); 2351 + down_write(&sbi->sb->s_umount); 2375 2352 2376 2353 f2fs_stop_gc_thread(sbi); 2377 2354 f2fs_stop_discard_thread(sbi); ··· 2383 2354 clear_opt(sbi, DISCARD); 2384 2355 2385 2356 if (need_lock) 2386 - up_read(&sbi->sb->s_umount); 2357 + up_write(&sbi->sb->s_umount); 2387 2358 2388 2359 f2fs_update_time(sbi, REQ_TIME); 2389 2360 out: ··· 2890 2861 if (!capable(CAP_SYS_ADMIN)) 2891 2862 return -EPERM; 2892 2863 2893 - if (!S_ISREG(inode->i_mode) || f2fs_is_atomic_file(inode)) 2864 + if (!S_ISREG(inode->i_mode)) 2894 2865 return -EINVAL; 2895 2866 2896 2867 if (f2fs_readonly(sbi->sb)) ··· 3320 3291 struct f2fs_inode_info *fi = F2FS_I(inode); 3321 3292 struct f2fs_sb_info *sbi = F2FS_I_SB(inode); 3322 3293 3294 + if (IS_DEVICE_ALIASING(inode)) 3295 + return -EINVAL; 3296 + 3323 3297 if (fi->i_gc_failures >= sbi->gc_pin_file_threshold) { 3324 3298 f2fs_warn(sbi, "%s: Enable GC = ino %lx after %x GC trials", 3325 3299 __func__, inode->i_ino, fi->i_gc_failures); ··· 3352 3320 3353 3321 if (f2fs_readonly(sbi->sb)) 3354 3322 return -EROFS; 3323 + 3324 + if (!pin && IS_DEVICE_ALIASING(inode)) 3325 + return -EOPNOTSUPP; 3355 3326 3356 3327 ret = mnt_want_write_file(filp); 3357 3328 if (ret) ··· 3419 3384 if (is_inode_flag_set(inode, FI_PIN_FILE)) 3420 3385 pin = F2FS_I(inode)->i_gc_failures; 3421 3386 return put_user(pin, (u32 __user *)arg); 3387 + } 3388 + 3389 + static int f2fs_ioc_get_dev_alias_file(struct file *filp, unsigned long arg) 3390 + { 3391 + return put_user(IS_DEVICE_ALIASING(file_inode(filp)) ? 1 : 0, 3392 + (u32 __user *)arg); 3422 3393 } 3423 3394 3424 3395 int f2fs_precache_extents(struct inode *inode) ··· 3828 3787 to_reserved = cluster_size - compr_blocks - reserved; 3829 3788 3830 3789 /* for the case all blocks in cluster were reserved */ 3831 - if (to_reserved == 1) { 3790 + if (reserved && to_reserved == 1) { 3832 3791 dn->ofs_in_node += cluster_size; 3833 3792 goto next; 3834 3793 } ··· 4526 4485 return f2fs_ioc_decompress_file(filp); 4527 4486 case F2FS_IOC_COMPRESS_FILE: 4528 4487 return f2fs_ioc_compress_file(filp); 4488 + case F2FS_IOC_GET_DEV_ALIAS_FILE: 4489 + return f2fs_ioc_get_dev_alias_file(filp, arg); 4529 4490 default: 4530 4491 return -ENOTTY; 4531 4492 } ··· 4803 4760 else 4804 4761 return 0; 4805 4762 4806 - map.m_may_create = true; 4763 + if (!IS_DEVICE_ALIASING(inode)) 4764 + map.m_may_create = true; 4807 4765 if (dio) { 4808 4766 map.m_seg_type = f2fs_rw_hint_to_seg_type(sbi, 4809 4767 inode->i_write_hint); ··· 4860 4816 { 4861 4817 struct inode *inode = iter->inode; 4862 4818 struct f2fs_sb_info *sbi = F2FS_I_SB(inode); 4863 - int seg_type = f2fs_rw_hint_to_seg_type(sbi, inode->i_write_hint); 4864 - enum temp_type temp = f2fs_get_segment_temp(seg_type); 4819 + enum log_type type = f2fs_rw_hint_to_seg_type(sbi, inode->i_write_hint); 4820 + enum temp_type temp = f2fs_get_segment_temp(sbi, type); 4865 4821 4866 4822 bio->bi_write_hint = f2fs_io_type_to_rw_hint(sbi, DATA, temp); 4867 4823 submit_bio(bio); ··· 5241 5197 case F2FS_IOC_SET_COMPRESS_OPTION: 5242 5198 case F2FS_IOC_DECOMPRESS_FILE: 5243 5199 case F2FS_IOC_COMPRESS_FILE: 5200 + case F2FS_IOC_GET_DEV_ALIAS_FILE: 5244 5201 break; 5245 5202 default: 5246 5203 return -ENOIOCTLCMD;
+6 -13
fs/f2fs/gc.c
··· 257 257 258 258 switch (sbi->gc_mode) { 259 259 case GC_IDLE_CB: 260 + case GC_URGENT_LOW: 261 + case GC_URGENT_MID: 260 262 gc_mode = GC_CB; 261 263 break; 262 264 case GC_IDLE_GREEDY: ··· 363 361 static unsigned int get_cb_cost(struct f2fs_sb_info *sbi, unsigned int segno) 364 362 { 365 363 struct sit_info *sit_i = SIT_I(sbi); 366 - unsigned int secno = GET_SEC_FROM_SEG(sbi, segno); 367 - unsigned int start = GET_SEG_FROM_SEC(sbi, secno); 368 364 unsigned long long mtime = 0; 369 365 unsigned int vblocks; 370 366 unsigned char age = 0; 371 367 unsigned char u; 372 - unsigned int i; 373 368 unsigned int usable_segs_per_sec = f2fs_usable_segs_in_sec(sbi); 374 369 375 - for (i = 0; i < usable_segs_per_sec; i++) 376 - mtime += get_seg_entry(sbi, start + i)->mtime; 370 + mtime = f2fs_get_section_mtime(sbi, segno); 371 + f2fs_bug_on(sbi, mtime == INVALID_MTIME); 377 372 vblocks = get_valid_blocks(sbi, segno, true); 378 - 379 - mtime = div_u64(mtime, usable_segs_per_sec); 380 373 vblocks = div_u64(vblocks, usable_segs_per_sec); 381 374 382 375 u = BLKS_TO_SEGS(sbi, vblocks * 100); ··· 516 519 struct victim_sel_policy *p, unsigned int segno) 517 520 { 518 521 struct sit_info *sit_i = SIT_I(sbi); 519 - unsigned int secno = GET_SEC_FROM_SEG(sbi, segno); 520 - unsigned int start = GET_SEG_FROM_SEC(sbi, secno); 521 522 unsigned long long mtime = 0; 522 - unsigned int i; 523 523 524 524 if (unlikely(is_sbi_flag_set(sbi, SBI_CP_DISABLED))) { 525 525 if (p->gc_mode == GC_AT && ··· 524 530 return; 525 531 } 526 532 527 - for (i = 0; i < SEGS_PER_SEC(sbi); i++) 528 - mtime += get_seg_entry(sbi, start + i)->mtime; 529 - mtime = div_u64(mtime, SEGS_PER_SEC(sbi)); 533 + mtime = f2fs_get_section_mtime(sbi, segno); 534 + f2fs_bug_on(sbi, mtime == INVALID_MTIME); 530 535 531 536 /* Handle if the system time has changed by the user */ 532 537 if (mtime < sit_i->min_mtime)
+1
fs/f2fs/gc.h
··· 35 35 #define LIMIT_BOOST_ZONED_GC 25 /* percentage over total user space of boosted gc for zoned devices */ 36 36 #define DEF_MIGRATION_WINDOW_GRANULARITY_ZONED 3 37 37 #define BOOST_GC_MULTIPLE 5 38 + #define ZONED_PIN_SEC_REQUIRED_COUNT 1 38 39 39 40 #define DEF_GC_FAILED_PINNED_FILES 2048 40 41 #define MAX_GC_FAILED_PINNED_FILES USHRT_MAX
+21 -2
fs/f2fs/inode.c
··· 372 372 return false; 373 373 } 374 374 375 + if (IS_DEVICE_ALIASING(inode)) { 376 + if (!f2fs_sb_has_device_alias(sbi)) { 377 + f2fs_warn(sbi, "%s: inode (ino=%lx) has device alias flag, but the feature is off", 378 + __func__, inode->i_ino); 379 + return false; 380 + } 381 + if (!f2fs_is_pinned_file(inode)) { 382 + f2fs_warn(sbi, "%s: inode (ino=%lx) has device alias flag, but is not pinned", 383 + __func__, inode->i_ino); 384 + return false; 385 + } 386 + } 387 + 375 388 return true; 376 389 } 377 390 ··· 788 775 !is_inode_flag_set(inode, FI_DIRTY_INODE)) 789 776 return 0; 790 777 791 - if (!f2fs_is_checkpoint_ready(sbi)) 778 + if (!f2fs_is_checkpoint_ready(sbi)) { 779 + f2fs_mark_inode_dirty_sync(inode, true); 792 780 return -ENOSPC; 781 + } 793 782 794 783 /* 795 784 * We need to balance fs here to prevent from producing dirty node pages ··· 838 823 f2fs_bug_on(sbi, get_dirty_pages(inode)); 839 824 f2fs_remove_dirty_inode(inode); 840 825 841 - f2fs_destroy_extent_tree(inode); 826 + if (!IS_DEVICE_ALIASING(inode)) 827 + f2fs_destroy_extent_tree(inode); 842 828 843 829 if (inode->i_nlink || is_bad_inode(inode)) 844 830 goto no_delete; ··· 894 878 err = 0; 895 879 goto retry; 896 880 } 881 + 882 + if (IS_DEVICE_ALIASING(inode)) 883 + f2fs_destroy_extent_tree(inode); 897 884 898 885 if (err) { 899 886 f2fs_update_inode_page(inode);
+20 -8
fs/f2fs/node.c
··· 905 905 if (err) 906 906 return err; 907 907 908 + if (ni.blk_addr != NEW_ADDR && 909 + !f2fs_is_valid_blkaddr(sbi, ni.blk_addr, DATA_GENERIC_ENHANCE)) { 910 + f2fs_err_ratelimited(sbi, 911 + "nat entry is corrupted, run fsck to fix it, ino:%u, " 912 + "nid:%u, blkaddr:%u", ni.ino, ni.nid, ni.blk_addr); 913 + set_sbi_flag(sbi, SBI_NEED_FSCK); 914 + f2fs_handle_error(sbi, ERROR_INCONSISTENT_NAT); 915 + return -EFSCORRUPTED; 916 + } 917 + 908 918 /* Deallocate node address */ 909 919 f2fs_invalidate_blocks(sbi, ni.blk_addr); 910 920 dec_valid_node_count(sbi, dn->inode, dn->nid == dn->inode->i_ino); ··· 1066 1056 int i; 1067 1057 int idx = depth - 2; 1068 1058 1069 - nid[0] = le32_to_cpu(ri->i_nid[offset[0] - NODE_DIR1_BLOCK]); 1059 + nid[0] = get_nid(dn->inode_page, offset[0], true); 1070 1060 if (!nid[0]) 1071 1061 return 0; 1072 1062 ··· 1177 1167 1178 1168 skip_partial: 1179 1169 while (cont) { 1180 - dn.nid = le32_to_cpu(ri->i_nid[offset[0] - NODE_DIR1_BLOCK]); 1170 + dn.nid = get_nid(page, offset[0], true); 1181 1171 switch (offset[0]) { 1182 1172 case NODE_DIR1_BLOCK: 1183 1173 case NODE_DIR2_BLOCK: ··· 1209 1199 } 1210 1200 if (err < 0) 1211 1201 goto fail; 1212 - if (offset[1] == 0 && 1213 - ri->i_nid[offset[0] - NODE_DIR1_BLOCK]) { 1202 + if (offset[1] == 0 && get_nid(page, offset[0], true)) { 1214 1203 lock_page(page); 1215 1204 BUG_ON(page->mapping != NODE_MAPPING(sbi)); 1216 - f2fs_wait_on_page_writeback(page, NODE, true, true); 1217 - ri->i_nid[offset[0] - NODE_DIR1_BLOCK] = 0; 1218 - set_page_dirty(page); 1205 + set_nid(page, offset[0], 0, true); 1219 1206 unlock_page(page); 1220 1207 } 1221 1208 offset[1] = 0; ··· 1338 1331 err = -EFSCORRUPTED; 1339 1332 dec_valid_node_count(sbi, dn->inode, !ofs); 1340 1333 set_sbi_flag(sbi, SBI_NEED_FSCK); 1341 - f2fs_handle_error(sbi, ERROR_INVALID_BLKADDR); 1334 + f2fs_warn_ratelimited(sbi, 1335 + "f2fs_new_node_page: inconsistent nat entry, " 1336 + "ino:%u, nid:%u, blkaddr:%u, ver:%u, flag:%u", 1337 + new_ni.ino, new_ni.nid, new_ni.blk_addr, 1338 + new_ni.version, new_ni.flag); 1339 + f2fs_handle_error(sbi, ERROR_INCONSISTENT_NAT); 1342 1340 goto fail; 1343 1341 } 1344 1342 #endif
+2 -7
fs/f2fs/recovery.c
··· 899 899 * and the f2fs is not read only, check and fix zoned block devices' 900 900 * write pointer consistency. 901 901 */ 902 - if (f2fs_sb_has_blkzoned(sbi) && !f2fs_readonly(sbi->sb)) { 903 - int err2 = f2fs_fix_curseg_write_pointer(sbi); 904 - 905 - if (!err2) 906 - err2 = f2fs_check_write_pointer(sbi); 907 - if (err2) 908 - err = err2; 902 + if (!err) { 903 + err = f2fs_check_and_fix_write_pointer(sbi); 909 904 ret = err; 910 905 } 911 906
+118 -43
fs/f2fs/segment.c
··· 1290 1290 wait_list, issued); 1291 1291 return 0; 1292 1292 } 1293 - 1294 - /* 1295 - * Issue discard for conventional zones only if the device 1296 - * supports discard. 1297 - */ 1298 - if (!bdev_max_discard_sectors(bdev)) 1299 - return -EOPNOTSUPP; 1300 1293 } 1301 1294 #endif 1295 + 1296 + /* 1297 + * stop issuing discard for any of below cases: 1298 + * 1. device is conventional zone, but it doesn't support discard. 1299 + * 2. device is regulare device, after snapshot it doesn't support 1300 + * discard. 1301 + */ 1302 + if (!bdev_max_discard_sectors(bdev)) 1303 + return -EOPNOTSUPP; 1302 1304 1303 1305 trace_f2fs_issue_discard(bdev, dc->di.start, dc->di.len); 1304 1306 ··· 2713 2711 if (sbi->blkzone_alloc_policy == BLKZONE_ALLOC_PRIOR_CONV || pinning) 2714 2712 segno = 0; 2715 2713 else 2716 - segno = max(first_zoned_segno(sbi), *newseg); 2714 + segno = max(sbi->first_zoned_segno, *newseg); 2717 2715 hint = GET_SEC_FROM_SEG(sbi, segno); 2718 2716 } 2719 2717 #endif ··· 2725 2723 if (secno >= MAIN_SECS(sbi) && f2fs_sb_has_blkzoned(sbi)) { 2726 2724 /* Write only to sequential zones */ 2727 2725 if (sbi->blkzone_alloc_policy == BLKZONE_ALLOC_ONLY_SEQ) { 2728 - hint = GET_SEC_FROM_SEG(sbi, first_zoned_segno(sbi)); 2726 + hint = GET_SEC_FROM_SEG(sbi, sbi->first_zoned_segno); 2729 2727 secno = find_next_zero_bit(free_i->free_secmap, MAIN_SECS(sbi), hint); 2730 2728 } else 2731 2729 secno = find_first_zero_bit(free_i->free_secmap, ··· 2928 2926 struct f2fs_summary_block *sum_node; 2929 2927 struct page *sum_page; 2930 2928 2931 - write_sum_page(sbi, curseg->sum_blk, GET_SUM_BLOCK(sbi, curseg->segno)); 2929 + if (curseg->inited) 2930 + write_sum_page(sbi, curseg->sum_blk, GET_SUM_BLOCK(sbi, curseg->segno)); 2932 2931 2933 2932 __set_test_and_inuse(sbi, new_segno); 2934 2933 ··· 3240 3237 3241 3238 if (f2fs_sb_has_blkzoned(sbi) && err == -EAGAIN && gc_required) { 3242 3239 f2fs_down_write(&sbi->gc_lock); 3243 - err = f2fs_gc_range(sbi, 0, GET_SEGNO(sbi, FDEV(0).end_blk), true, 1); 3240 + err = f2fs_gc_range(sbi, 0, GET_SEGNO(sbi, FDEV(0).end_blk), 3241 + true, ZONED_PIN_SEC_REQUIRED_COUNT); 3244 3242 f2fs_up_write(&sbi->gc_lock); 3245 3243 3246 3244 gc_required = false; ··· 3585 3581 } 3586 3582 } 3587 3583 3588 - int f2fs_get_segment_temp(int seg_type) 3584 + enum temp_type f2fs_get_segment_temp(struct f2fs_sb_info *sbi, 3585 + enum log_type type) 3589 3586 { 3590 - if (IS_HOT(seg_type)) 3591 - return HOT; 3592 - else if (IS_WARM(seg_type)) 3593 - return WARM; 3594 - return COLD; 3587 + struct curseg_info *curseg = CURSEG_I(sbi, type); 3588 + enum temp_type temp = COLD; 3589 + 3590 + switch (curseg->seg_type) { 3591 + case CURSEG_HOT_NODE: 3592 + case CURSEG_HOT_DATA: 3593 + temp = HOT; 3594 + break; 3595 + case CURSEG_WARM_NODE: 3596 + case CURSEG_WARM_DATA: 3597 + temp = WARM; 3598 + break; 3599 + case CURSEG_COLD_NODE: 3600 + case CURSEG_COLD_DATA: 3601 + temp = COLD; 3602 + break; 3603 + default: 3604 + f2fs_bug_on(sbi, 1); 3605 + } 3606 + 3607 + return temp; 3595 3608 } 3596 3609 3597 3610 static int __get_segment_type(struct f2fs_io_info *fio) 3598 3611 { 3599 - int type = 0; 3612 + enum log_type type = CURSEG_HOT_DATA; 3600 3613 3601 3614 switch (F2FS_OPTION(fio->sbi).active_logs) { 3602 3615 case 2: ··· 3629 3608 f2fs_bug_on(fio->sbi, true); 3630 3609 } 3631 3610 3632 - fio->temp = f2fs_get_segment_temp(type); 3611 + fio->temp = f2fs_get_segment_temp(fio->sbi, type); 3633 3612 3634 3613 return type; 3635 3614 } ··· 3814 3793 } 3815 3794 } 3816 3795 3796 + static int log_type_to_seg_type(enum log_type type) 3797 + { 3798 + int seg_type = CURSEG_COLD_DATA; 3799 + 3800 + switch (type) { 3801 + case CURSEG_HOT_DATA: 3802 + case CURSEG_WARM_DATA: 3803 + case CURSEG_COLD_DATA: 3804 + case CURSEG_HOT_NODE: 3805 + case CURSEG_WARM_NODE: 3806 + case CURSEG_COLD_NODE: 3807 + seg_type = (int)type; 3808 + break; 3809 + case CURSEG_COLD_DATA_PINNED: 3810 + case CURSEG_ALL_DATA_ATGC: 3811 + seg_type = CURSEG_COLD_DATA; 3812 + break; 3813 + default: 3814 + break; 3815 + } 3816 + return seg_type; 3817 + } 3818 + 3817 3819 static void do_write_page(struct f2fs_summary *sum, struct f2fs_io_info *fio) 3818 3820 { 3819 - int type = __get_segment_type(fio); 3820 - bool keep_order = (f2fs_lfs_mode(fio->sbi) && type == CURSEG_COLD_DATA); 3821 + enum log_type type = __get_segment_type(fio); 3822 + int seg_type = log_type_to_seg_type(type); 3823 + bool keep_order = (f2fs_lfs_mode(fio->sbi) && 3824 + seg_type == CURSEG_COLD_DATA); 3821 3825 3822 3826 if (keep_order) 3823 3827 f2fs_down_read(&fio->sbi->io_order_lock); ··· 4023 3977 } 4024 3978 } 4025 3979 4026 - f2fs_bug_on(sbi, !IS_DATASEG(type)); 4027 3980 curseg = CURSEG_I(sbi, type); 3981 + f2fs_bug_on(sbi, !IS_DATASEG(curseg->seg_type)); 4028 3982 4029 3983 mutex_lock(&curseg->curseg_mutex); 4030 3984 down_write(&sit_i->sentry_lock); ··· 4824 4778 sizeof(struct f2fs_journal), GFP_KERNEL); 4825 4779 if (!array[i].journal) 4826 4780 return -ENOMEM; 4827 - if (i < NR_PERSISTENT_LOG) 4828 - array[i].seg_type = CURSEG_HOT_DATA + i; 4829 - else if (i == CURSEG_COLD_DATA_PINNED) 4830 - array[i].seg_type = CURSEG_COLD_DATA; 4831 - else if (i == CURSEG_ALL_DATA_ATGC) 4832 - array[i].seg_type = CURSEG_COLD_DATA; 4781 + array[i].seg_type = log_type_to_seg_type(i); 4833 4782 reset_curseg_fields(&array[i]); 4834 4783 } 4835 4784 return restore_curseg_summaries(sbi); ··· 5248 5207 return 0; 5249 5208 } 5250 5209 5251 - static int fix_curseg_write_pointer(struct f2fs_sb_info *sbi, int type) 5210 + static int do_fix_curseg_write_pointer(struct f2fs_sb_info *sbi, int type) 5252 5211 { 5253 5212 struct curseg_info *cs = CURSEG_I(sbi, type); 5254 5213 struct f2fs_dev_info *zbd; ··· 5353 5312 return 0; 5354 5313 } 5355 5314 5356 - int f2fs_fix_curseg_write_pointer(struct f2fs_sb_info *sbi) 5315 + static int fix_curseg_write_pointer(struct f2fs_sb_info *sbi) 5357 5316 { 5358 5317 int i, ret; 5359 5318 5360 5319 for (i = 0; i < NR_PERSISTENT_LOG; i++) { 5361 - ret = fix_curseg_write_pointer(sbi, i); 5320 + ret = do_fix_curseg_write_pointer(sbi, i); 5362 5321 if (ret) 5363 5322 return ret; 5364 5323 } ··· 5381 5340 return check_zone_write_pointer(args->sbi, args->fdev, zone); 5382 5341 } 5383 5342 5384 - int f2fs_check_write_pointer(struct f2fs_sb_info *sbi) 5343 + static int check_write_pointer(struct f2fs_sb_info *sbi) 5385 5344 { 5386 5345 int i, ret; 5387 5346 struct check_zone_write_pointer_args args; ··· 5399 5358 } 5400 5359 5401 5360 return 0; 5361 + } 5362 + 5363 + int f2fs_check_and_fix_write_pointer(struct f2fs_sb_info *sbi) 5364 + { 5365 + int ret; 5366 + 5367 + if (!f2fs_sb_has_blkzoned(sbi) || f2fs_readonly(sbi->sb)) 5368 + return 0; 5369 + 5370 + f2fs_notice(sbi, "Checking entire write pointers"); 5371 + ret = fix_curseg_write_pointer(sbi); 5372 + if (!ret) 5373 + ret = check_write_pointer(sbi); 5374 + return ret; 5402 5375 } 5403 5376 5404 5377 /* ··· 5451 5396 return BLKS_PER_SEG(sbi); 5452 5397 } 5453 5398 #else 5454 - int f2fs_fix_curseg_write_pointer(struct f2fs_sb_info *sbi) 5455 - { 5456 - return 0; 5457 - } 5458 - 5459 - int f2fs_check_write_pointer(struct f2fs_sb_info *sbi) 5399 + int f2fs_check_and_fix_write_pointer(struct f2fs_sb_info *sbi) 5460 5400 { 5461 5401 return 0; 5462 5402 } ··· 5480 5430 return SEGS_PER_SEC(sbi); 5481 5431 } 5482 5432 5433 + unsigned long long f2fs_get_section_mtime(struct f2fs_sb_info *sbi, 5434 + unsigned int segno) 5435 + { 5436 + unsigned int usable_segs_per_sec = f2fs_usable_segs_in_sec(sbi); 5437 + unsigned int secno = 0, start = 0; 5438 + unsigned int total_valid_blocks = 0; 5439 + unsigned long long mtime = 0; 5440 + unsigned int i = 0; 5441 + 5442 + secno = GET_SEC_FROM_SEG(sbi, segno); 5443 + start = GET_SEG_FROM_SEC(sbi, secno); 5444 + 5445 + if (!__is_large_section(sbi)) 5446 + return get_seg_entry(sbi, start + i)->mtime; 5447 + 5448 + for (i = 0; i < usable_segs_per_sec; i++) { 5449 + /* for large section, only check the mtime of valid segments */ 5450 + struct seg_entry *se = get_seg_entry(sbi, start+i); 5451 + 5452 + mtime += se->mtime * se->valid_blocks; 5453 + total_valid_blocks += se->valid_blocks; 5454 + } 5455 + 5456 + if (total_valid_blocks == 0) 5457 + return INVALID_MTIME; 5458 + 5459 + return div_u64(mtime, total_valid_blocks); 5460 + } 5461 + 5483 5462 /* 5484 5463 * Update min, max modified time for cost-benefit GC algorithm 5485 5464 */ ··· 5522 5443 sit_i->min_mtime = ULLONG_MAX; 5523 5444 5524 5445 for (segno = 0; segno < MAIN_SEGS(sbi); segno += SEGS_PER_SEC(sbi)) { 5525 - unsigned int i; 5526 5446 unsigned long long mtime = 0; 5527 5447 5528 - for (i = 0; i < SEGS_PER_SEC(sbi); i++) 5529 - mtime += get_seg_entry(sbi, segno + i)->mtime; 5530 - 5531 - mtime = div_u64(mtime, SEGS_PER_SEC(sbi)); 5448 + mtime = f2fs_get_section_mtime(sbi, segno); 5532 5449 5533 5450 if (sit_i->min_mtime > mtime) 5534 5451 sit_i->min_mtime = mtime;
+46 -26
fs/f2fs/segment.h
··· 18 18 #define F2FS_MIN_SEGMENTS 9 /* SB + 2 (CP + SIT + NAT) + SSA + MAIN */ 19 19 #define F2FS_MIN_META_SEGMENTS 8 /* SB + 2 (CP + SIT + NAT) + SSA */ 20 20 21 + #define INVALID_MTIME ULLONG_MAX /* no valid blocks in a segment/section */ 22 + 21 23 /* L: Logical segment # in volume, R: Relative segment # in main area */ 22 24 #define GET_L2R_SEGNO(free_i, segno) ((segno) - (free_i)->start_segno) 23 25 #define GET_R2L_SEGNO(free_i, segno) ((segno) + (free_i)->start_segno) ··· 33 31 { 34 32 f2fs_bug_on(sbi, seg_type >= NR_PERSISTENT_LOG); 35 33 } 36 - 37 - #define IS_HOT(t) ((t) == CURSEG_HOT_NODE || (t) == CURSEG_HOT_DATA) 38 - #define IS_WARM(t) ((t) == CURSEG_WARM_NODE || (t) == CURSEG_WARM_DATA) 39 - #define IS_COLD(t) ((t) == CURSEG_COLD_NODE || (t) == CURSEG_COLD_DATA) 40 34 41 35 #define IS_CURSEG(sbi, seg) \ 42 36 (((seg) == CURSEG_I(sbi, CURSEG_HOT_DATA)->segno) || \ ··· 522 524 523 525 static inline unsigned int reserved_segments(struct f2fs_sb_info *sbi) 524 526 { 525 - return SM_I(sbi)->reserved_segments + 526 - SM_I(sbi)->additional_reserved_segments; 527 + return SM_I(sbi)->reserved_segments; 527 528 } 528 529 529 530 static inline unsigned int free_sections(struct f2fs_sb_info *sbi) ··· 556 559 } 557 560 558 561 static inline bool has_curseg_enough_space(struct f2fs_sb_info *sbi, 559 - unsigned int node_blocks, unsigned int dent_blocks) 562 + unsigned int node_blocks, unsigned int data_blocks, 563 + unsigned int dent_blocks) 560 564 { 561 565 562 - unsigned segno, left_blocks; 566 + unsigned int segno, left_blocks, blocks; 563 567 int i; 564 568 565 - /* check current node sections in the worst case. */ 566 - for (i = CURSEG_HOT_NODE; i <= CURSEG_COLD_NODE; i++) { 569 + /* check current data/node sections in the worst case. */ 570 + for (i = CURSEG_HOT_DATA; i < NR_PERSISTENT_LOG; i++) { 567 571 segno = CURSEG_I(sbi, i)->segno; 568 572 left_blocks = CAP_BLKS_PER_SEC(sbi) - 569 573 get_ckpt_valid_blocks(sbi, segno, true); 570 - if (node_blocks > left_blocks) 574 + 575 + blocks = i <= CURSEG_COLD_DATA ? data_blocks : node_blocks; 576 + if (blocks > left_blocks) 571 577 return false; 572 578 } 573 579 ··· 584 584 } 585 585 586 586 /* 587 - * calculate needed sections for dirty node/dentry 588 - * and call has_curseg_enough_space 587 + * calculate needed sections for dirty node/dentry and call 588 + * has_curseg_enough_space, please note that, it needs to account 589 + * dirty data as well in lfs mode when checkpoint is disabled. 589 590 */ 590 591 static inline void __get_secs_required(struct f2fs_sb_info *sbi, 591 592 unsigned int *lower_p, unsigned int *upper_p, bool *curseg_p) ··· 595 594 get_pages(sbi, F2FS_DIRTY_DENTS) + 596 595 get_pages(sbi, F2FS_DIRTY_IMETA); 597 596 unsigned int total_dent_blocks = get_pages(sbi, F2FS_DIRTY_DENTS); 597 + unsigned int total_data_blocks = 0; 598 598 unsigned int node_secs = total_node_blocks / CAP_BLKS_PER_SEC(sbi); 599 599 unsigned int dent_secs = total_dent_blocks / CAP_BLKS_PER_SEC(sbi); 600 + unsigned int data_secs = 0; 600 601 unsigned int node_blocks = total_node_blocks % CAP_BLKS_PER_SEC(sbi); 601 602 unsigned int dent_blocks = total_dent_blocks % CAP_BLKS_PER_SEC(sbi); 603 + unsigned int data_blocks = 0; 604 + 605 + if (f2fs_lfs_mode(sbi) && 606 + unlikely(is_sbi_flag_set(sbi, SBI_CP_DISABLED))) { 607 + total_data_blocks = get_pages(sbi, F2FS_DIRTY_DATA); 608 + data_secs = total_data_blocks / CAP_BLKS_PER_SEC(sbi); 609 + data_blocks = total_data_blocks % CAP_BLKS_PER_SEC(sbi); 610 + } 602 611 603 612 if (lower_p) 604 - *lower_p = node_secs + dent_secs; 613 + *lower_p = node_secs + dent_secs + data_secs; 605 614 if (upper_p) 606 615 *upper_p = node_secs + dent_secs + 607 - (node_blocks ? 1 : 0) + (dent_blocks ? 1 : 0); 616 + (node_blocks ? 1 : 0) + (dent_blocks ? 1 : 0) + 617 + (data_blocks ? 1 : 0); 608 618 if (curseg_p) 609 619 *curseg_p = has_curseg_enough_space(sbi, 610 - node_blocks, dent_blocks); 620 + node_blocks, data_blocks, dent_blocks); 611 621 } 612 622 613 623 static inline bool has_not_enough_free_secs(struct f2fs_sb_info *sbi, ··· 649 637 return !has_not_enough_free_secs(sbi, freed, needed); 650 638 } 651 639 640 + static inline bool has_enough_free_blks(struct f2fs_sb_info *sbi) 641 + { 642 + unsigned int total_free_blocks = 0; 643 + unsigned int avail_user_block_count; 644 + 645 + spin_lock(&sbi->stat_lock); 646 + 647 + avail_user_block_count = get_available_block_count(sbi, NULL, true); 648 + total_free_blocks = avail_user_block_count - (unsigned int)valid_user_blocks(sbi); 649 + 650 + spin_unlock(&sbi->stat_lock); 651 + 652 + return total_free_blocks > 0; 653 + } 654 + 652 655 static inline bool f2fs_is_checkpoint_ready(struct f2fs_sb_info *sbi) 653 656 { 654 657 if (likely(!is_sbi_flag_set(sbi, SBI_CP_DISABLED))) 655 658 return true; 656 659 if (likely(has_enough_free_secs(sbi, 0, 0))) 660 + return true; 661 + if (!f2fs_lfs_mode(sbi) && 662 + likely(has_enough_free_blks(sbi))) 657 663 return true; 658 664 return false; 659 665 } ··· 986 956 wake_up: 987 957 dcc->discard_wake = true; 988 958 wake_up_interruptible_all(&dcc->discard_wait_queue); 989 - } 990 - 991 - static inline unsigned int first_zoned_segno(struct f2fs_sb_info *sbi) 992 - { 993 - int devi; 994 - 995 - for (devi = 0; devi < sbi->s_ndevs; devi++) 996 - if (bdev_is_zoned(FDEV(devi).bdev)) 997 - return GET_SEGNO(sbi, FDEV(devi).start_blk); 998 - return 0; 999 959 }
+73 -28
fs/f2fs/super.c
··· 150 150 Opt_mode, 151 151 Opt_fault_injection, 152 152 Opt_fault_type, 153 + Opt_lazytime, 154 + Opt_nolazytime, 153 155 Opt_quota, 154 156 Opt_noquota, 155 157 Opt_usrquota, ··· 228 226 {Opt_mode, "mode=%s"}, 229 227 {Opt_fault_injection, "fault_injection=%u"}, 230 228 {Opt_fault_type, "fault_type=%u"}, 229 + {Opt_lazytime, "lazytime"}, 230 + {Opt_nolazytime, "nolazytime"}, 231 231 {Opt_quota, "quota"}, 232 232 {Opt_noquota, "noquota"}, 233 233 {Opt_usrquota, "usrquota"}, ··· 838 834 set_opt(sbi, READ_EXTENT_CACHE); 839 835 break; 840 836 case Opt_noextent_cache: 837 + if (F2FS_HAS_FEATURE(sbi, F2FS_FEATURE_DEVICE_ALIAS)) { 838 + f2fs_err(sbi, "device aliasing requires extent cache"); 839 + return -EINVAL; 840 + } 841 841 clear_opt(sbi, READ_EXTENT_CACHE); 842 842 break; 843 843 case Opt_noinline_data: ··· 926 918 f2fs_info(sbi, "fault_type options not supported"); 927 919 break; 928 920 #endif 921 + case Opt_lazytime: 922 + sb->s_flags |= SB_LAZYTIME; 923 + break; 924 + case Opt_nolazytime: 925 + sb->s_flags &= ~SB_LAZYTIME; 926 + break; 929 927 #ifdef CONFIG_QUOTA 930 928 case Opt_quota: 931 929 case Opt_usrquota: ··· 1172 1158 break; 1173 1159 } 1174 1160 1175 - strcpy(ext[ext_cnt], name); 1161 + ret = strscpy(ext[ext_cnt], name); 1162 + if (ret < 0) { 1163 + kfree(name); 1164 + return ret; 1165 + } 1176 1166 F2FS_OPTION(sbi).compress_ext_cnt++; 1177 1167 kfree(name); 1178 1168 break; ··· 1205 1187 break; 1206 1188 } 1207 1189 1208 - strcpy(noext[noext_cnt], name); 1190 + ret = strscpy(noext[noext_cnt], name); 1191 + if (ret < 0) { 1192 + kfree(name); 1193 + return ret; 1194 + } 1209 1195 F2FS_OPTION(sbi).nocompress_ext_cnt++; 1210 1196 kfree(name); 1211 1197 break; ··· 1760 1738 1761 1739 static int f2fs_unfreeze(struct super_block *sb) 1762 1740 { 1741 + struct f2fs_sb_info *sbi = F2FS_SB(sb); 1742 + 1743 + /* 1744 + * It will update discard_max_bytes of mounted lvm device to zero 1745 + * after creating snapshot on this lvm device, let's drop all 1746 + * remained discards. 1747 + * We don't need to disable real-time discard because discard_max_bytes 1748 + * will recover after removal of snapshot. 1749 + */ 1750 + if (test_opt(sbi, DISCARD) && !f2fs_hw_support_discard(sbi)) 1751 + f2fs_issue_discard_timeout(sbi); 1752 + 1763 1753 clear_sbi_flag(F2FS_SB(sb), SBI_IS_FREEZING); 1764 1754 return 0; 1765 1755 } ··· 2508 2474 } 2509 2475 } 2510 2476 2477 + adjust_unusable_cap_perc(sbi); 2511 2478 if (enable_checkpoint == !!test_opt(sbi, DISABLE_CHECKPOINT)) { 2512 2479 if (test_opt(sbi, DISABLE_CHECKPOINT)) { 2513 2480 err = f2fs_disable_checkpoint(sbi); ··· 2553 2518 (test_opt(sbi, POSIX_ACL) ? SB_POSIXACL : 0); 2554 2519 2555 2520 limit_reserve_root(sbi); 2556 - adjust_unusable_cap_perc(sbi); 2557 2521 *flags = (*flags & ~SB_LAZYTIME) | (sb->s_flags & SB_LAZYTIME); 2558 2522 return 0; 2559 2523 restore_checkpoint: ··· 3356 3322 * fit within U32_MAX + 1 data units. 3357 3323 */ 3358 3324 3359 - result = min(result, F2FS_BYTES_TO_BLK(((loff_t)U32_MAX + 1) * 4096)); 3325 + result = umin(result, F2FS_BYTES_TO_BLK(((loff_t)U32_MAX + 1) * 4096)); 3360 3326 3361 3327 return result; 3362 3328 } ··· 4189 4155 || system_state == SYSTEM_RESTART; 4190 4156 } 4191 4157 4192 - void f2fs_handle_critical_error(struct f2fs_sb_info *sbi, unsigned char reason, 4193 - bool irq_context) 4158 + void f2fs_handle_critical_error(struct f2fs_sb_info *sbi, unsigned char reason) 4194 4159 { 4195 4160 struct super_block *sb = sbi->sb; 4196 4161 bool shutdown = reason == STOP_CP_REASON_SHUTDOWN; ··· 4201 4168 if (!f2fs_hw_is_readonly(sbi)) { 4202 4169 save_stop_reason(sbi, reason); 4203 4170 4204 - if (irq_context && !shutdown) 4205 - schedule_work(&sbi->s_error_work); 4206 - else 4207 - f2fs_record_stop_reason(sbi); 4171 + /* 4172 + * always create an asynchronous task to record stop_reason 4173 + * in order to avoid potential deadlock when running into 4174 + * f2fs_record_stop_reason() synchronously. 4175 + */ 4176 + schedule_work(&sbi->s_error_work); 4208 4177 } 4209 4178 4210 4179 /* ··· 4250 4215 struct f2fs_sb_info, s_error_work); 4251 4216 4252 4217 f2fs_record_stop_reason(sbi); 4218 + } 4219 + 4220 + static inline unsigned int get_first_zoned_segno(struct f2fs_sb_info *sbi) 4221 + { 4222 + int devi; 4223 + 4224 + for (devi = 0; devi < sbi->s_ndevs; devi++) 4225 + if (bdev_is_zoned(FDEV(devi).bdev)) 4226 + return GET_SEGNO(sbi, FDEV(devi).start_blk); 4227 + return 0; 4253 4228 } 4254 4229 4255 4230 static int f2fs_scan_devices(struct f2fs_sb_info *sbi) ··· 4662 4617 /* For write statistics */ 4663 4618 sbi->sectors_written_start = f2fs_get_sectors_written(sbi); 4664 4619 4620 + /* get segno of first zoned block device */ 4621 + sbi->first_zoned_segno = get_first_zoned_segno(sbi); 4622 + 4665 4623 /* Read accumulated write IO statistics if exists */ 4666 4624 seg_i = CURSEG_I(sbi, CURSEG_HOT_NODE); 4667 4625 if (__exist_node_summaries(sbi)) ··· 4786 4738 reset_checkpoint: 4787 4739 /* 4788 4740 * If the f2fs is not readonly and fsync data recovery succeeds, 4789 - * check zoned block devices' write pointer consistency. 4741 + * write pointer consistency of cursegs and other zones are already 4742 + * checked and fixed during recovery. However, if recovery fails, 4743 + * write pointers are left untouched, and retry-mount should check 4744 + * them here. 4790 4745 */ 4791 - if (f2fs_sb_has_blkzoned(sbi) && !f2fs_readonly(sb)) { 4792 - int err2; 4793 - 4794 - f2fs_notice(sbi, "Checking entire write pointers"); 4795 - err2 = f2fs_check_write_pointer(sbi); 4796 - if (err2) 4797 - err = err2; 4798 - } 4746 + if (skip_recovery) 4747 + err = f2fs_check_and_fix_write_pointer(sbi); 4799 4748 if (err) 4800 4749 goto free_meta; 4750 + 4751 + /* f2fs_recover_fsync_data() cleared this already */ 4752 + clear_sbi_flag(sbi, SBI_POR_DOING); 4801 4753 4802 4754 err = f2fs_init_inmem_curseg(sbi); 4803 4755 if (err) 4804 4756 goto sync_free_meta; 4805 - 4806 - /* f2fs_recover_fsync_data() cleared this already */ 4807 - clear_sbi_flag(sbi, SBI_POR_DOING); 4808 4757 4809 4758 if (test_opt(sbi, DISABLE_CHECKPOINT)) { 4810 4759 err = f2fs_disable_checkpoint(sbi); ··· 5036 4991 err = f2fs_init_shrinker(); 5037 4992 if (err) 5038 4993 goto free_sysfs; 5039 - err = register_filesystem(&f2fs_fs_type); 5040 - if (err) 5041 - goto free_shrinker; 5042 4994 f2fs_create_root_stats(); 5043 4995 err = f2fs_init_post_read_processing(); 5044 4996 if (err) ··· 5058 5016 err = f2fs_create_casefold_cache(); 5059 5017 if (err) 5060 5018 goto free_compress_cache; 5019 + err = register_filesystem(&f2fs_fs_type); 5020 + if (err) 5021 + goto free_casefold_cache; 5061 5022 return 0; 5023 + free_casefold_cache: 5024 + f2fs_destroy_casefold_cache(); 5062 5025 free_compress_cache: 5063 5026 f2fs_destroy_compress_cache(); 5064 5027 free_compress_mempool: ··· 5078 5031 f2fs_destroy_post_read_processing(); 5079 5032 free_root_stats: 5080 5033 f2fs_destroy_root_stats(); 5081 - unregister_filesystem(&f2fs_fs_type); 5082 - free_shrinker: 5083 5034 f2fs_exit_shrinker(); 5084 5035 free_sysfs: 5085 5036 f2fs_exit_sysfs(); ··· 5101 5056 5102 5057 static void __exit exit_f2fs_fs(void) 5103 5058 { 5059 + unregister_filesystem(&f2fs_fs_type); 5104 5060 f2fs_destroy_casefold_cache(); 5105 5061 f2fs_destroy_compress_cache(); 5106 5062 f2fs_destroy_compress_mempool(); ··· 5110 5064 f2fs_destroy_iostat_processing(); 5111 5065 f2fs_destroy_post_read_processing(); 5112 5066 f2fs_destroy_root_stats(); 5113 - unregister_filesystem(&f2fs_fs_type); 5114 5067 f2fs_exit_shrinker(); 5115 5068 f2fs_exit_sysfs(); 5116 5069 f2fs_destroy_garbage_collection_cache();
+13 -3
fs/f2fs/sysfs.c
··· 501 501 if (a->struct_type == RESERVED_BLOCKS) { 502 502 spin_lock(&sbi->stat_lock); 503 503 if (t > (unsigned long)(sbi->user_block_count - 504 - F2FS_OPTION(sbi).root_reserved_blocks - 505 - SEGS_TO_BLKS(sbi, 506 - SM_I(sbi)->additional_reserved_segments))) { 504 + F2FS_OPTION(sbi).root_reserved_blocks)) { 507 505 spin_unlock(&sbi->stat_lock); 508 506 return -EINVAL; 509 507 } ··· 787 789 return count; 788 790 } 789 791 792 + if (!strcmp(a->attr.name, "max_read_extent_count")) { 793 + if (t > UINT_MAX) 794 + return -EINVAL; 795 + *ui = (unsigned int)t; 796 + return count; 797 + } 798 + 790 799 if (!strcmp(a->attr.name, "ipu_policy")) { 791 800 if (t >= BIT(F2FS_IPU_MAX)) 792 801 return -EINVAL; ··· 1059 1054 F2FS_SBI_GENERAL_RW_ATTR(hot_data_age_threshold); 1060 1055 F2FS_SBI_GENERAL_RW_ATTR(warm_data_age_threshold); 1061 1056 F2FS_SBI_GENERAL_RW_ATTR(last_age_weight); 1057 + /* read extent cache */ 1058 + F2FS_SBI_GENERAL_RW_ATTR(max_read_extent_count); 1062 1059 #ifdef CONFIG_BLK_DEV_ZONED 1063 1060 F2FS_SBI_GENERAL_RO_ATTR(unusable_blocks_per_sec); 1064 1061 F2FS_SBI_GENERAL_RW_ATTR(blkzone_alloc_policy); ··· 1251 1244 ATTR_LIST(hot_data_age_threshold), 1252 1245 ATTR_LIST(warm_data_age_threshold), 1253 1246 ATTR_LIST(last_age_weight), 1247 + ATTR_LIST(max_read_extent_count), 1254 1248 NULL, 1255 1249 }; 1256 1250 ATTRIBUTE_GROUPS(f2fs); ··· 1321 1313 F2FS_SB_FEATURE_RO_ATTR(casefold, CASEFOLD); 1322 1314 F2FS_SB_FEATURE_RO_ATTR(compression, COMPRESSION); 1323 1315 F2FS_SB_FEATURE_RO_ATTR(readonly, RO); 1316 + F2FS_SB_FEATURE_RO_ATTR(device_alias, DEVICE_ALIAS); 1324 1317 1325 1318 static struct attribute *f2fs_sb_feat_attrs[] = { 1326 1319 ATTR_LIST(sb_encryption), ··· 1338 1329 ATTR_LIST(sb_casefold), 1339 1330 ATTR_LIST(sb_compression), 1340 1331 ATTR_LIST(sb_readonly), 1332 + ATTR_LIST(sb_device_alias), 1341 1333 NULL, 1342 1334 }; 1343 1335 ATTRIBUTE_GROUPS(f2fs_sb_feat);
+4 -3
include/linux/f2fs_fs.h
··· 24 24 #define NEW_ADDR ((block_t)-1) /* used as block_t addresses */ 25 25 #define COMPRESS_ADDR ((block_t)-2) /* used as compressed data flag */ 26 26 27 - #define F2FS_BYTES_TO_BLK(bytes) ((bytes) >> F2FS_BLKSIZE_BITS) 28 - #define F2FS_BLK_TO_BYTES(blk) ((blk) << F2FS_BLKSIZE_BITS) 27 + #define F2FS_BLKSIZE_MASK (F2FS_BLKSIZE - 1) 28 + #define F2FS_BYTES_TO_BLK(bytes) ((unsigned long long)(bytes) >> F2FS_BLKSIZE_BITS) 29 + #define F2FS_BLK_TO_BYTES(blk) ((unsigned long long)(blk) << F2FS_BLKSIZE_BITS) 29 30 #define F2FS_BLK_END_BYTES(blk) (F2FS_BLK_TO_BYTES(blk + 1) - 1) 30 - #define F2FS_BLK_ALIGN(x) (F2FS_BYTES_TO_BLK((x) + F2FS_BLKSIZE - 1)) 31 + #define F2FS_BLK_ALIGN(x) (F2FS_BYTES_TO_BLK((x) + F2FS_BLKSIZE - 1)) 31 32 32 33 /* 0, 1(node nid), 2(meta nid) are reserved node id */ 33 34 #define F2FS_RESERVED_NODE_NUM 3
+1
include/uapi/linux/f2fs.h
··· 43 43 #define F2FS_IOC_DECOMPRESS_FILE _IO(F2FS_IOCTL_MAGIC, 23) 44 44 #define F2FS_IOC_COMPRESS_FILE _IO(F2FS_IOCTL_MAGIC, 24) 45 45 #define F2FS_IOC_START_ATOMIC_REPLACE _IO(F2FS_IOCTL_MAGIC, 25) 46 + #define F2FS_IOC_GET_DEV_ALIAS_FILE _IOR(F2FS_IOCTL_MAGIC, 26, __u32) 46 47 47 48 /* 48 49 * should be same as XFS_IOC_GOINGDOWN.