Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

zram: idle writeback fixes and cleanup

This patch includes some fixes and cleanup for idle-page writeback.

1. writeback_limit interface

Now writeback_limit interface is rather conusing. For example, once
writeback limit budget is exausted, admin can see 0 from
/sys/block/zramX/writeback_limit which is same semantic with disable
writeback_limit at this moment. IOW, admin cannot tell that zero came
from disable writeback limit or exausted writeback limit.

To make the interface clear, let's sepatate enable of writeback limit to
another knob - /sys/block/zram0/writeback_limit_enable

* before:
while true :
# to re-enable writeback limit once previous one is used up
echo 0 > /sys/block/zram0/writeback_limit
echo $((200<<20)) > /sys/block/zram0/writeback_limit
..
.. # used up the writeback limit budget

* new
# To enable writeback limit, from the beginning, admin should
# enable it.
echo $((200<<20)) > /sys/block/zram0/writeback_limit
echo 1 > /sys/block/zram/0/writeback_limit_enable
while true :
echo $((200<<20)) > /sys/block/zram0/writeback_limit
..
.. # used up the writeback limit budget

It's much strightforward.

2. fix condition check idle/huge writeback mode check

The mode in writeback_store is not bit opeartion any more so no need to
use bit operations. Furthermore, current condition check is broken in
that it does writeback every pages regardless of huge/idle.

3. clean up idle_store

No need to use goto.

[minchan@kernel.org: missed spin_lock_init]
Link: http://lkml.kernel.org/r/20190103001601.GA255139@google.com
Link: http://lkml.kernel.org/r/20181224033529.19450-1-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Suggested-by: John Dias <joaodias@google.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: John Dias <joaodias@google.com>
Cc: Srinivas Paladugu <srnvs@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

authored by

Minchan Kim and committed by
Linus Torvalds
1d69a3f8 3bd6e94b

+125 -55
+9 -2
Documentation/ABI/testing/sysfs-block-zram
··· 122 122 statistics (bd_count, bd_reads, bd_writes) in a format 123 123 similar to block layer statistics file format. 124 124 125 + What: /sys/block/zram<id>/writeback_limit_enable 126 + Date: November 2018 127 + Contact: Minchan Kim <minchan@kernel.org> 128 + Description: 129 + The writeback_limit_enable file is read-write and specifies 130 + eanbe of writeback_limit feature. "1" means eable the feature. 131 + No limit "0" is the initial state. 132 + 125 133 What: /sys/block/zram<id>/writeback_limit 126 134 Date: November 2018 127 135 Contact: Minchan Kim <minchan@kernel.org> 128 136 Description: 129 137 The writeback_limit file is read-write and specifies the maximum 130 138 amount of writeback ZRAM can do. The limit could be changed 131 - in run time and "0" means disable the limit. 132 - No limit is the initial state. 139 + in run time.
+47 -27
Documentation/blockdev/zram.txt
··· 156 156 A brief description of exported device attributes. For more details please 157 157 read Documentation/ABI/testing/sysfs-block-zram. 158 158 159 - Name access description 160 - ---- ------ ----------- 161 - disksize RW show and set the device's disk size 162 - initstate RO shows the initialization state of the device 163 - reset WO trigger device reset 164 - mem_used_max WO reset the `mem_used_max' counter (see later) 165 - mem_limit WO specifies the maximum amount of memory ZRAM can use 166 - to store the compressed data 167 - writeback_limit WO specifies the maximum amount of write IO zram can 168 - write out to backing device as 4KB unit 169 - max_comp_streams RW the number of possible concurrent compress operations 170 - comp_algorithm RW show and change the compression algorithm 171 - compact WO trigger memory compaction 172 - debug_stat RO this file is used for zram debugging purposes 173 - backing_dev RW set up backend storage for zram to write out 174 - idle WO mark allocated slot as idle 159 + Name access description 160 + ---- ------ ----------- 161 + disksize RW show and set the device's disk size 162 + initstate RO shows the initialization state of the device 163 + reset WO trigger device reset 164 + mem_used_max WO reset the `mem_used_max' counter (see later) 165 + mem_limit WO specifies the maximum amount of memory ZRAM can use 166 + to store the compressed data 167 + writeback_limit WO specifies the maximum amount of write IO zram can 168 + write out to backing device as 4KB unit 169 + writeback_limit_enable RW show and set writeback_limit feature 170 + max_comp_streams RW the number of possible concurrent compress operations 171 + comp_algorithm RW show and change the compression algorithm 172 + compact WO trigger memory compaction 173 + debug_stat RO this file is used for zram debugging purposes 174 + backing_dev RW set up backend storage for zram to write out 175 + idle WO mark allocated slot as idle 175 176 176 177 177 178 User space is advised to use the following files to read the device statistics. ··· 281 280 If there are lots of write IO with flash device, potentially, it has 282 281 flash wearout problem so that admin needs to design write limitation 283 282 to guarantee storage health for entire product life. 284 - To overcome the concern, zram supports "writeback_limit". 285 - The "writeback_limit"'s default value is 0 so that it doesn't limit 286 - any writeback. If admin want to measure writeback count in a certain 287 - period, he could know it via /sys/block/zram0/bd_stat's 3rd column. 283 + 284 + To overcome the concern, zram supports "writeback_limit" feature. 285 + The "writeback_limit_enable"'s default value is 0 so that it doesn't limit 286 + any writeback. IOW, if admin want to apply writeback budget, he should 287 + enable writeback_limit_enable via 288 + 289 + $ echo 1 > /sys/block/zramX/writeback_limit_enable 290 + 291 + Once writeback_limit_enable is set, zram doesn't allow any writeback 292 + until admin set the budget via /sys/block/zramX/writeback_limit. 293 + 294 + (If admin doesn't enable writeback_limit_enable, writeback_limit's value 295 + assigned via /sys/block/zramX/writeback_limit is meaninless.) 288 296 289 297 If admin want to limit writeback as per-day 400M, he could do it 290 298 like below. 291 299 292 - MB_SHIFT=20 293 - 4K_SHIFT=12 294 - echo $((400<<MB_SHIFT>>4K_SHIFT)) > \ 295 - /sys/block/zram0/writeback_limit. 300 + $ MB_SHIFT=20 301 + $ 4K_SHIFT=12 302 + $ echo $((400<<MB_SHIFT>>4K_SHIFT)) > \ 303 + /sys/block/zram0/writeback_limit. 304 + $ echo 1 > /sys/block/zram0/writeback_limit_enable 296 305 297 - If admin want to allow further write again, he could do it like below 306 + If admin want to allow further write again once the bugdet is exausted, 307 + he could do it like below 298 308 299 - echo 0 > /sys/block/zram0/writeback_limit 309 + $ echo $((400<<MB_SHIFT>>4K_SHIFT)) > \ 310 + /sys/block/zram0/writeback_limit 300 311 301 312 If admin want to see remaining writeback budget since he set, 302 313 303 - cat /sys/block/zram0/writeback_limit 314 + $ cat /sys/block/zramX/writeback_limit 315 + 316 + If admin want to disable writeback limit, he could do 317 + 318 + $ echo 0 > /sys/block/zramX/writeback_limit_enable 304 319 305 320 The writeback_limit count will reset whenever you reset zram(e.g., 306 321 system reboot, echo 1 > /sys/block/zramX/reset) so keeping how many of 307 322 writeback happened until you reset the zram to allocate extra writeback 308 323 budget in next setting is user's job. 324 + 325 + If admin want to measure writeback count in a certain period, he could 326 + know it via /sys/block/zram0/bd_stat's 3rd column. 309 327 310 328 = memory tracking 311 329
+66 -24
drivers/block/zram/zram_drv.c
··· 316 316 * See the comment in writeback_store. 317 317 */ 318 318 zram_slot_lock(zram, index); 319 - if (!zram_allocated(zram, index) || 320 - zram_test_flag(zram, index, ZRAM_UNDER_WB)) 321 - goto next; 322 - zram_set_flag(zram, index, ZRAM_IDLE); 323 - next: 319 + if (zram_allocated(zram, index) && 320 + !zram_test_flag(zram, index, ZRAM_UNDER_WB)) 321 + zram_set_flag(zram, index, ZRAM_IDLE); 324 322 zram_slot_unlock(zram, index); 325 323 } 326 324 ··· 328 330 } 329 331 330 332 #ifdef CONFIG_ZRAM_WRITEBACK 333 + static ssize_t writeback_limit_enable_store(struct device *dev, 334 + struct device_attribute *attr, const char *buf, size_t len) 335 + { 336 + struct zram *zram = dev_to_zram(dev); 337 + u64 val; 338 + ssize_t ret = -EINVAL; 339 + 340 + if (kstrtoull(buf, 10, &val)) 341 + return ret; 342 + 343 + down_read(&zram->init_lock); 344 + spin_lock(&zram->wb_limit_lock); 345 + zram->wb_limit_enable = val; 346 + spin_unlock(&zram->wb_limit_lock); 347 + up_read(&zram->init_lock); 348 + ret = len; 349 + 350 + return ret; 351 + } 352 + 353 + static ssize_t writeback_limit_enable_show(struct device *dev, 354 + struct device_attribute *attr, char *buf) 355 + { 356 + bool val; 357 + struct zram *zram = dev_to_zram(dev); 358 + 359 + down_read(&zram->init_lock); 360 + spin_lock(&zram->wb_limit_lock); 361 + val = zram->wb_limit_enable; 362 + spin_unlock(&zram->wb_limit_lock); 363 + up_read(&zram->init_lock); 364 + 365 + return scnprintf(buf, PAGE_SIZE, "%d\n", val); 366 + } 367 + 331 368 static ssize_t writeback_limit_store(struct device *dev, 332 369 struct device_attribute *attr, const char *buf, size_t len) 333 370 { ··· 374 341 return ret; 375 342 376 343 down_read(&zram->init_lock); 377 - atomic64_set(&zram->stats.bd_wb_limit, val); 378 - if (val == 0) 379 - zram->stop_writeback = false; 344 + spin_lock(&zram->wb_limit_lock); 345 + zram->bd_wb_limit = val; 346 + spin_unlock(&zram->wb_limit_lock); 380 347 up_read(&zram->init_lock); 381 348 ret = len; 382 349 ··· 390 357 struct zram *zram = dev_to_zram(dev); 391 358 392 359 down_read(&zram->init_lock); 393 - val = atomic64_read(&zram->stats.bd_wb_limit); 360 + spin_lock(&zram->wb_limit_lock); 361 + val = zram->bd_wb_limit; 362 + spin_unlock(&zram->wb_limit_lock); 394 363 up_read(&zram->init_lock); 395 364 396 365 return scnprintf(buf, PAGE_SIZE, "%llu\n", val); ··· 623 588 return 1; 624 589 } 625 590 626 - #define HUGE_WRITEBACK 0x1 627 - #define IDLE_WRITEBACK 0x2 591 + #define HUGE_WRITEBACK 1 592 + #define IDLE_WRITEBACK 2 628 593 629 594 static ssize_t writeback_store(struct device *dev, 630 595 struct device_attribute *attr, const char *buf, size_t len) ··· 637 602 struct page *page; 638 603 ssize_t ret, sz; 639 604 char mode_buf[8]; 640 - unsigned long mode = -1UL; 605 + int mode = -1; 641 606 unsigned long blk_idx = 0; 642 607 643 608 sz = strscpy(mode_buf, buf, sizeof(mode_buf)); ··· 653 618 else if (!strcmp(mode_buf, "huge")) 654 619 mode = HUGE_WRITEBACK; 655 620 656 - if (mode == -1UL) 621 + if (mode == -1) 657 622 return -EINVAL; 658 623 659 624 down_read(&zram->init_lock); ··· 680 645 bvec.bv_len = PAGE_SIZE; 681 646 bvec.bv_offset = 0; 682 647 683 - if (zram->stop_writeback) { 648 + spin_lock(&zram->wb_limit_lock); 649 + if (zram->wb_limit_enable && !zram->bd_wb_limit) { 650 + spin_unlock(&zram->wb_limit_lock); 684 651 ret = -EIO; 685 652 break; 686 653 } 654 + spin_unlock(&zram->wb_limit_lock); 687 655 688 656 if (!blk_idx) { 689 657 blk_idx = alloc_block_bdev(zram); ··· 705 667 zram_test_flag(zram, index, ZRAM_UNDER_WB)) 706 668 goto next; 707 669 708 - if ((mode & IDLE_WRITEBACK && 709 - !zram_test_flag(zram, index, ZRAM_IDLE)) && 710 - (mode & HUGE_WRITEBACK && 711 - !zram_test_flag(zram, index, ZRAM_HUGE))) 670 + if (mode == IDLE_WRITEBACK && 671 + !zram_test_flag(zram, index, ZRAM_IDLE)) 672 + goto next; 673 + if (mode == HUGE_WRITEBACK && 674 + !zram_test_flag(zram, index, ZRAM_HUGE)) 712 675 goto next; 713 676 /* 714 677 * Clearing ZRAM_UNDER_WB is duty of caller. ··· 771 732 zram_set_element(zram, index, blk_idx); 772 733 blk_idx = 0; 773 734 atomic64_inc(&zram->stats.pages_stored); 774 - if (atomic64_add_unless(&zram->stats.bd_wb_limit, 775 - -1 << (PAGE_SHIFT - 12), 0)) { 776 - if (atomic64_read(&zram->stats.bd_wb_limit) == 0) 777 - zram->stop_writeback = true; 778 - } 735 + spin_lock(&zram->wb_limit_lock); 736 + if (zram->wb_limit_enable && zram->bd_wb_limit > 0) 737 + zram->bd_wb_limit -= 1UL << (PAGE_SHIFT - 12); 738 + spin_unlock(&zram->wb_limit_lock); 779 739 next: 780 740 zram_slot_unlock(zram, index); 781 741 } ··· 1850 1812 static DEVICE_ATTR_RW(backing_dev); 1851 1813 static DEVICE_ATTR_WO(writeback); 1852 1814 static DEVICE_ATTR_RW(writeback_limit); 1815 + static DEVICE_ATTR_RW(writeback_limit_enable); 1853 1816 #endif 1854 1817 1855 1818 static struct attribute *zram_disk_attrs[] = { ··· 1867 1828 &dev_attr_backing_dev.attr, 1868 1829 &dev_attr_writeback.attr, 1869 1830 &dev_attr_writeback_limit.attr, 1831 + &dev_attr_writeback_limit_enable.attr, 1870 1832 #endif 1871 1833 &dev_attr_io_stat.attr, 1872 1834 &dev_attr_mm_stat.attr, ··· 1907 1867 device_id = ret; 1908 1868 1909 1869 init_rwsem(&zram->init_lock); 1910 - 1870 + #ifdef CONFIG_ZRAM_WRITEBACK 1871 + spin_lock_init(&zram->wb_limit_lock); 1872 + #endif 1911 1873 queue = blk_alloc_queue(GFP_KERNEL); 1912 1874 if (!queue) { 1913 1875 pr_err("Error allocating disk queue for device %d\n",
+3 -2
drivers/block/zram/zram_drv.h
··· 86 86 atomic64_t bd_count; /* no. of pages in backing device */ 87 87 atomic64_t bd_reads; /* no. of reads from backing device */ 88 88 atomic64_t bd_writes; /* no. of writes from backing device */ 89 - atomic64_t bd_wb_limit; /* writeback limit of backing device */ 90 89 #endif 91 90 }; 92 91 ··· 113 114 */ 114 115 bool claim; /* Protected by bdev->bd_mutex */ 115 116 struct file *backing_dev; 116 - bool stop_writeback; 117 117 #ifdef CONFIG_ZRAM_WRITEBACK 118 + spinlock_t wb_limit_lock; 119 + bool wb_limit_enable; 120 + u64 bd_wb_limit; 118 121 struct block_device *bdev; 119 122 unsigned int old_block_size; 120 123 unsigned long *bitmap;