Merge branch 'for-linus' of git://git.kernel.dk/linux-block

tjh.dev / kernel

fork

Configure Feed

Issues Pull Requests Commits Tags

Feed URL

Select the types of activity you want to include in your feed.

Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

kernel os linux

fork

Configure Feed

Issues Pull Requests Commits Tags

Feed URL

Select the types of activity you want to include in your feed.

Merge branch 'for-linus' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:
- fix for a memory leak on certain unplug events
- a collection of bcache fixes from Kent and Nicolas
- a few null_blk fixes and updates form Matias
- a marking of static of functions in the stec pci-e driver

* 'for-linus' of git://git.kernel.dk/linux-block:
null_blk: support submit_queues on use_per_node_hctx
null_blk: set use_per_node_hctx param to false
null_blk: corrections to documentation
null_blk: warning on ignored submit_queues param
null_blk: refactor init and init errors code paths
null_blk: documentation
null_blk: mem garbage on NUMA systems during init
drivers: block: Mark the functions as static in skd_main.c
bcache: New writeback PD controller
bcache: bugfix for race between moving_gc and bucket_invalidate
bcache: fix for gc and writeback race
bcache: bugfix - moving_gc now moves only correct buckets
bcache: fix for gc crashing when no sectors are used
bcache: Fix heap_peek() macro
bcache: Fix for can_attach_cache()
bcache: Fix dirty_data accounting
bcache: Use uninterruptible sleep in writeback
bcache: kthread don't set writeback task to INTERUPTIBLE
block: fix memory leaks on unplugging block device
bcache: fix sparse non static symbol warning

Linus Torvalds 12 years ago c5fdd531 70e672fa

+274 -94

13 changed files

expand all collapse all

Documentation

block

null_blk.txt

block

blk-mq-sysfs.c

drivers

block

null_blk.c

skd_main.c

bcache

alloc.c

bcache.h

btree.c

movinggc.c

super.c

sysfs.c

util.c

util.h

writeback.c

+72

Documentation/block/null_blk.txt

reviewed

··· 1 1 + Null block device driver 2 2 + ================================================================================ 3 3 + 4 4 + I. Overview 5 5 + 6 6 + The null block device (/dev/nullb*) is used for benchmarking the various 7 7 + block-layer implementations. It emulates a block device of X gigabytes in size. 8 8 + The following instances are possible: 9 9 + 10 10 + Single-queue block-layer 11 11 + - Request-based. 12 12 + - Single submission queue per device. 13 13 + - Implements IO scheduling algorithms (CFQ, Deadline, noop). 14 14 + Multi-queue block-layer 15 15 + - Request-based. 16 16 + - Configurable submission queues per device. 17 17 + No block-layer (Known as bio-based) 18 18 + - Bio-based. IO requests are submitted directly to the device driver. 19 19 + - Directly accepts bio data structure and returns them. 20 20 + 21 21 + All of them have a completion queue for each core in the system. 22 22 + 23 23 + II. Module parameters applicable for all instances: 24 24 + 25 25 + queue_mode=[0-2]: Default: 2-Multi-queue 26 26 + Selects which block-layer the module should instantiate with. 27 27 + 28 28 + 0: Bio-based. 29 29 + 1: Single-queue. 30 30 + 2: Multi-queue. 31 31 + 32 32 + home_node=[0--nr_nodes]: Default: NUMA_NO_NODE 33 33 + Selects what CPU node the data structures are allocated from. 34 34 + 35 35 + gb=[Size in GB]: Default: 250GB 36 36 + The size of the device reported to the system. 37 37 + 38 38 + bs=[Block size (in bytes)]: Default: 512 bytes 39 39 + The block size reported to the system. 40 40 + 41 41 + nr_devices=[Number of devices]: Default: 2 42 42 + Number of block devices instantiated. They are instantiated as /dev/nullb0, 43 43 + etc. 44 44 + 45 45 + irq_mode=[0-2]: Default: 1-Soft-irq 46 46 + The completion mode used for completing IOs to the block-layer. 47 47 + 48 48 + 0: None. 49 49 + 1: Soft-irq. Uses IPI to complete IOs across CPU nodes. Simulates the overhead 50 50 + when IOs are issued from another CPU node than the home the device is 51 51 + connected to. 52 52 + 2: Timer: Waits a specific period (completion_nsec) for each IO before 53 53 + completion. 54 54 + 55 55 + completion_nsec=[ns]: Default: 10.000ns 56 56 + Combined with irq_mode=2 (timer). The time each completion event must wait. 57 57 + 58 58 + submit_queues=[0..nr_cpus]: 59 59 + The number of submission queues attached to the device driver. If unset, it 60 60 + defaults to 1 on single-queue and bio-based instances. For multi-queue, 61 61 + it is ignored when use_per_node_hctx module parameter is 1. 62 62 + 63 63 + hw_queue_depth=[0..qdepth]: Default: 64 64 64 + The hardware queue depth of the device. 65 65 + 66 66 + III: Multi-queue specific parameters 67 67 + 68 68 + use_per_node_hctx=[0/1]: Default: 0 69 69 + 0: The number of submit queues are set to the value of the submit_queues 70 70 + parameter. 71 71 + 1: The multi-queue block layer is instantiated with a hardware dispatch 72 72 + queue for each CPU node in the system.

+13

block/blk-mq-sysfs.c

reviewed

··· 335 335 void blk_mq_unregister_disk(struct gendisk *disk) 336 336 { 337 337 struct request_queue *q = disk->queue; 338 338 + struct blk_mq_hw_ctx *hctx; 339 339 + struct blk_mq_ctx *ctx; 340 340 + int i, j; 341 341 + 342 342 + queue_for_each_hw_ctx(q, hctx, i) { 343 343 + hctx_for_each_ctx(hctx, ctx, j) { 344 344 + kobject_del(&ctx->kobj); 345 345 + kobject_put(&ctx->kobj); 346 346 + } 347 347 + kobject_del(&hctx->kobj); 348 348 + kobject_put(&hctx->kobj); 349 349 + } 338 350 339 351 kobject_uevent(&q->mq_kobj, KOBJ_REMOVE); 340 352 kobject_del(&q->mq_kobj); 353 353 + kobject_put(&q->mq_kobj); 341 354 342 355 kobject_put(&disk_to_dev(disk)->kobj); 343 356 }

+76 -26

drivers/block/null_blk.c

reviewed

··· 1 1 #include <linux/module.h> 2 2 + 2 3 #include <linux/moduleparam.h> 3 4 #include <linux/sched.h> 4 5 #include <linux/fs.h> ··· 66 65 NULL_Q_MQ = 2, 67 66 }; 68 67 69 69 - static int submit_queues = 1; 68 68 + static int submit_queues; 70 69 module_param(submit_queues, int, S_IRUGO); 71 70 MODULE_PARM_DESC(submit_queues, "Number of submission queues"); 72 71 ··· 102 101 module_param(hw_queue_depth, int, S_IRUGO); 103 102 MODULE_PARM_DESC(hw_queue_depth, "Queue depth for each hardware queue. Default: 64"); 104 103 105 105 - static bool use_per_node_hctx = true; 104 104 + static bool use_per_node_hctx = false; 106 105 module_param(use_per_node_hctx, bool, S_IRUGO); 107 107 - MODULE_PARM_DESC(use_per_node_hctx, "Use per-node allocation for hardware context queues. Default: true"); 106 106 + MODULE_PARM_DESC(use_per_node_hctx, "Use per-node allocation for hardware context queues. Default: false"); 108 107 109 108 static void put_tag(struct nullb_queue *nq, unsigned int tag) 110 109 { ··· 347 346 348 347 static struct blk_mq_hw_ctx *null_alloc_hctx(struct blk_mq_reg *reg, unsigned int hctx_index) 349 348 { 350 350 - return kzalloc_node(sizeof(struct blk_mq_hw_ctx), GFP_KERNEL, 351 351 - hctx_index); 349 349 + int b_size = DIV_ROUND_UP(reg->nr_hw_queues, nr_online_nodes); 350 350 + int tip = (reg->nr_hw_queues % nr_online_nodes); 351 351 + int node = 0, i, n; 352 352 + 353 353 + /* 354 354 + * Split submit queues evenly wrt to the number of nodes. If uneven, 355 355 + * fill the first buckets with one extra, until the rest is filled with 356 356 + * no extra. 357 357 + */ 358 358 + for (i = 0, n = 1; i < hctx_index; i++, n++) { 359 359 + if (n % b_size == 0) { 360 360 + n = 0; 361 361 + node++; 362 362 + 363 363 + tip--; 364 364 + if (!tip) 365 365 + b_size = reg->nr_hw_queues / nr_online_nodes; 366 366 + } 367 367 + } 368 368 + 369 369 + /* 370 370 + * A node might not be online, therefore map the relative node id to the 371 371 + * real node id. 372 372 + */ 373 373 + for_each_online_node(n) { 374 374 + if (!node) 375 375 + break; 376 376 + node--; 377 377 + } 378 378 + 379 379 + return kzalloc_node(sizeof(struct blk_mq_hw_ctx), GFP_KERNEL, n); 352 380 } 353 381 354 382 static void null_free_hctx(struct blk_mq_hw_ctx *hctx, unsigned int hctx_index) 355 383 { 356 384 kfree(hctx); 385 385 + } 386 386 + 387 387 + static void null_init_queue(struct nullb *nullb, struct nullb_queue *nq) 388 388 + { 389 389 + BUG_ON(!nullb); 390 390 + BUG_ON(!nq); 391 391 + 392 392 + init_waitqueue_head(&nq->wait); 393 393 + nq->queue_depth = nullb->queue_depth; 357 394 } 358 395 359 396 static int null_init_hctx(struct blk_mq_hw_ctx *hctx, void *data, ··· 400 361 struct nullb *nullb = data; 401 362 struct nullb_queue *nq = &nullb->queues[index]; 402 363 403 403 - init_waitqueue_head(&nq->wait); 404 404 - nq->queue_depth = nullb->queue_depth; 405 405 - nullb->nr_queues++; 406 364 hctx->driver_data = nq; 365 365 + null_init_queue(nullb, nq); 366 366 + nullb->nr_queues++; 407 367 408 368 return 0; 409 369 } ··· 455 417 456 418 nq->cmds = kzalloc(nq->queue_depth * sizeof(*cmd), GFP_KERNEL); 457 419 if (!nq->cmds) 458 458 - return 1; 420 420 + return -ENOMEM; 459 421 460 422 tag_size = ALIGN(nq->queue_depth, BITS_PER_LONG) / BITS_PER_LONG; 461 423 nq->tag_map = kzalloc(tag_size * sizeof(unsigned long), GFP_KERNEL); 462 424 if (!nq->tag_map) { 463 425 kfree(nq->cmds); 464 464 - return 1; 426 426 + return -ENOMEM; 465 427 } 466 428 467 429 for (i = 0; i < nq->queue_depth; i++) { ··· 492 454 493 455 static int setup_queues(struct nullb *nullb) 494 456 { 495 495 - struct nullb_queue *nq; 496 496 - int i; 497 497 - 498 498 - nullb->queues = kzalloc(submit_queues * sizeof(*nq), GFP_KERNEL); 457 457 + nullb->queues = kzalloc(submit_queues * sizeof(struct nullb_queue), 458 458 + GFP_KERNEL); 499 459 if (!nullb->queues) 500 500 - return 1; 460 460 + return -ENOMEM; 501 461 502 462 nullb->nr_queues = 0; 503 463 nullb->queue_depth = hw_queue_depth; 504 464 505 505 - if (queue_mode == NULL_Q_MQ) 506 506 - return 0; 465 465 + return 0; 466 466 + } 467 467 + 468 468 + static int init_driver_queues(struct nullb *nullb) 469 469 + { 470 470 + struct nullb_queue *nq; 471 471 + int i, ret = 0; 507 472 508 473 for (i = 0; i < submit_queues; i++) { 509 474 nq = &nullb->queues[i]; 510 510 - init_waitqueue_head(&nq->wait); 511 511 - nq->queue_depth = hw_queue_depth; 512 512 - if (setup_commands(nq)) 513 513 - break; 475 475 + 476 476 + null_init_queue(nullb, nq); 477 477 + 478 478 + ret = setup_commands(nq); 479 479 + if (ret) 480 480 + goto err_queue; 514 481 nullb->nr_queues++; 515 482 } 516 483 517 517 - if (i == submit_queues) 518 518 - return 0; 519 519 - 484 484 + return 0; 485 485 + err_queue: 520 486 cleanup_queues(nullb); 521 521 - return 1; 487 487 + return ret; 522 488 } 523 489 524 490 static int null_add_dev(void) ··· 560 518 } else if (queue_mode == NULL_Q_BIO) { 561 519 nullb->q = blk_alloc_queue_node(GFP_KERNEL, home_node); 562 520 blk_queue_make_request(nullb->q, null_queue_bio); 521 521 + init_driver_queues(nullb); 563 522 } else { 564 523 nullb->q = blk_init_queue_node(null_request_fn, &nullb->lock, home_node); 565 524 blk_queue_prep_rq(nullb->q, null_rq_prep_fn); 566 525 if (nullb->q) 567 526 blk_queue_softirq_done(nullb->q, null_softirq_done_fn); 527 527 + init_driver_queues(nullb); 568 528 } 569 529 570 530 if (!nullb->q) ··· 623 579 } 624 580 #endif 625 581 626 626 - if (submit_queues > nr_cpu_ids) 582 582 + if (queue_mode == NULL_Q_MQ && use_per_node_hctx) { 583 583 + if (submit_queues < nr_online_nodes) { 584 584 + pr_warn("null_blk: submit_queues param is set to %u.", 585 585 + nr_online_nodes); 586 586 + submit_queues = nr_online_nodes; 587 587 + } 588 588 + } else if (submit_queues > nr_cpu_ids) 627 589 submit_queues = nr_cpu_ids; 628 590 else if (!submit_queues) 629 591 submit_queues = 1;

+2 -2

drivers/block/skd_main.c

reviewed

··· 5269 5269 } 5270 5270 } 5271 5271 5272 5272 - const char *skd_skmsg_state_to_str(enum skd_fit_msg_state state) 5272 5272 + static const char *skd_skmsg_state_to_str(enum skd_fit_msg_state state) 5273 5273 { 5274 5274 switch (state) { 5275 5275 case SKD_MSG_STATE_IDLE: ··· 5281 5281 } 5282 5282 } 5283 5283 5284 5284 - const char *skd_skreq_state_to_str(enum skd_req_state state) 5284 5284 + static const char *skd_skreq_state_to_str(enum skd_req_state state) 5285 5285 { 5286 5286 switch (state) { 5287 5287 case SKD_REQ_STATE_IDLE:

drivers/md/bcache/alloc.c

reviewed

··· 421 421 422 422 if (watermark <= WATERMARK_METADATA) { 423 423 SET_GC_MARK(b, GC_MARK_METADATA); 424 424 + SET_GC_MOVE(b, 0); 424 425 b->prio = BTREE_PRIO; 425 426 } else { 426 427 SET_GC_MARK(b, GC_MARK_RECLAIMABLE); 428 428 + SET_GC_MOVE(b, 0); 427 429 b->prio = INITIAL_PRIO; 428 430 } 429 431

+6 -6

drivers/md/bcache/bcache.h

reviewed

··· 197 197 uint8_t disk_gen; 198 198 uint8_t last_gc; /* Most out of date gen in the btree */ 199 199 uint8_t gc_gen; 200 200 - uint16_t gc_mark; 200 200 + uint16_t gc_mark; /* Bitfield used by GC. See below for field */ 201 201 }; 202 202 203 203 /* ··· 209 209 #define GC_MARK_RECLAIMABLE 0 210 210 #define GC_MARK_DIRTY 1 211 211 #define GC_MARK_METADATA 2 212 212 - BITMASK(GC_SECTORS_USED, struct bucket, gc_mark, 2, 14); 212 212 + BITMASK(GC_SECTORS_USED, struct bucket, gc_mark, 2, 13); 213 213 + BITMASK(GC_MOVE, struct bucket, gc_mark, 15, 1); 213 214 214 215 #include "journal.h" 215 216 #include "stats.h" ··· 373 372 unsigned char writeback_percent; 374 373 unsigned writeback_delay; 375 374 376 376 - int writeback_rate_change; 377 377 - int64_t writeback_rate_derivative; 378 375 uint64_t writeback_rate_target; 376 376 + int64_t writeback_rate_proportional; 377 377 + int64_t writeback_rate_derivative; 378 378 + int64_t writeback_rate_change; 379 379 380 380 unsigned writeback_rate_update_seconds; 381 381 unsigned writeback_rate_d_term; 382 382 unsigned writeback_rate_p_term_inverse; 383 383 - unsigned writeback_rate_d_smooth; 384 383 }; 385 384 386 385 enum alloc_watermarks { ··· 446 445 * call prio_write() to keep gens from wrapping. 447 446 */ 448 447 uint8_t need_save_prio; 449 449 - unsigned gc_move_threshold; 450 448 451 449 /* 452 450 * If nonzero, we know we aren't going to find any buckets to invalidate

+25 -2

drivers/md/bcache/btree.c

reviewed

··· 1561 1561 SET_GC_MARK(PTR_BUCKET(c, &c->uuid_bucket, i), 1562 1562 GC_MARK_METADATA); 1563 1563 1564 1564 + /* don't reclaim buckets to which writeback keys point */ 1565 1565 + rcu_read_lock(); 1566 1566 + for (i = 0; i < c->nr_uuids; i++) { 1567 1567 + struct bcache_device *d = c->devices[i]; 1568 1568 + struct cached_dev *dc; 1569 1569 + struct keybuf_key *w, *n; 1570 1570 + unsigned j; 1571 1571 + 1572 1572 + if (!d || UUID_FLASH_ONLY(&c->uuids[i])) 1573 1573 + continue; 1574 1574 + dc = container_of(d, struct cached_dev, disk); 1575 1575 + 1576 1576 + spin_lock(&dc->writeback_keys.lock); 1577 1577 + rbtree_postorder_for_each_entry_safe(w, n, 1578 1578 + &dc->writeback_keys.keys, node) 1579 1579 + for (j = 0; j < KEY_PTRS(&w->key); j++) 1580 1580 + SET_GC_MARK(PTR_BUCKET(c, &w->key, j), 1581 1581 + GC_MARK_DIRTY); 1582 1582 + spin_unlock(&dc->writeback_keys.lock); 1583 1583 + } 1584 1584 + rcu_read_unlock(); 1585 1585 + 1564 1586 for_each_cache(ca, c, i) { 1565 1587 uint64_t *i; 1566 1588 ··· 1839 1817 if (KEY_START(k) > KEY_START(insert) + sectors_found) 1840 1818 goto check_failed; 1841 1819 1842 1842 - if (KEY_PTRS(replace_key) != KEY_PTRS(k)) 1820 1820 + if (KEY_PTRS(k) != KEY_PTRS(replace_key) || 1821 1821 + KEY_DIRTY(k) != KEY_DIRTY(replace_key)) 1843 1822 goto check_failed; 1844 1823 1845 1824 /* skip past gen */ ··· 2240 2217 struct bkey *replace_key; 2241 2218 }; 2242 2219 2243 2243 - int btree_insert_fn(struct btree_op *b_op, struct btree *b) 2220 2220 + static int btree_insert_fn(struct btree_op *b_op, struct btree *b) 2244 2221 { 2245 2222 struct btree_insert_op *op = container_of(b_op, 2246 2223 struct btree_insert_op, op);

+15 -6

drivers/md/bcache/movinggc.c

reviewed

··· 25 25 unsigned i; 26 26 27 27 for (i = 0; i < KEY_PTRS(k); i++) { 28 28 - struct cache *ca = PTR_CACHE(c, k, i); 29 28 struct bucket *g = PTR_BUCKET(c, k, i); 30 29 31 31 - if (GC_SECTORS_USED(g) < ca->gc_move_threshold) 30 30 + if (GC_MOVE(g)) 32 31 return true; 33 32 } 34 33 ··· 64 65 65 66 static void read_moving_endio(struct bio *bio, int error) 66 67 { 68 68 + struct bbio *b = container_of(bio, struct bbio, bio); 67 69 struct moving_io *io = container_of(bio->bi_private, 68 70 struct moving_io, cl); 69 71 70 72 if (error) 71 73 io->op.error = error; 74 74 + else if (!KEY_DIRTY(&b->key) && 75 75 + ptr_stale(io->op.c, &b->key, 0)) { 76 76 + io->op.error = -EINTR; 77 77 + } 72 78 73 79 bch_bbio_endio(io->op.c, bio, error, "reading data to move"); 74 80 } ··· 145 141 if (!w) 146 142 break; 147 143 144 144 + if (ptr_stale(c, &w->key, 0)) { 145 145 + bch_keybuf_del(&c->moving_gc_keys, w); 146 146 + continue; 147 147 + } 148 148 + 148 149 io = kzalloc(sizeof(struct moving_io) + sizeof(struct bio_vec) 149 150 * DIV_ROUND_UP(KEY_SIZE(&w->key), PAGE_SECTORS), 150 151 GFP_KERNEL); ··· 193 184 194 185 static unsigned bucket_heap_top(struct cache *ca) 195 186 { 196 196 - return GC_SECTORS_USED(heap_peek(&ca->heap)); 187 187 + struct bucket *b; 188 188 + return (b = heap_peek(&ca->heap)) ? GC_SECTORS_USED(b) : 0; 197 189 } 198 190 199 191 void bch_moving_gc(struct cache_set *c) ··· 236 226 sectors_to_move -= GC_SECTORS_USED(b); 237 227 } 238 228 239 239 - ca->gc_move_threshold = bucket_heap_top(ca); 240 240 - 241 241 - pr_debug("threshold %u", ca->gc_move_threshold); 229 229 + while (heap_pop(&ca->heap, b, bucket_cmp)) 230 230 + SET_GC_MOVE(b, 1); 242 231 } 243 232 244 233 mutex_unlock(&c->bucket_lock);

+1 -1

drivers/md/bcache/super.c

reviewed

··· 1676 1676 static bool can_attach_cache(struct cache *ca, struct cache_set *c) 1677 1677 { 1678 1678 return ca->sb.block_size == c->sb.block_size && 1679 1679 - ca->sb.bucket_size == c->sb.block_size && 1679 1679 + ca->sb.bucket_size == c->sb.bucket_size && 1680 1680 ca->sb.nr_in_set == c->sb.nr_in_set; 1681 1681 } 1682 1682

+29 -21

drivers/md/bcache/sysfs.c

reviewed

··· 83 83 rw_attribute(writeback_rate_update_seconds); 84 84 rw_attribute(writeback_rate_d_term); 85 85 rw_attribute(writeback_rate_p_term_inverse); 86 86 - rw_attribute(writeback_rate_d_smooth); 87 86 read_attribute(writeback_rate_debug); 88 87 89 88 read_attribute(stripe_size); ··· 128 129 var_printf(writeback_running, "%i"); 129 130 var_print(writeback_delay); 130 131 var_print(writeback_percent); 131 131 - sysfs_print(writeback_rate, dc->writeback_rate.rate); 132 132 + sysfs_hprint(writeback_rate, dc->writeback_rate.rate << 9); 132 133 133 134 var_print(writeback_rate_update_seconds); 134 135 var_print(writeback_rate_d_term); 135 136 var_print(writeback_rate_p_term_inverse); 136 136 - var_print(writeback_rate_d_smooth); 137 137 138 138 if (attr == &sysfs_writeback_rate_debug) { 139 139 + char rate[20]; 139 140 char dirty[20]; 140 140 - char derivative[20]; 141 141 char target[20]; 142 142 - bch_hprint(dirty, 143 143 - bcache_dev_sectors_dirty(&dc->disk) << 9); 144 144 - bch_hprint(derivative, dc->writeback_rate_derivative << 9); 142 142 + char proportional[20]; 143 143 + char derivative[20]; 144 144 + char change[20]; 145 145 + s64 next_io; 146 146 + 147 147 + bch_hprint(rate, dc->writeback_rate.rate << 9); 148 148 + bch_hprint(dirty, bcache_dev_sectors_dirty(&dc->disk) << 9); 145 149 bch_hprint(target, dc->writeback_rate_target << 9); 150 150 + bch_hprint(proportional,dc->writeback_rate_proportional << 9); 151 151 + bch_hprint(derivative, dc->writeback_rate_derivative << 9); 152 152 + bch_hprint(change, dc->writeback_rate_change << 9); 153 153 + 154 154 + next_io = div64_s64(dc->writeback_rate.next - local_clock(), 155 155 + NSEC_PER_MSEC); 146 156 147 157 return sprintf(buf, 148 148 - "rate:\t\t%u\n" 149 149 - "change:\t\t%i\n" 158 158 + "rate:\t\t%s/sec\n" 150 159 "dirty:\t\t%s\n" 160 160 + "target:\t\t%s\n" 161 161 + "proportional:\t%s\n" 151 162 "derivative:\t%s\n" 152 152 - "target:\t\t%s\n", 153 153 - dc->writeback_rate.rate, 154 154 - dc->writeback_rate_change, 155 155 - dirty, derivative, target); 163 163 + "change:\t\t%s/sec\n" 164 164 + "next io:\t%llims\n", 165 165 + rate, dirty, target, proportional, 166 166 + derivative, change, next_io); 156 167 } 157 168 158 169 sysfs_hprint(dirty_data, ··· 198 189 struct kobj_uevent_env *env; 199 190 200 191 #define d_strtoul(var) sysfs_strtoul(var, dc->var) 192 192 + #define d_strtoul_nonzero(var) sysfs_strtoul_clamp(var, dc->var, 1, INT_MAX) 201 193 #define d_strtoi_h(var) sysfs_hatoi(var, dc->var) 202 194 203 195 sysfs_strtoul(data_csum, dc->disk.data_csum); ··· 207 197 d_strtoul(writeback_metadata); 208 198 d_strtoul(writeback_running); 209 199 d_strtoul(writeback_delay); 210 210 - sysfs_strtoul_clamp(writeback_rate, 211 211 - dc->writeback_rate.rate, 1, 1000000); 200 200 + 212 201 sysfs_strtoul_clamp(writeback_percent, dc->writeback_percent, 0, 40); 213 202 214 214 - d_strtoul(writeback_rate_update_seconds); 203 203 + sysfs_strtoul_clamp(writeback_rate, 204 204 + dc->writeback_rate.rate, 1, INT_MAX); 205 205 + 206 206 + d_strtoul_nonzero(writeback_rate_update_seconds); 215 207 d_strtoul(writeback_rate_d_term); 216 216 - d_strtoul(writeback_rate_p_term_inverse); 217 217 - sysfs_strtoul_clamp(writeback_rate_p_term_inverse, 218 218 - dc->writeback_rate_p_term_inverse, 1, INT_MAX); 219 219 - d_strtoul(writeback_rate_d_smooth); 208 208 + d_strtoul_nonzero(writeback_rate_p_term_inverse); 220 209 221 210 d_strtoi_h(sequential_cutoff); 222 211 d_strtoi_h(readahead); ··· 322 313 &sysfs_writeback_rate_update_seconds, 323 314 &sysfs_writeback_rate_d_term, 324 315 &sysfs_writeback_rate_p_term_inverse, 325 325 - &sysfs_writeback_rate_d_smooth, 326 316 &sysfs_writeback_rate_debug, 327 317 &sysfs_dirty_data, 328 318 &sysfs_stripe_size,

+7 -1

drivers/md/bcache/util.c

reviewed

··· 209 209 { 210 210 uint64_t now = local_clock(); 211 211 212 212 - d->next += div_u64(done, d->rate); 212 212 + d->next += div_u64(done * NSEC_PER_SEC, d->rate); 213 213 + 214 214 + if (time_before64(now + NSEC_PER_SEC, d->next)) 215 215 + d->next = now + NSEC_PER_SEC; 216 216 + 217 217 + if (time_after64(now - NSEC_PER_SEC * 2, d->next)) 218 218 + d->next = now - NSEC_PER_SEC * 2; 213 219 214 220 return time_after64(d->next, now) 215 221 ? div_u64(d->next - now, NSEC_PER_SEC / HZ)

+1 -1

drivers/md/bcache/util.h

reviewed

··· 110 110 _r; \ 111 111 }) 112 112 113 113 - #define heap_peek(h) ((h)->size ? (h)->data[0] : NULL) 113 113 + #define heap_peek(h) ((h)->used ? (h)->data[0] : NULL) 114 114 115 115 #define heap_full(h) ((h)->used == (h)->size) 116 116

+25 -28

drivers/md/bcache/writeback.c

reviewed

··· 30 30 31 31 /* PD controller */ 32 32 33 33 - int change = 0; 34 34 - int64_t error; 35 33 int64_t dirty = bcache_dev_sectors_dirty(&dc->disk); 36 34 int64_t derivative = dirty - dc->disk.sectors_dirty_last; 35 35 + int64_t proportional = dirty - target; 36 36 + int64_t change; 37 37 38 38 dc->disk.sectors_dirty_last = dirty; 39 39 40 40 - derivative *= dc->writeback_rate_d_term; 41 41 - derivative = clamp(derivative, -dirty, dirty); 40 40 + /* Scale to sectors per second */ 41 41 + 42 42 + proportional *= dc->writeback_rate_update_seconds; 43 43 + proportional = div_s64(proportional, dc->writeback_rate_p_term_inverse); 44 44 + 45 45 + derivative = div_s64(derivative, dc->writeback_rate_update_seconds); 42 46 43 47 derivative = ewma_add(dc->disk.sectors_dirty_derivative, derivative, 44 44 - dc->writeback_rate_d_smooth, 0); 48 48 + (dc->writeback_rate_d_term / 49 49 + dc->writeback_rate_update_seconds) ?: 1, 0); 45 50 46 46 - /* Avoid divide by zero */ 47 47 - if (!target) 48 48 - goto out; 51 51 + derivative *= dc->writeback_rate_d_term; 52 52 + derivative = div_s64(derivative, dc->writeback_rate_p_term_inverse); 49 53 50 50 - error = div64_s64((dirty + derivative - target) << 8, target); 51 51 - 52 52 - change = div_s64((dc->writeback_rate.rate * error) >> 8, 53 53 - dc->writeback_rate_p_term_inverse); 54 54 + change = proportional + derivative; 54 55 55 56 /* Don't increase writeback rate if the device isn't keeping up */ 56 57 if (change > 0 && 57 58 time_after64(local_clock(), 58 58 - dc->writeback_rate.next + 10 * NSEC_PER_MSEC)) 59 59 + dc->writeback_rate.next + NSEC_PER_MSEC)) 59 60 change = 0; 60 61 61 62 dc->writeback_rate.rate = 62 62 - clamp_t(int64_t, dc->writeback_rate.rate + change, 63 63 + clamp_t(int64_t, (int64_t) dc->writeback_rate.rate + change, 63 64 1, NSEC_PER_MSEC); 64 64 - out: 65 65 + 66 66 + dc->writeback_rate_proportional = proportional; 65 67 dc->writeback_rate_derivative = derivative; 66 68 dc->writeback_rate_change = change; 67 69 dc->writeback_rate_target = target; ··· 89 87 90 88 static unsigned writeback_delay(struct cached_dev *dc, unsigned sectors) 91 89 { 92 92 - uint64_t ret; 93 93 - 94 90 if (test_bit(BCACHE_DEV_DETACHING, &dc->disk.flags) || 95 91 !dc->writeback_percent) 96 92 return 0; 97 93 98 98 - ret = bch_next_delay(&dc->writeback_rate, sectors * 10000000ULL); 99 99 - 100 100 - return min_t(uint64_t, ret, HZ); 94 94 + return bch_next_delay(&dc->writeback_rate, sectors); 101 95 } 102 96 103 97 struct dirty_io { ··· 239 241 if (KEY_START(&w->key) != dc->last_read || 240 242 jiffies_to_msecs(delay) > 50) 241 243 while (!kthread_should_stop() && delay) 242 242 - delay = schedule_timeout_interruptible(delay); 244 244 + delay = schedule_timeout_uninterruptible(delay); 243 245 244 246 dc->last_read = KEY_OFFSET(&w->key); 245 247 ··· 436 438 while (delay && 437 439 !kthread_should_stop() && 438 440 !test_bit(BCACHE_DEV_DETACHING, &dc->disk.flags)) 439 439 - delay = schedule_timeout_interruptible(delay); 441 441 + delay = schedule_timeout_uninterruptible(delay); 440 442 } 441 443 } 442 444 ··· 474 476 475 477 bch_btree_map_keys(&op.op, dc->disk.c, &KEY(op.inode, 0, 0), 476 478 sectors_dirty_init_fn, 0); 479 479 + 480 480 + dc->disk.sectors_dirty_last = bcache_dev_sectors_dirty(&dc->disk); 477 481 } 478 482 479 483 int bch_cached_dev_writeback_init(struct cached_dev *dc) ··· 490 490 dc->writeback_delay = 30; 491 491 dc->writeback_rate.rate = 1024; 492 492 493 493 - dc->writeback_rate_update_seconds = 30; 494 494 - dc->writeback_rate_d_term = 16; 495 495 - dc->writeback_rate_p_term_inverse = 64; 496 496 - dc->writeback_rate_d_smooth = 8; 493 493 + dc->writeback_rate_update_seconds = 5; 494 494 + dc->writeback_rate_d_term = 30; 495 495 + dc->writeback_rate_p_term_inverse = 6000; 497 496 498 497 dc->writeback_thread = kthread_create(bch_writeback_thread, dc, 499 498 "bcache_writeback"); 500 499 if (IS_ERR(dc->writeback_thread)) 501 500 return PTR_ERR(dc->writeback_thread); 502 502 - 503 503 - set_task_state(dc->writeback_thread, TASK_INTERRUPTIBLE); 504 501 505 502 INIT_DELAYED_WORK(&dc->writeback_rate_update, update_writeback_rate); 506 503 schedule_delayed_work(&dc->writeback_rate_update,