commit 6c5103890057b1bb781b26b7aae38d33e4c517d8 · tjh.dev/kernel

-5

Documentation/block/biodoc.txt

··· 963 963 964 964 elevator_add_req_fn* called to add a new request into the scheduler 965 965 966 - elevator_queue_empty_fn returns true if the merge queue is empty. 967 - Drivers shouldn't use this, but rather check 968 - if elv_next_request is NULL (without losing the 969 - request if one exists!) 970 - 971 966 elevator_former_req_fn 972 967 elevator_latter_req_fn These return the request before or after the 973 968 one specified in disk sort order. Used by the

+1 -29

Documentation/cgroups/blkio-controller.txt

··· 140 140 - Specifies per cgroup weight. This is default weight of the group 141 141 on all the devices until and unless overridden by per device rule. 142 142 (See blkio.weight_device). 143 - Currently allowed range of weights is from 100 to 1000. 143 + Currently allowed range of weights is from 10 to 1000. 144 144 145 145 - blkio.weight_device 146 146 - One can specify per cgroup per device rules using this interface. ··· 343 343 344 344 CFQ sysfs tunable 345 345 ================= 346 - /sys/block/<disk>/queue/iosched/group_isolation 347 - ----------------------------------------------- 348 - 349 - If group_isolation=1, it provides stronger isolation between groups at the 350 - expense of throughput. By default group_isolation is 0. In general that 351 - means that if group_isolation=0, expect fairness for sequential workload 352 - only. Set group_isolation=1 to see fairness for random IO workload also. 353 - 354 - Generally CFQ will put random seeky workload in sync-noidle category. CFQ 355 - will disable idling on these queues and it does a collective idling on group 356 - of such queues. Generally these are slow moving queues and if there is a 357 - sync-noidle service tree in each group, that group gets exclusive access to 358 - disk for certain period. That means it will bring the throughput down if 359 - group does not have enough IO to drive deeper queue depths and utilize disk 360 - capacity to the fullest in the slice allocated to it. But the flip side is 361 - that even a random reader should get better latencies and overall throughput 362 - if there are lots of sequential readers/sync-idle workload running in the 363 - system. 364 - 365 - If group_isolation=0, then CFQ automatically moves all the random seeky queues 366 - in the root group. That means there will be no service differentiation for 367 - that kind of workload. This leads to better throughput as we do collective 368 - idling on root sync-noidle tree. 369 - 370 - By default one should run with group_isolation=0. If that is not sufficient 371 - and one wants stronger isolation between groups, then set group_isolation=1 372 - but this will come at cost of reduced throughput. 373 - 374 346 /sys/block/<disk>/queue/iosched/slice_idle 375 347 ------------------------------------------ 376 348 On a faster hardware CFQ can be slow, especially with sequential workload.

+8 -9

Documentation/iostats.txt

··· 1 1 I/O statistics fields 2 2 --------------- 3 3 4 - Last modified Sep 30, 2003 5 - 6 4 Since 2.4.20 (and some versions before, with patches), and 2.5.45, 7 5 more extensive disk statistics have been introduced to help measure disk 8 6 activity. Tools such as sar and iostat typically interpret these and do ··· 44 46 By contrast, in 2.6 if you look at /sys/block/hda/stat, you'll 45 47 find just the eleven fields, beginning with 446216. If you look at 46 48 /proc/diskstats, the eleven fields will be preceded by the major and 47 - minor device numbers, and device name. Each of these formats provide 49 + minor device numbers, and device name. Each of these formats provides 48 50 eleven fields of statistics, each meaning exactly the same things. 49 51 All fields except field 9 are cumulative since boot. Field 9 should 50 - go to zero as I/Os complete; all others only increase. Yes, these are 51 - 32 bit unsigned numbers, and on a very busy or long-lived system they 52 + go to zero as I/Os complete; all others only increase (unless they 53 + overflow and wrap). Yes, these are (32-bit or 64-bit) unsigned long 54 + (native word size) numbers, and on a very busy or long-lived system they 52 55 may wrap. Applications should be prepared to deal with that; unless 53 56 your observations are measured in large numbers of minutes or hours, 54 57 they should not wrap twice before you notice them. ··· 95 96 read I/Os issued per partition should equal those made to the disks ... 96 97 but due to the lack of locking it may only be very close. 97 98 98 - In 2.6, there are counters for each cpu, which made the lack of locking 99 - almost a non-issue. When the statistics are read, the per-cpu counters 100 - are summed (possibly overflowing the unsigned 32-bit variable they are 99 + In 2.6, there are counters for each CPU, which make the lack of locking 100 + almost a non-issue. When the statistics are read, the per-CPU counters 101 + are summed (possibly overflowing the unsigned long variable they are 101 102 summed to) and the result given to the user. There is no convenient 102 - user interface for accessing the per-cpu counters themselves. 103 + user interface for accessing the per-CPU counters themselves. 103 104 104 105 Disks vs Partitions 105 106 -------------------

+15 -1

block/blk-cgroup.c

··· 371 371 } 372 372 EXPORT_SYMBOL_GPL(blkiocg_update_io_remove_stats); 373 373 374 - void blkiocg_update_timeslice_used(struct blkio_group *blkg, unsigned long time) 374 + void blkiocg_update_timeslice_used(struct blkio_group *blkg, unsigned long time, 375 + unsigned long unaccounted_time) 375 376 { 376 377 unsigned long flags; 377 378 378 379 spin_lock_irqsave(&blkg->stats_lock, flags); 379 380 blkg->stats.time += time; 381 + blkg->stats.unaccounted_time += unaccounted_time; 380 382 spin_unlock_irqrestore(&blkg->stats_lock, flags); 381 383 } 382 384 EXPORT_SYMBOL_GPL(blkiocg_update_timeslice_used); ··· 606 604 return blkio_fill_stat(key_str, MAX_KEY_LEN - 1, 607 605 blkg->stats.sectors, cb, dev); 608 606 #ifdef CONFIG_DEBUG_BLK_CGROUP 607 + if (type == BLKIO_STAT_UNACCOUNTED_TIME) 608 + return blkio_fill_stat(key_str, MAX_KEY_LEN - 1, 609 + blkg->stats.unaccounted_time, cb, dev); 609 610 if (type == BLKIO_STAT_AVG_QUEUE_SIZE) { 610 611 uint64_t sum = blkg->stats.avg_queue_size_sum; 611 612 uint64_t samples = blkg->stats.avg_queue_size_samples; ··· 1130 1125 return blkio_read_blkg_stats(blkcg, cft, cb, 1131 1126 BLKIO_STAT_QUEUED, 1); 1132 1127 #ifdef CONFIG_DEBUG_BLK_CGROUP 1128 + case BLKIO_PROP_unaccounted_time: 1129 + return blkio_read_blkg_stats(blkcg, cft, cb, 1130 + BLKIO_STAT_UNACCOUNTED_TIME, 0); 1133 1131 case BLKIO_PROP_dequeue: 1134 1132 return blkio_read_blkg_stats(blkcg, cft, cb, 1135 1133 BLKIO_STAT_DEQUEUE, 0); ··· 1388 1380 .name = "dequeue", 1389 1381 .private = BLKIOFILE_PRIVATE(BLKIO_POLICY_PROP, 1390 1382 BLKIO_PROP_dequeue), 1383 + .read_map = blkiocg_file_read_map, 1384 + }, 1385 + { 1386 + .name = "unaccounted_time", 1387 + .private = BLKIOFILE_PRIVATE(BLKIO_POLICY_PROP, 1388 + BLKIO_PROP_unaccounted_time), 1391 1389 .read_map = blkiocg_file_read_map, 1392 1390 }, 1393 1391 #endif

+11 -3

block/blk-cgroup.h

··· 49 49 /* All the single valued stats go below this */ 50 50 BLKIO_STAT_TIME, 51 51 BLKIO_STAT_SECTORS, 52 + /* Time not charged to this cgroup */ 53 + BLKIO_STAT_UNACCOUNTED_TIME, 52 54 #ifdef CONFIG_DEBUG_BLK_CGROUP 53 55 BLKIO_STAT_AVG_QUEUE_SIZE, 54 56 BLKIO_STAT_IDLE_TIME, ··· 83 81 BLKIO_PROP_io_serviced, 84 82 BLKIO_PROP_time, 85 83 BLKIO_PROP_sectors, 84 + BLKIO_PROP_unaccounted_time, 86 85 BLKIO_PROP_io_service_time, 87 86 BLKIO_PROP_io_wait_time, 88 87 BLKIO_PROP_io_merged, ··· 117 114 /* total disk time and nr sectors dispatched by this group */ 118 115 uint64_t time; 119 116 uint64_t sectors; 117 + /* Time not charged to this cgroup */ 118 + uint64_t unaccounted_time; 120 119 uint64_t stat_arr[BLKIO_STAT_QUEUED + 1][BLKIO_STAT_TOTAL]; 121 120 #ifdef CONFIG_DEBUG_BLK_CGROUP 122 121 /* Sum of number of IOs queued across all samples */ ··· 245 240 246 241 #endif 247 242 248 - #define BLKIO_WEIGHT_MIN 100 243 + #define BLKIO_WEIGHT_MIN 10 249 244 #define BLKIO_WEIGHT_MAX 1000 250 245 #define BLKIO_WEIGHT_DEFAULT 500 251 246 ··· 298 293 extern struct blkio_group *blkiocg_lookup_group(struct blkio_cgroup *blkcg, 299 294 void *key); 300 295 void blkiocg_update_timeslice_used(struct blkio_group *blkg, 301 - unsigned long time); 296 + unsigned long time, 297 + unsigned long unaccounted_time); 302 298 void blkiocg_update_dispatch_stats(struct blkio_group *blkg, uint64_t bytes, 303 299 bool direction, bool sync); 304 300 void blkiocg_update_completion_stats(struct blkio_group *blkg, ··· 325 319 static inline struct blkio_group * 326 320 blkiocg_lookup_group(struct blkio_cgroup *blkcg, void *key) { return NULL; } 327 321 static inline void blkiocg_update_timeslice_used(struct blkio_group *blkg, 328 - unsigned long time) {} 322 + unsigned long time, 323 + unsigned long unaccounted_time) 324 + {} 329 325 static inline void blkiocg_update_dispatch_stats(struct blkio_group *blkg, 330 326 uint64_t bytes, bool direction, bool sync) {} 331 327 static inline void blkiocg_update_completion_stats(struct blkio_group *blkg,

+380 -272

block/blk-core.c

··· 27 27 #include <linux/writeback.h> 28 28 #include <linux/task_io_accounting_ops.h> 29 29 #include <linux/fault-inject.h> 30 + #include <linux/list_sort.h> 30 31 31 32 #define CREATE_TRACE_POINTS 32 33 #include <trace/events/block.h> ··· 150 149 static void req_bio_endio(struct request *rq, struct bio *bio, 151 150 unsigned int nbytes, int error) 152 151 { 153 - struct request_queue *q = rq->q; 152 + if (error) 153 + clear_bit(BIO_UPTODATE, &bio->bi_flags); 154 + else if (!test_bit(BIO_UPTODATE, &bio->bi_flags)) 155 + error = -EIO; 154 156 155 - if (&q->flush_rq != rq) { 156 - if (error) 157 - clear_bit(BIO_UPTODATE, &bio->bi_flags); 158 - else if (!test_bit(BIO_UPTODATE, &bio->bi_flags)) 159 - error = -EIO; 160 - 161 - if (unlikely(nbytes > bio->bi_size)) { 162 - printk(KERN_ERR "%s: want %u bytes done, %u left\n", 163 - __func__, nbytes, bio->bi_size); 164 - nbytes = bio->bi_size; 165 - } 166 - 167 - if (unlikely(rq->cmd_flags & REQ_QUIET)) 168 - set_bit(BIO_QUIET, &bio->bi_flags); 169 - 170 - bio->bi_size -= nbytes; 171 - bio->bi_sector += (nbytes >> 9); 172 - 173 - if (bio_integrity(bio)) 174 - bio_integrity_advance(bio, nbytes); 175 - 176 - if (bio->bi_size == 0) 177 - bio_endio(bio, error); 178 - } else { 179 - /* 180 - * Okay, this is the sequenced flush request in 181 - * progress, just record the error; 182 - */ 183 - if (error && !q->flush_err) 184 - q->flush_err = error; 157 + if (unlikely(nbytes > bio->bi_size)) { 158 + printk(KERN_ERR "%s: want %u bytes done, %u left\n", 159 + __func__, nbytes, bio->bi_size); 160 + nbytes = bio->bi_size; 185 161 } 162 + 163 + if (unlikely(rq->cmd_flags & REQ_QUIET)) 164 + set_bit(BIO_QUIET, &bio->bi_flags); 165 + 166 + bio->bi_size -= nbytes; 167 + bio->bi_sector += (nbytes >> 9); 168 + 169 + if (bio_integrity(bio)) 170 + bio_integrity_advance(bio, nbytes); 171 + 172 + /* don't actually finish bio if it's part of flush sequence */ 173 + if (bio->bi_size == 0 && !(rq->cmd_flags & REQ_FLUSH_SEQ)) 174 + bio_endio(bio, error); 186 175 } 187 176 188 177 void blk_dump_rq_flags(struct request *rq, char *msg) ··· 199 208 EXPORT_SYMBOL(blk_dump_rq_flags); 200 209 201 210 /* 202 - * "plug" the device if there are no outstanding requests: this will 203 - * force the transfer to start only after we have put all the requests 204 - * on the list. 205 - * 206 - * This is called with interrupts off and no requests on the queue and 207 - * with the queue lock held. 208 - */ 209 - void blk_plug_device(struct request_queue *q) 211 + * Make sure that plugs that were pending when this function was entered, 212 + * are now complete and requests pushed to the queue. 213 + */ 214 + static inline void queue_sync_plugs(struct request_queue *q) 210 215 { 211 - WARN_ON(!irqs_disabled()); 212 - 213 216 /* 214 - * don't plug a stopped queue, it must be paired with blk_start_queue() 215 - * which will restart the queueing 217 + * If the current process is plugged and has barriers submitted, 218 + * we will livelock if we don't unplug first. 216 219 */ 217 - if (blk_queue_stopped(q)) 218 - return; 219 - 220 - if (!queue_flag_test_and_set(QUEUE_FLAG_PLUGGED, q)) { 221 - mod_timer(&q->unplug_timer, jiffies + q->unplug_delay); 222 - trace_block_plug(q); 223 - } 220 + blk_flush_plug(current); 224 221 } 225 - EXPORT_SYMBOL(blk_plug_device); 226 222 227 - /** 228 - * blk_plug_device_unlocked - plug a device without queue lock held 229 - * @q: The &struct request_queue to plug 230 - * 231 - * Description: 232 - * Like @blk_plug_device(), but grabs the queue lock and disables 233 - * interrupts. 234 - **/ 235 - void blk_plug_device_unlocked(struct request_queue *q) 223 + static void blk_delay_work(struct work_struct *work) 236 224 { 237 - unsigned long flags; 225 + struct request_queue *q; 238 226 239 - spin_lock_irqsave(q->queue_lock, flags); 240 - blk_plug_device(q); 241 - spin_unlock_irqrestore(q->queue_lock, flags); 242 - } 243 - EXPORT_SYMBOL(blk_plug_device_unlocked); 244 - 245 - /* 246 - * remove the queue from the plugged list, if present. called with 247 - * queue lock held and interrupts disabled. 248 - */ 249 - int blk_remove_plug(struct request_queue *q) 250 - { 251 - WARN_ON(!irqs_disabled()); 252 - 253 - if (!queue_flag_test_and_clear(QUEUE_FLAG_PLUGGED, q)) 254 - return 0; 255 - 256 - del_timer(&q->unplug_timer); 257 - return 1; 258 - } 259 - EXPORT_SYMBOL(blk_remove_plug); 260 - 261 - /* 262 - * remove the plug and let it rip.. 263 - */ 264 - void __generic_unplug_device(struct request_queue *q) 265 - { 266 - if (unlikely(blk_queue_stopped(q))) 267 - return; 268 - if (!blk_remove_plug(q) && !blk_queue_nonrot(q)) 269 - return; 270 - 271 - q->request_fn(q); 227 + q = container_of(work, struct request_queue, delay_work.work); 228 + spin_lock_irq(q->queue_lock); 229 + __blk_run_queue(q, false); 230 + spin_unlock_irq(q->queue_lock); 272 231 } 273 232 274 233 /** 275 - * generic_unplug_device - fire a request queue 276 - * @q: The &struct request_queue in question 234 + * blk_delay_queue - restart queueing after defined interval 235 + * @q: The &struct request_queue in question 236 + * @msecs: Delay in msecs 277 237 * 278 238 * Description: 279 - * Linux uses plugging to build bigger requests queues before letting 280 - * the device have at them. If a queue is plugged, the I/O scheduler 281 - * is still adding and merging requests on the queue. Once the queue 282 - * gets unplugged, the request_fn defined for the queue is invoked and 283 - * transfers started. 284 - **/ 285 - void generic_unplug_device(struct request_queue *q) 239 + * Sometimes queueing needs to be postponed for a little while, to allow 240 + * resources to come back. This function will make sure that queueing is 241 + * restarted around the specified time. 242 + */ 243 + void blk_delay_queue(struct request_queue *q, unsigned long msecs) 286 244 { 287 - if (blk_queue_plugged(q)) { 288 - spin_lock_irq(q->queue_lock); 289 - __generic_unplug_device(q); 290 - spin_unlock_irq(q->queue_lock); 291 - } 245 + schedule_delayed_work(&q->delay_work, msecs_to_jiffies(msecs)); 292 246 } 293 - EXPORT_SYMBOL(generic_unplug_device); 294 - 295 - static void blk_backing_dev_unplug(struct backing_dev_info *bdi, 296 - struct page *page) 297 - { 298 - struct request_queue *q = bdi->unplug_io_data; 299 - 300 - blk_unplug(q); 301 - } 302 - 303 - void blk_unplug_work(struct work_struct *work) 304 - { 305 - struct request_queue *q = 306 - container_of(work, struct request_queue, unplug_work); 307 - 308 - trace_block_unplug_io(q); 309 - q->unplug_fn(q); 310 - } 311 - 312 - void blk_unplug_timeout(unsigned long data) 313 - { 314 - struct request_queue *q = (struct request_queue *)data; 315 - 316 - trace_block_unplug_timer(q); 317 - kblockd_schedule_work(q, &q->unplug_work); 318 - } 319 - 320 - void blk_unplug(struct request_queue *q) 321 - { 322 - /* 323 - * devices don't necessarily have an ->unplug_fn defined 324 - */ 325 - if (q->unplug_fn) { 326 - trace_block_unplug_io(q); 327 - q->unplug_fn(q); 328 - } 329 - } 330 - EXPORT_SYMBOL(blk_unplug); 247 + EXPORT_SYMBOL(blk_delay_queue); 331 248 332 249 /** 333 250 * blk_start_queue - restart a previously stopped queue ··· 271 372 **/ 272 373 void blk_stop_queue(struct request_queue *q) 273 374 { 274 - blk_remove_plug(q); 375 + cancel_delayed_work(&q->delay_work); 275 376 queue_flag_set(QUEUE_FLAG_STOPPED, q); 276 377 } 277 378 EXPORT_SYMBOL(blk_stop_queue); ··· 289 390 * that its ->make_request_fn will not re-add plugging prior to calling 290 391 * this function. 291 392 * 393 + * This function does not cancel any asynchronous activity arising 394 + * out of elevator or throttling code. That would require elevaotor_exit() 395 + * and blk_throtl_exit() to be called with queue lock initialized. 396 + * 292 397 */ 293 398 void blk_sync_queue(struct request_queue *q) 294 399 { 295 - del_timer_sync(&q->unplug_timer); 296 400 del_timer_sync(&q->timeout); 297 - cancel_work_sync(&q->unplug_work); 298 - throtl_shutdown_timer_wq(q); 401 + cancel_delayed_work_sync(&q->delay_work); 402 + queue_sync_plugs(q); 299 403 } 300 404 EXPORT_SYMBOL(blk_sync_queue); 301 405 ··· 314 412 */ 315 413 void __blk_run_queue(struct request_queue *q, bool force_kblockd) 316 414 { 317 - blk_remove_plug(q); 318 - 319 415 if (unlikely(blk_queue_stopped(q))) 320 - return; 321 - 322 - if (elv_queue_empty(q)) 323 416 return; 324 417 325 418 /* ··· 324 427 if (!force_kblockd && !queue_flag_test_and_set(QUEUE_FLAG_REENTER, q)) { 325 428 q->request_fn(q); 326 429 queue_flag_clear(QUEUE_FLAG_REENTER, q); 327 - } else { 328 - queue_flag_set(QUEUE_FLAG_PLUGGED, q); 329 - kblockd_schedule_work(q, &q->unplug_work); 330 - } 430 + } else 431 + queue_delayed_work(kblockd_workqueue, &q->delay_work, 0); 331 432 } 332 433 EXPORT_SYMBOL(__blk_run_queue); 333 434 ··· 352 457 kobject_put(&q->kobj); 353 458 } 354 459 460 + /* 461 + * Note: If a driver supplied the queue lock, it should not zap that lock 462 + * unexpectedly as some queue cleanup components like elevator_exit() and 463 + * blk_throtl_exit() need queue lock. 464 + */ 355 465 void blk_cleanup_queue(struct request_queue *q) 356 466 { 357 467 /* ··· 374 474 375 475 if (q->elevator) 376 476 elevator_exit(q->elevator); 477 + 478 + blk_throtl_exit(q); 377 479 378 480 blk_put_queue(q); 379 481 } ··· 419 517 if (!q) 420 518 return NULL; 421 519 422 - q->backing_dev_info.unplug_io_fn = blk_backing_dev_unplug; 423 - q->backing_dev_info.unplug_io_data = q; 424 520 q->backing_dev_info.ra_pages = 425 521 (VM_MAX_READAHEAD * 1024) / PAGE_CACHE_SIZE; 426 522 q->backing_dev_info.state = 0; ··· 438 538 439 539 setup_timer(&q->backing_dev_info.laptop_mode_wb_timer, 440 540 laptop_mode_timer_fn, (unsigned long) q); 441 - init_timer(&q->unplug_timer); 442 541 setup_timer(&q->timeout, blk_rq_timed_out_timer, (unsigned long) q); 443 542 INIT_LIST_HEAD(&q->timeout_list); 444 - INIT_LIST_HEAD(&q->pending_flushes); 445 - INIT_WORK(&q->unplug_work, blk_unplug_work); 543 + INIT_LIST_HEAD(&q->flush_queue[0]); 544 + INIT_LIST_HEAD(&q->flush_queue[1]); 545 + INIT_LIST_HEAD(&q->flush_data_in_flight); 546 + INIT_DELAYED_WORK(&q->delay_work, blk_delay_work); 446 547 447 548 kobject_init(&q->kobj, &blk_queue_ktype); 448 549 449 550 mutex_init(&q->sysfs_lock); 450 551 spin_lock_init(&q->__queue_lock); 552 + 553 + /* 554 + * By default initialize queue_lock to internal lock and driver can 555 + * override it later if need be. 556 + */ 557 + q->queue_lock = &q->__queue_lock; 451 558 452 559 return q; 453 560 } ··· 538 631 q->request_fn = rfn; 539 632 q->prep_rq_fn = NULL; 540 633 q->unprep_rq_fn = NULL; 541 - q->unplug_fn = generic_unplug_device; 542 634 q->queue_flags = QUEUE_FLAG_DEFAULT; 543 - q->queue_lock = lock; 635 + 636 + /* Override internal queue lock with supplied lock pointer */ 637 + if (lock) 638 + q->queue_lock = lock; 544 639 545 640 /* 546 641 * This also sets hw/phys segments, boundary and size ··· 575 666 576 667 static inline void blk_free_request(struct request_queue *q, struct request *rq) 577 668 { 669 + BUG_ON(rq->cmd_flags & REQ_ON_PLUG); 670 + 578 671 if (rq->cmd_flags & REQ_ELVPRIV) 579 672 elv_put_request(q, rq); 580 673 mempool_free(rq, q->rq.rq_pool); ··· 673 762 } 674 763 675 764 /* 765 + * Determine if elevator data should be initialized when allocating the 766 + * request associated with @bio. 767 + */ 768 + static bool blk_rq_should_init_elevator(struct bio *bio) 769 + { 770 + if (!bio) 771 + return true; 772 + 773 + /* 774 + * Flush requests do not use the elevator so skip initialization. 775 + * This allows a request to share the flush and elevator data. 776 + */ 777 + if (bio->bi_rw & (REQ_FLUSH | REQ_FUA)) 778 + return false; 779 + 780 + return true; 781 + } 782 + 783 + /* 676 784 * Get a free request, queue_lock must be held. 677 785 * Returns NULL on failure, with queue_lock held. 678 786 * Returns !NULL on success, with queue_lock *not held*. ··· 703 773 struct request_list *rl = &q->rq; 704 774 struct io_context *ioc = NULL; 705 775 const bool is_sync = rw_is_sync(rw_flags) != 0; 706 - int may_queue, priv; 776 + int may_queue, priv = 0; 707 777 708 778 may_queue = elv_may_queue(q, rw_flags); 709 779 if (may_queue == ELV_MQUEUE_NO) ··· 747 817 rl->count[is_sync]++; 748 818 rl->starved[is_sync] = 0; 749 819 750 - priv = !test_bit(QUEUE_FLAG_ELVSWITCH, &q->queue_flags); 751 - if (priv) 752 - rl->elvpriv++; 820 + if (blk_rq_should_init_elevator(bio)) { 821 + priv = !test_bit(QUEUE_FLAG_ELVSWITCH, &q->queue_flags); 822 + if (priv) 823 + rl->elvpriv++; 824 + } 753 825 754 826 if (blk_queue_io_stat(q)) 755 827 rw_flags |= REQ_IO_STAT; ··· 798 866 } 799 867 800 868 /* 801 - * No available requests for this queue, unplug the device and wait for some 802 - * requests to become available. 869 + * No available requests for this queue, wait for some requests to become 870 + * available. 803 871 * 804 872 * Called with q->queue_lock held, and returns with it unlocked. 805 873 */ ··· 820 888 821 889 trace_block_sleeprq(q, bio, rw_flags & 1); 822 890 823 - __generic_unplug_device(q); 824 891 spin_unlock_irq(q->queue_lock); 825 892 io_schedule(); 826 893 ··· 941 1010 } 942 1011 EXPORT_SYMBOL(blk_requeue_request); 943 1012 1013 + static void add_acct_request(struct request_queue *q, struct request *rq, 1014 + int where) 1015 + { 1016 + drive_stat_acct(rq, 1); 1017 + __elv_add_request(q, rq, where); 1018 + } 1019 + 944 1020 /** 945 1021 * blk_insert_request - insert a special request into a request queue 946 1022 * @q: request queue where request should be inserted ··· 990 1052 if (blk_rq_tagged(rq)) 991 1053 blk_queue_end_tag(q, rq); 992 1054 993 - drive_stat_acct(rq, 1); 994 - __elv_add_request(q, rq, where, 0); 1055 + add_acct_request(q, rq, where); 995 1056 __blk_run_queue(q, false); 996 1057 spin_unlock_irqrestore(q->queue_lock, flags); 997 1058 } ··· 1111 1174 } 1112 1175 EXPORT_SYMBOL_GPL(blk_add_request_payload); 1113 1176 1177 + static bool bio_attempt_back_merge(struct request_queue *q, struct request *req, 1178 + struct bio *bio) 1179 + { 1180 + const int ff = bio->bi_rw & REQ_FAILFAST_MASK; 1181 + 1182 + /* 1183 + * Debug stuff, kill later 1184 + */ 1185 + if (!rq_mergeable(req)) { 1186 + blk_dump_rq_flags(req, "back"); 1187 + return false; 1188 + } 1189 + 1190 + if (!ll_back_merge_fn(q, req, bio)) 1191 + return false; 1192 + 1193 + trace_block_bio_backmerge(q, bio); 1194 + 1195 + if ((req->cmd_flags & REQ_FAILFAST_MASK) != ff) 1196 + blk_rq_set_mixed_merge(req); 1197 + 1198 + req->biotail->bi_next = bio; 1199 + req->biotail = bio; 1200 + req->__data_len += bio->bi_size; 1201 + req->ioprio = ioprio_best(req->ioprio, bio_prio(bio)); 1202 + 1203 + drive_stat_acct(req, 0); 1204 + return true; 1205 + } 1206 + 1207 + static bool bio_attempt_front_merge(struct request_queue *q, 1208 + struct request *req, struct bio *bio) 1209 + { 1210 + const int ff = bio->bi_rw & REQ_FAILFAST_MASK; 1211 + sector_t sector; 1212 + 1213 + /* 1214 + * Debug stuff, kill later 1215 + */ 1216 + if (!rq_mergeable(req)) { 1217 + blk_dump_rq_flags(req, "front"); 1218 + return false; 1219 + } 1220 + 1221 + if (!ll_front_merge_fn(q, req, bio)) 1222 + return false; 1223 + 1224 + trace_block_bio_frontmerge(q, bio); 1225 + 1226 + if ((req->cmd_flags & REQ_FAILFAST_MASK) != ff) 1227 + blk_rq_set_mixed_merge(req); 1228 + 1229 + sector = bio->bi_sector; 1230 + 1231 + bio->bi_next = req->bio; 1232 + req->bio = bio; 1233 + 1234 + /* 1235 + * may not be valid. if the low level driver said 1236 + * it didn't need a bounce buffer then it better 1237 + * not touch req->buffer either... 1238 + */ 1239 + req->buffer = bio_data(bio); 1240 + req->__sector = bio->bi_sector; 1241 + req->__data_len += bio->bi_size; 1242 + req->ioprio = ioprio_best(req->ioprio, bio_prio(bio)); 1243 + 1244 + drive_stat_acct(req, 0); 1245 + return true; 1246 + } 1247 + 1248 + /* 1249 + * Attempts to merge with the plugged list in the current process. Returns 1250 + * true if merge was succesful, otherwise false. 1251 + */ 1252 + static bool attempt_plug_merge(struct task_struct *tsk, struct request_queue *q, 1253 + struct bio *bio) 1254 + { 1255 + struct blk_plug *plug; 1256 + struct request *rq; 1257 + bool ret = false; 1258 + 1259 + plug = tsk->plug; 1260 + if (!plug) 1261 + goto out; 1262 + 1263 + list_for_each_entry_reverse(rq, &plug->list, queuelist) { 1264 + int el_ret; 1265 + 1266 + if (rq->q != q) 1267 + continue; 1268 + 1269 + el_ret = elv_try_merge(rq, bio); 1270 + if (el_ret == ELEVATOR_BACK_MERGE) { 1271 + ret = bio_attempt_back_merge(q, rq, bio); 1272 + if (ret) 1273 + break; 1274 + } else if (el_ret == ELEVATOR_FRONT_MERGE) { 1275 + ret = bio_attempt_front_merge(q, rq, bio); 1276 + if (ret) 1277 + break; 1278 + } 1279 + } 1280 + out: 1281 + return ret; 1282 + } 1283 + 1114 1284 void init_request_from_bio(struct request *req, struct bio *bio) 1115 1285 { 1116 1286 req->cpu = bio->bi_comp_cpu; ··· 1233 1189 blk_rq_bio_prep(req->q, req, bio); 1234 1190 } 1235 1191 1236 - /* 1237 - * Only disabling plugging for non-rotational devices if it does tagging 1238 - * as well, otherwise we do need the proper merging 1239 - */ 1240 - static inline bool queue_should_plug(struct request_queue *q) 1241 - { 1242 - return !(blk_queue_nonrot(q) && blk_queue_tagged(q)); 1243 - } 1244 - 1245 1192 static int __make_request(struct request_queue *q, struct bio *bio) 1246 1193 { 1247 - struct request *req; 1248 - int el_ret; 1249 - unsigned int bytes = bio->bi_size; 1250 - const unsigned short prio = bio_prio(bio); 1251 1194 const bool sync = !!(bio->bi_rw & REQ_SYNC); 1252 - const bool unplug = !!(bio->bi_rw & REQ_UNPLUG); 1253 - const unsigned long ff = bio->bi_rw & REQ_FAILFAST_MASK; 1254 - int where = ELEVATOR_INSERT_SORT; 1255 - int rw_flags; 1195 + struct blk_plug *plug; 1196 + int el_ret, rw_flags, where = ELEVATOR_INSERT_SORT; 1197 + struct request *req; 1256 1198 1257 1199 /* 1258 1200 * low level driver can indicate that it wants pages above a ··· 1247 1217 */ 1248 1218 blk_queue_bounce(q, &bio); 1249 1219 1250 - spin_lock_irq(q->queue_lock); 1251 - 1252 1220 if (bio->bi_rw & (REQ_FLUSH | REQ_FUA)) { 1253 - where = ELEVATOR_INSERT_FRONT; 1221 + spin_lock_irq(q->queue_lock); 1222 + where = ELEVATOR_INSERT_FLUSH; 1254 1223 goto get_rq; 1255 1224 } 1256 1225 1257 - if (elv_queue_empty(q)) 1258 - goto get_rq; 1226 + /* 1227 + * Check if we can merge with the plugged list before grabbing 1228 + * any locks. 1229 + */ 1230 + if (attempt_plug_merge(current, q, bio)) 1231 + goto out; 1232 + 1233 + spin_lock_irq(q->queue_lock); 1259 1234 1260 1235 el_ret = elv_merge(q, &req, bio); 1261 - switch (el_ret) { 1262 - case ELEVATOR_BACK_MERGE: 1263 - BUG_ON(!rq_mergeable(req)); 1264 - 1265 - if (!ll_back_merge_fn(q, req, bio)) 1266 - break; 1267 - 1268 - trace_block_bio_backmerge(q, bio); 1269 - 1270 - if ((req->cmd_flags & REQ_FAILFAST_MASK) != ff) 1271 - blk_rq_set_mixed_merge(req); 1272 - 1273 - req->biotail->bi_next = bio; 1274 - req->biotail = bio; 1275 - req->__data_len += bytes; 1276 - req->ioprio = ioprio_best(req->ioprio, prio); 1277 - if (!blk_rq_cpu_valid(req)) 1278 - req->cpu = bio->bi_comp_cpu; 1279 - drive_stat_acct(req, 0); 1280 - elv_bio_merged(q, req, bio); 1281 - if (!attempt_back_merge(q, req)) 1282 - elv_merged_request(q, req, el_ret); 1283 - goto out; 1284 - 1285 - case ELEVATOR_FRONT_MERGE: 1286 - BUG_ON(!rq_mergeable(req)); 1287 - 1288 - if (!ll_front_merge_fn(q, req, bio)) 1289 - break; 1290 - 1291 - trace_block_bio_frontmerge(q, bio); 1292 - 1293 - if ((req->cmd_flags & REQ_FAILFAST_MASK) != ff) { 1294 - blk_rq_set_mixed_merge(req); 1295 - req->cmd_flags &= ~REQ_FAILFAST_MASK; 1296 - req->cmd_flags |= ff; 1236 + if (el_ret == ELEVATOR_BACK_MERGE) { 1237 + BUG_ON(req->cmd_flags & REQ_ON_PLUG); 1238 + if (bio_attempt_back_merge(q, req, bio)) { 1239 + if (!attempt_back_merge(q, req)) 1240 + elv_merged_request(q, req, el_ret); 1241 + goto out_unlock; 1297 1242 } 1298 - 1299 - bio->bi_next = req->bio; 1300 - req->bio = bio; 1301 - 1302 - /* 1303 - * may not be valid. if the low level driver said 1304 - * it didn't need a bounce buffer then it better 1305 - * not touch req->buffer either... 1306 - */ 1307 - req->buffer = bio_data(bio); 1308 - req->__sector = bio->bi_sector; 1309 - req->__data_len += bytes; 1310 - req->ioprio = ioprio_best(req->ioprio, prio); 1311 - if (!blk_rq_cpu_valid(req)) 1312 - req->cpu = bio->bi_comp_cpu; 1313 - drive_stat_acct(req, 0); 1314 - elv_bio_merged(q, req, bio); 1315 - if (!attempt_front_merge(q, req)) 1316 - elv_merged_request(q, req, el_ret); 1317 - goto out; 1318 - 1319 - /* ELV_NO_MERGE: elevator says don't/can't merge. */ 1320 - default: 1321 - ; 1243 + } else if (el_ret == ELEVATOR_FRONT_MERGE) { 1244 + BUG_ON(req->cmd_flags & REQ_ON_PLUG); 1245 + if (bio_attempt_front_merge(q, req, bio)) { 1246 + if (!attempt_front_merge(q, req)) 1247 + elv_merged_request(q, req, el_ret); 1248 + goto out_unlock; 1249 + } 1322 1250 } 1323 1251 1324 1252 get_rq: ··· 1303 1315 */ 1304 1316 init_request_from_bio(req, bio); 1305 1317 1306 - spin_lock_irq(q->queue_lock); 1307 1318 if (test_bit(QUEUE_FLAG_SAME_COMP, &q->queue_flags) || 1308 - bio_flagged(bio, BIO_CPU_AFFINE)) 1309 - req->cpu = blk_cpu_to_group(smp_processor_id()); 1310 - if (queue_should_plug(q) && elv_queue_empty(q)) 1311 - blk_plug_device(q); 1319 + bio_flagged(bio, BIO_CPU_AFFINE)) { 1320 + req->cpu = blk_cpu_to_group(get_cpu()); 1321 + put_cpu(); 1322 + } 1312 1323 1313 - /* insert the request into the elevator */ 1314 - drive_stat_acct(req, 1); 1315 - __elv_add_request(q, req, where, 0); 1324 + plug = current->plug; 1325 + if (plug) { 1326 + if (!plug->should_sort && !list_empty(&plug->list)) { 1327 + struct request *__rq; 1328 + 1329 + __rq = list_entry_rq(plug->list.prev); 1330 + if (__rq->q != q) 1331 + plug->should_sort = 1; 1332 + } 1333 + /* 1334 + * Debug flag, kill later 1335 + */ 1336 + req->cmd_flags |= REQ_ON_PLUG; 1337 + list_add_tail(&req->queuelist, &plug->list); 1338 + drive_stat_acct(req, 1); 1339 + } else { 1340 + spin_lock_irq(q->queue_lock); 1341 + add_acct_request(q, req, where); 1342 + __blk_run_queue(q, false); 1343 + out_unlock: 1344 + spin_unlock_irq(q->queue_lock); 1345 + } 1316 1346 out: 1317 - if (unplug || !queue_should_plug(q)) 1318 - __generic_unplug_device(q); 1319 - spin_unlock_irq(q->queue_lock); 1320 1347 return 0; 1321 1348 } 1322 1349 ··· 1734 1731 */ 1735 1732 BUG_ON(blk_queued_rq(rq)); 1736 1733 1737 - drive_stat_acct(rq, 1); 1738 - __elv_add_request(q, rq, ELEVATOR_INSERT_BACK, 0); 1739 - 1734 + add_acct_request(q, rq, ELEVATOR_INSERT_BACK); 1740 1735 spin_unlock_irqrestore(q->queue_lock, flags); 1741 1736 1742 1737 return 0; ··· 1806 1805 * normal IO on queueing nor completion. Accounting the 1807 1806 * containing request is enough. 1808 1807 */ 1809 - if (blk_do_io_stat(req) && req != &req->q->flush_rq) { 1808 + if (blk_do_io_stat(req) && !(req->cmd_flags & REQ_FLUSH_SEQ)) { 1810 1809 unsigned long duration = jiffies - req->start_time; 1811 1810 const int rw = rq_data_dir(req); 1812 1811 struct hd_struct *part; ··· 2628 2627 return queue_work(kblockd_workqueue, work); 2629 2628 } 2630 2629 EXPORT_SYMBOL(kblockd_schedule_work); 2630 + 2631 + int kblockd_schedule_delayed_work(struct request_queue *q, 2632 + struct delayed_work *dwork, unsigned long delay) 2633 + { 2634 + return queue_delayed_work(kblockd_workqueue, dwork, delay); 2635 + } 2636 + EXPORT_SYMBOL(kblockd_schedule_delayed_work); 2637 + 2638 + #define PLUG_MAGIC 0x91827364 2639 + 2640 + void blk_start_plug(struct blk_plug *plug) 2641 + { 2642 + struct task_struct *tsk = current; 2643 + 2644 + plug->magic = PLUG_MAGIC; 2645 + INIT_LIST_HEAD(&plug->list); 2646 + plug->should_sort = 0; 2647 + 2648 + /* 2649 + * If this is a nested plug, don't actually assign it. It will be 2650 + * flushed on its own. 2651 + */ 2652 + if (!tsk->plug) { 2653 + /* 2654 + * Store ordering should not be needed here, since a potential 2655 + * preempt will imply a full memory barrier 2656 + */ 2657 + tsk->plug = plug; 2658 + } 2659 + } 2660 + EXPORT_SYMBOL(blk_start_plug); 2661 + 2662 + static int plug_rq_cmp(void *priv, struct list_head *a, struct list_head *b) 2663 + { 2664 + struct request *rqa = container_of(a, struct request, queuelist); 2665 + struct request *rqb = container_of(b, struct request, queuelist); 2666 + 2667 + return !(rqa->q == rqb->q); 2668 + } 2669 + 2670 + static void flush_plug_list(struct blk_plug *plug) 2671 + { 2672 + struct request_queue *q; 2673 + unsigned long flags; 2674 + struct request *rq; 2675 + 2676 + BUG_ON(plug->magic != PLUG_MAGIC); 2677 + 2678 + if (list_empty(&plug->list)) 2679 + return; 2680 + 2681 + if (plug->should_sort) 2682 + list_sort(NULL, &plug->list, plug_rq_cmp); 2683 + 2684 + q = NULL; 2685 + local_irq_save(flags); 2686 + while (!list_empty(&plug->list)) { 2687 + rq = list_entry_rq(plug->list.next); 2688 + list_del_init(&rq->queuelist); 2689 + BUG_ON(!(rq->cmd_flags & REQ_ON_PLUG)); 2690 + BUG_ON(!rq->q); 2691 + if (rq->q != q) { 2692 + if (q) { 2693 + __blk_run_queue(q, false); 2694 + spin_unlock(q->queue_lock); 2695 + } 2696 + q = rq->q; 2697 + spin_lock(q->queue_lock); 2698 + } 2699 + rq->cmd_flags &= ~REQ_ON_PLUG; 2700 + 2701 + /* 2702 + * rq is already accounted, so use raw insert 2703 + */ 2704 + __elv_add_request(q, rq, ELEVATOR_INSERT_SORT_MERGE); 2705 + } 2706 + 2707 + if (q) { 2708 + __blk_run_queue(q, false); 2709 + spin_unlock(q->queue_lock); 2710 + } 2711 + 2712 + BUG_ON(!list_empty(&plug->list)); 2713 + local_irq_restore(flags); 2714 + } 2715 + 2716 + static void __blk_finish_plug(struct task_struct *tsk, struct blk_plug *plug) 2717 + { 2718 + flush_plug_list(plug); 2719 + 2720 + if (plug == tsk->plug) 2721 + tsk->plug = NULL; 2722 + } 2723 + 2724 + void blk_finish_plug(struct blk_plug *plug) 2725 + { 2726 + if (plug) 2727 + __blk_finish_plug(current, plug); 2728 + } 2729 + EXPORT_SYMBOL(blk_finish_plug); 2730 + 2731 + void __blk_flush_plug(struct task_struct *tsk, struct blk_plug *plug) 2732 + { 2733 + __blk_finish_plug(tsk, plug); 2734 + tsk->plug = plug; 2735 + } 2736 + EXPORT_SYMBOL(__blk_flush_plug); 2631 2737 2632 2738 int __init blk_dev_init(void) 2633 2739 {

+2 -2

block/blk-exec.c

··· 54 54 rq->end_io = done; 55 55 WARN_ON(irqs_disabled()); 56 56 spin_lock_irq(q->queue_lock); 57 - __elv_add_request(q, rq, where, 1); 58 - __generic_unplug_device(q); 57 + __elv_add_request(q, rq, where); 58 + __blk_run_queue(q, false); 59 59 /* the queue is stopped so it won't be plugged+unplugged */ 60 60 if (rq->cmd_type == REQ_TYPE_PM_RESUME) 61 61 q->request_fn(q);

+319 -146

block/blk-flush.c

··· 1 1 /* 2 2 * Functions to sequence FLUSH and FUA writes. 3 + * 4 + * Copyright (C) 2011 Max Planck Institute for Gravitational Physics 5 + * Copyright (C) 2011 Tejun Heo <tj@kernel.org> 6 + * 7 + * This file is released under the GPLv2. 8 + * 9 + * REQ_{FLUSH|FUA} requests are decomposed to sequences consisted of three 10 + * optional steps - PREFLUSH, DATA and POSTFLUSH - according to the request 11 + * properties and hardware capability. 12 + * 13 + * If a request doesn't have data, only REQ_FLUSH makes sense, which 14 + * indicates a simple flush request. If there is data, REQ_FLUSH indicates 15 + * that the device cache should be flushed before the data is executed, and 16 + * REQ_FUA means that the data must be on non-volatile media on request 17 + * completion. 18 + * 19 + * If the device doesn't have writeback cache, FLUSH and FUA don't make any 20 + * difference. The requests are either completed immediately if there's no 21 + * data or executed as normal requests otherwise. 22 + * 23 + * If the device has writeback cache and supports FUA, REQ_FLUSH is 24 + * translated to PREFLUSH but REQ_FUA is passed down directly with DATA. 25 + * 26 + * If the device has writeback cache and doesn't support FUA, REQ_FLUSH is 27 + * translated to PREFLUSH and REQ_FUA to POSTFLUSH. 28 + * 29 + * The actual execution of flush is double buffered. Whenever a request 30 + * needs to execute PRE or POSTFLUSH, it queues at 31 + * q->flush_queue[q->flush_pending_idx]. Once certain criteria are met, a 32 + * flush is issued and the pending_idx is toggled. When the flush 33 + * completes, all the requests which were pending are proceeded to the next 34 + * step. This allows arbitrary merging of different types of FLUSH/FUA 35 + * requests. 36 + * 37 + * Currently, the following conditions are used to determine when to issue 38 + * flush. 39 + * 40 + * C1. At any given time, only one flush shall be in progress. This makes 41 + * double buffering sufficient. 42 + * 43 + * C2. Flush is deferred if any request is executing DATA of its sequence. 44 + * This avoids issuing separate POSTFLUSHes for requests which shared 45 + * PREFLUSH. 46 + * 47 + * C3. The second condition is ignored if there is a request which has 48 + * waited longer than FLUSH_PENDING_TIMEOUT. This is to avoid 49 + * starvation in the unlikely case where there are continuous stream of 50 + * FUA (without FLUSH) requests. 51 + * 52 + * For devices which support FUA, it isn't clear whether C2 (and thus C3) 53 + * is beneficial. 54 + * 55 + * Note that a sequenced FLUSH/FUA request with DATA is completed twice. 56 + * Once while executing DATA and again after the whole sequence is 57 + * complete. The first completion updates the contained bio but doesn't 58 + * finish it so that the bio submitter is notified only after the whole 59 + * sequence is complete. This is implemented by testing REQ_FLUSH_SEQ in 60 + * req_bio_endio(). 61 + * 62 + * The above peculiarity requires that each FLUSH/FUA request has only one 63 + * bio attached to it, which is guaranteed as they aren't allowed to be 64 + * merged in the usual way. 3 65 */ 66 + 4 67 #include <linux/kernel.h> 5 68 #include <linux/module.h> 6 69 #include <linux/bio.h> ··· 74 11 75 12 /* FLUSH/FUA sequences */ 76 13 enum { 77 - QUEUE_FSEQ_STARTED = (1 << 0), /* flushing in progress */ 78 - QUEUE_FSEQ_PREFLUSH = (1 << 1), /* pre-flushing in progress */ 79 - QUEUE_FSEQ_DATA = (1 << 2), /* data write in progress */ 80 - QUEUE_FSEQ_POSTFLUSH = (1 << 3), /* post-flushing in progress */ 81 - QUEUE_FSEQ_DONE = (1 << 4), 14 + REQ_FSEQ_PREFLUSH = (1 << 0), /* pre-flushing in progress */ 15 + REQ_FSEQ_DATA = (1 << 1), /* data write in progress */ 16 + REQ_FSEQ_POSTFLUSH = (1 << 2), /* post-flushing in progress */ 17 + REQ_FSEQ_DONE = (1 << 3), 18 + 19 + REQ_FSEQ_ACTIONS = REQ_FSEQ_PREFLUSH | REQ_FSEQ_DATA | 20 + REQ_FSEQ_POSTFLUSH, 21 + 22 + /* 23 + * If flush has been pending longer than the following timeout, 24 + * it's issued even if flush_data requests are still in flight. 25 + */ 26 + FLUSH_PENDING_TIMEOUT = 5 * HZ, 82 27 }; 83 28 84 - static struct request *queue_next_fseq(struct request_queue *q); 29 + static bool blk_kick_flush(struct request_queue *q); 85 30 86 - unsigned blk_flush_cur_seq(struct request_queue *q) 31 + static unsigned int blk_flush_policy(unsigned int fflags, struct request *rq) 87 32 { 88 - if (!q->flush_seq) 89 - return 0; 90 - return 1 << ffz(q->flush_seq); 91 - } 33 + unsigned int policy = 0; 92 34 93 - static struct request *blk_flush_complete_seq(struct request_queue *q, 94 - unsigned seq, int error) 95 - { 96 - struct request *next_rq = NULL; 97 - 98 - if (error && !q->flush_err) 99 - q->flush_err = error; 100 - 101 - BUG_ON(q->flush_seq & seq); 102 - q->flush_seq |= seq; 103 - 104 - if (blk_flush_cur_seq(q) != QUEUE_FSEQ_DONE) { 105 - /* not complete yet, queue the next flush sequence */ 106 - next_rq = queue_next_fseq(q); 107 - } else { 108 - /* complete this flush request */ 109 - __blk_end_request_all(q->orig_flush_rq, q->flush_err); 110 - q->orig_flush_rq = NULL; 111 - q->flush_seq = 0; 112 - 113 - /* dispatch the next flush if there's one */ 114 - if (!list_empty(&q->pending_flushes)) { 115 - next_rq = list_entry_rq(q->pending_flushes.next); 116 - list_move(&next_rq->queuelist, &q->queue_head); 117 - } 35 + if (fflags & REQ_FLUSH) { 36 + if (rq->cmd_flags & REQ_FLUSH) 37 + policy |= REQ_FSEQ_PREFLUSH; 38 + if (blk_rq_sectors(rq)) 39 + policy |= REQ_FSEQ_DATA; 40 + if (!(fflags & REQ_FUA) && (rq->cmd_flags & REQ_FUA)) 41 + policy |= REQ_FSEQ_POSTFLUSH; 118 42 } 119 - return next_rq; 43 + return policy; 120 44 } 121 45 122 - static void blk_flush_complete_seq_end_io(struct request_queue *q, 123 - unsigned seq, int error) 46 + static unsigned int blk_flush_cur_seq(struct request *rq) 124 47 { 125 - bool was_empty = elv_queue_empty(q); 126 - struct request *next_rq; 48 + return 1 << ffz(rq->flush.seq); 49 + } 127 50 128 - next_rq = blk_flush_complete_seq(q, seq, error); 51 + static void blk_flush_restore_request(struct request *rq) 52 + { 53 + /* 54 + * After flush data completion, @rq->bio is %NULL but we need to 55 + * complete the bio again. @rq->biotail is guaranteed to equal the 56 + * original @rq->bio. Restore it. 57 + */ 58 + rq->bio = rq->biotail; 59 + 60 + /* make @rq a normal request */ 61 + rq->cmd_flags &= ~REQ_FLUSH_SEQ; 62 + rq->end_io = NULL; 63 + } 64 + 65 + /** 66 + * blk_flush_complete_seq - complete flush sequence 67 + * @rq: FLUSH/FUA request being sequenced 68 + * @seq: sequences to complete (mask of %REQ_FSEQ_*, can be zero) 69 + * @error: whether an error occurred 70 + * 71 + * @rq just completed @seq part of its flush sequence, record the 72 + * completion and trigger the next step. 73 + * 74 + * CONTEXT: 75 + * spin_lock_irq(q->queue_lock) 76 + * 77 + * RETURNS: 78 + * %true if requests were added to the dispatch queue, %false otherwise. 79 + */ 80 + static bool blk_flush_complete_seq(struct request *rq, unsigned int seq, 81 + int error) 82 + { 83 + struct request_queue *q = rq->q; 84 + struct list_head *pending = &q->flush_queue[q->flush_pending_idx]; 85 + bool queued = false; 86 + 87 + BUG_ON(rq->flush.seq & seq); 88 + rq->flush.seq |= seq; 89 + 90 + if (likely(!error)) 91 + seq = blk_flush_cur_seq(rq); 92 + else 93 + seq = REQ_FSEQ_DONE; 94 + 95 + switch (seq) { 96 + case REQ_FSEQ_PREFLUSH: 97 + case REQ_FSEQ_POSTFLUSH: 98 + /* queue for flush */ 99 + if (list_empty(pending)) 100 + q->flush_pending_since = jiffies; 101 + list_move_tail(&rq->flush.list, pending); 102 + break; 103 + 104 + case REQ_FSEQ_DATA: 105 + list_move_tail(&rq->flush.list, &q->flush_data_in_flight); 106 + list_add(&rq->queuelist, &q->queue_head); 107 + queued = true; 108 + break; 109 + 110 + case REQ_FSEQ_DONE: 111 + /* 112 + * @rq was previously adjusted by blk_flush_issue() for 113 + * flush sequencing and may already have gone through the 114 + * flush data request completion path. Restore @rq for 115 + * normal completion and end it. 116 + */ 117 + BUG_ON(!list_empty(&rq->queuelist)); 118 + list_del_init(&rq->flush.list); 119 + blk_flush_restore_request(rq); 120 + __blk_end_request_all(rq, error); 121 + break; 122 + 123 + default: 124 + BUG(); 125 + } 126 + 127 + return blk_kick_flush(q) | queued; 128 + } 129 + 130 + static void flush_end_io(struct request *flush_rq, int error) 131 + { 132 + struct request_queue *q = flush_rq->q; 133 + struct list_head *running = &q->flush_queue[q->flush_running_idx]; 134 + bool queued = false; 135 + struct request *rq, *n; 136 + 137 + BUG_ON(q->flush_pending_idx == q->flush_running_idx); 138 + 139 + /* account completion of the flush request */ 140 + q->flush_running_idx ^= 1; 141 + elv_completed_request(q, flush_rq); 142 + 143 + /* and push the waiting requests to the next stage */ 144 + list_for_each_entry_safe(rq, n, running, flush.list) { 145 + unsigned int seq = blk_flush_cur_seq(rq); 146 + 147 + BUG_ON(seq != REQ_FSEQ_PREFLUSH && seq != REQ_FSEQ_POSTFLUSH); 148 + queued |= blk_flush_complete_seq(rq, seq, error); 149 + } 129 150 130 151 /* 131 152 * Moving a request silently to empty queue_head may stall the ··· 217 70 * from request completion path and calling directly into 218 71 * request_fn may confuse the driver. Always use kblockd. 219 72 */ 220 - if (was_empty && next_rq) 73 + if (queued) 221 74 __blk_run_queue(q, true); 222 75 } 223 76 224 - static void pre_flush_end_io(struct request *rq, int error) 77 + /** 78 + * blk_kick_flush - consider issuing flush request 79 + * @q: request_queue being kicked 80 + * 81 + * Flush related states of @q have changed, consider issuing flush request. 82 + * Please read the comment at the top of this file for more info. 83 + * 84 + * CONTEXT: 85 + * spin_lock_irq(q->queue_lock) 86 + * 87 + * RETURNS: 88 + * %true if flush was issued, %false otherwise. 89 + */ 90 + static bool blk_kick_flush(struct request_queue *q) 225 91 { 226 - elv_completed_request(rq->q, rq); 227 - blk_flush_complete_seq_end_io(rq->q, QUEUE_FSEQ_PREFLUSH, error); 92 + struct list_head *pending = &q->flush_queue[q->flush_pending_idx]; 93 + struct request *first_rq = 94 + list_first_entry(pending, struct request, flush.list); 95 + 96 + /* C1 described at the top of this file */ 97 + if (q->flush_pending_idx != q->flush_running_idx || list_empty(pending)) 98 + return false; 99 + 100 + /* C2 and C3 */ 101 + if (!list_empty(&q->flush_data_in_flight) && 102 + time_before(jiffies, 103 + q->flush_pending_since + FLUSH_PENDING_TIMEOUT)) 104 + return false; 105 + 106 + /* 107 + * Issue flush and toggle pending_idx. This makes pending_idx 108 + * different from running_idx, which means flush is in flight. 109 + */ 110 + blk_rq_init(q, &q->flush_rq); 111 + q->flush_rq.cmd_type = REQ_TYPE_FS; 112 + q->flush_rq.cmd_flags = WRITE_FLUSH | REQ_FLUSH_SEQ; 113 + q->flush_rq.rq_disk = first_rq->rq_disk; 114 + q->flush_rq.end_io = flush_end_io; 115 + 116 + q->flush_pending_idx ^= 1; 117 + elv_insert(q, &q->flush_rq, ELEVATOR_INSERT_REQUEUE); 118 + return true; 228 119 } 229 120 230 121 static void flush_data_end_io(struct request *rq, int error) 231 122 { 232 - elv_completed_request(rq->q, rq); 233 - blk_flush_complete_seq_end_io(rq->q, QUEUE_FSEQ_DATA, error); 234 - } 235 - 236 - static void post_flush_end_io(struct request *rq, int error) 237 - { 238 - elv_completed_request(rq->q, rq); 239 - blk_flush_complete_seq_end_io(rq->q, QUEUE_FSEQ_POSTFLUSH, error); 240 - } 241 - 242 - static void init_flush_request(struct request *rq, struct gendisk *disk) 243 - { 244 - rq->cmd_type = REQ_TYPE_FS; 245 - rq->cmd_flags = WRITE_FLUSH; 246 - rq->rq_disk = disk; 247 - } 248 - 249 - static struct request *queue_next_fseq(struct request_queue *q) 250 - { 251 - struct request *orig_rq = q->orig_flush_rq; 252 - struct request *rq = &q->flush_rq; 253 - 254 - blk_rq_init(q, rq); 255 - 256 - switch (blk_flush_cur_seq(q)) { 257 - case QUEUE_FSEQ_PREFLUSH: 258 - init_flush_request(rq, orig_rq->rq_disk); 259 - rq->end_io = pre_flush_end_io; 260 - break; 261 - case QUEUE_FSEQ_DATA: 262 - init_request_from_bio(rq, orig_rq->bio); 263 - /* 264 - * orig_rq->rq_disk may be different from 265 - * bio->bi_bdev->bd_disk if orig_rq got here through 266 - * remapping drivers. Make sure rq->rq_disk points 267 - * to the same one as orig_rq. 268 - */ 269 - rq->rq_disk = orig_rq->rq_disk; 270 - rq->cmd_flags &= ~(REQ_FLUSH | REQ_FUA); 271 - rq->cmd_flags |= orig_rq->cmd_flags & (REQ_FLUSH | REQ_FUA); 272 - rq->end_io = flush_data_end_io; 273 - break; 274 - case QUEUE_FSEQ_POSTFLUSH: 275 - init_flush_request(rq, orig_rq->rq_disk); 276 - rq->end_io = post_flush_end_io; 277 - break; 278 - default: 279 - BUG(); 280 - } 281 - 282 - elv_insert(q, rq, ELEVATOR_INSERT_REQUEUE); 283 - return rq; 284 - } 285 - 286 - struct request *blk_do_flush(struct request_queue *q, struct request *rq) 287 - { 288 - unsigned int fflags = q->flush_flags; /* may change, cache it */ 289 - bool has_flush = fflags & REQ_FLUSH, has_fua = fflags & REQ_FUA; 290 - bool do_preflush = has_flush && (rq->cmd_flags & REQ_FLUSH); 291 - bool do_postflush = has_flush && !has_fua && (rq->cmd_flags & REQ_FUA); 292 - unsigned skip = 0; 123 + struct request_queue *q = rq->q; 293 124 294 125 /* 295 - * Special case. If there's data but flush is not necessary, 296 - * the request can be issued directly. 297 - * 298 - * Flush w/o data should be able to be issued directly too but 299 - * currently some drivers assume that rq->bio contains 300 - * non-zero data if it isn't NULL and empty FLUSH requests 301 - * getting here usually have bio's without data. 126 + * After populating an empty queue, kick it to avoid stall. Read 127 + * the comment in flush_end_io(). 302 128 */ 303 - if (blk_rq_sectors(rq) && !do_preflush && !do_postflush) { 304 - rq->cmd_flags &= ~REQ_FLUSH; 305 - if (!has_fua) 306 - rq->cmd_flags &= ~REQ_FUA; 307 - return rq; 308 - } 129 + if (blk_flush_complete_seq(rq, REQ_FSEQ_DATA, error)) 130 + __blk_run_queue(q, true); 131 + } 132 + 133 + /** 134 + * blk_insert_flush - insert a new FLUSH/FUA request 135 + * @rq: request to insert 136 + * 137 + * To be called from elv_insert() for %ELEVATOR_INSERT_FLUSH insertions. 138 + * @rq is being submitted. Analyze what needs to be done and put it on the 139 + * right queue. 140 + * 141 + * CONTEXT: 142 + * spin_lock_irq(q->queue_lock) 143 + */ 144 + void blk_insert_flush(struct request *rq) 145 + { 146 + struct request_queue *q = rq->q; 147 + unsigned int fflags = q->flush_flags; /* may change, cache */ 148 + unsigned int policy = blk_flush_policy(fflags, rq); 149 + 150 + BUG_ON(rq->end_io); 151 + BUG_ON(!rq->bio || rq->bio != rq->biotail); 309 152 310 153 /* 311 - * Sequenced flushes can't be processed in parallel. If 312 - * another one is already in progress, queue for later 313 - * processing. 154 + * @policy now records what operations need to be done. Adjust 155 + * REQ_FLUSH and FUA for the driver. 314 156 */ 315 - if (q->flush_seq) { 316 - list_move_tail(&rq->queuelist, &q->pending_flushes); 317 - return NULL; 318 - } 319 - 320 - /* 321 - * Start a new flush sequence 322 - */ 323 - q->flush_err = 0; 324 - q->flush_seq |= QUEUE_FSEQ_STARTED; 325 - 326 - /* adjust FLUSH/FUA of the original request and stash it away */ 327 157 rq->cmd_flags &= ~REQ_FLUSH; 328 - if (!has_fua) 158 + if (!(fflags & REQ_FUA)) 329 159 rq->cmd_flags &= ~REQ_FUA; 330 - blk_dequeue_request(rq); 331 - q->orig_flush_rq = rq; 332 160 333 - /* skip unneded sequences and return the first one */ 334 - if (!do_preflush) 335 - skip |= QUEUE_FSEQ_PREFLUSH; 336 - if (!blk_rq_sectors(rq)) 337 - skip |= QUEUE_FSEQ_DATA; 338 - if (!do_postflush) 339 - skip |= QUEUE_FSEQ_POSTFLUSH; 340 - return blk_flush_complete_seq(q, skip, 0); 161 + /* 162 + * If there's data but flush is not necessary, the request can be 163 + * processed directly without going through flush machinery. Queue 164 + * for normal execution. 165 + */ 166 + if ((policy & REQ_FSEQ_DATA) && 167 + !(policy & (REQ_FSEQ_PREFLUSH | REQ_FSEQ_POSTFLUSH))) { 168 + list_add(&rq->queuelist, &q->queue_head); 169 + return; 170 + } 171 + 172 + /* 173 + * @rq should go through flush machinery. Mark it part of flush 174 + * sequence and submit for further processing. 175 + */ 176 + memset(&rq->flush, 0, sizeof(rq->flush)); 177 + INIT_LIST_HEAD(&rq->flush.list); 178 + rq->cmd_flags |= REQ_FLUSH_SEQ; 179 + rq->end_io = flush_data_end_io; 180 + 181 + blk_flush_complete_seq(rq, REQ_FSEQ_ACTIONS & ~policy, 0); 182 + } 183 + 184 + /** 185 + * blk_abort_flushes - @q is being aborted, abort flush requests 186 + * @q: request_queue being aborted 187 + * 188 + * To be called from elv_abort_queue(). @q is being aborted. Prepare all 189 + * FLUSH/FUA requests for abortion. 190 + * 191 + * CONTEXT: 192 + * spin_lock_irq(q->queue_lock) 193 + */ 194 + void blk_abort_flushes(struct request_queue *q) 195 + { 196 + struct request *rq, *n; 197 + int i; 198 + 199 + /* 200 + * Requests in flight for data are already owned by the dispatch 201 + * queue or the device driver. Just restore for normal completion. 202 + */ 203 + list_for_each_entry_safe(rq, n, &q->flush_data_in_flight, flush.list) { 204 + list_del_init(&rq->flush.list); 205 + blk_flush_restore_request(rq); 206 + } 207 + 208 + /* 209 + * We need to give away requests on flush queues. Restore for 210 + * normal completion and put them on the dispatch queue. 211 + */ 212 + for (i = 0; i < ARRAY_SIZE(q->flush_queue); i++) { 213 + list_for_each_entry_safe(rq, n, &q->flush_queue[i], 214 + flush.list) { 215 + list_del_init(&rq->flush.list); 216 + blk_flush_restore_request(rq); 217 + list_add_tail(&rq->queuelist, &q->queue_head); 218 + } 219 + } 341 220 } 342 221 343 222 static void bio_end_flush(struct bio *bio, int err)

-2

block/blk-lib.c

··· 136 136 * 137 137 * Description: 138 138 * Generate and issue number of bios with zerofiled pages. 139 - * Send barrier at the beginning and at the end if requested. This guarantie 140 - * correct request ordering. Empty barrier allow us to avoid post queue flush. 141 139 */ 142 140 143 141 int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,

+6

block/blk-merge.c

··· 465 465 466 466 return 0; 467 467 } 468 + 469 + int blk_attempt_req_merge(struct request_queue *q, struct request *rq, 470 + struct request *next) 471 + { 472 + return attempt_merge(q, rq, next); 473 + }

-15

block/blk-settings.c

··· 164 164 blk_queue_congestion_threshold(q); 165 165 q->nr_batching = BLK_BATCH_REQ; 166 166 167 - q->unplug_thresh = 4; /* hmm */ 168 - q->unplug_delay = msecs_to_jiffies(3); /* 3 milliseconds */ 169 - if (q->unplug_delay == 0) 170 - q->unplug_delay = 1; 171 - 172 - q->unplug_timer.function = blk_unplug_timeout; 173 - q->unplug_timer.data = (unsigned long)q; 174 - 175 167 blk_set_default_limits(&q->limits); 176 168 blk_queue_max_hw_sectors(q, BLK_SAFE_MAX_SECTORS); 177 - 178 - /* 179 - * If the caller didn't supply a lock, fall back to our embedded 180 - * per-queue locks 181 - */ 182 - if (!q->queue_lock) 183 - q->queue_lock = &q->__queue_lock; 184 169 185 170 /* 186 171 * by default assume old behaviour and bounce for any highmem page

-2

block/blk-sysfs.c

··· 471 471 472 472 blk_sync_queue(q); 473 473 474 - blk_throtl_exit(q); 475 - 476 474 if (rl->rq_pool) 477 475 mempool_destroy(rl->rq_pool); 478 476

+72 -69

block/blk-throttle.c

··· 102 102 /* Work for dispatching throttled bios */ 103 103 struct delayed_work throtl_work; 104 104 105 - atomic_t limits_changed; 105 + bool limits_changed; 106 106 }; 107 107 108 108 enum tg_state_flags { ··· 201 201 RB_CLEAR_NODE(&tg->rb_node); 202 202 bio_list_init(&tg->bio_lists[0]); 203 203 bio_list_init(&tg->bio_lists[1]); 204 + td->limits_changed = false; 204 205 205 206 /* 206 207 * Take the initial reference that will be released on destroy ··· 738 737 struct throtl_grp *tg; 739 738 struct hlist_node *pos, *n; 740 739 741 - if (!atomic_read(&td->limits_changed)) 740 + if (!td->limits_changed) 742 741 return; 743 742 744 - throtl_log(td, "limit changed =%d", atomic_read(&td->limits_changed)); 743 + xchg(&td->limits_changed, false); 745 744 746 - /* 747 - * Make sure updates from throtl_update_blkio_group_read_bps() group 748 - * of functions to tg->limits_changed are visible. We do not 749 - * want update td->limits_changed to be visible but update to 750 - * tg->limits_changed not being visible yet on this cpu. Hence 751 - * the read barrier. 752 - */ 753 - smp_rmb(); 745 + throtl_log(td, "limits changed"); 754 746 755 747 hlist_for_each_entry_safe(tg, pos, n, &td->tg_list, tg_node) { 756 - if (throtl_tg_on_rr(tg) && tg->limits_changed) { 757 - throtl_log_tg(td, tg, "limit change rbps=%llu wbps=%llu" 758 - " riops=%u wiops=%u", tg->bps[READ], 759 - tg->bps[WRITE], tg->iops[READ], 760 - tg->iops[WRITE]); 761 - tg_update_disptime(td, tg); 762 - tg->limits_changed = false; 763 - } 764 - } 748 + if (!tg->limits_changed) 749 + continue; 765 750 766 - smp_mb__before_atomic_dec(); 767 - atomic_dec(&td->limits_changed); 768 - smp_mb__after_atomic_dec(); 751 + if (!xchg(&tg->limits_changed, false)) 752 + continue; 753 + 754 + throtl_log_tg(td, tg, "limit change rbps=%llu wbps=%llu" 755 + " riops=%u wiops=%u", tg->bps[READ], tg->bps[WRITE], 756 + tg->iops[READ], tg->iops[WRITE]); 757 + 758 + /* 759 + * Restart the slices for both READ and WRITES. It 760 + * might happen that a group's limit are dropped 761 + * suddenly and we don't want to account recently 762 + * dispatched IO with new low rate 763 + */ 764 + throtl_start_new_slice(td, tg, 0); 765 + throtl_start_new_slice(td, tg, 1); 766 + 767 + if (throtl_tg_on_rr(tg)) 768 + tg_update_disptime(td, tg); 769 + } 769 770 } 770 771 771 772 /* Dispatch throttled bios. Should be called without queue lock held. */ ··· 777 774 unsigned int nr_disp = 0; 778 775 struct bio_list bio_list_on_stack; 779 776 struct bio *bio; 777 + struct blk_plug plug; 780 778 781 779 spin_lock_irq(q->queue_lock); 782 780 ··· 806 802 * immediate dispatch 807 803 */ 808 804 if (nr_disp) { 805 + blk_start_plug(&plug); 809 806 while((bio = bio_list_pop(&bio_list_on_stack))) 810 807 generic_make_request(bio); 811 - blk_unplug(q); 808 + blk_finish_plug(&plug); 812 809 } 813 810 return nr_disp; 814 811 } ··· 830 825 831 826 struct delayed_work *dwork = &td->throtl_work; 832 827 833 - if (total_nr_queued(td) > 0) { 828 + /* schedule work if limits changed even if no bio is queued */ 829 + if (total_nr_queued(td) > 0 || td->limits_changed) { 834 830 /* 835 831 * We might have a work scheduled to be executed in future. 836 832 * Cancel that and schedule a new one. ··· 904 898 spin_unlock_irqrestore(td->queue->queue_lock, flags); 905 899 } 906 900 901 + static void throtl_update_blkio_group_common(struct throtl_data *td, 902 + struct throtl_grp *tg) 903 + { 904 + xchg(&tg->limits_changed, true); 905 + xchg(&td->limits_changed, true); 906 + /* Schedule a work now to process the limit change */ 907 + throtl_schedule_delayed_work(td, 0); 908 + } 909 + 907 910 /* 908 911 * For all update functions, key should be a valid pointer because these 909 912 * update functions are called under blkcg_lock, that means, blkg is ··· 926 911 struct blkio_group *blkg, u64 read_bps) 927 912 { 928 913 struct throtl_data *td = key; 914 + struct throtl_grp *tg = tg_of_blkg(blkg); 929 915 930 - tg_of_blkg(blkg)->bps[READ] = read_bps; 931 - /* Make sure read_bps is updated before setting limits_changed */ 932 - smp_wmb(); 933 - tg_of_blkg(blkg)->limits_changed = true; 934 - 935 - /* Make sure tg->limits_changed is updated before td->limits_changed */ 936 - smp_mb__before_atomic_inc(); 937 - atomic_inc(&td->limits_changed); 938 - smp_mb__after_atomic_inc(); 939 - 940 - /* Schedule a work now to process the limit change */ 941 - throtl_schedule_delayed_work(td, 0); 916 + tg->bps[READ] = read_bps; 917 + throtl_update_blkio_group_common(td, tg); 942 918 } 943 919 944 920 static void throtl_update_blkio_group_write_bps(void *key, 945 921 struct blkio_group *blkg, u64 write_bps) 946 922 { 947 923 struct throtl_data *td = key; 924 + struct throtl_grp *tg = tg_of_blkg(blkg); 948 925 949 - tg_of_blkg(blkg)->bps[WRITE] = write_bps; 950 - smp_wmb(); 951 - tg_of_blkg(blkg)->limits_changed = true; 952 - smp_mb__before_atomic_inc(); 953 - atomic_inc(&td->limits_changed); 954 - smp_mb__after_atomic_inc(); 955 - throtl_schedule_delayed_work(td, 0); 926 + tg->bps[WRITE] = write_bps; 927 + throtl_update_blkio_group_common(td, tg); 956 928 } 957 929 958 930 static void throtl_update_blkio_group_read_iops(void *key, 959 931 struct blkio_group *blkg, unsigned int read_iops) 960 932 { 961 933 struct throtl_data *td = key; 934 + struct throtl_grp *tg = tg_of_blkg(blkg); 962 935 963 - tg_of_blkg(blkg)->iops[READ] = read_iops; 964 - smp_wmb(); 965 - tg_of_blkg(blkg)->limits_changed = true; 966 - smp_mb__before_atomic_inc(); 967 - atomic_inc(&td->limits_changed); 968 - smp_mb__after_atomic_inc(); 969 - throtl_schedule_delayed_work(td, 0); 936 + tg->iops[READ] = read_iops; 937 + throtl_update_blkio_group_common(td, tg); 970 938 } 971 939 972 940 static void throtl_update_blkio_group_write_iops(void *key, 973 941 struct blkio_group *blkg, unsigned int write_iops) 974 942 { 975 943 struct throtl_data *td = key; 944 + struct throtl_grp *tg = tg_of_blkg(blkg); 976 945 977 - tg_of_blkg(blkg)->iops[WRITE] = write_iops; 978 - smp_wmb(); 979 - tg_of_blkg(blkg)->limits_changed = true; 980 - smp_mb__before_atomic_inc(); 981 - atomic_inc(&td->limits_changed); 982 - smp_mb__after_atomic_inc(); 983 - throtl_schedule_delayed_work(td, 0); 946 + tg->iops[WRITE] = write_iops; 947 + throtl_update_blkio_group_common(td, tg); 984 948 } 985 949 986 - void throtl_shutdown_timer_wq(struct request_queue *q) 950 + static void throtl_shutdown_wq(struct request_queue *q) 987 951 { 988 952 struct throtl_data *td = q->td; 989 953 ··· 1003 1009 /* 1004 1010 * There is already another bio queued in same dir. No 1005 1011 * need to update dispatch time. 1006 - * Still update the disptime if rate limits on this group 1007 - * were changed. 1008 1012 */ 1009 - if (!tg->limits_changed) 1010 - update_disptime = false; 1011 - else 1012 - tg->limits_changed = false; 1013 - 1013 + update_disptime = false; 1014 1014 goto queue_bio; 1015 + 1015 1016 } 1016 1017 1017 1018 /* Bio is with-in rate limit of group */ 1018 1019 if (tg_may_dispatch(td, tg, bio, NULL)) { 1019 1020 throtl_charge_bio(tg, bio); 1021 + 1022 + /* 1023 + * We need to trim slice even when bios are not being queued 1024 + * otherwise it might happen that a bio is not queued for 1025 + * a long time and slice keeps on extending and trim is not 1026 + * called for a long time. Now if limits are reduced suddenly 1027 + * we take into account all the IO dispatched so far at new 1028 + * low rate and * newly queued IO gets a really long dispatch 1029 + * time. 1030 + * 1031 + * So keep on trimming slice even if bio is not queued. 1032 + */ 1033 + throtl_trim_slice(td, tg, rw); 1020 1034 goto out; 1021 1035 } 1022 1036 ··· 1060 1058 1061 1059 INIT_HLIST_HEAD(&td->tg_list); 1062 1060 td->tg_service_tree = THROTL_RB_ROOT; 1063 - atomic_set(&td->limits_changed, 0); 1061 + td->limits_changed = false; 1064 1062 1065 1063 /* Init root group */ 1066 1064 tg = &td->root_tg; ··· 1072 1070 /* Practically unlimited BW */ 1073 1071 tg->bps[0] = tg->bps[1] = -1; 1074 1072 tg->iops[0] = tg->iops[1] = -1; 1073 + td->limits_changed = false; 1075 1074 1076 1075 /* 1077 1076 * Set root group reference to 2. One reference will be dropped when ··· 1105 1102 1106 1103 BUG_ON(!td); 1107 1104 1108 - throtl_shutdown_timer_wq(q); 1105 + throtl_shutdown_wq(q); 1109 1106 1110 1107 spin_lock_irq(q->queue_lock); 1111 1108 throtl_release_tgs(td); ··· 1135 1132 * update limits through cgroup and another work got queued, cancel 1136 1133 * it. 1137 1134 */ 1138 - throtl_shutdown_timer_wq(q); 1135 + throtl_shutdown_wq(q); 1139 1136 throtl_td_free(td); 1140 1137 } 1141 1138

+6 -10

block/blk.h

··· 18 18 void blk_dequeue_request(struct request *rq); 19 19 void __blk_queue_free_tags(struct request_queue *q); 20 20 21 - void blk_unplug_work(struct work_struct *work); 22 - void blk_unplug_timeout(unsigned long data); 23 21 void blk_rq_timed_out_timer(unsigned long data); 24 22 void blk_delete_timer(struct request *); 25 23 void blk_add_timer(struct request *); ··· 49 51 */ 50 52 #define ELV_ON_HASH(rq) (!hlist_unhashed(&(rq)->hash)) 51 53 52 - struct request *blk_do_flush(struct request_queue *q, struct request *rq); 54 + void blk_insert_flush(struct request *rq); 55 + void blk_abort_flushes(struct request_queue *q); 53 56 54 57 static inline struct request *__elv_next_request(struct request_queue *q) 55 58 { 56 59 struct request *rq; 57 60 58 61 while (1) { 59 - while (!list_empty(&q->queue_head)) { 62 + if (!list_empty(&q->queue_head)) { 60 63 rq = list_entry_rq(q->queue_head.next); 61 - if (!(rq->cmd_flags & (REQ_FLUSH | REQ_FUA)) || 62 - rq == &q->flush_rq) 63 - return rq; 64 - rq = blk_do_flush(q, rq); 65 - if (rq) 66 - return rq; 64 + return rq; 67 65 } 68 66 69 67 if (!q->elevator->ops->elevator_dispatch_fn(q, 0)) ··· 103 109 struct bio *bio); 104 110 int attempt_back_merge(struct request_queue *q, struct request *rq); 105 111 int attempt_front_merge(struct request_queue *q, struct request *rq); 112 + int blk_attempt_req_merge(struct request_queue *q, struct request *rq, 113 + struct request *next); 106 114 void blk_recalc_rq_segments(struct request *rq); 107 115 void blk_rq_set_mixed_merge(struct request *rq); 108 116

+86 -79

block/cfq-iosched.c

··· 54 54 #define CFQQ_SEEKY(cfqq) (hweight32(cfqq->seek_history) > 32/8) 55 55 56 56 #define RQ_CIC(rq) \ 57 - ((struct cfq_io_context *) (rq)->elevator_private) 58 - #define RQ_CFQQ(rq) (struct cfq_queue *) ((rq)->elevator_private2) 59 - #define RQ_CFQG(rq) (struct cfq_group *) ((rq)->elevator_private3) 57 + ((struct cfq_io_context *) (rq)->elevator_private[0]) 58 + #define RQ_CFQQ(rq) (struct cfq_queue *) ((rq)->elevator_private[1]) 59 + #define RQ_CFQG(rq) (struct cfq_group *) ((rq)->elevator_private[2]) 60 60 61 61 static struct kmem_cache *cfq_pool; 62 62 static struct kmem_cache *cfq_ioc_pool; ··· 146 146 struct cfq_rb_root *service_tree; 147 147 struct cfq_queue *new_cfqq; 148 148 struct cfq_group *cfqg; 149 - struct cfq_group *orig_cfqg; 150 149 /* Number of sectors dispatched from queue in single dispatch round */ 151 150 unsigned long nr_sectors; 152 151 }; ··· 178 179 /* group service_tree key */ 179 180 u64 vdisktime; 180 181 unsigned int weight; 182 + unsigned int new_weight; 183 + bool needs_update; 181 184 182 185 /* number of cfqq currently on this group */ 183 186 int nr_cfqq; ··· 239 238 struct rb_root prio_trees[CFQ_PRIO_LISTS]; 240 239 241 240 unsigned int busy_queues; 241 + unsigned int busy_sync_queues; 242 242 243 243 int rq_in_driver; 244 244 int rq_in_flight[2]; ··· 287 285 unsigned int cfq_slice_idle; 288 286 unsigned int cfq_group_idle; 289 287 unsigned int cfq_latency; 290 - unsigned int cfq_group_isolation; 291 288 292 289 unsigned int cic_index; 293 290 struct list_head cic_list; ··· 502 501 } 503 502 } 504 503 505 - static int cfq_queue_empty(struct request_queue *q) 506 - { 507 - struct cfq_data *cfqd = q->elevator->elevator_data; 508 - 509 - return !cfqd->rq_queued; 510 - } 511 - 512 504 /* 513 505 * Scale schedule slice based on io priority. Use the sync time slice only 514 506 * if a queue is marked sync and has sync io queued. A sync queue with async ··· 552 558 553 559 static void update_min_vdisktime(struct cfq_rb_root *st) 554 560 { 555 - u64 vdisktime = st->min_vdisktime; 556 561 struct cfq_group *cfqg; 557 562 558 563 if (st->left) { 559 564 cfqg = rb_entry_cfqg(st->left); 560 - vdisktime = min_vdisktime(vdisktime, cfqg->vdisktime); 565 + st->min_vdisktime = max_vdisktime(st->min_vdisktime, 566 + cfqg->vdisktime); 561 567 } 562 - 563 - st->min_vdisktime = max_vdisktime(st->min_vdisktime, vdisktime); 564 568 } 565 569 566 570 /* ··· 855 863 } 856 864 857 865 static void 858 - cfq_group_service_tree_add(struct cfq_data *cfqd, struct cfq_group *cfqg) 866 + cfq_update_group_weight(struct cfq_group *cfqg) 867 + { 868 + BUG_ON(!RB_EMPTY_NODE(&cfqg->rb_node)); 869 + if (cfqg->needs_update) { 870 + cfqg->weight = cfqg->new_weight; 871 + cfqg->needs_update = false; 872 + } 873 + } 874 + 875 + static void 876 + cfq_group_service_tree_add(struct cfq_rb_root *st, struct cfq_group *cfqg) 877 + { 878 + BUG_ON(!RB_EMPTY_NODE(&cfqg->rb_node)); 879 + 880 + cfq_update_group_weight(cfqg); 881 + __cfq_group_service_tree_add(st, cfqg); 882 + st->total_weight += cfqg->weight; 883 + } 884 + 885 + static void 886 + cfq_group_notify_queue_add(struct cfq_data *cfqd, struct cfq_group *cfqg) 859 887 { 860 888 struct cfq_rb_root *st = &cfqd->grp_service_tree; 861 889 struct cfq_group *__cfqg; ··· 896 884 cfqg->vdisktime = __cfqg->vdisktime + CFQ_IDLE_DELAY; 897 885 } else 898 886 cfqg->vdisktime = st->min_vdisktime; 899 - 900 - __cfq_group_service_tree_add(st, cfqg); 901 - st->total_weight += cfqg->weight; 887 + cfq_group_service_tree_add(st, cfqg); 902 888 } 903 889 904 890 static void 905 - cfq_group_service_tree_del(struct cfq_data *cfqd, struct cfq_group *cfqg) 891 + cfq_group_service_tree_del(struct cfq_rb_root *st, struct cfq_group *cfqg) 892 + { 893 + st->total_weight -= cfqg->weight; 894 + if (!RB_EMPTY_NODE(&cfqg->rb_node)) 895 + cfq_rb_erase(&cfqg->rb_node, st); 896 + } 897 + 898 + static void 899 + cfq_group_notify_queue_del(struct cfq_data *cfqd, struct cfq_group *cfqg) 906 900 { 907 901 struct cfq_rb_root *st = &cfqd->grp_service_tree; 908 902 ··· 920 902 return; 921 903 922 904 cfq_log_cfqg(cfqd, cfqg, "del_from_rr group"); 923 - st->total_weight -= cfqg->weight; 924 - if (!RB_EMPTY_NODE(&cfqg->rb_node)) 925 - cfq_rb_erase(&cfqg->rb_node, st); 905 + cfq_group_service_tree_del(st, cfqg); 926 906 cfqg->saved_workload_slice = 0; 927 907 cfq_blkiocg_update_dequeue_stats(&cfqg->blkg, 1); 928 908 } 929 909 930 - static inline unsigned int cfq_cfqq_slice_usage(struct cfq_queue *cfqq) 910 + static inline unsigned int cfq_cfqq_slice_usage(struct cfq_queue *cfqq, 911 + unsigned int *unaccounted_time) 931 912 { 932 913 unsigned int slice_used; 933 914 ··· 945 928 1); 946 929 } else { 947 930 slice_used = jiffies - cfqq->slice_start; 948 - if (slice_used > cfqq->allocated_slice) 931 + if (slice_used > cfqq->allocated_slice) { 932 + *unaccounted_time = slice_used - cfqq->allocated_slice; 949 933 slice_used = cfqq->allocated_slice; 934 + } 935 + if (time_after(cfqq->slice_start, cfqq->dispatch_start)) 936 + *unaccounted_time += cfqq->slice_start - 937 + cfqq->dispatch_start; 950 938 } 951 939 952 940 return slice_used; ··· 961 939 struct cfq_queue *cfqq) 962 940 { 963 941 struct cfq_rb_root *st = &cfqd->grp_service_tree; 964 - unsigned int used_sl, charge; 942 + unsigned int used_sl, charge, unaccounted_sl = 0; 965 943 int nr_sync = cfqg->nr_cfqq - cfqg_busy_async_queues(cfqd, cfqg) 966 944 - cfqg->service_tree_idle.count; 967 945 968 946 BUG_ON(nr_sync < 0); 969 - used_sl = charge = cfq_cfqq_slice_usage(cfqq); 947 + used_sl = charge = cfq_cfqq_slice_usage(cfqq, &unaccounted_sl); 970 948 971 949 if (iops_mode(cfqd)) 972 950 charge = cfqq->slice_dispatch; ··· 974 952 charge = cfqq->allocated_slice; 975 953 976 954 /* Can't update vdisktime while group is on service tree */ 977 - cfq_rb_erase(&cfqg->rb_node, st); 955 + cfq_group_service_tree_del(st, cfqg); 978 956 cfqg->vdisktime += cfq_scale_slice(charge, cfqg); 979 - __cfq_group_service_tree_add(st, cfqg); 957 + /* If a new weight was requested, update now, off tree */ 958 + cfq_group_service_tree_add(st, cfqg); 980 959 981 960 /* This group is being expired. Save the context */ 982 961 if (time_after(cfqd->workload_expires, jiffies)) { ··· 993 970 cfq_log_cfqq(cfqq->cfqd, cfqq, "sl_used=%u disp=%u charge=%u iops=%u" 994 971 " sect=%u", used_sl, cfqq->slice_dispatch, charge, 995 972 iops_mode(cfqd), cfqq->nr_sectors); 996 - cfq_blkiocg_update_timeslice_used(&cfqg->blkg, used_sl); 973 + cfq_blkiocg_update_timeslice_used(&cfqg->blkg, used_sl, 974 + unaccounted_sl); 997 975 cfq_blkiocg_set_start_empty_time(&cfqg->blkg); 998 976 } 999 977 ··· 1009 985 void cfq_update_blkio_group_weight(void *key, struct blkio_group *blkg, 1010 986 unsigned int weight) 1011 987 { 1012 - cfqg_of_blkg(blkg)->weight = weight; 988 + struct cfq_group *cfqg = cfqg_of_blkg(blkg); 989 + cfqg->new_weight = weight; 990 + cfqg->needs_update = true; 1013 991 } 1014 992 1015 993 static struct cfq_group * ··· 1213 1187 int new_cfqq = 1; 1214 1188 int group_changed = 0; 1215 1189 1216 - #ifdef CONFIG_CFQ_GROUP_IOSCHED 1217 - if (!cfqd->cfq_group_isolation 1218 - && cfqq_type(cfqq) == SYNC_NOIDLE_WORKLOAD 1219 - && cfqq->cfqg && cfqq->cfqg != &cfqd->root_group) { 1220 - /* Move this cfq to root group */ 1221 - cfq_log_cfqq(cfqd, cfqq, "moving to root group"); 1222 - if (!RB_EMPTY_NODE(&cfqq->rb_node)) 1223 - cfq_group_service_tree_del(cfqd, cfqq->cfqg); 1224 - cfqq->orig_cfqg = cfqq->cfqg; 1225 - cfqq->cfqg = &cfqd->root_group; 1226 - cfqd->root_group.ref++; 1227 - group_changed = 1; 1228 - } else if (!cfqd->cfq_group_isolation 1229 - && cfqq_type(cfqq) == SYNC_WORKLOAD && cfqq->orig_cfqg) { 1230 - /* cfqq is sequential now needs to go to its original group */ 1231 - BUG_ON(cfqq->cfqg != &cfqd->root_group); 1232 - if (!RB_EMPTY_NODE(&cfqq->rb_node)) 1233 - cfq_group_service_tree_del(cfqd, cfqq->cfqg); 1234 - cfq_put_cfqg(cfqq->cfqg); 1235 - cfqq->cfqg = cfqq->orig_cfqg; 1236 - cfqq->orig_cfqg = NULL; 1237 - group_changed = 1; 1238 - cfq_log_cfqq(cfqd, cfqq, "moved to origin group"); 1239 - } 1240 - #endif 1241 - 1242 1190 service_tree = service_tree_for(cfqq->cfqg, cfqq_prio(cfqq), 1243 1191 cfqq_type(cfqq)); 1244 1192 if (cfq_class_idle(cfqq)) { ··· 1284 1284 service_tree->count++; 1285 1285 if ((add_front || !new_cfqq) && !group_changed) 1286 1286 return; 1287 - cfq_group_service_tree_add(cfqd, cfqq->cfqg); 1287 + cfq_group_notify_queue_add(cfqd, cfqq->cfqg); 1288 1288 } 1289 1289 1290 1290 static struct cfq_queue * ··· 1372 1372 BUG_ON(cfq_cfqq_on_rr(cfqq)); 1373 1373 cfq_mark_cfqq_on_rr(cfqq); 1374 1374 cfqd->busy_queues++; 1375 + if (cfq_cfqq_sync(cfqq)) 1376 + cfqd->busy_sync_queues++; 1375 1377 1376 1378 cfq_resort_rr_list(cfqd, cfqq); 1377 1379 } ··· 1397 1395 cfqq->p_root = NULL; 1398 1396 } 1399 1397 1400 - cfq_group_service_tree_del(cfqd, cfqq->cfqg); 1398 + cfq_group_notify_queue_del(cfqd, cfqq->cfqg); 1401 1399 BUG_ON(!cfqd->busy_queues); 1402 1400 cfqd->busy_queues--; 1401 + if (cfq_cfqq_sync(cfqq)) 1402 + cfqd->busy_sync_queues--; 1403 1403 } 1404 1404 1405 1405 /* ··· 2409 2405 * Does this cfqq already have too much IO in flight? 2410 2406 */ 2411 2407 if (cfqq->dispatched >= max_dispatch) { 2408 + bool promote_sync = false; 2412 2409 /* 2413 2410 * idle queue must always only have a single IO in flight 2414 2411 */ ··· 2417 2412 return false; 2418 2413 2419 2414 /* 2415 + * If there is only one sync queue 2416 + * we can ignore async queue here and give the sync 2417 + * queue no dispatch limit. The reason is a sync queue can 2418 + * preempt async queue, limiting the sync queue doesn't make 2419 + * sense. This is useful for aiostress test. 2420 + */ 2421 + if (cfq_cfqq_sync(cfqq) && cfqd->busy_sync_queues == 1) 2422 + promote_sync = true; 2423 + 2424 + /* 2420 2425 * We have other queues, don't allow more IO from this one 2421 2426 */ 2422 - if (cfqd->busy_queues > 1 && cfq_slice_used_soon(cfqd, cfqq)) 2427 + if (cfqd->busy_queues > 1 && cfq_slice_used_soon(cfqd, cfqq) && 2428 + !promote_sync) 2423 2429 return false; 2424 2430 2425 2431 /* 2426 2432 * Sole queue user, no limit 2427 2433 */ 2428 - if (cfqd->busy_queues == 1) 2434 + if (cfqd->busy_queues == 1 || promote_sync) 2429 2435 max_dispatch = -1; 2430 2436 else 2431 2437 /* ··· 2558 2542 static void cfq_put_queue(struct cfq_queue *cfqq) 2559 2543 { 2560 2544 struct cfq_data *cfqd = cfqq->cfqd; 2561 - struct cfq_group *cfqg, *orig_cfqg; 2545 + struct cfq_group *cfqg; 2562 2546 2563 2547 BUG_ON(cfqq->ref <= 0); 2564 2548 ··· 2570 2554 BUG_ON(rb_first(&cfqq->sort_list)); 2571 2555 BUG_ON(cfqq->allocated[READ] + cfqq->allocated[WRITE]); 2572 2556 cfqg = cfqq->cfqg; 2573 - orig_cfqg = cfqq->orig_cfqg; 2574 2557 2575 2558 if (unlikely(cfqd->active_queue == cfqq)) { 2576 2559 __cfq_slice_expired(cfqd, cfqq, 0); ··· 2579 2564 BUG_ON(cfq_cfqq_on_rr(cfqq)); 2580 2565 kmem_cache_free(cfq_pool, cfqq); 2581 2566 cfq_put_cfqg(cfqg); 2582 - if (orig_cfqg) 2583 - cfq_put_cfqg(orig_cfqg); 2584 2567 } 2585 2568 2586 2569 /* ··· 3626 3613 3627 3614 put_io_context(RQ_CIC(rq)->ioc); 3628 3615 3629 - rq->elevator_private = NULL; 3630 - rq->elevator_private2 = NULL; 3616 + rq->elevator_private[0] = NULL; 3617 + rq->elevator_private[1] = NULL; 3631 3618 3632 3619 /* Put down rq reference on cfqg */ 3633 3620 cfq_put_cfqg(RQ_CFQG(rq)); 3634 - rq->elevator_private3 = NULL; 3621 + rq->elevator_private[2] = NULL; 3635 3622 3636 3623 cfq_put_queue(cfqq); 3637 3624 } ··· 3718 3705 } 3719 3706 3720 3707 cfqq->allocated[rw]++; 3708 + 3721 3709 cfqq->ref++; 3722 - rq->elevator_private = cic; 3723 - rq->elevator_private2 = cfqq; 3724 - rq->elevator_private3 = cfq_ref_get_cfqg(cfqq->cfqg); 3725 - 3710 + rq->elevator_private[0] = cic; 3711 + rq->elevator_private[1] = cfqq; 3712 + rq->elevator_private[2] = cfq_ref_get_cfqg(cfqq->cfqg); 3726 3713 spin_unlock_irqrestore(q->queue_lock, flags); 3727 - 3728 3714 return 0; 3729 3715 3730 3716 queue_fail: ··· 3965 3953 cfqd->cfq_slice_idle = cfq_slice_idle; 3966 3954 cfqd->cfq_group_idle = cfq_group_idle; 3967 3955 cfqd->cfq_latency = 1; 3968 - cfqd->cfq_group_isolation = 0; 3969 3956 cfqd->hw_tag = -1; 3970 3957 /* 3971 3958 * we optimistically start assuming sync ops weren't delayed in last ··· 4040 4029 SHOW_FUNCTION(cfq_slice_async_show, cfqd->cfq_slice[0], 1); 4041 4030 SHOW_FUNCTION(cfq_slice_async_rq_show, cfqd->cfq_slice_async_rq, 0); 4042 4031 SHOW_FUNCTION(cfq_low_latency_show, cfqd->cfq_latency, 0); 4043 - SHOW_FUNCTION(cfq_group_isolation_show, cfqd->cfq_group_isolation, 0); 4044 4032 #undef SHOW_FUNCTION 4045 4033 4046 4034 #define STORE_FUNCTION(__FUNC, __PTR, MIN, MAX, __CONV) \ ··· 4073 4063 STORE_FUNCTION(cfq_slice_async_rq_store, &cfqd->cfq_slice_async_rq, 1, 4074 4064 UINT_MAX, 0); 4075 4065 STORE_FUNCTION(cfq_low_latency_store, &cfqd->cfq_latency, 0, 1, 0); 4076 - STORE_FUNCTION(cfq_group_isolation_store, &cfqd->cfq_group_isolation, 0, 1, 0); 4077 4066 #undef STORE_FUNCTION 4078 4067 4079 4068 #define CFQ_ATTR(name) \ ··· 4090 4081 CFQ_ATTR(slice_idle), 4091 4082 CFQ_ATTR(group_idle), 4092 4083 CFQ_ATTR(low_latency), 4093 - CFQ_ATTR(group_isolation), 4094 4084 __ATTR_NULL 4095 4085 }; 4096 4086 ··· 4104 4096 .elevator_add_req_fn = cfq_insert_request, 4105 4097 .elevator_activate_req_fn = cfq_activate_request, 4106 4098 .elevator_deactivate_req_fn = cfq_deactivate_request, 4107 - .elevator_queue_empty_fn = cfq_queue_empty, 4108 4099 .elevator_completed_req_fn = cfq_completed_request, 4109 4100 .elevator_former_req_fn = elv_rb_former_request, 4110 4101 .elevator_latter_req_fn = elv_rb_latter_request,

+3 -3

block/cfq.h

··· 16 16 } 17 17 18 18 static inline void cfq_blkiocg_update_timeslice_used(struct blkio_group *blkg, 19 - unsigned long time) 19 + unsigned long time, unsigned long unaccounted_time) 20 20 { 21 - blkiocg_update_timeslice_used(blkg, time); 21 + blkiocg_update_timeslice_used(blkg, time, unaccounted_time); 22 22 } 23 23 24 24 static inline void cfq_blkiocg_set_start_empty_time(struct blkio_group *blkg) ··· 85 85 unsigned long dequeue) {} 86 86 87 87 static inline void cfq_blkiocg_update_timeslice_used(struct blkio_group *blkg, 88 - unsigned long time) {} 88 + unsigned long time, unsigned long unaccounted_time) {} 89 89 static inline void cfq_blkiocg_set_start_empty_time(struct blkio_group *blkg) {} 90 90 static inline void cfq_blkiocg_update_io_remove_stats(struct blkio_group *blkg, 91 91 bool direction, bool sync) {}

-9

block/deadline-iosched.c

··· 326 326 return 1; 327 327 } 328 328 329 - static int deadline_queue_empty(struct request_queue *q) 330 - { 331 - struct deadline_data *dd = q->elevator->elevator_data; 332 - 333 - return list_empty(&dd->fifo_list[WRITE]) 334 - && list_empty(&dd->fifo_list[READ]); 335 - } 336 - 337 329 static void deadline_exit_queue(struct elevator_queue *e) 338 330 { 339 331 struct deadline_data *dd = e->elevator_data; ··· 437 445 .elevator_merge_req_fn = deadline_merged_requests, 438 446 .elevator_dispatch_fn = deadline_dispatch_requests, 439 447 .elevator_add_req_fn = deadline_add_request, 440 - .elevator_queue_empty_fn = deadline_queue_empty, 441 448 .elevator_former_req_fn = elv_rb_former_request, 442 449 .elevator_latter_req_fn = elv_rb_latter_request, 443 450 .elevator_init_fn = deadline_init_queue,

+64 -44

block/elevator.c

··· 113 113 } 114 114 EXPORT_SYMBOL(elv_rq_merge_ok); 115 115 116 - static inline int elv_try_merge(struct request *__rq, struct bio *bio) 116 + int elv_try_merge(struct request *__rq, struct bio *bio) 117 117 { 118 118 int ret = ELEVATOR_NO_MERGE; 119 119 ··· 421 421 struct list_head *entry; 422 422 int stop_flags; 423 423 424 + BUG_ON(rq->cmd_flags & REQ_ON_PLUG); 425 + 424 426 if (q->last_merge == rq) 425 427 q->last_merge = NULL; 426 428 ··· 521 519 return ELEVATOR_NO_MERGE; 522 520 } 523 521 522 + /* 523 + * Attempt to do an insertion back merge. Only check for the case where 524 + * we can append 'rq' to an existing request, so we can throw 'rq' away 525 + * afterwards. 526 + * 527 + * Returns true if we merged, false otherwise 528 + */ 529 + static bool elv_attempt_insert_merge(struct request_queue *q, 530 + struct request *rq) 531 + { 532 + struct request *__rq; 533 + 534 + if (blk_queue_nomerges(q)) 535 + return false; 536 + 537 + /* 538 + * First try one-hit cache. 539 + */ 540 + if (q->last_merge && blk_attempt_req_merge(q, q->last_merge, rq)) 541 + return true; 542 + 543 + if (blk_queue_noxmerges(q)) 544 + return false; 545 + 546 + /* 547 + * See if our hash lookup can find a potential backmerge. 548 + */ 549 + __rq = elv_rqhash_find(q, blk_rq_pos(rq)); 550 + if (__rq && blk_attempt_req_merge(q, __rq, rq)) 551 + return true; 552 + 553 + return false; 554 + } 555 + 524 556 void elv_merged_request(struct request_queue *q, struct request *rq, int type) 525 557 { 526 558 struct elevator_queue *e = q->elevator; ··· 572 536 struct request *next) 573 537 { 574 538 struct elevator_queue *e = q->elevator; 539 + const int next_sorted = next->cmd_flags & REQ_SORTED; 575 540 576 - if (e->ops->elevator_merge_req_fn) 541 + if (next_sorted && e->ops->elevator_merge_req_fn) 577 542 e->ops->elevator_merge_req_fn(q, rq, next); 578 543 579 544 elv_rqhash_reposition(q, rq); 580 - elv_rqhash_del(q, next); 581 545 582 - q->nr_sorted--; 546 + if (next_sorted) { 547 + elv_rqhash_del(q, next); 548 + q->nr_sorted--; 549 + } 550 + 583 551 q->last_merge = rq; 584 552 } 585 553 ··· 657 617 658 618 void elv_insert(struct request_queue *q, struct request *rq, int where) 659 619 { 660 - int unplug_it = 1; 661 - 662 620 trace_block_rq_insert(q, rq); 663 621 664 622 rq->q = q; 665 623 666 624 switch (where) { 667 625 case ELEVATOR_INSERT_REQUEUE: 668 - /* 669 - * Most requeues happen because of a busy condition, 670 - * don't force unplug of the queue for that case. 671 - * Clear unplug_it and fall through. 672 - */ 673 - unplug_it = 0; 674 - 675 626 case ELEVATOR_INSERT_FRONT: 676 627 rq->cmd_flags |= REQ_SOFTBARRIER; 677 628 list_add(&rq->queuelist, &q->queue_head); ··· 685 654 __blk_run_queue(q, false); 686 655 break; 687 656 657 + case ELEVATOR_INSERT_SORT_MERGE: 658 + /* 659 + * If we succeed in merging this request with one in the 660 + * queue already, we are done - rq has now been freed, 661 + * so no need to do anything further. 662 + */ 663 + if (elv_attempt_insert_merge(q, rq)) 664 + break; 688 665 case ELEVATOR_INSERT_SORT: 689 666 BUG_ON(rq->cmd_type != REQ_TYPE_FS && 690 667 !(rq->cmd_flags & REQ_DISCARD)); ··· 712 673 q->elevator->ops->elevator_add_req_fn(q, rq); 713 674 break; 714 675 676 + case ELEVATOR_INSERT_FLUSH: 677 + rq->cmd_flags |= REQ_SOFTBARRIER; 678 + blk_insert_flush(rq); 679 + break; 715 680 default: 716 681 printk(KERN_ERR "%s: bad insertion point %d\n", 717 682 __func__, where); 718 683 BUG(); 719 684 } 720 - 721 - if (unplug_it && blk_queue_plugged(q)) { 722 - int nrq = q->rq.count[BLK_RW_SYNC] + q->rq.count[BLK_RW_ASYNC] 723 - - queue_in_flight(q); 724 - 725 - if (nrq >= q->unplug_thresh) 726 - __generic_unplug_device(q); 727 - } 728 685 } 729 686 730 - void __elv_add_request(struct request_queue *q, struct request *rq, int where, 731 - int plug) 687 + void __elv_add_request(struct request_queue *q, struct request *rq, int where) 732 688 { 689 + BUG_ON(rq->cmd_flags & REQ_ON_PLUG); 690 + 733 691 if (rq->cmd_flags & REQ_SOFTBARRIER) { 734 692 /* barriers are scheduling boundary, update end_sector */ 735 693 if (rq->cmd_type == REQ_TYPE_FS || ··· 738 702 where == ELEVATOR_INSERT_SORT) 739 703 where = ELEVATOR_INSERT_BACK; 740 704 741 - if (plug) 742 - blk_plug_device(q); 743 - 744 705 elv_insert(q, rq, where); 745 706 } 746 707 EXPORT_SYMBOL(__elv_add_request); 747 708 748 - void elv_add_request(struct request_queue *q, struct request *rq, int where, 749 - int plug) 709 + void elv_add_request(struct request_queue *q, struct request *rq, int where) 750 710 { 751 711 unsigned long flags; 752 712 753 713 spin_lock_irqsave(q->queue_lock, flags); 754 - __elv_add_request(q, rq, where, plug); 714 + __elv_add_request(q, rq, where); 755 715 spin_unlock_irqrestore(q->queue_lock, flags); 756 716 } 757 717 EXPORT_SYMBOL(elv_add_request); 758 - 759 - int elv_queue_empty(struct request_queue *q) 760 - { 761 - struct elevator_queue *e = q->elevator; 762 - 763 - if (!list_empty(&q->queue_head)) 764 - return 0; 765 - 766 - if (e->ops->elevator_queue_empty_fn) 767 - return e->ops->elevator_queue_empty_fn(q); 768 - 769 - return 1; 770 - } 771 - EXPORT_SYMBOL(elv_queue_empty); 772 718 773 719 struct request *elv_latter_request(struct request_queue *q, struct request *rq) 774 720 { ··· 777 759 if (e->ops->elevator_set_req_fn) 778 760 return e->ops->elevator_set_req_fn(q, rq, gfp_mask); 779 761 780 - rq->elevator_private = NULL; 762 + rq->elevator_private[0] = NULL; 781 763 return 0; 782 764 } 783 765 ··· 802 784 void elv_abort_queue(struct request_queue *q) 803 785 { 804 786 struct request *rq; 787 + 788 + blk_abort_flushes(q); 805 789 806 790 while (!list_empty(&q->queue_head)) { 807 791 rq = list_entry_rq(q->queue_head.next);

+9 -9

block/genhd.c

··· 1158 1158 "%u %lu %lu %llu %u %u %u %u\n", 1159 1159 MAJOR(part_devt(hd)), MINOR(part_devt(hd)), 1160 1160 disk_name(gp, hd->partno, buf), 1161 - part_stat_read(hd, ios[0]), 1162 - part_stat_read(hd, merges[0]), 1163 - (unsigned long long)part_stat_read(hd, sectors[0]), 1164 - jiffies_to_msecs(part_stat_read(hd, ticks[0])), 1165 - part_stat_read(hd, ios[1]), 1166 - part_stat_read(hd, merges[1]), 1167 - (unsigned long long)part_stat_read(hd, sectors[1]), 1168 - jiffies_to_msecs(part_stat_read(hd, ticks[1])), 1161 + part_stat_read(hd, ios[READ]), 1162 + part_stat_read(hd, merges[READ]), 1163 + (unsigned long long)part_stat_read(hd, sectors[READ]), 1164 + jiffies_to_msecs(part_stat_read(hd, ticks[READ])), 1165 + part_stat_read(hd, ios[WRITE]), 1166 + part_stat_read(hd, merges[WRITE]), 1167 + (unsigned long long)part_stat_read(hd, sectors[WRITE]), 1168 + jiffies_to_msecs(part_stat_read(hd, ticks[WRITE])), 1169 1169 part_in_flight(hd), 1170 1170 jiffies_to_msecs(part_stat_read(hd, io_ticks)), 1171 1171 jiffies_to_msecs(part_stat_read(hd, time_in_queue)) ··· 1494 1494 void disk_unblock_events(struct gendisk *disk) 1495 1495 { 1496 1496 if (disk->ev) 1497 - __disk_unblock_events(disk, true); 1497 + __disk_unblock_events(disk, false); 1498 1498 } 1499 1499 1500 1500 /**

-8

block/noop-iosched.c

··· 39 39 list_add_tail(&rq->queuelist, &nd->queue); 40 40 } 41 41 42 - static int noop_queue_empty(struct request_queue *q) 43 - { 44 - struct noop_data *nd = q->elevator->elevator_data; 45 - 46 - return list_empty(&nd->queue); 47 - } 48 - 49 42 static struct request * 50 43 noop_former_request(struct request_queue *q, struct request *rq) 51 44 { ··· 83 90 .elevator_merge_req_fn = noop_merged_requests, 84 91 .elevator_dispatch_fn = noop_dispatch, 85 92 .elevator_add_req_fn = noop_add_request, 86 - .elevator_queue_empty_fn = noop_queue_empty, 87 93 .elevator_former_req_fn = noop_former_request, 88 94 .elevator_latter_req_fn = noop_latter_request, 89 95 .elevator_init_fn = noop_init_queue,

+5 -3

drivers/block/DAC960.c

··· 140 140 return 0; 141 141 } 142 142 143 - static int DAC960_media_changed(struct gendisk *disk) 143 + static unsigned int DAC960_check_events(struct gendisk *disk, 144 + unsigned int clearing) 144 145 { 145 146 DAC960_Controller_T *p = disk->queue->queuedata; 146 147 int drive_nr = (long)disk->private_data; 147 148 148 149 if (!p->LogicalDriveInitiallyAccessible[drive_nr]) 149 - return 1; 150 + return DISK_EVENT_MEDIA_CHANGE; 150 151 return 0; 151 152 } 152 153 ··· 164 163 .owner = THIS_MODULE, 165 164 .open = DAC960_open, 166 165 .getgeo = DAC960_getgeo, 167 - .media_changed = DAC960_media_changed, 166 + .check_events = DAC960_check_events, 168 167 .revalidate_disk = DAC960_revalidate_disk, 169 168 }; 170 169 ··· 2547 2546 disk->major = MajorNumber; 2548 2547 disk->first_minor = n << DAC960_MaxPartitionsBits; 2549 2548 disk->fops = &DAC960_BlockDeviceOperations; 2549 + disk->events = DISK_EVENT_MEDIA_CHANGE; 2550 2550 } 2551 2551 /* 2552 2552 Indicate the Block Device Registration completed successfully,

+5 -4

drivers/block/amiflop.c

··· 1658 1658 } 1659 1659 1660 1660 /* 1661 - * floppy-change is never called from an interrupt, so we can relax a bit 1661 + * check_events is never called from an interrupt, so we can relax a bit 1662 1662 * here, sleep etc. Note that floppy-on tries to set current_DOR to point 1663 1663 * to the desired drive, but it will probably not survive the sleep if 1664 1664 * several floppies are used at the same time: thus the loop. 1665 1665 */ 1666 - static int amiga_floppy_change(struct gendisk *disk) 1666 + static unsigned amiga_check_events(struct gendisk *disk, unsigned int clearing) 1667 1667 { 1668 1668 struct amiga_floppy_struct *p = disk->private_data; 1669 1669 int drive = p - unit; ··· 1686 1686 p->dirty = 0; 1687 1687 writepending = 0; /* if this was true before, too bad! */ 1688 1688 writefromint = 0; 1689 - return 1; 1689 + return DISK_EVENT_MEDIA_CHANGE; 1690 1690 } 1691 1691 return 0; 1692 1692 } ··· 1697 1697 .release = floppy_release, 1698 1698 .ioctl = fd_ioctl, 1699 1699 .getgeo = fd_getgeo, 1700 - .media_changed = amiga_floppy_change, 1700 + .check_events = amiga_check_events, 1701 1701 }; 1702 1702 1703 1703 static int __init fd_probe_drives(void) ··· 1736 1736 disk->major = FLOPPY_MAJOR; 1737 1737 disk->first_minor = drive; 1738 1738 disk->fops = &floppy_fops; 1739 + disk->events = DISK_EVENT_MEDIA_CHANGE; 1739 1740 sprintf(disk->disk_name, "fd%d", drive); 1740 1741 disk->private_data = &unit[drive]; 1741 1742 set_capacity(disk, 880*2);

+8 -6

drivers/block/ataflop.c

··· 1324 1324 * due to unrecognised disk changes. 1325 1325 */ 1326 1326 1327 - static int check_floppy_change(struct gendisk *disk) 1327 + static unsigned int floppy_check_events(struct gendisk *disk, 1328 + unsigned int clearing) 1328 1329 { 1329 1330 struct atari_floppy_struct *p = disk->private_data; 1330 1331 unsigned int drive = p - unit; 1331 1332 if (test_bit (drive, &fake_change)) { 1332 1333 /* simulated change (e.g. after formatting) */ 1333 - return 1; 1334 + return DISK_EVENT_MEDIA_CHANGE; 1334 1335 } 1335 1336 if (test_bit (drive, &changed_floppies)) { 1336 1337 /* surely changed (the WP signal changed at least once) */ 1337 - return 1; 1338 + return DISK_EVENT_MEDIA_CHANGE; 1338 1339 } 1339 1340 if (UD.wpstat) { 1340 1341 /* WP is on -> could be changed: to be sure, buffers should be 1341 1342 * invalidated... 1342 1343 */ 1343 - return 1; 1344 + return DISK_EVENT_MEDIA_CHANGE; 1344 1345 } 1345 1346 1346 1347 return 0; ··· 1571 1570 * or the next access will revalidate - and clear UDT :-( 1572 1571 */ 1573 1572 1574 - if (check_floppy_change(disk)) 1573 + if (floppy_check_events(disk, 0)) 1575 1574 floppy_revalidate(disk); 1576 1575 1577 1576 if (UD.flags & FTD_MSG) ··· 1905 1904 .open = floppy_unlocked_open, 1906 1905 .release = floppy_release, 1907 1906 .ioctl = fd_ioctl, 1908 - .media_changed = check_floppy_change, 1907 + .check_events = floppy_check_events, 1909 1908 .revalidate_disk= floppy_revalidate, 1910 1909 }; 1911 1910 ··· 1964 1963 unit[i].disk->first_minor = i; 1965 1964 sprintf(unit[i].disk->disk_name, "fd%d", i); 1966 1965 unit[i].disk->fops = &floppy_fops; 1966 + unit[i].disk->events = DISK_EVENT_MEDIA_CHANGE; 1967 1967 unit[i].disk->private_data = &unit[i]; 1968 1968 unit[i].disk->queue = blk_init_queue(do_fd_request, 1969 1969 &ataflop_lock);

-6

drivers/block/cciss.c

··· 3170 3170 int sg_index = 0; 3171 3171 int chained = 0; 3172 3172 3173 - /* We call start_io here in case there is a command waiting on the 3174 - * queue that has not been sent. 3175 - */ 3176 - if (blk_queue_plugged(q)) 3177 - goto startio; 3178 - 3179 3173 queue: 3180 3174 creq = blk_peek_request(q); 3181 3175 if (!creq)

-3

drivers/block/cpqarray.c

··· 911 911 struct scatterlist tmp_sg[SG_MAX]; 912 912 int i, dir, seg; 913 913 914 - if (blk_queue_plugged(q)) 915 - goto startio; 916 - 917 914 queue_next: 918 915 creq = blk_peek_request(q); 919 916 if (!creq)

+1 -3

drivers/block/drbd/drbd_actlog.c

··· 80 80 81 81 if ((rw & WRITE) && !test_bit(MD_NO_FUA, &mdev->flags)) 82 82 rw |= REQ_FUA; 83 - rw |= REQ_UNPLUG | REQ_SYNC; 83 + rw |= REQ_SYNC; 84 84 85 85 bio = bio_alloc(GFP_NOIO, 1); 86 86 bio->bi_bdev = bdev->md_bdev; ··· 688 688 submit_bio(WRITE, bios[i]); 689 689 } 690 690 } 691 - 692 - drbd_blk_run_queue(bdev_get_queue(mdev->ldev->md_bdev)); 693 691 694 692 /* always (try to) flush bitmap to stable storage */ 695 693 drbd_md_flush(mdev);

-1

drivers/block/drbd/drbd_bitmap.c

··· 840 840 for (i = 0; i < num_pages; i++) 841 841 bm_page_io_async(mdev, b, i, rw); 842 842 843 - drbd_blk_run_queue(bdev_get_queue(mdev->ldev->md_bdev)); 844 843 wait_event(b->bm_io_wait, atomic_read(&b->bm_async_io) == 0); 845 844 846 845 if (test_bit(BM_MD_IO_ERROR, &b->bm_flags)) {

+1 -15

drivers/block/drbd/drbd_int.h

··· 377 377 #define DP_HARDBARRIER 1 /* depricated */ 378 378 #define DP_RW_SYNC 2 /* equals REQ_SYNC */ 379 379 #define DP_MAY_SET_IN_SYNC 4 380 - #define DP_UNPLUG 8 /* equals REQ_UNPLUG */ 380 + #define DP_UNPLUG 8 /* not used anymore */ 381 381 #define DP_FUA 16 /* equals REQ_FUA */ 382 382 #define DP_FLUSH 32 /* equals REQ_FLUSH */ 383 383 #define DP_DISCARD 64 /* equals REQ_DISCARD */ ··· 2380 2380 #define QUEUE_ORDERED_NONE 0 2381 2381 #endif 2382 2382 return QUEUE_ORDERED_NONE; 2383 - } 2384 - 2385 - static inline void drbd_blk_run_queue(struct request_queue *q) 2386 - { 2387 - if (q && q->unplug_fn) 2388 - q->unplug_fn(q); 2389 - } 2390 - 2391 - static inline void drbd_kick_lo(struct drbd_conf *mdev) 2392 - { 2393 - if (get_ldev(mdev)) { 2394 - drbd_blk_run_queue(bdev_get_queue(mdev->ldev->backing_bdev)); 2395 - put_ldev(mdev); 2396 - } 2397 2383 } 2398 2384 2399 2385 static inline void drbd_md_flush(struct drbd_conf *mdev)

+2 -34

drivers/block/drbd/drbd_main.c

··· 2477 2477 { 2478 2478 if (mdev->agreed_pro_version >= 95) 2479 2479 return (bi_rw & REQ_SYNC ? DP_RW_SYNC : 0) | 2480 - (bi_rw & REQ_UNPLUG ? DP_UNPLUG : 0) | 2481 2480 (bi_rw & REQ_FUA ? DP_FUA : 0) | 2482 2481 (bi_rw & REQ_FLUSH ? DP_FLUSH : 0) | 2483 2482 (bi_rw & REQ_DISCARD ? DP_DISCARD : 0); 2484 2483 else 2485 - return bi_rw & (REQ_SYNC | REQ_UNPLUG) ? DP_RW_SYNC : 0; 2484 + return bi_rw & REQ_SYNC ? DP_RW_SYNC : 0; 2486 2485 } 2487 2486 2488 2487 /* Used to send write requests ··· 2716 2717 mdev->open_cnt--; 2717 2718 mutex_unlock(&drbd_main_mutex); 2718 2719 return 0; 2719 - } 2720 - 2721 - static void drbd_unplug_fn(struct request_queue *q) 2722 - { 2723 - struct drbd_conf *mdev = q->queuedata; 2724 - 2725 - /* unplug FIRST */ 2726 - spin_lock_irq(q->queue_lock); 2727 - blk_remove_plug(q); 2728 - spin_unlock_irq(q->queue_lock); 2729 - 2730 - /* only if connected */ 2731 - spin_lock_irq(&mdev->req_lock); 2732 - if (mdev->state.pdsk >= D_INCONSISTENT && mdev->state.conn >= C_CONNECTED) { 2733 - D_ASSERT(mdev->state.role == R_PRIMARY); 2734 - if (test_and_clear_bit(UNPLUG_REMOTE, &mdev->flags)) { 2735 - /* add to the data.work queue, 2736 - * unless already queued. 2737 - * XXX this might be a good addition to drbd_queue_work 2738 - * anyways, to detect "double queuing" ... */ 2739 - if (list_empty(&mdev->unplug_work.list)) 2740 - drbd_queue_work(&mdev->data.work, 2741 - &mdev->unplug_work); 2742 - } 2743 - } 2744 - spin_unlock_irq(&mdev->req_lock); 2745 - 2746 - if (mdev->state.disk >= D_INCONSISTENT) 2747 - drbd_kick_lo(mdev); 2748 2720 } 2749 2721 2750 2722 static void drbd_set_defaults(struct drbd_conf *mdev) ··· 3192 3222 blk_queue_max_segment_size(q, DRBD_MAX_SEGMENT_SIZE); 3193 3223 blk_queue_bounce_limit(q, BLK_BOUNCE_ANY); 3194 3224 blk_queue_merge_bvec(q, drbd_merge_bvec); 3195 - q->queue_lock = &mdev->req_lock; /* needed since we use */ 3196 - /* plugging on a queue, that actually has no requests! */ 3197 - q->unplug_fn = drbd_unplug_fn; 3225 + q->queue_lock = &mdev->req_lock; 3198 3226 3199 3227 mdev->md_io_page = alloc_page(GFP_KERNEL); 3200 3228 if (!mdev->md_io_page)

+2 -27

drivers/block/drbd/drbd_receiver.c

··· 187 187 return NULL; 188 188 } 189 189 190 - /* kick lower level device, if we have more than (arbitrary number) 191 - * reference counts on it, which typically are locally submitted io 192 - * requests. don't use unacked_cnt, so we speed up proto A and B, too. */ 193 - static void maybe_kick_lo(struct drbd_conf *mdev) 194 - { 195 - if (atomic_read(&mdev->local_cnt) >= mdev->net_conf->unplug_watermark) 196 - drbd_kick_lo(mdev); 197 - } 198 - 199 190 static void reclaim_net_ee(struct drbd_conf *mdev, struct list_head *to_be_freed) 200 191 { 201 192 struct drbd_epoch_entry *e; ··· 210 219 LIST_HEAD(reclaimed); 211 220 struct drbd_epoch_entry *e, *t; 212 221 213 - maybe_kick_lo(mdev); 214 222 spin_lock_irq(&mdev->req_lock); 215 223 reclaim_net_ee(mdev, &reclaimed); 216 224 spin_unlock_irq(&mdev->req_lock); ··· 426 436 while (!list_empty(head)) { 427 437 prepare_to_wait(&mdev->ee_wait, &wait, TASK_UNINTERRUPTIBLE); 428 438 spin_unlock_irq(&mdev->req_lock); 429 - drbd_kick_lo(mdev); 430 - schedule(); 439 + io_schedule(); 431 440 finish_wait(&mdev->ee_wait, &wait); 432 441 spin_lock_irq(&mdev->req_lock); 433 442 } ··· 1100 1111 /* > e->sector, unless this is the first bio */ 1101 1112 bio->bi_sector = sector; 1102 1113 bio->bi_bdev = mdev->ldev->backing_bdev; 1103 - /* we special case some flags in the multi-bio case, see below 1104 - * (REQ_UNPLUG) */ 1105 1114 bio->bi_rw = rw; 1106 1115 bio->bi_private = e; 1107 1116 bio->bi_end_io = drbd_endio_sec; ··· 1128 1141 bios = bios->bi_next; 1129 1142 bio->bi_next = NULL; 1130 1143 1131 - /* strip off REQ_UNPLUG unless it is the last bio */ 1132 - if (bios) 1133 - bio->bi_rw &= ~REQ_UNPLUG; 1134 - 1135 1144 drbd_generic_make_request(mdev, fault_type, bio); 1136 1145 } while (bios); 1137 - maybe_kick_lo(mdev); 1138 1146 return 0; 1139 1147 1140 1148 fail: ··· 1148 1166 struct drbd_epoch *epoch; 1149 1167 1150 1168 inc_unacked(mdev); 1151 - 1152 - if (mdev->net_conf->wire_protocol != DRBD_PROT_C) 1153 - drbd_kick_lo(mdev); 1154 1169 1155 1170 mdev->current_epoch->barrier_nr = p->barrier; 1156 1171 rv = drbd_may_finish_epoch(mdev, mdev->current_epoch, EV_GOT_BARRIER_NR); ··· 1615 1636 { 1616 1637 if (mdev->agreed_pro_version >= 95) 1617 1638 return (dpf & DP_RW_SYNC ? REQ_SYNC : 0) | 1618 - (dpf & DP_UNPLUG ? REQ_UNPLUG : 0) | 1619 1639 (dpf & DP_FUA ? REQ_FUA : 0) | 1620 1640 (dpf & DP_FLUSH ? REQ_FUA : 0) | 1621 1641 (dpf & DP_DISCARD ? REQ_DISCARD : 0); 1622 1642 else 1623 - return dpf & DP_RW_SYNC ? (REQ_SYNC | REQ_UNPLUG) : 0; 1643 + return dpf & DP_RW_SYNC ? REQ_SYNC : 0; 1624 1644 } 1625 1645 1626 1646 /* mirrored write */ ··· 3534 3556 3535 3557 static int receive_UnplugRemote(struct drbd_conf *mdev, enum drbd_packets cmd, unsigned int data_size) 3536 3558 { 3537 - if (mdev->state.disk >= D_INCONSISTENT) 3538 - drbd_kick_lo(mdev); 3539 - 3540 3559 /* Make sure we've acked all the TCP data associated 3541 3560 * with the data requests being unplugged */ 3542 3561 drbd_tcp_quickack(mdev->data.socket);

-4

drivers/block/drbd/drbd_req.c

··· 960 960 bio_endio(req->private_bio, -EIO); 961 961 } 962 962 963 - /* we need to plug ALWAYS since we possibly need to kick lo_dev. 964 - * we plug after submit, so we won't miss an unplug event */ 965 - drbd_plug_device(mdev); 966 - 967 963 return 0; 968 964 969 965 fail_conflicting:

-1

drivers/block/drbd/drbd_worker.c

··· 792 792 * queue (or even the read operations for those packets 793 793 * is not finished by now). Retry in 100ms. */ 794 794 795 - drbd_kick_lo(mdev); 796 795 __set_current_state(TASK_INTERRUPTIBLE); 797 796 schedule_timeout(HZ / 10); 798 797 w = kmalloc(sizeof(struct drbd_work), GFP_ATOMIC);

-18

drivers/block/drbd/drbd_wrappers.h

··· 45 45 generic_make_request(bio); 46 46 } 47 47 48 - static inline void drbd_plug_device(struct drbd_conf *mdev) 49 - { 50 - struct request_queue *q; 51 - q = bdev_get_queue(mdev->this_bdev); 52 - 53 - spin_lock_irq(q->queue_lock); 54 - 55 - /* XXX the check on !blk_queue_plugged is redundant, 56 - * implicitly checked in blk_plug_device */ 57 - 58 - if (!blk_queue_plugged(q)) { 59 - blk_plug_device(q); 60 - del_timer(&q->unplug_timer); 61 - /* unplugging should not happen automatically... */ 62 - } 63 - spin_unlock_irq(q->queue_lock); 64 - } 65 - 66 48 static inline int drbd_crypto_is_hash(struct crypto_tfm *tfm) 67 49 { 68 50 return (crypto_tfm_alg_type(tfm) & CRYPTO_ALG_TYPE_HASH_MASK)

+6 -5

drivers/block/floppy.c

··· 3770 3770 /* 3771 3771 * Check if the disk has been changed or if a change has been faked. 3772 3772 */ 3773 - static int check_floppy_change(struct gendisk *disk) 3773 + static unsigned int floppy_check_events(struct gendisk *disk, 3774 + unsigned int clearing) 3774 3775 { 3775 3776 int drive = (long)disk->private_data; 3776 3777 3777 3778 if (test_bit(FD_DISK_CHANGED_BIT, &UDRS->flags) || 3778 3779 test_bit(FD_VERIFY_BIT, &UDRS->flags)) 3779 - return 1; 3780 + return DISK_EVENT_MEDIA_CHANGE; 3780 3781 3781 3782 if (time_after(jiffies, UDRS->last_checked + UDP->checkfreq)) { 3782 3783 lock_fdc(drive, false); ··· 3789 3788 test_bit(FD_VERIFY_BIT, &UDRS->flags) || 3790 3789 test_bit(drive, &fake_change) || 3791 3790 drive_no_geom(drive)) 3792 - return 1; 3791 + return DISK_EVENT_MEDIA_CHANGE; 3793 3792 return 0; 3794 3793 } 3795 3794 ··· 3838 3837 bio.bi_end_io = floppy_rb0_complete; 3839 3838 3840 3839 submit_bio(READ, &bio); 3841 - generic_unplug_device(bdev_get_queue(bdev)); 3842 3840 process_fd_request(); 3843 3841 wait_for_completion(&complete); 3844 3842 ··· 3898 3898 .release = floppy_release, 3899 3899 .ioctl = fd_ioctl, 3900 3900 .getgeo = fd_getgeo, 3901 - .media_changed = check_floppy_change, 3901 + .check_events = floppy_check_events, 3902 3902 .revalidate_disk = floppy_revalidate, 3903 3903 }; 3904 3904 ··· 4205 4205 disks[dr]->major = FLOPPY_MAJOR; 4206 4206 disks[dr]->first_minor = TOMINOR(dr); 4207 4207 disks[dr]->fops = &floppy_fops; 4208 + disks[dr]->events = DISK_EVENT_MEDIA_CHANGE; 4208 4209 sprintf(disks[dr]->disk_name, "fd%d", dr); 4209 4210 4210 4211 init_timer(&motor_off_timer[dr]);

-16

drivers/block/loop.c

··· 540 540 return 0; 541 541 } 542 542 543 - /* 544 - * kick off io on the underlying address space 545 - */ 546 - static void loop_unplug(struct request_queue *q) 547 - { 548 - struct loop_device *lo = q->queuedata; 549 - 550 - queue_flag_clear_unlocked(QUEUE_FLAG_PLUGGED, q); 551 - blk_run_address_space(lo->lo_backing_file->f_mapping); 552 - } 553 - 554 543 struct switch_request { 555 544 struct file *file; 556 545 struct completion wait; ··· 906 917 */ 907 918 blk_queue_make_request(lo->lo_queue, loop_make_request); 908 919 lo->lo_queue->queuedata = lo; 909 - lo->lo_queue->unplug_fn = loop_unplug; 910 920 911 921 if (!(lo_flags & LO_FLAGS_READ_ONLY) && file->f_op->fsync) 912 922 blk_queue_flush(lo->lo_queue, REQ_FLUSH); ··· 1007 1019 1008 1020 kthread_stop(lo->lo_thread); 1009 1021 1010 - lo->lo_queue->unplug_fn = NULL; 1011 1022 lo->lo_backing_file = NULL; 1012 1023 1013 1024 loop_release_xfer(lo); ··· 1623 1636 1624 1637 static void loop_free(struct loop_device *lo) 1625 1638 { 1626 - if (!lo->lo_queue->queue_lock) 1627 - lo->lo_queue->queue_lock = &lo->lo_queue->__queue_lock; 1628 - 1629 1639 blk_cleanup_queue(lo->lo_queue); 1630 1640 put_disk(lo->lo_disk); 1631 1641 list_del(&lo->lo_list);

+11 -7

drivers/block/paride/pcd.c

··· 172 172 static int pcd_open(struct cdrom_device_info *cdi, int purpose); 173 173 static void pcd_release(struct cdrom_device_info *cdi); 174 174 static int pcd_drive_status(struct cdrom_device_info *cdi, int slot_nr); 175 - static int pcd_media_changed(struct cdrom_device_info *cdi, int slot_nr); 175 + static unsigned int pcd_check_events(struct cdrom_device_info *cdi, 176 + unsigned int clearing, int slot_nr); 176 177 static int pcd_tray_move(struct cdrom_device_info *cdi, int position); 177 178 static int pcd_lock_door(struct cdrom_device_info *cdi, int lock); 178 179 static int pcd_drive_reset(struct cdrom_device_info *cdi); ··· 258 257 return ret; 259 258 } 260 259 261 - static int pcd_block_media_changed(struct gendisk *disk) 260 + static unsigned int pcd_block_check_events(struct gendisk *disk, 261 + unsigned int clearing) 262 262 { 263 263 struct pcd_unit *cd = disk->private_data; 264 - return cdrom_media_changed(&cd->info); 264 + return cdrom_check_events(&cd->info, clearing); 265 265 } 266 266 267 267 static const struct block_device_operations pcd_bdops = { ··· 270 268 .open = pcd_block_open, 271 269 .release = pcd_block_release, 272 270 .ioctl = pcd_block_ioctl, 273 - .media_changed = pcd_block_media_changed, 271 + .check_events = pcd_block_check_events, 274 272 }; 275 273 276 274 static struct cdrom_device_ops pcd_dops = { 277 275 .open = pcd_open, 278 276 .release = pcd_release, 279 277 .drive_status = pcd_drive_status, 280 - .media_changed = pcd_media_changed, 278 + .check_events = pcd_check_events, 281 279 .tray_move = pcd_tray_move, 282 280 .lock_door = pcd_lock_door, 283 281 .get_mcn = pcd_get_mcn, ··· 320 318 disk->first_minor = unit; 321 319 strcpy(disk->disk_name, cd->name); /* umm... */ 322 320 disk->fops = &pcd_bdops; 321 + disk->events = DISK_EVENT_MEDIA_CHANGE; 323 322 } 324 323 } 325 324 ··· 505 502 506 503 #define DBMSG(msg) ((verbose>1)?(msg):NULL) 507 504 508 - static int pcd_media_changed(struct cdrom_device_info *cdi, int slot_nr) 505 + static unsigned int pcd_check_events(struct cdrom_device_info *cdi, 506 + unsigned int clearing, int slot_nr) 509 507 { 510 508 struct pcd_unit *cd = cdi->handle; 511 509 int res = cd->changed; 512 510 if (res) 513 511 cd->changed = 0; 514 - return res; 512 + return res ? DISK_EVENT_MEDIA_CHANGE : 0; 515 513 } 516 514 517 515 static int pcd_lock_door(struct cdrom_device_info *cdi, int lock)

+4 -3

drivers/block/paride/pd.c

··· 794 794 return 0; 795 795 } 796 796 797 - static int pd_check_media(struct gendisk *p) 797 + static unsigned int pd_check_events(struct gendisk *p, unsigned int clearing) 798 798 { 799 799 struct pd_unit *disk = p->private_data; 800 800 int r; ··· 803 803 pd_special_command(disk, pd_media_check); 804 804 r = disk->changed; 805 805 disk->changed = 0; 806 - return r; 806 + return r ? DISK_EVENT_MEDIA_CHANGE : 0; 807 807 } 808 808 809 809 static int pd_revalidate(struct gendisk *p) ··· 822 822 .release = pd_release, 823 823 .ioctl = pd_ioctl, 824 824 .getgeo = pd_getgeo, 825 - .media_changed = pd_check_media, 825 + .check_events = pd_check_events, 826 826 .revalidate_disk= pd_revalidate 827 827 }; 828 828 ··· 837 837 p->fops = &pd_fops; 838 838 p->major = major; 839 839 p->first_minor = (disk - pd) << PD_BITS; 840 + p->events = DISK_EVENT_MEDIA_CHANGE; 840 841 disk->gd = p; 841 842 p->private_data = disk; 842 843 p->queue = pd_queue;

+6 -4

drivers/block/paride/pf.c

··· 243 243 static int pf_identify(struct pf_unit *pf); 244 244 static void pf_lock(struct pf_unit *pf, int func); 245 245 static void pf_eject(struct pf_unit *pf); 246 - static int pf_check_media(struct gendisk *disk); 246 + static unsigned int pf_check_events(struct gendisk *disk, 247 + unsigned int clearing); 247 248 248 249 static char pf_scratch[512]; /* scratch block buffer */ 249 250 ··· 271 270 .release = pf_release, 272 271 .ioctl = pf_ioctl, 273 272 .getgeo = pf_getgeo, 274 - .media_changed = pf_check_media, 273 + .check_events = pf_check_events, 275 274 }; 276 275 277 276 static void __init pf_init_units(void) ··· 294 293 disk->first_minor = unit; 295 294 strcpy(disk->disk_name, pf->name); 296 295 disk->fops = &pf_fops; 296 + disk->events = DISK_EVENT_MEDIA_CHANGE; 297 297 if (!(*drives[unit])[D_PRT]) 298 298 pf_drive_count++; 299 299 } ··· 379 377 380 378 } 381 379 382 - static int pf_check_media(struct gendisk *disk) 380 + static unsigned int pf_check_events(struct gendisk *disk, unsigned int clearing) 383 381 { 384 - return 1; 382 + return DISK_EVENT_MEDIA_CHANGE; 385 383 } 386 384 387 385 static inline int status_reg(struct pf_unit *pf)

+9 -6

drivers/block/pktcdvd.c

··· 1606 1606 min_sleep_time = pkt->sleep_time; 1607 1607 } 1608 1608 1609 - generic_unplug_device(bdev_get_queue(pd->bdev)); 1610 - 1611 1609 VPRINTK("kcdrwd: sleeping\n"); 1612 1610 residue = schedule_timeout(min_sleep_time); 1613 1611 VPRINTK("kcdrwd: wake up\n"); ··· 2794 2796 return ret; 2795 2797 } 2796 2798 2797 - static int pkt_media_changed(struct gendisk *disk) 2799 + static unsigned int pkt_check_events(struct gendisk *disk, 2800 + unsigned int clearing) 2798 2801 { 2799 2802 struct pktcdvd_device *pd = disk->private_data; 2800 2803 struct gendisk *attached_disk; ··· 2805 2806 if (!pd->bdev) 2806 2807 return 0; 2807 2808 attached_disk = pd->bdev->bd_disk; 2808 - if (!attached_disk) 2809 + if (!attached_disk || !attached_disk->fops->check_events) 2809 2810 return 0; 2810 - return attached_disk->fops->media_changed(attached_disk); 2811 + return attached_disk->fops->check_events(attached_disk, clearing); 2811 2812 } 2812 2813 2813 2814 static const struct block_device_operations pktcdvd_ops = { ··· 2815 2816 .open = pkt_open, 2816 2817 .release = pkt_close, 2817 2818 .ioctl = pkt_ioctl, 2818 - .media_changed = pkt_media_changed, 2819 + .check_events = pkt_check_events, 2819 2820 }; 2820 2821 2821 2822 static char *pktcdvd_devnode(struct gendisk *gd, mode_t *mode) ··· 2887 2888 ret = pkt_new_dev(pd, dev); 2888 2889 if (ret) 2889 2890 goto out_new_dev; 2891 + 2892 + /* inherit events of the host device */ 2893 + disk->events = pd->bdev->bd_disk->events; 2894 + disk->async_events = pd->bdev->bd_disk->async_events; 2890 2895 2891 2896 add_disk(disk); 2892 2897

+5 -3

drivers/block/swim.c

··· 741 741 return 0; 742 742 } 743 743 744 - static int floppy_check_change(struct gendisk *disk) 744 + static unsigned int floppy_check_events(struct gendisk *disk, 745 + unsigned int clearing) 745 746 { 746 747 struct floppy_state *fs = disk->private_data; 747 748 748 - return fs->ejected; 749 + return fs->ejected ? DISK_EVENT_MEDIA_CHANGE : 0; 749 750 } 750 751 751 752 static int floppy_revalidate(struct gendisk *disk) ··· 773 772 .release = floppy_release, 774 773 .ioctl = floppy_ioctl, 775 774 .getgeo = floppy_getgeo, 776 - .media_changed = floppy_check_change, 775 + .check_events = floppy_check_events, 777 776 .revalidate_disk = floppy_revalidate, 778 777 }; 779 778 ··· 858 857 swd->unit[drive].disk->first_minor = drive; 859 858 sprintf(swd->unit[drive].disk->disk_name, "fd%d", drive); 860 859 swd->unit[drive].disk->fops = &floppy_fops; 860 + swd->unit[drive].disk->events = DISK_EVENT_MEDIA_CHANGE; 861 861 swd->unit[drive].disk->private_data = &swd->unit[drive]; 862 862 swd->unit[drive].disk->queue = swd->queue; 863 863 set_capacity(swd->unit[drive].disk, 2880);

+7 -4

drivers/block/swim3.c

··· 250 250 unsigned int cmd, unsigned long param); 251 251 static int floppy_open(struct block_device *bdev, fmode_t mode); 252 252 static int floppy_release(struct gendisk *disk, fmode_t mode); 253 - static int floppy_check_change(struct gendisk *disk); 253 + static unsigned int floppy_check_events(struct gendisk *disk, 254 + unsigned int clearing); 254 255 static int floppy_revalidate(struct gendisk *disk); 255 256 256 257 static bool swim3_end_request(int err, unsigned int nr_bytes) ··· 976 975 return 0; 977 976 } 978 977 979 - static int floppy_check_change(struct gendisk *disk) 978 + static unsigned int floppy_check_events(struct gendisk *disk, 979 + unsigned int clearing) 980 980 { 981 981 struct floppy_state *fs = disk->private_data; 982 - return fs->ejected; 982 + return fs->ejected ? DISK_EVENT_MEDIA_CHANGE : 0; 983 983 } 984 984 985 985 static int floppy_revalidate(struct gendisk *disk) ··· 1027 1025 .open = floppy_unlocked_open, 1028 1026 .release = floppy_release, 1029 1027 .ioctl = floppy_ioctl, 1030 - .media_changed = floppy_check_change, 1028 + .check_events = floppy_check_events, 1031 1029 .revalidate_disk= floppy_revalidate, 1032 1030 }; 1033 1031 ··· 1163 1161 disk->major = FLOPPY_MAJOR; 1164 1162 disk->first_minor = i; 1165 1163 disk->fops = &floppy_fops; 1164 + disk->events = DISK_EVENT_MEDIA_CHANGE; 1166 1165 disk->private_data = &floppy_states[i]; 1167 1166 disk->queue = swim3_queue; 1168 1167 disk->flags |= GENHD_FL_REMOVABLE;

+6 -4

drivers/block/ub.c

··· 1788 1788 * 1789 1789 * The return code is bool! 1790 1790 */ 1791 - static int ub_bd_media_changed(struct gendisk *disk) 1791 + static unsigned int ub_bd_check_events(struct gendisk *disk, 1792 + unsigned int clearing) 1792 1793 { 1793 1794 struct ub_lun *lun = disk->private_data; 1794 1795 ··· 1807 1806 */ 1808 1807 if (ub_sync_tur(lun->udev, lun) != 0) { 1809 1808 lun->changed = 1; 1810 - return 1; 1809 + return DISK_EVENT_MEDIA_CHANGE; 1811 1810 } 1812 1811 1813 - return lun->changed; 1812 + return lun->changed ? DISK_EVENT_MEDIA_CHANGE : 0; 1814 1813 } 1815 1814 1816 1815 static const struct block_device_operations ub_bd_fops = { ··· 1818 1817 .open = ub_bd_unlocked_open, 1819 1818 .release = ub_bd_release, 1820 1819 .ioctl = ub_bd_ioctl, 1821 - .media_changed = ub_bd_media_changed, 1820 + .check_events = ub_bd_check_events, 1822 1821 .revalidate_disk = ub_bd_revalidate, 1823 1822 }; 1824 1823 ··· 2334 2333 disk->major = UB_MAJOR; 2335 2334 disk->first_minor = lun->id * UB_PARTS_PER_LUN; 2336 2335 disk->fops = &ub_bd_fops; 2336 + disk->events = DISK_EVENT_MEDIA_CHANGE; 2337 2337 disk->private_data = lun; 2338 2338 disk->driverfs_dev = &sc->intf->dev; 2339 2339

+1 -25

drivers/block/umem.c

··· 241 241 * 242 242 * Whenever IO on the active page completes, the Ready page is activated 243 243 * and the ex-Active page is clean out and made Ready. 244 - * Otherwise the Ready page is only activated when it becomes full, or 245 - * when mm_unplug_device is called via the unplug_io_fn. 244 + * Otherwise the Ready page is only activated when it becomes full. 246 245 * 247 246 * If a request arrives while both pages a full, it is queued, and b_rdev is 248 247 * overloaded to record whether it was a read or a write. ··· 330 331 page->headcnt = 0; 331 332 page->bio = NULL; 332 333 page->biotail = &page->bio; 333 - } 334 - 335 - static void mm_unplug_device(struct request_queue *q) 336 - { 337 - struct cardinfo *card = q->queuedata; 338 - unsigned long flags; 339 - 340 - spin_lock_irqsave(&card->lock, flags); 341 - if (blk_remove_plug(q)) 342 - activate(card); 343 - spin_unlock_irqrestore(&card->lock, flags); 344 334 } 345 335 346 336 /* ··· 523 535 *card->biotail = bio; 524 536 bio->bi_next = NULL; 525 537 card->biotail = &bio->bi_next; 526 - blk_plug_device(q); 527 538 spin_unlock_irq(&card->lock); 528 539 529 540 return 0; ··· 766 779 return 0; 767 780 } 768 781 769 - /* 770 - * Future support for removable devices 771 - */ 772 - static int mm_check_change(struct gendisk *disk) 773 - { 774 - /* struct cardinfo *dev = disk->private_data; */ 775 - return 0; 776 - } 777 - 778 782 static const struct block_device_operations mm_fops = { 779 783 .owner = THIS_MODULE, 780 784 .getgeo = mm_getgeo, 781 785 .revalidate_disk = mm_revalidate, 782 - .media_changed = mm_check_change, 783 786 }; 784 787 785 788 static int __devinit mm_pci_probe(struct pci_dev *dev, ··· 884 907 blk_queue_make_request(card->queue, mm_make_request); 885 908 card->queue->queue_lock = &card->lock; 886 909 card->queue->queuedata = card; 887 - card->queue->unplug_fn = mm_unplug_device; 888 910 889 911 tasklet_init(&card->tasklet, process_page, (unsigned long)card); 890 912

+5 -4

drivers/block/xsysace.c

··· 867 867 } 868 868 } 869 869 870 - static int ace_media_changed(struct gendisk *gd) 870 + static unsigned int ace_check_events(struct gendisk *gd, unsigned int clearing) 871 871 { 872 872 struct ace_device *ace = gd->private_data; 873 - dev_dbg(ace->dev, "ace_media_changed(): %i\n", ace->media_change); 873 + dev_dbg(ace->dev, "ace_check_events(): %i\n", ace->media_change); 874 874 875 - return ace->media_change; 875 + return ace->media_change ? DISK_EVENT_MEDIA_CHANGE : 0; 876 876 } 877 877 878 878 static int ace_revalidate_disk(struct gendisk *gd) ··· 953 953 .owner = THIS_MODULE, 954 954 .open = ace_open, 955 955 .release = ace_release, 956 - .media_changed = ace_media_changed, 956 + .check_events = ace_check_events, 957 957 .revalidate_disk = ace_revalidate_disk, 958 958 .getgeo = ace_getgeo, 959 959 }; ··· 1005 1005 ace->gd->major = ace_major; 1006 1006 ace->gd->first_minor = ace->id * ACE_NUM_MINORS; 1007 1007 ace->gd->fops = &ace_fops; 1008 + ace->gd->events = DISK_EVENT_MEDIA_CHANGE; 1008 1009 ace->gd->queue = ace->queue; 1009 1010 ace->gd->private_data = ace; 1010 1011 snprintf(ace->gd->disk_name, 32, "xs%c", ace->id + 'a');

+10 -6

drivers/cdrom/gdrom.c

··· 395 395 return CDS_NO_INFO; 396 396 } 397 397 398 - static int gdrom_mediachanged(struct cdrom_device_info *cd_info, int ignore) 398 + static unsigned int gdrom_check_events(struct cdrom_device_info *cd_info, 399 + unsigned int clearing, int ignore) 399 400 { 400 401 /* check the sense key */ 401 - return (__raw_readb(GDROM_ERROR_REG) & 0xF0) == 0x60; 402 + return (__raw_readb(GDROM_ERROR_REG) & 0xF0) == 0x60 ? 403 + DISK_EVENT_MEDIA_CHANGE : 0; 402 404 } 403 405 404 406 /* reset the G1 bus */ ··· 485 483 .open = gdrom_open, 486 484 .release = gdrom_release, 487 485 .drive_status = gdrom_drivestatus, 488 - .media_changed = gdrom_mediachanged, 486 + .check_events = gdrom_check_events, 489 487 .get_last_session = gdrom_get_last_session, 490 488 .reset = gdrom_hardreset, 491 489 .audio_ioctl = gdrom_audio_ioctl, ··· 511 509 return 0; 512 510 } 513 511 514 - static int gdrom_bdops_mediachanged(struct gendisk *disk) 512 + static unsigned int gdrom_bdops_check_events(struct gendisk *disk, 513 + unsigned int clearing) 515 514 { 516 - return cdrom_media_changed(gd.cd_info); 515 + return cdrom_check_events(gd.cd_info, clearing); 517 516 } 518 517 519 518 static int gdrom_bdops_ioctl(struct block_device *bdev, fmode_t mode, ··· 533 530 .owner = THIS_MODULE, 534 531 .open = gdrom_bdops_open, 535 532 .release = gdrom_bdops_release, 536 - .media_changed = gdrom_bdops_mediachanged, 533 + .check_events = gdrom_bdops_check_events, 537 534 .ioctl = gdrom_bdops_ioctl, 538 535 }; 539 536 ··· 803 800 goto probe_fail_cdrom_register; 804 801 } 805 802 gd.disk->fops = &gdrom_bdops; 803 + gd.disk->events = DISK_EVENT_MEDIA_CHANGE; 806 804 /* latch on to the interrupt */ 807 805 err = gdrom_set_interrupt_handlers(); 808 806 if (err)

+10 -7

drivers/cdrom/viocd.c

··· 186 186 return ret; 187 187 } 188 188 189 - static int viocd_blk_media_changed(struct gendisk *disk) 189 + static unsigned int viocd_blk_check_events(struct gendisk *disk, 190 + unsigned int clearing) 190 191 { 191 192 struct disk_info *di = disk->private_data; 192 - return cdrom_media_changed(&di->viocd_info); 193 + return cdrom_check_events(&di->viocd_info, clearing); 193 194 } 194 195 195 196 static const struct block_device_operations viocd_fops = { ··· 198 197 .open = viocd_blk_open, 199 198 .release = viocd_blk_release, 200 199 .ioctl = viocd_blk_ioctl, 201 - .media_changed = viocd_blk_media_changed, 200 + .check_events = viocd_blk_check_events, 202 201 }; 203 202 204 203 static int viocd_open(struct cdrom_device_info *cdi, int purpose) ··· 321 320 } 322 321 } 323 322 324 - static int viocd_media_changed(struct cdrom_device_info *cdi, int disc_nr) 323 + static unsigned int viocd_check_events(struct cdrom_device_info *cdi, 324 + unsigned int clearing, int disc_nr) 325 325 { 326 326 struct viocd_waitevent we; 327 327 HvLpEvent_Rc hvrc; ··· 342 340 if (hvrc != 0) { 343 341 pr_warning("bad rc on HvCallEvent_signalLpEventFast %d\n", 344 342 (int)hvrc); 345 - return -EIO; 343 + return 0; 346 344 } 347 345 348 346 wait_for_completion(&we.com); ··· 356 354 return 0; 357 355 } 358 356 359 - return we.changed; 357 + return we.changed ? DISK_EVENT_MEDIA_CHANGE : 0; 360 358 } 361 359 362 360 static int viocd_lock_door(struct cdrom_device_info *cdi, int locking) ··· 552 550 static struct cdrom_device_ops viocd_dops = { 553 551 .open = viocd_open, 554 552 .release = viocd_release, 555 - .media_changed = viocd_media_changed, 553 + .check_events = viocd_check_events, 556 554 .lock_door = viocd_lock_door, 557 555 .generic_packet = viocd_packet, 558 556 .audio_ioctl = viocd_audio_ioctl, ··· 626 624 gendisk->queue = q; 627 625 gendisk->fops = &viocd_fops; 628 626 gendisk->flags = GENHD_FL_CD|GENHD_FL_REMOVABLE; 627 + gendisk->events = DISK_EVENT_MEDIA_CHANGE; 629 628 set_capacity(gendisk, 0); 630 629 gendisk->private_data = d; 631 630 d->viocd_disk = gendisk;

+1 -2

drivers/ide/ide-atapi.c

··· 233 233 234 234 drive->hwif->rq = NULL; 235 235 236 - elv_add_request(drive->queue, &drive->sense_rq, 237 - ELEVATOR_INSERT_FRONT, 0); 236 + elv_add_request(drive->queue, &drive->sense_rq, ELEVATOR_INSERT_FRONT); 238 237 return 0; 239 238 } 240 239 EXPORT_SYMBOL_GPL(ide_queue_sense_rq);

+8 -15

drivers/ide/ide-cd.c

··· 258 258 if (time_after(jiffies, info->write_timeout)) 259 259 return 0; 260 260 else { 261 - struct request_queue *q = drive->queue; 262 - unsigned long flags; 263 - 264 261 /* 265 - * take a breather relying on the unplug timer to kick us again 262 + * take a breather 266 263 */ 267 - 268 - spin_lock_irqsave(q->queue_lock, flags); 269 - blk_plug_device(q); 270 - spin_unlock_irqrestore(q->queue_lock, flags); 271 - 264 + blk_delay_queue(drive->queue, 1); 272 265 return 1; 273 266 } 274 267 } ··· 1170 1177 .open = ide_cdrom_open_real, 1171 1178 .release = ide_cdrom_release_real, 1172 1179 .drive_status = ide_cdrom_drive_status, 1173 - .media_changed = ide_cdrom_check_media_change_real, 1180 + .check_events = ide_cdrom_check_events_real, 1174 1181 .tray_move = ide_cdrom_tray_move, 1175 1182 .lock_door = ide_cdrom_lock_door, 1176 1183 .select_speed = ide_cdrom_select_speed, ··· 1507 1514 blk_queue_dma_alignment(q, 31); 1508 1515 blk_queue_update_dma_pad(q, 15); 1509 1516 1510 - q->unplug_delay = max((1 * HZ) / 1000, 1); 1511 - 1512 1517 drive->dev_flags |= IDE_DFLAG_MEDIA_CHANGED; 1513 1518 drive->atapi_flags = IDE_AFLAG_NO_EJECT | ide_cd_flags(id); 1514 1519 ··· 1693 1702 } 1694 1703 1695 1704 1696 - static int idecd_media_changed(struct gendisk *disk) 1705 + static unsigned int idecd_check_events(struct gendisk *disk, 1706 + unsigned int clearing) 1697 1707 { 1698 1708 struct cdrom_info *info = ide_drv_g(disk, cdrom_info); 1699 - return cdrom_media_changed(&info->devinfo); 1709 + return cdrom_check_events(&info->devinfo, clearing); 1700 1710 } 1701 1711 1702 1712 static int idecd_revalidate_disk(struct gendisk *disk) ··· 1715 1723 .open = idecd_open, 1716 1724 .release = idecd_release, 1717 1725 .ioctl = idecd_ioctl, 1718 - .media_changed = idecd_media_changed, 1726 + .check_events = idecd_check_events, 1719 1727 .revalidate_disk = idecd_revalidate_disk 1720 1728 }; 1721 1729 ··· 1782 1790 ide_cd_read_toc(drive, &sense); 1783 1791 g->fops = &idecd_ops; 1784 1792 g->flags |= GENHD_FL_REMOVABLE; 1793 + g->events = DISK_EVENT_MEDIA_CHANGE; 1785 1794 add_disk(g); 1786 1795 return 0; 1787 1796

+2 -1

drivers/ide/ide-cd.h

··· 111 111 int ide_cdrom_open_real(struct cdrom_device_info *, int); 112 112 void ide_cdrom_release_real(struct cdrom_device_info *); 113 113 int ide_cdrom_drive_status(struct cdrom_device_info *, int); 114 - int ide_cdrom_check_media_change_real(struct cdrom_device_info *, int); 114 + unsigned int ide_cdrom_check_events_real(struct cdrom_device_info *, 115 + unsigned int clearing, int slot_nr); 115 116 int ide_cdrom_tray_move(struct cdrom_device_info *, int); 116 117 int ide_cdrom_lock_door(struct cdrom_device_info *, int); 117 118 int ide_cdrom_select_speed(struct cdrom_device_info *, int);

+4 -4

drivers/ide/ide-cd_ioctl.c

··· 79 79 return CDS_DRIVE_NOT_READY; 80 80 } 81 81 82 - int ide_cdrom_check_media_change_real(struct cdrom_device_info *cdi, 83 - int slot_nr) 82 + unsigned int ide_cdrom_check_events_real(struct cdrom_device_info *cdi, 83 + unsigned int clearing, int slot_nr) 84 84 { 85 85 ide_drive_t *drive = cdi->handle; 86 86 int retval; ··· 89 89 (void) cdrom_check_status(drive, NULL); 90 90 retval = (drive->dev_flags & IDE_DFLAG_MEDIA_CHANGED) ? 1 : 0; 91 91 drive->dev_flags &= ~IDE_DFLAG_MEDIA_CHANGED; 92 - return retval; 92 + return retval ? DISK_EVENT_MEDIA_CHANGE : 0; 93 93 } else { 94 - return -EINVAL; 94 + return 0; 95 95 } 96 96 } 97 97

+8 -6

drivers/ide/ide-gd.c

··· 285 285 return 0; 286 286 } 287 287 288 - static int ide_gd_media_changed(struct gendisk *disk) 288 + static unsigned int ide_gd_check_events(struct gendisk *disk, 289 + unsigned int clearing) 289 290 { 290 291 struct ide_disk_obj *idkp = ide_drv_g(disk, ide_disk_obj); 291 292 ide_drive_t *drive = idkp->drive; 292 - int ret; 293 + bool ret; 293 294 294 295 /* do not scan partitions twice if this is a removable device */ 295 296 if (drive->dev_flags & IDE_DFLAG_ATTACH) { ··· 298 297 return 0; 299 298 } 300 299 301 - ret = !!(drive->dev_flags & IDE_DFLAG_MEDIA_CHANGED); 300 + ret = drive->dev_flags & IDE_DFLAG_MEDIA_CHANGED; 302 301 drive->dev_flags &= ~IDE_DFLAG_MEDIA_CHANGED; 303 302 304 - return ret; 303 + return ret ? DISK_EVENT_MEDIA_CHANGE : 0; 305 304 } 306 305 307 306 static void ide_gd_unlock_native_capacity(struct gendisk *disk) ··· 319 318 struct ide_disk_obj *idkp = ide_drv_g(disk, ide_disk_obj); 320 319 ide_drive_t *drive = idkp->drive; 321 320 322 - if (ide_gd_media_changed(disk)) 321 + if (ide_gd_check_events(disk, 0)) 323 322 drive->disk_ops->get_capacity(drive); 324 323 325 324 set_capacity(disk, ide_gd_capacity(drive)); ··· 341 340 .release = ide_gd_release, 342 341 .ioctl = ide_gd_ioctl, 343 342 .getgeo = ide_gd_getgeo, 344 - .media_changed = ide_gd_media_changed, 343 + .check_events = ide_gd_check_events, 345 344 .unlock_native_capacity = ide_gd_unlock_native_capacity, 346 345 .revalidate_disk = ide_gd_revalidate_disk 347 346 }; ··· 413 412 if (drive->dev_flags & IDE_DFLAG_REMOVABLE) 414 413 g->flags = GENHD_FL_REMOVABLE; 415 414 g->fops = &ide_gd_ops; 415 + g->events = DISK_EVENT_MEDIA_CHANGE; 416 416 add_disk(g); 417 417 return 0; 418 418

-4

drivers/ide/ide-io.c

··· 549 549 550 550 if (rq) 551 551 blk_requeue_request(q, rq); 552 - if (!elv_queue_empty(q)) 553 - blk_plug_device(q); 554 552 } 555 553 556 554 void ide_requeue_and_plug(ide_drive_t *drive, struct request *rq) ··· 560 562 561 563 if (rq) 562 564 blk_requeue_request(q, rq); 563 - if (!elv_queue_empty(q)) 564 - blk_plug_device(q); 565 565 566 566 spin_unlock_irqrestore(q->queue_lock, flags); 567 567 }

+1 -1

drivers/ide/ide-park.c

··· 52 52 rq->cmd[0] = REQ_UNPARK_HEADS; 53 53 rq->cmd_len = 1; 54 54 rq->cmd_type = REQ_TYPE_SPECIAL; 55 - elv_add_request(q, rq, ELEVATOR_INSERT_FRONT, 1); 55 + elv_add_request(q, rq, ELEVATOR_INSERT_FRONT); 56 56 57 57 out: 58 58 return;

+2 -3

drivers/md/bitmap.c

··· 347 347 atomic_inc(&bitmap->pending_writes); 348 348 set_buffer_locked(bh); 349 349 set_buffer_mapped(bh); 350 - submit_bh(WRITE | REQ_UNPLUG | REQ_SYNC, bh); 350 + submit_bh(WRITE | REQ_SYNC, bh); 351 351 bh = bh->b_this_page; 352 352 } 353 353 ··· 1339 1339 prepare_to_wait(&bitmap->overflow_wait, &__wait, 1340 1340 TASK_UNINTERRUPTIBLE); 1341 1341 spin_unlock_irq(&bitmap->lock); 1342 - md_unplug(bitmap->mddev); 1343 - schedule(); 1342 + io_schedule(); 1344 1343 finish_wait(&bitmap->overflow_wait, &__wait); 1345 1344 continue; 1346 1345 }

+1 -8

drivers/md/dm-crypt.c

··· 991 991 clone->bi_destructor = dm_crypt_bio_destructor; 992 992 } 993 993 994 - static void kcryptd_unplug(struct crypt_config *cc) 995 - { 996 - blk_unplug(bdev_get_queue(cc->dev->bdev)); 997 - } 998 - 999 994 static int kcryptd_io_read(struct dm_crypt_io *io, gfp_t gfp) 1000 995 { 1001 996 struct crypt_config *cc = io->target->private; ··· 1003 1008 * one in order to decrypt the whole bio data *afterwards*. 1004 1009 */ 1005 1010 clone = bio_alloc_bioset(gfp, bio_segments(base_bio), cc->bs); 1006 - if (!clone) { 1007 - kcryptd_unplug(cc); 1011 + if (!clone) 1008 1012 return 1; 1009 - } 1010 1013 1011 1014 crypt_inc_pending(io); 1012 1015

+1 -1

drivers/md/dm-io.c

··· 352 352 BUG_ON(num_regions > DM_IO_MAX_REGIONS); 353 353 354 354 if (sync) 355 - rw |= REQ_SYNC | REQ_UNPLUG; 355 + rw |= REQ_SYNC; 356 356 357 357 /* 358 358 * For multiple regions we need to be careful to rewind

+5 -50

drivers/md/dm-kcopyd.c

··· 37 37 unsigned int nr_pages; 38 38 unsigned int nr_free_pages; 39 39 40 - /* 41 - * Block devices to unplug. 42 - * Non-NULL pointer means that a block device has some pending requests 43 - * and needs to be unplugged. 44 - */ 45 - struct block_device *unplug[2]; 46 - 47 40 struct dm_io_client *io_client; 48 41 49 42 wait_queue_head_t destroyq; ··· 308 315 return 0; 309 316 } 310 317 311 - /* 312 - * Unplug the block device at the specified index. 313 - */ 314 - static void unplug(struct dm_kcopyd_client *kc, int rw) 315 - { 316 - if (kc->unplug[rw] != NULL) { 317 - blk_unplug(bdev_get_queue(kc->unplug[rw])); 318 - kc->unplug[rw] = NULL; 319 - } 320 - } 321 - 322 - /* 323 - * Prepare block device unplug. If there's another device 324 - * to be unplugged at the same array index, we unplug that 325 - * device first. 326 - */ 327 - static void prepare_unplug(struct dm_kcopyd_client *kc, int rw, 328 - struct block_device *bdev) 329 - { 330 - if (likely(kc->unplug[rw] == bdev)) 331 - return; 332 - unplug(kc, rw); 333 - kc->unplug[rw] = bdev; 334 - } 335 - 336 318 static void complete_io(unsigned long error, void *context) 337 319 { 338 320 struct kcopyd_job *job = (struct kcopyd_job *) context; ··· 354 386 .client = job->kc->io_client, 355 387 }; 356 388 357 - if (job->rw == READ) { 389 + if (job->rw == READ) 358 390 r = dm_io(&io_req, 1, &job->source, NULL); 359 - prepare_unplug(job->kc, READ, job->source.bdev); 360 - } else { 361 - if (job->num_dests > 1) 362 - io_req.bi_rw |= REQ_UNPLUG; 391 + else 363 392 r = dm_io(&io_req, job->num_dests, job->dests, NULL); 364 - if (!(io_req.bi_rw & REQ_UNPLUG)) 365 - prepare_unplug(job->kc, WRITE, job->dests[0].bdev); 366 - } 367 393 368 394 return r; 369 395 } ··· 428 466 { 429 467 struct dm_kcopyd_client *kc = container_of(work, 430 468 struct dm_kcopyd_client, kcopyd_work); 469 + struct blk_plug plug; 431 470 432 471 /* 433 472 * The order that these are called is *very* important. ··· 436 473 * Pages jobs when successful will jump onto the io jobs 437 474 * list. io jobs call wake when they complete and it all 438 475 * starts again. 439 - * 440 - * Note that io_jobs add block devices to the unplug array, 441 - * this array is cleared with "unplug" calls. It is thus 442 - * forbidden to run complete_jobs after io_jobs and before 443 - * unplug because the block device could be destroyed in 444 - * job completion callback. 445 476 */ 477 + blk_start_plug(&plug); 446 478 process_jobs(&kc->complete_jobs, kc, run_complete_job); 447 479 process_jobs(&kc->pages_jobs, kc, run_pages_job); 448 480 process_jobs(&kc->io_jobs, kc, run_io_job); 449 - unplug(kc, READ); 450 - unplug(kc, WRITE); 481 + blk_finish_plug(&plug); 451 482 } 452 483 453 484 /* ··· 621 664 INIT_LIST_HEAD(&kc->complete_jobs); 622 665 INIT_LIST_HEAD(&kc->io_jobs); 623 666 INIT_LIST_HEAD(&kc->pages_jobs); 624 - 625 - memset(kc->unplug, 0, sizeof(kc->unplug)); 626 667 627 668 kc->job_pool = mempool_create_slab_pool(MIN_JOBS, _job_cache); 628 669 if (!kc->job_pool)

+1 -1

drivers/md/dm-raid.c

··· 394 394 { 395 395 struct raid_set *rs = container_of(cb, struct raid_set, callbacks); 396 396 397 - md_raid5_unplug_device(rs->md.private); 397 + md_raid5_kick_device(rs->md.private); 398 398 } 399 399 400 400 /*

-2

drivers/md/dm-raid1.c

··· 842 842 do_reads(ms, &reads); 843 843 do_writes(ms, &writes); 844 844 do_failures(ms, &failures); 845 - 846 - dm_table_unplug_all(ms->ti->table); 847 845 } 848 846 849 847 /*-----------------------------------------------------------------

+5 -26

drivers/md/dm-table.c

··· 55 55 struct dm_target *targets; 56 56 57 57 unsigned discards_supported:1; 58 + unsigned integrity_supported:1; 58 59 59 60 /* 60 61 * Indicates the rw permissions for the new logical ··· 860 859 return -EINVAL; 861 860 } 862 861 863 - t->mempools = dm_alloc_md_mempools(type); 862 + t->mempools = dm_alloc_md_mempools(type, t->integrity_supported); 864 863 if (!t->mempools) 865 864 return -ENOMEM; 866 865 ··· 936 935 struct dm_dev_internal *dd; 937 936 938 937 list_for_each_entry(dd, devices, list) 939 - if (bdev_get_integrity(dd->dm_dev.bdev)) 938 + if (bdev_get_integrity(dd->dm_dev.bdev)) { 939 + t->integrity_supported = 1; 940 940 return blk_integrity_register(dm_disk(md), NULL); 941 + } 941 942 942 943 return 0; 943 944 } ··· 1278 1275 return 0; 1279 1276 } 1280 1277 1281 - void dm_table_unplug_all(struct dm_table *t) 1282 - { 1283 - struct dm_dev_internal *dd; 1284 - struct list_head *devices = dm_table_get_devices(t); 1285 - struct dm_target_callbacks *cb; 1286 - 1287 - list_for_each_entry(dd, devices, list) { 1288 - struct request_queue *q = bdev_get_queue(dd->dm_dev.bdev); 1289 - char b[BDEVNAME_SIZE]; 1290 - 1291 - if (likely(q)) 1292 - blk_unplug(q); 1293 - else 1294 - DMWARN_LIMIT("%s: Cannot unplug nonexistent device %s", 1295 - dm_device_name(t->md), 1296 - bdevname(dd->dm_dev.bdev, b)); 1297 - } 1298 - 1299 - list_for_each_entry(cb, &t->target_callbacks, list) 1300 - if (cb->unplug_fn) 1301 - cb->unplug_fn(cb); 1302 - } 1303 - 1304 1278 struct mapped_device *dm_table_get_md(struct dm_table *t) 1305 1279 { 1306 1280 return t->md; ··· 1325 1345 EXPORT_SYMBOL(dm_table_get_md); 1326 1346 EXPORT_SYMBOL(dm_table_put); 1327 1347 EXPORT_SYMBOL(dm_table_get); 1328 - EXPORT_SYMBOL(dm_table_unplug_all);

+18 -34

drivers/md/dm.c

··· 477 477 cpu = part_stat_lock(); 478 478 part_round_stats(cpu, &dm_disk(md)->part0); 479 479 part_stat_unlock(); 480 - dm_disk(md)->part0.in_flight[rw] = atomic_inc_return(&md->pending[rw]); 480 + atomic_set(&dm_disk(md)->part0.in_flight[rw], 481 + atomic_inc_return(&md->pending[rw])); 481 482 } 482 483 483 484 static void end_io_acct(struct dm_io *io) ··· 498 497 * After this is decremented the bio must not be touched if it is 499 498 * a flush. 500 499 */ 501 - dm_disk(md)->part0.in_flight[rw] = pending = 502 - atomic_dec_return(&md->pending[rw]); 500 + pending = atomic_dec_return(&md->pending[rw]); 501 + atomic_set(&dm_disk(md)->part0.in_flight[rw], pending); 503 502 pending += atomic_read(&md->pending[rw^0x1]); 504 503 505 504 /* nudge anyone waiting on suspend queue */ ··· 808 807 dm_unprep_request(rq); 809 808 810 809 spin_lock_irqsave(q->queue_lock, flags); 811 - if (elv_queue_empty(q)) 812 - blk_plug_device(q); 813 810 blk_requeue_request(q, rq); 814 811 spin_unlock_irqrestore(q->queue_lock, flags); 815 812 ··· 1612 1613 * number of in-flight I/Os after the queue is stopped in 1613 1614 * dm_suspend(). 1614 1615 */ 1615 - while (!blk_queue_plugged(q) && !blk_queue_stopped(q)) { 1616 + while (!blk_queue_stopped(q)) { 1616 1617 rq = blk_peek_request(q); 1617 1618 if (!rq) 1618 - goto plug_and_out; 1619 + goto delay_and_out; 1619 1620 1620 1621 /* always use block 0 to find the target for flushes for now */ 1621 1622 pos = 0; ··· 1626 1627 BUG_ON(!dm_target_is_valid(ti)); 1627 1628 1628 1629 if (ti->type->busy && ti->type->busy(ti)) 1629 - goto plug_and_out; 1630 + goto delay_and_out; 1630 1631 1631 1632 blk_start_request(rq); 1632 1633 clone = rq->special; ··· 1646 1647 BUG_ON(!irqs_disabled()); 1647 1648 spin_lock(q->queue_lock); 1648 1649 1649 - plug_and_out: 1650 - if (!elv_queue_empty(q)) 1651 - /* Some requests still remain, retry later */ 1652 - blk_plug_device(q); 1653 - 1650 + delay_and_out: 1651 + blk_delay_queue(q, HZ / 10); 1654 1652 out: 1655 1653 dm_table_put(map); 1656 1654 ··· 1674 1678 dm_table_put(map); 1675 1679 1676 1680 return r; 1677 - } 1678 - 1679 - static void dm_unplug_all(struct request_queue *q) 1680 - { 1681 - struct mapped_device *md = q->queuedata; 1682 - struct dm_table *map = dm_get_live_table(md); 1683 - 1684 - if (map) { 1685 - if (dm_request_based(md)) 1686 - generic_unplug_device(q); 1687 - 1688 - dm_table_unplug_all(map); 1689 - dm_table_put(map); 1690 - } 1691 1681 } 1692 1682 1693 1683 static int dm_any_congested(void *congested_data, int bdi_bits) ··· 1799 1817 md->queue->backing_dev_info.congested_data = md; 1800 1818 blk_queue_make_request(md->queue, dm_request); 1801 1819 blk_queue_bounce_limit(md->queue, BLK_BOUNCE_ANY); 1802 - md->queue->unplug_fn = dm_unplug_all; 1803 1820 blk_queue_merge_bvec(md->queue, dm_merge_bvec); 1804 1821 blk_queue_flush(md->queue, REQ_FLUSH | REQ_FUA); 1805 1822 } ··· 2244 2263 int r = 0; 2245 2264 DECLARE_WAITQUEUE(wait, current); 2246 2265 2247 - dm_unplug_all(md->queue); 2248 - 2249 2266 add_wait_queue(&md->wait, &wait); 2250 2267 2251 2268 while (1) { ··· 2518 2539 2519 2540 clear_bit(DMF_SUSPENDED, &md->flags); 2520 2541 2521 - dm_table_unplug_all(map); 2522 2542 r = 0; 2523 2543 out: 2524 2544 dm_table_put(map); ··· 2621 2643 } 2622 2644 EXPORT_SYMBOL_GPL(dm_noflush_suspending); 2623 2645 2624 - struct dm_md_mempools *dm_alloc_md_mempools(unsigned type) 2646 + struct dm_md_mempools *dm_alloc_md_mempools(unsigned type, unsigned integrity) 2625 2647 { 2626 2648 struct dm_md_mempools *pools = kmalloc(sizeof(*pools), GFP_KERNEL); 2649 + unsigned int pool_size = (type == DM_TYPE_BIO_BASED) ? 16 : MIN_IOS; 2627 2650 2628 2651 if (!pools) 2629 2652 return NULL; ··· 2641 2662 if (!pools->tio_pool) 2642 2663 goto free_io_pool_and_out; 2643 2664 2644 - pools->bs = (type == DM_TYPE_BIO_BASED) ? 2645 - bioset_create(16, 0) : bioset_create(MIN_IOS, 0); 2665 + pools->bs = bioset_create(pool_size, 0); 2646 2666 if (!pools->bs) 2647 2667 goto free_tio_pool_and_out; 2648 2668 2669 + if (integrity && bioset_integrity_create(pools->bs, pool_size)) 2670 + goto free_bioset_and_out; 2671 + 2649 2672 return pools; 2673 + 2674 + free_bioset_and_out: 2675 + bioset_free(pools->bs); 2650 2676 2651 2677 free_tio_pool_and_out: 2652 2678 mempool_destroy(pools->tio_pool);

+1 -1

drivers/md/dm.h

··· 149 149 /* 150 150 * Mempool operations 151 151 */ 152 - struct dm_md_mempools *dm_alloc_md_mempools(unsigned type); 152 + struct dm_md_mempools *dm_alloc_md_mempools(unsigned type, unsigned integrity); 153 153 void dm_free_md_mempools(struct dm_md_mempools *pools); 154 154 155 155 #endif

+1 -19

drivers/md/linear.c

··· 87 87 return maxsectors << 9; 88 88 } 89 89 90 - static void linear_unplug(struct request_queue *q) 91 - { 92 - mddev_t *mddev = q->queuedata; 93 - linear_conf_t *conf; 94 - int i; 95 - 96 - rcu_read_lock(); 97 - conf = rcu_dereference(mddev->private); 98 - 99 - for (i=0; i < mddev->raid_disks; i++) { 100 - struct request_queue *r_queue = bdev_get_queue(conf->disks[i].rdev->bdev); 101 - blk_unplug(r_queue); 102 - } 103 - rcu_read_unlock(); 104 - } 105 - 106 90 static int linear_congested(void *data, int bits) 107 91 { 108 92 mddev_t *mddev = data; ··· 208 224 md_set_array_sectors(mddev, linear_size(mddev, 0, 0)); 209 225 210 226 blk_queue_merge_bvec(mddev->queue, linear_mergeable_bvec); 211 - mddev->queue->unplug_fn = linear_unplug; 212 227 mddev->queue->backing_dev_info.congested_fn = linear_congested; 213 228 mddev->queue->backing_dev_info.congested_data = mddev; 214 - md_integrity_register(mddev); 215 - return 0; 229 + return md_integrity_register(mddev); 216 230 } 217 231 218 232 static void free_conf(struct rcu_head *head)

+8 -12

drivers/md/md.c

··· 780 780 bio->bi_end_io = super_written; 781 781 782 782 atomic_inc(&mddev->pending_writes); 783 - submit_bio(REQ_WRITE | REQ_SYNC | REQ_UNPLUG | REQ_FLUSH | REQ_FUA, 784 - bio); 783 + submit_bio(REQ_WRITE | REQ_SYNC | REQ_FLUSH | REQ_FUA, bio); 785 784 } 786 785 787 786 void md_super_wait(mddev_t *mddev) ··· 808 809 struct completion event; 809 810 int ret; 810 811 811 - rw |= REQ_SYNC | REQ_UNPLUG; 812 + rw |= REQ_SYNC; 812 813 813 814 bio->bi_bdev = (metadata_op && rdev->meta_bdev) ? 814 815 rdev->meta_bdev : rdev->bdev; ··· 1803 1804 mdname(mddev)); 1804 1805 return -EINVAL; 1805 1806 } 1806 - printk(KERN_NOTICE "md: data integrity on %s enabled\n", 1807 - mdname(mddev)); 1807 + printk(KERN_NOTICE "md: data integrity enabled on %s\n", mdname(mddev)); 1808 + if (bioset_integrity_create(mddev->bio_set, BIO_POOL_SIZE)) { 1809 + printk(KERN_ERR "md: failed to create integrity pool for %s\n", 1810 + mdname(mddev)); 1811 + return -EINVAL; 1812 + } 1808 1813 return 0; 1809 1814 } 1810 1815 EXPORT_SYMBOL(md_integrity_register); ··· 4820 4817 __md_stop_writes(mddev); 4821 4818 md_stop(mddev); 4822 4819 mddev->queue->merge_bvec_fn = NULL; 4823 - mddev->queue->unplug_fn = NULL; 4824 4820 mddev->queue->backing_dev_info.congested_fn = NULL; 4825 4821 4826 4822 /* tell userspace to handle 'inactive' */ ··· 6694 6692 6695 6693 void md_unplug(mddev_t *mddev) 6696 6694 { 6697 - if (mddev->queue) 6698 - blk_unplug(mddev->queue); 6699 6695 if (mddev->plug) 6700 6696 mddev->plug->unplug_fn(mddev->plug); 6701 6697 } ··· 6876 6876 >= mddev->resync_max - mddev->curr_resync_completed 6877 6877 )) { 6878 6878 /* time to update curr_resync_completed */ 6879 - md_unplug(mddev); 6880 6879 wait_event(mddev->recovery_wait, 6881 6880 atomic_read(&mddev->recovery_active) == 0); 6882 6881 mddev->curr_resync_completed = j; ··· 6951 6952 * about not overloading the IO subsystem. (things like an 6952 6953 * e2fsck being done on the RAID array should execute fast) 6953 6954 */ 6954 - md_unplug(mddev); 6955 6955 cond_resched(); 6956 6956 6957 6957 currspeed = ((unsigned long)(io_sectors-mddev->resync_mark_cnt))/2 ··· 6969 6971 * this also signals 'finished resyncing' to md_stop 6970 6972 */ 6971 6973 out: 6972 - md_unplug(mddev); 6973 - 6974 6974 wait_event(mddev->recovery_wait, !atomic_read(&mddev->recovery_active)); 6975 6975 6976 6976 /* tell personality that we are finished */

+5 -33

drivers/md/multipath.c

··· 106 106 rdev_dec_pending(rdev, conf->mddev); 107 107 } 108 108 109 - static void unplug_slaves(mddev_t *mddev) 110 - { 111 - multipath_conf_t *conf = mddev->private; 112 - int i; 113 - 114 - rcu_read_lock(); 115 - for (i=0; i<mddev->raid_disks; i++) { 116 - mdk_rdev_t *rdev = rcu_dereference(conf->multipaths[i].rdev); 117 - if (rdev && !test_bit(Faulty, &rdev->flags) 118 - && atomic_read(&rdev->nr_pending)) { 119 - struct request_queue *r_queue = bdev_get_queue(rdev->bdev); 120 - 121 - atomic_inc(&rdev->nr_pending); 122 - rcu_read_unlock(); 123 - 124 - blk_unplug(r_queue); 125 - 126 - rdev_dec_pending(rdev, mddev); 127 - rcu_read_lock(); 128 - } 129 - } 130 - rcu_read_unlock(); 131 - } 132 - 133 - static void multipath_unplug(struct request_queue *q) 134 - { 135 - unplug_slaves(q->queuedata); 136 - } 137 - 138 - 139 109 static int multipath_make_request(mddev_t *mddev, struct bio * bio) 140 110 { 141 111 multipath_conf_t *conf = mddev->private; ··· 315 345 p->rdev = rdev; 316 346 goto abort; 317 347 } 318 - md_integrity_register(mddev); 348 + err = md_integrity_register(mddev); 319 349 } 320 350 abort: 321 351 ··· 487 517 */ 488 518 md_set_array_sectors(mddev, multipath_size(mddev, 0, 0)); 489 519 490 - mddev->queue->unplug_fn = multipath_unplug; 491 520 mddev->queue->backing_dev_info.congested_fn = multipath_congested; 492 521 mddev->queue->backing_dev_info.congested_data = mddev; 493 - md_integrity_register(mddev); 522 + 523 + if (md_integrity_register(mddev)) 524 + goto out_free_conf; 525 + 494 526 return 0; 495 527 496 528 out_free_conf:

+1 -18

drivers/md/raid0.c

··· 25 25 #include "raid0.h" 26 26 #include "raid5.h" 27 27 28 - static void raid0_unplug(struct request_queue *q) 29 - { 30 - mddev_t *mddev = q->queuedata; 31 - raid0_conf_t *conf = mddev->private; 32 - mdk_rdev_t **devlist = conf->devlist; 33 - int raid_disks = conf->strip_zone[0].nb_dev; 34 - int i; 35 - 36 - for (i=0; i < raid_disks; i++) { 37 - struct request_queue *r_queue = bdev_get_queue(devlist[i]->bdev); 38 - 39 - blk_unplug(r_queue); 40 - } 41 - } 42 - 43 28 static int raid0_congested(void *data, int bits) 44 29 { 45 30 mddev_t *mddev = data; ··· 257 272 mdname(mddev), 258 273 (unsigned long long)smallest->sectors); 259 274 } 260 - mddev->queue->unplug_fn = raid0_unplug; 261 275 mddev->queue->backing_dev_info.congested_fn = raid0_congested; 262 276 mddev->queue->backing_dev_info.congested_data = mddev; 263 277 ··· 379 395 380 396 blk_queue_merge_bvec(mddev->queue, raid0_mergeable_bvec); 381 397 dump_zones(mddev); 382 - md_integrity_register(mddev); 383 - return 0; 398 + return md_integrity_register(mddev); 384 399 } 385 400 386 401 static int raid0_stop(mddev_t *mddev)

+19 -72

drivers/md/raid1.c

··· 52 52 #define NR_RAID1_BIOS 256 53 53 54 54 55 - static void unplug_slaves(mddev_t *mddev); 56 - 57 55 static void allow_barrier(conf_t *conf); 58 56 static void lower_barrier(conf_t *conf); 59 57 60 58 static void * r1bio_pool_alloc(gfp_t gfp_flags, void *data) 61 59 { 62 60 struct pool_info *pi = data; 63 - r1bio_t *r1_bio; 64 61 int size = offsetof(r1bio_t, bios[pi->raid_disks]); 65 62 66 63 /* allocate a r1bio with room for raid_disks entries in the bios array */ 67 - r1_bio = kzalloc(size, gfp_flags); 68 - if (!r1_bio && pi->mddev) 69 - unplug_slaves(pi->mddev); 70 - 71 - return r1_bio; 64 + return kzalloc(size, gfp_flags); 72 65 } 73 66 74 67 static void r1bio_pool_free(void *r1_bio, void *data) ··· 84 91 int i, j; 85 92 86 93 r1_bio = r1bio_pool_alloc(gfp_flags, pi); 87 - if (!r1_bio) { 88 - unplug_slaves(pi->mddev); 94 + if (!r1_bio) 89 95 return NULL; 90 - } 91 96 92 97 /* 93 98 * Allocate bios : 1 for reading, n-1 for writing ··· 511 520 return new_disk; 512 521 } 513 522 514 - static void unplug_slaves(mddev_t *mddev) 515 - { 516 - conf_t *conf = mddev->private; 517 - int i; 518 - 519 - rcu_read_lock(); 520 - for (i=0; i<mddev->raid_disks; i++) { 521 - mdk_rdev_t *rdev = rcu_dereference(conf->mirrors[i].rdev); 522 - if (rdev && !test_bit(Faulty, &rdev->flags) && atomic_read(&rdev->nr_pending)) { 523 - struct request_queue *r_queue = bdev_get_queue(rdev->bdev); 524 - 525 - atomic_inc(&rdev->nr_pending); 526 - rcu_read_unlock(); 527 - 528 - blk_unplug(r_queue); 529 - 530 - rdev_dec_pending(rdev, mddev); 531 - rcu_read_lock(); 532 - } 533 - } 534 - rcu_read_unlock(); 535 - } 536 - 537 - static void raid1_unplug(struct request_queue *q) 538 - { 539 - mddev_t *mddev = q->queuedata; 540 - 541 - unplug_slaves(mddev); 542 - md_wakeup_thread(mddev->thread); 543 - } 544 - 545 523 static int raid1_congested(void *data, int bits) 546 524 { 547 525 mddev_t *mddev = data; ··· 540 580 } 541 581 542 582 543 - static int flush_pending_writes(conf_t *conf) 583 + static void flush_pending_writes(conf_t *conf) 544 584 { 545 585 /* Any writes that have been queued but are awaiting 546 586 * bitmap updates get flushed here. 547 - * We return 1 if any requests were actually submitted. 548 587 */ 549 - int rv = 0; 550 - 551 588 spin_lock_irq(&conf->device_lock); 552 589 553 590 if (conf->pending_bio_list.head) { 554 591 struct bio *bio; 555 592 bio = bio_list_get(&conf->pending_bio_list); 556 - /* Only take the spinlock to quiet a warning */ 557 - spin_lock(conf->mddev->queue->queue_lock); 558 - blk_remove_plug(conf->mddev->queue); 559 - spin_unlock(conf->mddev->queue->queue_lock); 560 593 spin_unlock_irq(&conf->device_lock); 561 594 /* flush any pending bitmap writes to 562 595 * disk before proceeding w/ I/O */ ··· 561 608 generic_make_request(bio); 562 609 bio = next; 563 610 } 564 - rv = 1; 565 611 } else 566 612 spin_unlock_irq(&conf->device_lock); 567 - return rv; 613 + } 614 + 615 + static void md_kick_device(mddev_t *mddev) 616 + { 617 + blk_flush_plug(current); 618 + md_wakeup_thread(mddev->thread); 568 619 } 569 620 570 621 /* Barriers.... ··· 600 643 601 644 /* Wait until no block IO is waiting */ 602 645 wait_event_lock_irq(conf->wait_barrier, !conf->nr_waiting, 603 - conf->resync_lock, 604 - raid1_unplug(conf->mddev->queue)); 646 + conf->resync_lock, md_kick_device(conf->mddev)); 605 647 606 648 /* block any new IO from starting */ 607 649 conf->barrier++; ··· 608 652 /* Now wait for all pending IO to complete */ 609 653 wait_event_lock_irq(conf->wait_barrier, 610 654 !conf->nr_pending && conf->barrier < RESYNC_DEPTH, 611 - conf->resync_lock, 612 - raid1_unplug(conf->mddev->queue)); 655 + conf->resync_lock, md_kick_device(conf->mddev)); 613 656 614 657 spin_unlock_irq(&conf->resync_lock); 615 658 } ··· 630 675 conf->nr_waiting++; 631 676 wait_event_lock_irq(conf->wait_barrier, !conf->barrier, 632 677 conf->resync_lock, 633 - raid1_unplug(conf->mddev->queue)); 678 + md_kick_device(conf->mddev)); 634 679 conf->nr_waiting--; 635 680 } 636 681 conf->nr_pending++; ··· 667 712 conf->nr_pending == conf->nr_queued+1, 668 713 conf->resync_lock, 669 714 ({ flush_pending_writes(conf); 670 - raid1_unplug(conf->mddev->queue); })); 715 + md_kick_device(conf->mddev); })); 671 716 spin_unlock_irq(&conf->resync_lock); 672 717 } 673 718 static void unfreeze_array(conf_t *conf) ··· 917 962 atomic_inc(&r1_bio->remaining); 918 963 spin_lock_irqsave(&conf->device_lock, flags); 919 964 bio_list_add(&conf->pending_bio_list, mbio); 920 - blk_plug_device_unlocked(mddev->queue); 921 965 spin_unlock_irqrestore(&conf->device_lock, flags); 922 966 } 923 967 r1_bio_write_done(r1_bio, bio->bi_vcnt, behind_pages, behind_pages != NULL); ··· 925 971 /* In case raid1d snuck in to freeze_array */ 926 972 wake_up(&conf->wait_barrier); 927 973 928 - if (do_sync) 974 + if (do_sync || !bitmap) 929 975 md_wakeup_thread(mddev->thread); 930 976 931 977 return 0; ··· 1132 1178 p->rdev = rdev; 1133 1179 goto abort; 1134 1180 } 1135 - md_integrity_register(mddev); 1181 + err = md_integrity_register(mddev); 1136 1182 } 1137 1183 abort: 1138 1184 ··· 1515 1561 unsigned long flags; 1516 1562 conf_t *conf = mddev->private; 1517 1563 struct list_head *head = &conf->retry_list; 1518 - int unplug=0; 1519 1564 mdk_rdev_t *rdev; 1520 1565 1521 1566 md_check_recovery(mddev); ··· 1522 1569 for (;;) { 1523 1570 char b[BDEVNAME_SIZE]; 1524 1571 1525 - unplug += flush_pending_writes(conf); 1572 + flush_pending_writes(conf); 1526 1573 1527 1574 spin_lock_irqsave(&conf->device_lock, flags); 1528 1575 if (list_empty(head)) { ··· 1536 1583 1537 1584 mddev = r1_bio->mddev; 1538 1585 conf = mddev->private; 1539 - if (test_bit(R1BIO_IsSync, &r1_bio->state)) { 1586 + if (test_bit(R1BIO_IsSync, &r1_bio->state)) 1540 1587 sync_request_write(mddev, r1_bio); 1541 - unplug = 1; 1542 - } else { 1588 + else { 1543 1589 int disk; 1544 1590 1545 1591 /* we got a read error. Maybe the drive is bad. Maybe just ··· 1588 1636 bio->bi_end_io = raid1_end_read_request; 1589 1637 bio->bi_rw = READ | do_sync; 1590 1638 bio->bi_private = r1_bio; 1591 - unplug = 1; 1592 1639 generic_make_request(bio); 1593 1640 } 1594 1641 } 1595 1642 cond_resched(); 1596 1643 } 1597 - if (unplug) 1598 - unplug_slaves(mddev); 1599 1644 } 1600 1645 1601 1646 ··· 2015 2066 2016 2067 md_set_array_sectors(mddev, raid1_size(mddev, 0, 0)); 2017 2068 2018 - mddev->queue->unplug_fn = raid1_unplug; 2019 2069 mddev->queue->backing_dev_info.congested_fn = raid1_congested; 2020 2070 mddev->queue->backing_dev_info.congested_data = mddev; 2021 - md_integrity_register(mddev); 2022 - return 0; 2071 + return md_integrity_register(mddev); 2023 2072 } 2024 2073 2025 2074 static int stop(mddev_t *mddev)

+24 -73

drivers/md/raid10.c

··· 57 57 */ 58 58 #define NR_RAID10_BIOS 256 59 59 60 - static void unplug_slaves(mddev_t *mddev); 61 - 62 60 static void allow_barrier(conf_t *conf); 63 61 static void lower_barrier(conf_t *conf); 64 62 65 63 static void * r10bio_pool_alloc(gfp_t gfp_flags, void *data) 66 64 { 67 65 conf_t *conf = data; 68 - r10bio_t *r10_bio; 69 66 int size = offsetof(struct r10bio_s, devs[conf->copies]); 70 67 71 68 /* allocate a r10bio with room for raid_disks entries in the bios array */ 72 - r10_bio = kzalloc(size, gfp_flags); 73 - if (!r10_bio && conf->mddev) 74 - unplug_slaves(conf->mddev); 75 - 76 - return r10_bio; 69 + return kzalloc(size, gfp_flags); 77 70 } 78 71 79 72 static void r10bio_pool_free(void *r10_bio, void *data) ··· 99 106 int nalloc; 100 107 101 108 r10_bio = r10bio_pool_alloc(gfp_flags, conf); 102 - if (!r10_bio) { 103 - unplug_slaves(conf->mddev); 109 + if (!r10_bio) 104 110 return NULL; 105 - } 106 111 107 112 if (test_bit(MD_RECOVERY_SYNC, &conf->mddev->recovery)) 108 113 nalloc = conf->copies; /* resync */ ··· 588 597 return disk; 589 598 } 590 599 591 - static void unplug_slaves(mddev_t *mddev) 592 - { 593 - conf_t *conf = mddev->private; 594 - int i; 595 - 596 - rcu_read_lock(); 597 - for (i=0; i < conf->raid_disks; i++) { 598 - mdk_rdev_t *rdev = rcu_dereference(conf->mirrors[i].rdev); 599 - if (rdev && !test_bit(Faulty, &rdev->flags) && atomic_read(&rdev->nr_pending)) { 600 - struct request_queue *r_queue = bdev_get_queue(rdev->bdev); 601 - 602 - atomic_inc(&rdev->nr_pending); 603 - rcu_read_unlock(); 604 - 605 - blk_unplug(r_queue); 606 - 607 - rdev_dec_pending(rdev, mddev); 608 - rcu_read_lock(); 609 - } 610 - } 611 - rcu_read_unlock(); 612 - } 613 - 614 - static void raid10_unplug(struct request_queue *q) 615 - { 616 - mddev_t *mddev = q->queuedata; 617 - 618 - unplug_slaves(q->queuedata); 619 - md_wakeup_thread(mddev->thread); 620 - } 621 - 622 600 static int raid10_congested(void *data, int bits) 623 601 { 624 602 mddev_t *mddev = data; ··· 609 649 return ret; 610 650 } 611 651 612 - static int flush_pending_writes(conf_t *conf) 652 + static void flush_pending_writes(conf_t *conf) 613 653 { 614 654 /* Any writes that have been queued but are awaiting 615 655 * bitmap updates get flushed here. 616 - * We return 1 if any requests were actually submitted. 617 656 */ 618 - int rv = 0; 619 - 620 657 spin_lock_irq(&conf->device_lock); 621 658 622 659 if (conf->pending_bio_list.head) { 623 660 struct bio *bio; 624 661 bio = bio_list_get(&conf->pending_bio_list); 625 - /* Spinlock only taken to quiet a warning */ 626 - spin_lock(conf->mddev->queue->queue_lock); 627 - blk_remove_plug(conf->mddev->queue); 628 - spin_unlock(conf->mddev->queue->queue_lock); 629 662 spin_unlock_irq(&conf->device_lock); 630 663 /* flush any pending bitmap writes to disk 631 664 * before proceeding w/ I/O */ ··· 630 677 generic_make_request(bio); 631 678 bio = next; 632 679 } 633 - rv = 1; 634 680 } else 635 681 spin_unlock_irq(&conf->device_lock); 636 - return rv; 637 682 } 683 + 684 + static void md_kick_device(mddev_t *mddev) 685 + { 686 + blk_flush_plug(current); 687 + md_wakeup_thread(mddev->thread); 688 + } 689 + 638 690 /* Barriers.... 639 691 * Sometimes we need to suspend IO while we do something else, 640 692 * either some resync/recovery, or reconfigure the array. ··· 669 711 670 712 /* Wait until no block IO is waiting (unless 'force') */ 671 713 wait_event_lock_irq(conf->wait_barrier, force || !conf->nr_waiting, 672 - conf->resync_lock, 673 - raid10_unplug(conf->mddev->queue)); 714 + conf->resync_lock, md_kick_device(conf->mddev)); 674 715 675 716 /* block any new IO from starting */ 676 717 conf->barrier++; ··· 677 720 /* No wait for all pending IO to complete */ 678 721 wait_event_lock_irq(conf->wait_barrier, 679 722 !conf->nr_pending && conf->barrier < RESYNC_DEPTH, 680 - conf->resync_lock, 681 - raid10_unplug(conf->mddev->queue)); 723 + conf->resync_lock, md_kick_device(conf->mddev)); 682 724 683 725 spin_unlock_irq(&conf->resync_lock); 684 726 } ··· 698 742 conf->nr_waiting++; 699 743 wait_event_lock_irq(conf->wait_barrier, !conf->barrier, 700 744 conf->resync_lock, 701 - raid10_unplug(conf->mddev->queue)); 745 + md_kick_device(conf->mddev)); 702 746 conf->nr_waiting--; 703 747 } 704 748 conf->nr_pending++; ··· 735 779 conf->nr_pending == conf->nr_queued+1, 736 780 conf->resync_lock, 737 781 ({ flush_pending_writes(conf); 738 - raid10_unplug(conf->mddev->queue); })); 782 + md_kick_device(conf->mddev); })); 739 783 spin_unlock_irq(&conf->resync_lock); 740 784 } 741 785 ··· 930 974 atomic_inc(&r10_bio->remaining); 931 975 spin_lock_irqsave(&conf->device_lock, flags); 932 976 bio_list_add(&conf->pending_bio_list, mbio); 933 - blk_plug_device_unlocked(mddev->queue); 934 977 spin_unlock_irqrestore(&conf->device_lock, flags); 935 978 } 936 979 ··· 946 991 /* In case raid10d snuck in to freeze_array */ 947 992 wake_up(&conf->wait_barrier); 948 993 949 - if (do_sync) 994 + if (do_sync || !mddev->bitmap) 950 995 md_wakeup_thread(mddev->thread); 951 996 952 997 return 0; ··· 1188 1233 p->rdev = rdev; 1189 1234 goto abort; 1190 1235 } 1191 - md_integrity_register(mddev); 1236 + err = md_integrity_register(mddev); 1192 1237 } 1193 1238 abort: 1194 1239 ··· 1639 1684 unsigned long flags; 1640 1685 conf_t *conf = mddev->private; 1641 1686 struct list_head *head = &conf->retry_list; 1642 - int unplug=0; 1643 1687 mdk_rdev_t *rdev; 1644 1688 1645 1689 md_check_recovery(mddev); ··· 1646 1692 for (;;) { 1647 1693 char b[BDEVNAME_SIZE]; 1648 1694 1649 - unplug += flush_pending_writes(conf); 1695 + flush_pending_writes(conf); 1650 1696 1651 1697 spin_lock_irqsave(&conf->device_lock, flags); 1652 1698 if (list_empty(head)) { ··· 1660 1706 1661 1707 mddev = r10_bio->mddev; 1662 1708 conf = mddev->private; 1663 - if (test_bit(R10BIO_IsSync, &r10_bio->state)) { 1709 + if (test_bit(R10BIO_IsSync, &r10_bio->state)) 1664 1710 sync_request_write(mddev, r10_bio); 1665 - unplug = 1; 1666 - } else if (test_bit(R10BIO_IsRecover, &r10_bio->state)) { 1711 + else if (test_bit(R10BIO_IsRecover, &r10_bio->state)) 1667 1712 recovery_request_write(mddev, r10_bio); 1668 - unplug = 1; 1669 - } else { 1713 + else { 1670 1714 int mirror; 1671 1715 /* we got a read error. Maybe the drive is bad. Maybe just 1672 1716 * the block and we can fix it. ··· 1711 1759 bio->bi_rw = READ | do_sync; 1712 1760 bio->bi_private = r10_bio; 1713 1761 bio->bi_end_io = raid10_end_read_request; 1714 - unplug = 1; 1715 1762 generic_make_request(bio); 1716 1763 } 1717 1764 } 1718 1765 cond_resched(); 1719 1766 } 1720 - if (unplug) 1721 - unplug_slaves(mddev); 1722 1767 } 1723 1768 1724 1769 ··· 2326 2377 md_set_array_sectors(mddev, size); 2327 2378 mddev->resync_max_sectors = size; 2328 2379 2329 - mddev->queue->unplug_fn = raid10_unplug; 2330 2380 mddev->queue->backing_dev_info.congested_fn = raid10_congested; 2331 2381 mddev->queue->backing_dev_info.congested_data = mddev; 2332 2382 ··· 2343 2395 2344 2396 if (conf->near_copies < conf->raid_disks) 2345 2397 blk_queue_merge_bvec(mddev->queue, raid10_mergeable_bvec); 2346 - md_integrity_register(mddev); 2398 + 2399 + if (md_integrity_register(mddev)) 2400 + goto out_free_conf; 2401 + 2347 2402 return 0; 2348 2403 2349 2404 out_free_conf:

+9 -54

drivers/md/raid5.c

··· 433 433 return 0; 434 434 } 435 435 436 - static void unplug_slaves(mddev_t *mddev); 437 - 438 436 static struct stripe_head * 439 437 get_active_stripe(raid5_conf_t *conf, sector_t sector, 440 438 int previous, int noblock, int noquiesce) ··· 461 463 < (conf->max_nr_stripes *3/4) 462 464 || !conf->inactive_blocked), 463 465 conf->device_lock, 464 - md_raid5_unplug_device(conf) 465 - ); 466 + md_raid5_kick_device(conf)); 466 467 conf->inactive_blocked = 0; 467 468 } else 468 469 init_stripe(sh, sector, previous); ··· 1470 1473 wait_event_lock_irq(conf->wait_for_stripe, 1471 1474 !list_empty(&conf->inactive_list), 1472 1475 conf->device_lock, 1473 - unplug_slaves(conf->mddev) 1474 - ); 1476 + blk_flush_plug(current)); 1475 1477 osh = get_free_stripe(conf); 1476 1478 spin_unlock_irq(&conf->device_lock); 1477 1479 atomic_set(&nsh->count, 1); ··· 3641 3645 } 3642 3646 } 3643 3647 3644 - static void unplug_slaves(mddev_t *mddev) 3648 + void md_raid5_kick_device(raid5_conf_t *conf) 3645 3649 { 3646 - raid5_conf_t *conf = mddev->private; 3647 - int i; 3648 - int devs = max(conf->raid_disks, conf->previous_raid_disks); 3649 - 3650 - rcu_read_lock(); 3651 - for (i = 0; i < devs; i++) { 3652 - mdk_rdev_t *rdev = rcu_dereference(conf->disks[i].rdev); 3653 - if (rdev && !test_bit(Faulty, &rdev->flags) && atomic_read(&rdev->nr_pending)) { 3654 - struct request_queue *r_queue = bdev_get_queue(rdev->bdev); 3655 - 3656 - atomic_inc(&rdev->nr_pending); 3657 - rcu_read_unlock(); 3658 - 3659 - blk_unplug(r_queue); 3660 - 3661 - rdev_dec_pending(rdev, mddev); 3662 - rcu_read_lock(); 3663 - } 3664 - } 3665 - rcu_read_unlock(); 3666 - } 3667 - 3668 - void md_raid5_unplug_device(raid5_conf_t *conf) 3669 - { 3670 - unsigned long flags; 3671 - 3672 - spin_lock_irqsave(&conf->device_lock, flags); 3673 - 3674 - if (plugger_remove_plug(&conf->plug)) { 3675 - conf->seq_flush++; 3676 - raid5_activate_delayed(conf); 3677 - } 3650 + blk_flush_plug(current); 3651 + raid5_activate_delayed(conf); 3678 3652 md_wakeup_thread(conf->mddev->thread); 3679 - 3680 - spin_unlock_irqrestore(&conf->device_lock, flags); 3681 - 3682 - unplug_slaves(conf->mddev); 3683 3653 } 3684 - EXPORT_SYMBOL_GPL(md_raid5_unplug_device); 3654 + EXPORT_SYMBOL_GPL(md_raid5_kick_device); 3685 3655 3686 3656 static void raid5_unplug(struct plug_handle *plug) 3687 3657 { 3688 3658 raid5_conf_t *conf = container_of(plug, raid5_conf_t, plug); 3689 - md_raid5_unplug_device(conf); 3690 - } 3691 3659 3692 - static void raid5_unplug_queue(struct request_queue *q) 3693 - { 3694 - mddev_t *mddev = q->queuedata; 3695 - md_raid5_unplug_device(mddev->private); 3660 + md_raid5_kick_device(conf); 3696 3661 } 3697 3662 3698 3663 int md_raid5_congested(mddev_t *mddev, int bits) ··· 4057 4100 * add failed due to overlap. Flush everything 4058 4101 * and wait a while 4059 4102 */ 4060 - md_raid5_unplug_device(conf); 4103 + md_raid5_kick_device(conf); 4061 4104 release_stripe(sh); 4062 4105 schedule(); 4063 4106 goto retry; ··· 4322 4365 4323 4366 if (sector_nr >= max_sector) { 4324 4367 /* just being told to finish up .. nothing much to do */ 4325 - unplug_slaves(mddev); 4326 4368 4327 4369 if (test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery)) { 4328 4370 end_reshape(conf); ··· 4525 4569 spin_unlock_irq(&conf->device_lock); 4526 4570 4527 4571 async_tx_issue_pending_all(); 4528 - unplug_slaves(mddev); 4529 4572 4530 4573 pr_debug("--- raid5d inactive\n"); 4531 4574 } ··· 5159 5204 5160 5205 mddev->queue->backing_dev_info.congested_data = mddev; 5161 5206 mddev->queue->backing_dev_info.congested_fn = raid5_congested; 5162 - mddev->queue->unplug_fn = raid5_unplug_queue; 5207 + mddev->queue->queue_lock = &conf->device_lock; 5163 5208 5164 5209 chunk_size = mddev->chunk_sectors << 9; 5165 5210 blk_queue_io_min(mddev->queue, chunk_size);

+1 -1

drivers/md/raid5.h

··· 503 503 } 504 504 505 505 extern int md_raid5_congested(mddev_t *mddev, int bits); 506 - extern void md_raid5_unplug_device(raid5_conf_t *conf); 506 + extern void md_raid5_kick_device(raid5_conf_t *conf); 507 507 extern int raid5_set_cache_size(mddev_t *mddev, int size); 508 508 #endif

+8 -9

drivers/message/i2o/i2o_block.c

··· 695 695 }; 696 696 697 697 /** 698 - * i2o_block_media_changed - Have we seen a media change? 698 + * i2o_block_check_events - Have we seen a media change? 699 699 * @disk: gendisk which should be verified 700 + * @clearing: events being cleared 700 701 * 701 702 * Verifies if the media has changed. 702 703 * 703 704 * Returns 1 if the media was changed or 0 otherwise. 704 705 */ 705 - static int i2o_block_media_changed(struct gendisk *disk) 706 + static unsigned int i2o_block_check_events(struct gendisk *disk, 707 + unsigned int clearing) 706 708 { 707 709 struct i2o_block_device *p = disk->private_data; 708 710 709 711 if (p->media_change_flag) { 710 712 p->media_change_flag = 0; 711 - return 1; 713 + return DISK_EVENT_MEDIA_CHANGE; 712 714 } 713 715 return 0; 714 716 } ··· 897 895 { 898 896 struct request *req; 899 897 900 - while (!blk_queue_plugged(q)) { 901 - req = blk_peek_request(q); 902 - if (!req) 903 - break; 904 - 898 + while ((req = blk_peek_request(q)) != NULL) { 905 899 if (req->cmd_type == REQ_TYPE_FS) { 906 900 struct i2o_block_delayed_request *dreq; 907 901 struct i2o_block_request *ireq = req->special; ··· 948 950 .ioctl = i2o_block_ioctl, 949 951 .compat_ioctl = i2o_block_ioctl, 950 952 .getgeo = i2o_block_getgeo, 951 - .media_changed = i2o_block_media_changed 953 + .check_events = i2o_block_check_events, 952 954 }; 953 955 954 956 /** ··· 1000 1002 gd->major = I2O_MAJOR; 1001 1003 gd->queue = queue; 1002 1004 gd->fops = &i2o_block_fops; 1005 + gd->events = DISK_EVENT_MEDIA_CHANGE; 1003 1006 gd->private_data = dev; 1004 1007 1005 1008 dev->gd = gd;

+1 -2

drivers/mmc/card/queue.c

··· 55 55 56 56 spin_lock_irq(q->queue_lock); 57 57 set_current_state(TASK_INTERRUPTIBLE); 58 - if (!blk_queue_plugged(q)) 59 - req = blk_fetch_request(q); 58 + req = blk_fetch_request(q); 60 59 mq->req = req; 61 60 spin_unlock_irq(q->queue_lock); 62 61

+1 -1

drivers/s390/block/dasd.c

··· 1917 1917 return; 1918 1918 } 1919 1919 /* Now we try to fetch requests from the request queue */ 1920 - while (!blk_queue_plugged(queue) && (req = blk_peek_request(queue))) { 1920 + while ((req = blk_peek_request(queue))) { 1921 1921 if (basedev->features & DASD_FEATURE_READONLY && 1922 1922 rq_data_dir(req) == WRITE) { 1923 1923 DBF_DEV_EVENT(DBF_ERR, basedev,

+6 -6

drivers/s390/char/tape_block.c

··· 48 48 static DEFINE_MUTEX(tape_block_mutex); 49 49 static int tapeblock_open(struct block_device *, fmode_t); 50 50 static int tapeblock_release(struct gendisk *, fmode_t); 51 - static int tapeblock_medium_changed(struct gendisk *); 51 + static unsigned int tapeblock_check_events(struct gendisk *, unsigned int); 52 52 static int tapeblock_revalidate_disk(struct gendisk *); 53 53 54 54 static const struct block_device_operations tapeblock_fops = { 55 55 .owner = THIS_MODULE, 56 56 .open = tapeblock_open, 57 57 .release = tapeblock_release, 58 - .media_changed = tapeblock_medium_changed, 58 + .check_events = tapeblock_check_events, 59 59 .revalidate_disk = tapeblock_revalidate_disk, 60 60 }; 61 61 ··· 161 161 162 162 spin_lock_irq(&device->blk_data.request_queue_lock); 163 163 while ( 164 - !blk_queue_plugged(queue) && 165 164 blk_peek_request(queue) && 166 165 nr_queued < TAPEBLOCK_MIN_REQUEUE 167 166 ) { ··· 236 237 disk->major = tapeblock_major; 237 238 disk->first_minor = device->first_minor; 238 239 disk->fops = &tapeblock_fops; 240 + disk->events = DISK_EVENT_MEDIA_CHANGE; 239 241 disk->private_data = tape_get_device(device); 240 242 disk->queue = blkdat->request_queue; 241 243 set_capacity(disk, 0); ··· 340 340 return 0; 341 341 } 342 342 343 - static int 344 - tapeblock_medium_changed(struct gendisk *disk) 343 + static unsigned int 344 + tapeblock_check_events(struct gendisk *disk, unsigned int clearing) 345 345 { 346 346 struct tape_device *device; 347 347 ··· 349 349 DBF_LH(6, "tapeblock_medium_changed(%p) = %d\n", 350 350 device, device->blk_data.medium_changed); 351 351 352 - return device->blk_data.medium_changed; 352 + return device->blk_data.medium_changed ? DISK_EVENT_MEDIA_CHANGE : 0; 353 353 } 354 354 355 355 /*

+19 -25

drivers/scsi/scsi_lib.c

··· 67 67 68 68 struct kmem_cache *scsi_sdb_cache; 69 69 70 + /* 71 + * When to reinvoke queueing after a resource shortage. It's 3 msecs to 72 + * not change behaviour from the previous unplug mechanism, experimentation 73 + * may prove this needs changing. 74 + */ 75 + #define SCSI_QUEUE_DELAY 3 76 + 70 77 static void scsi_run_queue(struct request_queue *q); 71 78 72 79 /* ··· 156 149 /* 157 150 * Requeue this command. It will go before all other commands 158 151 * that are already in the queue. 159 - * 160 - * NOTE: there is magic here about the way the queue is plugged if 161 - * we have no outstanding commands. 162 - * 163 - * Although we *don't* plug the queue, we call the request 164 - * function. The SCSI request function detects the blocked condition 165 - * and plugs the queue appropriately. 166 - */ 152 + */ 167 153 spin_lock_irqsave(q->queue_lock, flags); 168 154 blk_requeue_request(q, cmd->request); 169 155 spin_unlock_irqrestore(q->queue_lock, flags); ··· 1226 1226 case BLKPREP_DEFER: 1227 1227 /* 1228 1228 * If we defer, the blk_peek_request() returns NULL, but the 1229 - * queue must be restarted, so we plug here if no returning 1230 - * command will automatically do that. 1229 + * queue must be restarted, so we schedule a callback to happen 1230 + * shortly. 1231 1231 */ 1232 1232 if (sdev->device_busy == 0) 1233 - blk_plug_device(q); 1233 + blk_delay_queue(q, SCSI_QUEUE_DELAY); 1234 1234 break; 1235 1235 default: 1236 1236 req->cmd_flags |= REQ_DONTPREP; ··· 1269 1269 sdev_printk(KERN_INFO, sdev, 1270 1270 "unblocking device at zero depth\n")); 1271 1271 } else { 1272 - blk_plug_device(q); 1272 + blk_delay_queue(q, SCSI_QUEUE_DELAY); 1273 1273 return 0; 1274 1274 } 1275 1275 } ··· 1499 1499 * the host is no longer able to accept any more requests. 1500 1500 */ 1501 1501 shost = sdev->host; 1502 - while (!blk_queue_plugged(q)) { 1502 + for (;;) { 1503 1503 int rtn; 1504 1504 /* 1505 1505 * get next queueable request. We do this early to make sure ··· 1578 1578 */ 1579 1579 rtn = scsi_dispatch_cmd(cmd); 1580 1580 spin_lock_irq(q->queue_lock); 1581 - if(rtn) { 1582 - /* we're refusing the command; because of 1583 - * the way locks get dropped, we need to 1584 - * check here if plugging is required */ 1585 - if(sdev->device_busy == 0) 1586 - blk_plug_device(q); 1587 - 1588 - break; 1589 - } 1581 + if (rtn) 1582 + goto out_delay; 1590 1583 } 1591 1584 1592 1585 goto out; ··· 1598 1605 spin_lock_irq(q->queue_lock); 1599 1606 blk_requeue_request(q, req); 1600 1607 sdev->device_busy--; 1601 - if(sdev->device_busy == 0) 1602 - blk_plug_device(q); 1603 - out: 1608 + out_delay: 1609 + if (sdev->device_busy == 0) 1610 + blk_delay_queue(q, SCSI_QUEUE_DELAY); 1611 + out: 1604 1612 /* must be careful here...if we trigger the ->remove() function 1605 1613 * we cannot be holding the q lock */ 1606 1614 spin_unlock_irq(q->queue_lock);

+1 -1

drivers/scsi/scsi_transport_fc.c

··· 3913 3913 if (!get_device(dev)) 3914 3914 return; 3915 3915 3916 - while (!blk_queue_plugged(q)) { 3916 + while (1) { 3917 3917 if (rport && (rport->port_state == FC_PORTSTATE_BLOCKED) && 3918 3918 !(rport->flags & FC_RPORT_FAST_FAIL_TIMEDOUT)) 3919 3919 break;

+1 -5

drivers/scsi/scsi_transport_sas.c

··· 173 173 int ret; 174 174 int (*handler)(struct Scsi_Host *, struct sas_rphy *, struct request *); 175 175 176 - while (!blk_queue_plugged(q)) { 177 - req = blk_fetch_request(q); 178 - if (!req) 179 - break; 180 - 176 + while ((req = blk_fetch_request(q)) != NULL) { 181 177 spin_unlock_irq(q->queue_lock); 182 178 183 179 handler = to_sas_internal(shost->transportt)->f->smp_handler;

+7 -4

drivers/staging/hv/blkvsc_drv.c

··· 124 124 125 125 static int blkvsc_open(struct block_device *bdev, fmode_t mode); 126 126 static int blkvsc_release(struct gendisk *disk, fmode_t mode); 127 - static int blkvsc_media_changed(struct gendisk *gd); 127 + static unsigned int blkvsc_check_events(struct gendisk *gd, 128 + unsigned int clearing); 128 129 static int blkvsc_revalidate_disk(struct gendisk *gd); 129 130 static int blkvsc_getgeo(struct block_device *bd, struct hd_geometry *hg); 130 131 static int blkvsc_ioctl(struct block_device *bd, fmode_t mode, ··· 156 155 .owner = THIS_MODULE, 157 156 .open = blkvsc_open, 158 157 .release = blkvsc_release, 159 - .media_changed = blkvsc_media_changed, 158 + .check_events = blkvsc_check_events, 160 159 .revalidate_disk = blkvsc_revalidate_disk, 161 160 .getgeo = blkvsc_getgeo, 162 161 .ioctl = blkvsc_ioctl, ··· 358 357 else 359 358 blkdev->gd->first_minor = 0; 360 359 blkdev->gd->fops = &block_ops; 360 + blkdev->gd->events = DISK_EVENT_MEDIA_CHANGE; 361 361 blkdev->gd->private_data = blkdev; 362 362 blkdev->gd->driverfs_dev = &(blkdev->device_ctx->device); 363 363 sprintf(blkdev->gd->disk_name, "hd%c", 'a' + devnum); ··· 1339 1337 return 0; 1340 1338 } 1341 1339 1342 - static int blkvsc_media_changed(struct gendisk *gd) 1340 + static unsigned int blkvsc_check_events(struct gendisk *gd, 1341 + unsigned int clearing) 1343 1342 { 1344 1343 DPRINT_DBG(BLKVSC_DRV, "- enter\n"); 1345 - return 1; 1344 + return DISK_EVENT_MEDIA_CHANGE; 1346 1345 } 1347 1346 1348 1347 static int blkvsc_revalidate_disk(struct gendisk *gd)

+7 -4

drivers/staging/westbridge/astoria/block/cyasblkdev_block.c

··· 381 381 return -ENOTTY; 382 382 } 383 383 384 - /* Media_changed block_device opp 384 + /* check_events block_device opp 385 385 * this one is called by kernel to confirm if the media really changed 386 386 * as we indicated by issuing check_disk_change() call */ 387 - int cyasblkdev_media_changed(struct gendisk *gd) 387 + unsigned int cyasblkdev_check_events(struct gendisk *gd, unsigned int clearing) 388 388 { 389 389 struct cyasblkdev_blk_data *bd; 390 390 ··· 402 402 #endif 403 403 } 404 404 405 - /* return media change state "1" yes, 0 no */ 405 + /* return media change state - DISK_EVENT_MEDIA_CHANGE yes, 0 no */ 406 406 return 0; 407 407 } 408 408 ··· 432 432 .ioctl = cyasblkdev_blk_ioctl, 433 433 /* .getgeo = cyasblkdev_blk_getgeo, */ 434 434 /* added to support media removal( real and simulated) media */ 435 - .media_changed = cyasblkdev_media_changed, 435 + .check_events = cyasblkdev_check_events, 436 436 /* added to support media removal( real and simulated) media */ 437 437 .revalidate_disk = cyasblkdev_revalidate_disk, 438 438 .owner = THIS_MODULE, ··· 1090 1090 bd->user_disk_0->first_minor = devidx << CYASBLKDEV_SHIFT; 1091 1091 bd->user_disk_0->minors = 8; 1092 1092 bd->user_disk_0->fops = &cyasblkdev_bdops; 1093 + bd->user_disk_0->events = DISK_EVENT_MEDIA_CHANGE; 1093 1094 bd->user_disk_0->private_data = bd; 1094 1095 bd->user_disk_0->queue = bd->queue.queue; 1095 1096 bd->dbgprn_flags = DBGPRN_RD_RQ; ··· 1191 1190 bd->user_disk_1->first_minor = (devidx + 1) << CYASBLKDEV_SHIFT; 1192 1191 bd->user_disk_1->minors = 8; 1193 1192 bd->user_disk_1->fops = &cyasblkdev_bdops; 1193 + bd->user_disk_0->events = DISK_EVENT_MEDIA_CHANGE; 1194 1194 bd->user_disk_1->private_data = bd; 1195 1195 bd->user_disk_1->queue = bd->queue.queue; 1196 1196 bd->dbgprn_flags = DBGPRN_RD_RQ; ··· 1280 1278 (devidx + 2) << CYASBLKDEV_SHIFT; 1281 1279 bd->system_disk->minors = 8; 1282 1280 bd->system_disk->fops = &cyasblkdev_bdops; 1281 + bd->system_disk->events = DISK_EVENT_MEDIA_CHANGE; 1283 1282 bd->system_disk->private_data = bd; 1284 1283 bd->system_disk->queue = bd->queue.queue; 1285 1284 /* don't search for vfat

+3 -4

drivers/target/target_core_iblock.c

··· 391 391 { 392 392 struct se_device *dev = task->task_se_cmd->se_dev; 393 393 struct iblock_req *req = IBLOCK_REQ(task); 394 - struct iblock_dev *ibd = (struct iblock_dev *)req->ib_dev; 395 - struct request_queue *q = bdev_get_queue(ibd->ibd_bd); 396 394 struct bio *bio = req->ib_bio, *nbio = NULL; 395 + struct blk_plug plug; 397 396 int rw; 398 397 399 398 if (task->task_data_direction == DMA_TO_DEVICE) { ··· 410 411 rw = READ; 411 412 } 412 413 414 + blk_start_plug(&plug); 413 415 while (bio) { 414 416 nbio = bio->bi_next; 415 417 bio->bi_next = NULL; ··· 420 420 submit_bio(rw, bio); 421 421 bio = nbio; 422 422 } 423 + blk_finish_plug(&plug); 423 424 424 - if (q->unplug_fn) 425 - q->unplug_fn(q); 426 425 return PYX_TRANSPORT_SENT_TO_TRANSPORT; 427 426 } 428 427

-1

fs/adfs/inode.c

··· 72 72 static const struct address_space_operations adfs_aops = { 73 73 .readpage = adfs_readpage, 74 74 .writepage = adfs_writepage, 75 - .sync_page = block_sync_page, 76 75 .write_begin = adfs_write_begin, 77 76 .write_end = generic_write_end, 78 77 .bmap = _adfs_bmap

-2

fs/affs/file.c

··· 429 429 const struct address_space_operations affs_aops = { 430 430 .readpage = affs_readpage, 431 431 .writepage = affs_writepage, 432 - .sync_page = block_sync_page, 433 432 .write_begin = affs_write_begin, 434 433 .write_end = generic_write_end, 435 434 .bmap = _affs_bmap ··· 785 786 const struct address_space_operations affs_aops_ofs = { 786 787 .readpage = affs_readpage_ofs, 787 788 //.writepage = affs_writepage_ofs, 788 - //.sync_page = affs_sync_page_ofs, 789 789 .write_begin = affs_write_begin_ofs, 790 790 .write_end = affs_write_end_ofs 791 791 };

+7 -70

fs/aio.c

··· 34 34 #include <linux/security.h> 35 35 #include <linux/eventfd.h> 36 36 #include <linux/blkdev.h> 37 - #include <linux/mempool.h> 38 - #include <linux/hash.h> 39 37 #include <linux/compat.h> 40 38 41 39 #include <asm/kmap_types.h> ··· 63 65 static DEFINE_SPINLOCK(fput_lock); 64 66 static LIST_HEAD(fput_head); 65 67 66 - #define AIO_BATCH_HASH_BITS 3 /* allocated on-stack, so don't go crazy */ 67 - #define AIO_BATCH_HASH_SIZE (1 << AIO_BATCH_HASH_BITS) 68 - struct aio_batch_entry { 69 - struct hlist_node list; 70 - struct address_space *mapping; 71 - }; 72 - mempool_t *abe_pool; 73 - 74 68 static void aio_kick_handler(struct work_struct *); 75 69 static void aio_queue_work(struct kioctx *); 76 70 ··· 76 86 kioctx_cachep = KMEM_CACHE(kioctx,SLAB_HWCACHE_ALIGN|SLAB_PANIC); 77 87 78 88 aio_wq = alloc_workqueue("aio", 0, 1); /* used to limit concurrency */ 79 - abe_pool = mempool_create_kmalloc_pool(1, sizeof(struct aio_batch_entry)); 80 - BUG_ON(!aio_wq || !abe_pool); 89 + BUG_ON(!aio_wq); 81 90 82 91 pr_debug("aio_setup: sizeof(struct page) = %d\n", (int)sizeof(struct page)); 83 92 ··· 1514 1525 return 0; 1515 1526 } 1516 1527 1517 - static void aio_batch_add(struct address_space *mapping, 1518 - struct hlist_head *batch_hash) 1519 - { 1520 - struct aio_batch_entry *abe; 1521 - struct hlist_node *pos; 1522 - unsigned bucket; 1523 - 1524 - bucket = hash_ptr(mapping, AIO_BATCH_HASH_BITS); 1525 - hlist_for_each_entry(abe, pos, &batch_hash[bucket], list) { 1526 - if (abe->mapping == mapping) 1527 - return; 1528 - } 1529 - 1530 - abe = mempool_alloc(abe_pool, GFP_KERNEL); 1531 - 1532 - /* 1533 - * we should be using igrab here, but 1534 - * we don't want to hammer on the global 1535 - * inode spinlock just to take an extra 1536 - * reference on a file that we must already 1537 - * have a reference to. 1538 - * 1539 - * When we're called, we always have a reference 1540 - * on the file, so we must always have a reference 1541 - * on the inode, so ihold() is safe here. 1542 - */ 1543 - ihold(mapping->host); 1544 - abe->mapping = mapping; 1545 - hlist_add_head(&abe->list, &batch_hash[bucket]); 1546 - return; 1547 - } 1548 - 1549 - static void aio_batch_free(struct hlist_head *batch_hash) 1550 - { 1551 - struct aio_batch_entry *abe; 1552 - struct hlist_node *pos, *n; 1553 - int i; 1554 - 1555 - for (i = 0; i < AIO_BATCH_HASH_SIZE; i++) { 1556 - hlist_for_each_entry_safe(abe, pos, n, &batch_hash[i], list) { 1557 - blk_run_address_space(abe->mapping); 1558 - iput(abe->mapping->host); 1559 - hlist_del(&abe->list); 1560 - mempool_free(abe, abe_pool); 1561 - } 1562 - } 1563 - } 1564 - 1565 1528 static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb, 1566 - struct iocb *iocb, struct hlist_head *batch_hash, 1567 - bool compat) 1529 + struct iocb *iocb, bool compat) 1568 1530 { 1569 1531 struct kiocb *req; 1570 1532 struct file *file; ··· 1606 1666 ; 1607 1667 } 1608 1668 spin_unlock_irq(&ctx->ctx_lock); 1609 - if (req->ki_opcode == IOCB_CMD_PREAD || 1610 - req->ki_opcode == IOCB_CMD_PREADV || 1611 - req->ki_opcode == IOCB_CMD_PWRITE || 1612 - req->ki_opcode == IOCB_CMD_PWRITEV) 1613 - aio_batch_add(file->f_mapping, batch_hash); 1614 1669 1615 1670 aio_put_req(req); /* drop extra ref to req */ 1616 1671 return 0; ··· 1622 1687 struct kioctx *ctx; 1623 1688 long ret = 0; 1624 1689 int i; 1625 - struct hlist_head batch_hash[AIO_BATCH_HASH_SIZE] = { { 0, }, }; 1690 + struct blk_plug plug; 1626 1691 1627 1692 if (unlikely(nr < 0)) 1628 1693 return -EINVAL; ··· 1638 1703 pr_debug("EINVAL: io_submit: invalid context id\n"); 1639 1704 return -EINVAL; 1640 1705 } 1706 + 1707 + blk_start_plug(&plug); 1641 1708 1642 1709 /* 1643 1710 * AKPM: should this return a partial result if some of the IOs were ··· 1659 1722 break; 1660 1723 } 1661 1724 1662 - ret = io_submit_one(ctx, user_iocb, &tmp, batch_hash, compat); 1725 + ret = io_submit_one(ctx, user_iocb, &tmp, compat); 1663 1726 if (ret) 1664 1727 break; 1665 1728 } 1666 - aio_batch_free(batch_hash); 1729 + blk_finish_plug(&plug); 1667 1730 1668 1731 put_ioctx(ctx); 1669 1732 return i ? i : ret;

-1

fs/befs/linuxvfs.c

··· 75 75 76 76 static const struct address_space_operations befs_aops = { 77 77 .readpage = befs_readpage, 78 - .sync_page = block_sync_page, 79 78 .bmap = befs_bmap, 80 79 }; 81 80

-1

fs/bfs/file.c

··· 186 186 const struct address_space_operations bfs_aops = { 187 187 .readpage = bfs_readpage, 188 188 .writepage = bfs_writepage, 189 - .sync_page = block_sync_page, 190 189 .write_begin = bfs_write_begin, 191 190 .write_end = generic_write_end, 192 191 .bmap = bfs_bmap,

+3

fs/bio-integrity.c

··· 761 761 { 762 762 unsigned int max_slab = vecs_to_idx(BIO_MAX_PAGES); 763 763 764 + if (bs->bio_integrity_pool) 765 + return 0; 766 + 764 767 bs->bio_integrity_pool = 765 768 mempool_create_slab_pool(pool_size, bip_slab[max_slab].slab); 766 769

+4 -6

fs/bio.c

··· 43 43 * unsigned short 44 44 */ 45 45 #define BV(x) { .nr_vecs = x, .name = "biovec-"__stringify(x) } 46 - struct biovec_slab bvec_slabs[BIOVEC_NR_POOLS] __read_mostly = { 46 + static struct biovec_slab bvec_slabs[BIOVEC_NR_POOLS] __read_mostly = { 47 47 BV(1), BV(4), BV(16), BV(64), BV(128), BV(BIO_MAX_PAGES), 48 48 }; 49 49 #undef BV ··· 1636 1636 if (!bs->bio_pool) 1637 1637 goto bad; 1638 1638 1639 - if (bioset_integrity_create(bs, pool_size)) 1640 - goto bad; 1641 - 1642 1639 if (!biovec_create_pools(bs, pool_size)) 1643 1640 return bs; 1644 1641 ··· 1653 1656 int size; 1654 1657 struct biovec_slab *bvs = bvec_slabs + i; 1655 1658 1656 - #ifndef CONFIG_BLK_DEV_INTEGRITY 1657 1659 if (bvs->nr_vecs <= BIO_INLINE_VECS) { 1658 1660 bvs->slab = NULL; 1659 1661 continue; 1660 1662 } 1661 - #endif 1662 1663 1663 1664 size = bvs->nr_vecs * sizeof(struct bio_vec); 1664 1665 bvs->slab = kmem_cache_create(bvs->name, size, 0, ··· 1678 1683 fs_bio_set = bioset_create(BIO_POOL_SIZE, 0); 1679 1684 if (!fs_bio_set) 1680 1685 panic("bio: can't allocate bios\n"); 1686 + 1687 + if (bioset_integrity_create(fs_bio_set, BIO_POOL_SIZE)) 1688 + panic("bio: can't create integrity pool\n"); 1681 1689 1682 1690 bio_split_pool = mempool_create_kmalloc_pool(BIO_SPLIT_ENTRIES, 1683 1691 sizeof(struct bio_pair));

+14 -13

fs/block_dev.c

··· 1087 1087 if (!disk) 1088 1088 goto out; 1089 1089 1090 + disk_block_events(disk); 1090 1091 mutex_lock_nested(&bdev->bd_mutex, for_part); 1091 1092 if (!bdev->bd_openers) { 1092 1093 bdev->bd_disk = disk; ··· 1109 1108 */ 1110 1109 disk_put_part(bdev->bd_part); 1111 1110 bdev->bd_part = NULL; 1112 - module_put(disk->fops->owner); 1113 - put_disk(disk); 1114 1111 bdev->bd_disk = NULL; 1115 1112 mutex_unlock(&bdev->bd_mutex); 1113 + disk_unblock_events(disk); 1114 + module_put(disk->fops->owner); 1115 + put_disk(disk); 1116 1116 goto restart; 1117 1117 } 1118 1118 if (ret) ··· 1150 1148 bd_set_size(bdev, (loff_t)bdev->bd_part->nr_sects << 9); 1151 1149 } 1152 1150 } else { 1153 - module_put(disk->fops->owner); 1154 - put_disk(disk); 1155 - disk = NULL; 1156 1151 if (bdev->bd_contains == bdev) { 1157 1152 if (bdev->bd_disk->fops->open) { 1158 1153 ret = bdev->bd_disk->fops->open(bdev, mode); ··· 1159 1160 if (bdev->bd_invalidated) 1160 1161 rescan_partitions(bdev->bd_disk, bdev); 1161 1162 } 1163 + /* only one opener holds refs to the module and disk */ 1164 + module_put(disk->fops->owner); 1165 + put_disk(disk); 1162 1166 } 1163 1167 bdev->bd_openers++; 1164 1168 if (for_part) 1165 1169 bdev->bd_part_count++; 1166 1170 mutex_unlock(&bdev->bd_mutex); 1171 + disk_unblock_events(disk); 1167 1172 return 0; 1168 1173 1169 1174 out_clear: ··· 1180 1177 bdev->bd_contains = NULL; 1181 1178 out_unlock_bdev: 1182 1179 mutex_unlock(&bdev->bd_mutex); 1183 - out: 1184 - if (disk) 1185 - module_put(disk->fops->owner); 1180 + disk_unblock_events(disk); 1181 + module_put(disk->fops->owner); 1186 1182 put_disk(disk); 1183 + out: 1187 1184 bdput(bdev); 1188 1185 1189 1186 return ret; ··· 1449 1446 if (bdev_free) { 1450 1447 if (bdev->bd_write_holder) { 1451 1448 disk_unblock_events(bdev->bd_disk); 1452 - bdev->bd_write_holder = false; 1453 - } else 1454 1449 disk_check_events(bdev->bd_disk); 1450 + bdev->bd_write_holder = false; 1451 + } 1455 1452 } 1456 1453 1457 1454 mutex_unlock(&bdev->bd_mutex); 1458 - } else 1459 - disk_check_events(bdev->bd_disk); 1455 + } 1460 1456 1461 1457 return __blkdev_put(bdev, mode, 0); 1462 1458 } ··· 1529 1527 static const struct address_space_operations def_blk_aops = { 1530 1528 .readpage = blkdev_readpage, 1531 1529 .writepage = blkdev_writepage, 1532 - .sync_page = block_sync_page, 1533 1530 .write_begin = blkdev_write_begin, 1534 1531 .write_end = blkdev_write_end, 1535 1532 .writepages = generic_writepages,

-79

fs/btrfs/disk-io.c

··· 847 847 .writepages = btree_writepages, 848 848 .releasepage = btree_releasepage, 849 849 .invalidatepage = btree_invalidatepage, 850 - .sync_page = block_sync_page, 851 850 #ifdef CONFIG_MIGRATION 852 851 .migratepage = btree_migratepage, 853 852 #endif ··· 1330 1331 } 1331 1332 1332 1333 /* 1333 - * this unplugs every device on the box, and it is only used when page 1334 - * is null 1335 - */ 1336 - static void __unplug_io_fn(struct backing_dev_info *bdi, struct page *page) 1337 - { 1338 - struct btrfs_device *device; 1339 - struct btrfs_fs_info *info; 1340 - 1341 - info = (struct btrfs_fs_info *)bdi->unplug_io_data; 1342 - list_for_each_entry(device, &info->fs_devices->devices, dev_list) { 1343 - if (!device->bdev) 1344 - continue; 1345 - 1346 - bdi = blk_get_backing_dev_info(device->bdev); 1347 - if (bdi->unplug_io_fn) 1348 - bdi->unplug_io_fn(bdi, page); 1349 - } 1350 - } 1351 - 1352 - static void btrfs_unplug_io_fn(struct backing_dev_info *bdi, struct page *page) 1353 - { 1354 - struct inode *inode; 1355 - struct extent_map_tree *em_tree; 1356 - struct extent_map *em; 1357 - struct address_space *mapping; 1358 - u64 offset; 1359 - 1360 - /* the generic O_DIRECT read code does this */ 1361 - if (1 || !page) { 1362 - __unplug_io_fn(bdi, page); 1363 - return; 1364 - } 1365 - 1366 - /* 1367 - * page->mapping may change at any time. Get a consistent copy 1368 - * and use that for everything below 1369 - */ 1370 - smp_mb(); 1371 - mapping = page->mapping; 1372 - if (!mapping) 1373 - return; 1374 - 1375 - inode = mapping->host; 1376 - 1377 - /* 1378 - * don't do the expensive searching for a small number of 1379 - * devices 1380 - */ 1381 - if (BTRFS_I(inode)->root->fs_info->fs_devices->open_devices <= 2) { 1382 - __unplug_io_fn(bdi, page); 1383 - return; 1384 - } 1385 - 1386 - offset = page_offset(page); 1387 - 1388 - em_tree = &BTRFS_I(inode)->extent_tree; 1389 - read_lock(&em_tree->lock); 1390 - em = lookup_extent_mapping(em_tree, offset, PAGE_CACHE_SIZE); 1391 - read_unlock(&em_tree->lock); 1392 - if (!em) { 1393 - __unplug_io_fn(bdi, page); 1394 - return; 1395 - } 1396 - 1397 - if (em->block_start >= EXTENT_MAP_LAST_BYTE) { 1398 - free_extent_map(em); 1399 - __unplug_io_fn(bdi, page); 1400 - return; 1401 - } 1402 - offset = offset - em->start; 1403 - btrfs_unplug_page(&BTRFS_I(inode)->root->fs_info->mapping_tree, 1404 - em->block_start + offset, page); 1405 - free_extent_map(em); 1406 - } 1407 - 1408 - /* 1409 1334 * If this fails, caller must call bdi_destroy() to get rid of the 1410 1335 * bdi again. 1411 1336 */ ··· 1343 1420 return err; 1344 1421 1345 1422 bdi->ra_pages = default_backing_dev_info.ra_pages; 1346 - bdi->unplug_io_fn = btrfs_unplug_io_fn; 1347 - bdi->unplug_io_data = info; 1348 1423 bdi->congested_fn = btrfs_congested_fn; 1349 1424 bdi->congested_data = info; 1350 1425 return 0;

+1 -1

fs/btrfs/extent_io.c

··· 2188 2188 unsigned long nr_written = 0; 2189 2189 2190 2190 if (wbc->sync_mode == WB_SYNC_ALL) 2191 - write_flags = WRITE_SYNC_PLUG; 2191 + write_flags = WRITE_SYNC; 2192 2192 else 2193 2193 write_flags = WRITE; 2194 2194

-1

fs/btrfs/inode.c

··· 7340 7340 .writepage = btrfs_writepage, 7341 7341 .writepages = btrfs_writepages, 7342 7342 .readpages = btrfs_readpages, 7343 - .sync_page = block_sync_page, 7344 7343 .direct_IO = btrfs_direct_IO, 7345 7344 .invalidatepage = btrfs_invalidatepage, 7346 7345 .releasepage = btrfs_releasepage,

+11 -80

fs/btrfs/volumes.c

··· 162 162 struct bio *cur; 163 163 int again = 0; 164 164 unsigned long num_run; 165 - unsigned long num_sync_run; 166 165 unsigned long batch_run = 0; 167 166 unsigned long limit; 168 167 unsigned long last_waited = 0; ··· 171 172 fs_info = device->dev_root->fs_info; 172 173 limit = btrfs_async_submit_limit(fs_info); 173 174 limit = limit * 2 / 3; 174 - 175 - /* we want to make sure that every time we switch from the sync 176 - * list to the normal list, we unplug 177 - */ 178 - num_sync_run = 0; 179 175 180 176 loop: 181 177 spin_lock(&device->io_lock); ··· 217 223 218 224 spin_unlock(&device->io_lock); 219 225 220 - /* 221 - * if we're doing the regular priority list, make sure we unplug 222 - * for any high prio bios we've sent down 223 - */ 224 - if (pending_bios == &device->pending_bios && num_sync_run > 0) { 225 - num_sync_run = 0; 226 - blk_run_backing_dev(bdi, NULL); 227 - } 228 - 229 226 while (pending) { 230 227 231 228 rmb(); ··· 244 259 245 260 BUG_ON(atomic_read(&cur->bi_cnt) == 0); 246 261 247 - if (cur->bi_rw & REQ_SYNC) 248 - num_sync_run++; 249 - 250 262 submit_bio(cur->bi_rw, cur); 251 263 num_run++; 252 264 batch_run++; 253 - if (need_resched()) { 254 - if (num_sync_run) { 255 - blk_run_backing_dev(bdi, NULL); 256 - num_sync_run = 0; 257 - } 265 + if (need_resched()) 258 266 cond_resched(); 259 - } 260 267 261 268 /* 262 269 * we made progress, there is more work to do and the bdi ··· 281 304 * against it before looping 282 305 */ 283 306 last_waited = ioc->last_waited; 284 - if (need_resched()) { 285 - if (num_sync_run) { 286 - blk_run_backing_dev(bdi, NULL); 287 - num_sync_run = 0; 288 - } 307 + if (need_resched()) 289 308 cond_resched(); 290 - } 291 309 continue; 292 310 } 293 311 spin_lock(&device->io_lock); ··· 294 322 goto done; 295 323 } 296 324 } 297 - 298 - if (num_sync_run) { 299 - num_sync_run = 0; 300 - blk_run_backing_dev(bdi, NULL); 301 - } 302 - /* 303 - * IO has already been through a long path to get here. Checksumming, 304 - * async helper threads, perhaps compression. We've done a pretty 305 - * good job of collecting a batch of IO and should just unplug 306 - * the device right away. 307 - * 308 - * This will help anyone who is waiting on the IO, they might have 309 - * already unplugged, but managed to do so before the bio they 310 - * cared about found its way down here. 311 - */ 312 - blk_run_backing_dev(bdi, NULL); 313 325 314 326 cond_resched(); 315 327 if (again) ··· 2911 2955 static int __btrfs_map_block(struct btrfs_mapping_tree *map_tree, int rw, 2912 2956 u64 logical, u64 *length, 2913 2957 struct btrfs_multi_bio **multi_ret, 2914 - int mirror_num, struct page *unplug_page) 2958 + int mirror_num) 2915 2959 { 2916 2960 struct extent_map *em; 2917 2961 struct map_lookup *map; ··· 2942 2986 read_lock(&em_tree->lock); 2943 2987 em = lookup_extent_mapping(em_tree, logical, *length); 2944 2988 read_unlock(&em_tree->lock); 2945 - 2946 - if (!em && unplug_page) { 2947 - kfree(multi); 2948 - return 0; 2949 - } 2950 2989 2951 2990 if (!em) { 2952 2991 printk(KERN_CRIT "unable to find logical %llu len %llu\n", ··· 2998 3047 *length = em->len - offset; 2999 3048 } 3000 3049 3001 - if (!multi_ret && !unplug_page) 3050 + if (!multi_ret) 3002 3051 goto out; 3003 3052 3004 3053 num_stripes = 1; 3005 3054 stripe_index = 0; 3006 3055 if (map->type & BTRFS_BLOCK_GROUP_RAID1) { 3007 - if (unplug_page || (rw & REQ_WRITE)) 3056 + if (rw & REQ_WRITE) 3008 3057 num_stripes = map->num_stripes; 3009 3058 else if (mirror_num) 3010 3059 stripe_index = mirror_num - 1; ··· 3026 3075 stripe_index = do_div(stripe_nr, factor); 3027 3076 stripe_index *= map->sub_stripes; 3028 3077 3029 - if (unplug_page || (rw & REQ_WRITE)) 3078 + if (rw & REQ_WRITE) 3030 3079 num_stripes = map->sub_stripes; 3031 3080 else if (mirror_num) 3032 3081 stripe_index += mirror_num - 1; ··· 3046 3095 BUG_ON(stripe_index >= map->num_stripes); 3047 3096 3048 3097 for (i = 0; i < num_stripes; i++) { 3049 - if (unplug_page) { 3050 - struct btrfs_device *device; 3051 - struct backing_dev_info *bdi; 3052 - 3053 - device = map->stripes[stripe_index].dev; 3054 - if (device->bdev) { 3055 - bdi = blk_get_backing_dev_info(device->bdev); 3056 - if (bdi->unplug_io_fn) 3057 - bdi->unplug_io_fn(bdi, unplug_page); 3058 - } 3059 - } else { 3060 - multi->stripes[i].physical = 3061 - map->stripes[stripe_index].physical + 3062 - stripe_offset + stripe_nr * map->stripe_len; 3063 - multi->stripes[i].dev = map->stripes[stripe_index].dev; 3064 - } 3098 + multi->stripes[i].physical = 3099 + map->stripes[stripe_index].physical + 3100 + stripe_offset + stripe_nr * map->stripe_len; 3101 + multi->stripes[i].dev = map->stripes[stripe_index].dev; 3065 3102 stripe_index++; 3066 3103 } 3067 3104 if (multi_ret) { ··· 3067 3128 struct btrfs_multi_bio **multi_ret, int mirror_num) 3068 3129 { 3069 3130 return __btrfs_map_block(map_tree, rw, logical, length, multi_ret, 3070 - mirror_num, NULL); 3131 + mirror_num); 3071 3132 } 3072 3133 3073 3134 int btrfs_rmap_block(struct btrfs_mapping_tree *map_tree, ··· 3133 3194 3134 3195 free_extent_map(em); 3135 3196 return 0; 3136 - } 3137 - 3138 - int btrfs_unplug_page(struct btrfs_mapping_tree *map_tree, 3139 - u64 logical, struct page *page) 3140 - { 3141 - u64 length = PAGE_CACHE_SIZE; 3142 - return __btrfs_map_block(map_tree, READ, logical, &length, 3143 - NULL, 0, page); 3144 3197 } 3145 3198 3146 3199 static void end_bio_multi_stripe(struct bio *bio, int err)

+14 -37

fs/buffer.c

··· 54 54 } 55 55 EXPORT_SYMBOL(init_buffer); 56 56 57 - static int sync_buffer(void *word) 57 + static int sleep_on_buffer(void *word) 58 58 { 59 - struct block_device *bd; 60 - struct buffer_head *bh 61 - = container_of(word, struct buffer_head, b_state); 62 - 63 - smp_mb(); 64 - bd = bh->b_bdev; 65 - if (bd) 66 - blk_run_address_space(bd->bd_inode->i_mapping); 67 59 io_schedule(); 68 60 return 0; 69 61 } 70 62 71 63 void __lock_buffer(struct buffer_head *bh) 72 64 { 73 - wait_on_bit_lock(&bh->b_state, BH_Lock, sync_buffer, 65 + wait_on_bit_lock(&bh->b_state, BH_Lock, sleep_on_buffer, 74 66 TASK_UNINTERRUPTIBLE); 75 67 } 76 68 EXPORT_SYMBOL(__lock_buffer); ··· 82 90 */ 83 91 void __wait_on_buffer(struct buffer_head * bh) 84 92 { 85 - wait_on_bit(&bh->b_state, BH_Lock, sync_buffer, TASK_UNINTERRUPTIBLE); 93 + wait_on_bit(&bh->b_state, BH_Lock, sleep_on_buffer, TASK_UNINTERRUPTIBLE); 86 94 } 87 95 EXPORT_SYMBOL(__wait_on_buffer); 88 96 ··· 741 749 { 742 750 struct buffer_head *bh; 743 751 struct list_head tmp; 744 - struct address_space *mapping, *prev_mapping = NULL; 752 + struct address_space *mapping; 745 753 int err = 0, err2; 754 + struct blk_plug plug; 746 755 747 756 INIT_LIST_HEAD(&tmp); 757 + blk_start_plug(&plug); 748 758 749 759 spin_lock(lock); 750 760 while (!list_empty(list)) { ··· 769 775 * still in flight on potentially older 770 776 * contents. 771 777 */ 772 - write_dirty_buffer(bh, WRITE_SYNC_PLUG); 778 + write_dirty_buffer(bh, WRITE_SYNC); 773 779 774 780 /* 775 781 * Kick off IO for the previous mapping. Note ··· 777 783 * wait_on_buffer() will do that for us 778 784 * through sync_buffer(). 779 785 */ 780 - if (prev_mapping && prev_mapping != mapping) 781 - blk_run_address_space(prev_mapping); 782 - prev_mapping = mapping; 783 - 784 786 brelse(bh); 785 787 spin_lock(lock); 786 788 } 787 789 } 788 790 } 791 + 792 + spin_unlock(lock); 793 + blk_finish_plug(&plug); 794 + spin_lock(lock); 789 795 790 796 while (!list_empty(&tmp)) { 791 797 bh = BH_ENTRY(tmp.prev); ··· 1608 1614 * prevents this contention from occurring. 1609 1615 * 1610 1616 * If block_write_full_page() is called with wbc->sync_mode == 1611 - * WB_SYNC_ALL, the writes are posted using WRITE_SYNC_PLUG; this 1612 - * causes the writes to be flagged as synchronous writes, but the 1613 - * block device queue will NOT be unplugged, since usually many pages 1614 - * will be pushed to the out before the higher-level caller actually 1615 - * waits for the writes to be completed. The various wait functions, 1616 - * such as wait_on_writeback_range() will ultimately call sync_page() 1617 - * which will ultimately call blk_run_backing_dev(), which will end up 1618 - * unplugging the device queue. 1617 + * WB_SYNC_ALL, the writes are posted using WRITE_SYNC; this 1618 + * causes the writes to be flagged as synchronous writes. 1619 1619 */ 1620 1620 static int __block_write_full_page(struct inode *inode, struct page *page, 1621 1621 get_block_t *get_block, struct writeback_control *wbc, ··· 1622 1634 const unsigned blocksize = 1 << inode->i_blkbits; 1623 1635 int nr_underway = 0; 1624 1636 int write_op = (wbc->sync_mode == WB_SYNC_ALL ? 1625 - WRITE_SYNC_PLUG : WRITE); 1637 + WRITE_SYNC : WRITE); 1626 1638 1627 1639 BUG_ON(!PageLocked(page)); 1628 1640 ··· 3125 3137 return ret; 3126 3138 } 3127 3139 EXPORT_SYMBOL(try_to_free_buffers); 3128 - 3129 - void block_sync_page(struct page *page) 3130 - { 3131 - struct address_space *mapping; 3132 - 3133 - smp_mb(); 3134 - mapping = page_mapping(page); 3135 - if (mapping) 3136 - blk_run_backing_dev(mapping->backing_dev_info, page); 3137 - } 3138 - EXPORT_SYMBOL(block_sync_page); 3139 3140 3140 3141 /* 3141 3142 * There are no bdflush tunables left. But distributions are

-30

fs/cifs/file.c

··· 1569 1569 return rc; 1570 1570 } 1571 1571 1572 - /* static void cifs_sync_page(struct page *page) 1573 - { 1574 - struct address_space *mapping; 1575 - struct inode *inode; 1576 - unsigned long index = page->index; 1577 - unsigned int rpages = 0; 1578 - int rc = 0; 1579 - 1580 - cFYI(1, "sync page %p", page); 1581 - mapping = page->mapping; 1582 - if (!mapping) 1583 - return 0; 1584 - inode = mapping->host; 1585 - if (!inode) 1586 - return; */ 1587 - 1588 - /* fill in rpages then 1589 - result = cifs_pagein_inode(inode, index, rpages); */ /* BB finish */ 1590 - 1591 - /* cFYI(1, "rpages is %d for sync page of Index %ld", rpages, index); 1592 - 1593 - #if 0 1594 - if (rc < 0) 1595 - return rc; 1596 - return 0; 1597 - #endif 1598 - } */ 1599 - 1600 1572 /* 1601 1573 * As file closes, flush all cached write data for this inode checking 1602 1574 * for write behind errors. ··· 2482 2510 .set_page_dirty = __set_page_dirty_nobuffers, 2483 2511 .releasepage = cifs_release_page, 2484 2512 .invalidatepage = cifs_invalidate_page, 2485 - /* .sync_page = cifs_sync_page, */ 2486 2513 /* .direct_IO = */ 2487 2514 }; 2488 2515 ··· 2499 2528 .set_page_dirty = __set_page_dirty_nobuffers, 2500 2529 .releasepage = cifs_release_page, 2501 2530 .invalidatepage = cifs_invalidate_page, 2502 - /* .sync_page = cifs_sync_page, */ 2503 2531 /* .direct_IO = */ 2504 2532 };

+2 -5

fs/direct-io.c

··· 1110 1110 ((rw & READ) || (dio->result == dio->size))) 1111 1111 ret = -EIOCBQUEUED; 1112 1112 1113 - if (ret != -EIOCBQUEUED) { 1114 - /* All IO is now issued, send it on its way */ 1115 - blk_run_address_space(inode->i_mapping); 1113 + if (ret != -EIOCBQUEUED) 1116 1114 dio_await_completion(dio); 1117 - } 1118 1115 1119 1116 /* 1120 1117 * Sync will always be dropping the final ref and completing the ··· 1173 1176 struct dio *dio; 1174 1177 1175 1178 if (rw & WRITE) 1176 - rw = WRITE_ODIRECT_PLUG; 1179 + rw = WRITE_ODIRECT; 1177 1180 1178 1181 if (bdev) 1179 1182 bdev_blkbits = blksize_bits(bdev_logical_block_size(bdev));

-1

fs/efs/inode.c

··· 23 23 } 24 24 static const struct address_space_operations efs_aops = { 25 25 .readpage = efs_readpage, 26 - .sync_page = block_sync_page, 27 26 .bmap = _efs_bmap 28 27 }; 29 28

-1

fs/exofs/inode.c

··· 823 823 .direct_IO = NULL, /* TODO: Should be trivial to do */ 824 824 825 825 /* With these NULL has special meaning or default is not exported */ 826 - .sync_page = NULL, 827 826 .get_xip_mem = NULL, 828 827 .migratepage = NULL, 829 828 .launder_page = NULL,

-2

fs/ext2/inode.c

··· 860 860 .readpage = ext2_readpage, 861 861 .readpages = ext2_readpages, 862 862 .writepage = ext2_writepage, 863 - .sync_page = block_sync_page, 864 863 .write_begin = ext2_write_begin, 865 864 .write_end = ext2_write_end, 866 865 .bmap = ext2_bmap, ··· 879 880 .readpage = ext2_readpage, 880 881 .readpages = ext2_readpages, 881 882 .writepage = ext2_nobh_writepage, 882 - .sync_page = block_sync_page, 883 883 .write_begin = ext2_nobh_write_begin, 884 884 .write_end = nobh_write_end, 885 885 .bmap = ext2_bmap,

-3

fs/ext3/inode.c

··· 1894 1894 .readpage = ext3_readpage, 1895 1895 .readpages = ext3_readpages, 1896 1896 .writepage = ext3_ordered_writepage, 1897 - .sync_page = block_sync_page, 1898 1897 .write_begin = ext3_write_begin, 1899 1898 .write_end = ext3_ordered_write_end, 1900 1899 .bmap = ext3_bmap, ··· 1909 1910 .readpage = ext3_readpage, 1910 1911 .readpages = ext3_readpages, 1911 1912 .writepage = ext3_writeback_writepage, 1912 - .sync_page = block_sync_page, 1913 1913 .write_begin = ext3_write_begin, 1914 1914 .write_end = ext3_writeback_write_end, 1915 1915 .bmap = ext3_bmap, ··· 1924 1926 .readpage = ext3_readpage, 1925 1927 .readpages = ext3_readpages, 1926 1928 .writepage = ext3_journalled_writepage, 1927 - .sync_page = block_sync_page, 1928 1929 .write_begin = ext3_write_begin, 1929 1930 .write_end = ext3_journalled_write_end, 1930 1931 .set_page_dirty = ext3_journalled_set_page_dirty,

-4

fs/ext4/inode.c

··· 3903 3903 .readpage = ext4_readpage, 3904 3904 .readpages = ext4_readpages, 3905 3905 .writepage = ext4_writepage, 3906 - .sync_page = block_sync_page, 3907 3906 .write_begin = ext4_write_begin, 3908 3907 .write_end = ext4_ordered_write_end, 3909 3908 .bmap = ext4_bmap, ··· 3918 3919 .readpage = ext4_readpage, 3919 3920 .readpages = ext4_readpages, 3920 3921 .writepage = ext4_writepage, 3921 - .sync_page = block_sync_page, 3922 3922 .write_begin = ext4_write_begin, 3923 3923 .write_end = ext4_writeback_write_end, 3924 3924 .bmap = ext4_bmap, ··· 3933 3935 .readpage = ext4_readpage, 3934 3936 .readpages = ext4_readpages, 3935 3937 .writepage = ext4_writepage, 3936 - .sync_page = block_sync_page, 3937 3938 .write_begin = ext4_write_begin, 3938 3939 .write_end = ext4_journalled_write_end, 3939 3940 .set_page_dirty = ext4_journalled_set_page_dirty, ··· 3948 3951 .readpages = ext4_readpages, 3949 3952 .writepage = ext4_writepage, 3950 3953 .writepages = ext4_da_writepages, 3951 - .sync_page = block_sync_page, 3952 3954 .write_begin = ext4_da_write_begin, 3953 3955 .write_end = ext4_da_write_end, 3954 3956 .bmap = ext4_bmap,

+1 -2

fs/ext4/page-io.c

··· 310 310 io_end->offset = (page->index << PAGE_CACHE_SHIFT) + bh_offset(bh); 311 311 312 312 io->io_bio = bio; 313 - io->io_op = (wbc->sync_mode == WB_SYNC_ALL ? 314 - WRITE_SYNC_PLUG : WRITE); 313 + io->io_op = (wbc->sync_mode == WB_SYNC_ALL ? WRITE_SYNC : WRITE); 315 314 io->io_next_block = bh->b_blocknr; 316 315 return 0; 317 316 }

-1

fs/fat/inode.c

··· 236 236 .readpages = fat_readpages, 237 237 .writepage = fat_writepage, 238 238 .writepages = fat_writepages, 239 - .sync_page = block_sync_page, 240 239 .write_begin = fat_write_begin, 241 240 .write_end = fat_write_end, 242 241 .direct_IO = fat_direct_IO,

-1

fs/freevxfs/vxfs_subr.c

··· 44 44 const struct address_space_operations vxfs_aops = { 45 45 .readpage = vxfs_readpage, 46 46 .bmap = vxfs_bmap, 47 - .sync_page = block_sync_page, 48 47 }; 49 48 50 49 inline void

-1

fs/fuse/inode.c

··· 870 870 871 871 fc->bdi.name = "fuse"; 872 872 fc->bdi.ra_pages = (VM_MAX_READAHEAD * 1024) / PAGE_CACHE_SIZE; 873 - fc->bdi.unplug_io_fn = default_unplug_io_fn; 874 873 /* fuse does it's own writeback accounting */ 875 874 fc->bdi.capabilities = BDI_CAP_NO_ACCT_WB; 876 875

-3

fs/gfs2/aops.c

··· 1117 1117 .writepages = gfs2_writeback_writepages, 1118 1118 .readpage = gfs2_readpage, 1119 1119 .readpages = gfs2_readpages, 1120 - .sync_page = block_sync_page, 1121 1120 .write_begin = gfs2_write_begin, 1122 1121 .write_end = gfs2_write_end, 1123 1122 .bmap = gfs2_bmap, ··· 1132 1133 .writepage = gfs2_ordered_writepage, 1133 1134 .readpage = gfs2_readpage, 1134 1135 .readpages = gfs2_readpages, 1135 - .sync_page = block_sync_page, 1136 1136 .write_begin = gfs2_write_begin, 1137 1137 .write_end = gfs2_write_end, 1138 1138 .set_page_dirty = gfs2_set_page_dirty, ··· 1149 1151 .writepages = gfs2_jdata_writepages, 1150 1152 .readpage = gfs2_readpage, 1151 1153 .readpages = gfs2_readpages, 1152 - .sync_page = block_sync_page, 1153 1154 .write_begin = gfs2_write_begin, 1154 1155 .write_end = gfs2_write_end, 1155 1156 .set_page_dirty = gfs2_set_page_dirty,

+2 -2

fs/gfs2/log.c

··· 121 121 lock_buffer(bh); 122 122 if (test_clear_buffer_dirty(bh)) { 123 123 bh->b_end_io = end_buffer_write_sync; 124 - submit_bh(WRITE_SYNC_PLUG, bh); 124 + submit_bh(WRITE_SYNC, bh); 125 125 } else { 126 126 unlock_buffer(bh); 127 127 brelse(bh); ··· 647 647 lock_buffer(bh); 648 648 if (buffer_mapped(bh) && test_clear_buffer_dirty(bh)) { 649 649 bh->b_end_io = end_buffer_write_sync; 650 - submit_bh(WRITE_SYNC_PLUG, bh); 650 + submit_bh(WRITE_SYNC, bh); 651 651 } else { 652 652 unlock_buffer(bh); 653 653 brelse(bh);

+6 -6

fs/gfs2/lops.c

··· 204 204 } 205 205 206 206 gfs2_log_unlock(sdp); 207 - submit_bh(WRITE_SYNC_PLUG, bh); 207 + submit_bh(WRITE_SYNC, bh); 208 208 gfs2_log_lock(sdp); 209 209 210 210 n = 0; ··· 214 214 gfs2_log_unlock(sdp); 215 215 lock_buffer(bd2->bd_bh); 216 216 bh = gfs2_log_fake_buf(sdp, bd2->bd_bh); 217 - submit_bh(WRITE_SYNC_PLUG, bh); 217 + submit_bh(WRITE_SYNC, bh); 218 218 gfs2_log_lock(sdp); 219 219 if (++n >= num) 220 220 break; ··· 356 356 sdp->sd_log_num_revoke--; 357 357 358 358 if (offset + sizeof(u64) > sdp->sd_sb.sb_bsize) { 359 - submit_bh(WRITE_SYNC_PLUG, bh); 359 + submit_bh(WRITE_SYNC, bh); 360 360 361 361 bh = gfs2_log_get_buf(sdp); 362 362 mh = (struct gfs2_meta_header *)bh->b_data; ··· 373 373 } 374 374 gfs2_assert_withdraw(sdp, !sdp->sd_log_num_revoke); 375 375 376 - submit_bh(WRITE_SYNC_PLUG, bh); 376 + submit_bh(WRITE_SYNC, bh); 377 377 } 378 378 379 379 static void revoke_lo_before_scan(struct gfs2_jdesc *jd, ··· 575 575 ptr = bh_log_ptr(bh); 576 576 577 577 get_bh(bh); 578 - submit_bh(WRITE_SYNC_PLUG, bh); 578 + submit_bh(WRITE_SYNC, bh); 579 579 gfs2_log_lock(sdp); 580 580 while(!list_empty(list)) { 581 581 bd = list_entry(list->next, struct gfs2_bufdata, bd_le.le_list); ··· 601 601 } else { 602 602 bh1 = gfs2_log_fake_buf(sdp, bd->bd_bh); 603 603 } 604 - submit_bh(WRITE_SYNC_PLUG, bh1); 604 + submit_bh(WRITE_SYNC, bh1); 605 605 gfs2_log_lock(sdp); 606 606 ptr += 2; 607 607 }

+1 -2

fs/gfs2/meta_io.c

··· 37 37 struct buffer_head *bh, *head; 38 38 int nr_underway = 0; 39 39 int write_op = REQ_META | 40 - (wbc->sync_mode == WB_SYNC_ALL ? WRITE_SYNC_PLUG : WRITE); 40 + (wbc->sync_mode == WB_SYNC_ALL ? WRITE_SYNC : WRITE); 41 41 42 42 BUG_ON(!PageLocked(page)); 43 43 BUG_ON(!page_has_buffers(page)); ··· 94 94 const struct address_space_operations gfs2_meta_aops = { 95 95 .writepage = gfs2_aspace_writepage, 96 96 .releasepage = gfs2_releasepage, 97 - .sync_page = block_sync_page, 98 97 }; 99 98 100 99 /**

-2

fs/hfs/inode.c

··· 150 150 const struct address_space_operations hfs_btree_aops = { 151 151 .readpage = hfs_readpage, 152 152 .writepage = hfs_writepage, 153 - .sync_page = block_sync_page, 154 153 .write_begin = hfs_write_begin, 155 154 .write_end = generic_write_end, 156 155 .bmap = hfs_bmap, ··· 159 160 const struct address_space_operations hfs_aops = { 160 161 .readpage = hfs_readpage, 161 162 .writepage = hfs_writepage, 162 - .sync_page = block_sync_page, 163 163 .write_begin = hfs_write_begin, 164 164 .write_end = generic_write_end, 165 165 .bmap = hfs_bmap,

-2

fs/hfsplus/inode.c

··· 146 146 const struct address_space_operations hfsplus_btree_aops = { 147 147 .readpage = hfsplus_readpage, 148 148 .writepage = hfsplus_writepage, 149 - .sync_page = block_sync_page, 150 149 .write_begin = hfsplus_write_begin, 151 150 .write_end = generic_write_end, 152 151 .bmap = hfsplus_bmap, ··· 155 156 const struct address_space_operations hfsplus_aops = { 156 157 .readpage = hfsplus_readpage, 157 158 .writepage = hfsplus_writepage, 158 - .sync_page = block_sync_page, 159 159 .write_begin = hfsplus_write_begin, 160 160 .write_end = generic_write_end, 161 161 .bmap = hfsplus_bmap,

-1

fs/hpfs/file.c

··· 119 119 const struct address_space_operations hpfs_aops = { 120 120 .readpage = hpfs_readpage, 121 121 .writepage = hpfs_writepage, 122 - .sync_page = block_sync_page, 123 122 .write_begin = hpfs_write_begin, 124 123 .write_end = generic_write_end, 125 124 .bmap = _hpfs_bmap

-1

fs/isofs/inode.c

··· 1158 1158 1159 1159 static const struct address_space_operations isofs_aops = { 1160 1160 .readpage = isofs_readpage, 1161 - .sync_page = block_sync_page, 1162 1161 .bmap = _isofs_bmap 1163 1162 }; 1164 1163

+11 -11

fs/jbd/commit.c

··· 20 20 #include <linux/mm.h> 21 21 #include <linux/pagemap.h> 22 22 #include <linux/bio.h> 23 + #include <linux/blkdev.h> 23 24 24 25 /* 25 26 * Default IO end handler for temporary BJ_IO buffer_heads. ··· 295 294 int first_tag = 0; 296 295 int tag_flag; 297 296 int i; 298 - int write_op = WRITE_SYNC; 297 + struct blk_plug plug; 299 298 300 299 /* 301 300 * First job: lock down the current transaction and wait for ··· 328 327 spin_lock(&journal->j_state_lock); 329 328 commit_transaction->t_state = T_LOCKED; 330 329 331 - /* 332 - * Use plugged writes here, since we want to submit several before 333 - * we unplug the device. We don't do explicit unplugging in here, 334 - * instead we rely on sync_buffer() doing the unplug for us. 335 - */ 336 - if (commit_transaction->t_synchronous_commit) 337 - write_op = WRITE_SYNC_PLUG; 338 330 spin_lock(&commit_transaction->t_handle_lock); 339 331 while (commit_transaction->t_updates) { 340 332 DEFINE_WAIT(wait); ··· 412 418 * Now start flushing things to disk, in the order they appear 413 419 * on the transaction lists. Data blocks go first. 414 420 */ 421 + blk_start_plug(&plug); 415 422 err = journal_submit_data_buffers(journal, commit_transaction, 416 - write_op); 423 + WRITE_SYNC); 424 + blk_finish_plug(&plug); 417 425 418 426 /* 419 427 * Wait for all previously submitted IO to complete. ··· 476 480 err = 0; 477 481 } 478 482 479 - journal_write_revoke_records(journal, commit_transaction, write_op); 483 + blk_start_plug(&plug); 484 + 485 + journal_write_revoke_records(journal, commit_transaction, WRITE_SYNC); 480 486 481 487 /* 482 488 * If we found any dirty or locked buffers, then we should have ··· 648 650 clear_buffer_dirty(bh); 649 651 set_buffer_uptodate(bh); 650 652 bh->b_end_io = journal_end_buffer_io_sync; 651 - submit_bh(write_op, bh); 653 + submit_bh(WRITE_SYNC, bh); 652 654 } 653 655 cond_resched(); 654 656 ··· 658 660 bufs = 0; 659 661 } 660 662 } 663 + 664 + blk_finish_plug(&plug); 661 665 662 666 /* Lo and behold: we have just managed to send a transaction to 663 667 the log. Before we can commit it, wait for the IO so far to

+10 -12

fs/jbd2/commit.c

··· 137 137 if (journal->j_flags & JBD2_BARRIER && 138 138 !JBD2_HAS_INCOMPAT_FEATURE(journal, 139 139 JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT)) 140 - ret = submit_bh(WRITE_SYNC_PLUG | WRITE_FLUSH_FUA, bh); 140 + ret = submit_bh(WRITE_SYNC | WRITE_FLUSH_FUA, bh); 141 141 else 142 - ret = submit_bh(WRITE_SYNC_PLUG, bh); 142 + ret = submit_bh(WRITE_SYNC, bh); 143 143 144 144 *cbh = bh; 145 145 return ret; ··· 329 329 int tag_bytes = journal_tag_bytes(journal); 330 330 struct buffer_head *cbh = NULL; /* For transactional checksums */ 331 331 __u32 crc32_sum = ~0; 332 - int write_op = WRITE_SYNC; 332 + struct blk_plug plug; 333 333 334 334 /* 335 335 * First job: lock down the current transaction and wait for ··· 363 363 write_lock(&journal->j_state_lock); 364 364 commit_transaction->t_state = T_LOCKED; 365 365 366 - /* 367 - * Use plugged writes here, since we want to submit several before 368 - * we unplug the device. We don't do explicit unplugging in here, 369 - * instead we rely on sync_buffer() doing the unplug for us. 370 - */ 371 - if (commit_transaction->t_synchronous_commit) 372 - write_op = WRITE_SYNC_PLUG; 373 366 trace_jbd2_commit_locking(journal, commit_transaction); 374 367 stats.run.rs_wait = commit_transaction->t_max_wait; 375 368 stats.run.rs_locked = jiffies; ··· 462 469 if (err) 463 470 jbd2_journal_abort(journal, err); 464 471 472 + blk_start_plug(&plug); 465 473 jbd2_journal_write_revoke_records(journal, commit_transaction, 466 - write_op); 474 + WRITE_SYNC); 475 + blk_finish_plug(&plug); 467 476 468 477 jbd_debug(3, "JBD: commit phase 2\n"); 469 478 ··· 492 497 err = 0; 493 498 descriptor = NULL; 494 499 bufs = 0; 500 + blk_start_plug(&plug); 495 501 while (commit_transaction->t_buffers) { 496 502 497 503 /* Find the next buffer to be journaled... */ ··· 654 658 clear_buffer_dirty(bh); 655 659 set_buffer_uptodate(bh); 656 660 bh->b_end_io = journal_end_buffer_io_sync; 657 - submit_bh(write_op, bh); 661 + submit_bh(WRITE_SYNC, bh); 658 662 } 659 663 cond_resched(); 660 664 stats.run.rs_blocks_logged += bufs; ··· 694 698 if (err) 695 699 __jbd2_journal_abort_hard(journal); 696 700 } 701 + 702 + blk_finish_plug(&plug); 697 703 698 704 /* Lo and behold: we have just managed to send a transaction to 699 705 the log. Before we can commit it, wait for the IO so far to

-1

fs/jfs/inode.c

··· 352 352 .readpages = jfs_readpages, 353 353 .writepage = jfs_writepage, 354 354 .writepages = jfs_writepages, 355 - .sync_page = block_sync_page, 356 355 .write_begin = jfs_write_begin, 357 356 .write_end = nobh_write_end, 358 357 .bmap = jfs_bmap,

-1

fs/jfs/jfs_metapage.c

··· 583 583 const struct address_space_operations jfs_metapage_aops = { 584 584 .readpage = metapage_readpage, 585 585 .writepage = metapage_writepage, 586 - .sync_page = block_sync_page, 587 586 .releasepage = metapage_releasepage, 588 587 .invalidatepage = metapage_invalidatepage, 589 588 .set_page_dirty = __set_page_dirty_nobuffers,

-2

fs/logfs/dev_bdev.c

··· 39 39 bio.bi_end_io = request_complete; 40 40 41 41 submit_bio(rw, &bio); 42 - generic_unplug_device(bdev_get_queue(bdev)); 43 42 wait_for_completion(&complete); 44 43 return test_bit(BIO_UPTODATE, &bio.bi_flags) ? 0 : -EIO; 45 44 } ··· 167 168 } 168 169 len = PAGE_ALIGN(len); 169 170 __bdev_writeseg(sb, ofs, ofs >> PAGE_SHIFT, len >> PAGE_SHIFT); 170 - generic_unplug_device(bdev_get_queue(logfs_super(sb)->s_bdev)); 171 171 } 172 172 173 173

-1

fs/minix/inode.c

··· 399 399 static const struct address_space_operations minix_aops = { 400 400 .readpage = minix_readpage, 401 401 .writepage = minix_writepage, 402 - .sync_page = block_sync_page, 403 402 .write_begin = minix_write_begin, 404 403 .write_end = generic_write_end, 405 404 .bmap = minix_bmap

+8

fs/mpage.c

··· 364 364 sector_t last_block_in_bio = 0; 365 365 struct buffer_head map_bh; 366 366 unsigned long first_logical_block = 0; 367 + struct blk_plug plug; 368 + 369 + blk_start_plug(&plug); 367 370 368 371 map_bh.b_state = 0; 369 372 map_bh.b_size = 0; ··· 388 385 BUG_ON(!list_empty(pages)); 389 386 if (bio) 390 387 mpage_bio_submit(READ, bio); 388 + blk_finish_plug(&plug); 391 389 return 0; 392 390 } 393 391 EXPORT_SYMBOL(mpage_readpages); ··· 670 666 mpage_writepages(struct address_space *mapping, 671 667 struct writeback_control *wbc, get_block_t get_block) 672 668 { 669 + struct blk_plug plug; 673 670 int ret; 671 + 672 + blk_start_plug(&plug); 674 673 675 674 if (!get_block) 676 675 ret = generic_writepages(mapping, wbc); ··· 689 682 if (mpd.bio) 690 683 mpage_bio_submit(WRITE, mpd.bio); 691 684 } 685 + blk_finish_plug(&plug); 692 686 return ret; 693 687 } 694 688 EXPORT_SYMBOL(mpage_writepages);

+1 -6

fs/nilfs2/btnode.c

··· 34 34 #include "page.h" 35 35 #include "btnode.h" 36 36 37 - 38 - static const struct address_space_operations def_btnode_aops = { 39 - .sync_page = block_sync_page, 40 - }; 41 - 42 37 void nilfs_btnode_cache_init(struct address_space *btnc, 43 38 struct backing_dev_info *bdi) 44 39 { 45 - nilfs_mapping_init(btnc, bdi, &def_btnode_aops); 40 + nilfs_mapping_init(btnc, bdi); 46 41 } 47 42 48 43 void nilfs_btnode_cache_clear(struct address_space *btnc)

-1

fs/nilfs2/gcinode.c

··· 49 49 #include "ifile.h" 50 50 51 51 static const struct address_space_operations def_gcinode_aops = { 52 - .sync_page = block_sync_page, 53 52 }; 54 53 55 54 /*

-1

fs/nilfs2/inode.c

··· 280 280 const struct address_space_operations nilfs_aops = { 281 281 .writepage = nilfs_writepage, 282 282 .readpage = nilfs_readpage, 283 - .sync_page = block_sync_page, 284 283 .writepages = nilfs_writepages, 285 284 .set_page_dirty = nilfs_set_page_dirty, 286 285 .readpages = nilfs_readpages,

+2 -7

fs/nilfs2/mdt.c

··· 399 399 400 400 static const struct address_space_operations def_mdt_aops = { 401 401 .writepage = nilfs_mdt_write_page, 402 - .sync_page = block_sync_page, 403 402 }; 404 403 405 404 static const struct inode_operations def_mdt_iops; ··· 437 438 mi->mi_first_entry_offset = DIV_ROUND_UP(header_size, entry_size); 438 439 } 439 440 440 - static const struct address_space_operations shadow_map_aops = { 441 - .sync_page = block_sync_page, 442 - }; 443 - 444 441 /** 445 442 * nilfs_mdt_setup_shadow_map - setup shadow map and bind it to metadata file 446 443 * @inode: inode of the metadata file ··· 450 455 451 456 INIT_LIST_HEAD(&shadow->frozen_buffers); 452 457 address_space_init_once(&shadow->frozen_data); 453 - nilfs_mapping_init(&shadow->frozen_data, bdi, &shadow_map_aops); 458 + nilfs_mapping_init(&shadow->frozen_data, bdi); 454 459 address_space_init_once(&shadow->frozen_btnodes); 455 - nilfs_mapping_init(&shadow->frozen_btnodes, bdi, &shadow_map_aops); 460 + nilfs_mapping_init(&shadow->frozen_btnodes, bdi); 456 461 mi->mi_shadow = shadow; 457 462 return 0; 458 463 }

+2 -3

fs/nilfs2/page.c

··· 493 493 } 494 494 495 495 void nilfs_mapping_init(struct address_space *mapping, 496 - struct backing_dev_info *bdi, 497 - const struct address_space_operations *aops) 496 + struct backing_dev_info *bdi) 498 497 { 499 498 mapping->host = NULL; 500 499 mapping->flags = 0; 501 500 mapping_set_gfp_mask(mapping, GFP_NOFS); 502 501 mapping->assoc_mapping = NULL; 503 502 mapping->backing_dev_info = bdi; 504 - mapping->a_ops = aops; 503 + mapping->a_ops = NULL; 505 504 } 506 505 507 506 /*

+1 -2

fs/nilfs2/page.h

··· 62 62 void nilfs_copy_back_pages(struct address_space *, struct address_space *); 63 63 void nilfs_clear_dirty_pages(struct address_space *); 64 64 void nilfs_mapping_init(struct address_space *mapping, 65 - struct backing_dev_info *bdi, 66 - const struct address_space_operations *aops); 65 + struct backing_dev_info *bdi); 67 66 unsigned nilfs_page_count_clean_buffers(struct page *, unsigned, unsigned); 68 67 unsigned long nilfs_find_uncommitted_extent(struct inode *inode, 69 68 sector_t start_blk,

+1 -1

fs/nilfs2/segbuf.c

··· 509 509 * Last BIO is always sent through the following 510 510 * submission. 511 511 */ 512 - rw |= REQ_SYNC | REQ_UNPLUG; 512 + rw |= REQ_SYNC; 513 513 res = nilfs_segbuf_submit_bio(segbuf, &wi, rw); 514 514 } 515 515

-4

fs/ntfs/aops.c

··· 1543 1543 */ 1544 1544 const struct address_space_operations ntfs_aops = { 1545 1545 .readpage = ntfs_readpage, /* Fill page with data. */ 1546 - .sync_page = block_sync_page, /* Currently, just unplugs the 1547 - disk request queue. */ 1548 1546 #ifdef NTFS_RW 1549 1547 .writepage = ntfs_writepage, /* Write dirty page to disk. */ 1550 1548 #endif /* NTFS_RW */ ··· 1558 1560 */ 1559 1561 const struct address_space_operations ntfs_mst_aops = { 1560 1562 .readpage = ntfs_readpage, /* Fill page with data. */ 1561 - .sync_page = block_sync_page, /* Currently, just unplugs the 1562 - disk request queue. */ 1563 1563 #ifdef NTFS_RW 1564 1564 .writepage = ntfs_writepage, /* Write dirty page to disk. */ 1565 1565 .set_page_dirty = __set_page_dirty_nobuffers, /* Set the page dirty

+1 -2

fs/ntfs/compress.c

··· 698 698 "uptodate! Unplugging the disk queue " 699 699 "and rescheduling."); 700 700 get_bh(tbh); 701 - blk_run_address_space(mapping); 702 - schedule(); 701 + io_schedule(); 703 702 put_bh(tbh); 704 703 if (unlikely(!buffer_uptodate(tbh))) 705 704 goto read_err;

-1

fs/ocfs2/aops.c

··· 2043 2043 .write_begin = ocfs2_write_begin, 2044 2044 .write_end = ocfs2_write_end, 2045 2045 .bmap = ocfs2_bmap, 2046 - .sync_page = block_sync_page, 2047 2046 .direct_IO = ocfs2_direct_IO, 2048 2047 .invalidatepage = ocfs2_invalidatepage, 2049 2048 .releasepage = ocfs2_releasepage,

-4

fs/ocfs2/cluster/heartbeat.c

··· 367 367 static void o2hb_wait_on_io(struct o2hb_region *reg, 368 368 struct o2hb_bio_wait_ctxt *wc) 369 369 { 370 - struct address_space *mapping = reg->hr_bdev->bd_inode->i_mapping; 371 - 372 - blk_run_address_space(mapping); 373 370 o2hb_bio_wait_dec(wc, 1); 374 - 375 371 wait_for_completion(&wc->wc_io_complete); 376 372 } 377 373

-1

fs/omfs/file.c

··· 372 372 .readpages = omfs_readpages, 373 373 .writepage = omfs_writepage, 374 374 .writepages = omfs_writepages, 375 - .sync_page = block_sync_page, 376 375 .write_begin = omfs_write_begin, 377 376 .write_end = generic_write_end, 378 377 .bmap = omfs_bmap,

+2 -1

fs/partitions/check.c

··· 290 290 { 291 291 struct hd_struct *p = dev_to_part(dev); 292 292 293 - return sprintf(buf, "%8u %8u\n", p->in_flight[0], p->in_flight[1]); 293 + return sprintf(buf, "%8u %8u\n", atomic_read(&p->in_flight[0]), 294 + atomic_read(&p->in_flight[1])); 294 295 } 295 296 296 297 #ifdef CONFIG_FAIL_MAKE_REQUEST

-1

fs/qnx4/inode.c

··· 335 335 static const struct address_space_operations qnx4_aops = { 336 336 .readpage = qnx4_readpage, 337 337 .writepage = qnx4_writepage, 338 - .sync_page = block_sync_page, 339 338 .write_begin = qnx4_write_begin, 340 339 .write_end = generic_write_end, 341 340 .bmap = qnx4_bmap

-1

fs/reiserfs/inode.c

··· 3217 3217 .readpages = reiserfs_readpages, 3218 3218 .releasepage = reiserfs_releasepage, 3219 3219 .invalidatepage = reiserfs_invalidatepage, 3220 - .sync_page = block_sync_page, 3221 3220 .write_begin = reiserfs_write_begin, 3222 3221 .write_end = reiserfs_write_end, 3223 3222 .bmap = reiserfs_aop_bmap,

+2

fs/super.c

··· 71 71 #else 72 72 INIT_LIST_HEAD(&s->s_files); 73 73 #endif 74 + s->s_bdi = &default_backing_dev_info; 74 75 INIT_LIST_HEAD(&s->s_instances); 75 76 INIT_HLIST_BL_HEAD(&s->s_anon); 76 77 INIT_LIST_HEAD(&s->s_inodes); ··· 937 936 sb = root->d_sb; 938 937 BUG_ON(!sb); 939 938 WARN_ON(!sb->s_bdi); 939 + WARN_ON(sb->s_bdi == &default_backing_dev_info); 940 940 sb->s_flags |= MS_BORN; 941 941 942 942 error = security_sb_kern_mount(sb, flags, secdata);

+2 -2

fs/sync.c

··· 34 34 * This should be safe, as we require bdi backing to actually 35 35 * write out data in the first place 36 36 */ 37 - if (!sb->s_bdi || sb->s_bdi == &noop_backing_dev_info) 37 + if (sb->s_bdi == &noop_backing_dev_info) 38 38 return 0; 39 39 40 40 if (sb->s_qcop && sb->s_qcop->quota_sync) ··· 80 80 81 81 static void sync_one_sb(struct super_block *sb, void *arg) 82 82 { 83 - if (!(sb->s_flags & MS_RDONLY) && sb->s_bdi) 83 + if (!(sb->s_flags & MS_RDONLY)) 84 84 __sync_filesystem(sb, *(int *)arg); 85 85 } 86 86 /*

-1

fs/sysv/itree.c

··· 488 488 const struct address_space_operations sysv_aops = { 489 489 .readpage = sysv_readpage, 490 490 .writepage = sysv_writepage, 491 - .sync_page = block_sync_page, 492 491 .write_begin = sysv_write_begin, 493 492 .write_end = generic_write_end, 494 493 .bmap = sysv_bmap

-1

fs/ubifs/super.c

··· 2011 2011 */ 2012 2012 c->bdi.name = "ubifs", 2013 2013 c->bdi.capabilities = BDI_CAP_MAP_COPY; 2014 - c->bdi.unplug_io_fn = default_unplug_io_fn; 2015 2014 err = bdi_init(&c->bdi); 2016 2015 if (err) 2017 2016 goto out_close;

-1

fs/udf/file.c

··· 98 98 const struct address_space_operations udf_adinicb_aops = { 99 99 .readpage = udf_adinicb_readpage, 100 100 .writepage = udf_adinicb_writepage, 101 - .sync_page = block_sync_page, 102 101 .write_begin = simple_write_begin, 103 102 .write_end = udf_adinicb_write_end, 104 103 };

-1

fs/udf/inode.c

··· 140 140 const struct address_space_operations udf_aops = { 141 141 .readpage = udf_readpage, 142 142 .writepage = udf_writepage, 143 - .sync_page = block_sync_page, 144 143 .write_begin = udf_write_begin, 145 144 .write_end = generic_write_end, 146 145 .bmap = udf_bmap,

-1

fs/ufs/inode.c

··· 552 552 const struct address_space_operations ufs_aops = { 553 553 .readpage = ufs_readpage, 554 554 .writepage = ufs_writepage, 555 - .sync_page = block_sync_page, 556 555 .write_begin = ufs_write_begin, 557 556 .write_end = generic_write_end, 558 557 .bmap = ufs_bmap

+1 -1

fs/ufs/truncate.c

··· 479 479 break; 480 480 if (IS_SYNC(inode) && (inode->i_state & I_DIRTY)) 481 481 ufs_sync_inode (inode); 482 - blk_run_address_space(inode->i_mapping); 482 + blk_flush_plug(current); 483 483 yield(); 484 484 } 485 485

+1 -3

fs/xfs/linux-2.6/xfs_aops.c

··· 413 413 if (xfs_ioend_new_eof(ioend)) 414 414 xfs_mark_inode_dirty(XFS_I(ioend->io_inode)); 415 415 416 - submit_bio(wbc->sync_mode == WB_SYNC_ALL ? 417 - WRITE_SYNC_PLUG : WRITE, bio); 416 + submit_bio(wbc->sync_mode == WB_SYNC_ALL ? WRITE_SYNC : WRITE, bio); 418 417 } 419 418 420 419 STATIC struct bio * ··· 1494 1495 .readpages = xfs_vm_readpages, 1495 1496 .writepage = xfs_vm_writepage, 1496 1497 .writepages = xfs_vm_writepages, 1497 - .sync_page = block_sync_page, 1498 1498 .releasepage = xfs_vm_releasepage, 1499 1499 .invalidatepage = xfs_vm_invalidatepage, 1500 1500 .write_begin = xfs_vm_write_begin,

+5 -8

fs/xfs/linux-2.6/xfs_buf.c

··· 990 990 if (atomic_read(&bp->b_pin_count) && (bp->b_flags & XBF_STALE)) 991 991 xfs_log_force(bp->b_target->bt_mount, 0); 992 992 if (atomic_read(&bp->b_io_remaining)) 993 - blk_run_address_space(bp->b_target->bt_mapping); 993 + blk_flush_plug(current); 994 994 down(&bp->b_sema); 995 995 XB_SET_OWNER(bp); 996 996 ··· 1034 1034 set_current_state(TASK_UNINTERRUPTIBLE); 1035 1035 if (atomic_read(&bp->b_pin_count) == 0) 1036 1036 break; 1037 - if (atomic_read(&bp->b_io_remaining)) 1038 - blk_run_address_space(bp->b_target->bt_mapping); 1039 - schedule(); 1037 + io_schedule(); 1040 1038 } 1041 1039 remove_wait_queue(&bp->b_waiters, &wait); 1042 1040 set_current_state(TASK_RUNNING); ··· 1440 1442 trace_xfs_buf_iowait(bp, _RET_IP_); 1441 1443 1442 1444 if (atomic_read(&bp->b_io_remaining)) 1443 - blk_run_address_space(bp->b_target->bt_mapping); 1445 + blk_flush_plug(current); 1444 1446 wait_for_completion(&bp->b_iowait); 1445 1447 1446 1448 trace_xfs_buf_iowait_done(bp, _RET_IP_); ··· 1664 1666 struct inode *inode; 1665 1667 struct address_space *mapping; 1666 1668 static const struct address_space_operations mapping_aops = { 1667 - .sync_page = block_sync_page, 1668 1669 .migratepage = fail_migrate_page, 1669 1670 }; 1670 1671 ··· 1944 1947 count++; 1945 1948 } 1946 1949 if (count) 1947 - blk_run_address_space(target->bt_mapping); 1950 + blk_flush_plug(current); 1948 1951 1949 1952 } while (!kthread_should_stop()); 1950 1953 ··· 1992 1995 1993 1996 if (wait) { 1994 1997 /* Expedite and wait for IO to complete. */ 1995 - blk_run_address_space(target->bt_mapping); 1998 + blk_flush_plug(current); 1996 1999 while (!list_empty(&wait_list)) { 1997 2000 bp = list_first_entry(&wait_list, struct xfs_buf, b_list); 1998 2001

-16

include/linux/backing-dev.h

··· 66 66 unsigned int capabilities; /* Device capabilities */ 67 67 congested_fn *congested_fn; /* Function pointer if device is md/dm */ 68 68 void *congested_data; /* Pointer to aux data for congested func */ 69 - void (*unplug_io_fn)(struct backing_dev_info *, struct page *); 70 - void *unplug_io_data; 71 69 72 70 char *name; 73 71 ··· 249 251 250 252 extern struct backing_dev_info default_backing_dev_info; 251 253 extern struct backing_dev_info noop_backing_dev_info; 252 - void default_unplug_io_fn(struct backing_dev_info *bdi, struct page *page); 253 254 254 255 int writeback_in_progress(struct backing_dev_info *bdi); 255 256 ··· 331 334 { 332 335 schedule(); 333 336 return 0; 334 - } 335 - 336 - static inline void blk_run_backing_dev(struct backing_dev_info *bdi, 337 - struct page *page) 338 - { 339 - if (bdi && bdi->unplug_io_fn) 340 - bdi->unplug_io_fn(bdi, page); 341 - } 342 - 343 - static inline void blk_run_address_space(struct address_space *mapping) 344 - { 345 - if (mapping) 346 - blk_run_backing_dev(mapping->backing_dev_info, NULL); 347 337 } 348 338 349 339 #endif /* _LINUX_BACKING_DEV_H */

-1

include/linux/bio.h

··· 304 304 }; 305 305 306 306 extern struct bio_set *fs_bio_set; 307 - extern struct biovec_slab bvec_slabs[BIOVEC_NR_POOLS] __read_mostly; 308 307 309 308 /* 310 309 * a small number of entries is fine, not going to be performance critical.

+4 -2

include/linux/blk_types.h

··· 128 128 __REQ_NOIDLE, /* don't anticipate more IO after this one */ 129 129 130 130 /* bio only flags */ 131 - __REQ_UNPLUG, /* unplug the immediately after submission */ 132 131 __REQ_RAHEAD, /* read ahead, can fail anytime */ 133 132 __REQ_THROTTLED, /* This bio has already been subjected to 134 133 * throttling rules. Don't do it again. */ ··· 147 148 __REQ_ALLOCED, /* request came from our alloc pool */ 148 149 __REQ_COPY_USER, /* contains copies of user pages */ 149 150 __REQ_FLUSH, /* request for cache flush */ 151 + __REQ_FLUSH_SEQ, /* request for flush sequence */ 150 152 __REQ_IO_STAT, /* account I/O stat */ 151 153 __REQ_MIXED_MERGE, /* merge of different types, fail separately */ 152 154 __REQ_SECURE, /* secure discard (used with __REQ_DISCARD) */ 155 + __REQ_ON_PLUG, /* on plug list */ 153 156 __REQ_NR_BITS, /* stops here */ 154 157 }; 155 158 ··· 171 170 REQ_NOIDLE | REQ_FLUSH | REQ_FUA) 172 171 #define REQ_CLONE_MASK REQ_COMMON_MASK 173 172 174 - #define REQ_UNPLUG (1 << __REQ_UNPLUG) 175 173 #define REQ_RAHEAD (1 << __REQ_RAHEAD) 176 174 #define REQ_THROTTLED (1 << __REQ_THROTTLED) 177 175 ··· 188 188 #define REQ_ALLOCED (1 << __REQ_ALLOCED) 189 189 #define REQ_COPY_USER (1 << __REQ_COPY_USER) 190 190 #define REQ_FLUSH (1 << __REQ_FLUSH) 191 + #define REQ_FLUSH_SEQ (1 << __REQ_FLUSH_SEQ) 191 192 #define REQ_IO_STAT (1 << __REQ_IO_STAT) 192 193 #define REQ_MIXED_MERGE (1 << __REQ_MIXED_MERGE) 193 194 #define REQ_SECURE (1 << __REQ_SECURE) 195 + #define REQ_ON_PLUG (1 << __REQ_ON_PLUG) 194 196 195 197 #endif /* __LINUX_BLK_TYPES_H */

+70 -31

include/linux/blkdev.h

··· 108 108 109 109 /* 110 110 * Three pointers are available for the IO schedulers, if they need 111 - * more they have to dynamically allocate it. 111 + * more they have to dynamically allocate it. Flush requests are 112 + * never put on the IO scheduler. So let the flush fields share 113 + * space with the three elevator_private pointers. 112 114 */ 113 - void *elevator_private; 114 - void *elevator_private2; 115 - void *elevator_private3; 115 + union { 116 + void *elevator_private[3]; 117 + struct { 118 + unsigned int seq; 119 + struct list_head list; 120 + } flush; 121 + }; 116 122 117 123 struct gendisk *rq_disk; 118 124 struct hd_struct *part; ··· 196 190 typedef int (make_request_fn) (struct request_queue *q, struct bio *bio); 197 191 typedef int (prep_rq_fn) (struct request_queue *, struct request *); 198 192 typedef void (unprep_rq_fn) (struct request_queue *, struct request *); 199 - typedef void (unplug_fn) (struct request_queue *); 200 193 201 194 struct bio_vec; 202 195 struct bvec_merge_data { ··· 278 273 make_request_fn *make_request_fn; 279 274 prep_rq_fn *prep_rq_fn; 280 275 unprep_rq_fn *unprep_rq_fn; 281 - unplug_fn *unplug_fn; 282 276 merge_bvec_fn *merge_bvec_fn; 283 277 softirq_done_fn *softirq_done_fn; 284 278 rq_timed_out_fn *rq_timed_out_fn; ··· 291 287 struct request *boundary_rq; 292 288 293 289 /* 294 - * Auto-unplugging state 290 + * Delayed queue handling 295 291 */ 296 - struct timer_list unplug_timer; 297 - int unplug_thresh; /* After this many requests */ 298 - unsigned long unplug_delay; /* After this many jiffies */ 299 - struct work_struct unplug_work; 292 + struct delayed_work delay_work; 300 293 301 294 struct backing_dev_info backing_dev_info; 302 295 ··· 364 363 * for flush operations 365 364 */ 366 365 unsigned int flush_flags; 367 - unsigned int flush_seq; 368 - int flush_err; 366 + unsigned int flush_pending_idx:1; 367 + unsigned int flush_running_idx:1; 368 + unsigned long flush_pending_since; 369 + struct list_head flush_queue[2]; 370 + struct list_head flush_data_in_flight; 369 371 struct request flush_rq; 370 - struct request *orig_flush_rq; 371 - struct list_head pending_flushes; 372 372 373 373 struct mutex sysfs_lock; 374 374 ··· 389 387 #define QUEUE_FLAG_ASYNCFULL 4 /* write queue has been filled */ 390 388 #define QUEUE_FLAG_DEAD 5 /* queue being torn down */ 391 389 #define QUEUE_FLAG_REENTER 6 /* Re-entrancy avoidance */ 392 - #define QUEUE_FLAG_PLUGGED 7 /* queue is plugged */ 393 - #define QUEUE_FLAG_ELVSWITCH 8 /* don't use elevator, just do FIFO */ 394 - #define QUEUE_FLAG_BIDI 9 /* queue supports bidi requests */ 395 - #define QUEUE_FLAG_NOMERGES 10 /* disable merge attempts */ 396 - #define QUEUE_FLAG_SAME_COMP 11 /* force complete on same CPU */ 397 - #define QUEUE_FLAG_FAIL_IO 12 /* fake timeout */ 398 - #define QUEUE_FLAG_STACKABLE 13 /* supports request stacking */ 399 - #define QUEUE_FLAG_NONROT 14 /* non-rotational device (SSD) */ 390 + #define QUEUE_FLAG_ELVSWITCH 7 /* don't use elevator, just do FIFO */ 391 + #define QUEUE_FLAG_BIDI 8 /* queue supports bidi requests */ 392 + #define QUEUE_FLAG_NOMERGES 9 /* disable merge attempts */ 393 + #define QUEUE_FLAG_SAME_COMP 10 /* force complete on same CPU */ 394 + #define QUEUE_FLAG_FAIL_IO 11 /* fake timeout */ 395 + #define QUEUE_FLAG_STACKABLE 12 /* supports request stacking */ 396 + #define QUEUE_FLAG_NONROT 13 /* non-rotational device (SSD) */ 400 397 #define QUEUE_FLAG_VIRT QUEUE_FLAG_NONROT /* paravirt device */ 401 398 #define QUEUE_FLAG_IO_STAT 15 /* do IO stats */ 402 399 #define QUEUE_FLAG_DISCARD 16 /* supports DISCARD */ ··· 473 472 __clear_bit(flag, &q->queue_flags); 474 473 } 475 474 476 - #define blk_queue_plugged(q) test_bit(QUEUE_FLAG_PLUGGED, &(q)->queue_flags) 477 475 #define blk_queue_tagged(q) test_bit(QUEUE_FLAG_QUEUED, &(q)->queue_flags) 478 476 #define blk_queue_stopped(q) test_bit(QUEUE_FLAG_STOPPED, &(q)->queue_flags) 479 477 #define blk_queue_nomerges(q) test_bit(QUEUE_FLAG_NOMERGES, &(q)->queue_flags) ··· 667 667 extern void blk_rq_unprep_clone(struct request *rq); 668 668 extern int blk_insert_cloned_request(struct request_queue *q, 669 669 struct request *rq); 670 - extern void blk_plug_device(struct request_queue *); 671 - extern void blk_plug_device_unlocked(struct request_queue *); 672 - extern int blk_remove_plug(struct request_queue *); 670 + extern void blk_delay_queue(struct request_queue *, unsigned long); 673 671 extern void blk_recount_segments(struct request_queue *, struct bio *); 674 672 extern int scsi_cmd_ioctl(struct request_queue *, struct gendisk *, fmode_t, 675 673 unsigned int, void __user *); ··· 711 713 struct request *, int); 712 714 extern void blk_execute_rq_nowait(struct request_queue *, struct gendisk *, 713 715 struct request *, int, rq_end_io_fn *); 714 - extern void blk_unplug(struct request_queue *q); 715 716 716 717 static inline struct request_queue *bdev_get_queue(struct block_device *bdev) 717 718 { ··· 847 850 848 851 extern int blk_rq_map_sg(struct request_queue *, struct request *, struct scatterlist *); 849 852 extern void blk_dump_rq_flags(struct request *, char *); 850 - extern void generic_unplug_device(struct request_queue *); 851 853 extern long nr_blockdev_pages(void); 852 854 853 855 int blk_get_queue(struct request_queue *); 854 856 struct request_queue *blk_alloc_queue(gfp_t); 855 857 struct request_queue *blk_alloc_queue_node(gfp_t, int); 856 858 extern void blk_put_queue(struct request_queue *); 859 + 860 + struct blk_plug { 861 + unsigned long magic; 862 + struct list_head list; 863 + unsigned int should_sort; 864 + }; 865 + 866 + extern void blk_start_plug(struct blk_plug *); 867 + extern void blk_finish_plug(struct blk_plug *); 868 + extern void __blk_flush_plug(struct task_struct *, struct blk_plug *); 869 + 870 + static inline void blk_flush_plug(struct task_struct *tsk) 871 + { 872 + struct blk_plug *plug = tsk->plug; 873 + 874 + if (unlikely(plug)) 875 + __blk_flush_plug(tsk, plug); 876 + } 877 + 878 + static inline bool blk_needs_flush_plug(struct task_struct *tsk) 879 + { 880 + struct blk_plug *plug = tsk->plug; 881 + 882 + return plug && !list_empty(&plug->list); 883 + } 857 884 858 885 /* 859 886 * tag stuff ··· 1156 1135 extern int blk_throtl_init(struct request_queue *q); 1157 1136 extern void blk_throtl_exit(struct request_queue *q); 1158 1137 extern int blk_throtl_bio(struct request_queue *q, struct bio **bio); 1159 - extern void throtl_shutdown_timer_wq(struct request_queue *q); 1160 1138 #else /* CONFIG_BLK_DEV_THROTTLING */ 1161 1139 static inline int blk_throtl_bio(struct request_queue *q, struct bio **bio) 1162 1140 { ··· 1164 1144 1165 1145 static inline int blk_throtl_init(struct request_queue *q) { return 0; } 1166 1146 static inline int blk_throtl_exit(struct request_queue *q) { return 0; } 1167 - static inline void throtl_shutdown_timer_wq(struct request_queue *q) {} 1168 1147 #endif /* CONFIG_BLK_DEV_THROTTLING */ 1169 1148 1170 1149 #define MODULE_ALIAS_BLOCKDEV(major,minor) \ ··· 1295 1276 static inline long nr_blockdev_pages(void) 1296 1277 { 1297 1278 return 0; 1279 + } 1280 + 1281 + struct blk_plug { 1282 + }; 1283 + 1284 + static inline void blk_start_plug(struct blk_plug *plug) 1285 + { 1286 + } 1287 + 1288 + static inline void blk_finish_plug(struct blk_plug *plug) 1289 + { 1290 + } 1291 + 1292 + static inline void blk_flush_plug(struct task_struct *task) 1293 + { 1294 + } 1295 + 1296 + static inline bool blk_needs_flush_plug(struct task_struct *tsk) 1297 + { 1298 + return false; 1298 1299 } 1299 1300 1300 1301 #endif /* CONFIG_BLOCK */

-1

include/linux/buffer_head.h

··· 219 219 int block_commit_write(struct page *page, unsigned from, unsigned to); 220 220 int block_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf, 221 221 get_block_t get_block); 222 - void block_sync_page(struct page *); 223 222 sector_t generic_block_bmap(struct address_space *, sector_t, get_block_t *); 224 223 int block_truncate_page(struct address_space *, loff_t, get_block_t *); 225 224 int nobh_write_begin(struct address_space *, loff_t, unsigned, unsigned,

-5

include/linux/device-mapper.h

··· 286 286 int dm_table_complete(struct dm_table *t); 287 287 288 288 /* 289 - * Unplug all devices in a table. 290 - */ 291 - void dm_table_unplug_all(struct dm_table *t); 292 - 293 - /* 294 289 * Table reference counting. 295 290 */ 296 291 struct dm_table *dm_get_live_table(struct mapped_device *md);

+5 -5

include/linux/elevator.h

··· 20 20 typedef int (elevator_dispatch_fn) (struct request_queue *, int); 21 21 22 22 typedef void (elevator_add_req_fn) (struct request_queue *, struct request *); 23 - typedef int (elevator_queue_empty_fn) (struct request_queue *); 24 23 typedef struct request *(elevator_request_list_fn) (struct request_queue *, struct request *); 25 24 typedef void (elevator_completed_req_fn) (struct request_queue *, struct request *); 26 25 typedef int (elevator_may_queue_fn) (struct request_queue *, int); ··· 45 46 elevator_activate_req_fn *elevator_activate_req_fn; 46 47 elevator_deactivate_req_fn *elevator_deactivate_req_fn; 47 48 48 - elevator_queue_empty_fn *elevator_queue_empty_fn; 49 49 elevator_completed_req_fn *elevator_completed_req_fn; 50 50 51 51 elevator_request_list_fn *elevator_former_req_fn; ··· 99 101 */ 100 102 extern void elv_dispatch_sort(struct request_queue *, struct request *); 101 103 extern void elv_dispatch_add_tail(struct request_queue *, struct request *); 102 - extern void elv_add_request(struct request_queue *, struct request *, int, int); 103 - extern void __elv_add_request(struct request_queue *, struct request *, int, int); 104 + extern void elv_add_request(struct request_queue *, struct request *, int); 105 + extern void __elv_add_request(struct request_queue *, struct request *, int); 104 106 extern void elv_insert(struct request_queue *, struct request *, int); 105 107 extern int elv_merge(struct request_queue *, struct request **, struct bio *); 108 + extern int elv_try_merge(struct request *, struct bio *); 106 109 extern void elv_merge_requests(struct request_queue *, struct request *, 107 110 struct request *); 108 111 extern void elv_merged_request(struct request_queue *, struct request *, int); 109 112 extern void elv_bio_merged(struct request_queue *q, struct request *, 110 113 struct bio *); 111 114 extern void elv_requeue_request(struct request_queue *, struct request *); 112 - extern int elv_queue_empty(struct request_queue *); 113 115 extern struct request *elv_former_request(struct request_queue *, struct request *); 114 116 extern struct request *elv_latter_request(struct request_queue *, struct request *); 115 117 extern int elv_register_queue(struct request_queue *q); ··· 165 167 #define ELEVATOR_INSERT_BACK 2 166 168 #define ELEVATOR_INSERT_SORT 3 167 169 #define ELEVATOR_INSERT_REQUEUE 4 170 + #define ELEVATOR_INSERT_FLUSH 5 171 + #define ELEVATOR_INSERT_SORT_MERGE 6 168 172 169 173 /* 170 174 * return values from elevator_may_queue_fn

+9 -20

include/linux/fs.h

··· 138 138 * block layer could (in theory) choose to ignore this 139 139 * request if it runs into resource problems. 140 140 * WRITE A normal async write. Device will be plugged. 141 - * WRITE_SYNC_PLUG Synchronous write. Identical to WRITE, but passes down 141 + * WRITE_SYNC Synchronous write. Identical to WRITE, but passes down 142 142 * the hint that someone will be waiting on this IO 143 - * shortly. The device must still be unplugged explicitly, 144 - * WRITE_SYNC_PLUG does not do this as we could be 145 - * submitting more writes before we actually wait on any 146 - * of them. 147 - * WRITE_SYNC Like WRITE_SYNC_PLUG, but also unplugs the device 148 - * immediately after submission. The write equivalent 149 - * of READ_SYNC. 150 - * WRITE_ODIRECT_PLUG Special case write for O_DIRECT only. 143 + * shortly. The write equivalent of READ_SYNC. 144 + * WRITE_ODIRECT Special case write for O_DIRECT only. 151 145 * WRITE_FLUSH Like WRITE_SYNC but with preceding cache flush. 152 146 * WRITE_FUA Like WRITE_SYNC but data is guaranteed to be on 153 147 * non-volatile media on completion. ··· 157 163 #define WRITE RW_MASK 158 164 #define READA RWA_MASK 159 165 160 - #define READ_SYNC (READ | REQ_SYNC | REQ_UNPLUG) 166 + #define READ_SYNC (READ | REQ_SYNC) 161 167 #define READ_META (READ | REQ_META) 162 - #define WRITE_SYNC_PLUG (WRITE | REQ_SYNC | REQ_NOIDLE) 163 - #define WRITE_SYNC (WRITE | REQ_SYNC | REQ_NOIDLE | REQ_UNPLUG) 164 - #define WRITE_ODIRECT_PLUG (WRITE | REQ_SYNC) 168 + #define WRITE_SYNC (WRITE | REQ_SYNC | REQ_NOIDLE) 169 + #define WRITE_ODIRECT (WRITE | REQ_SYNC) 165 170 #define WRITE_META (WRITE | REQ_META) 166 - #define WRITE_FLUSH (WRITE | REQ_SYNC | REQ_NOIDLE | REQ_UNPLUG | \ 167 - REQ_FLUSH) 168 - #define WRITE_FUA (WRITE | REQ_SYNC | REQ_NOIDLE | REQ_UNPLUG | \ 169 - REQ_FUA) 170 - #define WRITE_FLUSH_FUA (WRITE | REQ_SYNC | REQ_NOIDLE | REQ_UNPLUG | \ 171 - REQ_FLUSH | REQ_FUA) 171 + #define WRITE_FLUSH (WRITE | REQ_SYNC | REQ_NOIDLE | REQ_FLUSH) 172 + #define WRITE_FUA (WRITE | REQ_SYNC | REQ_NOIDLE | REQ_FUA) 173 + #define WRITE_FLUSH_FUA (WRITE | REQ_SYNC | REQ_NOIDLE | REQ_FLUSH | REQ_FUA) 172 174 173 175 #define SEL_IN 1 174 176 #define SEL_OUT 2 ··· 576 586 struct address_space_operations { 577 587 int (*writepage)(struct page *page, struct writeback_control *wbc); 578 588 int (*readpage)(struct file *, struct page *); 579 - void (*sync_page)(struct page *); 580 589 581 590 /* Write back some dirty pages from this mapping. */ 582 591 int (*writepages)(struct address_space *, struct writeback_control *);

+6 -6

include/linux/genhd.h

··· 109 109 int make_it_fail; 110 110 #endif 111 111 unsigned long stamp; 112 - int in_flight[2]; 112 + atomic_t in_flight[2]; 113 113 #ifdef CONFIG_SMP 114 114 struct disk_stats __percpu *dkstats; 115 115 #else ··· 370 370 371 371 static inline void part_inc_in_flight(struct hd_struct *part, int rw) 372 372 { 373 - part->in_flight[rw]++; 373 + atomic_inc(&part->in_flight[rw]); 374 374 if (part->partno) 375 - part_to_disk(part)->part0.in_flight[rw]++; 375 + atomic_inc(&part_to_disk(part)->part0.in_flight[rw]); 376 376 } 377 377 378 378 static inline void part_dec_in_flight(struct hd_struct *part, int rw) 379 379 { 380 - part->in_flight[rw]--; 380 + atomic_dec(&part->in_flight[rw]); 381 381 if (part->partno) 382 - part_to_disk(part)->part0.in_flight[rw]--; 382 + atomic_dec(&part_to_disk(part)->part0.in_flight[rw]); 383 383 } 384 384 385 385 static inline int part_in_flight(struct hd_struct *part) 386 386 { 387 - return part->in_flight[0] + part->in_flight[1]; 387 + return atomic_read(&part->in_flight[0]) + atomic_read(&part->in_flight[1]); 388 388 } 389 389 390 390 static inline struct partition_meta_info *alloc_part_info(struct gendisk *disk)

-12

include/linux/pagemap.h

··· 298 298 299 299 extern void __lock_page(struct page *page); 300 300 extern int __lock_page_killable(struct page *page); 301 - extern void __lock_page_nosync(struct page *page); 302 301 extern int __lock_page_or_retry(struct page *page, struct mm_struct *mm, 303 302 unsigned int flags); 304 303 extern void unlock_page(struct page *page); ··· 340 341 return 0; 341 342 } 342 343 343 - /* 344 - * lock_page_nosync should only be used if we can't pin the page's inode. 345 - * Doesn't play quite so well with block device plugging. 346 - */ 347 - static inline void lock_page_nosync(struct page *page) 348 - { 349 - might_sleep(); 350 - if (!trylock_page(page)) 351 - __lock_page_nosync(page); 352 - } 353 - 354 344 /* 355 345 * lock_page_or_retry - Lock the page, unless this would block and the 356 346 * caller indicated that it can handle a retry.

+6

include/linux/sched.h

··· 99 99 struct bio_list; 100 100 struct fs_struct; 101 101 struct perf_event_context; 102 + struct blk_plug; 102 103 103 104 /* 104 105 * List of flags we want to share for kernel threads, ··· 1428 1427 1429 1428 /* stacked block device info */ 1430 1429 struct bio_list *bio_list; 1430 + 1431 + #ifdef CONFIG_BLOCK 1432 + /* stack plugging */ 1433 + struct blk_plug *plug; 1434 + #endif 1431 1435 1432 1436 /* VM state */ 1433 1437 struct reclaim_state *reclaim_state;

-2

include/linux/swap.h

··· 309 309 struct page **pagep, swp_entry_t *ent); 310 310 #endif 311 311 312 - extern void swap_unplug_io_fn(struct backing_dev_info *, struct page *); 313 - 314 312 #ifdef CONFIG_SWAP 315 313 /* linux/mm/page_io.c */ 316 314 extern int swap_readpage(struct page *);

+1

kernel/exit.c

··· 908 908 profile_task_exit(tsk); 909 909 910 910 WARN_ON(atomic_read(&tsk->fs_excl)); 911 + WARN_ON(blk_needs_flush_plug(tsk)); 911 912 912 913 if (unlikely(in_interrupt())) 913 914 panic("Aiee, killing interrupt handler!");

+3

kernel/fork.c

··· 1205 1205 * Clear TID on mm_release()? 1206 1206 */ 1207 1207 p->clear_child_tid = (clone_flags & CLONE_CHILD_CLEARTID) ? child_tidptr: NULL; 1208 + #ifdef CONFIG_BLOCK 1209 + p->plug = NULL; 1210 + #endif 1208 1211 #ifdef CONFIG_FUTEX 1209 1212 p->robust_list = NULL; 1210 1213 #ifdef CONFIG_COMPAT

+1 -1

kernel/power/block_io.c

··· 28 28 static int submit(int rw, struct block_device *bdev, sector_t sector, 29 29 struct page *page, struct bio **bio_chain) 30 30 { 31 - const int bio_rw = rw | REQ_SYNC | REQ_UNPLUG; 31 + const int bio_rw = rw | REQ_SYNC; 32 32 struct bio *bio; 33 33 34 34 bio = bio_alloc(__GFP_WAIT | __GFP_HIGH, 1);

+12

kernel/sched.c

··· 4115 4115 switch_count = &prev->nvcsw; 4116 4116 } 4117 4117 4118 + /* 4119 + * If we are going to sleep and we have plugged IO queued, make 4120 + * sure to submit it to avoid deadlocks. 4121 + */ 4122 + if (prev->state != TASK_RUNNING && blk_needs_flush_plug(prev)) { 4123 + raw_spin_unlock(&rq->lock); 4124 + blk_flush_plug(prev); 4125 + raw_spin_lock(&rq->lock); 4126 + } 4127 + 4118 4128 pre_schedule(rq, prev); 4119 4129 4120 4130 if (unlikely(!rq->nr_running)) ··· 5538 5528 5539 5529 delayacct_blkio_start(); 5540 5530 atomic_inc(&rq->nr_iowait); 5531 + blk_flush_plug(current); 5541 5532 current->in_iowait = 1; 5542 5533 schedule(); 5543 5534 current->in_iowait = 0; ··· 5554 5543 5555 5544 delayacct_blkio_start(); 5556 5545 atomic_inc(&rq->nr_iowait); 5546 + blk_flush_plug(current); 5557 5547 current->in_iowait = 1; 5558 5548 ret = schedule_timeout(timeout); 5559 5549 current->in_iowait = 0;

+4 -11

kernel/trace/blktrace.c

··· 703 703 * 704 704 **/ 705 705 static void blk_add_trace_rq(struct request_queue *q, struct request *rq, 706 - u32 what) 706 + u32 what) 707 707 { 708 708 struct blk_trace *bt = q->blk_trace; 709 - int rw = rq->cmd_flags & 0x03; 710 709 711 710 if (likely(!bt)) 712 711 return; 713 712 714 - if (rq->cmd_flags & REQ_DISCARD) 715 - rw |= REQ_DISCARD; 716 - 717 - if (rq->cmd_flags & REQ_SECURE) 718 - rw |= REQ_SECURE; 719 - 720 713 if (rq->cmd_type == REQ_TYPE_BLOCK_PC) { 721 714 what |= BLK_TC_ACT(BLK_TC_PC); 722 - __blk_add_trace(bt, 0, blk_rq_bytes(rq), rw, 715 + __blk_add_trace(bt, 0, blk_rq_bytes(rq), rq->cmd_flags, 723 716 what, rq->errors, rq->cmd_len, rq->cmd); 724 717 } else { 725 718 what |= BLK_TC_ACT(BLK_TC_FS); 726 - __blk_add_trace(bt, blk_rq_pos(rq), blk_rq_bytes(rq), rw, 727 - what, rq->errors, 0, NULL); 719 + __blk_add_trace(bt, blk_rq_pos(rq), blk_rq_bytes(rq), 720 + rq->cmd_flags, what, rq->errors, 0, NULL); 728 721 } 729 722 } 730 723

+1 -7

mm/backing-dev.c

··· 14 14 15 15 static atomic_long_t bdi_seq = ATOMIC_LONG_INIT(0); 16 16 17 - void default_unplug_io_fn(struct backing_dev_info *bdi, struct page *page) 18 - { 19 - } 20 - EXPORT_SYMBOL(default_unplug_io_fn); 21 - 22 17 struct backing_dev_info default_backing_dev_info = { 23 18 .name = "default", 24 19 .ra_pages = VM_MAX_READAHEAD * 1024 / PAGE_CACHE_SIZE, 25 20 .state = 0, 26 21 .capabilities = BDI_CAP_MAP_COPY, 27 - .unplug_io_fn = default_unplug_io_fn, 28 22 }; 29 23 EXPORT_SYMBOL_GPL(default_backing_dev_info); 30 24 ··· 598 604 spin_lock(&sb_lock); 599 605 list_for_each_entry(sb, &super_blocks, s_list) { 600 606 if (sb->s_bdi == bdi) 601 - sb->s_bdi = NULL; 607 + sb->s_bdi = &default_backing_dev_info; 602 608 } 603 609 spin_unlock(&sb_lock); 604 610 }

+13 -61

mm/filemap.c

··· 164 164 } 165 165 EXPORT_SYMBOL(delete_from_page_cache); 166 166 167 - static int sync_page(void *word) 167 + static int sleep_on_page(void *word) 168 168 { 169 - struct address_space *mapping; 170 - struct page *page; 171 - 172 - page = container_of((unsigned long *)word, struct page, flags); 173 - 174 - /* 175 - * page_mapping() is being called without PG_locked held. 176 - * Some knowledge of the state and use of the page is used to 177 - * reduce the requirements down to a memory barrier. 178 - * The danger here is of a stale page_mapping() return value 179 - * indicating a struct address_space different from the one it's 180 - * associated with when it is associated with one. 181 - * After smp_mb(), it's either the correct page_mapping() for 182 - * the page, or an old page_mapping() and the page's own 183 - * page_mapping() has gone NULL. 184 - * The ->sync_page() address_space operation must tolerate 185 - * page_mapping() going NULL. By an amazing coincidence, 186 - * this comes about because none of the users of the page 187 - * in the ->sync_page() methods make essential use of the 188 - * page_mapping(), merely passing the page down to the backing 189 - * device's unplug functions when it's non-NULL, which in turn 190 - * ignore it for all cases but swap, where only page_private(page) is 191 - * of interest. When page_mapping() does go NULL, the entire 192 - * call stack gracefully ignores the page and returns. 193 - * -- wli 194 - */ 195 - smp_mb(); 196 - mapping = page_mapping(page); 197 - if (mapping && mapping->a_ops && mapping->a_ops->sync_page) 198 - mapping->a_ops->sync_page(page); 199 169 io_schedule(); 200 170 return 0; 201 171 } 202 172 203 - static int sync_page_killable(void *word) 173 + static int sleep_on_page_killable(void *word) 204 174 { 205 - sync_page(word); 175 + sleep_on_page(word); 206 176 return fatal_signal_pending(current) ? -EINTR : 0; 207 177 } 208 178 ··· 528 558 EXPORT_SYMBOL(__page_cache_alloc); 529 559 #endif 530 560 531 - static int __sleep_on_page_lock(void *word) 532 - { 533 - io_schedule(); 534 - return 0; 535 - } 536 - 537 561 /* 538 562 * In order to wait for pages to become available there must be 539 563 * waitqueues associated with pages. By using a hash table of ··· 555 591 DEFINE_WAIT_BIT(wait, &page->flags, bit_nr); 556 592 557 593 if (test_bit(bit_nr, &page->flags)) 558 - __wait_on_bit(page_waitqueue(page), &wait, sync_page, 594 + __wait_on_bit(page_waitqueue(page), &wait, sleep_on_page, 559 595 TASK_UNINTERRUPTIBLE); 560 596 } 561 597 EXPORT_SYMBOL(wait_on_page_bit); ··· 619 655 /** 620 656 * __lock_page - get a lock on the page, assuming we need to sleep to get it 621 657 * @page: the page to lock 622 - * 623 - * Ugly. Running sync_page() in state TASK_UNINTERRUPTIBLE is scary. If some 624 - * random driver's requestfn sets TASK_RUNNING, we could busywait. However 625 - * chances are that on the second loop, the block layer's plug list is empty, 626 - * so sync_page() will then return in state TASK_UNINTERRUPTIBLE. 627 658 */ 628 659 void __lock_page(struct page *page) 629 660 { 630 661 DEFINE_WAIT_BIT(wait, &page->flags, PG_locked); 631 662 632 - __wait_on_bit_lock(page_waitqueue(page), &wait, sync_page, 663 + __wait_on_bit_lock(page_waitqueue(page), &wait, sleep_on_page, 633 664 TASK_UNINTERRUPTIBLE); 634 665 } 635 666 EXPORT_SYMBOL(__lock_page); ··· 634 675 DEFINE_WAIT_BIT(wait, &page->flags, PG_locked); 635 676 636 677 return __wait_on_bit_lock(page_waitqueue(page), &wait, 637 - sync_page_killable, TASK_KILLABLE); 678 + sleep_on_page_killable, TASK_KILLABLE); 638 679 } 639 680 EXPORT_SYMBOL_GPL(__lock_page_killable); 640 - 641 - /** 642 - * __lock_page_nosync - get a lock on the page, without calling sync_page() 643 - * @page: the page to lock 644 - * 645 - * Variant of lock_page that does not require the caller to hold a reference 646 - * on the page's mapping. 647 - */ 648 - void __lock_page_nosync(struct page *page) 649 - { 650 - DEFINE_WAIT_BIT(wait, &page->flags, PG_locked); 651 - __wait_on_bit_lock(page_waitqueue(page), &wait, __sleep_on_page_lock, 652 - TASK_UNINTERRUPTIBLE); 653 - } 654 681 655 682 int __lock_page_or_retry(struct page *page, struct mm_struct *mm, 656 683 unsigned int flags) ··· 1352 1407 unsigned long seg = 0; 1353 1408 size_t count; 1354 1409 loff_t *ppos = &iocb->ki_pos; 1410 + struct blk_plug plug; 1355 1411 1356 1412 count = 0; 1357 1413 retval = generic_segment_checks(iov, &nr_segs, &count, VERIFY_WRITE); 1358 1414 if (retval) 1359 1415 return retval; 1416 + 1417 + blk_start_plug(&plug); 1360 1418 1361 1419 /* coalesce the iovecs and go direct-to-BIO for O_DIRECT */ 1362 1420 if (filp->f_flags & O_DIRECT) { ··· 1433 1485 break; 1434 1486 } 1435 1487 out: 1488 + blk_finish_plug(&plug); 1436 1489 return retval; 1437 1490 } 1438 1491 EXPORT_SYMBOL(generic_file_aio_read); ··· 2545 2596 { 2546 2597 struct file *file = iocb->ki_filp; 2547 2598 struct inode *inode = file->f_mapping->host; 2599 + struct blk_plug plug; 2548 2600 ssize_t ret; 2549 2601 2550 2602 BUG_ON(iocb->ki_pos != pos); 2551 2603 2552 2604 mutex_lock(&inode->i_mutex); 2605 + blk_start_plug(&plug); 2553 2606 ret = __generic_file_aio_write(iocb, iov, nr_segs, &iocb->ki_pos); 2554 2607 mutex_unlock(&inode->i_mutex); 2555 2608 ··· 2562 2611 if (err < 0 && ret > 0) 2563 2612 ret = err; 2564 2613 } 2614 + blk_finish_plug(&plug); 2565 2615 return ret; 2566 2616 } 2567 2617 EXPORT_SYMBOL(generic_file_aio_write);

+4 -4

mm/memory-failure.c

··· 945 945 collect_procs(ppage, &tokill); 946 946 947 947 if (hpage != ppage) 948 - lock_page_nosync(ppage); 948 + lock_page(ppage); 949 949 950 950 ret = try_to_unmap(ppage, ttu); 951 951 if (ret != SWAP_SUCCESS) ··· 1038 1038 * Check "just unpoisoned", "filter hit", and 1039 1039 * "race with other subpage." 1040 1040 */ 1041 - lock_page_nosync(hpage); 1041 + lock_page(hpage); 1042 1042 if (!PageHWPoison(hpage) 1043 1043 || (hwpoison_filter(p) && TestClearPageHWPoison(p)) 1044 1044 || (p != hpage && TestSetPageHWPoison(hpage))) { ··· 1088 1088 * It's very difficult to mess with pages currently under IO 1089 1089 * and in many cases impossible, so we just avoid it here. 1090 1090 */ 1091 - lock_page_nosync(hpage); 1091 + lock_page(hpage); 1092 1092 1093 1093 /* 1094 1094 * unpoison always clear PG_hwpoison inside page lock ··· 1231 1231 return 0; 1232 1232 } 1233 1233 1234 - lock_page_nosync(page); 1234 + lock_page(page); 1235 1235 /* 1236 1236 * This test is racy because PG_hwpoison is set outside of page lock. 1237 1237 * That's acceptable because that won't trigger kernel panic. Instead,

-4

mm/nommu.c

··· 1842 1842 } 1843 1843 EXPORT_SYMBOL(remap_vmalloc_range); 1844 1844 1845 - void swap_unplug_io_fn(struct backing_dev_info *bdi, struct page *page) 1846 - { 1847 - } 1848 - 1849 1845 unsigned long arch_get_unmapped_area(struct file *file, unsigned long addr, 1850 1846 unsigned long len, unsigned long pgoff, unsigned long flags) 1851 1847 {

+8 -2

mm/page-writeback.c

··· 1040 1040 int generic_writepages(struct address_space *mapping, 1041 1041 struct writeback_control *wbc) 1042 1042 { 1043 + struct blk_plug plug; 1044 + int ret; 1045 + 1043 1046 /* deal with chardevs and other special file */ 1044 1047 if (!mapping->a_ops->writepage) 1045 1048 return 0; 1046 1049 1047 - return write_cache_pages(mapping, wbc, __writepage, mapping); 1050 + blk_start_plug(&plug); 1051 + ret = write_cache_pages(mapping, wbc, __writepage, mapping); 1052 + blk_finish_plug(&plug); 1053 + return ret; 1048 1054 } 1049 1055 1050 1056 EXPORT_SYMBOL(generic_writepages); ··· 1257 1251 { 1258 1252 int ret; 1259 1253 1260 - lock_page_nosync(page); 1254 + lock_page(page); 1261 1255 ret = set_page_dirty(page); 1262 1256 unlock_page(page); 1263 1257 return ret;

+1 -1

mm/page_io.c

··· 106 106 goto out; 107 107 } 108 108 if (wbc->sync_mode == WB_SYNC_ALL) 109 - rw |= REQ_SYNC | REQ_UNPLUG; 109 + rw |= REQ_SYNC; 110 110 count_vm_event(PSWPOUT); 111 111 set_page_writeback(page); 112 112 unlock_page(page);

+6 -12

mm/readahead.c

··· 109 109 static int read_pages(struct address_space *mapping, struct file *filp, 110 110 struct list_head *pages, unsigned nr_pages) 111 111 { 112 + struct blk_plug plug; 112 113 unsigned page_idx; 113 114 int ret; 115 + 116 + blk_start_plug(&plug); 114 117 115 118 if (mapping->a_ops->readpages) { 116 119 ret = mapping->a_ops->readpages(filp, mapping, pages, nr_pages); ··· 132 129 page_cache_release(page); 133 130 } 134 131 ret = 0; 132 + 135 133 out: 134 + blk_finish_plug(&plug); 135 + 136 136 return ret; 137 137 } 138 138 ··· 560 554 561 555 /* do read-ahead */ 562 556 ondemand_readahead(mapping, ra, filp, true, offset, req_size); 563 - 564 - #ifdef CONFIG_BLOCK 565 - /* 566 - * Normally the current page is !uptodate and lock_page() will be 567 - * immediately called to implicitly unplug the device. However this 568 - * is not always true for RAID conifgurations, where data arrives 569 - * not strictly in their submission order. In this case we need to 570 - * explicitly kick off the IO. 571 - */ 572 - if (PageUptodate(page)) 573 - blk_run_backing_dev(mapping->backing_dev_info, NULL); 574 - #endif 575 557 } 576 558 EXPORT_SYMBOL_GPL(page_cache_async_readahead);

-1

mm/shmem.c

··· 224 224 static struct backing_dev_info shmem_backing_dev_info __read_mostly = { 225 225 .ra_pages = 0, /* No readahead */ 226 226 .capabilities = BDI_CAP_NO_ACCT_AND_WRITEBACK | BDI_CAP_SWAP_BACKED, 227 - .unplug_io_fn = default_unplug_io_fn, 228 227 }; 229 228 230 229 static LIST_HEAD(shmem_swaplist);

+1 -4

mm/swap_state.c

··· 24 24 25 25 /* 26 26 * swapper_space is a fiction, retained to simplify the path through 27 - * vmscan's shrink_page_list, to make sync_page look nicer, and to allow 28 - * future use of radix_tree tags in the swap cache. 27 + * vmscan's shrink_page_list. 29 28 */ 30 29 static const struct address_space_operations swap_aops = { 31 30 .writepage = swap_writepage, 32 - .sync_page = block_sync_page, 33 31 .set_page_dirty = __set_page_dirty_nobuffers, 34 32 .migratepage = migrate_page, 35 33 }; ··· 35 37 static struct backing_dev_info swap_backing_dev_info = { 36 38 .name = "swap", 37 39 .capabilities = BDI_CAP_NO_ACCT_AND_WRITEBACK | BDI_CAP_SWAP_BACKED, 38 - .unplug_io_fn = swap_unplug_io_fn, 39 40 }; 40 41 41 42 struct address_space swapper_space = {

-37

mm/swapfile.c

··· 95 95 } 96 96 97 97 /* 98 - * We need this because the bdev->unplug_fn can sleep and we cannot 99 - * hold swap_lock while calling the unplug_fn. And swap_lock 100 - * cannot be turned into a mutex. 101 - */ 102 - static DECLARE_RWSEM(swap_unplug_sem); 103 - 104 - void swap_unplug_io_fn(struct backing_dev_info *unused_bdi, struct page *page) 105 - { 106 - swp_entry_t entry; 107 - 108 - down_read(&swap_unplug_sem); 109 - entry.val = page_private(page); 110 - if (PageSwapCache(page)) { 111 - struct block_device *bdev = swap_info[swp_type(entry)]->bdev; 112 - struct backing_dev_info *bdi; 113 - 114 - /* 115 - * If the page is removed from swapcache from under us (with a 116 - * racy try_to_unuse/swapoff) we need an additional reference 117 - * count to avoid reading garbage from page_private(page) above. 118 - * If the WARN_ON triggers during a swapoff it maybe the race 119 - * condition and it's harmless. However if it triggers without 120 - * swapoff it signals a problem. 121 - */ 122 - WARN_ON(page_count(page) <= 1); 123 - 124 - bdi = bdev->bd_inode->i_mapping->backing_dev_info; 125 - blk_run_backing_dev(bdi, page); 126 - } 127 - up_read(&swap_unplug_sem); 128 - } 129 - 130 - /* 131 98 * swapon tell device that all the old swap contents can be discarded, 132 99 * to allow the swap device to optimize its wear-levelling. 133 100 */ ··· 1628 1661 enable_swap_info(p, p->prio, p->swap_map); 1629 1662 goto out_dput; 1630 1663 } 1631 - 1632 - /* wait for any unplug function to finish */ 1633 - down_write(&swap_unplug_sem); 1634 - up_write(&swap_unplug_sem); 1635 1664 1636 1665 destroy_swap_extents(p); 1637 1666 if (p->flags & SWP_CONTINUED)

+1 -1

mm/vmscan.c

··· 358 358 static void handle_write_error(struct address_space *mapping, 359 359 struct page *page, int error) 360 360 { 361 - lock_page_nosync(page); 361 + lock_page(page); 362 362 if (page_mapping(page) == mapping) 363 363 mapping_set_error(mapping, error); 364 364 unlock_page(page);