commit 6c5103890057b1bb781b26b7aae38d33e4c517d8 · tjh.dev/kernel

-5

Documentation/block/biodoc.txt

··· 963 964 elevator_add_req_fn* called to add a new request into the scheduler 965 966 - elevator_queue_empty_fn returns true if the merge queue is empty. 967 - Drivers shouldn't use this, but rather check 968 - if elv_next_request is NULL (without losing the 969 - request if one exists!) 970 - 971 elevator_former_req_fn 972 elevator_latter_req_fn These return the request before or after the 973 one specified in disk sort order. Used by the

··· 963 964 elevator_add_req_fn* called to add a new request into the scheduler 965 966 elevator_former_req_fn 967 elevator_latter_req_fn These return the request before or after the 968 one specified in disk sort order. Used by the

+1 -29

Documentation/cgroups/blkio-controller.txt

··· 140 - Specifies per cgroup weight. This is default weight of the group 141 on all the devices until and unless overridden by per device rule. 142 (See blkio.weight_device). 143 - Currently allowed range of weights is from 100 to 1000. 144 145 - blkio.weight_device 146 - One can specify per cgroup per device rules using this interface. ··· 343 344 CFQ sysfs tunable 345 ================= 346 - /sys/block/<disk>/queue/iosched/group_isolation 347 - ----------------------------------------------- 348 - 349 - If group_isolation=1, it provides stronger isolation between groups at the 350 - expense of throughput. By default group_isolation is 0. In general that 351 - means that if group_isolation=0, expect fairness for sequential workload 352 - only. Set group_isolation=1 to see fairness for random IO workload also. 353 - 354 - Generally CFQ will put random seeky workload in sync-noidle category. CFQ 355 - will disable idling on these queues and it does a collective idling on group 356 - of such queues. Generally these are slow moving queues and if there is a 357 - sync-noidle service tree in each group, that group gets exclusive access to 358 - disk for certain period. That means it will bring the throughput down if 359 - group does not have enough IO to drive deeper queue depths and utilize disk 360 - capacity to the fullest in the slice allocated to it. But the flip side is 361 - that even a random reader should get better latencies and overall throughput 362 - if there are lots of sequential readers/sync-idle workload running in the 363 - system. 364 - 365 - If group_isolation=0, then CFQ automatically moves all the random seeky queues 366 - in the root group. That means there will be no service differentiation for 367 - that kind of workload. This leads to better throughput as we do collective 368 - idling on root sync-noidle tree. 369 - 370 - By default one should run with group_isolation=0. If that is not sufficient 371 - and one wants stronger isolation between groups, then set group_isolation=1 372 - but this will come at cost of reduced throughput. 373 - 374 /sys/block/<disk>/queue/iosched/slice_idle 375 ------------------------------------------ 376 On a faster hardware CFQ can be slow, especially with sequential workload.

··· 140 - Specifies per cgroup weight. This is default weight of the group 141 on all the devices until and unless overridden by per device rule. 142 (See blkio.weight_device). 143 + Currently allowed range of weights is from 10 to 1000. 144 145 - blkio.weight_device 146 - One can specify per cgroup per device rules using this interface. ··· 343 344 CFQ sysfs tunable 345 ================= 346 /sys/block/<disk>/queue/iosched/slice_idle 347 ------------------------------------------ 348 On a faster hardware CFQ can be slow, especially with sequential workload.

+8 -9

Documentation/iostats.txt

··· 1 I/O statistics fields 2 --------------- 3 4 - Last modified Sep 30, 2003 5 - 6 Since 2.4.20 (and some versions before, with patches), and 2.5.45, 7 more extensive disk statistics have been introduced to help measure disk 8 activity. Tools such as sar and iostat typically interpret these and do ··· 44 By contrast, in 2.6 if you look at /sys/block/hda/stat, you'll 45 find just the eleven fields, beginning with 446216. If you look at 46 /proc/diskstats, the eleven fields will be preceded by the major and 47 - minor device numbers, and device name. Each of these formats provide 48 eleven fields of statistics, each meaning exactly the same things. 49 All fields except field 9 are cumulative since boot. Field 9 should 50 - go to zero as I/Os complete; all others only increase. Yes, these are 51 - 32 bit unsigned numbers, and on a very busy or long-lived system they 52 may wrap. Applications should be prepared to deal with that; unless 53 your observations are measured in large numbers of minutes or hours, 54 they should not wrap twice before you notice them. ··· 95 read I/Os issued per partition should equal those made to the disks ... 96 but due to the lack of locking it may only be very close. 97 98 - In 2.6, there are counters for each cpu, which made the lack of locking 99 - almost a non-issue. When the statistics are read, the per-cpu counters 100 - are summed (possibly overflowing the unsigned 32-bit variable they are 101 summed to) and the result given to the user. There is no convenient 102 - user interface for accessing the per-cpu counters themselves. 103 104 Disks vs Partitions 105 -------------------

··· 1 I/O statistics fields 2 --------------- 3 4 Since 2.4.20 (and some versions before, with patches), and 2.5.45, 5 more extensive disk statistics have been introduced to help measure disk 6 activity. Tools such as sar and iostat typically interpret these and do ··· 46 By contrast, in 2.6 if you look at /sys/block/hda/stat, you'll 47 find just the eleven fields, beginning with 446216. If you look at 48 /proc/diskstats, the eleven fields will be preceded by the major and 49 + minor device numbers, and device name. Each of these formats provides 50 eleven fields of statistics, each meaning exactly the same things. 51 All fields except field 9 are cumulative since boot. Field 9 should 52 + go to zero as I/Os complete; all others only increase (unless they 53 + overflow and wrap). Yes, these are (32-bit or 64-bit) unsigned long 54 + (native word size) numbers, and on a very busy or long-lived system they 55 may wrap. Applications should be prepared to deal with that; unless 56 your observations are measured in large numbers of minutes or hours, 57 they should not wrap twice before you notice them. ··· 96 read I/Os issued per partition should equal those made to the disks ... 97 but due to the lack of locking it may only be very close. 98 99 + In 2.6, there are counters for each CPU, which make the lack of locking 100 + almost a non-issue. When the statistics are read, the per-CPU counters 101 + are summed (possibly overflowing the unsigned long variable they are 102 summed to) and the result given to the user. There is no convenient 103 + user interface for accessing the per-CPU counters themselves. 104 105 Disks vs Partitions 106 -------------------

+15 -1

block/blk-cgroup.c

··· 371 } 372 EXPORT_SYMBOL_GPL(blkiocg_update_io_remove_stats); 373 374 - void blkiocg_update_timeslice_used(struct blkio_group *blkg, unsigned long time) 375 { 376 unsigned long flags; 377 378 spin_lock_irqsave(&blkg->stats_lock, flags); 379 blkg->stats.time += time; 380 spin_unlock_irqrestore(&blkg->stats_lock, flags); 381 } 382 EXPORT_SYMBOL_GPL(blkiocg_update_timeslice_used); ··· 606 return blkio_fill_stat(key_str, MAX_KEY_LEN - 1, 607 blkg->stats.sectors, cb, dev); 608 #ifdef CONFIG_DEBUG_BLK_CGROUP 609 if (type == BLKIO_STAT_AVG_QUEUE_SIZE) { 610 uint64_t sum = blkg->stats.avg_queue_size_sum; 611 uint64_t samples = blkg->stats.avg_queue_size_samples; ··· 1130 return blkio_read_blkg_stats(blkcg, cft, cb, 1131 BLKIO_STAT_QUEUED, 1); 1132 #ifdef CONFIG_DEBUG_BLK_CGROUP 1133 case BLKIO_PROP_dequeue: 1134 return blkio_read_blkg_stats(blkcg, cft, cb, 1135 BLKIO_STAT_DEQUEUE, 0); ··· 1388 .name = "dequeue", 1389 .private = BLKIOFILE_PRIVATE(BLKIO_POLICY_PROP, 1390 BLKIO_PROP_dequeue), 1391 .read_map = blkiocg_file_read_map, 1392 }, 1393 #endif

··· 371 } 372 EXPORT_SYMBOL_GPL(blkiocg_update_io_remove_stats); 373 374 + void blkiocg_update_timeslice_used(struct blkio_group *blkg, unsigned long time, 375 + unsigned long unaccounted_time) 376 { 377 unsigned long flags; 378 379 spin_lock_irqsave(&blkg->stats_lock, flags); 380 blkg->stats.time += time; 381 + blkg->stats.unaccounted_time += unaccounted_time; 382 spin_unlock_irqrestore(&blkg->stats_lock, flags); 383 } 384 EXPORT_SYMBOL_GPL(blkiocg_update_timeslice_used); ··· 604 return blkio_fill_stat(key_str, MAX_KEY_LEN - 1, 605 blkg->stats.sectors, cb, dev); 606 #ifdef CONFIG_DEBUG_BLK_CGROUP 607 + if (type == BLKIO_STAT_UNACCOUNTED_TIME) 608 + return blkio_fill_stat(key_str, MAX_KEY_LEN - 1, 609 + blkg->stats.unaccounted_time, cb, dev); 610 if (type == BLKIO_STAT_AVG_QUEUE_SIZE) { 611 uint64_t sum = blkg->stats.avg_queue_size_sum; 612 uint64_t samples = blkg->stats.avg_queue_size_samples; ··· 1125 return blkio_read_blkg_stats(blkcg, cft, cb, 1126 BLKIO_STAT_QUEUED, 1); 1127 #ifdef CONFIG_DEBUG_BLK_CGROUP 1128 + case BLKIO_PROP_unaccounted_time: 1129 + return blkio_read_blkg_stats(blkcg, cft, cb, 1130 + BLKIO_STAT_UNACCOUNTED_TIME, 0); 1131 case BLKIO_PROP_dequeue: 1132 return blkio_read_blkg_stats(blkcg, cft, cb, 1133 BLKIO_STAT_DEQUEUE, 0); ··· 1380 .name = "dequeue", 1381 .private = BLKIOFILE_PRIVATE(BLKIO_POLICY_PROP, 1382 BLKIO_PROP_dequeue), 1383 + .read_map = blkiocg_file_read_map, 1384 + }, 1385 + { 1386 + .name = "unaccounted_time", 1387 + .private = BLKIOFILE_PRIVATE(BLKIO_POLICY_PROP, 1388 + BLKIO_PROP_unaccounted_time), 1389 .read_map = blkiocg_file_read_map, 1390 }, 1391 #endif

+11 -3

block/blk-cgroup.h

··· 49 /* All the single valued stats go below this */ 50 BLKIO_STAT_TIME, 51 BLKIO_STAT_SECTORS, 52 #ifdef CONFIG_DEBUG_BLK_CGROUP 53 BLKIO_STAT_AVG_QUEUE_SIZE, 54 BLKIO_STAT_IDLE_TIME, ··· 83 BLKIO_PROP_io_serviced, 84 BLKIO_PROP_time, 85 BLKIO_PROP_sectors, 86 BLKIO_PROP_io_service_time, 87 BLKIO_PROP_io_wait_time, 88 BLKIO_PROP_io_merged, ··· 117 /* total disk time and nr sectors dispatched by this group */ 118 uint64_t time; 119 uint64_t sectors; 120 uint64_t stat_arr[BLKIO_STAT_QUEUED + 1][BLKIO_STAT_TOTAL]; 121 #ifdef CONFIG_DEBUG_BLK_CGROUP 122 /* Sum of number of IOs queued across all samples */ ··· 245 246 #endif 247 248 - #define BLKIO_WEIGHT_MIN 100 249 #define BLKIO_WEIGHT_MAX 1000 250 #define BLKIO_WEIGHT_DEFAULT 500 251 ··· 298 extern struct blkio_group *blkiocg_lookup_group(struct blkio_cgroup *blkcg, 299 void *key); 300 void blkiocg_update_timeslice_used(struct blkio_group *blkg, 301 - unsigned long time); 302 void blkiocg_update_dispatch_stats(struct blkio_group *blkg, uint64_t bytes, 303 bool direction, bool sync); 304 void blkiocg_update_completion_stats(struct blkio_group *blkg, ··· 325 static inline struct blkio_group * 326 blkiocg_lookup_group(struct blkio_cgroup *blkcg, void *key) { return NULL; } 327 static inline void blkiocg_update_timeslice_used(struct blkio_group *blkg, 328 - unsigned long time) {} 329 static inline void blkiocg_update_dispatch_stats(struct blkio_group *blkg, 330 uint64_t bytes, bool direction, bool sync) {} 331 static inline void blkiocg_update_completion_stats(struct blkio_group *blkg,

··· 49 /* All the single valued stats go below this */ 50 BLKIO_STAT_TIME, 51 BLKIO_STAT_SECTORS, 52 + /* Time not charged to this cgroup */ 53 + BLKIO_STAT_UNACCOUNTED_TIME, 54 #ifdef CONFIG_DEBUG_BLK_CGROUP 55 BLKIO_STAT_AVG_QUEUE_SIZE, 56 BLKIO_STAT_IDLE_TIME, ··· 81 BLKIO_PROP_io_serviced, 82 BLKIO_PROP_time, 83 BLKIO_PROP_sectors, 84 + BLKIO_PROP_unaccounted_time, 85 BLKIO_PROP_io_service_time, 86 BLKIO_PROP_io_wait_time, 87 BLKIO_PROP_io_merged, ··· 114 /* total disk time and nr sectors dispatched by this group */ 115 uint64_t time; 116 uint64_t sectors; 117 + /* Time not charged to this cgroup */ 118 + uint64_t unaccounted_time; 119 uint64_t stat_arr[BLKIO_STAT_QUEUED + 1][BLKIO_STAT_TOTAL]; 120 #ifdef CONFIG_DEBUG_BLK_CGROUP 121 /* Sum of number of IOs queued across all samples */ ··· 240 241 #endif 242 243 + #define BLKIO_WEIGHT_MIN 10 244 #define BLKIO_WEIGHT_MAX 1000 245 #define BLKIO_WEIGHT_DEFAULT 500 246 ··· 293 extern struct blkio_group *blkiocg_lookup_group(struct blkio_cgroup *blkcg, 294 void *key); 295 void blkiocg_update_timeslice_used(struct blkio_group *blkg, 296 + unsigned long time, 297 + unsigned long unaccounted_time); 298 void blkiocg_update_dispatch_stats(struct blkio_group *blkg, uint64_t bytes, 299 bool direction, bool sync); 300 void blkiocg_update_completion_stats(struct blkio_group *blkg, ··· 319 static inline struct blkio_group * 320 blkiocg_lookup_group(struct blkio_cgroup *blkcg, void *key) { return NULL; } 321 static inline void blkiocg_update_timeslice_used(struct blkio_group *blkg, 322 + unsigned long time, 323 + unsigned long unaccounted_time) 324 + {} 325 static inline void blkiocg_update_dispatch_stats(struct blkio_group *blkg, 326 uint64_t bytes, bool direction, bool sync) {} 327 static inline void blkiocg_update_completion_stats(struct blkio_group *blkg,

+380 -272

block/blk-core.c

··· 27 #include <linux/writeback.h> 28 #include <linux/task_io_accounting_ops.h> 29 #include <linux/fault-inject.h> 30 31 #define CREATE_TRACE_POINTS 32 #include <trace/events/block.h> ··· 150 static void req_bio_endio(struct request *rq, struct bio *bio, 151 unsigned int nbytes, int error) 152 { 153 - struct request_queue *q = rq->q; 154 155 - if (&q->flush_rq != rq) { 156 - if (error) 157 - clear_bit(BIO_UPTODATE, &bio->bi_flags); 158 - else if (!test_bit(BIO_UPTODATE, &bio->bi_flags)) 159 - error = -EIO; 160 - 161 - if (unlikely(nbytes > bio->bi_size)) { 162 - printk(KERN_ERR "%s: want %u bytes done, %u left\n", 163 - __func__, nbytes, bio->bi_size); 164 - nbytes = bio->bi_size; 165 - } 166 - 167 - if (unlikely(rq->cmd_flags & REQ_QUIET)) 168 - set_bit(BIO_QUIET, &bio->bi_flags); 169 - 170 - bio->bi_size -= nbytes; 171 - bio->bi_sector += (nbytes >> 9); 172 - 173 - if (bio_integrity(bio)) 174 - bio_integrity_advance(bio, nbytes); 175 - 176 - if (bio->bi_size == 0) 177 - bio_endio(bio, error); 178 - } else { 179 - /* 180 - * Okay, this is the sequenced flush request in 181 - * progress, just record the error; 182 - */ 183 - if (error && !q->flush_err) 184 - q->flush_err = error; 185 } 186 } 187 188 void blk_dump_rq_flags(struct request *rq, char *msg) ··· 199 EXPORT_SYMBOL(blk_dump_rq_flags); 200 201 /* 202 - * "plug" the device if there are no outstanding requests: this will 203 - * force the transfer to start only after we have put all the requests 204 - * on the list. 205 - * 206 - * This is called with interrupts off and no requests on the queue and 207 - * with the queue lock held. 208 - */ 209 - void blk_plug_device(struct request_queue *q) 210 { 211 - WARN_ON(!irqs_disabled()); 212 - 213 /* 214 - * don't plug a stopped queue, it must be paired with blk_start_queue() 215 - * which will restart the queueing 216 */ 217 - if (blk_queue_stopped(q)) 218 - return; 219 - 220 - if (!queue_flag_test_and_set(QUEUE_FLAG_PLUGGED, q)) { 221 - mod_timer(&q->unplug_timer, jiffies + q->unplug_delay); 222 - trace_block_plug(q); 223 - } 224 } 225 - EXPORT_SYMBOL(blk_plug_device); 226 227 - /** 228 - * blk_plug_device_unlocked - plug a device without queue lock held 229 - * @q: The &struct request_queue to plug 230 - * 231 - * Description: 232 - * Like @blk_plug_device(), but grabs the queue lock and disables 233 - * interrupts. 234 - **/ 235 - void blk_plug_device_unlocked(struct request_queue *q) 236 { 237 - unsigned long flags; 238 239 - spin_lock_irqsave(q->queue_lock, flags); 240 - blk_plug_device(q); 241 - spin_unlock_irqrestore(q->queue_lock, flags); 242 - } 243 - EXPORT_SYMBOL(blk_plug_device_unlocked); 244 - 245 - /* 246 - * remove the queue from the plugged list, if present. called with 247 - * queue lock held and interrupts disabled. 248 - */ 249 - int blk_remove_plug(struct request_queue *q) 250 - { 251 - WARN_ON(!irqs_disabled()); 252 - 253 - if (!queue_flag_test_and_clear(QUEUE_FLAG_PLUGGED, q)) 254 - return 0; 255 - 256 - del_timer(&q->unplug_timer); 257 - return 1; 258 - } 259 - EXPORT_SYMBOL(blk_remove_plug); 260 - 261 - /* 262 - * remove the plug and let it rip.. 263 - */ 264 - void __generic_unplug_device(struct request_queue *q) 265 - { 266 - if (unlikely(blk_queue_stopped(q))) 267 - return; 268 - if (!blk_remove_plug(q) && !blk_queue_nonrot(q)) 269 - return; 270 - 271 - q->request_fn(q); 272 } 273 274 /** 275 - * generic_unplug_device - fire a request queue 276 - * @q: The &struct request_queue in question 277 * 278 * Description: 279 - * Linux uses plugging to build bigger requests queues before letting 280 - * the device have at them. If a queue is plugged, the I/O scheduler 281 - * is still adding and merging requests on the queue. Once the queue 282 - * gets unplugged, the request_fn defined for the queue is invoked and 283 - * transfers started. 284 - **/ 285 - void generic_unplug_device(struct request_queue *q) 286 { 287 - if (blk_queue_plugged(q)) { 288 - spin_lock_irq(q->queue_lock); 289 - __generic_unplug_device(q); 290 - spin_unlock_irq(q->queue_lock); 291 - } 292 } 293 - EXPORT_SYMBOL(generic_unplug_device); 294 - 295 - static void blk_backing_dev_unplug(struct backing_dev_info *bdi, 296 - struct page *page) 297 - { 298 - struct request_queue *q = bdi->unplug_io_data; 299 - 300 - blk_unplug(q); 301 - } 302 - 303 - void blk_unplug_work(struct work_struct *work) 304 - { 305 - struct request_queue *q = 306 - container_of(work, struct request_queue, unplug_work); 307 - 308 - trace_block_unplug_io(q); 309 - q->unplug_fn(q); 310 - } 311 - 312 - void blk_unplug_timeout(unsigned long data) 313 - { 314 - struct request_queue *q = (struct request_queue *)data; 315 - 316 - trace_block_unplug_timer(q); 317 - kblockd_schedule_work(q, &q->unplug_work); 318 - } 319 - 320 - void blk_unplug(struct request_queue *q) 321 - { 322 - /* 323 - * devices don't necessarily have an ->unplug_fn defined 324 - */ 325 - if (q->unplug_fn) { 326 - trace_block_unplug_io(q); 327 - q->unplug_fn(q); 328 - } 329 - } 330 - EXPORT_SYMBOL(blk_unplug); 331 332 /** 333 * blk_start_queue - restart a previously stopped queue ··· 271 **/ 272 void blk_stop_queue(struct request_queue *q) 273 { 274 - blk_remove_plug(q); 275 queue_flag_set(QUEUE_FLAG_STOPPED, q); 276 } 277 EXPORT_SYMBOL(blk_stop_queue); ··· 289 * that its ->make_request_fn will not re-add plugging prior to calling 290 * this function. 291 * 292 */ 293 void blk_sync_queue(struct request_queue *q) 294 { 295 - del_timer_sync(&q->unplug_timer); 296 del_timer_sync(&q->timeout); 297 - cancel_work_sync(&q->unplug_work); 298 - throtl_shutdown_timer_wq(q); 299 } 300 EXPORT_SYMBOL(blk_sync_queue); 301 ··· 314 */ 315 void __blk_run_queue(struct request_queue *q, bool force_kblockd) 316 { 317 - blk_remove_plug(q); 318 - 319 if (unlikely(blk_queue_stopped(q))) 320 - return; 321 - 322 - if (elv_queue_empty(q)) 323 return; 324 325 /* ··· 324 if (!force_kblockd && !queue_flag_test_and_set(QUEUE_FLAG_REENTER, q)) { 325 q->request_fn(q); 326 queue_flag_clear(QUEUE_FLAG_REENTER, q); 327 - } else { 328 - queue_flag_set(QUEUE_FLAG_PLUGGED, q); 329 - kblockd_schedule_work(q, &q->unplug_work); 330 - } 331 } 332 EXPORT_SYMBOL(__blk_run_queue); 333 ··· 352 kobject_put(&q->kobj); 353 } 354 355 void blk_cleanup_queue(struct request_queue *q) 356 { 357 /* ··· 374 375 if (q->elevator) 376 elevator_exit(q->elevator); 377 378 blk_put_queue(q); 379 } ··· 419 if (!q) 420 return NULL; 421 422 - q->backing_dev_info.unplug_io_fn = blk_backing_dev_unplug; 423 - q->backing_dev_info.unplug_io_data = q; 424 q->backing_dev_info.ra_pages = 425 (VM_MAX_READAHEAD * 1024) / PAGE_CACHE_SIZE; 426 q->backing_dev_info.state = 0; ··· 438 439 setup_timer(&q->backing_dev_info.laptop_mode_wb_timer, 440 laptop_mode_timer_fn, (unsigned long) q); 441 - init_timer(&q->unplug_timer); 442 setup_timer(&q->timeout, blk_rq_timed_out_timer, (unsigned long) q); 443 INIT_LIST_HEAD(&q->timeout_list); 444 - INIT_LIST_HEAD(&q->pending_flushes); 445 - INIT_WORK(&q->unplug_work, blk_unplug_work); 446 447 kobject_init(&q->kobj, &blk_queue_ktype); 448 449 mutex_init(&q->sysfs_lock); 450 spin_lock_init(&q->__queue_lock); 451 452 return q; 453 } ··· 538 q->request_fn = rfn; 539 q->prep_rq_fn = NULL; 540 q->unprep_rq_fn = NULL; 541 - q->unplug_fn = generic_unplug_device; 542 q->queue_flags = QUEUE_FLAG_DEFAULT; 543 - q->queue_lock = lock; 544 545 /* 546 * This also sets hw/phys segments, boundary and size ··· 575 576 static inline void blk_free_request(struct request_queue *q, struct request *rq) 577 { 578 if (rq->cmd_flags & REQ_ELVPRIV) 579 elv_put_request(q, rq); 580 mempool_free(rq, q->rq.rq_pool); ··· 673 } 674 675 /* 676 * Get a free request, queue_lock must be held. 677 * Returns NULL on failure, with queue_lock held. 678 * Returns !NULL on success, with queue_lock *not held*. ··· 703 struct request_list *rl = &q->rq; 704 struct io_context *ioc = NULL; 705 const bool is_sync = rw_is_sync(rw_flags) != 0; 706 - int may_queue, priv; 707 708 may_queue = elv_may_queue(q, rw_flags); 709 if (may_queue == ELV_MQUEUE_NO) ··· 747 rl->count[is_sync]++; 748 rl->starved[is_sync] = 0; 749 750 - priv = !test_bit(QUEUE_FLAG_ELVSWITCH, &q->queue_flags); 751 - if (priv) 752 - rl->elvpriv++; 753 754 if (blk_queue_io_stat(q)) 755 rw_flags |= REQ_IO_STAT; ··· 798 } 799 800 /* 801 - * No available requests for this queue, unplug the device and wait for some 802 - * requests to become available. 803 * 804 * Called with q->queue_lock held, and returns with it unlocked. 805 */ ··· 820 821 trace_block_sleeprq(q, bio, rw_flags & 1); 822 823 - __generic_unplug_device(q); 824 spin_unlock_irq(q->queue_lock); 825 io_schedule(); 826 ··· 941 } 942 EXPORT_SYMBOL(blk_requeue_request); 943 944 /** 945 * blk_insert_request - insert a special request into a request queue 946 * @q: request queue where request should be inserted ··· 990 if (blk_rq_tagged(rq)) 991 blk_queue_end_tag(q, rq); 992 993 - drive_stat_acct(rq, 1); 994 - __elv_add_request(q, rq, where, 0); 995 __blk_run_queue(q, false); 996 spin_unlock_irqrestore(q->queue_lock, flags); 997 } ··· 1111 } 1112 EXPORT_SYMBOL_GPL(blk_add_request_payload); 1113 1114 void init_request_from_bio(struct request *req, struct bio *bio) 1115 { 1116 req->cpu = bio->bi_comp_cpu; ··· 1233 blk_rq_bio_prep(req->q, req, bio); 1234 } 1235 1236 - /* 1237 - * Only disabling plugging for non-rotational devices if it does tagging 1238 - * as well, otherwise we do need the proper merging 1239 - */ 1240 - static inline bool queue_should_plug(struct request_queue *q) 1241 - { 1242 - return !(blk_queue_nonrot(q) && blk_queue_tagged(q)); 1243 - } 1244 - 1245 static int __make_request(struct request_queue *q, struct bio *bio) 1246 { 1247 - struct request *req; 1248 - int el_ret; 1249 - unsigned int bytes = bio->bi_size; 1250 - const unsigned short prio = bio_prio(bio); 1251 const bool sync = !!(bio->bi_rw & REQ_SYNC); 1252 - const bool unplug = !!(bio->bi_rw & REQ_UNPLUG); 1253 - const unsigned long ff = bio->bi_rw & REQ_FAILFAST_MASK; 1254 - int where = ELEVATOR_INSERT_SORT; 1255 - int rw_flags; 1256 1257 /* 1258 * low level driver can indicate that it wants pages above a ··· 1247 */ 1248 blk_queue_bounce(q, &bio); 1249 1250 - spin_lock_irq(q->queue_lock); 1251 - 1252 if (bio->bi_rw & (REQ_FLUSH | REQ_FUA)) { 1253 - where = ELEVATOR_INSERT_FRONT; 1254 goto get_rq; 1255 } 1256 1257 - if (elv_queue_empty(q)) 1258 - goto get_rq; 1259 1260 el_ret = elv_merge(q, &req, bio); 1261 - switch (el_ret) { 1262 - case ELEVATOR_BACK_MERGE: 1263 - BUG_ON(!rq_mergeable(req)); 1264 - 1265 - if (!ll_back_merge_fn(q, req, bio)) 1266 - break; 1267 - 1268 - trace_block_bio_backmerge(q, bio); 1269 - 1270 - if ((req->cmd_flags & REQ_FAILFAST_MASK) != ff) 1271 - blk_rq_set_mixed_merge(req); 1272 - 1273 - req->biotail->bi_next = bio; 1274 - req->biotail = bio; 1275 - req->__data_len += bytes; 1276 - req->ioprio = ioprio_best(req->ioprio, prio); 1277 - if (!blk_rq_cpu_valid(req)) 1278 - req->cpu = bio->bi_comp_cpu; 1279 - drive_stat_acct(req, 0); 1280 - elv_bio_merged(q, req, bio); 1281 - if (!attempt_back_merge(q, req)) 1282 - elv_merged_request(q, req, el_ret); 1283 - goto out; 1284 - 1285 - case ELEVATOR_FRONT_MERGE: 1286 - BUG_ON(!rq_mergeable(req)); 1287 - 1288 - if (!ll_front_merge_fn(q, req, bio)) 1289 - break; 1290 - 1291 - trace_block_bio_frontmerge(q, bio); 1292 - 1293 - if ((req->cmd_flags & REQ_FAILFAST_MASK) != ff) { 1294 - blk_rq_set_mixed_merge(req); 1295 - req->cmd_flags &= ~REQ_FAILFAST_MASK; 1296 - req->cmd_flags |= ff; 1297 } 1298 - 1299 - bio->bi_next = req->bio; 1300 - req->bio = bio; 1301 - 1302 - /* 1303 - * may not be valid. if the low level driver said 1304 - * it didn't need a bounce buffer then it better 1305 - * not touch req->buffer either... 1306 - */ 1307 - req->buffer = bio_data(bio); 1308 - req->__sector = bio->bi_sector; 1309 - req->__data_len += bytes; 1310 - req->ioprio = ioprio_best(req->ioprio, prio); 1311 - if (!blk_rq_cpu_valid(req)) 1312 - req->cpu = bio->bi_comp_cpu; 1313 - drive_stat_acct(req, 0); 1314 - elv_bio_merged(q, req, bio); 1315 - if (!attempt_front_merge(q, req)) 1316 - elv_merged_request(q, req, el_ret); 1317 - goto out; 1318 - 1319 - /* ELV_NO_MERGE: elevator says don't/can't merge. */ 1320 - default: 1321 - ; 1322 } 1323 1324 get_rq: ··· 1303 */ 1304 init_request_from_bio(req, bio); 1305 1306 - spin_lock_irq(q->queue_lock); 1307 if (test_bit(QUEUE_FLAG_SAME_COMP, &q->queue_flags) || 1308 - bio_flagged(bio, BIO_CPU_AFFINE)) 1309 - req->cpu = blk_cpu_to_group(smp_processor_id()); 1310 - if (queue_should_plug(q) && elv_queue_empty(q)) 1311 - blk_plug_device(q); 1312 1313 - /* insert the request into the elevator */ 1314 - drive_stat_acct(req, 1); 1315 - __elv_add_request(q, req, where, 0); 1316 out: 1317 - if (unplug || !queue_should_plug(q)) 1318 - __generic_unplug_device(q); 1319 - spin_unlock_irq(q->queue_lock); 1320 return 0; 1321 } 1322 ··· 1734 */ 1735 BUG_ON(blk_queued_rq(rq)); 1736 1737 - drive_stat_acct(rq, 1); 1738 - __elv_add_request(q, rq, ELEVATOR_INSERT_BACK, 0); 1739 - 1740 spin_unlock_irqrestore(q->queue_lock, flags); 1741 1742 return 0; ··· 1806 * normal IO on queueing nor completion. Accounting the 1807 * containing request is enough. 1808 */ 1809 - if (blk_do_io_stat(req) && req != &req->q->flush_rq) { 1810 unsigned long duration = jiffies - req->start_time; 1811 const int rw = rq_data_dir(req); 1812 struct hd_struct *part; ··· 2628 return queue_work(kblockd_workqueue, work); 2629 } 2630 EXPORT_SYMBOL(kblockd_schedule_work); 2631 2632 int __init blk_dev_init(void) 2633 {

··· 27 #include <linux/writeback.h> 28 #include <linux/task_io_accounting_ops.h> 29 #include <linux/fault-inject.h> 30 + #include <linux/list_sort.h> 31 32 #define CREATE_TRACE_POINTS 33 #include <trace/events/block.h> ··· 149 static void req_bio_endio(struct request *rq, struct bio *bio, 150 unsigned int nbytes, int error) 151 { 152 + if (error) 153 + clear_bit(BIO_UPTODATE, &bio->bi_flags); 154 + else if (!test_bit(BIO_UPTODATE, &bio->bi_flags)) 155 + error = -EIO; 156 157 + if (unlikely(nbytes > bio->bi_size)) { 158 + printk(KERN_ERR "%s: want %u bytes done, %u left\n", 159 + __func__, nbytes, bio->bi_size); 160 + nbytes = bio->bi_size; 161 } 162 + 163 + if (unlikely(rq->cmd_flags & REQ_QUIET)) 164 + set_bit(BIO_QUIET, &bio->bi_flags); 165 + 166 + bio->bi_size -= nbytes; 167 + bio->bi_sector += (nbytes >> 9); 168 + 169 + if (bio_integrity(bio)) 170 + bio_integrity_advance(bio, nbytes); 171 + 172 + /* don't actually finish bio if it's part of flush sequence */ 173 + if (bio->bi_size == 0 && !(rq->cmd_flags & REQ_FLUSH_SEQ)) 174 + bio_endio(bio, error); 175 } 176 177 void blk_dump_rq_flags(struct request *rq, char *msg) ··· 208 EXPORT_SYMBOL(blk_dump_rq_flags); 209 210 /* 211 + * Make sure that plugs that were pending when this function was entered, 212 + * are now complete and requests pushed to the queue. 213 + */ 214 + static inline void queue_sync_plugs(struct request_queue *q) 215 { 216 /* 217 + * If the current process is plugged and has barriers submitted, 218 + * we will livelock if we don't unplug first. 219 */ 220 + blk_flush_plug(current); 221 } 222 223 + static void blk_delay_work(struct work_struct *work) 224 { 225 + struct request_queue *q; 226 227 + q = container_of(work, struct request_queue, delay_work.work); 228 + spin_lock_irq(q->queue_lock); 229 + __blk_run_queue(q, false); 230 + spin_unlock_irq(q->queue_lock); 231 } 232 233 /** 234 + * blk_delay_queue - restart queueing after defined interval 235 + * @q: The &struct request_queue in question 236 + * @msecs: Delay in msecs 237 * 238 * Description: 239 + * Sometimes queueing needs to be postponed for a little while, to allow 240 + * resources to come back. This function will make sure that queueing is 241 + * restarted around the specified time. 242 + */ 243 + void blk_delay_queue(struct request_queue *q, unsigned long msecs) 244 { 245 + schedule_delayed_work(&q->delay_work, msecs_to_jiffies(msecs)); 246 } 247 + EXPORT_SYMBOL(blk_delay_queue); 248 249 /** 250 * blk_start_queue - restart a previously stopped queue ··· 372 **/ 373 void blk_stop_queue(struct request_queue *q) 374 { 375 + cancel_delayed_work(&q->delay_work); 376 queue_flag_set(QUEUE_FLAG_STOPPED, q); 377 } 378 EXPORT_SYMBOL(blk_stop_queue); ··· 390 * that its ->make_request_fn will not re-add plugging prior to calling 391 * this function. 392 * 393 + * This function does not cancel any asynchronous activity arising 394 + * out of elevator or throttling code. That would require elevaotor_exit() 395 + * and blk_throtl_exit() to be called with queue lock initialized. 396 + * 397 */ 398 void blk_sync_queue(struct request_queue *q) 399 { 400 del_timer_sync(&q->timeout); 401 + cancel_delayed_work_sync(&q->delay_work); 402 + queue_sync_plugs(q); 403 } 404 EXPORT_SYMBOL(blk_sync_queue); 405 ··· 412 */ 413 void __blk_run_queue(struct request_queue *q, bool force_kblockd) 414 { 415 if (unlikely(blk_queue_stopped(q))) 416 return; 417 418 /* ··· 427 if (!force_kblockd && !queue_flag_test_and_set(QUEUE_FLAG_REENTER, q)) { 428 q->request_fn(q); 429 queue_flag_clear(QUEUE_FLAG_REENTER, q); 430 + } else 431 + queue_delayed_work(kblockd_workqueue, &q->delay_work, 0); 432 } 433 EXPORT_SYMBOL(__blk_run_queue); 434 ··· 457 kobject_put(&q->kobj); 458 } 459 460 + /* 461 + * Note: If a driver supplied the queue lock, it should not zap that lock 462 + * unexpectedly as some queue cleanup components like elevator_exit() and 463 + * blk_throtl_exit() need queue lock. 464 + */ 465 void blk_cleanup_queue(struct request_queue *q) 466 { 467 /* ··· 474 475 if (q->elevator) 476 elevator_exit(q->elevator); 477 + 478 + blk_throtl_exit(q); 479 480 blk_put_queue(q); 481 } ··· 517 if (!q) 518 return NULL; 519 520 q->backing_dev_info.ra_pages = 521 (VM_MAX_READAHEAD * 1024) / PAGE_CACHE_SIZE; 522 q->backing_dev_info.state = 0; ··· 538 539 setup_timer(&q->backing_dev_info.laptop_mode_wb_timer, 540 laptop_mode_timer_fn, (unsigned long) q); 541 setup_timer(&q->timeout, blk_rq_timed_out_timer, (unsigned long) q); 542 INIT_LIST_HEAD(&q->timeout_list); 543 + INIT_LIST_HEAD(&q->flush_queue[0]); 544 + INIT_LIST_HEAD(&q->flush_queue[1]); 545 + INIT_LIST_HEAD(&q->flush_data_in_flight); 546 + INIT_DELAYED_WORK(&q->delay_work, blk_delay_work); 547 548 kobject_init(&q->kobj, &blk_queue_ktype); 549 550 mutex_init(&q->sysfs_lock); 551 spin_lock_init(&q->__queue_lock); 552 + 553 + /* 554 + * By default initialize queue_lock to internal lock and driver can 555 + * override it later if need be. 556 + */ 557 + q->queue_lock = &q->__queue_lock; 558 559 return q; 560 } ··· 631 q->request_fn = rfn; 632 q->prep_rq_fn = NULL; 633 q->unprep_rq_fn = NULL; 634 q->queue_flags = QUEUE_FLAG_DEFAULT; 635 + 636 + /* Override internal queue lock with supplied lock pointer */ 637 + if (lock) 638 + q->queue_lock = lock; 639 640 /* 641 * This also sets hw/phys segments, boundary and size ··· 666 667 static inline void blk_free_request(struct request_queue *q, struct request *rq) 668 { 669 + BUG_ON(rq->cmd_flags & REQ_ON_PLUG); 670 + 671 if (rq->cmd_flags & REQ_ELVPRIV) 672 elv_put_request(q, rq); 673 mempool_free(rq, q->rq.rq_pool); ··· 762 } 763 764 /* 765 + * Determine if elevator data should be initialized when allocating the 766 + * request associated with @bio. 767 + */ 768 + static bool blk_rq_should_init_elevator(struct bio *bio) 769 + { 770 + if (!bio) 771 + return true; 772 + 773 + /* 774 + * Flush requests do not use the elevator so skip initialization. 775 + * This allows a request to share the flush and elevator data. 776 + */ 777 + if (bio->bi_rw & (REQ_FLUSH | REQ_FUA)) 778 + return false; 779 + 780 + return true; 781 + } 782 + 783 + /* 784 * Get a free request, queue_lock must be held. 785 * Returns NULL on failure, with queue_lock held. 786 * Returns !NULL on success, with queue_lock *not held*. ··· 773 struct request_list *rl = &q->rq; 774 struct io_context *ioc = NULL; 775 const bool is_sync = rw_is_sync(rw_flags) != 0; 776 + int may_queue, priv = 0; 777 778 may_queue = elv_may_queue(q, rw_flags); 779 if (may_queue == ELV_MQUEUE_NO) ··· 817 rl->count[is_sync]++; 818 rl->starved[is_sync] = 0; 819 820 + if (blk_rq_should_init_elevator(bio)) { 821 + priv = !test_bit(QUEUE_FLAG_ELVSWITCH, &q->queue_flags); 822 + if (priv) 823 + rl->elvpriv++; 824 + } 825 826 if (blk_queue_io_stat(q)) 827 rw_flags |= REQ_IO_STAT; ··· 866 } 867 868 /* 869 + * No available requests for this queue, wait for some requests to become 870 + * available. 871 * 872 * Called with q->queue_lock held, and returns with it unlocked. 873 */ ··· 888 889 trace_block_sleeprq(q, bio, rw_flags & 1); 890 891 spin_unlock_irq(q->queue_lock); 892 io_schedule(); 893 ··· 1010 } 1011 EXPORT_SYMBOL(blk_requeue_request); 1012 1013 + static void add_acct_request(struct request_queue *q, struct request *rq, 1014 + int where) 1015 + { 1016 + drive_stat_acct(rq, 1); 1017 + __elv_add_request(q, rq, where); 1018 + } 1019 + 1020 /** 1021 * blk_insert_request - insert a special request into a request queue 1022 * @q: request queue where request should be inserted ··· 1052 if (blk_rq_tagged(rq)) 1053 blk_queue_end_tag(q, rq); 1054 1055 + add_acct_request(q, rq, where); 1056 __blk_run_queue(q, false); 1057 spin_unlock_irqrestore(q->queue_lock, flags); 1058 } ··· 1174 } 1175 EXPORT_SYMBOL_GPL(blk_add_request_payload); 1176 1177 + static bool bio_attempt_back_merge(struct request_queue *q, struct request *req, 1178 + struct bio *bio) 1179 + { 1180 + const int ff = bio->bi_rw & REQ_FAILFAST_MASK; 1181 + 1182 + /* 1183 + * Debug stuff, kill later 1184 + */ 1185 + if (!rq_mergeable(req)) { 1186 + blk_dump_rq_flags(req, "back"); 1187 + return false; 1188 + } 1189 + 1190 + if (!ll_back_merge_fn(q, req, bio)) 1191 + return false; 1192 + 1193 + trace_block_bio_backmerge(q, bio); 1194 + 1195 + if ((req->cmd_flags & REQ_FAILFAST_MASK) != ff) 1196 + blk_rq_set_mixed_merge(req); 1197 + 1198 + req->biotail->bi_next = bio; 1199 + req->biotail = bio; 1200 + req->__data_len += bio->bi_size; 1201 + req->ioprio = ioprio_best(req->ioprio, bio_prio(bio)); 1202 + 1203 + drive_stat_acct(req, 0); 1204 + return true; 1205 + } 1206 + 1207 + static bool bio_attempt_front_merge(struct request_queue *q, 1208 + struct request *req, struct bio *bio) 1209 + { 1210 + const int ff = bio->bi_rw & REQ_FAILFAST_MASK; 1211 + sector_t sector; 1212 + 1213 + /* 1214 + * Debug stuff, kill later 1215 + */ 1216 + if (!rq_mergeable(req)) { 1217 + blk_dump_rq_flags(req, "front"); 1218 + return false; 1219 + } 1220 + 1221 + if (!ll_front_merge_fn(q, req, bio)) 1222 + return false; 1223 + 1224 + trace_block_bio_frontmerge(q, bio); 1225 + 1226 + if ((req->cmd_flags & REQ_FAILFAST_MASK) != ff) 1227 + blk_rq_set_mixed_merge(req); 1228 + 1229 + sector = bio->bi_sector; 1230 + 1231 + bio->bi_next = req->bio; 1232 + req->bio = bio; 1233 + 1234 + /* 1235 + * may not be valid. if the low level driver said 1236 + * it didn't need a bounce buffer then it better 1237 + * not touch req->buffer either... 1238 + */ 1239 + req->buffer = bio_data(bio); 1240 + req->__sector = bio->bi_sector; 1241 + req->__data_len += bio->bi_size; 1242 + req->ioprio = ioprio_best(req->ioprio, bio_prio(bio)); 1243 + 1244 + drive_stat_acct(req, 0); 1245 + return true; 1246 + } 1247 + 1248 + /* 1249 + * Attempts to merge with the plugged list in the current process. Returns 1250 + * true if merge was succesful, otherwise false. 1251 + */ 1252 + static bool attempt_plug_merge(struct task_struct *tsk, struct request_queue *q, 1253 + struct bio *bio) 1254 + { 1255 + struct blk_plug *plug; 1256 + struct request *rq; 1257 + bool ret = false; 1258 + 1259 + plug = tsk->plug; 1260 + if (!plug) 1261 + goto out; 1262 + 1263 + list_for_each_entry_reverse(rq, &plug->list, queuelist) { 1264 + int el_ret; 1265 + 1266 + if (rq->q != q) 1267 + continue; 1268 + 1269 + el_ret = elv_try_merge(rq, bio); 1270 + if (el_ret == ELEVATOR_BACK_MERGE) { 1271 + ret = bio_attempt_back_merge(q, rq, bio); 1272 + if (ret) 1273 + break; 1274 + } else if (el_ret == ELEVATOR_FRONT_MERGE) { 1275 + ret = bio_attempt_front_merge(q, rq, bio); 1276 + if (ret) 1277 + break; 1278 + } 1279 + } 1280 + out: 1281 + return ret; 1282 + } 1283 + 1284 void init_request_from_bio(struct request *req, struct bio *bio) 1285 { 1286 req->cpu = bio->bi_comp_cpu; ··· 1189 blk_rq_bio_prep(req->q, req, bio); 1190 } 1191 1192 static int __make_request(struct request_queue *q, struct bio *bio) 1193 { 1194 const bool sync = !!(bio->bi_rw & REQ_SYNC); 1195 + struct blk_plug *plug; 1196 + int el_ret, rw_flags, where = ELEVATOR_INSERT_SORT; 1197 + struct request *req; 1198 1199 /* 1200 * low level driver can indicate that it wants pages above a ··· 1217 */ 1218 blk_queue_bounce(q, &bio); 1219 1220 if (bio->bi_rw & (REQ_FLUSH | REQ_FUA)) { 1221 + spin_lock_irq(q->queue_lock); 1222 + where = ELEVATOR_INSERT_FLUSH; 1223 goto get_rq; 1224 } 1225 1226 + /* 1227 + * Check if we can merge with the plugged list before grabbing 1228 + * any locks. 1229 + */ 1230 + if (attempt_plug_merge(current, q, bio)) 1231 + goto out; 1232 + 1233 + spin_lock_irq(q->queue_lock); 1234 1235 el_ret = elv_merge(q, &req, bio); 1236 + if (el_ret == ELEVATOR_BACK_MERGE) { 1237 + BUG_ON(req->cmd_flags & REQ_ON_PLUG); 1238 + if (bio_attempt_back_merge(q, req, bio)) { 1239 + if (!attempt_back_merge(q, req)) 1240 + elv_merged_request(q, req, el_ret); 1241 + goto out_unlock; 1242 } 1243 + } else if (el_ret == ELEVATOR_FRONT_MERGE) { 1244 + BUG_ON(req->cmd_flags & REQ_ON_PLUG); 1245 + if (bio_attempt_front_merge(q, req, bio)) { 1246 + if (!attempt_front_merge(q, req)) 1247 + elv_merged_request(q, req, el_ret); 1248 + goto out_unlock; 1249 + } 1250 } 1251 1252 get_rq: ··· 1315 */ 1316 init_request_from_bio(req, bio); 1317 1318 if (test_bit(QUEUE_FLAG_SAME_COMP, &q->queue_flags) || 1319 + bio_flagged(bio, BIO_CPU_AFFINE)) { 1320 + req->cpu = blk_cpu_to_group(get_cpu()); 1321 + put_cpu(); 1322 + } 1323 1324 + plug = current->plug; 1325 + if (plug) { 1326 + if (!plug->should_sort && !list_empty(&plug->list)) { 1327 + struct request *__rq; 1328 + 1329 + __rq = list_entry_rq(plug->list.prev); 1330 + if (__rq->q != q) 1331 + plug->should_sort = 1; 1332 + } 1333 + /* 1334 + * Debug flag, kill later 1335 + */ 1336 + req->cmd_flags |= REQ_ON_PLUG; 1337 + list_add_tail(&req->queuelist, &plug->list); 1338 + drive_stat_acct(req, 1); 1339 + } else { 1340 + spin_lock_irq(q->queue_lock); 1341 + add_acct_request(q, req, where); 1342 + __blk_run_queue(q, false); 1343 + out_unlock: 1344 + spin_unlock_irq(q->queue_lock); 1345 + } 1346 out: 1347 return 0; 1348 } 1349 ··· 1731 */ 1732 BUG_ON(blk_queued_rq(rq)); 1733 1734 + add_acct_request(q, rq, ELEVATOR_INSERT_BACK); 1735 spin_unlock_irqrestore(q->queue_lock, flags); 1736 1737 return 0; ··· 1805 * normal IO on queueing nor completion. Accounting the 1806 * containing request is enough. 1807 */ 1808 + if (blk_do_io_stat(req) && !(req->cmd_flags & REQ_FLUSH_SEQ)) { 1809 unsigned long duration = jiffies - req->start_time; 1810 const int rw = rq_data_dir(req); 1811 struct hd_struct *part; ··· 2627 return queue_work(kblockd_workqueue, work); 2628 } 2629 EXPORT_SYMBOL(kblockd_schedule_work); 2630 + 2631 + int kblockd_schedule_delayed_work(struct request_queue *q, 2632 + struct delayed_work *dwork, unsigned long delay) 2633 + { 2634 + return queue_delayed_work(kblockd_workqueue, dwork, delay); 2635 + } 2636 + EXPORT_SYMBOL(kblockd_schedule_delayed_work); 2637 + 2638 + #define PLUG_MAGIC 0x91827364 2639 + 2640 + void blk_start_plug(struct blk_plug *plug) 2641 + { 2642 + struct task_struct *tsk = current; 2643 + 2644 + plug->magic = PLUG_MAGIC; 2645 + INIT_LIST_HEAD(&plug->list); 2646 + plug->should_sort = 0; 2647 + 2648 + /* 2649 + * If this is a nested plug, don't actually assign it. It will be 2650 + * flushed on its own. 2651 + */ 2652 + if (!tsk->plug) { 2653 + /* 2654 + * Store ordering should not be needed here, since a potential 2655 + * preempt will imply a full memory barrier 2656 + */ 2657 + tsk->plug = plug; 2658 + } 2659 + } 2660 + EXPORT_SYMBOL(blk_start_plug); 2661 + 2662 + static int plug_rq_cmp(void *priv, struct list_head *a, struct list_head *b) 2663 + { 2664 + struct request *rqa = container_of(a, struct request, queuelist); 2665 + struct request *rqb = container_of(b, struct request, queuelist); 2666 + 2667 + return !(rqa->q == rqb->q); 2668 + } 2669 + 2670 + static void flush_plug_list(struct blk_plug *plug) 2671 + { 2672 + struct request_queue *q; 2673 + unsigned long flags; 2674 + struct request *rq; 2675 + 2676 + BUG_ON(plug->magic != PLUG_MAGIC); 2677 + 2678 + if (list_empty(&plug->list)) 2679 + return; 2680 + 2681 + if (plug->should_sort) 2682 + list_sort(NULL, &plug->list, plug_rq_cmp); 2683 + 2684 + q = NULL; 2685 + local_irq_save(flags); 2686 + while (!list_empty(&plug->list)) { 2687 + rq = list_entry_rq(plug->list.next); 2688 + list_del_init(&rq->queuelist); 2689 + BUG_ON(!(rq->cmd_flags & REQ_ON_PLUG)); 2690 + BUG_ON(!rq->q); 2691 + if (rq->q != q) { 2692 + if (q) { 2693 + __blk_run_queue(q, false); 2694 + spin_unlock(q->queue_lock); 2695 + } 2696 + q = rq->q; 2697 + spin_lock(q->queue_lock); 2698 + } 2699 + rq->cmd_flags &= ~REQ_ON_PLUG; 2700 + 2701 + /* 2702 + * rq is already accounted, so use raw insert 2703 + */ 2704 + __elv_add_request(q, rq, ELEVATOR_INSERT_SORT_MERGE); 2705 + } 2706 + 2707 + if (q) { 2708 + __blk_run_queue(q, false); 2709 + spin_unlock(q->queue_lock); 2710 + } 2711 + 2712 + BUG_ON(!list_empty(&plug->list)); 2713 + local_irq_restore(flags); 2714 + } 2715 + 2716 + static void __blk_finish_plug(struct task_struct *tsk, struct blk_plug *plug) 2717 + { 2718 + flush_plug_list(plug); 2719 + 2720 + if (plug == tsk->plug) 2721 + tsk->plug = NULL; 2722 + } 2723 + 2724 + void blk_finish_plug(struct blk_plug *plug) 2725 + { 2726 + if (plug) 2727 + __blk_finish_plug(current, plug); 2728 + } 2729 + EXPORT_SYMBOL(blk_finish_plug); 2730 + 2731 + void __blk_flush_plug(struct task_struct *tsk, struct blk_plug *plug) 2732 + { 2733 + __blk_finish_plug(tsk, plug); 2734 + tsk->plug = plug; 2735 + } 2736 + EXPORT_SYMBOL(__blk_flush_plug); 2737 2738 int __init blk_dev_init(void) 2739 {

+2 -2

block/blk-exec.c

··· 54 rq->end_io = done; 55 WARN_ON(irqs_disabled()); 56 spin_lock_irq(q->queue_lock); 57 - __elv_add_request(q, rq, where, 1); 58 - __generic_unplug_device(q); 59 /* the queue is stopped so it won't be plugged+unplugged */ 60 if (rq->cmd_type == REQ_TYPE_PM_RESUME) 61 q->request_fn(q);

··· 54 rq->end_io = done; 55 WARN_ON(irqs_disabled()); 56 spin_lock_irq(q->queue_lock); 57 + __elv_add_request(q, rq, where); 58 + __blk_run_queue(q, false); 59 /* the queue is stopped so it won't be plugged+unplugged */ 60 if (rq->cmd_type == REQ_TYPE_PM_RESUME) 61 q->request_fn(q);

+319 -146

block/blk-flush.c

··· 1 /* 2 * Functions to sequence FLUSH and FUA writes. 3 */ 4 #include <linux/kernel.h> 5 #include <linux/module.h> 6 #include <linux/bio.h> ··· 74 75 /* FLUSH/FUA sequences */ 76 enum { 77 - QUEUE_FSEQ_STARTED = (1 << 0), /* flushing in progress */ 78 - QUEUE_FSEQ_PREFLUSH = (1 << 1), /* pre-flushing in progress */ 79 - QUEUE_FSEQ_DATA = (1 << 2), /* data write in progress */ 80 - QUEUE_FSEQ_POSTFLUSH = (1 << 3), /* post-flushing in progress */ 81 - QUEUE_FSEQ_DONE = (1 << 4), 82 }; 83 84 - static struct request *queue_next_fseq(struct request_queue *q); 85 86 - unsigned blk_flush_cur_seq(struct request_queue *q) 87 { 88 - if (!q->flush_seq) 89 - return 0; 90 - return 1 << ffz(q->flush_seq); 91 - } 92 93 - static struct request *blk_flush_complete_seq(struct request_queue *q, 94 - unsigned seq, int error) 95 - { 96 - struct request *next_rq = NULL; 97 - 98 - if (error && !q->flush_err) 99 - q->flush_err = error; 100 - 101 - BUG_ON(q->flush_seq & seq); 102 - q->flush_seq |= seq; 103 - 104 - if (blk_flush_cur_seq(q) != QUEUE_FSEQ_DONE) { 105 - /* not complete yet, queue the next flush sequence */ 106 - next_rq = queue_next_fseq(q); 107 - } else { 108 - /* complete this flush request */ 109 - __blk_end_request_all(q->orig_flush_rq, q->flush_err); 110 - q->orig_flush_rq = NULL; 111 - q->flush_seq = 0; 112 - 113 - /* dispatch the next flush if there's one */ 114 - if (!list_empty(&q->pending_flushes)) { 115 - next_rq = list_entry_rq(q->pending_flushes.next); 116 - list_move(&next_rq->queuelist, &q->queue_head); 117 - } 118 } 119 - return next_rq; 120 } 121 122 - static void blk_flush_complete_seq_end_io(struct request_queue *q, 123 - unsigned seq, int error) 124 { 125 - bool was_empty = elv_queue_empty(q); 126 - struct request *next_rq; 127 128 - next_rq = blk_flush_complete_seq(q, seq, error); 129 130 /* 131 * Moving a request silently to empty queue_head may stall the ··· 217 * from request completion path and calling directly into 218 * request_fn may confuse the driver. Always use kblockd. 219 */ 220 - if (was_empty && next_rq) 221 __blk_run_queue(q, true); 222 } 223 224 - static void pre_flush_end_io(struct request *rq, int error) 225 { 226 - elv_completed_request(rq->q, rq); 227 - blk_flush_complete_seq_end_io(rq->q, QUEUE_FSEQ_PREFLUSH, error); 228 } 229 230 static void flush_data_end_io(struct request *rq, int error) 231 { 232 - elv_completed_request(rq->q, rq); 233 - blk_flush_complete_seq_end_io(rq->q, QUEUE_FSEQ_DATA, error); 234 - } 235 - 236 - static void post_flush_end_io(struct request *rq, int error) 237 - { 238 - elv_completed_request(rq->q, rq); 239 - blk_flush_complete_seq_end_io(rq->q, QUEUE_FSEQ_POSTFLUSH, error); 240 - } 241 - 242 - static void init_flush_request(struct request *rq, struct gendisk *disk) 243 - { 244 - rq->cmd_type = REQ_TYPE_FS; 245 - rq->cmd_flags = WRITE_FLUSH; 246 - rq->rq_disk = disk; 247 - } 248 - 249 - static struct request *queue_next_fseq(struct request_queue *q) 250 - { 251 - struct request *orig_rq = q->orig_flush_rq; 252 - struct request *rq = &q->flush_rq; 253 - 254 - blk_rq_init(q, rq); 255 - 256 - switch (blk_flush_cur_seq(q)) { 257 - case QUEUE_FSEQ_PREFLUSH: 258 - init_flush_request(rq, orig_rq->rq_disk); 259 - rq->end_io = pre_flush_end_io; 260 - break; 261 - case QUEUE_FSEQ_DATA: 262 - init_request_from_bio(rq, orig_rq->bio); 263 - /* 264 - * orig_rq->rq_disk may be different from 265 - * bio->bi_bdev->bd_disk if orig_rq got here through 266 - * remapping drivers. Make sure rq->rq_disk points 267 - * to the same one as orig_rq. 268 - */ 269 - rq->rq_disk = orig_rq->rq_disk; 270 - rq->cmd_flags &= ~(REQ_FLUSH | REQ_FUA); 271 - rq->cmd_flags |= orig_rq->cmd_flags & (REQ_FLUSH | REQ_FUA); 272 - rq->end_io = flush_data_end_io; 273 - break; 274 - case QUEUE_FSEQ_POSTFLUSH: 275 - init_flush_request(rq, orig_rq->rq_disk); 276 - rq->end_io = post_flush_end_io; 277 - break; 278 - default: 279 - BUG(); 280 - } 281 - 282 - elv_insert(q, rq, ELEVATOR_INSERT_REQUEUE); 283 - return rq; 284 - } 285 - 286 - struct request *blk_do_flush(struct request_queue *q, struct request *rq) 287 - { 288 - unsigned int fflags = q->flush_flags; /* may change, cache it */ 289 - bool has_flush = fflags & REQ_FLUSH, has_fua = fflags & REQ_FUA; 290 - bool do_preflush = has_flush && (rq->cmd_flags & REQ_FLUSH); 291 - bool do_postflush = has_flush && !has_fua && (rq->cmd_flags & REQ_FUA); 292 - unsigned skip = 0; 293 294 /* 295 - * Special case. If there's data but flush is not necessary, 296 - * the request can be issued directly. 297 - * 298 - * Flush w/o data should be able to be issued directly too but 299 - * currently some drivers assume that rq->bio contains 300 - * non-zero data if it isn't NULL and empty FLUSH requests 301 - * getting here usually have bio's without data. 302 */ 303 - if (blk_rq_sectors(rq) && !do_preflush && !do_postflush) { 304 - rq->cmd_flags &= ~REQ_FLUSH; 305 - if (!has_fua) 306 - rq->cmd_flags &= ~REQ_FUA; 307 - return rq; 308 - } 309 310 /* 311 - * Sequenced flushes can't be processed in parallel. If 312 - * another one is already in progress, queue for later 313 - * processing. 314 */ 315 - if (q->flush_seq) { 316 - list_move_tail(&rq->queuelist, &q->pending_flushes); 317 - return NULL; 318 - } 319 - 320 - /* 321 - * Start a new flush sequence 322 - */ 323 - q->flush_err = 0; 324 - q->flush_seq |= QUEUE_FSEQ_STARTED; 325 - 326 - /* adjust FLUSH/FUA of the original request and stash it away */ 327 rq->cmd_flags &= ~REQ_FLUSH; 328 - if (!has_fua) 329 rq->cmd_flags &= ~REQ_FUA; 330 - blk_dequeue_request(rq); 331 - q->orig_flush_rq = rq; 332 333 - /* skip unneded sequences and return the first one */ 334 - if (!do_preflush) 335 - skip |= QUEUE_FSEQ_PREFLUSH; 336 - if (!blk_rq_sectors(rq)) 337 - skip |= QUEUE_FSEQ_DATA; 338 - if (!do_postflush) 339 - skip |= QUEUE_FSEQ_POSTFLUSH; 340 - return blk_flush_complete_seq(q, skip, 0); 341 } 342 343 static void bio_end_flush(struct bio *bio, int err)

··· 1 /* 2 * Functions to sequence FLUSH and FUA writes. 3 + * 4 + * Copyright (C) 2011 Max Planck Institute for Gravitational Physics 5 + * Copyright (C) 2011 Tejun Heo <tj@kernel.org> 6 + * 7 + * This file is released under the GPLv2. 8 + * 9 + * REQ_{FLUSH|FUA} requests are decomposed to sequences consisted of three 10 + * optional steps - PREFLUSH, DATA and POSTFLUSH - according to the request 11 + * properties and hardware capability. 12 + * 13 + * If a request doesn't have data, only REQ_FLUSH makes sense, which 14 + * indicates a simple flush request. If there is data, REQ_FLUSH indicates 15 + * that the device cache should be flushed before the data is executed, and 16 + * REQ_FUA means that the data must be on non-volatile media on request 17 + * completion. 18 + * 19 + * If the device doesn't have writeback cache, FLUSH and FUA don't make any 20 + * difference. The requests are either completed immediately if there's no 21 + * data or executed as normal requests otherwise. 22 + * 23 + * If the device has writeback cache and supports FUA, REQ_FLUSH is 24 + * translated to PREFLUSH but REQ_FUA is passed down directly with DATA. 25 + * 26 + * If the device has writeback cache and doesn't support FUA, REQ_FLUSH is 27 + * translated to PREFLUSH and REQ_FUA to POSTFLUSH. 28 + * 29 + * The actual execution of flush is double buffered. Whenever a request 30 + * needs to execute PRE or POSTFLUSH, it queues at 31 + * q->flush_queue[q->flush_pending_idx]. Once certain criteria are met, a 32 + * flush is issued and the pending_idx is toggled. When the flush 33 + * completes, all the requests which were pending are proceeded to the next 34 + * step. This allows arbitrary merging of different types of FLUSH/FUA 35 + * requests. 36 + * 37 + * Currently, the following conditions are used to determine when to issue 38 + * flush. 39 + * 40 + * C1. At any given time, only one flush shall be in progress. This makes 41 + * double buffering sufficient. 42 + * 43 + * C2. Flush is deferred if any request is executing DATA of its sequence. 44 + * This avoids issuing separate POSTFLUSHes for requests which shared 45 + * PREFLUSH. 46 + * 47 + * C3. The second condition is ignored if there is a request which has 48 + * waited longer than FLUSH_PENDING_TIMEOUT. This is to avoid 49 + * starvation in the unlikely case where there are continuous stream of 50 + * FUA (without FLUSH) requests. 51 + * 52 + * For devices which support FUA, it isn't clear whether C2 (and thus C3) 53 + * is beneficial. 54 + * 55 + * Note that a sequenced FLUSH/FUA request with DATA is completed twice. 56 + * Once while executing DATA and again after the whole sequence is 57 + * complete. The first completion updates the contained bio but doesn't 58 + * finish it so that the bio submitter is notified only after the whole 59 + * sequence is complete. This is implemented by testing REQ_FLUSH_SEQ in 60 + * req_bio_endio(). 61 + * 62 + * The above peculiarity requires that each FLUSH/FUA request has only one 63 + * bio attached to it, which is guaranteed as they aren't allowed to be 64 + * merged in the usual way. 65 */ 66 + 67 #include <linux/kernel.h> 68 #include <linux/module.h> 69 #include <linux/bio.h> ··· 11 12 /* FLUSH/FUA sequences */ 13 enum { 14 + REQ_FSEQ_PREFLUSH = (1 << 0), /* pre-flushing in progress */ 15 + REQ_FSEQ_DATA = (1 << 1), /* data write in progress */ 16 + REQ_FSEQ_POSTFLUSH = (1 << 2), /* post-flushing in progress */ 17 + REQ_FSEQ_DONE = (1 << 3), 18 + 19 + REQ_FSEQ_ACTIONS = REQ_FSEQ_PREFLUSH | REQ_FSEQ_DATA | 20 + REQ_FSEQ_POSTFLUSH, 21 + 22 + /* 23 + * If flush has been pending longer than the following timeout, 24 + * it's issued even if flush_data requests are still in flight. 25 + */ 26 + FLUSH_PENDING_TIMEOUT = 5 * HZ, 27 }; 28 29 + static bool blk_kick_flush(struct request_queue *q); 30 31 + static unsigned int blk_flush_policy(unsigned int fflags, struct request *rq) 32 { 33 + unsigned int policy = 0; 34 35 + if (fflags & REQ_FLUSH) { 36 + if (rq->cmd_flags & REQ_FLUSH) 37 + policy |= REQ_FSEQ_PREFLUSH; 38 + if (blk_rq_sectors(rq)) 39 + policy |= REQ_FSEQ_DATA; 40 + if (!(fflags & REQ_FUA) && (rq->cmd_flags & REQ_FUA)) 41 + policy |= REQ_FSEQ_POSTFLUSH; 42 } 43 + return policy; 44 } 45 46 + static unsigned int blk_flush_cur_seq(struct request *rq) 47 { 48 + return 1 << ffz(rq->flush.seq); 49 + } 50 51 + static void blk_flush_restore_request(struct request *rq) 52 + { 53 + /* 54 + * After flush data completion, @rq->bio is %NULL but we need to 55 + * complete the bio again. @rq->biotail is guaranteed to equal the 56 + * original @rq->bio. Restore it. 57 + */ 58 + rq->bio = rq->biotail; 59 + 60 + /* make @rq a normal request */ 61 + rq->cmd_flags &= ~REQ_FLUSH_SEQ; 62 + rq->end_io = NULL; 63 + } 64 + 65 + /** 66 + * blk_flush_complete_seq - complete flush sequence 67 + * @rq: FLUSH/FUA request being sequenced 68 + * @seq: sequences to complete (mask of %REQ_FSEQ_*, can be zero) 69 + * @error: whether an error occurred 70 + * 71 + * @rq just completed @seq part of its flush sequence, record the 72 + * completion and trigger the next step. 73 + * 74 + * CONTEXT: 75 + * spin_lock_irq(q->queue_lock) 76 + * 77 + * RETURNS: 78 + * %true if requests were added to the dispatch queue, %false otherwise. 79 + */ 80 + static bool blk_flush_complete_seq(struct request *rq, unsigned int seq, 81 + int error) 82 + { 83 + struct request_queue *q = rq->q; 84 + struct list_head *pending = &q->flush_queue[q->flush_pending_idx]; 85 + bool queued = false; 86 + 87 + BUG_ON(rq->flush.seq & seq); 88 + rq->flush.seq |= seq; 89 + 90 + if (likely(!error)) 91 + seq = blk_flush_cur_seq(rq); 92 + else 93 + seq = REQ_FSEQ_DONE; 94 + 95 + switch (seq) { 96 + case REQ_FSEQ_PREFLUSH: 97 + case REQ_FSEQ_POSTFLUSH: 98 + /* queue for flush */ 99 + if (list_empty(pending)) 100 + q->flush_pending_since = jiffies; 101 + list_move_tail(&rq->flush.list, pending); 102 + break; 103 + 104 + case REQ_FSEQ_DATA: 105 + list_move_tail(&rq->flush.list, &q->flush_data_in_flight); 106 + list_add(&rq->queuelist, &q->queue_head); 107 + queued = true; 108 + break; 109 + 110 + case REQ_FSEQ_DONE: 111 + /* 112 + * @rq was previously adjusted by blk_flush_issue() for 113 + * flush sequencing and may already have gone through the 114 + * flush data request completion path. Restore @rq for 115 + * normal completion and end it. 116 + */ 117 + BUG_ON(!list_empty(&rq->queuelist)); 118 + list_del_init(&rq->flush.list); 119 + blk_flush_restore_request(rq); 120 + __blk_end_request_all(rq, error); 121 + break; 122 + 123 + default: 124 + BUG(); 125 + } 126 + 127 + return blk_kick_flush(q) | queued; 128 + } 129 + 130 + static void flush_end_io(struct request *flush_rq, int error) 131 + { 132 + struct request_queue *q = flush_rq->q; 133 + struct list_head *running = &q->flush_queue[q->flush_running_idx]; 134 + bool queued = false; 135 + struct request *rq, *n; 136 + 137 + BUG_ON(q->flush_pending_idx == q->flush_running_idx); 138 + 139 + /* account completion of the flush request */ 140 + q->flush_running_idx ^= 1; 141 + elv_completed_request(q, flush_rq); 142 + 143 + /* and push the waiting requests to the next stage */ 144 + list_for_each_entry_safe(rq, n, running, flush.list) { 145 + unsigned int seq = blk_flush_cur_seq(rq); 146 + 147 + BUG_ON(seq != REQ_FSEQ_PREFLUSH && seq != REQ_FSEQ_POSTFLUSH); 148 + queued |= blk_flush_complete_seq(rq, seq, error); 149 + } 150 151 /* 152 * Moving a request silently to empty queue_head may stall the ··· 70 * from request completion path and calling directly into 71 * request_fn may confuse the driver. Always use kblockd. 72 */ 73 + if (queued) 74 __blk_run_queue(q, true); 75 } 76 77 + /** 78 + * blk_kick_flush - consider issuing flush request 79 + * @q: request_queue being kicked 80 + * 81 + * Flush related states of @q have changed, consider issuing flush request. 82 + * Please read the comment at the top of this file for more info. 83 + * 84 + * CONTEXT: 85 + * spin_lock_irq(q->queue_lock) 86 + * 87 + * RETURNS: 88 + * %true if flush was issued, %false otherwise. 89 + */ 90 + static bool blk_kick_flush(struct request_queue *q) 91 { 92 + struct list_head *pending = &q->flush_queue[q->flush_pending_idx]; 93 + struct request *first_rq = 94 + list_first_entry(pending, struct request, flush.list); 95 + 96 + /* C1 described at the top of this file */ 97 + if (q->flush_pending_idx != q->flush_running_idx || list_empty(pending)) 98 + return false; 99 + 100 + /* C2 and C3 */ 101 + if (!list_empty(&q->flush_data_in_flight) && 102 + time_before(jiffies, 103 + q->flush_pending_since + FLUSH_PENDING_TIMEOUT)) 104 + return false; 105 + 106 + /* 107 + * Issue flush and toggle pending_idx. This makes pending_idx 108 + * different from running_idx, which means flush is in flight. 109 + */ 110 + blk_rq_init(q, &q->flush_rq); 111 + q->flush_rq.cmd_type = REQ_TYPE_FS; 112 + q->flush_rq.cmd_flags = WRITE_FLUSH | REQ_FLUSH_SEQ; 113 + q->flush_rq.rq_disk = first_rq->rq_disk; 114 + q->flush_rq.end_io = flush_end_io; 115 + 116 + q->flush_pending_idx ^= 1; 117 + elv_insert(q, &q->flush_rq, ELEVATOR_INSERT_REQUEUE); 118 + return true; 119 } 120 121 static void flush_data_end_io(struct request *rq, int error) 122 { 123 + struct request_queue *q = rq->q; 124 125 /* 126 + * After populating an empty queue, kick it to avoid stall. Read 127 + * the comment in flush_end_io(). 128 */ 129 + if (blk_flush_complete_seq(rq, REQ_FSEQ_DATA, error)) 130 + __blk_run_queue(q, true); 131 + } 132 + 133 + /** 134 + * blk_insert_flush - insert a new FLUSH/FUA request 135 + * @rq: request to insert 136 + * 137 + * To be called from elv_insert() for %ELEVATOR_INSERT_FLUSH insertions. 138 + * @rq is being submitted. Analyze what needs to be done and put it on the 139 + * right queue. 140 + * 141 + * CONTEXT: 142 + * spin_lock_irq(q->queue_lock) 143 + */ 144 + void blk_insert_flush(struct request *rq) 145 + { 146 + struct request_queue *q = rq->q; 147 + unsigned int fflags = q->flush_flags; /* may change, cache */ 148 + unsigned int policy = blk_flush_policy(fflags, rq); 149 + 150 + BUG_ON(rq->end_io); 151 + BUG_ON(!rq->bio || rq->bio != rq->biotail); 152 153 /* 154 + * @policy now records what operations need to be done. Adjust 155 + * REQ_FLUSH and FUA for the driver. 156 */ 157 rq->cmd_flags &= ~REQ_FLUSH; 158 + if (!(fflags & REQ_FUA)) 159 rq->cmd_flags &= ~REQ_FUA; 160 161 + /* 162 + * If there's data but flush is not necessary, the request can be 163 + * processed directly without going through flush machinery. Queue 164 + * for normal execution. 165 + */ 166 + if ((policy & REQ_FSEQ_DATA) && 167 + !(policy & (REQ_FSEQ_PREFLUSH | REQ_FSEQ_POSTFLUSH))) { 168 + list_add(&rq->queuelist, &q->queue_head); 169 + return; 170 + } 171 + 172 + /* 173 + * @rq should go through flush machinery. Mark it part of flush 174 + * sequence and submit for further processing. 175 + */ 176 + memset(&rq->flush, 0, sizeof(rq->flush)); 177 + INIT_LIST_HEAD(&rq->flush.list); 178 + rq->cmd_flags |= REQ_FLUSH_SEQ; 179 + rq->end_io = flush_data_end_io; 180 + 181 + blk_flush_complete_seq(rq, REQ_FSEQ_ACTIONS & ~policy, 0); 182 + } 183 + 184 + /** 185 + * blk_abort_flushes - @q is being aborted, abort flush requests 186 + * @q: request_queue being aborted 187 + * 188 + * To be called from elv_abort_queue(). @q is being aborted. Prepare all 189 + * FLUSH/FUA requests for abortion. 190 + * 191 + * CONTEXT: 192 + * spin_lock_irq(q->queue_lock) 193 + */ 194 + void blk_abort_flushes(struct request_queue *q) 195 + { 196 + struct request *rq, *n; 197 + int i; 198 + 199 + /* 200 + * Requests in flight for data are already owned by the dispatch 201 + * queue or the device driver. Just restore for normal completion. 202 + */ 203 + list_for_each_entry_safe(rq, n, &q->flush_data_in_flight, flush.list) { 204 + list_del_init(&rq->flush.list); 205 + blk_flush_restore_request(rq); 206 + } 207 + 208 + /* 209 + * We need to give away requests on flush queues. Restore for 210 + * normal completion and put them on the dispatch queue. 211 + */ 212 + for (i = 0; i < ARRAY_SIZE(q->flush_queue); i++) { 213 + list_for_each_entry_safe(rq, n, &q->flush_queue[i], 214 + flush.list) { 215 + list_del_init(&rq->flush.list); 216 + blk_flush_restore_request(rq); 217 + list_add_tail(&rq->queuelist, &q->queue_head); 218 + } 219 + } 220 } 221 222 static void bio_end_flush(struct bio *bio, int err)

-2

block/blk-lib.c

··· 136 * 137 * Description: 138 * Generate and issue number of bios with zerofiled pages. 139 - * Send barrier at the beginning and at the end if requested. This guarantie 140 - * correct request ordering. Empty barrier allow us to avoid post queue flush. 141 */ 142 143 int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,

··· 136 * 137 * Description: 138 * Generate and issue number of bios with zerofiled pages. 139 */ 140 141 int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,

+6

block/blk-merge.c

··· 465 466 return 0; 467 }

··· 465 466 return 0; 467 } 468 + 469 + int blk_attempt_req_merge(struct request_queue *q, struct request *rq, 470 + struct request *next) 471 + { 472 + return attempt_merge(q, rq, next); 473 + }

-15

block/blk-settings.c

··· 164 blk_queue_congestion_threshold(q); 165 q->nr_batching = BLK_BATCH_REQ; 166 167 - q->unplug_thresh = 4; /* hmm */ 168 - q->unplug_delay = msecs_to_jiffies(3); /* 3 milliseconds */ 169 - if (q->unplug_delay == 0) 170 - q->unplug_delay = 1; 171 - 172 - q->unplug_timer.function = blk_unplug_timeout; 173 - q->unplug_timer.data = (unsigned long)q; 174 - 175 blk_set_default_limits(&q->limits); 176 blk_queue_max_hw_sectors(q, BLK_SAFE_MAX_SECTORS); 177 - 178 - /* 179 - * If the caller didn't supply a lock, fall back to our embedded 180 - * per-queue locks 181 - */ 182 - if (!q->queue_lock) 183 - q->queue_lock = &q->__queue_lock; 184 185 /* 186 * by default assume old behaviour and bounce for any highmem page

··· 164 blk_queue_congestion_threshold(q); 165 q->nr_batching = BLK_BATCH_REQ; 166 167 blk_set_default_limits(&q->limits); 168 blk_queue_max_hw_sectors(q, BLK_SAFE_MAX_SECTORS); 169 170 /* 171 * by default assume old behaviour and bounce for any highmem page

-2

block/blk-sysfs.c

··· 471 472 blk_sync_queue(q); 473 474 - blk_throtl_exit(q); 475 - 476 if (rl->rq_pool) 477 mempool_destroy(rl->rq_pool); 478

··· 471 472 blk_sync_queue(q); 473 474 if (rl->rq_pool) 475 mempool_destroy(rl->rq_pool); 476

+72 -69

block/blk-throttle.c

··· 102 /* Work for dispatching throttled bios */ 103 struct delayed_work throtl_work; 104 105 - atomic_t limits_changed; 106 }; 107 108 enum tg_state_flags { ··· 201 RB_CLEAR_NODE(&tg->rb_node); 202 bio_list_init(&tg->bio_lists[0]); 203 bio_list_init(&tg->bio_lists[1]); 204 205 /* 206 * Take the initial reference that will be released on destroy ··· 738 struct throtl_grp *tg; 739 struct hlist_node *pos, *n; 740 741 - if (!atomic_read(&td->limits_changed)) 742 return; 743 744 - throtl_log(td, "limit changed =%d", atomic_read(&td->limits_changed)); 745 746 - /* 747 - * Make sure updates from throtl_update_blkio_group_read_bps() group 748 - * of functions to tg->limits_changed are visible. We do not 749 - * want update td->limits_changed to be visible but update to 750 - * tg->limits_changed not being visible yet on this cpu. Hence 751 - * the read barrier. 752 - */ 753 - smp_rmb(); 754 755 hlist_for_each_entry_safe(tg, pos, n, &td->tg_list, tg_node) { 756 - if (throtl_tg_on_rr(tg) && tg->limits_changed) { 757 - throtl_log_tg(td, tg, "limit change rbps=%llu wbps=%llu" 758 - " riops=%u wiops=%u", tg->bps[READ], 759 - tg->bps[WRITE], tg->iops[READ], 760 - tg->iops[WRITE]); 761 - tg_update_disptime(td, tg); 762 - tg->limits_changed = false; 763 - } 764 - } 765 766 - smp_mb__before_atomic_dec(); 767 - atomic_dec(&td->limits_changed); 768 - smp_mb__after_atomic_dec(); 769 } 770 771 /* Dispatch throttled bios. Should be called without queue lock held. */ ··· 777 unsigned int nr_disp = 0; 778 struct bio_list bio_list_on_stack; 779 struct bio *bio; 780 781 spin_lock_irq(q->queue_lock); 782 ··· 806 * immediate dispatch 807 */ 808 if (nr_disp) { 809 while((bio = bio_list_pop(&bio_list_on_stack))) 810 generic_make_request(bio); 811 - blk_unplug(q); 812 } 813 return nr_disp; 814 } ··· 830 831 struct delayed_work *dwork = &td->throtl_work; 832 833 - if (total_nr_queued(td) > 0) { 834 /* 835 * We might have a work scheduled to be executed in future. 836 * Cancel that and schedule a new one. ··· 904 spin_unlock_irqrestore(td->queue->queue_lock, flags); 905 } 906 907 /* 908 * For all update functions, key should be a valid pointer because these 909 * update functions are called under blkcg_lock, that means, blkg is ··· 926 struct blkio_group *blkg, u64 read_bps) 927 { 928 struct throtl_data *td = key; 929 930 - tg_of_blkg(blkg)->bps[READ] = read_bps; 931 - /* Make sure read_bps is updated before setting limits_changed */ 932 - smp_wmb(); 933 - tg_of_blkg(blkg)->limits_changed = true; 934 - 935 - /* Make sure tg->limits_changed is updated before td->limits_changed */ 936 - smp_mb__before_atomic_inc(); 937 - atomic_inc(&td->limits_changed); 938 - smp_mb__after_atomic_inc(); 939 - 940 - /* Schedule a work now to process the limit change */ 941 - throtl_schedule_delayed_work(td, 0); 942 } 943 944 static void throtl_update_blkio_group_write_bps(void *key, 945 struct blkio_group *blkg, u64 write_bps) 946 { 947 struct throtl_data *td = key; 948 949 - tg_of_blkg(blkg)->bps[WRITE] = write_bps; 950 - smp_wmb(); 951 - tg_of_blkg(blkg)->limits_changed = true; 952 - smp_mb__before_atomic_inc(); 953 - atomic_inc(&td->limits_changed); 954 - smp_mb__after_atomic_inc(); 955 - throtl_schedule_delayed_work(td, 0); 956 } 957 958 static void throtl_update_blkio_group_read_iops(void *key, 959 struct blkio_group *blkg, unsigned int read_iops) 960 { 961 struct throtl_data *td = key; 962 963 - tg_of_blkg(blkg)->iops[READ] = read_iops; 964 - smp_wmb(); 965 - tg_of_blkg(blkg)->limits_changed = true; 966 - smp_mb__before_atomic_inc(); 967 - atomic_inc(&td->limits_changed); 968 - smp_mb__after_atomic_inc(); 969 - throtl_schedule_delayed_work(td, 0); 970 } 971 972 static void throtl_update_blkio_group_write_iops(void *key, 973 struct blkio_group *blkg, unsigned int write_iops) 974 { 975 struct throtl_data *td = key; 976 977 - tg_of_blkg(blkg)->iops[WRITE] = write_iops; 978 - smp_wmb(); 979 - tg_of_blkg(blkg)->limits_changed = true; 980 - smp_mb__before_atomic_inc(); 981 - atomic_inc(&td->limits_changed); 982 - smp_mb__after_atomic_inc(); 983 - throtl_schedule_delayed_work(td, 0); 984 } 985 986 - void throtl_shutdown_timer_wq(struct request_queue *q) 987 { 988 struct throtl_data *td = q->td; 989 ··· 1003 /* 1004 * There is already another bio queued in same dir. No 1005 * need to update dispatch time. 1006 - * Still update the disptime if rate limits on this group 1007 - * were changed. 1008 */ 1009 - if (!tg->limits_changed) 1010 - update_disptime = false; 1011 - else 1012 - tg->limits_changed = false; 1013 - 1014 goto queue_bio; 1015 } 1016 1017 /* Bio is with-in rate limit of group */ 1018 if (tg_may_dispatch(td, tg, bio, NULL)) { 1019 throtl_charge_bio(tg, bio); 1020 goto out; 1021 } 1022 ··· 1060 1061 INIT_HLIST_HEAD(&td->tg_list); 1062 td->tg_service_tree = THROTL_RB_ROOT; 1063 - atomic_set(&td->limits_changed, 0); 1064 1065 /* Init root group */ 1066 tg = &td->root_tg; ··· 1072 /* Practically unlimited BW */ 1073 tg->bps[0] = tg->bps[1] = -1; 1074 tg->iops[0] = tg->iops[1] = -1; 1075 1076 /* 1077 * Set root group reference to 2. One reference will be dropped when ··· 1105 1106 BUG_ON(!td); 1107 1108 - throtl_shutdown_timer_wq(q); 1109 1110 spin_lock_irq(q->queue_lock); 1111 throtl_release_tgs(td); ··· 1135 * update limits through cgroup and another work got queued, cancel 1136 * it. 1137 */ 1138 - throtl_shutdown_timer_wq(q); 1139 throtl_td_free(td); 1140 } 1141

··· 102 /* Work for dispatching throttled bios */ 103 struct delayed_work throtl_work; 104 105 + bool limits_changed; 106 }; 107 108 enum tg_state_flags { ··· 201 RB_CLEAR_NODE(&tg->rb_node); 202 bio_list_init(&tg->bio_lists[0]); 203 bio_list_init(&tg->bio_lists[1]); 204 + td->limits_changed = false; 205 206 /* 207 * Take the initial reference that will be released on destroy ··· 737 struct throtl_grp *tg; 738 struct hlist_node *pos, *n; 739 740 + if (!td->limits_changed) 741 return; 742 743 + xchg(&td->limits_changed, false); 744 745 + throtl_log(td, "limits changed"); 746 747 hlist_for_each_entry_safe(tg, pos, n, &td->tg_list, tg_node) { 748 + if (!tg->limits_changed) 749 + continue; 750 751 + if (!xchg(&tg->limits_changed, false)) 752 + continue; 753 + 754 + throtl_log_tg(td, tg, "limit change rbps=%llu wbps=%llu" 755 + " riops=%u wiops=%u", tg->bps[READ], tg->bps[WRITE], 756 + tg->iops[READ], tg->iops[WRITE]); 757 + 758 + /* 759 + * Restart the slices for both READ and WRITES. It 760 + * might happen that a group's limit are dropped 761 + * suddenly and we don't want to account recently 762 + * dispatched IO with new low rate 763 + */ 764 + throtl_start_new_slice(td, tg, 0); 765 + throtl_start_new_slice(td, tg, 1); 766 + 767 + if (throtl_tg_on_rr(tg)) 768 + tg_update_disptime(td, tg); 769 + } 770 } 771 772 /* Dispatch throttled bios. Should be called without queue lock held. */ ··· 774 unsigned int nr_disp = 0; 775 struct bio_list bio_list_on_stack; 776 struct bio *bio; 777 + struct blk_plug plug; 778 779 spin_lock_irq(q->queue_lock); 780 ··· 802 * immediate dispatch 803 */ 804 if (nr_disp) { 805 + blk_start_plug(&plug); 806 while((bio = bio_list_pop(&bio_list_on_stack))) 807 generic_make_request(bio); 808 + blk_finish_plug(&plug); 809 } 810 return nr_disp; 811 } ··· 825 826 struct delayed_work *dwork = &td->throtl_work; 827 828 + /* schedule work if limits changed even if no bio is queued */ 829 + if (total_nr_queued(td) > 0 || td->limits_changed) { 830 /* 831 * We might have a work scheduled to be executed in future. 832 * Cancel that and schedule a new one. ··· 898 spin_unlock_irqrestore(td->queue->queue_lock, flags); 899 } 900 901 + static void throtl_update_blkio_group_common(struct throtl_data *td, 902 + struct throtl_grp *tg) 903 + { 904 + xchg(&tg->limits_changed, true); 905 + xchg(&td->limits_changed, true); 906 + /* Schedule a work now to process the limit change */ 907 + throtl_schedule_delayed_work(td, 0); 908 + } 909 + 910 /* 911 * For all update functions, key should be a valid pointer because these 912 * update functions are called under blkcg_lock, that means, blkg is ··· 911 struct blkio_group *blkg, u64 read_bps) 912 { 913 struct throtl_data *td = key; 914 + struct throtl_grp *tg = tg_of_blkg(blkg); 915 916 + tg->bps[READ] = read_bps; 917 + throtl_update_blkio_group_common(td, tg); 918 } 919 920 static void throtl_update_blkio_group_write_bps(void *key, 921 struct blkio_group *blkg, u64 write_bps) 922 { 923 struct throtl_data *td = key; 924 + struct throtl_grp *tg = tg_of_blkg(blkg); 925 926 + tg->bps[WRITE] = write_bps; 927 + throtl_update_blkio_group_common(td, tg); 928 } 929 930 static void throtl_update_blkio_group_read_iops(void *key, 931 struct blkio_group *blkg, unsigned int read_iops) 932 { 933 struct throtl_data *td = key; 934 + struct throtl_grp *tg = tg_of_blkg(blkg); 935 936 + tg->iops[READ] = read_iops; 937 + throtl_update_blkio_group_common(td, tg); 938 } 939 940 static void throtl_update_blkio_group_write_iops(void *key, 941 struct blkio_group *blkg, unsigned int write_iops) 942 { 943 struct throtl_data *td = key; 944 + struct throtl_grp *tg = tg_of_blkg(blkg); 945 946 + tg->iops[WRITE] = write_iops; 947 + throtl_update_blkio_group_common(td, tg); 948 } 949 950 + static void throtl_shutdown_wq(struct request_queue *q) 951 { 952 struct throtl_data *td = q->td; 953 ··· 1009 /* 1010 * There is already another bio queued in same dir. No 1011 * need to update dispatch time. 1012 */ 1013 + update_disptime = false; 1014 goto queue_bio; 1015 + 1016 } 1017 1018 /* Bio is with-in rate limit of group */ 1019 if (tg_may_dispatch(td, tg, bio, NULL)) { 1020 throtl_charge_bio(tg, bio); 1021 + 1022 + /* 1023 + * We need to trim slice even when bios are not being queued 1024 + * otherwise it might happen that a bio is not queued for 1025 + * a long time and slice keeps on extending and trim is not 1026 + * called for a long time. Now if limits are reduced suddenly 1027 + * we take into account all the IO dispatched so far at new 1028 + * low rate and * newly queued IO gets a really long dispatch 1029 + * time. 1030 + * 1031 + * So keep on trimming slice even if bio is not queued. 1032 + */ 1033 + throtl_trim_slice(td, tg, rw); 1034 goto out; 1035 } 1036 ··· 1058 1059 INIT_HLIST_HEAD(&td->tg_list); 1060 td->tg_service_tree = THROTL_RB_ROOT; 1061 + td->limits_changed = false; 1062 1063 /* Init root group */ 1064 tg = &td->root_tg; ··· 1070 /* Practically unlimited BW */ 1071 tg->bps[0] = tg->bps[1] = -1; 1072 tg->iops[0] = tg->iops[1] = -1; 1073 + td->limits_changed = false; 1074 1075 /* 1076 * Set root group reference to 2. One reference will be dropped when ··· 1102 1103 BUG_ON(!td); 1104 1105 + throtl_shutdown_wq(q); 1106 1107 spin_lock_irq(q->queue_lock); 1108 throtl_release_tgs(td); ··· 1132 * update limits through cgroup and another work got queued, cancel 1133 * it. 1134 */ 1135 + throtl_shutdown_wq(q); 1136 throtl_td_free(td); 1137 } 1138

+6 -10

block/blk.h

··· 18 void blk_dequeue_request(struct request *rq); 19 void __blk_queue_free_tags(struct request_queue *q); 20 21 - void blk_unplug_work(struct work_struct *work); 22 - void blk_unplug_timeout(unsigned long data); 23 void blk_rq_timed_out_timer(unsigned long data); 24 void blk_delete_timer(struct request *); 25 void blk_add_timer(struct request *); ··· 49 */ 50 #define ELV_ON_HASH(rq) (!hlist_unhashed(&(rq)->hash)) 51 52 - struct request *blk_do_flush(struct request_queue *q, struct request *rq); 53 54 static inline struct request *__elv_next_request(struct request_queue *q) 55 { 56 struct request *rq; 57 58 while (1) { 59 - while (!list_empty(&q->queue_head)) { 60 rq = list_entry_rq(q->queue_head.next); 61 - if (!(rq->cmd_flags & (REQ_FLUSH | REQ_FUA)) || 62 - rq == &q->flush_rq) 63 - return rq; 64 - rq = blk_do_flush(q, rq); 65 - if (rq) 66 - return rq; 67 } 68 69 if (!q->elevator->ops->elevator_dispatch_fn(q, 0)) ··· 103 struct bio *bio); 104 int attempt_back_merge(struct request_queue *q, struct request *rq); 105 int attempt_front_merge(struct request_queue *q, struct request *rq); 106 void blk_recalc_rq_segments(struct request *rq); 107 void blk_rq_set_mixed_merge(struct request *rq); 108

··· 18 void blk_dequeue_request(struct request *rq); 19 void __blk_queue_free_tags(struct request_queue *q); 20 21 void blk_rq_timed_out_timer(unsigned long data); 22 void blk_delete_timer(struct request *); 23 void blk_add_timer(struct request *); ··· 51 */ 52 #define ELV_ON_HASH(rq) (!hlist_unhashed(&(rq)->hash)) 53 54 + void blk_insert_flush(struct request *rq); 55 + void blk_abort_flushes(struct request_queue *q); 56 57 static inline struct request *__elv_next_request(struct request_queue *q) 58 { 59 struct request *rq; 60 61 while (1) { 62 + if (!list_empty(&q->queue_head)) { 63 rq = list_entry_rq(q->queue_head.next); 64 + return rq; 65 } 66 67 if (!q->elevator->ops->elevator_dispatch_fn(q, 0)) ··· 109 struct bio *bio); 110 int attempt_back_merge(struct request_queue *q, struct request *rq); 111 int attempt_front_merge(struct request_queue *q, struct request *rq); 112 + int blk_attempt_req_merge(struct request_queue *q, struct request *rq, 113 + struct request *next); 114 void blk_recalc_rq_segments(struct request *rq); 115 void blk_rq_set_mixed_merge(struct request *rq); 116

+86 -79

block/cfq-iosched.c

··· 54 #define CFQQ_SEEKY(cfqq) (hweight32(cfqq->seek_history) > 32/8) 55 56 #define RQ_CIC(rq) \ 57 - ((struct cfq_io_context *) (rq)->elevator_private) 58 - #define RQ_CFQQ(rq) (struct cfq_queue *) ((rq)->elevator_private2) 59 - #define RQ_CFQG(rq) (struct cfq_group *) ((rq)->elevator_private3) 60 61 static struct kmem_cache *cfq_pool; 62 static struct kmem_cache *cfq_ioc_pool; ··· 146 struct cfq_rb_root *service_tree; 147 struct cfq_queue *new_cfqq; 148 struct cfq_group *cfqg; 149 - struct cfq_group *orig_cfqg; 150 /* Number of sectors dispatched from queue in single dispatch round */ 151 unsigned long nr_sectors; 152 }; ··· 178 /* group service_tree key */ 179 u64 vdisktime; 180 unsigned int weight; 181 182 /* number of cfqq currently on this group */ 183 int nr_cfqq; ··· 239 struct rb_root prio_trees[CFQ_PRIO_LISTS]; 240 241 unsigned int busy_queues; 242 243 int rq_in_driver; 244 int rq_in_flight[2]; ··· 287 unsigned int cfq_slice_idle; 288 unsigned int cfq_group_idle; 289 unsigned int cfq_latency; 290 - unsigned int cfq_group_isolation; 291 292 unsigned int cic_index; 293 struct list_head cic_list; ··· 502 } 503 } 504 505 - static int cfq_queue_empty(struct request_queue *q) 506 - { 507 - struct cfq_data *cfqd = q->elevator->elevator_data; 508 - 509 - return !cfqd->rq_queued; 510 - } 511 - 512 /* 513 * Scale schedule slice based on io priority. Use the sync time slice only 514 * if a queue is marked sync and has sync io queued. A sync queue with async ··· 552 553 static void update_min_vdisktime(struct cfq_rb_root *st) 554 { 555 - u64 vdisktime = st->min_vdisktime; 556 struct cfq_group *cfqg; 557 558 if (st->left) { 559 cfqg = rb_entry_cfqg(st->left); 560 - vdisktime = min_vdisktime(vdisktime, cfqg->vdisktime); 561 } 562 - 563 - st->min_vdisktime = max_vdisktime(st->min_vdisktime, vdisktime); 564 } 565 566 /* ··· 855 } 856 857 static void 858 - cfq_group_service_tree_add(struct cfq_data *cfqd, struct cfq_group *cfqg) 859 { 860 struct cfq_rb_root *st = &cfqd->grp_service_tree; 861 struct cfq_group *__cfqg; ··· 896 cfqg->vdisktime = __cfqg->vdisktime + CFQ_IDLE_DELAY; 897 } else 898 cfqg->vdisktime = st->min_vdisktime; 899 - 900 - __cfq_group_service_tree_add(st, cfqg); 901 - st->total_weight += cfqg->weight; 902 } 903 904 static void 905 - cfq_group_service_tree_del(struct cfq_data *cfqd, struct cfq_group *cfqg) 906 { 907 struct cfq_rb_root *st = &cfqd->grp_service_tree; 908 ··· 920 return; 921 922 cfq_log_cfqg(cfqd, cfqg, "del_from_rr group"); 923 - st->total_weight -= cfqg->weight; 924 - if (!RB_EMPTY_NODE(&cfqg->rb_node)) 925 - cfq_rb_erase(&cfqg->rb_node, st); 926 cfqg->saved_workload_slice = 0; 927 cfq_blkiocg_update_dequeue_stats(&cfqg->blkg, 1); 928 } 929 930 - static inline unsigned int cfq_cfqq_slice_usage(struct cfq_queue *cfqq) 931 { 932 unsigned int slice_used; 933 ··· 945 1); 946 } else { 947 slice_used = jiffies - cfqq->slice_start; 948 - if (slice_used > cfqq->allocated_slice) 949 slice_used = cfqq->allocated_slice; 950 } 951 952 return slice_used; ··· 961 struct cfq_queue *cfqq) 962 { 963 struct cfq_rb_root *st = &cfqd->grp_service_tree; 964 - unsigned int used_sl, charge; 965 int nr_sync = cfqg->nr_cfqq - cfqg_busy_async_queues(cfqd, cfqg) 966 - cfqg->service_tree_idle.count; 967 968 BUG_ON(nr_sync < 0); 969 - used_sl = charge = cfq_cfqq_slice_usage(cfqq); 970 971 if (iops_mode(cfqd)) 972 charge = cfqq->slice_dispatch; ··· 974 charge = cfqq->allocated_slice; 975 976 /* Can't update vdisktime while group is on service tree */ 977 - cfq_rb_erase(&cfqg->rb_node, st); 978 cfqg->vdisktime += cfq_scale_slice(charge, cfqg); 979 - __cfq_group_service_tree_add(st, cfqg); 980 981 /* This group is being expired. Save the context */ 982 if (time_after(cfqd->workload_expires, jiffies)) { ··· 993 cfq_log_cfqq(cfqq->cfqd, cfqq, "sl_used=%u disp=%u charge=%u iops=%u" 994 " sect=%u", used_sl, cfqq->slice_dispatch, charge, 995 iops_mode(cfqd), cfqq->nr_sectors); 996 - cfq_blkiocg_update_timeslice_used(&cfqg->blkg, used_sl); 997 cfq_blkiocg_set_start_empty_time(&cfqg->blkg); 998 } 999 ··· 1009 void cfq_update_blkio_group_weight(void *key, struct blkio_group *blkg, 1010 unsigned int weight) 1011 { 1012 - cfqg_of_blkg(blkg)->weight = weight; 1013 } 1014 1015 static struct cfq_group * ··· 1213 int new_cfqq = 1; 1214 int group_changed = 0; 1215 1216 - #ifdef CONFIG_CFQ_GROUP_IOSCHED 1217 - if (!cfqd->cfq_group_isolation 1218 - && cfqq_type(cfqq) == SYNC_NOIDLE_WORKLOAD 1219 - && cfqq->cfqg && cfqq->cfqg != &cfqd->root_group) { 1220 - /* Move this cfq to root group */ 1221 - cfq_log_cfqq(cfqd, cfqq, "moving to root group"); 1222 - if (!RB_EMPTY_NODE(&cfqq->rb_node)) 1223 - cfq_group_service_tree_del(cfqd, cfqq->cfqg); 1224 - cfqq->orig_cfqg = cfqq->cfqg; 1225 - cfqq->cfqg = &cfqd->root_group; 1226 - cfqd->root_group.ref++; 1227 - group_changed = 1; 1228 - } else if (!cfqd->cfq_group_isolation 1229 - && cfqq_type(cfqq) == SYNC_WORKLOAD && cfqq->orig_cfqg) { 1230 - /* cfqq is sequential now needs to go to its original group */ 1231 - BUG_ON(cfqq->cfqg != &cfqd->root_group); 1232 - if (!RB_EMPTY_NODE(&cfqq->rb_node)) 1233 - cfq_group_service_tree_del(cfqd, cfqq->cfqg); 1234 - cfq_put_cfqg(cfqq->cfqg); 1235 - cfqq->cfqg = cfqq->orig_cfqg; 1236 - cfqq->orig_cfqg = NULL; 1237 - group_changed = 1; 1238 - cfq_log_cfqq(cfqd, cfqq, "moved to origin group"); 1239 - } 1240 - #endif 1241 - 1242 service_tree = service_tree_for(cfqq->cfqg, cfqq_prio(cfqq), 1243 cfqq_type(cfqq)); 1244 if (cfq_class_idle(cfqq)) { ··· 1284 service_tree->count++; 1285 if ((add_front || !new_cfqq) && !group_changed) 1286 return; 1287 - cfq_group_service_tree_add(cfqd, cfqq->cfqg); 1288 } 1289 1290 static struct cfq_queue * ··· 1372 BUG_ON(cfq_cfqq_on_rr(cfqq)); 1373 cfq_mark_cfqq_on_rr(cfqq); 1374 cfqd->busy_queues++; 1375 1376 cfq_resort_rr_list(cfqd, cfqq); 1377 } ··· 1397 cfqq->p_root = NULL; 1398 } 1399 1400 - cfq_group_service_tree_del(cfqd, cfqq->cfqg); 1401 BUG_ON(!cfqd->busy_queues); 1402 cfqd->busy_queues--; 1403 } 1404 1405 /* ··· 2409 * Does this cfqq already have too much IO in flight? 2410 */ 2411 if (cfqq->dispatched >= max_dispatch) { 2412 /* 2413 * idle queue must always only have a single IO in flight 2414 */ ··· 2417 return false; 2418 2419 /* 2420 * We have other queues, don't allow more IO from this one 2421 */ 2422 - if (cfqd->busy_queues > 1 && cfq_slice_used_soon(cfqd, cfqq)) 2423 return false; 2424 2425 /* 2426 * Sole queue user, no limit 2427 */ 2428 - if (cfqd->busy_queues == 1) 2429 max_dispatch = -1; 2430 else 2431 /* ··· 2558 static void cfq_put_queue(struct cfq_queue *cfqq) 2559 { 2560 struct cfq_data *cfqd = cfqq->cfqd; 2561 - struct cfq_group *cfqg, *orig_cfqg; 2562 2563 BUG_ON(cfqq->ref <= 0); 2564 ··· 2570 BUG_ON(rb_first(&cfqq->sort_list)); 2571 BUG_ON(cfqq->allocated[READ] + cfqq->allocated[WRITE]); 2572 cfqg = cfqq->cfqg; 2573 - orig_cfqg = cfqq->orig_cfqg; 2574 2575 if (unlikely(cfqd->active_queue == cfqq)) { 2576 __cfq_slice_expired(cfqd, cfqq, 0); ··· 2579 BUG_ON(cfq_cfqq_on_rr(cfqq)); 2580 kmem_cache_free(cfq_pool, cfqq); 2581 cfq_put_cfqg(cfqg); 2582 - if (orig_cfqg) 2583 - cfq_put_cfqg(orig_cfqg); 2584 } 2585 2586 /* ··· 3626 3627 put_io_context(RQ_CIC(rq)->ioc); 3628 3629 - rq->elevator_private = NULL; 3630 - rq->elevator_private2 = NULL; 3631 3632 /* Put down rq reference on cfqg */ 3633 cfq_put_cfqg(RQ_CFQG(rq)); 3634 - rq->elevator_private3 = NULL; 3635 3636 cfq_put_queue(cfqq); 3637 } ··· 3718 } 3719 3720 cfqq->allocated[rw]++; 3721 cfqq->ref++; 3722 - rq->elevator_private = cic; 3723 - rq->elevator_private2 = cfqq; 3724 - rq->elevator_private3 = cfq_ref_get_cfqg(cfqq->cfqg); 3725 - 3726 spin_unlock_irqrestore(q->queue_lock, flags); 3727 - 3728 return 0; 3729 3730 queue_fail: ··· 3965 cfqd->cfq_slice_idle = cfq_slice_idle; 3966 cfqd->cfq_group_idle = cfq_group_idle; 3967 cfqd->cfq_latency = 1; 3968 - cfqd->cfq_group_isolation = 0; 3969 cfqd->hw_tag = -1; 3970 /* 3971 * we optimistically start assuming sync ops weren't delayed in last ··· 4040 SHOW_FUNCTION(cfq_slice_async_show, cfqd->cfq_slice[0], 1); 4041 SHOW_FUNCTION(cfq_slice_async_rq_show, cfqd->cfq_slice_async_rq, 0); 4042 SHOW_FUNCTION(cfq_low_latency_show, cfqd->cfq_latency, 0); 4043 - SHOW_FUNCTION(cfq_group_isolation_show, cfqd->cfq_group_isolation, 0); 4044 #undef SHOW_FUNCTION 4045 4046 #define STORE_FUNCTION(__FUNC, __PTR, MIN, MAX, __CONV) \ ··· 4073 STORE_FUNCTION(cfq_slice_async_rq_store, &cfqd->cfq_slice_async_rq, 1, 4074 UINT_MAX, 0); 4075 STORE_FUNCTION(cfq_low_latency_store, &cfqd->cfq_latency, 0, 1, 0); 4076 - STORE_FUNCTION(cfq_group_isolation_store, &cfqd->cfq_group_isolation, 0, 1, 0); 4077 #undef STORE_FUNCTION 4078 4079 #define CFQ_ATTR(name) \ ··· 4090 CFQ_ATTR(slice_idle), 4091 CFQ_ATTR(group_idle), 4092 CFQ_ATTR(low_latency), 4093 - CFQ_ATTR(group_isolation), 4094 __ATTR_NULL 4095 }; 4096 ··· 4104 .elevator_add_req_fn = cfq_insert_request, 4105 .elevator_activate_req_fn = cfq_activate_request, 4106 .elevator_deactivate_req_fn = cfq_deactivate_request, 4107 - .elevator_queue_empty_fn = cfq_queue_empty, 4108 .elevator_completed_req_fn = cfq_completed_request, 4109 .elevator_former_req_fn = elv_rb_former_request, 4110 .elevator_latter_req_fn = elv_rb_latter_request,

··· 54 #define CFQQ_SEEKY(cfqq) (hweight32(cfqq->seek_history) > 32/8) 55 56 #define RQ_CIC(rq) \ 57 + ((struct cfq_io_context *) (rq)->elevator_private[0]) 58 + #define RQ_CFQQ(rq) (struct cfq_queue *) ((rq)->elevator_private[1]) 59 + #define RQ_CFQG(rq) (struct cfq_group *) ((rq)->elevator_private[2]) 60 61 static struct kmem_cache *cfq_pool; 62 static struct kmem_cache *cfq_ioc_pool; ··· 146 struct cfq_rb_root *service_tree; 147 struct cfq_queue *new_cfqq; 148 struct cfq_group *cfqg; 149 /* Number of sectors dispatched from queue in single dispatch round */ 150 unsigned long nr_sectors; 151 }; ··· 179 /* group service_tree key */ 180 u64 vdisktime; 181 unsigned int weight; 182 + unsigned int new_weight; 183 + bool needs_update; 184 185 /* number of cfqq currently on this group */ 186 int nr_cfqq; ··· 238 struct rb_root prio_trees[CFQ_PRIO_LISTS]; 239 240 unsigned int busy_queues; 241 + unsigned int busy_sync_queues; 242 243 int rq_in_driver; 244 int rq_in_flight[2]; ··· 285 unsigned int cfq_slice_idle; 286 unsigned int cfq_group_idle; 287 unsigned int cfq_latency; 288 289 unsigned int cic_index; 290 struct list_head cic_list; ··· 501 } 502 } 503 504 /* 505 * Scale schedule slice based on io priority. Use the sync time slice only 506 * if a queue is marked sync and has sync io queued. A sync queue with async ··· 558 559 static void update_min_vdisktime(struct cfq_rb_root *st) 560 { 561 struct cfq_group *cfqg; 562 563 if (st->left) { 564 cfqg = rb_entry_cfqg(st->left); 565 + st->min_vdisktime = max_vdisktime(st->min_vdisktime, 566 + cfqg->vdisktime); 567 } 568 } 569 570 /* ··· 863 } 864 865 static void 866 + cfq_update_group_weight(struct cfq_group *cfqg) 867 + { 868 + BUG_ON(!RB_EMPTY_NODE(&cfqg->rb_node)); 869 + if (cfqg->needs_update) { 870 + cfqg->weight = cfqg->new_weight; 871 + cfqg->needs_update = false; 872 + } 873 + } 874 + 875 + static void 876 + cfq_group_service_tree_add(struct cfq_rb_root *st, struct cfq_group *cfqg) 877 + { 878 + BUG_ON(!RB_EMPTY_NODE(&cfqg->rb_node)); 879 + 880 + cfq_update_group_weight(cfqg); 881 + __cfq_group_service_tree_add(st, cfqg); 882 + st->total_weight += cfqg->weight; 883 + } 884 + 885 + static void 886 + cfq_group_notify_queue_add(struct cfq_data *cfqd, struct cfq_group *cfqg) 887 { 888 struct cfq_rb_root *st = &cfqd->grp_service_tree; 889 struct cfq_group *__cfqg; ··· 884 cfqg->vdisktime = __cfqg->vdisktime + CFQ_IDLE_DELAY; 885 } else 886 cfqg->vdisktime = st->min_vdisktime; 887 + cfq_group_service_tree_add(st, cfqg); 888 } 889 890 static void 891 + cfq_group_service_tree_del(struct cfq_rb_root *st, struct cfq_group *cfqg) 892 + { 893 + st->total_weight -= cfqg->weight; 894 + if (!RB_EMPTY_NODE(&cfqg->rb_node)) 895 + cfq_rb_erase(&cfqg->rb_node, st); 896 + } 897 + 898 + static void 899 + cfq_group_notify_queue_del(struct cfq_data *cfqd, struct cfq_group *cfqg) 900 { 901 struct cfq_rb_root *st = &cfqd->grp_service_tree; 902 ··· 902 return; 903 904 cfq_log_cfqg(cfqd, cfqg, "del_from_rr group"); 905 + cfq_group_service_tree_del(st, cfqg); 906 cfqg->saved_workload_slice = 0; 907 cfq_blkiocg_update_dequeue_stats(&cfqg->blkg, 1); 908 } 909 910 + static inline unsigned int cfq_cfqq_slice_usage(struct cfq_queue *cfqq, 911 + unsigned int *unaccounted_time) 912 { 913 unsigned int slice_used; 914 ··· 928 1); 929 } else { 930 slice_used = jiffies - cfqq->slice_start; 931 + if (slice_used > cfqq->allocated_slice) { 932 + *unaccounted_time = slice_used - cfqq->allocated_slice; 933 slice_used = cfqq->allocated_slice; 934 + } 935 + if (time_after(cfqq->slice_start, cfqq->dispatch_start)) 936 + *unaccounted_time += cfqq->slice_start - 937 + cfqq->dispatch_start; 938 } 939 940 return slice_used; ··· 939 struct cfq_queue *cfqq) 940 { 941 struct cfq_rb_root *st = &cfqd->grp_service_tree; 942 + unsigned int used_sl, charge, unaccounted_sl = 0; 943 int nr_sync = cfqg->nr_cfqq - cfqg_busy_async_queues(cfqd, cfqg) 944 - cfqg->service_tree_idle.count; 945 946 BUG_ON(nr_sync < 0); 947 + used_sl = charge = cfq_cfqq_slice_usage(cfqq, &unaccounted_sl); 948 949 if (iops_mode(cfqd)) 950 charge = cfqq->slice_dispatch; ··· 952 charge = cfqq->allocated_slice; 953 954 /* Can't update vdisktime while group is on service tree */ 955 + cfq_group_service_tree_del(st, cfqg); 956 cfqg->vdisktime += cfq_scale_slice(charge, cfqg); 957 + /* If a new weight was requested, update now, off tree */ 958 + cfq_group_service_tree_add(st, cfqg); 959 960 /* This group is being expired. Save the context */ 961 if (time_after(cfqd->workload_expires, jiffies)) { ··· 970 cfq_log_cfqq(cfqq->cfqd, cfqq, "sl_used=%u disp=%u charge=%u iops=%u" 971 " sect=%u", used_sl, cfqq->slice_dispatch, charge, 972 iops_mode(cfqd), cfqq->nr_sectors); 973 + cfq_blkiocg_update_timeslice_used(&cfqg->blkg, used_sl, 974 + unaccounted_sl); 975 cfq_blkiocg_set_start_empty_time(&cfqg->blkg); 976 } 977 ··· 985 void cfq_update_blkio_group_weight(void *key, struct blkio_group *blkg, 986 unsigned int weight) 987 { 988 + struct cfq_group *cfqg = cfqg_of_blkg(blkg); 989 + cfqg->new_weight = weight; 990 + cfqg->needs_update = true; 991 } 992 993 static struct cfq_group * ··· 1187 int new_cfqq = 1; 1188 int group_changed = 0; 1189 1190 service_tree = service_tree_for(cfqq->cfqg, cfqq_prio(cfqq), 1191 cfqq_type(cfqq)); 1192 if (cfq_class_idle(cfqq)) { ··· 1284 service_tree->count++; 1285 if ((add_front || !new_cfqq) && !group_changed) 1286 return; 1287 + cfq_group_notify_queue_add(cfqd, cfqq->cfqg); 1288 } 1289 1290 static struct cfq_queue * ··· 1372 BUG_ON(cfq_cfqq_on_rr(cfqq)); 1373 cfq_mark_cfqq_on_rr(cfqq); 1374 cfqd->busy_queues++; 1375 + if (cfq_cfqq_sync(cfqq)) 1376 + cfqd->busy_sync_queues++; 1377 1378 cfq_resort_rr_list(cfqd, cfqq); 1379 } ··· 1395 cfqq->p_root = NULL; 1396 } 1397 1398 + cfq_group_notify_queue_del(cfqd, cfqq->cfqg); 1399 BUG_ON(!cfqd->busy_queues); 1400 cfqd->busy_queues--; 1401 + if (cfq_cfqq_sync(cfqq)) 1402 + cfqd->busy_sync_queues--; 1403 } 1404 1405 /* ··· 2405 * Does this cfqq already have too much IO in flight? 2406 */ 2407 if (cfqq->dispatched >= max_dispatch) { 2408 + bool promote_sync = false; 2409 /* 2410 * idle queue must always only have a single IO in flight 2411 */ ··· 2412 return false; 2413 2414 /* 2415 + * If there is only one sync queue 2416 + * we can ignore async queue here and give the sync 2417 + * queue no dispatch limit. The reason is a sync queue can 2418 + * preempt async queue, limiting the sync queue doesn't make 2419 + * sense. This is useful for aiostress test. 2420 + */ 2421 + if (cfq_cfqq_sync(cfqq) && cfqd->busy_sync_queues == 1) 2422 + promote_sync = true; 2423 + 2424 + /* 2425 * We have other queues, don't allow more IO from this one 2426 */ 2427 + if (cfqd->busy_queues > 1 && cfq_slice_used_soon(cfqd, cfqq) && 2428 + !promote_sync) 2429 return false; 2430 2431 /* 2432 * Sole queue user, no limit 2433 */ 2434 + if (cfqd->busy_queues == 1 || promote_sync) 2435 max_dispatch = -1; 2436 else 2437 /* ··· 2542 static void cfq_put_queue(struct cfq_queue *cfqq) 2543 { 2544 struct cfq_data *cfqd = cfqq->cfqd; 2545 + struct cfq_group *cfqg; 2546 2547 BUG_ON(cfqq->ref <= 0); 2548 ··· 2554 BUG_ON(rb_first(&cfqq->sort_list)); 2555 BUG_ON(cfqq->allocated[READ] + cfqq->allocated[WRITE]); 2556 cfqg = cfqq->cfqg; 2557 2558 if (unlikely(cfqd->active_queue == cfqq)) { 2559 __cfq_slice_expired(cfqd, cfqq, 0); ··· 2564 BUG_ON(cfq_cfqq_on_rr(cfqq)); 2565 kmem_cache_free(cfq_pool, cfqq); 2566 cfq_put_cfqg(cfqg); 2567 } 2568 2569 /* ··· 3613 3614 put_io_context(RQ_CIC(rq)->ioc); 3615 3616 + rq->elevator_private[0] = NULL; 3617 + rq->elevator_private[1] = NULL; 3618 3619 /* Put down rq reference on cfqg */ 3620 cfq_put_cfqg(RQ_CFQG(rq)); 3621 + rq->elevator_private[2] = NULL; 3622 3623 cfq_put_queue(cfqq); 3624 } ··· 3705 } 3706 3707 cfqq->allocated[rw]++; 3708 + 3709 cfqq->ref++; 3710 + rq->elevator_private[0] = cic; 3711 + rq->elevator_private[1] = cfqq; 3712 + rq->elevator_private[2] = cfq_ref_get_cfqg(cfqq->cfqg); 3713 spin_unlock_irqrestore(q->queue_lock, flags); 3714 return 0; 3715 3716 queue_fail: ··· 3953 cfqd->cfq_slice_idle = cfq_slice_idle; 3954 cfqd->cfq_group_idle = cfq_group_idle; 3955 cfqd->cfq_latency = 1; 3956 cfqd->hw_tag = -1; 3957 /* 3958 * we optimistically start assuming sync ops weren't delayed in last ··· 4029 SHOW_FUNCTION(cfq_slice_async_show, cfqd->cfq_slice[0], 1); 4030 SHOW_FUNCTION(cfq_slice_async_rq_show, cfqd->cfq_slice_async_rq, 0); 4031 SHOW_FUNCTION(cfq_low_latency_show, cfqd->cfq_latency, 0); 4032 #undef SHOW_FUNCTION 4033 4034 #define STORE_FUNCTION(__FUNC, __PTR, MIN, MAX, __CONV) \ ··· 4063 STORE_FUNCTION(cfq_slice_async_rq_store, &cfqd->cfq_slice_async_rq, 1, 4064 UINT_MAX, 0); 4065 STORE_FUNCTION(cfq_low_latency_store, &cfqd->cfq_latency, 0, 1, 0); 4066 #undef STORE_FUNCTION 4067 4068 #define CFQ_ATTR(name) \ ··· 4081 CFQ_ATTR(slice_idle), 4082 CFQ_ATTR(group_idle), 4083 CFQ_ATTR(low_latency), 4084 __ATTR_NULL 4085 }; 4086 ··· 4096 .elevator_add_req_fn = cfq_insert_request, 4097 .elevator_activate_req_fn = cfq_activate_request, 4098 .elevator_deactivate_req_fn = cfq_deactivate_request, 4099 .elevator_completed_req_fn = cfq_completed_request, 4100 .elevator_former_req_fn = elv_rb_former_request, 4101 .elevator_latter_req_fn = elv_rb_latter_request,

+3 -3

block/cfq.h

··· 16 } 17 18 static inline void cfq_blkiocg_update_timeslice_used(struct blkio_group *blkg, 19 - unsigned long time) 20 { 21 - blkiocg_update_timeslice_used(blkg, time); 22 } 23 24 static inline void cfq_blkiocg_set_start_empty_time(struct blkio_group *blkg) ··· 85 unsigned long dequeue) {} 86 87 static inline void cfq_blkiocg_update_timeslice_used(struct blkio_group *blkg, 88 - unsigned long time) {} 89 static inline void cfq_blkiocg_set_start_empty_time(struct blkio_group *blkg) {} 90 static inline void cfq_blkiocg_update_io_remove_stats(struct blkio_group *blkg, 91 bool direction, bool sync) {}

··· 16 } 17 18 static inline void cfq_blkiocg_update_timeslice_used(struct blkio_group *blkg, 19 + unsigned long time, unsigned long unaccounted_time) 20 { 21 + blkiocg_update_timeslice_used(blkg, time, unaccounted_time); 22 } 23 24 static inline void cfq_blkiocg_set_start_empty_time(struct blkio_group *blkg) ··· 85 unsigned long dequeue) {} 86 87 static inline void cfq_blkiocg_update_timeslice_used(struct blkio_group *blkg, 88 + unsigned long time, unsigned long unaccounted_time) {} 89 static inline void cfq_blkiocg_set_start_empty_time(struct blkio_group *blkg) {} 90 static inline void cfq_blkiocg_update_io_remove_stats(struct blkio_group *blkg, 91 bool direction, bool sync) {}

-9

block/deadline-iosched.c

··· 326 return 1; 327 } 328 329 - static int deadline_queue_empty(struct request_queue *q) 330 - { 331 - struct deadline_data *dd = q->elevator->elevator_data; 332 - 333 - return list_empty(&dd->fifo_list[WRITE]) 334 - && list_empty(&dd->fifo_list[READ]); 335 - } 336 - 337 static void deadline_exit_queue(struct elevator_queue *e) 338 { 339 struct deadline_data *dd = e->elevator_data; ··· 437 .elevator_merge_req_fn = deadline_merged_requests, 438 .elevator_dispatch_fn = deadline_dispatch_requests, 439 .elevator_add_req_fn = deadline_add_request, 440 - .elevator_queue_empty_fn = deadline_queue_empty, 441 .elevator_former_req_fn = elv_rb_former_request, 442 .elevator_latter_req_fn = elv_rb_latter_request, 443 .elevator_init_fn = deadline_init_queue,

··· 326 return 1; 327 } 328 329 static void deadline_exit_queue(struct elevator_queue *e) 330 { 331 struct deadline_data *dd = e->elevator_data; ··· 445 .elevator_merge_req_fn = deadline_merged_requests, 446 .elevator_dispatch_fn = deadline_dispatch_requests, 447 .elevator_add_req_fn = deadline_add_request, 448 .elevator_former_req_fn = elv_rb_former_request, 449 .elevator_latter_req_fn = elv_rb_latter_request, 450 .elevator_init_fn = deadline_init_queue,

+64 -44

block/elevator.c

··· 113 } 114 EXPORT_SYMBOL(elv_rq_merge_ok); 115 116 - static inline int elv_try_merge(struct request *__rq, struct bio *bio) 117 { 118 int ret = ELEVATOR_NO_MERGE; 119 ··· 421 struct list_head *entry; 422 int stop_flags; 423 424 if (q->last_merge == rq) 425 q->last_merge = NULL; 426 ··· 521 return ELEVATOR_NO_MERGE; 522 } 523 524 void elv_merged_request(struct request_queue *q, struct request *rq, int type) 525 { 526 struct elevator_queue *e = q->elevator; ··· 572 struct request *next) 573 { 574 struct elevator_queue *e = q->elevator; 575 576 - if (e->ops->elevator_merge_req_fn) 577 e->ops->elevator_merge_req_fn(q, rq, next); 578 579 elv_rqhash_reposition(q, rq); 580 - elv_rqhash_del(q, next); 581 582 - q->nr_sorted--; 583 q->last_merge = rq; 584 } 585 ··· 657 658 void elv_insert(struct request_queue *q, struct request *rq, int where) 659 { 660 - int unplug_it = 1; 661 - 662 trace_block_rq_insert(q, rq); 663 664 rq->q = q; 665 666 switch (where) { 667 case ELEVATOR_INSERT_REQUEUE: 668 - /* 669 - * Most requeues happen because of a busy condition, 670 - * don't force unplug of the queue for that case. 671 - * Clear unplug_it and fall through. 672 - */ 673 - unplug_it = 0; 674 - 675 case ELEVATOR_INSERT_FRONT: 676 rq->cmd_flags |= REQ_SOFTBARRIER; 677 list_add(&rq->queuelist, &q->queue_head); ··· 685 __blk_run_queue(q, false); 686 break; 687 688 case ELEVATOR_INSERT_SORT: 689 BUG_ON(rq->cmd_type != REQ_TYPE_FS && 690 !(rq->cmd_flags & REQ_DISCARD)); ··· 712 q->elevator->ops->elevator_add_req_fn(q, rq); 713 break; 714 715 default: 716 printk(KERN_ERR "%s: bad insertion point %d\n", 717 __func__, where); 718 BUG(); 719 } 720 - 721 - if (unplug_it && blk_queue_plugged(q)) { 722 - int nrq = q->rq.count[BLK_RW_SYNC] + q->rq.count[BLK_RW_ASYNC] 723 - - queue_in_flight(q); 724 - 725 - if (nrq >= q->unplug_thresh) 726 - __generic_unplug_device(q); 727 - } 728 } 729 730 - void __elv_add_request(struct request_queue *q, struct request *rq, int where, 731 - int plug) 732 { 733 if (rq->cmd_flags & REQ_SOFTBARRIER) { 734 /* barriers are scheduling boundary, update end_sector */ 735 if (rq->cmd_type == REQ_TYPE_FS || ··· 738 where == ELEVATOR_INSERT_SORT) 739 where = ELEVATOR_INSERT_BACK; 740 741 - if (plug) 742 - blk_plug_device(q); 743 - 744 elv_insert(q, rq, where); 745 } 746 EXPORT_SYMBOL(__elv_add_request); 747 748 - void elv_add_request(struct request_queue *q, struct request *rq, int where, 749 - int plug) 750 { 751 unsigned long flags; 752 753 spin_lock_irqsave(q->queue_lock, flags); 754 - __elv_add_request(q, rq, where, plug); 755 spin_unlock_irqrestore(q->queue_lock, flags); 756 } 757 EXPORT_SYMBOL(elv_add_request); 758 - 759 - int elv_queue_empty(struct request_queue *q) 760 - { 761 - struct elevator_queue *e = q->elevator; 762 - 763 - if (!list_empty(&q->queue_head)) 764 - return 0; 765 - 766 - if (e->ops->elevator_queue_empty_fn) 767 - return e->ops->elevator_queue_empty_fn(q); 768 - 769 - return 1; 770 - } 771 - EXPORT_SYMBOL(elv_queue_empty); 772 773 struct request *elv_latter_request(struct request_queue *q, struct request *rq) 774 { ··· 777 if (e->ops->elevator_set_req_fn) 778 return e->ops->elevator_set_req_fn(q, rq, gfp_mask); 779 780 - rq->elevator_private = NULL; 781 return 0; 782 } 783 ··· 802 void elv_abort_queue(struct request_queue *q) 803 { 804 struct request *rq; 805 806 while (!list_empty(&q->queue_head)) { 807 rq = list_entry_rq(q->queue_head.next);

··· 113 } 114 EXPORT_SYMBOL(elv_rq_merge_ok); 115 116 + int elv_try_merge(struct request *__rq, struct bio *bio) 117 { 118 int ret = ELEVATOR_NO_MERGE; 119 ··· 421 struct list_head *entry; 422 int stop_flags; 423 424 + BUG_ON(rq->cmd_flags & REQ_ON_PLUG); 425 + 426 if (q->last_merge == rq) 427 q->last_merge = NULL; 428 ··· 519 return ELEVATOR_NO_MERGE; 520 } 521 522 + /* 523 + * Attempt to do an insertion back merge. Only check for the case where 524 + * we can append 'rq' to an existing request, so we can throw 'rq' away 525 + * afterwards. 526 + * 527 + * Returns true if we merged, false otherwise 528 + */ 529 + static bool elv_attempt_insert_merge(struct request_queue *q, 530 + struct request *rq) 531 + { 532 + struct request *__rq; 533 + 534 + if (blk_queue_nomerges(q)) 535 + return false; 536 + 537 + /* 538 + * First try one-hit cache. 539 + */ 540 + if (q->last_merge && blk_attempt_req_merge(q, q->last_merge, rq)) 541 + return true; 542 + 543 + if (blk_queue_noxmerges(q)) 544 + return false; 545 + 546 + /* 547 + * See if our hash lookup can find a potential backmerge. 548 + */ 549 + __rq = elv_rqhash_find(q, blk_rq_pos(rq)); 550 + if (__rq && blk_attempt_req_merge(q, __rq, rq)) 551 + return true; 552 + 553 + return false; 554 + } 555 + 556 void elv_merged_request(struct request_queue *q, struct request *rq, int type) 557 { 558 struct elevator_queue *e = q->elevator; ··· 536 struct request *next) 537 { 538 struct elevator_queue *e = q->elevator; 539 + const int next_sorted = next->cmd_flags & REQ_SORTED; 540 541 + if (next_sorted && e->ops->elevator_merge_req_fn) 542 e->ops->elevator_merge_req_fn(q, rq, next); 543 544 elv_rqhash_reposition(q, rq); 545 546 + if (next_sorted) { 547 + elv_rqhash_del(q, next); 548 + q->nr_sorted--; 549 + } 550 + 551 q->last_merge = rq; 552 } 553 ··· 617 618 void elv_insert(struct request_queue *q, struct request *rq, int where) 619 { 620 trace_block_rq_insert(q, rq); 621 622 rq->q = q; 623 624 switch (where) { 625 case ELEVATOR_INSERT_REQUEUE: 626 case ELEVATOR_INSERT_FRONT: 627 rq->cmd_flags |= REQ_SOFTBARRIER; 628 list_add(&rq->queuelist, &q->queue_head); ··· 654 __blk_run_queue(q, false); 655 break; 656 657 + case ELEVATOR_INSERT_SORT_MERGE: 658 + /* 659 + * If we succeed in merging this request with one in the 660 + * queue already, we are done - rq has now been freed, 661 + * so no need to do anything further. 662 + */ 663 + if (elv_attempt_insert_merge(q, rq)) 664 + break; 665 case ELEVATOR_INSERT_SORT: 666 BUG_ON(rq->cmd_type != REQ_TYPE_FS && 667 !(rq->cmd_flags & REQ_DISCARD)); ··· 673 q->elevator->ops->elevator_add_req_fn(q, rq); 674 break; 675 676 + case ELEVATOR_INSERT_FLUSH: 677 + rq->cmd_flags |= REQ_SOFTBARRIER; 678 + blk_insert_flush(rq); 679 + break; 680 default: 681 printk(KERN_ERR "%s: bad insertion point %d\n", 682 __func__, where); 683 BUG(); 684 } 685 } 686 687 + void __elv_add_request(struct request_queue *q, struct request *rq, int where) 688 { 689 + BUG_ON(rq->cmd_flags & REQ_ON_PLUG); 690 + 691 if (rq->cmd_flags & REQ_SOFTBARRIER) { 692 /* barriers are scheduling boundary, update end_sector */ 693 if (rq->cmd_type == REQ_TYPE_FS || ··· 702 where == ELEVATOR_INSERT_SORT) 703 where = ELEVATOR_INSERT_BACK; 704 705 elv_insert(q, rq, where); 706 } 707 EXPORT_SYMBOL(__elv_add_request); 708 709 + void elv_add_request(struct request_queue *q, struct request *rq, int where) 710 { 711 unsigned long flags; 712 713 spin_lock_irqsave(q->queue_lock, flags); 714 + __elv_add_request(q, rq, where); 715 spin_unlock_irqrestore(q->queue_lock, flags); 716 } 717 EXPORT_SYMBOL(elv_add_request); 718 719 struct request *elv_latter_request(struct request_queue *q, struct request *rq) 720 { ··· 759 if (e->ops->elevator_set_req_fn) 760 return e->ops->elevator_set_req_fn(q, rq, gfp_mask); 761 762 + rq->elevator_private[0] = NULL; 763 return 0; 764 } 765 ··· 784 void elv_abort_queue(struct request_queue *q) 785 { 786 struct request *rq; 787 + 788 + blk_abort_flushes(q); 789 790 while (!list_empty(&q->queue_head)) { 791 rq = list_entry_rq(q->queue_head.next);

+9 -9

block/genhd.c

··· 1158 "%u %lu %lu %llu %u %u %u %u\n", 1159 MAJOR(part_devt(hd)), MINOR(part_devt(hd)), 1160 disk_name(gp, hd->partno, buf), 1161 - part_stat_read(hd, ios[0]), 1162 - part_stat_read(hd, merges[0]), 1163 - (unsigned long long)part_stat_read(hd, sectors[0]), 1164 - jiffies_to_msecs(part_stat_read(hd, ticks[0])), 1165 - part_stat_read(hd, ios[1]), 1166 - part_stat_read(hd, merges[1]), 1167 - (unsigned long long)part_stat_read(hd, sectors[1]), 1168 - jiffies_to_msecs(part_stat_read(hd, ticks[1])), 1169 part_in_flight(hd), 1170 jiffies_to_msecs(part_stat_read(hd, io_ticks)), 1171 jiffies_to_msecs(part_stat_read(hd, time_in_queue)) ··· 1494 void disk_unblock_events(struct gendisk *disk) 1495 { 1496 if (disk->ev) 1497 - __disk_unblock_events(disk, true); 1498 } 1499 1500 /**

··· 1158 "%u %lu %lu %llu %u %u %u %u\n", 1159 MAJOR(part_devt(hd)), MINOR(part_devt(hd)), 1160 disk_name(gp, hd->partno, buf), 1161 + part_stat_read(hd, ios[READ]), 1162 + part_stat_read(hd, merges[READ]), 1163 + (unsigned long long)part_stat_read(hd, sectors[READ]), 1164 + jiffies_to_msecs(part_stat_read(hd, ticks[READ])), 1165 + part_stat_read(hd, ios[WRITE]), 1166 + part_stat_read(hd, merges[WRITE]), 1167 + (unsigned long long)part_stat_read(hd, sectors[WRITE]), 1168 + jiffies_to_msecs(part_stat_read(hd, ticks[WRITE])), 1169 part_in_flight(hd), 1170 jiffies_to_msecs(part_stat_read(hd, io_ticks)), 1171 jiffies_to_msecs(part_stat_read(hd, time_in_queue)) ··· 1494 void disk_unblock_events(struct gendisk *disk) 1495 { 1496 if (disk->ev) 1497 + __disk_unblock_events(disk, false); 1498 } 1499 1500 /**

-8

block/noop-iosched.c

··· 39 list_add_tail(&rq->queuelist, &nd->queue); 40 } 41 42 - static int noop_queue_empty(struct request_queue *q) 43 - { 44 - struct noop_data *nd = q->elevator->elevator_data; 45 - 46 - return list_empty(&nd->queue); 47 - } 48 - 49 static struct request * 50 noop_former_request(struct request_queue *q, struct request *rq) 51 { ··· 83 .elevator_merge_req_fn = noop_merged_requests, 84 .elevator_dispatch_fn = noop_dispatch, 85 .elevator_add_req_fn = noop_add_request, 86 - .elevator_queue_empty_fn = noop_queue_empty, 87 .elevator_former_req_fn = noop_former_request, 88 .elevator_latter_req_fn = noop_latter_request, 89 .elevator_init_fn = noop_init_queue,

··· 39 list_add_tail(&rq->queuelist, &nd->queue); 40 } 41 42 static struct request * 43 noop_former_request(struct request_queue *q, struct request *rq) 44 { ··· 90 .elevator_merge_req_fn = noop_merged_requests, 91 .elevator_dispatch_fn = noop_dispatch, 92 .elevator_add_req_fn = noop_add_request, 93 .elevator_former_req_fn = noop_former_request, 94 .elevator_latter_req_fn = noop_latter_request, 95 .elevator_init_fn = noop_init_queue,

+5 -3

drivers/block/DAC960.c

··· 140 return 0; 141 } 142 143 - static int DAC960_media_changed(struct gendisk *disk) 144 { 145 DAC960_Controller_T *p = disk->queue->queuedata; 146 int drive_nr = (long)disk->private_data; 147 148 if (!p->LogicalDriveInitiallyAccessible[drive_nr]) 149 - return 1; 150 return 0; 151 } 152 ··· 164 .owner = THIS_MODULE, 165 .open = DAC960_open, 166 .getgeo = DAC960_getgeo, 167 - .media_changed = DAC960_media_changed, 168 .revalidate_disk = DAC960_revalidate_disk, 169 }; 170 ··· 2547 disk->major = MajorNumber; 2548 disk->first_minor = n << DAC960_MaxPartitionsBits; 2549 disk->fops = &DAC960_BlockDeviceOperations; 2550 } 2551 /* 2552 Indicate the Block Device Registration completed successfully,

··· 140 return 0; 141 } 142 143 + static unsigned int DAC960_check_events(struct gendisk *disk, 144 + unsigned int clearing) 145 { 146 DAC960_Controller_T *p = disk->queue->queuedata; 147 int drive_nr = (long)disk->private_data; 148 149 if (!p->LogicalDriveInitiallyAccessible[drive_nr]) 150 + return DISK_EVENT_MEDIA_CHANGE; 151 return 0; 152 } 153 ··· 163 .owner = THIS_MODULE, 164 .open = DAC960_open, 165 .getgeo = DAC960_getgeo, 166 + .check_events = DAC960_check_events, 167 .revalidate_disk = DAC960_revalidate_disk, 168 }; 169 ··· 2546 disk->major = MajorNumber; 2547 disk->first_minor = n << DAC960_MaxPartitionsBits; 2548 disk->fops = &DAC960_BlockDeviceOperations; 2549 + disk->events = DISK_EVENT_MEDIA_CHANGE; 2550 } 2551 /* 2552 Indicate the Block Device Registration completed successfully,

+5 -4

drivers/block/amiflop.c

··· 1658 } 1659 1660 /* 1661 - * floppy-change is never called from an interrupt, so we can relax a bit 1662 * here, sleep etc. Note that floppy-on tries to set current_DOR to point 1663 * to the desired drive, but it will probably not survive the sleep if 1664 * several floppies are used at the same time: thus the loop. 1665 */ 1666 - static int amiga_floppy_change(struct gendisk *disk) 1667 { 1668 struct amiga_floppy_struct *p = disk->private_data; 1669 int drive = p - unit; ··· 1686 p->dirty = 0; 1687 writepending = 0; /* if this was true before, too bad! */ 1688 writefromint = 0; 1689 - return 1; 1690 } 1691 return 0; 1692 } ··· 1697 .release = floppy_release, 1698 .ioctl = fd_ioctl, 1699 .getgeo = fd_getgeo, 1700 - .media_changed = amiga_floppy_change, 1701 }; 1702 1703 static int __init fd_probe_drives(void) ··· 1736 disk->major = FLOPPY_MAJOR; 1737 disk->first_minor = drive; 1738 disk->fops = &floppy_fops; 1739 sprintf(disk->disk_name, "fd%d", drive); 1740 disk->private_data = &unit[drive]; 1741 set_capacity(disk, 880*2);

··· 1658 } 1659 1660 /* 1661 + * check_events is never called from an interrupt, so we can relax a bit 1662 * here, sleep etc. Note that floppy-on tries to set current_DOR to point 1663 * to the desired drive, but it will probably not survive the sleep if 1664 * several floppies are used at the same time: thus the loop. 1665 */ 1666 + static unsigned amiga_check_events(struct gendisk *disk, unsigned int clearing) 1667 { 1668 struct amiga_floppy_struct *p = disk->private_data; 1669 int drive = p - unit; ··· 1686 p->dirty = 0; 1687 writepending = 0; /* if this was true before, too bad! */ 1688 writefromint = 0; 1689 + return DISK_EVENT_MEDIA_CHANGE; 1690 } 1691 return 0; 1692 } ··· 1697 .release = floppy_release, 1698 .ioctl = fd_ioctl, 1699 .getgeo = fd_getgeo, 1700 + .check_events = amiga_check_events, 1701 }; 1702 1703 static int __init fd_probe_drives(void) ··· 1736 disk->major = FLOPPY_MAJOR; 1737 disk->first_minor = drive; 1738 disk->fops = &floppy_fops; 1739 + disk->events = DISK_EVENT_MEDIA_CHANGE; 1740 sprintf(disk->disk_name, "fd%d", drive); 1741 disk->private_data = &unit[drive]; 1742 set_capacity(disk, 880*2);

+8 -6

drivers/block/ataflop.c

··· 1324 * due to unrecognised disk changes. 1325 */ 1326 1327 - static int check_floppy_change(struct gendisk *disk) 1328 { 1329 struct atari_floppy_struct *p = disk->private_data; 1330 unsigned int drive = p - unit; 1331 if (test_bit (drive, &fake_change)) { 1332 /* simulated change (e.g. after formatting) */ 1333 - return 1; 1334 } 1335 if (test_bit (drive, &changed_floppies)) { 1336 /* surely changed (the WP signal changed at least once) */ 1337 - return 1; 1338 } 1339 if (UD.wpstat) { 1340 /* WP is on -> could be changed: to be sure, buffers should be 1341 * invalidated... 1342 */ 1343 - return 1; 1344 } 1345 1346 return 0; ··· 1571 * or the next access will revalidate - and clear UDT :-( 1572 */ 1573 1574 - if (check_floppy_change(disk)) 1575 floppy_revalidate(disk); 1576 1577 if (UD.flags & FTD_MSG) ··· 1905 .open = floppy_unlocked_open, 1906 .release = floppy_release, 1907 .ioctl = fd_ioctl, 1908 - .media_changed = check_floppy_change, 1909 .revalidate_disk= floppy_revalidate, 1910 }; 1911 ··· 1964 unit[i].disk->first_minor = i; 1965 sprintf(unit[i].disk->disk_name, "fd%d", i); 1966 unit[i].disk->fops = &floppy_fops; 1967 unit[i].disk->private_data = &unit[i]; 1968 unit[i].disk->queue = blk_init_queue(do_fd_request, 1969 &ataflop_lock);

··· 1324 * due to unrecognised disk changes. 1325 */ 1326 1327 + static unsigned int floppy_check_events(struct gendisk *disk, 1328 + unsigned int clearing) 1329 { 1330 struct atari_floppy_struct *p = disk->private_data; 1331 unsigned int drive = p - unit; 1332 if (test_bit (drive, &fake_change)) { 1333 /* simulated change (e.g. after formatting) */ 1334 + return DISK_EVENT_MEDIA_CHANGE; 1335 } 1336 if (test_bit (drive, &changed_floppies)) { 1337 /* surely changed (the WP signal changed at least once) */ 1338 + return DISK_EVENT_MEDIA_CHANGE; 1339 } 1340 if (UD.wpstat) { 1341 /* WP is on -> could be changed: to be sure, buffers should be 1342 * invalidated... 1343 */ 1344 + return DISK_EVENT_MEDIA_CHANGE; 1345 } 1346 1347 return 0; ··· 1570 * or the next access will revalidate - and clear UDT :-( 1571 */ 1572 1573 + if (floppy_check_events(disk, 0)) 1574 floppy_revalidate(disk); 1575 1576 if (UD.flags & FTD_MSG) ··· 1904 .open = floppy_unlocked_open, 1905 .release = floppy_release, 1906 .ioctl = fd_ioctl, 1907 + .check_events = floppy_check_events, 1908 .revalidate_disk= floppy_revalidate, 1909 }; 1910 ··· 1963 unit[i].disk->first_minor = i; 1964 sprintf(unit[i].disk->disk_name, "fd%d", i); 1965 unit[i].disk->fops = &floppy_fops; 1966 + unit[i].disk->events = DISK_EVENT_MEDIA_CHANGE; 1967 unit[i].disk->private_data = &unit[i]; 1968 unit[i].disk->queue = blk_init_queue(do_fd_request, 1969 &ataflop_lock);

-6

drivers/block/cciss.c

··· 3170 int sg_index = 0; 3171 int chained = 0; 3172 3173 - /* We call start_io here in case there is a command waiting on the 3174 - * queue that has not been sent. 3175 - */ 3176 - if (blk_queue_plugged(q)) 3177 - goto startio; 3178 - 3179 queue: 3180 creq = blk_peek_request(q); 3181 if (!creq)

··· 3170 int sg_index = 0; 3171 int chained = 0; 3172 3173 queue: 3174 creq = blk_peek_request(q); 3175 if (!creq)

-3

drivers/block/cpqarray.c

··· 911 struct scatterlist tmp_sg[SG_MAX]; 912 int i, dir, seg; 913 914 - if (blk_queue_plugged(q)) 915 - goto startio; 916 - 917 queue_next: 918 creq = blk_peek_request(q); 919 if (!creq)

··· 911 struct scatterlist tmp_sg[SG_MAX]; 912 int i, dir, seg; 913 914 queue_next: 915 creq = blk_peek_request(q); 916 if (!creq)

+1 -3

drivers/block/drbd/drbd_actlog.c

··· 80 81 if ((rw & WRITE) && !test_bit(MD_NO_FUA, &mdev->flags)) 82 rw |= REQ_FUA; 83 - rw |= REQ_UNPLUG | REQ_SYNC; 84 85 bio = bio_alloc(GFP_NOIO, 1); 86 bio->bi_bdev = bdev->md_bdev; ··· 688 submit_bio(WRITE, bios[i]); 689 } 690 } 691 - 692 - drbd_blk_run_queue(bdev_get_queue(mdev->ldev->md_bdev)); 693 694 /* always (try to) flush bitmap to stable storage */ 695 drbd_md_flush(mdev);

··· 80 81 if ((rw & WRITE) && !test_bit(MD_NO_FUA, &mdev->flags)) 82 rw |= REQ_FUA; 83 + rw |= REQ_SYNC; 84 85 bio = bio_alloc(GFP_NOIO, 1); 86 bio->bi_bdev = bdev->md_bdev; ··· 688 submit_bio(WRITE, bios[i]); 689 } 690 } 691 692 /* always (try to) flush bitmap to stable storage */ 693 drbd_md_flush(mdev);

-1

drivers/block/drbd/drbd_bitmap.c

··· 840 for (i = 0; i < num_pages; i++) 841 bm_page_io_async(mdev, b, i, rw); 842 843 - drbd_blk_run_queue(bdev_get_queue(mdev->ldev->md_bdev)); 844 wait_event(b->bm_io_wait, atomic_read(&b->bm_async_io) == 0); 845 846 if (test_bit(BM_MD_IO_ERROR, &b->bm_flags)) {

··· 840 for (i = 0; i < num_pages; i++) 841 bm_page_io_async(mdev, b, i, rw); 842 843 wait_event(b->bm_io_wait, atomic_read(&b->bm_async_io) == 0); 844 845 if (test_bit(BM_MD_IO_ERROR, &b->bm_flags)) {

+1 -15

drivers/block/drbd/drbd_int.h

··· 377 #define DP_HARDBARRIER 1 /* depricated */ 378 #define DP_RW_SYNC 2 /* equals REQ_SYNC */ 379 #define DP_MAY_SET_IN_SYNC 4 380 - #define DP_UNPLUG 8 /* equals REQ_UNPLUG */ 381 #define DP_FUA 16 /* equals REQ_FUA */ 382 #define DP_FLUSH 32 /* equals REQ_FLUSH */ 383 #define DP_DISCARD 64 /* equals REQ_DISCARD */ ··· 2380 #define QUEUE_ORDERED_NONE 0 2381 #endif 2382 return QUEUE_ORDERED_NONE; 2383 - } 2384 - 2385 - static inline void drbd_blk_run_queue(struct request_queue *q) 2386 - { 2387 - if (q && q->unplug_fn) 2388 - q->unplug_fn(q); 2389 - } 2390 - 2391 - static inline void drbd_kick_lo(struct drbd_conf *mdev) 2392 - { 2393 - if (get_ldev(mdev)) { 2394 - drbd_blk_run_queue(bdev_get_queue(mdev->ldev->backing_bdev)); 2395 - put_ldev(mdev); 2396 - } 2397 } 2398 2399 static inline void drbd_md_flush(struct drbd_conf *mdev)

··· 377 #define DP_HARDBARRIER 1 /* depricated */ 378 #define DP_RW_SYNC 2 /* equals REQ_SYNC */ 379 #define DP_MAY_SET_IN_SYNC 4 380 + #define DP_UNPLUG 8 /* not used anymore */ 381 #define DP_FUA 16 /* equals REQ_FUA */ 382 #define DP_FLUSH 32 /* equals REQ_FLUSH */ 383 #define DP_DISCARD 64 /* equals REQ_DISCARD */ ··· 2380 #define QUEUE_ORDERED_NONE 0 2381 #endif 2382 return QUEUE_ORDERED_NONE; 2383 } 2384 2385 static inline void drbd_md_flush(struct drbd_conf *mdev)

+2 -34

drivers/block/drbd/drbd_main.c

··· 2477 { 2478 if (mdev->agreed_pro_version >= 95) 2479 return (bi_rw & REQ_SYNC ? DP_RW_SYNC : 0) | 2480 - (bi_rw & REQ_UNPLUG ? DP_UNPLUG : 0) | 2481 (bi_rw & REQ_FUA ? DP_FUA : 0) | 2482 (bi_rw & REQ_FLUSH ? DP_FLUSH : 0) | 2483 (bi_rw & REQ_DISCARD ? DP_DISCARD : 0); 2484 else 2485 - return bi_rw & (REQ_SYNC | REQ_UNPLUG) ? DP_RW_SYNC : 0; 2486 } 2487 2488 /* Used to send write requests ··· 2716 mdev->open_cnt--; 2717 mutex_unlock(&drbd_main_mutex); 2718 return 0; 2719 - } 2720 - 2721 - static void drbd_unplug_fn(struct request_queue *q) 2722 - { 2723 - struct drbd_conf *mdev = q->queuedata; 2724 - 2725 - /* unplug FIRST */ 2726 - spin_lock_irq(q->queue_lock); 2727 - blk_remove_plug(q); 2728 - spin_unlock_irq(q->queue_lock); 2729 - 2730 - /* only if connected */ 2731 - spin_lock_irq(&mdev->req_lock); 2732 - if (mdev->state.pdsk >= D_INCONSISTENT && mdev->state.conn >= C_CONNECTED) { 2733 - D_ASSERT(mdev->state.role == R_PRIMARY); 2734 - if (test_and_clear_bit(UNPLUG_REMOTE, &mdev->flags)) { 2735 - /* add to the data.work queue, 2736 - * unless already queued. 2737 - * XXX this might be a good addition to drbd_queue_work 2738 - * anyways, to detect "double queuing" ... */ 2739 - if (list_empty(&mdev->unplug_work.list)) 2740 - drbd_queue_work(&mdev->data.work, 2741 - &mdev->unplug_work); 2742 - } 2743 - } 2744 - spin_unlock_irq(&mdev->req_lock); 2745 - 2746 - if (mdev->state.disk >= D_INCONSISTENT) 2747 - drbd_kick_lo(mdev); 2748 } 2749 2750 static void drbd_set_defaults(struct drbd_conf *mdev) ··· 3192 blk_queue_max_segment_size(q, DRBD_MAX_SEGMENT_SIZE); 3193 blk_queue_bounce_limit(q, BLK_BOUNCE_ANY); 3194 blk_queue_merge_bvec(q, drbd_merge_bvec); 3195 - q->queue_lock = &mdev->req_lock; /* needed since we use */ 3196 - /* plugging on a queue, that actually has no requests! */ 3197 - q->unplug_fn = drbd_unplug_fn; 3198 3199 mdev->md_io_page = alloc_page(GFP_KERNEL); 3200 if (!mdev->md_io_page)

··· 2477 { 2478 if (mdev->agreed_pro_version >= 95) 2479 return (bi_rw & REQ_SYNC ? DP_RW_SYNC : 0) | 2480 (bi_rw & REQ_FUA ? DP_FUA : 0) | 2481 (bi_rw & REQ_FLUSH ? DP_FLUSH : 0) | 2482 (bi_rw & REQ_DISCARD ? DP_DISCARD : 0); 2483 else 2484 + return bi_rw & REQ_SYNC ? DP_RW_SYNC : 0; 2485 } 2486 2487 /* Used to send write requests ··· 2717 mdev->open_cnt--; 2718 mutex_unlock(&drbd_main_mutex); 2719 return 0; 2720 } 2721 2722 static void drbd_set_defaults(struct drbd_conf *mdev) ··· 3222 blk_queue_max_segment_size(q, DRBD_MAX_SEGMENT_SIZE); 3223 blk_queue_bounce_limit(q, BLK_BOUNCE_ANY); 3224 blk_queue_merge_bvec(q, drbd_merge_bvec); 3225 + q->queue_lock = &mdev->req_lock; 3226 3227 mdev->md_io_page = alloc_page(GFP_KERNEL); 3228 if (!mdev->md_io_page)

+2 -27

drivers/block/drbd/drbd_receiver.c

··· 187 return NULL; 188 } 189 190 - /* kick lower level device, if we have more than (arbitrary number) 191 - * reference counts on it, which typically are locally submitted io 192 - * requests. don't use unacked_cnt, so we speed up proto A and B, too. */ 193 - static void maybe_kick_lo(struct drbd_conf *mdev) 194 - { 195 - if (atomic_read(&mdev->local_cnt) >= mdev->net_conf->unplug_watermark) 196 - drbd_kick_lo(mdev); 197 - } 198 - 199 static void reclaim_net_ee(struct drbd_conf *mdev, struct list_head *to_be_freed) 200 { 201 struct drbd_epoch_entry *e; ··· 210 LIST_HEAD(reclaimed); 211 struct drbd_epoch_entry *e, *t; 212 213 - maybe_kick_lo(mdev); 214 spin_lock_irq(&mdev->req_lock); 215 reclaim_net_ee(mdev, &reclaimed); 216 spin_unlock_irq(&mdev->req_lock); ··· 426 while (!list_empty(head)) { 427 prepare_to_wait(&mdev->ee_wait, &wait, TASK_UNINTERRUPTIBLE); 428 spin_unlock_irq(&mdev->req_lock); 429 - drbd_kick_lo(mdev); 430 - schedule(); 431 finish_wait(&mdev->ee_wait, &wait); 432 spin_lock_irq(&mdev->req_lock); 433 } ··· 1100 /* > e->sector, unless this is the first bio */ 1101 bio->bi_sector = sector; 1102 bio->bi_bdev = mdev->ldev->backing_bdev; 1103 - /* we special case some flags in the multi-bio case, see below 1104 - * (REQ_UNPLUG) */ 1105 bio->bi_rw = rw; 1106 bio->bi_private = e; 1107 bio->bi_end_io = drbd_endio_sec; ··· 1128 bios = bios->bi_next; 1129 bio->bi_next = NULL; 1130 1131 - /* strip off REQ_UNPLUG unless it is the last bio */ 1132 - if (bios) 1133 - bio->bi_rw &= ~REQ_UNPLUG; 1134 - 1135 drbd_generic_make_request(mdev, fault_type, bio); 1136 } while (bios); 1137 - maybe_kick_lo(mdev); 1138 return 0; 1139 1140 fail: ··· 1148 struct drbd_epoch *epoch; 1149 1150 inc_unacked(mdev); 1151 - 1152 - if (mdev->net_conf->wire_protocol != DRBD_PROT_C) 1153 - drbd_kick_lo(mdev); 1154 1155 mdev->current_epoch->barrier_nr = p->barrier; 1156 rv = drbd_may_finish_epoch(mdev, mdev->current_epoch, EV_GOT_BARRIER_NR); ··· 1615 { 1616 if (mdev->agreed_pro_version >= 95) 1617 return (dpf & DP_RW_SYNC ? REQ_SYNC : 0) | 1618 - (dpf & DP_UNPLUG ? REQ_UNPLUG : 0) | 1619 (dpf & DP_FUA ? REQ_FUA : 0) | 1620 (dpf & DP_FLUSH ? REQ_FUA : 0) | 1621 (dpf & DP_DISCARD ? REQ_DISCARD : 0); 1622 else 1623 - return dpf & DP_RW_SYNC ? (REQ_SYNC | REQ_UNPLUG) : 0; 1624 } 1625 1626 /* mirrored write */ ··· 3534 3535 static int receive_UnplugRemote(struct drbd_conf *mdev, enum drbd_packets cmd, unsigned int data_size) 3536 { 3537 - if (mdev->state.disk >= D_INCONSISTENT) 3538 - drbd_kick_lo(mdev); 3539 - 3540 /* Make sure we've acked all the TCP data associated 3541 * with the data requests being unplugged */ 3542 drbd_tcp_quickack(mdev->data.socket);

··· 187 return NULL; 188 } 189 190 static void reclaim_net_ee(struct drbd_conf *mdev, struct list_head *to_be_freed) 191 { 192 struct drbd_epoch_entry *e; ··· 219 LIST_HEAD(reclaimed); 220 struct drbd_epoch_entry *e, *t; 221 222 spin_lock_irq(&mdev->req_lock); 223 reclaim_net_ee(mdev, &reclaimed); 224 spin_unlock_irq(&mdev->req_lock); ··· 436 while (!list_empty(head)) { 437 prepare_to_wait(&mdev->ee_wait, &wait, TASK_UNINTERRUPTIBLE); 438 spin_unlock_irq(&mdev->req_lock); 439 + io_schedule(); 440 finish_wait(&mdev->ee_wait, &wait); 441 spin_lock_irq(&mdev->req_lock); 442 } ··· 1111 /* > e->sector, unless this is the first bio */ 1112 bio->bi_sector = sector; 1113 bio->bi_bdev = mdev->ldev->backing_bdev; 1114 bio->bi_rw = rw; 1115 bio->bi_private = e; 1116 bio->bi_end_io = drbd_endio_sec; ··· 1141 bios = bios->bi_next; 1142 bio->bi_next = NULL; 1143 1144 drbd_generic_make_request(mdev, fault_type, bio); 1145 } while (bios); 1146 return 0; 1147 1148 fail: ··· 1166 struct drbd_epoch *epoch; 1167 1168 inc_unacked(mdev); 1169 1170 mdev->current_epoch->barrier_nr = p->barrier; 1171 rv = drbd_may_finish_epoch(mdev, mdev->current_epoch, EV_GOT_BARRIER_NR); ··· 1636 { 1637 if (mdev->agreed_pro_version >= 95) 1638 return (dpf & DP_RW_SYNC ? REQ_SYNC : 0) | 1639 (dpf & DP_FUA ? REQ_FUA : 0) | 1640 (dpf & DP_FLUSH ? REQ_FUA : 0) | 1641 (dpf & DP_DISCARD ? REQ_DISCARD : 0); 1642 else 1643 + return dpf & DP_RW_SYNC ? REQ_SYNC : 0; 1644 } 1645 1646 /* mirrored write */ ··· 3556 3557 static int receive_UnplugRemote(struct drbd_conf *mdev, enum drbd_packets cmd, unsigned int data_size) 3558 { 3559 /* Make sure we've acked all the TCP data associated 3560 * with the data requests being unplugged */ 3561 drbd_tcp_quickack(mdev->data.socket);

-4

drivers/block/drbd/drbd_req.c

··· 960 bio_endio(req->private_bio, -EIO); 961 } 962 963 - /* we need to plug ALWAYS since we possibly need to kick lo_dev. 964 - * we plug after submit, so we won't miss an unplug event */ 965 - drbd_plug_device(mdev); 966 - 967 return 0; 968 969 fail_conflicting:

··· 960 bio_endio(req->private_bio, -EIO); 961 } 962 963 return 0; 964 965 fail_conflicting:

-1

drivers/block/drbd/drbd_worker.c

··· 792 * queue (or even the read operations for those packets 793 * is not finished by now). Retry in 100ms. */ 794 795 - drbd_kick_lo(mdev); 796 __set_current_state(TASK_INTERRUPTIBLE); 797 schedule_timeout(HZ / 10); 798 w = kmalloc(sizeof(struct drbd_work), GFP_ATOMIC);

··· 792 * queue (or even the read operations for those packets 793 * is not finished by now). Retry in 100ms. */ 794 795 __set_current_state(TASK_INTERRUPTIBLE); 796 schedule_timeout(HZ / 10); 797 w = kmalloc(sizeof(struct drbd_work), GFP_ATOMIC);

-18

drivers/block/drbd/drbd_wrappers.h

··· 45 generic_make_request(bio); 46 } 47 48 - static inline void drbd_plug_device(struct drbd_conf *mdev) 49 - { 50 - struct request_queue *q; 51 - q = bdev_get_queue(mdev->this_bdev); 52 - 53 - spin_lock_irq(q->queue_lock); 54 - 55 - /* XXX the check on !blk_queue_plugged is redundant, 56 - * implicitly checked in blk_plug_device */ 57 - 58 - if (!blk_queue_plugged(q)) { 59 - blk_plug_device(q); 60 - del_timer(&q->unplug_timer); 61 - /* unplugging should not happen automatically... */ 62 - } 63 - spin_unlock_irq(q->queue_lock); 64 - } 65 - 66 static inline int drbd_crypto_is_hash(struct crypto_tfm *tfm) 67 { 68 return (crypto_tfm_alg_type(tfm) & CRYPTO_ALG_TYPE_HASH_MASK)

··· 45 generic_make_request(bio); 46 } 47 48 static inline int drbd_crypto_is_hash(struct crypto_tfm *tfm) 49 { 50 return (crypto_tfm_alg_type(tfm) & CRYPTO_ALG_TYPE_HASH_MASK)

+6 -5

drivers/block/floppy.c

··· 3770 /* 3771 * Check if the disk has been changed or if a change has been faked. 3772 */ 3773 - static int check_floppy_change(struct gendisk *disk) 3774 { 3775 int drive = (long)disk->private_data; 3776 3777 if (test_bit(FD_DISK_CHANGED_BIT, &UDRS->flags) || 3778 test_bit(FD_VERIFY_BIT, &UDRS->flags)) 3779 - return 1; 3780 3781 if (time_after(jiffies, UDRS->last_checked + UDP->checkfreq)) { 3782 lock_fdc(drive, false); ··· 3789 test_bit(FD_VERIFY_BIT, &UDRS->flags) || 3790 test_bit(drive, &fake_change) || 3791 drive_no_geom(drive)) 3792 - return 1; 3793 return 0; 3794 } 3795 ··· 3838 bio.bi_end_io = floppy_rb0_complete; 3839 3840 submit_bio(READ, &bio); 3841 - generic_unplug_device(bdev_get_queue(bdev)); 3842 process_fd_request(); 3843 wait_for_completion(&complete); 3844 ··· 3898 .release = floppy_release, 3899 .ioctl = fd_ioctl, 3900 .getgeo = fd_getgeo, 3901 - .media_changed = check_floppy_change, 3902 .revalidate_disk = floppy_revalidate, 3903 }; 3904 ··· 4205 disks[dr]->major = FLOPPY_MAJOR; 4206 disks[dr]->first_minor = TOMINOR(dr); 4207 disks[dr]->fops = &floppy_fops; 4208 sprintf(disks[dr]->disk_name, "fd%d", dr); 4209 4210 init_timer(&motor_off_timer[dr]);

··· 3770 /* 3771 * Check if the disk has been changed or if a change has been faked. 3772 */ 3773 + static unsigned int floppy_check_events(struct gendisk *disk, 3774 + unsigned int clearing) 3775 { 3776 int drive = (long)disk->private_data; 3777 3778 if (test_bit(FD_DISK_CHANGED_BIT, &UDRS->flags) || 3779 test_bit(FD_VERIFY_BIT, &UDRS->flags)) 3780 + return DISK_EVENT_MEDIA_CHANGE; 3781 3782 if (time_after(jiffies, UDRS->last_checked + UDP->checkfreq)) { 3783 lock_fdc(drive, false); ··· 3788 test_bit(FD_VERIFY_BIT, &UDRS->flags) || 3789 test_bit(drive, &fake_change) || 3790 drive_no_geom(drive)) 3791 + return DISK_EVENT_MEDIA_CHANGE; 3792 return 0; 3793 } 3794 ··· 3837 bio.bi_end_io = floppy_rb0_complete; 3838 3839 submit_bio(READ, &bio); 3840 process_fd_request(); 3841 wait_for_completion(&complete); 3842 ··· 3898 .release = floppy_release, 3899 .ioctl = fd_ioctl, 3900 .getgeo = fd_getgeo, 3901 + .check_events = floppy_check_events, 3902 .revalidate_disk = floppy_revalidate, 3903 }; 3904 ··· 4205 disks[dr]->major = FLOPPY_MAJOR; 4206 disks[dr]->first_minor = TOMINOR(dr); 4207 disks[dr]->fops = &floppy_fops; 4208 + disks[dr]->events = DISK_EVENT_MEDIA_CHANGE; 4209 sprintf(disks[dr]->disk_name, "fd%d", dr); 4210 4211 init_timer(&motor_off_timer[dr]);

-16

drivers/block/loop.c

··· 540 return 0; 541 } 542 543 - /* 544 - * kick off io on the underlying address space 545 - */ 546 - static void loop_unplug(struct request_queue *q) 547 - { 548 - struct loop_device *lo = q->queuedata; 549 - 550 - queue_flag_clear_unlocked(QUEUE_FLAG_PLUGGED, q); 551 - blk_run_address_space(lo->lo_backing_file->f_mapping); 552 - } 553 - 554 struct switch_request { 555 struct file *file; 556 struct completion wait; ··· 906 */ 907 blk_queue_make_request(lo->lo_queue, loop_make_request); 908 lo->lo_queue->queuedata = lo; 909 - lo->lo_queue->unplug_fn = loop_unplug; 910 911 if (!(lo_flags & LO_FLAGS_READ_ONLY) && file->f_op->fsync) 912 blk_queue_flush(lo->lo_queue, REQ_FLUSH); ··· 1007 1008 kthread_stop(lo->lo_thread); 1009 1010 - lo->lo_queue->unplug_fn = NULL; 1011 lo->lo_backing_file = NULL; 1012 1013 loop_release_xfer(lo); ··· 1623 1624 static void loop_free(struct loop_device *lo) 1625 { 1626 - if (!lo->lo_queue->queue_lock) 1627 - lo->lo_queue->queue_lock = &lo->lo_queue->__queue_lock; 1628 - 1629 blk_cleanup_queue(lo->lo_queue); 1630 put_disk(lo->lo_disk); 1631 list_del(&lo->lo_list);

··· 540 return 0; 541 } 542 543 struct switch_request { 544 struct file *file; 545 struct completion wait; ··· 917 */ 918 blk_queue_make_request(lo->lo_queue, loop_make_request); 919 lo->lo_queue->queuedata = lo; 920 921 if (!(lo_flags & LO_FLAGS_READ_ONLY) && file->f_op->fsync) 922 blk_queue_flush(lo->lo_queue, REQ_FLUSH); ··· 1019 1020 kthread_stop(lo->lo_thread); 1021 1022 lo->lo_backing_file = NULL; 1023 1024 loop_release_xfer(lo); ··· 1636 1637 static void loop_free(struct loop_device *lo) 1638 { 1639 blk_cleanup_queue(lo->lo_queue); 1640 put_disk(lo->lo_disk); 1641 list_del(&lo->lo_list);

+11 -7

drivers/block/paride/pcd.c

··· 172 static int pcd_open(struct cdrom_device_info *cdi, int purpose); 173 static void pcd_release(struct cdrom_device_info *cdi); 174 static int pcd_drive_status(struct cdrom_device_info *cdi, int slot_nr); 175 - static int pcd_media_changed(struct cdrom_device_info *cdi, int slot_nr); 176 static int pcd_tray_move(struct cdrom_device_info *cdi, int position); 177 static int pcd_lock_door(struct cdrom_device_info *cdi, int lock); 178 static int pcd_drive_reset(struct cdrom_device_info *cdi); ··· 258 return ret; 259 } 260 261 - static int pcd_block_media_changed(struct gendisk *disk) 262 { 263 struct pcd_unit *cd = disk->private_data; 264 - return cdrom_media_changed(&cd->info); 265 } 266 267 static const struct block_device_operations pcd_bdops = { ··· 270 .open = pcd_block_open, 271 .release = pcd_block_release, 272 .ioctl = pcd_block_ioctl, 273 - .media_changed = pcd_block_media_changed, 274 }; 275 276 static struct cdrom_device_ops pcd_dops = { 277 .open = pcd_open, 278 .release = pcd_release, 279 .drive_status = pcd_drive_status, 280 - .media_changed = pcd_media_changed, 281 .tray_move = pcd_tray_move, 282 .lock_door = pcd_lock_door, 283 .get_mcn = pcd_get_mcn, ··· 320 disk->first_minor = unit; 321 strcpy(disk->disk_name, cd->name); /* umm... */ 322 disk->fops = &pcd_bdops; 323 } 324 } 325 ··· 505 506 #define DBMSG(msg) ((verbose>1)?(msg):NULL) 507 508 - static int pcd_media_changed(struct cdrom_device_info *cdi, int slot_nr) 509 { 510 struct pcd_unit *cd = cdi->handle; 511 int res = cd->changed; 512 if (res) 513 cd->changed = 0; 514 - return res; 515 } 516 517 static int pcd_lock_door(struct cdrom_device_info *cdi, int lock)

··· 172 static int pcd_open(struct cdrom_device_info *cdi, int purpose); 173 static void pcd_release(struct cdrom_device_info *cdi); 174 static int pcd_drive_status(struct cdrom_device_info *cdi, int slot_nr); 175 + static unsigned int pcd_check_events(struct cdrom_device_info *cdi, 176 + unsigned int clearing, int slot_nr); 177 static int pcd_tray_move(struct cdrom_device_info *cdi, int position); 178 static int pcd_lock_door(struct cdrom_device_info *cdi, int lock); 179 static int pcd_drive_reset(struct cdrom_device_info *cdi); ··· 257 return ret; 258 } 259 260 + static unsigned int pcd_block_check_events(struct gendisk *disk, 261 + unsigned int clearing) 262 { 263 struct pcd_unit *cd = disk->private_data; 264 + return cdrom_check_events(&cd->info, clearing); 265 } 266 267 static const struct block_device_operations pcd_bdops = { ··· 268 .open = pcd_block_open, 269 .release = pcd_block_release, 270 .ioctl = pcd_block_ioctl, 271 + .check_events = pcd_block_check_events, 272 }; 273 274 static struct cdrom_device_ops pcd_dops = { 275 .open = pcd_open, 276 .release = pcd_release, 277 .drive_status = pcd_drive_status, 278 + .check_events = pcd_check_events, 279 .tray_move = pcd_tray_move, 280 .lock_door = pcd_lock_door, 281 .get_mcn = pcd_get_mcn, ··· 318 disk->first_minor = unit; 319 strcpy(disk->disk_name, cd->name); /* umm... */ 320 disk->fops = &pcd_bdops; 321 + disk->events = DISK_EVENT_MEDIA_CHANGE; 322 } 323 } 324 ··· 502 503 #define DBMSG(msg) ((verbose>1)?(msg):NULL) 504 505 + static unsigned int pcd_check_events(struct cdrom_device_info *cdi, 506 + unsigned int clearing, int slot_nr) 507 { 508 struct pcd_unit *cd = cdi->handle; 509 int res = cd->changed; 510 if (res) 511 cd->changed = 0; 512 + return res ? DISK_EVENT_MEDIA_CHANGE : 0; 513 } 514 515 static int pcd_lock_door(struct cdrom_device_info *cdi, int lock)

+4 -3

drivers/block/paride/pd.c

··· 794 return 0; 795 } 796 797 - static int pd_check_media(struct gendisk *p) 798 { 799 struct pd_unit *disk = p->private_data; 800 int r; ··· 803 pd_special_command(disk, pd_media_check); 804 r = disk->changed; 805 disk->changed = 0; 806 - return r; 807 } 808 809 static int pd_revalidate(struct gendisk *p) ··· 822 .release = pd_release, 823 .ioctl = pd_ioctl, 824 .getgeo = pd_getgeo, 825 - .media_changed = pd_check_media, 826 .revalidate_disk= pd_revalidate 827 }; 828 ··· 837 p->fops = &pd_fops; 838 p->major = major; 839 p->first_minor = (disk - pd) << PD_BITS; 840 disk->gd = p; 841 p->private_data = disk; 842 p->queue = pd_queue;

··· 794 return 0; 795 } 796 797 + static unsigned int pd_check_events(struct gendisk *p, unsigned int clearing) 798 { 799 struct pd_unit *disk = p->private_data; 800 int r; ··· 803 pd_special_command(disk, pd_media_check); 804 r = disk->changed; 805 disk->changed = 0; 806 + return r ? DISK_EVENT_MEDIA_CHANGE : 0; 807 } 808 809 static int pd_revalidate(struct gendisk *p) ··· 822 .release = pd_release, 823 .ioctl = pd_ioctl, 824 .getgeo = pd_getgeo, 825 + .check_events = pd_check_events, 826 .revalidate_disk= pd_revalidate 827 }; 828 ··· 837 p->fops = &pd_fops; 838 p->major = major; 839 p->first_minor = (disk - pd) << PD_BITS; 840 + p->events = DISK_EVENT_MEDIA_CHANGE; 841 disk->gd = p; 842 p->private_data = disk; 843 p->queue = pd_queue;

+6 -4

drivers/block/paride/pf.c

··· 243 static int pf_identify(struct pf_unit *pf); 244 static void pf_lock(struct pf_unit *pf, int func); 245 static void pf_eject(struct pf_unit *pf); 246 - static int pf_check_media(struct gendisk *disk); 247 248 static char pf_scratch[512]; /* scratch block buffer */ 249 ··· 271 .release = pf_release, 272 .ioctl = pf_ioctl, 273 .getgeo = pf_getgeo, 274 - .media_changed = pf_check_media, 275 }; 276 277 static void __init pf_init_units(void) ··· 294 disk->first_minor = unit; 295 strcpy(disk->disk_name, pf->name); 296 disk->fops = &pf_fops; 297 if (!(*drives[unit])[D_PRT]) 298 pf_drive_count++; 299 } ··· 379 380 } 381 382 - static int pf_check_media(struct gendisk *disk) 383 { 384 - return 1; 385 } 386 387 static inline int status_reg(struct pf_unit *pf)

··· 243 static int pf_identify(struct pf_unit *pf); 244 static void pf_lock(struct pf_unit *pf, int func); 245 static void pf_eject(struct pf_unit *pf); 246 + static unsigned int pf_check_events(struct gendisk *disk, 247 + unsigned int clearing); 248 249 static char pf_scratch[512]; /* scratch block buffer */ 250 ··· 270 .release = pf_release, 271 .ioctl = pf_ioctl, 272 .getgeo = pf_getgeo, 273 + .check_events = pf_check_events, 274 }; 275 276 static void __init pf_init_units(void) ··· 293 disk->first_minor = unit; 294 strcpy(disk->disk_name, pf->name); 295 disk->fops = &pf_fops; 296 + disk->events = DISK_EVENT_MEDIA_CHANGE; 297 if (!(*drives[unit])[D_PRT]) 298 pf_drive_count++; 299 } ··· 377 378 } 379 380 + static unsigned int pf_check_events(struct gendisk *disk, unsigned int clearing) 381 { 382 + return DISK_EVENT_MEDIA_CHANGE; 383 } 384 385 static inline int status_reg(struct pf_unit *pf)

+9 -6

drivers/block/pktcdvd.c

··· 1606 min_sleep_time = pkt->sleep_time; 1607 } 1608 1609 - generic_unplug_device(bdev_get_queue(pd->bdev)); 1610 - 1611 VPRINTK("kcdrwd: sleeping\n"); 1612 residue = schedule_timeout(min_sleep_time); 1613 VPRINTK("kcdrwd: wake up\n"); ··· 2794 return ret; 2795 } 2796 2797 - static int pkt_media_changed(struct gendisk *disk) 2798 { 2799 struct pktcdvd_device *pd = disk->private_data; 2800 struct gendisk *attached_disk; ··· 2805 if (!pd->bdev) 2806 return 0; 2807 attached_disk = pd->bdev->bd_disk; 2808 - if (!attached_disk) 2809 return 0; 2810 - return attached_disk->fops->media_changed(attached_disk); 2811 } 2812 2813 static const struct block_device_operations pktcdvd_ops = { ··· 2815 .open = pkt_open, 2816 .release = pkt_close, 2817 .ioctl = pkt_ioctl, 2818 - .media_changed = pkt_media_changed, 2819 }; 2820 2821 static char *pktcdvd_devnode(struct gendisk *gd, mode_t *mode) ··· 2887 ret = pkt_new_dev(pd, dev); 2888 if (ret) 2889 goto out_new_dev; 2890 2891 add_disk(disk); 2892

··· 1606 min_sleep_time = pkt->sleep_time; 1607 } 1608 1609 VPRINTK("kcdrwd: sleeping\n"); 1610 residue = schedule_timeout(min_sleep_time); 1611 VPRINTK("kcdrwd: wake up\n"); ··· 2796 return ret; 2797 } 2798 2799 + static unsigned int pkt_check_events(struct gendisk *disk, 2800 + unsigned int clearing) 2801 { 2802 struct pktcdvd_device *pd = disk->private_data; 2803 struct gendisk *attached_disk; ··· 2806 if (!pd->bdev) 2807 return 0; 2808 attached_disk = pd->bdev->bd_disk; 2809 + if (!attached_disk || !attached_disk->fops->check_events) 2810 return 0; 2811 + return attached_disk->fops->check_events(attached_disk, clearing); 2812 } 2813 2814 static const struct block_device_operations pktcdvd_ops = { ··· 2816 .open = pkt_open, 2817 .release = pkt_close, 2818 .ioctl = pkt_ioctl, 2819 + .check_events = pkt_check_events, 2820 }; 2821 2822 static char *pktcdvd_devnode(struct gendisk *gd, mode_t *mode) ··· 2888 ret = pkt_new_dev(pd, dev); 2889 if (ret) 2890 goto out_new_dev; 2891 + 2892 + /* inherit events of the host device */ 2893 + disk->events = pd->bdev->bd_disk->events; 2894 + disk->async_events = pd->bdev->bd_disk->async_events; 2895 2896 add_disk(disk); 2897

+5 -3

drivers/block/swim.c

··· 741 return 0; 742 } 743 744 - static int floppy_check_change(struct gendisk *disk) 745 { 746 struct floppy_state *fs = disk->private_data; 747 748 - return fs->ejected; 749 } 750 751 static int floppy_revalidate(struct gendisk *disk) ··· 773 .release = floppy_release, 774 .ioctl = floppy_ioctl, 775 .getgeo = floppy_getgeo, 776 - .media_changed = floppy_check_change, 777 .revalidate_disk = floppy_revalidate, 778 }; 779 ··· 858 swd->unit[drive].disk->first_minor = drive; 859 sprintf(swd->unit[drive].disk->disk_name, "fd%d", drive); 860 swd->unit[drive].disk->fops = &floppy_fops; 861 swd->unit[drive].disk->private_data = &swd->unit[drive]; 862 swd->unit[drive].disk->queue = swd->queue; 863 set_capacity(swd->unit[drive].disk, 2880);

··· 741 return 0; 742 } 743 744 + static unsigned int floppy_check_events(struct gendisk *disk, 745 + unsigned int clearing) 746 { 747 struct floppy_state *fs = disk->private_data; 748 749 + return fs->ejected ? DISK_EVENT_MEDIA_CHANGE : 0; 750 } 751 752 static int floppy_revalidate(struct gendisk *disk) ··· 772 .release = floppy_release, 773 .ioctl = floppy_ioctl, 774 .getgeo = floppy_getgeo, 775 + .check_events = floppy_check_events, 776 .revalidate_disk = floppy_revalidate, 777 }; 778 ··· 857 swd->unit[drive].disk->first_minor = drive; 858 sprintf(swd->unit[drive].disk->disk_name, "fd%d", drive); 859 swd->unit[drive].disk->fops = &floppy_fops; 860 + swd->unit[drive].disk->events = DISK_EVENT_MEDIA_CHANGE; 861 swd->unit[drive].disk->private_data = &swd->unit[drive]; 862 swd->unit[drive].disk->queue = swd->queue; 863 set_capacity(swd->unit[drive].disk, 2880);

+7 -4

drivers/block/swim3.c

··· 250 unsigned int cmd, unsigned long param); 251 static int floppy_open(struct block_device *bdev, fmode_t mode); 252 static int floppy_release(struct gendisk *disk, fmode_t mode); 253 - static int floppy_check_change(struct gendisk *disk); 254 static int floppy_revalidate(struct gendisk *disk); 255 256 static bool swim3_end_request(int err, unsigned int nr_bytes) ··· 976 return 0; 977 } 978 979 - static int floppy_check_change(struct gendisk *disk) 980 { 981 struct floppy_state *fs = disk->private_data; 982 - return fs->ejected; 983 } 984 985 static int floppy_revalidate(struct gendisk *disk) ··· 1027 .open = floppy_unlocked_open, 1028 .release = floppy_release, 1029 .ioctl = floppy_ioctl, 1030 - .media_changed = floppy_check_change, 1031 .revalidate_disk= floppy_revalidate, 1032 }; 1033 ··· 1163 disk->major = FLOPPY_MAJOR; 1164 disk->first_minor = i; 1165 disk->fops = &floppy_fops; 1166 disk->private_data = &floppy_states[i]; 1167 disk->queue = swim3_queue; 1168 disk->flags |= GENHD_FL_REMOVABLE;

··· 250 unsigned int cmd, unsigned long param); 251 static int floppy_open(struct block_device *bdev, fmode_t mode); 252 static int floppy_release(struct gendisk *disk, fmode_t mode); 253 + static unsigned int floppy_check_events(struct gendisk *disk, 254 + unsigned int clearing); 255 static int floppy_revalidate(struct gendisk *disk); 256 257 static bool swim3_end_request(int err, unsigned int nr_bytes) ··· 975 return 0; 976 } 977 978 + static unsigned int floppy_check_events(struct gendisk *disk, 979 + unsigned int clearing) 980 { 981 struct floppy_state *fs = disk->private_data; 982 + return fs->ejected ? DISK_EVENT_MEDIA_CHANGE : 0; 983 } 984 985 static int floppy_revalidate(struct gendisk *disk) ··· 1025 .open = floppy_unlocked_open, 1026 .release = floppy_release, 1027 .ioctl = floppy_ioctl, 1028 + .check_events = floppy_check_events, 1029 .revalidate_disk= floppy_revalidate, 1030 }; 1031 ··· 1161 disk->major = FLOPPY_MAJOR; 1162 disk->first_minor = i; 1163 disk->fops = &floppy_fops; 1164 + disk->events = DISK_EVENT_MEDIA_CHANGE; 1165 disk->private_data = &floppy_states[i]; 1166 disk->queue = swim3_queue; 1167 disk->flags |= GENHD_FL_REMOVABLE;

+6 -4

drivers/block/ub.c

··· 1788 * 1789 * The return code is bool! 1790 */ 1791 - static int ub_bd_media_changed(struct gendisk *disk) 1792 { 1793 struct ub_lun *lun = disk->private_data; 1794 ··· 1807 */ 1808 if (ub_sync_tur(lun->udev, lun) != 0) { 1809 lun->changed = 1; 1810 - return 1; 1811 } 1812 1813 - return lun->changed; 1814 } 1815 1816 static const struct block_device_operations ub_bd_fops = { ··· 1818 .open = ub_bd_unlocked_open, 1819 .release = ub_bd_release, 1820 .ioctl = ub_bd_ioctl, 1821 - .media_changed = ub_bd_media_changed, 1822 .revalidate_disk = ub_bd_revalidate, 1823 }; 1824 ··· 2334 disk->major = UB_MAJOR; 2335 disk->first_minor = lun->id * UB_PARTS_PER_LUN; 2336 disk->fops = &ub_bd_fops; 2337 disk->private_data = lun; 2338 disk->driverfs_dev = &sc->intf->dev; 2339

··· 1788 * 1789 * The return code is bool! 1790 */ 1791 + static unsigned int ub_bd_check_events(struct gendisk *disk, 1792 + unsigned int clearing) 1793 { 1794 struct ub_lun *lun = disk->private_data; 1795 ··· 1806 */ 1807 if (ub_sync_tur(lun->udev, lun) != 0) { 1808 lun->changed = 1; 1809 + return DISK_EVENT_MEDIA_CHANGE; 1810 } 1811 1812 + return lun->changed ? DISK_EVENT_MEDIA_CHANGE : 0; 1813 } 1814 1815 static const struct block_device_operations ub_bd_fops = { ··· 1817 .open = ub_bd_unlocked_open, 1818 .release = ub_bd_release, 1819 .ioctl = ub_bd_ioctl, 1820 + .check_events = ub_bd_check_events, 1821 .revalidate_disk = ub_bd_revalidate, 1822 }; 1823 ··· 2333 disk->major = UB_MAJOR; 2334 disk->first_minor = lun->id * UB_PARTS_PER_LUN; 2335 disk->fops = &ub_bd_fops; 2336 + disk->events = DISK_EVENT_MEDIA_CHANGE; 2337 disk->private_data = lun; 2338 disk->driverfs_dev = &sc->intf->dev; 2339

+1 -25

drivers/block/umem.c

··· 241 * 242 * Whenever IO on the active page completes, the Ready page is activated 243 * and the ex-Active page is clean out and made Ready. 244 - * Otherwise the Ready page is only activated when it becomes full, or 245 - * when mm_unplug_device is called via the unplug_io_fn. 246 * 247 * If a request arrives while both pages a full, it is queued, and b_rdev is 248 * overloaded to record whether it was a read or a write. ··· 330 page->headcnt = 0; 331 page->bio = NULL; 332 page->biotail = &page->bio; 333 - } 334 - 335 - static void mm_unplug_device(struct request_queue *q) 336 - { 337 - struct cardinfo *card = q->queuedata; 338 - unsigned long flags; 339 - 340 - spin_lock_irqsave(&card->lock, flags); 341 - if (blk_remove_plug(q)) 342 - activate(card); 343 - spin_unlock_irqrestore(&card->lock, flags); 344 } 345 346 /* ··· 523 *card->biotail = bio; 524 bio->bi_next = NULL; 525 card->biotail = &bio->bi_next; 526 - blk_plug_device(q); 527 spin_unlock_irq(&card->lock); 528 529 return 0; ··· 766 return 0; 767 } 768 769 - /* 770 - * Future support for removable devices 771 - */ 772 - static int mm_check_change(struct gendisk *disk) 773 - { 774 - /* struct cardinfo *dev = disk->private_data; */ 775 - return 0; 776 - } 777 - 778 static const struct block_device_operations mm_fops = { 779 .owner = THIS_MODULE, 780 .getgeo = mm_getgeo, 781 .revalidate_disk = mm_revalidate, 782 - .media_changed = mm_check_change, 783 }; 784 785 static int __devinit mm_pci_probe(struct pci_dev *dev, ··· 884 blk_queue_make_request(card->queue, mm_make_request); 885 card->queue->queue_lock = &card->lock; 886 card->queue->queuedata = card; 887 - card->queue->unplug_fn = mm_unplug_device; 888 889 tasklet_init(&card->tasklet, process_page, (unsigned long)card); 890

··· 241 * 242 * Whenever IO on the active page completes, the Ready page is activated 243 * and the ex-Active page is clean out and made Ready. 244 + * Otherwise the Ready page is only activated when it becomes full. 245 * 246 * If a request arrives while both pages a full, it is queued, and b_rdev is 247 * overloaded to record whether it was a read or a write. ··· 331 page->headcnt = 0; 332 page->bio = NULL; 333 page->biotail = &page->bio; 334 } 335 336 /* ··· 535 *card->biotail = bio; 536 bio->bi_next = NULL; 537 card->biotail = &bio->bi_next; 538 spin_unlock_irq(&card->lock); 539 540 return 0; ··· 779 return 0; 780 } 781 782 static const struct block_device_operations mm_fops = { 783 .owner = THIS_MODULE, 784 .getgeo = mm_getgeo, 785 .revalidate_disk = mm_revalidate, 786 }; 787 788 static int __devinit mm_pci_probe(struct pci_dev *dev, ··· 907 blk_queue_make_request(card->queue, mm_make_request); 908 card->queue->queue_lock = &card->lock; 909 card->queue->queuedata = card; 910 911 tasklet_init(&card->tasklet, process_page, (unsigned long)card); 912

+5 -4

drivers/block/xsysace.c

··· 867 } 868 } 869 870 - static int ace_media_changed(struct gendisk *gd) 871 { 872 struct ace_device *ace = gd->private_data; 873 - dev_dbg(ace->dev, "ace_media_changed(): %i\n", ace->media_change); 874 875 - return ace->media_change; 876 } 877 878 static int ace_revalidate_disk(struct gendisk *gd) ··· 953 .owner = THIS_MODULE, 954 .open = ace_open, 955 .release = ace_release, 956 - .media_changed = ace_media_changed, 957 .revalidate_disk = ace_revalidate_disk, 958 .getgeo = ace_getgeo, 959 }; ··· 1005 ace->gd->major = ace_major; 1006 ace->gd->first_minor = ace->id * ACE_NUM_MINORS; 1007 ace->gd->fops = &ace_fops; 1008 ace->gd->queue = ace->queue; 1009 ace->gd->private_data = ace; 1010 snprintf(ace->gd->disk_name, 32, "xs%c", ace->id + 'a');

··· 867 } 868 } 869 870 + static unsigned int ace_check_events(struct gendisk *gd, unsigned int clearing) 871 { 872 struct ace_device *ace = gd->private_data; 873 + dev_dbg(ace->dev, "ace_check_events(): %i\n", ace->media_change); 874 875 + return ace->media_change ? DISK_EVENT_MEDIA_CHANGE : 0; 876 } 877 878 static int ace_revalidate_disk(struct gendisk *gd) ··· 953 .owner = THIS_MODULE, 954 .open = ace_open, 955 .release = ace_release, 956 + .check_events = ace_check_events, 957 .revalidate_disk = ace_revalidate_disk, 958 .getgeo = ace_getgeo, 959 }; ··· 1005 ace->gd->major = ace_major; 1006 ace->gd->first_minor = ace->id * ACE_NUM_MINORS; 1007 ace->gd->fops = &ace_fops; 1008 + ace->gd->events = DISK_EVENT_MEDIA_CHANGE; 1009 ace->gd->queue = ace->queue; 1010 ace->gd->private_data = ace; 1011 snprintf(ace->gd->disk_name, 32, "xs%c", ace->id + 'a');

+10 -6

drivers/cdrom/gdrom.c

··· 395 return CDS_NO_INFO; 396 } 397 398 - static int gdrom_mediachanged(struct cdrom_device_info *cd_info, int ignore) 399 { 400 /* check the sense key */ 401 - return (__raw_readb(GDROM_ERROR_REG) & 0xF0) == 0x60; 402 } 403 404 /* reset the G1 bus */ ··· 485 .open = gdrom_open, 486 .release = gdrom_release, 487 .drive_status = gdrom_drivestatus, 488 - .media_changed = gdrom_mediachanged, 489 .get_last_session = gdrom_get_last_session, 490 .reset = gdrom_hardreset, 491 .audio_ioctl = gdrom_audio_ioctl, ··· 511 return 0; 512 } 513 514 - static int gdrom_bdops_mediachanged(struct gendisk *disk) 515 { 516 - return cdrom_media_changed(gd.cd_info); 517 } 518 519 static int gdrom_bdops_ioctl(struct block_device *bdev, fmode_t mode, ··· 533 .owner = THIS_MODULE, 534 .open = gdrom_bdops_open, 535 .release = gdrom_bdops_release, 536 - .media_changed = gdrom_bdops_mediachanged, 537 .ioctl = gdrom_bdops_ioctl, 538 }; 539 ··· 803 goto probe_fail_cdrom_register; 804 } 805 gd.disk->fops = &gdrom_bdops; 806 /* latch on to the interrupt */ 807 err = gdrom_set_interrupt_handlers(); 808 if (err)

··· 395 return CDS_NO_INFO; 396 } 397 398 + static unsigned int gdrom_check_events(struct cdrom_device_info *cd_info, 399 + unsigned int clearing, int ignore) 400 { 401 /* check the sense key */ 402 + return (__raw_readb(GDROM_ERROR_REG) & 0xF0) == 0x60 ? 403 + DISK_EVENT_MEDIA_CHANGE : 0; 404 } 405 406 /* reset the G1 bus */ ··· 483 .open = gdrom_open, 484 .release = gdrom_release, 485 .drive_status = gdrom_drivestatus, 486 + .check_events = gdrom_check_events, 487 .get_last_session = gdrom_get_last_session, 488 .reset = gdrom_hardreset, 489 .audio_ioctl = gdrom_audio_ioctl, ··· 509 return 0; 510 } 511 512 + static unsigned int gdrom_bdops_check_events(struct gendisk *disk, 513 + unsigned int clearing) 514 { 515 + return cdrom_check_events(gd.cd_info, clearing); 516 } 517 518 static int gdrom_bdops_ioctl(struct block_device *bdev, fmode_t mode, ··· 530 .owner = THIS_MODULE, 531 .open = gdrom_bdops_open, 532 .release = gdrom_bdops_release, 533 + .check_events = gdrom_bdops_check_events, 534 .ioctl = gdrom_bdops_ioctl, 535 }; 536 ··· 800 goto probe_fail_cdrom_register; 801 } 802 gd.disk->fops = &gdrom_bdops; 803 + gd.disk->events = DISK_EVENT_MEDIA_CHANGE; 804 /* latch on to the interrupt */ 805 err = gdrom_set_interrupt_handlers(); 806 if (err)

+10 -7

drivers/cdrom/viocd.c

··· 186 return ret; 187 } 188 189 - static int viocd_blk_media_changed(struct gendisk *disk) 190 { 191 struct disk_info *di = disk->private_data; 192 - return cdrom_media_changed(&di->viocd_info); 193 } 194 195 static const struct block_device_operations viocd_fops = { ··· 198 .open = viocd_blk_open, 199 .release = viocd_blk_release, 200 .ioctl = viocd_blk_ioctl, 201 - .media_changed = viocd_blk_media_changed, 202 }; 203 204 static int viocd_open(struct cdrom_device_info *cdi, int purpose) ··· 321 } 322 } 323 324 - static int viocd_media_changed(struct cdrom_device_info *cdi, int disc_nr) 325 { 326 struct viocd_waitevent we; 327 HvLpEvent_Rc hvrc; ··· 342 if (hvrc != 0) { 343 pr_warning("bad rc on HvCallEvent_signalLpEventFast %d\n", 344 (int)hvrc); 345 - return -EIO; 346 } 347 348 wait_for_completion(&we.com); ··· 356 return 0; 357 } 358 359 - return we.changed; 360 } 361 362 static int viocd_lock_door(struct cdrom_device_info *cdi, int locking) ··· 552 static struct cdrom_device_ops viocd_dops = { 553 .open = viocd_open, 554 .release = viocd_release, 555 - .media_changed = viocd_media_changed, 556 .lock_door = viocd_lock_door, 557 .generic_packet = viocd_packet, 558 .audio_ioctl = viocd_audio_ioctl, ··· 626 gendisk->queue = q; 627 gendisk->fops = &viocd_fops; 628 gendisk->flags = GENHD_FL_CD|GENHD_FL_REMOVABLE; 629 set_capacity(gendisk, 0); 630 gendisk->private_data = d; 631 d->viocd_disk = gendisk;

··· 186 return ret; 187 } 188 189 + static unsigned int viocd_blk_check_events(struct gendisk *disk, 190 + unsigned int clearing) 191 { 192 struct disk_info *di = disk->private_data; 193 + return cdrom_check_events(&di->viocd_info, clearing); 194 } 195 196 static const struct block_device_operations viocd_fops = { ··· 197 .open = viocd_blk_open, 198 .release = viocd_blk_release, 199 .ioctl = viocd_blk_ioctl, 200 + .check_events = viocd_blk_check_events, 201 }; 202 203 static int viocd_open(struct cdrom_device_info *cdi, int purpose) ··· 320 } 321 } 322 323 + static unsigned int viocd_check_events(struct cdrom_device_info *cdi, 324 + unsigned int clearing, int disc_nr) 325 { 326 struct viocd_waitevent we; 327 HvLpEvent_Rc hvrc; ··· 340 if (hvrc != 0) { 341 pr_warning("bad rc on HvCallEvent_signalLpEventFast %d\n", 342 (int)hvrc); 343 + return 0; 344 } 345 346 wait_for_completion(&we.com); ··· 354 return 0; 355 } 356 357 + return we.changed ? DISK_EVENT_MEDIA_CHANGE : 0; 358 } 359 360 static int viocd_lock_door(struct cdrom_device_info *cdi, int locking) ··· 550 static struct cdrom_device_ops viocd_dops = { 551 .open = viocd_open, 552 .release = viocd_release, 553 + .check_events = viocd_check_events, 554 .lock_door = viocd_lock_door, 555 .generic_packet = viocd_packet, 556 .audio_ioctl = viocd_audio_ioctl, ··· 624 gendisk->queue = q; 625 gendisk->fops = &viocd_fops; 626 gendisk->flags = GENHD_FL_CD|GENHD_FL_REMOVABLE; 627 + gendisk->events = DISK_EVENT_MEDIA_CHANGE; 628 set_capacity(gendisk, 0); 629 gendisk->private_data = d; 630 d->viocd_disk = gendisk;

+1 -2

drivers/ide/ide-atapi.c

··· 233 234 drive->hwif->rq = NULL; 235 236 - elv_add_request(drive->queue, &drive->sense_rq, 237 - ELEVATOR_INSERT_FRONT, 0); 238 return 0; 239 } 240 EXPORT_SYMBOL_GPL(ide_queue_sense_rq);

··· 233 234 drive->hwif->rq = NULL; 235 236 + elv_add_request(drive->queue, &drive->sense_rq, ELEVATOR_INSERT_FRONT); 237 return 0; 238 } 239 EXPORT_SYMBOL_GPL(ide_queue_sense_rq);

+8 -15

drivers/ide/ide-cd.c

··· 258 if (time_after(jiffies, info->write_timeout)) 259 return 0; 260 else { 261 - struct request_queue *q = drive->queue; 262 - unsigned long flags; 263 - 264 /* 265 - * take a breather relying on the unplug timer to kick us again 266 */ 267 - 268 - spin_lock_irqsave(q->queue_lock, flags); 269 - blk_plug_device(q); 270 - spin_unlock_irqrestore(q->queue_lock, flags); 271 - 272 return 1; 273 } 274 } ··· 1170 .open = ide_cdrom_open_real, 1171 .release = ide_cdrom_release_real, 1172 .drive_status = ide_cdrom_drive_status, 1173 - .media_changed = ide_cdrom_check_media_change_real, 1174 .tray_move = ide_cdrom_tray_move, 1175 .lock_door = ide_cdrom_lock_door, 1176 .select_speed = ide_cdrom_select_speed, ··· 1507 blk_queue_dma_alignment(q, 31); 1508 blk_queue_update_dma_pad(q, 15); 1509 1510 - q->unplug_delay = max((1 * HZ) / 1000, 1); 1511 - 1512 drive->dev_flags |= IDE_DFLAG_MEDIA_CHANGED; 1513 drive->atapi_flags = IDE_AFLAG_NO_EJECT | ide_cd_flags(id); 1514 ··· 1693 } 1694 1695 1696 - static int idecd_media_changed(struct gendisk *disk) 1697 { 1698 struct cdrom_info *info = ide_drv_g(disk, cdrom_info); 1699 - return cdrom_media_changed(&info->devinfo); 1700 } 1701 1702 static int idecd_revalidate_disk(struct gendisk *disk) ··· 1715 .open = idecd_open, 1716 .release = idecd_release, 1717 .ioctl = idecd_ioctl, 1718 - .media_changed = idecd_media_changed, 1719 .revalidate_disk = idecd_revalidate_disk 1720 }; 1721 ··· 1782 ide_cd_read_toc(drive, &sense); 1783 g->fops = &idecd_ops; 1784 g->flags |= GENHD_FL_REMOVABLE; 1785 add_disk(g); 1786 return 0; 1787

··· 258 if (time_after(jiffies, info->write_timeout)) 259 return 0; 260 else { 261 /* 262 + * take a breather 263 */ 264 + blk_delay_queue(drive->queue, 1); 265 return 1; 266 } 267 } ··· 1177 .open = ide_cdrom_open_real, 1178 .release = ide_cdrom_release_real, 1179 .drive_status = ide_cdrom_drive_status, 1180 + .check_events = ide_cdrom_check_events_real, 1181 .tray_move = ide_cdrom_tray_move, 1182 .lock_door = ide_cdrom_lock_door, 1183 .select_speed = ide_cdrom_select_speed, ··· 1514 blk_queue_dma_alignment(q, 31); 1515 blk_queue_update_dma_pad(q, 15); 1516 1517 drive->dev_flags |= IDE_DFLAG_MEDIA_CHANGED; 1518 drive->atapi_flags = IDE_AFLAG_NO_EJECT | ide_cd_flags(id); 1519 ··· 1702 } 1703 1704 1705 + static unsigned int idecd_check_events(struct gendisk *disk, 1706 + unsigned int clearing) 1707 { 1708 struct cdrom_info *info = ide_drv_g(disk, cdrom_info); 1709 + return cdrom_check_events(&info->devinfo, clearing); 1710 } 1711 1712 static int idecd_revalidate_disk(struct gendisk *disk) ··· 1723 .open = idecd_open, 1724 .release = idecd_release, 1725 .ioctl = idecd_ioctl, 1726 + .check_events = idecd_check_events, 1727 .revalidate_disk = idecd_revalidate_disk 1728 }; 1729 ··· 1790 ide_cd_read_toc(drive, &sense); 1791 g->fops = &idecd_ops; 1792 g->flags |= GENHD_FL_REMOVABLE; 1793 + g->events = DISK_EVENT_MEDIA_CHANGE; 1794 add_disk(g); 1795 return 0; 1796

+2 -1

drivers/ide/ide-cd.h

··· 111 int ide_cdrom_open_real(struct cdrom_device_info *, int); 112 void ide_cdrom_release_real(struct cdrom_device_info *); 113 int ide_cdrom_drive_status(struct cdrom_device_info *, int); 114 - int ide_cdrom_check_media_change_real(struct cdrom_device_info *, int); 115 int ide_cdrom_tray_move(struct cdrom_device_info *, int); 116 int ide_cdrom_lock_door(struct cdrom_device_info *, int); 117 int ide_cdrom_select_speed(struct cdrom_device_info *, int);

··· 111 int ide_cdrom_open_real(struct cdrom_device_info *, int); 112 void ide_cdrom_release_real(struct cdrom_device_info *); 113 int ide_cdrom_drive_status(struct cdrom_device_info *, int); 114 + unsigned int ide_cdrom_check_events_real(struct cdrom_device_info *, 115 + unsigned int clearing, int slot_nr); 116 int ide_cdrom_tray_move(struct cdrom_device_info *, int); 117 int ide_cdrom_lock_door(struct cdrom_device_info *, int); 118 int ide_cdrom_select_speed(struct cdrom_device_info *, int);

+4 -4

drivers/ide/ide-cd_ioctl.c

··· 79 return CDS_DRIVE_NOT_READY; 80 } 81 82 - int ide_cdrom_check_media_change_real(struct cdrom_device_info *cdi, 83 - int slot_nr) 84 { 85 ide_drive_t *drive = cdi->handle; 86 int retval; ··· 89 (void) cdrom_check_status(drive, NULL); 90 retval = (drive->dev_flags & IDE_DFLAG_MEDIA_CHANGED) ? 1 : 0; 91 drive->dev_flags &= ~IDE_DFLAG_MEDIA_CHANGED; 92 - return retval; 93 } else { 94 - return -EINVAL; 95 } 96 } 97

··· 79 return CDS_DRIVE_NOT_READY; 80 } 81 82 + unsigned int ide_cdrom_check_events_real(struct cdrom_device_info *cdi, 83 + unsigned int clearing, int slot_nr) 84 { 85 ide_drive_t *drive = cdi->handle; 86 int retval; ··· 89 (void) cdrom_check_status(drive, NULL); 90 retval = (drive->dev_flags & IDE_DFLAG_MEDIA_CHANGED) ? 1 : 0; 91 drive->dev_flags &= ~IDE_DFLAG_MEDIA_CHANGED; 92 + return retval ? DISK_EVENT_MEDIA_CHANGE : 0; 93 } else { 94 + return 0; 95 } 96 } 97

+8 -6

drivers/ide/ide-gd.c

··· 285 return 0; 286 } 287 288 - static int ide_gd_media_changed(struct gendisk *disk) 289 { 290 struct ide_disk_obj *idkp = ide_drv_g(disk, ide_disk_obj); 291 ide_drive_t *drive = idkp->drive; 292 - int ret; 293 294 /* do not scan partitions twice if this is a removable device */ 295 if (drive->dev_flags & IDE_DFLAG_ATTACH) { ··· 298 return 0; 299 } 300 301 - ret = !!(drive->dev_flags & IDE_DFLAG_MEDIA_CHANGED); 302 drive->dev_flags &= ~IDE_DFLAG_MEDIA_CHANGED; 303 304 - return ret; 305 } 306 307 static void ide_gd_unlock_native_capacity(struct gendisk *disk) ··· 319 struct ide_disk_obj *idkp = ide_drv_g(disk, ide_disk_obj); 320 ide_drive_t *drive = idkp->drive; 321 322 - if (ide_gd_media_changed(disk)) 323 drive->disk_ops->get_capacity(drive); 324 325 set_capacity(disk, ide_gd_capacity(drive)); ··· 341 .release = ide_gd_release, 342 .ioctl = ide_gd_ioctl, 343 .getgeo = ide_gd_getgeo, 344 - .media_changed = ide_gd_media_changed, 345 .unlock_native_capacity = ide_gd_unlock_native_capacity, 346 .revalidate_disk = ide_gd_revalidate_disk 347 }; ··· 413 if (drive->dev_flags & IDE_DFLAG_REMOVABLE) 414 g->flags = GENHD_FL_REMOVABLE; 415 g->fops = &ide_gd_ops; 416 add_disk(g); 417 return 0; 418

··· 285 return 0; 286 } 287 288 + static unsigned int ide_gd_check_events(struct gendisk *disk, 289 + unsigned int clearing) 290 { 291 struct ide_disk_obj *idkp = ide_drv_g(disk, ide_disk_obj); 292 ide_drive_t *drive = idkp->drive; 293 + bool ret; 294 295 /* do not scan partitions twice if this is a removable device */ 296 if (drive->dev_flags & IDE_DFLAG_ATTACH) { ··· 297 return 0; 298 } 299 300 + ret = drive->dev_flags & IDE_DFLAG_MEDIA_CHANGED; 301 drive->dev_flags &= ~IDE_DFLAG_MEDIA_CHANGED; 302 303 + return ret ? DISK_EVENT_MEDIA_CHANGE : 0; 304 } 305 306 static void ide_gd_unlock_native_capacity(struct gendisk *disk) ··· 318 struct ide_disk_obj *idkp = ide_drv_g(disk, ide_disk_obj); 319 ide_drive_t *drive = idkp->drive; 320 321 + if (ide_gd_check_events(disk, 0)) 322 drive->disk_ops->get_capacity(drive); 323 324 set_capacity(disk, ide_gd_capacity(drive)); ··· 340 .release = ide_gd_release, 341 .ioctl = ide_gd_ioctl, 342 .getgeo = ide_gd_getgeo, 343 + .check_events = ide_gd_check_events, 344 .unlock_native_capacity = ide_gd_unlock_native_capacity, 345 .revalidate_disk = ide_gd_revalidate_disk 346 }; ··· 412 if (drive->dev_flags & IDE_DFLAG_REMOVABLE) 413 g->flags = GENHD_FL_REMOVABLE; 414 g->fops = &ide_gd_ops; 415 + g->events = DISK_EVENT_MEDIA_CHANGE; 416 add_disk(g); 417 return 0; 418

-4

drivers/ide/ide-io.c

··· 549 550 if (rq) 551 blk_requeue_request(q, rq); 552 - if (!elv_queue_empty(q)) 553 - blk_plug_device(q); 554 } 555 556 void ide_requeue_and_plug(ide_drive_t *drive, struct request *rq) ··· 560 561 if (rq) 562 blk_requeue_request(q, rq); 563 - if (!elv_queue_empty(q)) 564 - blk_plug_device(q); 565 566 spin_unlock_irqrestore(q->queue_lock, flags); 567 }

··· 549 550 if (rq) 551 blk_requeue_request(q, rq); 552 } 553 554 void ide_requeue_and_plug(ide_drive_t *drive, struct request *rq) ··· 562 563 if (rq) 564 blk_requeue_request(q, rq); 565 566 spin_unlock_irqrestore(q->queue_lock, flags); 567 }

+1 -1

drivers/ide/ide-park.c

··· 52 rq->cmd[0] = REQ_UNPARK_HEADS; 53 rq->cmd_len = 1; 54 rq->cmd_type = REQ_TYPE_SPECIAL; 55 - elv_add_request(q, rq, ELEVATOR_INSERT_FRONT, 1); 56 57 out: 58 return;

··· 52 rq->cmd[0] = REQ_UNPARK_HEADS; 53 rq->cmd_len = 1; 54 rq->cmd_type = REQ_TYPE_SPECIAL; 55 + elv_add_request(q, rq, ELEVATOR_INSERT_FRONT); 56 57 out: 58 return;

+2 -3

drivers/md/bitmap.c

··· 347 atomic_inc(&bitmap->pending_writes); 348 set_buffer_locked(bh); 349 set_buffer_mapped(bh); 350 - submit_bh(WRITE | REQ_UNPLUG | REQ_SYNC, bh); 351 bh = bh->b_this_page; 352 } 353 ··· 1339 prepare_to_wait(&bitmap->overflow_wait, &__wait, 1340 TASK_UNINTERRUPTIBLE); 1341 spin_unlock_irq(&bitmap->lock); 1342 - md_unplug(bitmap->mddev); 1343 - schedule(); 1344 finish_wait(&bitmap->overflow_wait, &__wait); 1345 continue; 1346 }

··· 347 atomic_inc(&bitmap->pending_writes); 348 set_buffer_locked(bh); 349 set_buffer_mapped(bh); 350 + submit_bh(WRITE | REQ_SYNC, bh); 351 bh = bh->b_this_page; 352 } 353 ··· 1339 prepare_to_wait(&bitmap->overflow_wait, &__wait, 1340 TASK_UNINTERRUPTIBLE); 1341 spin_unlock_irq(&bitmap->lock); 1342 + io_schedule(); 1343 finish_wait(&bitmap->overflow_wait, &__wait); 1344 continue; 1345 }

+1 -8

drivers/md/dm-crypt.c

··· 991 clone->bi_destructor = dm_crypt_bio_destructor; 992 } 993 994 - static void kcryptd_unplug(struct crypt_config *cc) 995 - { 996 - blk_unplug(bdev_get_queue(cc->dev->bdev)); 997 - } 998 - 999 static int kcryptd_io_read(struct dm_crypt_io *io, gfp_t gfp) 1000 { 1001 struct crypt_config *cc = io->target->private; ··· 1003 * one in order to decrypt the whole bio data *afterwards*. 1004 */ 1005 clone = bio_alloc_bioset(gfp, bio_segments(base_bio), cc->bs); 1006 - if (!clone) { 1007 - kcryptd_unplug(cc); 1008 return 1; 1009 - } 1010 1011 crypt_inc_pending(io); 1012

··· 991 clone->bi_destructor = dm_crypt_bio_destructor; 992 } 993 994 static int kcryptd_io_read(struct dm_crypt_io *io, gfp_t gfp) 995 { 996 struct crypt_config *cc = io->target->private; ··· 1008 * one in order to decrypt the whole bio data *afterwards*. 1009 */ 1010 clone = bio_alloc_bioset(gfp, bio_segments(base_bio), cc->bs); 1011 + if (!clone) 1012 return 1; 1013 1014 crypt_inc_pending(io); 1015

+1 -1

drivers/md/dm-io.c

··· 352 BUG_ON(num_regions > DM_IO_MAX_REGIONS); 353 354 if (sync) 355 - rw |= REQ_SYNC | REQ_UNPLUG; 356 357 /* 358 * For multiple regions we need to be careful to rewind

··· 352 BUG_ON(num_regions > DM_IO_MAX_REGIONS); 353 354 if (sync) 355 + rw |= REQ_SYNC; 356 357 /* 358 * For multiple regions we need to be careful to rewind

+5 -50

drivers/md/dm-kcopyd.c

··· 37 unsigned int nr_pages; 38 unsigned int nr_free_pages; 39 40 - /* 41 - * Block devices to unplug. 42 - * Non-NULL pointer means that a block device has some pending requests 43 - * and needs to be unplugged. 44 - */ 45 - struct block_device *unplug[2]; 46 - 47 struct dm_io_client *io_client; 48 49 wait_queue_head_t destroyq; ··· 308 return 0; 309 } 310 311 - /* 312 - * Unplug the block device at the specified index. 313 - */ 314 - static void unplug(struct dm_kcopyd_client *kc, int rw) 315 - { 316 - if (kc->unplug[rw] != NULL) { 317 - blk_unplug(bdev_get_queue(kc->unplug[rw])); 318 - kc->unplug[rw] = NULL; 319 - } 320 - } 321 - 322 - /* 323 - * Prepare block device unplug. If there's another device 324 - * to be unplugged at the same array index, we unplug that 325 - * device first. 326 - */ 327 - static void prepare_unplug(struct dm_kcopyd_client *kc, int rw, 328 - struct block_device *bdev) 329 - { 330 - if (likely(kc->unplug[rw] == bdev)) 331 - return; 332 - unplug(kc, rw); 333 - kc->unplug[rw] = bdev; 334 - } 335 - 336 static void complete_io(unsigned long error, void *context) 337 { 338 struct kcopyd_job *job = (struct kcopyd_job *) context; ··· 354 .client = job->kc->io_client, 355 }; 356 357 - if (job->rw == READ) { 358 r = dm_io(&io_req, 1, &job->source, NULL); 359 - prepare_unplug(job->kc, READ, job->source.bdev); 360 - } else { 361 - if (job->num_dests > 1) 362 - io_req.bi_rw |= REQ_UNPLUG; 363 r = dm_io(&io_req, job->num_dests, job->dests, NULL); 364 - if (!(io_req.bi_rw & REQ_UNPLUG)) 365 - prepare_unplug(job->kc, WRITE, job->dests[0].bdev); 366 - } 367 368 return r; 369 } ··· 428 { 429 struct dm_kcopyd_client *kc = container_of(work, 430 struct dm_kcopyd_client, kcopyd_work); 431 432 /* 433 * The order that these are called is *very* important. ··· 436 * Pages jobs when successful will jump onto the io jobs 437 * list. io jobs call wake when they complete and it all 438 * starts again. 439 - * 440 - * Note that io_jobs add block devices to the unplug array, 441 - * this array is cleared with "unplug" calls. It is thus 442 - * forbidden to run complete_jobs after io_jobs and before 443 - * unplug because the block device could be destroyed in 444 - * job completion callback. 445 */ 446 process_jobs(&kc->complete_jobs, kc, run_complete_job); 447 process_jobs(&kc->pages_jobs, kc, run_pages_job); 448 process_jobs(&kc->io_jobs, kc, run_io_job); 449 - unplug(kc, READ); 450 - unplug(kc, WRITE); 451 } 452 453 /* ··· 621 INIT_LIST_HEAD(&kc->complete_jobs); 622 INIT_LIST_HEAD(&kc->io_jobs); 623 INIT_LIST_HEAD(&kc->pages_jobs); 624 - 625 - memset(kc->unplug, 0, sizeof(kc->unplug)); 626 627 kc->job_pool = mempool_create_slab_pool(MIN_JOBS, _job_cache); 628 if (!kc->job_pool)

··· 37 unsigned int nr_pages; 38 unsigned int nr_free_pages; 39 40 struct dm_io_client *io_client; 41 42 wait_queue_head_t destroyq; ··· 315 return 0; 316 } 317 318 static void complete_io(unsigned long error, void *context) 319 { 320 struct kcopyd_job *job = (struct kcopyd_job *) context; ··· 386 .client = job->kc->io_client, 387 }; 388 389 + if (job->rw == READ) 390 r = dm_io(&io_req, 1, &job->source, NULL); 391 + else 392 r = dm_io(&io_req, job->num_dests, job->dests, NULL); 393 394 return r; 395 } ··· 466 { 467 struct dm_kcopyd_client *kc = container_of(work, 468 struct dm_kcopyd_client, kcopyd_work); 469 + struct blk_plug plug; 470 471 /* 472 * The order that these are called is *very* important. ··· 473 * Pages jobs when successful will jump onto the io jobs 474 * list. io jobs call wake when they complete and it all 475 * starts again. 476 */ 477 + blk_start_plug(&plug); 478 process_jobs(&kc->complete_jobs, kc, run_complete_job); 479 process_jobs(&kc->pages_jobs, kc, run_pages_job); 480 process_jobs(&kc->io_jobs, kc, run_io_job); 481 + blk_finish_plug(&plug); 482 } 483 484 /* ··· 664 INIT_LIST_HEAD(&kc->complete_jobs); 665 INIT_LIST_HEAD(&kc->io_jobs); 666 INIT_LIST_HEAD(&kc->pages_jobs); 667 668 kc->job_pool = mempool_create_slab_pool(MIN_JOBS, _job_cache); 669 if (!kc->job_pool)

+1 -1

drivers/md/dm-raid.c

··· 394 { 395 struct raid_set *rs = container_of(cb, struct raid_set, callbacks); 396 397 - md_raid5_unplug_device(rs->md.private); 398 } 399 400 /*

··· 394 { 395 struct raid_set *rs = container_of(cb, struct raid_set, callbacks); 396 397 + md_raid5_kick_device(rs->md.private); 398 } 399 400 /*

-2

drivers/md/dm-raid1.c

··· 842 do_reads(ms, &reads); 843 do_writes(ms, &writes); 844 do_failures(ms, &failures); 845 - 846 - dm_table_unplug_all(ms->ti->table); 847 } 848 849 /*-----------------------------------------------------------------

··· 842 do_reads(ms, &reads); 843 do_writes(ms, &writes); 844 do_failures(ms, &failures); 845 } 846 847 /*-----------------------------------------------------------------

+5 -26

drivers/md/dm-table.c

··· 55 struct dm_target *targets; 56 57 unsigned discards_supported:1; 58 59 /* 60 * Indicates the rw permissions for the new logical ··· 860 return -EINVAL; 861 } 862 863 - t->mempools = dm_alloc_md_mempools(type); 864 if (!t->mempools) 865 return -ENOMEM; 866 ··· 936 struct dm_dev_internal *dd; 937 938 list_for_each_entry(dd, devices, list) 939 - if (bdev_get_integrity(dd->dm_dev.bdev)) 940 return blk_integrity_register(dm_disk(md), NULL); 941 942 return 0; 943 } ··· 1278 return 0; 1279 } 1280 1281 - void dm_table_unplug_all(struct dm_table *t) 1282 - { 1283 - struct dm_dev_internal *dd; 1284 - struct list_head *devices = dm_table_get_devices(t); 1285 - struct dm_target_callbacks *cb; 1286 - 1287 - list_for_each_entry(dd, devices, list) { 1288 - struct request_queue *q = bdev_get_queue(dd->dm_dev.bdev); 1289 - char b[BDEVNAME_SIZE]; 1290 - 1291 - if (likely(q)) 1292 - blk_unplug(q); 1293 - else 1294 - DMWARN_LIMIT("%s: Cannot unplug nonexistent device %s", 1295 - dm_device_name(t->md), 1296 - bdevname(dd->dm_dev.bdev, b)); 1297 - } 1298 - 1299 - list_for_each_entry(cb, &t->target_callbacks, list) 1300 - if (cb->unplug_fn) 1301 - cb->unplug_fn(cb); 1302 - } 1303 - 1304 struct mapped_device *dm_table_get_md(struct dm_table *t) 1305 { 1306 return t->md; ··· 1325 EXPORT_SYMBOL(dm_table_get_md); 1326 EXPORT_SYMBOL(dm_table_put); 1327 EXPORT_SYMBOL(dm_table_get); 1328 - EXPORT_SYMBOL(dm_table_unplug_all);

··· 55 struct dm_target *targets; 56 57 unsigned discards_supported:1; 58 + unsigned integrity_supported:1; 59 60 /* 61 * Indicates the rw permissions for the new logical ··· 859 return -EINVAL; 860 } 861 862 + t->mempools = dm_alloc_md_mempools(type, t->integrity_supported); 863 if (!t->mempools) 864 return -ENOMEM; 865 ··· 935 struct dm_dev_internal *dd; 936 937 list_for_each_entry(dd, devices, list) 938 + if (bdev_get_integrity(dd->dm_dev.bdev)) { 939 + t->integrity_supported = 1; 940 return blk_integrity_register(dm_disk(md), NULL); 941 + } 942 943 return 0; 944 } ··· 1275 return 0; 1276 } 1277 1278 struct mapped_device *dm_table_get_md(struct dm_table *t) 1279 { 1280 return t->md; ··· 1345 EXPORT_SYMBOL(dm_table_get_md); 1346 EXPORT_SYMBOL(dm_table_put); 1347 EXPORT_SYMBOL(dm_table_get);

+18 -34

drivers/md/dm.c

··· 477 cpu = part_stat_lock(); 478 part_round_stats(cpu, &dm_disk(md)->part0); 479 part_stat_unlock(); 480 - dm_disk(md)->part0.in_flight[rw] = atomic_inc_return(&md->pending[rw]); 481 } 482 483 static void end_io_acct(struct dm_io *io) ··· 498 * After this is decremented the bio must not be touched if it is 499 * a flush. 500 */ 501 - dm_disk(md)->part0.in_flight[rw] = pending = 502 - atomic_dec_return(&md->pending[rw]); 503 pending += atomic_read(&md->pending[rw^0x1]); 504 505 /* nudge anyone waiting on suspend queue */ ··· 808 dm_unprep_request(rq); 809 810 spin_lock_irqsave(q->queue_lock, flags); 811 - if (elv_queue_empty(q)) 812 - blk_plug_device(q); 813 blk_requeue_request(q, rq); 814 spin_unlock_irqrestore(q->queue_lock, flags); 815 ··· 1612 * number of in-flight I/Os after the queue is stopped in 1613 * dm_suspend(). 1614 */ 1615 - while (!blk_queue_plugged(q) && !blk_queue_stopped(q)) { 1616 rq = blk_peek_request(q); 1617 if (!rq) 1618 - goto plug_and_out; 1619 1620 /* always use block 0 to find the target for flushes for now */ 1621 pos = 0; ··· 1626 BUG_ON(!dm_target_is_valid(ti)); 1627 1628 if (ti->type->busy && ti->type->busy(ti)) 1629 - goto plug_and_out; 1630 1631 blk_start_request(rq); 1632 clone = rq->special; ··· 1646 BUG_ON(!irqs_disabled()); 1647 spin_lock(q->queue_lock); 1648 1649 - plug_and_out: 1650 - if (!elv_queue_empty(q)) 1651 - /* Some requests still remain, retry later */ 1652 - blk_plug_device(q); 1653 - 1654 out: 1655 dm_table_put(map); 1656 ··· 1674 dm_table_put(map); 1675 1676 return r; 1677 - } 1678 - 1679 - static void dm_unplug_all(struct request_queue *q) 1680 - { 1681 - struct mapped_device *md = q->queuedata; 1682 - struct dm_table *map = dm_get_live_table(md); 1683 - 1684 - if (map) { 1685 - if (dm_request_based(md)) 1686 - generic_unplug_device(q); 1687 - 1688 - dm_table_unplug_all(map); 1689 - dm_table_put(map); 1690 - } 1691 } 1692 1693 static int dm_any_congested(void *congested_data, int bdi_bits) ··· 1799 md->queue->backing_dev_info.congested_data = md; 1800 blk_queue_make_request(md->queue, dm_request); 1801 blk_queue_bounce_limit(md->queue, BLK_BOUNCE_ANY); 1802 - md->queue->unplug_fn = dm_unplug_all; 1803 blk_queue_merge_bvec(md->queue, dm_merge_bvec); 1804 blk_queue_flush(md->queue, REQ_FLUSH | REQ_FUA); 1805 } ··· 2244 int r = 0; 2245 DECLARE_WAITQUEUE(wait, current); 2246 2247 - dm_unplug_all(md->queue); 2248 - 2249 add_wait_queue(&md->wait, &wait); 2250 2251 while (1) { ··· 2518 2519 clear_bit(DMF_SUSPENDED, &md->flags); 2520 2521 - dm_table_unplug_all(map); 2522 r = 0; 2523 out: 2524 dm_table_put(map); ··· 2621 } 2622 EXPORT_SYMBOL_GPL(dm_noflush_suspending); 2623 2624 - struct dm_md_mempools *dm_alloc_md_mempools(unsigned type) 2625 { 2626 struct dm_md_mempools *pools = kmalloc(sizeof(*pools), GFP_KERNEL); 2627 2628 if (!pools) 2629 return NULL; ··· 2641 if (!pools->tio_pool) 2642 goto free_io_pool_and_out; 2643 2644 - pools->bs = (type == DM_TYPE_BIO_BASED) ? 2645 - bioset_create(16, 0) : bioset_create(MIN_IOS, 0); 2646 if (!pools->bs) 2647 goto free_tio_pool_and_out; 2648 2649 return pools; 2650 2651 free_tio_pool_and_out: 2652 mempool_destroy(pools->tio_pool);

··· 477 cpu = part_stat_lock(); 478 part_round_stats(cpu, &dm_disk(md)->part0); 479 part_stat_unlock(); 480 + atomic_set(&dm_disk(md)->part0.in_flight[rw], 481 + atomic_inc_return(&md->pending[rw])); 482 } 483 484 static void end_io_acct(struct dm_io *io) ··· 497 * After this is decremented the bio must not be touched if it is 498 * a flush. 499 */ 500 + pending = atomic_dec_return(&md->pending[rw]); 501 + atomic_set(&dm_disk(md)->part0.in_flight[rw], pending); 502 pending += atomic_read(&md->pending[rw^0x1]); 503 504 /* nudge anyone waiting on suspend queue */ ··· 807 dm_unprep_request(rq); 808 809 spin_lock_irqsave(q->queue_lock, flags); 810 blk_requeue_request(q, rq); 811 spin_unlock_irqrestore(q->queue_lock, flags); 812 ··· 1613 * number of in-flight I/Os after the queue is stopped in 1614 * dm_suspend(). 1615 */ 1616 + while (!blk_queue_stopped(q)) { 1617 rq = blk_peek_request(q); 1618 if (!rq) 1619 + goto delay_and_out; 1620 1621 /* always use block 0 to find the target for flushes for now */ 1622 pos = 0; ··· 1627 BUG_ON(!dm_target_is_valid(ti)); 1628 1629 if (ti->type->busy && ti->type->busy(ti)) 1630 + goto delay_and_out; 1631 1632 blk_start_request(rq); 1633 clone = rq->special; ··· 1647 BUG_ON(!irqs_disabled()); 1648 spin_lock(q->queue_lock); 1649 1650 + delay_and_out: 1651 + blk_delay_queue(q, HZ / 10); 1652 out: 1653 dm_table_put(map); 1654 ··· 1678 dm_table_put(map); 1679 1680 return r; 1681 } 1682 1683 static int dm_any_congested(void *congested_data, int bdi_bits) ··· 1817 md->queue->backing_dev_info.congested_data = md; 1818 blk_queue_make_request(md->queue, dm_request); 1819 blk_queue_bounce_limit(md->queue, BLK_BOUNCE_ANY); 1820 blk_queue_merge_bvec(md->queue, dm_merge_bvec); 1821 blk_queue_flush(md->queue, REQ_FLUSH | REQ_FUA); 1822 } ··· 2263 int r = 0; 2264 DECLARE_WAITQUEUE(wait, current); 2265 2266 add_wait_queue(&md->wait, &wait); 2267 2268 while (1) { ··· 2539 2540 clear_bit(DMF_SUSPENDED, &md->flags); 2541 2542 r = 0; 2543 out: 2544 dm_table_put(map); ··· 2643 } 2644 EXPORT_SYMBOL_GPL(dm_noflush_suspending); 2645 2646 + struct dm_md_mempools *dm_alloc_md_mempools(unsigned type, unsigned integrity) 2647 { 2648 struct dm_md_mempools *pools = kmalloc(sizeof(*pools), GFP_KERNEL); 2649 + unsigned int pool_size = (type == DM_TYPE_BIO_BASED) ? 16 : MIN_IOS; 2650 2651 if (!pools) 2652 return NULL; ··· 2662 if (!pools->tio_pool) 2663 goto free_io_pool_and_out; 2664 2665 + pools->bs = bioset_create(pool_size, 0); 2666 if (!pools->bs) 2667 goto free_tio_pool_and_out; 2668 2669 + if (integrity && bioset_integrity_create(pools->bs, pool_size)) 2670 + goto free_bioset_and_out; 2671 + 2672 return pools; 2673 + 2674 + free_bioset_and_out: 2675 + bioset_free(pools->bs); 2676 2677 free_tio_pool_and_out: 2678 mempool_destroy(pools->tio_pool);

+1 -1

drivers/md/dm.h

··· 149 /* 150 * Mempool operations 151 */ 152 - struct dm_md_mempools *dm_alloc_md_mempools(unsigned type); 153 void dm_free_md_mempools(struct dm_md_mempools *pools); 154 155 #endif

··· 149 /* 150 * Mempool operations 151 */ 152 + struct dm_md_mempools *dm_alloc_md_mempools(unsigned type, unsigned integrity); 153 void dm_free_md_mempools(struct dm_md_mempools *pools); 154 155 #endif

+1 -19

drivers/md/linear.c

··· 87 return maxsectors << 9; 88 } 89 90 - static void linear_unplug(struct request_queue *q) 91 - { 92 - mddev_t *mddev = q->queuedata; 93 - linear_conf_t *conf; 94 - int i; 95 - 96 - rcu_read_lock(); 97 - conf = rcu_dereference(mddev->private); 98 - 99 - for (i=0; i < mddev->raid_disks; i++) { 100 - struct request_queue *r_queue = bdev_get_queue(conf->disks[i].rdev->bdev); 101 - blk_unplug(r_queue); 102 - } 103 - rcu_read_unlock(); 104 - } 105 - 106 static int linear_congested(void *data, int bits) 107 { 108 mddev_t *mddev = data; ··· 208 md_set_array_sectors(mddev, linear_size(mddev, 0, 0)); 209 210 blk_queue_merge_bvec(mddev->queue, linear_mergeable_bvec); 211 - mddev->queue->unplug_fn = linear_unplug; 212 mddev->queue->backing_dev_info.congested_fn = linear_congested; 213 mddev->queue->backing_dev_info.congested_data = mddev; 214 - md_integrity_register(mddev); 215 - return 0; 216 } 217 218 static void free_conf(struct rcu_head *head)

··· 87 return maxsectors << 9; 88 } 89 90 static int linear_congested(void *data, int bits) 91 { 92 mddev_t *mddev = data; ··· 224 md_set_array_sectors(mddev, linear_size(mddev, 0, 0)); 225 226 blk_queue_merge_bvec(mddev->queue, linear_mergeable_bvec); 227 mddev->queue->backing_dev_info.congested_fn = linear_congested; 228 mddev->queue->backing_dev_info.congested_data = mddev; 229 + return md_integrity_register(mddev); 230 } 231 232 static void free_conf(struct rcu_head *head)

+8 -12

drivers/md/md.c

··· 780 bio->bi_end_io = super_written; 781 782 atomic_inc(&mddev->pending_writes); 783 - submit_bio(REQ_WRITE | REQ_SYNC | REQ_UNPLUG | REQ_FLUSH | REQ_FUA, 784 - bio); 785 } 786 787 void md_super_wait(mddev_t *mddev) ··· 808 struct completion event; 809 int ret; 810 811 - rw |= REQ_SYNC | REQ_UNPLUG; 812 813 bio->bi_bdev = (metadata_op && rdev->meta_bdev) ? 814 rdev->meta_bdev : rdev->bdev; ··· 1803 mdname(mddev)); 1804 return -EINVAL; 1805 } 1806 - printk(KERN_NOTICE "md: data integrity on %s enabled\n", 1807 - mdname(mddev)); 1808 return 0; 1809 } 1810 EXPORT_SYMBOL(md_integrity_register); ··· 4820 __md_stop_writes(mddev); 4821 md_stop(mddev); 4822 mddev->queue->merge_bvec_fn = NULL; 4823 - mddev->queue->unplug_fn = NULL; 4824 mddev->queue->backing_dev_info.congested_fn = NULL; 4825 4826 /* tell userspace to handle 'inactive' */ ··· 6694 6695 void md_unplug(mddev_t *mddev) 6696 { 6697 - if (mddev->queue) 6698 - blk_unplug(mddev->queue); 6699 if (mddev->plug) 6700 mddev->plug->unplug_fn(mddev->plug); 6701 } ··· 6876 >= mddev->resync_max - mddev->curr_resync_completed 6877 )) { 6878 /* time to update curr_resync_completed */ 6879 - md_unplug(mddev); 6880 wait_event(mddev->recovery_wait, 6881 atomic_read(&mddev->recovery_active) == 0); 6882 mddev->curr_resync_completed = j; ··· 6951 * about not overloading the IO subsystem. (things like an 6952 * e2fsck being done on the RAID array should execute fast) 6953 */ 6954 - md_unplug(mddev); 6955 cond_resched(); 6956 6957 currspeed = ((unsigned long)(io_sectors-mddev->resync_mark_cnt))/2 ··· 6969 * this also signals 'finished resyncing' to md_stop 6970 */ 6971 out: 6972 - md_unplug(mddev); 6973 - 6974 wait_event(mddev->recovery_wait, !atomic_read(&mddev->recovery_active)); 6975 6976 /* tell personality that we are finished */

··· 780 bio->bi_end_io = super_written; 781 782 atomic_inc(&mddev->pending_writes); 783 + submit_bio(REQ_WRITE | REQ_SYNC | REQ_FLUSH | REQ_FUA, bio); 784 } 785 786 void md_super_wait(mddev_t *mddev) ··· 809 struct completion event; 810 int ret; 811 812 + rw |= REQ_SYNC; 813 814 bio->bi_bdev = (metadata_op && rdev->meta_bdev) ? 815 rdev->meta_bdev : rdev->bdev; ··· 1804 mdname(mddev)); 1805 return -EINVAL; 1806 } 1807 + printk(KERN_NOTICE "md: data integrity enabled on %s\n", mdname(mddev)); 1808 + if (bioset_integrity_create(mddev->bio_set, BIO_POOL_SIZE)) { 1809 + printk(KERN_ERR "md: failed to create integrity pool for %s\n", 1810 + mdname(mddev)); 1811 + return -EINVAL; 1812 + } 1813 return 0; 1814 } 1815 EXPORT_SYMBOL(md_integrity_register); ··· 4817 __md_stop_writes(mddev); 4818 md_stop(mddev); 4819 mddev->queue->merge_bvec_fn = NULL; 4820 mddev->queue->backing_dev_info.congested_fn = NULL; 4821 4822 /* tell userspace to handle 'inactive' */ ··· 6692 6693 void md_unplug(mddev_t *mddev) 6694 { 6695 if (mddev->plug) 6696 mddev->plug->unplug_fn(mddev->plug); 6697 } ··· 6876 >= mddev->resync_max - mddev->curr_resync_completed 6877 )) { 6878 /* time to update curr_resync_completed */ 6879 wait_event(mddev->recovery_wait, 6880 atomic_read(&mddev->recovery_active) == 0); 6881 mddev->curr_resync_completed = j; ··· 6952 * about not overloading the IO subsystem. (things like an 6953 * e2fsck being done on the RAID array should execute fast) 6954 */ 6955 cond_resched(); 6956 6957 currspeed = ((unsigned long)(io_sectors-mddev->resync_mark_cnt))/2 ··· 6971 * this also signals 'finished resyncing' to md_stop 6972 */ 6973 out: 6974 wait_event(mddev->recovery_wait, !atomic_read(&mddev->recovery_active)); 6975 6976 /* tell personality that we are finished */

+5 -33

drivers/md/multipath.c

··· 106 rdev_dec_pending(rdev, conf->mddev); 107 } 108 109 - static void unplug_slaves(mddev_t *mddev) 110 - { 111 - multipath_conf_t *conf = mddev->private; 112 - int i; 113 - 114 - rcu_read_lock(); 115 - for (i=0; i<mddev->raid_disks; i++) { 116 - mdk_rdev_t *rdev = rcu_dereference(conf->multipaths[i].rdev); 117 - if (rdev && !test_bit(Faulty, &rdev->flags) 118 - && atomic_read(&rdev->nr_pending)) { 119 - struct request_queue *r_queue = bdev_get_queue(rdev->bdev); 120 - 121 - atomic_inc(&rdev->nr_pending); 122 - rcu_read_unlock(); 123 - 124 - blk_unplug(r_queue); 125 - 126 - rdev_dec_pending(rdev, mddev); 127 - rcu_read_lock(); 128 - } 129 - } 130 - rcu_read_unlock(); 131 - } 132 - 133 - static void multipath_unplug(struct request_queue *q) 134 - { 135 - unplug_slaves(q->queuedata); 136 - } 137 - 138 - 139 static int multipath_make_request(mddev_t *mddev, struct bio * bio) 140 { 141 multipath_conf_t *conf = mddev->private; ··· 315 p->rdev = rdev; 316 goto abort; 317 } 318 - md_integrity_register(mddev); 319 } 320 abort: 321 ··· 487 */ 488 md_set_array_sectors(mddev, multipath_size(mddev, 0, 0)); 489 490 - mddev->queue->unplug_fn = multipath_unplug; 491 mddev->queue->backing_dev_info.congested_fn = multipath_congested; 492 mddev->queue->backing_dev_info.congested_data = mddev; 493 - md_integrity_register(mddev); 494 return 0; 495 496 out_free_conf:

··· 106 rdev_dec_pending(rdev, conf->mddev); 107 } 108 109 static int multipath_make_request(mddev_t *mddev, struct bio * bio) 110 { 111 multipath_conf_t *conf = mddev->private; ··· 345 p->rdev = rdev; 346 goto abort; 347 } 348 + err = md_integrity_register(mddev); 349 } 350 abort: 351 ··· 517 */ 518 md_set_array_sectors(mddev, multipath_size(mddev, 0, 0)); 519 520 mddev->queue->backing_dev_info.congested_fn = multipath_congested; 521 mddev->queue->backing_dev_info.congested_data = mddev; 522 + 523 + if (md_integrity_register(mddev)) 524 + goto out_free_conf; 525 + 526 return 0; 527 528 out_free_conf:

+1 -18

drivers/md/raid0.c

··· 25 #include "raid0.h" 26 #include "raid5.h" 27 28 - static void raid0_unplug(struct request_queue *q) 29 - { 30 - mddev_t *mddev = q->queuedata; 31 - raid0_conf_t *conf = mddev->private; 32 - mdk_rdev_t **devlist = conf->devlist; 33 - int raid_disks = conf->strip_zone[0].nb_dev; 34 - int i; 35 - 36 - for (i=0; i < raid_disks; i++) { 37 - struct request_queue *r_queue = bdev_get_queue(devlist[i]->bdev); 38 - 39 - blk_unplug(r_queue); 40 - } 41 - } 42 - 43 static int raid0_congested(void *data, int bits) 44 { 45 mddev_t *mddev = data; ··· 257 mdname(mddev), 258 (unsigned long long)smallest->sectors); 259 } 260 - mddev->queue->unplug_fn = raid0_unplug; 261 mddev->queue->backing_dev_info.congested_fn = raid0_congested; 262 mddev->queue->backing_dev_info.congested_data = mddev; 263 ··· 379 380 blk_queue_merge_bvec(mddev->queue, raid0_mergeable_bvec); 381 dump_zones(mddev); 382 - md_integrity_register(mddev); 383 - return 0; 384 } 385 386 static int raid0_stop(mddev_t *mddev)

··· 25 #include "raid0.h" 26 #include "raid5.h" 27 28 static int raid0_congested(void *data, int bits) 29 { 30 mddev_t *mddev = data; ··· 272 mdname(mddev), 273 (unsigned long long)smallest->sectors); 274 } 275 mddev->queue->backing_dev_info.congested_fn = raid0_congested; 276 mddev->queue->backing_dev_info.congested_data = mddev; 277 ··· 395 396 blk_queue_merge_bvec(mddev->queue, raid0_mergeable_bvec); 397 dump_zones(mddev); 398 + return md_integrity_register(mddev); 399 } 400 401 static int raid0_stop(mddev_t *mddev)

+19 -72

drivers/md/raid1.c

··· 52 #define NR_RAID1_BIOS 256 53 54 55 - static void unplug_slaves(mddev_t *mddev); 56 - 57 static void allow_barrier(conf_t *conf); 58 static void lower_barrier(conf_t *conf); 59 60 static void * r1bio_pool_alloc(gfp_t gfp_flags, void *data) 61 { 62 struct pool_info *pi = data; 63 - r1bio_t *r1_bio; 64 int size = offsetof(r1bio_t, bios[pi->raid_disks]); 65 66 /* allocate a r1bio with room for raid_disks entries in the bios array */ 67 - r1_bio = kzalloc(size, gfp_flags); 68 - if (!r1_bio && pi->mddev) 69 - unplug_slaves(pi->mddev); 70 - 71 - return r1_bio; 72 } 73 74 static void r1bio_pool_free(void *r1_bio, void *data) ··· 84 int i, j; 85 86 r1_bio = r1bio_pool_alloc(gfp_flags, pi); 87 - if (!r1_bio) { 88 - unplug_slaves(pi->mddev); 89 return NULL; 90 - } 91 92 /* 93 * Allocate bios : 1 for reading, n-1 for writing ··· 511 return new_disk; 512 } 513 514 - static void unplug_slaves(mddev_t *mddev) 515 - { 516 - conf_t *conf = mddev->private; 517 - int i; 518 - 519 - rcu_read_lock(); 520 - for (i=0; i<mddev->raid_disks; i++) { 521 - mdk_rdev_t *rdev = rcu_dereference(conf->mirrors[i].rdev); 522 - if (rdev && !test_bit(Faulty, &rdev->flags) && atomic_read(&rdev->nr_pending)) { 523 - struct request_queue *r_queue = bdev_get_queue(rdev->bdev); 524 - 525 - atomic_inc(&rdev->nr_pending); 526 - rcu_read_unlock(); 527 - 528 - blk_unplug(r_queue); 529 - 530 - rdev_dec_pending(rdev, mddev); 531 - rcu_read_lock(); 532 - } 533 - } 534 - rcu_read_unlock(); 535 - } 536 - 537 - static void raid1_unplug(struct request_queue *q) 538 - { 539 - mddev_t *mddev = q->queuedata; 540 - 541 - unplug_slaves(mddev); 542 - md_wakeup_thread(mddev->thread); 543 - } 544 - 545 static int raid1_congested(void *data, int bits) 546 { 547 mddev_t *mddev = data; ··· 540 } 541 542 543 - static int flush_pending_writes(conf_t *conf) 544 { 545 /* Any writes that have been queued but are awaiting 546 * bitmap updates get flushed here. 547 - * We return 1 if any requests were actually submitted. 548 */ 549 - int rv = 0; 550 - 551 spin_lock_irq(&conf->device_lock); 552 553 if (conf->pending_bio_list.head) { 554 struct bio *bio; 555 bio = bio_list_get(&conf->pending_bio_list); 556 - /* Only take the spinlock to quiet a warning */ 557 - spin_lock(conf->mddev->queue->queue_lock); 558 - blk_remove_plug(conf->mddev->queue); 559 - spin_unlock(conf->mddev->queue->queue_lock); 560 spin_unlock_irq(&conf->device_lock); 561 /* flush any pending bitmap writes to 562 * disk before proceeding w/ I/O */ ··· 561 generic_make_request(bio); 562 bio = next; 563 } 564 - rv = 1; 565 } else 566 spin_unlock_irq(&conf->device_lock); 567 - return rv; 568 } 569 570 /* Barriers.... ··· 600 601 /* Wait until no block IO is waiting */ 602 wait_event_lock_irq(conf->wait_barrier, !conf->nr_waiting, 603 - conf->resync_lock, 604 - raid1_unplug(conf->mddev->queue)); 605 606 /* block any new IO from starting */ 607 conf->barrier++; ··· 608 /* Now wait for all pending IO to complete */ 609 wait_event_lock_irq(conf->wait_barrier, 610 !conf->nr_pending && conf->barrier < RESYNC_DEPTH, 611 - conf->resync_lock, 612 - raid1_unplug(conf->mddev->queue)); 613 614 spin_unlock_irq(&conf->resync_lock); 615 } ··· 630 conf->nr_waiting++; 631 wait_event_lock_irq(conf->wait_barrier, !conf->barrier, 632 conf->resync_lock, 633 - raid1_unplug(conf->mddev->queue)); 634 conf->nr_waiting--; 635 } 636 conf->nr_pending++; ··· 667 conf->nr_pending == conf->nr_queued+1, 668 conf->resync_lock, 669 ({ flush_pending_writes(conf); 670 - raid1_unplug(conf->mddev->queue); })); 671 spin_unlock_irq(&conf->resync_lock); 672 } 673 static void unfreeze_array(conf_t *conf) ··· 917 atomic_inc(&r1_bio->remaining); 918 spin_lock_irqsave(&conf->device_lock, flags); 919 bio_list_add(&conf->pending_bio_list, mbio); 920 - blk_plug_device_unlocked(mddev->queue); 921 spin_unlock_irqrestore(&conf->device_lock, flags); 922 } 923 r1_bio_write_done(r1_bio, bio->bi_vcnt, behind_pages, behind_pages != NULL); ··· 925 /* In case raid1d snuck in to freeze_array */ 926 wake_up(&conf->wait_barrier); 927 928 - if (do_sync) 929 md_wakeup_thread(mddev->thread); 930 931 return 0; ··· 1132 p->rdev = rdev; 1133 goto abort; 1134 } 1135 - md_integrity_register(mddev); 1136 } 1137 abort: 1138 ··· 1515 unsigned long flags; 1516 conf_t *conf = mddev->private; 1517 struct list_head *head = &conf->retry_list; 1518 - int unplug=0; 1519 mdk_rdev_t *rdev; 1520 1521 md_check_recovery(mddev); ··· 1522 for (;;) { 1523 char b[BDEVNAME_SIZE]; 1524 1525 - unplug += flush_pending_writes(conf); 1526 1527 spin_lock_irqsave(&conf->device_lock, flags); 1528 if (list_empty(head)) { ··· 1536 1537 mddev = r1_bio->mddev; 1538 conf = mddev->private; 1539 - if (test_bit(R1BIO_IsSync, &r1_bio->state)) { 1540 sync_request_write(mddev, r1_bio); 1541 - unplug = 1; 1542 - } else { 1543 int disk; 1544 1545 /* we got a read error. Maybe the drive is bad. Maybe just ··· 1588 bio->bi_end_io = raid1_end_read_request; 1589 bio->bi_rw = READ | do_sync; 1590 bio->bi_private = r1_bio; 1591 - unplug = 1; 1592 generic_make_request(bio); 1593 } 1594 } 1595 cond_resched(); 1596 } 1597 - if (unplug) 1598 - unplug_slaves(mddev); 1599 } 1600 1601 ··· 2015 2016 md_set_array_sectors(mddev, raid1_size(mddev, 0, 0)); 2017 2018 - mddev->queue->unplug_fn = raid1_unplug; 2019 mddev->queue->backing_dev_info.congested_fn = raid1_congested; 2020 mddev->queue->backing_dev_info.congested_data = mddev; 2021 - md_integrity_register(mddev); 2022 - return 0; 2023 } 2024 2025 static int stop(mddev_t *mddev)

··· 52 #define NR_RAID1_BIOS 256 53 54 55 static void allow_barrier(conf_t *conf); 56 static void lower_barrier(conf_t *conf); 57 58 static void * r1bio_pool_alloc(gfp_t gfp_flags, void *data) 59 { 60 struct pool_info *pi = data; 61 int size = offsetof(r1bio_t, bios[pi->raid_disks]); 62 63 /* allocate a r1bio with room for raid_disks entries in the bios array */ 64 + return kzalloc(size, gfp_flags); 65 } 66 67 static void r1bio_pool_free(void *r1_bio, void *data) ··· 91 int i, j; 92 93 r1_bio = r1bio_pool_alloc(gfp_flags, pi); 94 + if (!r1_bio) 95 return NULL; 96 97 /* 98 * Allocate bios : 1 for reading, n-1 for writing ··· 520 return new_disk; 521 } 522 523 static int raid1_congested(void *data, int bits) 524 { 525 mddev_t *mddev = data; ··· 580 } 581 582 583 + static void flush_pending_writes(conf_t *conf) 584 { 585 /* Any writes that have been queued but are awaiting 586 * bitmap updates get flushed here. 587 */ 588 spin_lock_irq(&conf->device_lock); 589 590 if (conf->pending_bio_list.head) { 591 struct bio *bio; 592 bio = bio_list_get(&conf->pending_bio_list); 593 spin_unlock_irq(&conf->device_lock); 594 /* flush any pending bitmap writes to 595 * disk before proceeding w/ I/O */ ··· 608 generic_make_request(bio); 609 bio = next; 610 } 611 } else 612 spin_unlock_irq(&conf->device_lock); 613 + } 614 + 615 + static void md_kick_device(mddev_t *mddev) 616 + { 617 + blk_flush_plug(current); 618 + md_wakeup_thread(mddev->thread); 619 } 620 621 /* Barriers.... ··· 643 644 /* Wait until no block IO is waiting */ 645 wait_event_lock_irq(conf->wait_barrier, !conf->nr_waiting, 646 + conf->resync_lock, md_kick_device(conf->mddev)); 647 648 /* block any new IO from starting */ 649 conf->barrier++; ··· 652 /* Now wait for all pending IO to complete */ 653 wait_event_lock_irq(conf->wait_barrier, 654 !conf->nr_pending && conf->barrier < RESYNC_DEPTH, 655 + conf->resync_lock, md_kick_device(conf->mddev)); 656 657 spin_unlock_irq(&conf->resync_lock); 658 } ··· 675 conf->nr_waiting++; 676 wait_event_lock_irq(conf->wait_barrier, !conf->barrier, 677 conf->resync_lock, 678 + md_kick_device(conf->mddev)); 679 conf->nr_waiting--; 680 } 681 conf->nr_pending++; ··· 712 conf->nr_pending == conf->nr_queued+1, 713 conf->resync_lock, 714 ({ flush_pending_writes(conf); 715 + md_kick_device(conf->mddev); })); 716 spin_unlock_irq(&conf->resync_lock); 717 } 718 static void unfreeze_array(conf_t *conf) ··· 962 atomic_inc(&r1_bio->remaining); 963 spin_lock_irqsave(&conf->device_lock, flags); 964 bio_list_add(&conf->pending_bio_list, mbio); 965 spin_unlock_irqrestore(&conf->device_lock, flags); 966 } 967 r1_bio_write_done(r1_bio, bio->bi_vcnt, behind_pages, behind_pages != NULL); ··· 971 /* In case raid1d snuck in to freeze_array */ 972 wake_up(&conf->wait_barrier); 973 974 + if (do_sync || !bitmap) 975 md_wakeup_thread(mddev->thread); 976 977 return 0; ··· 1178 p->rdev = rdev; 1179 goto abort; 1180 } 1181 + err = md_integrity_register(mddev); 1182 } 1183 abort: 1184 ··· 1561 unsigned long flags; 1562 conf_t *conf = mddev->private; 1563 struct list_head *head = &conf->retry_list; 1564 mdk_rdev_t *rdev; 1565 1566 md_check_recovery(mddev); ··· 1569 for (;;) { 1570 char b[BDEVNAME_SIZE]; 1571 1572 + flush_pending_writes(conf); 1573 1574 spin_lock_irqsave(&conf->device_lock, flags); 1575 if (list_empty(head)) { ··· 1583 1584 mddev = r1_bio->mddev; 1585 conf = mddev->private; 1586 + if (test_bit(R1BIO_IsSync, &r1_bio->state)) 1587 sync_request_write(mddev, r1_bio); 1588 + else { 1589 int disk; 1590 1591 /* we got a read error. Maybe the drive is bad. Maybe just ··· 1636 bio->bi_end_io = raid1_end_read_request; 1637 bio->bi_rw = READ | do_sync; 1638 bio->bi_private = r1_bio; 1639 generic_make_request(bio); 1640 } 1641 } 1642 cond_resched(); 1643 } 1644 } 1645 1646 ··· 2066 2067 md_set_array_sectors(mddev, raid1_size(mddev, 0, 0)); 2068 2069 mddev->queue->backing_dev_info.congested_fn = raid1_congested; 2070 mddev->queue->backing_dev_info.congested_data = mddev; 2071 + return md_integrity_register(mddev); 2072 } 2073 2074 static int stop(mddev_t *mddev)

+24 -73

drivers/md/raid10.c

··· 57 */ 58 #define NR_RAID10_BIOS 256 59 60 - static void unplug_slaves(mddev_t *mddev); 61 - 62 static void allow_barrier(conf_t *conf); 63 static void lower_barrier(conf_t *conf); 64 65 static void * r10bio_pool_alloc(gfp_t gfp_flags, void *data) 66 { 67 conf_t *conf = data; 68 - r10bio_t *r10_bio; 69 int size = offsetof(struct r10bio_s, devs[conf->copies]); 70 71 /* allocate a r10bio with room for raid_disks entries in the bios array */ 72 - r10_bio = kzalloc(size, gfp_flags); 73 - if (!r10_bio && conf->mddev) 74 - unplug_slaves(conf->mddev); 75 - 76 - return r10_bio; 77 } 78 79 static void r10bio_pool_free(void *r10_bio, void *data) ··· 99 int nalloc; 100 101 r10_bio = r10bio_pool_alloc(gfp_flags, conf); 102 - if (!r10_bio) { 103 - unplug_slaves(conf->mddev); 104 return NULL; 105 - } 106 107 if (test_bit(MD_RECOVERY_SYNC, &conf->mddev->recovery)) 108 nalloc = conf->copies; /* resync */ ··· 588 return disk; 589 } 590 591 - static void unplug_slaves(mddev_t *mddev) 592 - { 593 - conf_t *conf = mddev->private; 594 - int i; 595 - 596 - rcu_read_lock(); 597 - for (i=0; i < conf->raid_disks; i++) { 598 - mdk_rdev_t *rdev = rcu_dereference(conf->mirrors[i].rdev); 599 - if (rdev && !test_bit(Faulty, &rdev->flags) && atomic_read(&rdev->nr_pending)) { 600 - struct request_queue *r_queue = bdev_get_queue(rdev->bdev); 601 - 602 - atomic_inc(&rdev->nr_pending); 603 - rcu_read_unlock(); 604 - 605 - blk_unplug(r_queue); 606 - 607 - rdev_dec_pending(rdev, mddev); 608 - rcu_read_lock(); 609 - } 610 - } 611 - rcu_read_unlock(); 612 - } 613 - 614 - static void raid10_unplug(struct request_queue *q) 615 - { 616 - mddev_t *mddev = q->queuedata; 617 - 618 - unplug_slaves(q->queuedata); 619 - md_wakeup_thread(mddev->thread); 620 - } 621 - 622 static int raid10_congested(void *data, int bits) 623 { 624 mddev_t *mddev = data; ··· 609 return ret; 610 } 611 612 - static int flush_pending_writes(conf_t *conf) 613 { 614 /* Any writes that have been queued but are awaiting 615 * bitmap updates get flushed here. 616 - * We return 1 if any requests were actually submitted. 617 */ 618 - int rv = 0; 619 - 620 spin_lock_irq(&conf->device_lock); 621 622 if (conf->pending_bio_list.head) { 623 struct bio *bio; 624 bio = bio_list_get(&conf->pending_bio_list); 625 - /* Spinlock only taken to quiet a warning */ 626 - spin_lock(conf->mddev->queue->queue_lock); 627 - blk_remove_plug(conf->mddev->queue); 628 - spin_unlock(conf->mddev->queue->queue_lock); 629 spin_unlock_irq(&conf->device_lock); 630 /* flush any pending bitmap writes to disk 631 * before proceeding w/ I/O */ ··· 630 generic_make_request(bio); 631 bio = next; 632 } 633 - rv = 1; 634 } else 635 spin_unlock_irq(&conf->device_lock); 636 - return rv; 637 } 638 /* Barriers.... 639 * Sometimes we need to suspend IO while we do something else, 640 * either some resync/recovery, or reconfigure the array. ··· 669 670 /* Wait until no block IO is waiting (unless 'force') */ 671 wait_event_lock_irq(conf->wait_barrier, force || !conf->nr_waiting, 672 - conf->resync_lock, 673 - raid10_unplug(conf->mddev->queue)); 674 675 /* block any new IO from starting */ 676 conf->barrier++; ··· 677 /* No wait for all pending IO to complete */ 678 wait_event_lock_irq(conf->wait_barrier, 679 !conf->nr_pending && conf->barrier < RESYNC_DEPTH, 680 - conf->resync_lock, 681 - raid10_unplug(conf->mddev->queue)); 682 683 spin_unlock_irq(&conf->resync_lock); 684 } ··· 698 conf->nr_waiting++; 699 wait_event_lock_irq(conf->wait_barrier, !conf->barrier, 700 conf->resync_lock, 701 - raid10_unplug(conf->mddev->queue)); 702 conf->nr_waiting--; 703 } 704 conf->nr_pending++; ··· 735 conf->nr_pending == conf->nr_queued+1, 736 conf->resync_lock, 737 ({ flush_pending_writes(conf); 738 - raid10_unplug(conf->mddev->queue); })); 739 spin_unlock_irq(&conf->resync_lock); 740 } 741 ··· 930 atomic_inc(&r10_bio->remaining); 931 spin_lock_irqsave(&conf->device_lock, flags); 932 bio_list_add(&conf->pending_bio_list, mbio); 933 - blk_plug_device_unlocked(mddev->queue); 934 spin_unlock_irqrestore(&conf->device_lock, flags); 935 } 936 ··· 946 /* In case raid10d snuck in to freeze_array */ 947 wake_up(&conf->wait_barrier); 948 949 - if (do_sync) 950 md_wakeup_thread(mddev->thread); 951 952 return 0; ··· 1188 p->rdev = rdev; 1189 goto abort; 1190 } 1191 - md_integrity_register(mddev); 1192 } 1193 abort: 1194 ··· 1639 unsigned long flags; 1640 conf_t *conf = mddev->private; 1641 struct list_head *head = &conf->retry_list; 1642 - int unplug=0; 1643 mdk_rdev_t *rdev; 1644 1645 md_check_recovery(mddev); ··· 1646 for (;;) { 1647 char b[BDEVNAME_SIZE]; 1648 1649 - unplug += flush_pending_writes(conf); 1650 1651 spin_lock_irqsave(&conf->device_lock, flags); 1652 if (list_empty(head)) { ··· 1660 1661 mddev = r10_bio->mddev; 1662 conf = mddev->private; 1663 - if (test_bit(R10BIO_IsSync, &r10_bio->state)) { 1664 sync_request_write(mddev, r10_bio); 1665 - unplug = 1; 1666 - } else if (test_bit(R10BIO_IsRecover, &r10_bio->state)) { 1667 recovery_request_write(mddev, r10_bio); 1668 - unplug = 1; 1669 - } else { 1670 int mirror; 1671 /* we got a read error. Maybe the drive is bad. Maybe just 1672 * the block and we can fix it. ··· 1711 bio->bi_rw = READ | do_sync; 1712 bio->bi_private = r10_bio; 1713 bio->bi_end_io = raid10_end_read_request; 1714 - unplug = 1; 1715 generic_make_request(bio); 1716 } 1717 } 1718 cond_resched(); 1719 } 1720 - if (unplug) 1721 - unplug_slaves(mddev); 1722 } 1723 1724 ··· 2326 md_set_array_sectors(mddev, size); 2327 mddev->resync_max_sectors = size; 2328 2329 - mddev->queue->unplug_fn = raid10_unplug; 2330 mddev->queue->backing_dev_info.congested_fn = raid10_congested; 2331 mddev->queue->backing_dev_info.congested_data = mddev; 2332 ··· 2343 2344 if (conf->near_copies < conf->raid_disks) 2345 blk_queue_merge_bvec(mddev->queue, raid10_mergeable_bvec); 2346 - md_integrity_register(mddev); 2347 return 0; 2348 2349 out_free_conf:

··· 57 */ 58 #define NR_RAID10_BIOS 256 59 60 static void allow_barrier(conf_t *conf); 61 static void lower_barrier(conf_t *conf); 62 63 static void * r10bio_pool_alloc(gfp_t gfp_flags, void *data) 64 { 65 conf_t *conf = data; 66 int size = offsetof(struct r10bio_s, devs[conf->copies]); 67 68 /* allocate a r10bio with room for raid_disks entries in the bios array */ 69 + return kzalloc(size, gfp_flags); 70 } 71 72 static void r10bio_pool_free(void *r10_bio, void *data) ··· 106 int nalloc; 107 108 r10_bio = r10bio_pool_alloc(gfp_flags, conf); 109 + if (!r10_bio) 110 return NULL; 111 112 if (test_bit(MD_RECOVERY_SYNC, &conf->mddev->recovery)) 113 nalloc = conf->copies; /* resync */ ··· 597 return disk; 598 } 599 600 static int raid10_congested(void *data, int bits) 601 { 602 mddev_t *mddev = data; ··· 649 return ret; 650 } 651 652 + static void flush_pending_writes(conf_t *conf) 653 { 654 /* Any writes that have been queued but are awaiting 655 * bitmap updates get flushed here. 656 */ 657 spin_lock_irq(&conf->device_lock); 658 659 if (conf->pending_bio_list.head) { 660 struct bio *bio; 661 bio = bio_list_get(&conf->pending_bio_list); 662 spin_unlock_irq(&conf->device_lock); 663 /* flush any pending bitmap writes to disk 664 * before proceeding w/ I/O */ ··· 677 generic_make_request(bio); 678 bio = next; 679 } 680 } else 681 spin_unlock_irq(&conf->device_lock); 682 } 683 + 684 + static void md_kick_device(mddev_t *mddev) 685 + { 686 + blk_flush_plug(current); 687 + md_wakeup_thread(mddev->thread); 688 + } 689 + 690 /* Barriers.... 691 * Sometimes we need to suspend IO while we do something else, 692 * either some resync/recovery, or reconfigure the array. ··· 711 712 /* Wait until no block IO is waiting (unless 'force') */ 713 wait_event_lock_irq(conf->wait_barrier, force || !conf->nr_waiting, 714 + conf->resync_lock, md_kick_device(conf->mddev)); 715 716 /* block any new IO from starting */ 717 conf->barrier++; ··· 720 /* No wait for all pending IO to complete */ 721 wait_event_lock_irq(conf->wait_barrier, 722 !conf->nr_pending && conf->barrier < RESYNC_DEPTH, 723 + conf->resync_lock, md_kick_device(conf->mddev)); 724 725 spin_unlock_irq(&conf->resync_lock); 726 } ··· 742 conf->nr_waiting++; 743 wait_event_lock_irq(conf->wait_barrier, !conf->barrier, 744 conf->resync_lock, 745 + md_kick_device(conf->mddev)); 746 conf->nr_waiting--; 747 } 748 conf->nr_pending++; ··· 779 conf->nr_pending == conf->nr_queued+1, 780 conf->resync_lock, 781 ({ flush_pending_writes(conf); 782 + md_kick_device(conf->mddev); })); 783 spin_unlock_irq(&conf->resync_lock); 784 } 785 ··· 974 atomic_inc(&r10_bio->remaining); 975 spin_lock_irqsave(&conf->device_lock, flags); 976 bio_list_add(&conf->pending_bio_list, mbio); 977 spin_unlock_irqrestore(&conf->device_lock, flags); 978 } 979 ··· 991 /* In case raid10d snuck in to freeze_array */ 992 wake_up(&conf->wait_barrier); 993 994 + if (do_sync || !mddev->bitmap) 995 md_wakeup_thread(mddev->thread); 996 997 return 0; ··· 1233 p->rdev = rdev; 1234 goto abort; 1235 } 1236 + err = md_integrity_register(mddev); 1237 } 1238 abort: 1239 ··· 1684 unsigned long flags; 1685 conf_t *conf = mddev->private; 1686 struct list_head *head = &conf->retry_list; 1687 mdk_rdev_t *rdev; 1688 1689 md_check_recovery(mddev); ··· 1692 for (;;) { 1693 char b[BDEVNAME_SIZE]; 1694 1695 + flush_pending_writes(conf); 1696 1697 spin_lock_irqsave(&conf->device_lock, flags); 1698 if (list_empty(head)) { ··· 1706 1707 mddev = r10_bio->mddev; 1708 conf = mddev->private; 1709 + if (test_bit(R10BIO_IsSync, &r10_bio->state)) 1710 sync_request_write(mddev, r10_bio); 1711 + else if (test_bit(R10BIO_IsRecover, &r10_bio->state)) 1712 recovery_request_write(mddev, r10_bio); 1713 + else { 1714 int mirror; 1715 /* we got a read error. Maybe the drive is bad. Maybe just 1716 * the block and we can fix it. ··· 1759 bio->bi_rw = READ | do_sync; 1760 bio->bi_private = r10_bio; 1761 bio->bi_end_io = raid10_end_read_request; 1762 generic_make_request(bio); 1763 } 1764 } 1765 cond_resched(); 1766 } 1767 } 1768 1769 ··· 2377 md_set_array_sectors(mddev, size); 2378 mddev->resync_max_sectors = size; 2379 2380 mddev->queue->backing_dev_info.congested_fn = raid10_congested; 2381 mddev->queue->backing_dev_info.congested_data = mddev; 2382 ··· 2395 2396 if (conf->near_copies < conf->raid_disks) 2397 blk_queue_merge_bvec(mddev->queue, raid10_mergeable_bvec); 2398 + 2399 + if (md_integrity_register(mddev)) 2400 + goto out_free_conf; 2401 + 2402 return 0; 2403 2404 out_free_conf:

+9 -54

drivers/md/raid5.c

··· 433 return 0; 434 } 435 436 - static void unplug_slaves(mddev_t *mddev); 437 - 438 static struct stripe_head * 439 get_active_stripe(raid5_conf_t *conf, sector_t sector, 440 int previous, int noblock, int noquiesce) ··· 461 < (conf->max_nr_stripes *3/4) 462 || !conf->inactive_blocked), 463 conf->device_lock, 464 - md_raid5_unplug_device(conf) 465 - ); 466 conf->inactive_blocked = 0; 467 } else 468 init_stripe(sh, sector, previous); ··· 1470 wait_event_lock_irq(conf->wait_for_stripe, 1471 !list_empty(&conf->inactive_list), 1472 conf->device_lock, 1473 - unplug_slaves(conf->mddev) 1474 - ); 1475 osh = get_free_stripe(conf); 1476 spin_unlock_irq(&conf->device_lock); 1477 atomic_set(&nsh->count, 1); ··· 3641 } 3642 } 3643 3644 - static void unplug_slaves(mddev_t *mddev) 3645 { 3646 - raid5_conf_t *conf = mddev->private; 3647 - int i; 3648 - int devs = max(conf->raid_disks, conf->previous_raid_disks); 3649 - 3650 - rcu_read_lock(); 3651 - for (i = 0; i < devs; i++) { 3652 - mdk_rdev_t *rdev = rcu_dereference(conf->disks[i].rdev); 3653 - if (rdev && !test_bit(Faulty, &rdev->flags) && atomic_read(&rdev->nr_pending)) { 3654 - struct request_queue *r_queue = bdev_get_queue(rdev->bdev); 3655 - 3656 - atomic_inc(&rdev->nr_pending); 3657 - rcu_read_unlock(); 3658 - 3659 - blk_unplug(r_queue); 3660 - 3661 - rdev_dec_pending(rdev, mddev); 3662 - rcu_read_lock(); 3663 - } 3664 - } 3665 - rcu_read_unlock(); 3666 - } 3667 - 3668 - void md_raid5_unplug_device(raid5_conf_t *conf) 3669 - { 3670 - unsigned long flags; 3671 - 3672 - spin_lock_irqsave(&conf->device_lock, flags); 3673 - 3674 - if (plugger_remove_plug(&conf->plug)) { 3675 - conf->seq_flush++; 3676 - raid5_activate_delayed(conf); 3677 - } 3678 md_wakeup_thread(conf->mddev->thread); 3679 - 3680 - spin_unlock_irqrestore(&conf->device_lock, flags); 3681 - 3682 - unplug_slaves(conf->mddev); 3683 } 3684 - EXPORT_SYMBOL_GPL(md_raid5_unplug_device); 3685 3686 static void raid5_unplug(struct plug_handle *plug) 3687 { 3688 raid5_conf_t *conf = container_of(plug, raid5_conf_t, plug); 3689 - md_raid5_unplug_device(conf); 3690 - } 3691 3692 - static void raid5_unplug_queue(struct request_queue *q) 3693 - { 3694 - mddev_t *mddev = q->queuedata; 3695 - md_raid5_unplug_device(mddev->private); 3696 } 3697 3698 int md_raid5_congested(mddev_t *mddev, int bits) ··· 4057 * add failed due to overlap. Flush everything 4058 * and wait a while 4059 */ 4060 - md_raid5_unplug_device(conf); 4061 release_stripe(sh); 4062 schedule(); 4063 goto retry; ··· 4322 4323 if (sector_nr >= max_sector) { 4324 /* just being told to finish up .. nothing much to do */ 4325 - unplug_slaves(mddev); 4326 4327 if (test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery)) { 4328 end_reshape(conf); ··· 4525 spin_unlock_irq(&conf->device_lock); 4526 4527 async_tx_issue_pending_all(); 4528 - unplug_slaves(mddev); 4529 4530 pr_debug("--- raid5d inactive\n"); 4531 } ··· 5159 5160 mddev->queue->backing_dev_info.congested_data = mddev; 5161 mddev->queue->backing_dev_info.congested_fn = raid5_congested; 5162 - mddev->queue->unplug_fn = raid5_unplug_queue; 5163 5164 chunk_size = mddev->chunk_sectors << 9; 5165 blk_queue_io_min(mddev->queue, chunk_size);

··· 433 return 0; 434 } 435 436 static struct stripe_head * 437 get_active_stripe(raid5_conf_t *conf, sector_t sector, 438 int previous, int noblock, int noquiesce) ··· 463 < (conf->max_nr_stripes *3/4) 464 || !conf->inactive_blocked), 465 conf->device_lock, 466 + md_raid5_kick_device(conf)); 467 conf->inactive_blocked = 0; 468 } else 469 init_stripe(sh, sector, previous); ··· 1473 wait_event_lock_irq(conf->wait_for_stripe, 1474 !list_empty(&conf->inactive_list), 1475 conf->device_lock, 1476 + blk_flush_plug(current)); 1477 osh = get_free_stripe(conf); 1478 spin_unlock_irq(&conf->device_lock); 1479 atomic_set(&nsh->count, 1); ··· 3645 } 3646 } 3647 3648 + void md_raid5_kick_device(raid5_conf_t *conf) 3649 { 3650 + blk_flush_plug(current); 3651 + raid5_activate_delayed(conf); 3652 md_wakeup_thread(conf->mddev->thread); 3653 } 3654 + EXPORT_SYMBOL_GPL(md_raid5_kick_device); 3655 3656 static void raid5_unplug(struct plug_handle *plug) 3657 { 3658 raid5_conf_t *conf = container_of(plug, raid5_conf_t, plug); 3659 3660 + md_raid5_kick_device(conf); 3661 } 3662 3663 int md_raid5_congested(mddev_t *mddev, int bits) ··· 4100 * add failed due to overlap. Flush everything 4101 * and wait a while 4102 */ 4103 + md_raid5_kick_device(conf); 4104 release_stripe(sh); 4105 schedule(); 4106 goto retry; ··· 4365 4366 if (sector_nr >= max_sector) { 4367 /* just being told to finish up .. nothing much to do */ 4368 4369 if (test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery)) { 4370 end_reshape(conf); ··· 4569 spin_unlock_irq(&conf->device_lock); 4570 4571 async_tx_issue_pending_all(); 4572 4573 pr_debug("--- raid5d inactive\n"); 4574 } ··· 5204 5205 mddev->queue->backing_dev_info.congested_data = mddev; 5206 mddev->queue->backing_dev_info.congested_fn = raid5_congested; 5207 + mddev->queue->queue_lock = &conf->device_lock; 5208 5209 chunk_size = mddev->chunk_sectors << 9; 5210 blk_queue_io_min(mddev->queue, chunk_size);

+1 -1

drivers/md/raid5.h

··· 503 } 504 505 extern int md_raid5_congested(mddev_t *mddev, int bits); 506 - extern void md_raid5_unplug_device(raid5_conf_t *conf); 507 extern int raid5_set_cache_size(mddev_t *mddev, int size); 508 #endif

··· 503 } 504 505 extern int md_raid5_congested(mddev_t *mddev, int bits); 506 + extern void md_raid5_kick_device(raid5_conf_t *conf); 507 extern int raid5_set_cache_size(mddev_t *mddev, int size); 508 #endif

+8 -9

drivers/message/i2o/i2o_block.c

··· 695 }; 696 697 /** 698 - * i2o_block_media_changed - Have we seen a media change? 699 * @disk: gendisk which should be verified 700 * 701 * Verifies if the media has changed. 702 * 703 * Returns 1 if the media was changed or 0 otherwise. 704 */ 705 - static int i2o_block_media_changed(struct gendisk *disk) 706 { 707 struct i2o_block_device *p = disk->private_data; 708 709 if (p->media_change_flag) { 710 p->media_change_flag = 0; 711 - return 1; 712 } 713 return 0; 714 } ··· 897 { 898 struct request *req; 899 900 - while (!blk_queue_plugged(q)) { 901 - req = blk_peek_request(q); 902 - if (!req) 903 - break; 904 - 905 if (req->cmd_type == REQ_TYPE_FS) { 906 struct i2o_block_delayed_request *dreq; 907 struct i2o_block_request *ireq = req->special; ··· 948 .ioctl = i2o_block_ioctl, 949 .compat_ioctl = i2o_block_ioctl, 950 .getgeo = i2o_block_getgeo, 951 - .media_changed = i2o_block_media_changed 952 }; 953 954 /** ··· 1000 gd->major = I2O_MAJOR; 1001 gd->queue = queue; 1002 gd->fops = &i2o_block_fops; 1003 gd->private_data = dev; 1004 1005 dev->gd = gd;

··· 695 }; 696 697 /** 698 + * i2o_block_check_events - Have we seen a media change? 699 * @disk: gendisk which should be verified 700 + * @clearing: events being cleared 701 * 702 * Verifies if the media has changed. 703 * 704 * Returns 1 if the media was changed or 0 otherwise. 705 */ 706 + static unsigned int i2o_block_check_events(struct gendisk *disk, 707 + unsigned int clearing) 708 { 709 struct i2o_block_device *p = disk->private_data; 710 711 if (p->media_change_flag) { 712 p->media_change_flag = 0; 713 + return DISK_EVENT_MEDIA_CHANGE; 714 } 715 return 0; 716 } ··· 895 { 896 struct request *req; 897 898 + while ((req = blk_peek_request(q)) != NULL) { 899 if (req->cmd_type == REQ_TYPE_FS) { 900 struct i2o_block_delayed_request *dreq; 901 struct i2o_block_request *ireq = req->special; ··· 950 .ioctl = i2o_block_ioctl, 951 .compat_ioctl = i2o_block_ioctl, 952 .getgeo = i2o_block_getgeo, 953 + .check_events = i2o_block_check_events, 954 }; 955 956 /** ··· 1002 gd->major = I2O_MAJOR; 1003 gd->queue = queue; 1004 gd->fops = &i2o_block_fops; 1005 + gd->events = DISK_EVENT_MEDIA_CHANGE; 1006 gd->private_data = dev; 1007 1008 dev->gd = gd;

+1 -2

drivers/mmc/card/queue.c

··· 55 56 spin_lock_irq(q->queue_lock); 57 set_current_state(TASK_INTERRUPTIBLE); 58 - if (!blk_queue_plugged(q)) 59 - req = blk_fetch_request(q); 60 mq->req = req; 61 spin_unlock_irq(q->queue_lock); 62

··· 55 56 spin_lock_irq(q->queue_lock); 57 set_current_state(TASK_INTERRUPTIBLE); 58 + req = blk_fetch_request(q); 59 mq->req = req; 60 spin_unlock_irq(q->queue_lock); 61

+1 -1

drivers/s390/block/dasd.c

··· 1917 return; 1918 } 1919 /* Now we try to fetch requests from the request queue */ 1920 - while (!blk_queue_plugged(queue) && (req = blk_peek_request(queue))) { 1921 if (basedev->features & DASD_FEATURE_READONLY && 1922 rq_data_dir(req) == WRITE) { 1923 DBF_DEV_EVENT(DBF_ERR, basedev,

··· 1917 return; 1918 } 1919 /* Now we try to fetch requests from the request queue */ 1920 + while ((req = blk_peek_request(queue))) { 1921 if (basedev->features & DASD_FEATURE_READONLY && 1922 rq_data_dir(req) == WRITE) { 1923 DBF_DEV_EVENT(DBF_ERR, basedev,

+6 -6

drivers/s390/char/tape_block.c

··· 48 static DEFINE_MUTEX(tape_block_mutex); 49 static int tapeblock_open(struct block_device *, fmode_t); 50 static int tapeblock_release(struct gendisk *, fmode_t); 51 - static int tapeblock_medium_changed(struct gendisk *); 52 static int tapeblock_revalidate_disk(struct gendisk *); 53 54 static const struct block_device_operations tapeblock_fops = { 55 .owner = THIS_MODULE, 56 .open = tapeblock_open, 57 .release = tapeblock_release, 58 - .media_changed = tapeblock_medium_changed, 59 .revalidate_disk = tapeblock_revalidate_disk, 60 }; 61 ··· 161 162 spin_lock_irq(&device->blk_data.request_queue_lock); 163 while ( 164 - !blk_queue_plugged(queue) && 165 blk_peek_request(queue) && 166 nr_queued < TAPEBLOCK_MIN_REQUEUE 167 ) { ··· 236 disk->major = tapeblock_major; 237 disk->first_minor = device->first_minor; 238 disk->fops = &tapeblock_fops; 239 disk->private_data = tape_get_device(device); 240 disk->queue = blkdat->request_queue; 241 set_capacity(disk, 0); ··· 340 return 0; 341 } 342 343 - static int 344 - tapeblock_medium_changed(struct gendisk *disk) 345 { 346 struct tape_device *device; 347 ··· 349 DBF_LH(6, "tapeblock_medium_changed(%p) = %d\n", 350 device, device->blk_data.medium_changed); 351 352 - return device->blk_data.medium_changed; 353 } 354 355 /*

··· 48 static DEFINE_MUTEX(tape_block_mutex); 49 static int tapeblock_open(struct block_device *, fmode_t); 50 static int tapeblock_release(struct gendisk *, fmode_t); 51 + static unsigned int tapeblock_check_events(struct gendisk *, unsigned int); 52 static int tapeblock_revalidate_disk(struct gendisk *); 53 54 static const struct block_device_operations tapeblock_fops = { 55 .owner = THIS_MODULE, 56 .open = tapeblock_open, 57 .release = tapeblock_release, 58 + .check_events = tapeblock_check_events, 59 .revalidate_disk = tapeblock_revalidate_disk, 60 }; 61 ··· 161 162 spin_lock_irq(&device->blk_data.request_queue_lock); 163 while ( 164 blk_peek_request(queue) && 165 nr_queued < TAPEBLOCK_MIN_REQUEUE 166 ) { ··· 237 disk->major = tapeblock_major; 238 disk->first_minor = device->first_minor; 239 disk->fops = &tapeblock_fops; 240 + disk->events = DISK_EVENT_MEDIA_CHANGE; 241 disk->private_data = tape_get_device(device); 242 disk->queue = blkdat->request_queue; 243 set_capacity(disk, 0); ··· 340 return 0; 341 } 342 343 + static unsigned int 344 + tapeblock_check_events(struct gendisk *disk, unsigned int clearing) 345 { 346 struct tape_device *device; 347 ··· 349 DBF_LH(6, "tapeblock_medium_changed(%p) = %d\n", 350 device, device->blk_data.medium_changed); 351 352 + return device->blk_data.medium_changed ? DISK_EVENT_MEDIA_CHANGE : 0; 353 } 354 355 /*

+19 -25

drivers/scsi/scsi_lib.c

··· 67 68 struct kmem_cache *scsi_sdb_cache; 69 70 static void scsi_run_queue(struct request_queue *q); 71 72 /* ··· 156 /* 157 * Requeue this command. It will go before all other commands 158 * that are already in the queue. 159 - * 160 - * NOTE: there is magic here about the way the queue is plugged if 161 - * we have no outstanding commands. 162 - * 163 - * Although we *don't* plug the queue, we call the request 164 - * function. The SCSI request function detects the blocked condition 165 - * and plugs the queue appropriately. 166 - */ 167 spin_lock_irqsave(q->queue_lock, flags); 168 blk_requeue_request(q, cmd->request); 169 spin_unlock_irqrestore(q->queue_lock, flags); ··· 1226 case BLKPREP_DEFER: 1227 /* 1228 * If we defer, the blk_peek_request() returns NULL, but the 1229 - * queue must be restarted, so we plug here if no returning 1230 - * command will automatically do that. 1231 */ 1232 if (sdev->device_busy == 0) 1233 - blk_plug_device(q); 1234 break; 1235 default: 1236 req->cmd_flags |= REQ_DONTPREP; ··· 1269 sdev_printk(KERN_INFO, sdev, 1270 "unblocking device at zero depth\n")); 1271 } else { 1272 - blk_plug_device(q); 1273 return 0; 1274 } 1275 } ··· 1499 * the host is no longer able to accept any more requests. 1500 */ 1501 shost = sdev->host; 1502 - while (!blk_queue_plugged(q)) { 1503 int rtn; 1504 /* 1505 * get next queueable request. We do this early to make sure ··· 1578 */ 1579 rtn = scsi_dispatch_cmd(cmd); 1580 spin_lock_irq(q->queue_lock); 1581 - if(rtn) { 1582 - /* we're refusing the command; because of 1583 - * the way locks get dropped, we need to 1584 - * check here if plugging is required */ 1585 - if(sdev->device_busy == 0) 1586 - blk_plug_device(q); 1587 - 1588 - break; 1589 - } 1590 } 1591 1592 goto out; ··· 1598 spin_lock_irq(q->queue_lock); 1599 blk_requeue_request(q, req); 1600 sdev->device_busy--; 1601 - if(sdev->device_busy == 0) 1602 - blk_plug_device(q); 1603 - out: 1604 /* must be careful here...if we trigger the ->remove() function 1605 * we cannot be holding the q lock */ 1606 spin_unlock_irq(q->queue_lock);

··· 67 68 struct kmem_cache *scsi_sdb_cache; 69 70 + /* 71 + * When to reinvoke queueing after a resource shortage. It's 3 msecs to 72 + * not change behaviour from the previous unplug mechanism, experimentation 73 + * may prove this needs changing. 74 + */ 75 + #define SCSI_QUEUE_DELAY 3 76 + 77 static void scsi_run_queue(struct request_queue *q); 78 79 /* ··· 149 /* 150 * Requeue this command. It will go before all other commands 151 * that are already in the queue. 152 + */ 153 spin_lock_irqsave(q->queue_lock, flags); 154 blk_requeue_request(q, cmd->request); 155 spin_unlock_irqrestore(q->queue_lock, flags); ··· 1226 case BLKPREP_DEFER: 1227 /* 1228 * If we defer, the blk_peek_request() returns NULL, but the 1229 + * queue must be restarted, so we schedule a callback to happen 1230 + * shortly. 1231 */ 1232 if (sdev->device_busy == 0) 1233 + blk_delay_queue(q, SCSI_QUEUE_DELAY); 1234 break; 1235 default: 1236 req->cmd_flags |= REQ_DONTPREP; ··· 1269 sdev_printk(KERN_INFO, sdev, 1270 "unblocking device at zero depth\n")); 1271 } else { 1272 + blk_delay_queue(q, SCSI_QUEUE_DELAY); 1273 return 0; 1274 } 1275 } ··· 1499 * the host is no longer able to accept any more requests. 1500 */ 1501 shost = sdev->host; 1502 + for (;;) { 1503 int rtn; 1504 /* 1505 * get next queueable request. We do this early to make sure ··· 1578 */ 1579 rtn = scsi_dispatch_cmd(cmd); 1580 spin_lock_irq(q->queue_lock); 1581 + if (rtn) 1582 + goto out_delay; 1583 } 1584 1585 goto out; ··· 1605 spin_lock_irq(q->queue_lock); 1606 blk_requeue_request(q, req); 1607 sdev->device_busy--; 1608 + out_delay: 1609 + if (sdev->device_busy == 0) 1610 + blk_delay_queue(q, SCSI_QUEUE_DELAY); 1611 + out: 1612 /* must be careful here...if we trigger the ->remove() function 1613 * we cannot be holding the q lock */ 1614 spin_unlock_irq(q->queue_lock);

+1 -1

drivers/scsi/scsi_transport_fc.c

··· 3913 if (!get_device(dev)) 3914 return; 3915 3916 - while (!blk_queue_plugged(q)) { 3917 if (rport && (rport->port_state == FC_PORTSTATE_BLOCKED) && 3918 !(rport->flags & FC_RPORT_FAST_FAIL_TIMEDOUT)) 3919 break;

··· 3913 if (!get_device(dev)) 3914 return; 3915 3916 + while (1) { 3917 if (rport && (rport->port_state == FC_PORTSTATE_BLOCKED) && 3918 !(rport->flags & FC_RPORT_FAST_FAIL_TIMEDOUT)) 3919 break;

+1 -5

drivers/scsi/scsi_transport_sas.c

··· 173 int ret; 174 int (*handler)(struct Scsi_Host *, struct sas_rphy *, struct request *); 175 176 - while (!blk_queue_plugged(q)) { 177 - req = blk_fetch_request(q); 178 - if (!req) 179 - break; 180 - 181 spin_unlock_irq(q->queue_lock); 182 183 handler = to_sas_internal(shost->transportt)->f->smp_handler;

··· 173 int ret; 174 int (*handler)(struct Scsi_Host *, struct sas_rphy *, struct request *); 175 176 + while ((req = blk_fetch_request(q)) != NULL) { 177 spin_unlock_irq(q->queue_lock); 178 179 handler = to_sas_internal(shost->transportt)->f->smp_handler;

+7 -4

drivers/staging/hv/blkvsc_drv.c

··· 124 125 static int blkvsc_open(struct block_device *bdev, fmode_t mode); 126 static int blkvsc_release(struct gendisk *disk, fmode_t mode); 127 - static int blkvsc_media_changed(struct gendisk *gd); 128 static int blkvsc_revalidate_disk(struct gendisk *gd); 129 static int blkvsc_getgeo(struct block_device *bd, struct hd_geometry *hg); 130 static int blkvsc_ioctl(struct block_device *bd, fmode_t mode, ··· 156 .owner = THIS_MODULE, 157 .open = blkvsc_open, 158 .release = blkvsc_release, 159 - .media_changed = blkvsc_media_changed, 160 .revalidate_disk = blkvsc_revalidate_disk, 161 .getgeo = blkvsc_getgeo, 162 .ioctl = blkvsc_ioctl, ··· 358 else 359 blkdev->gd->first_minor = 0; 360 blkdev->gd->fops = &block_ops; 361 blkdev->gd->private_data = blkdev; 362 blkdev->gd->driverfs_dev = &(blkdev->device_ctx->device); 363 sprintf(blkdev->gd->disk_name, "hd%c", 'a' + devnum); ··· 1339 return 0; 1340 } 1341 1342 - static int blkvsc_media_changed(struct gendisk *gd) 1343 { 1344 DPRINT_DBG(BLKVSC_DRV, "- enter\n"); 1345 - return 1; 1346 } 1347 1348 static int blkvsc_revalidate_disk(struct gendisk *gd)

··· 124 125 static int blkvsc_open(struct block_device *bdev, fmode_t mode); 126 static int blkvsc_release(struct gendisk *disk, fmode_t mode); 127 + static unsigned int blkvsc_check_events(struct gendisk *gd, 128 + unsigned int clearing); 129 static int blkvsc_revalidate_disk(struct gendisk *gd); 130 static int blkvsc_getgeo(struct block_device *bd, struct hd_geometry *hg); 131 static int blkvsc_ioctl(struct block_device *bd, fmode_t mode, ··· 155 .owner = THIS_MODULE, 156 .open = blkvsc_open, 157 .release = blkvsc_release, 158 + .check_events = blkvsc_check_events, 159 .revalidate_disk = blkvsc_revalidate_disk, 160 .getgeo = blkvsc_getgeo, 161 .ioctl = blkvsc_ioctl, ··· 357 else 358 blkdev->gd->first_minor = 0; 359 blkdev->gd->fops = &block_ops; 360 + blkdev->gd->events = DISK_EVENT_MEDIA_CHANGE; 361 blkdev->gd->private_data = blkdev; 362 blkdev->gd->driverfs_dev = &(blkdev->device_ctx->device); 363 sprintf(blkdev->gd->disk_name, "hd%c", 'a' + devnum); ··· 1337 return 0; 1338 } 1339 1340 + static unsigned int blkvsc_check_events(struct gendisk *gd, 1341 + unsigned int clearing) 1342 { 1343 DPRINT_DBG(BLKVSC_DRV, "- enter\n"); 1344 + return DISK_EVENT_MEDIA_CHANGE; 1345 } 1346 1347 static int blkvsc_revalidate_disk(struct gendisk *gd)

+7 -4

drivers/staging/westbridge/astoria/block/cyasblkdev_block.c

··· 381 return -ENOTTY; 382 } 383 384 - /* Media_changed block_device opp 385 * this one is called by kernel to confirm if the media really changed 386 * as we indicated by issuing check_disk_change() call */ 387 - int cyasblkdev_media_changed(struct gendisk *gd) 388 { 389 struct cyasblkdev_blk_data *bd; 390 ··· 402 #endif 403 } 404 405 - /* return media change state "1" yes, 0 no */ 406 return 0; 407 } 408 ··· 432 .ioctl = cyasblkdev_blk_ioctl, 433 /* .getgeo = cyasblkdev_blk_getgeo, */ 434 /* added to support media removal( real and simulated) media */ 435 - .media_changed = cyasblkdev_media_changed, 436 /* added to support media removal( real and simulated) media */ 437 .revalidate_disk = cyasblkdev_revalidate_disk, 438 .owner = THIS_MODULE, ··· 1090 bd->user_disk_0->first_minor = devidx << CYASBLKDEV_SHIFT; 1091 bd->user_disk_0->minors = 8; 1092 bd->user_disk_0->fops = &cyasblkdev_bdops; 1093 bd->user_disk_0->private_data = bd; 1094 bd->user_disk_0->queue = bd->queue.queue; 1095 bd->dbgprn_flags = DBGPRN_RD_RQ; ··· 1191 bd->user_disk_1->first_minor = (devidx + 1) << CYASBLKDEV_SHIFT; 1192 bd->user_disk_1->minors = 8; 1193 bd->user_disk_1->fops = &cyasblkdev_bdops; 1194 bd->user_disk_1->private_data = bd; 1195 bd->user_disk_1->queue = bd->queue.queue; 1196 bd->dbgprn_flags = DBGPRN_RD_RQ; ··· 1280 (devidx + 2) << CYASBLKDEV_SHIFT; 1281 bd->system_disk->minors = 8; 1282 bd->system_disk->fops = &cyasblkdev_bdops; 1283 bd->system_disk->private_data = bd; 1284 bd->system_disk->queue = bd->queue.queue; 1285 /* don't search for vfat

··· 381 return -ENOTTY; 382 } 383 384 + /* check_events block_device opp 385 * this one is called by kernel to confirm if the media really changed 386 * as we indicated by issuing check_disk_change() call */ 387 + unsigned int cyasblkdev_check_events(struct gendisk *gd, unsigned int clearing) 388 { 389 struct cyasblkdev_blk_data *bd; 390 ··· 402 #endif 403 } 404 405 + /* return media change state - DISK_EVENT_MEDIA_CHANGE yes, 0 no */ 406 return 0; 407 } 408 ··· 432 .ioctl = cyasblkdev_blk_ioctl, 433 /* .getgeo = cyasblkdev_blk_getgeo, */ 434 /* added to support media removal( real and simulated) media */ 435 + .check_events = cyasblkdev_check_events, 436 /* added to support media removal( real and simulated) media */ 437 .revalidate_disk = cyasblkdev_revalidate_disk, 438 .owner = THIS_MODULE, ··· 1090 bd->user_disk_0->first_minor = devidx << CYASBLKDEV_SHIFT; 1091 bd->user_disk_0->minors = 8; 1092 bd->user_disk_0->fops = &cyasblkdev_bdops; 1093 + bd->user_disk_0->events = DISK_EVENT_MEDIA_CHANGE; 1094 bd->user_disk_0->private_data = bd; 1095 bd->user_disk_0->queue = bd->queue.queue; 1096 bd->dbgprn_flags = DBGPRN_RD_RQ; ··· 1190 bd->user_disk_1->first_minor = (devidx + 1) << CYASBLKDEV_SHIFT; 1191 bd->user_disk_1->minors = 8; 1192 bd->user_disk_1->fops = &cyasblkdev_bdops; 1193 + bd->user_disk_0->events = DISK_EVENT_MEDIA_CHANGE; 1194 bd->user_disk_1->private_data = bd; 1195 bd->user_disk_1->queue = bd->queue.queue; 1196 bd->dbgprn_flags = DBGPRN_RD_RQ; ··· 1278 (devidx + 2) << CYASBLKDEV_SHIFT; 1279 bd->system_disk->minors = 8; 1280 bd->system_disk->fops = &cyasblkdev_bdops; 1281 + bd->system_disk->events = DISK_EVENT_MEDIA_CHANGE; 1282 bd->system_disk->private_data = bd; 1283 bd->system_disk->queue = bd->queue.queue; 1284 /* don't search for vfat

+3 -4

drivers/target/target_core_iblock.c

··· 391 { 392 struct se_device *dev = task->task_se_cmd->se_dev; 393 struct iblock_req *req = IBLOCK_REQ(task); 394 - struct iblock_dev *ibd = (struct iblock_dev *)req->ib_dev; 395 - struct request_queue *q = bdev_get_queue(ibd->ibd_bd); 396 struct bio *bio = req->ib_bio, *nbio = NULL; 397 int rw; 398 399 if (task->task_data_direction == DMA_TO_DEVICE) { ··· 410 rw = READ; 411 } 412 413 while (bio) { 414 nbio = bio->bi_next; 415 bio->bi_next = NULL; ··· 420 submit_bio(rw, bio); 421 bio = nbio; 422 } 423 424 - if (q->unplug_fn) 425 - q->unplug_fn(q); 426 return PYX_TRANSPORT_SENT_TO_TRANSPORT; 427 } 428

··· 391 { 392 struct se_device *dev = task->task_se_cmd->se_dev; 393 struct iblock_req *req = IBLOCK_REQ(task); 394 struct bio *bio = req->ib_bio, *nbio = NULL; 395 + struct blk_plug plug; 396 int rw; 397 398 if (task->task_data_direction == DMA_TO_DEVICE) { ··· 411 rw = READ; 412 } 413 414 + blk_start_plug(&plug); 415 while (bio) { 416 nbio = bio->bi_next; 417 bio->bi_next = NULL; ··· 420 submit_bio(rw, bio); 421 bio = nbio; 422 } 423 + blk_finish_plug(&plug); 424 425 return PYX_TRANSPORT_SENT_TO_TRANSPORT; 426 } 427

-1

fs/adfs/inode.c

··· 72 static const struct address_space_operations adfs_aops = { 73 .readpage = adfs_readpage, 74 .writepage = adfs_writepage, 75 - .sync_page = block_sync_page, 76 .write_begin = adfs_write_begin, 77 .write_end = generic_write_end, 78 .bmap = _adfs_bmap

··· 72 static const struct address_space_operations adfs_aops = { 73 .readpage = adfs_readpage, 74 .writepage = adfs_writepage, 75 .write_begin = adfs_write_begin, 76 .write_end = generic_write_end, 77 .bmap = _adfs_bmap

-2

fs/affs/file.c

··· 429 const struct address_space_operations affs_aops = { 430 .readpage = affs_readpage, 431 .writepage = affs_writepage, 432 - .sync_page = block_sync_page, 433 .write_begin = affs_write_begin, 434 .write_end = generic_write_end, 435 .bmap = _affs_bmap ··· 785 const struct address_space_operations affs_aops_ofs = { 786 .readpage = affs_readpage_ofs, 787 //.writepage = affs_writepage_ofs, 788 - //.sync_page = affs_sync_page_ofs, 789 .write_begin = affs_write_begin_ofs, 790 .write_end = affs_write_end_ofs 791 };

··· 429 const struct address_space_operations affs_aops = { 430 .readpage = affs_readpage, 431 .writepage = affs_writepage, 432 .write_begin = affs_write_begin, 433 .write_end = generic_write_end, 434 .bmap = _affs_bmap ··· 786 const struct address_space_operations affs_aops_ofs = { 787 .readpage = affs_readpage_ofs, 788 //.writepage = affs_writepage_ofs, 789 .write_begin = affs_write_begin_ofs, 790 .write_end = affs_write_end_ofs 791 };

+7 -70

fs/aio.c

··· 34 #include <linux/security.h> 35 #include <linux/eventfd.h> 36 #include <linux/blkdev.h> 37 - #include <linux/mempool.h> 38 - #include <linux/hash.h> 39 #include <linux/compat.h> 40 41 #include <asm/kmap_types.h> ··· 63 static DEFINE_SPINLOCK(fput_lock); 64 static LIST_HEAD(fput_head); 65 66 - #define AIO_BATCH_HASH_BITS 3 /* allocated on-stack, so don't go crazy */ 67 - #define AIO_BATCH_HASH_SIZE (1 << AIO_BATCH_HASH_BITS) 68 - struct aio_batch_entry { 69 - struct hlist_node list; 70 - struct address_space *mapping; 71 - }; 72 - mempool_t *abe_pool; 73 - 74 static void aio_kick_handler(struct work_struct *); 75 static void aio_queue_work(struct kioctx *); 76 ··· 76 kioctx_cachep = KMEM_CACHE(kioctx,SLAB_HWCACHE_ALIGN|SLAB_PANIC); 77 78 aio_wq = alloc_workqueue("aio", 0, 1); /* used to limit concurrency */ 79 - abe_pool = mempool_create_kmalloc_pool(1, sizeof(struct aio_batch_entry)); 80 - BUG_ON(!aio_wq || !abe_pool); 81 82 pr_debug("aio_setup: sizeof(struct page) = %d\n", (int)sizeof(struct page)); 83 ··· 1514 return 0; 1515 } 1516 1517 - static void aio_batch_add(struct address_space *mapping, 1518 - struct hlist_head *batch_hash) 1519 - { 1520 - struct aio_batch_entry *abe; 1521 - struct hlist_node *pos; 1522 - unsigned bucket; 1523 - 1524 - bucket = hash_ptr(mapping, AIO_BATCH_HASH_BITS); 1525 - hlist_for_each_entry(abe, pos, &batch_hash[bucket], list) { 1526 - if (abe->mapping == mapping) 1527 - return; 1528 - } 1529 - 1530 - abe = mempool_alloc(abe_pool, GFP_KERNEL); 1531 - 1532 - /* 1533 - * we should be using igrab here, but 1534 - * we don't want to hammer on the global 1535 - * inode spinlock just to take an extra 1536 - * reference on a file that we must already 1537 - * have a reference to. 1538 - * 1539 - * When we're called, we always have a reference 1540 - * on the file, so we must always have a reference 1541 - * on the inode, so ihold() is safe here. 1542 - */ 1543 - ihold(mapping->host); 1544 - abe->mapping = mapping; 1545 - hlist_add_head(&abe->list, &batch_hash[bucket]); 1546 - return; 1547 - } 1548 - 1549 - static void aio_batch_free(struct hlist_head *batch_hash) 1550 - { 1551 - struct aio_batch_entry *abe; 1552 - struct hlist_node *pos, *n; 1553 - int i; 1554 - 1555 - for (i = 0; i < AIO_BATCH_HASH_SIZE; i++) { 1556 - hlist_for_each_entry_safe(abe, pos, n, &batch_hash[i], list) { 1557 - blk_run_address_space(abe->mapping); 1558 - iput(abe->mapping->host); 1559 - hlist_del(&abe->list); 1560 - mempool_free(abe, abe_pool); 1561 - } 1562 - } 1563 - } 1564 - 1565 static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb, 1566 - struct iocb *iocb, struct hlist_head *batch_hash, 1567 - bool compat) 1568 { 1569 struct kiocb *req; 1570 struct file *file; ··· 1606 ; 1607 } 1608 spin_unlock_irq(&ctx->ctx_lock); 1609 - if (req->ki_opcode == IOCB_CMD_PREAD || 1610 - req->ki_opcode == IOCB_CMD_PREADV || 1611 - req->ki_opcode == IOCB_CMD_PWRITE || 1612 - req->ki_opcode == IOCB_CMD_PWRITEV) 1613 - aio_batch_add(file->f_mapping, batch_hash); 1614 1615 aio_put_req(req); /* drop extra ref to req */ 1616 return 0; ··· 1622 struct kioctx *ctx; 1623 long ret = 0; 1624 int i; 1625 - struct hlist_head batch_hash[AIO_BATCH_HASH_SIZE] = { { 0, }, }; 1626 1627 if (unlikely(nr < 0)) 1628 return -EINVAL; ··· 1638 pr_debug("EINVAL: io_submit: invalid context id\n"); 1639 return -EINVAL; 1640 } 1641 1642 /* 1643 * AKPM: should this return a partial result if some of the IOs were ··· 1659 break; 1660 } 1661 1662 - ret = io_submit_one(ctx, user_iocb, &tmp, batch_hash, compat); 1663 if (ret) 1664 break; 1665 } 1666 - aio_batch_free(batch_hash); 1667 1668 put_ioctx(ctx); 1669 return i ? i : ret;

··· 34 #include <linux/security.h> 35 #include <linux/eventfd.h> 36 #include <linux/blkdev.h> 37 #include <linux/compat.h> 38 39 #include <asm/kmap_types.h> ··· 65 static DEFINE_SPINLOCK(fput_lock); 66 static LIST_HEAD(fput_head); 67 68 static void aio_kick_handler(struct work_struct *); 69 static void aio_queue_work(struct kioctx *); 70 ··· 86 kioctx_cachep = KMEM_CACHE(kioctx,SLAB_HWCACHE_ALIGN|SLAB_PANIC); 87 88 aio_wq = alloc_workqueue("aio", 0, 1); /* used to limit concurrency */ 89 + BUG_ON(!aio_wq); 90 91 pr_debug("aio_setup: sizeof(struct page) = %d\n", (int)sizeof(struct page)); 92 ··· 1525 return 0; 1526 } 1527 1528 static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb, 1529 + struct iocb *iocb, bool compat) 1530 { 1531 struct kiocb *req; 1532 struct file *file; ··· 1666 ; 1667 } 1668 spin_unlock_irq(&ctx->ctx_lock); 1669 1670 aio_put_req(req); /* drop extra ref to req */ 1671 return 0; ··· 1687 struct kioctx *ctx; 1688 long ret = 0; 1689 int i; 1690 + struct blk_plug plug; 1691 1692 if (unlikely(nr < 0)) 1693 return -EINVAL; ··· 1703 pr_debug("EINVAL: io_submit: invalid context id\n"); 1704 return -EINVAL; 1705 } 1706 + 1707 + blk_start_plug(&plug); 1708 1709 /* 1710 * AKPM: should this return a partial result if some of the IOs were ··· 1722 break; 1723 } 1724 1725 + ret = io_submit_one(ctx, user_iocb, &tmp, compat); 1726 if (ret) 1727 break; 1728 } 1729 + blk_finish_plug(&plug); 1730 1731 put_ioctx(ctx); 1732 return i ? i : ret;

-1

fs/befs/linuxvfs.c

··· 75 76 static const struct address_space_operations befs_aops = { 77 .readpage = befs_readpage, 78 - .sync_page = block_sync_page, 79 .bmap = befs_bmap, 80 }; 81

··· 75 76 static const struct address_space_operations befs_aops = { 77 .readpage = befs_readpage, 78 .bmap = befs_bmap, 79 }; 80

-1

fs/bfs/file.c

··· 186 const struct address_space_operations bfs_aops = { 187 .readpage = bfs_readpage, 188 .writepage = bfs_writepage, 189 - .sync_page = block_sync_page, 190 .write_begin = bfs_write_begin, 191 .write_end = generic_write_end, 192 .bmap = bfs_bmap,

··· 186 const struct address_space_operations bfs_aops = { 187 .readpage = bfs_readpage, 188 .writepage = bfs_writepage, 189 .write_begin = bfs_write_begin, 190 .write_end = generic_write_end, 191 .bmap = bfs_bmap,

+3

fs/bio-integrity.c

··· 761 { 762 unsigned int max_slab = vecs_to_idx(BIO_MAX_PAGES); 763 764 bs->bio_integrity_pool = 765 mempool_create_slab_pool(pool_size, bip_slab[max_slab].slab); 766

··· 761 { 762 unsigned int max_slab = vecs_to_idx(BIO_MAX_PAGES); 763 764 + if (bs->bio_integrity_pool) 765 + return 0; 766 + 767 bs->bio_integrity_pool = 768 mempool_create_slab_pool(pool_size, bip_slab[max_slab].slab); 769

+4 -6

fs/bio.c

··· 43 * unsigned short 44 */ 45 #define BV(x) { .nr_vecs = x, .name = "biovec-"__stringify(x) } 46 - struct biovec_slab bvec_slabs[BIOVEC_NR_POOLS] __read_mostly = { 47 BV(1), BV(4), BV(16), BV(64), BV(128), BV(BIO_MAX_PAGES), 48 }; 49 #undef BV ··· 1636 if (!bs->bio_pool) 1637 goto bad; 1638 1639 - if (bioset_integrity_create(bs, pool_size)) 1640 - goto bad; 1641 - 1642 if (!biovec_create_pools(bs, pool_size)) 1643 return bs; 1644 ··· 1653 int size; 1654 struct biovec_slab *bvs = bvec_slabs + i; 1655 1656 - #ifndef CONFIG_BLK_DEV_INTEGRITY 1657 if (bvs->nr_vecs <= BIO_INLINE_VECS) { 1658 bvs->slab = NULL; 1659 continue; 1660 } 1661 - #endif 1662 1663 size = bvs->nr_vecs * sizeof(struct bio_vec); 1664 bvs->slab = kmem_cache_create(bvs->name, size, 0, ··· 1678 fs_bio_set = bioset_create(BIO_POOL_SIZE, 0); 1679 if (!fs_bio_set) 1680 panic("bio: can't allocate bios\n"); 1681 1682 bio_split_pool = mempool_create_kmalloc_pool(BIO_SPLIT_ENTRIES, 1683 sizeof(struct bio_pair));

··· 43 * unsigned short 44 */ 45 #define BV(x) { .nr_vecs = x, .name = "biovec-"__stringify(x) } 46 + static struct biovec_slab bvec_slabs[BIOVEC_NR_POOLS] __read_mostly = { 47 BV(1), BV(4), BV(16), BV(64), BV(128), BV(BIO_MAX_PAGES), 48 }; 49 #undef BV ··· 1636 if (!bs->bio_pool) 1637 goto bad; 1638 1639 if (!biovec_create_pools(bs, pool_size)) 1640 return bs; 1641 ··· 1656 int size; 1657 struct biovec_slab *bvs = bvec_slabs + i; 1658 1659 if (bvs->nr_vecs <= BIO_INLINE_VECS) { 1660 bvs->slab = NULL; 1661 continue; 1662 } 1663 1664 size = bvs->nr_vecs * sizeof(struct bio_vec); 1665 bvs->slab = kmem_cache_create(bvs->name, size, 0, ··· 1683 fs_bio_set = bioset_create(BIO_POOL_SIZE, 0); 1684 if (!fs_bio_set) 1685 panic("bio: can't allocate bios\n"); 1686 + 1687 + if (bioset_integrity_create(fs_bio_set, BIO_POOL_SIZE)) 1688 + panic("bio: can't create integrity pool\n"); 1689 1690 bio_split_pool = mempool_create_kmalloc_pool(BIO_SPLIT_ENTRIES, 1691 sizeof(struct bio_pair));

+14 -13

fs/block_dev.c

··· 1087 if (!disk) 1088 goto out; 1089 1090 mutex_lock_nested(&bdev->bd_mutex, for_part); 1091 if (!bdev->bd_openers) { 1092 bdev->bd_disk = disk; ··· 1109 */ 1110 disk_put_part(bdev->bd_part); 1111 bdev->bd_part = NULL; 1112 - module_put(disk->fops->owner); 1113 - put_disk(disk); 1114 bdev->bd_disk = NULL; 1115 mutex_unlock(&bdev->bd_mutex); 1116 goto restart; 1117 } 1118 if (ret) ··· 1150 bd_set_size(bdev, (loff_t)bdev->bd_part->nr_sects << 9); 1151 } 1152 } else { 1153 - module_put(disk->fops->owner); 1154 - put_disk(disk); 1155 - disk = NULL; 1156 if (bdev->bd_contains == bdev) { 1157 if (bdev->bd_disk->fops->open) { 1158 ret = bdev->bd_disk->fops->open(bdev, mode); ··· 1159 if (bdev->bd_invalidated) 1160 rescan_partitions(bdev->bd_disk, bdev); 1161 } 1162 } 1163 bdev->bd_openers++; 1164 if (for_part) 1165 bdev->bd_part_count++; 1166 mutex_unlock(&bdev->bd_mutex); 1167 return 0; 1168 1169 out_clear: ··· 1180 bdev->bd_contains = NULL; 1181 out_unlock_bdev: 1182 mutex_unlock(&bdev->bd_mutex); 1183 - out: 1184 - if (disk) 1185 - module_put(disk->fops->owner); 1186 put_disk(disk); 1187 bdput(bdev); 1188 1189 return ret; ··· 1449 if (bdev_free) { 1450 if (bdev->bd_write_holder) { 1451 disk_unblock_events(bdev->bd_disk); 1452 - bdev->bd_write_holder = false; 1453 - } else 1454 disk_check_events(bdev->bd_disk); 1455 } 1456 1457 mutex_unlock(&bdev->bd_mutex); 1458 - } else 1459 - disk_check_events(bdev->bd_disk); 1460 1461 return __blkdev_put(bdev, mode, 0); 1462 } ··· 1529 static const struct address_space_operations def_blk_aops = { 1530 .readpage = blkdev_readpage, 1531 .writepage = blkdev_writepage, 1532 - .sync_page = block_sync_page, 1533 .write_begin = blkdev_write_begin, 1534 .write_end = blkdev_write_end, 1535 .writepages = generic_writepages,

··· 1087 if (!disk) 1088 goto out; 1089 1090 + disk_block_events(disk); 1091 mutex_lock_nested(&bdev->bd_mutex, for_part); 1092 if (!bdev->bd_openers) { 1093 bdev->bd_disk = disk; ··· 1108 */ 1109 disk_put_part(bdev->bd_part); 1110 bdev->bd_part = NULL; 1111 bdev->bd_disk = NULL; 1112 mutex_unlock(&bdev->bd_mutex); 1113 + disk_unblock_events(disk); 1114 + module_put(disk->fops->owner); 1115 + put_disk(disk); 1116 goto restart; 1117 } 1118 if (ret) ··· 1148 bd_set_size(bdev, (loff_t)bdev->bd_part->nr_sects << 9); 1149 } 1150 } else { 1151 if (bdev->bd_contains == bdev) { 1152 if (bdev->bd_disk->fops->open) { 1153 ret = bdev->bd_disk->fops->open(bdev, mode); ··· 1160 if (bdev->bd_invalidated) 1161 rescan_partitions(bdev->bd_disk, bdev); 1162 } 1163 + /* only one opener holds refs to the module and disk */ 1164 + module_put(disk->fops->owner); 1165 + put_disk(disk); 1166 } 1167 bdev->bd_openers++; 1168 if (for_part) 1169 bdev->bd_part_count++; 1170 mutex_unlock(&bdev->bd_mutex); 1171 + disk_unblock_events(disk); 1172 return 0; 1173 1174 out_clear: ··· 1177 bdev->bd_contains = NULL; 1178 out_unlock_bdev: 1179 mutex_unlock(&bdev->bd_mutex); 1180 + disk_unblock_events(disk); 1181 + module_put(disk->fops->owner); 1182 put_disk(disk); 1183 + out: 1184 bdput(bdev); 1185 1186 return ret; ··· 1446 if (bdev_free) { 1447 if (bdev->bd_write_holder) { 1448 disk_unblock_events(bdev->bd_disk); 1449 disk_check_events(bdev->bd_disk); 1450 + bdev->bd_write_holder = false; 1451 + } 1452 } 1453 1454 mutex_unlock(&bdev->bd_mutex); 1455 + } 1456 1457 return __blkdev_put(bdev, mode, 0); 1458 } ··· 1527 static const struct address_space_operations def_blk_aops = { 1528 .readpage = blkdev_readpage, 1529 .writepage = blkdev_writepage, 1530 .write_begin = blkdev_write_begin, 1531 .write_end = blkdev_write_end, 1532 .writepages = generic_writepages,

-79

fs/btrfs/disk-io.c

··· 847 .writepages = btree_writepages, 848 .releasepage = btree_releasepage, 849 .invalidatepage = btree_invalidatepage, 850 - .sync_page = block_sync_page, 851 #ifdef CONFIG_MIGRATION 852 .migratepage = btree_migratepage, 853 #endif ··· 1330 } 1331 1332 /* 1333 - * this unplugs every device on the box, and it is only used when page 1334 - * is null 1335 - */ 1336 - static void __unplug_io_fn(struct backing_dev_info *bdi, struct page *page) 1337 - { 1338 - struct btrfs_device *device; 1339 - struct btrfs_fs_info *info; 1340 - 1341 - info = (struct btrfs_fs_info *)bdi->unplug_io_data; 1342 - list_for_each_entry(device, &info->fs_devices->devices, dev_list) { 1343 - if (!device->bdev) 1344 - continue; 1345 - 1346 - bdi = blk_get_backing_dev_info(device->bdev); 1347 - if (bdi->unplug_io_fn) 1348 - bdi->unplug_io_fn(bdi, page); 1349 - } 1350 - } 1351 - 1352 - static void btrfs_unplug_io_fn(struct backing_dev_info *bdi, struct page *page) 1353 - { 1354 - struct inode *inode; 1355 - struct extent_map_tree *em_tree; 1356 - struct extent_map *em; 1357 - struct address_space *mapping; 1358 - u64 offset; 1359 - 1360 - /* the generic O_DIRECT read code does this */ 1361 - if (1 || !page) { 1362 - __unplug_io_fn(bdi, page); 1363 - return; 1364 - } 1365 - 1366 - /* 1367 - * page->mapping may change at any time. Get a consistent copy 1368 - * and use that for everything below 1369 - */ 1370 - smp_mb(); 1371 - mapping = page->mapping; 1372 - if (!mapping) 1373 - return; 1374 - 1375 - inode = mapping->host; 1376 - 1377 - /* 1378 - * don't do the expensive searching for a small number of 1379 - * devices 1380 - */ 1381 - if (BTRFS_I(inode)->root->fs_info->fs_devices->open_devices <= 2) { 1382 - __unplug_io_fn(bdi, page); 1383 - return; 1384 - } 1385 - 1386 - offset = page_offset(page); 1387 - 1388 - em_tree = &BTRFS_I(inode)->extent_tree; 1389 - read_lock(&em_tree->lock); 1390 - em = lookup_extent_mapping(em_tree, offset, PAGE_CACHE_SIZE); 1391 - read_unlock(&em_tree->lock); 1392 - if (!em) { 1393 - __unplug_io_fn(bdi, page); 1394 - return; 1395 - } 1396 - 1397 - if (em->block_start >= EXTENT_MAP_LAST_BYTE) { 1398 - free_extent_map(em); 1399 - __unplug_io_fn(bdi, page); 1400 - return; 1401 - } 1402 - offset = offset - em->start; 1403 - btrfs_unplug_page(&BTRFS_I(inode)->root->fs_info->mapping_tree, 1404 - em->block_start + offset, page); 1405 - free_extent_map(em); 1406 - } 1407 - 1408 - /* 1409 * If this fails, caller must call bdi_destroy() to get rid of the 1410 * bdi again. 1411 */ ··· 1343 return err; 1344 1345 bdi->ra_pages = default_backing_dev_info.ra_pages; 1346 - bdi->unplug_io_fn = btrfs_unplug_io_fn; 1347 - bdi->unplug_io_data = info; 1348 bdi->congested_fn = btrfs_congested_fn; 1349 bdi->congested_data = info; 1350 return 0;

··· 847 .writepages = btree_writepages, 848 .releasepage = btree_releasepage, 849 .invalidatepage = btree_invalidatepage, 850 #ifdef CONFIG_MIGRATION 851 .migratepage = btree_migratepage, 852 #endif ··· 1331 } 1332 1333 /* 1334 * If this fails, caller must call bdi_destroy() to get rid of the 1335 * bdi again. 1336 */ ··· 1420 return err; 1421 1422 bdi->ra_pages = default_backing_dev_info.ra_pages; 1423 bdi->congested_fn = btrfs_congested_fn; 1424 bdi->congested_data = info; 1425 return 0;

+1 -1

fs/btrfs/extent_io.c

··· 2188 unsigned long nr_written = 0; 2189 2190 if (wbc->sync_mode == WB_SYNC_ALL) 2191 - write_flags = WRITE_SYNC_PLUG; 2192 else 2193 write_flags = WRITE; 2194

··· 2188 unsigned long nr_written = 0; 2189 2190 if (wbc->sync_mode == WB_SYNC_ALL) 2191 + write_flags = WRITE_SYNC; 2192 else 2193 write_flags = WRITE; 2194

-1

fs/btrfs/inode.c

··· 7340 .writepage = btrfs_writepage, 7341 .writepages = btrfs_writepages, 7342 .readpages = btrfs_readpages, 7343 - .sync_page = block_sync_page, 7344 .direct_IO = btrfs_direct_IO, 7345 .invalidatepage = btrfs_invalidatepage, 7346 .releasepage = btrfs_releasepage,

··· 7340 .writepage = btrfs_writepage, 7341 .writepages = btrfs_writepages, 7342 .readpages = btrfs_readpages, 7343 .direct_IO = btrfs_direct_IO, 7344 .invalidatepage = btrfs_invalidatepage, 7345 .releasepage = btrfs_releasepage,

+11 -80

fs/btrfs/volumes.c

··· 162 struct bio *cur; 163 int again = 0; 164 unsigned long num_run; 165 - unsigned long num_sync_run; 166 unsigned long batch_run = 0; 167 unsigned long limit; 168 unsigned long last_waited = 0; ··· 171 fs_info = device->dev_root->fs_info; 172 limit = btrfs_async_submit_limit(fs_info); 173 limit = limit * 2 / 3; 174 - 175 - /* we want to make sure that every time we switch from the sync 176 - * list to the normal list, we unplug 177 - */ 178 - num_sync_run = 0; 179 180 loop: 181 spin_lock(&device->io_lock); ··· 217 218 spin_unlock(&device->io_lock); 219 220 - /* 221 - * if we're doing the regular priority list, make sure we unplug 222 - * for any high prio bios we've sent down 223 - */ 224 - if (pending_bios == &device->pending_bios && num_sync_run > 0) { 225 - num_sync_run = 0; 226 - blk_run_backing_dev(bdi, NULL); 227 - } 228 - 229 while (pending) { 230 231 rmb(); ··· 244 245 BUG_ON(atomic_read(&cur->bi_cnt) == 0); 246 247 - if (cur->bi_rw & REQ_SYNC) 248 - num_sync_run++; 249 - 250 submit_bio(cur->bi_rw, cur); 251 num_run++; 252 batch_run++; 253 - if (need_resched()) { 254 - if (num_sync_run) { 255 - blk_run_backing_dev(bdi, NULL); 256 - num_sync_run = 0; 257 - } 258 cond_resched(); 259 - } 260 261 /* 262 * we made progress, there is more work to do and the bdi ··· 281 * against it before looping 282 */ 283 last_waited = ioc->last_waited; 284 - if (need_resched()) { 285 - if (num_sync_run) { 286 - blk_run_backing_dev(bdi, NULL); 287 - num_sync_run = 0; 288 - } 289 cond_resched(); 290 - } 291 continue; 292 } 293 spin_lock(&device->io_lock); ··· 294 goto done; 295 } 296 } 297 - 298 - if (num_sync_run) { 299 - num_sync_run = 0; 300 - blk_run_backing_dev(bdi, NULL); 301 - } 302 - /* 303 - * IO has already been through a long path to get here. Checksumming, 304 - * async helper threads, perhaps compression. We've done a pretty 305 - * good job of collecting a batch of IO and should just unplug 306 - * the device right away. 307 - * 308 - * This will help anyone who is waiting on the IO, they might have 309 - * already unplugged, but managed to do so before the bio they 310 - * cared about found its way down here. 311 - */ 312 - blk_run_backing_dev(bdi, NULL); 313 314 cond_resched(); 315 if (again) ··· 2911 static int __btrfs_map_block(struct btrfs_mapping_tree *map_tree, int rw, 2912 u64 logical, u64 *length, 2913 struct btrfs_multi_bio **multi_ret, 2914 - int mirror_num, struct page *unplug_page) 2915 { 2916 struct extent_map *em; 2917 struct map_lookup *map; ··· 2942 read_lock(&em_tree->lock); 2943 em = lookup_extent_mapping(em_tree, logical, *length); 2944 read_unlock(&em_tree->lock); 2945 - 2946 - if (!em && unplug_page) { 2947 - kfree(multi); 2948 - return 0; 2949 - } 2950 2951 if (!em) { 2952 printk(KERN_CRIT "unable to find logical %llu len %llu\n", ··· 2998 *length = em->len - offset; 2999 } 3000 3001 - if (!multi_ret && !unplug_page) 3002 goto out; 3003 3004 num_stripes = 1; 3005 stripe_index = 0; 3006 if (map->type & BTRFS_BLOCK_GROUP_RAID1) { 3007 - if (unplug_page || (rw & REQ_WRITE)) 3008 num_stripes = map->num_stripes; 3009 else if (mirror_num) 3010 stripe_index = mirror_num - 1; ··· 3026 stripe_index = do_div(stripe_nr, factor); 3027 stripe_index *= map->sub_stripes; 3028 3029 - if (unplug_page || (rw & REQ_WRITE)) 3030 num_stripes = map->sub_stripes; 3031 else if (mirror_num) 3032 stripe_index += mirror_num - 1; ··· 3046 BUG_ON(stripe_index >= map->num_stripes); 3047 3048 for (i = 0; i < num_stripes; i++) { 3049 - if (unplug_page) { 3050 - struct btrfs_device *device; 3051 - struct backing_dev_info *bdi; 3052 - 3053 - device = map->stripes[stripe_index].dev; 3054 - if (device->bdev) { 3055 - bdi = blk_get_backing_dev_info(device->bdev); 3056 - if (bdi->unplug_io_fn) 3057 - bdi->unplug_io_fn(bdi, unplug_page); 3058 - } 3059 - } else { 3060 - multi->stripes[i].physical = 3061 - map->stripes[stripe_index].physical + 3062 - stripe_offset + stripe_nr * map->stripe_len; 3063 - multi->stripes[i].dev = map->stripes[stripe_index].dev; 3064 - } 3065 stripe_index++; 3066 } 3067 if (multi_ret) { ··· 3067 struct btrfs_multi_bio **multi_ret, int mirror_num) 3068 { 3069 return __btrfs_map_block(map_tree, rw, logical, length, multi_ret, 3070 - mirror_num, NULL); 3071 } 3072 3073 int btrfs_rmap_block(struct btrfs_mapping_tree *map_tree, ··· 3133 3134 free_extent_map(em); 3135 return 0; 3136 - } 3137 - 3138 - int btrfs_unplug_page(struct btrfs_mapping_tree *map_tree, 3139 - u64 logical, struct page *page) 3140 - { 3141 - u64 length = PAGE_CACHE_SIZE; 3142 - return __btrfs_map_block(map_tree, READ, logical, &length, 3143 - NULL, 0, page); 3144 } 3145 3146 static void end_bio_multi_stripe(struct bio *bio, int err)

··· 162 struct bio *cur; 163 int again = 0; 164 unsigned long num_run; 165 unsigned long batch_run = 0; 166 unsigned long limit; 167 unsigned long last_waited = 0; ··· 172 fs_info = device->dev_root->fs_info; 173 limit = btrfs_async_submit_limit(fs_info); 174 limit = limit * 2 / 3; 175 176 loop: 177 spin_lock(&device->io_lock); ··· 223 224 spin_unlock(&device->io_lock); 225 226 while (pending) { 227 228 rmb(); ··· 259 260 BUG_ON(atomic_read(&cur->bi_cnt) == 0); 261 262 submit_bio(cur->bi_rw, cur); 263 num_run++; 264 batch_run++; 265 + if (need_resched()) 266 cond_resched(); 267 268 /* 269 * we made progress, there is more work to do and the bdi ··· 304 * against it before looping 305 */ 306 last_waited = ioc->last_waited; 307 + if (need_resched()) 308 cond_resched(); 309 continue; 310 } 311 spin_lock(&device->io_lock); ··· 322 goto done; 323 } 324 } 325 326 cond_resched(); 327 if (again) ··· 2955 static int __btrfs_map_block(struct btrfs_mapping_tree *map_tree, int rw, 2956 u64 logical, u64 *length, 2957 struct btrfs_multi_bio **multi_ret, 2958 + int mirror_num) 2959 { 2960 struct extent_map *em; 2961 struct map_lookup *map; ··· 2986 read_lock(&em_tree->lock); 2987 em = lookup_extent_mapping(em_tree, logical, *length); 2988 read_unlock(&em_tree->lock); 2989 2990 if (!em) { 2991 printk(KERN_CRIT "unable to find logical %llu len %llu\n", ··· 3047 *length = em->len - offset; 3048 } 3049 3050 + if (!multi_ret) 3051 goto out; 3052 3053 num_stripes = 1; 3054 stripe_index = 0; 3055 if (map->type & BTRFS_BLOCK_GROUP_RAID1) { 3056 + if (rw & REQ_WRITE) 3057 num_stripes = map->num_stripes; 3058 else if (mirror_num) 3059 stripe_index = mirror_num - 1; ··· 3075 stripe_index = do_div(stripe_nr, factor); 3076 stripe_index *= map->sub_stripes; 3077 3078 + if (rw & REQ_WRITE) 3079 num_stripes = map->sub_stripes; 3080 else if (mirror_num) 3081 stripe_index += mirror_num - 1; ··· 3095 BUG_ON(stripe_index >= map->num_stripes); 3096 3097 for (i = 0; i < num_stripes; i++) { 3098 + multi->stripes[i].physical = 3099 + map->stripes[stripe_index].physical + 3100 + stripe_offset + stripe_nr * map->stripe_len; 3101 + multi->stripes[i].dev = map->stripes[stripe_index].dev; 3102 stripe_index++; 3103 } 3104 if (multi_ret) { ··· 3128 struct btrfs_multi_bio **multi_ret, int mirror_num) 3129 { 3130 return __btrfs_map_block(map_tree, rw, logical, length, multi_ret, 3131 + mirror_num); 3132 } 3133 3134 int btrfs_rmap_block(struct btrfs_mapping_tree *map_tree, ··· 3194 3195 free_extent_map(em); 3196 return 0; 3197 } 3198 3199 static void end_bio_multi_stripe(struct bio *bio, int err)

+14 -37

fs/buffer.c

··· 54 } 55 EXPORT_SYMBOL(init_buffer); 56 57 - static int sync_buffer(void *word) 58 { 59 - struct block_device *bd; 60 - struct buffer_head *bh 61 - = container_of(word, struct buffer_head, b_state); 62 - 63 - smp_mb(); 64 - bd = bh->b_bdev; 65 - if (bd) 66 - blk_run_address_space(bd->bd_inode->i_mapping); 67 io_schedule(); 68 return 0; 69 } 70 71 void __lock_buffer(struct buffer_head *bh) 72 { 73 - wait_on_bit_lock(&bh->b_state, BH_Lock, sync_buffer, 74 TASK_UNINTERRUPTIBLE); 75 } 76 EXPORT_SYMBOL(__lock_buffer); ··· 82 */ 83 void __wait_on_buffer(struct buffer_head * bh) 84 { 85 - wait_on_bit(&bh->b_state, BH_Lock, sync_buffer, TASK_UNINTERRUPTIBLE); 86 } 87 EXPORT_SYMBOL(__wait_on_buffer); 88 ··· 741 { 742 struct buffer_head *bh; 743 struct list_head tmp; 744 - struct address_space *mapping, *prev_mapping = NULL; 745 int err = 0, err2; 746 747 INIT_LIST_HEAD(&tmp); 748 749 spin_lock(lock); 750 while (!list_empty(list)) { ··· 769 * still in flight on potentially older 770 * contents. 771 */ 772 - write_dirty_buffer(bh, WRITE_SYNC_PLUG); 773 774 /* 775 * Kick off IO for the previous mapping. Note ··· 777 * wait_on_buffer() will do that for us 778 * through sync_buffer(). 779 */ 780 - if (prev_mapping && prev_mapping != mapping) 781 - blk_run_address_space(prev_mapping); 782 - prev_mapping = mapping; 783 - 784 brelse(bh); 785 spin_lock(lock); 786 } 787 } 788 } 789 790 while (!list_empty(&tmp)) { 791 bh = BH_ENTRY(tmp.prev); ··· 1608 * prevents this contention from occurring. 1609 * 1610 * If block_write_full_page() is called with wbc->sync_mode == 1611 - * WB_SYNC_ALL, the writes are posted using WRITE_SYNC_PLUG; this 1612 - * causes the writes to be flagged as synchronous writes, but the 1613 - * block device queue will NOT be unplugged, since usually many pages 1614 - * will be pushed to the out before the higher-level caller actually 1615 - * waits for the writes to be completed. The various wait functions, 1616 - * such as wait_on_writeback_range() will ultimately call sync_page() 1617 - * which will ultimately call blk_run_backing_dev(), which will end up 1618 - * unplugging the device queue. 1619 */ 1620 static int __block_write_full_page(struct inode *inode, struct page *page, 1621 get_block_t *get_block, struct writeback_control *wbc, ··· 1622 const unsigned blocksize = 1 << inode->i_blkbits; 1623 int nr_underway = 0; 1624 int write_op = (wbc->sync_mode == WB_SYNC_ALL ? 1625 - WRITE_SYNC_PLUG : WRITE); 1626 1627 BUG_ON(!PageLocked(page)); 1628 ··· 3125 return ret; 3126 } 3127 EXPORT_SYMBOL(try_to_free_buffers); 3128 - 3129 - void block_sync_page(struct page *page) 3130 - { 3131 - struct address_space *mapping; 3132 - 3133 - smp_mb(); 3134 - mapping = page_mapping(page); 3135 - if (mapping) 3136 - blk_run_backing_dev(mapping->backing_dev_info, page); 3137 - } 3138 - EXPORT_SYMBOL(block_sync_page); 3139 3140 /* 3141 * There are no bdflush tunables left. But distributions are

··· 54 } 55 EXPORT_SYMBOL(init_buffer); 56 57 + static int sleep_on_buffer(void *word) 58 { 59 io_schedule(); 60 return 0; 61 } 62 63 void __lock_buffer(struct buffer_head *bh) 64 { 65 + wait_on_bit_lock(&bh->b_state, BH_Lock, sleep_on_buffer, 66 TASK_UNINTERRUPTIBLE); 67 } 68 EXPORT_SYMBOL(__lock_buffer); ··· 90 */ 91 void __wait_on_buffer(struct buffer_head * bh) 92 { 93 + wait_on_bit(&bh->b_state, BH_Lock, sleep_on_buffer, TASK_UNINTERRUPTIBLE); 94 } 95 EXPORT_SYMBOL(__wait_on_buffer); 96 ··· 749 { 750 struct buffer_head *bh; 751 struct list_head tmp; 752 + struct address_space *mapping; 753 int err = 0, err2; 754 + struct blk_plug plug; 755 756 INIT_LIST_HEAD(&tmp); 757 + blk_start_plug(&plug); 758 759 spin_lock(lock); 760 while (!list_empty(list)) { ··· 775 * still in flight on potentially older 776 * contents. 777 */ 778 + write_dirty_buffer(bh, WRITE_SYNC); 779 780 /* 781 * Kick off IO for the previous mapping. Note ··· 783 * wait_on_buffer() will do that for us 784 * through sync_buffer(). 785 */ 786 brelse(bh); 787 spin_lock(lock); 788 } 789 } 790 } 791 + 792 + spin_unlock(lock); 793 + blk_finish_plug(&plug); 794 + spin_lock(lock); 795 796 while (!list_empty(&tmp)) { 797 bh = BH_ENTRY(tmp.prev); ··· 1614 * prevents this contention from occurring. 1615 * 1616 * If block_write_full_page() is called with wbc->sync_mode == 1617 + * WB_SYNC_ALL, the writes are posted using WRITE_SYNC; this 1618 + * causes the writes to be flagged as synchronous writes. 1619 */ 1620 static int __block_write_full_page(struct inode *inode, struct page *page, 1621 get_block_t *get_block, struct writeback_control *wbc, ··· 1634 const unsigned blocksize = 1 << inode->i_blkbits; 1635 int nr_underway = 0; 1636 int write_op = (wbc->sync_mode == WB_SYNC_ALL ? 1637 + WRITE_SYNC : WRITE); 1638 1639 BUG_ON(!PageLocked(page)); 1640 ··· 3137 return ret; 3138 } 3139 EXPORT_SYMBOL(try_to_free_buffers); 3140 3141 /* 3142 * There are no bdflush tunables left. But distributions are

-30

fs/cifs/file.c

··· 1569 return rc; 1570 } 1571 1572 - /* static void cifs_sync_page(struct page *page) 1573 - { 1574 - struct address_space *mapping; 1575 - struct inode *inode; 1576 - unsigned long index = page->index; 1577 - unsigned int rpages = 0; 1578 - int rc = 0; 1579 - 1580 - cFYI(1, "sync page %p", page); 1581 - mapping = page->mapping; 1582 - if (!mapping) 1583 - return 0; 1584 - inode = mapping->host; 1585 - if (!inode) 1586 - return; */ 1587 - 1588 - /* fill in rpages then 1589 - result = cifs_pagein_inode(inode, index, rpages); */ /* BB finish */ 1590 - 1591 - /* cFYI(1, "rpages is %d for sync page of Index %ld", rpages, index); 1592 - 1593 - #if 0 1594 - if (rc < 0) 1595 - return rc; 1596 - return 0; 1597 - #endif 1598 - } */ 1599 - 1600 /* 1601 * As file closes, flush all cached write data for this inode checking 1602 * for write behind errors. ··· 2482 .set_page_dirty = __set_page_dirty_nobuffers, 2483 .releasepage = cifs_release_page, 2484 .invalidatepage = cifs_invalidate_page, 2485 - /* .sync_page = cifs_sync_page, */ 2486 /* .direct_IO = */ 2487 }; 2488 ··· 2499 .set_page_dirty = __set_page_dirty_nobuffers, 2500 .releasepage = cifs_release_page, 2501 .invalidatepage = cifs_invalidate_page, 2502 - /* .sync_page = cifs_sync_page, */ 2503 /* .direct_IO = */ 2504 };

··· 1569 return rc; 1570 } 1571 1572 /* 1573 * As file closes, flush all cached write data for this inode checking 1574 * for write behind errors. ··· 2510 .set_page_dirty = __set_page_dirty_nobuffers, 2511 .releasepage = cifs_release_page, 2512 .invalidatepage = cifs_invalidate_page, 2513 /* .direct_IO = */ 2514 }; 2515 ··· 2528 .set_page_dirty = __set_page_dirty_nobuffers, 2529 .releasepage = cifs_release_page, 2530 .invalidatepage = cifs_invalidate_page, 2531 /* .direct_IO = */ 2532 };

+2 -5

fs/direct-io.c

··· 1110 ((rw & READ) || (dio->result == dio->size))) 1111 ret = -EIOCBQUEUED; 1112 1113 - if (ret != -EIOCBQUEUED) { 1114 - /* All IO is now issued, send it on its way */ 1115 - blk_run_address_space(inode->i_mapping); 1116 dio_await_completion(dio); 1117 - } 1118 1119 /* 1120 * Sync will always be dropping the final ref and completing the ··· 1173 struct dio *dio; 1174 1175 if (rw & WRITE) 1176 - rw = WRITE_ODIRECT_PLUG; 1177 1178 if (bdev) 1179 bdev_blkbits = blksize_bits(bdev_logical_block_size(bdev));

··· 1110 ((rw & READ) || (dio->result == dio->size))) 1111 ret = -EIOCBQUEUED; 1112 1113 + if (ret != -EIOCBQUEUED) 1114 dio_await_completion(dio); 1115 1116 /* 1117 * Sync will always be dropping the final ref and completing the ··· 1176 struct dio *dio; 1177 1178 if (rw & WRITE) 1179 + rw = WRITE_ODIRECT; 1180 1181 if (bdev) 1182 bdev_blkbits = blksize_bits(bdev_logical_block_size(bdev));

-1

fs/efs/inode.c

··· 23 } 24 static const struct address_space_operations efs_aops = { 25 .readpage = efs_readpage, 26 - .sync_page = block_sync_page, 27 .bmap = _efs_bmap 28 }; 29

··· 23 } 24 static const struct address_space_operations efs_aops = { 25 .readpage = efs_readpage, 26 .bmap = _efs_bmap 27 }; 28

-1

fs/exofs/inode.c

··· 823 .direct_IO = NULL, /* TODO: Should be trivial to do */ 824 825 /* With these NULL has special meaning or default is not exported */ 826 - .sync_page = NULL, 827 .get_xip_mem = NULL, 828 .migratepage = NULL, 829 .launder_page = NULL,

··· 823 .direct_IO = NULL, /* TODO: Should be trivial to do */ 824 825 /* With these NULL has special meaning or default is not exported */ 826 .get_xip_mem = NULL, 827 .migratepage = NULL, 828 .launder_page = NULL,

-2

fs/ext2/inode.c

··· 860 .readpage = ext2_readpage, 861 .readpages = ext2_readpages, 862 .writepage = ext2_writepage, 863 - .sync_page = block_sync_page, 864 .write_begin = ext2_write_begin, 865 .write_end = ext2_write_end, 866 .bmap = ext2_bmap, ··· 879 .readpage = ext2_readpage, 880 .readpages = ext2_readpages, 881 .writepage = ext2_nobh_writepage, 882 - .sync_page = block_sync_page, 883 .write_begin = ext2_nobh_write_begin, 884 .write_end = nobh_write_end, 885 .bmap = ext2_bmap,

··· 860 .readpage = ext2_readpage, 861 .readpages = ext2_readpages, 862 .writepage = ext2_writepage, 863 .write_begin = ext2_write_begin, 864 .write_end = ext2_write_end, 865 .bmap = ext2_bmap, ··· 880 .readpage = ext2_readpage, 881 .readpages = ext2_readpages, 882 .writepage = ext2_nobh_writepage, 883 .write_begin = ext2_nobh_write_begin, 884 .write_end = nobh_write_end, 885 .bmap = ext2_bmap,

-3

fs/ext3/inode.c

··· 1894 .readpage = ext3_readpage, 1895 .readpages = ext3_readpages, 1896 .writepage = ext3_ordered_writepage, 1897 - .sync_page = block_sync_page, 1898 .write_begin = ext3_write_begin, 1899 .write_end = ext3_ordered_write_end, 1900 .bmap = ext3_bmap, ··· 1909 .readpage = ext3_readpage, 1910 .readpages = ext3_readpages, 1911 .writepage = ext3_writeback_writepage, 1912 - .sync_page = block_sync_page, 1913 .write_begin = ext3_write_begin, 1914 .write_end = ext3_writeback_write_end, 1915 .bmap = ext3_bmap, ··· 1924 .readpage = ext3_readpage, 1925 .readpages = ext3_readpages, 1926 .writepage = ext3_journalled_writepage, 1927 - .sync_page = block_sync_page, 1928 .write_begin = ext3_write_begin, 1929 .write_end = ext3_journalled_write_end, 1930 .set_page_dirty = ext3_journalled_set_page_dirty,

··· 1894 .readpage = ext3_readpage, 1895 .readpages = ext3_readpages, 1896 .writepage = ext3_ordered_writepage, 1897 .write_begin = ext3_write_begin, 1898 .write_end = ext3_ordered_write_end, 1899 .bmap = ext3_bmap, ··· 1910 .readpage = ext3_readpage, 1911 .readpages = ext3_readpages, 1912 .writepage = ext3_writeback_writepage, 1913 .write_begin = ext3_write_begin, 1914 .write_end = ext3_writeback_write_end, 1915 .bmap = ext3_bmap, ··· 1926 .readpage = ext3_readpage, 1927 .readpages = ext3_readpages, 1928 .writepage = ext3_journalled_writepage, 1929 .write_begin = ext3_write_begin, 1930 .write_end = ext3_journalled_write_end, 1931 .set_page_dirty = ext3_journalled_set_page_dirty,

-4

fs/ext4/inode.c

··· 3903 .readpage = ext4_readpage, 3904 .readpages = ext4_readpages, 3905 .writepage = ext4_writepage, 3906 - .sync_page = block_sync_page, 3907 .write_begin = ext4_write_begin, 3908 .write_end = ext4_ordered_write_end, 3909 .bmap = ext4_bmap, ··· 3918 .readpage = ext4_readpage, 3919 .readpages = ext4_readpages, 3920 .writepage = ext4_writepage, 3921 - .sync_page = block_sync_page, 3922 .write_begin = ext4_write_begin, 3923 .write_end = ext4_writeback_write_end, 3924 .bmap = ext4_bmap, ··· 3933 .readpage = ext4_readpage, 3934 .readpages = ext4_readpages, 3935 .writepage = ext4_writepage, 3936 - .sync_page = block_sync_page, 3937 .write_begin = ext4_write_begin, 3938 .write_end = ext4_journalled_write_end, 3939 .set_page_dirty = ext4_journalled_set_page_dirty, ··· 3948 .readpages = ext4_readpages, 3949 .writepage = ext4_writepage, 3950 .writepages = ext4_da_writepages, 3951 - .sync_page = block_sync_page, 3952 .write_begin = ext4_da_write_begin, 3953 .write_end = ext4_da_write_end, 3954 .bmap = ext4_bmap,

··· 3903 .readpage = ext4_readpage, 3904 .readpages = ext4_readpages, 3905 .writepage = ext4_writepage, 3906 .write_begin = ext4_write_begin, 3907 .write_end = ext4_ordered_write_end, 3908 .bmap = ext4_bmap, ··· 3919 .readpage = ext4_readpage, 3920 .readpages = ext4_readpages, 3921 .writepage = ext4_writepage, 3922 .write_begin = ext4_write_begin, 3923 .write_end = ext4_writeback_write_end, 3924 .bmap = ext4_bmap, ··· 3935 .readpage = ext4_readpage, 3936 .readpages = ext4_readpages, 3937 .writepage = ext4_writepage, 3938 .write_begin = ext4_write_begin, 3939 .write_end = ext4_journalled_write_end, 3940 .set_page_dirty = ext4_journalled_set_page_dirty, ··· 3951 .readpages = ext4_readpages, 3952 .writepage = ext4_writepage, 3953 .writepages = ext4_da_writepages, 3954 .write_begin = ext4_da_write_begin, 3955 .write_end = ext4_da_write_end, 3956 .bmap = ext4_bmap,

+1 -2

fs/ext4/page-io.c

··· 310 io_end->offset = (page->index << PAGE_CACHE_SHIFT) + bh_offset(bh); 311 312 io->io_bio = bio; 313 - io->io_op = (wbc->sync_mode == WB_SYNC_ALL ? 314 - WRITE_SYNC_PLUG : WRITE); 315 io->io_next_block = bh->b_blocknr; 316 return 0; 317 }

··· 310 io_end->offset = (page->index << PAGE_CACHE_SHIFT) + bh_offset(bh); 311 312 io->io_bio = bio; 313 + io->io_op = (wbc->sync_mode == WB_SYNC_ALL ? WRITE_SYNC : WRITE); 314 io->io_next_block = bh->b_blocknr; 315 return 0; 316 }

-1

fs/fat/inode.c

··· 236 .readpages = fat_readpages, 237 .writepage = fat_writepage, 238 .writepages = fat_writepages, 239 - .sync_page = block_sync_page, 240 .write_begin = fat_write_begin, 241 .write_end = fat_write_end, 242 .direct_IO = fat_direct_IO,

··· 236 .readpages = fat_readpages, 237 .writepage = fat_writepage, 238 .writepages = fat_writepages, 239 .write_begin = fat_write_begin, 240 .write_end = fat_write_end, 241 .direct_IO = fat_direct_IO,

-1

fs/freevxfs/vxfs_subr.c

··· 44 const struct address_space_operations vxfs_aops = { 45 .readpage = vxfs_readpage, 46 .bmap = vxfs_bmap, 47 - .sync_page = block_sync_page, 48 }; 49 50 inline void

··· 44 const struct address_space_operations vxfs_aops = { 45 .readpage = vxfs_readpage, 46 .bmap = vxfs_bmap, 47 }; 48 49 inline void

-1

fs/fuse/inode.c

··· 870 871 fc->bdi.name = "fuse"; 872 fc->bdi.ra_pages = (VM_MAX_READAHEAD * 1024) / PAGE_CACHE_SIZE; 873 - fc->bdi.unplug_io_fn = default_unplug_io_fn; 874 /* fuse does it's own writeback accounting */ 875 fc->bdi.capabilities = BDI_CAP_NO_ACCT_WB; 876

··· 870 871 fc->bdi.name = "fuse"; 872 fc->bdi.ra_pages = (VM_MAX_READAHEAD * 1024) / PAGE_CACHE_SIZE; 873 /* fuse does it's own writeback accounting */ 874 fc->bdi.capabilities = BDI_CAP_NO_ACCT_WB; 875

-3

fs/gfs2/aops.c

··· 1117 .writepages = gfs2_writeback_writepages, 1118 .readpage = gfs2_readpage, 1119 .readpages = gfs2_readpages, 1120 - .sync_page = block_sync_page, 1121 .write_begin = gfs2_write_begin, 1122 .write_end = gfs2_write_end, 1123 .bmap = gfs2_bmap, ··· 1132 .writepage = gfs2_ordered_writepage, 1133 .readpage = gfs2_readpage, 1134 .readpages = gfs2_readpages, 1135 - .sync_page = block_sync_page, 1136 .write_begin = gfs2_write_begin, 1137 .write_end = gfs2_write_end, 1138 .set_page_dirty = gfs2_set_page_dirty, ··· 1149 .writepages = gfs2_jdata_writepages, 1150 .readpage = gfs2_readpage, 1151 .readpages = gfs2_readpages, 1152 - .sync_page = block_sync_page, 1153 .write_begin = gfs2_write_begin, 1154 .write_end = gfs2_write_end, 1155 .set_page_dirty = gfs2_set_page_dirty,

··· 1117 .writepages = gfs2_writeback_writepages, 1118 .readpage = gfs2_readpage, 1119 .readpages = gfs2_readpages, 1120 .write_begin = gfs2_write_begin, 1121 .write_end = gfs2_write_end, 1122 .bmap = gfs2_bmap, ··· 1133 .writepage = gfs2_ordered_writepage, 1134 .readpage = gfs2_readpage, 1135 .readpages = gfs2_readpages, 1136 .write_begin = gfs2_write_begin, 1137 .write_end = gfs2_write_end, 1138 .set_page_dirty = gfs2_set_page_dirty, ··· 1151 .writepages = gfs2_jdata_writepages, 1152 .readpage = gfs2_readpage, 1153 .readpages = gfs2_readpages, 1154 .write_begin = gfs2_write_begin, 1155 .write_end = gfs2_write_end, 1156 .set_page_dirty = gfs2_set_page_dirty,

+2 -2

fs/gfs2/log.c

··· 121 lock_buffer(bh); 122 if (test_clear_buffer_dirty(bh)) { 123 bh->b_end_io = end_buffer_write_sync; 124 - submit_bh(WRITE_SYNC_PLUG, bh); 125 } else { 126 unlock_buffer(bh); 127 brelse(bh); ··· 647 lock_buffer(bh); 648 if (buffer_mapped(bh) && test_clear_buffer_dirty(bh)) { 649 bh->b_end_io = end_buffer_write_sync; 650 - submit_bh(WRITE_SYNC_PLUG, bh); 651 } else { 652 unlock_buffer(bh); 653 brelse(bh);

··· 121 lock_buffer(bh); 122 if (test_clear_buffer_dirty(bh)) { 123 bh->b_end_io = end_buffer_write_sync; 124 + submit_bh(WRITE_SYNC, bh); 125 } else { 126 unlock_buffer(bh); 127 brelse(bh); ··· 647 lock_buffer(bh); 648 if (buffer_mapped(bh) && test_clear_buffer_dirty(bh)) { 649 bh->b_end_io = end_buffer_write_sync; 650 + submit_bh(WRITE_SYNC, bh); 651 } else { 652 unlock_buffer(bh); 653 brelse(bh);

+6 -6

fs/gfs2/lops.c

··· 204 } 205 206 gfs2_log_unlock(sdp); 207 - submit_bh(WRITE_SYNC_PLUG, bh); 208 gfs2_log_lock(sdp); 209 210 n = 0; ··· 214 gfs2_log_unlock(sdp); 215 lock_buffer(bd2->bd_bh); 216 bh = gfs2_log_fake_buf(sdp, bd2->bd_bh); 217 - submit_bh(WRITE_SYNC_PLUG, bh); 218 gfs2_log_lock(sdp); 219 if (++n >= num) 220 break; ··· 356 sdp->sd_log_num_revoke--; 357 358 if (offset + sizeof(u64) > sdp->sd_sb.sb_bsize) { 359 - submit_bh(WRITE_SYNC_PLUG, bh); 360 361 bh = gfs2_log_get_buf(sdp); 362 mh = (struct gfs2_meta_header *)bh->b_data; ··· 373 } 374 gfs2_assert_withdraw(sdp, !sdp->sd_log_num_revoke); 375 376 - submit_bh(WRITE_SYNC_PLUG, bh); 377 } 378 379 static void revoke_lo_before_scan(struct gfs2_jdesc *jd, ··· 575 ptr = bh_log_ptr(bh); 576 577 get_bh(bh); 578 - submit_bh(WRITE_SYNC_PLUG, bh); 579 gfs2_log_lock(sdp); 580 while(!list_empty(list)) { 581 bd = list_entry(list->next, struct gfs2_bufdata, bd_le.le_list); ··· 601 } else { 602 bh1 = gfs2_log_fake_buf(sdp, bd->bd_bh); 603 } 604 - submit_bh(WRITE_SYNC_PLUG, bh1); 605 gfs2_log_lock(sdp); 606 ptr += 2; 607 }

··· 204 } 205 206 gfs2_log_unlock(sdp); 207 + submit_bh(WRITE_SYNC, bh); 208 gfs2_log_lock(sdp); 209 210 n = 0; ··· 214 gfs2_log_unlock(sdp); 215 lock_buffer(bd2->bd_bh); 216 bh = gfs2_log_fake_buf(sdp, bd2->bd_bh); 217 + submit_bh(WRITE_SYNC, bh); 218 gfs2_log_lock(sdp); 219 if (++n >= num) 220 break; ··· 356 sdp->sd_log_num_revoke--; 357 358 if (offset + sizeof(u64) > sdp->sd_sb.sb_bsize) { 359 + submit_bh(WRITE_SYNC, bh); 360 361 bh = gfs2_log_get_buf(sdp); 362 mh = (struct gfs2_meta_header *)bh->b_data; ··· 373 } 374 gfs2_assert_withdraw(sdp, !sdp->sd_log_num_revoke); 375 376 + submit_bh(WRITE_SYNC, bh); 377 } 378 379 static void revoke_lo_before_scan(struct gfs2_jdesc *jd, ··· 575 ptr = bh_log_ptr(bh); 576 577 get_bh(bh); 578 + submit_bh(WRITE_SYNC, bh); 579 gfs2_log_lock(sdp); 580 while(!list_empty(list)) { 581 bd = list_entry(list->next, struct gfs2_bufdata, bd_le.le_list); ··· 601 } else { 602 bh1 = gfs2_log_fake_buf(sdp, bd->bd_bh); 603 } 604 + submit_bh(WRITE_SYNC, bh1); 605 gfs2_log_lock(sdp); 606 ptr += 2; 607 }

+1 -2

fs/gfs2/meta_io.c

··· 37 struct buffer_head *bh, *head; 38 int nr_underway = 0; 39 int write_op = REQ_META | 40 - (wbc->sync_mode == WB_SYNC_ALL ? WRITE_SYNC_PLUG : WRITE); 41 42 BUG_ON(!PageLocked(page)); 43 BUG_ON(!page_has_buffers(page)); ··· 94 const struct address_space_operations gfs2_meta_aops = { 95 .writepage = gfs2_aspace_writepage, 96 .releasepage = gfs2_releasepage, 97 - .sync_page = block_sync_page, 98 }; 99 100 /**

··· 37 struct buffer_head *bh, *head; 38 int nr_underway = 0; 39 int write_op = REQ_META | 40 + (wbc->sync_mode == WB_SYNC_ALL ? WRITE_SYNC : WRITE); 41 42 BUG_ON(!PageLocked(page)); 43 BUG_ON(!page_has_buffers(page)); ··· 94 const struct address_space_operations gfs2_meta_aops = { 95 .writepage = gfs2_aspace_writepage, 96 .releasepage = gfs2_releasepage, 97 }; 98 99 /**

-2

fs/hfs/inode.c

··· 150 const struct address_space_operations hfs_btree_aops = { 151 .readpage = hfs_readpage, 152 .writepage = hfs_writepage, 153 - .sync_page = block_sync_page, 154 .write_begin = hfs_write_begin, 155 .write_end = generic_write_end, 156 .bmap = hfs_bmap, ··· 159 const struct address_space_operations hfs_aops = { 160 .readpage = hfs_readpage, 161 .writepage = hfs_writepage, 162 - .sync_page = block_sync_page, 163 .write_begin = hfs_write_begin, 164 .write_end = generic_write_end, 165 .bmap = hfs_bmap,

··· 150 const struct address_space_operations hfs_btree_aops = { 151 .readpage = hfs_readpage, 152 .writepage = hfs_writepage, 153 .write_begin = hfs_write_begin, 154 .write_end = generic_write_end, 155 .bmap = hfs_bmap, ··· 160 const struct address_space_operations hfs_aops = { 161 .readpage = hfs_readpage, 162 .writepage = hfs_writepage, 163 .write_begin = hfs_write_begin, 164 .write_end = generic_write_end, 165 .bmap = hfs_bmap,

-2

fs/hfsplus/inode.c

··· 146 const struct address_space_operations hfsplus_btree_aops = { 147 .readpage = hfsplus_readpage, 148 .writepage = hfsplus_writepage, 149 - .sync_page = block_sync_page, 150 .write_begin = hfsplus_write_begin, 151 .write_end = generic_write_end, 152 .bmap = hfsplus_bmap, ··· 155 const struct address_space_operations hfsplus_aops = { 156 .readpage = hfsplus_readpage, 157 .writepage = hfsplus_writepage, 158 - .sync_page = block_sync_page, 159 .write_begin = hfsplus_write_begin, 160 .write_end = generic_write_end, 161 .bmap = hfsplus_bmap,

··· 146 const struct address_space_operations hfsplus_btree_aops = { 147 .readpage = hfsplus_readpage, 148 .writepage = hfsplus_writepage, 149 .write_begin = hfsplus_write_begin, 150 .write_end = generic_write_end, 151 .bmap = hfsplus_bmap, ··· 156 const struct address_space_operations hfsplus_aops = { 157 .readpage = hfsplus_readpage, 158 .writepage = hfsplus_writepage, 159 .write_begin = hfsplus_write_begin, 160 .write_end = generic_write_end, 161 .bmap = hfsplus_bmap,

-1

fs/hpfs/file.c

··· 119 const struct address_space_operations hpfs_aops = { 120 .readpage = hpfs_readpage, 121 .writepage = hpfs_writepage, 122 - .sync_page = block_sync_page, 123 .write_begin = hpfs_write_begin, 124 .write_end = generic_write_end, 125 .bmap = _hpfs_bmap

··· 119 const struct address_space_operations hpfs_aops = { 120 .readpage = hpfs_readpage, 121 .writepage = hpfs_writepage, 122 .write_begin = hpfs_write_begin, 123 .write_end = generic_write_end, 124 .bmap = _hpfs_bmap

-1

fs/isofs/inode.c

··· 1158 1159 static const struct address_space_operations isofs_aops = { 1160 .readpage = isofs_readpage, 1161 - .sync_page = block_sync_page, 1162 .bmap = _isofs_bmap 1163 }; 1164

··· 1158 1159 static const struct address_space_operations isofs_aops = { 1160 .readpage = isofs_readpage, 1161 .bmap = _isofs_bmap 1162 }; 1163

+11 -11

fs/jbd/commit.c

··· 20 #include <linux/mm.h> 21 #include <linux/pagemap.h> 22 #include <linux/bio.h> 23 24 /* 25 * Default IO end handler for temporary BJ_IO buffer_heads. ··· 295 int first_tag = 0; 296 int tag_flag; 297 int i; 298 - int write_op = WRITE_SYNC; 299 300 /* 301 * First job: lock down the current transaction and wait for ··· 328 spin_lock(&journal->j_state_lock); 329 commit_transaction->t_state = T_LOCKED; 330 331 - /* 332 - * Use plugged writes here, since we want to submit several before 333 - * we unplug the device. We don't do explicit unplugging in here, 334 - * instead we rely on sync_buffer() doing the unplug for us. 335 - */ 336 - if (commit_transaction->t_synchronous_commit) 337 - write_op = WRITE_SYNC_PLUG; 338 spin_lock(&commit_transaction->t_handle_lock); 339 while (commit_transaction->t_updates) { 340 DEFINE_WAIT(wait); ··· 412 * Now start flushing things to disk, in the order they appear 413 * on the transaction lists. Data blocks go first. 414 */ 415 err = journal_submit_data_buffers(journal, commit_transaction, 416 - write_op); 417 418 /* 419 * Wait for all previously submitted IO to complete. ··· 476 err = 0; 477 } 478 479 - journal_write_revoke_records(journal, commit_transaction, write_op); 480 481 /* 482 * If we found any dirty or locked buffers, then we should have ··· 648 clear_buffer_dirty(bh); 649 set_buffer_uptodate(bh); 650 bh->b_end_io = journal_end_buffer_io_sync; 651 - submit_bh(write_op, bh); 652 } 653 cond_resched(); 654 ··· 658 bufs = 0; 659 } 660 } 661 662 /* Lo and behold: we have just managed to send a transaction to 663 the log. Before we can commit it, wait for the IO so far to

··· 20 #include <linux/mm.h> 21 #include <linux/pagemap.h> 22 #include <linux/bio.h> 23 + #include <linux/blkdev.h> 24 25 /* 26 * Default IO end handler for temporary BJ_IO buffer_heads. ··· 294 int first_tag = 0; 295 int tag_flag; 296 int i; 297 + struct blk_plug plug; 298 299 /* 300 * First job: lock down the current transaction and wait for ··· 327 spin_lock(&journal->j_state_lock); 328 commit_transaction->t_state = T_LOCKED; 329 330 spin_lock(&commit_transaction->t_handle_lock); 331 while (commit_transaction->t_updates) { 332 DEFINE_WAIT(wait); ··· 418 * Now start flushing things to disk, in the order they appear 419 * on the transaction lists. Data blocks go first. 420 */ 421 + blk_start_plug(&plug); 422 err = journal_submit_data_buffers(journal, commit_transaction, 423 + WRITE_SYNC); 424 + blk_finish_plug(&plug); 425 426 /* 427 * Wait for all previously submitted IO to complete. ··· 480 err = 0; 481 } 482 483 + blk_start_plug(&plug); 484 + 485 + journal_write_revoke_records(journal, commit_transaction, WRITE_SYNC); 486 487 /* 488 * If we found any dirty or locked buffers, then we should have ··· 650 clear_buffer_dirty(bh); 651 set_buffer_uptodate(bh); 652 bh->b_end_io = journal_end_buffer_io_sync; 653 + submit_bh(WRITE_SYNC, bh); 654 } 655 cond_resched(); 656 ··· 660 bufs = 0; 661 } 662 } 663 + 664 + blk_finish_plug(&plug); 665 666 /* Lo and behold: we have just managed to send a transaction to 667 the log. Before we can commit it, wait for the IO so far to

+10 -12

fs/jbd2/commit.c

··· 137 if (journal->j_flags & JBD2_BARRIER && 138 !JBD2_HAS_INCOMPAT_FEATURE(journal, 139 JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT)) 140 - ret = submit_bh(WRITE_SYNC_PLUG | WRITE_FLUSH_FUA, bh); 141 else 142 - ret = submit_bh(WRITE_SYNC_PLUG, bh); 143 144 *cbh = bh; 145 return ret; ··· 329 int tag_bytes = journal_tag_bytes(journal); 330 struct buffer_head *cbh = NULL; /* For transactional checksums */ 331 __u32 crc32_sum = ~0; 332 - int write_op = WRITE_SYNC; 333 334 /* 335 * First job: lock down the current transaction and wait for ··· 363 write_lock(&journal->j_state_lock); 364 commit_transaction->t_state = T_LOCKED; 365 366 - /* 367 - * Use plugged writes here, since we want to submit several before 368 - * we unplug the device. We don't do explicit unplugging in here, 369 - * instead we rely on sync_buffer() doing the unplug for us. 370 - */ 371 - if (commit_transaction->t_synchronous_commit) 372 - write_op = WRITE_SYNC_PLUG; 373 trace_jbd2_commit_locking(journal, commit_transaction); 374 stats.run.rs_wait = commit_transaction->t_max_wait; 375 stats.run.rs_locked = jiffies; ··· 462 if (err) 463 jbd2_journal_abort(journal, err); 464 465 jbd2_journal_write_revoke_records(journal, commit_transaction, 466 - write_op); 467 468 jbd_debug(3, "JBD: commit phase 2\n"); 469 ··· 492 err = 0; 493 descriptor = NULL; 494 bufs = 0; 495 while (commit_transaction->t_buffers) { 496 497 /* Find the next buffer to be journaled... */ ··· 654 clear_buffer_dirty(bh); 655 set_buffer_uptodate(bh); 656 bh->b_end_io = journal_end_buffer_io_sync; 657 - submit_bh(write_op, bh); 658 } 659 cond_resched(); 660 stats.run.rs_blocks_logged += bufs; ··· 694 if (err) 695 __jbd2_journal_abort_hard(journal); 696 } 697 698 /* Lo and behold: we have just managed to send a transaction to 699 the log. Before we can commit it, wait for the IO so far to

··· 137 if (journal->j_flags & JBD2_BARRIER && 138 !JBD2_HAS_INCOMPAT_FEATURE(journal, 139 JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT)) 140 + ret = submit_bh(WRITE_SYNC | WRITE_FLUSH_FUA, bh); 141 else 142 + ret = submit_bh(WRITE_SYNC, bh); 143 144 *cbh = bh; 145 return ret; ··· 329 int tag_bytes = journal_tag_bytes(journal); 330 struct buffer_head *cbh = NULL; /* For transactional checksums */ 331 __u32 crc32_sum = ~0; 332 + struct blk_plug plug; 333 334 /* 335 * First job: lock down the current transaction and wait for ··· 363 write_lock(&journal->j_state_lock); 364 commit_transaction->t_state = T_LOCKED; 365 366 trace_jbd2_commit_locking(journal, commit_transaction); 367 stats.run.rs_wait = commit_transaction->t_max_wait; 368 stats.run.rs_locked = jiffies; ··· 469 if (err) 470 jbd2_journal_abort(journal, err); 471 472 + blk_start_plug(&plug); 473 jbd2_journal_write_revoke_records(journal, commit_transaction, 474 + WRITE_SYNC); 475 + blk_finish_plug(&plug); 476 477 jbd_debug(3, "JBD: commit phase 2\n"); 478 ··· 497 err = 0; 498 descriptor = NULL; 499 bufs = 0; 500 + blk_start_plug(&plug); 501 while (commit_transaction->t_buffers) { 502 503 /* Find the next buffer to be journaled... */ ··· 658 clear_buffer_dirty(bh); 659 set_buffer_uptodate(bh); 660 bh->b_end_io = journal_end_buffer_io_sync; 661 + submit_bh(WRITE_SYNC, bh); 662 } 663 cond_resched(); 664 stats.run.rs_blocks_logged += bufs; ··· 698 if (err) 699 __jbd2_journal_abort_hard(journal); 700 } 701 + 702 + blk_finish_plug(&plug); 703 704 /* Lo and behold: we have just managed to send a transaction to 705 the log. Before we can commit it, wait for the IO so far to

-1

fs/jfs/inode.c

··· 352 .readpages = jfs_readpages, 353 .writepage = jfs_writepage, 354 .writepages = jfs_writepages, 355 - .sync_page = block_sync_page, 356 .write_begin = jfs_write_begin, 357 .write_end = nobh_write_end, 358 .bmap = jfs_bmap,

··· 352 .readpages = jfs_readpages, 353 .writepage = jfs_writepage, 354 .writepages = jfs_writepages, 355 .write_begin = jfs_write_begin, 356 .write_end = nobh_write_end, 357 .bmap = jfs_bmap,

-1

fs/jfs/jfs_metapage.c

··· 583 const struct address_space_operations jfs_metapage_aops = { 584 .readpage = metapage_readpage, 585 .writepage = metapage_writepage, 586 - .sync_page = block_sync_page, 587 .releasepage = metapage_releasepage, 588 .invalidatepage = metapage_invalidatepage, 589 .set_page_dirty = __set_page_dirty_nobuffers,

··· 583 const struct address_space_operations jfs_metapage_aops = { 584 .readpage = metapage_readpage, 585 .writepage = metapage_writepage, 586 .releasepage = metapage_releasepage, 587 .invalidatepage = metapage_invalidatepage, 588 .set_page_dirty = __set_page_dirty_nobuffers,

-2

fs/logfs/dev_bdev.c

··· 39 bio.bi_end_io = request_complete; 40 41 submit_bio(rw, &bio); 42 - generic_unplug_device(bdev_get_queue(bdev)); 43 wait_for_completion(&complete); 44 return test_bit(BIO_UPTODATE, &bio.bi_flags) ? 0 : -EIO; 45 } ··· 167 } 168 len = PAGE_ALIGN(len); 169 __bdev_writeseg(sb, ofs, ofs >> PAGE_SHIFT, len >> PAGE_SHIFT); 170 - generic_unplug_device(bdev_get_queue(logfs_super(sb)->s_bdev)); 171 } 172 173

··· 39 bio.bi_end_io = request_complete; 40 41 submit_bio(rw, &bio); 42 wait_for_completion(&complete); 43 return test_bit(BIO_UPTODATE, &bio.bi_flags) ? 0 : -EIO; 44 } ··· 168 } 169 len = PAGE_ALIGN(len); 170 __bdev_writeseg(sb, ofs, ofs >> PAGE_SHIFT, len >> PAGE_SHIFT); 171 } 172 173

-1

fs/minix/inode.c

··· 399 static const struct address_space_operations minix_aops = { 400 .readpage = minix_readpage, 401 .writepage = minix_writepage, 402 - .sync_page = block_sync_page, 403 .write_begin = minix_write_begin, 404 .write_end = generic_write_end, 405 .bmap = minix_bmap

··· 399 static const struct address_space_operations minix_aops = { 400 .readpage = minix_readpage, 401 .writepage = minix_writepage, 402 .write_begin = minix_write_begin, 403 .write_end = generic_write_end, 404 .bmap = minix_bmap

+8

fs/mpage.c

··· 364 sector_t last_block_in_bio = 0; 365 struct buffer_head map_bh; 366 unsigned long first_logical_block = 0; 367 368 map_bh.b_state = 0; 369 map_bh.b_size = 0; ··· 388 BUG_ON(!list_empty(pages)); 389 if (bio) 390 mpage_bio_submit(READ, bio); 391 return 0; 392 } 393 EXPORT_SYMBOL(mpage_readpages); ··· 670 mpage_writepages(struct address_space *mapping, 671 struct writeback_control *wbc, get_block_t get_block) 672 { 673 int ret; 674 675 if (!get_block) 676 ret = generic_writepages(mapping, wbc); ··· 689 if (mpd.bio) 690 mpage_bio_submit(WRITE, mpd.bio); 691 } 692 return ret; 693 } 694 EXPORT_SYMBOL(mpage_writepages);

··· 364 sector_t last_block_in_bio = 0; 365 struct buffer_head map_bh; 366 unsigned long first_logical_block = 0; 367 + struct blk_plug plug; 368 + 369 + blk_start_plug(&plug); 370 371 map_bh.b_state = 0; 372 map_bh.b_size = 0; ··· 385 BUG_ON(!list_empty(pages)); 386 if (bio) 387 mpage_bio_submit(READ, bio); 388 + blk_finish_plug(&plug); 389 return 0; 390 } 391 EXPORT_SYMBOL(mpage_readpages); ··· 666 mpage_writepages(struct address_space *mapping, 667 struct writeback_control *wbc, get_block_t get_block) 668 { 669 + struct blk_plug plug; 670 int ret; 671 + 672 + blk_start_plug(&plug); 673 674 if (!get_block) 675 ret = generic_writepages(mapping, wbc); ··· 682 if (mpd.bio) 683 mpage_bio_submit(WRITE, mpd.bio); 684 } 685 + blk_finish_plug(&plug); 686 return ret; 687 } 688 EXPORT_SYMBOL(mpage_writepages);

+1 -6

fs/nilfs2/btnode.c

··· 34 #include "page.h" 35 #include "btnode.h" 36 37 - 38 - static const struct address_space_operations def_btnode_aops = { 39 - .sync_page = block_sync_page, 40 - }; 41 - 42 void nilfs_btnode_cache_init(struct address_space *btnc, 43 struct backing_dev_info *bdi) 44 { 45 - nilfs_mapping_init(btnc, bdi, &def_btnode_aops); 46 } 47 48 void nilfs_btnode_cache_clear(struct address_space *btnc)

··· 34 #include "page.h" 35 #include "btnode.h" 36 37 void nilfs_btnode_cache_init(struct address_space *btnc, 38 struct backing_dev_info *bdi) 39 { 40 + nilfs_mapping_init(btnc, bdi); 41 } 42 43 void nilfs_btnode_cache_clear(struct address_space *btnc)

-1

fs/nilfs2/gcinode.c

··· 49 #include "ifile.h" 50 51 static const struct address_space_operations def_gcinode_aops = { 52 - .sync_page = block_sync_page, 53 }; 54 55 /*

··· 49 #include "ifile.h" 50 51 static const struct address_space_operations def_gcinode_aops = { 52 }; 53 54 /*

-1

fs/nilfs2/inode.c

··· 280 const struct address_space_operations nilfs_aops = { 281 .writepage = nilfs_writepage, 282 .readpage = nilfs_readpage, 283 - .sync_page = block_sync_page, 284 .writepages = nilfs_writepages, 285 .set_page_dirty = nilfs_set_page_dirty, 286 .readpages = nilfs_readpages,

··· 280 const struct address_space_operations nilfs_aops = { 281 .writepage = nilfs_writepage, 282 .readpage = nilfs_readpage, 283 .writepages = nilfs_writepages, 284 .set_page_dirty = nilfs_set_page_dirty, 285 .readpages = nilfs_readpages,

+2 -7

fs/nilfs2/mdt.c

··· 399 400 static const struct address_space_operations def_mdt_aops = { 401 .writepage = nilfs_mdt_write_page, 402 - .sync_page = block_sync_page, 403 }; 404 405 static const struct inode_operations def_mdt_iops; ··· 437 mi->mi_first_entry_offset = DIV_ROUND_UP(header_size, entry_size); 438 } 439 440 - static const struct address_space_operations shadow_map_aops = { 441 - .sync_page = block_sync_page, 442 - }; 443 - 444 /** 445 * nilfs_mdt_setup_shadow_map - setup shadow map and bind it to metadata file 446 * @inode: inode of the metadata file ··· 450 451 INIT_LIST_HEAD(&shadow->frozen_buffers); 452 address_space_init_once(&shadow->frozen_data); 453 - nilfs_mapping_init(&shadow->frozen_data, bdi, &shadow_map_aops); 454 address_space_init_once(&shadow->frozen_btnodes); 455 - nilfs_mapping_init(&shadow->frozen_btnodes, bdi, &shadow_map_aops); 456 mi->mi_shadow = shadow; 457 return 0; 458 }

··· 399 400 static const struct address_space_operations def_mdt_aops = { 401 .writepage = nilfs_mdt_write_page, 402 }; 403 404 static const struct inode_operations def_mdt_iops; ··· 438 mi->mi_first_entry_offset = DIV_ROUND_UP(header_size, entry_size); 439 } 440 441 /** 442 * nilfs_mdt_setup_shadow_map - setup shadow map and bind it to metadata file 443 * @inode: inode of the metadata file ··· 455 456 INIT_LIST_HEAD(&shadow->frozen_buffers); 457 address_space_init_once(&shadow->frozen_data); 458 + nilfs_mapping_init(&shadow->frozen_data, bdi); 459 address_space_init_once(&shadow->frozen_btnodes); 460 + nilfs_mapping_init(&shadow->frozen_btnodes, bdi); 461 mi->mi_shadow = shadow; 462 return 0; 463 }

+2 -3

fs/nilfs2/page.c

··· 493 } 494 495 void nilfs_mapping_init(struct address_space *mapping, 496 - struct backing_dev_info *bdi, 497 - const struct address_space_operations *aops) 498 { 499 mapping->host = NULL; 500 mapping->flags = 0; 501 mapping_set_gfp_mask(mapping, GFP_NOFS); 502 mapping->assoc_mapping = NULL; 503 mapping->backing_dev_info = bdi; 504 - mapping->a_ops = aops; 505 } 506 507 /*

··· 493 } 494 495 void nilfs_mapping_init(struct address_space *mapping, 496 + struct backing_dev_info *bdi) 497 { 498 mapping->host = NULL; 499 mapping->flags = 0; 500 mapping_set_gfp_mask(mapping, GFP_NOFS); 501 mapping->assoc_mapping = NULL; 502 mapping->backing_dev_info = bdi; 503 + mapping->a_ops = NULL; 504 } 505 506 /*

+1 -2

fs/nilfs2/page.h

··· 62 void nilfs_copy_back_pages(struct address_space *, struct address_space *); 63 void nilfs_clear_dirty_pages(struct address_space *); 64 void nilfs_mapping_init(struct address_space *mapping, 65 - struct backing_dev_info *bdi, 66 - const struct address_space_operations *aops); 67 unsigned nilfs_page_count_clean_buffers(struct page *, unsigned, unsigned); 68 unsigned long nilfs_find_uncommitted_extent(struct inode *inode, 69 sector_t start_blk,

··· 62 void nilfs_copy_back_pages(struct address_space *, struct address_space *); 63 void nilfs_clear_dirty_pages(struct address_space *); 64 void nilfs_mapping_init(struct address_space *mapping, 65 + struct backing_dev_info *bdi); 66 unsigned nilfs_page_count_clean_buffers(struct page *, unsigned, unsigned); 67 unsigned long nilfs_find_uncommitted_extent(struct inode *inode, 68 sector_t start_blk,

+1 -1

fs/nilfs2/segbuf.c

··· 509 * Last BIO is always sent through the following 510 * submission. 511 */ 512 - rw |= REQ_SYNC | REQ_UNPLUG; 513 res = nilfs_segbuf_submit_bio(segbuf, &wi, rw); 514 } 515

··· 509 * Last BIO is always sent through the following 510 * submission. 511 */ 512 + rw |= REQ_SYNC; 513 res = nilfs_segbuf_submit_bio(segbuf, &wi, rw); 514 } 515

-4

fs/ntfs/aops.c

··· 1543 */ 1544 const struct address_space_operations ntfs_aops = { 1545 .readpage = ntfs_readpage, /* Fill page with data. */ 1546 - .sync_page = block_sync_page, /* Currently, just unplugs the 1547 - disk request queue. */ 1548 #ifdef NTFS_RW 1549 .writepage = ntfs_writepage, /* Write dirty page to disk. */ 1550 #endif /* NTFS_RW */ ··· 1558 */ 1559 const struct address_space_operations ntfs_mst_aops = { 1560 .readpage = ntfs_readpage, /* Fill page with data. */ 1561 - .sync_page = block_sync_page, /* Currently, just unplugs the 1562 - disk request queue. */ 1563 #ifdef NTFS_RW 1564 .writepage = ntfs_writepage, /* Write dirty page to disk. */ 1565 .set_page_dirty = __set_page_dirty_nobuffers, /* Set the page dirty

··· 1543 */ 1544 const struct address_space_operations ntfs_aops = { 1545 .readpage = ntfs_readpage, /* Fill page with data. */ 1546 #ifdef NTFS_RW 1547 .writepage = ntfs_writepage, /* Write dirty page to disk. */ 1548 #endif /* NTFS_RW */ ··· 1560 */ 1561 const struct address_space_operations ntfs_mst_aops = { 1562 .readpage = ntfs_readpage, /* Fill page with data. */ 1563 #ifdef NTFS_RW 1564 .writepage = ntfs_writepage, /* Write dirty page to disk. */ 1565 .set_page_dirty = __set_page_dirty_nobuffers, /* Set the page dirty

+1 -2

fs/ntfs/compress.c

··· 698 "uptodate! Unplugging the disk queue " 699 "and rescheduling."); 700 get_bh(tbh); 701 - blk_run_address_space(mapping); 702 - schedule(); 703 put_bh(tbh); 704 if (unlikely(!buffer_uptodate(tbh))) 705 goto read_err;

··· 698 "uptodate! Unplugging the disk queue " 699 "and rescheduling."); 700 get_bh(tbh); 701 + io_schedule(); 702 put_bh(tbh); 703 if (unlikely(!buffer_uptodate(tbh))) 704 goto read_err;

-1

fs/ocfs2/aops.c

··· 2043 .write_begin = ocfs2_write_begin, 2044 .write_end = ocfs2_write_end, 2045 .bmap = ocfs2_bmap, 2046 - .sync_page = block_sync_page, 2047 .direct_IO = ocfs2_direct_IO, 2048 .invalidatepage = ocfs2_invalidatepage, 2049 .releasepage = ocfs2_releasepage,

··· 2043 .write_begin = ocfs2_write_begin, 2044 .write_end = ocfs2_write_end, 2045 .bmap = ocfs2_bmap, 2046 .direct_IO = ocfs2_direct_IO, 2047 .invalidatepage = ocfs2_invalidatepage, 2048 .releasepage = ocfs2_releasepage,

-4

fs/ocfs2/cluster/heartbeat.c

··· 367 static void o2hb_wait_on_io(struct o2hb_region *reg, 368 struct o2hb_bio_wait_ctxt *wc) 369 { 370 - struct address_space *mapping = reg->hr_bdev->bd_inode->i_mapping; 371 - 372 - blk_run_address_space(mapping); 373 o2hb_bio_wait_dec(wc, 1); 374 - 375 wait_for_completion(&wc->wc_io_complete); 376 } 377

··· 367 static void o2hb_wait_on_io(struct o2hb_region *reg, 368 struct o2hb_bio_wait_ctxt *wc) 369 { 370 o2hb_bio_wait_dec(wc, 1); 371 wait_for_completion(&wc->wc_io_complete); 372 } 373

-1

fs/omfs/file.c

··· 372 .readpages = omfs_readpages, 373 .writepage = omfs_writepage, 374 .writepages = omfs_writepages, 375 - .sync_page = block_sync_page, 376 .write_begin = omfs_write_begin, 377 .write_end = generic_write_end, 378 .bmap = omfs_bmap,

··· 372 .readpages = omfs_readpages, 373 .writepage = omfs_writepage, 374 .writepages = omfs_writepages, 375 .write_begin = omfs_write_begin, 376 .write_end = generic_write_end, 377 .bmap = omfs_bmap,

+2 -1

fs/partitions/check.c

··· 290 { 291 struct hd_struct *p = dev_to_part(dev); 292 293 - return sprintf(buf, "%8u %8u\n", p->in_flight[0], p->in_flight[1]); 294 } 295 296 #ifdef CONFIG_FAIL_MAKE_REQUEST

··· 290 { 291 struct hd_struct *p = dev_to_part(dev); 292 293 + return sprintf(buf, "%8u %8u\n", atomic_read(&p->in_flight[0]), 294 + atomic_read(&p->in_flight[1])); 295 } 296 297 #ifdef CONFIG_FAIL_MAKE_REQUEST

-1

fs/qnx4/inode.c

··· 335 static const struct address_space_operations qnx4_aops = { 336 .readpage = qnx4_readpage, 337 .writepage = qnx4_writepage, 338 - .sync_page = block_sync_page, 339 .write_begin = qnx4_write_begin, 340 .write_end = generic_write_end, 341 .bmap = qnx4_bmap

··· 335 static const struct address_space_operations qnx4_aops = { 336 .readpage = qnx4_readpage, 337 .writepage = qnx4_writepage, 338 .write_begin = qnx4_write_begin, 339 .write_end = generic_write_end, 340 .bmap = qnx4_bmap

-1

fs/reiserfs/inode.c

··· 3217 .readpages = reiserfs_readpages, 3218 .releasepage = reiserfs_releasepage, 3219 .invalidatepage = reiserfs_invalidatepage, 3220 - .sync_page = block_sync_page, 3221 .write_begin = reiserfs_write_begin, 3222 .write_end = reiserfs_write_end, 3223 .bmap = reiserfs_aop_bmap,

··· 3217 .readpages = reiserfs_readpages, 3218 .releasepage = reiserfs_releasepage, 3219 .invalidatepage = reiserfs_invalidatepage, 3220 .write_begin = reiserfs_write_begin, 3221 .write_end = reiserfs_write_end, 3222 .bmap = reiserfs_aop_bmap,

+2

fs/super.c

··· 71 #else 72 INIT_LIST_HEAD(&s->s_files); 73 #endif 74 INIT_LIST_HEAD(&s->s_instances); 75 INIT_HLIST_BL_HEAD(&s->s_anon); 76 INIT_LIST_HEAD(&s->s_inodes); ··· 937 sb = root->d_sb; 938 BUG_ON(!sb); 939 WARN_ON(!sb->s_bdi); 940 sb->s_flags |= MS_BORN; 941 942 error = security_sb_kern_mount(sb, flags, secdata);

··· 71 #else 72 INIT_LIST_HEAD(&s->s_files); 73 #endif 74 + s->s_bdi = &default_backing_dev_info; 75 INIT_LIST_HEAD(&s->s_instances); 76 INIT_HLIST_BL_HEAD(&s->s_anon); 77 INIT_LIST_HEAD(&s->s_inodes); ··· 936 sb = root->d_sb; 937 BUG_ON(!sb); 938 WARN_ON(!sb->s_bdi); 939 + WARN_ON(sb->s_bdi == &default_backing_dev_info); 940 sb->s_flags |= MS_BORN; 941 942 error = security_sb_kern_mount(sb, flags, secdata);

+2 -2

fs/sync.c

··· 34 * This should be safe, as we require bdi backing to actually 35 * write out data in the first place 36 */ 37 - if (!sb->s_bdi || sb->s_bdi == &noop_backing_dev_info) 38 return 0; 39 40 if (sb->s_qcop && sb->s_qcop->quota_sync) ··· 80 81 static void sync_one_sb(struct super_block *sb, void *arg) 82 { 83 - if (!(sb->s_flags & MS_RDONLY) && sb->s_bdi) 84 __sync_filesystem(sb, *(int *)arg); 85 } 86 /*

··· 34 * This should be safe, as we require bdi backing to actually 35 * write out data in the first place 36 */ 37 + if (sb->s_bdi == &noop_backing_dev_info) 38 return 0; 39 40 if (sb->s_qcop && sb->s_qcop->quota_sync) ··· 80 81 static void sync_one_sb(struct super_block *sb, void *arg) 82 { 83 + if (!(sb->s_flags & MS_RDONLY)) 84 __sync_filesystem(sb, *(int *)arg); 85 } 86 /*

-1

fs/sysv/itree.c

··· 488 const struct address_space_operations sysv_aops = { 489 .readpage = sysv_readpage, 490 .writepage = sysv_writepage, 491 - .sync_page = block_sync_page, 492 .write_begin = sysv_write_begin, 493 .write_end = generic_write_end, 494 .bmap = sysv_bmap

··· 488 const struct address_space_operations sysv_aops = { 489 .readpage = sysv_readpage, 490 .writepage = sysv_writepage, 491 .write_begin = sysv_write_begin, 492 .write_end = generic_write_end, 493 .bmap = sysv_bmap

-1

fs/ubifs/super.c

··· 2011 */ 2012 c->bdi.name = "ubifs", 2013 c->bdi.capabilities = BDI_CAP_MAP_COPY; 2014 - c->bdi.unplug_io_fn = default_unplug_io_fn; 2015 err = bdi_init(&c->bdi); 2016 if (err) 2017 goto out_close;

··· 2011 */ 2012 c->bdi.name = "ubifs", 2013 c->bdi.capabilities = BDI_CAP_MAP_COPY; 2014 err = bdi_init(&c->bdi); 2015 if (err) 2016 goto out_close;

-1

fs/udf/file.c

··· 98 const struct address_space_operations udf_adinicb_aops = { 99 .readpage = udf_adinicb_readpage, 100 .writepage = udf_adinicb_writepage, 101 - .sync_page = block_sync_page, 102 .write_begin = simple_write_begin, 103 .write_end = udf_adinicb_write_end, 104 };

··· 98 const struct address_space_operations udf_adinicb_aops = { 99 .readpage = udf_adinicb_readpage, 100 .writepage = udf_adinicb_writepage, 101 .write_begin = simple_write_begin, 102 .write_end = udf_adinicb_write_end, 103 };

-1

fs/udf/inode.c

··· 140 const struct address_space_operations udf_aops = { 141 .readpage = udf_readpage, 142 .writepage = udf_writepage, 143 - .sync_page = block_sync_page, 144 .write_begin = udf_write_begin, 145 .write_end = generic_write_end, 146 .bmap = udf_bmap,

··· 140 const struct address_space_operations udf_aops = { 141 .readpage = udf_readpage, 142 .writepage = udf_writepage, 143 .write_begin = udf_write_begin, 144 .write_end = generic_write_end, 145 .bmap = udf_bmap,

-1

fs/ufs/inode.c

··· 552 const struct address_space_operations ufs_aops = { 553 .readpage = ufs_readpage, 554 .writepage = ufs_writepage, 555 - .sync_page = block_sync_page, 556 .write_begin = ufs_write_begin, 557 .write_end = generic_write_end, 558 .bmap = ufs_bmap

··· 552 const struct address_space_operations ufs_aops = { 553 .readpage = ufs_readpage, 554 .writepage = ufs_writepage, 555 .write_begin = ufs_write_begin, 556 .write_end = generic_write_end, 557 .bmap = ufs_bmap

+1 -1

fs/ufs/truncate.c

··· 479 break; 480 if (IS_SYNC(inode) && (inode->i_state & I_DIRTY)) 481 ufs_sync_inode (inode); 482 - blk_run_address_space(inode->i_mapping); 483 yield(); 484 } 485

··· 479 break; 480 if (IS_SYNC(inode) && (inode->i_state & I_DIRTY)) 481 ufs_sync_inode (inode); 482 + blk_flush_plug(current); 483 yield(); 484 } 485

+1 -3

fs/xfs/linux-2.6/xfs_aops.c

··· 413 if (xfs_ioend_new_eof(ioend)) 414 xfs_mark_inode_dirty(XFS_I(ioend->io_inode)); 415 416 - submit_bio(wbc->sync_mode == WB_SYNC_ALL ? 417 - WRITE_SYNC_PLUG : WRITE, bio); 418 } 419 420 STATIC struct bio * ··· 1494 .readpages = xfs_vm_readpages, 1495 .writepage = xfs_vm_writepage, 1496 .writepages = xfs_vm_writepages, 1497 - .sync_page = block_sync_page, 1498 .releasepage = xfs_vm_releasepage, 1499 .invalidatepage = xfs_vm_invalidatepage, 1500 .write_begin = xfs_vm_write_begin,

··· 413 if (xfs_ioend_new_eof(ioend)) 414 xfs_mark_inode_dirty(XFS_I(ioend->io_inode)); 415 416 + submit_bio(wbc->sync_mode == WB_SYNC_ALL ? WRITE_SYNC : WRITE, bio); 417 } 418 419 STATIC struct bio * ··· 1495 .readpages = xfs_vm_readpages, 1496 .writepage = xfs_vm_writepage, 1497 .writepages = xfs_vm_writepages, 1498 .releasepage = xfs_vm_releasepage, 1499 .invalidatepage = xfs_vm_invalidatepage, 1500 .write_begin = xfs_vm_write_begin,

+5 -8

fs/xfs/linux-2.6/xfs_buf.c

··· 990 if (atomic_read(&bp->b_pin_count) && (bp->b_flags & XBF_STALE)) 991 xfs_log_force(bp->b_target->bt_mount, 0); 992 if (atomic_read(&bp->b_io_remaining)) 993 - blk_run_address_space(bp->b_target->bt_mapping); 994 down(&bp->b_sema); 995 XB_SET_OWNER(bp); 996 ··· 1034 set_current_state(TASK_UNINTERRUPTIBLE); 1035 if (atomic_read(&bp->b_pin_count) == 0) 1036 break; 1037 - if (atomic_read(&bp->b_io_remaining)) 1038 - blk_run_address_space(bp->b_target->bt_mapping); 1039 - schedule(); 1040 } 1041 remove_wait_queue(&bp->b_waiters, &wait); 1042 set_current_state(TASK_RUNNING); ··· 1440 trace_xfs_buf_iowait(bp, _RET_IP_); 1441 1442 if (atomic_read(&bp->b_io_remaining)) 1443 - blk_run_address_space(bp->b_target->bt_mapping); 1444 wait_for_completion(&bp->b_iowait); 1445 1446 trace_xfs_buf_iowait_done(bp, _RET_IP_); ··· 1664 struct inode *inode; 1665 struct address_space *mapping; 1666 static const struct address_space_operations mapping_aops = { 1667 - .sync_page = block_sync_page, 1668 .migratepage = fail_migrate_page, 1669 }; 1670 ··· 1944 count++; 1945 } 1946 if (count) 1947 - blk_run_address_space(target->bt_mapping); 1948 1949 } while (!kthread_should_stop()); 1950 ··· 1992 1993 if (wait) { 1994 /* Expedite and wait for IO to complete. */ 1995 - blk_run_address_space(target->bt_mapping); 1996 while (!list_empty(&wait_list)) { 1997 bp = list_first_entry(&wait_list, struct xfs_buf, b_list); 1998

··· 990 if (atomic_read(&bp->b_pin_count) && (bp->b_flags & XBF_STALE)) 991 xfs_log_force(bp->b_target->bt_mount, 0); 992 if (atomic_read(&bp->b_io_remaining)) 993 + blk_flush_plug(current); 994 down(&bp->b_sema); 995 XB_SET_OWNER(bp); 996 ··· 1034 set_current_state(TASK_UNINTERRUPTIBLE); 1035 if (atomic_read(&bp->b_pin_count) == 0) 1036 break; 1037 + io_schedule(); 1038 } 1039 remove_wait_queue(&bp->b_waiters, &wait); 1040 set_current_state(TASK_RUNNING); ··· 1442 trace_xfs_buf_iowait(bp, _RET_IP_); 1443 1444 if (atomic_read(&bp->b_io_remaining)) 1445 + blk_flush_plug(current); 1446 wait_for_completion(&bp->b_iowait); 1447 1448 trace_xfs_buf_iowait_done(bp, _RET_IP_); ··· 1666 struct inode *inode; 1667 struct address_space *mapping; 1668 static const struct address_space_operations mapping_aops = { 1669 .migratepage = fail_migrate_page, 1670 }; 1671 ··· 1947 count++; 1948 } 1949 if (count) 1950 + blk_flush_plug(current); 1951 1952 } while (!kthread_should_stop()); 1953 ··· 1995 1996 if (wait) { 1997 /* Expedite and wait for IO to complete. */ 1998 + blk_flush_plug(current); 1999 while (!list_empty(&wait_list)) { 2000 bp = list_first_entry(&wait_list, struct xfs_buf, b_list); 2001

-16

include/linux/backing-dev.h

··· 66 unsigned int capabilities; /* Device capabilities */ 67 congested_fn *congested_fn; /* Function pointer if device is md/dm */ 68 void *congested_data; /* Pointer to aux data for congested func */ 69 - void (*unplug_io_fn)(struct backing_dev_info *, struct page *); 70 - void *unplug_io_data; 71 72 char *name; 73 ··· 249 250 extern struct backing_dev_info default_backing_dev_info; 251 extern struct backing_dev_info noop_backing_dev_info; 252 - void default_unplug_io_fn(struct backing_dev_info *bdi, struct page *page); 253 254 int writeback_in_progress(struct backing_dev_info *bdi); 255 ··· 331 { 332 schedule(); 333 return 0; 334 - } 335 - 336 - static inline void blk_run_backing_dev(struct backing_dev_info *bdi, 337 - struct page *page) 338 - { 339 - if (bdi && bdi->unplug_io_fn) 340 - bdi->unplug_io_fn(bdi, page); 341 - } 342 - 343 - static inline void blk_run_address_space(struct address_space *mapping) 344 - { 345 - if (mapping) 346 - blk_run_backing_dev(mapping->backing_dev_info, NULL); 347 } 348 349 #endif /* _LINUX_BACKING_DEV_H */

··· 66 unsigned int capabilities; /* Device capabilities */ 67 congested_fn *congested_fn; /* Function pointer if device is md/dm */ 68 void *congested_data; /* Pointer to aux data for congested func */ 69 70 char *name; 71 ··· 251 252 extern struct backing_dev_info default_backing_dev_info; 253 extern struct backing_dev_info noop_backing_dev_info; 254 255 int writeback_in_progress(struct backing_dev_info *bdi); 256 ··· 334 { 335 schedule(); 336 return 0; 337 } 338 339 #endif /* _LINUX_BACKING_DEV_H */

-1

include/linux/bio.h

··· 304 }; 305 306 extern struct bio_set *fs_bio_set; 307 - extern struct biovec_slab bvec_slabs[BIOVEC_NR_POOLS] __read_mostly; 308 309 /* 310 * a small number of entries is fine, not going to be performance critical.

··· 304 }; 305 306 extern struct bio_set *fs_bio_set; 307 308 /* 309 * a small number of entries is fine, not going to be performance critical.

+4 -2

include/linux/blk_types.h

··· 128 __REQ_NOIDLE, /* don't anticipate more IO after this one */ 129 130 /* bio only flags */ 131 - __REQ_UNPLUG, /* unplug the immediately after submission */ 132 __REQ_RAHEAD, /* read ahead, can fail anytime */ 133 __REQ_THROTTLED, /* This bio has already been subjected to 134 * throttling rules. Don't do it again. */ ··· 147 __REQ_ALLOCED, /* request came from our alloc pool */ 148 __REQ_COPY_USER, /* contains copies of user pages */ 149 __REQ_FLUSH, /* request for cache flush */ 150 __REQ_IO_STAT, /* account I/O stat */ 151 __REQ_MIXED_MERGE, /* merge of different types, fail separately */ 152 __REQ_SECURE, /* secure discard (used with __REQ_DISCARD) */ 153 __REQ_NR_BITS, /* stops here */ 154 }; 155 ··· 171 REQ_NOIDLE | REQ_FLUSH | REQ_FUA) 172 #define REQ_CLONE_MASK REQ_COMMON_MASK 173 174 - #define REQ_UNPLUG (1 << __REQ_UNPLUG) 175 #define REQ_RAHEAD (1 << __REQ_RAHEAD) 176 #define REQ_THROTTLED (1 << __REQ_THROTTLED) 177 ··· 188 #define REQ_ALLOCED (1 << __REQ_ALLOCED) 189 #define REQ_COPY_USER (1 << __REQ_COPY_USER) 190 #define REQ_FLUSH (1 << __REQ_FLUSH) 191 #define REQ_IO_STAT (1 << __REQ_IO_STAT) 192 #define REQ_MIXED_MERGE (1 << __REQ_MIXED_MERGE) 193 #define REQ_SECURE (1 << __REQ_SECURE) 194 195 #endif /* __LINUX_BLK_TYPES_H */

··· 128 __REQ_NOIDLE, /* don't anticipate more IO after this one */ 129 130 /* bio only flags */ 131 __REQ_RAHEAD, /* read ahead, can fail anytime */ 132 __REQ_THROTTLED, /* This bio has already been subjected to 133 * throttling rules. Don't do it again. */ ··· 148 __REQ_ALLOCED, /* request came from our alloc pool */ 149 __REQ_COPY_USER, /* contains copies of user pages */ 150 __REQ_FLUSH, /* request for cache flush */ 151 + __REQ_FLUSH_SEQ, /* request for flush sequence */ 152 __REQ_IO_STAT, /* account I/O stat */ 153 __REQ_MIXED_MERGE, /* merge of different types, fail separately */ 154 __REQ_SECURE, /* secure discard (used with __REQ_DISCARD) */ 155 + __REQ_ON_PLUG, /* on plug list */ 156 __REQ_NR_BITS, /* stops here */ 157 }; 158 ··· 170 REQ_NOIDLE | REQ_FLUSH | REQ_FUA) 171 #define REQ_CLONE_MASK REQ_COMMON_MASK 172 173 #define REQ_RAHEAD (1 << __REQ_RAHEAD) 174 #define REQ_THROTTLED (1 << __REQ_THROTTLED) 175 ··· 188 #define REQ_ALLOCED (1 << __REQ_ALLOCED) 189 #define REQ_COPY_USER (1 << __REQ_COPY_USER) 190 #define REQ_FLUSH (1 << __REQ_FLUSH) 191 + #define REQ_FLUSH_SEQ (1 << __REQ_FLUSH_SEQ) 192 #define REQ_IO_STAT (1 << __REQ_IO_STAT) 193 #define REQ_MIXED_MERGE (1 << __REQ_MIXED_MERGE) 194 #define REQ_SECURE (1 << __REQ_SECURE) 195 + #define REQ_ON_PLUG (1 << __REQ_ON_PLUG) 196 197 #endif /* __LINUX_BLK_TYPES_H */

+70 -31

include/linux/blkdev.h

··· 108 109 /* 110 * Three pointers are available for the IO schedulers, if they need 111 - * more they have to dynamically allocate it. 112 */ 113 - void *elevator_private; 114 - void *elevator_private2; 115 - void *elevator_private3; 116 117 struct gendisk *rq_disk; 118 struct hd_struct *part; ··· 196 typedef int (make_request_fn) (struct request_queue *q, struct bio *bio); 197 typedef int (prep_rq_fn) (struct request_queue *, struct request *); 198 typedef void (unprep_rq_fn) (struct request_queue *, struct request *); 199 - typedef void (unplug_fn) (struct request_queue *); 200 201 struct bio_vec; 202 struct bvec_merge_data { ··· 278 make_request_fn *make_request_fn; 279 prep_rq_fn *prep_rq_fn; 280 unprep_rq_fn *unprep_rq_fn; 281 - unplug_fn *unplug_fn; 282 merge_bvec_fn *merge_bvec_fn; 283 softirq_done_fn *softirq_done_fn; 284 rq_timed_out_fn *rq_timed_out_fn; ··· 291 struct request *boundary_rq; 292 293 /* 294 - * Auto-unplugging state 295 */ 296 - struct timer_list unplug_timer; 297 - int unplug_thresh; /* After this many requests */ 298 - unsigned long unplug_delay; /* After this many jiffies */ 299 - struct work_struct unplug_work; 300 301 struct backing_dev_info backing_dev_info; 302 ··· 364 * for flush operations 365 */ 366 unsigned int flush_flags; 367 - unsigned int flush_seq; 368 - int flush_err; 369 struct request flush_rq; 370 - struct request *orig_flush_rq; 371 - struct list_head pending_flushes; 372 373 struct mutex sysfs_lock; 374 ··· 389 #define QUEUE_FLAG_ASYNCFULL 4 /* write queue has been filled */ 390 #define QUEUE_FLAG_DEAD 5 /* queue being torn down */ 391 #define QUEUE_FLAG_REENTER 6 /* Re-entrancy avoidance */ 392 - #define QUEUE_FLAG_PLUGGED 7 /* queue is plugged */ 393 - #define QUEUE_FLAG_ELVSWITCH 8 /* don't use elevator, just do FIFO */ 394 - #define QUEUE_FLAG_BIDI 9 /* queue supports bidi requests */ 395 - #define QUEUE_FLAG_NOMERGES 10 /* disable merge attempts */ 396 - #define QUEUE_FLAG_SAME_COMP 11 /* force complete on same CPU */ 397 - #define QUEUE_FLAG_FAIL_IO 12 /* fake timeout */ 398 - #define QUEUE_FLAG_STACKABLE 13 /* supports request stacking */ 399 - #define QUEUE_FLAG_NONROT 14 /* non-rotational device (SSD) */ 400 #define QUEUE_FLAG_VIRT QUEUE_FLAG_NONROT /* paravirt device */ 401 #define QUEUE_FLAG_IO_STAT 15 /* do IO stats */ 402 #define QUEUE_FLAG_DISCARD 16 /* supports DISCARD */ ··· 473 __clear_bit(flag, &q->queue_flags); 474 } 475 476 - #define blk_queue_plugged(q) test_bit(QUEUE_FLAG_PLUGGED, &(q)->queue_flags) 477 #define blk_queue_tagged(q) test_bit(QUEUE_FLAG_QUEUED, &(q)->queue_flags) 478 #define blk_queue_stopped(q) test_bit(QUEUE_FLAG_STOPPED, &(q)->queue_flags) 479 #define blk_queue_nomerges(q) test_bit(QUEUE_FLAG_NOMERGES, &(q)->queue_flags) ··· 667 extern void blk_rq_unprep_clone(struct request *rq); 668 extern int blk_insert_cloned_request(struct request_queue *q, 669 struct request *rq); 670 - extern void blk_plug_device(struct request_queue *); 671 - extern void blk_plug_device_unlocked(struct request_queue *); 672 - extern int blk_remove_plug(struct request_queue *); 673 extern void blk_recount_segments(struct request_queue *, struct bio *); 674 extern int scsi_cmd_ioctl(struct request_queue *, struct gendisk *, fmode_t, 675 unsigned int, void __user *); ··· 711 struct request *, int); 712 extern void blk_execute_rq_nowait(struct request_queue *, struct gendisk *, 713 struct request *, int, rq_end_io_fn *); 714 - extern void blk_unplug(struct request_queue *q); 715 716 static inline struct request_queue *bdev_get_queue(struct block_device *bdev) 717 { ··· 847 848 extern int blk_rq_map_sg(struct request_queue *, struct request *, struct scatterlist *); 849 extern void blk_dump_rq_flags(struct request *, char *); 850 - extern void generic_unplug_device(struct request_queue *); 851 extern long nr_blockdev_pages(void); 852 853 int blk_get_queue(struct request_queue *); 854 struct request_queue *blk_alloc_queue(gfp_t); 855 struct request_queue *blk_alloc_queue_node(gfp_t, int); 856 extern void blk_put_queue(struct request_queue *); 857 858 /* 859 * tag stuff ··· 1156 extern int blk_throtl_init(struct request_queue *q); 1157 extern void blk_throtl_exit(struct request_queue *q); 1158 extern int blk_throtl_bio(struct request_queue *q, struct bio **bio); 1159 - extern void throtl_shutdown_timer_wq(struct request_queue *q); 1160 #else /* CONFIG_BLK_DEV_THROTTLING */ 1161 static inline int blk_throtl_bio(struct request_queue *q, struct bio **bio) 1162 { ··· 1164 1165 static inline int blk_throtl_init(struct request_queue *q) { return 0; } 1166 static inline int blk_throtl_exit(struct request_queue *q) { return 0; } 1167 - static inline void throtl_shutdown_timer_wq(struct request_queue *q) {} 1168 #endif /* CONFIG_BLK_DEV_THROTTLING */ 1169 1170 #define MODULE_ALIAS_BLOCKDEV(major,minor) \ ··· 1295 static inline long nr_blockdev_pages(void) 1296 { 1297 return 0; 1298 } 1299 1300 #endif /* CONFIG_BLOCK */

··· 108 109 /* 110 * Three pointers are available for the IO schedulers, if they need 111 + * more they have to dynamically allocate it. Flush requests are 112 + * never put on the IO scheduler. So let the flush fields share 113 + * space with the three elevator_private pointers. 114 */ 115 + union { 116 + void *elevator_private[3]; 117 + struct { 118 + unsigned int seq; 119 + struct list_head list; 120 + } flush; 121 + }; 122 123 struct gendisk *rq_disk; 124 struct hd_struct *part; ··· 190 typedef int (make_request_fn) (struct request_queue *q, struct bio *bio); 191 typedef int (prep_rq_fn) (struct request_queue *, struct request *); 192 typedef void (unprep_rq_fn) (struct request_queue *, struct request *); 193 194 struct bio_vec; 195 struct bvec_merge_data { ··· 273 make_request_fn *make_request_fn; 274 prep_rq_fn *prep_rq_fn; 275 unprep_rq_fn *unprep_rq_fn; 276 merge_bvec_fn *merge_bvec_fn; 277 softirq_done_fn *softirq_done_fn; 278 rq_timed_out_fn *rq_timed_out_fn; ··· 287 struct request *boundary_rq; 288 289 /* 290 + * Delayed queue handling 291 */ 292 + struct delayed_work delay_work; 293 294 struct backing_dev_info backing_dev_info; 295 ··· 363 * for flush operations 364 */ 365 unsigned int flush_flags; 366 + unsigned int flush_pending_idx:1; 367 + unsigned int flush_running_idx:1; 368 + unsigned long flush_pending_since; 369 + struct list_head flush_queue[2]; 370 + struct list_head flush_data_in_flight; 371 struct request flush_rq; 372 373 struct mutex sysfs_lock; 374 ··· 387 #define QUEUE_FLAG_ASYNCFULL 4 /* write queue has been filled */ 388 #define QUEUE_FLAG_DEAD 5 /* queue being torn down */ 389 #define QUEUE_FLAG_REENTER 6 /* Re-entrancy avoidance */ 390 + #define QUEUE_FLAG_ELVSWITCH 7 /* don't use elevator, just do FIFO */ 391 + #define QUEUE_FLAG_BIDI 8 /* queue supports bidi requests */ 392 + #define QUEUE_FLAG_NOMERGES 9 /* disable merge attempts */ 393 + #define QUEUE_FLAG_SAME_COMP 10 /* force complete on same CPU */ 394 + #define QUEUE_FLAG_FAIL_IO 11 /* fake timeout */ 395 + #define QUEUE_FLAG_STACKABLE 12 /* supports request stacking */ 396 + #define QUEUE_FLAG_NONROT 13 /* non-rotational device (SSD) */ 397 #define QUEUE_FLAG_VIRT QUEUE_FLAG_NONROT /* paravirt device */ 398 #define QUEUE_FLAG_IO_STAT 15 /* do IO stats */ 399 #define QUEUE_FLAG_DISCARD 16 /* supports DISCARD */ ··· 472 __clear_bit(flag, &q->queue_flags); 473 } 474 475 #define blk_queue_tagged(q) test_bit(QUEUE_FLAG_QUEUED, &(q)->queue_flags) 476 #define blk_queue_stopped(q) test_bit(QUEUE_FLAG_STOPPED, &(q)->queue_flags) 477 #define blk_queue_nomerges(q) test_bit(QUEUE_FLAG_NOMERGES, &(q)->queue_flags) ··· 667 extern void blk_rq_unprep_clone(struct request *rq); 668 extern int blk_insert_cloned_request(struct request_queue *q, 669 struct request *rq); 670 + extern void blk_delay_queue(struct request_queue *, unsigned long); 671 extern void blk_recount_segments(struct request_queue *, struct bio *); 672 extern int scsi_cmd_ioctl(struct request_queue *, struct gendisk *, fmode_t, 673 unsigned int, void __user *); ··· 713 struct request *, int); 714 extern void blk_execute_rq_nowait(struct request_queue *, struct gendisk *, 715 struct request *, int, rq_end_io_fn *); 716 717 static inline struct request_queue *bdev_get_queue(struct block_device *bdev) 718 { ··· 850 851 extern int blk_rq_map_sg(struct request_queue *, struct request *, struct scatterlist *); 852 extern void blk_dump_rq_flags(struct request *, char *); 853 extern long nr_blockdev_pages(void); 854 855 int blk_get_queue(struct request_queue *); 856 struct request_queue *blk_alloc_queue(gfp_t); 857 struct request_queue *blk_alloc_queue_node(gfp_t, int); 858 extern void blk_put_queue(struct request_queue *); 859 + 860 + struct blk_plug { 861 + unsigned long magic; 862 + struct list_head list; 863 + unsigned int should_sort; 864 + }; 865 + 866 + extern void blk_start_plug(struct blk_plug *); 867 + extern void blk_finish_plug(struct blk_plug *); 868 + extern void __blk_flush_plug(struct task_struct *, struct blk_plug *); 869 + 870 + static inline void blk_flush_plug(struct task_struct *tsk) 871 + { 872 + struct blk_plug *plug = tsk->plug; 873 + 874 + if (unlikely(plug)) 875 + __blk_flush_plug(tsk, plug); 876 + } 877 + 878 + static inline bool blk_needs_flush_plug(struct task_struct *tsk) 879 + { 880 + struct blk_plug *plug = tsk->plug; 881 + 882 + return plug && !list_empty(&plug->list); 883 + } 884 885 /* 886 * tag stuff ··· 1135 extern int blk_throtl_init(struct request_queue *q); 1136 extern void blk_throtl_exit(struct request_queue *q); 1137 extern int blk_throtl_bio(struct request_queue *q, struct bio **bio); 1138 #else /* CONFIG_BLK_DEV_THROTTLING */ 1139 static inline int blk_throtl_bio(struct request_queue *q, struct bio **bio) 1140 { ··· 1144 1145 static inline int blk_throtl_init(struct request_queue *q) { return 0; } 1146 static inline int blk_throtl_exit(struct request_queue *q) { return 0; } 1147 #endif /* CONFIG_BLK_DEV_THROTTLING */ 1148 1149 #define MODULE_ALIAS_BLOCKDEV(major,minor) \ ··· 1276 static inline long nr_blockdev_pages(void) 1277 { 1278 return 0; 1279 + } 1280 + 1281 + struct blk_plug { 1282 + }; 1283 + 1284 + static inline void blk_start_plug(struct blk_plug *plug) 1285 + { 1286 + } 1287 + 1288 + static inline void blk_finish_plug(struct blk_plug *plug) 1289 + { 1290 + } 1291 + 1292 + static inline void blk_flush_plug(struct task_struct *task) 1293 + { 1294 + } 1295 + 1296 + static inline bool blk_needs_flush_plug(struct task_struct *tsk) 1297 + { 1298 + return false; 1299 } 1300 1301 #endif /* CONFIG_BLOCK */

-1

include/linux/buffer_head.h

··· 219 int block_commit_write(struct page *page, unsigned from, unsigned to); 220 int block_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf, 221 get_block_t get_block); 222 - void block_sync_page(struct page *); 223 sector_t generic_block_bmap(struct address_space *, sector_t, get_block_t *); 224 int block_truncate_page(struct address_space *, loff_t, get_block_t *); 225 int nobh_write_begin(struct address_space *, loff_t, unsigned, unsigned,

··· 219 int block_commit_write(struct page *page, unsigned from, unsigned to); 220 int block_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf, 221 get_block_t get_block); 222 sector_t generic_block_bmap(struct address_space *, sector_t, get_block_t *); 223 int block_truncate_page(struct address_space *, loff_t, get_block_t *); 224 int nobh_write_begin(struct address_space *, loff_t, unsigned, unsigned,

-5

include/linux/device-mapper.h

··· 286 int dm_table_complete(struct dm_table *t); 287 288 /* 289 - * Unplug all devices in a table. 290 - */ 291 - void dm_table_unplug_all(struct dm_table *t); 292 - 293 - /* 294 * Table reference counting. 295 */ 296 struct dm_table *dm_get_live_table(struct mapped_device *md);

··· 286 int dm_table_complete(struct dm_table *t); 287 288 /* 289 * Table reference counting. 290 */ 291 struct dm_table *dm_get_live_table(struct mapped_device *md);

+5 -5

include/linux/elevator.h

··· 20 typedef int (elevator_dispatch_fn) (struct request_queue *, int); 21 22 typedef void (elevator_add_req_fn) (struct request_queue *, struct request *); 23 - typedef int (elevator_queue_empty_fn) (struct request_queue *); 24 typedef struct request *(elevator_request_list_fn) (struct request_queue *, struct request *); 25 typedef void (elevator_completed_req_fn) (struct request_queue *, struct request *); 26 typedef int (elevator_may_queue_fn) (struct request_queue *, int); ··· 45 elevator_activate_req_fn *elevator_activate_req_fn; 46 elevator_deactivate_req_fn *elevator_deactivate_req_fn; 47 48 - elevator_queue_empty_fn *elevator_queue_empty_fn; 49 elevator_completed_req_fn *elevator_completed_req_fn; 50 51 elevator_request_list_fn *elevator_former_req_fn; ··· 99 */ 100 extern void elv_dispatch_sort(struct request_queue *, struct request *); 101 extern void elv_dispatch_add_tail(struct request_queue *, struct request *); 102 - extern void elv_add_request(struct request_queue *, struct request *, int, int); 103 - extern void __elv_add_request(struct request_queue *, struct request *, int, int); 104 extern void elv_insert(struct request_queue *, struct request *, int); 105 extern int elv_merge(struct request_queue *, struct request **, struct bio *); 106 extern void elv_merge_requests(struct request_queue *, struct request *, 107 struct request *); 108 extern void elv_merged_request(struct request_queue *, struct request *, int); 109 extern void elv_bio_merged(struct request_queue *q, struct request *, 110 struct bio *); 111 extern void elv_requeue_request(struct request_queue *, struct request *); 112 - extern int elv_queue_empty(struct request_queue *); 113 extern struct request *elv_former_request(struct request_queue *, struct request *); 114 extern struct request *elv_latter_request(struct request_queue *, struct request *); 115 extern int elv_register_queue(struct request_queue *q); ··· 165 #define ELEVATOR_INSERT_BACK 2 166 #define ELEVATOR_INSERT_SORT 3 167 #define ELEVATOR_INSERT_REQUEUE 4 168 169 /* 170 * return values from elevator_may_queue_fn

··· 20 typedef int (elevator_dispatch_fn) (struct request_queue *, int); 21 22 typedef void (elevator_add_req_fn) (struct request_queue *, struct request *); 23 typedef struct request *(elevator_request_list_fn) (struct request_queue *, struct request *); 24 typedef void (elevator_completed_req_fn) (struct request_queue *, struct request *); 25 typedef int (elevator_may_queue_fn) (struct request_queue *, int); ··· 46 elevator_activate_req_fn *elevator_activate_req_fn; 47 elevator_deactivate_req_fn *elevator_deactivate_req_fn; 48 49 elevator_completed_req_fn *elevator_completed_req_fn; 50 51 elevator_request_list_fn *elevator_former_req_fn; ··· 101 */ 102 extern void elv_dispatch_sort(struct request_queue *, struct request *); 103 extern void elv_dispatch_add_tail(struct request_queue *, struct request *); 104 + extern void elv_add_request(struct request_queue *, struct request *, int); 105 + extern void __elv_add_request(struct request_queue *, struct request *, int); 106 extern void elv_insert(struct request_queue *, struct request *, int); 107 extern int elv_merge(struct request_queue *, struct request **, struct bio *); 108 + extern int elv_try_merge(struct request *, struct bio *); 109 extern void elv_merge_requests(struct request_queue *, struct request *, 110 struct request *); 111 extern void elv_merged_request(struct request_queue *, struct request *, int); 112 extern void elv_bio_merged(struct request_queue *q, struct request *, 113 struct bio *); 114 extern void elv_requeue_request(struct request_queue *, struct request *); 115 extern struct request *elv_former_request(struct request_queue *, struct request *); 116 extern struct request *elv_latter_request(struct request_queue *, struct request *); 117 extern int elv_register_queue(struct request_queue *q); ··· 167 #define ELEVATOR_INSERT_BACK 2 168 #define ELEVATOR_INSERT_SORT 3 169 #define ELEVATOR_INSERT_REQUEUE 4 170 + #define ELEVATOR_INSERT_FLUSH 5 171 + #define ELEVATOR_INSERT_SORT_MERGE 6 172 173 /* 174 * return values from elevator_may_queue_fn

+9 -20

include/linux/fs.h

··· 138 * block layer could (in theory) choose to ignore this 139 * request if it runs into resource problems. 140 * WRITE A normal async write. Device will be plugged. 141 - * WRITE_SYNC_PLUG Synchronous write. Identical to WRITE, but passes down 142 * the hint that someone will be waiting on this IO 143 - * shortly. The device must still be unplugged explicitly, 144 - * WRITE_SYNC_PLUG does not do this as we could be 145 - * submitting more writes before we actually wait on any 146 - * of them. 147 - * WRITE_SYNC Like WRITE_SYNC_PLUG, but also unplugs the device 148 - * immediately after submission. The write equivalent 149 - * of READ_SYNC. 150 - * WRITE_ODIRECT_PLUG Special case write for O_DIRECT only. 151 * WRITE_FLUSH Like WRITE_SYNC but with preceding cache flush. 152 * WRITE_FUA Like WRITE_SYNC but data is guaranteed to be on 153 * non-volatile media on completion. ··· 157 #define WRITE RW_MASK 158 #define READA RWA_MASK 159 160 - #define READ_SYNC (READ | REQ_SYNC | REQ_UNPLUG) 161 #define READ_META (READ | REQ_META) 162 - #define WRITE_SYNC_PLUG (WRITE | REQ_SYNC | REQ_NOIDLE) 163 - #define WRITE_SYNC (WRITE | REQ_SYNC | REQ_NOIDLE | REQ_UNPLUG) 164 - #define WRITE_ODIRECT_PLUG (WRITE | REQ_SYNC) 165 #define WRITE_META (WRITE | REQ_META) 166 - #define WRITE_FLUSH (WRITE | REQ_SYNC | REQ_NOIDLE | REQ_UNPLUG | \ 167 - REQ_FLUSH) 168 - #define WRITE_FUA (WRITE | REQ_SYNC | REQ_NOIDLE | REQ_UNPLUG | \ 169 - REQ_FUA) 170 - #define WRITE_FLUSH_FUA (WRITE | REQ_SYNC | REQ_NOIDLE | REQ_UNPLUG | \ 171 - REQ_FLUSH | REQ_FUA) 172 173 #define SEL_IN 1 174 #define SEL_OUT 2 ··· 576 struct address_space_operations { 577 int (*writepage)(struct page *page, struct writeback_control *wbc); 578 int (*readpage)(struct file *, struct page *); 579 - void (*sync_page)(struct page *); 580 581 /* Write back some dirty pages from this mapping. */ 582 int (*writepages)(struct address_space *, struct writeback_control *);

··· 138 * block layer could (in theory) choose to ignore this 139 * request if it runs into resource problems. 140 * WRITE A normal async write. Device will be plugged. 141 + * WRITE_SYNC Synchronous write. Identical to WRITE, but passes down 142 * the hint that someone will be waiting on this IO 143 + * shortly. The write equivalent of READ_SYNC. 144 + * WRITE_ODIRECT Special case write for O_DIRECT only. 145 * WRITE_FLUSH Like WRITE_SYNC but with preceding cache flush. 146 * WRITE_FUA Like WRITE_SYNC but data is guaranteed to be on 147 * non-volatile media on completion. ··· 163 #define WRITE RW_MASK 164 #define READA RWA_MASK 165 166 + #define READ_SYNC (READ | REQ_SYNC) 167 #define READ_META (READ | REQ_META) 168 + #define WRITE_SYNC (WRITE | REQ_SYNC | REQ_NOIDLE) 169 + #define WRITE_ODIRECT (WRITE | REQ_SYNC) 170 #define WRITE_META (WRITE | REQ_META) 171 + #define WRITE_FLUSH (WRITE | REQ_SYNC | REQ_NOIDLE | REQ_FLUSH) 172 + #define WRITE_FUA (WRITE | REQ_SYNC | REQ_NOIDLE | REQ_FUA) 173 + #define WRITE_FLUSH_FUA (WRITE | REQ_SYNC | REQ_NOIDLE | REQ_FLUSH | REQ_FUA) 174 175 #define SEL_IN 1 176 #define SEL_OUT 2 ··· 586 struct address_space_operations { 587 int (*writepage)(struct page *page, struct writeback_control *wbc); 588 int (*readpage)(struct file *, struct page *); 589 590 /* Write back some dirty pages from this mapping. */ 591 int (*writepages)(struct address_space *, struct writeback_control *);

+6 -6

include/linux/genhd.h

··· 109 int make_it_fail; 110 #endif 111 unsigned long stamp; 112 - int in_flight[2]; 113 #ifdef CONFIG_SMP 114 struct disk_stats __percpu *dkstats; 115 #else ··· 370 371 static inline void part_inc_in_flight(struct hd_struct *part, int rw) 372 { 373 - part->in_flight[rw]++; 374 if (part->partno) 375 - part_to_disk(part)->part0.in_flight[rw]++; 376 } 377 378 static inline void part_dec_in_flight(struct hd_struct *part, int rw) 379 { 380 - part->in_flight[rw]--; 381 if (part->partno) 382 - part_to_disk(part)->part0.in_flight[rw]--; 383 } 384 385 static inline int part_in_flight(struct hd_struct *part) 386 { 387 - return part->in_flight[0] + part->in_flight[1]; 388 } 389 390 static inline struct partition_meta_info *alloc_part_info(struct gendisk *disk)

··· 109 int make_it_fail; 110 #endif 111 unsigned long stamp; 112 + atomic_t in_flight[2]; 113 #ifdef CONFIG_SMP 114 struct disk_stats __percpu *dkstats; 115 #else ··· 370 371 static inline void part_inc_in_flight(struct hd_struct *part, int rw) 372 { 373 + atomic_inc(&part->in_flight[rw]); 374 if (part->partno) 375 + atomic_inc(&part_to_disk(part)->part0.in_flight[rw]); 376 } 377 378 static inline void part_dec_in_flight(struct hd_struct *part, int rw) 379 { 380 + atomic_dec(&part->in_flight[rw]); 381 if (part->partno) 382 + atomic_dec(&part_to_disk(part)->part0.in_flight[rw]); 383 } 384 385 static inline int part_in_flight(struct hd_struct *part) 386 { 387 + return atomic_read(&part->in_flight[0]) + atomic_read(&part->in_flight[1]); 388 } 389 390 static inline struct partition_meta_info *alloc_part_info(struct gendisk *disk)

-12

include/linux/pagemap.h

··· 298 299 extern void __lock_page(struct page *page); 300 extern int __lock_page_killable(struct page *page); 301 - extern void __lock_page_nosync(struct page *page); 302 extern int __lock_page_or_retry(struct page *page, struct mm_struct *mm, 303 unsigned int flags); 304 extern void unlock_page(struct page *page); ··· 340 return 0; 341 } 342 343 - /* 344 - * lock_page_nosync should only be used if we can't pin the page's inode. 345 - * Doesn't play quite so well with block device plugging. 346 - */ 347 - static inline void lock_page_nosync(struct page *page) 348 - { 349 - might_sleep(); 350 - if (!trylock_page(page)) 351 - __lock_page_nosync(page); 352 - } 353 - 354 /* 355 * lock_page_or_retry - Lock the page, unless this would block and the 356 * caller indicated that it can handle a retry.

··· 298 299 extern void __lock_page(struct page *page); 300 extern int __lock_page_killable(struct page *page); 301 extern int __lock_page_or_retry(struct page *page, struct mm_struct *mm, 302 unsigned int flags); 303 extern void unlock_page(struct page *page); ··· 341 return 0; 342 } 343 344 /* 345 * lock_page_or_retry - Lock the page, unless this would block and the 346 * caller indicated that it can handle a retry.

+6

include/linux/sched.h

··· 99 struct bio_list; 100 struct fs_struct; 101 struct perf_event_context; 102 103 /* 104 * List of flags we want to share for kernel threads, ··· 1428 1429 /* stacked block device info */ 1430 struct bio_list *bio_list; 1431 1432 /* VM state */ 1433 struct reclaim_state *reclaim_state;

··· 99 struct bio_list; 100 struct fs_struct; 101 struct perf_event_context; 102 + struct blk_plug; 103 104 /* 105 * List of flags we want to share for kernel threads, ··· 1427 1428 /* stacked block device info */ 1429 struct bio_list *bio_list; 1430 + 1431 + #ifdef CONFIG_BLOCK 1432 + /* stack plugging */ 1433 + struct blk_plug *plug; 1434 + #endif 1435 1436 /* VM state */ 1437 struct reclaim_state *reclaim_state;

-2

include/linux/swap.h

··· 309 struct page **pagep, swp_entry_t *ent); 310 #endif 311 312 - extern void swap_unplug_io_fn(struct backing_dev_info *, struct page *); 313 - 314 #ifdef CONFIG_SWAP 315 /* linux/mm/page_io.c */ 316 extern int swap_readpage(struct page *);

··· 309 struct page **pagep, swp_entry_t *ent); 310 #endif 311 312 #ifdef CONFIG_SWAP 313 /* linux/mm/page_io.c */ 314 extern int swap_readpage(struct page *);

+1

kernel/exit.c

··· 908 profile_task_exit(tsk); 909 910 WARN_ON(atomic_read(&tsk->fs_excl)); 911 912 if (unlikely(in_interrupt())) 913 panic("Aiee, killing interrupt handler!");

··· 908 profile_task_exit(tsk); 909 910 WARN_ON(atomic_read(&tsk->fs_excl)); 911 + WARN_ON(blk_needs_flush_plug(tsk)); 912 913 if (unlikely(in_interrupt())) 914 panic("Aiee, killing interrupt handler!");

+3

kernel/fork.c

··· 1205 * Clear TID on mm_release()? 1206 */ 1207 p->clear_child_tid = (clone_flags & CLONE_CHILD_CLEARTID) ? child_tidptr: NULL; 1208 #ifdef CONFIG_FUTEX 1209 p->robust_list = NULL; 1210 #ifdef CONFIG_COMPAT

··· 1205 * Clear TID on mm_release()? 1206 */ 1207 p->clear_child_tid = (clone_flags & CLONE_CHILD_CLEARTID) ? child_tidptr: NULL; 1208 + #ifdef CONFIG_BLOCK 1209 + p->plug = NULL; 1210 + #endif 1211 #ifdef CONFIG_FUTEX 1212 p->robust_list = NULL; 1213 #ifdef CONFIG_COMPAT

+1 -1

kernel/power/block_io.c

··· 28 static int submit(int rw, struct block_device *bdev, sector_t sector, 29 struct page *page, struct bio **bio_chain) 30 { 31 - const int bio_rw = rw | REQ_SYNC | REQ_UNPLUG; 32 struct bio *bio; 33 34 bio = bio_alloc(__GFP_WAIT | __GFP_HIGH, 1);

··· 28 static int submit(int rw, struct block_device *bdev, sector_t sector, 29 struct page *page, struct bio **bio_chain) 30 { 31 + const int bio_rw = rw | REQ_SYNC; 32 struct bio *bio; 33 34 bio = bio_alloc(__GFP_WAIT | __GFP_HIGH, 1);

+12

kernel/sched.c

··· 4115 switch_count = &prev->nvcsw; 4116 } 4117 4118 pre_schedule(rq, prev); 4119 4120 if (unlikely(!rq->nr_running)) ··· 5538 5539 delayacct_blkio_start(); 5540 atomic_inc(&rq->nr_iowait); 5541 current->in_iowait = 1; 5542 schedule(); 5543 current->in_iowait = 0; ··· 5554 5555 delayacct_blkio_start(); 5556 atomic_inc(&rq->nr_iowait); 5557 current->in_iowait = 1; 5558 ret = schedule_timeout(timeout); 5559 current->in_iowait = 0;

··· 4115 switch_count = &prev->nvcsw; 4116 } 4117 4118 + /* 4119 + * If we are going to sleep and we have plugged IO queued, make 4120 + * sure to submit it to avoid deadlocks. 4121 + */ 4122 + if (prev->state != TASK_RUNNING && blk_needs_flush_plug(prev)) { 4123 + raw_spin_unlock(&rq->lock); 4124 + blk_flush_plug(prev); 4125 + raw_spin_lock(&rq->lock); 4126 + } 4127 + 4128 pre_schedule(rq, prev); 4129 4130 if (unlikely(!rq->nr_running)) ··· 5528 5529 delayacct_blkio_start(); 5530 atomic_inc(&rq->nr_iowait); 5531 + blk_flush_plug(current); 5532 current->in_iowait = 1; 5533 schedule(); 5534 current->in_iowait = 0; ··· 5543 5544 delayacct_blkio_start(); 5545 atomic_inc(&rq->nr_iowait); 5546 + blk_flush_plug(current); 5547 current->in_iowait = 1; 5548 ret = schedule_timeout(timeout); 5549 current->in_iowait = 0;

+4 -11

kernel/trace/blktrace.c

··· 703 * 704 **/ 705 static void blk_add_trace_rq(struct request_queue *q, struct request *rq, 706 - u32 what) 707 { 708 struct blk_trace *bt = q->blk_trace; 709 - int rw = rq->cmd_flags & 0x03; 710 711 if (likely(!bt)) 712 return; 713 714 - if (rq->cmd_flags & REQ_DISCARD) 715 - rw |= REQ_DISCARD; 716 - 717 - if (rq->cmd_flags & REQ_SECURE) 718 - rw |= REQ_SECURE; 719 - 720 if (rq->cmd_type == REQ_TYPE_BLOCK_PC) { 721 what |= BLK_TC_ACT(BLK_TC_PC); 722 - __blk_add_trace(bt, 0, blk_rq_bytes(rq), rw, 723 what, rq->errors, rq->cmd_len, rq->cmd); 724 } else { 725 what |= BLK_TC_ACT(BLK_TC_FS); 726 - __blk_add_trace(bt, blk_rq_pos(rq), blk_rq_bytes(rq), rw, 727 - what, rq->errors, 0, NULL); 728 } 729 } 730

··· 703 * 704 **/ 705 static void blk_add_trace_rq(struct request_queue *q, struct request *rq, 706 + u32 what) 707 { 708 struct blk_trace *bt = q->blk_trace; 709 710 if (likely(!bt)) 711 return; 712 713 if (rq->cmd_type == REQ_TYPE_BLOCK_PC) { 714 what |= BLK_TC_ACT(BLK_TC_PC); 715 + __blk_add_trace(bt, 0, blk_rq_bytes(rq), rq->cmd_flags, 716 what, rq->errors, rq->cmd_len, rq->cmd); 717 } else { 718 what |= BLK_TC_ACT(BLK_TC_FS); 719 + __blk_add_trace(bt, blk_rq_pos(rq), blk_rq_bytes(rq), 720 + rq->cmd_flags, what, rq->errors, 0, NULL); 721 } 722 } 723

+1 -7

mm/backing-dev.c

··· 14 15 static atomic_long_t bdi_seq = ATOMIC_LONG_INIT(0); 16 17 - void default_unplug_io_fn(struct backing_dev_info *bdi, struct page *page) 18 - { 19 - } 20 - EXPORT_SYMBOL(default_unplug_io_fn); 21 - 22 struct backing_dev_info default_backing_dev_info = { 23 .name = "default", 24 .ra_pages = VM_MAX_READAHEAD * 1024 / PAGE_CACHE_SIZE, 25 .state = 0, 26 .capabilities = BDI_CAP_MAP_COPY, 27 - .unplug_io_fn = default_unplug_io_fn, 28 }; 29 EXPORT_SYMBOL_GPL(default_backing_dev_info); 30 ··· 598 spin_lock(&sb_lock); 599 list_for_each_entry(sb, &super_blocks, s_list) { 600 if (sb->s_bdi == bdi) 601 - sb->s_bdi = NULL; 602 } 603 spin_unlock(&sb_lock); 604 }

··· 14 15 static atomic_long_t bdi_seq = ATOMIC_LONG_INIT(0); 16 17 struct backing_dev_info default_backing_dev_info = { 18 .name = "default", 19 .ra_pages = VM_MAX_READAHEAD * 1024 / PAGE_CACHE_SIZE, 20 .state = 0, 21 .capabilities = BDI_CAP_MAP_COPY, 22 }; 23 EXPORT_SYMBOL_GPL(default_backing_dev_info); 24 ··· 604 spin_lock(&sb_lock); 605 list_for_each_entry(sb, &super_blocks, s_list) { 606 if (sb->s_bdi == bdi) 607 + sb->s_bdi = &default_backing_dev_info; 608 } 609 spin_unlock(&sb_lock); 610 }

+13 -61

mm/filemap.c

··· 164 } 165 EXPORT_SYMBOL(delete_from_page_cache); 166 167 - static int sync_page(void *word) 168 { 169 - struct address_space *mapping; 170 - struct page *page; 171 - 172 - page = container_of((unsigned long *)word, struct page, flags); 173 - 174 - /* 175 - * page_mapping() is being called without PG_locked held. 176 - * Some knowledge of the state and use of the page is used to 177 - * reduce the requirements down to a memory barrier. 178 - * The danger here is of a stale page_mapping() return value 179 - * indicating a struct address_space different from the one it's 180 - * associated with when it is associated with one. 181 - * After smp_mb(), it's either the correct page_mapping() for 182 - * the page, or an old page_mapping() and the page's own 183 - * page_mapping() has gone NULL. 184 - * The ->sync_page() address_space operation must tolerate 185 - * page_mapping() going NULL. By an amazing coincidence, 186 - * this comes about because none of the users of the page 187 - * in the ->sync_page() methods make essential use of the 188 - * page_mapping(), merely passing the page down to the backing 189 - * device's unplug functions when it's non-NULL, which in turn 190 - * ignore it for all cases but swap, where only page_private(page) is 191 - * of interest. When page_mapping() does go NULL, the entire 192 - * call stack gracefully ignores the page and returns. 193 - * -- wli 194 - */ 195 - smp_mb(); 196 - mapping = page_mapping(page); 197 - if (mapping && mapping->a_ops && mapping->a_ops->sync_page) 198 - mapping->a_ops->sync_page(page); 199 io_schedule(); 200 return 0; 201 } 202 203 - static int sync_page_killable(void *word) 204 { 205 - sync_page(word); 206 return fatal_signal_pending(current) ? -EINTR : 0; 207 } 208 ··· 528 EXPORT_SYMBOL(__page_cache_alloc); 529 #endif 530 531 - static int __sleep_on_page_lock(void *word) 532 - { 533 - io_schedule(); 534 - return 0; 535 - } 536 - 537 /* 538 * In order to wait for pages to become available there must be 539 * waitqueues associated with pages. By using a hash table of ··· 555 DEFINE_WAIT_BIT(wait, &page->flags, bit_nr); 556 557 if (test_bit(bit_nr, &page->flags)) 558 - __wait_on_bit(page_waitqueue(page), &wait, sync_page, 559 TASK_UNINTERRUPTIBLE); 560 } 561 EXPORT_SYMBOL(wait_on_page_bit); ··· 619 /** 620 * __lock_page - get a lock on the page, assuming we need to sleep to get it 621 * @page: the page to lock 622 - * 623 - * Ugly. Running sync_page() in state TASK_UNINTERRUPTIBLE is scary. If some 624 - * random driver's requestfn sets TASK_RUNNING, we could busywait. However 625 - * chances are that on the second loop, the block layer's plug list is empty, 626 - * so sync_page() will then return in state TASK_UNINTERRUPTIBLE. 627 */ 628 void __lock_page(struct page *page) 629 { 630 DEFINE_WAIT_BIT(wait, &page->flags, PG_locked); 631 632 - __wait_on_bit_lock(page_waitqueue(page), &wait, sync_page, 633 TASK_UNINTERRUPTIBLE); 634 } 635 EXPORT_SYMBOL(__lock_page); ··· 634 DEFINE_WAIT_BIT(wait, &page->flags, PG_locked); 635 636 return __wait_on_bit_lock(page_waitqueue(page), &wait, 637 - sync_page_killable, TASK_KILLABLE); 638 } 639 EXPORT_SYMBOL_GPL(__lock_page_killable); 640 - 641 - /** 642 - * __lock_page_nosync - get a lock on the page, without calling sync_page() 643 - * @page: the page to lock 644 - * 645 - * Variant of lock_page that does not require the caller to hold a reference 646 - * on the page's mapping. 647 - */ 648 - void __lock_page_nosync(struct page *page) 649 - { 650 - DEFINE_WAIT_BIT(wait, &page->flags, PG_locked); 651 - __wait_on_bit_lock(page_waitqueue(page), &wait, __sleep_on_page_lock, 652 - TASK_UNINTERRUPTIBLE); 653 - } 654 655 int __lock_page_or_retry(struct page *page, struct mm_struct *mm, 656 unsigned int flags) ··· 1352 unsigned long seg = 0; 1353 size_t count; 1354 loff_t *ppos = &iocb->ki_pos; 1355 1356 count = 0; 1357 retval = generic_segment_checks(iov, &nr_segs, &count, VERIFY_WRITE); 1358 if (retval) 1359 return retval; 1360 1361 /* coalesce the iovecs and go direct-to-BIO for O_DIRECT */ 1362 if (filp->f_flags & O_DIRECT) { ··· 1433 break; 1434 } 1435 out: 1436 return retval; 1437 } 1438 EXPORT_SYMBOL(generic_file_aio_read); ··· 2545 { 2546 struct file *file = iocb->ki_filp; 2547 struct inode *inode = file->f_mapping->host; 2548 ssize_t ret; 2549 2550 BUG_ON(iocb->ki_pos != pos); 2551 2552 mutex_lock(&inode->i_mutex); 2553 ret = __generic_file_aio_write(iocb, iov, nr_segs, &iocb->ki_pos); 2554 mutex_unlock(&inode->i_mutex); 2555 ··· 2562 if (err < 0 && ret > 0) 2563 ret = err; 2564 } 2565 return ret; 2566 } 2567 EXPORT_SYMBOL(generic_file_aio_write);

··· 164 } 165 EXPORT_SYMBOL(delete_from_page_cache); 166 167 + static int sleep_on_page(void *word) 168 { 169 io_schedule(); 170 return 0; 171 } 172 173 + static int sleep_on_page_killable(void *word) 174 { 175 + sleep_on_page(word); 176 return fatal_signal_pending(current) ? -EINTR : 0; 177 } 178 ··· 558 EXPORT_SYMBOL(__page_cache_alloc); 559 #endif 560 561 /* 562 * In order to wait for pages to become available there must be 563 * waitqueues associated with pages. By using a hash table of ··· 591 DEFINE_WAIT_BIT(wait, &page->flags, bit_nr); 592 593 if (test_bit(bit_nr, &page->flags)) 594 + __wait_on_bit(page_waitqueue(page), &wait, sleep_on_page, 595 TASK_UNINTERRUPTIBLE); 596 } 597 EXPORT_SYMBOL(wait_on_page_bit); ··· 655 /** 656 * __lock_page - get a lock on the page, assuming we need to sleep to get it 657 * @page: the page to lock 658 */ 659 void __lock_page(struct page *page) 660 { 661 DEFINE_WAIT_BIT(wait, &page->flags, PG_locked); 662 663 + __wait_on_bit_lock(page_waitqueue(page), &wait, sleep_on_page, 664 TASK_UNINTERRUPTIBLE); 665 } 666 EXPORT_SYMBOL(__lock_page); ··· 675 DEFINE_WAIT_BIT(wait, &page->flags, PG_locked); 676 677 return __wait_on_bit_lock(page_waitqueue(page), &wait, 678 + sleep_on_page_killable, TASK_KILLABLE); 679 } 680 EXPORT_SYMBOL_GPL(__lock_page_killable); 681 682 int __lock_page_or_retry(struct page *page, struct mm_struct *mm, 683 unsigned int flags) ··· 1407 unsigned long seg = 0; 1408 size_t count; 1409 loff_t *ppos = &iocb->ki_pos; 1410 + struct blk_plug plug; 1411 1412 count = 0; 1413 retval = generic_segment_checks(iov, &nr_segs, &count, VERIFY_WRITE); 1414 if (retval) 1415 return retval; 1416 + 1417 + blk_start_plug(&plug); 1418 1419 /* coalesce the iovecs and go direct-to-BIO for O_DIRECT */ 1420 if (filp->f_flags & O_DIRECT) { ··· 1485 break; 1486 } 1487 out: 1488 + blk_finish_plug(&plug); 1489 return retval; 1490 } 1491 EXPORT_SYMBOL(generic_file_aio_read); ··· 2596 { 2597 struct file *file = iocb->ki_filp; 2598 struct inode *inode = file->f_mapping->host; 2599 + struct blk_plug plug; 2600 ssize_t ret; 2601 2602 BUG_ON(iocb->ki_pos != pos); 2603 2604 mutex_lock(&inode->i_mutex); 2605 + blk_start_plug(&plug); 2606 ret = __generic_file_aio_write(iocb, iov, nr_segs, &iocb->ki_pos); 2607 mutex_unlock(&inode->i_mutex); 2608 ··· 2611 if (err < 0 && ret > 0) 2612 ret = err; 2613 } 2614 + blk_finish_plug(&plug); 2615 return ret; 2616 } 2617 EXPORT_SYMBOL(generic_file_aio_write);

+4 -4

mm/memory-failure.c

··· 945 collect_procs(ppage, &tokill); 946 947 if (hpage != ppage) 948 - lock_page_nosync(ppage); 949 950 ret = try_to_unmap(ppage, ttu); 951 if (ret != SWAP_SUCCESS) ··· 1038 * Check "just unpoisoned", "filter hit", and 1039 * "race with other subpage." 1040 */ 1041 - lock_page_nosync(hpage); 1042 if (!PageHWPoison(hpage) 1043 || (hwpoison_filter(p) && TestClearPageHWPoison(p)) 1044 || (p != hpage && TestSetPageHWPoison(hpage))) { ··· 1088 * It's very difficult to mess with pages currently under IO 1089 * and in many cases impossible, so we just avoid it here. 1090 */ 1091 - lock_page_nosync(hpage); 1092 1093 /* 1094 * unpoison always clear PG_hwpoison inside page lock ··· 1231 return 0; 1232 } 1233 1234 - lock_page_nosync(page); 1235 /* 1236 * This test is racy because PG_hwpoison is set outside of page lock. 1237 * That's acceptable because that won't trigger kernel panic. Instead,

··· 945 collect_procs(ppage, &tokill); 946 947 if (hpage != ppage) 948 + lock_page(ppage); 949 950 ret = try_to_unmap(ppage, ttu); 951 if (ret != SWAP_SUCCESS) ··· 1038 * Check "just unpoisoned", "filter hit", and 1039 * "race with other subpage." 1040 */ 1041 + lock_page(hpage); 1042 if (!PageHWPoison(hpage) 1043 || (hwpoison_filter(p) && TestClearPageHWPoison(p)) 1044 || (p != hpage && TestSetPageHWPoison(hpage))) { ··· 1088 * It's very difficult to mess with pages currently under IO 1089 * and in many cases impossible, so we just avoid it here. 1090 */ 1091 + lock_page(hpage); 1092 1093 /* 1094 * unpoison always clear PG_hwpoison inside page lock ··· 1231 return 0; 1232 } 1233 1234 + lock_page(page); 1235 /* 1236 * This test is racy because PG_hwpoison is set outside of page lock. 1237 * That's acceptable because that won't trigger kernel panic. Instead,

-4

mm/nommu.c

··· 1842 } 1843 EXPORT_SYMBOL(remap_vmalloc_range); 1844 1845 - void swap_unplug_io_fn(struct backing_dev_info *bdi, struct page *page) 1846 - { 1847 - } 1848 - 1849 unsigned long arch_get_unmapped_area(struct file *file, unsigned long addr, 1850 unsigned long len, unsigned long pgoff, unsigned long flags) 1851 {

··· 1842 } 1843 EXPORT_SYMBOL(remap_vmalloc_range); 1844 1845 unsigned long arch_get_unmapped_area(struct file *file, unsigned long addr, 1846 unsigned long len, unsigned long pgoff, unsigned long flags) 1847 {

+8 -2

mm/page-writeback.c

··· 1040 int generic_writepages(struct address_space *mapping, 1041 struct writeback_control *wbc) 1042 { 1043 /* deal with chardevs and other special file */ 1044 if (!mapping->a_ops->writepage) 1045 return 0; 1046 1047 - return write_cache_pages(mapping, wbc, __writepage, mapping); 1048 } 1049 1050 EXPORT_SYMBOL(generic_writepages); ··· 1257 { 1258 int ret; 1259 1260 - lock_page_nosync(page); 1261 ret = set_page_dirty(page); 1262 unlock_page(page); 1263 return ret;

··· 1040 int generic_writepages(struct address_space *mapping, 1041 struct writeback_control *wbc) 1042 { 1043 + struct blk_plug plug; 1044 + int ret; 1045 + 1046 /* deal with chardevs and other special file */ 1047 if (!mapping->a_ops->writepage) 1048 return 0; 1049 1050 + blk_start_plug(&plug); 1051 + ret = write_cache_pages(mapping, wbc, __writepage, mapping); 1052 + blk_finish_plug(&plug); 1053 + return ret; 1054 } 1055 1056 EXPORT_SYMBOL(generic_writepages); ··· 1251 { 1252 int ret; 1253 1254 + lock_page(page); 1255 ret = set_page_dirty(page); 1256 unlock_page(page); 1257 return ret;

+1 -1

mm/page_io.c

··· 106 goto out; 107 } 108 if (wbc->sync_mode == WB_SYNC_ALL) 109 - rw |= REQ_SYNC | REQ_UNPLUG; 110 count_vm_event(PSWPOUT); 111 set_page_writeback(page); 112 unlock_page(page);

··· 106 goto out; 107 } 108 if (wbc->sync_mode == WB_SYNC_ALL) 109 + rw |= REQ_SYNC; 110 count_vm_event(PSWPOUT); 111 set_page_writeback(page); 112 unlock_page(page);

+6 -12

mm/readahead.c

··· 109 static int read_pages(struct address_space *mapping, struct file *filp, 110 struct list_head *pages, unsigned nr_pages) 111 { 112 unsigned page_idx; 113 int ret; 114 115 if (mapping->a_ops->readpages) { 116 ret = mapping->a_ops->readpages(filp, mapping, pages, nr_pages); ··· 132 page_cache_release(page); 133 } 134 ret = 0; 135 out: 136 return ret; 137 } 138 ··· 560 561 /* do read-ahead */ 562 ondemand_readahead(mapping, ra, filp, true, offset, req_size); 563 - 564 - #ifdef CONFIG_BLOCK 565 - /* 566 - * Normally the current page is !uptodate and lock_page() will be 567 - * immediately called to implicitly unplug the device. However this 568 - * is not always true for RAID conifgurations, where data arrives 569 - * not strictly in their submission order. In this case we need to 570 - * explicitly kick off the IO. 571 - */ 572 - if (PageUptodate(page)) 573 - blk_run_backing_dev(mapping->backing_dev_info, NULL); 574 - #endif 575 } 576 EXPORT_SYMBOL_GPL(page_cache_async_readahead);

··· 109 static int read_pages(struct address_space *mapping, struct file *filp, 110 struct list_head *pages, unsigned nr_pages) 111 { 112 + struct blk_plug plug; 113 unsigned page_idx; 114 int ret; 115 + 116 + blk_start_plug(&plug); 117 118 if (mapping->a_ops->readpages) { 119 ret = mapping->a_ops->readpages(filp, mapping, pages, nr_pages); ··· 129 page_cache_release(page); 130 } 131 ret = 0; 132 + 133 out: 134 + blk_finish_plug(&plug); 135 + 136 return ret; 137 } 138 ··· 554 555 /* do read-ahead */ 556 ondemand_readahead(mapping, ra, filp, true, offset, req_size); 557 } 558 EXPORT_SYMBOL_GPL(page_cache_async_readahead);

-1

mm/shmem.c

··· 224 static struct backing_dev_info shmem_backing_dev_info __read_mostly = { 225 .ra_pages = 0, /* No readahead */ 226 .capabilities = BDI_CAP_NO_ACCT_AND_WRITEBACK | BDI_CAP_SWAP_BACKED, 227 - .unplug_io_fn = default_unplug_io_fn, 228 }; 229 230 static LIST_HEAD(shmem_swaplist);

··· 224 static struct backing_dev_info shmem_backing_dev_info __read_mostly = { 225 .ra_pages = 0, /* No readahead */ 226 .capabilities = BDI_CAP_NO_ACCT_AND_WRITEBACK | BDI_CAP_SWAP_BACKED, 227 }; 228 229 static LIST_HEAD(shmem_swaplist);

+1 -4

mm/swap_state.c

··· 24 25 /* 26 * swapper_space is a fiction, retained to simplify the path through 27 - * vmscan's shrink_page_list, to make sync_page look nicer, and to allow 28 - * future use of radix_tree tags in the swap cache. 29 */ 30 static const struct address_space_operations swap_aops = { 31 .writepage = swap_writepage, 32 - .sync_page = block_sync_page, 33 .set_page_dirty = __set_page_dirty_nobuffers, 34 .migratepage = migrate_page, 35 }; ··· 35 static struct backing_dev_info swap_backing_dev_info = { 36 .name = "swap", 37 .capabilities = BDI_CAP_NO_ACCT_AND_WRITEBACK | BDI_CAP_SWAP_BACKED, 38 - .unplug_io_fn = swap_unplug_io_fn, 39 }; 40 41 struct address_space swapper_space = {

··· 24 25 /* 26 * swapper_space is a fiction, retained to simplify the path through 27 + * vmscan's shrink_page_list. 28 */ 29 static const struct address_space_operations swap_aops = { 30 .writepage = swap_writepage, 31 .set_page_dirty = __set_page_dirty_nobuffers, 32 .migratepage = migrate_page, 33 }; ··· 37 static struct backing_dev_info swap_backing_dev_info = { 38 .name = "swap", 39 .capabilities = BDI_CAP_NO_ACCT_AND_WRITEBACK | BDI_CAP_SWAP_BACKED, 40 }; 41 42 struct address_space swapper_space = {

-37

mm/swapfile.c

··· 95 } 96 97 /* 98 - * We need this because the bdev->unplug_fn can sleep and we cannot 99 - * hold swap_lock while calling the unplug_fn. And swap_lock 100 - * cannot be turned into a mutex. 101 - */ 102 - static DECLARE_RWSEM(swap_unplug_sem); 103 - 104 - void swap_unplug_io_fn(struct backing_dev_info *unused_bdi, struct page *page) 105 - { 106 - swp_entry_t entry; 107 - 108 - down_read(&swap_unplug_sem); 109 - entry.val = page_private(page); 110 - if (PageSwapCache(page)) { 111 - struct block_device *bdev = swap_info[swp_type(entry)]->bdev; 112 - struct backing_dev_info *bdi; 113 - 114 - /* 115 - * If the page is removed from swapcache from under us (with a 116 - * racy try_to_unuse/swapoff) we need an additional reference 117 - * count to avoid reading garbage from page_private(page) above. 118 - * If the WARN_ON triggers during a swapoff it maybe the race 119 - * condition and it's harmless. However if it triggers without 120 - * swapoff it signals a problem. 121 - */ 122 - WARN_ON(page_count(page) <= 1); 123 - 124 - bdi = bdev->bd_inode->i_mapping->backing_dev_info; 125 - blk_run_backing_dev(bdi, page); 126 - } 127 - up_read(&swap_unplug_sem); 128 - } 129 - 130 - /* 131 * swapon tell device that all the old swap contents can be discarded, 132 * to allow the swap device to optimize its wear-levelling. 133 */ ··· 1628 enable_swap_info(p, p->prio, p->swap_map); 1629 goto out_dput; 1630 } 1631 - 1632 - /* wait for any unplug function to finish */ 1633 - down_write(&swap_unplug_sem); 1634 - up_write(&swap_unplug_sem); 1635 1636 destroy_swap_extents(p); 1637 if (p->flags & SWP_CONTINUED)

··· 95 } 96 97 /* 98 * swapon tell device that all the old swap contents can be discarded, 99 * to allow the swap device to optimize its wear-levelling. 100 */ ··· 1661 enable_swap_info(p, p->prio, p->swap_map); 1662 goto out_dput; 1663 } 1664 1665 destroy_swap_extents(p); 1666 if (p->flags & SWP_CONTINUED)

+1 -1

mm/vmscan.c

··· 358 static void handle_write_error(struct address_space *mapping, 359 struct page *page, int error) 360 { 361 - lock_page_nosync(page); 362 if (page_mapping(page) == mapping) 363 mapping_set_error(mapping, error); 364 unlock_page(page);

··· 358 static void handle_write_error(struct address_space *mapping, 359 struct page *page, int error) 360 { 361 + lock_page(page); 362 if (page_mapping(page) == mapping) 363 mapping_set_error(mapping, error); 364 unlock_page(page);