Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block

tjh.dev / kernel

fork atom

Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

kernel os linux

fork atom

Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block

* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
block: Range check cpu in blk_cpu_to_group
scatterlist: prevent invalid free when alloc fails
writeback: Fix lost wake-up shutting down writeback thread
writeback: do not lose wakeup events when forking bdi threads
cciss: fix reporting of max queue depth since init
block: switch s390 tape_block and mg_disk to elevator_change()
block: add function call to switch the IO scheduler from a driver
fs/bio-integrity.c: return -ENOMEM on kmalloc failure
bio-integrity.c: remove dependency on __GFP_NOFAIL
BLOCK: fix bio.bi_rw handling
block: put dev->kobj in blk_register_queue fail path
cciss: handle allocation failure
cfq-iosched: Documentation help for new tunables
cfq-iosched: blktrace print per slice sector stats
cfq-iosched: Implement tunable group_idle
cfq-iosched: Do group share accounting in IOPS when slice_idle=0
cfq-iosched: Do not idle if slice_idle=0
cciss: disable doorbell reset on reset_devices
blkio: Fix return code for mkdir calls

Linus Torvalds 15 years ago ff3cb3fe 6ccaa317

+238 -46

17 changed files

expand all collapse all

Documentation

block

cfq-iosched.txt

cgroups

blkio-controller.txt

block

blk-cgroup.c

blk-core.c

blk-sysfs.c

blk.h

cfq-iosched.c

elevator.c

drivers

block

cciss.c

loop.c

mg_disk.c

s390

char

tape_block.c

bio-integrity.c

fs-writeback.c

include

linux

elevator.h

lib

scatterlist.c

backing-dev.c

+45

Documentation/block/cfq-iosched.txt

reviewed

··· 1 1 + CFQ ioscheduler tunables 2 2 + ======================== 3 3 + 4 4 + slice_idle 5 5 + ---------- 6 6 + This specifies how long CFQ should idle for next request on certain cfq queues 7 7 + (for sequential workloads) and service trees (for random workloads) before 8 8 + queue is expired and CFQ selects next queue to dispatch from. 9 9 + 10 10 + By default slice_idle is a non-zero value. That means by default we idle on 11 11 + queues/service trees. This can be very helpful on highly seeky media like 12 12 + single spindle SATA/SAS disks where we can cut down on overall number of 13 13 + seeks and see improved throughput. 14 14 + 15 15 + Setting slice_idle to 0 will remove all the idling on queues/service tree 16 16 + level and one should see an overall improved throughput on faster storage 17 17 + devices like multiple SATA/SAS disks in hardware RAID configuration. The down 18 18 + side is that isolation provided from WRITES also goes down and notion of 19 19 + IO priority becomes weaker. 20 20 + 21 21 + So depending on storage and workload, it might be useful to set slice_idle=0. 22 22 + In general I think for SATA/SAS disks and software RAID of SATA/SAS disks 23 23 + keeping slice_idle enabled should be useful. For any configurations where 24 24 + there are multiple spindles behind single LUN (Host based hardware RAID 25 25 + controller or for storage arrays), setting slice_idle=0 might end up in better 26 26 + throughput and acceptable latencies. 27 27 + 28 28 + CFQ IOPS Mode for group scheduling 29 29 + =================================== 30 30 + Basic CFQ design is to provide priority based time slices. Higher priority 31 31 + process gets bigger time slice and lower priority process gets smaller time 32 32 + slice. Measuring time becomes harder if storage is fast and supports NCQ and 33 33 + it would be better to dispatch multiple requests from multiple cfq queues in 34 34 + request queue at a time. In such scenario, it is not possible to measure time 35 35 + consumed by single queue accurately. 36 36 + 37 37 + What is possible though is to measure number of requests dispatched from a 38 38 + single queue and also allow dispatch from multiple cfq queue at the same time. 39 39 + This effectively becomes the fairness in terms of IOPS (IO operations per 40 40 + second). 41 41 + 42 42 + If one sets slice_idle=0 and if storage supports NCQ, CFQ internally switches 43 43 + to IOPS mode and starts providing fairness in terms of number of requests 44 44 + dispatched. Note that this mode switching takes effect only for group 45 45 + scheduling. For non-cgroup users nothing should change.

+28

Documentation/cgroups/blkio-controller.txt

reviewed

··· 217 217 CFQ sysfs tunable 218 218 ================= 219 219 /sys/block/<disk>/queue/iosched/group_isolation 220 220 + ----------------------------------------------- 220 221 221 222 If group_isolation=1, it provides stronger isolation between groups at the 222 223 expense of throughput. By default group_isolation is 0. In general that ··· 243 242 By default one should run with group_isolation=0. If that is not sufficient 244 243 and one wants stronger isolation between groups, then set group_isolation=1 245 244 but this will come at cost of reduced throughput. 245 245 + 246 246 + /sys/block/<disk>/queue/iosched/slice_idle 247 247 + ------------------------------------------ 248 248 + On a faster hardware CFQ can be slow, especially with sequential workload. 249 249 + This happens because CFQ idles on a single queue and single queue might not 250 250 + drive deeper request queue depths to keep the storage busy. In such scenarios 251 251 + one can try setting slice_idle=0 and that would switch CFQ to IOPS 252 252 + (IO operations per second) mode on NCQ supporting hardware. 253 253 + 254 254 + That means CFQ will not idle between cfq queues of a cfq group and hence be 255 255 + able to driver higher queue depth and achieve better throughput. That also 256 256 + means that cfq provides fairness among groups in terms of IOPS and not in 257 257 + terms of disk time. 258 258 + 259 259 + /sys/block/<disk>/queue/iosched/group_idle 260 260 + ------------------------------------------ 261 261 + If one disables idling on individual cfq queues and cfq service trees by 262 262 + setting slice_idle=0, group_idle kicks in. That means CFQ will still idle 263 263 + on the group in an attempt to provide fairness among groups. 264 264 + 265 265 + By default group_idle is same as slice_idle and does not do anything if 266 266 + slice_idle is enabled. 267 267 + 268 268 + One can experience an overall throughput drop if you have created multiple 269 269 + groups and put applications in that group which are not driving enough 270 270 + IO to keep disk busy. In that case set group_idle=0, and CFQ will not idle 271 271 + on individual groups and throughput should improve. 246 272 247 273 What works 248 274 ==========

+1 -1

block/blk-cgroup.c

reviewed

··· 966 966 967 967 /* Currently we do not support hierarchy deeper than two level (0,1) */ 968 968 if (parent != cgroup->top_cgroup) 969 969 - return ERR_PTR(-EINVAL); 969 969 + return ERR_PTR(-EPERM); 970 970 971 971 blkcg = kzalloc(sizeof(*blkcg), GFP_KERNEL); 972 972 if (!blkcg)

+3 -3

block/blk-core.c

reviewed

··· 1198 1198 int el_ret; 1199 1199 unsigned int bytes = bio->bi_size; 1200 1200 const unsigned short prio = bio_prio(bio); 1201 1201 - const bool sync = (bio->bi_rw & REQ_SYNC); 1202 1202 - const bool unplug = (bio->bi_rw & REQ_UNPLUG); 1203 1203 - const unsigned int ff = bio->bi_rw & REQ_FAILFAST_MASK; 1201 1201 + const bool sync = !!(bio->bi_rw & REQ_SYNC); 1202 1202 + const bool unplug = !!(bio->bi_rw & REQ_UNPLUG); 1203 1203 + const unsigned long ff = bio->bi_rw & REQ_FAILFAST_MASK; 1204 1204 int rw_flags; 1205 1205 1206 1206 if ((bio->bi_rw & REQ_HARDBARRIER) &&

block/blk-sysfs.c

reviewed

··· 511 511 kobject_uevent(&q->kobj, KOBJ_REMOVE); 512 512 kobject_del(&q->kobj); 513 513 blk_trace_remove_sysfs(disk_to_dev(disk)); 514 514 + kobject_put(&dev->kobj); 514 515 return ret; 515 516 } 516 517

+6 -2

block/blk.h

reviewed

··· 142 142 143 143 static inline int blk_cpu_to_group(int cpu) 144 144 { 145 145 + int group = NR_CPUS; 145 146 #ifdef CONFIG_SCHED_MC 146 147 const struct cpumask *mask = cpu_coregroup_mask(cpu); 147 147 - return cpumask_first(mask); 148 148 + group = cpumask_first(mask); 148 149 #elif defined(CONFIG_SCHED_SMT) 149 149 - return cpumask_first(topology_thread_cpumask(cpu)); 150 150 + group = cpumask_first(topology_thread_cpumask(cpu)); 150 151 #else 151 152 return cpu; 152 153 #endif 154 154 + if (likely(group < NR_CPUS)) 155 155 + return group; 156 156 + return cpu; 153 157 } 154 158 155 159 /*

+88 -15

block/cfq-iosched.c

reviewed

··· 30 30 static int cfq_slice_async = HZ / 25; 31 31 static const int cfq_slice_async_rq = 2; 32 32 static int cfq_slice_idle = HZ / 125; 33 33 + static int cfq_group_idle = HZ / 125; 33 34 static const int cfq_target_latency = HZ * 3/10; /* 300 ms */ 34 35 static const int cfq_hist_divisor = 4; 35 36 ··· 148 147 struct cfq_queue *new_cfqq; 149 148 struct cfq_group *cfqg; 150 149 struct cfq_group *orig_cfqg; 150 150 + /* Number of sectors dispatched from queue in single dispatch round */ 151 151 + unsigned long nr_sectors; 151 152 }; 152 153 153 154 /* ··· 201 198 struct hlist_node cfqd_node; 202 199 atomic_t ref; 203 200 #endif 201 201 + /* number of requests that are on the dispatch list or inside driver */ 202 202 + int dispatched; 204 203 }; 205 204 206 205 /* ··· 276 271 unsigned int cfq_slice[2]; 277 272 unsigned int cfq_slice_async_rq; 278 273 unsigned int cfq_slice_idle; 274 274 + unsigned int cfq_group_idle; 279 275 unsigned int cfq_latency; 280 276 unsigned int cfq_group_isolation; 281 277 ··· 383 377 j++, st = i < IDLE_WORKLOAD ? \ 384 378 &cfqg->service_trees[i][j]: NULL) \ 385 379 380 380 + 381 381 + static inline bool iops_mode(struct cfq_data *cfqd) 382 382 + { 383 383 + /* 384 384 + * If we are not idling on queues and it is a NCQ drive, parallel 385 385 + * execution of requests is on and measuring time is not possible 386 386 + * in most of the cases until and unless we drive shallower queue 387 387 + * depths and that becomes a performance bottleneck. In such cases 388 388 + * switch to start providing fairness in terms of number of IOs. 389 389 + */ 390 390 + if (!cfqd->cfq_slice_idle && cfqd->hw_tag) 391 391 + return true; 392 392 + else 393 393 + return false; 394 394 + } 386 395 387 396 static inline enum wl_prio_t cfqq_prio(struct cfq_queue *cfqq) 388 397 { ··· 927 906 slice_used = cfqq->allocated_slice; 928 907 } 929 908 930 930 - cfq_log_cfqq(cfqq->cfqd, cfqq, "sl_used=%u", slice_used); 931 909 return slice_used; 932 910 } 933 911 ··· 934 914 struct cfq_queue *cfqq) 935 915 { 936 916 struct cfq_rb_root *st = &cfqd->grp_service_tree; 937 937 - unsigned int used_sl, charge_sl; 917 917 + unsigned int used_sl, charge; 938 918 int nr_sync = cfqg->nr_cfqq - cfqg_busy_async_queues(cfqd, cfqg) 939 919 - cfqg->service_tree_idle.count; 940 920 941 921 BUG_ON(nr_sync < 0); 942 942 - used_sl = charge_sl = cfq_cfqq_slice_usage(cfqq); 922 922 + used_sl = charge = cfq_cfqq_slice_usage(cfqq); 943 923 944 944 - if (!cfq_cfqq_sync(cfqq) && !nr_sync) 945 945 - charge_sl = cfqq->allocated_slice; 924 924 + if (iops_mode(cfqd)) 925 925 + charge = cfqq->slice_dispatch; 926 926 + else if (!cfq_cfqq_sync(cfqq) && !nr_sync) 927 927 + charge = cfqq->allocated_slice; 946 928 947 929 /* Can't update vdisktime while group is on service tree */ 948 930 cfq_rb_erase(&cfqg->rb_node, st); 949 949 - cfqg->vdisktime += cfq_scale_slice(charge_sl, cfqg); 931 931 + cfqg->vdisktime += cfq_scale_slice(charge, cfqg); 950 932 __cfq_group_service_tree_add(st, cfqg); 951 933 952 934 /* This group is being expired. Save the context */ ··· 962 940 963 941 cfq_log_cfqg(cfqd, cfqg, "served: vt=%llu min_vt=%llu", cfqg->vdisktime, 964 942 st->min_vdisktime); 943 943 + cfq_log_cfqq(cfqq->cfqd, cfqq, "sl_used=%u disp=%u charge=%u iops=%u" 944 944 + " sect=%u", used_sl, cfqq->slice_dispatch, charge, 945 945 + iops_mode(cfqd), cfqq->nr_sectors); 965 946 cfq_blkiocg_update_timeslice_used(&cfqg->blkg, used_sl); 966 947 cfq_blkiocg_set_start_empty_time(&cfqg->blkg); 967 948 } ··· 1612 1587 cfqq->allocated_slice = 0; 1613 1588 cfqq->slice_end = 0; 1614 1589 cfqq->slice_dispatch = 0; 1590 1590 + cfqq->nr_sectors = 0; 1615 1591 1616 1592 cfq_clear_cfqq_wait_request(cfqq); 1617 1593 cfq_clear_cfqq_must_dispatch(cfqq); ··· 1865 1839 BUG_ON(!service_tree); 1866 1840 BUG_ON(!service_tree->count); 1867 1841 1842 1842 + if (!cfqd->cfq_slice_idle) 1843 1843 + return false; 1844 1844 + 1868 1845 /* We never do for idle class queues. */ 1869 1846 if (prio == IDLE_WORKLOAD) 1870 1847 return false; ··· 1892 1863 { 1893 1864 struct cfq_queue *cfqq = cfqd->active_queue; 1894 1865 struct cfq_io_context *cic; 1895 1895 - unsigned long sl; 1866 1866 + unsigned long sl, group_idle = 0; 1896 1867 1897 1868 /* 1898 1869 * SSD device without seek penalty, disable idling. But only do so ··· 1908 1879 /* 1909 1880 * idle is disabled, either manually or by past process history 1910 1881 */ 1911 1911 - if (!cfqd->cfq_slice_idle || !cfq_should_idle(cfqd, cfqq)) 1912 1912 - return; 1882 1882 + if (!cfq_should_idle(cfqd, cfqq)) { 1883 1883 + /* no queue idling. Check for group idling */ 1884 1884 + if (cfqd->cfq_group_idle) 1885 1885 + group_idle = cfqd->cfq_group_idle; 1886 1886 + else 1887 1887 + return; 1888 1888 + } 1913 1889 1914 1890 /* 1915 1891 * still active requests from this queue, don't idle ··· 1941 1907 return; 1942 1908 } 1943 1909 1910 1910 + /* There are other queues in the group, don't do group idle */ 1911 1911 + if (group_idle && cfqq->cfqg->nr_cfqq > 1) 1912 1912 + return; 1913 1913 + 1944 1914 cfq_mark_cfqq_wait_request(cfqq); 1945 1915 1946 1946 - sl = cfqd->cfq_slice_idle; 1916 1916 + if (group_idle) 1917 1917 + sl = cfqd->cfq_group_idle; 1918 1918 + else 1919 1919 + sl = cfqd->cfq_slice_idle; 1947 1920 1948 1921 mod_timer(&cfqd->idle_slice_timer, jiffies + sl); 1949 1922 cfq_blkiocg_update_set_idle_time_stats(&cfqq->cfqg->blkg); 1950 1950 - cfq_log_cfqq(cfqd, cfqq, "arm_idle: %lu", sl); 1923 1923 + cfq_log_cfqq(cfqd, cfqq, "arm_idle: %lu group_idle: %d", sl, 1924 1924 + group_idle ? 1 : 0); 1951 1925 } 1952 1926 1953 1927 /* ··· 1971 1929 cfqq->next_rq = cfq_find_next_rq(cfqd, cfqq, rq); 1972 1930 cfq_remove_request(rq); 1973 1931 cfqq->dispatched++; 1932 1932 + (RQ_CFQG(rq))->dispatched++; 1974 1933 elv_dispatch_sort(q, rq); 1975 1934 1976 1935 cfqd->rq_in_flight[cfq_cfqq_sync(cfqq)]++; 1936 1936 + cfqq->nr_sectors += blk_rq_sectors(rq); 1977 1937 cfq_blkiocg_update_dispatch_stats(&cfqq->cfqg->blkg, blk_rq_bytes(rq), 1978 1938 rq_data_dir(rq), rq_is_sync(rq)); 1979 1939 } ··· 2242 2198 cfqq = NULL; 2243 2199 goto keep_queue; 2244 2200 } else 2245 2245 - goto expire; 2201 2201 + goto check_group_idle; 2246 2202 } 2247 2203 2248 2204 /* ··· 2270 2226 * flight or is idling for a new request, allow either of these 2271 2227 * conditions to happen (or time out) before selecting a new queue. 2272 2228 */ 2273 2273 - if (timer_pending(&cfqd->idle_slice_timer) || 2274 2274 - (cfqq->dispatched && cfq_should_idle(cfqd, cfqq))) { 2229 2229 + if (timer_pending(&cfqd->idle_slice_timer)) { 2230 2230 + cfqq = NULL; 2231 2231 + goto keep_queue; 2232 2232 + } 2233 2233 + 2234 2234 + if (cfqq->dispatched && cfq_should_idle(cfqd, cfqq)) { 2235 2235 + cfqq = NULL; 2236 2236 + goto keep_queue; 2237 2237 + } 2238 2238 + 2239 2239 + /* 2240 2240 + * If group idle is enabled and there are requests dispatched from 2241 2241 + * this group, wait for requests to complete. 2242 2242 + */ 2243 2243 + check_group_idle: 2244 2244 + if (cfqd->cfq_group_idle && cfqq->cfqg->nr_cfqq == 1 2245 2245 + && cfqq->cfqg->dispatched) { 2275 2246 cfqq = NULL; 2276 2247 goto keep_queue; 2277 2248 } ··· 3434 3375 WARN_ON(!cfqq->dispatched); 3435 3376 cfqd->rq_in_driver--; 3436 3377 cfqq->dispatched--; 3378 3378 + (RQ_CFQG(rq))->dispatched--; 3437 3379 cfq_blkiocg_update_completion_stats(&cfqq->cfqg->blkg, 3438 3380 rq_start_time_ns(rq), rq_io_start_time_ns(rq), 3439 3381 rq_data_dir(rq), rq_is_sync(rq)); ··· 3464 3404 * the queue. 3465 3405 */ 3466 3406 if (cfq_should_wait_busy(cfqd, cfqq)) { 3467 3467 - cfqq->slice_end = jiffies + cfqd->cfq_slice_idle; 3407 3407 + unsigned long extend_sl = cfqd->cfq_slice_idle; 3408 3408 + if (!cfqd->cfq_slice_idle) 3409 3409 + extend_sl = cfqd->cfq_group_idle; 3410 3410 + cfqq->slice_end = jiffies + extend_sl; 3468 3411 cfq_mark_cfqq_wait_busy(cfqq); 3469 3412 cfq_log_cfqq(cfqd, cfqq, "will busy wait"); 3470 3413 } ··· 3913 3850 cfqd->cfq_slice[1] = cfq_slice_sync; 3914 3851 cfqd->cfq_slice_async_rq = cfq_slice_async_rq; 3915 3852 cfqd->cfq_slice_idle = cfq_slice_idle; 3853 3853 + cfqd->cfq_group_idle = cfq_group_idle; 3916 3854 cfqd->cfq_latency = 1; 3917 3855 cfqd->cfq_group_isolation = 0; 3918 3856 cfqd->hw_tag = -1; ··· 3986 3922 SHOW_FUNCTION(cfq_back_seek_max_show, cfqd->cfq_back_max, 0); 3987 3923 SHOW_FUNCTION(cfq_back_seek_penalty_show, cfqd->cfq_back_penalty, 0); 3988 3924 SHOW_FUNCTION(cfq_slice_idle_show, cfqd->cfq_slice_idle, 1); 3925 3925 + SHOW_FUNCTION(cfq_group_idle_show, cfqd->cfq_group_idle, 1); 3989 3926 SHOW_FUNCTION(cfq_slice_sync_show, cfqd->cfq_slice[1], 1); 3990 3927 SHOW_FUNCTION(cfq_slice_async_show, cfqd->cfq_slice[0], 1); 3991 3928 SHOW_FUNCTION(cfq_slice_async_rq_show, cfqd->cfq_slice_async_rq, 0); ··· 4019 3954 STORE_FUNCTION(cfq_back_seek_penalty_store, &cfqd->cfq_back_penalty, 1, 4020 3955 UINT_MAX, 0); 4021 3956 STORE_FUNCTION(cfq_slice_idle_store, &cfqd->cfq_slice_idle, 0, UINT_MAX, 1); 3957 3957 + STORE_FUNCTION(cfq_group_idle_store, &cfqd->cfq_group_idle, 0, UINT_MAX, 1); 4022 3958 STORE_FUNCTION(cfq_slice_sync_store, &cfqd->cfq_slice[1], 1, UINT_MAX, 1); 4023 3959 STORE_FUNCTION(cfq_slice_async_store, &cfqd->cfq_slice[0], 1, UINT_MAX, 1); 4024 3960 STORE_FUNCTION(cfq_slice_async_rq_store, &cfqd->cfq_slice_async_rq, 1, ··· 4041 3975 CFQ_ATTR(slice_async), 4042 3976 CFQ_ATTR(slice_async_rq), 4043 3977 CFQ_ATTR(slice_idle), 3978 3978 + CFQ_ATTR(group_idle), 4044 3979 CFQ_ATTR(low_latency), 4045 3980 CFQ_ATTR(group_isolation), 4046 3981 __ATTR_NULL ··· 4095 4028 if (!cfq_slice_idle) 4096 4029 cfq_slice_idle = 1; 4097 4030 4031 4031 + #ifdef CONFIG_CFQ_GROUP_IOSCHED 4032 4032 + if (!cfq_group_idle) 4033 4033 + cfq_group_idle = 1; 4034 4034 + #else 4035 4035 + cfq_group_idle = 0; 4036 4036 + #endif 4098 4037 if (cfq_slab_setup()) 4099 4038 return -ENOMEM; 4100 4039

+31 -13

block/elevator.c

reviewed

··· 1009 1009 { 1010 1010 struct elevator_queue *old_elevator, *e; 1011 1011 void *data; 1012 1012 + int err; 1012 1013 1013 1014 /* 1014 1015 * Allocate new elevator 1015 1016 */ 1016 1017 e = elevator_alloc(q, new_e); 1017 1018 if (!e) 1018 1018 - return 0; 1019 1019 + return -ENOMEM; 1019 1020 1020 1021 data = elevator_init_queue(q, e); 1021 1022 if (!data) { 1022 1023 kobject_put(&e->kobj); 1023 1023 - return 0; 1024 1024 + return -ENOMEM; 1024 1025 } 1025 1026 1026 1027 /* ··· 1044 1043 1045 1044 __elv_unregister_queue(old_elevator); 1046 1045 1047 1047 - if (elv_register_queue(q)) 1046 1046 + err = elv_register_queue(q); 1047 1047 + if (err) 1048 1048 goto fail_register; 1049 1049 1050 1050 /* ··· 1058 1056 1059 1057 blk_add_trace_msg(q, "elv switch: %s", e->elevator_type->elevator_name); 1060 1058 1061 1061 - return 1; 1059 1059 + return 0; 1062 1060 1063 1061 fail_register: 1064 1062 /* ··· 1073 1071 queue_flag_clear(QUEUE_FLAG_ELVSWITCH, q); 1074 1072 spin_unlock_irq(q->queue_lock); 1075 1073 1076 1076 - return 0; 1074 1074 + return err; 1077 1075 } 1078 1076 1079 1079 - ssize_t elv_iosched_store(struct request_queue *q, const char *name, 1080 1080 - size_t count) 1077 1077 + /* 1078 1078 + * Switch this queue to the given IO scheduler. 1079 1079 + */ 1080 1080 + int elevator_change(struct request_queue *q, const char *name) 1081 1081 { 1082 1082 char elevator_name[ELV_NAME_MAX]; 1083 1083 struct elevator_type *e; 1084 1084 1085 1085 if (!q->elevator) 1086 1086 - return count; 1086 1086 + return -ENXIO; 1087 1087 1088 1088 strlcpy(elevator_name, name, sizeof(elevator_name)); 1089 1089 e = elevator_get(strstrip(elevator_name)); ··· 1096 1092 1097 1093 if (!strcmp(elevator_name, q->elevator->elevator_type->elevator_name)) { 1098 1094 elevator_put(e); 1099 1099 - return count; 1095 1095 + return 0; 1100 1096 } 1101 1097 1102 1102 - if (!elevator_switch(q, e)) 1103 1103 - printk(KERN_ERR "elevator: switch to %s failed\n", 1104 1104 - elevator_name); 1105 1105 - return count; 1098 1098 + return elevator_switch(q, e); 1099 1099 + } 1100 1100 + EXPORT_SYMBOL(elevator_change); 1101 1101 + 1102 1102 + ssize_t elv_iosched_store(struct request_queue *q, const char *name, 1103 1103 + size_t count) 1104 1104 + { 1105 1105 + int ret; 1106 1106 + 1107 1107 + if (!q->elevator) 1108 1108 + return count; 1109 1109 + 1110 1110 + ret = elevator_change(q, name); 1111 1111 + if (!ret) 1112 1112 + return count; 1113 1113 + 1114 1114 + printk(KERN_ERR "elevator: switch to %s failed\n", name); 1115 1115 + return ret; 1106 1116 } 1107 1117 1108 1118 ssize_t elv_iosched_show(struct request_queue *q, char *name)

+11

drivers/block/cciss.c

reviewed

··· 297 297 spin_lock_irqsave(&h->lock, flags); 298 298 addQ(&h->reqQ, c); 299 299 h->Qdepth++; 300 300 + if (h->Qdepth > h->maxQsinceinit) 301 301 + h->maxQsinceinit = h->Qdepth; 300 302 start_io(h); 301 303 spin_unlock_irqrestore(&h->lock, flags); 302 304 } ··· 4521 4519 misc_fw_support = readl(&cfgtable->misc_fw_support); 4522 4520 use_doorbell = misc_fw_support & MISC_FW_DOORBELL_RESET; 4523 4521 4522 4522 + /* The doorbell reset seems to cause lockups on some Smart 4523 4523 + * Arrays (e.g. P410, P410i, maybe others). Until this is 4524 4524 + * fixed or at least isolated, avoid the doorbell reset. 4525 4525 + */ 4526 4526 + use_doorbell = 0; 4527 4527 + 4524 4528 rc = cciss_controller_hard_reset(pdev, vaddr, use_doorbell); 4525 4529 if (rc) 4526 4530 goto unmap_cfgtable; ··· 4720 4712 h->scatter_list = kmalloc(h->max_commands * 4721 4713 sizeof(struct scatterlist *), 4722 4714 GFP_KERNEL); 4715 4715 + if (!h->scatter_list) 4716 4716 + goto clean4; 4717 4717 + 4723 4718 for (k = 0; k < h->nr_cmds; k++) { 4724 4719 h->scatter_list[k] = kmalloc(sizeof(struct scatterlist) * 4725 4720 h->maxsgentries,

+1 -1

drivers/block/loop.c

reviewed

··· 477 477 pos = ((loff_t) bio->bi_sector << 9) + lo->lo_offset; 478 478 479 479 if (bio_rw(bio) == WRITE) { 480 480 - bool barrier = (bio->bi_rw & REQ_HARDBARRIER); 480 480 + bool barrier = !!(bio->bi_rw & REQ_HARDBARRIER); 481 481 struct file *file = lo->lo_backing_file; 482 482 483 483 if (barrier) {

+1 -2

drivers/block/mg_disk.c

reviewed

··· 974 974 host->breq->queuedata = host; 975 975 976 976 /* mflash is random device, thanx for the noop */ 977 977 - elevator_exit(host->breq->elevator); 978 978 - err = elevator_init(host->breq, "noop"); 977 977 + err = elevator_change(host->breq, "noop"); 979 978 if (err) { 980 979 printk(KERN_ERR "%s:%d (elevator_init) fail\n", 981 980 __func__, __LINE__);

+1 -2

drivers/s390/char/tape_block.c

reviewed

··· 217 217 if (!blkdat->request_queue) 218 218 return -ENOMEM; 219 219 220 220 - elevator_exit(blkdat->request_queue->elevator); 221 221 - rc = elevator_init(blkdat->request_queue, "noop"); 220 220 + rc = elevator_change(blkdat->request_queue, "noop"); 222 221 if (rc) 223 222 goto cleanup_queue; 224 223

+2 -2

fs/bio-integrity.c

reviewed

··· 413 413 414 414 /* Allocate kernel buffer for protection data */ 415 415 len = sectors * blk_integrity_tuple_size(bi); 416 416 - buf = kmalloc(len, GFP_NOIO | __GFP_NOFAIL | q->bounce_gfp); 416 416 + buf = kmalloc(len, GFP_NOIO | q->bounce_gfp); 417 417 if (unlikely(buf == NULL)) { 418 418 printk(KERN_ERR "could not allocate integrity buffer\n"); 419 419 - return -EIO; 419 419 + return -ENOMEM; 420 420 } 421 421 422 422 end = (((unsigned long) buf) + len + PAGE_SIZE - 1) >> PAGE_SHIFT;

+1 -1

fs/fs-writeback.c

reviewed

··· 808 808 wb->last_active = jiffies; 809 809 810 810 set_current_state(TASK_INTERRUPTIBLE); 811 811 - if (!list_empty(&bdi->work_list)) { 811 811 + if (!list_empty(&bdi->work_list) || kthread_should_stop()) { 812 812 __set_current_state(TASK_RUNNING); 813 813 continue; 814 814 }

include/linux/elevator.h

reviewed

··· 136 136 137 137 extern int elevator_init(struct request_queue *, char *); 138 138 extern void elevator_exit(struct elevator_queue *); 139 139 + extern int elevator_change(struct request_queue *, const char *); 139 140 extern int elv_rq_merge_ok(struct request *, struct bio *); 140 141 141 142 /*

+12 -2

lib/scatterlist.c

reviewed

··· 248 248 left -= sg_size; 249 249 250 250 sg = alloc_fn(alloc_size, gfp_mask); 251 251 - if (unlikely(!sg)) 252 252 - return -ENOMEM; 251 251 + if (unlikely(!sg)) { 252 252 + /* 253 253 + * Adjust entry count to reflect that the last 254 254 + * entry of the previous table won't be used for 255 255 + * linkage. Without this, sg_kfree() may get 256 256 + * confused. 257 257 + */ 258 258 + if (prv) 259 259 + table->nents = ++table->orig_nents; 260 260 + 261 261 + return -ENOMEM; 262 262 + } 253 263 254 264 sg_init_table(sg, alloc_size); 255 265 table->nents = table->orig_nents += sg_size;

+5 -2

mm/backing-dev.c

reviewed

··· 445 445 switch (action) { 446 446 case FORK_THREAD: 447 447 __set_current_state(TASK_RUNNING); 448 448 - task = kthread_run(bdi_writeback_thread, &bdi->wb, "flush-%s", 449 449 - dev_name(bdi->dev)); 448 448 + task = kthread_create(bdi_writeback_thread, &bdi->wb, 449 449 + "flush-%s", dev_name(bdi->dev)); 450 450 if (IS_ERR(task)) { 451 451 /* 452 452 * If thread creation fails, force writeout of ··· 457 457 /* 458 458 * The spinlock makes sure we do not lose 459 459 * wake-ups when racing with 'bdi_queue_work()'. 460 460 + * And as soon as the bdi thread is visible, we 461 461 + * can start it. 460 462 */ 461 463 spin_lock_bh(&bdi->wb_lock); 462 464 bdi->wb.task = task; 463 465 spin_unlock_bh(&bdi->wb_lock); 466 466 + wake_up_process(task); 464 467 } 465 468 break; 466 469