Merge branch 'for-5.18/block' into for-5.18/write-streams

+49

Documentation/ABI/stable/sysfs-block

··· 155 155 last zone of the device which may be smaller. 156 156 157 157 158 + What: /sys/block/<disk>/queue/crypto/ 159 + Date: February 2022 160 + Contact: linux-block@vger.kernel.org 161 + Description: 162 + The presence of this subdirectory of /sys/block/<disk>/queue/ 163 + indicates that the device supports inline encryption. This 164 + subdirectory contains files which describe the inline encryption 165 + capabilities of the device. For more information about inline 166 + encryption, refer to Documentation/block/inline-encryption.rst. 167 + 168 + 169 + What: /sys/block/<disk>/queue/crypto/max_dun_bits 170 + Date: February 2022 171 + Contact: linux-block@vger.kernel.org 172 + Description: 173 + [RO] This file shows the maximum length, in bits, of data unit 174 + numbers accepted by the device in inline encryption requests. 175 + 176 + 177 + What: /sys/block/<disk>/queue/crypto/modes/<mode> 178 + Date: February 2022 179 + Contact: linux-block@vger.kernel.org 180 + Description: 181 + [RO] For each crypto mode (i.e., encryption/decryption 182 + algorithm) the device supports with inline encryption, a file 183 + will exist at this location. It will contain a hexadecimal 184 + number that is a bitmask of the supported data unit sizes, in 185 + bytes, for that crypto mode. 186 + 187 + Currently, the crypto modes that may be supported are: 188 + 189 + * AES-256-XTS 190 + * AES-128-CBC-ESSIV 191 + * Adiantum 192 + 193 + For example, if a device supports AES-256-XTS inline encryption 194 + with data unit sizes of 512 and 4096 bytes, the file 195 + /sys/block/<disk>/queue/crypto/modes/AES-256-XTS will exist and 196 + will contain "0x1200". 197 + 198 + 199 + What: /sys/block/<disk>/queue/crypto/num_keyslots 200 + Date: February 2022 201 + Contact: linux-block@vger.kernel.org 202 + Description: 203 + [RO] This file shows the number of keyslots the device has for 204 + use with inline encryption. 205 + 206 + 158 207 What: /sys/block/<disk>/queue/dax 159 208 Date: June 2016 160 209 Contact: linux-block@vger.kernel.org

-1164

Documentation/block/biodoc.rst

··· 1 - ===================================================== 2 - Notes on the Generic Block Layer Rewrite in Linux 2.5 3 - ===================================================== 4 - 5 - .. note:: 6 - 7 - It seems that there are lot of outdated stuff here. This seems 8 - to be written somewhat as a task list. Yet, eventually, something 9 - here might still be useful. 10 - 11 - Notes Written on Jan 15, 2002: 12 - 13 - - Jens Axboe <jens.axboe@oracle.com> 14 - - Suparna Bhattacharya <suparna@in.ibm.com> 15 - 16 - Last Updated May 2, 2002 17 - 18 - September 2003: Updated I/O Scheduler portions 19 - - Nick Piggin <npiggin@kernel.dk> 20 - 21 - Introduction 22 - ============ 23 - 24 - These are some notes describing some aspects of the 2.5 block layer in the 25 - context of the bio rewrite. The idea is to bring out some of the key 26 - changes and a glimpse of the rationale behind those changes. 27 - 28 - Please mail corrections & suggestions to suparna@in.ibm.com. 29 - 30 - Credits 31 - ======= 32 - 33 - 2.5 bio rewrite: 34 - - Jens Axboe <jens.axboe@oracle.com> 35 - 36 - Many aspects of the generic block layer redesign were driven by and evolved 37 - over discussions, prior patches and the collective experience of several 38 - people. See sections 8 and 9 for a list of some related references. 39 - 40 - The following people helped with review comments and inputs for this 41 - document: 42 - 43 - - Christoph Hellwig <hch@infradead.org> 44 - - Arjan van de Ven <arjanv@redhat.com> 45 - - Randy Dunlap <rdunlap@xenotime.net> 46 - - Andre Hedrick <andre@linux-ide.org> 47 - 48 - The following people helped with fixes/contributions to the bio patches 49 - while it was still work-in-progress: 50 - 51 - - David S. Miller <davem@redhat.com> 52 - 53 - 54 - .. Description of Contents: 55 - 56 - 1. Scope for tuning of logic to various needs 57 - 1.1 Tuning based on device or low level driver capabilities 58 - - Per-queue parameters 59 - - Highmem I/O support 60 - - I/O scheduler modularization 61 - 1.2 Tuning based on high level requirements/capabilities 62 - 1.2.1 Request Priority/Latency 63 - 1.3 Direct access/bypass to lower layers for diagnostics and special 64 - device operations 65 - 1.3.1 Pre-built commands 66 - 2. New flexible and generic but minimalist i/o structure or descriptor 67 - (instead of using buffer heads at the i/o layer) 68 - 2.1 Requirements/Goals addressed 69 - 2.2 The bio struct in detail (multi-page io unit) 70 - 2.3 Changes in the request structure 71 - 3. Using bios 72 - 3.1 Setup/teardown (allocation, splitting) 73 - 3.2 Generic bio helper routines 74 - 3.2.1 Traversing segments and completion units in a request 75 - 3.2.2 Setting up DMA scatterlists 76 - 3.2.3 I/O completion 77 - 3.2.4 Implications for drivers that do not interpret bios (don't handle 78 - multiple segments) 79 - 3.3 I/O submission 80 - 4. The I/O scheduler 81 - 5. Scalability related changes 82 - 5.1 Granular locking: Removal of io_request_lock 83 - 5.2 Prepare for transition to 64 bit sector_t 84 - 6. Other Changes/Implications 85 - 6.1 Partition re-mapping handled by the generic block layer 86 - 7. A few tips on migration of older drivers 87 - 8. A list of prior/related/impacted patches/ideas 88 - 9. Other References/Discussion Threads 89 - 90 - 91 - Bio Notes 92 - ========= 93 - 94 - Let us discuss the changes in the context of how some overall goals for the 95 - block layer are addressed. 96 - 97 - 1. Scope for tuning the generic logic to satisfy various requirements 98 - ===================================================================== 99 - 100 - The block layer design supports adaptable abstractions to handle common 101 - processing with the ability to tune the logic to an appropriate extent 102 - depending on the nature of the device and the requirements of the caller. 103 - One of the objectives of the rewrite was to increase the degree of tunability 104 - and to enable higher level code to utilize underlying device/driver 105 - capabilities to the maximum extent for better i/o performance. This is 106 - important especially in the light of ever improving hardware capabilities 107 - and application/middleware software designed to take advantage of these 108 - capabilities. 109 - 110 - 1.1 Tuning based on low level device / driver capabilities 111 - ---------------------------------------------------------- 112 - 113 - Sophisticated devices with large built-in caches, intelligent i/o scheduling 114 - optimizations, high memory DMA support, etc may find some of the 115 - generic processing an overhead, while for less capable devices the 116 - generic functionality is essential for performance or correctness reasons. 117 - Knowledge of some of the capabilities or parameters of the device should be 118 - used at the generic block layer to take the right decisions on 119 - behalf of the driver. 120 - 121 - How is this achieved ? 122 - 123 - Tuning at a per-queue level: 124 - 125 - i. Per-queue limits/values exported to the generic layer by the driver 126 - 127 - Various parameters that the generic i/o scheduler logic uses are set at 128 - a per-queue level (e.g maximum request size, maximum number of segments in 129 - a scatter-gather list, logical block size) 130 - 131 - Some parameters that were earlier available as global arrays indexed by 132 - major/minor are now directly associated with the queue. Some of these may 133 - move into the block device structure in the future. Some characteristics 134 - have been incorporated into a queue flags field rather than separate fields 135 - in themselves. There are blk_queue_xxx functions to set the parameters, 136 - rather than update the fields directly 137 - 138 - Some new queue property settings: 139 - 140 - blk_queue_bounce_limit(q, u64 dma_address) 141 - Enable I/O to highmem pages, dma_address being the 142 - limit. No highmem default. 143 - 144 - blk_queue_max_sectors(q, max_sectors) 145 - Sets two variables that limit the size of the request. 146 - 147 - - The request queue's max_sectors, which is a soft size in 148 - units of 512 byte sectors, and could be dynamically varied 149 - by the core kernel. 150 - 151 - - The request queue's max_hw_sectors, which is a hard limit 152 - and reflects the maximum size request a driver can handle 153 - in units of 512 byte sectors. 154 - 155 - The default for both max_sectors and max_hw_sectors is 156 - 255. The upper limit of max_sectors is 1024. 157 - 158 - blk_queue_max_phys_segments(q, max_segments) 159 - Maximum physical segments you can handle in a request. 128 160 - default (driver limit). (See 3.2.2) 161 - 162 - blk_queue_max_hw_segments(q, max_segments) 163 - Maximum dma segments the hardware can handle in a request. 128 164 - default (host adapter limit, after dma remapping). 165 - (See 3.2.2) 166 - 167 - blk_queue_max_segment_size(q, max_seg_size) 168 - Maximum size of a clustered segment, 64kB default. 169 - 170 - blk_queue_logical_block_size(q, logical_block_size) 171 - Lowest possible sector size that the hardware can operate 172 - on, 512 bytes default. 173 - 174 - New queue flags: 175 - 176 - - QUEUE_FLAG_CLUSTER (see 3.2.2) 177 - - QUEUE_FLAG_QUEUED (see 3.2.4) 178 - 179 - 180 - ii. High-mem i/o capabilities are now considered the default 181 - 182 - The generic bounce buffer logic, present in 2.4, where the block layer would 183 - by default copyin/out i/o requests on high-memory buffers to low-memory buffers 184 - assuming that the driver wouldn't be able to handle it directly, has been 185 - changed in 2.5. The bounce logic is now applied only for memory ranges 186 - for which the device cannot handle i/o. A driver can specify this by 187 - setting the queue bounce limit for the request queue for the device 188 - (blk_queue_bounce_limit()). This avoids the inefficiencies of the copyin/out 189 - where a device is capable of handling high memory i/o. 190 - 191 - In order to enable high-memory i/o where the device is capable of supporting 192 - it, the pci dma mapping routines and associated data structures have now been 193 - modified to accomplish a direct page -> bus translation, without requiring 194 - a virtual address mapping (unlike the earlier scheme of virtual address 195 - -> bus translation). So this works uniformly for high-memory pages (which 196 - do not have a corresponding kernel virtual address space mapping) and 197 - low-memory pages. 198 - 199 - Note: Please refer to Documentation/core-api/dma-api-howto.rst for a discussion 200 - on PCI high mem DMA aspects and mapping of scatter gather lists, and support 201 - for 64 bit PCI. 202 - 203 - Special handling is required only for cases where i/o needs to happen on 204 - pages at physical memory addresses beyond what the device can support. In these 205 - cases, a bounce bio representing a buffer from the supported memory range 206 - is used for performing the i/o with copyin/copyout as needed depending on 207 - the type of the operation. For example, in case of a read operation, the 208 - data read has to be copied to the original buffer on i/o completion, so a 209 - callback routine is set up to do this, while for write, the data is copied 210 - from the original buffer to the bounce buffer prior to issuing the 211 - operation. Since an original buffer may be in a high memory area that's not 212 - mapped in kernel virtual addr, a kmap operation may be required for 213 - performing the copy, and special care may be needed in the completion path 214 - as it may not be in irq context. Special care is also required (by way of 215 - GFP flags) when allocating bounce buffers, to avoid certain highmem 216 - deadlock possibilities. 217 - 218 - It is also possible that a bounce buffer may be allocated from high-memory 219 - area that's not mapped in kernel virtual addr, but within the range that the 220 - device can use directly; so the bounce page may need to be kmapped during 221 - copy operations. [Note: This does not hold in the current implementation, 222 - though] 223 - 224 - There are some situations when pages from high memory may need to 225 - be kmapped, even if bounce buffers are not necessary. For example a device 226 - may need to abort DMA operations and revert to PIO for the transfer, in 227 - which case a virtual mapping of the page is required. For SCSI it is also 228 - done in some scenarios where the low level driver cannot be trusted to 229 - handle a single sg entry correctly. The driver is expected to perform the 230 - kmaps as needed on such occasions as appropriate. A driver could also use 231 - the blk_queue_bounce() routine on its own to bounce highmem i/o to low 232 - memory for specific requests if so desired. 233 - 234 - iii. The i/o scheduler algorithm itself can be replaced/set as appropriate 235 - 236 - As in 2.4, it is possible to plugin a brand new i/o scheduler for a particular 237 - queue or pick from (copy) existing generic schedulers and replace/override 238 - certain portions of it. The 2.5 rewrite provides improved modularization 239 - of the i/o scheduler. There are more pluggable callbacks, e.g for init, 240 - add request, extract request, which makes it possible to abstract specific 241 - i/o scheduling algorithm aspects and details outside of the generic loop. 242 - It also makes it possible to completely hide the implementation details of 243 - the i/o scheduler from block drivers. 244 - 245 - I/O scheduler wrappers are to be used instead of accessing the queue directly. 246 - See section 4. The I/O scheduler for details. 247 - 248 - 1.2 Tuning Based on High level code capabilities 249 - ------------------------------------------------ 250 - 251 - i. Application capabilities for raw i/o 252 - 253 - This comes from some of the high-performance database/middleware 254 - requirements where an application prefers to make its own i/o scheduling 255 - decisions based on an understanding of the access patterns and i/o 256 - characteristics 257 - 258 - ii. High performance filesystems or other higher level kernel code's 259 - capabilities 260 - 261 - Kernel components like filesystems could also take their own i/o scheduling 262 - decisions for optimizing performance. Journalling filesystems may need 263 - some control over i/o ordering. 264 - 265 - What kind of support exists at the generic block layer for this ? 266 - 267 - The flags and rw fields in the bio structure can be used for some tuning 268 - from above e.g indicating that an i/o is just a readahead request, or priority 269 - settings (currently unused). As far as user applications are concerned they 270 - would need an additional mechanism either via open flags or ioctls, or some 271 - other upper level mechanism to communicate such settings to block. 272 - 273 - 1.2.1 Request Priority/Latency 274 - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 275 - 276 - Todo/Under discussion:: 277 - 278 - Arjan's proposed request priority scheme allows higher levels some broad 279 - control (high/med/low) over the priority of an i/o request vs other pending 280 - requests in the queue. For example it allows reads for bringing in an 281 - executable page on demand to be given a higher priority over pending write 282 - requests which haven't aged too much on the queue. Potentially this priority 283 - could even be exposed to applications in some manner, providing higher level 284 - tunability. Time based aging avoids starvation of lower priority 285 - requests. Some bits in the bi_opf flags field in the bio structure are 286 - intended to be used for this priority information. 287 - 288 - 289 - 1.3 Direct Access to Low level Device/Driver Capabilities (Bypass mode) 290 - ----------------------------------------------------------------------- 291 - 292 - (e.g Diagnostics, Systems Management) 293 - 294 - There are situations where high-level code needs to have direct access to 295 - the low level device capabilities or requires the ability to issue commands 296 - to the device bypassing some of the intermediate i/o layers. 297 - These could, for example, be special control commands issued through ioctl 298 - interfaces, or could be raw read/write commands that stress the drive's 299 - capabilities for certain kinds of fitness tests. Having direct interfaces at 300 - multiple levels without having to pass through upper layers makes 301 - it possible to perform bottom up validation of the i/o path, layer by 302 - layer, starting from the media. 303 - 304 - The normal i/o submission interfaces, e.g submit_bio, could be bypassed 305 - for specially crafted requests which such ioctl or diagnostics 306 - interfaces would typically use, and the elevator add_request routine 307 - can instead be used to directly insert such requests in the queue or preferably 308 - the blk_do_rq routine can be used to place the request on the queue and 309 - wait for completion. Alternatively, sometimes the caller might just 310 - invoke a lower level driver specific interface with the request as a 311 - parameter. 312 - 313 - If the request is a means for passing on special information associated with 314 - the command, then such information is associated with the request->special 315 - field (rather than misuse the request->buffer field which is meant for the 316 - request data buffer's virtual mapping). 317 - 318 - For passing request data, the caller must build up a bio descriptor 319 - representing the concerned memory buffer if the underlying driver interprets 320 - bio segments or uses the block layer end*request* functions for i/o 321 - completion. Alternatively one could directly use the request->buffer field to 322 - specify the virtual address of the buffer, if the driver expects buffer 323 - addresses passed in this way and ignores bio entries for the request type 324 - involved. In the latter case, the driver would modify and manage the 325 - request->buffer, request->sector and request->nr_sectors or 326 - request->current_nr_sectors fields itself rather than using the block layer 327 - end_request or end_that_request_first completion interfaces. 328 - (See 2.3 or Documentation/block/request.rst for a brief explanation of 329 - the request structure fields) 330 - 331 - :: 332 - 333 - [TBD: end_that_request_last should be usable even in this case; 334 - Perhaps an end_that_direct_request_first routine could be implemented to make 335 - handling direct requests easier for such drivers; Also for drivers that 336 - expect bios, a helper function could be provided for setting up a bio 337 - corresponding to a data buffer] 338 - 339 - <JENS: I dont understand the above, why is end_that_request_first() not 340 - usable? Or _last for that matter. I must be missing something> 341 - 342 - <SUP: What I meant here was that if the request doesn't have a bio, then 343 - end_that_request_first doesn't modify nr_sectors or current_nr_sectors, 344 - and hence can't be used for advancing request state settings on the 345 - completion of partial transfers. The driver has to modify these fields 346 - directly by hand. 347 - This is because end_that_request_first only iterates over the bio list, 348 - and always returns 0 if there are none associated with the request. 349 - _last works OK in this case, and is not a problem, as I mentioned earlier 350 - > 351 - 352 - 1.3.1 Pre-built Commands 353 - ^^^^^^^^^^^^^^^^^^^^^^^^ 354 - 355 - A request can be created with a pre-built custom command to be sent directly 356 - to the device. The cmd block in the request structure has room for filling 357 - in the command bytes. (i.e rq->cmd is now 16 bytes in size, and meant for 358 - command pre-building, and the type of the request is now indicated 359 - through rq->flags instead of via rq->cmd) 360 - 361 - The request structure flags can be set up to indicate the type of request 362 - in such cases (REQ_PC: direct packet command passed to driver, REQ_BLOCK_PC: 363 - packet command issued via blk_do_rq, REQ_SPECIAL: special request). 364 - 365 - It can help to pre-build device commands for requests in advance. 366 - Drivers can now specify a request prepare function (q->prep_rq_fn) that the 367 - block layer would invoke to pre-build device commands for a given request, 368 - or perform other preparatory processing for the request. This is routine is 369 - called by elv_next_request(), i.e. typically just before servicing a request. 370 - (The prepare function would not be called for requests that have RQF_DONTPREP 371 - enabled) 372 - 373 - Aside: 374 - Pre-building could possibly even be done early, i.e before placing the 375 - request on the queue, rather than construct the command on the fly in the 376 - driver while servicing the request queue when it may affect latencies in 377 - interrupt context or responsiveness in general. One way to add early 378 - pre-building would be to do it whenever we fail to merge on a request. 379 - Now REQ_NOMERGE is set in the request flags to skip this one in the future, 380 - which means that it will not change before we feed it to the device. So 381 - the pre-builder hook can be invoked there. 382 - 383 - 384 - 2. Flexible and generic but minimalist i/o structure/descriptor 385 - =============================================================== 386 - 387 - 2.1 Reason for a new structure and requirements addressed 388 - --------------------------------------------------------- 389 - 390 - Prior to 2.5, buffer heads were used as the unit of i/o at the generic block 391 - layer, and the low level request structure was associated with a chain of 392 - buffer heads for a contiguous i/o request. This led to certain inefficiencies 393 - when it came to large i/o requests and readv/writev style operations, as it 394 - forced such requests to be broken up into small chunks before being passed 395 - on to the generic block layer, only to be merged by the i/o scheduler 396 - when the underlying device was capable of handling the i/o in one shot. 397 - Also, using the buffer head as an i/o structure for i/os that didn't originate 398 - from the buffer cache unnecessarily added to the weight of the descriptors 399 - which were generated for each such chunk. 400 - 401 - The following were some of the goals and expectations considered in the 402 - redesign of the block i/o data structure in 2.5. 403 - 404 - 1. Should be appropriate as a descriptor for both raw and buffered i/o - 405 - avoid cache related fields which are irrelevant in the direct/page i/o path, 406 - or filesystem block size alignment restrictions which may not be relevant 407 - for raw i/o. 408 - 2. Ability to represent high-memory buffers (which do not have a virtual 409 - address mapping in kernel address space). 410 - 3. Ability to represent large i/os w/o unnecessarily breaking them up (i.e 411 - greater than PAGE_SIZE chunks in one shot) 412 - 4. At the same time, ability to retain independent identity of i/os from 413 - different sources or i/o units requiring individual completion (e.g. for 414 - latency reasons) 415 - 5. Ability to represent an i/o involving multiple physical memory segments 416 - (including non-page aligned page fragments, as specified via readv/writev) 417 - without unnecessarily breaking it up, if the underlying device is capable of 418 - handling it. 419 - 6. Preferably should be based on a memory descriptor structure that can be 420 - passed around different types of subsystems or layers, maybe even 421 - networking, without duplication or extra copies of data/descriptor fields 422 - themselves in the process 423 - 7. Ability to handle the possibility of splits/merges as the structure passes 424 - through layered drivers (lvm, md, evms), with minimal overhead. 425 - 426 - The solution was to define a new structure (bio) for the block layer, 427 - instead of using the buffer head structure (bh) directly, the idea being 428 - avoidance of some associated baggage and limitations. The bio structure 429 - is uniformly used for all i/o at the block layer ; it forms a part of the 430 - bh structure for buffered i/o, and in the case of raw/direct i/o kiobufs are 431 - mapped to bio structures. 432 - 433 - 2.2 The bio struct 434 - ------------------ 435 - 436 - The bio structure uses a vector representation pointing to an array of tuples 437 - of <page, offset, len> to describe the i/o buffer, and has various other 438 - fields describing i/o parameters and state that needs to be maintained for 439 - performing the i/o. 440 - 441 - Notice that this representation means that a bio has no virtual address 442 - mapping at all (unlike buffer heads). 443 - 444 - :: 445 - 446 - struct bio_vec { 447 - struct page *bv_page; 448 - unsigned short bv_len; 449 - unsigned short bv_offset; 450 - }; 451 - 452 - /* 453 - * main unit of I/O for the block layer and lower layers (ie drivers) 454 - */ 455 - struct bio { 456 - struct bio *bi_next; /* request queue link */ 457 - struct block_device *bi_bdev; /* target device */ 458 - unsigned long bi_flags; /* status, command, etc */ 459 - unsigned long bi_opf; /* low bits: r/w, high: priority */ 460 - 461 - unsigned int bi_vcnt; /* how may bio_vec's */ 462 - struct bvec_iter bi_iter; /* current index into bio_vec array */ 463 - 464 - unsigned int bi_size; /* total size in bytes */ 465 - unsigned short bi_hw_segments; /* segments after DMA remapping */ 466 - unsigned int bi_max; /* max bio_vecs we can hold 467 - used as index into pool */ 468 - struct bio_vec *bi_io_vec; /* the actual vec list */ 469 - bio_end_io_t *bi_end_io; /* bi_end_io (bio) */ 470 - atomic_t bi_cnt; /* pin count: free when it hits zero */ 471 - void *bi_private; 472 - }; 473 - 474 - With this multipage bio design: 475 - 476 - - Large i/os can be sent down in one go using a bio_vec list consisting 477 - of an array of <page, offset, len> fragments (similar to the way fragments 478 - are represented in the zero-copy network code) 479 - - Splitting of an i/o request across multiple devices (as in the case of 480 - lvm or raid) is achieved by cloning the bio (where the clone points to 481 - the same bi_io_vec array, but with the index and size accordingly modified) 482 - - A linked list of bios is used as before for unrelated merges [#]_ - this 483 - avoids reallocs and makes independent completions easier to handle. 484 - - Code that traverses the req list can find all the segments of a bio 485 - by using rq_for_each_segment. This handles the fact that a request 486 - has multiple bios, each of which can have multiple segments. 487 - - Drivers which can't process a large bio in one shot can use the bi_iter 488 - field to keep track of the next bio_vec entry to process. 489 - (e.g a 1MB bio_vec needs to be handled in max 128kB chunks for IDE) 490 - [TBD: Should preferably also have a bi_voffset and bi_vlen to avoid modifying 491 - bi_offset an len fields] 492 - 493 - .. [#] 494 - 495 - unrelated merges -- a request ends up containing two or more bios that 496 - didn't originate from the same place. 497 - 498 - bi_end_io() i/o callback gets called on i/o completion of the entire bio. 499 - 500 - At a lower level, drivers build a scatter gather list from the merged bios. 501 - The scatter gather list is in the form of an array of <page, offset, len> 502 - entries with their corresponding dma address mappings filled in at the 503 - appropriate time. As an optimization, contiguous physical pages can be 504 - covered by a single entry where <page> refers to the first page and <len> 505 - covers the range of pages (up to 16 contiguous pages could be covered this 506 - way). There is a helper routine (blk_rq_map_sg) which drivers can use to build 507 - the sg list. 508 - 509 - Note: Right now the only user of bios with more than one page is ll_rw_kio, 510 - which in turn means that only raw I/O uses it (direct i/o may not work 511 - right now). The intent however is to enable clustering of pages etc to 512 - become possible. The pagebuf abstraction layer from SGI also uses multi-page 513 - bios, but that is currently not included in the stock development kernels. 514 - The same is true of Andrew Morton's work-in-progress multipage bio writeout 515 - and readahead patches. 516 - 517 - 2.3 Changes in the Request Structure 518 - ------------------------------------ 519 - 520 - The request structure is the structure that gets passed down to low level 521 - drivers. The block layer make_request function builds up a request structure, 522 - places it on the queue and invokes the drivers request_fn. The driver makes 523 - use of block layer helper routine elv_next_request to pull the next request 524 - off the queue. Control or diagnostic functions might bypass block and directly 525 - invoke underlying driver entry points passing in a specially constructed 526 - request structure. 527 - 528 - Only some relevant fields (mainly those which changed or may be referred 529 - to in some of the discussion here) are listed below, not necessarily in 530 - the order in which they occur in the structure (see include/linux/blkdev.h) 531 - Refer to Documentation/block/request.rst for details about all the request 532 - structure fields and a quick reference about the layers which are 533 - supposed to use or modify those fields:: 534 - 535 - struct request { 536 - struct list_head queuelist; /* Not meant to be directly accessed by 537 - the driver. 538 - Used by q->elv_next_request_fn 539 - rq->queue is gone 540 - */ 541 - . 542 - . 543 - unsigned char cmd[16]; /* prebuilt command data block */ 544 - unsigned long flags; /* also includes earlier rq->cmd settings */ 545 - . 546 - . 547 - sector_t sector; /* this field is now of type sector_t instead of int 548 - preparation for 64 bit sectors */ 549 - . 550 - . 551 - 552 - /* Number of scatter-gather DMA addr+len pairs after 553 - * physical address coalescing is performed. 554 - */ 555 - unsigned short nr_phys_segments; 556 - 557 - /* Number of scatter-gather addr+len pairs after 558 - * physical and DMA remapping hardware coalescing is performed. 559 - * This is the number of scatter-gather entries the driver 560 - * will actually have to deal with after DMA mapping is done. 561 - */ 562 - unsigned short nr_hw_segments; 563 - 564 - /* Various sector counts */ 565 - unsigned long nr_sectors; /* no. of sectors left: driver modifiable */ 566 - unsigned long hard_nr_sectors; /* block internal copy of above */ 567 - unsigned int current_nr_sectors; /* no. of sectors left in the 568 - current segment:driver modifiable */ 569 - unsigned long hard_cur_sectors; /* block internal copy of the above */ 570 - . 571 - . 572 - int tag; /* command tag associated with request */ 573 - void *special; /* same as before */ 574 - char *buffer; /* valid only for low memory buffers up to 575 - current_nr_sectors */ 576 - . 577 - . 578 - struct bio *bio, *biotail; /* bio list instead of bh */ 579 - struct request_list *rl; 580 - } 581 - 582 - See the req_ops and req_flag_bits definitions for an explanation of the various 583 - flags available. Some bits are used by the block layer or i/o scheduler. 584 - 585 - The behaviour of the various sector counts are almost the same as before, 586 - except that since we have multi-segment bios, current_nr_sectors refers 587 - to the numbers of sectors in the current segment being processed which could 588 - be one of the many segments in the current bio (i.e i/o completion unit). 589 - The nr_sectors value refers to the total number of sectors in the whole 590 - request that remain to be transferred (no change). The purpose of the 591 - hard_xxx values is for block to remember these counts every time it hands 592 - over the request to the driver. These values are updated by block on 593 - end_that_request_first, i.e. every time the driver completes a part of the 594 - transfer and invokes block end*request helpers to mark this. The 595 - driver should not modify these values. The block layer sets up the 596 - nr_sectors and current_nr_sectors fields (based on the corresponding 597 - hard_xxx values and the number of bytes transferred) and updates it on 598 - every transfer that invokes end_that_request_first. It does the same for the 599 - buffer, bio, bio->bi_iter fields too. 600 - 601 - The buffer field is just a virtual address mapping of the current segment 602 - of the i/o buffer in cases where the buffer resides in low-memory. For high 603 - memory i/o, this field is not valid and must not be used by drivers. 604 - 605 - Code that sets up its own request structures and passes them down to 606 - a driver needs to be careful about interoperation with the block layer helper 607 - functions which the driver uses. (Section 1.3) 608 - 609 - 3. Using bios 610 - ============= 611 - 612 - 3.1 Setup/Teardown 613 - ------------------ 614 - 615 - There are routines for managing the allocation, and reference counting, and 616 - freeing of bios (bio_alloc, bio_get, bio_put). 617 - 618 - This makes use of Ingo Molnar's mempool implementation, which enables 619 - subsystems like bio to maintain their own reserve memory pools for guaranteed 620 - deadlock-free allocations during extreme VM load. For example, the VM 621 - subsystem makes use of the block layer to writeout dirty pages in order to be 622 - able to free up memory space, a case which needs careful handling. The 623 - allocation logic draws from the preallocated emergency reserve in situations 624 - where it cannot allocate through normal means. If the pool is empty and it 625 - can wait, then it would trigger action that would help free up memory or 626 - replenish the pool (without deadlocking) and wait for availability in the pool. 627 - If it is in IRQ context, and hence not in a position to do this, allocation 628 - could fail if the pool is empty. In general mempool always first tries to 629 - perform allocation without having to wait, even if it means digging into the 630 - pool as long it is not less that 50% full. 631 - 632 - On a free, memory is released to the pool or directly freed depending on 633 - the current availability in the pool. The mempool interface lets the 634 - subsystem specify the routines to be used for normal alloc and free. In the 635 - case of bio, these routines make use of the standard slab allocator. 636 - 637 - The caller of bio_alloc is expected to taken certain steps to avoid 638 - deadlocks, e.g. avoid trying to allocate more memory from the pool while 639 - already holding memory obtained from the pool. 640 - 641 - :: 642 - 643 - [TBD: This is a potential issue, though a rare possibility 644 - in the bounce bio allocation that happens in the current code, since 645 - it ends up allocating a second bio from the same pool while 646 - holding the original bio ] 647 - 648 - Memory allocated from the pool should be released back within a limited 649 - amount of time (in the case of bio, that would be after the i/o is completed). 650 - This ensures that if part of the pool has been used up, some work (in this 651 - case i/o) must already be in progress and memory would be available when it 652 - is over. If allocating from multiple pools in the same code path, the order 653 - or hierarchy of allocation needs to be consistent, just the way one deals 654 - with multiple locks. 655 - 656 - The bio_alloc routine also needs to allocate the bio_vec_list (bvec_alloc()) 657 - for a non-clone bio. There are the 6 pools setup for different size biovecs, 658 - so bio_alloc(gfp_mask, nr_iovecs) will allocate a vec_list of the 659 - given size from these slabs. 660 - 661 - The bio_get() routine may be used to hold an extra reference on a bio prior 662 - to i/o submission, if the bio fields are likely to be accessed after the 663 - i/o is issued (since the bio may otherwise get freed in case i/o completion 664 - happens in the meantime). 665 - 666 - The bio_clone_fast() routine may be used to duplicate a bio, where the clone 667 - shares the bio_vec_list with the original bio (i.e. both point to the 668 - same bio_vec_list). This would typically be used for splitting i/o requests 669 - in lvm or md. 670 - 671 - 3.2 Generic bio helper Routines 672 - ------------------------------- 673 - 674 - 3.2.1 Traversing segments and completion units in a request 675 - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 676 - 677 - The macro rq_for_each_segment() should be used for traversing the bios 678 - in the request list (drivers should avoid directly trying to do it 679 - themselves). Using these helpers should also make it easier to cope 680 - with block changes in the future. 681 - 682 - :: 683 - 684 - struct req_iterator iter; 685 - rq_for_each_segment(bio_vec, rq, iter) 686 - /* bio_vec is now current segment */ 687 - 688 - I/O completion callbacks are per-bio rather than per-segment, so drivers 689 - that traverse bio chains on completion need to keep that in mind. Drivers 690 - which don't make a distinction between segments and completion units would 691 - need to be reorganized to support multi-segment bios. 692 - 693 - 3.2.2 Setting up DMA scatterlists 694 - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 695 - 696 - The blk_rq_map_sg() helper routine would be used for setting up scatter 697 - gather lists from a request, so a driver need not do it on its own. 698 - 699 - nr_segments = blk_rq_map_sg(q, rq, scatterlist); 700 - 701 - The helper routine provides a level of abstraction which makes it easier 702 - to modify the internals of request to scatterlist conversion down the line 703 - without breaking drivers. The blk_rq_map_sg routine takes care of several 704 - things like collapsing physically contiguous segments (if QUEUE_FLAG_CLUSTER 705 - is set) and correct segment accounting to avoid exceeding the limits which 706 - the i/o hardware can handle, based on various queue properties. 707 - 708 - - Prevents a clustered segment from crossing a 4GB mem boundary 709 - - Avoids building segments that would exceed the number of physical 710 - memory segments that the driver can handle (phys_segments) and the 711 - number that the underlying hardware can handle at once, accounting for 712 - DMA remapping (hw_segments) (i.e. IOMMU aware limits). 713 - 714 - Routines which the low level driver can use to set up the segment limits: 715 - 716 - blk_queue_max_hw_segments() : Sets an upper limit of the maximum number of 717 - hw data segments in a request (i.e. the maximum number of address/length 718 - pairs the host adapter can actually hand to the device at once) 719 - 720 - blk_queue_max_phys_segments() : Sets an upper limit on the maximum number 721 - of physical data segments in a request (i.e. the largest sized scatter list 722 - a driver could handle) 723 - 724 - 3.2.3 I/O completion 725 - ^^^^^^^^^^^^^^^^^^^^ 726 - 727 - The existing generic block layer helper routines end_request, 728 - end_that_request_first and end_that_request_last can be used for i/o 729 - completion (and setting things up so the rest of the i/o or the next 730 - request can be kicked of) as before. With the introduction of multi-page 731 - bio support, end_that_request_first requires an additional argument indicating 732 - the number of sectors completed. 733 - 734 - 3.2.4 Implications for drivers that do not interpret bios 735 - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 736 - 737 - (don't handle multiple segments) 738 - 739 - Drivers that do not interpret bios e.g those which do not handle multiple 740 - segments and do not support i/o into high memory addresses (require bounce 741 - buffers) and expect only virtually mapped buffers, can access the rq->buffer 742 - field. As before the driver should use current_nr_sectors to determine the 743 - size of remaining data in the current segment (that is the maximum it can 744 - transfer in one go unless it interprets segments), and rely on the block layer 745 - end_request, or end_that_request_first/last to take care of all accounting 746 - and transparent mapping of the next bio segment when a segment boundary 747 - is crossed on completion of a transfer. (The end*request* functions should 748 - be used if only if the request has come down from block/bio path, not for 749 - direct access requests which only specify rq->buffer without a valid rq->bio) 750 - 751 - 3.3 I/O Submission 752 - ------------------ 753 - 754 - The routine submit_bio() is used to submit a single io. Higher level i/o 755 - routines make use of this: 756 - 757 - (a) Buffered i/o: 758 - 759 - The routine submit_bh() invokes submit_bio() on a bio corresponding to the 760 - bh, allocating the bio if required. ll_rw_block() uses submit_bh() as before. 761 - 762 - (b) Kiobuf i/o (for raw/direct i/o): 763 - 764 - The ll_rw_kio() routine breaks up the kiobuf into page sized chunks and 765 - maps the array to one or more multi-page bios, issuing submit_bio() to 766 - perform the i/o on each of these. 767 - 768 - The embedded bh array in the kiobuf structure has been removed and no 769 - preallocation of bios is done for kiobufs. [The intent is to remove the 770 - blocks array as well, but it's currently in there to kludge around direct i/o.] 771 - Thus kiobuf allocation has switched back to using kmalloc rather than vmalloc. 772 - 773 - Todo/Observation: 774 - 775 - A single kiobuf structure is assumed to correspond to a contiguous range 776 - of data, so brw_kiovec() invokes ll_rw_kio for each kiobuf in a kiovec. 777 - So right now it wouldn't work for direct i/o on non-contiguous blocks. 778 - This is to be resolved. The eventual direction is to replace kiobuf 779 - by kvec's. 780 - 781 - Badari Pulavarty has a patch to implement direct i/o correctly using 782 - bio and kvec. 783 - 784 - 785 - (c) Page i/o: 786 - 787 - Todo/Under discussion: 788 - 789 - Andrew Morton's multi-page bio patches attempt to issue multi-page 790 - writeouts (and reads) from the page cache, by directly building up 791 - large bios for submission completely bypassing the usage of buffer 792 - heads. This work is still in progress. 793 - 794 - Christoph Hellwig had some code that uses bios for page-io (rather than 795 - bh). This isn't included in bio as yet. Christoph was also working on a 796 - design for representing virtual/real extents as an entity and modifying 797 - some of the address space ops interfaces to utilize this abstraction rather 798 - than buffer_heads. (This is somewhat along the lines of the SGI XFS pagebuf 799 - abstraction, but intended to be as lightweight as possible). 800 - 801 - (d) Direct access i/o: 802 - 803 - Direct access requests that do not contain bios would be submitted differently 804 - as discussed earlier in section 1.3. 805 - 806 - Aside: 807 - 808 - Kvec i/o: 809 - 810 - Ben LaHaise's aio code uses a slightly different structure instead 811 - of kiobufs, called a kvec_cb. This contains an array of <page, offset, len> 812 - tuples (very much like the networking code), together with a callback function 813 - and data pointer. This is embedded into a brw_cb structure when passed 814 - to brw_kvec_async(). 815 - 816 - Now it should be possible to directly map these kvecs to a bio. Just as while 817 - cloning, in this case rather than PRE_BUILT bio_vecs, we set the bi_io_vec 818 - array pointer to point to the veclet array in kvecs. 819 - 820 - TBD: In order for this to work, some changes are needed in the way multi-page 821 - bios are handled today. The values of the tuples in such a vector passed in 822 - from higher level code should not be modified by the block layer in the course 823 - of its request processing, since that would make it hard for the higher layer 824 - to continue to use the vector descriptor (kvec) after i/o completes. Instead, 825 - all such transient state should either be maintained in the request structure, 826 - and passed on in some way to the endio completion routine. 827 - 828 - 829 - 4. The I/O scheduler 830 - ==================== 831 - 832 - I/O scheduler, a.k.a. elevator, is implemented in two layers. Generic dispatch 833 - queue and specific I/O schedulers. Unless stated otherwise, elevator is used 834 - to refer to both parts and I/O scheduler to specific I/O schedulers. 835 - 836 - Block layer implements generic dispatch queue in `block/*.c`. 837 - The generic dispatch queue is responsible for requeueing, handling non-fs 838 - requests and all other subtleties. 839 - 840 - Specific I/O schedulers are responsible for ordering normal filesystem 841 - requests. They can also choose to delay certain requests to improve 842 - throughput or whatever purpose. As the plural form indicates, there are 843 - multiple I/O schedulers. They can be built as modules but at least one should 844 - be built inside the kernel. Each queue can choose different one and can also 845 - change to another one dynamically. 846 - 847 - A block layer call to the i/o scheduler follows the convention elv_xxx(). This 848 - calls elevator_xxx_fn in the elevator switch (block/elevator.c). Oh, xxx 849 - and xxx might not match exactly, but use your imagination. If an elevator 850 - doesn't implement a function, the switch does nothing or some minimal house 851 - keeping work. 852 - 853 - 4.1. I/O scheduler API 854 - ---------------------- 855 - 856 - The functions an elevator may implement are: (* are mandatory) 857 - 858 - =============================== ================================================ 859 - elevator_merge_fn called to query requests for merge with a bio 860 - 861 - elevator_merge_req_fn called when two requests get merged. the one 862 - which gets merged into the other one will be 863 - never seen by I/O scheduler again. IOW, after 864 - being merged, the request is gone. 865 - 866 - elevator_merged_fn called when a request in the scheduler has been 867 - involved in a merge. It is used in the deadline 868 - scheduler for example, to reposition the request 869 - if its sorting order has changed. 870 - 871 - elevator_allow_merge_fn called whenever the block layer determines 872 - that a bio can be merged into an existing 873 - request safely. The io scheduler may still 874 - want to stop a merge at this point if it 875 - results in some sort of conflict internally, 876 - this hook allows it to do that. Note however 877 - that two *requests* can still be merged at later 878 - time. Currently the io scheduler has no way to 879 - prevent that. It can only learn about the fact 880 - from elevator_merge_req_fn callback. 881 - 882 - elevator_dispatch_fn* fills the dispatch queue with ready requests. 883 - I/O schedulers are free to postpone requests by 884 - not filling the dispatch queue unless @force 885 - is non-zero. Once dispatched, I/O schedulers 886 - are not allowed to manipulate the requests - 887 - they belong to generic dispatch queue. 888 - 889 - elevator_add_req_fn* called to add a new request into the scheduler 890 - 891 - elevator_former_req_fn 892 - elevator_latter_req_fn These return the request before or after the 893 - one specified in disk sort order. Used by the 894 - block layer to find merge possibilities. 895 - 896 - elevator_completed_req_fn called when a request is completed. 897 - 898 - elevator_set_req_fn 899 - elevator_put_req_fn Must be used to allocate and free any elevator 900 - specific storage for a request. 901 - 902 - elevator_activate_req_fn Called when device driver first sees a request. 903 - I/O schedulers can use this callback to 904 - determine when actual execution of a request 905 - starts. 906 - elevator_deactivate_req_fn Called when device driver decides to delay 907 - a request by requeueing it. 908 - 909 - elevator_init_fn* 910 - elevator_exit_fn Allocate and free any elevator specific storage 911 - for a queue. 912 - =============================== ================================================ 913 - 914 - 4.2 Request flows seen by I/O schedulers 915 - ---------------------------------------- 916 - 917 - All requests seen by I/O schedulers strictly follow one of the following three 918 - flows. 919 - 920 - set_req_fn -> 921 - 922 - i. add_req_fn -> (merged_fn ->)* -> dispatch_fn -> activate_req_fn -> 923 - (deactivate_req_fn -> activate_req_fn ->)* -> completed_req_fn 924 - ii. add_req_fn -> (merged_fn ->)* -> merge_req_fn 925 - iii. [none] 926 - 927 - -> put_req_fn 928 - 929 - 4.3 I/O scheduler implementation 930 - -------------------------------- 931 - 932 - The generic i/o scheduler algorithm attempts to sort/merge/batch requests for 933 - optimal disk scan and request servicing performance (based on generic 934 - principles and device capabilities), optimized for: 935 - 936 - i. improved throughput 937 - ii. improved latency 938 - iii. better utilization of h/w & CPU time 939 - 940 - Characteristics: 941 - 942 - i. Binary tree 943 - AS and deadline i/o schedulers use red black binary trees for disk position 944 - sorting and searching, and a fifo linked list for time-based searching. This 945 - gives good scalability and good availability of information. Requests are 946 - almost always dispatched in disk sort order, so a cache is kept of the next 947 - request in sort order to prevent binary tree lookups. 948 - 949 - This arrangement is not a generic block layer characteristic however, so 950 - elevators may implement queues as they please. 951 - 952 - ii. Merge hash 953 - AS and deadline use a hash table indexed by the last sector of a request. This 954 - enables merging code to quickly look up "back merge" candidates, even when 955 - multiple I/O streams are being performed at once on one disk. 956 - 957 - "Front merges", a new request being merged at the front of an existing request, 958 - are far less common than "back merges" due to the nature of most I/O patterns. 959 - Front merges are handled by the binary trees in AS and deadline schedulers. 960 - 961 - iii. Plugging the queue to batch requests in anticipation of opportunities for 962 - merge/sort optimizations 963 - 964 - Plugging is an approach that the current i/o scheduling algorithm resorts to so 965 - that it collects up enough requests in the queue to be able to take 966 - advantage of the sorting/merging logic in the elevator. If the 967 - queue is empty when a request comes in, then it plugs the request queue 968 - (sort of like plugging the bath tub of a vessel to get fluid to build up) 969 - till it fills up with a few more requests, before starting to service 970 - the requests. This provides an opportunity to merge/sort the requests before 971 - passing them down to the device. There are various conditions when the queue is 972 - unplugged (to open up the flow again), either through a scheduled task or 973 - could be on demand. For example wait_on_buffer sets the unplugging going 974 - through sync_buffer() running blk_run_address_space(mapping). Or the caller 975 - can do it explicity through blk_unplug(bdev). So in the read case, 976 - the queue gets explicitly unplugged as part of waiting for completion on that 977 - buffer. 978 - 979 - Aside: 980 - This is kind of controversial territory, as it's not clear if plugging is 981 - always the right thing to do. Devices typically have their own queues, 982 - and allowing a big queue to build up in software, while letting the device be 983 - idle for a while may not always make sense. The trick is to handle the fine 984 - balance between when to plug and when to open up. Also now that we have 985 - multi-page bios being queued in one shot, we may not need to wait to merge 986 - a big request from the broken up pieces coming by. 987 - 988 - 4.4 I/O contexts 989 - ---------------- 990 - 991 - I/O contexts provide a dynamically allocated per process data area. They may 992 - be used in I/O schedulers, and in the block layer (could be used for IO statis, 993 - priorities for example). See `*io_context` in block/ll_rw_blk.c, and as-iosched.c 994 - for an example of usage in an i/o scheduler. 995 - 996 - 997 - 5. Scalability related changes 998 - ============================== 999 - 1000 - 5.1 Granular Locking: io_request_lock replaced by a per-queue lock 1001 - ------------------------------------------------------------------ 1002 - 1003 - The global io_request_lock has been removed as of 2.5, to avoid 1004 - the scalability bottleneck it was causing, and has been replaced by more 1005 - granular locking. The request queue structure has a pointer to the 1006 - lock to be used for that queue. As a result, locking can now be 1007 - per-queue, with a provision for sharing a lock across queues if 1008 - necessary (e.g the scsi layer sets the queue lock pointers to the 1009 - corresponding adapter lock, which results in a per host locking 1010 - granularity). The locking semantics are the same, i.e. locking is 1011 - still imposed by the block layer, grabbing the lock before 1012 - request_fn execution which it means that lots of older drivers 1013 - should still be SMP safe. Drivers are free to drop the queue 1014 - lock themselves, if required. Drivers that explicitly used the 1015 - io_request_lock for serialization need to be modified accordingly. 1016 - Usually it's as easy as adding a global lock:: 1017 - 1018 - static DEFINE_SPINLOCK(my_driver_lock); 1019 - 1020 - and passing the address to that lock to blk_init_queue(). 1021 - 1022 - 5.2 64 bit sector numbers (sector_t prepares for 64 bit support) 1023 - ---------------------------------------------------------------- 1024 - 1025 - The sector number used in the bio structure has been changed to sector_t, 1026 - which could be defined as 64 bit in preparation for 64 bit sector support. 1027 - 1028 - 6. Other Changes/Implications 1029 - ============================= 1030 - 1031 - 6.1 Partition re-mapping handled by the generic block layer 1032 - ----------------------------------------------------------- 1033 - 1034 - In 2.5 some of the gendisk/partition related code has been reorganized. 1035 - Now the generic block layer performs partition-remapping early and thus 1036 - provides drivers with a sector number relative to whole device, rather than 1037 - having to take partition number into account in order to arrive at the true 1038 - sector number. The routine blk_partition_remap() is invoked by 1039 - submit_bio_noacct even before invoking the queue specific ->submit_bio, 1040 - so the i/o scheduler also gets to operate on whole disk sector numbers. This 1041 - should typically not require changes to block drivers, it just never gets 1042 - to invoke its own partition sector offset calculations since all bios 1043 - sent are offset from the beginning of the device. 1044 - 1045 - 1046 - 7. A Few Tips on Migration of older drivers 1047 - =========================================== 1048 - 1049 - Old-style drivers that just use CURRENT and ignores clustered requests, 1050 - may not need much change. The generic layer will automatically handle 1051 - clustered requests, multi-page bios, etc for the driver. 1052 - 1053 - For a low performance driver or hardware that is PIO driven or just doesn't 1054 - support scatter-gather changes should be minimal too. 1055 - 1056 - The following are some points to keep in mind when converting old drivers 1057 - to bio. 1058 - 1059 - Drivers should use elv_next_request to pick up requests and are no longer 1060 - supposed to handle looping directly over the request list. 1061 - (struct request->queue has been removed) 1062 - 1063 - Now end_that_request_first takes an additional number_of_sectors argument. 1064 - It used to handle always just the first buffer_head in a request, now 1065 - it will loop and handle as many sectors (on a bio-segment granularity) 1066 - as specified. 1067 - 1068 - Now bh->b_end_io is replaced by bio->bi_end_io, but most of the time the 1069 - right thing to use is bio_endio(bio) instead. 1070 - 1071 - If the driver is dropping the io_request_lock from its request_fn strategy, 1072 - then it just needs to replace that with q->queue_lock instead. 1073 - 1074 - As described in Sec 1.1, drivers can set max sector size, max segment size 1075 - etc per queue now. Drivers that used to define their own merge functions i 1076 - to handle things like this can now just use the blk_queue_* functions at 1077 - blk_init_queue time. 1078 - 1079 - Drivers no longer have to map a {partition, sector offset} into the 1080 - correct absolute location anymore, this is done by the block layer, so 1081 - where a driver received a request ala this before:: 1082 - 1083 - rq->rq_dev = mk_kdev(3, 5); /* /dev/hda5 */ 1084 - rq->sector = 0; /* first sector on hda5 */ 1085 - 1086 - it will now see:: 1087 - 1088 - rq->rq_dev = mk_kdev(3, 0); /* /dev/hda */ 1089 - rq->sector = 123128; /* offset from start of disk */ 1090 - 1091 - As mentioned, there is no virtual mapping of a bio. For DMA, this is 1092 - not a problem as the driver probably never will need a virtual mapping. 1093 - Instead it needs a bus mapping (dma_map_page for a single segment or 1094 - use dma_map_sg for scatter gather) to be able to ship it to the driver. For 1095 - PIO drivers (or drivers that need to revert to PIO transfer once in a 1096 - while (IDE for example)), where the CPU is doing the actual data 1097 - transfer a virtual mapping is needed. If the driver supports highmem I/O, 1098 - (Sec 1.1, (ii) ) it needs to use kmap_atomic or similar to temporarily map 1099 - a bio into the virtual address space. 1100 - 1101 - 1102 - 8. Prior/Related/Impacted patches 1103 - ================================= 1104 - 1105 - 8.1. Earlier kiobuf patches (sct/axboe/chait/hch/mkp) 1106 - ----------------------------------------------------- 1107 - 1108 - - orig kiobuf & raw i/o patches (now in 2.4 tree) 1109 - - direct kiobuf based i/o to devices (no intermediate bh's) 1110 - - page i/o using kiobuf 1111 - - kiobuf splitting for lvm (mkp) 1112 - - elevator support for kiobuf request merging (axboe) 1113 - 1114 - 8.2. Zero-copy networking (Dave Miller) 1115 - --------------------------------------- 1116 - 1117 - 8.3. SGI XFS - pagebuf patches - use of kiobufs 1118 - ----------------------------------------------- 1119 - 8.4. Multi-page pioent patch for bio (Christoph Hellwig) 1120 - -------------------------------------------------------- 1121 - 8.5. Direct i/o implementation (Andrea Arcangeli) since 2.4.10-pre11 1122 - -------------------------------------------------------------------- 1123 - 8.6. Async i/o implementation patch (Ben LaHaise) 1124 - ------------------------------------------------- 1125 - 8.7. EVMS layering design (IBM EVMS team) 1126 - ----------------------------------------- 1127 - 8.8. Larger page cache size patch (Ben LaHaise) and Large page size (Daniel Phillips) 1128 - ------------------------------------------------------------------------------------- 1129 - 1130 - => larger contiguous physical memory buffers 1131 - 1132 - 8.9. VM reservations patch (Ben LaHaise) 1133 - ---------------------------------------- 1134 - 8.10. Write clustering patches ? (Marcelo/Quintela/Riel ?) 1135 - ---------------------------------------------------------- 1136 - 8.11. Block device in page cache patch (Andrea Archangeli) - now in 2.4.10+ 1137 - --------------------------------------------------------------------------- 1138 - 8.12. Multiple block-size transfers for faster raw i/o (Shailabh Nagar, Badari) 1139 - ------------------------------------------------------------------------------- 1140 - 8.13 Priority based i/o scheduler - prepatches (Arjan van de Ven) 1141 - ------------------------------------------------------------------ 1142 - 8.14 IDE Taskfile i/o patch (Andre Hedrick) 1143 - -------------------------------------------- 1144 - 8.15 Multi-page writeout and readahead patches (Andrew Morton) 1145 - --------------------------------------------------------------- 1146 - 8.16 Direct i/o patches for 2.5 using kvec and bio (Badari Pulavarthy) 1147 - ----------------------------------------------------------------------- 1148 - 1149 - 9. Other References 1150 - =================== 1151 - 1152 - 9.1 The Splice I/O Model 1153 - ------------------------ 1154 - 1155 - Larry McVoy (and subsequent discussions on lkml, and Linus' comments - Jan 2001 1156 - 1157 - 9.2 Discussions about kiobuf and bh design 1158 - ------------------------------------------ 1159 - 1160 - On lkml between sct, linus, alan et al - Feb-March 2001 (many of the 1161 - initial thoughts that led to bio were brought up in this discussion thread) 1162 - 1163 - 9.3 Discussions on mempool on lkml - Dec 2001. 1164 - ----------------------------------------------

+1 -1

Documentation/block/capability.rst

··· 7 7 ``capability`` is a bitfield, printed in hexadecimal, indicating which 8 8 capabilities a specific block device supports: 9 9 10 - .. kernel-doc:: include/linux/genhd.h 10 + .. kernel-doc:: include/linux/blkdev.h

-1

Documentation/block/index.rst

··· 8 8 :maxdepth: 1 9 9 10 10 bfq-iosched 11 - biodoc 12 11 biovecs 13 12 blk-mq 14 13 capability

+1

MAINTAINERS

··· 3440 3440 F: Documentation/block/ 3441 3441 F: block/ 3442 3442 F: drivers/block/ 3443 + F: include/linux/bio.h 3443 3444 F: include/linux/blk* 3444 3445 F: kernel/trace/blktrace.c 3445 3446 F: lib/sbitmap.c

-1

arch/m68k/atari/stdma.c

··· 30 30 31 31 #include <linux/types.h> 32 32 #include <linux/kdev_t.h> 33 - #include <linux/genhd.h> 34 33 #include <linux/sched.h> 35 34 #include <linux/init.h> 36 35 #include <linux/interrupt.h>

-1

arch/m68k/bvme6000/config.c

··· 23 23 #include <linux/linkage.h> 24 24 #include <linux/init.h> 25 25 #include <linux/major.h> 26 - #include <linux/genhd.h> 27 26 #include <linux/rtc.h> 28 27 #include <linux/interrupt.h> 29 28 #include <linux/bcd.h>

-1

arch/m68k/emu/nfblock.c

··· 13 13 #include <linux/kernel.h> 14 14 #include <linux/errno.h> 15 15 #include <linux/types.h> 16 - #include <linux/genhd.h> 17 16 #include <linux/blkdev.h> 18 17 #include <linux/hdreg.h> 19 18 #include <linux/slab.h>

-1

arch/m68k/kernel/setup_mm.c

··· 16 16 #include <linux/interrupt.h> 17 17 #include <linux/fs.h> 18 18 #include <linux/console.h> 19 - #include <linux/genhd.h> 20 19 #include <linux/errno.h> 21 20 #include <linux/string.h> 22 21 #include <linux/init.h>

-1

arch/m68k/mvme147/config.c

··· 22 22 #include <linux/linkage.h> 23 23 #include <linux/init.h> 24 24 #include <linux/major.h> 25 - #include <linux/genhd.h> 26 25 #include <linux/rtc.h> 27 26 #include <linux/interrupt.h> 28 27

-1

arch/m68k/mvme16x/config.c

··· 24 24 #include <linux/linkage.h> 25 25 #include <linux/init.h> 26 26 #include <linux/major.h> 27 - #include <linux/genhd.h> 28 27 #include <linux/rtc.h> 29 28 #include <linux/interrupt.h> 30 29 #include <linux/module.h>

+13

block/Kconfig

··· 26 26 27 27 if BLOCK 28 28 29 + config BLOCK_LEGACY_AUTOLOAD 30 + bool "Legacy autoloading support" 31 + default y 32 + help 33 + Enable loading modules and creating block device instances based on 34 + accesses through their device special file. This is a historic Linux 35 + feature and makes no sense in a udev world where device files are 36 + created on demand, but scripts that manually create device nodes and 37 + then call losetup might rely on this behavior. 38 + 29 39 config BLK_RQ_ALLOC_TIME 30 40 bool 31 41 ··· 226 216 227 217 # do not use in new code 228 218 config BLOCK_HOLDER_DEPRECATED 219 + bool 220 + 221 + config BLK_MQ_STACKING 229 222 bool 230 223 231 224 source "block/Kconfig.iosched"

+2 -1

block/Makefile

··· 36 36 obj-$(CONFIG_BLK_DEBUG_FS_ZONED)+= blk-mq-debugfs-zoned.o 37 37 obj-$(CONFIG_BLK_SED_OPAL) += sed-opal.o 38 38 obj-$(CONFIG_BLK_PM) += blk-pm.o 39 - obj-$(CONFIG_BLK_INLINE_ENCRYPTION) += blk-crypto.o blk-crypto-profile.o 39 + obj-$(CONFIG_BLK_INLINE_ENCRYPTION) += blk-crypto.o blk-crypto-profile.o \ 40 + blk-crypto-sysfs.o 40 41 obj-$(CONFIG_BLK_INLINE_ENCRYPTION_FALLBACK) += blk-crypto-fallback.o 41 42 obj-$(CONFIG_BLOCK_HOLDER_DEPRECATED) += holder.o

+7 -4

block/bdev.c

··· 678 678 if (test_bit(GD_NEED_PART_SCAN, &disk->state)) 679 679 bdev_disk_changed(disk, false); 680 680 bdev->bd_openers++; 681 - return 0;; 681 + return 0; 682 682 } 683 683 684 684 static void blkdev_put_whole(struct block_device *bdev, fmode_t mode) ··· 733 733 struct inode *inode; 734 734 735 735 inode = ilookup(blockdev_superblock, dev); 736 - if (!inode) { 736 + if (!inode && IS_ENABLED(CONFIG_BLOCK_LEGACY_AUTOLOAD)) { 737 737 blk_request_module(dev); 738 738 inode = ilookup(blockdev_superblock, dev); 739 - if (!inode) 740 - return NULL; 739 + if (inode) 740 + pr_warn_ratelimited( 741 + "block device autoloading is deprecated and will be removed.\n"); 741 742 } 743 + if (!inode) 744 + return NULL; 742 745 743 746 /* switch from the inode reference to a device mode one: */ 744 747 bdev = &BDEV_I(inode)->bdev;

+15 -1

block/bfq-cgroup.c

··· 645 645 struct bfq_group *bfqg) 646 646 { 647 647 struct bfq_entity *entity = &bfqq->entity; 648 + struct bfq_group *old_parent = bfqq_group(bfqq); 648 649 650 + /* 651 + * No point to move bfqq to the same group, which can happen when 652 + * root group is offlined 653 + */ 654 + if (old_parent == bfqg) 655 + return; 656 + 657 + /* 658 + * oom_bfqq is not allowed to move, oom_bfqq will hold ref to root_group 659 + * until elevator exit. 660 + */ 661 + if (bfqq == &bfqd->oom_bfqq) 662 + return; 649 663 /* 650 664 * Get extra reference to prevent bfqq from being freed in 651 665 * next possible expire or deactivate. ··· 680 666 bfq_deactivate_bfqq(bfqd, bfqq, false, false); 681 667 else if (entity->on_st_or_in_serv) 682 668 bfq_put_idle_entity(bfq_entity_service_tree(entity), entity); 683 - bfqg_and_blkg_put(bfqq_group(bfqq)); 669 + bfqg_and_blkg_put(old_parent); 684 670 685 671 if (entity->parent && 686 672 entity->parent->last_bfqq_created == bfqq)

+10 -9

block/bfq-iosched.c

··· 774 774 if (!bfqq->next_rq) 775 775 return; 776 776 777 - bfqq->pos_root = &bfq_bfqq_to_bfqg(bfqq)->rq_pos_tree; 777 + bfqq->pos_root = &bfqq_group(bfqq)->rq_pos_tree; 778 778 __bfqq = bfq_rq_pos_tree_lookup(bfqd, bfqq->pos_root, 779 779 blk_rq_pos(bfqq->next_rq), &parent, &p); 780 780 if (!__bfqq) { ··· 2669 2669 struct bfq_queue *bfqq, 2670 2670 sector_t sector) 2671 2671 { 2672 - struct rb_root *root = &bfq_bfqq_to_bfqg(bfqq)->rq_pos_tree; 2672 + struct rb_root *root = &bfqq_group(bfqq)->rq_pos_tree; 2673 2673 struct rb_node *parent, *node; 2674 2674 struct bfq_queue *__bfqq; 2675 2675 ··· 5181 5181 struct bfq_data *bfqd = hctx->queue->elevator->elevator_data; 5182 5182 struct request *rq; 5183 5183 struct bfq_queue *in_serv_queue; 5184 - bool waiting_rq, idle_timer_disabled; 5184 + bool waiting_rq, idle_timer_disabled = false; 5185 5185 5186 5186 spin_lock_irq(&bfqd->lock); 5187 5187 ··· 5189 5189 waiting_rq = in_serv_queue && bfq_bfqq_wait_request(in_serv_queue); 5190 5190 5191 5191 rq = __bfq_dispatch_request(hctx); 5192 - 5193 - idle_timer_disabled = 5194 - waiting_rq && !bfq_bfqq_wait_request(in_serv_queue); 5192 + if (in_serv_queue == bfqd->in_service_queue) { 5193 + idle_timer_disabled = 5194 + waiting_rq && !bfq_bfqq_wait_request(in_serv_queue); 5195 + } 5195 5196 5196 5197 spin_unlock_irq(&bfqd->lock); 5197 - 5198 - bfq_update_dispatch_stats(hctx->queue, rq, in_serv_queue, 5199 - idle_timer_disabled); 5198 + bfq_update_dispatch_stats(hctx->queue, rq, 5199 + idle_timer_disabled ? in_serv_queue : NULL, 5200 + idle_timer_disabled); 5200 5201 5201 5202 return rq; 5202 5203 }

-2

block/bfq-iosched.h

··· 8 8 9 9 #include <linux/blktrace_api.h> 10 10 #include <linux/hrtimer.h> 11 - #include <linux/blk-cgroup.h> 12 11 13 12 #include "blk-cgroup-rwstat.h" 14 13 ··· 1050 1051 for (parent = NULL; entity ; entity = parent) 1051 1052 #endif /* CONFIG_BFQ_GROUP_IOSCHED */ 1052 1053 1053 - struct bfq_group *bfq_bfqq_to_bfqg(struct bfq_queue *bfqq); 1054 1054 struct bfq_queue *bfq_entity_to_bfqq(struct bfq_entity *entity); 1055 1055 unsigned int bfq_tot_busy_queues(struct bfq_data *bfqd); 1056 1056 struct bfq_service_tree *bfq_entity_service_tree(struct bfq_entity *entity);

+1 -16

block/bfq-wf2q.c

··· 142 142 143 143 #ifdef CONFIG_BFQ_GROUP_IOSCHED 144 144 145 - struct bfq_group *bfq_bfqq_to_bfqg(struct bfq_queue *bfqq) 146 - { 147 - struct bfq_entity *group_entity = bfqq->entity.parent; 148 - 149 - if (!group_entity) 150 - group_entity = &bfqq->bfqd->root_group->entity; 151 - 152 - return container_of(group_entity, struct bfq_group, entity); 153 - } 154 - 155 145 /* 156 146 * Returns true if this budget changes may let next_in_service->parent 157 147 * become the next_in_service entity for its parent entity. ··· 219 229 } 220 230 221 231 #else /* CONFIG_BFQ_GROUP_IOSCHED */ 222 - 223 - struct bfq_group *bfq_bfqq_to_bfqg(struct bfq_queue *bfqq) 224 - { 225 - return bfqq->bfqd->root_group; 226 - } 227 232 228 233 static bool bfq_update_parent_budget(struct bfq_entity *next_in_service) 229 234 { ··· 504 519 static unsigned short bfq_weight_to_ioprio(int weight) 505 520 { 506 521 return max_t(int, 0, 507 - IOPRIO_NR_LEVELS * BFQ_WEIGHT_CONVERSION_COEFF - weight); 522 + IOPRIO_NR_LEVELS - weight / BFQ_WEIGHT_CONVERSION_COEFF); 508 523 } 509 524 510 525 static void bfq_get_entity(struct bfq_entity *entity)

-1

block/bio-integrity.c

··· 420 420 421 421 return 0; 422 422 } 423 - EXPORT_SYMBOL(bio_integrity_clone); 424 423 425 424 int bioset_integrity_create(struct bio_set *bs, int pool_size) 426 425 {

+110 -77

block/bio.c

··· 15 15 #include <linux/mempool.h> 16 16 #include <linux/workqueue.h> 17 17 #include <linux/cgroup.h> 18 - #include <linux/blk-cgroup.h> 19 18 #include <linux/highmem.h> 20 19 #include <linux/sched/sysctl.h> 21 20 #include <linux/blk-crypto.h> ··· 23 24 #include <trace/events/block.h> 24 25 #include "blk.h" 25 26 #include "blk-rq-qos.h" 27 + #include "blk-cgroup.h" 26 28 27 29 struct bio_alloc_cache { 28 30 struct bio *free_list; ··· 249 249 * they must remember to pair any call to bio_init() with bio_uninit() 250 250 * when IO has completed, or when the bio is released. 251 251 */ 252 - void bio_init(struct bio *bio, struct bio_vec *table, 253 - unsigned short max_vecs) 252 + void bio_init(struct bio *bio, struct block_device *bdev, struct bio_vec *table, 253 + unsigned short max_vecs, unsigned int opf) 254 254 { 255 255 bio->bi_next = NULL; 256 - bio->bi_bdev = NULL; 257 - bio->bi_opf = 0; 256 + bio->bi_bdev = bdev; 257 + bio->bi_opf = opf; 258 258 bio->bi_flags = 0; 259 259 bio->bi_ioprio = 0; 260 260 bio->bi_write_hint = 0; ··· 268 268 #ifdef CONFIG_BLK_CGROUP 269 269 bio->bi_blkg = NULL; 270 270 bio->bi_issue.value = 0; 271 + if (bdev) 272 + bio_associate_blkg(bio); 271 273 #ifdef CONFIG_BLK_CGROUP_IOCOST 272 274 bio->bi_iocost_cost = 0; 273 275 #endif ··· 295 293 /** 296 294 * bio_reset - reinitialize a bio 297 295 * @bio: bio to reset 296 + * @bdev: block device to use the bio for 297 + * @opf: operation and flags for bio 298 298 * 299 299 * Description: 300 300 * After calling bio_reset(), @bio will be in the same state as a freshly ··· 304 300 * preserved are the ones that are initialized by bio_alloc_bioset(). See 305 301 * comment in struct bio. 306 302 */ 307 - void bio_reset(struct bio *bio) 303 + void bio_reset(struct bio *bio, struct block_device *bdev, unsigned int opf) 308 304 { 309 305 bio_uninit(bio); 310 306 memset(bio, 0, BIO_RESET_BYTES); 311 307 atomic_set(&bio->__bi_remaining, 1); 308 + bio->bi_bdev = bdev; 309 + if (bio->bi_bdev) 310 + bio_associate_blkg(bio); 311 + bio->bi_opf = opf; 312 312 } 313 313 EXPORT_SYMBOL(bio_reset); 314 314 ··· 351 343 bio_inc_remaining(parent); 352 344 } 353 345 EXPORT_SYMBOL(bio_chain); 346 + 347 + struct bio *blk_next_bio(struct bio *bio, struct block_device *bdev, 348 + unsigned int nr_pages, unsigned int opf, gfp_t gfp) 349 + { 350 + struct bio *new = bio_alloc(bdev, nr_pages, opf, gfp); 351 + 352 + if (bio) { 353 + bio_chain(bio, new); 354 + submit_bio(bio); 355 + } 356 + 357 + return new; 358 + } 359 + EXPORT_SYMBOL_GPL(blk_next_bio); 354 360 355 361 static void bio_alloc_rescue(struct work_struct *work) 356 362 { ··· 422 400 423 401 /** 424 402 * bio_alloc_bioset - allocate a bio for I/O 403 + * @bdev: block device to allocate the bio for (can be %NULL) 404 + * @nr_vecs: number of bvecs to pre-allocate 405 + * @opf: operation and flags for bio 425 406 * @gfp_mask: the GFP_* mask given to the slab allocator 426 - * @nr_iovecs: number of iovecs to pre-allocate 427 407 * @bs: the bio_set to allocate from. 428 408 * 429 409 * Allocate a bio from the mempools in @bs. ··· 454 430 * 455 431 * Returns: Pointer to new bio on success, NULL on failure. 456 432 */ 457 - struct bio *bio_alloc_bioset(gfp_t gfp_mask, unsigned short nr_iovecs, 433 + struct bio *bio_alloc_bioset(struct block_device *bdev, unsigned short nr_vecs, 434 + unsigned int opf, gfp_t gfp_mask, 458 435 struct bio_set *bs) 459 436 { 460 437 gfp_t saved_gfp = gfp_mask; 461 438 struct bio *bio; 462 439 void *p; 463 440 464 - /* should not use nobvec bioset for nr_iovecs > 0 */ 465 - if (WARN_ON_ONCE(!mempool_initialized(&bs->bvec_pool) && nr_iovecs > 0)) 441 + /* should not use nobvec bioset for nr_vecs > 0 */ 442 + if (WARN_ON_ONCE(!mempool_initialized(&bs->bvec_pool) && nr_vecs > 0)) 466 443 return NULL; 467 444 468 445 /* ··· 500 475 return NULL; 501 476 502 477 bio = p + bs->front_pad; 503 - if (nr_iovecs > BIO_INLINE_VECS) { 478 + if (nr_vecs > BIO_INLINE_VECS) { 504 479 struct bio_vec *bvl = NULL; 505 480 506 - bvl = bvec_alloc(&bs->bvec_pool, &nr_iovecs, gfp_mask); 481 + bvl = bvec_alloc(&bs->bvec_pool, &nr_vecs, gfp_mask); 507 482 if (!bvl && gfp_mask != saved_gfp) { 508 483 punt_bios_to_rescuer(bs); 509 484 gfp_mask = saved_gfp; 510 - bvl = bvec_alloc(&bs->bvec_pool, &nr_iovecs, gfp_mask); 485 + bvl = bvec_alloc(&bs->bvec_pool, &nr_vecs, gfp_mask); 511 486 } 512 487 if (unlikely(!bvl)) 513 488 goto err_free; 514 489 515 - bio_init(bio, bvl, nr_iovecs); 516 - } else if (nr_iovecs) { 517 - bio_init(bio, bio->bi_inline_vecs, BIO_INLINE_VECS); 490 + bio_init(bio, bdev, bvl, nr_vecs, opf); 491 + } else if (nr_vecs) { 492 + bio_init(bio, bdev, bio->bi_inline_vecs, BIO_INLINE_VECS, opf); 518 493 } else { 519 - bio_init(bio, NULL, 0); 494 + bio_init(bio, bdev, NULL, 0, opf); 520 495 } 521 496 522 497 bio->bi_pool = bs; ··· 547 522 bio = kmalloc(struct_size(bio, bi_inline_vecs, nr_iovecs), gfp_mask); 548 523 if (unlikely(!bio)) 549 524 return NULL; 550 - bio_init(bio, nr_iovecs ? bio->bi_inline_vecs : NULL, nr_iovecs); 525 + bio_init(bio, NULL, nr_iovecs ? bio->bi_inline_vecs : NULL, nr_iovecs, 526 + 0); 551 527 bio->bi_pool = NULL; 552 528 return bio; 553 529 } ··· 728 702 } 729 703 EXPORT_SYMBOL(bio_put); 730 704 731 - /** 732 - * __bio_clone_fast - clone a bio that shares the original bio's biovec 733 - * @bio: destination bio 734 - * @bio_src: bio to clone 735 - * 736 - * Clone a &bio. Caller will own the returned bio, but not 737 - * the actual data it points to. Reference count of returned 738 - * bio will be one. 739 - * 740 - * Caller must ensure that @bio_src is not freed before @bio. 741 - */ 742 - void __bio_clone_fast(struct bio *bio, struct bio *bio_src) 705 + static int __bio_clone(struct bio *bio, struct bio *bio_src, gfp_t gfp) 743 706 { 744 - WARN_ON_ONCE(bio->bi_pool && bio->bi_max_vecs); 745 - 746 - /* 747 - * most users will be overriding ->bi_bdev with a new target, 748 - * so we don't set nor calculate new physical/hw segment counts here 749 - */ 750 - bio->bi_bdev = bio_src->bi_bdev; 751 707 bio_set_flag(bio, BIO_CLONED); 752 708 if (bio_flagged(bio_src, BIO_THROTTLED)) 753 709 bio_set_flag(bio, BIO_THROTTLED); 754 - if (bio_flagged(bio_src, BIO_REMAPPED)) 710 + if (bio->bi_bdev == bio_src->bi_bdev && 711 + bio_flagged(bio_src, BIO_REMAPPED)) 755 712 bio_set_flag(bio, BIO_REMAPPED); 756 - bio->bi_opf = bio_src->bi_opf; 757 713 bio->bi_ioprio = bio_src->bi_ioprio; 758 714 bio->bi_write_hint = bio_src->bi_write_hint; 759 715 bio->bi_iter = bio_src->bi_iter; 760 - bio->bi_io_vec = bio_src->bi_io_vec; 761 716 762 717 bio_clone_blkg_association(bio, bio_src); 763 718 blkcg_bio_issue_init(bio); 719 + 720 + if (bio_crypt_clone(bio, bio_src, gfp) < 0) 721 + return -ENOMEM; 722 + if (bio_integrity(bio_src) && 723 + bio_integrity_clone(bio, bio_src, gfp) < 0) 724 + return -ENOMEM; 725 + return 0; 764 726 } 765 - EXPORT_SYMBOL(__bio_clone_fast); 766 727 767 728 /** 768 - * bio_clone_fast - clone a bio that shares the original bio's biovec 769 - * @bio: bio to clone 770 - * @gfp_mask: allocation priority 771 - * @bs: bio_set to allocate from 729 + * bio_alloc_clone - clone a bio that shares the original bio's biovec 730 + * @bdev: block_device to clone onto 731 + * @bio_src: bio to clone from 732 + * @gfp: allocation priority 733 + * @bs: bio_set to allocate from 772 734 * 773 - * Like __bio_clone_fast, only also allocates the returned bio 735 + * Allocate a new bio that is a clone of @bio_src. The caller owns the returned 736 + * bio, but not the actual data it points to. 737 + * 738 + * The caller must ensure that the return bio is not freed before @bio_src. 774 739 */ 775 - struct bio *bio_clone_fast(struct bio *bio, gfp_t gfp_mask, struct bio_set *bs) 740 + struct bio *bio_alloc_clone(struct block_device *bdev, struct bio *bio_src, 741 + gfp_t gfp, struct bio_set *bs) 776 742 { 777 - struct bio *b; 743 + struct bio *bio; 778 744 779 - b = bio_alloc_bioset(gfp_mask, 0, bs); 780 - if (!b) 745 + bio = bio_alloc_bioset(bdev, 0, bio_src->bi_opf, gfp, bs); 746 + if (!bio) 781 747 return NULL; 782 748 783 - __bio_clone_fast(b, bio); 749 + if (__bio_clone(bio, bio_src, gfp) < 0) { 750 + bio_put(bio); 751 + return NULL; 752 + } 753 + bio->bi_io_vec = bio_src->bi_io_vec; 784 754 785 - if (bio_crypt_clone(b, bio, gfp_mask) < 0) 786 - goto err_put; 787 - 788 - if (bio_integrity(bio) && 789 - bio_integrity_clone(b, bio, gfp_mask) < 0) 790 - goto err_put; 791 - 792 - return b; 793 - 794 - err_put: 795 - bio_put(b); 796 - return NULL; 755 + return bio; 797 756 } 798 - EXPORT_SYMBOL(bio_clone_fast); 757 + EXPORT_SYMBOL(bio_alloc_clone); 799 758 800 - const char *bio_devname(struct bio *bio, char *buf) 759 + /** 760 + * bio_init_clone - clone a bio that shares the original bio's biovec 761 + * @bdev: block_device to clone onto 762 + * @bio: bio to clone into 763 + * @bio_src: bio to clone from 764 + * @gfp: allocation priority 765 + * 766 + * Initialize a new bio in caller provided memory that is a clone of @bio_src. 767 + * The caller owns the returned bio, but not the actual data it points to. 768 + * 769 + * The caller must ensure that @bio_src is not freed before @bio. 770 + */ 771 + int bio_init_clone(struct block_device *bdev, struct bio *bio, 772 + struct bio *bio_src, gfp_t gfp) 801 773 { 802 - return bdevname(bio->bi_bdev, buf); 774 + int ret; 775 + 776 + bio_init(bio, bdev, bio_src->bi_io_vec, 0, bio_src->bi_opf); 777 + ret = __bio_clone(bio, bio_src, gfp); 778 + if (ret) 779 + bio_uninit(bio); 780 + return ret; 803 781 } 804 - EXPORT_SYMBOL(bio_devname); 782 + EXPORT_SYMBOL(bio_init_clone); 805 783 806 784 /** 807 785 * bio_full - check if the bio is full ··· 1084 1054 size_t off) 1085 1055 { 1086 1056 if (len > UINT_MAX || off > UINT_MAX) 1087 - return 0; 1057 + return false; 1088 1058 return bio_add_page(bio, &folio->page, len, off) > 0; 1089 1059 } 1090 1060 ··· 1571 1541 if (WARN_ON_ONCE(bio_op(bio) == REQ_OP_ZONE_APPEND)) 1572 1542 return NULL; 1573 1543 1574 - split = bio_clone_fast(bio, gfp, bs); 1544 + split = bio_alloc_clone(bio->bi_bdev, bio, gfp, bs); 1575 1545 if (!split) 1576 1546 return NULL; 1577 1547 ··· 1666 1636 * Note that the bio must be embedded at the END of that structure always, 1667 1637 * or things will break badly. 1668 1638 * If %BIOSET_NEED_BVECS is set in @flags, a separate pool will be allocated 1669 - * for allocating iovecs. This pool is not needed e.g. for bio_clone_fast(). 1670 - * If %BIOSET_NEED_RESCUER is set, a workqueue is created which can be used to 1671 - * dispatch queued requests when the mempool runs out of space. 1639 + * for allocating iovecs. This pool is not needed e.g. for bio_init_clone(). 1640 + * If %BIOSET_NEED_RESCUER is set, a workqueue is created which can be used 1641 + * to dispatch queued requests when the mempool runs out of space. 1672 1642 * 1673 1643 */ 1674 1644 int bioset_init(struct bio_set *bs, ··· 1738 1708 /** 1739 1709 * bio_alloc_kiocb - Allocate a bio from bio_set based on kiocb 1740 1710 * @kiocb: kiocb describing the IO 1711 + * @bdev: block device to allocate the bio for (can be %NULL) 1741 1712 * @nr_vecs: number of iovecs to pre-allocate 1713 + * @opf: operation and flags for bio 1742 1714 * @bs: bio_set to allocate from 1743 1715 * 1744 1716 * Description: ··· 1751 1719 * MUST be done from process context, not hard/soft IRQ. 1752 1720 * 1753 1721 */ 1754 - struct bio *bio_alloc_kiocb(struct kiocb *kiocb, unsigned short nr_vecs, 1755 - struct bio_set *bs) 1722 + struct bio *bio_alloc_kiocb(struct kiocb *kiocb, struct block_device *bdev, 1723 + unsigned short nr_vecs, unsigned int opf, struct bio_set *bs) 1756 1724 { 1757 1725 struct bio_alloc_cache *cache; 1758 1726 struct bio *bio; 1759 1727 1760 1728 if (!(kiocb->ki_flags & IOCB_ALLOC_CACHE) || nr_vecs > BIO_INLINE_VECS) 1761 - return bio_alloc_bioset(GFP_KERNEL, nr_vecs, bs); 1729 + return bio_alloc_bioset(bdev, nr_vecs, opf, GFP_KERNEL, bs); 1762 1730 1763 1731 cache = per_cpu_ptr(bs->cache, get_cpu()); 1764 1732 if (cache->free_list) { ··· 1766 1734 cache->free_list = bio->bi_next; 1767 1735 cache->nr--; 1768 1736 put_cpu(); 1769 - bio_init(bio, nr_vecs ? bio->bi_inline_vecs : NULL, nr_vecs); 1737 + bio_init(bio, bdev, nr_vecs ? bio->bi_inline_vecs : NULL, 1738 + nr_vecs, opf); 1770 1739 bio->bi_pool = bs; 1771 1740 bio_set_flag(bio, BIO_PERCPU_CACHE); 1772 1741 return bio; 1773 1742 } 1774 1743 put_cpu(); 1775 - bio = bio_alloc_bioset(GFP_KERNEL, nr_vecs, bs); 1744 + bio = bio_alloc_bioset(bdev, nr_vecs, opf, GFP_KERNEL, bs); 1776 1745 bio_set_flag(bio, BIO_PERCPU_CACHE); 1777 1746 return bio; 1778 1747 }

+1 -1

block/blk-cgroup-rwstat.h

··· 6 6 #ifndef _BLK_CGROUP_RWSTAT_H 7 7 #define _BLK_CGROUP_RWSTAT_H 8 8 9 - #include <linux/blk-cgroup.h> 9 + #include "blk-cgroup.h" 10 10 11 11 enum blkg_rwstat_type { 12 12 BLKG_RWSTAT_READ,

+8 -7

block/blk-cgroup.c

··· 23 23 #include <linux/blkdev.h> 24 24 #include <linux/backing-dev.h> 25 25 #include <linux/slab.h> 26 - #include <linux/genhd.h> 27 26 #include <linux/delay.h> 28 27 #include <linux/atomic.h> 29 28 #include <linux/ctype.h> 30 - #include <linux/blk-cgroup.h> 31 29 #include <linux/tracehook.h> 32 30 #include <linux/psi.h> 33 31 #include <linux/part_stat.h> 34 32 #include "blk.h" 33 + #include "blk-cgroup.h" 35 34 #include "blk-ioprio.h" 36 35 #include "blk-throttle.h" 37 36 ··· 856 857 blk_queue_root_blkg(bdev_get_queue(bdev)); 857 858 struct blkg_iostat tmp; 858 859 int cpu; 860 + unsigned long flags; 859 861 860 862 memset(&tmp, 0, sizeof(tmp)); 861 863 for_each_possible_cpu(cpu) { 862 864 struct disk_stats *cpu_dkstats; 863 - unsigned long flags; 864 865 865 866 cpu_dkstats = per_cpu_ptr(bdev->bd_stats, cpu); 866 867 tmp.ios[BLKG_IOSTAT_READ] += ··· 876 877 cpu_dkstats->sectors[STAT_WRITE] << 9; 877 878 tmp.bytes[BLKG_IOSTAT_DISCARD] += 878 879 cpu_dkstats->sectors[STAT_DISCARD] << 9; 879 - 880 - flags = u64_stats_update_begin_irqsave(&blkg->iostat.sync); 881 - blkg_iostat_set(&blkg->iostat.cur, &tmp); 882 - u64_stats_update_end_irqrestore(&blkg->iostat.sync, flags); 883 880 } 881 + 882 + flags = u64_stats_update_begin_irqsave(&blkg->iostat.sync); 883 + blkg_iostat_set(&blkg->iostat.cur, &tmp); 884 + u64_stats_update_end_irqrestore(&blkg->iostat.sync, flags); 884 885 } 885 886 } 886 887 ··· 1174 1175 struct blkcg_gq *new_blkg, *blkg; 1175 1176 bool preloaded; 1176 1177 int ret; 1178 + 1179 + INIT_LIST_HEAD(&q->blkg_list); 1177 1180 1178 1181 new_blkg = blkg_alloc(&blkcg_root, q, GFP_KERNEL); 1179 1182 if (!new_blkg)

+477

block/blk-cgroup.h

··· 1 + /* SPDX-License-Identifier: GPL-2.0 */ 2 + #ifndef _BLK_CGROUP_PRIVATE_H 3 + #define _BLK_CGROUP_PRIVATE_H 4 + /* 5 + * block cgroup private header 6 + * 7 + * Based on ideas and code from CFQ, CFS and BFQ: 8 + * Copyright (C) 2003 Jens Axboe <axboe@kernel.dk> 9 + * 10 + * Copyright (C) 2008 Fabio Checconi <fabio@gandalf.sssup.it> 11 + * Paolo Valente <paolo.valente@unimore.it> 12 + * 13 + * Copyright (C) 2009 Vivek Goyal <vgoyal@redhat.com> 14 + * Nauman Rafique <nauman@google.com> 15 + */ 16 + 17 + #include <linux/blk-cgroup.h> 18 + 19 + /* percpu_counter batch for blkg_[rw]stats, per-cpu drift doesn't matter */ 20 + #define BLKG_STAT_CPU_BATCH (INT_MAX / 2) 21 + 22 + #ifdef CONFIG_BLK_CGROUP 23 + 24 + /* 25 + * A blkcg_gq (blkg) is association between a block cgroup (blkcg) and a 26 + * request_queue (q). This is used by blkcg policies which need to track 27 + * information per blkcg - q pair. 28 + * 29 + * There can be multiple active blkcg policies and each blkg:policy pair is 30 + * represented by a blkg_policy_data which is allocated and freed by each 31 + * policy's pd_alloc/free_fn() methods. A policy can allocate private data 32 + * area by allocating larger data structure which embeds blkg_policy_data 33 + * at the beginning. 34 + */ 35 + struct blkg_policy_data { 36 + /* the blkg and policy id this per-policy data belongs to */ 37 + struct blkcg_gq *blkg; 38 + int plid; 39 + }; 40 + 41 + /* 42 + * Policies that need to keep per-blkcg data which is independent from any 43 + * request_queue associated to it should implement cpd_alloc/free_fn() 44 + * methods. A policy can allocate private data area by allocating larger 45 + * data structure which embeds blkcg_policy_data at the beginning. 46 + * cpd_init() is invoked to let each policy handle per-blkcg data. 47 + */ 48 + struct blkcg_policy_data { 49 + /* the blkcg and policy id this per-policy data belongs to */ 50 + struct blkcg *blkcg; 51 + int plid; 52 + }; 53 + 54 + typedef struct blkcg_policy_data *(blkcg_pol_alloc_cpd_fn)(gfp_t gfp); 55 + typedef void (blkcg_pol_init_cpd_fn)(struct blkcg_policy_data *cpd); 56 + typedef void (blkcg_pol_free_cpd_fn)(struct blkcg_policy_data *cpd); 57 + typedef void (blkcg_pol_bind_cpd_fn)(struct blkcg_policy_data *cpd); 58 + typedef struct blkg_policy_data *(blkcg_pol_alloc_pd_fn)(gfp_t gfp, 59 + struct request_queue *q, struct blkcg *blkcg); 60 + typedef void (blkcg_pol_init_pd_fn)(struct blkg_policy_data *pd); 61 + typedef void (blkcg_pol_online_pd_fn)(struct blkg_policy_data *pd); 62 + typedef void (blkcg_pol_offline_pd_fn)(struct blkg_policy_data *pd); 63 + typedef void (blkcg_pol_free_pd_fn)(struct blkg_policy_data *pd); 64 + typedef void (blkcg_pol_reset_pd_stats_fn)(struct blkg_policy_data *pd); 65 + typedef bool (blkcg_pol_stat_pd_fn)(struct blkg_policy_data *pd, 66 + struct seq_file *s); 67 + 68 + struct blkcg_policy { 69 + int plid; 70 + /* cgroup files for the policy */ 71 + struct cftype *dfl_cftypes; 72 + struct cftype *legacy_cftypes; 73 + 74 + /* operations */ 75 + blkcg_pol_alloc_cpd_fn *cpd_alloc_fn; 76 + blkcg_pol_init_cpd_fn *cpd_init_fn; 77 + blkcg_pol_free_cpd_fn *cpd_free_fn; 78 + blkcg_pol_bind_cpd_fn *cpd_bind_fn; 79 + 80 + blkcg_pol_alloc_pd_fn *pd_alloc_fn; 81 + blkcg_pol_init_pd_fn *pd_init_fn; 82 + blkcg_pol_online_pd_fn *pd_online_fn; 83 + blkcg_pol_offline_pd_fn *pd_offline_fn; 84 + blkcg_pol_free_pd_fn *pd_free_fn; 85 + blkcg_pol_reset_pd_stats_fn *pd_reset_stats_fn; 86 + blkcg_pol_stat_pd_fn *pd_stat_fn; 87 + }; 88 + 89 + extern struct blkcg blkcg_root; 90 + extern bool blkcg_debug_stats; 91 + 92 + struct blkcg_gq *blkg_lookup_slowpath(struct blkcg *blkcg, 93 + struct request_queue *q, bool update_hint); 94 + int blkcg_init_queue(struct request_queue *q); 95 + void blkcg_exit_queue(struct request_queue *q); 96 + 97 + /* Blkio controller policy registration */ 98 + int blkcg_policy_register(struct blkcg_policy *pol); 99 + void blkcg_policy_unregister(struct blkcg_policy *pol); 100 + int blkcg_activate_policy(struct request_queue *q, 101 + const struct blkcg_policy *pol); 102 + void blkcg_deactivate_policy(struct request_queue *q, 103 + const struct blkcg_policy *pol); 104 + 105 + const char *blkg_dev_name(struct blkcg_gq *blkg); 106 + void blkcg_print_blkgs(struct seq_file *sf, struct blkcg *blkcg, 107 + u64 (*prfill)(struct seq_file *, 108 + struct blkg_policy_data *, int), 109 + const struct blkcg_policy *pol, int data, 110 + bool show_total); 111 + u64 __blkg_prfill_u64(struct seq_file *sf, struct blkg_policy_data *pd, u64 v); 112 + 113 + struct blkg_conf_ctx { 114 + struct block_device *bdev; 115 + struct blkcg_gq *blkg; 116 + char *body; 117 + }; 118 + 119 + struct block_device *blkcg_conf_open_bdev(char **inputp); 120 + int blkg_conf_prep(struct blkcg *blkcg, const struct blkcg_policy *pol, 121 + char *input, struct blkg_conf_ctx *ctx); 122 + void blkg_conf_finish(struct blkg_conf_ctx *ctx); 123 + 124 + /** 125 + * blkcg_css - find the current css 126 + * 127 + * Find the css associated with either the kthread or the current task. 128 + * This may return a dying css, so it is up to the caller to use tryget logic 129 + * to confirm it is alive and well. 130 + */ 131 + static inline struct cgroup_subsys_state *blkcg_css(void) 132 + { 133 + struct cgroup_subsys_state *css; 134 + 135 + css = kthread_blkcg(); 136 + if (css) 137 + return css; 138 + return task_css(current, io_cgrp_id); 139 + } 140 + 141 + /** 142 + * __bio_blkcg - internal, inconsistent version to get blkcg 143 + * 144 + * DO NOT USE. 145 + * This function is inconsistent and consequently is dangerous to use. The 146 + * first part of the function returns a blkcg where a reference is owned by the 147 + * bio. This means it does not need to be rcu protected as it cannot go away 148 + * with the bio owning a reference to it. However, the latter potentially gets 149 + * it from task_css(). This can race against task migration and the cgroup 150 + * dying. It is also semantically different as it must be called rcu protected 151 + * and is susceptible to failure when trying to get a reference to it. 152 + * Therefore, it is not ok to assume that *_get() will always succeed on the 153 + * blkcg returned here. 154 + */ 155 + static inline struct blkcg *__bio_blkcg(struct bio *bio) 156 + { 157 + if (bio && bio->bi_blkg) 158 + return bio->bi_blkg->blkcg; 159 + return css_to_blkcg(blkcg_css()); 160 + } 161 + 162 + /** 163 + * bio_issue_as_root_blkg - see if this bio needs to be issued as root blkg 164 + * @return: true if this bio needs to be submitted with the root blkg context. 165 + * 166 + * In order to avoid priority inversions we sometimes need to issue a bio as if 167 + * it were attached to the root blkg, and then backcharge to the actual owning 168 + * blkg. The idea is we do bio_blkcg() to look up the actual context for the 169 + * bio and attach the appropriate blkg to the bio. Then we call this helper and 170 + * if it is true run with the root blkg for that queue and then do any 171 + * backcharging to the originating cgroup once the io is complete. 172 + */ 173 + static inline bool bio_issue_as_root_blkg(struct bio *bio) 174 + { 175 + return (bio->bi_opf & (REQ_META | REQ_SWAP)) != 0; 176 + } 177 + 178 + /** 179 + * __blkg_lookup - internal version of blkg_lookup() 180 + * @blkcg: blkcg of interest 181 + * @q: request_queue of interest 182 + * @update_hint: whether to update lookup hint with the result or not 183 + * 184 + * This is internal version and shouldn't be used by policy 185 + * implementations. Looks up blkgs for the @blkcg - @q pair regardless of 186 + * @q's bypass state. If @update_hint is %true, the caller should be 187 + * holding @q->queue_lock and lookup hint is updated on success. 188 + */ 189 + static inline struct blkcg_gq *__blkg_lookup(struct blkcg *blkcg, 190 + struct request_queue *q, 191 + bool update_hint) 192 + { 193 + struct blkcg_gq *blkg; 194 + 195 + if (blkcg == &blkcg_root) 196 + return q->root_blkg; 197 + 198 + blkg = rcu_dereference(blkcg->blkg_hint); 199 + if (blkg && blkg->q == q) 200 + return blkg; 201 + 202 + return blkg_lookup_slowpath(blkcg, q, update_hint); 203 + } 204 + 205 + /** 206 + * blkg_lookup - lookup blkg for the specified blkcg - q pair 207 + * @blkcg: blkcg of interest 208 + * @q: request_queue of interest 209 + * 210 + * Lookup blkg for the @blkcg - @q pair. This function should be called 211 + * under RCU read lock. 212 + */ 213 + static inline struct blkcg_gq *blkg_lookup(struct blkcg *blkcg, 214 + struct request_queue *q) 215 + { 216 + WARN_ON_ONCE(!rcu_read_lock_held()); 217 + return __blkg_lookup(blkcg, q, false); 218 + } 219 + 220 + /** 221 + * blk_queue_root_blkg - return blkg for the (blkcg_root, @q) pair 222 + * @q: request_queue of interest 223 + * 224 + * Lookup blkg for @q at the root level. See also blkg_lookup(). 225 + */ 226 + static inline struct blkcg_gq *blk_queue_root_blkg(struct request_queue *q) 227 + { 228 + return q->root_blkg; 229 + } 230 + 231 + /** 232 + * blkg_to_pdata - get policy private data 233 + * @blkg: blkg of interest 234 + * @pol: policy of interest 235 + * 236 + * Return pointer to private data associated with the @blkg-@pol pair. 237 + */ 238 + static inline struct blkg_policy_data *blkg_to_pd(struct blkcg_gq *blkg, 239 + struct blkcg_policy *pol) 240 + { 241 + return blkg ? blkg->pd[pol->plid] : NULL; 242 + } 243 + 244 + static inline struct blkcg_policy_data *blkcg_to_cpd(struct blkcg *blkcg, 245 + struct blkcg_policy *pol) 246 + { 247 + return blkcg ? blkcg->cpd[pol->plid] : NULL; 248 + } 249 + 250 + /** 251 + * pdata_to_blkg - get blkg associated with policy private data 252 + * @pd: policy private data of interest 253 + * 254 + * @pd is policy private data. Determine the blkg it's associated with. 255 + */ 256 + static inline struct blkcg_gq *pd_to_blkg(struct blkg_policy_data *pd) 257 + { 258 + return pd ? pd->blkg : NULL; 259 + } 260 + 261 + static inline struct blkcg *cpd_to_blkcg(struct blkcg_policy_data *cpd) 262 + { 263 + return cpd ? cpd->blkcg : NULL; 264 + } 265 + 266 + /** 267 + * blkg_path - format cgroup path of blkg 268 + * @blkg: blkg of interest 269 + * @buf: target buffer 270 + * @buflen: target buffer length 271 + * 272 + * Format the path of the cgroup of @blkg into @buf. 273 + */ 274 + static inline int blkg_path(struct blkcg_gq *blkg, char *buf, int buflen) 275 + { 276 + return cgroup_path(blkg->blkcg->css.cgroup, buf, buflen); 277 + } 278 + 279 + /** 280 + * blkg_get - get a blkg reference 281 + * @blkg: blkg to get 282 + * 283 + * The caller should be holding an existing reference. 284 + */ 285 + static inline void blkg_get(struct blkcg_gq *blkg) 286 + { 287 + percpu_ref_get(&blkg->refcnt); 288 + } 289 + 290 + /** 291 + * blkg_tryget - try and get a blkg reference 292 + * @blkg: blkg to get 293 + * 294 + * This is for use when doing an RCU lookup of the blkg. We may be in the midst 295 + * of freeing this blkg, so we can only use it if the refcnt is not zero. 296 + */ 297 + static inline bool blkg_tryget(struct blkcg_gq *blkg) 298 + { 299 + return blkg && percpu_ref_tryget(&blkg->refcnt); 300 + } 301 + 302 + /** 303 + * blkg_put - put a blkg reference 304 + * @blkg: blkg to put 305 + */ 306 + static inline void blkg_put(struct blkcg_gq *blkg) 307 + { 308 + percpu_ref_put(&blkg->refcnt); 309 + } 310 + 311 + /** 312 + * blkg_for_each_descendant_pre - pre-order walk of a blkg's descendants 313 + * @d_blkg: loop cursor pointing to the current descendant 314 + * @pos_css: used for iteration 315 + * @p_blkg: target blkg to walk descendants of 316 + * 317 + * Walk @c_blkg through the descendants of @p_blkg. Must be used with RCU 318 + * read locked. If called under either blkcg or queue lock, the iteration 319 + * is guaranteed to include all and only online blkgs. The caller may 320 + * update @pos_css by calling css_rightmost_descendant() to skip subtree. 321 + * @p_blkg is included in the iteration and the first node to be visited. 322 + */ 323 + #define blkg_for_each_descendant_pre(d_blkg, pos_css, p_blkg) \ 324 + css_for_each_descendant_pre((pos_css), &(p_blkg)->blkcg->css) \ 325 + if (((d_blkg) = __blkg_lookup(css_to_blkcg(pos_css), \ 326 + (p_blkg)->q, false))) 327 + 328 + /** 329 + * blkg_for_each_descendant_post - post-order walk of a blkg's descendants 330 + * @d_blkg: loop cursor pointing to the current descendant 331 + * @pos_css: used for iteration 332 + * @p_blkg: target blkg to walk descendants of 333 + * 334 + * Similar to blkg_for_each_descendant_pre() but performs post-order 335 + * traversal instead. Synchronization rules are the same. @p_blkg is 336 + * included in the iteration and the last node to be visited. 337 + */ 338 + #define blkg_for_each_descendant_post(d_blkg, pos_css, p_blkg) \ 339 + css_for_each_descendant_post((pos_css), &(p_blkg)->blkcg->css) \ 340 + if (((d_blkg) = __blkg_lookup(css_to_blkcg(pos_css), \ 341 + (p_blkg)->q, false))) 342 + 343 + bool __blkcg_punt_bio_submit(struct bio *bio); 344 + 345 + static inline bool blkcg_punt_bio_submit(struct bio *bio) 346 + { 347 + if (bio->bi_opf & REQ_CGROUP_PUNT) 348 + return __blkcg_punt_bio_submit(bio); 349 + else 350 + return false; 351 + } 352 + 353 + static inline void blkcg_bio_issue_init(struct bio *bio) 354 + { 355 + bio_issue_init(&bio->bi_issue, bio_sectors(bio)); 356 + } 357 + 358 + static inline void blkcg_use_delay(struct blkcg_gq *blkg) 359 + { 360 + if (WARN_ON_ONCE(atomic_read(&blkg->use_delay) < 0)) 361 + return; 362 + if (atomic_add_return(1, &blkg->use_delay) == 1) 363 + atomic_inc(&blkg->blkcg->css.cgroup->congestion_count); 364 + } 365 + 366 + static inline int blkcg_unuse_delay(struct blkcg_gq *blkg) 367 + { 368 + int old = atomic_read(&blkg->use_delay); 369 + 370 + if (WARN_ON_ONCE(old < 0)) 371 + return 0; 372 + if (old == 0) 373 + return 0; 374 + 375 + /* 376 + * We do this song and dance because we can race with somebody else 377 + * adding or removing delay. If we just did an atomic_dec we'd end up 378 + * negative and we'd already be in trouble. We need to subtract 1 and 379 + * then check to see if we were the last delay so we can drop the 380 + * congestion count on the cgroup. 381 + */ 382 + while (old) { 383 + int cur = atomic_cmpxchg(&blkg->use_delay, old, old - 1); 384 + if (cur == old) 385 + break; 386 + old = cur; 387 + } 388 + 389 + if (old == 0) 390 + return 0; 391 + if (old == 1) 392 + atomic_dec(&blkg->blkcg->css.cgroup->congestion_count); 393 + return 1; 394 + } 395 + 396 + /** 397 + * blkcg_set_delay - Enable allocator delay mechanism with the specified delay amount 398 + * @blkg: target blkg 399 + * @delay: delay duration in nsecs 400 + * 401 + * When enabled with this function, the delay is not decayed and must be 402 + * explicitly cleared with blkcg_clear_delay(). Must not be mixed with 403 + * blkcg_[un]use_delay() and blkcg_add_delay() usages. 404 + */ 405 + static inline void blkcg_set_delay(struct blkcg_gq *blkg, u64 delay) 406 + { 407 + int old = atomic_read(&blkg->use_delay); 408 + 409 + /* We only want 1 person setting the congestion count for this blkg. */ 410 + if (!old && atomic_cmpxchg(&blkg->use_delay, old, -1) == old) 411 + atomic_inc(&blkg->blkcg->css.cgroup->congestion_count); 412 + 413 + atomic64_set(&blkg->delay_nsec, delay); 414 + } 415 + 416 + /** 417 + * blkcg_clear_delay - Disable allocator delay mechanism 418 + * @blkg: target blkg 419 + * 420 + * Disable use_delay mechanism. See blkcg_set_delay(). 421 + */ 422 + static inline void blkcg_clear_delay(struct blkcg_gq *blkg) 423 + { 424 + int old = atomic_read(&blkg->use_delay); 425 + 426 + /* We only want 1 person clearing the congestion count for this blkg. */ 427 + if (old && atomic_cmpxchg(&blkg->use_delay, old, 0) == old) 428 + atomic_dec(&blkg->blkcg->css.cgroup->congestion_count); 429 + } 430 + 431 + void blk_cgroup_bio_start(struct bio *bio); 432 + void blkcg_add_delay(struct blkcg_gq *blkg, u64 now, u64 delta); 433 + #else /* CONFIG_BLK_CGROUP */ 434 + 435 + struct blkg_policy_data { 436 + }; 437 + 438 + struct blkcg_policy_data { 439 + }; 440 + 441 + struct blkcg_policy { 442 + }; 443 + 444 + #ifdef CONFIG_BLOCK 445 + 446 + static inline struct blkcg_gq *blkg_lookup(struct blkcg *blkcg, void *key) { return NULL; } 447 + static inline struct blkcg_gq *blk_queue_root_blkg(struct request_queue *q) 448 + { return NULL; } 449 + static inline int blkcg_init_queue(struct request_queue *q) { return 0; } 450 + static inline void blkcg_exit_queue(struct request_queue *q) { } 451 + static inline int blkcg_policy_register(struct blkcg_policy *pol) { return 0; } 452 + static inline void blkcg_policy_unregister(struct blkcg_policy *pol) { } 453 + static inline int blkcg_activate_policy(struct request_queue *q, 454 + const struct blkcg_policy *pol) { return 0; } 455 + static inline void blkcg_deactivate_policy(struct request_queue *q, 456 + const struct blkcg_policy *pol) { } 457 + 458 + static inline struct blkcg *__bio_blkcg(struct bio *bio) { return NULL; } 459 + 460 + static inline struct blkg_policy_data *blkg_to_pd(struct blkcg_gq *blkg, 461 + struct blkcg_policy *pol) { return NULL; } 462 + static inline struct blkcg_gq *pd_to_blkg(struct blkg_policy_data *pd) { return NULL; } 463 + static inline char *blkg_path(struct blkcg_gq *blkg) { return NULL; } 464 + static inline void blkg_get(struct blkcg_gq *blkg) { } 465 + static inline void blkg_put(struct blkcg_gq *blkg) { } 466 + 467 + static inline bool blkcg_punt_bio_submit(struct bio *bio) { return false; } 468 + static inline void blkcg_bio_issue_init(struct bio *bio) { } 469 + static inline void blk_cgroup_bio_start(struct bio *bio) { } 470 + 471 + #define blk_queue_for_each_rl(rl, q) \ 472 + for ((rl) = &(q)->root_rl; (rl); (rl) = NULL) 473 + 474 + #endif /* CONFIG_BLOCK */ 475 + #endif /* CONFIG_BLK_CGROUP */ 476 + 477 + #endif /* _BLK_CGROUP_PRIVATE_H */

+132 -154

block/blk-core.c

··· 34 34 #include <linux/delay.h> 35 35 #include <linux/ratelimit.h> 36 36 #include <linux/pm_runtime.h> 37 - #include <linux/blk-cgroup.h> 38 37 #include <linux/t10-pi.h> 39 38 #include <linux/debugfs.h> 40 39 #include <linux/bpf.h> ··· 48 49 #include "blk.h" 49 50 #include "blk-mq-sched.h" 50 51 #include "blk-pm.h" 52 + #include "blk-cgroup.h" 51 53 #include "blk-throttle.h" 52 54 53 55 struct dentry *blk_debugfs_root; ··· 164 164 [BLK_STS_RESOURCE] = { -ENOMEM, "kernel resource" }, 165 165 [BLK_STS_DEV_RESOURCE] = { -EBUSY, "device resource" }, 166 166 [BLK_STS_AGAIN] = { -EAGAIN, "nonblocking retry" }, 167 + [BLK_STS_OFFLINE] = { -ENODEV, "device offline" }, 167 168 168 169 /* device mapper special case, should not leak out: */ 169 170 [BLK_STS_DM_REQUEUE] = { -EREMCHG, "dm internal retry" }, ··· 470 469 timer_setup(&q->timeout, blk_rq_timed_out_timer, 0); 471 470 INIT_WORK(&q->timeout_work, blk_timeout_work); 472 471 INIT_LIST_HEAD(&q->icq_list); 473 - #ifdef CONFIG_BLK_CGROUP 474 - INIT_LIST_HEAD(&q->blkg_list); 475 - #endif 476 472 477 473 kobject_init(&q->kobj, &blk_queue_ktype); 478 474 ··· 534 536 } 535 537 EXPORT_SYMBOL(blk_get_queue); 536 538 537 - static void handle_bad_sector(struct bio *bio, sector_t maxsector) 538 - { 539 - char b[BDEVNAME_SIZE]; 540 - 541 - pr_info_ratelimited("%s: attempt to access beyond end of device\n" 542 - "%s: rw=%d, want=%llu, limit=%llu\n", 543 - current->comm, 544 - bio_devname(bio, b), bio->bi_opf, 545 - bio_end_sector(bio), maxsector); 546 - } 547 - 548 539 #ifdef CONFIG_FAIL_MAKE_REQUEST 549 540 550 541 static DECLARE_FAULT_ATTR(fail_make_request); ··· 563 576 static inline bool bio_check_ro(struct bio *bio) 564 577 { 565 578 if (op_is_write(bio_op(bio)) && bdev_read_only(bio->bi_bdev)) { 566 - char b[BDEVNAME_SIZE]; 567 - 568 579 if (op_is_flush(bio->bi_opf) && !bio_sectors(bio)) 569 580 return false; 570 - 571 - WARN_ONCE(1, 572 - "Trying to write to read-only block-device %s (partno %d)\n", 573 - bio_devname(bio, b), bio->bi_bdev->bd_partno); 581 + pr_warn("Trying to write to read-only block-device %pg\n", 582 + bio->bi_bdev); 574 583 /* Older lvm-tools actually trigger this */ 575 584 return false; 576 585 } ··· 595 612 if (nr_sectors && maxsector && 596 613 (nr_sectors > maxsector || 597 614 bio->bi_iter.bi_sector > maxsector - nr_sectors)) { 598 - handle_bad_sector(bio, maxsector); 615 + pr_info_ratelimited("%s: attempt to access beyond end of device\n" 616 + "%pg: rw=%d, want=%llu, limit=%llu\n", 617 + current->comm, 618 + bio->bi_bdev, bio->bi_opf, 619 + bio_end_sector(bio), maxsector); 599 620 return -EIO; 600 621 } 601 622 return 0; ··· 659 672 return BLK_STS_OK; 660 673 } 661 674 662 - noinline_for_stack bool submit_bio_checks(struct bio *bio) 675 + static void __submit_bio(struct bio *bio) 676 + { 677 + struct gendisk *disk = bio->bi_bdev->bd_disk; 678 + 679 + if (unlikely(!blk_crypto_bio_prep(&bio))) 680 + return; 681 + 682 + if (!disk->fops->submit_bio) { 683 + blk_mq_submit_bio(bio); 684 + } else if (likely(bio_queue_enter(bio) == 0)) { 685 + disk->fops->submit_bio(bio); 686 + blk_queue_exit(disk->queue); 687 + } 688 + } 689 + 690 + /* 691 + * The loop in this function may be a bit non-obvious, and so deserves some 692 + * explanation: 693 + * 694 + * - Before entering the loop, bio->bi_next is NULL (as all callers ensure 695 + * that), so we have a list with a single bio. 696 + * - We pretend that we have just taken it off a longer list, so we assign 697 + * bio_list to a pointer to the bio_list_on_stack, thus initialising the 698 + * bio_list of new bios to be added. ->submit_bio() may indeed add some more 699 + * bios through a recursive call to submit_bio_noacct. If it did, we find a 700 + * non-NULL value in bio_list and re-enter the loop from the top. 701 + * - In this case we really did just take the bio of the top of the list (no 702 + * pretending) and so remove it from bio_list, and call into ->submit_bio() 703 + * again. 704 + * 705 + * bio_list_on_stack[0] contains bios submitted by the current ->submit_bio. 706 + * bio_list_on_stack[1] contains bios that were submitted before the current 707 + * ->submit_bio_bio, but that haven't been processed yet. 708 + */ 709 + static void __submit_bio_noacct(struct bio *bio) 710 + { 711 + struct bio_list bio_list_on_stack[2]; 712 + 713 + BUG_ON(bio->bi_next); 714 + 715 + bio_list_init(&bio_list_on_stack[0]); 716 + current->bio_list = bio_list_on_stack; 717 + 718 + do { 719 + struct request_queue *q = bdev_get_queue(bio->bi_bdev); 720 + struct bio_list lower, same; 721 + 722 + /* 723 + * Create a fresh bio_list for all subordinate requests. 724 + */ 725 + bio_list_on_stack[1] = bio_list_on_stack[0]; 726 + bio_list_init(&bio_list_on_stack[0]); 727 + 728 + __submit_bio(bio); 729 + 730 + /* 731 + * Sort new bios into those for a lower level and those for the 732 + * same level. 733 + */ 734 + bio_list_init(&lower); 735 + bio_list_init(&same); 736 + while ((bio = bio_list_pop(&bio_list_on_stack[0])) != NULL) 737 + if (q == bdev_get_queue(bio->bi_bdev)) 738 + bio_list_add(&same, bio); 739 + else 740 + bio_list_add(&lower, bio); 741 + 742 + /* 743 + * Now assemble so we handle the lowest level first. 744 + */ 745 + bio_list_merge(&bio_list_on_stack[0], &lower); 746 + bio_list_merge(&bio_list_on_stack[0], &same); 747 + bio_list_merge(&bio_list_on_stack[0], &bio_list_on_stack[1]); 748 + } while ((bio = bio_list_pop(&bio_list_on_stack[0]))); 749 + 750 + current->bio_list = NULL; 751 + } 752 + 753 + static void __submit_bio_noacct_mq(struct bio *bio) 754 + { 755 + struct bio_list bio_list[2] = { }; 756 + 757 + current->bio_list = bio_list; 758 + 759 + do { 760 + __submit_bio(bio); 761 + } while ((bio = bio_list_pop(&bio_list[0]))); 762 + 763 + current->bio_list = NULL; 764 + } 765 + 766 + void submit_bio_noacct_nocheck(struct bio *bio) 767 + { 768 + /* 769 + * We only want one ->submit_bio to be active at a time, else stack 770 + * usage with stacked devices could be a problem. Use current->bio_list 771 + * to collect a list of requests submited by a ->submit_bio method while 772 + * it is active, and then process them after it returned. 773 + */ 774 + if (current->bio_list) 775 + bio_list_add(&current->bio_list[0], bio); 776 + else if (!bio->bi_bdev->bd_disk->fops->submit_bio) 777 + __submit_bio_noacct_mq(bio); 778 + else 779 + __submit_bio_noacct(bio); 780 + } 781 + 782 + /** 783 + * submit_bio_noacct - re-submit a bio to the block device layer for I/O 784 + * @bio: The bio describing the location in memory and on the device. 785 + * 786 + * This is a version of submit_bio() that shall only be used for I/O that is 787 + * resubmitted to lower level drivers by stacking block drivers. All file 788 + * systems and other upper level users of the block layer should use 789 + * submit_bio() instead. 790 + */ 791 + void submit_bio_noacct(struct bio *bio) 663 792 { 664 793 struct block_device *bdev = bio->bi_bdev; 665 794 struct request_queue *q = bdev_get_queue(bdev); ··· 860 757 } 861 758 862 759 if (blk_throtl_bio(bio)) 863 - return false; 760 + return; 864 761 865 762 blk_cgroup_bio_start(bio); 866 763 blkcg_bio_issue_init(bio); ··· 872 769 */ 873 770 bio_set_flag(bio, BIO_TRACE_COMPLETION); 874 771 } 875 - return true; 772 + submit_bio_noacct_nocheck(bio); 773 + return; 876 774 877 775 not_supported: 878 776 status = BLK_STS_NOTSUPP; 879 777 end_io: 880 778 bio->bi_status = status; 881 779 bio_endio(bio); 882 - return false; 883 - } 884 - 885 - static void __submit_bio_fops(struct gendisk *disk, struct bio *bio) 886 - { 887 - if (blk_crypto_bio_prep(&bio)) { 888 - if (likely(bio_queue_enter(bio) == 0)) { 889 - disk->fops->submit_bio(bio); 890 - blk_queue_exit(disk->queue); 891 - } 892 - } 893 - } 894 - 895 - static void __submit_bio(struct bio *bio) 896 - { 897 - struct gendisk *disk = bio->bi_bdev->bd_disk; 898 - 899 - if (unlikely(!submit_bio_checks(bio))) 900 - return; 901 - 902 - if (!disk->fops->submit_bio) 903 - blk_mq_submit_bio(bio); 904 - else 905 - __submit_bio_fops(disk, bio); 906 - } 907 - 908 - /* 909 - * The loop in this function may be a bit non-obvious, and so deserves some 910 - * explanation: 911 - * 912 - * - Before entering the loop, bio->bi_next is NULL (as all callers ensure 913 - * that), so we have a list with a single bio. 914 - * - We pretend that we have just taken it off a longer list, so we assign 915 - * bio_list to a pointer to the bio_list_on_stack, thus initialising the 916 - * bio_list of new bios to be added. ->submit_bio() may indeed add some more 917 - * bios through a recursive call to submit_bio_noacct. If it did, we find a 918 - * non-NULL value in bio_list and re-enter the loop from the top. 919 - * - In this case we really did just take the bio of the top of the list (no 920 - * pretending) and so remove it from bio_list, and call into ->submit_bio() 921 - * again. 922 - * 923 - * bio_list_on_stack[0] contains bios submitted by the current ->submit_bio. 924 - * bio_list_on_stack[1] contains bios that were submitted before the current 925 - * ->submit_bio_bio, but that haven't been processed yet. 926 - */ 927 - static void __submit_bio_noacct(struct bio *bio) 928 - { 929 - struct bio_list bio_list_on_stack[2]; 930 - 931 - BUG_ON(bio->bi_next); 932 - 933 - bio_list_init(&bio_list_on_stack[0]); 934 - current->bio_list = bio_list_on_stack; 935 - 936 - do { 937 - struct request_queue *q = bdev_get_queue(bio->bi_bdev); 938 - struct bio_list lower, same; 939 - 940 - /* 941 - * Create a fresh bio_list for all subordinate requests. 942 - */ 943 - bio_list_on_stack[1] = bio_list_on_stack[0]; 944 - bio_list_init(&bio_list_on_stack[0]); 945 - 946 - __submit_bio(bio); 947 - 948 - /* 949 - * Sort new bios into those for a lower level and those for the 950 - * same level. 951 - */ 952 - bio_list_init(&lower); 953 - bio_list_init(&same); 954 - while ((bio = bio_list_pop(&bio_list_on_stack[0])) != NULL) 955 - if (q == bdev_get_queue(bio->bi_bdev)) 956 - bio_list_add(&same, bio); 957 - else 958 - bio_list_add(&lower, bio); 959 - 960 - /* 961 - * Now assemble so we handle the lowest level first. 962 - */ 963 - bio_list_merge(&bio_list_on_stack[0], &lower); 964 - bio_list_merge(&bio_list_on_stack[0], &same); 965 - bio_list_merge(&bio_list_on_stack[0], &bio_list_on_stack[1]); 966 - } while ((bio = bio_list_pop(&bio_list_on_stack[0]))); 967 - 968 - current->bio_list = NULL; 969 - } 970 - 971 - static void __submit_bio_noacct_mq(struct bio *bio) 972 - { 973 - struct bio_list bio_list[2] = { }; 974 - 975 - current->bio_list = bio_list; 976 - 977 - do { 978 - __submit_bio(bio); 979 - } while ((bio = bio_list_pop(&bio_list[0]))); 980 - 981 - current->bio_list = NULL; 982 - } 983 - 984 - /** 985 - * submit_bio_noacct - re-submit a bio to the block device layer for I/O 986 - * @bio: The bio describing the location in memory and on the device. 987 - * 988 - * This is a version of submit_bio() that shall only be used for I/O that is 989 - * resubmitted to lower level drivers by stacking block drivers. All file 990 - * systems and other upper level users of the block layer should use 991 - * submit_bio() instead. 992 - */ 993 - void submit_bio_noacct(struct bio *bio) 994 - { 995 - /* 996 - * We only want one ->submit_bio to be active at a time, else stack 997 - * usage with stacked devices could be a problem. Use current->bio_list 998 - * to collect a list of requests submited by a ->submit_bio method while 999 - * it is active, and then process them after it returned. 1000 - */ 1001 - if (current->bio_list) 1002 - bio_list_add(&current->bio_list[0], bio); 1003 - else if (!bio->bi_bdev->bd_disk->fops->submit_bio) 1004 - __submit_bio_noacct_mq(bio); 1005 - else 1006 - __submit_bio_noacct(bio); 1007 780 } 1008 781 EXPORT_SYMBOL(submit_bio_noacct); 1009 782 ··· 964 985 !test_bit(QUEUE_FLAG_POLL, &q->queue_flags)) 965 986 return 0; 966 987 967 - if (current->plug) 968 - blk_flush_plug(current->plug, false); 988 + blk_flush_plug(current->plug, false); 969 989 970 990 if (blk_queue_enter(q, BLK_MQ_REQ_NOWAIT)) 971 991 return 0; ··· 1246 1268 } 1247 1269 EXPORT_SYMBOL(blk_check_plugged); 1248 1270 1249 - void blk_flush_plug(struct blk_plug *plug, bool from_schedule) 1271 + void __blk_flush_plug(struct blk_plug *plug, bool from_schedule) 1250 1272 { 1251 1273 if (!list_empty(&plug->cb_list)) 1252 1274 flush_plug_callbacks(plug, from_schedule); ··· 1275 1297 void blk_finish_plug(struct blk_plug *plug) 1276 1298 { 1277 1299 if (plug == current->plug) { 1278 - blk_flush_plug(plug, false); 1300 + __blk_flush_plug(plug, false); 1279 1301 current->plug = NULL; 1280 1302 } 1281 1303 }

+1 -1

block/blk-crypto-fallback.c

··· 10 10 #define pr_fmt(fmt) "blk-crypto-fallback: " fmt 11 11 12 12 #include <crypto/skcipher.h> 13 - #include <linux/blk-cgroup.h> 14 13 #include <linux/blk-crypto.h> 15 14 #include <linux/blk-crypto-profile.h> 16 15 #include <linux/blkdev.h> ··· 19 20 #include <linux/random.h> 20 21 #include <linux/scatterlist.h> 21 22 23 + #include "blk-cgroup.h" 22 24 #include "blk-crypto-internal.h" 23 25 24 26 static unsigned int num_prealloc_bounce_pg = 32;

+12

block/blk-crypto-internal.h

··· 11 11 12 12 /* Represents a crypto mode supported by blk-crypto */ 13 13 struct blk_crypto_mode { 14 + const char *name; /* name of this mode, shown in sysfs */ 14 15 const char *cipher_str; /* crypto API name (for fallback case) */ 15 16 unsigned int keysize; /* key size in bytes */ 16 17 unsigned int ivsize; /* iv size in bytes */ ··· 20 19 extern const struct blk_crypto_mode blk_crypto_modes[]; 21 20 22 21 #ifdef CONFIG_BLK_INLINE_ENCRYPTION 22 + 23 + int blk_crypto_sysfs_register(struct request_queue *q); 24 + 25 + void blk_crypto_sysfs_unregister(struct request_queue *q); 23 26 24 27 void bio_crypt_dun_increment(u64 dun[BLK_CRYPTO_DUN_ARRAY_SIZE], 25 28 unsigned int inc); ··· 66 61 } 67 62 68 63 #else /* CONFIG_BLK_INLINE_ENCRYPTION */ 64 + 65 + static inline int blk_crypto_sysfs_register(struct request_queue *q) 66 + { 67 + return 0; 68 + } 69 + 70 + static inline void blk_crypto_sysfs_unregister(struct request_queue *q) { } 69 71 70 72 static inline bool bio_crypt_rq_ctx_compatible(struct request *rq, 71 73 struct bio *bio)

+172

block/blk-crypto-sysfs.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* 3 + * Copyright 2021 Google LLC 4 + * 5 + * sysfs support for blk-crypto. This file contains the code which exports the 6 + * crypto capabilities of devices via /sys/block/$disk/queue/crypto/. 7 + */ 8 + 9 + #include <linux/blk-crypto-profile.h> 10 + 11 + #include "blk-crypto-internal.h" 12 + 13 + struct blk_crypto_kobj { 14 + struct kobject kobj; 15 + struct blk_crypto_profile *profile; 16 + }; 17 + 18 + struct blk_crypto_attr { 19 + struct attribute attr; 20 + ssize_t (*show)(struct blk_crypto_profile *profile, 21 + struct blk_crypto_attr *attr, char *page); 22 + }; 23 + 24 + static struct blk_crypto_profile *kobj_to_crypto_profile(struct kobject *kobj) 25 + { 26 + return container_of(kobj, struct blk_crypto_kobj, kobj)->profile; 27 + } 28 + 29 + static struct blk_crypto_attr *attr_to_crypto_attr(struct attribute *attr) 30 + { 31 + return container_of(attr, struct blk_crypto_attr, attr); 32 + } 33 + 34 + static ssize_t max_dun_bits_show(struct blk_crypto_profile *profile, 35 + struct blk_crypto_attr *attr, char *page) 36 + { 37 + return sysfs_emit(page, "%u\n", 8 * profile->max_dun_bytes_supported); 38 + } 39 + 40 + static ssize_t num_keyslots_show(struct blk_crypto_profile *profile, 41 + struct blk_crypto_attr *attr, char *page) 42 + { 43 + return sysfs_emit(page, "%u\n", profile->num_slots); 44 + } 45 + 46 + #define BLK_CRYPTO_RO_ATTR(_name) \ 47 + static struct blk_crypto_attr _name##_attr = __ATTR_RO(_name) 48 + 49 + BLK_CRYPTO_RO_ATTR(max_dun_bits); 50 + BLK_CRYPTO_RO_ATTR(num_keyslots); 51 + 52 + static struct attribute *blk_crypto_attrs[] = { 53 + &max_dun_bits_attr.attr, 54 + &num_keyslots_attr.attr, 55 + NULL, 56 + }; 57 + 58 + static const struct attribute_group blk_crypto_attr_group = { 59 + .attrs = blk_crypto_attrs, 60 + }; 61 + 62 + /* 63 + * The encryption mode attributes. To avoid hard-coding the list of encryption 64 + * modes, these are initialized at boot time by blk_crypto_sysfs_init(). 65 + */ 66 + static struct blk_crypto_attr __blk_crypto_mode_attrs[BLK_ENCRYPTION_MODE_MAX]; 67 + static struct attribute *blk_crypto_mode_attrs[BLK_ENCRYPTION_MODE_MAX + 1]; 68 + 69 + static umode_t blk_crypto_mode_is_visible(struct kobject *kobj, 70 + struct attribute *attr, int n) 71 + { 72 + struct blk_crypto_profile *profile = kobj_to_crypto_profile(kobj); 73 + struct blk_crypto_attr *a = attr_to_crypto_attr(attr); 74 + int mode_num = a - __blk_crypto_mode_attrs; 75 + 76 + if (profile->modes_supported[mode_num]) 77 + return 0444; 78 + return 0; 79 + } 80 + 81 + static ssize_t blk_crypto_mode_show(struct blk_crypto_profile *profile, 82 + struct blk_crypto_attr *attr, char *page) 83 + { 84 + int mode_num = attr - __blk_crypto_mode_attrs; 85 + 86 + return sysfs_emit(page, "0x%x\n", profile->modes_supported[mode_num]); 87 + } 88 + 89 + static const struct attribute_group blk_crypto_modes_attr_group = { 90 + .name = "modes", 91 + .attrs = blk_crypto_mode_attrs, 92 + .is_visible = blk_crypto_mode_is_visible, 93 + }; 94 + 95 + static const struct attribute_group *blk_crypto_attr_groups[] = { 96 + &blk_crypto_attr_group, 97 + &blk_crypto_modes_attr_group, 98 + NULL, 99 + }; 100 + 101 + static ssize_t blk_crypto_attr_show(struct kobject *kobj, 102 + struct attribute *attr, char *page) 103 + { 104 + struct blk_crypto_profile *profile = kobj_to_crypto_profile(kobj); 105 + struct blk_crypto_attr *a = attr_to_crypto_attr(attr); 106 + 107 + return a->show(profile, a, page); 108 + } 109 + 110 + static const struct sysfs_ops blk_crypto_attr_ops = { 111 + .show = blk_crypto_attr_show, 112 + }; 113 + 114 + static void blk_crypto_release(struct kobject *kobj) 115 + { 116 + kfree(container_of(kobj, struct blk_crypto_kobj, kobj)); 117 + } 118 + 119 + static struct kobj_type blk_crypto_ktype = { 120 + .default_groups = blk_crypto_attr_groups, 121 + .sysfs_ops = &blk_crypto_attr_ops, 122 + .release = blk_crypto_release, 123 + }; 124 + 125 + /* 126 + * If the request_queue has a blk_crypto_profile, create the "crypto" 127 + * subdirectory in sysfs (/sys/block/$disk/queue/crypto/). 128 + */ 129 + int blk_crypto_sysfs_register(struct request_queue *q) 130 + { 131 + struct blk_crypto_kobj *obj; 132 + int err; 133 + 134 + if (!q->crypto_profile) 135 + return 0; 136 + 137 + obj = kzalloc(sizeof(*obj), GFP_KERNEL); 138 + if (!obj) 139 + return -ENOMEM; 140 + obj->profile = q->crypto_profile; 141 + 142 + err = kobject_init_and_add(&obj->kobj, &blk_crypto_ktype, &q->kobj, 143 + "crypto"); 144 + if (err) { 145 + kobject_put(&obj->kobj); 146 + return err; 147 + } 148 + q->crypto_kobject = &obj->kobj; 149 + return 0; 150 + } 151 + 152 + void blk_crypto_sysfs_unregister(struct request_queue *q) 153 + { 154 + kobject_put(q->crypto_kobject); 155 + } 156 + 157 + static int __init blk_crypto_sysfs_init(void) 158 + { 159 + int i; 160 + 161 + BUILD_BUG_ON(BLK_ENCRYPTION_MODE_INVALID != 0); 162 + for (i = 1; i < BLK_ENCRYPTION_MODE_MAX; i++) { 163 + struct blk_crypto_attr *attr = &__blk_crypto_mode_attrs[i]; 164 + 165 + attr->attr.name = blk_crypto_modes[i].name; 166 + attr->attr.mode = 0444; 167 + attr->show = blk_crypto_mode_show; 168 + blk_crypto_mode_attrs[i - 1] = &attr->attr; 169 + } 170 + return 0; 171 + } 172 + subsys_initcall(blk_crypto_sysfs_init);

+3 -1

block/blk-crypto.c

··· 19 19 20 20 const struct blk_crypto_mode blk_crypto_modes[] = { 21 21 [BLK_ENCRYPTION_MODE_AES_256_XTS] = { 22 + .name = "AES-256-XTS", 22 23 .cipher_str = "xts(aes)", 23 24 .keysize = 64, 24 25 .ivsize = 16, 25 26 }, 26 27 [BLK_ENCRYPTION_MODE_AES_128_CBC_ESSIV] = { 28 + .name = "AES-128-CBC-ESSIV", 27 29 .cipher_str = "essiv(cbc(aes),sha256)", 28 30 .keysize = 16, 29 31 .ivsize = 16, 30 32 }, 31 33 [BLK_ENCRYPTION_MODE_ADIANTUM] = { 34 + .name = "Adiantum", 32 35 .cipher_str = "adiantum(xchacha12,aes)", 33 36 .keysize = 32, 34 37 .ivsize = 32, ··· 114 111 *dst->bi_crypt_context = *src->bi_crypt_context; 115 112 return 0; 116 113 } 117 - EXPORT_SYMBOL_GPL(__bio_crypt_clone); 118 114 119 115 /* Increments @dun by @inc, treating @dun as a multi-limb integer. */ 120 116 void bio_crypt_dun_increment(u64 dun[BLK_CRYPTO_DUN_ARRAY_SIZE],

+1 -3

block/blk-flush.c

··· 460 460 { 461 461 struct bio bio; 462 462 463 - bio_init(&bio, NULL, 0); 464 - bio_set_dev(&bio, bdev); 465 - bio.bi_opf = REQ_OP_WRITE | REQ_PREFLUSH; 463 + bio_init(&bio, bdev, NULL, 0, REQ_OP_WRITE | REQ_PREFLUSH); 466 464 return submit_bio_wait(&bio); 467 465 } 468 466 EXPORT_SYMBOL(blkdev_issue_flush);

+1 -1

block/blk-iocost.c

··· 178 178 #include <linux/time64.h> 179 179 #include <linux/parser.h> 180 180 #include <linux/sched/signal.h> 181 - #include <linux/blk-cgroup.h> 182 181 #include <asm/local.h> 183 182 #include <asm/local64.h> 184 183 #include "blk-rq-qos.h" 185 184 #include "blk-stat.h" 186 185 #include "blk-wbt.h" 186 + #include "blk-cgroup.h" 187 187 188 188 #ifdef CONFIG_TRACEPOINTS 189 189

+1 -1

block/blk-iolatency.c

··· 74 74 #include <linux/sched/signal.h> 75 75 #include <trace/events/block.h> 76 76 #include <linux/blk-mq.h> 77 - #include <linux/blk-cgroup.h> 78 77 #include "blk-rq-qos.h" 79 78 #include "blk-stat.h" 79 + #include "blk-cgroup.h" 80 80 #include "blk.h" 81 81 82 82 #define DEFAULT_SCALE_COOKIE 1000000U

+1 -1

block/blk-ioprio.c

··· 12 12 * Documentation/admin-guide/cgroup-v2.rst. 13 13 */ 14 14 15 - #include <linux/blk-cgroup.h> 16 15 #include <linux/blk-mq.h> 17 16 #include <linux/blk_types.h> 18 17 #include <linux/kernel.h> 19 18 #include <linux/module.h> 19 + #include "blk-cgroup.h" 20 20 #include "blk-ioprio.h" 21 21 #include "blk-rq-qos.h" 22 22

+5 -41

block/blk-lib.c

··· 10 10 11 11 #include "blk.h" 12 12 13 - struct bio *blk_next_bio(struct bio *bio, unsigned int nr_pages, gfp_t gfp) 14 - { 15 - struct bio *new = bio_alloc(gfp, nr_pages); 16 - 17 - if (bio) { 18 - bio_chain(bio, new); 19 - submit_bio(bio); 20 - } 21 - 22 - return new; 23 - } 24 - EXPORT_SYMBOL_GPL(blk_next_bio); 25 - 26 13 int __blkdev_issue_discard(struct block_device *bdev, sector_t sector, 27 14 sector_t nr_sects, gfp_t gfp_mask, int flags, 28 15 struct bio **biop) ··· 18 31 struct bio *bio = *biop; 19 32 unsigned int op; 20 33 sector_t bs_mask, part_offset = 0; 21 - 22 - if (!q) 23 - return -ENXIO; 24 34 25 35 if (bdev_read_only(bdev)) 26 36 return -EPERM; ··· 79 95 80 96 WARN_ON_ONCE((req_sects << 9) > UINT_MAX); 81 97 82 - bio = blk_next_bio(bio, 0, gfp_mask); 98 + bio = blk_next_bio(bio, bdev, 0, op, gfp_mask); 83 99 bio->bi_iter.bi_sector = sector; 84 - bio_set_dev(bio, bdev); 85 - bio_set_op_attrs(bio, op, 0); 86 - 87 100 bio->bi_iter.bi_size = req_sects << 9; 88 101 sector += req_sects; 89 102 nr_sects -= req_sects; ··· 153 172 struct bio *bio = *biop; 154 173 sector_t bs_mask; 155 174 156 - if (!q) 157 - return -ENXIO; 158 - 159 175 if (bdev_read_only(bdev)) 160 176 return -EPERM; 161 177 ··· 167 189 max_write_same_sectors = bio_allowed_max_sectors(q); 168 190 169 191 while (nr_sects) { 170 - bio = blk_next_bio(bio, 1, gfp_mask); 192 + bio = blk_next_bio(bio, bdev, 1, REQ_OP_WRITE_SAME, gfp_mask); 171 193 bio->bi_iter.bi_sector = sector; 172 - bio_set_dev(bio, bdev); 173 194 bio->bi_vcnt = 1; 174 195 bio->bi_io_vec->bv_page = page; 175 196 bio->bi_io_vec->bv_offset = 0; 176 197 bio->bi_io_vec->bv_len = bdev_logical_block_size(bdev); 177 - bio_set_op_attrs(bio, REQ_OP_WRITE_SAME, 0); 178 198 179 199 if (nr_sects > max_write_same_sectors) { 180 200 bio->bi_iter.bi_size = max_write_same_sectors << 9; ··· 226 250 { 227 251 struct bio *bio = *biop; 228 252 unsigned int max_write_zeroes_sectors; 229 - struct request_queue *q = bdev_get_queue(bdev); 230 - 231 - if (!q) 232 - return -ENXIO; 233 253 234 254 if (bdev_read_only(bdev)) 235 255 return -EPERM; ··· 237 265 return -EOPNOTSUPP; 238 266 239 267 while (nr_sects) { 240 - bio = blk_next_bio(bio, 0, gfp_mask); 268 + bio = blk_next_bio(bio, bdev, 0, REQ_OP_WRITE_ZEROES, gfp_mask); 241 269 bio->bi_iter.bi_sector = sector; 242 - bio_set_dev(bio, bdev); 243 - bio->bi_opf = REQ_OP_WRITE_ZEROES; 244 270 if (flags & BLKDEV_ZERO_NOUNMAP) 245 271 bio->bi_opf |= REQ_NOUNMAP; 246 272 ··· 274 304 sector_t sector, sector_t nr_sects, gfp_t gfp_mask, 275 305 struct bio **biop) 276 306 { 277 - struct request_queue *q = bdev_get_queue(bdev); 278 307 struct bio *bio = *biop; 279 308 int bi_size = 0; 280 309 unsigned int sz; 281 - 282 - if (!q) 283 - return -ENXIO; 284 310 285 311 if (bdev_read_only(bdev)) 286 312 return -EPERM; 287 313 288 314 while (nr_sects != 0) { 289 - bio = blk_next_bio(bio, __blkdev_sectors_to_bio_pages(nr_sects), 290 - gfp_mask); 315 + bio = blk_next_bio(bio, bdev, __blkdev_sectors_to_bio_pages(nr_sects), 316 + REQ_OP_WRITE, gfp_mask); 291 317 bio->bi_iter.bi_sector = sector; 292 - bio_set_dev(bio, bdev); 293 - bio_set_op_attrs(bio, REQ_OP_WRITE, 0); 294 318 295 319 while (nr_sects != 0) { 296 320 sz = min((sector_t) PAGE_SIZE, nr_sects << 9);

-2

block/blk-merge.c

··· 368 368 trace_block_split(split, (*bio)->bi_iter.bi_sector); 369 369 submit_bio_noacct(*bio); 370 370 *bio = split; 371 - 372 - blk_throtl_charge_bio_split(*bio); 373 371 } 374 372 } 375 373

+1 -1

block/blk-mq-tag.c

··· 107 107 return BLK_MQ_NO_TAG; 108 108 109 109 if (data->shallow_depth) 110 - return __sbitmap_queue_get_shallow(bt, data->shallow_depth); 110 + return sbitmap_queue_get_shallow(bt, data->shallow_depth); 111 111 else 112 112 return __sbitmap_queue_get(bt); 113 113 }

+22 -42

block/blk-mq.c

··· 793 793 #endif 794 794 795 795 if (unlikely(error && !blk_rq_is_passthrough(req) && 796 - !(req->rq_flags & RQF_QUIET))) 796 + !(req->rq_flags & RQF_QUIET))) { 797 797 blk_print_req_error(req, error); 798 + trace_block_rq_error(req, error, nr_bytes); 799 + } 798 800 799 801 blk_account_io_completion(req, nr_bytes); 800 802 ··· 2184 2182 if (blk_mq_hctx_stopped(hctx)) 2185 2183 continue; 2186 2184 /* 2185 + * If there is already a run_work pending, leave the 2186 + * pending delay untouched. Otherwise, a hctx can stall 2187 + * if another hctx is re-delaying the other's work 2188 + * before the work executes. 2189 + */ 2190 + if (delayed_work_pending(&hctx->run_work)) 2191 + continue; 2192 + /* 2187 2193 * Dispatch from this hctx either if there's no hctx preferred 2188 2194 * by IO scheduler or if it has requests that bypass the 2189 2195 * scheduler. ··· 2800 2790 unsigned int nr_segs = 1; 2801 2791 blk_status_t ret; 2802 2792 2803 - if (unlikely(!blk_crypto_bio_prep(&bio))) 2804 - return; 2805 - 2806 2793 blk_queue_bounce(q, &bio); 2807 2794 if (blk_may_split(q, bio)) 2808 2795 __blk_queue_split(q, &bio, &nr_segs); ··· 2849 2842 blk_mq_try_issue_directly(rq->mq_hctx, rq)); 2850 2843 } 2851 2844 2845 + #ifdef CONFIG_BLK_MQ_STACKING 2852 2846 /** 2853 - * blk_cloned_rq_check_limits - Helper function to check a cloned request 2854 - * for the new queue limits 2855 - * @q: the queue 2856 - * @rq: the request being checked 2857 - * 2858 - * Description: 2859 - * @rq may have been made based on weaker limitations of upper-level queues 2860 - * in request stacking drivers, and it may violate the limitation of @q. 2861 - * Since the block layer and the underlying device driver trust @rq 2862 - * after it is inserted to @q, it should be checked against @q before 2863 - * the insertion using this generic function. 2864 - * 2865 - * Request stacking drivers like request-based dm may change the queue 2866 - * limits when retrying requests on other queues. Those requests need 2867 - * to be checked against the new queue limits again during dispatch. 2847 + * blk_insert_cloned_request - Helper for stacking drivers to submit a request 2848 + * @rq: the request being queued 2868 2849 */ 2869 - static blk_status_t blk_cloned_rq_check_limits(struct request_queue *q, 2870 - struct request *rq) 2850 + blk_status_t blk_insert_cloned_request(struct request *rq) 2871 2851 { 2852 + struct request_queue *q = rq->q; 2872 2853 unsigned int max_sectors = blk_queue_get_max_sectors(q, req_op(rq)); 2854 + blk_status_t ret; 2873 2855 2874 2856 if (blk_rq_sectors(rq) > max_sectors) { 2875 2857 /* ··· 2890 2894 return BLK_STS_IOERR; 2891 2895 } 2892 2896 2893 - return BLK_STS_OK; 2894 - } 2895 - 2896 - /** 2897 - * blk_insert_cloned_request - Helper for stacking drivers to submit a request 2898 - * @q: the queue to submit the request 2899 - * @rq: the request being queued 2900 - */ 2901 - blk_status_t blk_insert_cloned_request(struct request_queue *q, struct request *rq) 2902 - { 2903 - blk_status_t ret; 2904 - 2905 - ret = blk_cloned_rq_check_limits(q, rq); 2906 - if (ret != BLK_STS_OK) 2907 - return ret; 2908 - 2909 - if (rq->q->disk && 2910 - should_fail_request(rq->q->disk->part0, blk_rq_bytes(rq))) 2897 + if (q->disk && should_fail_request(q->disk->part0, blk_rq_bytes(rq))) 2911 2898 return BLK_STS_IOERR; 2912 2899 2913 2900 if (blk_crypto_insert_cloned_request(rq)) ··· 2903 2924 * bypass a potential scheduler on the bottom device for 2904 2925 * insert. 2905 2926 */ 2906 - blk_mq_run_dispatch_ops(rq->q, 2927 + blk_mq_run_dispatch_ops(q, 2907 2928 ret = blk_mq_request_issue_directly(rq, true)); 2908 2929 if (ret) 2909 2930 blk_account_io_done(rq, ktime_get_ns()); ··· 2958 2979 bs = &fs_bio_set; 2959 2980 2960 2981 __rq_for_each_bio(bio_src, rq_src) { 2961 - bio = bio_clone_fast(bio_src, gfp_mask, bs); 2982 + bio = bio_alloc_clone(rq->q->disk->part0, bio_src, gfp_mask, 2983 + bs); 2962 2984 if (!bio) 2963 2985 goto free_and_out; 2964 - bio->bi_bdev = rq->q->disk->part0; 2965 2986 2966 2987 if (bio_ctr && bio_ctr(bio, bio_src, data)) 2967 2988 goto free_and_out; ··· 2998 3019 return -ENOMEM; 2999 3020 } 3000 3021 EXPORT_SYMBOL_GPL(blk_rq_prep_clone); 3022 + #endif /* CONFIG_BLK_MQ_STACKING */ 3001 3023 3002 3024 /* 3003 3025 * Steal bios from a request and add them to a bio list.

+13 -6

block/blk-sysfs.c

··· 10 10 #include <linux/backing-dev.h> 11 11 #include <linux/blktrace_api.h> 12 12 #include <linux/blk-mq.h> 13 - #include <linux/blk-cgroup.h> 14 13 #include <linux/debugfs.h> 15 14 16 15 #include "blk.h" ··· 17 18 #include "blk-mq-debugfs.h" 18 19 #include "blk-mq-sched.h" 19 20 #include "blk-wbt.h" 21 + #include "blk-cgroup.h" 20 22 #include "blk-throttle.h" 21 23 22 24 struct queue_sysfs_entry { ··· 880 880 goto put_dev; 881 881 } 882 882 883 + ret = blk_crypto_sysfs_register(q); 884 + if (ret) 885 + goto put_dev; 886 + 883 887 blk_queue_flag_set(QUEUE_FLAG_REGISTERED, q); 884 888 wbt_enable_default(q); 885 889 blk_throtl_register_queue(q); ··· 914 910 return ret; 915 911 916 912 put_dev: 913 + elv_unregister_queue(q); 917 914 disk_unregister_independent_access_ranges(disk); 918 915 mutex_unlock(&q->sysfs_lock); 919 916 mutex_unlock(&q->sysfs_dir_lock); ··· 959 954 */ 960 955 if (queue_is_mq(q)) 961 956 blk_mq_unregister_dev(disk_to_dev(disk), q); 962 - 963 - kobject_uevent(&q->kobj, KOBJ_REMOVE); 964 - kobject_del(&q->kobj); 957 + blk_crypto_sysfs_unregister(q); 965 958 blk_trace_remove_sysfs(disk_to_dev(disk)); 966 959 967 960 mutex_lock(&q->sysfs_lock); 968 - if (q->elevator) 969 - elv_unregister_queue(q); 961 + elv_unregister_queue(q); 970 962 disk_unregister_independent_access_ranges(disk); 971 963 mutex_unlock(&q->sysfs_lock); 964 + 965 + /* Now that we've deleted all child objects, we can delete the queue. */ 966 + kobject_uevent(&q->kobj, KOBJ_REMOVE); 967 + kobject_del(&q->kobj); 968 + 972 969 mutex_unlock(&q->sysfs_dir_lock); 973 970 974 971 kobject_put(&disk_to_dev(disk)->kobj);

+22 -40

block/blk-throttle.c

··· 10 10 #include <linux/blkdev.h> 11 11 #include <linux/bio.h> 12 12 #include <linux/blktrace_api.h> 13 - #include <linux/blk-cgroup.h> 14 13 #include "blk.h" 15 14 #include "blk-cgroup-rwstat.h" 16 15 #include "blk-stat.h" ··· 40 41 41 42 /* A workqueue to queue throttle related work */ 42 43 static struct workqueue_struct *kthrotld_workqueue; 43 - 44 - enum tg_state_flags { 45 - THROTL_TG_PENDING = 1 << 0, /* on parent's pending tree */ 46 - THROTL_TG_WAS_EMPTY = 1 << 1, /* bio_lists[] became non-empty */ 47 - }; 48 44 49 45 #define rb_entry_tg(node) rb_entry((node), struct throtl_grp, rb_node) 50 46 ··· 420 426 struct throtl_grp *parent_tg = sq_to_tg(tg->service_queue.parent_sq); 421 427 struct throtl_data *td = tg->td; 422 428 int rw; 429 + int has_iops_limit = 0; 423 430 424 - for (rw = READ; rw <= WRITE; rw++) 431 + for (rw = READ; rw <= WRITE; rw++) { 432 + unsigned int iops_limit = tg_iops_limit(tg, rw); 433 + 425 434 tg->has_rules[rw] = (parent_tg && parent_tg->has_rules[rw]) || 426 435 (td->limit_valid[td->limit_index] && 427 436 (tg_bps_limit(tg, rw) != U64_MAX || 428 - tg_iops_limit(tg, rw) != UINT_MAX)); 437 + iops_limit != UINT_MAX)); 438 + 439 + if (iops_limit != UINT_MAX) 440 + has_iops_limit = 1; 441 + } 442 + 443 + if (has_iops_limit) 444 + tg->flags |= THROTL_TG_HAS_IOPS_LIMIT; 445 + else 446 + tg->flags &= ~THROTL_TG_HAS_IOPS_LIMIT; 429 447 } 430 448 431 449 static void throtl_pd_online(struct blkg_policy_data *pd) ··· 640 634 tg->bytes_disp[rw] = 0; 641 635 tg->io_disp[rw] = 0; 642 636 643 - atomic_set(&tg->io_split_cnt[rw], 0); 644 - 645 637 /* 646 638 * Previous slice has expired. We must have trimmed it after last 647 639 * bio dispatch. That means since start of last slice, we never used ··· 662 658 tg->io_disp[rw] = 0; 663 659 tg->slice_start[rw] = jiffies; 664 660 tg->slice_end[rw] = jiffies + tg->td->throtl_slice; 665 - 666 - atomic_set(&tg->io_split_cnt[rw], 0); 667 661 668 662 throtl_log(&tg->service_queue, 669 663 "[%c] new slice start=%lu end=%lu jiffies=%lu", ··· 810 808 unsigned long jiffy_elapsed, jiffy_wait, jiffy_elapsed_rnd; 811 809 unsigned int bio_size = throtl_bio_data_size(bio); 812 810 813 - if (bps_limit == U64_MAX) { 811 + /* no need to throttle if this bio's bytes have been accounted */ 812 + if (bps_limit == U64_MAX || bio_flagged(bio, BIO_THROTTLED)) { 814 813 if (wait) 815 814 *wait = 0; 816 815 return true; ··· 896 893 jiffies + tg->td->throtl_slice); 897 894 } 898 895 899 - if (iops_limit != UINT_MAX) 900 - tg->io_disp[rw] += atomic_xchg(&tg->io_split_cnt[rw], 0); 901 - 902 896 if (tg_with_in_bps_limit(tg, bio, bps_limit, &bps_wait) && 903 897 tg_with_in_iops_limit(tg, bio, iops_limit, &iops_wait)) { 904 898 if (wait) ··· 920 920 unsigned int bio_size = throtl_bio_data_size(bio); 921 921 922 922 /* Charge the bio to the group */ 923 - tg->bytes_disp[rw] += bio_size; 923 + if (!bio_flagged(bio, BIO_THROTTLED)) { 924 + tg->bytes_disp[rw] += bio_size; 925 + tg->last_bytes_disp[rw] += bio_size; 926 + } 927 + 924 928 tg->io_disp[rw]++; 925 - tg->last_bytes_disp[rw] += bio_size; 926 929 tg->last_io_disp[rw]++; 927 930 928 931 /* ··· 1222 1219 if (!bio_list_empty(&bio_list_on_stack)) { 1223 1220 blk_start_plug(&plug); 1224 1221 while ((bio = bio_list_pop(&bio_list_on_stack))) 1225 - submit_bio_noacct(bio); 1222 + submit_bio_noacct_nocheck(bio); 1226 1223 blk_finish_plug(&plug); 1227 1224 } 1228 1225 } ··· 1920 1917 } 1921 1918 1922 1919 if (tg->iops[READ][LIMIT_LOW]) { 1923 - tg->last_io_disp[READ] += atomic_xchg(&tg->last_io_split_cnt[READ], 0); 1924 1920 iops = tg->last_io_disp[READ] * HZ / elapsed_time; 1925 1921 if (iops >= tg->iops[READ][LIMIT_LOW]) 1926 1922 tg->last_low_overflow_time[READ] = now; 1927 1923 } 1928 1924 1929 1925 if (tg->iops[WRITE][LIMIT_LOW]) { 1930 - tg->last_io_disp[WRITE] += atomic_xchg(&tg->last_io_split_cnt[WRITE], 0); 1931 1926 iops = tg->last_io_disp[WRITE] * HZ / elapsed_time; 1932 1927 if (iops >= tg->iops[WRITE][LIMIT_LOW]) 1933 1928 tg->last_low_overflow_time[WRITE] = now; ··· 2043 2042 { 2044 2043 } 2045 2044 #endif 2046 - 2047 - void blk_throtl_charge_bio_split(struct bio *bio) 2048 - { 2049 - struct blkcg_gq *blkg = bio->bi_blkg; 2050 - struct throtl_grp *parent = blkg_to_tg(blkg); 2051 - struct throtl_service_queue *parent_sq; 2052 - bool rw = bio_data_dir(bio); 2053 - 2054 - do { 2055 - if (!parent->has_rules[rw]) 2056 - break; 2057 - 2058 - atomic_inc(&parent->io_split_cnt[rw]); 2059 - atomic_inc(&parent->last_io_split_cnt[rw]); 2060 - 2061 - parent_sq = parent->service_queue.parent_sq; 2062 - parent = sq_to_tg(parent_sq); 2063 - } while (parent); 2064 - } 2065 2045 2066 2046 bool __blk_throtl_bio(struct bio *bio) 2067 2047 {

+10 -6

block/blk-throttle.h

··· 52 52 struct timer_list pending_timer; /* fires on first_pending_disptime */ 53 53 }; 54 54 55 + enum tg_state_flags { 56 + THROTL_TG_PENDING = 1 << 0, /* on parent's pending tree */ 57 + THROTL_TG_WAS_EMPTY = 1 << 1, /* bio_lists[] became non-empty */ 58 + THROTL_TG_HAS_IOPS_LIMIT = 1 << 2, /* tg has iops limit */ 59 + }; 60 + 55 61 enum { 56 62 LIMIT_LOW, 57 63 LIMIT_MAX, ··· 138 132 unsigned int bad_bio_cnt; /* bios exceeding latency threshold */ 139 133 unsigned long bio_cnt_reset_time; 140 134 141 - atomic_t io_split_cnt[2]; 142 - atomic_t last_io_split_cnt[2]; 143 - 144 135 struct blkg_rwstat stat_bytes; 145 136 struct blkg_rwstat stat_ios; 146 137 }; ··· 161 158 static inline int blk_throtl_init(struct request_queue *q) { return 0; } 162 159 static inline void blk_throtl_exit(struct request_queue *q) { } 163 160 static inline void blk_throtl_register_queue(struct request_queue *q) { } 164 - static inline void blk_throtl_charge_bio_split(struct bio *bio) { } 165 161 static inline bool blk_throtl_bio(struct bio *bio) { return false; } 166 162 #else /* CONFIG_BLK_DEV_THROTTLING */ 167 163 int blk_throtl_init(struct request_queue *q); 168 164 void blk_throtl_exit(struct request_queue *q); 169 165 void blk_throtl_register_queue(struct request_queue *q); 170 - void blk_throtl_charge_bio_split(struct bio *bio); 171 166 bool __blk_throtl_bio(struct bio *bio); 172 167 static inline bool blk_throtl_bio(struct bio *bio) 173 168 { 174 169 struct throtl_grp *tg = blkg_to_tg(bio->bi_blkg); 175 170 176 - if (bio_flagged(bio, BIO_THROTTLED)) 171 + /* no need to throttle bps any more if the bio has been throttled */ 172 + if (bio_flagged(bio, BIO_THROTTLED) && 173 + !(tg->flags & THROTL_TG_HAS_IOPS_LIMIT)) 177 174 return false; 175 + 178 176 if (!tg->has_rules[bio_data_dir(bio)]) 179 177 return false; 180 178

+4 -10

block/blk-zoned.c

··· 215 215 continue; 216 216 } 217 217 218 - bio = blk_next_bio(bio, 0, gfp_mask); 219 - bio_set_dev(bio, bdev); 220 - bio->bi_opf = REQ_OP_ZONE_RESET | REQ_SYNC; 218 + bio = blk_next_bio(bio, bdev, 0, REQ_OP_ZONE_RESET | REQ_SYNC, 219 + gfp_mask); 221 220 bio->bi_iter.bi_sector = sector; 222 221 sector += zone_sectors; 223 222 ··· 238 239 { 239 240 struct bio bio; 240 241 241 - bio_init(&bio, NULL, 0); 242 - bio_set_dev(&bio, bdev); 243 - bio.bi_opf = REQ_OP_ZONE_RESET_ALL | REQ_SYNC; 244 - 242 + bio_init(&bio, bdev, NULL, 0, REQ_OP_ZONE_RESET_ALL | REQ_SYNC); 245 243 return submit_bio_wait(&bio); 246 244 } 247 245 ··· 302 306 } 303 307 304 308 while (sector < end_sector) { 305 - bio = blk_next_bio(bio, 0, gfp_mask); 306 - bio_set_dev(bio, bdev); 307 - bio->bi_opf = op | REQ_SYNC; 309 + bio = blk_next_bio(bio, bdev, 0, op | REQ_SYNC, gfp_mask); 308 310 bio->bi_iter.bi_sector = sector; 309 311 sector += zone_sectors; 310 312

+5 -3

block/blk.h

··· 46 46 void __blk_mq_unfreeze_queue(struct request_queue *q, bool force_atomic); 47 47 void blk_queue_start_drain(struct request_queue *q); 48 48 int __bio_queue_enter(struct request_queue *q, struct bio *bio); 49 - bool submit_bio_checks(struct bio *bio); 49 + void submit_bio_noacct_nocheck(struct bio *bio); 50 50 51 51 static inline bool blk_try_enter_queue(struct request_queue *q, bool pm) 52 52 { ··· 406 406 static inline int blk_iolatency_init(struct request_queue *q) { return 0; } 407 407 #endif 408 408 409 - struct bio *blk_next_bio(struct bio *bio, unsigned int nr_pages, gfp_t gfp); 410 - 411 409 #ifdef CONFIG_BLK_DEV_ZONED 412 410 void blk_queue_free_zone_bitmaps(struct request_queue *q); 413 411 void blk_queue_clear_zone_settings(struct request_queue *q); ··· 424 426 int bdev_del_partition(struct gendisk *disk, int partno); 425 427 int bdev_resize_partition(struct gendisk *disk, int partno, sector_t start, 426 428 sector_t length); 429 + void blk_drop_partitions(struct gendisk *disk); 427 430 428 431 int bio_add_hw_page(struct request_queue *q, struct bio *bio, 429 432 struct page *page, unsigned int len, unsigned int offset, ··· 444 445 void disk_add_events(struct gendisk *disk); 445 446 void disk_del_events(struct gendisk *disk); 446 447 void disk_release_events(struct gendisk *disk); 448 + void disk_block_events(struct gendisk *disk); 449 + void disk_unblock_events(struct gendisk *disk); 450 + void disk_flush_events(struct gendisk *disk, unsigned int mask); 447 451 extern struct device_attribute dev_attr_events; 448 452 extern struct device_attribute dev_attr_events_async; 449 453 extern struct device_attribute dev_attr_events_poll_msecs;

+4 -7

block/bounce.c

··· 14 14 #include <linux/pagemap.h> 15 15 #include <linux/mempool.h> 16 16 #include <linux/blkdev.h> 17 - #include <linux/blk-cgroup.h> 18 17 #include <linux/backing-dev.h> 19 18 #include <linux/init.h> 20 19 #include <linux/hash.h> ··· 23 24 24 25 #include <trace/events/block.h> 25 26 #include "blk.h" 27 + #include "blk-cgroup.h" 26 28 27 29 #define POOL_SIZE 64 28 30 #define ISA_POOL_SIZE 16 ··· 162 162 * that does not own the bio - reason being drivers don't use it for 163 163 * iterating over the biovec anymore, so expecting it to be kept up 164 164 * to date (i.e. for clones that share the parent biovec) is just 165 - * asking for trouble and would force extra work on 166 - * __bio_clone_fast() anyways. 165 + * asking for trouble and would force extra work. 167 166 */ 168 - bio = bio_alloc_bioset(GFP_NOIO, bio_segments(bio_src), 169 - &bounce_bio_set); 170 - bio->bi_bdev = bio_src->bi_bdev; 167 + bio = bio_alloc_bioset(bio_src->bi_bdev, bio_segments(bio_src), 168 + bio_src->bi_opf, GFP_NOIO, &bounce_bio_set); 171 169 if (bio_flagged(bio_src, BIO_REMAPPED)) 172 170 bio_set_flag(bio, BIO_REMAPPED); 173 - bio->bi_opf = bio_src->bi_opf; 174 171 bio->bi_ioprio = bio_src->bi_ioprio; 175 172 bio->bi_write_hint = bio_src->bi_write_hint; 176 173 bio->bi_iter.bi_sector = bio_src->bi_iter.bi_sector;

+1 -1

block/disk-events.c

··· 4 4 */ 5 5 #include <linux/export.h> 6 6 #include <linux/moduleparam.h> 7 - #include <linux/genhd.h> 7 + #include <linux/blkdev.h> 8 8 #include "blk.h" 9 9 10 10 struct disk_events {

+5 -5

block/elevator.c

··· 35 35 #include <linux/hash.h> 36 36 #include <linux/uaccess.h> 37 37 #include <linux/pm_runtime.h> 38 - #include <linux/blk-cgroup.h> 39 38 40 39 #include <trace/events/block.h> 41 40 ··· 43 44 #include "blk-mq-sched.h" 44 45 #include "blk-pm.h" 45 46 #include "blk-wbt.h" 47 + #include "blk-cgroup.h" 46 48 47 49 static DEFINE_SPINLOCK(elv_list_lock); 48 50 static LIST_HEAD(elv_list); ··· 516 516 517 517 void elv_unregister_queue(struct request_queue *q) 518 518 { 519 + struct elevator_queue *e = q->elevator; 520 + 519 521 lockdep_assert_held(&q->sysfs_lock); 520 522 521 - if (q) { 523 + if (e && e->registered) { 522 524 struct elevator_queue *e = q->elevator; 523 525 524 526 kobject_uevent(&e->kobj, KOBJ_REMOVE); ··· 593 591 lockdep_assert_held(&q->sysfs_lock); 594 592 595 593 if (q->elevator) { 596 - if (q->elevator->registered) 597 - elv_unregister_queue(q); 598 - 594 + elv_unregister_queue(q); 599 595 ioc_clear_queue(q); 600 596 blk_mq_sched_free_rqs(q); 601 597 elevator_exit(q);

+16 -19

block/fops.c

··· 75 75 return -ENOMEM; 76 76 } 77 77 78 - bio_init(&bio, vecs, nr_pages); 79 - bio_set_dev(&bio, bdev); 78 + if (iov_iter_rw(iter) == READ) { 79 + bio_init(&bio, bdev, vecs, nr_pages, REQ_OP_READ); 80 + if (iter_is_iovec(iter)) 81 + should_dirty = true; 82 + } else { 83 + bio_init(&bio, bdev, vecs, nr_pages, dio_bio_write_op(iocb)); 84 + } 80 85 bio.bi_iter.bi_sector = pos >> SECTOR_SHIFT; 81 86 bio.bi_write_hint = iocb->ki_hint; 82 87 bio.bi_private = current; ··· 93 88 goto out; 94 89 ret = bio.bi_iter.bi_size; 95 90 96 - if (iov_iter_rw(iter) == READ) { 97 - bio.bi_opf = REQ_OP_READ; 98 - if (iter_is_iovec(iter)) 99 - should_dirty = true; 100 - } else { 101 - bio.bi_opf = dio_bio_write_op(iocb); 91 + if (iov_iter_rw(iter) == WRITE) 102 92 task_io_account_write(ret); 103 - } 93 + 104 94 if (iocb->ki_flags & IOCB_NOWAIT) 105 95 bio.bi_opf |= REQ_NOWAIT; 106 96 if (iocb->ki_flags & IOCB_HIPRI) ··· 190 190 struct blkdev_dio *dio; 191 191 struct bio *bio; 192 192 bool is_read = (iov_iter_rw(iter) == READ), is_sync; 193 + unsigned int opf = is_read ? REQ_OP_READ : dio_bio_write_op(iocb); 193 194 loff_t pos = iocb->ki_pos; 194 195 int ret = 0; 195 196 ··· 198 197 (bdev_logical_block_size(bdev) - 1)) 199 198 return -EINVAL; 200 199 201 - bio = bio_alloc_kiocb(iocb, nr_pages, &blkdev_dio_pool); 200 + bio = bio_alloc_kiocb(iocb, bdev, nr_pages, opf, &blkdev_dio_pool); 202 201 203 202 dio = container_of(bio, struct blkdev_dio, bio); 204 203 atomic_set(&dio->ref, 1); ··· 224 223 blk_start_plug(&plug); 225 224 226 225 for (;;) { 227 - bio_set_dev(bio, bdev); 228 226 bio->bi_iter.bi_sector = pos >> SECTOR_SHIFT; 229 227 bio->bi_write_hint = iocb->ki_hint; 230 228 bio->bi_private = dio; ··· 238 238 } 239 239 240 240 if (is_read) { 241 - bio->bi_opf = REQ_OP_READ; 242 241 if (dio->flags & DIO_SHOULD_DIRTY) 243 242 bio_set_pages_dirty(bio); 244 243 } else { 245 - bio->bi_opf = dio_bio_write_op(iocb); 246 244 task_io_account_write(bio->bi_iter.bi_size); 247 245 } 248 246 if (iocb->ki_flags & IOCB_NOWAIT) ··· 256 258 } 257 259 atomic_inc(&dio->ref); 258 260 submit_bio(bio); 259 - bio = bio_alloc(GFP_KERNEL, nr_pages); 261 + bio = bio_alloc(bdev, nr_pages, opf, GFP_KERNEL); 260 262 } 261 263 262 264 blk_finish_plug(&plug); ··· 311 313 unsigned int nr_pages) 312 314 { 313 315 struct block_device *bdev = iocb->ki_filp->private_data; 316 + bool is_read = iov_iter_rw(iter) == READ; 317 + unsigned int opf = is_read ? REQ_OP_READ : dio_bio_write_op(iocb); 314 318 struct blkdev_dio *dio; 315 319 struct bio *bio; 316 320 loff_t pos = iocb->ki_pos; ··· 322 322 (bdev_logical_block_size(bdev) - 1)) 323 323 return -EINVAL; 324 324 325 - bio = bio_alloc_kiocb(iocb, nr_pages, &blkdev_dio_pool); 325 + bio = bio_alloc_kiocb(iocb, bdev, nr_pages, opf, &blkdev_dio_pool); 326 326 dio = container_of(bio, struct blkdev_dio, bio); 327 327 dio->flags = 0; 328 328 dio->iocb = iocb; 329 - bio_set_dev(bio, bdev); 330 329 bio->bi_iter.bi_sector = pos >> SECTOR_SHIFT; 331 330 bio->bi_write_hint = iocb->ki_hint; 332 331 bio->bi_end_io = blkdev_bio_end_io_async; ··· 348 349 } 349 350 dio->size = bio->bi_iter.bi_size; 350 351 351 - if (iov_iter_rw(iter) == READ) { 352 - bio->bi_opf = REQ_OP_READ; 352 + if (is_read) { 353 353 if (iter_is_iovec(iter)) { 354 354 dio->flags |= DIO_SHOULD_DIRTY; 355 355 bio_set_pages_dirty(bio); 356 356 } 357 357 } else { 358 - bio->bi_opf = dio_bio_write_op(iocb); 359 358 task_io_account_write(bio->bi_iter.bi_size); 360 359 } 361 360

+23 -3

block/genhd.c

··· 8 8 #include <linux/module.h> 9 9 #include <linux/ctype.h> 10 10 #include <linux/fs.h> 11 - #include <linux/genhd.h> 12 11 #include <linux/kdev_t.h> 13 12 #include <linux/kernel.h> 14 13 #include <linux/blkdev.h> ··· 184 185 struct blk_major_name *next; 185 186 int major; 186 187 char name[16]; 188 + #ifdef CONFIG_BLOCK_LEGACY_AUTOLOAD 187 189 void (*probe)(dev_t devt); 190 + #endif 188 191 } *major_names[BLKDEV_MAJOR_HASH_SIZE]; 189 192 static DEFINE_MUTEX(major_names_lock); 190 193 static DEFINE_SPINLOCK(major_names_spinlock); ··· 276 275 } 277 276 278 277 p->major = major; 278 + #ifdef CONFIG_BLOCK_LEGACY_AUTOLOAD 279 279 p->probe = probe; 280 + #endif 280 281 strlcpy(p->name, name, sizeof(p->name)); 281 282 p->next = NULL; 282 283 index = major_to_index(major); ··· 526 523 527 524 disk_update_readahead(disk); 528 525 disk_add_events(disk); 526 + set_bit(GD_ADDED, &disk->state); 529 527 return 0; 530 528 531 529 out_unregister_bdi: ··· 697 693 return badblocks_store(disk->bb, page, len, 0); 698 694 } 699 695 696 + #ifdef CONFIG_BLOCK_LEGACY_AUTOLOAD 700 697 void blk_request_module(dev_t devt) 701 698 { 702 699 unsigned int major = MAJOR(devt); ··· 717 712 /* Make old-style 2.4 aliases work */ 718 713 request_module("block-major-%d", MAJOR(devt)); 719 714 } 715 + #endif /* CONFIG_BLOCK_LEGACY_AUTOLOAD */ 720 716 721 717 /* 722 718 * print a full list of all partitions - intended for places where the root ··· 933 927 struct disk_stats stat; 934 928 unsigned int inflight; 935 929 936 - part_stat_read_all(bdev, &stat); 937 930 if (queue_is_mq(q)) 938 931 inflight = blk_mq_in_flight(q, bdev); 939 932 else 940 933 inflight = part_in_flight(bdev); 941 934 935 + if (inflight) { 936 + part_stat_lock(); 937 + update_io_ticks(bdev, jiffies, true); 938 + part_stat_unlock(); 939 + } 940 + part_stat_read_all(bdev, &stat); 942 941 return sprintf(buf, 943 942 "%8lu %8lu %8llu %8u " 944 943 "%8lu %8lu %8llu %8u " ··· 1139 1128 xa_destroy(&disk->part_tbl); 1140 1129 disk->queue->disk = NULL; 1141 1130 blk_put_queue(disk->queue); 1131 + 1132 + if (test_bit(GD_ADDED, &disk->state) && disk->fops->free_disk) 1133 + disk->fops->free_disk(disk); 1134 + 1142 1135 iput(disk->part0->bd_inode); /* frees the disk */ 1143 1136 } 1144 1137 ··· 1203 1188 xa_for_each(&gp->part_tbl, idx, hd) { 1204 1189 if (bdev_is_partition(hd) && !bdev_nr_sectors(hd)) 1205 1190 continue; 1206 - part_stat_read_all(hd, &stat); 1207 1191 if (queue_is_mq(gp->queue)) 1208 1192 inflight = blk_mq_in_flight(gp->queue, hd); 1209 1193 else 1210 1194 inflight = part_in_flight(hd); 1211 1195 1196 + if (inflight) { 1197 + part_stat_lock(); 1198 + update_io_ticks(hd, jiffies, true); 1199 + part_stat_unlock(); 1200 + } 1201 + part_stat_read_all(hd, &stat); 1212 1202 seq_printf(seqf, "%4d %7d %pg " 1213 1203 "%lu %lu %lu %u " 1214 1204 "%lu %lu %lu %u "

+1 -1

block/holder.c

··· 1 1 // SPDX-License-Identifier: GPL-2.0-only 2 - #include <linux/genhd.h> 2 + #include <linux/blkdev.h> 3 3 #include <linux/slab.h> 4 4 5 5 struct bd_holder_disk {

-1

block/partitions/check.h

··· 1 1 /* SPDX-License-Identifier: GPL-2.0 */ 2 2 #include <linux/pagemap.h> 3 3 #include <linux/blkdev.h> 4 - #include <linux/genhd.h> 5 4 #include "../blk.h" 6 5 7 6 /*

-1

block/partitions/core.c

··· 8 8 #include <linux/major.h> 9 9 #include <linux/slab.h> 10 10 #include <linux/ctype.h> 11 - #include <linux/genhd.h> 12 11 #include <linux/vmalloc.h> 13 12 #include <linux/blktrace_api.h> 14 13 #include <linux/raid/detect.h>

-1

block/partitions/efi.h

··· 13 13 14 14 #include <linux/types.h> 15 15 #include <linux/fs.h> 16 - #include <linux/genhd.h> 17 16 #include <linux/kernel.h> 18 17 #include <linux/major.h> 19 18 #include <linux/string.h>

-1

block/partitions/ldm.h

··· 14 14 15 15 #include <linux/types.h> 16 16 #include <linux/list.h> 17 - #include <linux/genhd.h> 18 17 #include <linux/fs.h> 19 18 #include <asm/unaligned.h> 20 19 #include <asm/byteorder.h>

+1 -1

block/sed-opal.c

··· 13 13 #include <linux/device.h> 14 14 #include <linux/kernel.h> 15 15 #include <linux/list.h> 16 - #include <linux/genhd.h> 16 + #include <linux/blkdev.h> 17 17 #include <linux/slab.h> 18 18 #include <linux/uaccess.h> 19 19 #include <uapi/linux/sed-opal.h>

+1 -1

drivers/base/class.c

··· 16 16 #include <linux/kdev_t.h> 17 17 #include <linux/err.h> 18 18 #include <linux/slab.h> 19 - #include <linux/genhd.h> 19 + #include <linux/blkdev.h> 20 20 #include <linux/mutex.h> 21 21 #include "base.h" 22 22

+1 -1

drivers/base/core.c

··· 21 21 #include <linux/notifier.h> 22 22 #include <linux/of.h> 23 23 #include <linux/of_device.h> 24 - #include <linux/genhd.h> 24 + #include <linux/blkdev.h> 25 25 #include <linux/mutex.h> 26 26 #include <linux/pm_runtime.h> 27 27 #include <linux/netdevice.h>

+1 -1

drivers/base/devtmpfs.c

··· 17 17 #include <linux/syscalls.h> 18 18 #include <linux/mount.h> 19 19 #include <linux/device.h> 20 - #include <linux/genhd.h> 20 + #include <linux/blkdev.h> 21 21 #include <linux/namei.h> 22 22 #include <linux/fs.h> 23 23 #include <linux/shmem_fs.h>

-1

drivers/block/aoe/aoeblk.c

··· 12 12 #include <linux/ioctl.h> 13 13 #include <linux/slab.h> 14 14 #include <linux/ratelimit.h> 15 - #include <linux/genhd.h> 16 15 #include <linux/netdevice.h> 17 16 #include <linux/mutex.h> 18 17 #include <linux/export.h>

-1

drivers/block/aoe/aoecmd.c

··· 10 10 #include <linux/blk-mq.h> 11 11 #include <linux/skbuff.h> 12 12 #include <linux/netdevice.h> 13 - #include <linux/genhd.h> 14 13 #include <linux/moduleparam.h> 15 14 #include <linux/workqueue.h> 16 15 #include <linux/kthread.h>

+2 -3

drivers/block/drbd/drbd_actlog.c

··· 138 138 op_flags |= REQ_FUA | REQ_PREFLUSH; 139 139 op_flags |= REQ_SYNC; 140 140 141 - bio = bio_alloc_bioset(GFP_NOIO, 1, &drbd_md_io_bio_set); 142 - bio_set_dev(bio, bdev->md_bdev); 141 + bio = bio_alloc_bioset(bdev->md_bdev, 1, op | op_flags, GFP_NOIO, 142 + &drbd_md_io_bio_set); 143 143 bio->bi_iter.bi_sector = sector; 144 144 err = -EIO; 145 145 if (bio_add_page(bio, device->md_io.page, size, 0) != size) 146 146 goto out; 147 147 bio->bi_private = device; 148 148 bio->bi_end_io = drbd_md_endio; 149 - bio_set_op_attrs(bio, op, op_flags); 150 149 151 150 if (op != REQ_OP_WRITE && device->state.disk == D_DISKLESS && device->ldev == NULL) 152 151 /* special case, drbd_md_read() during drbd_adm_attach(): no get_ldev */

+3 -4

drivers/block/drbd/drbd_bitmap.c

··· 976 976 977 977 static void bm_page_io_async(struct drbd_bm_aio_ctx *ctx, int page_nr) __must_hold(local) 978 978 { 979 - struct bio *bio = bio_alloc_bioset(GFP_NOIO, 1, &drbd_md_io_bio_set); 980 979 struct drbd_device *device = ctx->device; 980 + unsigned int op = (ctx->flags & BM_AIO_READ) ? REQ_OP_READ : REQ_OP_WRITE; 981 + struct bio *bio = bio_alloc_bioset(device->ldev->md_bdev, 1, op, 982 + GFP_NOIO, &drbd_md_io_bio_set); 981 983 struct drbd_bitmap *b = device->bitmap; 982 984 struct page *page; 983 985 unsigned int len; 984 - unsigned int op = (ctx->flags & BM_AIO_READ) ? REQ_OP_READ : REQ_OP_WRITE; 985 986 986 987 sector_t on_disk_sector = 987 988 device->ldev->md.md_offset + device->ldev->md.bm_offset; ··· 1007 1006 bm_store_page_idx(page, page_nr); 1008 1007 } else 1009 1008 page = b->bm_pages[page_nr]; 1010 - bio_set_dev(bio, device->ldev->md_bdev); 1011 1009 bio->bi_iter.bi_sector = on_disk_sector; 1012 1010 /* bio_add_page of a single page to an empty bio will always succeed, 1013 1011 * according to api. Do we want to assert that? */ 1014 1012 bio_add_page(bio, page, len, 0); 1015 1013 bio->bi_private = ctx; 1016 1014 bio->bi_end_io = drbd_bm_endio; 1017 - bio_set_op_attrs(bio, op, 0); 1018 1015 1019 1016 if (drbd_insert_fault(device, (op == REQ_OP_WRITE) ? DRBD_FAULT_MD_WR : DRBD_FAULT_MD_RD)) { 1020 1017 bio_io_error(bio);

-1

drivers/block/drbd/drbd_int.h

··· 27 27 #include <linux/major.h> 28 28 #include <linux/blkdev.h> 29 29 #include <linux/backing-dev.h> 30 - #include <linux/genhd.h> 31 30 #include <linux/idr.h> 32 31 #include <linux/dynamic_debug.h> 33 32 #include <net/tcp.h>

+8 -24

drivers/block/drbd/drbd_receiver.c

··· 1279 1279 1280 1280 static void submit_one_flush(struct drbd_device *device, struct issue_flush_context *ctx) 1281 1281 { 1282 - struct bio *bio = bio_alloc(GFP_NOIO, 0); 1282 + struct bio *bio = bio_alloc(device->ldev->backing_bdev, 0, 1283 + REQ_OP_FLUSH | REQ_PREFLUSH, GFP_NOIO); 1283 1284 struct one_flush_context *octx = kmalloc(sizeof(*octx), GFP_NOIO); 1284 - if (!bio || !octx) { 1285 - drbd_warn(device, "Could not allocate a bio, CANNOT ISSUE FLUSH\n"); 1285 + 1286 + if (!octx) { 1287 + drbd_warn(device, "Could not allocate a octx, CANNOT ISSUE FLUSH\n"); 1286 1288 /* FIXME: what else can I do now? disconnecting or detaching 1287 1289 * really does not help to improve the state of the world, either. 1288 1290 */ 1289 - kfree(octx); 1290 - if (bio) 1291 - bio_put(bio); 1291 + bio_put(bio); 1292 1292 1293 1293 ctx->error = -ENOMEM; 1294 1294 put_ldev(device); ··· 1298 1298 1299 1299 octx->device = device; 1300 1300 octx->ctx = ctx; 1301 - bio_set_dev(bio, device->ldev->backing_bdev); 1302 1301 bio->bi_private = octx; 1303 1302 bio->bi_end_io = one_flush_endio; 1304 - bio->bi_opf = REQ_OP_FLUSH | REQ_PREFLUSH; 1305 1303 1306 1304 device->flush_jif = jiffies; 1307 1305 set_bit(FLUSH_PENDING, &device->flags); ··· 1644 1646 unsigned data_size = peer_req->i.size; 1645 1647 unsigned n_bios = 0; 1646 1648 unsigned nr_pages = (data_size + PAGE_SIZE -1) >> PAGE_SHIFT; 1647 - int err = -ENOMEM; 1648 1649 1649 1650 /* TRIM/DISCARD: for now, always use the helper function 1650 1651 * blkdev_issue_zeroout(..., discard=true). ··· 1684 1687 * generated bio, but a bio allocated on behalf of the peer. 1685 1688 */ 1686 1689 next_bio: 1687 - bio = bio_alloc(GFP_NOIO, nr_pages); 1688 - if (!bio) { 1689 - drbd_err(device, "submit_ee: Allocation of a bio failed (nr_pages=%u)\n", nr_pages); 1690 - goto fail; 1691 - } 1690 + bio = bio_alloc(device->ldev->backing_bdev, nr_pages, op | op_flags, 1691 + GFP_NOIO); 1692 1692 /* > peer_req->i.sector, unless this is the first bio */ 1693 1693 bio->bi_iter.bi_sector = sector; 1694 - bio_set_dev(bio, device->ldev->backing_bdev); 1695 - bio_set_op_attrs(bio, op, op_flags); 1696 1694 bio->bi_private = peer_req; 1697 1695 bio->bi_end_io = drbd_peer_request_endio; 1698 1696 ··· 1718 1726 drbd_submit_bio_noacct(device, fault_type, bio); 1719 1727 } while (bios); 1720 1728 return 0; 1721 - 1722 - fail: 1723 - while (bios) { 1724 - bio = bios; 1725 - bios = bios->bi_next; 1726 - bio_put(bio); 1727 - } 1728 - return err; 1729 1729 } 1730 1730 1731 1731 static void drbd_remove_epoch_entry_interval(struct drbd_device *device,

+2 -3

drivers/block/drbd/drbd_req.c

··· 30 30 return NULL; 31 31 memset(req, 0, sizeof(*req)); 32 32 33 - req->private_bio = bio_clone_fast(bio_src, GFP_NOIO, &drbd_io_bio_set); 33 + req->private_bio = bio_alloc_clone(device->ldev->backing_bdev, bio_src, 34 + GFP_NOIO, &drbd_io_bio_set); 34 35 req->private_bio->bi_private = req; 35 36 req->private_bio->bi_end_io = drbd_request_endio; 36 37 ··· 1151 1150 type = DRBD_FAULT_DT_RA; 1152 1151 else 1153 1152 type = DRBD_FAULT_DT_RD; 1154 - 1155 - bio_set_dev(bio, device->ldev->backing_bdev); 1156 1153 1157 1154 /* State may have changed since we grabbed our reference on the 1158 1155 * ->ldev member. Double check, and short-circuit to endio.

+2 -2

drivers/block/drbd/drbd_worker.c

··· 1523 1523 if (bio_data_dir(req->master_bio) == WRITE && req->rq_state & RQ_IN_ACT_LOG) 1524 1524 drbd_al_begin_io(device, &req->i); 1525 1525 1526 - req->private_bio = bio_clone_fast(req->master_bio, GFP_NOIO, 1526 + req->private_bio = bio_alloc_clone(device->ldev->backing_bdev, 1527 + req->master_bio, GFP_NOIO, 1527 1528 &drbd_io_bio_set); 1528 - bio_set_dev(req->private_bio, device->ldev->backing_bdev); 1529 1529 req->private_bio->bi_private = req; 1530 1530 req->private_bio->bi_end_io = drbd_request_endio; 1531 1531 submit_bio_noacct(req->private_bio);

+1 -3

drivers/block/floppy.c

··· 4129 4129 4130 4130 cbdata.drive = drive; 4131 4131 4132 - bio_init(&bio, &bio_vec, 1); 4133 - bio_set_dev(&bio, bdev); 4132 + bio_init(&bio, bdev, &bio_vec, 1, REQ_OP_READ); 4134 4133 bio_add_page(&bio, page, block_size(bdev), 0); 4135 4134 4136 4135 bio.bi_iter.bi_sector = 0; 4137 4136 bio.bi_flags |= (1 << BIO_QUIET); 4138 4137 bio.bi_private = &cbdata; 4139 4138 bio.bi_end_io = floppy_rb0_cb; 4140 - bio_set_op_attrs(&bio, REQ_OP_READ, 0); 4141 4139 4142 4140 init_completion(&cbdata.complete); 4143 4141

-1

drivers/block/mtip32xx/mtip32xx.c

··· 19 19 #include <linux/compat.h> 20 20 #include <linux/fs.h> 21 21 #include <linux/module.h> 22 - #include <linux/genhd.h> 23 22 #include <linux/blkdev.h> 24 23 #include <linux/blk-mq.h> 25 24 #include <linux/bio.h>

-1

drivers/block/mtip32xx/mtip32xx.h

··· 15 15 #include <linux/rwsem.h> 16 16 #include <linux/ata.h> 17 17 #include <linux/interrupt.h> 18 - #include <linux/genhd.h> 19 18 20 19 /* Offset of Subsystem Device ID in pci confoguration space */ 21 20 #define PCI_SUBSYSTEM_DEVICEID 0x2E

+5 -16

drivers/block/pktcdvd.c

··· 1020 1020 continue; 1021 1021 1022 1022 bio = pkt->r_bios[f]; 1023 - bio_reset(bio); 1023 + bio_reset(bio, pd->bdev, REQ_OP_READ); 1024 1024 bio->bi_iter.bi_sector = pkt->sector + f * (CD_FRAMESIZE >> 9); 1025 - bio_set_dev(bio, pd->bdev); 1026 1025 bio->bi_end_io = pkt_end_io_read; 1027 1026 bio->bi_private = pkt; 1028 1027 ··· 1033 1034 BUG(); 1034 1035 1035 1036 atomic_inc(&pkt->io_wait); 1036 - bio_set_op_attrs(bio, REQ_OP_READ, 0); 1037 1037 pkt_queue_bio(pd, bio); 1038 1038 frames_read++; 1039 1039 } ··· 1233 1235 { 1234 1236 int f; 1235 1237 1236 - bio_reset(pkt->w_bio); 1238 + bio_reset(pkt->w_bio, pd->bdev, REQ_OP_WRITE); 1237 1239 pkt->w_bio->bi_iter.bi_sector = pkt->sector; 1238 - bio_set_dev(pkt->w_bio, pd->bdev); 1239 1240 pkt->w_bio->bi_end_io = pkt_end_io_packet_write; 1240 1241 pkt->w_bio->bi_private = pkt; 1241 1242 ··· 1267 1270 1268 1271 /* Start the write request */ 1269 1272 atomic_set(&pkt->io_wait, 1); 1270 - bio_set_op_attrs(pkt->w_bio, REQ_OP_WRITE, 0); 1271 1273 pkt_queue_bio(pd, pkt->w_bio); 1272 1274 } 1273 1275 ··· 2294 2298 2295 2299 static void pkt_make_request_read(struct pktcdvd_device *pd, struct bio *bio) 2296 2300 { 2297 - struct bio *cloned_bio = bio_clone_fast(bio, GFP_NOIO, &pkt_bio_set); 2301 + struct bio *cloned_bio = 2302 + bio_alloc_clone(pd->bdev, bio, GFP_NOIO, &pkt_bio_set); 2298 2303 struct packet_stacked_data *psd = mempool_alloc(&psd_pool, GFP_NOIO); 2299 2304 2300 2305 psd->pd = pd; 2301 2306 psd->bio = bio; 2302 - bio_set_dev(cloned_bio, pd->bdev); 2303 2307 cloned_bio->bi_private = psd; 2304 2308 cloned_bio->bi_end_io = pkt_end_io_read_cloned; 2305 2309 pd->stats.secs_r += bio_sectors(bio); ··· 2400 2404 2401 2405 static void pkt_submit_bio(struct bio *bio) 2402 2406 { 2403 - struct pktcdvd_device *pd; 2404 - char b[BDEVNAME_SIZE]; 2407 + struct pktcdvd_device *pd = bio->bi_bdev->bd_disk->queue->queuedata; 2405 2408 struct bio *split; 2406 2409 2407 2410 blk_queue_split(&bio); 2408 - 2409 - pd = bio->bi_bdev->bd_disk->queue->queuedata; 2410 - if (!pd) { 2411 - pr_err("%s incorrect request queue\n", bio_devname(bio, b)); 2412 - goto end_io; 2413 - } 2414 2411 2415 2412 pkt_dbg(2, pd, "start = %6llx stop = %6llx\n", 2416 2413 (unsigned long long)bio->bi_iter.bi_sector,

+1 -60

drivers/block/rnbd/rnbd-srv-dev.c

··· 12 12 #include "rnbd-srv-dev.h" 13 13 #include "rnbd-log.h" 14 14 15 - struct rnbd_dev *rnbd_dev_open(const char *path, fmode_t flags, 16 - struct bio_set *bs) 15 + struct rnbd_dev *rnbd_dev_open(const char *path, fmode_t flags) 17 16 { 18 17 struct rnbd_dev *dev; 19 18 int ret; ··· 29 30 30 31 dev->blk_open_flags = flags; 31 32 bdevname(dev->bdev, dev->name); 32 - dev->ibd_bio_set = bs; 33 33 34 34 return dev; 35 35 ··· 41 43 { 42 44 blkdev_put(dev->bdev, dev->blk_open_flags); 43 45 kfree(dev); 44 - } 45 - 46 - void rnbd_dev_bi_end_io(struct bio *bio) 47 - { 48 - struct rnbd_dev_blk_io *io = bio->bi_private; 49 - 50 - rnbd_endio(io->priv, blk_status_to_errno(bio->bi_status)); 51 - bio_put(bio); 52 - } 53 - 54 - /** 55 - * rnbd_bio_map_kern - map kernel address into bio 56 - * @data: pointer to buffer to map 57 - * @bs: bio_set to use. 58 - * @len: length in bytes 59 - * @gfp_mask: allocation flags for bio allocation 60 - * 61 - * Map the kernel address into a bio suitable for io to a block 62 - * device. Returns an error pointer in case of error. 63 - */ 64 - struct bio *rnbd_bio_map_kern(void *data, struct bio_set *bs, 65 - unsigned int len, gfp_t gfp_mask) 66 - { 67 - unsigned long kaddr = (unsigned long)data; 68 - unsigned long end = (kaddr + len + PAGE_SIZE - 1) >> PAGE_SHIFT; 69 - unsigned long start = kaddr >> PAGE_SHIFT; 70 - const int nr_pages = end - start; 71 - int offset, i; 72 - struct bio *bio; 73 - 74 - bio = bio_alloc_bioset(gfp_mask, nr_pages, bs); 75 - if (!bio) 76 - return ERR_PTR(-ENOMEM); 77 - 78 - offset = offset_in_page(kaddr); 79 - for (i = 0; i < nr_pages; i++) { 80 - unsigned int bytes = PAGE_SIZE - offset; 81 - 82 - if (len <= 0) 83 - break; 84 - 85 - if (bytes > len) 86 - bytes = len; 87 - 88 - if (bio_add_page(bio, virt_to_page(data), bytes, 89 - offset) < bytes) { 90 - /* we don't support partial mappings */ 91 - bio_put(bio); 92 - return ERR_PTR(-EINVAL); 93 - } 94 - 95 - data += bytes; 96 - len -= bytes; 97 - offset = 0; 98 - } 99 - 100 - return bio; 101 46 }

+2 -16

drivers/block/rnbd/rnbd-srv-dev.h

··· 14 14 15 15 struct rnbd_dev { 16 16 struct block_device *bdev; 17 - struct bio_set *ibd_bio_set; 18 17 fmode_t blk_open_flags; 19 18 char name[BDEVNAME_SIZE]; 20 19 }; 21 20 22 - struct rnbd_dev_blk_io { 23 - struct rnbd_dev *dev; 24 - void *priv; 25 - /* have to be last member for front_pad usage of bioset_init */ 26 - struct bio bio; 27 - }; 28 - 29 21 /** 30 22 * rnbd_dev_open() - Open a device 23 + * @path: path to open 31 24 * @flags: open flags 32 - * @bs: bio_set to use during block io, 33 25 */ 34 - struct rnbd_dev *rnbd_dev_open(const char *path, fmode_t flags, 35 - struct bio_set *bs); 26 + struct rnbd_dev *rnbd_dev_open(const char *path, fmode_t flags); 36 27 37 28 /** 38 29 * rnbd_dev_close() - Close a device ··· 31 40 void rnbd_dev_close(struct rnbd_dev *dev); 32 41 33 42 void rnbd_endio(void *priv, int error); 34 - 35 - void rnbd_dev_bi_end_io(struct bio *bio); 36 - 37 - struct bio *rnbd_bio_map_kern(void *data, struct bio_set *bs, 38 - unsigned int len, gfp_t gfp_mask); 39 43 40 44 static inline int rnbd_dev_get_max_segs(const struct rnbd_dev *dev) 41 45 {

-1

drivers/block/rnbd/rnbd-srv-sysfs.c

··· 13 13 #include <linux/kobject.h> 14 14 #include <linux/sysfs.h> 15 15 #include <linux/stat.h> 16 - #include <linux/genhd.h> 17 16 #include <linux/list.h> 18 17 #include <linux/moduleparam.h> 19 18 #include <linux/device.h>

+17 -28

drivers/block/rnbd/rnbd-srv.c

··· 114 114 return sess_dev; 115 115 } 116 116 117 + static void rnbd_dev_bi_end_io(struct bio *bio) 118 + { 119 + rnbd_endio(bio->bi_private, blk_status_to_errno(bio->bi_status)); 120 + bio_put(bio); 121 + } 122 + 117 123 static int process_rdma(struct rnbd_srv_session *srv_sess, 118 124 struct rtrs_srv_op *id, void *data, u32 datalen, 119 125 const void *usr, size_t usrlen) ··· 129 123 struct rnbd_srv_sess_dev *sess_dev; 130 124 u32 dev_id; 131 125 int err; 132 - struct rnbd_dev_blk_io *io; 133 126 struct bio *bio; 134 127 short prio; 135 128 ··· 149 144 priv->sess_dev = sess_dev; 150 145 priv->id = id; 151 146 152 - /* Generate bio with pages pointing to the rdma buffer */ 153 - bio = rnbd_bio_map_kern(data, sess_dev->rnbd_dev->ibd_bio_set, datalen, GFP_KERNEL); 154 - if (IS_ERR(bio)) { 155 - err = PTR_ERR(bio); 156 - rnbd_srv_err(sess_dev, "Failed to generate bio, err: %d\n", err); 157 - goto sess_dev_put; 147 + bio = bio_alloc(sess_dev->rnbd_dev->bdev, 1, 148 + rnbd_to_bio_flags(le32_to_cpu(msg->rw)), GFP_KERNEL); 149 + if (bio_add_page(bio, virt_to_page(data), datalen, 150 + offset_in_page(data)) != datalen) { 151 + rnbd_srv_err(sess_dev, "Failed to map data to bio\n"); 152 + err = -EINVAL; 153 + goto bio_put; 158 154 } 159 155 160 - io = container_of(bio, struct rnbd_dev_blk_io, bio); 161 - io->dev = sess_dev->rnbd_dev; 162 - io->priv = priv; 163 - 164 156 bio->bi_end_io = rnbd_dev_bi_end_io; 165 - bio->bi_private = io; 166 - bio->bi_opf = rnbd_to_bio_flags(le32_to_cpu(msg->rw)); 157 + bio->bi_private = priv; 167 158 bio->bi_iter.bi_sector = le64_to_cpu(msg->sector); 168 159 bio->bi_iter.bi_size = le32_to_cpu(msg->bi_size); 169 160 prio = srv_sess->ver < RNBD_PROTO_VER_MAJOR || 170 161 usrlen < sizeof(*msg) ? 0 : le16_to_cpu(msg->prio); 171 162 bio_set_prio(bio, prio); 172 - bio_set_dev(bio, sess_dev->rnbd_dev->bdev); 173 163 174 164 submit_bio(bio); 175 165 176 166 return 0; 177 167 178 - sess_dev_put: 168 + bio_put: 169 + bio_put(bio); 179 170 rnbd_put_sess_dev(sess_dev); 180 171 err: 181 172 kfree(priv); ··· 252 251 253 252 out: 254 253 xa_destroy(&srv_sess->index_idr); 255 - bioset_exit(&srv_sess->sess_bio_set); 256 254 257 255 pr_info("RTRS Session %s disconnected\n", srv_sess->sessname); 258 256 ··· 280 280 return -ENOMEM; 281 281 282 282 srv_sess->queue_depth = rtrs_srv_get_queue_depth(rtrs); 283 - err = bioset_init(&srv_sess->sess_bio_set, srv_sess->queue_depth, 284 - offsetof(struct rnbd_dev_blk_io, bio), 285 - BIOSET_NEED_BVECS); 286 - if (err) { 287 - pr_err("Allocating srv_session for path %s failed\n", 288 - pathname); 289 - kfree(srv_sess); 290 - return err; 291 - } 292 - 293 283 xa_init_flags(&srv_sess->index_idr, XA_FLAGS_ALLOC); 294 284 INIT_LIST_HEAD(&srv_sess->sess_dev_list); 295 285 mutex_init(&srv_sess->lock); ··· 728 738 goto reject; 729 739 } 730 740 731 - rnbd_dev = rnbd_dev_open(full_path, open_flags, 732 - &srv_sess->sess_bio_set); 741 + rnbd_dev = rnbd_dev_open(full_path, open_flags); 733 742 if (IS_ERR(rnbd_dev)) { 734 743 pr_err("Opening device '%s' on session %s failed, failed to open the block device, err: %ld\n", 735 744 full_path, srv_sess->sessname, PTR_ERR(rnbd_dev));

-1

drivers/block/rnbd/rnbd-srv.h

··· 23 23 struct rtrs_srv_sess *rtrs; 24 24 char sessname[NAME_MAX]; 25 25 int queue_depth; 26 - struct bio_set sess_bio_set; 27 26 28 27 struct xarray index_idr; 29 28 /* List of struct rnbd_srv_sess_dev */

-1

drivers/block/sunvdc.c

··· 9 9 #include <linux/types.h> 10 10 #include <linux/blk-mq.h> 11 11 #include <linux/hdreg.h> 12 - #include <linux/genhd.h> 13 12 #include <linux/cdrom.h> 14 13 #include <linux/slab.h> 15 14 #include <linux/spinlock.h>

+14 -52

drivers/block/virtio_blk.c

··· 69 69 /* Process context for config space updates */ 70 70 struct work_struct config_work; 71 71 72 - /* 73 - * Tracks references from block_device_operations open/release and 74 - * virtio_driver probe/remove so this object can be freed once no 75 - * longer in use. 76 - */ 77 - refcount_t refs; 78 - 79 72 /* What host tells us, plus 2 for header & tailer. */ 80 73 unsigned int sg_elems; 81 74 ··· 384 391 return err; 385 392 } 386 393 387 - static void virtblk_get(struct virtio_blk *vblk) 388 - { 389 - refcount_inc(&vblk->refs); 390 - } 391 - 392 - static void virtblk_put(struct virtio_blk *vblk) 393 - { 394 - if (refcount_dec_and_test(&vblk->refs)) { 395 - ida_simple_remove(&vd_index_ida, vblk->index); 396 - mutex_destroy(&vblk->vdev_mutex); 397 - kfree(vblk); 398 - } 399 - } 400 - 401 - static int virtblk_open(struct block_device *bd, fmode_t mode) 402 - { 403 - struct virtio_blk *vblk = bd->bd_disk->private_data; 404 - int ret = 0; 405 - 406 - mutex_lock(&vblk->vdev_mutex); 407 - 408 - if (vblk->vdev) 409 - virtblk_get(vblk); 410 - else 411 - ret = -ENXIO; 412 - 413 - mutex_unlock(&vblk->vdev_mutex); 414 - return ret; 415 - } 416 - 417 - static void virtblk_release(struct gendisk *disk, fmode_t mode) 418 - { 419 - struct virtio_blk *vblk = disk->private_data; 420 - 421 - virtblk_put(vblk); 422 - } 423 - 424 394 /* We provide getgeo only to please some old bootloader/partitioning tools */ 425 395 static int virtblk_getgeo(struct block_device *bd, struct hd_geometry *geo) 426 396 { ··· 416 460 return ret; 417 461 } 418 462 463 + static void virtblk_free_disk(struct gendisk *disk) 464 + { 465 + struct virtio_blk *vblk = disk->private_data; 466 + 467 + ida_simple_remove(&vd_index_ida, vblk->index); 468 + mutex_destroy(&vblk->vdev_mutex); 469 + kfree(vblk); 470 + } 471 + 419 472 static const struct block_device_operations virtblk_fops = { 420 - .owner = THIS_MODULE, 421 - .open = virtblk_open, 422 - .release = virtblk_release, 423 - .getgeo = virtblk_getgeo, 473 + .owner = THIS_MODULE, 474 + .getgeo = virtblk_getgeo, 475 + .free_disk = virtblk_free_disk, 424 476 }; 425 477 426 478 static int index_to_minor(int index) ··· 755 791 goto out_free_index; 756 792 } 757 793 758 - /* This reference is dropped in virtblk_remove(). */ 759 - refcount_set(&vblk->refs, 1); 760 794 mutex_init(&vblk->vdev_mutex); 761 795 762 796 vblk->vdev = vdev; ··· 932 970 flush_work(&vblk->config_work); 933 971 934 972 del_gendisk(vblk->disk); 935 - blk_cleanup_disk(vblk->disk); 973 + blk_cleanup_queue(vblk->disk->queue); 936 974 blk_mq_free_tag_set(&vblk->tag_set); 937 975 938 976 mutex_lock(&vblk->vdev_mutex); ··· 948 986 949 987 mutex_unlock(&vblk->vdev_mutex); 950 988 951 - virtblk_put(vblk); 989 + put_disk(vblk->disk); 952 990 } 953 991 954 992 #ifdef CONFIG_PM_SLEEP

+5 -20

drivers/block/xen-blkback/blkback.c

··· 1326 1326 pages[i]->page, 1327 1327 seg[i].nsec << 9, 1328 1328 seg[i].offset) == 0)) { 1329 - bio = bio_alloc(GFP_KERNEL, bio_max_segs(nseg - i)); 1330 - if (unlikely(bio == NULL)) 1331 - goto fail_put_bio; 1332 - 1329 + bio = bio_alloc(preq.bdev, bio_max_segs(nseg - i), 1330 + operation | operation_flags, 1331 + GFP_KERNEL); 1333 1332 biolist[nbio++] = bio; 1334 - bio_set_dev(bio, preq.bdev); 1335 1333 bio->bi_private = pending_req; 1336 1334 bio->bi_end_io = end_block_io_op; 1337 1335 bio->bi_iter.bi_sector = preq.sector_number; 1338 - bio_set_op_attrs(bio, operation, operation_flags); 1339 1336 } 1340 1337 1341 1338 preq.sector_number += seg[i].nsec; ··· 1342 1345 if (!bio) { 1343 1346 BUG_ON(operation_flags != REQ_PREFLUSH); 1344 1347 1345 - bio = bio_alloc(GFP_KERNEL, 0); 1346 - if (unlikely(bio == NULL)) 1347 - goto fail_put_bio; 1348 - 1348 + bio = bio_alloc(preq.bdev, 0, operation | operation_flags, 1349 + GFP_KERNEL); 1349 1350 biolist[nbio++] = bio; 1350 - bio_set_dev(bio, preq.bdev); 1351 1351 bio->bi_private = pending_req; 1352 1352 bio->bi_end_io = end_block_io_op; 1353 - bio_set_op_attrs(bio, operation, operation_flags); 1354 1353 } 1355 1354 1356 1355 atomic_set(&pending_req->pendcnt, nbio); ··· 1372 1379 /* Haven't submitted any bio's yet. */ 1373 1380 make_response(ring, req->u.rw.id, req_operation, BLKIF_RSP_ERROR); 1374 1381 free_req(ring, pending_req); 1375 - msleep(1); /* back off a bit */ 1376 - return -EIO; 1377 - 1378 - fail_put_bio: 1379 - for (i = 0; i < nbio; i++) 1380 - bio_put(biolist[i]); 1381 - atomic_set(&pending_req->pendcnt, 1); 1382 - __end_block_io_op(pending_req, BLK_STS_RESOURCE); 1383 1382 msleep(1); /* back off a bit */ 1384 1383 return -EIO; 1385 1384 }

+6 -11

drivers/block/zram/zram_drv.c

··· 22 22 #include <linux/blkdev.h> 23 23 #include <linux/buffer_head.h> 24 24 #include <linux/device.h> 25 - #include <linux/genhd.h> 26 25 #include <linux/highmem.h> 27 26 #include <linux/slab.h> 28 27 #include <linux/backing-dev.h> ··· 616 617 { 617 618 struct bio *bio; 618 619 619 - bio = bio_alloc(GFP_NOIO, 1); 620 + bio = bio_alloc(zram->bdev, 1, parent ? parent->bi_opf : REQ_OP_READ, 621 + GFP_NOIO); 620 622 if (!bio) 621 623 return -ENOMEM; 622 624 623 625 bio->bi_iter.bi_sector = entry * (PAGE_SIZE >> 9); 624 - bio_set_dev(bio, zram->bdev); 625 626 if (!bio_add_page(bio, bvec->bv_page, bvec->bv_len, bvec->bv_offset)) { 626 627 bio_put(bio); 627 628 return -EIO; 628 629 } 629 630 630 - if (!parent) { 631 - bio->bi_opf = REQ_OP_READ; 631 + if (!parent) 632 632 bio->bi_end_io = zram_page_end_io; 633 - } else { 634 - bio->bi_opf = parent->bi_opf; 633 + else 635 634 bio_chain(bio, parent); 636 - } 637 635 638 636 submit_bio(bio); 639 637 return 1; ··· 743 747 continue; 744 748 } 745 749 746 - bio_init(&bio, &bio_vec, 1); 747 - bio_set_dev(&bio, zram->bdev); 750 + bio_init(&bio, zram->bdev, &bio_vec, 1, 751 + REQ_OP_WRITE | REQ_SYNC); 748 752 bio.bi_iter.bi_sector = blk_idx * (PAGE_SIZE >> 9); 749 - bio.bi_opf = REQ_OP_WRITE | REQ_SYNC; 750 753 751 754 bio_add_page(&bio, bvec.bv_page, bvec.bv_len, 752 755 bvec.bv_offset);

-1

drivers/cdrom/gdrom.c

··· 15 15 #include <linux/slab.h> 16 16 #include <linux/dma-mapping.h> 17 17 #include <linux/cdrom.h> 18 - #include <linux/genhd.h> 19 18 #include <linux/bio.h> 20 19 #include <linux/blk-mq.h> 21 20 #include <linux/interrupt.h>

+1 -1

drivers/char/random.c

··· 330 330 #include <linux/poll.h> 331 331 #include <linux/init.h> 332 332 #include <linux/fs.h> 333 - #include <linux/genhd.h> 333 + #include <linux/blkdev.h> 334 334 #include <linux/interrupt.h> 335 335 #include <linux/mm.h> 336 336 #include <linux/nodemask.h>

+1

drivers/md/Kconfig

··· 204 204 tristate "Device mapper support" 205 205 select BLOCK_HOLDER_DEPRECATED if SYSFS 206 206 select BLK_DEV_DM_BUILTIN 207 + select BLK_MQ_STACKING 207 208 depends on DAX || DAX=n 208 209 help 209 210 Device-mapper is a low level volume manager. It works by allowing

+2 -1

drivers/md/bcache/io.c

··· 26 26 struct bbio *b = mempool_alloc(&c->bio_meta, GFP_NOIO); 27 27 struct bio *bio = &b->bio; 28 28 29 - bio_init(bio, bio->bi_inline_vecs, meta_bucket_pages(&c->cache->sb)); 29 + bio_init(bio, NULL, bio->bi_inline_vecs, 30 + meta_bucket_pages(&c->cache->sb), 0); 30 31 31 32 return bio; 32 33 }

+5 -11

drivers/md/bcache/journal.c

··· 53 53 reread: left = ca->sb.bucket_size - offset; 54 54 len = min_t(unsigned int, left, PAGE_SECTORS << JSET_BITS); 55 55 56 - bio_reset(bio); 56 + bio_reset(bio, ca->bdev, REQ_OP_READ); 57 57 bio->bi_iter.bi_sector = bucket + offset; 58 - bio_set_dev(bio, ca->bdev); 59 58 bio->bi_iter.bi_size = len << 9; 60 59 61 60 bio->bi_end_io = journal_read_endio; 62 61 bio->bi_private = &cl; 63 - bio_set_op_attrs(bio, REQ_OP_READ, 0); 64 62 bch_bio_map(bio, data); 65 63 66 64 closure_bio_submit(ca->set, bio, &cl); ··· 609 611 610 612 atomic_set(&ja->discard_in_flight, DISCARD_IN_FLIGHT); 611 613 612 - bio_init(bio, bio->bi_inline_vecs, 1); 613 - bio_set_op_attrs(bio, REQ_OP_DISCARD, 0); 614 + bio_init(bio, ca->bdev, bio->bi_inline_vecs, 1, REQ_OP_DISCARD); 614 615 bio->bi_iter.bi_sector = bucket_to_sector(ca->set, 615 616 ca->sb.d[ja->discard_idx]); 616 - bio_set_dev(bio, ca->bdev); 617 617 bio->bi_iter.bi_size = bucket_bytes(ca); 618 618 bio->bi_end_io = journal_discard_endio; 619 619 ··· 769 773 770 774 atomic_long_add(sectors, &ca->meta_sectors_written); 771 775 772 - bio_reset(bio); 776 + bio_reset(bio, ca->bdev, REQ_OP_WRITE | 777 + REQ_SYNC | REQ_META | REQ_PREFLUSH | REQ_FUA); 778 + bch_bio_map(bio, w->data); 773 779 bio->bi_iter.bi_sector = PTR_OFFSET(k, i); 774 - bio_set_dev(bio, ca->bdev); 775 780 bio->bi_iter.bi_size = sectors << 9; 776 781 777 782 bio->bi_end_io = journal_write_endio; 778 783 bio->bi_private = w; 779 - bio_set_op_attrs(bio, REQ_OP_WRITE, 780 - REQ_SYNC|REQ_META|REQ_PREFLUSH|REQ_FUA); 781 - bch_bio_map(bio, w->data); 782 784 783 785 trace_bcache_journal_write(bio, w->data->keys); 784 786 bio_list_add(&list, bio);

+2 -2

drivers/md/bcache/movinggc.c

··· 79 79 { 80 80 struct bio *bio = &io->bio.bio; 81 81 82 - bio_init(bio, bio->bi_inline_vecs, 83 - DIV_ROUND_UP(KEY_SIZE(&io->w->key), PAGE_SECTORS)); 82 + bio_init(bio, NULL, bio->bi_inline_vecs, 83 + DIV_ROUND_UP(KEY_SIZE(&io->w->key), PAGE_SECTORS), 0); 84 84 bio_get(bio); 85 85 bio_set_prio(bio, IOPRIO_PRIO_VALUE(IOPRIO_CLASS_IDLE, 0)); 86 86

+10 -12

drivers/md/bcache/request.c

··· 685 685 { 686 686 struct bio *bio = &s->bio.bio; 687 687 688 - bio_init(bio, NULL, 0); 689 - __bio_clone_fast(bio, orig_bio); 688 + bio_init_clone(bio->bi_bdev, bio, orig_bio, GFP_NOIO); 690 689 /* 691 690 * bi_end_io can be set separately somewhere else, e.g. the 692 691 * variants in, ··· 830 831 */ 831 832 832 833 if (s->iop.bio) { 833 - bio_reset(s->iop.bio); 834 + bio_reset(s->iop.bio, s->cache_miss->bi_bdev, REQ_OP_READ); 834 835 s->iop.bio->bi_iter.bi_sector = 835 836 s->cache_miss->bi_iter.bi_sector; 836 - bio_copy_dev(s->iop.bio, s->cache_miss); 837 837 s->iop.bio->bi_iter.bi_size = s->insert_bio_sectors << 9; 838 + bio_clone_blkg_association(s->iop.bio, s->cache_miss); 838 839 bch_bio_map(s->iop.bio, NULL); 839 840 840 841 bio_copy_data(s->cache_miss, s->iop.bio); ··· 912 913 /* btree_search_recurse()'s btree iterator is no good anymore */ 913 914 ret = miss == bio ? MAP_DONE : -EINTR; 914 915 915 - cache_bio = bio_alloc_bioset(GFP_NOWAIT, 916 + cache_bio = bio_alloc_bioset(miss->bi_bdev, 916 917 DIV_ROUND_UP(s->insert_bio_sectors, PAGE_SECTORS), 917 - &dc->disk.bio_split); 918 + 0, GFP_NOWAIT, &dc->disk.bio_split); 918 919 if (!cache_bio) 919 920 goto out_submit; 920 921 921 922 cache_bio->bi_iter.bi_sector = miss->bi_iter.bi_sector; 922 - bio_copy_dev(cache_bio, miss); 923 923 cache_bio->bi_iter.bi_size = s->insert_bio_sectors << 9; 924 924 925 925 cache_bio->bi_end_io = backing_request_endio; ··· 1023 1025 */ 1024 1026 struct bio *flush; 1025 1027 1026 - flush = bio_alloc_bioset(GFP_NOIO, 0, 1027 - &dc->disk.bio_split); 1028 + flush = bio_alloc_bioset(bio->bi_bdev, 0, 1029 + REQ_OP_WRITE | REQ_PREFLUSH, 1030 + GFP_NOIO, &dc->disk.bio_split); 1028 1031 if (!flush) { 1029 1032 s->iop.status = BLK_STS_RESOURCE; 1030 1033 goto insert_data; 1031 1034 } 1032 - bio_copy_dev(flush, bio); 1033 1035 flush->bi_end_io = backing_request_endio; 1034 1036 flush->bi_private = cl; 1035 - flush->bi_opf = REQ_OP_WRITE | REQ_PREFLUSH; 1036 1037 /* I/O request sent to backing device */ 1037 1038 closure_bio_submit(s->iop.c, flush, cl); 1038 1039 } 1039 1040 } else { 1040 - s->iop.bio = bio_clone_fast(bio, GFP_NOIO, &dc->disk.bio_split); 1041 + s->iop.bio = bio_alloc_clone(bio->bi_bdev, bio, GFP_NOIO, 1042 + &dc->disk.bio_split); 1041 1043 /* I/O request sent to backing device */ 1042 1044 bio->bi_end_io = backing_request_endio; 1043 1045 closure_bio_submit(s->iop.c, bio, cl);

+3 -6

drivers/md/bcache/super.c

··· 18 18 #include <linux/blkdev.h> 19 19 #include <linux/pagemap.h> 20 20 #include <linux/debugfs.h> 21 - #include <linux/genhd.h> 22 21 #include <linux/idr.h> 23 22 #include <linux/kthread.h> 24 23 #include <linux/workqueue.h> ··· 342 343 down(&dc->sb_write_mutex); 343 344 closure_init(cl, parent); 344 345 345 - bio_init(bio, dc->sb_bv, 1); 346 - bio_set_dev(bio, dc->bdev); 346 + bio_init(bio, dc->bdev, dc->sb_bv, 1, 0); 347 347 bio->bi_end_io = write_bdev_super_endio; 348 348 bio->bi_private = dc; 349 349 ··· 385 387 if (ca->sb.version < version) 386 388 ca->sb.version = version; 387 389 388 - bio_init(bio, ca->sb_bv, 1); 389 - bio_set_dev(bio, ca->bdev); 390 + bio_init(bio, ca->bdev, ca->sb_bv, 1, 0); 390 391 bio->bi_end_io = write_super_endio; 391 392 bio->bi_private = ca; 392 393 ··· 2237 2240 __module_get(THIS_MODULE); 2238 2241 kobject_init(&ca->kobj, &bch_cache_ktype); 2239 2242 2240 - bio_init(&ca->journal.bio, ca->journal.bio.bi_inline_vecs, 8); 2243 + bio_init(&ca->journal.bio, NULL, ca->journal.bio.bi_inline_vecs, 8, 0); 2241 2244 2242 2245 /* 2243 2246 * when ca->sb.njournal_buckets is not zero, journal exists,

+2 -2

drivers/md/bcache/writeback.c

··· 292 292 struct dirty_io *io = w->private; 293 293 struct bio *bio = &io->bio; 294 294 295 - bio_init(bio, bio->bi_inline_vecs, 296 - DIV_ROUND_UP(KEY_SIZE(&w->key), PAGE_SECTORS)); 295 + bio_init(bio, NULL, bio->bi_inline_vecs, 296 + DIV_ROUND_UP(KEY_SIZE(&w->key), PAGE_SECTORS), 0); 297 297 if (!io->dc->writeback_percent) 298 298 bio_set_prio(bio, IOPRIO_PRIO_VALUE(IOPRIO_CLASS_IDLE, 0)); 299 299

+9 -17

drivers/md/dm-cache-target.c

··· 744 744 spin_unlock_irq(&cache->lock); 745 745 } 746 746 747 - static void __remap_to_origin_clear_discard(struct cache *cache, struct bio *bio, 748 - dm_oblock_t oblock, bool bio_has_pbd) 749 - { 750 - if (bio_has_pbd) 751 - check_if_tick_bio_needed(cache, bio); 752 - remap_to_origin(cache, bio); 753 - if (bio_data_dir(bio) == WRITE) 754 - clear_discard(cache, oblock_to_dblock(cache, oblock)); 755 - } 756 - 757 747 static void remap_to_origin_clear_discard(struct cache *cache, struct bio *bio, 758 748 dm_oblock_t oblock) 759 749 { 760 750 // FIXME: check_if_tick_bio_needed() is called way too much through this interface 761 - __remap_to_origin_clear_discard(cache, bio, oblock, true); 751 + check_if_tick_bio_needed(cache, bio); 752 + remap_to_origin(cache, bio); 753 + if (bio_data_dir(bio) == WRITE) 754 + clear_discard(cache, oblock_to_dblock(cache, oblock)); 762 755 } 763 756 764 757 static void remap_to_cache_dirty(struct cache *cache, struct bio *bio, ··· 819 826 static void remap_to_origin_and_cache(struct cache *cache, struct bio *bio, 820 827 dm_oblock_t oblock, dm_cblock_t cblock) 821 828 { 822 - struct bio *origin_bio = bio_clone_fast(bio, GFP_NOIO, &cache->bs); 829 + struct bio *origin_bio = bio_alloc_clone(cache->origin_dev->bdev, bio, 830 + GFP_NOIO, &cache->bs); 823 831 824 832 BUG_ON(!origin_bio); 825 833 826 834 bio_chain(origin_bio, bio); 827 - /* 828 - * Passing false to __remap_to_origin_clear_discard() skips 829 - * all code that might use per_bio_data (since clone doesn't have it) 830 - */ 831 - __remap_to_origin_clear_discard(cache, origin_bio, oblock, false); 835 + 836 + if (bio_data_dir(origin_bio) == WRITE) 837 + clear_discard(cache, oblock_to_dblock(cache, oblock)); 832 838 submit_bio(origin_bio); 833 839 834 840 remap_to_cache(cache, bio, cblock);

-1

drivers/md/dm-core.h

··· 11 11 12 12 #include <linux/kthread.h> 13 13 #include <linux/ktime.h> 14 - #include <linux/genhd.h> 15 14 #include <linux/blk-mq.h> 16 15 #include <linux/blk-crypto-profile.h> 17 16

+17 -29

drivers/md/dm-crypt.c

··· 234 234 #define DM_CRYPT_MEMORY_PERCENT 2 235 235 #define DM_CRYPT_MIN_PAGES_PER_CLIENT (BIO_MAX_VECS * 16) 236 236 237 - static void clone_init(struct dm_crypt_io *, struct bio *); 237 + static void crypt_endio(struct bio *clone); 238 238 static void kcryptd_queue_crypt(struct dm_crypt_io *io); 239 239 static struct scatterlist *crypt_get_sg_data(struct crypt_config *cc, 240 240 struct scatterlist *sg); ··· 1364 1364 } 1365 1365 1366 1366 if (r == -EBADMSG) { 1367 - char b[BDEVNAME_SIZE]; 1368 1367 sector_t s = le64_to_cpu(*sector); 1369 1368 1370 - DMERR_LIMIT("%s: INTEGRITY AEAD ERROR, sector %llu", 1371 - bio_devname(ctx->bio_in, b), s); 1369 + DMERR_LIMIT("%pg: INTEGRITY AEAD ERROR, sector %llu", 1370 + ctx->bio_in->bi_bdev, s); 1372 1371 dm_audit_log_bio(DM_MSG_PREFIX, "integrity-aead", 1373 1372 ctx->bio_in, s, 0); 1374 1373 } ··· 1671 1672 if (unlikely(gfp_mask & __GFP_DIRECT_RECLAIM)) 1672 1673 mutex_lock(&cc->bio_alloc_lock); 1673 1674 1674 - clone = bio_alloc_bioset(GFP_NOIO, nr_iovecs, &cc->bs); 1675 - if (!clone) 1676 - goto out; 1677 - 1678 - clone_init(io, clone); 1675 + clone = bio_alloc_bioset(cc->dev->bdev, nr_iovecs, io->base_bio->bi_opf, 1676 + GFP_NOIO, &cc->bs); 1677 + clone->bi_private = io; 1678 + clone->bi_end_io = crypt_endio; 1679 1679 1680 1680 remaining_size = size; 1681 1681 ··· 1700 1702 bio_put(clone); 1701 1703 clone = NULL; 1702 1704 } 1703 - out: 1705 + 1704 1706 if (unlikely(gfp_mask & __GFP_DIRECT_RECLAIM)) 1705 1707 mutex_unlock(&cc->bio_alloc_lock); 1706 1708 ··· 1827 1829 crypt_dec_pending(io); 1828 1830 } 1829 1831 1830 - static void clone_init(struct dm_crypt_io *io, struct bio *clone) 1831 - { 1832 - struct crypt_config *cc = io->cc; 1833 - 1834 - clone->bi_private = io; 1835 - clone->bi_end_io = crypt_endio; 1836 - bio_set_dev(clone, cc->dev->bdev); 1837 - clone->bi_opf = io->base_bio->bi_opf; 1838 - } 1839 - 1840 1832 static int kcryptd_io_read(struct dm_crypt_io *io, gfp_t gfp) 1841 1833 { 1842 1834 struct crypt_config *cc = io->cc; 1843 1835 struct bio *clone; 1844 1836 1845 1837 /* 1846 - * We need the original biovec array in order to decrypt 1847 - * the whole bio data *afterwards* -- thanks to immutable 1848 - * biovecs we don't need to worry about the block layer 1849 - * modifying the biovec array; so leverage bio_clone_fast(). 1838 + * We need the original biovec array in order to decrypt the whole bio 1839 + * data *afterwards* -- thanks to immutable biovecs we don't need to 1840 + * worry about the block layer modifying the biovec array; so leverage 1841 + * bio_alloc_clone(). 1850 1842 */ 1851 - clone = bio_clone_fast(io->base_bio, gfp, &cc->bs); 1843 + clone = bio_alloc_clone(cc->dev->bdev, io->base_bio, gfp, &cc->bs); 1852 1844 if (!clone) 1853 1845 return 1; 1846 + clone->bi_private = io; 1847 + clone->bi_end_io = crypt_endio; 1854 1848 1855 1849 crypt_inc_pending(io); 1856 1850 1857 - clone_init(io, clone); 1858 1851 clone->bi_iter.bi_sector = cc->start + io->sector; 1859 1852 1860 1853 if (dm_crypt_integrity_io_alloc(io, clone)) { ··· 2168 2179 error = cc->iv_gen_ops->post(cc, org_iv_of_dmreq(cc, dmreq), dmreq); 2169 2180 2170 2181 if (error == -EBADMSG) { 2171 - char b[BDEVNAME_SIZE]; 2172 2182 sector_t s = le64_to_cpu(*org_sector_of_dmreq(cc, dmreq)); 2173 2183 2174 - DMERR_LIMIT("%s: INTEGRITY AEAD ERROR, sector %llu", 2175 - bio_devname(ctx->bio_in, b), s); 2184 + DMERR_LIMIT("%pg: INTEGRITY AEAD ERROR, sector %llu", 2185 + ctx->bio_in->bi_bdev, s); 2176 2186 dm_audit_log_bio(DM_MSG_PREFIX, "integrity-aead", 2177 2187 ctx->bio_in, s, 0); 2178 2188 io->error = BLK_STS_PROTECTION;

+2 -3

drivers/md/dm-integrity.c

··· 1788 1788 checksums_ptr - checksums, dio->op == REQ_OP_READ ? TAG_CMP : TAG_WRITE); 1789 1789 if (unlikely(r)) { 1790 1790 if (r > 0) { 1791 - char b[BDEVNAME_SIZE]; 1792 1791 sector_t s; 1793 1792 1794 1793 s = sector - ((r + ic->tag_size - 1) / ic->tag_size); 1795 - DMERR_LIMIT("%s: Checksum failed at sector 0x%llx", 1796 - bio_devname(bio, b), s); 1794 + DMERR_LIMIT("%pg: Checksum failed at sector 0x%llx", 1795 + bio->bi_bdev, s); 1797 1796 r = -EILSEQ; 1798 1797 atomic64_inc(&ic->number_of_mismatches); 1799 1798 dm_audit_log_bio(DM_MSG_PREFIX, "integrity-checksum",

+2 -3

drivers/md/dm-io.c

··· 345 345 (PAGE_SIZE >> SECTOR_SHIFT))); 346 346 } 347 347 348 - bio = bio_alloc_bioset(GFP_NOIO, num_bvecs, &io->client->bios); 348 + bio = bio_alloc_bioset(where->bdev, num_bvecs, op | op_flags, 349 + GFP_NOIO, &io->client->bios); 349 350 bio->bi_iter.bi_sector = where->sector + (where->count - remaining); 350 - bio_set_dev(bio, where->bdev); 351 351 bio->bi_end_io = endio; 352 - bio_set_op_attrs(bio, op, op_flags); 353 352 store_io_and_region_in_bio(bio, io, region); 354 353 355 354 if (op == REQ_OP_DISCARD || op == REQ_OP_WRITE_ZEROES) {

+8 -31

drivers/md/dm-log-writes.c

··· 217 217 void *ptr; 218 218 size_t ret; 219 219 220 - bio = bio_alloc(GFP_KERNEL, 1); 221 - if (!bio) { 222 - DMERR("Couldn't alloc log bio"); 223 - goto error; 224 - } 220 + bio = bio_alloc(lc->logdev->bdev, 1, REQ_OP_WRITE, GFP_KERNEL); 225 221 bio->bi_iter.bi_size = 0; 226 222 bio->bi_iter.bi_sector = sector; 227 - bio_set_dev(bio, lc->logdev->bdev); 228 223 bio->bi_end_io = (sector == WRITE_LOG_SUPER_SECTOR) ? 229 224 log_end_super : log_end_io; 230 225 bio->bi_private = lc; 231 - bio_set_op_attrs(bio, REQ_OP_WRITE, 0); 232 226 233 227 page = alloc_page(GFP_KERNEL); 234 228 if (!page) { ··· 269 275 270 276 atomic_inc(&lc->io_blocks); 271 277 272 - bio = bio_alloc(GFP_KERNEL, bio_pages); 273 - if (!bio) { 274 - DMERR("Couldn't alloc inline data bio"); 275 - goto error; 276 - } 277 - 278 + bio = bio_alloc(lc->logdev->bdev, bio_pages, REQ_OP_WRITE, 279 + GFP_KERNEL); 278 280 bio->bi_iter.bi_size = 0; 279 281 bio->bi_iter.bi_sector = sector; 280 - bio_set_dev(bio, lc->logdev->bdev); 281 282 bio->bi_end_io = log_end_io; 282 283 bio->bi_private = lc; 283 - bio_set_op_attrs(bio, REQ_OP_WRITE, 0); 284 284 285 285 for (i = 0; i < bio_pages; i++) { 286 286 pg_datalen = min_t(int, datalen, PAGE_SIZE); ··· 310 322 error_bio: 311 323 bio_free_pages(bio); 312 324 bio_put(bio); 313 - error: 314 325 put_io_block(lc); 315 326 return -1; 316 327 } ··· 350 363 goto out; 351 364 352 365 atomic_inc(&lc->io_blocks); 353 - bio = bio_alloc(GFP_KERNEL, bio_max_segs(block->vec_cnt)); 354 - if (!bio) { 355 - DMERR("Couldn't alloc log bio"); 356 - goto error; 357 - } 366 + bio = bio_alloc(lc->logdev->bdev, bio_max_segs(block->vec_cnt), 367 + REQ_OP_WRITE, GFP_KERNEL); 358 368 bio->bi_iter.bi_size = 0; 359 369 bio->bi_iter.bi_sector = sector; 360 - bio_set_dev(bio, lc->logdev->bdev); 361 370 bio->bi_end_io = log_end_io; 362 371 bio->bi_private = lc; 363 - bio_set_op_attrs(bio, REQ_OP_WRITE, 0); 364 372 365 373 for (i = 0; i < block->vec_cnt; i++) { 366 374 /* ··· 367 385 if (ret != block->vecs[i].bv_len) { 368 386 atomic_inc(&lc->io_blocks); 369 387 submit_bio(bio); 370 - bio = bio_alloc(GFP_KERNEL, 371 - bio_max_segs(block->vec_cnt - i)); 372 - if (!bio) { 373 - DMERR("Couldn't alloc log bio"); 374 - goto error; 375 - } 388 + bio = bio_alloc(lc->logdev->bdev, 389 + bio_max_segs(block->vec_cnt - i), 390 + REQ_OP_WRITE, GFP_KERNEL); 376 391 bio->bi_iter.bi_size = 0; 377 392 bio->bi_iter.bi_sector = sector; 378 - bio_set_dev(bio, lc->logdev->bdev); 379 393 bio->bi_end_io = log_end_io; 380 394 bio->bi_private = lc; 381 - bio_set_op_attrs(bio, REQ_OP_WRITE, 0); 382 395 383 396 ret = bio_add_page(bio, block->vecs[i].bv_page, 384 397 block->vecs[i].bv_len, 0);

+9 -17

drivers/md/dm-rq.c

··· 303 303 dm_complete_request(tio->orig, error); 304 304 } 305 305 306 - static blk_status_t dm_dispatch_clone_request(struct request *clone, struct request *rq) 307 - { 308 - blk_status_t r; 309 - 310 - if (blk_queue_io_stat(clone->q)) 311 - clone->rq_flags |= RQF_IO_STAT; 312 - 313 - clone->start_time_ns = ktime_get_ns(); 314 - r = blk_insert_cloned_request(clone->q, clone); 315 - if (r != BLK_STS_OK && r != BLK_STS_RESOURCE && r != BLK_STS_DEV_RESOURCE) 316 - /* must complete clone in terms of original request */ 317 - dm_complete_request(rq, r); 318 - return r; 319 - } 320 - 321 306 static int dm_rq_bio_constructor(struct bio *bio, struct bio *bio_orig, 322 307 void *data) 323 308 { ··· 383 398 /* The target has remapped the I/O so dispatch it */ 384 399 trace_block_rq_remap(clone, disk_devt(dm_disk(md)), 385 400 blk_rq_pos(rq)); 386 - ret = dm_dispatch_clone_request(clone, rq); 387 - if (ret == BLK_STS_RESOURCE || ret == BLK_STS_DEV_RESOURCE) { 401 + ret = blk_insert_cloned_request(clone); 402 + switch (ret) { 403 + case BLK_STS_OK: 404 + break; 405 + case BLK_STS_RESOURCE: 406 + case BLK_STS_DEV_RESOURCE: 388 407 blk_rq_unprep_clone(clone); 389 408 blk_mq_cleanup_rq(clone); 390 409 tio->ti->type->release_clone_rq(clone, &tio->info); 391 410 tio->clone = NULL; 392 411 return DM_MAPIO_REQUEUE; 412 + default: 413 + /* must complete clone in terms of original request */ 414 + dm_complete_request(rq, ret); 393 415 } 394 416 break; 395 417 case DM_MAPIO_REQUEUE:

+1 -20

drivers/md/dm-snap.c

··· 141 141 * for them to be committed. 142 142 */ 143 143 struct bio_list bios_queued_during_merge; 144 - 145 - /* 146 - * Flush data after merge. 147 - */ 148 - struct bio flush_bio; 149 144 }; 150 145 151 146 /* ··· 1122 1127 1123 1128 static void error_bios(struct bio *bio); 1124 1129 1125 - static int flush_data(struct dm_snapshot *s) 1126 - { 1127 - struct bio *flush_bio = &s->flush_bio; 1128 - 1129 - bio_reset(flush_bio); 1130 - bio_set_dev(flush_bio, s->origin->bdev); 1131 - flush_bio->bi_opf = REQ_OP_WRITE | REQ_PREFLUSH; 1132 - 1133 - return submit_bio_wait(flush_bio); 1134 - } 1135 - 1136 1130 static void merge_callback(int read_err, unsigned long write_err, void *context) 1137 1131 { 1138 1132 struct dm_snapshot *s = context; ··· 1135 1151 goto shut; 1136 1152 } 1137 1153 1138 - if (flush_data(s) < 0) { 1154 + if (blkdev_issue_flush(s->origin->bdev) < 0) { 1139 1155 DMERR("Flush after merge failed: shutting down merge"); 1140 1156 goto shut; 1141 1157 } ··· 1324 1340 s->first_merging_chunk = 0; 1325 1341 s->num_merging_chunks = 0; 1326 1342 bio_list_init(&s->bios_queued_during_merge); 1327 - bio_init(&s->flush_bio, NULL, 0); 1328 1343 1329 1344 /* Allocate hash table for COW data */ 1330 1345 if (init_hash_tables(s)) { ··· 1510 1527 mempool_exit(&s->pending_pool); 1511 1528 1512 1529 dm_exception_store_destroy(s->store); 1513 - 1514 - bio_uninit(&s->flush_bio); 1515 1530 1516 1531 dm_put_device(ti, s->cow); 1517 1532

+11 -28

drivers/md/dm-thin.c

··· 282 282 struct dm_bio_prison_cell **cell_sort_array; 283 283 284 284 mempool_t mapping_pool; 285 - 286 - struct bio flush_bio; 287 285 }; 288 286 289 287 static void metadata_operation_failed(struct pool *pool, const char *op, int r); ··· 1177 1179 return; 1178 1180 } 1179 1181 1180 - discard_parent = bio_alloc(GFP_NOIO, 1); 1181 - if (!discard_parent) { 1182 - DMWARN("%s: unable to allocate top level discard bio for passdown. Skipping passdown.", 1183 - dm_device_name(tc->pool->pool_md)); 1184 - queue_passdown_pt2(m); 1182 + discard_parent = bio_alloc(NULL, 1, 0, GFP_NOIO); 1183 + discard_parent->bi_end_io = passdown_endio; 1184 + discard_parent->bi_private = m; 1185 + if (m->maybe_shared) 1186 + passdown_double_checking_shared_status(m, discard_parent); 1187 + else { 1188 + struct discard_op op; 1185 1189 1186 - } else { 1187 - discard_parent->bi_end_io = passdown_endio; 1188 - discard_parent->bi_private = m; 1189 - 1190 - if (m->maybe_shared) 1191 - passdown_double_checking_shared_status(m, discard_parent); 1192 - else { 1193 - struct discard_op op; 1194 - 1195 - begin_discard(&op, tc, discard_parent); 1196 - r = issue_discard(&op, m->data_block, data_end); 1197 - end_discard(&op, r); 1198 - } 1190 + begin_discard(&op, tc, discard_parent); 1191 + r = issue_discard(&op, m->data_block, data_end); 1192 + end_discard(&op, r); 1199 1193 } 1200 1194 } 1201 1195 ··· 2903 2913 if (pool->next_mapping) 2904 2914 mempool_free(pool->next_mapping, &pool->mapping_pool); 2905 2915 mempool_exit(&pool->mapping_pool); 2906 - bio_uninit(&pool->flush_bio); 2907 2916 dm_deferred_set_destroy(pool->shared_read_ds); 2908 2917 dm_deferred_set_destroy(pool->all_io_ds); 2909 2918 kfree(pool); ··· 2983 2994 pool->low_water_triggered = false; 2984 2995 pool->suspended = true; 2985 2996 pool->out_of_data_space = false; 2986 - bio_init(&pool->flush_bio, NULL, 0); 2987 2997 2988 2998 pool->shared_read_ds = dm_deferred_set_create(); 2989 2999 if (!pool->shared_read_ds) { ··· 3189 3201 static int metadata_pre_commit_callback(void *context) 3190 3202 { 3191 3203 struct pool *pool = context; 3192 - struct bio *flush_bio = &pool->flush_bio; 3193 3204 3194 - bio_reset(flush_bio); 3195 - bio_set_dev(flush_bio, pool->data_dev); 3196 - flush_bio->bi_opf = REQ_OP_WRITE | REQ_PREFLUSH; 3197 - 3198 - return submit_bio_wait(flush_bio); 3205 + return blkdev_issue_flush(pool->data_dev); 3199 3206 } 3200 3207 3201 3208 static sector_t get_dev_size(struct block_device *bdev)

+4 -3

drivers/md/dm-writecache.c

··· 1821 1821 1822 1822 max_pages = e->wc_list_contiguous; 1823 1823 1824 - bio = bio_alloc_bioset(GFP_NOIO, max_pages, &wc->bio_set); 1824 + bio = bio_alloc_bioset(wc->dev->bdev, max_pages, REQ_OP_WRITE, 1825 + GFP_NOIO, &wc->bio_set); 1825 1826 wb = container_of(bio, struct writeback_struct, bio); 1826 1827 wb->wc = wc; 1827 1828 bio->bi_end_io = writecache_writeback_endio; 1828 - bio_set_dev(bio, wc->dev->bdev); 1829 1829 bio->bi_iter.bi_sector = read_original_sector(wc, e); 1830 1830 if (max_pages <= WB_LIST_INLINE || 1831 1831 unlikely(!(wb->wc_list = kmalloc_array(max_pages, sizeof(struct wc_entry *), ··· 1852 1852 wb->wc_list[wb->wc_list_n++] = f; 1853 1853 e = f; 1854 1854 } 1855 - bio_set_op_attrs(bio, REQ_OP_WRITE, WC_MODE_FUA(wc) * REQ_FUA); 1855 + if (WC_MODE_FUA(wc)) 1856 + bio->bi_opf |= REQ_FUA; 1856 1857 if (writecache_has_error(wc)) { 1857 1858 bio->bi_status = BLK_STS_IOERR; 1858 1859 bio_endio(bio);

+6 -20

drivers/md/dm-zoned-metadata.c

··· 550 550 if (!mblk) 551 551 return ERR_PTR(-ENOMEM); 552 552 553 - bio = bio_alloc(GFP_NOIO, 1); 554 - if (!bio) { 555 - dmz_free_mblock(zmd, mblk); 556 - return ERR_PTR(-ENOMEM); 557 - } 553 + bio = bio_alloc(dev->bdev, 1, REQ_OP_READ | REQ_META | REQ_PRIO, 554 + GFP_NOIO); 558 555 559 556 spin_lock(&zmd->mblk_lock); 560 557 ··· 575 578 576 579 /* Submit read BIO */ 577 580 bio->bi_iter.bi_sector = dmz_blk2sect(block); 578 - bio_set_dev(bio, dev->bdev); 579 581 bio->bi_private = mblk; 580 582 bio->bi_end_io = dmz_mblock_bio_end_io; 581 - bio_set_op_attrs(bio, REQ_OP_READ, REQ_META | REQ_PRIO); 582 583 bio_add_page(bio, mblk->page, DMZ_BLOCK_SIZE, 0); 583 584 submit_bio(bio); 584 585 ··· 720 725 if (dmz_bdev_is_dying(dev)) 721 726 return -EIO; 722 727 723 - bio = bio_alloc(GFP_NOIO, 1); 724 - if (!bio) { 725 - set_bit(DMZ_META_ERROR, &mblk->state); 726 - return -ENOMEM; 727 - } 728 + bio = bio_alloc(dev->bdev, 1, REQ_OP_WRITE | REQ_META | REQ_PRIO, 729 + GFP_NOIO); 728 730 729 731 set_bit(DMZ_META_WRITING, &mblk->state); 730 732 731 733 bio->bi_iter.bi_sector = dmz_blk2sect(block); 732 - bio_set_dev(bio, dev->bdev); 733 734 bio->bi_private = mblk; 734 735 bio->bi_end_io = dmz_mblock_bio_end_io; 735 - bio_set_op_attrs(bio, REQ_OP_WRITE, REQ_META | REQ_PRIO); 736 736 bio_add_page(bio, mblk->page, DMZ_BLOCK_SIZE, 0); 737 737 submit_bio(bio); 738 738 ··· 749 759 if (dmz_bdev_is_dying(dev)) 750 760 return -EIO; 751 761 752 - bio = bio_alloc(GFP_NOIO, 1); 753 - if (!bio) 754 - return -ENOMEM; 755 - 762 + bio = bio_alloc(dev->bdev, 1, op | REQ_SYNC | REQ_META | REQ_PRIO, 763 + GFP_NOIO); 756 764 bio->bi_iter.bi_sector = dmz_blk2sect(block); 757 - bio_set_dev(bio, dev->bdev); 758 - bio_set_op_attrs(bio, op, REQ_SYNC | REQ_META | REQ_PRIO); 759 765 bio_add_page(bio, page, DMZ_BLOCK_SIZE, 0); 760 766 ret = submit_bio_wait(bio); 761 767 bio_put(bio);

+1 -2

drivers/md/dm-zoned-target.c

··· 125 125 if (dev->flags & DMZ_BDEV_DYING) 126 126 return -EIO; 127 127 128 - clone = bio_clone_fast(bio, GFP_NOIO, &dmz->bio_set); 128 + clone = bio_alloc_clone(dev->bdev, bio, GFP_NOIO, &dmz->bio_set); 129 129 if (!clone) 130 130 return -ENOMEM; 131 131 132 - bio_set_dev(clone, dev->bdev); 133 132 bioctx->dev = dev; 134 133 clone->bi_iter.bi_sector = 135 134 dmz_start_sect(dmz->metadata, zone) + dmz_blk2sect(chunk_block);

+61 -111

drivers/md/dm.c

··· 79 79 #define DM_IO_BIO_OFFSET \ 80 80 (offsetof(struct dm_target_io, clone) + offsetof(struct dm_io, tio)) 81 81 82 + static inline struct dm_target_io *clone_to_tio(struct bio *clone) 83 + { 84 + return container_of(clone, struct dm_target_io, clone); 85 + } 86 + 82 87 void *dm_per_bio_data(struct bio *bio, size_t data_size) 83 88 { 84 - struct dm_target_io *tio = container_of(bio, struct dm_target_io, clone); 85 - if (!tio->inside_dm_io) 89 + if (!clone_to_tio(bio)->inside_dm_io) 86 90 return (char *)bio - DM_TARGET_IO_BIO_OFFSET - data_size; 87 91 return (char *)bio - DM_IO_BIO_OFFSET - data_size; 88 92 } ··· 481 477 482 478 u64 dm_start_time_ns_from_clone(struct bio *bio) 483 479 { 484 - struct dm_target_io *tio = container_of(bio, struct dm_target_io, clone); 485 - struct dm_io *io = tio->io; 486 - 487 - return jiffies_to_nsecs(io->start_time); 480 + return jiffies_to_nsecs(clone_to_tio(bio)->io->start_time); 488 481 } 489 482 EXPORT_SYMBOL_GPL(dm_start_time_ns_from_clone); 490 483 ··· 520 519 struct dm_target_io *tio; 521 520 struct bio *clone; 522 521 523 - clone = bio_alloc_bioset(GFP_NOIO, 0, &md->io_bs); 524 - if (!clone) 525 - return NULL; 522 + clone = bio_alloc_clone(bio->bi_bdev, bio, GFP_NOIO, &md->io_bs); 526 523 527 - tio = container_of(clone, struct dm_target_io, clone); 524 + tio = clone_to_tio(clone); 528 525 tio->inside_dm_io = true; 529 526 tio->io = NULL; 530 527 ··· 544 545 bio_put(&io->tio.clone); 545 546 } 546 547 547 - static struct dm_target_io *alloc_tio(struct clone_info *ci, struct dm_target *ti, 548 - unsigned target_bio_nr, gfp_t gfp_mask) 548 + static struct bio *alloc_tio(struct clone_info *ci, struct dm_target *ti, 549 + unsigned target_bio_nr, unsigned *len, gfp_t gfp_mask) 549 550 { 550 551 struct dm_target_io *tio; 551 552 ··· 553 554 /* the dm_target_io embedded in ci->io is available */ 554 555 tio = &ci->io->tio; 555 556 } else { 556 - struct bio *clone = bio_alloc_bioset(gfp_mask, 0, &ci->io->md->bs); 557 + struct bio *clone = bio_alloc_clone(ci->bio->bi_bdev, ci->bio, 558 + gfp_mask, &ci->io->md->bs); 557 559 if (!clone) 558 560 return NULL; 559 561 560 - tio = container_of(clone, struct dm_target_io, clone); 562 + tio = clone_to_tio(clone); 561 563 tio->inside_dm_io = false; 562 564 } 563 565 ··· 566 566 tio->io = ci->io; 567 567 tio->ti = ti; 568 568 tio->target_bio_nr = target_bio_nr; 569 + tio->len_ptr = len; 569 570 570 - return tio; 571 + return &tio->clone; 571 572 } 572 573 573 - static void free_tio(struct dm_target_io *tio) 574 + static void free_tio(struct bio *clone) 574 575 { 575 - if (tio->inside_dm_io) 576 + if (clone_to_tio(clone)->inside_dm_io) 576 577 return; 577 - bio_put(&tio->clone); 578 + bio_put(clone); 578 579 } 579 580 580 581 /* ··· 880 879 static void clone_endio(struct bio *bio) 881 880 { 882 881 blk_status_t error = bio->bi_status; 883 - struct dm_target_io *tio = container_of(bio, struct dm_target_io, clone); 882 + struct dm_target_io *tio = clone_to_tio(bio); 884 883 struct dm_io *io = tio->io; 885 884 struct mapped_device *md = tio->io->md; 886 885 dm_endio_fn endio = tio->ti->type->end_io; ··· 931 930 up(&md->swap_bios_semaphore); 932 931 } 933 932 934 - free_tio(tio); 933 + free_tio(bio); 935 934 dm_io_dec_pending(io, error); 936 935 } 937 936 ··· 1086 1085 */ 1087 1086 void dm_accept_partial_bio(struct bio *bio, unsigned n_sectors) 1088 1087 { 1089 - struct dm_target_io *tio = container_of(bio, struct dm_target_io, clone); 1088 + struct dm_target_io *tio = clone_to_tio(bio); 1090 1089 unsigned bi_size = bio->bi_iter.bi_size >> SECTOR_SHIFT; 1091 1090 1092 1091 BUG_ON(bio->bi_opf & REQ_PREFLUSH); ··· 1116 1115 mutex_unlock(&md->swap_bios_lock); 1117 1116 } 1118 1117 1119 - static void __map_bio(struct dm_target_io *tio) 1118 + static void __map_bio(struct bio *clone) 1120 1119 { 1120 + struct dm_target_io *tio = clone_to_tio(clone); 1121 1121 int r; 1122 1122 sector_t sector; 1123 - struct bio *clone = &tio->clone; 1124 1123 struct dm_io *io = tio->io; 1125 1124 struct dm_target *ti = tio->ti; 1126 1125 ··· 1165 1164 struct mapped_device *md = io->md; 1166 1165 up(&md->swap_bios_semaphore); 1167 1166 } 1168 - free_tio(tio); 1167 + free_tio(clone); 1169 1168 dm_io_dec_pending(io, BLK_STS_IOERR); 1170 1169 break; 1171 1170 case DM_MAPIO_REQUEUE: ··· 1173 1172 struct mapped_device *md = io->md; 1174 1173 up(&md->swap_bios_semaphore); 1175 1174 } 1176 - free_tio(tio); 1175 + free_tio(clone); 1177 1176 dm_io_dec_pending(io, BLK_STS_DM_REQUEUE); 1178 1177 break; 1179 1178 default: ··· 1191 1190 /* 1192 1191 * Creates a bio that consists of range of complete bvecs. 1193 1192 */ 1194 - static int clone_bio(struct dm_target_io *tio, struct bio *bio, 1195 - sector_t sector, unsigned len) 1193 + static int __clone_and_map_data_bio(struct clone_info *ci, struct dm_target *ti, 1194 + sector_t sector, unsigned *len) 1196 1195 { 1197 - struct bio *clone = &tio->clone; 1198 - int r; 1196 + struct bio *bio = ci->bio, *clone; 1199 1197 1200 - __bio_clone_fast(clone, bio); 1201 - 1202 - r = bio_crypt_clone(clone, bio, GFP_NOIO); 1203 - if (r < 0) 1204 - return r; 1205 - 1206 - if (bio_integrity(bio)) { 1207 - if (unlikely(!dm_target_has_integrity(tio->ti->type) && 1208 - !dm_target_passes_integrity(tio->ti->type))) { 1209 - DMWARN("%s: the target %s doesn't support integrity data.", 1210 - dm_device_name(tio->io->md), 1211 - tio->ti->type->name); 1212 - return -EIO; 1213 - } 1214 - 1215 - r = bio_integrity_clone(clone, bio, GFP_NOIO); 1216 - if (r < 0) 1217 - return r; 1218 - } 1219 - 1198 + clone = alloc_tio(ci, ti, 0, len, GFP_NOIO); 1220 1199 bio_advance(clone, to_bytes(sector - clone->bi_iter.bi_sector)); 1221 - clone->bi_iter.bi_size = to_bytes(len); 1200 + clone->bi_iter.bi_size = to_bytes(*len); 1222 1201 1223 1202 if (bio_integrity(bio)) 1224 1203 bio_integrity_trim(clone); 1225 1204 1205 + __map_bio(clone); 1226 1206 return 0; 1227 1207 } 1228 1208 1229 1209 static void alloc_multiple_bios(struct bio_list *blist, struct clone_info *ci, 1230 - struct dm_target *ti, unsigned num_bios) 1210 + struct dm_target *ti, unsigned num_bios, 1211 + unsigned *len) 1231 1212 { 1232 - struct dm_target_io *tio; 1213 + struct bio *bio; 1233 1214 int try; 1234 - 1235 - if (!num_bios) 1236 - return; 1237 - 1238 - if (num_bios == 1) { 1239 - tio = alloc_tio(ci, ti, 0, GFP_NOIO); 1240 - bio_list_add(blist, &tio->clone); 1241 - return; 1242 - } 1243 1215 1244 1216 for (try = 0; try < 2; try++) { 1245 1217 int bio_nr; 1246 - struct bio *bio; 1247 1218 1248 1219 if (try) 1249 1220 mutex_lock(&ci->io->md->table_devices_lock); 1250 1221 for (bio_nr = 0; bio_nr < num_bios; bio_nr++) { 1251 - tio = alloc_tio(ci, ti, bio_nr, try ? GFP_NOIO : GFP_NOWAIT); 1252 - if (!tio) 1222 + bio = alloc_tio(ci, ti, bio_nr, len, 1223 + try ? GFP_NOIO : GFP_NOWAIT); 1224 + if (!bio) 1253 1225 break; 1254 1226 1255 - bio_list_add(blist, &tio->clone); 1227 + bio_list_add(blist, bio); 1256 1228 } 1257 1229 if (try) 1258 1230 mutex_unlock(&ci->io->md->table_devices_lock); 1259 1231 if (bio_nr == num_bios) 1260 1232 return; 1261 1233 1262 - while ((bio = bio_list_pop(blist))) { 1263 - tio = container_of(bio, struct dm_target_io, clone); 1264 - free_tio(tio); 1265 - } 1234 + while ((bio = bio_list_pop(blist))) 1235 + free_tio(bio); 1266 1236 } 1267 - } 1268 - 1269 - static void __clone_and_map_simple_bio(struct clone_info *ci, 1270 - struct dm_target_io *tio, unsigned *len) 1271 - { 1272 - struct bio *clone = &tio->clone; 1273 - 1274 - tio->len_ptr = len; 1275 - 1276 - __bio_clone_fast(clone, ci->bio); 1277 - if (len) 1278 - bio_setup_sector(clone, ci->sector, *len); 1279 - __map_bio(tio); 1280 1237 } 1281 1238 1282 1239 static void __send_duplicate_bios(struct clone_info *ci, struct dm_target *ti, 1283 1240 unsigned num_bios, unsigned *len) 1284 1241 { 1285 1242 struct bio_list blist = BIO_EMPTY_LIST; 1286 - struct bio *bio; 1287 - struct dm_target_io *tio; 1243 + struct bio *clone; 1288 1244 1289 - alloc_multiple_bios(&blist, ci, ti, num_bios); 1290 - 1291 - while ((bio = bio_list_pop(&blist))) { 1292 - tio = container_of(bio, struct dm_target_io, clone); 1293 - __clone_and_map_simple_bio(ci, tio, len); 1245 + switch (num_bios) { 1246 + case 0: 1247 + break; 1248 + case 1: 1249 + clone = alloc_tio(ci, ti, 0, len, GFP_NOIO); 1250 + if (len) 1251 + bio_setup_sector(clone, ci->sector, *len); 1252 + __map_bio(clone); 1253 + break; 1254 + default: 1255 + alloc_multiple_bios(&blist, ci, ti, num_bios, len); 1256 + while ((clone = bio_list_pop(&blist))) { 1257 + if (len) 1258 + bio_setup_sector(clone, ci->sector, *len); 1259 + __map_bio(clone); 1260 + } 1261 + break; 1294 1262 } 1295 1263 } 1296 1264 ··· 1274 1304 * need to reference it after submit. It's just used as 1275 1305 * the basis for the clone(s). 1276 1306 */ 1277 - bio_init(&flush_bio, NULL, 0); 1278 - flush_bio.bi_opf = REQ_OP_WRITE | REQ_PREFLUSH | REQ_SYNC; 1279 - bio_set_dev(&flush_bio, ci->io->md->disk->part0); 1307 + bio_init(&flush_bio, ci->io->md->disk->part0, NULL, 0, 1308 + REQ_OP_WRITE | REQ_PREFLUSH | REQ_SYNC); 1280 1309 1281 1310 ci->bio = &flush_bio; 1282 1311 ci->sector_count = 0; ··· 1285 1316 __send_duplicate_bios(ci, ti, ti->num_flush_bios, NULL); 1286 1317 1287 1318 bio_uninit(ci->bio); 1288 - return 0; 1289 - } 1290 - 1291 - static int __clone_and_map_data_bio(struct clone_info *ci, struct dm_target *ti, 1292 - sector_t sector, unsigned *len) 1293 - { 1294 - struct bio *bio = ci->bio; 1295 - struct dm_target_io *tio; 1296 - int r; 1297 - 1298 - tio = alloc_tio(ci, ti, 0, GFP_NOIO); 1299 - tio->len_ptr = len; 1300 - r = clone_bio(tio, bio, sector, *len); 1301 - if (r < 0) { 1302 - free_tio(tio); 1303 - return r; 1304 - } 1305 - __map_bio(tio); 1306 - 1307 1319 return 0; 1308 1320 } 1309 1321

+2 -2

drivers/md/md-faulty.c

··· 205 205 } 206 206 } 207 207 if (failit) { 208 - struct bio *b = bio_clone_fast(bio, GFP_NOIO, &mddev->bio_set); 208 + struct bio *b = bio_alloc_clone(conf->rdev->bdev, bio, GFP_NOIO, 209 + &mddev->bio_set); 209 210 210 - bio_set_dev(b, conf->rdev->bdev); 211 211 b->bi_private = bio; 212 212 b->bi_end_io = faulty_fail; 213 213 bio = b;

+5 -8

drivers/md/md-multipath.c

··· 121 121 } 122 122 multipath = conf->multipaths + mp_bh->path; 123 123 124 - bio_init(&mp_bh->bio, NULL, 0); 125 - __bio_clone_fast(&mp_bh->bio, bio); 124 + bio_init_clone(multipath->rdev->bdev, &mp_bh->bio, bio, GFP_NOIO); 126 125 127 126 mp_bh->bio.bi_iter.bi_sector += multipath->rdev->data_offset; 128 - bio_set_dev(&mp_bh->bio, multipath->rdev->bdev); 129 127 mp_bh->bio.bi_opf |= REQ_FAILFAST_TRANSPORT; 130 128 mp_bh->bio.bi_end_io = multipath_end_request; 131 129 mp_bh->bio.bi_private = mp_bh; ··· 297 299 298 300 md_check_recovery(mddev); 299 301 for (;;) { 300 - char b[BDEVNAME_SIZE]; 301 302 spin_lock_irqsave(&conf->device_lock, flags); 302 303 if (list_empty(head)) 303 304 break; ··· 308 311 bio->bi_iter.bi_sector = mp_bh->master_bio->bi_iter.bi_sector; 309 312 310 313 if ((mp_bh->path = multipath_map (conf))<0) { 311 - pr_err("multipath: %s: unrecoverable IO read error for block %llu\n", 312 - bio_devname(bio, b), 314 + pr_err("multipath: %pg: unrecoverable IO read error for block %llu\n", 315 + bio->bi_bdev, 313 316 (unsigned long long)bio->bi_iter.bi_sector); 314 317 multipath_end_bh_io(mp_bh, BLK_STS_IOERR); 315 318 } else { 316 - pr_err("multipath: %s: redirecting sector %llu to another IO path\n", 317 - bio_devname(bio, b), 319 + pr_err("multipath: %pg: redirecting sector %llu to another IO path\n", 320 + bio->bi_bdev, 318 321 (unsigned long long)bio->bi_iter.bi_sector); 319 322 *bio = *(mp_bh->master_bio); 320 323 bio->bi_iter.bi_sector +=

+14 -15

drivers/md/md.c

··· 562 562 atomic_inc(&rdev->nr_pending); 563 563 atomic_inc(&rdev->nr_pending); 564 564 rcu_read_unlock(); 565 - bi = bio_alloc_bioset(GFP_NOIO, 0, &mddev->bio_set); 565 + bi = bio_alloc_bioset(rdev->bdev, 0, 566 + REQ_OP_WRITE | REQ_PREFLUSH, 567 + GFP_NOIO, &mddev->bio_set); 566 568 bi->bi_end_io = md_end_flush; 567 569 bi->bi_private = rdev; 568 - bio_set_dev(bi, rdev->bdev); 569 - bi->bi_opf = REQ_OP_WRITE | REQ_PREFLUSH; 570 570 atomic_inc(&mddev->flush_pending); 571 571 submit_bio(bi); 572 572 rcu_read_lock(); ··· 955 955 * If an error occurred, call md_error 956 956 */ 957 957 struct bio *bio; 958 - int ff = 0; 959 958 960 959 if (!page) 961 960 return; ··· 962 963 if (test_bit(Faulty, &rdev->flags)) 963 964 return; 964 965 965 - bio = bio_alloc_bioset(GFP_NOIO, 1, &mddev->sync_set); 966 + bio = bio_alloc_bioset(rdev->meta_bdev ? rdev->meta_bdev : rdev->bdev, 967 + 1, 968 + REQ_OP_WRITE | REQ_SYNC | REQ_PREFLUSH | REQ_FUA, 969 + GFP_NOIO, &mddev->sync_set); 966 970 967 971 atomic_inc(&rdev->nr_pending); 968 972 969 - bio_set_dev(bio, rdev->meta_bdev ? rdev->meta_bdev : rdev->bdev); 970 973 bio->bi_iter.bi_sector = sector; 971 974 bio_add_page(bio, page, size, 0); 972 975 bio->bi_private = rdev; ··· 977 976 if (test_bit(MD_FAILFAST_SUPPORTED, &mddev->flags) && 978 977 test_bit(FailFast, &rdev->flags) && 979 978 !test_bit(LastDev, &rdev->flags)) 980 - ff = MD_FAILFAST; 981 - bio->bi_opf = REQ_OP_WRITE | REQ_SYNC | REQ_PREFLUSH | REQ_FUA | ff; 979 + bio->bi_opf |= MD_FAILFAST; 982 980 983 981 atomic_inc(&mddev->pending_writes); 984 982 submit_bio(bio); ··· 998 998 struct bio bio; 999 999 struct bio_vec bvec; 1000 1000 1001 - bio_init(&bio, &bvec, 1); 1002 - 1003 1001 if (metadata_op && rdev->meta_bdev) 1004 - bio_set_dev(&bio, rdev->meta_bdev); 1002 + bio_init(&bio, rdev->meta_bdev, &bvec, 1, op | op_flags); 1005 1003 else 1006 - bio_set_dev(&bio, rdev->bdev); 1007 - bio.bi_opf = op | op_flags; 1004 + bio_init(&bio, rdev->bdev, &bvec, 1, op | op_flags); 1005 + 1008 1006 if (metadata_op) 1009 1007 bio.bi_iter.bi_sector = sector + rdev->sb_start; 1010 1008 else if (rdev->mddev->reshape_position != MaxSector && ··· 8634 8636 */ 8635 8637 void md_account_bio(struct mddev *mddev, struct bio **bio) 8636 8638 { 8639 + struct block_device *bdev = (*bio)->bi_bdev; 8637 8640 struct md_io_acct *md_io_acct; 8638 8641 struct bio *clone; 8639 8642 8640 - if (!blk_queue_io_stat((*bio)->bi_bdev->bd_disk->queue)) 8643 + if (!blk_queue_io_stat(bdev->bd_disk->queue)) 8641 8644 return; 8642 8645 8643 - clone = bio_clone_fast(*bio, GFP_NOIO, &mddev->io_acct_set); 8646 + clone = bio_alloc_clone(bdev, *bio, GFP_NOIO, &mddev->io_acct_set); 8644 8647 md_io_acct = container_of(clone, struct md_io_acct, bio_clone); 8645 8648 md_io_acct->orig_bio = *bio; 8646 8649 md_io_acct->start_time = bio_start_io_acct(*bio);

+23 -24

drivers/md/raid1.c

··· 1126 1126 int i = 0; 1127 1127 struct bio *behind_bio = NULL; 1128 1128 1129 - behind_bio = bio_alloc_bioset(GFP_NOIO, vcnt, &r1_bio->mddev->bio_set); 1129 + behind_bio = bio_alloc_bioset(NULL, vcnt, 0, GFP_NOIO, 1130 + &r1_bio->mddev->bio_set); 1130 1131 if (!behind_bio) 1131 1132 return; 1132 1133 ··· 1320 1319 if (!r1bio_existed && blk_queue_io_stat(bio->bi_bdev->bd_disk->queue)) 1321 1320 r1_bio->start_time = bio_start_io_acct(bio); 1322 1321 1323 - read_bio = bio_clone_fast(bio, gfp, &mddev->bio_set); 1322 + read_bio = bio_alloc_clone(mirror->rdev->bdev, bio, gfp, 1323 + &mddev->bio_set); 1324 1324 1325 1325 r1_bio->bios[rdisk] = read_bio; 1326 1326 1327 1327 read_bio->bi_iter.bi_sector = r1_bio->sector + 1328 1328 mirror->rdev->data_offset; 1329 - bio_set_dev(read_bio, mirror->rdev->bdev); 1330 1329 read_bio->bi_end_io = raid1_end_read_request; 1331 1330 bio_set_op_attrs(read_bio, op, do_sync); 1332 1331 if (test_bit(FailFast, &mirror->rdev->flags) && ··· 1546 1545 first_clone = 0; 1547 1546 } 1548 1547 1549 - if (r1_bio->behind_master_bio) 1550 - mbio = bio_clone_fast(r1_bio->behind_master_bio, 1551 - GFP_NOIO, &mddev->bio_set); 1552 - else 1553 - mbio = bio_clone_fast(bio, GFP_NOIO, &mddev->bio_set); 1554 - 1555 1548 if (r1_bio->behind_master_bio) { 1549 + mbio = bio_alloc_clone(rdev->bdev, 1550 + r1_bio->behind_master_bio, 1551 + GFP_NOIO, &mddev->bio_set); 1556 1552 if (test_bit(CollisionCheck, &rdev->flags)) 1557 1553 wait_for_serialization(rdev, r1_bio); 1558 1554 if (test_bit(WriteMostly, &rdev->flags)) 1559 1555 atomic_inc(&r1_bio->behind_remaining); 1560 - } else if (mddev->serialize_policy) 1561 - wait_for_serialization(rdev, r1_bio); 1556 + } else { 1557 + mbio = bio_alloc_clone(rdev->bdev, bio, GFP_NOIO, 1558 + &mddev->bio_set); 1559 + 1560 + if (mddev->serialize_policy) 1561 + wait_for_serialization(rdev, r1_bio); 1562 + } 1562 1563 1563 1564 r1_bio->bios[i] = mbio; 1564 1565 1565 1566 mbio->bi_iter.bi_sector = (r1_bio->sector + rdev->data_offset); 1566 - bio_set_dev(mbio, rdev->bdev); 1567 1567 mbio->bi_end_io = raid1_end_write_request; 1568 1568 mbio->bi_opf = bio_op(bio) | (bio->bi_opf & (REQ_SYNC | REQ_FUA)); 1569 1569 if (test_bit(FailFast, &rdev->flags) && ··· 2072 2070 } while (!success && d != r1_bio->read_disk); 2073 2071 2074 2072 if (!success) { 2075 - char b[BDEVNAME_SIZE]; 2076 2073 int abort = 0; 2077 2074 /* Cannot read from anywhere, this block is lost. 2078 2075 * Record a bad block on each device. If that doesn't 2079 2076 * work just disable and interrupt the recovery. 2080 2077 * Don't fail devices as that won't really help. 2081 2078 */ 2082 - pr_crit_ratelimited("md/raid1:%s: %s: unrecoverable I/O read error for block %llu\n", 2083 - mdname(mddev), bio_devname(bio, b), 2079 + pr_crit_ratelimited("md/raid1:%s: %pg: unrecoverable I/O read error for block %llu\n", 2080 + mdname(mddev), bio->bi_bdev, 2084 2081 (unsigned long long)r1_bio->sector); 2085 2082 for (d = 0; d < conf->raid_disks * 2; d++) { 2086 2083 rdev = conf->mirrors[d].rdev; ··· 2166 2165 continue; 2167 2166 /* fixup the bio for reuse, but preserve errno */ 2168 2167 status = b->bi_status; 2169 - bio_reset(b); 2168 + bio_reset(b, conf->mirrors[i].rdev->bdev, REQ_OP_READ); 2170 2169 b->bi_status = status; 2171 2170 b->bi_iter.bi_sector = r1_bio->sector + 2172 2171 conf->mirrors[i].rdev->data_offset; 2173 - bio_set_dev(b, conf->mirrors[i].rdev->bdev); 2174 2172 b->bi_end_io = end_sync_read; 2175 2173 rp->raid_bio = r1_bio; 2176 2174 b->bi_private = rp; ··· 2416 2416 /* Write at 'sector' for 'sectors'*/ 2417 2417 2418 2418 if (test_bit(R1BIO_BehindIO, &r1_bio->state)) { 2419 - wbio = bio_clone_fast(r1_bio->behind_master_bio, 2420 - GFP_NOIO, 2421 - &mddev->bio_set); 2419 + wbio = bio_alloc_clone(rdev->bdev, 2420 + r1_bio->behind_master_bio, 2421 + GFP_NOIO, &mddev->bio_set); 2422 2422 } else { 2423 - wbio = bio_clone_fast(r1_bio->master_bio, GFP_NOIO, 2424 - &mddev->bio_set); 2423 + wbio = bio_alloc_clone(rdev->bdev, r1_bio->master_bio, 2424 + GFP_NOIO, &mddev->bio_set); 2425 2425 } 2426 2426 2427 2427 bio_set_op_attrs(wbio, REQ_OP_WRITE, 0); ··· 2430 2430 2431 2431 bio_trim(wbio, sector - r1_bio->sector, sectors); 2432 2432 wbio->bi_iter.bi_sector += rdev->data_offset; 2433 - bio_set_dev(wbio, rdev->bdev); 2434 2433 2435 2434 if (submit_bio_wait(wbio) < 0) 2436 2435 /* failure! */ ··· 2649 2650 for (i = conf->poolinfo->raid_disks; i--; ) { 2650 2651 bio = r1bio->bios[i]; 2651 2652 rps = bio->bi_private; 2652 - bio_reset(bio); 2653 + bio_reset(bio, NULL, 0); 2653 2654 bio->bi_private = rps; 2654 2655 } 2655 2656 r1bio->master_bio = NULL;

+13 -17

drivers/md/raid10.c

··· 1208 1208 1209 1209 if (blk_queue_io_stat(bio->bi_bdev->bd_disk->queue)) 1210 1210 r10_bio->start_time = bio_start_io_acct(bio); 1211 - read_bio = bio_clone_fast(bio, gfp, &mddev->bio_set); 1211 + read_bio = bio_alloc_clone(rdev->bdev, bio, gfp, &mddev->bio_set); 1212 1212 1213 1213 r10_bio->devs[slot].bio = read_bio; 1214 1214 r10_bio->devs[slot].rdev = rdev; 1215 1215 1216 1216 read_bio->bi_iter.bi_sector = r10_bio->devs[slot].addr + 1217 1217 choose_data_offset(r10_bio, rdev); 1218 - bio_set_dev(read_bio, rdev->bdev); 1219 1218 read_bio->bi_end_io = raid10_end_read_request; 1220 1219 bio_set_op_attrs(read_bio, op, do_sync); 1221 1220 if (test_bit(FailFast, &rdev->flags) && ··· 1254 1255 } else 1255 1256 rdev = conf->mirrors[devnum].rdev; 1256 1257 1257 - mbio = bio_clone_fast(bio, GFP_NOIO, &mddev->bio_set); 1258 + mbio = bio_alloc_clone(rdev->bdev, bio, GFP_NOIO, &mddev->bio_set); 1258 1259 if (replacement) 1259 1260 r10_bio->devs[n_copy].repl_bio = mbio; 1260 1261 else ··· 1262 1263 1263 1264 mbio->bi_iter.bi_sector = (r10_bio->devs[n_copy].addr + 1264 1265 choose_data_offset(r10_bio, rdev)); 1265 - bio_set_dev(mbio, rdev->bdev); 1266 1266 mbio->bi_end_io = raid10_end_write_request; 1267 1267 bio_set_op_attrs(mbio, op, do_sync | do_fua); 1268 1268 if (!replacement && test_bit(FailFast, ··· 1810 1812 */ 1811 1813 if (r10_bio->devs[disk].bio) { 1812 1814 struct md_rdev *rdev = conf->mirrors[disk].rdev; 1813 - mbio = bio_clone_fast(bio, GFP_NOIO, &mddev->bio_set); 1815 + mbio = bio_alloc_clone(bio->bi_bdev, bio, GFP_NOIO, 1816 + &mddev->bio_set); 1814 1817 mbio->bi_end_io = raid10_end_discard_request; 1815 1818 mbio->bi_private = r10_bio; 1816 1819 r10_bio->devs[disk].bio = mbio; ··· 1824 1825 } 1825 1826 if (r10_bio->devs[disk].repl_bio) { 1826 1827 struct md_rdev *rrdev = conf->mirrors[disk].replacement; 1827 - rbio = bio_clone_fast(bio, GFP_NOIO, &mddev->bio_set); 1828 + rbio = bio_alloc_clone(bio->bi_bdev, bio, GFP_NOIO, 1829 + &mddev->bio_set); 1828 1830 rbio->bi_end_io = raid10_end_discard_request; 1829 1831 rbio->bi_private = r10_bio; 1830 1832 r10_bio->devs[disk].repl_bio = rbio; ··· 2422 2422 * bi_vecs, as the read request might have corrupted these 2423 2423 */ 2424 2424 rp = get_resync_pages(tbio); 2425 - bio_reset(tbio); 2425 + bio_reset(tbio, conf->mirrors[d].rdev->bdev, REQ_OP_WRITE); 2426 2426 2427 2427 md_bio_reset_resync_pages(tbio, rp, fbio->bi_iter.bi_size); 2428 2428 ··· 2430 2430 tbio->bi_private = rp; 2431 2431 tbio->bi_iter.bi_sector = r10_bio->devs[i].addr; 2432 2432 tbio->bi_end_io = end_sync_write; 2433 - bio_set_op_attrs(tbio, REQ_OP_WRITE, 0); 2434 2433 2435 2434 bio_copy_data(tbio, fbio); 2436 2435 ··· 2440 2441 if (test_bit(FailFast, &conf->mirrors[d].rdev->flags)) 2441 2442 tbio->bi_opf |= MD_FAILFAST; 2442 2443 tbio->bi_iter.bi_sector += conf->mirrors[d].rdev->data_offset; 2443 - bio_set_dev(tbio, conf->mirrors[d].rdev->bdev); 2444 2444 submit_bio_noacct(tbio); 2445 2445 } 2446 2446 ··· 2892 2894 if (sectors > sect_to_write) 2893 2895 sectors = sect_to_write; 2894 2896 /* Write at 'sector' for 'sectors' */ 2895 - wbio = bio_clone_fast(bio, GFP_NOIO, &mddev->bio_set); 2897 + wbio = bio_alloc_clone(rdev->bdev, bio, GFP_NOIO, 2898 + &mddev->bio_set); 2896 2899 bio_trim(wbio, sector - bio->bi_iter.bi_sector, sectors); 2897 2900 wsector = r10_bio->devs[i].addr + (sector - r10_bio->sector); 2898 2901 wbio->bi_iter.bi_sector = wsector + 2899 2902 choose_data_offset(r10_bio, rdev); 2900 - bio_set_dev(wbio, rdev->bdev); 2901 2903 bio_set_op_attrs(wbio, REQ_OP_WRITE, 0); 2902 2904 2903 2905 if (submit_bio_wait(wbio) < 0) ··· 3158 3160 for (i = 0; i < nalloc; i++) { 3159 3161 bio = r10bio->devs[i].bio; 3160 3162 rp = bio->bi_private; 3161 - bio_reset(bio); 3163 + bio_reset(bio, NULL, 0); 3162 3164 bio->bi_private = rp; 3163 3165 bio = r10bio->devs[i].repl_bio; 3164 3166 if (bio) { 3165 3167 rp = bio->bi_private; 3166 - bio_reset(bio); 3168 + bio_reset(bio, NULL, 0); 3167 3169 bio->bi_private = rp; 3168 3170 } 3169 3171 } ··· 4890 4892 return sectors_done; 4891 4893 } 4892 4894 4893 - read_bio = bio_alloc_bioset(GFP_KERNEL, RESYNC_PAGES, &mddev->bio_set); 4894 - 4895 - bio_set_dev(read_bio, rdev->bdev); 4895 + read_bio = bio_alloc_bioset(rdev->bdev, RESYNC_PAGES, REQ_OP_READ, 4896 + GFP_KERNEL, &mddev->bio_set); 4896 4897 read_bio->bi_iter.bi_sector = (r10_bio->devs[r10_bio->read_slot].addr 4897 4898 + rdev->data_offset); 4898 4899 read_bio->bi_private = r10_bio; 4899 4900 read_bio->bi_end_io = end_reshape_read; 4900 - bio_set_op_attrs(read_bio, REQ_OP_READ, 0); 4901 4901 r10_bio->master_bio = read_bio; 4902 4902 r10_bio->read_slot = r10_bio->devs[r10_bio->read_slot].devnum; 4903 4903

+8 -11

drivers/md/raid5-cache.c

··· 735 735 736 736 static struct bio *r5l_bio_alloc(struct r5l_log *log) 737 737 { 738 - struct bio *bio = bio_alloc_bioset(GFP_NOIO, BIO_MAX_VECS, &log->bs); 738 + struct bio *bio = bio_alloc_bioset(log->rdev->bdev, BIO_MAX_VECS, 739 + REQ_OP_WRITE, GFP_NOIO, &log->bs); 739 740 740 - bio_set_op_attrs(bio, REQ_OP_WRITE, 0); 741 - bio_set_dev(bio, log->rdev->bdev); 742 741 bio->bi_iter.bi_sector = log->rdev->data_offset + log->log_start; 743 742 744 743 return bio; ··· 1301 1302 1302 1303 if (!do_flush) 1303 1304 return; 1304 - bio_reset(&log->flush_bio); 1305 - bio_set_dev(&log->flush_bio, log->rdev->bdev); 1305 + bio_reset(&log->flush_bio, log->rdev->bdev, 1306 + REQ_OP_WRITE | REQ_PREFLUSH); 1306 1307 log->flush_bio.bi_end_io = r5l_log_flush_endio; 1307 - log->flush_bio.bi_opf = REQ_OP_WRITE | REQ_PREFLUSH; 1308 1308 submit_bio(&log->flush_bio); 1309 1309 } 1310 1310 ··· 1632 1634 { 1633 1635 struct page *page; 1634 1636 1635 - ctx->ra_bio = bio_alloc_bioset(GFP_KERNEL, BIO_MAX_VECS, &log->bs); 1637 + ctx->ra_bio = bio_alloc_bioset(NULL, BIO_MAX_VECS, 0, GFP_KERNEL, 1638 + &log->bs); 1636 1639 if (!ctx->ra_bio) 1637 1640 return -ENOMEM; 1638 1641 ··· 1677 1678 struct r5l_recovery_ctx *ctx, 1678 1679 sector_t offset) 1679 1680 { 1680 - bio_reset(ctx->ra_bio); 1681 - bio_set_dev(ctx->ra_bio, log->rdev->bdev); 1682 - bio_set_op_attrs(ctx->ra_bio, REQ_OP_READ, 0); 1681 + bio_reset(ctx->ra_bio, log->rdev->bdev, REQ_OP_READ); 1683 1682 ctx->ra_bio->bi_iter.bi_sector = log->rdev->data_offset + offset; 1684 1683 1685 1684 ctx->valid_pages = 0; ··· 3105 3108 INIT_LIST_HEAD(&log->io_end_ios); 3106 3109 INIT_LIST_HEAD(&log->flushing_ios); 3107 3110 INIT_LIST_HEAD(&log->finished_ios); 3108 - bio_init(&log->flush_bio, NULL, 0); 3111 + bio_init(&log->flush_bio, NULL, NULL, 0, 0); 3109 3112 3110 3113 log->io_kc = KMEM_CACHE(r5l_io_unit, 0); 3111 3114 if (!log->io_kc)

+10 -16

drivers/md/raid5-ppl.c

··· 250 250 INIT_LIST_HEAD(&io->stripe_list); 251 251 atomic_set(&io->pending_stripes, 0); 252 252 atomic_set(&io->pending_flushes, 0); 253 - bio_init(&io->bio, io->biovec, PPL_IO_INLINE_BVECS); 253 + bio_init(&io->bio, NULL, io->biovec, PPL_IO_INLINE_BVECS, 0); 254 254 255 255 pplhdr = page_address(io->header_page); 256 256 clear_page(pplhdr); ··· 416 416 417 417 static void ppl_submit_iounit_bio(struct ppl_io_unit *io, struct bio *bio) 418 418 { 419 - char b[BDEVNAME_SIZE]; 420 - 421 - pr_debug("%s: seq: %llu size: %u sector: %llu dev: %s\n", 419 + pr_debug("%s: seq: %llu size: %u sector: %llu dev: %pg\n", 422 420 __func__, io->seq, bio->bi_iter.bi_size, 423 421 (unsigned long long)bio->bi_iter.bi_sector, 424 - bio_devname(bio, b)); 422 + bio->bi_bdev); 425 423 426 424 submit_bio(bio); 427 425 } ··· 494 496 if (!bio_add_page(bio, sh->ppl_page, PAGE_SIZE, 0)) { 495 497 struct bio *prev = bio; 496 498 497 - bio = bio_alloc_bioset(GFP_NOIO, BIO_MAX_VECS, 499 + bio = bio_alloc_bioset(prev->bi_bdev, BIO_MAX_VECS, 500 + prev->bi_opf, GFP_NOIO, 498 501 &ppl_conf->bs); 499 - bio->bi_opf = prev->bi_opf; 500 502 bio->bi_write_hint = prev->bi_write_hint; 501 - bio_copy_dev(bio, prev); 502 503 bio->bi_iter.bi_sector = bio_end_sector(prev); 503 504 bio_add_page(bio, sh->ppl_page, PAGE_SIZE, 0); 504 505 ··· 587 590 struct ppl_log *log = io->log; 588 591 struct ppl_conf *ppl_conf = log->ppl_conf; 589 592 struct r5conf *conf = ppl_conf->mddev->private; 590 - char b[BDEVNAME_SIZE]; 591 593 592 - pr_debug("%s: dev: %s\n", __func__, bio_devname(bio, b)); 594 + pr_debug("%s: dev: %pg\n", __func__, bio->bi_bdev); 593 595 594 596 if (bio->bi_status) { 595 597 struct md_rdev *rdev; ··· 631 635 632 636 if (bdev) { 633 637 struct bio *bio; 634 - char b[BDEVNAME_SIZE]; 635 638 636 - bio = bio_alloc_bioset(GFP_NOIO, 0, &ppl_conf->flush_bs); 637 - bio_set_dev(bio, bdev); 639 + bio = bio_alloc_bioset(bdev, 0, GFP_NOIO, 640 + REQ_OP_WRITE | REQ_PREFLUSH, 641 + &ppl_conf->flush_bs); 638 642 bio->bi_private = io; 639 - bio->bi_opf = REQ_OP_WRITE | REQ_PREFLUSH; 640 643 bio->bi_end_io = ppl_flush_endio; 641 644 642 - pr_debug("%s: dev: %s\n", __func__, 643 - bio_devname(bio, b)); 645 + pr_debug("%s: dev: %ps\n", __func__, bio->bi_bdev); 644 646 645 647 submit_bio(bio); 646 648 flushed_disks++;

+8 -8

drivers/md/raid5.c

··· 2310 2310 for (i = 0; i < disks; i++) { 2311 2311 struct r5dev *dev = &sh->dev[i]; 2312 2312 2313 - bio_init(&dev->req, &dev->vec, 1); 2314 - bio_init(&dev->rreq, &dev->rvec, 1); 2313 + bio_init(&dev->req, NULL, &dev->vec, 1, 0); 2314 + bio_init(&dev->rreq, NULL, &dev->rvec, 1, 0); 2315 2315 } 2316 2316 2317 2317 if (raid5_has_ppl(conf)) { ··· 2677 2677 (unsigned long long)sh->sector, i, atomic_read(&sh->count), 2678 2678 bi->bi_status); 2679 2679 if (i == disks) { 2680 - bio_reset(bi); 2680 + bio_reset(bi, NULL, 0); 2681 2681 BUG(); 2682 2682 return; 2683 2683 } ··· 2785 2785 } 2786 2786 } 2787 2787 rdev_dec_pending(rdev, conf->mddev); 2788 - bio_reset(bi); 2788 + bio_reset(bi, NULL, 0); 2789 2789 clear_bit(R5_LOCKED, &sh->dev[i].flags); 2790 2790 set_bit(STRIPE_HANDLE, &sh->state); 2791 2791 raid5_release_stripe(sh); ··· 2823 2823 (unsigned long long)sh->sector, i, atomic_read(&sh->count), 2824 2824 bi->bi_status); 2825 2825 if (i == disks) { 2826 - bio_reset(bi); 2826 + bio_reset(bi, NULL, 0); 2827 2827 BUG(); 2828 2828 return; 2829 2829 } ··· 2860 2860 if (sh->batch_head && bi->bi_status && !replacement) 2861 2861 set_bit(STRIPE_BATCH_ERR, &sh->batch_head->state); 2862 2862 2863 - bio_reset(bi); 2863 + bio_reset(bi, NULL, 0); 2864 2864 if (!test_and_clear_bit(R5_DOUBLE_LOCKED, &sh->dev[i].flags)) 2865 2865 clear_bit(R5_LOCKED, &sh->dev[i].flags); 2866 2866 set_bit(STRIPE_HANDLE, &sh->state); ··· 5438 5438 return 0; 5439 5439 } 5440 5440 5441 - align_bio = bio_clone_fast(raid_bio, GFP_NOIO, &mddev->io_acct_set); 5441 + align_bio = bio_alloc_clone(rdev->bdev, raid_bio, GFP_NOIO, 5442 + &mddev->io_acct_set); 5442 5443 md_io_acct = container_of(align_bio, struct md_io_acct, bio_clone); 5443 5444 raid_bio->bi_next = (void *)rdev; 5444 5445 if (blk_queue_io_stat(raid_bio->bi_bdev->bd_disk->queue)) 5445 5446 md_io_acct->start_time = bio_start_io_acct(raid_bio); 5446 5447 md_io_acct->orig_bio = raid_bio; 5447 5448 5448 - bio_set_dev(align_bio, rdev->bdev); 5449 5449 align_bio->bi_end_io = raid5_align_endio; 5450 5450 align_bio->bi_private = md_io_acct; 5451 5451 align_bio->bi_iter.bi_sector = sector;

+15 -49

drivers/memstick/core/ms_block.c

··· 1943 1943 static DEFINE_IDR(msb_disk_idr); /*set of used disk numbers */ 1944 1944 static DEFINE_MUTEX(msb_disk_lock); /* protects against races in open/release */ 1945 1945 1946 - static int msb_bd_open(struct block_device *bdev, fmode_t mode) 1947 - { 1948 - struct gendisk *disk = bdev->bd_disk; 1949 - struct msb_data *msb = disk->private_data; 1950 - 1951 - dbg_verbose("block device open"); 1952 - 1953 - mutex_lock(&msb_disk_lock); 1954 - 1955 - if (msb && msb->card) 1956 - msb->usage_count++; 1957 - 1958 - mutex_unlock(&msb_disk_lock); 1959 - return 0; 1960 - } 1961 - 1962 1946 static void msb_data_clear(struct msb_data *msb) 1963 1947 { 1964 1948 kfree(msb->boot_page); ··· 1952 1968 msb->card = NULL; 1953 1969 } 1954 1970 1955 - static int msb_disk_release(struct gendisk *disk) 1956 - { 1957 - struct msb_data *msb = disk->private_data; 1958 - 1959 - dbg_verbose("block device release"); 1960 - mutex_lock(&msb_disk_lock); 1961 - 1962 - if (msb) { 1963 - if (msb->usage_count) 1964 - msb->usage_count--; 1965 - 1966 - if (!msb->usage_count) { 1967 - disk->private_data = NULL; 1968 - idr_remove(&msb_disk_idr, msb->disk_id); 1969 - put_disk(disk); 1970 - kfree(msb); 1971 - } 1972 - } 1973 - mutex_unlock(&msb_disk_lock); 1974 - return 0; 1975 - } 1976 - 1977 - static void msb_bd_release(struct gendisk *disk, fmode_t mode) 1978 - { 1979 - msb_disk_release(disk); 1980 - } 1981 - 1982 1971 static int msb_bd_getgeo(struct block_device *bdev, 1983 1972 struct hd_geometry *geo) 1984 1973 { 1985 1974 struct msb_data *msb = bdev->bd_disk->private_data; 1986 1975 *geo = msb->geometry; 1987 1976 return 0; 1977 + } 1978 + 1979 + static void msb_bd_free_disk(struct gendisk *disk) 1980 + { 1981 + struct msb_data *msb = disk->private_data; 1982 + 1983 + mutex_lock(&msb_disk_lock); 1984 + idr_remove(&msb_disk_idr, msb->disk_id); 1985 + mutex_unlock(&msb_disk_lock); 1986 + 1987 + kfree(msb); 1988 1988 } 1989 1989 1990 1990 static blk_status_t msb_queue_rq(struct blk_mq_hw_ctx *hctx, ··· 2064 2096 } 2065 2097 2066 2098 static const struct block_device_operations msb_bdops = { 2067 - .open = msb_bd_open, 2068 - .release = msb_bd_release, 2069 - .getgeo = msb_bd_getgeo, 2070 - .owner = THIS_MODULE 2099 + .owner = THIS_MODULE, 2100 + .getgeo = msb_bd_getgeo, 2101 + .free_disk = msb_bd_free_disk, 2071 2102 }; 2072 2103 2073 2104 static const struct blk_mq_ops msb_mq_ops = { ··· 2114 2147 set_capacity(msb->disk, capacity); 2115 2148 dbg("Set total disk size to %lu sectors", capacity); 2116 2149 2117 - msb->usage_count = 1; 2118 2150 msb->io_queue = alloc_ordered_workqueue("ms_block", WQ_MEM_RECLAIM); 2119 2151 INIT_WORK(&msb->io_work, msb_io_work); 2120 2152 sg_init_table(msb->prealloc_sg, MS_BLOCK_MAX_SEGS+1); ··· 2195 2229 msb_data_clear(msb); 2196 2230 mutex_unlock(&msb_disk_lock); 2197 2231 2198 - msb_disk_release(msb->disk); 2232 + put_disk(msb->disk); 2199 2233 memstick_set_drvdata(card, NULL); 2200 2234 } 2201 2235

-1

drivers/memstick/core/ms_block.h

··· 143 143 } __packed; 144 144 145 145 struct msb_data { 146 - unsigned int usage_count; 147 146 struct memstick_dev *card; 148 147 struct gendisk *disk; 149 148 struct request_queue *queue;

+10 -47

drivers/memstick/core/mspro_block.c

··· 133 133 134 134 struct mspro_block_data { 135 135 struct memstick_dev *card; 136 - unsigned int usage_count; 137 136 unsigned int caps; 138 137 struct gendisk *disk; 139 138 struct request_queue *queue; ··· 177 178 178 179 /*** Block device ***/ 179 180 180 - static int mspro_block_bd_open(struct block_device *bdev, fmode_t mode) 181 - { 182 - struct gendisk *disk = bdev->bd_disk; 183 - struct mspro_block_data *msb = disk->private_data; 184 - int rc = -ENXIO; 185 - 186 - mutex_lock(&mspro_block_disk_lock); 187 - 188 - if (msb && msb->card) { 189 - msb->usage_count++; 190 - if ((mode & FMODE_WRITE) && msb->read_only) 191 - rc = -EROFS; 192 - else 193 - rc = 0; 194 - } 195 - 196 - mutex_unlock(&mspro_block_disk_lock); 197 - 198 - return rc; 199 - } 200 - 201 - 202 - static void mspro_block_disk_release(struct gendisk *disk) 181 + static void mspro_block_bd_free_disk(struct gendisk *disk) 203 182 { 204 183 struct mspro_block_data *msb = disk->private_data; 205 184 int disk_id = MINOR(disk_devt(disk)) >> MSPRO_BLOCK_PART_SHIFT; 206 185 207 186 mutex_lock(&mspro_block_disk_lock); 208 - 209 - if (msb) { 210 - if (msb->usage_count) 211 - msb->usage_count--; 212 - 213 - if (!msb->usage_count) { 214 - kfree(msb); 215 - disk->private_data = NULL; 216 - idr_remove(&mspro_block_disk_idr, disk_id); 217 - put_disk(disk); 218 - } 219 - } 220 - 187 + idr_remove(&mspro_block_disk_idr, disk_id); 221 188 mutex_unlock(&mspro_block_disk_lock); 222 - } 223 189 224 - static void mspro_block_bd_release(struct gendisk *disk, fmode_t mode) 225 - { 226 - mspro_block_disk_release(disk); 190 + kfree(msb); 227 191 } 228 192 229 193 static int mspro_block_bd_getgeo(struct block_device *bdev, ··· 202 240 } 203 241 204 242 static const struct block_device_operations ms_block_bdops = { 205 - .open = mspro_block_bd_open, 206 - .release = mspro_block_bd_release, 207 - .getgeo = mspro_block_bd_getgeo, 208 - .owner = THIS_MODULE 243 + .owner = THIS_MODULE, 244 + .getgeo = mspro_block_bd_getgeo, 245 + .free_disk = mspro_block_bd_free_disk, 209 246 }; 210 247 211 248 /*** Information ***/ ··· 1187 1226 msb->disk->first_minor = disk_id << MSPRO_BLOCK_PART_SHIFT; 1188 1227 msb->disk->minors = 1 << MSPRO_BLOCK_PART_SHIFT; 1189 1228 msb->disk->fops = &ms_block_bdops; 1190 - msb->usage_count = 1; 1191 1229 msb->disk->private_data = msb; 1192 1230 1193 1231 sprintf(msb->disk->disk_name, "mspblk%d", disk_id); ··· 1198 1238 capacity *= msb->page_size >> 9; 1199 1239 set_capacity(msb->disk, capacity); 1200 1240 dev_dbg(&card->dev, "capacity set %ld\n", capacity); 1241 + 1242 + if (msb->read_only) 1243 + set_disk_ro(msb->disk, true); 1201 1244 1202 1245 rc = device_add_disk(&card->dev, msb->disk, NULL); 1203 1246 if (rc) ··· 1304 1341 mspro_block_data_clear(msb); 1305 1342 mutex_unlock(&mspro_block_disk_lock); 1306 1343 1307 - mspro_block_disk_release(msb->disk); 1344 + put_disk(msb->disk); 1308 1345 memstick_set_drvdata(card, NULL); 1309 1346 } 1310 1347

+1 -1

drivers/mtd/mtdswap.c

··· 19 19 #include <linux/sched.h> 20 20 #include <linux/slab.h> 21 21 #include <linux/vmalloc.h> 22 - #include <linux/genhd.h> 22 + #include <linux/blkdev.h> 23 23 #include <linux/swap.h> 24 24 #include <linux/debugfs.h> 25 25 #include <linux/seq_file.h>

-1

drivers/mtd/nand/raw/sharpsl.c

··· 6 6 * Based on Sharp's NAND driver sharp_sl.c 7 7 */ 8 8 9 - #include <linux/genhd.h> 10 9 #include <linux/slab.h> 11 10 #include <linux/module.h> 12 11 #include <linux/delay.h>

-1

drivers/nvdimm/blk.c

··· 6 6 7 7 #include <linux/blkdev.h> 8 8 #include <linux/fs.h> 9 - #include <linux/genhd.h> 10 9 #include <linux/module.h> 11 10 #include <linux/moduleparam.h> 12 11 #include <linux/nd.h>

-1

drivers/nvdimm/btt.c

··· 11 11 #include <linux/device.h> 12 12 #include <linux/mutex.h> 13 13 #include <linux/hdreg.h> 14 - #include <linux/genhd.h> 15 14 #include <linux/sizes.h> 16 15 #include <linux/ndctl.h> 17 16 #include <linux/fs.h>

-1

drivers/nvdimm/btt_devs.c

··· 4 4 */ 5 5 #include <linux/blkdev.h> 6 6 #include <linux/device.h> 7 - #include <linux/genhd.h> 8 7 #include <linux/sizes.h> 9 8 #include <linux/slab.h> 10 9 #include <linux/fs.h>

-1

drivers/nvdimm/bus.c

··· 11 11 #include <linux/blkdev.h> 12 12 #include <linux/fcntl.h> 13 13 #include <linux/async.h> 14 - #include <linux/genhd.h> 15 14 #include <linux/ndctl.h> 16 15 #include <linux/sched.h> 17 16 #include <linux/slab.h>

+3 -3

drivers/nvdimm/nd_virtio.c

··· 105 105 * parent bio. Otherwise directly call nd_region flush. 106 106 */ 107 107 if (bio && bio->bi_iter.bi_sector != -1) { 108 - struct bio *child = bio_alloc(GFP_ATOMIC, 0); 108 + struct bio *child = bio_alloc(bio->bi_bdev, 0, REQ_PREFLUSH, 109 + GFP_ATOMIC); 109 110 110 111 if (!child) 111 112 return -ENOMEM; 112 - bio_copy_dev(child, bio); 113 - child->bi_opf = REQ_PREFLUSH; 113 + bio_clone_blkg_association(child, bio); 114 114 child->bi_iter.bi_sector = -1; 115 115 bio_chain(child, bio); 116 116 submit_bio(child);

-1

drivers/nvdimm/pfn_devs.c

··· 5 5 #include <linux/memremap.h> 6 6 #include <linux/blkdev.h> 7 7 #include <linux/device.h> 8 - #include <linux/genhd.h> 9 8 #include <linux/sizes.h> 10 9 #include <linux/slab.h> 11 10 #include <linux/fs.h>

+8 -10

drivers/nvme/target/io-cmd-bdev.c

··· 267 267 268 268 if (nvmet_use_inline_bvec(req)) { 269 269 bio = &req->b.inline_bio; 270 - bio_init(bio, req->inline_bvec, ARRAY_SIZE(req->inline_bvec)); 270 + bio_init(bio, req->ns->bdev, req->inline_bvec, 271 + ARRAY_SIZE(req->inline_bvec), op); 271 272 } else { 272 - bio = bio_alloc(GFP_KERNEL, bio_max_segs(sg_cnt)); 273 + bio = bio_alloc(req->ns->bdev, bio_max_segs(sg_cnt), op, 274 + GFP_KERNEL); 273 275 } 274 - bio_set_dev(bio, req->ns->bdev); 275 276 bio->bi_iter.bi_sector = sector; 276 277 bio->bi_private = req; 277 278 bio->bi_end_io = nvmet_bio_done; 278 - bio->bi_opf = op; 279 279 280 280 blk_start_plug(&plug); 281 281 if (req->metadata_len) ··· 296 296 } 297 297 } 298 298 299 - bio = bio_alloc(GFP_KERNEL, bio_max_segs(sg_cnt)); 300 - bio_set_dev(bio, req->ns->bdev); 299 + bio = bio_alloc(req->ns->bdev, bio_max_segs(sg_cnt), 300 + op, GFP_KERNEL); 301 301 bio->bi_iter.bi_sector = sector; 302 - bio->bi_opf = op; 303 302 304 303 bio_chain(bio, prev); 305 304 submit_bio(prev); ··· 327 328 if (!nvmet_check_transfer_len(req, 0)) 328 329 return; 329 330 330 - bio_init(bio, req->inline_bvec, ARRAY_SIZE(req->inline_bvec)); 331 - bio_set_dev(bio, req->ns->bdev); 331 + bio_init(bio, req->ns->bdev, req->inline_bvec, 332 + ARRAY_SIZE(req->inline_bvec), REQ_OP_WRITE | REQ_PREFLUSH); 332 333 bio->bi_private = req; 333 334 bio->bi_end_io = nvmet_bio_done; 334 - bio->bi_opf = REQ_OP_WRITE | REQ_PREFLUSH; 335 335 336 336 submit_bio(bio); 337 337 }

+4 -3

drivers/nvme/target/passthru.c

··· 206 206 207 207 if (nvmet_use_inline_bvec(req)) { 208 208 bio = &req->p.inline_bio; 209 - bio_init(bio, req->inline_bvec, ARRAY_SIZE(req->inline_bvec)); 209 + bio_init(bio, NULL, req->inline_bvec, 210 + ARRAY_SIZE(req->inline_bvec), req_op(rq)); 210 211 } else { 211 - bio = bio_alloc(GFP_KERNEL, bio_max_segs(req->sg_cnt)); 212 + bio = bio_alloc(NULL, bio_max_segs(req->sg_cnt), req_op(rq), 213 + GFP_KERNEL); 212 214 bio->bi_end_io = bio_put; 213 215 } 214 - bio->bi_opf = req_op(rq); 215 216 216 217 for_each_sg(req->sg, sg, req->sg_cnt, i) { 217 218 if (bio_add_pc_page(rq->q, bio, sg_page(sg), sg->length,

+7 -7

drivers/nvme/target/zns.c

··· 412 412 413 413 while (sector < get_capacity(bdev->bd_disk)) { 414 414 if (test_bit(blk_queue_zone_no(q, sector), d.zbitmap)) { 415 - bio = blk_next_bio(bio, 0, GFP_KERNEL); 416 - bio->bi_opf = zsa_req_op(req->cmd->zms.zsa) | REQ_SYNC; 415 + bio = blk_next_bio(bio, bdev, 0, 416 + zsa_req_op(req->cmd->zms.zsa) | REQ_SYNC, 417 + GFP_KERNEL); 417 418 bio->bi_iter.bi_sector = sector; 418 - bio_set_dev(bio, bdev); 419 419 /* This may take a while, so be nice to others */ 420 420 cond_resched(); 421 421 } ··· 522 522 void nvmet_bdev_execute_zone_append(struct nvmet_req *req) 523 523 { 524 524 sector_t sect = nvmet_lba_to_sect(req->ns, req->cmd->rw.slba); 525 + const unsigned int op = REQ_OP_ZONE_APPEND | REQ_SYNC | REQ_IDLE; 525 526 u16 status = NVME_SC_SUCCESS; 526 527 unsigned int total_len = 0; 527 528 struct scatterlist *sg; ··· 552 551 553 552 if (nvmet_use_inline_bvec(req)) { 554 553 bio = &req->z.inline_bio; 555 - bio_init(bio, req->inline_bvec, ARRAY_SIZE(req->inline_bvec)); 554 + bio_init(bio, req->ns->bdev, req->inline_bvec, 555 + ARRAY_SIZE(req->inline_bvec), op); 556 556 } else { 557 - bio = bio_alloc(GFP_KERNEL, req->sg_cnt); 557 + bio = bio_alloc(req->ns->bdev, req->sg_cnt, op, GFP_KERNEL); 558 558 } 559 559 560 - bio->bi_opf = REQ_OP_ZONE_APPEND | REQ_SYNC | REQ_IDLE; 561 560 bio->bi_end_io = nvmet_bdev_zone_append_bio_done; 562 - bio_set_dev(bio, req->ns->bdev); 563 561 bio->bi_iter.bi_sector = sect; 564 562 bio->bi_private = req; 565 563 if (req->cmd->rw.control & cpu_to_le16(NVME_RW_FUA))

-1

drivers/s390/block/dasd_int.h

··· 47 47 #include <linux/module.h> 48 48 #include <linux/wait.h> 49 49 #include <linux/blkdev.h> 50 - #include <linux/genhd.h> 51 50 #include <linux/hdreg.h> 52 51 #include <linux/interrupt.h> 53 52 #include <linux/log2.h>

-1

drivers/s390/block/scm_blk.c

··· 15 15 #include <linux/module.h> 16 16 #include <linux/blkdev.h> 17 17 #include <linux/blk-mq.h> 18 - #include <linux/genhd.h> 19 18 #include <linux/slab.h> 20 19 #include <linux/list.h> 21 20 #include <asm/eadm.h>

-1

drivers/s390/block/scm_blk.h

··· 6 6 #include <linux/spinlock.h> 7 7 #include <linux/blkdev.h> 8 8 #include <linux/blk-mq.h> 9 - #include <linux/genhd.h> 10 9 #include <linux/list.h> 11 10 12 11 #include <asm/debug.h>

-1

drivers/scsi/scsi_debug.c

··· 23 23 #include <linux/slab.h> 24 24 #include <linux/types.h> 25 25 #include <linux/string.h> 26 - #include <linux/genhd.h> 27 26 #include <linux/fs.h> 28 27 #include <linux/init.h> 29 28 #include <linux/proc_fs.h>

+1 -1

drivers/scsi/scsi_lib.c

··· 1276 1276 * power management commands. 1277 1277 */ 1278 1278 if (req && !(req->rq_flags & RQF_PM)) 1279 - return BLK_STS_IOERR; 1279 + return BLK_STS_OFFLINE; 1280 1280 return BLK_STS_OK; 1281 1281 } 1282 1282 }

-1

drivers/scsi/scsicam.c

··· 14 14 #include <linux/module.h> 15 15 #include <linux/slab.h> 16 16 #include <linux/fs.h> 17 - #include <linux/genhd.h> 18 17 #include <linux/kernel.h> 19 18 #include <linux/blkdev.h> 20 19 #include <linux/pagemap.h>

-1

drivers/scsi/sd.c

··· 38 38 #include <linux/kernel.h> 39 39 #include <linux/mm.h> 40 40 #include <linux/bio.h> 41 - #include <linux/genhd.h> 42 41 #include <linux/hdreg.h> 43 42 #include <linux/errno.h> 44 43 #include <linux/idr.h>

-1

drivers/scsi/sr.h

··· 18 18 #ifndef _SR_H 19 19 #define _SR_H 20 20 21 - #include <linux/genhd.h> 22 21 #include <linux/kref.h> 23 22 #include <linux/mutex.h> 24 23

+2 -2

drivers/scsi/ufs/ufshpb.c

··· 494 494 if (!map_req) 495 495 return NULL; 496 496 497 - bio = bio_alloc(GFP_KERNEL, hpb->pages_per_srgn); 497 + bio = bio_alloc(NULL, hpb->pages_per_srgn, 0, GFP_KERNEL); 498 498 if (!bio) { 499 499 ufshpb_put_req(hpb, map_req); 500 500 return NULL; ··· 2050 2050 INIT_LIST_HEAD(&pre_req->list_req); 2051 2051 pre_req->req = NULL; 2052 2052 2053 - pre_req->bio = bio_alloc(GFP_KERNEL, 1); 2053 + pre_req->bio = bio_alloc(NULL, 1, 0, GFP_KERNEL); 2054 2054 if (!pre_req->bio) 2055 2055 goto release_mem; 2056 2056

+4 -8

drivers/target/target_core_iblock.c

··· 20 20 #include <linux/slab.h> 21 21 #include <linux/spinlock.h> 22 22 #include <linux/bio.h> 23 - #include <linux/genhd.h> 24 23 #include <linux/file.h> 25 24 #include <linux/module.h> 26 25 #include <linux/scatterlist.h> ··· 352 353 * Only allocate as many vector entries as the bio code allows us to, 353 354 * we'll loop later on until we have handled the whole request. 354 355 */ 355 - bio = bio_alloc_bioset(GFP_NOIO, bio_max_segs(sg_num), 356 - &ib_dev->ibd_bio_set); 356 + bio = bio_alloc_bioset(ib_dev->ibd_bd, bio_max_segs(sg_num), opf, 357 + GFP_NOIO, &ib_dev->ibd_bio_set); 357 358 if (!bio) { 358 359 pr_err("Unable to allocate memory for bio\n"); 359 360 return NULL; 360 361 } 361 362 362 - bio_set_dev(bio, ib_dev->ibd_bd); 363 363 bio->bi_private = cmd; 364 364 bio->bi_end_io = &iblock_bio_done; 365 365 bio->bi_iter.bi_sector = lba; 366 - bio->bi_opf = opf; 367 366 368 367 return bio; 369 368 } ··· 415 418 if (immed) 416 419 target_complete_cmd(cmd, SAM_STAT_GOOD); 417 420 418 - bio = bio_alloc(GFP_KERNEL, 0); 421 + bio = bio_alloc(ib_dev->ibd_bd, 0, REQ_OP_WRITE | REQ_PREFLUSH, 422 + GFP_KERNEL); 419 423 bio->bi_end_io = iblock_end_io_flush; 420 - bio_set_dev(bio, ib_dev->ibd_bd); 421 - bio->bi_opf = REQ_OP_WRITE | REQ_PREFLUSH; 422 424 if (!immed) 423 425 bio->bi_private = cmd; 424 426 submit_bio(bio);

-1

drivers/target/target_core_pscsi.c

··· 17 17 #include <linux/blk_types.h> 18 18 #include <linux/slab.h> 19 19 #include <linux/spinlock.h> 20 - #include <linux/genhd.h> 21 20 #include <linux/cdrom.h> 22 21 #include <linux/ratelimit.h> 23 22 #include <linux/module.h>

-1

fs/btrfs/check-integrity.c

··· 78 78 #include <linux/sched.h> 79 79 #include <linux/slab.h> 80 80 #include <linux/mutex.h> 81 - #include <linux/genhd.h> 82 81 #include <linux/blkdev.h> 83 82 #include <linux/mm.h> 84 83 #include <linux/string.h>

+4 -6

fs/btrfs/disk-io.c

··· 4033 4033 * to do I/O, so we don't lose the ability to do integrity 4034 4034 * checking. 4035 4035 */ 4036 - bio = bio_alloc(GFP_NOFS, 1); 4037 - bio_set_dev(bio, device->bdev); 4036 + bio = bio_alloc(device->bdev, 1, 4037 + REQ_OP_WRITE | REQ_SYNC | REQ_META | REQ_PRIO, 4038 + GFP_NOFS); 4038 4039 bio->bi_iter.bi_sector = bytenr >> SECTOR_SHIFT; 4039 4040 bio->bi_private = device; 4040 4041 bio->bi_end_io = btrfs_end_super_write; ··· 4047 4046 * go down lazy and there's a short window where the on-disk 4048 4047 * copies might still contain the older version. 4049 4048 */ 4050 - bio->bi_opf = REQ_OP_WRITE | REQ_SYNC | REQ_META | REQ_PRIO; 4051 4049 if (i == 0 && !btrfs_test_opt(device->fs_info, NOBARRIER)) 4052 4050 bio->bi_opf |= REQ_FUA; 4053 4051 ··· 4158 4158 return; 4159 4159 #endif 4160 4160 4161 - bio_reset(bio); 4161 + bio_reset(bio, device->bdev, REQ_OP_WRITE | REQ_SYNC | REQ_PREFLUSH); 4162 4162 bio->bi_end_io = btrfs_end_empty_barrier; 4163 - bio_set_dev(bio, device->bdev); 4164 - bio->bi_opf = REQ_OP_WRITE | REQ_SYNC | REQ_PREFLUSH; 4165 4163 init_completion(&device->flush_wait); 4166 4164 bio->bi_private = &device->flush_wait; 4167 4165

+3 -3

fs/btrfs/extent_io.c

··· 3143 3143 struct bio *bio; 3144 3144 3145 3145 ASSERT(0 < nr_iovecs && nr_iovecs <= BIO_MAX_VECS); 3146 - bio = bio_alloc_bioset(GFP_NOFS, nr_iovecs, &btrfs_bioset); 3146 + bio = bio_alloc_bioset(NULL, nr_iovecs, 0, GFP_NOFS, &btrfs_bioset); 3147 3147 btrfs_bio_init(btrfs_bio(bio)); 3148 3148 return bio; 3149 3149 } ··· 3154 3154 struct bio *new; 3155 3155 3156 3156 /* Bio allocation backed by a bioset does not fail */ 3157 - new = bio_clone_fast(bio, GFP_NOFS, &btrfs_bioset); 3157 + new = bio_alloc_clone(bio->bi_bdev, bio, GFP_NOFS, &btrfs_bioset); 3158 3158 bbio = btrfs_bio(new); 3159 3159 btrfs_bio_init(bbio); 3160 3160 bbio->iter = bio->bi_iter; ··· 3169 3169 ASSERT(offset <= UINT_MAX && size <= UINT_MAX); 3170 3170 3171 3171 /* this will never fail when it's backed by a bioset */ 3172 - bio = bio_clone_fast(orig, GFP_NOFS, &btrfs_bioset); 3172 + bio = bio_alloc_clone(orig->bi_bdev, orig, GFP_NOFS, &btrfs_bioset); 3173 3173 ASSERT(bio); 3174 3174 3175 3175 bbio = btrfs_bio(bio);

+6 -8

fs/buffer.c

··· 3024 3024 if (test_set_buffer_req(bh) && (op == REQ_OP_WRITE)) 3025 3025 clear_buffer_write_io_error(bh); 3026 3026 3027 - bio = bio_alloc(GFP_NOIO, 1); 3027 + if (buffer_meta(bh)) 3028 + op_flags |= REQ_META; 3029 + if (buffer_prio(bh)) 3030 + op_flags |= REQ_PRIO; 3031 + 3032 + bio = bio_alloc(bh->b_bdev, 1, op | op_flags, GFP_NOIO); 3028 3033 3029 3034 fscrypt_set_bio_crypt_ctx_bh(bio, bh, GFP_NOIO); 3030 3035 3031 3036 bio->bi_iter.bi_sector = bh->b_blocknr * (bh->b_size >> 9); 3032 - bio_set_dev(bio, bh->b_bdev); 3033 3037 bio->bi_write_hint = write_hint; 3034 3038 3035 3039 bio_add_page(bio, bh->b_page, bh->b_size, bh_offset(bh)); ··· 3041 3037 3042 3038 bio->bi_end_io = end_bio_bh_io_sync; 3043 3039 bio->bi_private = bh; 3044 - 3045 - if (buffer_meta(bh)) 3046 - op_flags |= REQ_META; 3047 - if (buffer_prio(bh)) 3048 - op_flags |= REQ_PRIO; 3049 - bio_set_op_attrs(bio, op, op_flags); 3050 3040 3051 3041 /* Take care of bh's that straddle the end of the device */ 3052 3042 guard_bio_eod(bio);

+5 -8

fs/crypto/bio.c

··· 54 54 int num_pages = 0; 55 55 56 56 /* This always succeeds since __GFP_DIRECT_RECLAIM is set. */ 57 - bio = bio_alloc(GFP_NOFS, BIO_MAX_VECS); 57 + bio = bio_alloc(inode->i_sb->s_bdev, BIO_MAX_VECS, REQ_OP_WRITE, 58 + GFP_NOFS); 58 59 59 60 while (len) { 60 61 unsigned int blocks_this_page = min(len, blocks_per_page); ··· 63 62 64 63 if (num_pages == 0) { 65 64 fscrypt_set_bio_crypt_ctx(bio, inode, lblk, GFP_NOFS); 66 - bio_set_dev(bio, inode->i_sb->s_bdev); 67 65 bio->bi_iter.bi_sector = 68 66 pblk << (blockbits - SECTOR_SHIFT); 69 - bio_set_op_attrs(bio, REQ_OP_WRITE, 0); 70 67 } 71 68 ret = bio_add_page(bio, ZERO_PAGE(0), bytes_this_page, 0); 72 69 if (WARN_ON(ret != bytes_this_page)) { ··· 80 81 err = submit_bio_wait(bio); 81 82 if (err) 82 83 goto out; 83 - bio_reset(bio); 84 + bio_reset(bio, inode->i_sb->s_bdev, REQ_OP_WRITE); 84 85 num_pages = 0; 85 86 } 86 87 } ··· 149 150 return -EINVAL; 150 151 151 152 /* This always succeeds since __GFP_DIRECT_RECLAIM is set. */ 152 - bio = bio_alloc(GFP_NOFS, nr_pages); 153 + bio = bio_alloc(inode->i_sb->s_bdev, nr_pages, REQ_OP_WRITE, GFP_NOFS); 153 154 154 155 do { 155 - bio_set_dev(bio, inode->i_sb->s_bdev); 156 156 bio->bi_iter.bi_sector = pblk << (blockbits - 9); 157 - bio_set_op_attrs(bio, REQ_OP_WRITE, 0); 158 157 159 158 i = 0; 160 159 offset = 0; ··· 179 182 err = submit_bio_wait(bio); 180 183 if (err) 181 184 goto out; 182 - bio_reset(bio); 185 + bio_reset(bio, inode->i_sb->s_bdev, REQ_OP_WRITE); 183 186 } while (len != 0); 184 187 err = 0; 185 188 out:

-1

fs/dax.c

··· 11 11 #include <linux/buffer_head.h> 12 12 #include <linux/dax.h> 13 13 #include <linux/fs.h> 14 - #include <linux/genhd.h> 15 14 #include <linux/highmem.h> 16 15 #include <linux/memcontrol.h> 17 16 #include <linux/mm.h>

+1 -4

fs/direct-io.c

··· 396 396 * bio_alloc() is guaranteed to return a bio when allowed to sleep and 397 397 * we request a valid number of vectors. 398 398 */ 399 - bio = bio_alloc(GFP_KERNEL, nr_vecs); 400 - 401 - bio_set_dev(bio, bdev); 399 + bio = bio_alloc(bdev, nr_vecs, dio->op | dio->op_flags, GFP_KERNEL); 402 400 bio->bi_iter.bi_sector = first_sector; 403 - bio_set_op_attrs(bio, dio->op, dio->op_flags); 404 401 if (dio->is_async) 405 402 bio->bi_end_io = dio_bio_end_aio; 406 403 else

+2 -3

fs/erofs/zdata.c

··· 1370 1370 } 1371 1371 1372 1372 if (!bio) { 1373 - bio = bio_alloc(GFP_NOIO, BIO_MAX_VECS); 1373 + bio = bio_alloc(mdev.m_bdev, BIO_MAX_VECS, 1374 + REQ_OP_READ, GFP_NOIO); 1374 1375 bio->bi_end_io = z_erofs_decompressqueue_endio; 1375 1376 1376 - bio_set_dev(bio, mdev.m_bdev); 1377 1377 last_bdev = mdev.m_bdev; 1378 1378 bio->bi_iter.bi_sector = (sector_t)cur << 1379 1379 LOG_SECTORS_PER_BLOCK; 1380 1380 bio->bi_private = bi_private; 1381 - bio->bi_opf = REQ_OP_READ; 1382 1381 if (f->readahead) 1383 1382 bio->bi_opf |= REQ_RAHEAD; 1384 1383 ++nr_bios;

+3 -5

fs/ext4/page-io.c

··· 323 323 { 324 324 ext4_io_end_t *io_end = bio->bi_private; 325 325 sector_t bi_sector = bio->bi_iter.bi_sector; 326 - char b[BDEVNAME_SIZE]; 327 326 328 - if (WARN_ONCE(!io_end, "io_end is NULL: %s: sector %Lu len %u err %d\n", 329 - bio_devname(bio, b), 327 + if (WARN_ONCE(!io_end, "io_end is NULL: %pg: sector %Lu len %u err %d\n", 328 + bio->bi_bdev, 330 329 (long long) bio->bi_iter.bi_sector, 331 330 (unsigned) bio_sectors(bio), 332 331 bio->bi_status)) { ··· 397 398 * bio_alloc will _always_ be able to allocate a bio if 398 399 * __GFP_DIRECT_RECLAIM is set, see comments for bio_alloc_bioset(). 399 400 */ 400 - bio = bio_alloc(GFP_NOIO, BIO_MAX_VECS); 401 + bio = bio_alloc(bh->b_bdev, BIO_MAX_VECS, 0, GFP_NOIO); 401 402 fscrypt_set_bio_crypt_ctx_bh(bio, bh, GFP_NOIO); 402 403 bio->bi_iter.bi_sector = bh->b_blocknr * (bh->b_size >> 9); 403 - bio_set_dev(bio, bh->b_bdev); 404 404 bio->bi_end_io = ext4_end_bio; 405 405 bio->bi_private = ext4_get_io_end(io->io_end); 406 406 io->io_bio = bio;

+4 -4

fs/ext4/readpage.c

··· 365 365 * bio_alloc will _always_ be able to allocate a bio if 366 366 * __GFP_DIRECT_RECLAIM is set, see bio_alloc_bioset(). 367 367 */ 368 - bio = bio_alloc(GFP_KERNEL, bio_max_segs(nr_pages)); 368 + bio = bio_alloc(bdev, bio_max_segs(nr_pages), 369 + REQ_OP_READ, GFP_KERNEL); 369 370 fscrypt_set_bio_crypt_ctx(bio, inode, next_block, 370 371 GFP_KERNEL); 371 372 ext4_set_bio_post_read_ctx(bio, inode, page->index); 372 - bio_set_dev(bio, bdev); 373 373 bio->bi_iter.bi_sector = blocks[0] << (blkbits - 9); 374 374 bio->bi_end_io = mpage_end_io; 375 - bio_set_op_attrs(bio, REQ_OP_READ, 376 - rac ? REQ_RAHEAD : 0); 375 + if (rac) 376 + bio->bi_opf |= REQ_RAHEAD; 377 377 } 378 378 379 379 length = first_hole << blkbits;

+3 -4

fs/f2fs/data.c

··· 394 394 struct f2fs_sb_info *sbi = fio->sbi; 395 395 struct bio *bio; 396 396 397 - bio = bio_alloc_bioset(GFP_NOIO, npages, &f2fs_bioset); 397 + bio = bio_alloc_bioset(NULL, npages, 0, GFP_NOIO, &f2fs_bioset); 398 398 399 399 f2fs_target_device(sbi, fio->new_blkaddr, bio); 400 400 if (is_read_io(fio->op)) { ··· 985 985 struct bio_post_read_ctx *ctx = NULL; 986 986 unsigned int post_read_steps = 0; 987 987 988 - bio = bio_alloc_bioset(for_write ? GFP_NOIO : GFP_KERNEL, 989 - bio_max_segs(nr_pages), &f2fs_bioset); 988 + bio = bio_alloc_bioset(NULL, bio_max_segs(nr_pages), REQ_OP_READ, 989 + for_write ? GFP_NOIO : GFP_KERNEL, &f2fs_bioset); 990 990 if (!bio) 991 991 return ERR_PTR(-ENOMEM); 992 992 ··· 994 994 995 995 f2fs_target_device(sbi, blkaddr, bio); 996 996 bio->bi_end_io = f2fs_read_end_io; 997 - bio_set_op_attrs(bio, REQ_OP_READ, op_flag); 998 997 999 998 if (fscrypt_inode_uses_fs_layer_crypto(inode)) 1000 999 post_read_steps |= STEP_DECRYPT;

+2 -4

fs/fs-writeback.c

··· 1903 1903 * unplug, so get our IOs out the door before we 1904 1904 * give up the CPU. 1905 1905 */ 1906 - if (current->plug) 1907 - blk_flush_plug(current->plug, false); 1906 + blk_flush_plug(current->plug, false); 1908 1907 cond_resched(); 1909 1908 } 1910 1909 ··· 2300 2301 /* 2301 2302 * If we are expecting writeback progress we must submit plugged IO. 2302 2303 */ 2303 - if (blk_needs_flush_plug(current)) 2304 - blk_flush_plug(current->plug, true); 2304 + blk_flush_plug(current->plug, true); 2305 2305 2306 2306 rcu_read_lock(); 2307 2307 list_for_each_entry_rcu(bdi, &bdi_list, bdi_list)

+3 -5

fs/gfs2/lops.c

··· 265 265 bio_end_io_t *end_io) 266 266 { 267 267 struct super_block *sb = sdp->sd_vfs; 268 - struct bio *bio = bio_alloc(GFP_NOIO, BIO_MAX_VECS); 268 + struct bio *bio = bio_alloc(sb->s_bdev, BIO_MAX_VECS, 0, GFP_NOIO); 269 269 270 270 bio->bi_iter.bi_sector = blkno << sdp->sd_fsb2bb_shift; 271 - bio_set_dev(bio, sb->s_bdev); 272 271 bio->bi_end_io = end_io; 273 272 bio->bi_private = sdp; 274 273 ··· 488 489 { 489 490 struct bio *new; 490 491 491 - new = bio_alloc(GFP_NOIO, nr_iovecs); 492 - bio_copy_dev(new, prev); 492 + new = bio_alloc(prev->bi_bdev, nr_iovecs, prev->bi_opf, GFP_NOIO); 493 + bio_clone_blkg_association(new, prev); 493 494 new->bi_iter.bi_sector = bio_end_sector(prev); 494 - new->bi_opf = prev->bi_opf; 495 495 new->bi_write_hint = prev->bi_write_hint; 496 496 bio_chain(new, prev); 497 497 submit_bio(prev);

+1 -3

fs/gfs2/meta_io.c

··· 222 222 struct buffer_head *bh = *bhs; 223 223 struct bio *bio; 224 224 225 - bio = bio_alloc(GFP_NOIO, num); 225 + bio = bio_alloc(bh->b_bdev, num, op | op_flags, GFP_NOIO); 226 226 bio->bi_iter.bi_sector = bh->b_blocknr * (bh->b_size >> 9); 227 - bio_set_dev(bio, bh->b_bdev); 228 227 while (num > 0) { 229 228 bh = *bhs; 230 229 if (!bio_add_page(bio, bh->b_page, bh->b_size, bh_offset(bh))) { ··· 234 235 num--; 235 236 } 236 237 bio->bi_end_io = gfs2_meta_read_endio; 237 - bio_set_op_attrs(bio, op, op_flags); 238 238 submit_bio(bio); 239 239 } 240 240 }

+1 -3

fs/gfs2/ops_fstype.c

··· 251 251 ClearPageDirty(page); 252 252 lock_page(page); 253 253 254 - bio = bio_alloc(GFP_NOFS, 1); 254 + bio = bio_alloc(sb->s_bdev, 1, REQ_OP_READ | REQ_META, GFP_NOFS); 255 255 bio->bi_iter.bi_sector = sector * (sb->s_blocksize >> 9); 256 - bio_set_dev(bio, sb->s_bdev); 257 256 bio_add_page(bio, page, PAGE_SIZE, 0); 258 257 259 258 bio->bi_end_io = end_bio_io_page; 260 259 bio->bi_private = page; 261 - bio_set_op_attrs(bio, REQ_OP_READ, REQ_META); 262 260 submit_bio(bio); 263 261 wait_on_page_locked(page); 264 262 bio_put(bio);

+1 -1

fs/gfs2/sys.c

··· 15 15 #include <linux/kobject.h> 16 16 #include <linux/uaccess.h> 17 17 #include <linux/gfs2_ondisk.h> 18 - #include <linux/genhd.h> 18 + #include <linux/blkdev.h> 19 19 20 20 #include "gfs2.h" 21 21 #include "incore.h"

+1 -1

fs/hfs/mdb.c

··· 9 9 */ 10 10 11 11 #include <linux/cdrom.h> 12 - #include <linux/genhd.h> 12 + #include <linux/blkdev.h> 13 13 #include <linux/nls.h> 14 14 #include <linux/slab.h> 15 15

+1 -4

fs/hfsplus/wrapper.c

··· 12 12 #include <linux/fs.h> 13 13 #include <linux/blkdev.h> 14 14 #include <linux/cdrom.h> 15 - #include <linux/genhd.h> 16 15 #include <asm/unaligned.h> 17 16 18 17 #include "hfsplus_fs.h" ··· 63 64 offset = start & (io_size - 1); 64 65 sector &= ~((io_size >> HFSPLUS_SECTOR_SHIFT) - 1); 65 66 66 - bio = bio_alloc(GFP_NOIO, 1); 67 + bio = bio_alloc(sb->s_bdev, 1, op | op_flags, GFP_NOIO); 67 68 bio->bi_iter.bi_sector = sector; 68 - bio_set_dev(bio, sb->s_bdev); 69 - bio_set_op_attrs(bio, op, op_flags); 70 69 71 70 if (op != WRITE && data) 72 71 *data = (u8 *)buf + offset;

+12 -14

fs/iomap/buffered-io.c

··· 292 292 293 293 if (ctx->rac) /* same as readahead_gfp_mask */ 294 294 gfp |= __GFP_NORETRY | __GFP_NOWARN; 295 - ctx->bio = bio_alloc(gfp, bio_max_segs(nr_vecs)); 295 + ctx->bio = bio_alloc(iomap->bdev, bio_max_segs(nr_vecs), 296 + REQ_OP_READ, gfp); 296 297 /* 297 298 * If the bio_alloc fails, try it again for a single page to 298 299 * avoid having to deal with partial page reads. This emulates 299 300 * what do_mpage_readpage does. 300 301 */ 301 - if (!ctx->bio) 302 - ctx->bio = bio_alloc(orig_gfp, 1); 303 - ctx->bio->bi_opf = REQ_OP_READ; 302 + if (!ctx->bio) { 303 + ctx->bio = bio_alloc(iomap->bdev, 1, REQ_OP_READ, 304 + orig_gfp); 305 + } 304 306 if (ctx->rac) 305 307 ctx->bio->bi_opf |= REQ_RAHEAD; 306 308 ctx->bio->bi_iter.bi_sector = sector; 307 - bio_set_dev(ctx->bio, iomap->bdev); 308 309 ctx->bio->bi_end_io = iomap_read_end_io; 309 310 bio_add_folio(ctx->bio, folio, plen, poff); 310 311 } ··· 551 550 struct bio_vec bvec; 552 551 struct bio bio; 553 552 554 - bio_init(&bio, &bvec, 1); 555 - bio.bi_opf = REQ_OP_READ; 553 + bio_init(&bio, iomap->bdev, &bvec, 1, REQ_OP_READ); 556 554 bio.bi_iter.bi_sector = iomap_sector(iomap, block_start); 557 - bio_set_dev(&bio, iomap->bdev); 558 555 bio_add_folio(&bio, folio, plen, poff); 559 556 return submit_bio_wait(&bio); 560 557 } ··· 1228 1229 struct iomap_ioend *ioend; 1229 1230 struct bio *bio; 1230 1231 1231 - bio = bio_alloc_bioset(GFP_NOFS, BIO_MAX_VECS, &iomap_ioend_bioset); 1232 - bio_set_dev(bio, wpc->iomap.bdev); 1232 + bio = bio_alloc_bioset(wpc->iomap.bdev, BIO_MAX_VECS, 1233 + REQ_OP_WRITE | wbc_to_write_flags(wbc), 1234 + GFP_NOFS, &iomap_ioend_bioset); 1233 1235 bio->bi_iter.bi_sector = sector; 1234 - bio->bi_opf = REQ_OP_WRITE | wbc_to_write_flags(wbc); 1235 1236 bio->bi_write_hint = inode->i_write_hint; 1236 1237 wbc_init_bio(wbc, bio); 1237 1238 ··· 1260 1261 { 1261 1262 struct bio *new; 1262 1263 1263 - new = bio_alloc(GFP_NOFS, BIO_MAX_VECS); 1264 - bio_copy_dev(new, prev);/* also copies over blkcg information */ 1264 + new = bio_alloc(prev->bi_bdev, BIO_MAX_VECS, prev->bi_opf, GFP_NOFS); 1265 + bio_clone_blkg_association(new, prev); 1265 1266 new->bi_iter.bi_sector = bio_end_sector(prev); 1266 - new->bi_opf = prev->bi_opf; 1267 1267 new->bi_write_hint = prev->bi_write_hint; 1268 1268 1269 1269 bio_chain(prev, new);

+2 -6

fs/iomap/direct-io.c

··· 183 183 int flags = REQ_SYNC | REQ_IDLE; 184 184 struct bio *bio; 185 185 186 - bio = bio_alloc(GFP_KERNEL, 1); 187 - bio_set_dev(bio, iter->iomap.bdev); 186 + bio = bio_alloc(iter->iomap.bdev, 1, REQ_OP_WRITE | flags, GFP_KERNEL); 188 187 bio->bi_iter.bi_sector = iomap_sector(&iter->iomap, pos); 189 188 bio->bi_private = dio; 190 189 bio->bi_end_io = iomap_dio_bio_end_io; 191 190 192 191 get_page(page); 193 192 __bio_add_page(bio, page, len, 0); 194 - bio_set_op_attrs(bio, REQ_OP_WRITE, flags); 195 193 iomap_dio_submit_bio(iter, dio, bio, pos); 196 194 } 197 195 ··· 307 309 goto out; 308 310 } 309 311 310 - bio = bio_alloc(GFP_KERNEL, nr_pages); 311 - bio_set_dev(bio, iomap->bdev); 312 + bio = bio_alloc(iomap->bdev, nr_pages, bio_opf, GFP_KERNEL); 312 313 bio->bi_iter.bi_sector = iomap_sector(iomap, pos); 313 314 bio->bi_write_hint = dio->iocb->ki_hint; 314 315 bio->bi_ioprio = dio->iocb->ki_ioprio; 315 316 bio->bi_private = dio; 316 317 bio->bi_end_io = iomap_dio_bio_end_io; 317 - bio->bi_opf = bio_opf; 318 318 319 319 ret = bio_iov_iter_get_pages(bio, dio->submit.iter); 320 320 if (unlikely(ret)) {

+2 -9

fs/jfs/jfs_logmgr.c

··· 1980 1980 1981 1981 bp->l_flag |= lbmREAD; 1982 1982 1983 - bio = bio_alloc(GFP_NOFS, 1); 1984 - 1983 + bio = bio_alloc(log->bdev, 1, REQ_OP_READ, GFP_NOFS); 1985 1984 bio->bi_iter.bi_sector = bp->l_blkno << (log->l2bsize - 9); 1986 - bio_set_dev(bio, log->bdev); 1987 - 1988 1985 bio_add_page(bio, bp->l_page, LOGPSIZE, bp->l_offset); 1989 1986 BUG_ON(bio->bi_iter.bi_size != LOGPSIZE); 1990 1987 1991 1988 bio->bi_end_io = lbmIODone; 1992 1989 bio->bi_private = bp; 1993 - bio->bi_opf = REQ_OP_READ; 1994 1990 /*check if journaling to disk has been disabled*/ 1995 1991 if (log->no_integrity) { 1996 1992 bio->bi_iter.bi_size = 0; ··· 2121 2125 2122 2126 jfs_info("lbmStartIO"); 2123 2127 2124 - bio = bio_alloc(GFP_NOFS, 1); 2128 + bio = bio_alloc(log->bdev, 1, REQ_OP_WRITE | REQ_SYNC, GFP_NOFS); 2125 2129 bio->bi_iter.bi_sector = bp->l_blkno << (log->l2bsize - 9); 2126 - bio_set_dev(bio, log->bdev); 2127 - 2128 2130 bio_add_page(bio, bp->l_page, LOGPSIZE, bp->l_offset); 2129 2131 BUG_ON(bio->bi_iter.bi_size != LOGPSIZE); 2130 2132 2131 2133 bio->bi_end_io = lbmIODone; 2132 2134 bio->bi_private = bp; 2133 - bio->bi_opf = REQ_OP_WRITE | REQ_SYNC; 2134 2135 2135 2136 /* check if journaling to disk has been disabled */ 2136 2137 if (log->no_integrity) {

+3 -6

fs/jfs/jfs_metapage.c

··· 417 417 } 418 418 len = min(xlen, (int)JFS_SBI(inode->i_sb)->nbperpage); 419 419 420 - bio = bio_alloc(GFP_NOFS, 1); 421 - bio_set_dev(bio, inode->i_sb->s_bdev); 420 + bio = bio_alloc(inode->i_sb->s_bdev, 1, REQ_OP_WRITE, GFP_NOFS); 422 421 bio->bi_iter.bi_sector = pblock << (inode->i_blkbits - 9); 423 422 bio->bi_end_io = metapage_write_end_io; 424 423 bio->bi_private = page; 425 - bio_set_op_attrs(bio, REQ_OP_WRITE, 0); 426 424 427 425 /* Don't call bio_add_page yet, we may add to this vec */ 428 426 bio_offset = offset; ··· 495 497 if (bio) 496 498 submit_bio(bio); 497 499 498 - bio = bio_alloc(GFP_NOFS, 1); 499 - bio_set_dev(bio, inode->i_sb->s_bdev); 500 + bio = bio_alloc(inode->i_sb->s_bdev, 1, REQ_OP_READ, 501 + GFP_NOFS); 500 502 bio->bi_iter.bi_sector = 501 503 pblock << (inode->i_blkbits - 9); 502 504 bio->bi_end_io = metapage_read_end_io; 503 505 bio->bi_private = page; 504 - bio_set_op_attrs(bio, REQ_OP_READ, 0); 505 506 len = xlen << inode->i_blkbits; 506 507 offset = block_offset << inode->i_blkbits; 507 508 if (bio_add_page(bio, page, len, offset) < len)

-1

fs/ksmbd/vfs.c

··· 11 11 #include <linux/writeback.h> 12 12 #include <linux/xattr.h> 13 13 #include <linux/falloc.h> 14 - #include <linux/genhd.h> 15 14 #include <linux/fsnotify.h> 16 15 #include <linux/dcache.h> 17 16 #include <linux/slab.h>

+5 -29

fs/mpage.c

··· 66 66 return NULL; 67 67 } 68 68 69 - static struct bio * 70 - mpage_alloc(struct block_device *bdev, 71 - sector_t first_sector, int nr_vecs, 72 - gfp_t gfp_flags) 73 - { 74 - struct bio *bio; 75 - 76 - /* Restrict the given (page cache) mask for slab allocations */ 77 - gfp_flags &= GFP_KERNEL; 78 - bio = bio_alloc(gfp_flags, nr_vecs); 79 - 80 - if (bio == NULL && (current->flags & PF_MEMALLOC)) { 81 - while (!bio && (nr_vecs /= 2)) 82 - bio = bio_alloc(gfp_flags, nr_vecs); 83 - } 84 - 85 - if (bio) { 86 - bio_set_dev(bio, bdev); 87 - bio->bi_iter.bi_sector = first_sector; 88 - } 89 - return bio; 90 - } 91 - 92 69 /* 93 70 * support function for mpage_readahead. The fs supplied get_block might 94 71 * return an up to date buffer. This is used to map that buffer into ··· 273 296 page)) 274 297 goto out; 275 298 } 276 - args->bio = mpage_alloc(bdev, blocks[0] << (blkbits - 9), 277 - bio_max_segs(args->nr_pages), gfp); 299 + args->bio = bio_alloc(bdev, bio_max_segs(args->nr_pages), 0, 300 + gfp); 278 301 if (args->bio == NULL) 279 302 goto confused; 303 + args->bio->bi_iter.bi_sector = blocks[0] << (blkbits - 9); 280 304 } 281 305 282 306 length = first_hole << blkbits; ··· 586 608 page, wbc)) 587 609 goto out; 588 610 } 589 - bio = mpage_alloc(bdev, blocks[0] << (blkbits - 9), 590 - BIO_MAX_VECS, GFP_NOFS|__GFP_HIGH); 591 - if (bio == NULL) 592 - goto confused; 611 + bio = bio_alloc(bdev, BIO_MAX_VECS, 0, GFP_NOFS); 612 + bio->bi_iter.bi_sector = blocks[0] << (blkbits - 9); 593 613 594 614 wbc_init_bio(wbc, bio); 595 615 bio->bi_write_hint = inode->i_write_hint;

+4 -22

fs/nfs/blocklayout/blocklayout.c

··· 115 115 return NULL; 116 116 } 117 117 118 - static struct bio *bl_alloc_init_bio(unsigned int npg, 119 - struct block_device *bdev, sector_t disk_sector, 120 - bio_end_io_t end_io, struct parallel_io *par) 121 - { 122 - struct bio *bio; 123 - 124 - npg = bio_max_segs(npg); 125 - bio = bio_alloc(GFP_NOIO, npg); 126 - if (bio) { 127 - bio->bi_iter.bi_sector = disk_sector; 128 - bio_set_dev(bio, bdev); 129 - bio->bi_end_io = end_io; 130 - bio->bi_private = par; 131 - } 132 - return bio; 133 - } 134 - 135 118 static bool offset_in_map(u64 offset, struct pnfs_block_dev_map *map) 136 119 { 137 120 return offset >= map->start && offset < map->start + map->len; ··· 154 171 155 172 retry: 156 173 if (!bio) { 157 - bio = bl_alloc_init_bio(npg, map->bdev, 158 - disk_addr >> SECTOR_SHIFT, end_io, par); 159 - if (!bio) 160 - return ERR_PTR(-ENOMEM); 161 - bio_set_op_attrs(bio, rw, 0); 174 + bio = bio_alloc(map->bdev, bio_max_segs(npg), rw, GFP_NOIO); 175 + bio->bi_iter.bi_sector = disk_addr >> SECTOR_SHIFT; 176 + bio->bi_end_io = end_io; 177 + bio->bi_private = par; 162 178 } 163 179 if (bio_add_page(bio, page, *len, offset) < *len) { 164 180 bio = bl_submit_bio(bio);

-1

fs/nfs/blocklayout/rpc_pipefs.c

··· 27 27 */ 28 28 29 29 #include <linux/module.h> 30 - #include <linux/genhd.h> 31 30 #include <linux/blkdev.h> 32 31 33 32 #include "blocklayout.h"

-1

fs/nfsd/blocklayout.c

··· 4 4 */ 5 5 #include <linux/exportfs.h> 6 6 #include <linux/iomap.h> 7 - #include <linux/genhd.h> 8 7 #include <linux/slab.h> 9 8 #include <linux/pr.h> 10 9

+4 -27

fs/nilfs2/segbuf.c

··· 371 371 return err; 372 372 } 373 373 374 - /** 375 - * nilfs_alloc_seg_bio - allocate a new bio for writing log 376 - * @nilfs: nilfs object 377 - * @start: start block number of the bio 378 - * @nr_vecs: request size of page vector. 379 - * 380 - * Return Value: On success, pointer to the struct bio is returned. 381 - * On error, NULL is returned. 382 - */ 383 - static struct bio *nilfs_alloc_seg_bio(struct the_nilfs *nilfs, sector_t start, 384 - int nr_vecs) 385 - { 386 - struct bio *bio; 387 - 388 - bio = bio_alloc(GFP_NOIO, nr_vecs); 389 - if (likely(bio)) { 390 - bio_set_dev(bio, nilfs->ns_bdev); 391 - bio->bi_iter.bi_sector = 392 - start << (nilfs->ns_blocksize_bits - 9); 393 - } 394 - return bio; 395 - } 396 - 397 374 static void nilfs_segbuf_prepare_write(struct nilfs_segment_buffer *segbuf, 398 375 struct nilfs_write_info *wi) 399 376 { ··· 391 414 BUG_ON(wi->nr_vecs <= 0); 392 415 repeat: 393 416 if (!wi->bio) { 394 - wi->bio = nilfs_alloc_seg_bio(wi->nilfs, wi->blocknr + wi->end, 395 - wi->nr_vecs); 396 - if (unlikely(!wi->bio)) 397 - return -ENOMEM; 417 + wi->bio = bio_alloc(wi->nilfs->ns_bdev, wi->nr_vecs, 0, 418 + GFP_NOIO); 419 + wi->bio->bi_iter.bi_sector = (wi->blocknr + wi->end) << 420 + (wi->nilfs->ns_blocksize_bits - 9); 398 421 } 399 422 400 423 len = bio_add_page(wi->bio, bh->b_page, bh->b_size, bh_offset(bh));

+6 -30

fs/ntfs3/fsntfs.c

··· 1443 1443 return err; 1444 1444 } 1445 1445 1446 - static inline struct bio *ntfs_alloc_bio(u32 nr_vecs) 1447 - { 1448 - struct bio *bio = bio_alloc(GFP_NOFS | __GFP_HIGH, nr_vecs); 1449 - 1450 - if (!bio && (current->flags & PF_MEMALLOC)) { 1451 - while (!bio && (nr_vecs /= 2)) 1452 - bio = bio_alloc(GFP_NOFS | __GFP_HIGH, nr_vecs); 1453 - } 1454 - return bio; 1455 - } 1456 - 1457 1446 /* 1458 1447 * ntfs_bio_pages - Read/write pages from/to disk. 1459 1448 */ ··· 1485 1496 lbo = ((u64)lcn << cluster_bits) + off; 1486 1497 len = ((u64)clen << cluster_bits) - off; 1487 1498 new_bio: 1488 - new = ntfs_alloc_bio(nr_pages - page_idx); 1489 - if (!new) { 1490 - err = -ENOMEM; 1491 - goto out; 1492 - } 1499 + new = bio_alloc(bdev, nr_pages - page_idx, op, GFP_NOFS); 1493 1500 if (bio) { 1494 1501 bio_chain(bio, new); 1495 1502 submit_bio(bio); 1496 1503 } 1497 1504 bio = new; 1498 - bio_set_dev(bio, bdev); 1499 1505 bio->bi_iter.bi_sector = lbo >> 9; 1500 - bio->bi_opf = op; 1501 1506 1502 1507 while (len) { 1503 1508 off = vbo & (PAGE_SIZE - 1); ··· 1582 1599 lbo = (u64)lcn << cluster_bits; 1583 1600 len = (u64)clen << cluster_bits; 1584 1601 new_bio: 1585 - new = ntfs_alloc_bio(BIO_MAX_VECS); 1586 - if (!new) { 1587 - err = -ENOMEM; 1588 - break; 1589 - } 1602 + new = bio_alloc(bdev, BIO_MAX_VECS, REQ_OP_WRITE, GFP_NOFS); 1590 1603 if (bio) { 1591 1604 bio_chain(bio, new); 1592 1605 submit_bio(bio); 1593 1606 } 1594 1607 bio = new; 1595 - bio_set_dev(bio, bdev); 1596 - bio->bi_opf = REQ_OP_WRITE; 1597 1608 bio->bi_iter.bi_sector = lbo >> 9; 1598 1609 1599 1610 for (;;) { ··· 1603 1626 } 1604 1627 } while (run_get_entry(run, ++run_idx, NULL, &lcn, &clen)); 1605 1628 1606 - if (bio) { 1607 - if (!err) 1608 - err = submit_bio_wait(bio); 1609 - bio_put(bio); 1610 - } 1629 + if (!err) 1630 + err = submit_bio_wait(bio); 1631 + bio_put(bio); 1632 + 1611 1633 blk_finish_plug(&plug); 1612 1634 out: 1613 1635 unlock_page(fill);

+1 -3

fs/ocfs2/cluster/heartbeat.c

··· 518 518 * GFP_KERNEL that the local node can get fenced. It would be 519 519 * nicest if we could pre-allocate these bios and avoid this 520 520 * all together. */ 521 - bio = bio_alloc(GFP_ATOMIC, 16); 521 + bio = bio_alloc(reg->hr_bdev, 16, op | op_flags, GFP_ATOMIC); 522 522 if (!bio) { 523 523 mlog(ML_ERROR, "Could not alloc slots BIO!\n"); 524 524 bio = ERR_PTR(-ENOMEM); ··· 527 527 528 528 /* Must put everything in 512 byte sectors for the bio... */ 529 529 bio->bi_iter.bi_sector = (reg->hr_start_block + cs) << (bits - 9); 530 - bio_set_dev(bio, reg->hr_bdev); 531 530 bio->bi_private = wc; 532 531 bio->bi_end_io = o2hb_bio_end_io; 533 - bio_set_op_attrs(bio, op, op_flags); 534 532 535 533 vec_start = (cs << bits) % PAGE_SIZE; 536 534 while(cs < max_slots) {

+6 -5

fs/squashfs/block.c

··· 86 86 int error, i; 87 87 struct bio *bio; 88 88 89 - if (page_count <= BIO_MAX_VECS) 90 - bio = bio_alloc(GFP_NOIO, page_count); 91 - else 89 + if (page_count <= BIO_MAX_VECS) { 90 + bio = bio_alloc(sb->s_bdev, page_count, REQ_OP_READ, GFP_NOIO); 91 + } else { 92 92 bio = bio_kmalloc(GFP_NOIO, page_count); 93 + bio_set_dev(bio, sb->s_bdev); 94 + bio->bi_opf = REQ_OP_READ; 95 + } 93 96 94 97 if (!bio) 95 98 return -ENOMEM; 96 99 97 - bio_set_dev(bio, sb->s_bdev); 98 - bio->bi_opf = READ; 99 100 bio->bi_iter.bi_sector = block * (msblk->devblksize >> SECTOR_SHIFT); 100 101 101 102 for (i = 0; i < page_count; ++i) {

+5 -9

fs/xfs/xfs_bio_io.c

··· 36 36 return; 37 37 } 38 38 39 - bio_init(bio, NULL, 0); 40 - bio_set_dev(bio, bdev); 41 - bio->bi_opf = REQ_OP_WRITE | REQ_PREFLUSH | REQ_SYNC; 39 + bio_init(bio, bdev, NULL, 0, REQ_OP_WRITE | REQ_PREFLUSH | REQ_SYNC); 42 40 bio->bi_private = done; 43 41 bio->bi_end_io = xfs_flush_bdev_async_endio; 44 42 ··· 59 61 if (is_vmalloc && op == REQ_OP_WRITE) 60 62 flush_kernel_vmap_range(data, count); 61 63 62 - bio = bio_alloc(GFP_KERNEL, bio_max_vecs(left)); 63 - bio_set_dev(bio, bdev); 64 + bio = bio_alloc(bdev, bio_max_vecs(left), op | REQ_META | REQ_SYNC, 65 + GFP_KERNEL); 64 66 bio->bi_iter.bi_sector = sector; 65 - bio->bi_opf = op | REQ_META | REQ_SYNC; 66 67 67 68 do { 68 69 struct page *page = kmem_to_page(data); ··· 71 74 while (bio_add_page(bio, page, len, off) != len) { 72 75 struct bio *prev = bio; 73 76 74 - bio = bio_alloc(GFP_KERNEL, bio_max_vecs(left)); 75 - bio_copy_dev(bio, prev); 77 + bio = bio_alloc(prev->bi_bdev, bio_max_vecs(left), 78 + prev->bi_opf, GFP_KERNEL); 76 79 bio->bi_iter.bi_sector = bio_end_sector(prev); 77 - bio->bi_opf = prev->bi_opf; 78 80 bio_chain(prev, bio); 79 81 80 82 submit_bio(prev);

+1 -3

fs/xfs/xfs_buf.c

··· 1440 1440 atomic_inc(&bp->b_io_remaining); 1441 1441 nr_pages = bio_max_segs(total_nr_pages); 1442 1442 1443 - bio = bio_alloc(GFP_NOIO, nr_pages); 1444 - bio_set_dev(bio, bp->b_target->bt_bdev); 1443 + bio = bio_alloc(bp->b_target->bt_bdev, nr_pages, op, GFP_NOIO); 1445 1444 bio->bi_iter.bi_sector = sector; 1446 1445 bio->bi_end_io = xfs_buf_bio_end_io; 1447 1446 bio->bi_private = bp; 1448 - bio->bi_opf = op; 1449 1447 1450 1448 for (; size && nr_pages; nr_pages--, page_index++) { 1451 1449 int rbytes, nbytes = PAGE_SIZE - offset;

+7 -7

fs/xfs/xfs_log.c

··· 1883 1883 return; 1884 1884 } 1885 1885 1886 - bio_init(&iclog->ic_bio, iclog->ic_bvec, howmany(count, PAGE_SIZE)); 1887 - bio_set_dev(&iclog->ic_bio, log->l_targ->bt_bdev); 1888 - iclog->ic_bio.bi_iter.bi_sector = log->l_logBBstart + bno; 1889 - iclog->ic_bio.bi_end_io = xlog_bio_end_io; 1890 - iclog->ic_bio.bi_private = iclog; 1891 - 1892 1886 /* 1893 1887 * We use REQ_SYNC | REQ_IDLE here to tell the block layer the are more 1894 1888 * IOs coming immediately after this one. This prevents the block layer 1895 1889 * writeback throttle from throttling log writes behind background 1896 1890 * metadata writeback and causing priority inversions. 1897 1891 */ 1898 - iclog->ic_bio.bi_opf = REQ_OP_WRITE | REQ_META | REQ_SYNC | REQ_IDLE; 1892 + bio_init(&iclog->ic_bio, log->l_targ->bt_bdev, iclog->ic_bvec, 1893 + howmany(count, PAGE_SIZE), 1894 + REQ_OP_WRITE | REQ_META | REQ_SYNC | REQ_IDLE); 1895 + iclog->ic_bio.bi_iter.bi_sector = log->l_logBBstart + bno; 1896 + iclog->ic_bio.bi_end_io = xlog_bio_end_io; 1897 + iclog->ic_bio.bi_private = iclog; 1898 + 1899 1899 if (iclog->ic_flags & XLOG_ICL_NEED_FLUSH) { 1900 1900 iclog->ic_bio.bi_opf |= REQ_PREFLUSH; 1901 1901 /*

+3 -6

fs/zonefs/super.c

··· 692 692 if (!nr_pages) 693 693 return 0; 694 694 695 - bio = bio_alloc(GFP_NOFS, nr_pages); 696 - bio_set_dev(bio, bdev); 695 + bio = bio_alloc(bdev, nr_pages, 696 + REQ_OP_ZONE_APPEND | REQ_SYNC | REQ_IDLE, GFP_NOFS); 697 697 bio->bi_iter.bi_sector = zi->i_zsector; 698 698 bio->bi_write_hint = iocb->ki_hint; 699 699 bio->bi_ioprio = iocb->ki_ioprio; 700 - bio->bi_opf = REQ_OP_ZONE_APPEND | REQ_SYNC | REQ_IDLE; 701 700 if (iocb->ki_flags & IOCB_DSYNC) 702 701 bio->bi_opf |= REQ_FUA; 703 702 ··· 1540 1541 if (!page) 1541 1542 return -ENOMEM; 1542 1543 1543 - bio_init(&bio, &bio_vec, 1); 1544 + bio_init(&bio, sb->s_bdev, &bio_vec, 1, REQ_OP_READ); 1544 1545 bio.bi_iter.bi_sector = 0; 1545 - bio.bi_opf = REQ_OP_READ; 1546 - bio_set_dev(&bio, sb->s_bdev); 1547 1546 bio_add_page(&bio, page, PAGE_SIZE, 0); 1548 1547 1549 1548 ret = submit_bio_wait(&bio);

+17 -21

include/linux/bio.h

··· 405 405 extern int biovec_init_pool(mempool_t *pool, int pool_entries); 406 406 extern int bioset_init_from_src(struct bio_set *bs, struct bio_set *src); 407 407 408 - struct bio *bio_alloc_bioset(gfp_t gfp, unsigned short nr_iovecs, 409 - struct bio_set *bs); 410 - struct bio *bio_alloc_kiocb(struct kiocb *kiocb, unsigned short nr_vecs, 411 - struct bio_set *bs); 408 + struct bio *bio_alloc_bioset(struct block_device *bdev, unsigned short nr_vecs, 409 + unsigned int opf, gfp_t gfp_mask, 410 + struct bio_set *bs); 411 + struct bio *bio_alloc_kiocb(struct kiocb *kiocb, struct block_device *bdev, 412 + unsigned short nr_vecs, unsigned int opf, struct bio_set *bs); 412 413 struct bio *bio_kmalloc(gfp_t gfp_mask, unsigned short nr_iovecs); 413 414 extern void bio_put(struct bio *); 414 415 415 - extern void __bio_clone_fast(struct bio *, struct bio *); 416 - extern struct bio *bio_clone_fast(struct bio *, gfp_t, struct bio_set *); 416 + struct bio *bio_alloc_clone(struct block_device *bdev, struct bio *bio_src, 417 + gfp_t gfp, struct bio_set *bs); 418 + int bio_init_clone(struct block_device *bdev, struct bio *bio, 419 + struct bio *bio_src, gfp_t gfp); 417 420 418 421 extern struct bio_set fs_bio_set; 419 422 420 - static inline struct bio *bio_alloc(gfp_t gfp_mask, unsigned short nr_iovecs) 423 + static inline struct bio *bio_alloc(struct block_device *bdev, 424 + unsigned short nr_vecs, unsigned int opf, gfp_t gfp_mask) 421 425 { 422 - return bio_alloc_bioset(gfp_mask, nr_iovecs, &fs_bio_set); 426 + return bio_alloc_bioset(bdev, nr_vecs, opf, gfp_mask, &fs_bio_set); 423 427 } 424 428 425 429 void submit_bio(struct bio *bio); ··· 458 454 struct request_queue; 459 455 460 456 extern int submit_bio_wait(struct bio *bio); 461 - extern void bio_init(struct bio *bio, struct bio_vec *table, 462 - unsigned short max_vecs); 457 + void bio_init(struct bio *bio, struct block_device *bdev, struct bio_vec *table, 458 + unsigned short max_vecs, unsigned int opf); 463 459 extern void bio_uninit(struct bio *); 464 - extern void bio_reset(struct bio *); 460 + void bio_reset(struct bio *bio, struct block_device *bdev, unsigned int opf); 465 461 void bio_chain(struct bio *, struct bio *); 466 462 467 463 int bio_add_page(struct bio *, struct page *, unsigned len, unsigned off); ··· 491 487 __bio_release_pages(bio, mark_dirty); 492 488 } 493 489 494 - extern const char *bio_devname(struct bio *bio, char *buffer); 495 - 496 490 #define bio_dev(bio) \ 497 491 disk_devt((bio)->bi_bdev->bd_disk) 498 492 ··· 515 513 bio_clear_flag(bio, BIO_THROTTLED); 516 514 bio->bi_bdev = bdev; 517 515 bio_associate_blkg(bio); 518 - } 519 - 520 - static inline void bio_copy_dev(struct bio *dst, struct bio *src) 521 - { 522 - bio_clear_flag(dst, BIO_REMAPPED); 523 - dst->bi_bdev = src->bi_bdev; 524 - bio_clone_blkg_association(dst, src); 525 516 } 526 517 527 518 /* ··· 785 790 bio->bi_opf |= REQ_NOWAIT; 786 791 } 787 792 788 - struct bio *blk_next_bio(struct bio *bio, unsigned int nr_pages, gfp_t gfp); 793 + struct bio *blk_next_bio(struct bio *bio, struct block_device *bdev, 794 + unsigned int nr_pages, unsigned int opf, gfp_t gfp); 789 795 790 796 #endif /* __LINUX_BIO_H */

+5 -456

include/linux/blk-cgroup.h

··· 25 25 #include <linux/kthread.h> 26 26 #include <linux/fs.h> 27 27 28 - /* percpu_counter batch for blkg_[rw]stats, per-cpu drift doesn't matter */ 29 - #define BLKG_STAT_CPU_BATCH (INT_MAX / 2) 30 - 31 - /* Max limits for throttle policy */ 32 - #define THROTL_IOPS_MAX UINT_MAX 33 28 #define FC_APPID_LEN 129 34 - 35 29 36 30 #ifdef CONFIG_BLK_CGROUP 37 31 ··· 38 44 }; 39 45 40 46 struct blkcg_gq; 47 + struct blkg_policy_data; 41 48 42 49 struct blkcg { 43 50 struct cgroup_subsys_state css; ··· 69 74 struct u64_stats_sync sync; 70 75 struct blkg_iostat cur; 71 76 struct blkg_iostat last; 72 - }; 73 - 74 - /* 75 - * A blkcg_gq (blkg) is association between a block cgroup (blkcg) and a 76 - * request_queue (q). This is used by blkcg policies which need to track 77 - * information per blkcg - q pair. 78 - * 79 - * There can be multiple active blkcg policies and each blkg:policy pair is 80 - * represented by a blkg_policy_data which is allocated and freed by each 81 - * policy's pd_alloc/free_fn() methods. A policy can allocate private data 82 - * area by allocating larger data structure which embeds blkg_policy_data 83 - * at the beginning. 84 - */ 85 - struct blkg_policy_data { 86 - /* the blkg and policy id this per-policy data belongs to */ 87 - struct blkcg_gq *blkg; 88 - int plid; 89 - }; 90 - 91 - /* 92 - * Policies that need to keep per-blkcg data which is independent from any 93 - * request_queue associated to it should implement cpd_alloc/free_fn() 94 - * methods. A policy can allocate private data area by allocating larger 95 - * data structure which embeds blkcg_policy_data at the beginning. 96 - * cpd_init() is invoked to let each policy handle per-blkcg data. 97 - */ 98 - struct blkcg_policy_data { 99 - /* the blkcg and policy id this per-policy data belongs to */ 100 - struct blkcg *blkcg; 101 - int plid; 102 77 }; 103 78 104 79 /* association between a blk cgroup and a request queue */ ··· 106 141 struct rcu_head rcu_head; 107 142 }; 108 143 109 - typedef struct blkcg_policy_data *(blkcg_pol_alloc_cpd_fn)(gfp_t gfp); 110 - typedef void (blkcg_pol_init_cpd_fn)(struct blkcg_policy_data *cpd); 111 - typedef void (blkcg_pol_free_cpd_fn)(struct blkcg_policy_data *cpd); 112 - typedef void (blkcg_pol_bind_cpd_fn)(struct blkcg_policy_data *cpd); 113 - typedef struct blkg_policy_data *(blkcg_pol_alloc_pd_fn)(gfp_t gfp, 114 - struct request_queue *q, struct blkcg *blkcg); 115 - typedef void (blkcg_pol_init_pd_fn)(struct blkg_policy_data *pd); 116 - typedef void (blkcg_pol_online_pd_fn)(struct blkg_policy_data *pd); 117 - typedef void (blkcg_pol_offline_pd_fn)(struct blkg_policy_data *pd); 118 - typedef void (blkcg_pol_free_pd_fn)(struct blkg_policy_data *pd); 119 - typedef void (blkcg_pol_reset_pd_stats_fn)(struct blkg_policy_data *pd); 120 - typedef bool (blkcg_pol_stat_pd_fn)(struct blkg_policy_data *pd, 121 - struct seq_file *s); 122 - 123 - struct blkcg_policy { 124 - int plid; 125 - /* cgroup files for the policy */ 126 - struct cftype *dfl_cftypes; 127 - struct cftype *legacy_cftypes; 128 - 129 - /* operations */ 130 - blkcg_pol_alloc_cpd_fn *cpd_alloc_fn; 131 - blkcg_pol_init_cpd_fn *cpd_init_fn; 132 - blkcg_pol_free_cpd_fn *cpd_free_fn; 133 - blkcg_pol_bind_cpd_fn *cpd_bind_fn; 134 - 135 - blkcg_pol_alloc_pd_fn *pd_alloc_fn; 136 - blkcg_pol_init_pd_fn *pd_init_fn; 137 - blkcg_pol_online_pd_fn *pd_online_fn; 138 - blkcg_pol_offline_pd_fn *pd_offline_fn; 139 - blkcg_pol_free_pd_fn *pd_free_fn; 140 - blkcg_pol_reset_pd_stats_fn *pd_reset_stats_fn; 141 - blkcg_pol_stat_pd_fn *pd_stat_fn; 142 - }; 143 - 144 - extern struct blkcg blkcg_root; 145 144 extern struct cgroup_subsys_state * const blkcg_root_css; 146 - extern bool blkcg_debug_stats; 147 145 148 - struct blkcg_gq *blkg_lookup_slowpath(struct blkcg *blkcg, 149 - struct request_queue *q, bool update_hint); 150 - int blkcg_init_queue(struct request_queue *q); 151 - void blkcg_exit_queue(struct request_queue *q); 152 - 153 - /* Blkio controller policy registration */ 154 - int blkcg_policy_register(struct blkcg_policy *pol); 155 - void blkcg_policy_unregister(struct blkcg_policy *pol); 156 - int blkcg_activate_policy(struct request_queue *q, 157 - const struct blkcg_policy *pol); 158 - void blkcg_deactivate_policy(struct request_queue *q, 159 - const struct blkcg_policy *pol); 160 - 161 - const char *blkg_dev_name(struct blkcg_gq *blkg); 162 - void blkcg_print_blkgs(struct seq_file *sf, struct blkcg *blkcg, 163 - u64 (*prfill)(struct seq_file *, 164 - struct blkg_policy_data *, int), 165 - const struct blkcg_policy *pol, int data, 166 - bool show_total); 167 - u64 __blkg_prfill_u64(struct seq_file *sf, struct blkg_policy_data *pd, u64 v); 168 - 169 - struct blkg_conf_ctx { 170 - struct block_device *bdev; 171 - struct blkcg_gq *blkg; 172 - char *body; 173 - }; 174 - 175 - struct block_device *blkcg_conf_open_bdev(char **inputp); 176 - int blkg_conf_prep(struct blkcg *blkcg, const struct blkcg_policy *pol, 177 - char *input, struct blkg_conf_ctx *ctx); 178 - void blkg_conf_finish(struct blkg_conf_ctx *ctx); 179 - 180 - /** 181 - * blkcg_css - find the current css 182 - * 183 - * Find the css associated with either the kthread or the current task. 184 - * This may return a dying css, so it is up to the caller to use tryget logic 185 - * to confirm it is alive and well. 186 - */ 187 - static inline struct cgroup_subsys_state *blkcg_css(void) 188 - { 189 - struct cgroup_subsys_state *css; 190 - 191 - css = kthread_blkcg(); 192 - if (css) 193 - return css; 194 - return task_css(current, io_cgrp_id); 195 - } 146 + void blkcg_destroy_blkgs(struct blkcg *blkcg); 147 + void blkcg_schedule_throttle(struct request_queue *q, bool use_memdelay); 148 + void blkcg_maybe_throttle_current(void); 196 149 197 150 static inline struct blkcg *css_to_blkcg(struct cgroup_subsys_state *css) 198 151 { 199 152 return css ? container_of(css, struct blkcg, css) : NULL; 200 - } 201 - 202 - /** 203 - * __bio_blkcg - internal, inconsistent version to get blkcg 204 - * 205 - * DO NOT USE. 206 - * This function is inconsistent and consequently is dangerous to use. The 207 - * first part of the function returns a blkcg where a reference is owned by the 208 - * bio. This means it does not need to be rcu protected as it cannot go away 209 - * with the bio owning a reference to it. However, the latter potentially gets 210 - * it from task_css(). This can race against task migration and the cgroup 211 - * dying. It is also semantically different as it must be called rcu protected 212 - * and is susceptible to failure when trying to get a reference to it. 213 - * Therefore, it is not ok to assume that *_get() will always succeed on the 214 - * blkcg returned here. 215 - */ 216 - static inline struct blkcg *__bio_blkcg(struct bio *bio) 217 - { 218 - if (bio && bio->bi_blkg) 219 - return bio->bi_blkg->blkcg; 220 - return css_to_blkcg(blkcg_css()); 221 153 } 222 154 223 155 /** ··· 153 291 } 154 292 155 293 /** 156 - * bio_issue_as_root_blkg - see if this bio needs to be issued as root blkg 157 - * @return: true if this bio needs to be submitted with the root blkg context. 158 - * 159 - * In order to avoid priority inversions we sometimes need to issue a bio as if 160 - * it were attached to the root blkg, and then backcharge to the actual owning 161 - * blkg. The idea is we do bio_blkcg() to look up the actual context for the 162 - * bio and attach the appropriate blkg to the bio. Then we call this helper and 163 - * if it is true run with the root blkg for that queue and then do any 164 - * backcharging to the originating cgroup once the io is complete. 165 - */ 166 - static inline bool bio_issue_as_root_blkg(struct bio *bio) 167 - { 168 - return (bio->bi_opf & (REQ_META | REQ_SWAP)) != 0; 169 - } 170 - 171 - /** 172 294 * blkcg_parent - get the parent of a blkcg 173 295 * @blkcg: blkcg of interest 174 296 * ··· 162 316 { 163 317 return css_to_blkcg(blkcg->css.parent); 164 318 } 165 - 166 - /** 167 - * __blkg_lookup - internal version of blkg_lookup() 168 - * @blkcg: blkcg of interest 169 - * @q: request_queue of interest 170 - * @update_hint: whether to update lookup hint with the result or not 171 - * 172 - * This is internal version and shouldn't be used by policy 173 - * implementations. Looks up blkgs for the @blkcg - @q pair regardless of 174 - * @q's bypass state. If @update_hint is %true, the caller should be 175 - * holding @q->queue_lock and lookup hint is updated on success. 176 - */ 177 - static inline struct blkcg_gq *__blkg_lookup(struct blkcg *blkcg, 178 - struct request_queue *q, 179 - bool update_hint) 180 - { 181 - struct blkcg_gq *blkg; 182 - 183 - if (blkcg == &blkcg_root) 184 - return q->root_blkg; 185 - 186 - blkg = rcu_dereference(blkcg->blkg_hint); 187 - if (blkg && blkg->q == q) 188 - return blkg; 189 - 190 - return blkg_lookup_slowpath(blkcg, q, update_hint); 191 - } 192 - 193 - /** 194 - * blkg_lookup - lookup blkg for the specified blkcg - q pair 195 - * @blkcg: blkcg of interest 196 - * @q: request_queue of interest 197 - * 198 - * Lookup blkg for the @blkcg - @q pair. This function should be called 199 - * under RCU read lock. 200 - */ 201 - static inline struct blkcg_gq *blkg_lookup(struct blkcg *blkcg, 202 - struct request_queue *q) 203 - { 204 - WARN_ON_ONCE(!rcu_read_lock_held()); 205 - return __blkg_lookup(blkcg, q, false); 206 - } 207 - 208 - /** 209 - * blk_queue_root_blkg - return blkg for the (blkcg_root, @q) pair 210 - * @q: request_queue of interest 211 - * 212 - * Lookup blkg for @q at the root level. See also blkg_lookup(). 213 - */ 214 - static inline struct blkcg_gq *blk_queue_root_blkg(struct request_queue *q) 215 - { 216 - return q->root_blkg; 217 - } 218 - 219 - /** 220 - * blkg_to_pdata - get policy private data 221 - * @blkg: blkg of interest 222 - * @pol: policy of interest 223 - * 224 - * Return pointer to private data associated with the @blkg-@pol pair. 225 - */ 226 - static inline struct blkg_policy_data *blkg_to_pd(struct blkcg_gq *blkg, 227 - struct blkcg_policy *pol) 228 - { 229 - return blkg ? blkg->pd[pol->plid] : NULL; 230 - } 231 - 232 - static inline struct blkcg_policy_data *blkcg_to_cpd(struct blkcg *blkcg, 233 - struct blkcg_policy *pol) 234 - { 235 - return blkcg ? blkcg->cpd[pol->plid] : NULL; 236 - } 237 - 238 - /** 239 - * pdata_to_blkg - get blkg associated with policy private data 240 - * @pd: policy private data of interest 241 - * 242 - * @pd is policy private data. Determine the blkg it's associated with. 243 - */ 244 - static inline struct blkcg_gq *pd_to_blkg(struct blkg_policy_data *pd) 245 - { 246 - return pd ? pd->blkg : NULL; 247 - } 248 - 249 - static inline struct blkcg *cpd_to_blkcg(struct blkcg_policy_data *cpd) 250 - { 251 - return cpd ? cpd->blkcg : NULL; 252 - } 253 - 254 - extern void blkcg_destroy_blkgs(struct blkcg *blkcg); 255 319 256 320 /** 257 321 * blkcg_pin_online - pin online state ··· 195 439 } while (blkcg); 196 440 } 197 441 198 - /** 199 - * blkg_path - format cgroup path of blkg 200 - * @blkg: blkg of interest 201 - * @buf: target buffer 202 - * @buflen: target buffer length 203 - * 204 - * Format the path of the cgroup of @blkg into @buf. 205 - */ 206 - static inline int blkg_path(struct blkcg_gq *blkg, char *buf, int buflen) 207 - { 208 - return cgroup_path(blkg->blkcg->css.cgroup, buf, buflen); 209 - } 210 - 211 - /** 212 - * blkg_get - get a blkg reference 213 - * @blkg: blkg to get 214 - * 215 - * The caller should be holding an existing reference. 216 - */ 217 - static inline void blkg_get(struct blkcg_gq *blkg) 218 - { 219 - percpu_ref_get(&blkg->refcnt); 220 - } 221 - 222 - /** 223 - * blkg_tryget - try and get a blkg reference 224 - * @blkg: blkg to get 225 - * 226 - * This is for use when doing an RCU lookup of the blkg. We may be in the midst 227 - * of freeing this blkg, so we can only use it if the refcnt is not zero. 228 - */ 229 - static inline bool blkg_tryget(struct blkcg_gq *blkg) 230 - { 231 - return blkg && percpu_ref_tryget(&blkg->refcnt); 232 - } 233 - 234 - /** 235 - * blkg_put - put a blkg reference 236 - * @blkg: blkg to put 237 - */ 238 - static inline void blkg_put(struct blkcg_gq *blkg) 239 - { 240 - percpu_ref_put(&blkg->refcnt); 241 - } 242 - 243 - /** 244 - * blkg_for_each_descendant_pre - pre-order walk of a blkg's descendants 245 - * @d_blkg: loop cursor pointing to the current descendant 246 - * @pos_css: used for iteration 247 - * @p_blkg: target blkg to walk descendants of 248 - * 249 - * Walk @c_blkg through the descendants of @p_blkg. Must be used with RCU 250 - * read locked. If called under either blkcg or queue lock, the iteration 251 - * is guaranteed to include all and only online blkgs. The caller may 252 - * update @pos_css by calling css_rightmost_descendant() to skip subtree. 253 - * @p_blkg is included in the iteration and the first node to be visited. 254 - */ 255 - #define blkg_for_each_descendant_pre(d_blkg, pos_css, p_blkg) \ 256 - css_for_each_descendant_pre((pos_css), &(p_blkg)->blkcg->css) \ 257 - if (((d_blkg) = __blkg_lookup(css_to_blkcg(pos_css), \ 258 - (p_blkg)->q, false))) 259 - 260 - /** 261 - * blkg_for_each_descendant_post - post-order walk of a blkg's descendants 262 - * @d_blkg: loop cursor pointing to the current descendant 263 - * @pos_css: used for iteration 264 - * @p_blkg: target blkg to walk descendants of 265 - * 266 - * Similar to blkg_for_each_descendant_pre() but performs post-order 267 - * traversal instead. Synchronization rules are the same. @p_blkg is 268 - * included in the iteration and the last node to be visited. 269 - */ 270 - #define blkg_for_each_descendant_post(d_blkg, pos_css, p_blkg) \ 271 - css_for_each_descendant_post((pos_css), &(p_blkg)->blkcg->css) \ 272 - if (((d_blkg) = __blkg_lookup(css_to_blkcg(pos_css), \ 273 - (p_blkg)->q, false))) 274 - 275 - bool __blkcg_punt_bio_submit(struct bio *bio); 276 - 277 - static inline bool blkcg_punt_bio_submit(struct bio *bio) 278 - { 279 - if (bio->bi_opf & REQ_CGROUP_PUNT) 280 - return __blkcg_punt_bio_submit(bio); 281 - else 282 - return false; 283 - } 284 - 285 - static inline void blkcg_bio_issue_init(struct bio *bio) 286 - { 287 - bio_issue_init(&bio->bi_issue, bio_sectors(bio)); 288 - } 289 - 290 - static inline void blkcg_use_delay(struct blkcg_gq *blkg) 291 - { 292 - if (WARN_ON_ONCE(atomic_read(&blkg->use_delay) < 0)) 293 - return; 294 - if (atomic_add_return(1, &blkg->use_delay) == 1) 295 - atomic_inc(&blkg->blkcg->css.cgroup->congestion_count); 296 - } 297 - 298 - static inline int blkcg_unuse_delay(struct blkcg_gq *blkg) 299 - { 300 - int old = atomic_read(&blkg->use_delay); 301 - 302 - if (WARN_ON_ONCE(old < 0)) 303 - return 0; 304 - if (old == 0) 305 - return 0; 306 - 307 - /* 308 - * We do this song and dance because we can race with somebody else 309 - * adding or removing delay. If we just did an atomic_dec we'd end up 310 - * negative and we'd already be in trouble. We need to subtract 1 and 311 - * then check to see if we were the last delay so we can drop the 312 - * congestion count on the cgroup. 313 - */ 314 - while (old) { 315 - int cur = atomic_cmpxchg(&blkg->use_delay, old, old - 1); 316 - if (cur == old) 317 - break; 318 - old = cur; 319 - } 320 - 321 - if (old == 0) 322 - return 0; 323 - if (old == 1) 324 - atomic_dec(&blkg->blkcg->css.cgroup->congestion_count); 325 - return 1; 326 - } 327 - 328 - /** 329 - * blkcg_set_delay - Enable allocator delay mechanism with the specified delay amount 330 - * @blkg: target blkg 331 - * @delay: delay duration in nsecs 332 - * 333 - * When enabled with this function, the delay is not decayed and must be 334 - * explicitly cleared with blkcg_clear_delay(). Must not be mixed with 335 - * blkcg_[un]use_delay() and blkcg_add_delay() usages. 336 - */ 337 - static inline void blkcg_set_delay(struct blkcg_gq *blkg, u64 delay) 338 - { 339 - int old = atomic_read(&blkg->use_delay); 340 - 341 - /* We only want 1 person setting the congestion count for this blkg. */ 342 - if (!old && atomic_cmpxchg(&blkg->use_delay, old, -1) == old) 343 - atomic_inc(&blkg->blkcg->css.cgroup->congestion_count); 344 - 345 - atomic64_set(&blkg->delay_nsec, delay); 346 - } 347 - 348 - /** 349 - * blkcg_clear_delay - Disable allocator delay mechanism 350 - * @blkg: target blkg 351 - * 352 - * Disable use_delay mechanism. See blkcg_set_delay(). 353 - */ 354 - static inline void blkcg_clear_delay(struct blkcg_gq *blkg) 355 - { 356 - int old = atomic_read(&blkg->use_delay); 357 - 358 - /* We only want 1 person clearing the congestion count for this blkg. */ 359 - if (old && atomic_cmpxchg(&blkg->use_delay, old, 0) == old) 360 - atomic_dec(&blkg->blkcg->css.cgroup->congestion_count); 361 - } 362 - 363 - void blk_cgroup_bio_start(struct bio *bio); 364 - void blkcg_add_delay(struct blkcg_gq *blkg, u64 now, u64 delta); 365 - void blkcg_schedule_throttle(struct request_queue *q, bool use_memdelay); 366 - void blkcg_maybe_throttle_current(void); 367 442 #else /* CONFIG_BLK_CGROUP */ 368 443 369 444 struct blkcg { 370 445 }; 371 446 372 - struct blkg_policy_data { 373 - }; 374 - 375 - struct blkcg_policy_data { 376 - }; 377 - 378 447 struct blkcg_gq { 379 - }; 380 - 381 - struct blkcg_policy { 382 448 }; 383 449 384 450 #define blkcg_root_css ((struct cgroup_subsys_state *)ERR_PTR(-EINVAL)) ··· 209 631 static inline bool blk_cgroup_congested(void) { return false; } 210 632 211 633 #ifdef CONFIG_BLOCK 212 - 213 634 static inline void blkcg_schedule_throttle(struct request_queue *q, bool use_memdelay) { } 214 - 215 - static inline struct blkcg_gq *blkg_lookup(struct blkcg *blkcg, void *key) { return NULL; } 216 - static inline struct blkcg_gq *blk_queue_root_blkg(struct request_queue *q) 217 - { return NULL; } 218 - static inline int blkcg_init_queue(struct request_queue *q) { return 0; } 219 - static inline void blkcg_exit_queue(struct request_queue *q) { } 220 - static inline int blkcg_policy_register(struct blkcg_policy *pol) { return 0; } 221 - static inline void blkcg_policy_unregister(struct blkcg_policy *pol) { } 222 - static inline int blkcg_activate_policy(struct request_queue *q, 223 - const struct blkcg_policy *pol) { return 0; } 224 - static inline void blkcg_deactivate_policy(struct request_queue *q, 225 - const struct blkcg_policy *pol) { } 226 - 227 - static inline struct blkcg *__bio_blkcg(struct bio *bio) { return NULL; } 228 635 static inline struct blkcg *bio_blkcg(struct bio *bio) { return NULL; } 636 + #endif /* CONFIG_BLOCK */ 229 637 230 - static inline struct blkg_policy_data *blkg_to_pd(struct blkcg_gq *blkg, 231 - struct blkcg_policy *pol) { return NULL; } 232 - static inline struct blkcg_gq *pd_to_blkg(struct blkg_policy_data *pd) { return NULL; } 233 - static inline char *blkg_path(struct blkcg_gq *blkg) { return NULL; } 234 - static inline void blkg_get(struct blkcg_gq *blkg) { } 235 - static inline void blkg_put(struct blkcg_gq *blkg) { } 236 - 237 - static inline bool blkcg_punt_bio_submit(struct bio *bio) { return false; } 238 - static inline void blkcg_bio_issue_init(struct bio *bio) { } 239 - static inline void blk_cgroup_bio_start(struct bio *bio) { } 240 - 241 - #define blk_queue_for_each_rl(rl, q) \ 242 - for ((rl) = &(q)->root_rl; (rl); (rl) = NULL) 243 - 244 - #endif /* CONFIG_BLOCK */ 245 638 #endif /* CONFIG_BLK_CGROUP */ 246 639 247 640 #ifdef CONFIG_BLK_CGROUP_FC_APPID

+1 -2

include/linux/blk-mq.h

··· 952 952 struct bio_set *bs, gfp_t gfp_mask, 953 953 int (*bio_ctr)(struct bio *, struct bio *, void *), void *data); 954 954 void blk_rq_unprep_clone(struct request *rq); 955 - blk_status_t blk_insert_cloned_request(struct request_queue *q, 956 - struct request *rq); 955 + blk_status_t blk_insert_cloned_request(struct request *rq); 957 956 958 957 struct rq_map_data { 959 958 struct page **pages;

+7

include/linux/blk_types.h

··· 153 153 */ 154 154 #define BLK_STS_ZONE_ACTIVE_RESOURCE ((__force blk_status_t)16) 155 155 156 + /* 157 + * BLK_STS_OFFLINE is returned from the driver when the target device is offline 158 + * or is being taken offline. This could help differentiate the case where a 159 + * device is intentionally being shut down from a real I/O error. 160 + */ 161 + #define BLK_STS_OFFLINE ((__force blk_status_t)17) 162 + 156 163 /** 157 164 * blk_path_error - returns true if error may be path related 158 165 * @error: status the request was completed with

+278 -16

include/linux/blkdev.h

··· 1 1 /* SPDX-License-Identifier: GPL-2.0 */ 2 + /* 3 + * Portions Copyright (C) 1992 Drew Eckhardt 4 + */ 2 5 #ifndef _LINUX_BLKDEV_H 3 6 #define _LINUX_BLKDEV_H 4 7 5 - #include <linux/sched.h> 6 - #include <linux/genhd.h> 8 + #include <linux/types.h> 9 + #include <linux/blk_types.h> 10 + #include <linux/device.h> 7 11 #include <linux/list.h> 8 12 #include <linux/llist.h> 9 13 #include <linux/minmax.h> ··· 16 12 #include <linux/wait.h> 17 13 #include <linux/bio.h> 18 14 #include <linux/gfp.h> 15 + #include <linux/kdev_t.h> 19 16 #include <linux/rcupdate.h> 20 17 #include <linux/percpu-refcount.h> 21 18 #include <linux/blkzoned.h> 19 + #include <linux/sched.h> 22 20 #include <linux/sbitmap.h> 23 21 #include <linux/srcu.h> 22 + #include <linux/uuid.h> 23 + #include <linux/xarray.h> 24 24 25 25 struct module; 26 26 struct request_queue; ··· 41 33 struct blk_stat_callback; 42 34 struct blk_crypto_profile; 43 35 36 + extern const struct device_type disk_type; 37 + extern struct device_type part_type; 38 + extern struct class block_class; 39 + 44 40 /* Must be consistent with blk_mq_poll_stats_bkt() */ 45 41 #define BLK_MQ_POLL_STATS_BKTS 16 46 42 ··· 56 44 * Defined here to simplify include dependency. 57 45 */ 58 46 #define BLKCG_MAX_POLS 6 47 + 48 + #define DISK_MAX_PARTS 256 49 + #define DISK_NAME_LEN 32 50 + 51 + #define PARTITION_META_INFO_VOLNAMELTH 64 52 + /* 53 + * Enough for the string representation of any kind of UUID plus NULL. 54 + * EFI UUID is 36 characters. MSDOS UUID is 11 characters. 55 + */ 56 + #define PARTITION_META_INFO_UUIDLTH (UUID_STRING_LEN + 1) 57 + 58 + struct partition_meta_info { 59 + char uuid[PARTITION_META_INFO_UUIDLTH]; 60 + u8 volname[PARTITION_META_INFO_VOLNAMELTH]; 61 + }; 62 + 63 + /** 64 + * DOC: genhd capability flags 65 + * 66 + * ``GENHD_FL_REMOVABLE``: indicates that the block device gives access to 67 + * removable media. When set, the device remains present even when media is not 68 + * inserted. Shall not be set for devices which are removed entirely when the 69 + * media is removed. 70 + * 71 + * ``GENHD_FL_HIDDEN``: the block device is hidden; it doesn't produce events, 72 + * doesn't appear in sysfs, and can't be opened from userspace or using 73 + * blkdev_get*. Used for the underlying components of multipath devices. 74 + * 75 + * ``GENHD_FL_NO_PART``: partition support is disabled. The kernel will not 76 + * scan for partitions from add_disk, and users can't add partitions manually. 77 + * 78 + */ 79 + enum { 80 + GENHD_FL_REMOVABLE = 1 << 0, 81 + GENHD_FL_HIDDEN = 1 << 1, 82 + GENHD_FL_NO_PART = 1 << 2, 83 + }; 84 + 85 + enum { 86 + DISK_EVENT_MEDIA_CHANGE = 1 << 0, /* media changed */ 87 + DISK_EVENT_EJECT_REQUEST = 1 << 1, /* eject requested */ 88 + }; 89 + 90 + enum { 91 + /* Poll even if events_poll_msecs is unset */ 92 + DISK_EVENT_FLAG_POLL = 1 << 0, 93 + /* Forward events to udev */ 94 + DISK_EVENT_FLAG_UEVENT = 1 << 1, 95 + /* Block event polling when open for exclusive write */ 96 + DISK_EVENT_FLAG_BLOCK_ON_EXCL_WRITE = 1 << 2, 97 + }; 98 + 99 + struct disk_events; 100 + struct badblocks; 101 + 102 + struct blk_integrity { 103 + const struct blk_integrity_profile *profile; 104 + unsigned char flags; 105 + unsigned char tuple_size; 106 + unsigned char interval_exp; 107 + unsigned char tag_size; 108 + }; 109 + 110 + struct gendisk { 111 + /* 112 + * major/first_minor/minors should not be set by any new driver, the 113 + * block core will take care of allocating them automatically. 114 + */ 115 + int major; 116 + int first_minor; 117 + int minors; 118 + 119 + char disk_name[DISK_NAME_LEN]; /* name of major driver */ 120 + 121 + unsigned short events; /* supported events */ 122 + unsigned short event_flags; /* flags related to event processing */ 123 + 124 + struct xarray part_tbl; 125 + struct block_device *part0; 126 + 127 + const struct block_device_operations *fops; 128 + struct request_queue *queue; 129 + void *private_data; 130 + 131 + int flags; 132 + unsigned long state; 133 + #define GD_NEED_PART_SCAN 0 134 + #define GD_READ_ONLY 1 135 + #define GD_DEAD 2 136 + #define GD_NATIVE_CAPACITY 3 137 + #define GD_ADDED 4 138 + 139 + struct mutex open_mutex; /* open/close mutex */ 140 + unsigned open_partitions; /* number of open partitions */ 141 + 142 + struct backing_dev_info *bdi; 143 + struct kobject *slave_dir; 144 + #ifdef CONFIG_BLOCK_HOLDER_DEPRECATED 145 + struct list_head slave_bdevs; 146 + #endif 147 + struct timer_rand_state *random; 148 + atomic_t sync_io; /* RAID */ 149 + struct disk_events *ev; 150 + #ifdef CONFIG_BLK_DEV_INTEGRITY 151 + struct kobject integrity_kobj; 152 + #endif /* CONFIG_BLK_DEV_INTEGRITY */ 153 + #if IS_ENABLED(CONFIG_CDROM) 154 + struct cdrom_device_info *cdi; 155 + #endif 156 + int node_id; 157 + struct badblocks *bb; 158 + struct lockdep_map lockdep_map; 159 + u64 diskseq; 160 + }; 161 + 162 + static inline bool disk_live(struct gendisk *disk) 163 + { 164 + return !inode_unhashed(disk->part0->bd_inode); 165 + } 166 + 167 + /* 168 + * The gendisk is refcounted by the part0 block_device, and the bd_device 169 + * therein is also used for device model presentation in sysfs. 170 + */ 171 + #define dev_to_disk(device) \ 172 + (dev_to_bdev(device)->bd_disk) 173 + #define disk_to_dev(disk) \ 174 + (&((disk)->part0->bd_device)) 175 + 176 + #if IS_REACHABLE(CONFIG_CDROM) 177 + #define disk_to_cdi(disk) ((disk)->cdi) 178 + #else 179 + #define disk_to_cdi(disk) NULL 180 + #endif 181 + 182 + static inline dev_t disk_devt(struct gendisk *disk) 183 + { 184 + return MKDEV(disk->major, disk->first_minor); 185 + } 59 186 60 187 static inline int blk_validate_block_size(unsigned long bsize) 61 188 { ··· 413 262 414 263 #ifdef CONFIG_BLK_INLINE_ENCRYPTION 415 264 struct blk_crypto_profile *crypto_profile; 265 + struct kobject *crypto_kobject; 416 266 #endif 417 267 418 268 unsigned int rq_timeout; ··· 748 596 #define for_each_bio(_bio) \ 749 597 for (; _bio; _bio = _bio->bi_next) 750 598 599 + int __must_check device_add_disk(struct device *parent, struct gendisk *disk, 600 + const struct attribute_group **groups); 601 + static inline int __must_check add_disk(struct gendisk *disk) 602 + { 603 + return device_add_disk(NULL, disk, NULL); 604 + } 605 + void del_gendisk(struct gendisk *gp); 606 + void invalidate_disk(struct gendisk *disk); 607 + void set_disk_ro(struct gendisk *disk, bool read_only); 608 + void disk_uevent(struct gendisk *disk, enum kobject_action action); 609 + 610 + static inline int get_disk_ro(struct gendisk *disk) 611 + { 612 + return disk->part0->bd_read_only || 613 + test_bit(GD_READ_ONLY, &disk->state); 614 + } 615 + 616 + static inline int bdev_read_only(struct block_device *bdev) 617 + { 618 + return bdev->bd_read_only || get_disk_ro(bdev->bd_disk); 619 + } 620 + 621 + bool set_capacity_and_notify(struct gendisk *disk, sector_t size); 622 + bool disk_force_media_change(struct gendisk *disk, unsigned int events); 623 + 624 + void add_disk_randomness(struct gendisk *disk) __latent_entropy; 625 + void rand_initialize_disk(struct gendisk *disk); 626 + 627 + static inline sector_t get_start_sect(struct block_device *bdev) 628 + { 629 + return bdev->bd_start_sect; 630 + } 631 + 632 + static inline sector_t bdev_nr_sectors(struct block_device *bdev) 633 + { 634 + return bdev->bd_nr_sectors; 635 + } 636 + 637 + static inline loff_t bdev_nr_bytes(struct block_device *bdev) 638 + { 639 + return (loff_t)bdev_nr_sectors(bdev) << SECTOR_SHIFT; 640 + } 641 + 642 + static inline sector_t get_capacity(struct gendisk *disk) 643 + { 644 + return bdev_nr_sectors(disk->part0); 645 + } 646 + 647 + static inline u64 sb_bdev_nr_blocks(struct super_block *sb) 648 + { 649 + return bdev_nr_sectors(sb->s_bdev) >> 650 + (sb->s_blocksize_bits - SECTOR_SHIFT); 651 + } 652 + 653 + int bdev_disk_changed(struct gendisk *disk, bool invalidate); 654 + 655 + struct gendisk *__alloc_disk_node(struct request_queue *q, int node_id, 656 + struct lock_class_key *lkclass); 657 + void put_disk(struct gendisk *disk); 658 + struct gendisk *__blk_alloc_disk(int node, struct lock_class_key *lkclass); 659 + 660 + /** 661 + * blk_alloc_disk - allocate a gendisk structure 662 + * @node_id: numa node to allocate on 663 + * 664 + * Allocate and pre-initialize a gendisk structure for use with BIO based 665 + * drivers. 666 + * 667 + * Context: can sleep 668 + */ 669 + #define blk_alloc_disk(node_id) \ 670 + ({ \ 671 + static struct lock_class_key __key; \ 672 + \ 673 + __blk_alloc_disk(node_id, &__key); \ 674 + }) 675 + void blk_cleanup_disk(struct gendisk *disk); 676 + 677 + int __register_blkdev(unsigned int major, const char *name, 678 + void (*probe)(dev_t devt)); 679 + #define register_blkdev(major, name) \ 680 + __register_blkdev(major, name, NULL) 681 + void unregister_blkdev(unsigned int major, const char *name); 682 + 683 + bool bdev_check_media_change(struct block_device *bdev); 684 + int __invalidate_device(struct block_device *bdev, bool kill_dirty); 685 + void set_capacity(struct gendisk *disk, sector_t size); 686 + 687 + #ifdef CONFIG_BLOCK_HOLDER_DEPRECATED 688 + int bd_link_disk_holder(struct block_device *bdev, struct gendisk *disk); 689 + void bd_unlink_disk_holder(struct block_device *bdev, struct gendisk *disk); 690 + int bd_register_pending_holders(struct gendisk *disk); 691 + #else 692 + static inline int bd_link_disk_holder(struct block_device *bdev, 693 + struct gendisk *disk) 694 + { 695 + return 0; 696 + } 697 + static inline void bd_unlink_disk_holder(struct block_device *bdev, 698 + struct gendisk *disk) 699 + { 700 + } 701 + static inline int bd_register_pending_holders(struct gendisk *disk) 702 + { 703 + return 0; 704 + } 705 + #endif /* CONFIG_BLOCK_HOLDER_DEPRECATED */ 706 + 707 + dev_t part_devt(struct gendisk *disk, u8 partno); 708 + void inc_diskseq(struct gendisk *disk); 709 + dev_t blk_lookup_devt(const char *name, int partno); 710 + void blk_request_module(dev_t devt); 751 711 752 712 extern int blk_register_queue(struct gendisk *disk); 753 713 extern void blk_unregister_queue(struct gendisk *disk); ··· 1056 792 extern void blk_start_plug_nr_ios(struct blk_plug *, unsigned short); 1057 793 extern void blk_finish_plug(struct blk_plug *); 1058 794 1059 - void blk_flush_plug(struct blk_plug *plug, bool from_schedule); 1060 - 1061 - static inline bool blk_needs_flush_plug(struct task_struct *tsk) 795 + void __blk_flush_plug(struct blk_plug *plug, bool from_schedule); 796 + static inline void blk_flush_plug(struct blk_plug *plug, bool async) 1062 797 { 1063 - struct blk_plug *plug = tsk->plug; 1064 - 1065 - return plug && 1066 - (plug->mq_list || !list_empty(&plug->cb_list)); 798 + if (plug) 799 + __blk_flush_plug(plug, async); 1067 800 } 1068 801 1069 802 int blkdev_issue_flush(struct block_device *bdev); ··· 1084 823 1085 824 static inline void blk_flush_plug(struct blk_plug *plug, bool async) 1086 825 { 1087 - } 1088 - 1089 - static inline bool blk_needs_flush_plug(struct task_struct *tsk) 1090 - { 1091 - return false; 1092 826 } 1093 827 1094 828 static inline int blkdev_issue_flush(struct block_device *bdev) ··· 1467 1211 void (*unlock_native_capacity) (struct gendisk *); 1468 1212 int (*getgeo)(struct block_device *, struct hd_geometry *); 1469 1213 int (*set_read_only)(struct block_device *bdev, bool ro); 1214 + void (*free_disk)(struct gendisk *disk); 1470 1215 /* this callback is with swap_lock and sometimes page table lock held */ 1471 1216 void (*swap_slot_free_notify) (struct block_device *, unsigned long); 1472 1217 int (*report_zones)(struct gendisk *, sector_t sector, ··· 1524 1267 /** 1525 1268 * bio_end_io_acct - end I/O accounting for bio based drivers 1526 1269 * @bio: bio to end account for 1527 - * @start: start time returned by bio_start_io_acct() 1270 + * @start_time: start time returned by bio_start_io_acct() 1528 1271 */ 1529 1272 static inline void bio_end_io_acct(struct bio *bio, unsigned long start_time) 1530 1273 { ··· 1569 1312 int sync_blockdev(struct block_device *bdev); 1570 1313 int sync_blockdev_nowait(struct block_device *bdev); 1571 1314 void sync_bdevs(bool wait); 1315 + void printk_all_partitions(void); 1572 1316 #else 1573 1317 static inline void invalidate_bdev(struct block_device *bdev) 1574 1318 { ··· 1585 1327 static inline void sync_bdevs(bool wait) 1586 1328 { 1587 1329 } 1588 - #endif 1330 + static inline void printk_all_partitions(void) 1331 + { 1332 + } 1333 + #endif /* CONFIG_BLOCK */ 1334 + 1589 1335 int fsync_bdev(struct block_device *bdev); 1590 1336 1591 1337 int freeze_bdev(struct block_device *bdev);

-291

include/linux/genhd.h

··· 1 - /* SPDX-License-Identifier: GPL-2.0 */ 2 - #ifndef _LINUX_GENHD_H 3 - #define _LINUX_GENHD_H 4 - 5 - /* 6 - * genhd.h Copyright (C) 1992 Drew Eckhardt 7 - * Generic hard disk header file by 8 - * Drew Eckhardt 9 - * 10 - * <drew@colorado.edu> 11 - */ 12 - 13 - #include <linux/types.h> 14 - #include <linux/kdev_t.h> 15 - #include <linux/uuid.h> 16 - #include <linux/blk_types.h> 17 - #include <linux/device.h> 18 - #include <linux/xarray.h> 19 - 20 - extern const struct device_type disk_type; 21 - extern struct device_type part_type; 22 - extern struct class block_class; 23 - 24 - #define DISK_MAX_PARTS 256 25 - #define DISK_NAME_LEN 32 26 - 27 - #define PARTITION_META_INFO_VOLNAMELTH 64 28 - /* 29 - * Enough for the string representation of any kind of UUID plus NULL. 30 - * EFI UUID is 36 characters. MSDOS UUID is 11 characters. 31 - */ 32 - #define PARTITION_META_INFO_UUIDLTH (UUID_STRING_LEN + 1) 33 - 34 - struct partition_meta_info { 35 - char uuid[PARTITION_META_INFO_UUIDLTH]; 36 - u8 volname[PARTITION_META_INFO_VOLNAMELTH]; 37 - }; 38 - 39 - /** 40 - * DOC: genhd capability flags 41 - * 42 - * ``GENHD_FL_REMOVABLE``: indicates that the block device gives access to 43 - * removable media. When set, the device remains present even when media is not 44 - * inserted. Shall not be set for devices which are removed entirely when the 45 - * media is removed. 46 - * 47 - * ``GENHD_FL_HIDDEN``: the block device is hidden; it doesn't produce events, 48 - * doesn't appear in sysfs, and can't be opened from userspace or using 49 - * blkdev_get*. Used for the underlying components of multipath devices. 50 - * 51 - * ``GENHD_FL_NO_PART``: partition support is disabled. The kernel will not 52 - * scan for partitions from add_disk, and users can't add partitions manually. 53 - * 54 - */ 55 - enum { 56 - GENHD_FL_REMOVABLE = 1 << 0, 57 - GENHD_FL_HIDDEN = 1 << 1, 58 - GENHD_FL_NO_PART = 1 << 2, 59 - }; 60 - 61 - enum { 62 - DISK_EVENT_MEDIA_CHANGE = 1 << 0, /* media changed */ 63 - DISK_EVENT_EJECT_REQUEST = 1 << 1, /* eject requested */ 64 - }; 65 - 66 - enum { 67 - /* Poll even if events_poll_msecs is unset */ 68 - DISK_EVENT_FLAG_POLL = 1 << 0, 69 - /* Forward events to udev */ 70 - DISK_EVENT_FLAG_UEVENT = 1 << 1, 71 - /* Block event polling when open for exclusive write */ 72 - DISK_EVENT_FLAG_BLOCK_ON_EXCL_WRITE = 1 << 2, 73 - }; 74 - 75 - struct disk_events; 76 - struct badblocks; 77 - 78 - struct blk_integrity { 79 - const struct blk_integrity_profile *profile; 80 - unsigned char flags; 81 - unsigned char tuple_size; 82 - unsigned char interval_exp; 83 - unsigned char tag_size; 84 - }; 85 - 86 - struct gendisk { 87 - /* 88 - * major/first_minor/minors should not be set by any new driver, the 89 - * block core will take care of allocating them automatically. 90 - */ 91 - int major; 92 - int first_minor; 93 - int minors; 94 - 95 - char disk_name[DISK_NAME_LEN]; /* name of major driver */ 96 - 97 - unsigned short events; /* supported events */ 98 - unsigned short event_flags; /* flags related to event processing */ 99 - 100 - struct xarray part_tbl; 101 - struct block_device *part0; 102 - 103 - const struct block_device_operations *fops; 104 - struct request_queue *queue; 105 - void *private_data; 106 - 107 - int flags; 108 - unsigned long state; 109 - #define GD_NEED_PART_SCAN 0 110 - #define GD_READ_ONLY 1 111 - #define GD_DEAD 2 112 - #define GD_NATIVE_CAPACITY 3 113 - 114 - struct mutex open_mutex; /* open/close mutex */ 115 - unsigned open_partitions; /* number of open partitions */ 116 - 117 - struct backing_dev_info *bdi; 118 - struct kobject *slave_dir; 119 - #ifdef CONFIG_BLOCK_HOLDER_DEPRECATED 120 - struct list_head slave_bdevs; 121 - #endif 122 - struct timer_rand_state *random; 123 - atomic_t sync_io; /* RAID */ 124 - struct disk_events *ev; 125 - #ifdef CONFIG_BLK_DEV_INTEGRITY 126 - struct kobject integrity_kobj; 127 - #endif /* CONFIG_BLK_DEV_INTEGRITY */ 128 - #if IS_ENABLED(CONFIG_CDROM) 129 - struct cdrom_device_info *cdi; 130 - #endif 131 - int node_id; 132 - struct badblocks *bb; 133 - struct lockdep_map lockdep_map; 134 - u64 diskseq; 135 - }; 136 - 137 - static inline bool disk_live(struct gendisk *disk) 138 - { 139 - return !inode_unhashed(disk->part0->bd_inode); 140 - } 141 - 142 - /* 143 - * The gendisk is refcounted by the part0 block_device, and the bd_device 144 - * therein is also used for device model presentation in sysfs. 145 - */ 146 - #define dev_to_disk(device) \ 147 - (dev_to_bdev(device)->bd_disk) 148 - #define disk_to_dev(disk) \ 149 - (&((disk)->part0->bd_device)) 150 - 151 - #if IS_REACHABLE(CONFIG_CDROM) 152 - #define disk_to_cdi(disk) ((disk)->cdi) 153 - #else 154 - #define disk_to_cdi(disk) NULL 155 - #endif 156 - 157 - static inline dev_t disk_devt(struct gendisk *disk) 158 - { 159 - return MKDEV(disk->major, disk->first_minor); 160 - } 161 - 162 - void disk_uevent(struct gendisk *disk, enum kobject_action action); 163 - 164 - /* block/genhd.c */ 165 - int __must_check device_add_disk(struct device *parent, struct gendisk *disk, 166 - const struct attribute_group **groups); 167 - static inline int __must_check add_disk(struct gendisk *disk) 168 - { 169 - return device_add_disk(NULL, disk, NULL); 170 - } 171 - extern void del_gendisk(struct gendisk *gp); 172 - 173 - void invalidate_disk(struct gendisk *disk); 174 - 175 - void set_disk_ro(struct gendisk *disk, bool read_only); 176 - 177 - static inline int get_disk_ro(struct gendisk *disk) 178 - { 179 - return disk->part0->bd_read_only || 180 - test_bit(GD_READ_ONLY, &disk->state); 181 - } 182 - 183 - static inline int bdev_read_only(struct block_device *bdev) 184 - { 185 - return bdev->bd_read_only || get_disk_ro(bdev->bd_disk); 186 - } 187 - 188 - extern void disk_block_events(struct gendisk *disk); 189 - extern void disk_unblock_events(struct gendisk *disk); 190 - extern void disk_flush_events(struct gendisk *disk, unsigned int mask); 191 - bool set_capacity_and_notify(struct gendisk *disk, sector_t size); 192 - bool disk_force_media_change(struct gendisk *disk, unsigned int events); 193 - 194 - /* drivers/char/random.c */ 195 - extern void add_disk_randomness(struct gendisk *disk) __latent_entropy; 196 - extern void rand_initialize_disk(struct gendisk *disk); 197 - 198 - static inline sector_t get_start_sect(struct block_device *bdev) 199 - { 200 - return bdev->bd_start_sect; 201 - } 202 - 203 - static inline sector_t bdev_nr_sectors(struct block_device *bdev) 204 - { 205 - return bdev->bd_nr_sectors; 206 - } 207 - 208 - static inline loff_t bdev_nr_bytes(struct block_device *bdev) 209 - { 210 - return (loff_t)bdev_nr_sectors(bdev) << SECTOR_SHIFT; 211 - } 212 - 213 - static inline sector_t get_capacity(struct gendisk *disk) 214 - { 215 - return bdev_nr_sectors(disk->part0); 216 - } 217 - 218 - static inline u64 sb_bdev_nr_blocks(struct super_block *sb) 219 - { 220 - return bdev_nr_sectors(sb->s_bdev) >> 221 - (sb->s_blocksize_bits - SECTOR_SHIFT); 222 - } 223 - 224 - int bdev_disk_changed(struct gendisk *disk, bool invalidate); 225 - void blk_drop_partitions(struct gendisk *disk); 226 - 227 - struct gendisk *__alloc_disk_node(struct request_queue *q, int node_id, 228 - struct lock_class_key *lkclass); 229 - extern void put_disk(struct gendisk *disk); 230 - struct gendisk *__blk_alloc_disk(int node, struct lock_class_key *lkclass); 231 - 232 - /** 233 - * blk_alloc_disk - allocate a gendisk structure 234 - * @node_id: numa node to allocate on 235 - * 236 - * Allocate and pre-initialize a gendisk structure for use with BIO based 237 - * drivers. 238 - * 239 - * Context: can sleep 240 - */ 241 - #define blk_alloc_disk(node_id) \ 242 - ({ \ 243 - static struct lock_class_key __key; \ 244 - \ 245 - __blk_alloc_disk(node_id, &__key); \ 246 - }) 247 - void blk_cleanup_disk(struct gendisk *disk); 248 - 249 - int __register_blkdev(unsigned int major, const char *name, 250 - void (*probe)(dev_t devt)); 251 - #define register_blkdev(major, name) \ 252 - __register_blkdev(major, name, NULL) 253 - void unregister_blkdev(unsigned int major, const char *name); 254 - 255 - bool bdev_check_media_change(struct block_device *bdev); 256 - int __invalidate_device(struct block_device *bdev, bool kill_dirty); 257 - void set_capacity(struct gendisk *disk, sector_t size); 258 - 259 - #ifdef CONFIG_BLOCK_HOLDER_DEPRECATED 260 - int bd_link_disk_holder(struct block_device *bdev, struct gendisk *disk); 261 - void bd_unlink_disk_holder(struct block_device *bdev, struct gendisk *disk); 262 - int bd_register_pending_holders(struct gendisk *disk); 263 - #else 264 - static inline int bd_link_disk_holder(struct block_device *bdev, 265 - struct gendisk *disk) 266 - { 267 - return 0; 268 - } 269 - static inline void bd_unlink_disk_holder(struct block_device *bdev, 270 - struct gendisk *disk) 271 - { 272 - } 273 - static inline int bd_register_pending_holders(struct gendisk *disk) 274 - { 275 - return 0; 276 - } 277 - #endif /* CONFIG_BLOCK_HOLDER_DEPRECATED */ 278 - 279 - dev_t part_devt(struct gendisk *disk, u8 partno); 280 - void inc_diskseq(struct gendisk *disk); 281 - dev_t blk_lookup_devt(const char *name, int partno); 282 - void blk_request_module(dev_t devt); 283 - #ifdef CONFIG_BLOCK 284 - void printk_all_partitions(void); 285 - #else /* CONFIG_BLOCK */ 286 - static inline void printk_all_partitions(void) 287 - { 288 - } 289 - #endif /* CONFIG_BLOCK */ 290 - 291 - #endif /* _LINUX_GENHD_H */

+1 -1

include/linux/part_stat.h

··· 2 2 #ifndef _LINUX_PART_STAT_H 3 3 #define _LINUX_PART_STAT_H 4 4 5 - #include <linux/genhd.h> 5 + #include <linux/blkdev.h> 6 6 #include <asm/local.h> 7 7 8 8 struct disk_stats {

+14 -37

include/linux/sbitmap.h

··· 28 28 */ 29 29 struct sbitmap_word { 30 30 /** 31 - * @depth: Number of bits being used in @word/@cleared 32 - */ 33 - unsigned long depth; 34 - 35 - /** 36 31 * @word: word holding free bits 37 32 */ 38 - unsigned long word ____cacheline_aligned_in_smp; 33 + unsigned long word; 39 34 40 35 /** 41 36 * @cleared: word holding cleared bits ··· 135 140 136 141 /** 137 142 * @min_shallow_depth: The minimum shallow depth which may be passed to 138 - * sbitmap_queue_get_shallow() or __sbitmap_queue_get_shallow(). 143 + * sbitmap_queue_get_shallow() 139 144 */ 140 145 unsigned int min_shallow_depth; 141 146 }; ··· 158 163 */ 159 164 int sbitmap_init_node(struct sbitmap *sb, unsigned int depth, int shift, 160 165 gfp_t flags, int node, bool round_robin, bool alloc_hint); 166 + 167 + /* sbitmap internal helper */ 168 + static inline unsigned int __map_depth(const struct sbitmap *sb, int index) 169 + { 170 + if (index == sb->map_nr - 1) 171 + return sb->depth - (index << sb->shift); 172 + return 1U << sb->shift; 173 + } 161 174 162 175 /** 163 176 * sbitmap_free() - Free memory used by a &struct sbitmap. ··· 254 251 while (scanned < sb->depth) { 255 252 unsigned long word; 256 253 unsigned int depth = min_t(unsigned int, 257 - sb->map[index].depth - nr, 254 + __map_depth(sb, index) - nr, 258 255 sb->depth - scanned); 259 256 260 257 scanned += depth; ··· 463 460 unsigned int *offset); 464 461 465 462 /** 466 - * __sbitmap_queue_get_shallow() - Try to allocate a free bit from a &struct 463 + * sbitmap_queue_get_shallow() - Try to allocate a free bit from a &struct 467 464 * sbitmap_queue, limiting the depth used from each word, with preemption 468 465 * already disabled. 469 466 * @sbq: Bitmap queue to allocate from. ··· 475 472 * 476 473 * Return: Non-negative allocated bit number if successful, -1 otherwise. 477 474 */ 478 - int __sbitmap_queue_get_shallow(struct sbitmap_queue *sbq, 479 - unsigned int shallow_depth); 475 + int sbitmap_queue_get_shallow(struct sbitmap_queue *sbq, 476 + unsigned int shallow_depth); 480 477 481 478 /** 482 479 * sbitmap_queue_get() - Try to allocate a free bit from a &struct ··· 494 491 495 492 *cpu = get_cpu(); 496 493 nr = __sbitmap_queue_get(sbq); 497 - put_cpu(); 498 - return nr; 499 - } 500 - 501 - /** 502 - * sbitmap_queue_get_shallow() - Try to allocate a free bit from a &struct 503 - * sbitmap_queue, limiting the depth used from each word. 504 - * @sbq: Bitmap queue to allocate from. 505 - * @cpu: Output parameter; will contain the CPU we ran on (e.g., to be passed to 506 - * sbitmap_queue_clear()). 507 - * @shallow_depth: The maximum number of bits to allocate from a single word. 508 - * See sbitmap_get_shallow(). 509 - * 510 - * If you call this, make sure to call sbitmap_queue_min_shallow_depth() after 511 - * initializing @sbq. 512 - * 513 - * Return: Non-negative allocated bit number if successful, -1 otherwise. 514 - */ 515 - static inline int sbitmap_queue_get_shallow(struct sbitmap_queue *sbq, 516 - unsigned int *cpu, 517 - unsigned int shallow_depth) 518 - { 519 - int nr; 520 - 521 - *cpu = get_cpu(); 522 - nr = __sbitmap_queue_get_shallow(sbq, shallow_depth); 523 494 put_cpu(); 524 495 return nr; 525 496 }

+36 -13

include/trace/events/block.h

··· 100 100 __entry->nr_sector, 0) 101 101 ); 102 102 103 - /** 104 - * block_rq_complete - block IO operation completed by device driver 105 - * @rq: block operations request 106 - * @error: status code 107 - * @nr_bytes: number of completed bytes 108 - * 109 - * The block_rq_complete tracepoint event indicates that some portion 110 - * of operation request has been completed by the device driver. If 111 - * the @rq->bio is %NULL, then there is absolutely no additional work to 112 - * do for the request. If @rq->bio is non-NULL then there is 113 - * additional work required to complete the request. 114 - */ 115 - TRACE_EVENT(block_rq_complete, 103 + DECLARE_EVENT_CLASS(block_rq_completion, 116 104 117 105 TP_PROTO(struct request *rq, blk_status_t error, unsigned int nr_bytes), 118 106 ··· 130 142 __entry->rwbs, __get_str(cmd), 131 143 (unsigned long long)__entry->sector, 132 144 __entry->nr_sector, __entry->error) 145 + ); 146 + 147 + /** 148 + * block_rq_complete - block IO operation completed by device driver 149 + * @rq: block operations request 150 + * @error: status code 151 + * @nr_bytes: number of completed bytes 152 + * 153 + * The block_rq_complete tracepoint event indicates that some portion 154 + * of operation request has been completed by the device driver. If 155 + * the @rq->bio is %NULL, then there is absolutely no additional work to 156 + * do for the request. If @rq->bio is non-NULL then there is 157 + * additional work required to complete the request. 158 + */ 159 + DEFINE_EVENT(block_rq_completion, block_rq_complete, 160 + 161 + TP_PROTO(struct request *rq, blk_status_t error, unsigned int nr_bytes), 162 + 163 + TP_ARGS(rq, error, nr_bytes) 164 + ); 165 + 166 + /** 167 + * block_rq_error - block IO operation error reported by device driver 168 + * @rq: block operations request 169 + * @error: status code 170 + * @nr_bytes: number of completed bytes 171 + * 172 + * The block_rq_error tracepoint event indicates that some portion 173 + * of operation request has failed as reported by the device driver. 174 + */ 175 + DEFINE_EVENT(block_rq_completion, block_rq_error, 176 + 177 + TP_PROTO(struct request *rq, blk_status_t error, unsigned int nr_bytes), 178 + 179 + TP_ARGS(rq, error, nr_bytes) 133 180 ); 134 181 135 182 DECLARE_EVENT_CLASS(block_rq,

-1

init/do_mounts.c

··· 8 8 #include <linux/root_dev.h> 9 9 #include <linux/security.h> 10 10 #include <linux/delay.h> 11 - #include <linux/genhd.h> 12 11 #include <linux/mount.h> 13 12 #include <linux/device.h> 14 13 #include <linux/init.h>

+1 -1

kernel/exit.c

··· 735 735 struct task_struct *tsk = current; 736 736 int group_dead; 737 737 738 - WARN_ON(blk_needs_flush_plug(tsk)); 738 + WARN_ON(tsk->plug); 739 739 740 740 /* 741 741 * If do_dead is called because this processes oopsed, it's possible

-1

kernel/power/hibernate.c

··· 28 28 #include <linux/gfp.h> 29 29 #include <linux/syscore_ops.h> 30 30 #include <linux/ctype.h> 31 - #include <linux/genhd.h> 32 31 #include <linux/ktime.h> 33 32 #include <linux/security.h> 34 33 #include <linux/secretmem.h>

+2 -4

kernel/power/swap.c

··· 16 16 #include <linux/file.h> 17 17 #include <linux/delay.h> 18 18 #include <linux/bitops.h> 19 - #include <linux/genhd.h> 20 19 #include <linux/device.h> 21 20 #include <linux/bio.h> 22 21 #include <linux/blkdev.h> ··· 276 277 struct bio *bio; 277 278 int error = 0; 278 279 279 - bio = bio_alloc(GFP_NOIO | __GFP_HIGH, 1); 280 + bio = bio_alloc(hib_resume_bdev, 1, op | op_flags, 281 + GFP_NOIO | __GFP_HIGH); 280 282 bio->bi_iter.bi_sector = page_off * (PAGE_SIZE >> 9); 281 - bio_set_dev(bio, hib_resume_bdev); 282 - bio_set_op_attrs(bio, op, op_flags); 283 283 284 284 if (bio_add_page(bio, page, PAGE_SIZE, 0) < PAGE_SIZE) { 285 285 pr_err("Adding page to bio failed at %llu\n",

+2 -5

kernel/sched/core.c

··· 6353 6353 * If we are going to sleep and we have plugged IO queued, 6354 6354 * make sure to submit it to avoid deadlocks. 6355 6355 */ 6356 - if (blk_needs_flush_plug(tsk)) 6357 - blk_flush_plug(tsk->plug, true); 6356 + blk_flush_plug(tsk->plug, true); 6358 6357 } 6359 6358 6360 6359 static void sched_update_worker(struct task_struct *tsk) ··· 8379 8380 int old_iowait = current->in_iowait; 8380 8381 8381 8382 current->in_iowait = 1; 8382 - if (current->plug) 8383 - blk_flush_plug(current->plug, true); 8384 - 8383 + blk_flush_plug(current->plug, true); 8385 8384 return old_iowait; 8386 8385 } 8387 8386

+17 -23

lib/sbitmap.c

··· 85 85 bool alloc_hint) 86 86 { 87 87 unsigned int bits_per_word; 88 - unsigned int i; 89 88 90 89 if (shift < 0) 91 90 shift = sbitmap_calculate_shift(depth); ··· 116 117 return -ENOMEM; 117 118 } 118 119 119 - for (i = 0; i < sb->map_nr; i++) { 120 - sb->map[i].depth = min(depth, bits_per_word); 121 - depth -= sb->map[i].depth; 122 - } 123 120 return 0; 124 121 } 125 122 EXPORT_SYMBOL_GPL(sbitmap_init_node); ··· 130 135 131 136 sb->depth = depth; 132 137 sb->map_nr = DIV_ROUND_UP(sb->depth, bits_per_word); 133 - 134 - for (i = 0; i < sb->map_nr; i++) { 135 - sb->map[i].depth = min(depth, bits_per_word); 136 - depth -= sb->map[i].depth; 137 - } 138 138 } 139 139 EXPORT_SYMBOL_GPL(sbitmap_resize); 140 140 ··· 174 184 int nr; 175 185 176 186 do { 177 - nr = __sbitmap_get_word(&map->word, map->depth, alloc_hint, 178 - !sb->round_robin); 187 + nr = __sbitmap_get_word(&map->word, __map_depth(sb, index), 188 + alloc_hint, !sb->round_robin); 179 189 if (nr != -1) 180 190 break; 181 191 if (!sbitmap_deferred_clear(map)) ··· 247 257 for (i = 0; i < sb->map_nr; i++) { 248 258 again: 249 259 nr = __sbitmap_get_word(&sb->map[index].word, 250 - min(sb->map[index].depth, shallow_depth), 260 + min_t(unsigned int, 261 + __map_depth(sb, index), 262 + shallow_depth), 251 263 SB_NR_TO_BIT(sb, alloc_hint), true); 252 264 if (nr != -1) { 253 265 nr += index << sb->shift; ··· 307 315 308 316 for (i = 0; i < sb->map_nr; i++) { 309 317 const struct sbitmap_word *word = &sb->map[i]; 318 + unsigned int word_depth = __map_depth(sb, i); 310 319 311 320 if (set) 312 - weight += bitmap_weight(&word->word, word->depth); 321 + weight += bitmap_weight(&word->word, word_depth); 313 322 else 314 - weight += bitmap_weight(&word->cleared, word->depth); 323 + weight += bitmap_weight(&word->cleared, word_depth); 315 324 } 316 325 return weight; 317 326 } ··· 360 367 for (i = 0; i < sb->map_nr; i++) { 361 368 unsigned long word = READ_ONCE(sb->map[i].word); 362 369 unsigned long cleared = READ_ONCE(sb->map[i].cleared); 363 - unsigned int word_bits = READ_ONCE(sb->map[i].depth); 370 + unsigned int word_bits = __map_depth(sb, i); 364 371 365 372 word &= ~cleared; 366 373 ··· 524 531 for (i = 0; i < sb->map_nr; i++) { 525 532 struct sbitmap_word *map = &sb->map[index]; 526 533 unsigned long get_mask; 534 + unsigned int map_depth = __map_depth(sb, index); 527 535 528 536 sbitmap_deferred_clear(map); 529 - if (map->word == (1UL << (map->depth - 1)) - 1) 537 + if (map->word == (1UL << (map_depth - 1)) - 1) 530 538 continue; 531 539 532 - nr = find_first_zero_bit(&map->word, map->depth); 533 - if (nr + nr_tags <= map->depth) { 540 + nr = find_first_zero_bit(&map->word, map_depth); 541 + if (nr + nr_tags <= map_depth) { 534 542 atomic_long_t *ptr = (atomic_long_t *) &map->word; 535 - int map_tags = min_t(int, nr_tags, map->depth); 543 + int map_tags = min_t(int, nr_tags, map_depth); 536 544 unsigned long val, ret; 537 545 538 546 get_mask = ((1UL << map_tags) - 1) << nr; ··· 557 563 return 0; 558 564 } 559 565 560 - int __sbitmap_queue_get_shallow(struct sbitmap_queue *sbq, 561 - unsigned int shallow_depth) 566 + int sbitmap_queue_get_shallow(struct sbitmap_queue *sbq, 567 + unsigned int shallow_depth) 562 568 { 563 569 WARN_ON_ONCE(shallow_depth < sbq->min_shallow_depth); 564 570 565 571 return sbitmap_get_shallow(&sbq->sb, shallow_depth); 566 572 } 567 - EXPORT_SYMBOL_GPL(__sbitmap_queue_get_shallow); 573 + EXPORT_SYMBOL_GPL(sbitmap_queue_get_shallow); 568 574 569 575 void sbitmap_queue_min_shallow_depth(struct sbitmap_queue *sbq, 570 576 unsigned int min_shallow_depth)

+4 -6

mm/page_io.c

··· 338 338 return 0; 339 339 } 340 340 341 - bio = bio_alloc(GFP_NOIO, 1); 342 - bio_set_dev(bio, sis->bdev); 341 + bio = bio_alloc(sis->bdev, 1, 342 + REQ_OP_WRITE | REQ_SWAP | wbc_to_write_flags(wbc), 343 + GFP_NOIO); 343 344 bio->bi_iter.bi_sector = swap_page_sector(page); 344 - bio->bi_opf = REQ_OP_WRITE | REQ_SWAP | wbc_to_write_flags(wbc); 345 345 bio->bi_end_io = end_write_func; 346 346 bio_add_page(bio, page, thp_size(page), 0); 347 347 ··· 403 403 } 404 404 405 405 ret = 0; 406 - bio = bio_alloc(GFP_KERNEL, 1); 407 - bio_set_dev(bio, sis->bdev); 408 - bio->bi_opf = REQ_OP_READ; 406 + bio = bio_alloc(sis->bdev, 1, REQ_OP_READ, GFP_KERNEL); 409 407 bio->bi_iter.bi_sector = swap_page_sector(page); 410 408 bio->bi_end_io = end_swap_bio_read; 411 409 bio_add_page(bio, page, thp_size(page), 0);

-1

security/integrity/ima/ima_policy.c

··· 16 16 #include <linux/parser.h> 17 17 #include <linux/slab.h> 18 18 #include <linux/rculist.h> 19 - #include <linux/genhd.h> 20 19 #include <linux/seq_file.h> 21 20 #include <linux/ima.h> 22 21