dm-pcache: add persistent cache target in device-mapper

+202

Documentation/admin-guide/device-mapper/dm-pcache.rst

··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + ================================= 4 + dm-pcache — Persistent Cache 5 + ================================= 6 + 7 + *Author: Dongsheng Yang <dongsheng.yang@linux.dev>* 8 + 9 + This document describes *dm-pcache*, a Device-Mapper target that lets a 10 + byte-addressable *DAX* (persistent-memory, “pmem”) region act as a 11 + high-performance, crash-persistent cache in front of a slower block 12 + device. The code lives in `drivers/md/dm-pcache/`. 13 + 14 + Quick feature summary 15 + ===================== 16 + 17 + * *Write-back* caching (only mode currently supported). 18 + * *16 MiB segments* allocated on the pmem device. 19 + * *Data CRC32* verification (optional, per cache). 20 + * Crash-safe: every metadata structure is duplicated (`PCACHE_META_INDEX_MAX 21 + == 2`) and protected with CRC+sequence numbers. 22 + * *Multi-tree indexing* (indexing trees sharded by logical address) for high PMem parallelism 23 + * Pure *DAX path* I/O – no extra BIO round-trips 24 + * *Log-structured write-back* that preserves backend crash-consistency 25 + 26 + 27 + Constructor 28 + =========== 29 + 30 + :: 31 + 32 + pcache <cache_dev> <backing_dev> [<number_of_optional_arguments> <cache_mode writeback> <data_crc true|false>] 33 + 34 + ========================= ==================================================== 35 + ``cache_dev`` Any DAX-capable block device (``/dev/pmem0``…). 36 + All metadata *and* cached blocks are stored here. 37 + 38 + ``backing_dev`` The slow block device to be cached. 39 + 40 + ``cache_mode`` Optional, Only ``writeback`` is accepted at the 41 + moment. 42 + 43 + ``data_crc`` Optional, default to ``false`` 44 + 45 + * ``true`` – store CRC32 for every cached entry 46 + and verify on reads 47 + * ``false`` – skip CRC (faster) 48 + ========================= ==================================================== 49 + 50 + Example 51 + ------- 52 + 53 + .. code-block:: shell 54 + 55 + dmsetup create pcache_sdb --table \ 56 + "0 $(blockdev --getsz /dev/sdb) pcache /dev/pmem0 /dev/sdb 4 cache_mode writeback data_crc true" 57 + 58 + The first time a pmem device is used, dm-pcache formats it automatically 59 + (super-block, cache_info, etc.). 60 + 61 + 62 + Status line 63 + =========== 64 + 65 + ``dmsetup status <device>`` (``STATUSTYPE_INFO``) prints: 66 + 67 + :: 68 + 69 + <sb_flags> <seg_total> <cache_segs> <segs_used> \ 70 + <gc_percent> <cache_flags> \ 71 + <key_head_seg>:<key_head_off> \ 72 + <dirty_tail_seg>:<dirty_tail_off> \ 73 + <key_tail_seg>:<key_tail_off> 74 + 75 + Field meanings 76 + -------------- 77 + 78 + =============================== ============================================= 79 + ``sb_flags`` Super-block flags (e.g. endian marker). 80 + 81 + ``seg_total`` Number of physical *pmem* segments. 82 + 83 + ``cache_segs`` Number of segments used for cache. 84 + 85 + ``segs_used`` Segments currently allocated (bitmap weight). 86 + 87 + ``gc_percent`` Current GC high-water mark (0-90). 88 + 89 + ``cache_flags`` Bit 0 – DATA_CRC enabled 90 + Bit 1 – INIT_DONE (cache initialised) 91 + Bits 2-5 – cache mode (0 == WB). 92 + 93 + ``key_head`` Where new key-sets are being written. 94 + 95 + ``dirty_tail`` First dirty key-set that still needs 96 + write-back to the backing device. 97 + 98 + ``key_tail`` First key-set that may be reclaimed by GC. 99 + =============================== ============================================= 100 + 101 + 102 + Messages 103 + ======== 104 + 105 + *Change GC trigger* 106 + 107 + :: 108 + 109 + dmsetup message <dev> 0 gc_percent <0-90> 110 + 111 + 112 + Theory of operation 113 + =================== 114 + 115 + Sub-devices 116 + ----------- 117 + 118 + ==================== ========================================================= 119 + backing_dev Any block device (SSD/HDD/loop/LVM, etc.). 120 + cache_dev DAX device; must expose direct-access memory. 121 + ==================== ========================================================= 122 + 123 + Segments and key-sets 124 + --------------------- 125 + 126 + * The pmem space is divided into *16 MiB segments*. 127 + * Each write allocates space from a per-CPU *data_head* inside a segment. 128 + * A *cache-key* records a logical range on the origin and where it lives 129 + inside pmem (segment + offset + generation). 130 + * 128 keys form a *key-set* (kset); ksets are written sequentially in pmem 131 + and are themselves crash-safe (CRC). 132 + * The pair *(key_tail, dirty_tail)* delimit clean/dirty and live/dead ksets. 133 + 134 + Write-back 135 + ---------- 136 + 137 + Dirty keys are queued into a tree; a background worker copies data 138 + back to the backing_dev and advances *dirty_tail*. A FLUSH/FUA bio from the 139 + upper layers forces an immediate metadata commit. 140 + 141 + Garbage collection 142 + ------------------ 143 + 144 + GC starts when ``segs_used >= seg_total * gc_percent / 100``. It walks 145 + from *key_tail*, frees segments whose every key has been invalidated, and 146 + advances *key_tail*. 147 + 148 + CRC verification 149 + ---------------- 150 + 151 + If ``data_crc is enabled`` dm-pcache computes a CRC32 over every cached data 152 + range when it is inserted and stores it in the on-media key. Reads 153 + validate the CRC before copying to the caller. 154 + 155 + 156 + Failure handling 157 + ================ 158 + 159 + * *pmem media errors* – all metadata copies are read with 160 + ``copy_mc_to_kernel``; an uncorrectable error logs and aborts initialisation. 161 + * *Cache full* – if no free segment can be found, writes return ``-EBUSY``; 162 + dm-pcache retries internally (request deferral). 163 + * *System crash* – on attach, the driver replays ksets from *key_tail* to 164 + rebuild the in-core trees; every segment’s generation guards against 165 + use-after-free keys. 166 + 167 + 168 + Limitations & TODO 169 + ================== 170 + 171 + * Only *write-back* mode; other modes planned. 172 + * Only FIFO cache invalidate; other (LRU, ARC...) planned. 173 + * Table reload is not supported currently. 174 + * Discard planned. 175 + 176 + 177 + Example workflow 178 + ================ 179 + 180 + .. code-block:: shell 181 + 182 + # 1. Create devices 183 + dmsetup create pcache_sdb --table \ 184 + "0 $(blockdev --getsz /dev/sdb) pcache /dev/pmem0 /dev/sdb 4 cache_mode writeback data_crc true" 185 + 186 + # 2. Put a filesystem on top 187 + mkfs.ext4 /dev/mapper/pcache_sdb 188 + mount /dev/mapper/pcache_sdb /mnt 189 + 190 + # 3. Tune GC threshold to 80 % 191 + dmsetup message pcache_sdb 0 gc_percent 80 192 + 193 + # 4. Observe status 194 + watch -n1 'dmsetup status pcache_sdb' 195 + 196 + # 5. Shutdown 197 + umount /mnt 198 + dmsetup remove pcache_sdb 199 + 200 + 201 + ``dm-pcache`` is under active development; feedback, bug reports and patches 202 + are very welcome!

+1

Documentation/admin-guide/device-mapper/index.rst

··· 18 18 dm-integrity 19 19 dm-io 20 20 dm-log 21 + dm-pcache 21 22 dm-queue-length 22 23 dm-raid 23 24 dm-service-time

+8

MAINTAINERS

··· 7051 7051 F: Documentation/admin-guide/device-mapper/vdo*.rst 7052 7052 F: drivers/md/dm-vdo/ 7053 7053 7054 + DEVICE-MAPPER PCACHE TARGET 7055 + M: Dongsheng Yang <dongsheng.yang@linux.dev> 7056 + M: Zheng Gu <cengku@gmail.com> 7057 + L: dm-devel@lists.linux.dev 7058 + S: Maintained 7059 + F: Documentation/admin-guide/device-mapper/dm-pcache.rst 7060 + F: drivers/md/dm-pcache/ 7061 + 7054 7062 DEVLINK 7055 7063 M: Jiri Pirko <jiri@resnulli.us> 7056 7064 L: netdev@vger.kernel.org

+2

drivers/md/Kconfig

··· 659 659 660 660 source "drivers/md/dm-vdo/Kconfig" 661 661 662 + source "drivers/md/dm-pcache/Kconfig" 663 + 662 664 endif # MD

+1

drivers/md/Makefile

··· 71 71 obj-$(CONFIG_DM_THIN_PROVISIONING) += dm-thin-pool.o 72 72 obj-$(CONFIG_DM_VERITY) += dm-verity.o 73 73 obj-$(CONFIG_DM_VDO) += dm-vdo/ 74 + obj-$(CONFIG_DM_PCACHE) += dm-pcache/ 74 75 obj-$(CONFIG_DM_CACHE) += dm-cache.o 75 76 obj-$(CONFIG_DM_CACHE_SMQ) += dm-cache-smq.o 76 77 obj-$(CONFIG_DM_EBS) += dm-ebs.o

+17

drivers/md/dm-pcache/Kconfig

··· 1 + config DM_PCACHE 2 + tristate "Persistent cache for Block Device (Experimental)" 3 + depends on BLK_DEV_DM 4 + depends on DEV_DAX 5 + help 6 + PCACHE provides a mechanism to use persistent memory (e.g., CXL persistent memory, 7 + DAX-enabled devices) as a high-performance cache layer in front of 8 + traditional block devices such as SSDs or HDDs. 9 + 10 + PCACHE is implemented as a kernel module that integrates with the block 11 + layer and supports direct access (DAX) to persistent memory for low-latency, 12 + byte-addressable caching. 13 + 14 + Note: This feature is experimental and should be tested thoroughly 15 + before use in production environments. 16 + 17 + If unsure, say 'N'.

+3

drivers/md/dm-pcache/Makefile

··· 1 + dm-pcache-y := dm_pcache.o cache_dev.o segment.o backing_dev.o cache.o cache_gc.o cache_writeback.o cache_segment.o cache_key.o cache_req.o 2 + 3 + obj-m += dm-pcache.o

+374

drivers/md/dm-pcache/backing_dev.c

··· 1 + // SPDX-License-Identifier: GPL-2.0-or-later 2 + #include <linux/blkdev.h> 3 + 4 + #include "../dm-core.h" 5 + #include "pcache_internal.h" 6 + #include "cache_dev.h" 7 + #include "backing_dev.h" 8 + #include "cache.h" 9 + #include "dm_pcache.h" 10 + 11 + static struct kmem_cache *backing_req_cache; 12 + static struct kmem_cache *backing_bvec_cache; 13 + 14 + static void backing_dev_exit(struct pcache_backing_dev *backing_dev) 15 + { 16 + mempool_exit(&backing_dev->req_pool); 17 + mempool_exit(&backing_dev->bvec_pool); 18 + } 19 + 20 + static void req_submit_fn(struct work_struct *work); 21 + static void req_complete_fn(struct work_struct *work); 22 + static int backing_dev_init(struct dm_pcache *pcache) 23 + { 24 + struct pcache_backing_dev *backing_dev = &pcache->backing_dev; 25 + int ret; 26 + 27 + ret = mempool_init_slab_pool(&backing_dev->req_pool, 128, backing_req_cache); 28 + if (ret) 29 + goto err; 30 + 31 + ret = mempool_init_slab_pool(&backing_dev->bvec_pool, 128, backing_bvec_cache); 32 + if (ret) 33 + goto req_pool_exit; 34 + 35 + INIT_LIST_HEAD(&backing_dev->submit_list); 36 + INIT_LIST_HEAD(&backing_dev->complete_list); 37 + spin_lock_init(&backing_dev->submit_lock); 38 + spin_lock_init(&backing_dev->complete_lock); 39 + INIT_WORK(&backing_dev->req_submit_work, req_submit_fn); 40 + INIT_WORK(&backing_dev->req_complete_work, req_complete_fn); 41 + atomic_set(&backing_dev->inflight_reqs, 0); 42 + init_waitqueue_head(&backing_dev->inflight_wq); 43 + 44 + return 0; 45 + 46 + req_pool_exit: 47 + mempool_exit(&backing_dev->req_pool); 48 + err: 49 + return ret; 50 + } 51 + 52 + int backing_dev_start(struct dm_pcache *pcache) 53 + { 54 + struct pcache_backing_dev *backing_dev = &pcache->backing_dev; 55 + int ret; 56 + 57 + ret = backing_dev_init(pcache); 58 + if (ret) 59 + return ret; 60 + 61 + backing_dev->dev_size = bdev_nr_sectors(backing_dev->dm_dev->bdev); 62 + 63 + return 0; 64 + } 65 + 66 + void backing_dev_stop(struct dm_pcache *pcache) 67 + { 68 + struct pcache_backing_dev *backing_dev = &pcache->backing_dev; 69 + 70 + /* 71 + * There should not be any new request comming, just wait 72 + * inflight requests done. 73 + */ 74 + wait_event(backing_dev->inflight_wq, 75 + atomic_read(&backing_dev->inflight_reqs) == 0); 76 + 77 + flush_work(&backing_dev->req_submit_work); 78 + flush_work(&backing_dev->req_complete_work); 79 + 80 + backing_dev_exit(backing_dev); 81 + } 82 + 83 + /* pcache_backing_dev_req functions */ 84 + void backing_dev_req_end(struct pcache_backing_dev_req *backing_req) 85 + { 86 + struct pcache_backing_dev *backing_dev = backing_req->backing_dev; 87 + 88 + if (backing_req->end_req) 89 + backing_req->end_req(backing_req, backing_req->ret); 90 + 91 + switch (backing_req->type) { 92 + case BACKING_DEV_REQ_TYPE_REQ: 93 + if (backing_req->req.upper_req) 94 + pcache_req_put(backing_req->req.upper_req, backing_req->ret); 95 + break; 96 + case BACKING_DEV_REQ_TYPE_KMEM: 97 + if (backing_req->kmem.bvecs != backing_req->kmem.inline_bvecs) 98 + mempool_free(backing_req->kmem.bvecs, &backing_dev->bvec_pool); 99 + break; 100 + default: 101 + BUG(); 102 + } 103 + 104 + mempool_free(backing_req, &backing_dev->req_pool); 105 + 106 + if (atomic_dec_and_test(&backing_dev->inflight_reqs)) 107 + wake_up(&backing_dev->inflight_wq); 108 + } 109 + 110 + static void req_complete_fn(struct work_struct *work) 111 + { 112 + struct pcache_backing_dev *backing_dev = container_of(work, struct pcache_backing_dev, req_complete_work); 113 + struct pcache_backing_dev_req *backing_req; 114 + LIST_HEAD(tmp_list); 115 + 116 + spin_lock_irq(&backing_dev->complete_lock); 117 + list_splice_init(&backing_dev->complete_list, &tmp_list); 118 + spin_unlock_irq(&backing_dev->complete_lock); 119 + 120 + while (!list_empty(&tmp_list)) { 121 + backing_req = list_first_entry(&tmp_list, 122 + struct pcache_backing_dev_req, node); 123 + list_del_init(&backing_req->node); 124 + backing_dev_req_end(backing_req); 125 + } 126 + } 127 + 128 + static void backing_dev_bio_end(struct bio *bio) 129 + { 130 + struct pcache_backing_dev_req *backing_req = bio->bi_private; 131 + struct pcache_backing_dev *backing_dev = backing_req->backing_dev; 132 + unsigned long flags; 133 + 134 + backing_req->ret = blk_status_to_errno(bio->bi_status); 135 + 136 + spin_lock_irqsave(&backing_dev->complete_lock, flags); 137 + list_move_tail(&backing_req->node, &backing_dev->complete_list); 138 + queue_work(BACKING_DEV_TO_PCACHE(backing_dev)->task_wq, &backing_dev->req_complete_work); 139 + spin_unlock_irqrestore(&backing_dev->complete_lock, flags); 140 + } 141 + 142 + static void req_submit_fn(struct work_struct *work) 143 + { 144 + struct pcache_backing_dev *backing_dev = container_of(work, struct pcache_backing_dev, req_submit_work); 145 + struct pcache_backing_dev_req *backing_req; 146 + LIST_HEAD(tmp_list); 147 + 148 + spin_lock(&backing_dev->submit_lock); 149 + list_splice_init(&backing_dev->submit_list, &tmp_list); 150 + spin_unlock(&backing_dev->submit_lock); 151 + 152 + while (!list_empty(&tmp_list)) { 153 + backing_req = list_first_entry(&tmp_list, 154 + struct pcache_backing_dev_req, node); 155 + list_del_init(&backing_req->node); 156 + submit_bio_noacct(&backing_req->bio); 157 + } 158 + } 159 + 160 + void backing_dev_req_submit(struct pcache_backing_dev_req *backing_req, bool direct) 161 + { 162 + struct pcache_backing_dev *backing_dev = backing_req->backing_dev; 163 + 164 + if (direct) { 165 + submit_bio_noacct(&backing_req->bio); 166 + return; 167 + } 168 + 169 + spin_lock(&backing_dev->submit_lock); 170 + list_add_tail(&backing_req->node, &backing_dev->submit_list); 171 + queue_work(BACKING_DEV_TO_PCACHE(backing_dev)->task_wq, &backing_dev->req_submit_work); 172 + spin_unlock(&backing_dev->submit_lock); 173 + } 174 + 175 + static void bio_map(struct bio *bio, void *base, size_t size) 176 + { 177 + struct page *page; 178 + unsigned int offset; 179 + unsigned int len; 180 + 181 + if (!is_vmalloc_addr(base)) { 182 + page = virt_to_page(base); 183 + offset = offset_in_page(base); 184 + 185 + BUG_ON(!bio_add_page(bio, page, size, offset)); 186 + return; 187 + } 188 + 189 + flush_kernel_vmap_range(base, size); 190 + while (size) { 191 + page = vmalloc_to_page(base); 192 + offset = offset_in_page(base); 193 + len = min_t(size_t, PAGE_SIZE - offset, size); 194 + 195 + BUG_ON(!bio_add_page(bio, page, len, offset)); 196 + size -= len; 197 + base += len; 198 + } 199 + } 200 + 201 + static struct pcache_backing_dev_req *req_type_req_alloc(struct pcache_backing_dev *backing_dev, 202 + struct pcache_backing_dev_req_opts *opts) 203 + { 204 + struct pcache_request *pcache_req = opts->req.upper_req; 205 + struct pcache_backing_dev_req *backing_req; 206 + struct bio *orig = pcache_req->bio; 207 + 208 + backing_req = mempool_alloc(&backing_dev->req_pool, opts->gfp_mask); 209 + if (!backing_req) 210 + return NULL; 211 + 212 + memset(backing_req, 0, sizeof(struct pcache_backing_dev_req)); 213 + 214 + bio_init_clone(backing_dev->dm_dev->bdev, &backing_req->bio, orig, opts->gfp_mask); 215 + 216 + backing_req->type = BACKING_DEV_REQ_TYPE_REQ; 217 + backing_req->backing_dev = backing_dev; 218 + atomic_inc(&backing_dev->inflight_reqs); 219 + 220 + return backing_req; 221 + } 222 + 223 + static struct pcache_backing_dev_req *kmem_type_req_alloc(struct pcache_backing_dev *backing_dev, 224 + struct pcache_backing_dev_req_opts *opts) 225 + { 226 + struct pcache_backing_dev_req *backing_req; 227 + u32 n_vecs = bio_add_max_vecs(opts->kmem.data, opts->kmem.len); 228 + 229 + backing_req = mempool_alloc(&backing_dev->req_pool, opts->gfp_mask); 230 + if (!backing_req) 231 + return NULL; 232 + 233 + memset(backing_req, 0, sizeof(struct pcache_backing_dev_req)); 234 + 235 + if (n_vecs > BACKING_DEV_REQ_INLINE_BVECS) { 236 + backing_req->kmem.bvecs = mempool_alloc(&backing_dev->bvec_pool, opts->gfp_mask); 237 + if (!backing_req->kmem.bvecs) 238 + goto free_backing_req; 239 + } else { 240 + backing_req->kmem.bvecs = backing_req->kmem.inline_bvecs; 241 + } 242 + 243 + backing_req->kmem.n_vecs = n_vecs; 244 + backing_req->type = BACKING_DEV_REQ_TYPE_KMEM; 245 + backing_req->backing_dev = backing_dev; 246 + atomic_inc(&backing_dev->inflight_reqs); 247 + 248 + return backing_req; 249 + 250 + free_backing_req: 251 + mempool_free(backing_req, &backing_dev->req_pool); 252 + return NULL; 253 + } 254 + 255 + struct pcache_backing_dev_req *backing_dev_req_alloc(struct pcache_backing_dev *backing_dev, 256 + struct pcache_backing_dev_req_opts *opts) 257 + { 258 + if (opts->type == BACKING_DEV_REQ_TYPE_REQ) 259 + return req_type_req_alloc(backing_dev, opts); 260 + 261 + if (opts->type == BACKING_DEV_REQ_TYPE_KMEM) 262 + return kmem_type_req_alloc(backing_dev, opts); 263 + 264 + BUG(); 265 + } 266 + 267 + static void req_type_req_init(struct pcache_backing_dev_req *backing_req, 268 + struct pcache_backing_dev_req_opts *opts) 269 + { 270 + struct pcache_request *pcache_req = opts->req.upper_req; 271 + struct bio *clone; 272 + u32 off = opts->req.req_off; 273 + u32 len = opts->req.len; 274 + 275 + clone = &backing_req->bio; 276 + BUG_ON(off & SECTOR_MASK); 277 + BUG_ON(len & SECTOR_MASK); 278 + bio_trim(clone, off >> SECTOR_SHIFT, len >> SECTOR_SHIFT); 279 + 280 + clone->bi_iter.bi_sector = (pcache_req->off + off) >> SECTOR_SHIFT; 281 + clone->bi_private = backing_req; 282 + clone->bi_end_io = backing_dev_bio_end; 283 + 284 + INIT_LIST_HEAD(&backing_req->node); 285 + backing_req->end_req = opts->end_fn; 286 + 287 + pcache_req_get(pcache_req); 288 + backing_req->req.upper_req = pcache_req; 289 + backing_req->req.bio_off = off; 290 + } 291 + 292 + static void kmem_type_req_init(struct pcache_backing_dev_req *backing_req, 293 + struct pcache_backing_dev_req_opts *opts) 294 + { 295 + struct pcache_backing_dev *backing_dev = backing_req->backing_dev; 296 + struct bio *backing_bio; 297 + 298 + bio_init(&backing_req->bio, backing_dev->dm_dev->bdev, backing_req->kmem.bvecs, 299 + backing_req->kmem.n_vecs, opts->kmem.opf); 300 + 301 + backing_bio = &backing_req->bio; 302 + bio_map(backing_bio, opts->kmem.data, opts->kmem.len); 303 + 304 + backing_bio->bi_iter.bi_sector = (opts->kmem.backing_off) >> SECTOR_SHIFT; 305 + backing_bio->bi_private = backing_req; 306 + backing_bio->bi_end_io = backing_dev_bio_end; 307 + 308 + INIT_LIST_HEAD(&backing_req->node); 309 + backing_req->end_req = opts->end_fn; 310 + backing_req->priv_data = opts->priv_data; 311 + } 312 + 313 + void backing_dev_req_init(struct pcache_backing_dev_req *backing_req, 314 + struct pcache_backing_dev_req_opts *opts) 315 + { 316 + if (opts->type == BACKING_DEV_REQ_TYPE_REQ) 317 + return req_type_req_init(backing_req, opts); 318 + 319 + if (opts->type == BACKING_DEV_REQ_TYPE_KMEM) 320 + return kmem_type_req_init(backing_req, opts); 321 + 322 + BUG(); 323 + } 324 + 325 + struct pcache_backing_dev_req *backing_dev_req_create(struct pcache_backing_dev *backing_dev, 326 + struct pcache_backing_dev_req_opts *opts) 327 + { 328 + struct pcache_backing_dev_req *backing_req; 329 + 330 + backing_req = backing_dev_req_alloc(backing_dev, opts); 331 + if (!backing_req) 332 + return NULL; 333 + 334 + backing_dev_req_init(backing_req, opts); 335 + 336 + return backing_req; 337 + } 338 + 339 + void backing_dev_flush(struct pcache_backing_dev *backing_dev) 340 + { 341 + blkdev_issue_flush(backing_dev->dm_dev->bdev); 342 + } 343 + 344 + int pcache_backing_init(void) 345 + { 346 + u32 max_bvecs = (PCACHE_CACHE_SUBTREE_SIZE >> PAGE_SHIFT) + 1; 347 + int ret; 348 + 349 + backing_req_cache = KMEM_CACHE(pcache_backing_dev_req, 0); 350 + if (!backing_req_cache) { 351 + ret = -ENOMEM; 352 + goto err; 353 + } 354 + 355 + backing_bvec_cache = kmem_cache_create("pcache-bvec-slab", 356 + max_bvecs * sizeof(struct bio_vec), 357 + 0, 0, NULL); 358 + if (!backing_bvec_cache) { 359 + ret = -ENOMEM; 360 + goto destroy_req_cache; 361 + } 362 + 363 + return 0; 364 + destroy_req_cache: 365 + kmem_cache_destroy(backing_req_cache); 366 + err: 367 + return ret; 368 + } 369 + 370 + void pcache_backing_exit(void) 371 + { 372 + kmem_cache_destroy(backing_bvec_cache); 373 + kmem_cache_destroy(backing_req_cache); 374 + }

+127

drivers/md/dm-pcache/backing_dev.h

··· 1 + /* SPDX-License-Identifier: GPL-2.0-or-later */ 2 + #ifndef _BACKING_DEV_H 3 + #define _BACKING_DEV_H 4 + 5 + #include <linux/device-mapper.h> 6 + 7 + #include "pcache_internal.h" 8 + 9 + struct pcache_backing_dev_req; 10 + typedef void (*backing_req_end_fn_t)(struct pcache_backing_dev_req *backing_req, int ret); 11 + 12 + #define BACKING_DEV_REQ_TYPE_REQ 1 13 + #define BACKING_DEV_REQ_TYPE_KMEM 2 14 + 15 + #define BACKING_DEV_REQ_INLINE_BVECS 4 16 + 17 + struct pcache_request; 18 + struct pcache_backing_dev_req { 19 + u8 type; 20 + struct bio bio; 21 + struct pcache_backing_dev *backing_dev; 22 + 23 + void *priv_data; 24 + backing_req_end_fn_t end_req; 25 + 26 + struct list_head node; 27 + int ret; 28 + 29 + union { 30 + struct { 31 + struct pcache_request *upper_req; 32 + u32 bio_off; 33 + } req; 34 + struct { 35 + struct bio_vec inline_bvecs[BACKING_DEV_REQ_INLINE_BVECS]; 36 + struct bio_vec *bvecs; 37 + u32 n_vecs; 38 + } kmem; 39 + }; 40 + }; 41 + 42 + struct pcache_backing_dev { 43 + struct pcache_cache *cache; 44 + 45 + struct dm_dev *dm_dev; 46 + mempool_t req_pool; 47 + mempool_t bvec_pool; 48 + 49 + struct list_head submit_list; 50 + spinlock_t submit_lock; 51 + struct work_struct req_submit_work; 52 + 53 + struct list_head complete_list; 54 + spinlock_t complete_lock; 55 + struct work_struct req_complete_work; 56 + 57 + atomic_t inflight_reqs; 58 + wait_queue_head_t inflight_wq; 59 + 60 + u64 dev_size; 61 + }; 62 + 63 + struct dm_pcache; 64 + int backing_dev_start(struct dm_pcache *pcache); 65 + void backing_dev_stop(struct dm_pcache *pcache); 66 + 67 + struct pcache_backing_dev_req_opts { 68 + u32 type; 69 + union { 70 + struct { 71 + struct pcache_request *upper_req; 72 + u32 req_off; 73 + u32 len; 74 + } req; 75 + struct { 76 + void *data; 77 + blk_opf_t opf; 78 + u32 len; 79 + u64 backing_off; 80 + } kmem; 81 + }; 82 + 83 + gfp_t gfp_mask; 84 + backing_req_end_fn_t end_fn; 85 + void *priv_data; 86 + }; 87 + 88 + static inline u32 backing_dev_req_coalesced_max_len(const void *data, u32 len) 89 + { 90 + const void *p = data; 91 + u32 done = 0, in_page, to_advance; 92 + struct page *first_page, *next_page; 93 + 94 + if (!is_vmalloc_addr(data)) 95 + return len; 96 + 97 + first_page = vmalloc_to_page(p); 98 + advance: 99 + in_page = PAGE_SIZE - offset_in_page(p); 100 + to_advance = min_t(u32, in_page, len - done); 101 + 102 + done += to_advance; 103 + p += to_advance; 104 + 105 + if (done == len) 106 + return done; 107 + 108 + next_page = vmalloc_to_page(p); 109 + if (zone_device_pages_have_same_pgmap(first_page, next_page)) 110 + goto advance; 111 + 112 + return done; 113 + } 114 + 115 + void backing_dev_req_submit(struct pcache_backing_dev_req *backing_req, bool direct); 116 + void backing_dev_req_end(struct pcache_backing_dev_req *backing_req); 117 + struct pcache_backing_dev_req *backing_dev_req_create(struct pcache_backing_dev *backing_dev, 118 + struct pcache_backing_dev_req_opts *opts); 119 + struct pcache_backing_dev_req *backing_dev_req_alloc(struct pcache_backing_dev *backing_dev, 120 + struct pcache_backing_dev_req_opts *opts); 121 + void backing_dev_req_init(struct pcache_backing_dev_req *backing_req, 122 + struct pcache_backing_dev_req_opts *opts); 123 + void backing_dev_flush(struct pcache_backing_dev *backing_dev); 124 + 125 + int pcache_backing_init(void); 126 + void pcache_backing_exit(void); 127 + #endif /* _BACKING_DEV_H */

+445

drivers/md/dm-pcache/cache.c

··· 1 + // SPDX-License-Identifier: GPL-2.0-or-later 2 + #include <linux/blk_types.h> 3 + 4 + #include "cache.h" 5 + #include "cache_dev.h" 6 + #include "backing_dev.h" 7 + #include "dm_pcache.h" 8 + 9 + struct kmem_cache *key_cache; 10 + 11 + static inline struct pcache_cache_info *get_cache_info_addr(struct pcache_cache *cache) 12 + { 13 + return cache->cache_info_addr + cache->info_index; 14 + } 15 + 16 + static void cache_info_write(struct pcache_cache *cache) 17 + { 18 + struct pcache_cache_info *cache_info = &cache->cache_info; 19 + 20 + cache_info->header.seq++; 21 + cache_info->header.crc = pcache_meta_crc(&cache_info->header, 22 + sizeof(struct pcache_cache_info)); 23 + 24 + memcpy_flushcache(get_cache_info_addr(cache), cache_info, 25 + sizeof(struct pcache_cache_info)); 26 + 27 + cache->info_index = (cache->info_index + 1) % PCACHE_META_INDEX_MAX; 28 + } 29 + 30 + static void cache_info_init_default(struct pcache_cache *cache); 31 + static int cache_info_init(struct pcache_cache *cache, struct pcache_cache_options *opts) 32 + { 33 + struct dm_pcache *pcache = CACHE_TO_PCACHE(cache); 34 + struct pcache_cache_info *cache_info_addr; 35 + 36 + cache_info_addr = pcache_meta_find_latest(&cache->cache_info_addr->header, 37 + sizeof(struct pcache_cache_info), 38 + PCACHE_CACHE_INFO_SIZE, 39 + &cache->cache_info); 40 + if (IS_ERR(cache_info_addr)) 41 + return PTR_ERR(cache_info_addr); 42 + 43 + if (cache_info_addr) { 44 + if (opts->data_crc != 45 + (cache->cache_info.flags & PCACHE_CACHE_FLAGS_DATA_CRC)) { 46 + pcache_dev_err(pcache, "invalid option for data_crc: %s, expected: %s", 47 + opts->data_crc ? "true" : "false", 48 + cache->cache_info.flags & PCACHE_CACHE_FLAGS_DATA_CRC ? "true" : "false"); 49 + return -EINVAL; 50 + } 51 + 52 + return 0; 53 + } 54 + 55 + /* init cache_info for new cache */ 56 + cache_info_init_default(cache); 57 + cache_mode_set(cache, opts->cache_mode); 58 + if (opts->data_crc) 59 + cache->cache_info.flags |= PCACHE_CACHE_FLAGS_DATA_CRC; 60 + 61 + return 0; 62 + } 63 + 64 + static void cache_info_set_gc_percent(struct pcache_cache_info *cache_info, u8 percent) 65 + { 66 + cache_info->flags &= ~PCACHE_CACHE_FLAGS_GC_PERCENT_MASK; 67 + cache_info->flags |= FIELD_PREP(PCACHE_CACHE_FLAGS_GC_PERCENT_MASK, percent); 68 + } 69 + 70 + int pcache_cache_set_gc_percent(struct pcache_cache *cache, u8 percent) 71 + { 72 + if (percent > PCACHE_CACHE_GC_PERCENT_MAX || percent < PCACHE_CACHE_GC_PERCENT_MIN) 73 + return -EINVAL; 74 + 75 + mutex_lock(&cache->cache_info_lock); 76 + cache_info_set_gc_percent(&cache->cache_info, percent); 77 + 78 + cache_info_write(cache); 79 + mutex_unlock(&cache->cache_info_lock); 80 + 81 + return 0; 82 + } 83 + 84 + void cache_pos_encode(struct pcache_cache *cache, 85 + struct pcache_cache_pos_onmedia *pos_onmedia_base, 86 + struct pcache_cache_pos *pos, u64 seq, u32 *index) 87 + { 88 + struct pcache_cache_pos_onmedia pos_onmedia; 89 + struct pcache_cache_pos_onmedia *pos_onmedia_addr = pos_onmedia_base + *index; 90 + 91 + pos_onmedia.cache_seg_id = pos->cache_seg->cache_seg_id; 92 + pos_onmedia.seg_off = pos->seg_off; 93 + pos_onmedia.header.seq = seq; 94 + pos_onmedia.header.crc = cache_pos_onmedia_crc(&pos_onmedia); 95 + 96 + memcpy_flushcache(pos_onmedia_addr, &pos_onmedia, sizeof(struct pcache_cache_pos_onmedia)); 97 + pmem_wmb(); 98 + 99 + *index = (*index + 1) % PCACHE_META_INDEX_MAX; 100 + } 101 + 102 + int cache_pos_decode(struct pcache_cache *cache, 103 + struct pcache_cache_pos_onmedia *pos_onmedia, 104 + struct pcache_cache_pos *pos, u64 *seq, u32 *index) 105 + { 106 + struct pcache_cache_pos_onmedia latest, *latest_addr; 107 + 108 + latest_addr = pcache_meta_find_latest(&pos_onmedia->header, 109 + sizeof(struct pcache_cache_pos_onmedia), 110 + sizeof(struct pcache_cache_pos_onmedia), 111 + &latest); 112 + if (IS_ERR(latest_addr)) 113 + return PTR_ERR(latest_addr); 114 + 115 + if (!latest_addr) 116 + return -EIO; 117 + 118 + pos->cache_seg = &cache->segments[latest.cache_seg_id]; 119 + pos->seg_off = latest.seg_off; 120 + *seq = latest.header.seq; 121 + *index = (latest_addr - pos_onmedia); 122 + 123 + return 0; 124 + } 125 + 126 + static inline void cache_info_set_seg_id(struct pcache_cache *cache, u32 seg_id) 127 + { 128 + cache->cache_info.seg_id = seg_id; 129 + } 130 + 131 + static int cache_init(struct dm_pcache *pcache) 132 + { 133 + struct pcache_cache *cache = &pcache->cache; 134 + struct pcache_backing_dev *backing_dev = &pcache->backing_dev; 135 + struct pcache_cache_dev *cache_dev = &pcache->cache_dev; 136 + int ret; 137 + 138 + cache->segments = kvcalloc(cache_dev->seg_num, sizeof(struct pcache_cache_segment), GFP_KERNEL); 139 + if (!cache->segments) { 140 + ret = -ENOMEM; 141 + goto err; 142 + } 143 + 144 + cache->seg_map = kvcalloc(BITS_TO_LONGS(cache_dev->seg_num), sizeof(unsigned long), GFP_KERNEL); 145 + if (!cache->seg_map) { 146 + ret = -ENOMEM; 147 + goto free_segments; 148 + } 149 + 150 + cache->backing_dev = backing_dev; 151 + cache->cache_dev = &pcache->cache_dev; 152 + cache->n_segs = cache_dev->seg_num; 153 + atomic_set(&cache->gc_errors, 0); 154 + spin_lock_init(&cache->seg_map_lock); 155 + spin_lock_init(&cache->key_head_lock); 156 + 157 + mutex_init(&cache->cache_info_lock); 158 + mutex_init(&cache->key_tail_lock); 159 + mutex_init(&cache->dirty_tail_lock); 160 + mutex_init(&cache->writeback_lock); 161 + 162 + INIT_DELAYED_WORK(&cache->writeback_work, cache_writeback_fn); 163 + INIT_DELAYED_WORK(&cache->gc_work, pcache_cache_gc_fn); 164 + INIT_WORK(&cache->clean_work, clean_fn); 165 + 166 + return 0; 167 + 168 + free_segments: 169 + kvfree(cache->segments); 170 + err: 171 + return ret; 172 + } 173 + 174 + static void cache_exit(struct pcache_cache *cache) 175 + { 176 + kvfree(cache->seg_map); 177 + kvfree(cache->segments); 178 + } 179 + 180 + static void cache_info_init_default(struct pcache_cache *cache) 181 + { 182 + struct pcache_cache_info *cache_info = &cache->cache_info; 183 + 184 + cache_info->header.seq = 0; 185 + cache_info->n_segs = cache->cache_dev->seg_num; 186 + cache_info_set_gc_percent(cache_info, PCACHE_CACHE_GC_PERCENT_DEFAULT); 187 + } 188 + 189 + static int cache_tail_init(struct pcache_cache *cache) 190 + { 191 + struct dm_pcache *pcache = CACHE_TO_PCACHE(cache); 192 + bool new_cache = !(cache->cache_info.flags & PCACHE_CACHE_FLAGS_INIT_DONE); 193 + 194 + if (new_cache) { 195 + __set_bit(0, cache->seg_map); 196 + 197 + cache->key_head.cache_seg = &cache->segments[0]; 198 + cache->key_head.seg_off = 0; 199 + cache_pos_copy(&cache->key_tail, &cache->key_head); 200 + cache_pos_copy(&cache->dirty_tail, &cache->key_head); 201 + 202 + cache_encode_dirty_tail(cache); 203 + cache_encode_key_tail(cache); 204 + } else { 205 + if (cache_decode_key_tail(cache) || cache_decode_dirty_tail(cache)) { 206 + pcache_dev_err(pcache, "Corrupted key tail or dirty tail.\n"); 207 + return -EIO; 208 + } 209 + } 210 + 211 + return 0; 212 + } 213 + 214 + static int get_seg_id(struct pcache_cache *cache, 215 + struct pcache_cache_segment *prev_cache_seg, 216 + bool new_cache, u32 *seg_id) 217 + { 218 + struct dm_pcache *pcache = CACHE_TO_PCACHE(cache); 219 + struct pcache_cache_dev *cache_dev = cache->cache_dev; 220 + int ret; 221 + 222 + if (new_cache) { 223 + ret = cache_dev_get_empty_segment_id(cache_dev, seg_id); 224 + if (ret) { 225 + pcache_dev_err(pcache, "no available segment\n"); 226 + goto err; 227 + } 228 + 229 + if (prev_cache_seg) 230 + cache_seg_set_next_seg(prev_cache_seg, *seg_id); 231 + else 232 + cache_info_set_seg_id(cache, *seg_id); 233 + } else { 234 + if (prev_cache_seg) { 235 + struct pcache_segment_info *prev_seg_info; 236 + 237 + prev_seg_info = &prev_cache_seg->cache_seg_info; 238 + if (!segment_info_has_next(prev_seg_info)) { 239 + ret = -EFAULT; 240 + goto err; 241 + } 242 + *seg_id = prev_cache_seg->cache_seg_info.next_seg; 243 + } else { 244 + *seg_id = cache->cache_info.seg_id; 245 + } 246 + } 247 + return 0; 248 + err: 249 + return ret; 250 + } 251 + 252 + static int cache_segs_init(struct pcache_cache *cache) 253 + { 254 + struct pcache_cache_segment *prev_cache_seg = NULL; 255 + struct pcache_cache_info *cache_info = &cache->cache_info; 256 + bool new_cache = !(cache->cache_info.flags & PCACHE_CACHE_FLAGS_INIT_DONE); 257 + u32 seg_id; 258 + int ret; 259 + u32 i; 260 + 261 + for (i = 0; i < cache_info->n_segs; i++) { 262 + ret = get_seg_id(cache, prev_cache_seg, new_cache, &seg_id); 263 + if (ret) 264 + goto err; 265 + 266 + ret = cache_seg_init(cache, seg_id, i, new_cache); 267 + if (ret) 268 + goto err; 269 + 270 + prev_cache_seg = &cache->segments[i]; 271 + } 272 + return 0; 273 + err: 274 + return ret; 275 + } 276 + 277 + static int cache_init_req_keys(struct pcache_cache *cache, u32 n_paral) 278 + { 279 + struct dm_pcache *pcache = CACHE_TO_PCACHE(cache); 280 + u32 n_subtrees; 281 + int ret; 282 + u32 i, cpu; 283 + 284 + /* Calculate number of cache trees based on the device size */ 285 + n_subtrees = DIV_ROUND_UP(cache->dev_size << SECTOR_SHIFT, PCACHE_CACHE_SUBTREE_SIZE); 286 + ret = cache_tree_init(cache, &cache->req_key_tree, n_subtrees); 287 + if (ret) 288 + goto err; 289 + 290 + cache->n_ksets = n_paral; 291 + cache->ksets = kvcalloc(cache->n_ksets, PCACHE_KSET_SIZE, GFP_KERNEL); 292 + if (!cache->ksets) { 293 + ret = -ENOMEM; 294 + goto req_tree_exit; 295 + } 296 + 297 + /* 298 + * Initialize each kset with a spinlock and delayed work for flushing. 299 + * Each kset is associated with one queue to ensure independent handling 300 + * of cache keys across multiple queues, maximizing multiqueue concurrency. 301 + */ 302 + for (i = 0; i < cache->n_ksets; i++) { 303 + struct pcache_cache_kset *kset = get_kset(cache, i); 304 + 305 + kset->cache = cache; 306 + spin_lock_init(&kset->kset_lock); 307 + INIT_DELAYED_WORK(&kset->flush_work, kset_flush_fn); 308 + } 309 + 310 + cache->data_heads = alloc_percpu(struct pcache_cache_data_head); 311 + if (!cache->data_heads) { 312 + ret = -ENOMEM; 313 + goto free_kset; 314 + } 315 + 316 + for_each_possible_cpu(cpu) { 317 + struct pcache_cache_data_head *h = 318 + per_cpu_ptr(cache->data_heads, cpu); 319 + h->head_pos.cache_seg = NULL; 320 + } 321 + 322 + /* 323 + * Replay persisted cache keys using cache_replay. 324 + * This function loads and replays cache keys from previously stored 325 + * ksets, allowing the cache to restore its state after a restart. 326 + */ 327 + ret = cache_replay(cache); 328 + if (ret) { 329 + pcache_dev_err(pcache, "failed to replay keys\n"); 330 + goto free_heads; 331 + } 332 + 333 + return 0; 334 + 335 + free_heads: 336 + free_percpu(cache->data_heads); 337 + free_kset: 338 + kvfree(cache->ksets); 339 + req_tree_exit: 340 + cache_tree_exit(&cache->req_key_tree); 341 + err: 342 + return ret; 343 + } 344 + 345 + static void cache_destroy_req_keys(struct pcache_cache *cache) 346 + { 347 + u32 i; 348 + 349 + for (i = 0; i < cache->n_ksets; i++) { 350 + struct pcache_cache_kset *kset = get_kset(cache, i); 351 + 352 + cancel_delayed_work_sync(&kset->flush_work); 353 + } 354 + 355 + free_percpu(cache->data_heads); 356 + kvfree(cache->ksets); 357 + cache_tree_exit(&cache->req_key_tree); 358 + } 359 + 360 + int pcache_cache_start(struct dm_pcache *pcache) 361 + { 362 + struct pcache_backing_dev *backing_dev = &pcache->backing_dev; 363 + struct pcache_cache *cache = &pcache->cache; 364 + struct pcache_cache_options *opts = &pcache->opts; 365 + int ret; 366 + 367 + ret = cache_init(pcache); 368 + if (ret) 369 + return ret; 370 + 371 + cache->cache_info_addr = CACHE_DEV_CACHE_INFO(cache->cache_dev); 372 + cache->cache_ctrl = CACHE_DEV_CACHE_CTRL(cache->cache_dev); 373 + backing_dev->cache = cache; 374 + cache->dev_size = backing_dev->dev_size; 375 + 376 + ret = cache_info_init(cache, opts); 377 + if (ret) 378 + goto cache_exit; 379 + 380 + ret = cache_segs_init(cache); 381 + if (ret) 382 + goto cache_exit; 383 + 384 + ret = cache_tail_init(cache); 385 + if (ret) 386 + goto cache_exit; 387 + 388 + ret = cache_init_req_keys(cache, num_online_cpus()); 389 + if (ret) 390 + goto cache_exit; 391 + 392 + ret = cache_writeback_init(cache); 393 + if (ret) 394 + goto destroy_keys; 395 + 396 + cache->cache_info.flags |= PCACHE_CACHE_FLAGS_INIT_DONE; 397 + cache_info_write(cache); 398 + queue_delayed_work(cache_get_wq(cache), &cache->gc_work, 0); 399 + 400 + return 0; 401 + 402 + destroy_keys: 403 + cache_destroy_req_keys(cache); 404 + cache_exit: 405 + cache_exit(cache); 406 + 407 + return ret; 408 + } 409 + 410 + void pcache_cache_stop(struct dm_pcache *pcache) 411 + { 412 + struct pcache_cache *cache = &pcache->cache; 413 + 414 + cache_flush(cache); 415 + 416 + cancel_delayed_work_sync(&cache->gc_work); 417 + flush_work(&cache->clean_work); 418 + cache_writeback_exit(cache); 419 + 420 + if (cache->req_key_tree.n_subtrees) 421 + cache_destroy_req_keys(cache); 422 + 423 + cache_exit(cache); 424 + } 425 + 426 + struct workqueue_struct *cache_get_wq(struct pcache_cache *cache) 427 + { 428 + struct dm_pcache *pcache = CACHE_TO_PCACHE(cache); 429 + 430 + return pcache->task_wq; 431 + } 432 + 433 + int pcache_cache_init(void) 434 + { 435 + key_cache = KMEM_CACHE(pcache_cache_key, 0); 436 + if (!key_cache) 437 + return -ENOMEM; 438 + 439 + return 0; 440 + } 441 + 442 + void pcache_cache_exit(void) 443 + { 444 + kmem_cache_destroy(key_cache); 445 + }

+634

drivers/md/dm-pcache/cache.h

··· 1 + /* SPDX-License-Identifier: GPL-2.0-or-later */ 2 + #ifndef _PCACHE_CACHE_H 3 + #define _PCACHE_CACHE_H 4 + 5 + #include "segment.h" 6 + 7 + /* Garbage collection thresholds */ 8 + #define PCACHE_CACHE_GC_PERCENT_MIN 0 /* Minimum GC percentage */ 9 + #define PCACHE_CACHE_GC_PERCENT_MAX 90 /* Maximum GC percentage */ 10 + #define PCACHE_CACHE_GC_PERCENT_DEFAULT 70 /* Default GC percentage */ 11 + 12 + #define PCACHE_CACHE_SUBTREE_SIZE (4 * PCACHE_MB) /* 4MB total tree size */ 13 + #define PCACHE_CACHE_SUBTREE_SIZE_MASK 0x3FFFFF /* Mask for tree size */ 14 + #define PCACHE_CACHE_SUBTREE_SIZE_SHIFT 22 /* Bit shift for tree size */ 15 + 16 + /* Maximum number of keys per key set */ 17 + #define PCACHE_KSET_KEYS_MAX 128 18 + #define PCACHE_CACHE_SEGS_MAX (1024 * 1024) /* maximum cache size for each device is 16T */ 19 + #define PCACHE_KSET_ONMEDIA_SIZE_MAX struct_size_t(struct pcache_cache_kset_onmedia, data, PCACHE_KSET_KEYS_MAX) 20 + #define PCACHE_KSET_SIZE (sizeof(struct pcache_cache_kset) + sizeof(struct pcache_cache_key_onmedia) * PCACHE_KSET_KEYS_MAX) 21 + 22 + /* Maximum number of keys to clean in one round of clean_work */ 23 + #define PCACHE_CLEAN_KEYS_MAX 10 24 + 25 + /* Writeback and garbage collection intervals in jiffies */ 26 + #define PCACHE_CACHE_WRITEBACK_INTERVAL (5 * HZ) 27 + #define PCACHE_CACHE_GC_INTERVAL (5 * HZ) 28 + 29 + /* Macro to get the cache key structure from an rb_node pointer */ 30 + #define CACHE_KEY(node) (container_of(node, struct pcache_cache_key, rb_node)) 31 + 32 + struct pcache_cache_pos_onmedia { 33 + struct pcache_meta_header header; 34 + __u32 cache_seg_id; 35 + __u32 seg_off; 36 + }; 37 + 38 + /* Offset and size definitions for cache segment control */ 39 + #define PCACHE_CACHE_SEG_CTRL_OFF (PCACHE_SEG_INFO_SIZE * PCACHE_META_INDEX_MAX) 40 + #define PCACHE_CACHE_SEG_CTRL_SIZE (4 * PCACHE_KB) 41 + 42 + struct pcache_cache_seg_gen { 43 + struct pcache_meta_header header; 44 + __u64 gen; 45 + }; 46 + 47 + /* Control structure for cache segments */ 48 + struct pcache_cache_seg_ctrl { 49 + struct pcache_cache_seg_gen gen[PCACHE_META_INDEX_MAX]; 50 + __u64 res[64]; 51 + }; 52 + 53 + #define PCACHE_CACHE_FLAGS_DATA_CRC BIT(0) 54 + #define PCACHE_CACHE_FLAGS_INIT_DONE BIT(1) 55 + 56 + #define PCACHE_CACHE_FLAGS_CACHE_MODE_MASK GENMASK(5, 2) 57 + #define PCACHE_CACHE_MODE_WRITEBACK 0 58 + #define PCACHE_CACHE_MODE_WRITETHROUGH 1 59 + #define PCACHE_CACHE_MODE_WRITEAROUND 2 60 + #define PCACHE_CACHE_MODE_WRITEONLY 3 61 + 62 + #define PCACHE_CACHE_FLAGS_GC_PERCENT_MASK GENMASK(12, 6) 63 + 64 + struct pcache_cache_info { 65 + struct pcache_meta_header header; 66 + __u32 seg_id; 67 + __u32 n_segs; 68 + __u32 flags; 69 + __u32 reserved; 70 + }; 71 + 72 + struct pcache_cache_pos { 73 + struct pcache_cache_segment *cache_seg; 74 + u32 seg_off; 75 + }; 76 + 77 + struct pcache_cache_segment { 78 + struct pcache_cache *cache; 79 + u32 cache_seg_id; /* Index in cache->segments */ 80 + struct pcache_segment segment; 81 + atomic_t refs; 82 + 83 + struct pcache_segment_info cache_seg_info; 84 + struct mutex info_lock; 85 + u32 info_index; 86 + 87 + spinlock_t gen_lock; 88 + u64 gen; 89 + u64 gen_seq; 90 + u32 gen_index; 91 + 92 + struct pcache_cache_seg_ctrl *cache_seg_ctrl; 93 + struct mutex ctrl_lock; 94 + }; 95 + 96 + /* rbtree for cache entries */ 97 + struct pcache_cache_subtree { 98 + struct rb_root root; 99 + spinlock_t tree_lock; 100 + }; 101 + 102 + struct pcache_cache_tree { 103 + struct pcache_cache *cache; 104 + u32 n_subtrees; 105 + mempool_t key_pool; 106 + struct pcache_cache_subtree *subtrees; 107 + }; 108 + 109 + extern struct kmem_cache *key_cache; 110 + 111 + struct pcache_cache_key { 112 + struct pcache_cache_tree *cache_tree; 113 + struct pcache_cache_subtree *cache_subtree; 114 + struct kref ref; 115 + struct rb_node rb_node; 116 + struct list_head list_node; 117 + u64 off; 118 + u32 len; 119 + u32 flags; 120 + struct pcache_cache_pos cache_pos; 121 + u64 seg_gen; 122 + }; 123 + 124 + #define PCACHE_CACHE_KEY_FLAGS_EMPTY BIT(0) 125 + #define PCACHE_CACHE_KEY_FLAGS_CLEAN BIT(1) 126 + 127 + struct pcache_cache_key_onmedia { 128 + __u64 off; 129 + __u32 len; 130 + __u32 flags; 131 + __u32 cache_seg_id; 132 + __u32 cache_seg_off; 133 + __u64 seg_gen; 134 + __u32 data_crc; 135 + __u32 reserved; 136 + }; 137 + 138 + struct pcache_cache_kset_onmedia { 139 + __u32 crc; 140 + union { 141 + __u32 key_num; 142 + __u32 next_cache_seg_id; 143 + }; 144 + __u64 magic; 145 + __u64 flags; 146 + struct pcache_cache_key_onmedia data[]; 147 + }; 148 + 149 + struct pcache_cache { 150 + struct pcache_backing_dev *backing_dev; 151 + struct pcache_cache_dev *cache_dev; 152 + struct pcache_cache_ctrl *cache_ctrl; 153 + u64 dev_size; 154 + 155 + struct pcache_cache_data_head __percpu *data_heads; 156 + 157 + spinlock_t key_head_lock; 158 + struct pcache_cache_pos key_head; 159 + u32 n_ksets; 160 + struct pcache_cache_kset *ksets; 161 + 162 + struct mutex key_tail_lock; 163 + struct pcache_cache_pos key_tail; 164 + u64 key_tail_seq; 165 + u32 key_tail_index; 166 + 167 + struct mutex dirty_tail_lock; 168 + struct pcache_cache_pos dirty_tail; 169 + u64 dirty_tail_seq; 170 + u32 dirty_tail_index; 171 + 172 + struct pcache_cache_tree req_key_tree; 173 + struct work_struct clean_work; 174 + 175 + struct mutex writeback_lock; 176 + char wb_kset_onmedia_buf[PCACHE_KSET_ONMEDIA_SIZE_MAX]; 177 + struct pcache_cache_tree writeback_key_tree; 178 + struct delayed_work writeback_work; 179 + struct { 180 + atomic_t pending; 181 + u32 advance; 182 + int ret; 183 + } writeback_ctx; 184 + 185 + char gc_kset_onmedia_buf[PCACHE_KSET_ONMEDIA_SIZE_MAX]; 186 + struct delayed_work gc_work; 187 + atomic_t gc_errors; 188 + 189 + struct mutex cache_info_lock; 190 + struct pcache_cache_info cache_info; 191 + struct pcache_cache_info *cache_info_addr; 192 + u32 info_index; 193 + 194 + u32 n_segs; 195 + unsigned long *seg_map; 196 + u32 last_cache_seg; 197 + bool cache_full; 198 + spinlock_t seg_map_lock; 199 + struct pcache_cache_segment *segments; 200 + }; 201 + 202 + struct workqueue_struct *cache_get_wq(struct pcache_cache *cache); 203 + 204 + struct dm_pcache; 205 + struct pcache_cache_options { 206 + u32 cache_mode:4; 207 + u32 data_crc:1; 208 + }; 209 + int pcache_cache_start(struct dm_pcache *pcache); 210 + void pcache_cache_stop(struct dm_pcache *pcache); 211 + 212 + struct pcache_cache_ctrl { 213 + /* Updated by gc_thread */ 214 + struct pcache_cache_pos_onmedia key_tail_pos[PCACHE_META_INDEX_MAX]; 215 + 216 + /* Updated by writeback_thread */ 217 + struct pcache_cache_pos_onmedia dirty_tail_pos[PCACHE_META_INDEX_MAX]; 218 + }; 219 + 220 + struct pcache_cache_data_head { 221 + struct pcache_cache_pos head_pos; 222 + }; 223 + 224 + static inline u16 pcache_cache_get_gc_percent(struct pcache_cache *cache) 225 + { 226 + return FIELD_GET(PCACHE_CACHE_FLAGS_GC_PERCENT_MASK, cache->cache_info.flags); 227 + } 228 + 229 + int pcache_cache_set_gc_percent(struct pcache_cache *cache, u8 percent); 230 + 231 + /* cache key */ 232 + struct pcache_cache_key *cache_key_alloc(struct pcache_cache_tree *cache_tree, gfp_t gfp_mask); 233 + void cache_key_init(struct pcache_cache_tree *cache_tree, struct pcache_cache_key *key); 234 + void cache_key_get(struct pcache_cache_key *key); 235 + void cache_key_put(struct pcache_cache_key *key); 236 + int cache_key_append(struct pcache_cache *cache, struct pcache_cache_key *key, bool force_close); 237 + void cache_key_insert(struct pcache_cache_tree *cache_tree, struct pcache_cache_key *key, bool fixup); 238 + int cache_key_decode(struct pcache_cache *cache, 239 + struct pcache_cache_key_onmedia *key_onmedia, 240 + struct pcache_cache_key *key); 241 + void cache_pos_advance(struct pcache_cache_pos *pos, u32 len); 242 + 243 + #define PCACHE_KSET_FLAGS_LAST BIT(0) 244 + #define PCACHE_KSET_MAGIC 0x676894a64e164f1aULL 245 + 246 + struct pcache_cache_kset { 247 + struct pcache_cache *cache; 248 + spinlock_t kset_lock; 249 + struct delayed_work flush_work; 250 + struct pcache_cache_kset_onmedia kset_onmedia; 251 + }; 252 + 253 + extern struct pcache_cache_kset_onmedia pcache_empty_kset; 254 + 255 + #define SUBTREE_WALK_RET_OK 0 256 + #define SUBTREE_WALK_RET_ERR 1 257 + #define SUBTREE_WALK_RET_NEED_KEY 2 258 + #define SUBTREE_WALK_RET_NEED_REQ 3 259 + #define SUBTREE_WALK_RET_RESEARCH 4 260 + 261 + struct pcache_cache_subtree_walk_ctx { 262 + struct pcache_cache_tree *cache_tree; 263 + struct rb_node *start_node; 264 + struct pcache_request *pcache_req; 265 + struct pcache_cache_key *key; 266 + u32 req_done; 267 + int ret; 268 + 269 + /* pre-allocated key and backing_dev_req */ 270 + struct pcache_cache_key *pre_alloc_key; 271 + struct pcache_backing_dev_req *pre_alloc_req; 272 + 273 + struct list_head *delete_key_list; 274 + struct list_head *submit_req_list; 275 + 276 + /* 277 + * |--------| key_tmp 278 + * |====| key 279 + */ 280 + int (*before)(struct pcache_cache_key *key, struct pcache_cache_key *key_tmp, 281 + struct pcache_cache_subtree_walk_ctx *ctx); 282 + 283 + /* 284 + * |----------| key_tmp 285 + * |=====| key 286 + */ 287 + int (*after)(struct pcache_cache_key *key, struct pcache_cache_key *key_tmp, 288 + struct pcache_cache_subtree_walk_ctx *ctx); 289 + 290 + /* 291 + * |----------------| key_tmp 292 + * |===========| key 293 + */ 294 + int (*overlap_tail)(struct pcache_cache_key *key, struct pcache_cache_key *key_tmp, 295 + struct pcache_cache_subtree_walk_ctx *ctx); 296 + 297 + /* 298 + * |--------| key_tmp 299 + * |==========| key 300 + */ 301 + int (*overlap_head)(struct pcache_cache_key *key, struct pcache_cache_key *key_tmp, 302 + struct pcache_cache_subtree_walk_ctx *ctx); 303 + 304 + /* 305 + * |----| key_tmp 306 + * |==========| key 307 + */ 308 + int (*overlap_contain)(struct pcache_cache_key *key, struct pcache_cache_key *key_tmp, 309 + struct pcache_cache_subtree_walk_ctx *ctx); 310 + 311 + /* 312 + * |-----------| key_tmp 313 + * |====| key 314 + */ 315 + int (*overlap_contained)(struct pcache_cache_key *key, struct pcache_cache_key *key_tmp, 316 + struct pcache_cache_subtree_walk_ctx *ctx); 317 + 318 + int (*walk_finally)(struct pcache_cache_subtree_walk_ctx *ctx, int ret); 319 + bool (*walk_done)(struct pcache_cache_subtree_walk_ctx *ctx); 320 + }; 321 + 322 + int cache_subtree_walk(struct pcache_cache_subtree_walk_ctx *ctx); 323 + struct rb_node *cache_subtree_search(struct pcache_cache_subtree *cache_subtree, struct pcache_cache_key *key, 324 + struct rb_node **parentp, struct rb_node ***newp, 325 + struct list_head *delete_key_list); 326 + int cache_kset_close(struct pcache_cache *cache, struct pcache_cache_kset *kset); 327 + void clean_fn(struct work_struct *work); 328 + void kset_flush_fn(struct work_struct *work); 329 + int cache_replay(struct pcache_cache *cache); 330 + int cache_tree_init(struct pcache_cache *cache, struct pcache_cache_tree *cache_tree, u32 n_subtrees); 331 + void cache_tree_clear(struct pcache_cache_tree *cache_tree); 332 + void cache_tree_exit(struct pcache_cache_tree *cache_tree); 333 + 334 + /* cache segments */ 335 + struct pcache_cache_segment *get_cache_segment(struct pcache_cache *cache); 336 + int cache_seg_init(struct pcache_cache *cache, u32 seg_id, u32 cache_seg_id, 337 + bool new_cache); 338 + void cache_seg_get(struct pcache_cache_segment *cache_seg); 339 + void cache_seg_put(struct pcache_cache_segment *cache_seg); 340 + void cache_seg_set_next_seg(struct pcache_cache_segment *cache_seg, u32 seg_id); 341 + 342 + /* cache request*/ 343 + int cache_flush(struct pcache_cache *cache); 344 + void miss_read_end_work_fn(struct work_struct *work); 345 + int pcache_cache_handle_req(struct pcache_cache *cache, struct pcache_request *pcache_req); 346 + 347 + /* gc */ 348 + void pcache_cache_gc_fn(struct work_struct *work); 349 + 350 + /* writeback */ 351 + void cache_writeback_exit(struct pcache_cache *cache); 352 + int cache_writeback_init(struct pcache_cache *cache); 353 + void cache_writeback_fn(struct work_struct *work); 354 + 355 + /* inline functions */ 356 + static inline struct pcache_cache_subtree *get_subtree(struct pcache_cache_tree *cache_tree, u64 off) 357 + { 358 + if (cache_tree->n_subtrees == 1) 359 + return &cache_tree->subtrees[0]; 360 + 361 + return &cache_tree->subtrees[off >> PCACHE_CACHE_SUBTREE_SIZE_SHIFT]; 362 + } 363 + 364 + static inline void *cache_pos_addr(struct pcache_cache_pos *pos) 365 + { 366 + return (pos->cache_seg->segment.data + pos->seg_off); 367 + } 368 + 369 + static inline void *get_key_head_addr(struct pcache_cache *cache) 370 + { 371 + return cache_pos_addr(&cache->key_head); 372 + } 373 + 374 + static inline u32 get_kset_id(struct pcache_cache *cache, u64 off) 375 + { 376 + u32 rem; 377 + div_u64_rem(off >> PCACHE_CACHE_SUBTREE_SIZE_SHIFT, cache->n_ksets, &rem); 378 + return rem; 379 + } 380 + 381 + static inline struct pcache_cache_kset *get_kset(struct pcache_cache *cache, u32 kset_id) 382 + { 383 + return (void *)cache->ksets + PCACHE_KSET_SIZE * kset_id; 384 + } 385 + 386 + static inline struct pcache_cache_data_head *get_data_head(struct pcache_cache *cache) 387 + { 388 + return this_cpu_ptr(cache->data_heads); 389 + } 390 + 391 + static inline bool cache_key_empty(struct pcache_cache_key *key) 392 + { 393 + return key->flags & PCACHE_CACHE_KEY_FLAGS_EMPTY; 394 + } 395 + 396 + static inline bool cache_key_clean(struct pcache_cache_key *key) 397 + { 398 + return key->flags & PCACHE_CACHE_KEY_FLAGS_CLEAN; 399 + } 400 + 401 + static inline void cache_pos_copy(struct pcache_cache_pos *dst, struct pcache_cache_pos *src) 402 + { 403 + memcpy(dst, src, sizeof(struct pcache_cache_pos)); 404 + } 405 + 406 + /** 407 + * cache_seg_is_ctrl_seg - Checks if a cache segment is a cache ctrl segment. 408 + * @cache_seg_id: ID of the cache segment. 409 + * 410 + * Returns true if the cache segment ID corresponds to a cache ctrl segment. 411 + * 412 + * Note: We extend the segment control of the first cache segment 413 + * (cache segment ID 0) to serve as the cache control (pcache_cache_ctrl) 414 + * for the entire PCACHE cache. This function determines whether the given 415 + * cache segment is the one storing the pcache_cache_ctrl information. 416 + */ 417 + static inline bool cache_seg_is_ctrl_seg(u32 cache_seg_id) 418 + { 419 + return (cache_seg_id == 0); 420 + } 421 + 422 + /** 423 + * cache_key_cutfront - Cuts a specified length from the front of a cache key. 424 + * @key: Pointer to pcache_cache_key structure. 425 + * @cut_len: Length to cut from the front. 426 + * 427 + * Advances the cache key position by cut_len and adjusts offset and length accordingly. 428 + */ 429 + static inline void cache_key_cutfront(struct pcache_cache_key *key, u32 cut_len) 430 + { 431 + if (key->cache_pos.cache_seg) 432 + cache_pos_advance(&key->cache_pos, cut_len); 433 + 434 + key->off += cut_len; 435 + key->len -= cut_len; 436 + } 437 + 438 + /** 439 + * cache_key_cutback - Cuts a specified length from the back of a cache key. 440 + * @key: Pointer to pcache_cache_key structure. 441 + * @cut_len: Length to cut from the back. 442 + * 443 + * Reduces the length of the cache key by cut_len. 444 + */ 445 + static inline void cache_key_cutback(struct pcache_cache_key *key, u32 cut_len) 446 + { 447 + key->len -= cut_len; 448 + } 449 + 450 + static inline void cache_key_delete(struct pcache_cache_key *key) 451 + { 452 + struct pcache_cache_subtree *cache_subtree; 453 + 454 + cache_subtree = key->cache_subtree; 455 + BUG_ON(!cache_subtree); 456 + 457 + rb_erase(&key->rb_node, &cache_subtree->root); 458 + key->flags = 0; 459 + cache_key_put(key); 460 + } 461 + 462 + static inline bool cache_data_crc_on(struct pcache_cache *cache) 463 + { 464 + return (cache->cache_info.flags & PCACHE_CACHE_FLAGS_DATA_CRC); 465 + } 466 + 467 + static inline u32 cache_mode_get(struct pcache_cache *cache) 468 + { 469 + return FIELD_GET(PCACHE_CACHE_FLAGS_CACHE_MODE_MASK, cache->cache_info.flags); 470 + } 471 + 472 + static inline void cache_mode_set(struct pcache_cache *cache, u32 cache_mode) 473 + { 474 + cache->cache_info.flags &= ~PCACHE_CACHE_FLAGS_CACHE_MODE_MASK; 475 + cache->cache_info.flags |= FIELD_PREP(PCACHE_CACHE_FLAGS_CACHE_MODE_MASK, cache_mode); 476 + } 477 + 478 + /** 479 + * cache_key_data_crc - Calculates CRC for data in a cache key. 480 + * @key: Pointer to the pcache_cache_key structure. 481 + * 482 + * Returns the CRC-32 checksum of the data within the cache key's position. 483 + */ 484 + static inline u32 cache_key_data_crc(struct pcache_cache_key *key) 485 + { 486 + void *data; 487 + 488 + data = cache_pos_addr(&key->cache_pos); 489 + 490 + return crc32c(PCACHE_CRC_SEED, data, key->len); 491 + } 492 + 493 + static inline u32 cache_kset_crc(struct pcache_cache_kset_onmedia *kset_onmedia) 494 + { 495 + u32 crc_size; 496 + 497 + if (kset_onmedia->flags & PCACHE_KSET_FLAGS_LAST) 498 + crc_size = sizeof(struct pcache_cache_kset_onmedia) - 4; 499 + else 500 + crc_size = struct_size(kset_onmedia, data, kset_onmedia->key_num) - 4; 501 + 502 + return crc32c(PCACHE_CRC_SEED, (void *)kset_onmedia + 4, crc_size); 503 + } 504 + 505 + static inline u32 get_kset_onmedia_size(struct pcache_cache_kset_onmedia *kset_onmedia) 506 + { 507 + return struct_size_t(struct pcache_cache_kset_onmedia, data, kset_onmedia->key_num); 508 + } 509 + 510 + /** 511 + * cache_seg_remain - Computes remaining space in a cache segment. 512 + * @pos: Pointer to pcache_cache_pos structure. 513 + * 514 + * Returns the amount of remaining space in the segment data starting from 515 + * the current position offset. 516 + */ 517 + static inline u32 cache_seg_remain(struct pcache_cache_pos *pos) 518 + { 519 + struct pcache_cache_segment *cache_seg; 520 + struct pcache_segment *segment; 521 + u32 seg_remain; 522 + 523 + cache_seg = pos->cache_seg; 524 + segment = &cache_seg->segment; 525 + seg_remain = segment->data_size - pos->seg_off; 526 + 527 + return seg_remain; 528 + } 529 + 530 + /** 531 + * cache_key_invalid - Checks if a cache key is invalid. 532 + * @key: Pointer to pcache_cache_key structure. 533 + * 534 + * Returns true if the cache key is invalid due to its generation being 535 + * less than the generation of its segment; otherwise returns false. 536 + * 537 + * When the GC (garbage collection) thread identifies a segment 538 + * as reclaimable, it increments the segment's generation (gen). However, 539 + * it does not immediately remove all related cache keys. When accessing 540 + * such a cache key, this function can be used to determine if the cache 541 + * key has already become invalid. 542 + */ 543 + static inline bool cache_key_invalid(struct pcache_cache_key *key) 544 + { 545 + if (cache_key_empty(key)) 546 + return false; 547 + 548 + return (key->seg_gen < key->cache_pos.cache_seg->gen); 549 + } 550 + 551 + /** 552 + * cache_key_lstart - Retrieves the logical start offset of a cache key. 553 + * @key: Pointer to pcache_cache_key structure. 554 + * 555 + * Returns the logical start offset for the cache key. 556 + */ 557 + static inline u64 cache_key_lstart(struct pcache_cache_key *key) 558 + { 559 + return key->off; 560 + } 561 + 562 + /** 563 + * cache_key_lend - Retrieves the logical end offset of a cache key. 564 + * @key: Pointer to pcache_cache_key structure. 565 + * 566 + * Returns the logical end offset for the cache key. 567 + */ 568 + static inline u64 cache_key_lend(struct pcache_cache_key *key) 569 + { 570 + return key->off + key->len; 571 + } 572 + 573 + static inline void cache_key_copy(struct pcache_cache_key *key_dst, struct pcache_cache_key *key_src) 574 + { 575 + key_dst->off = key_src->off; 576 + key_dst->len = key_src->len; 577 + key_dst->seg_gen = key_src->seg_gen; 578 + key_dst->cache_tree = key_src->cache_tree; 579 + key_dst->cache_subtree = key_src->cache_subtree; 580 + key_dst->flags = key_src->flags; 581 + 582 + cache_pos_copy(&key_dst->cache_pos, &key_src->cache_pos); 583 + } 584 + 585 + /** 586 + * cache_pos_onmedia_crc - Calculates the CRC for an on-media cache position. 587 + * @pos_om: Pointer to pcache_cache_pos_onmedia structure. 588 + * 589 + * Calculates the CRC-32 checksum of the position, excluding the first 4 bytes. 590 + * Returns the computed CRC value. 591 + */ 592 + static inline u32 cache_pos_onmedia_crc(struct pcache_cache_pos_onmedia *pos_om) 593 + { 594 + return pcache_meta_crc(&pos_om->header, sizeof(struct pcache_cache_pos_onmedia)); 595 + } 596 + 597 + void cache_pos_encode(struct pcache_cache *cache, 598 + struct pcache_cache_pos_onmedia *pos_onmedia, 599 + struct pcache_cache_pos *pos, u64 seq, u32 *index); 600 + int cache_pos_decode(struct pcache_cache *cache, 601 + struct pcache_cache_pos_onmedia *pos_onmedia, 602 + struct pcache_cache_pos *pos, u64 *seq, u32 *index); 603 + 604 + static inline void cache_encode_key_tail(struct pcache_cache *cache) 605 + { 606 + cache_pos_encode(cache, cache->cache_ctrl->key_tail_pos, 607 + &cache->key_tail, ++cache->key_tail_seq, 608 + &cache->key_tail_index); 609 + } 610 + 611 + static inline int cache_decode_key_tail(struct pcache_cache *cache) 612 + { 613 + return cache_pos_decode(cache, cache->cache_ctrl->key_tail_pos, 614 + &cache->key_tail, &cache->key_tail_seq, 615 + &cache->key_tail_index); 616 + } 617 + 618 + static inline void cache_encode_dirty_tail(struct pcache_cache *cache) 619 + { 620 + cache_pos_encode(cache, cache->cache_ctrl->dirty_tail_pos, 621 + &cache->dirty_tail, ++cache->dirty_tail_seq, 622 + &cache->dirty_tail_index); 623 + } 624 + 625 + static inline int cache_decode_dirty_tail(struct pcache_cache *cache) 626 + { 627 + return cache_pos_decode(cache, cache->cache_ctrl->dirty_tail_pos, 628 + &cache->dirty_tail, &cache->dirty_tail_seq, 629 + &cache->dirty_tail_index); 630 + } 631 + 632 + int pcache_cache_init(void); 633 + void pcache_cache_exit(void); 634 + #endif /* _PCACHE_CACHE_H */

+303

drivers/md/dm-pcache/cache_dev.c

··· 1 + // SPDX-License-Identifier: GPL-2.0-or-later 2 + 3 + #include <linux/blkdev.h> 4 + #include <linux/dax.h> 5 + #include <linux/vmalloc.h> 6 + #include <linux/parser.h> 7 + 8 + #include "cache_dev.h" 9 + #include "backing_dev.h" 10 + #include "cache.h" 11 + #include "dm_pcache.h" 12 + 13 + static void cache_dev_dax_exit(struct pcache_cache_dev *cache_dev) 14 + { 15 + if (cache_dev->use_vmap) 16 + vunmap(cache_dev->mapping); 17 + } 18 + 19 + static int build_vmap(struct dax_device *dax_dev, long total_pages, void **vaddr) 20 + { 21 + struct page **pages; 22 + long i = 0, chunk; 23 + unsigned long pfn; 24 + int ret; 25 + 26 + pages = vmalloc_array(total_pages, sizeof(struct page *)); 27 + if (!pages) 28 + return -ENOMEM; 29 + 30 + do { 31 + chunk = dax_direct_access(dax_dev, i, total_pages - i, 32 + DAX_ACCESS, NULL, &pfn); 33 + if (chunk <= 0) { 34 + ret = chunk ? chunk : -EINVAL; 35 + goto out_free; 36 + } 37 + 38 + if (!pfn_valid(pfn)) { 39 + ret = -EOPNOTSUPP; 40 + goto out_free; 41 + } 42 + 43 + while (chunk-- && i < total_pages) { 44 + pages[i++] = pfn_to_page(pfn); 45 + pfn++; 46 + if (!(i & 15)) 47 + cond_resched(); 48 + } 49 + } while (i < total_pages); 50 + 51 + *vaddr = vmap(pages, total_pages, VM_MAP, PAGE_KERNEL); 52 + if (!*vaddr) { 53 + ret = -ENOMEM; 54 + goto out_free; 55 + } 56 + 57 + ret = 0; 58 + 59 + out_free: 60 + vfree(pages); 61 + return ret; 62 + } 63 + 64 + static int cache_dev_dax_init(struct pcache_cache_dev *cache_dev) 65 + { 66 + struct dm_pcache *pcache = CACHE_DEV_TO_PCACHE(cache_dev); 67 + struct dax_device *dax_dev; 68 + long total_pages, mapped_pages; 69 + u64 bdev_size; 70 + void *vaddr; 71 + int ret; 72 + int id; 73 + unsigned long pfn; 74 + 75 + dax_dev = cache_dev->dm_dev->dax_dev; 76 + /* total size check */ 77 + bdev_size = bdev_nr_bytes(cache_dev->dm_dev->bdev); 78 + if (bdev_size < PCACHE_CACHE_DEV_SIZE_MIN) { 79 + pcache_dev_err(pcache, "dax device is too small, required at least %llu", 80 + PCACHE_CACHE_DEV_SIZE_MIN); 81 + ret = -ENOSPC; 82 + goto out; 83 + } 84 + 85 + total_pages = bdev_size >> PAGE_SHIFT; 86 + /* attempt: direct-map the whole range */ 87 + id = dax_read_lock(); 88 + mapped_pages = dax_direct_access(dax_dev, 0, total_pages, 89 + DAX_ACCESS, &vaddr, &pfn); 90 + if (mapped_pages < 0) { 91 + pcache_dev_err(pcache, "dax_direct_access failed: %ld\n", mapped_pages); 92 + ret = mapped_pages; 93 + goto unlock; 94 + } 95 + 96 + if (!pfn_valid(pfn)) { 97 + ret = -EOPNOTSUPP; 98 + goto unlock; 99 + } 100 + 101 + if (mapped_pages == total_pages) { 102 + /* success: contiguous direct mapping */ 103 + cache_dev->mapping = vaddr; 104 + } else { 105 + /* need vmap fallback */ 106 + ret = build_vmap(dax_dev, total_pages, &vaddr); 107 + if (ret) { 108 + pcache_dev_err(pcache, "vmap fallback failed: %d\n", ret); 109 + goto unlock; 110 + } 111 + 112 + cache_dev->mapping = vaddr; 113 + cache_dev->use_vmap = true; 114 + } 115 + dax_read_unlock(id); 116 + 117 + return 0; 118 + unlock: 119 + dax_read_unlock(id); 120 + out: 121 + return ret; 122 + } 123 + 124 + void cache_dev_zero_range(struct pcache_cache_dev *cache_dev, void *pos, u32 size) 125 + { 126 + memset(pos, 0, size); 127 + dax_flush(cache_dev->dm_dev->dax_dev, pos, size); 128 + } 129 + 130 + static int sb_read(struct pcache_cache_dev *cache_dev, struct pcache_sb *sb) 131 + { 132 + struct pcache_sb *sb_addr = CACHE_DEV_SB(cache_dev); 133 + 134 + if (copy_mc_to_kernel(sb, sb_addr, sizeof(struct pcache_sb))) 135 + return -EIO; 136 + 137 + return 0; 138 + } 139 + 140 + static void sb_write(struct pcache_cache_dev *cache_dev, struct pcache_sb *sb) 141 + { 142 + struct pcache_sb *sb_addr = CACHE_DEV_SB(cache_dev); 143 + 144 + memcpy_flushcache(sb_addr, sb, sizeof(struct pcache_sb)); 145 + pmem_wmb(); 146 + } 147 + 148 + static int sb_init(struct pcache_cache_dev *cache_dev, struct pcache_sb *sb) 149 + { 150 + struct dm_pcache *pcache = CACHE_DEV_TO_PCACHE(cache_dev); 151 + u64 nr_segs; 152 + u64 cache_dev_size; 153 + u64 magic; 154 + u32 flags = 0; 155 + 156 + magic = le64_to_cpu(sb->magic); 157 + if (magic) 158 + return -EEXIST; 159 + 160 + cache_dev_size = bdev_nr_bytes(file_bdev(cache_dev->dm_dev->bdev_file)); 161 + if (cache_dev_size < PCACHE_CACHE_DEV_SIZE_MIN) { 162 + pcache_dev_err(pcache, "dax device is too small, required at least %llu", 163 + PCACHE_CACHE_DEV_SIZE_MIN); 164 + return -ENOSPC; 165 + } 166 + 167 + nr_segs = (cache_dev_size - PCACHE_SEGMENTS_OFF) / ((PCACHE_SEG_SIZE)); 168 + 169 + #if defined(__BYTE_ORDER) ? (__BIG_ENDIAN == __BYTE_ORDER) : defined(__BIG_ENDIAN) 170 + flags |= PCACHE_SB_F_BIGENDIAN; 171 + #endif 172 + sb->flags = cpu_to_le32(flags); 173 + sb->magic = cpu_to_le64(PCACHE_MAGIC); 174 + sb->seg_num = cpu_to_le32(nr_segs); 175 + sb->crc = cpu_to_le32(crc32c(PCACHE_CRC_SEED, (void *)(sb) + 4, sizeof(struct pcache_sb) - 4)); 176 + 177 + cache_dev_zero_range(cache_dev, CACHE_DEV_CACHE_INFO(cache_dev), 178 + PCACHE_CACHE_INFO_SIZE * PCACHE_META_INDEX_MAX + 179 + PCACHE_CACHE_CTRL_SIZE); 180 + 181 + return 0; 182 + } 183 + 184 + static int sb_validate(struct pcache_cache_dev *cache_dev, struct pcache_sb *sb) 185 + { 186 + struct dm_pcache *pcache = CACHE_DEV_TO_PCACHE(cache_dev); 187 + u32 flags; 188 + u32 crc; 189 + 190 + if (le64_to_cpu(sb->magic) != PCACHE_MAGIC) { 191 + pcache_dev_err(pcache, "unexpected magic: %llx\n", 192 + le64_to_cpu(sb->magic)); 193 + return -EINVAL; 194 + } 195 + 196 + crc = crc32c(PCACHE_CRC_SEED, (void *)(sb) + 4, sizeof(struct pcache_sb) - 4); 197 + if (crc != le32_to_cpu(sb->crc)) { 198 + pcache_dev_err(pcache, "corrupted sb: %u, expected: %u\n", crc, le32_to_cpu(sb->crc)); 199 + return -EINVAL; 200 + } 201 + 202 + flags = le32_to_cpu(sb->flags); 203 + #if defined(__BYTE_ORDER) ? (__BIG_ENDIAN == __BYTE_ORDER) : defined(__BIG_ENDIAN) 204 + if (!(flags & PCACHE_SB_F_BIGENDIAN)) { 205 + pcache_dev_err(pcache, "cache_dev is not big endian\n"); 206 + return -EINVAL; 207 + } 208 + #else 209 + if (flags & PCACHE_SB_F_BIGENDIAN) { 210 + pcache_dev_err(pcache, "cache_dev is big endian\n"); 211 + return -EINVAL; 212 + } 213 + #endif 214 + return 0; 215 + } 216 + 217 + static int cache_dev_init(struct pcache_cache_dev *cache_dev, u32 seg_num) 218 + { 219 + cache_dev->seg_num = seg_num; 220 + cache_dev->seg_bitmap = kvcalloc(BITS_TO_LONGS(cache_dev->seg_num), sizeof(unsigned long), GFP_KERNEL); 221 + if (!cache_dev->seg_bitmap) 222 + return -ENOMEM; 223 + 224 + return 0; 225 + } 226 + 227 + static void cache_dev_exit(struct pcache_cache_dev *cache_dev) 228 + { 229 + kvfree(cache_dev->seg_bitmap); 230 + } 231 + 232 + void cache_dev_stop(struct dm_pcache *pcache) 233 + { 234 + struct pcache_cache_dev *cache_dev = &pcache->cache_dev; 235 + 236 + cache_dev_exit(cache_dev); 237 + cache_dev_dax_exit(cache_dev); 238 + } 239 + 240 + int cache_dev_start(struct dm_pcache *pcache) 241 + { 242 + struct pcache_cache_dev *cache_dev = &pcache->cache_dev; 243 + struct pcache_sb sb; 244 + bool format = false; 245 + int ret; 246 + 247 + mutex_init(&cache_dev->seg_lock); 248 + 249 + ret = cache_dev_dax_init(cache_dev); 250 + if (ret) { 251 + pcache_dev_err(pcache, "failed to init cache_dev %s via dax way: %d.", 252 + cache_dev->dm_dev->name, ret); 253 + goto err; 254 + } 255 + 256 + ret = sb_read(cache_dev, &sb); 257 + if (ret) 258 + goto dax_release; 259 + 260 + if (le64_to_cpu(sb.magic) == 0) { 261 + format = true; 262 + ret = sb_init(cache_dev, &sb); 263 + if (ret < 0) 264 + goto dax_release; 265 + } 266 + 267 + ret = sb_validate(cache_dev, &sb); 268 + if (ret) 269 + goto dax_release; 270 + 271 + cache_dev->sb_flags = le32_to_cpu(sb.flags); 272 + ret = cache_dev_init(cache_dev, le32_to_cpu(sb.seg_num)); 273 + if (ret) 274 + goto dax_release; 275 + 276 + if (format) 277 + sb_write(cache_dev, &sb); 278 + 279 + return 0; 280 + 281 + dax_release: 282 + cache_dev_dax_exit(cache_dev); 283 + err: 284 + return ret; 285 + } 286 + 287 + int cache_dev_get_empty_segment_id(struct pcache_cache_dev *cache_dev, u32 *seg_id) 288 + { 289 + int ret; 290 + 291 + mutex_lock(&cache_dev->seg_lock); 292 + *seg_id = find_next_zero_bit(cache_dev->seg_bitmap, cache_dev->seg_num, 0); 293 + if (*seg_id == cache_dev->seg_num) { 294 + ret = -ENOSPC; 295 + goto unlock; 296 + } 297 + 298 + __set_bit(*seg_id, cache_dev->seg_bitmap); 299 + ret = 0; 300 + unlock: 301 + mutex_unlock(&cache_dev->seg_lock); 302 + return ret; 303 + }

+70

drivers/md/dm-pcache/cache_dev.h

··· 1 + /* SPDX-License-Identifier: GPL-2.0-or-later */ 2 + #ifndef _PCACHE_CACHE_DEV_H 3 + #define _PCACHE_CACHE_DEV_H 4 + 5 + #include <linux/device.h> 6 + #include <linux/device-mapper.h> 7 + 8 + #include "pcache_internal.h" 9 + 10 + #define PCACHE_MAGIC 0x65B05EFA96C596EFULL 11 + 12 + #define PCACHE_SB_OFF (4 * PCACHE_KB) 13 + #define PCACHE_SB_SIZE (4 * PCACHE_KB) 14 + 15 + #define PCACHE_CACHE_INFO_OFF (PCACHE_SB_OFF + PCACHE_SB_SIZE) 16 + #define PCACHE_CACHE_INFO_SIZE (4 * PCACHE_KB) 17 + 18 + #define PCACHE_CACHE_CTRL_OFF (PCACHE_CACHE_INFO_OFF + (PCACHE_CACHE_INFO_SIZE * PCACHE_META_INDEX_MAX)) 19 + #define PCACHE_CACHE_CTRL_SIZE (4 * PCACHE_KB) 20 + 21 + #define PCACHE_SEGMENTS_OFF (PCACHE_CACHE_CTRL_OFF + PCACHE_CACHE_CTRL_SIZE) 22 + #define PCACHE_SEG_INFO_SIZE (4 * PCACHE_KB) 23 + 24 + #define PCACHE_CACHE_DEV_SIZE_MIN (512 * PCACHE_MB) /* 512 MB */ 25 + #define PCACHE_SEG_SIZE (16 * PCACHE_MB) /* Size of each PCACHE segment (16 MB) */ 26 + 27 + #define CACHE_DEV_SB(cache_dev) ((struct pcache_sb *)(cache_dev->mapping + PCACHE_SB_OFF)) 28 + #define CACHE_DEV_CACHE_INFO(cache_dev) ((void *)cache_dev->mapping + PCACHE_CACHE_INFO_OFF) 29 + #define CACHE_DEV_CACHE_CTRL(cache_dev) ((void *)cache_dev->mapping + PCACHE_CACHE_CTRL_OFF) 30 + #define CACHE_DEV_SEGMENTS(cache_dev) ((void *)cache_dev->mapping + PCACHE_SEGMENTS_OFF) 31 + #define CACHE_DEV_SEGMENT(cache_dev, id) ((void *)CACHE_DEV_SEGMENTS(cache_dev) + (u64)id * PCACHE_SEG_SIZE) 32 + 33 + /* 34 + * PCACHE SB flags configured during formatting 35 + * 36 + * The PCACHE_SB_F_xxx flags define registration requirements based on cache_dev 37 + * formatting. For a machine to register a cache_dev: 38 + * - PCACHE_SB_F_BIGENDIAN: Requires a big-endian machine. 39 + */ 40 + #define PCACHE_SB_F_BIGENDIAN BIT(0) 41 + 42 + struct pcache_sb { 43 + __le32 crc; 44 + __le32 flags; 45 + __le64 magic; 46 + 47 + __le32 seg_num; 48 + }; 49 + 50 + struct pcache_cache_dev { 51 + u32 sb_flags; 52 + u32 seg_num; 53 + void *mapping; 54 + bool use_vmap; 55 + 56 + struct dm_dev *dm_dev; 57 + 58 + struct mutex seg_lock; 59 + unsigned long *seg_bitmap; 60 + }; 61 + 62 + struct dm_pcache; 63 + int cache_dev_start(struct dm_pcache *pcache); 64 + void cache_dev_stop(struct dm_pcache *pcache); 65 + 66 + void cache_dev_zero_range(struct pcache_cache_dev *cache_dev, void *pos, u32 size); 67 + 68 + int cache_dev_get_empty_segment_id(struct pcache_cache_dev *cache_dev, u32 *seg_id); 69 + 70 + #endif /* _PCACHE_CACHE_DEV_H */

+170

drivers/md/dm-pcache/cache_gc.c

··· 1 + // SPDX-License-Identifier: GPL-2.0-or-later 2 + #include "cache.h" 3 + #include "backing_dev.h" 4 + #include "cache_dev.h" 5 + #include "dm_pcache.h" 6 + 7 + /** 8 + * cache_key_gc - Releases the reference of a cache key segment. 9 + * @cache: Pointer to the pcache_cache structure. 10 + * @key: Pointer to the cache key to be garbage collected. 11 + * 12 + * This function decrements the reference count of the cache segment 13 + * associated with the given key. If the reference count drops to zero, 14 + * the segment may be invalidated and reused. 15 + */ 16 + static void cache_key_gc(struct pcache_cache *cache, struct pcache_cache_key *key) 17 + { 18 + cache_seg_put(key->cache_pos.cache_seg); 19 + } 20 + 21 + static bool need_gc(struct pcache_cache *cache, struct pcache_cache_pos *dirty_tail, struct pcache_cache_pos *key_tail) 22 + { 23 + struct dm_pcache *pcache = CACHE_TO_PCACHE(cache); 24 + struct pcache_cache_kset_onmedia *kset_onmedia; 25 + void *dirty_addr, *key_addr; 26 + u32 segs_used, segs_gc_threshold, to_copy; 27 + int ret; 28 + 29 + dirty_addr = cache_pos_addr(dirty_tail); 30 + key_addr = cache_pos_addr(key_tail); 31 + if (dirty_addr == key_addr) { 32 + pcache_dev_debug(pcache, "key tail is equal to dirty tail: %u:%u\n", 33 + dirty_tail->cache_seg->cache_seg_id, 34 + dirty_tail->seg_off); 35 + return false; 36 + } 37 + 38 + kset_onmedia = (struct pcache_cache_kset_onmedia *)cache->gc_kset_onmedia_buf; 39 + 40 + to_copy = min(PCACHE_KSET_ONMEDIA_SIZE_MAX, PCACHE_SEG_SIZE - key_tail->seg_off); 41 + ret = copy_mc_to_kernel(kset_onmedia, key_addr, to_copy); 42 + if (ret) { 43 + pcache_dev_err(pcache, "error to read kset: %d", ret); 44 + return false; 45 + } 46 + 47 + /* Check if kset_onmedia is corrupted */ 48 + if (kset_onmedia->magic != PCACHE_KSET_MAGIC) { 49 + pcache_dev_debug(pcache, "gc error: magic is not as expected. key_tail: %u:%u magic: %llx, expected: %llx\n", 50 + key_tail->cache_seg->cache_seg_id, key_tail->seg_off, 51 + kset_onmedia->magic, PCACHE_KSET_MAGIC); 52 + return false; 53 + } 54 + 55 + /* Verify the CRC of the kset_onmedia */ 56 + if (kset_onmedia->crc != cache_kset_crc(kset_onmedia)) { 57 + pcache_dev_debug(pcache, "gc error: crc is not as expected. crc: %x, expected: %x\n", 58 + cache_kset_crc(kset_onmedia), kset_onmedia->crc); 59 + return false; 60 + } 61 + 62 + segs_used = bitmap_weight(cache->seg_map, cache->n_segs); 63 + segs_gc_threshold = cache->n_segs * pcache_cache_get_gc_percent(cache) / 100; 64 + if (segs_used < segs_gc_threshold) { 65 + pcache_dev_debug(pcache, "segs_used: %u, segs_gc_threshold: %u\n", segs_used, segs_gc_threshold); 66 + return false; 67 + } 68 + 69 + return true; 70 + } 71 + 72 + /** 73 + * last_kset_gc - Advances the garbage collection for the last kset. 74 + * @cache: Pointer to the pcache_cache structure. 75 + * @kset_onmedia: Pointer to the kset_onmedia structure for the last kset. 76 + */ 77 + static void last_kset_gc(struct pcache_cache *cache, struct pcache_cache_kset_onmedia *kset_onmedia) 78 + { 79 + struct dm_pcache *pcache = CACHE_TO_PCACHE(cache); 80 + struct pcache_cache_segment *cur_seg, *next_seg; 81 + 82 + cur_seg = cache->key_tail.cache_seg; 83 + 84 + next_seg = &cache->segments[kset_onmedia->next_cache_seg_id]; 85 + 86 + mutex_lock(&cache->key_tail_lock); 87 + cache->key_tail.cache_seg = next_seg; 88 + cache->key_tail.seg_off = 0; 89 + cache_encode_key_tail(cache); 90 + mutex_unlock(&cache->key_tail_lock); 91 + 92 + pcache_dev_debug(pcache, "gc advance kset seg: %u\n", cur_seg->cache_seg_id); 93 + 94 + spin_lock(&cache->seg_map_lock); 95 + __clear_bit(cur_seg->cache_seg_id, cache->seg_map); 96 + spin_unlock(&cache->seg_map_lock); 97 + } 98 + 99 + void pcache_cache_gc_fn(struct work_struct *work) 100 + { 101 + struct pcache_cache *cache = container_of(work, struct pcache_cache, gc_work.work); 102 + struct dm_pcache *pcache = CACHE_TO_PCACHE(cache); 103 + struct pcache_cache_pos dirty_tail, key_tail; 104 + struct pcache_cache_kset_onmedia *kset_onmedia; 105 + struct pcache_cache_key_onmedia *key_onmedia; 106 + struct pcache_cache_key *key; 107 + int ret; 108 + int i; 109 + 110 + kset_onmedia = (struct pcache_cache_kset_onmedia *)cache->gc_kset_onmedia_buf; 111 + 112 + while (true) { 113 + if (pcache_is_stopping(pcache) || atomic_read(&cache->gc_errors)) 114 + return; 115 + 116 + /* Get new tail positions */ 117 + mutex_lock(&cache->dirty_tail_lock); 118 + cache_pos_copy(&dirty_tail, &cache->dirty_tail); 119 + mutex_unlock(&cache->dirty_tail_lock); 120 + 121 + mutex_lock(&cache->key_tail_lock); 122 + cache_pos_copy(&key_tail, &cache->key_tail); 123 + mutex_unlock(&cache->key_tail_lock); 124 + 125 + if (!need_gc(cache, &dirty_tail, &key_tail)) 126 + break; 127 + 128 + if (kset_onmedia->flags & PCACHE_KSET_FLAGS_LAST) { 129 + /* Don't move to the next segment if dirty_tail has not moved */ 130 + if (dirty_tail.cache_seg == key_tail.cache_seg) 131 + break; 132 + 133 + last_kset_gc(cache, kset_onmedia); 134 + continue; 135 + } 136 + 137 + for (i = 0; i < kset_onmedia->key_num; i++) { 138 + struct pcache_cache_key key_tmp = { 0 }; 139 + 140 + key_onmedia = &kset_onmedia->data[i]; 141 + 142 + key = &key_tmp; 143 + cache_key_init(&cache->req_key_tree, key); 144 + 145 + ret = cache_key_decode(cache, key_onmedia, key); 146 + if (ret) { 147 + /* return without re-arm gc work, and prevent future 148 + * gc, because we can't retry the partial-gc-ed kset 149 + */ 150 + atomic_inc(&cache->gc_errors); 151 + pcache_dev_err(pcache, "failed to decode cache key in gc\n"); 152 + return; 153 + } 154 + 155 + cache_key_gc(cache, key); 156 + } 157 + 158 + pcache_dev_debug(pcache, "gc advance: %u:%u %u\n", 159 + key_tail.cache_seg->cache_seg_id, 160 + key_tail.seg_off, 161 + get_kset_onmedia_size(kset_onmedia)); 162 + 163 + mutex_lock(&cache->key_tail_lock); 164 + cache_pos_advance(&cache->key_tail, get_kset_onmedia_size(kset_onmedia)); 165 + cache_encode_key_tail(cache); 166 + mutex_unlock(&cache->key_tail_lock); 167 + } 168 + 169 + queue_delayed_work(cache_get_wq(cache), &cache->gc_work, PCACHE_CACHE_GC_INTERVAL); 170 + }

+888

drivers/md/dm-pcache/cache_key.c

··· 1 + // SPDX-License-Identifier: GPL-2.0-or-later 2 + #include "cache.h" 3 + #include "backing_dev.h" 4 + #include "cache_dev.h" 5 + #include "dm_pcache.h" 6 + 7 + struct pcache_cache_kset_onmedia pcache_empty_kset = { 0 }; 8 + 9 + void cache_key_init(struct pcache_cache_tree *cache_tree, struct pcache_cache_key *key) 10 + { 11 + kref_init(&key->ref); 12 + key->cache_tree = cache_tree; 13 + INIT_LIST_HEAD(&key->list_node); 14 + RB_CLEAR_NODE(&key->rb_node); 15 + } 16 + 17 + struct pcache_cache_key *cache_key_alloc(struct pcache_cache_tree *cache_tree, gfp_t gfp_mask) 18 + { 19 + struct pcache_cache_key *key; 20 + 21 + key = mempool_alloc(&cache_tree->key_pool, gfp_mask); 22 + if (!key) 23 + return NULL; 24 + 25 + memset(key, 0, sizeof(struct pcache_cache_key)); 26 + cache_key_init(cache_tree, key); 27 + 28 + return key; 29 + } 30 + 31 + /** 32 + * cache_key_get - Increment the reference count of a cache key. 33 + * @key: Pointer to the pcache_cache_key structure. 34 + * 35 + * This function increments the reference count of the specified cache key, 36 + * ensuring that it is not freed while still in use. 37 + */ 38 + void cache_key_get(struct pcache_cache_key *key) 39 + { 40 + kref_get(&key->ref); 41 + } 42 + 43 + /** 44 + * cache_key_destroy - Free a cache key structure when its reference count drops to zero. 45 + * @ref: Pointer to the kref structure. 46 + * 47 + * This function is called when the reference count of the cache key reaches zero. 48 + * It frees the allocated cache key back to the slab cache. 49 + */ 50 + static void cache_key_destroy(struct kref *ref) 51 + { 52 + struct pcache_cache_key *key = container_of(ref, struct pcache_cache_key, ref); 53 + struct pcache_cache_tree *cache_tree = key->cache_tree; 54 + 55 + mempool_free(key, &cache_tree->key_pool); 56 + } 57 + 58 + void cache_key_put(struct pcache_cache_key *key) 59 + { 60 + kref_put(&key->ref, cache_key_destroy); 61 + } 62 + 63 + void cache_pos_advance(struct pcache_cache_pos *pos, u32 len) 64 + { 65 + /* Ensure enough space remains in the current segment */ 66 + BUG_ON(cache_seg_remain(pos) < len); 67 + 68 + pos->seg_off += len; 69 + } 70 + 71 + static void cache_key_encode(struct pcache_cache *cache, 72 + struct pcache_cache_key_onmedia *key_onmedia, 73 + struct pcache_cache_key *key) 74 + { 75 + key_onmedia->off = key->off; 76 + key_onmedia->len = key->len; 77 + 78 + key_onmedia->cache_seg_id = key->cache_pos.cache_seg->cache_seg_id; 79 + key_onmedia->cache_seg_off = key->cache_pos.seg_off; 80 + 81 + key_onmedia->seg_gen = key->seg_gen; 82 + key_onmedia->flags = key->flags; 83 + 84 + if (cache_data_crc_on(cache)) 85 + key_onmedia->data_crc = cache_key_data_crc(key); 86 + } 87 + 88 + int cache_key_decode(struct pcache_cache *cache, 89 + struct pcache_cache_key_onmedia *key_onmedia, 90 + struct pcache_cache_key *key) 91 + { 92 + struct dm_pcache *pcache = CACHE_TO_PCACHE(cache); 93 + 94 + key->off = key_onmedia->off; 95 + key->len = key_onmedia->len; 96 + 97 + key->cache_pos.cache_seg = &cache->segments[key_onmedia->cache_seg_id]; 98 + key->cache_pos.seg_off = key_onmedia->cache_seg_off; 99 + 100 + key->seg_gen = key_onmedia->seg_gen; 101 + key->flags = key_onmedia->flags; 102 + 103 + if (cache_data_crc_on(cache) && 104 + key_onmedia->data_crc != cache_key_data_crc(key)) { 105 + pcache_dev_err(pcache, "key: %llu:%u seg %u:%u data_crc error: %x, expected: %x\n", 106 + key->off, key->len, key->cache_pos.cache_seg->cache_seg_id, 107 + key->cache_pos.seg_off, cache_key_data_crc(key), key_onmedia->data_crc); 108 + return -EIO; 109 + } 110 + 111 + return 0; 112 + } 113 + 114 + static void append_last_kset(struct pcache_cache *cache, u32 next_seg) 115 + { 116 + struct pcache_cache_kset_onmedia kset_onmedia = { 0 }; 117 + 118 + kset_onmedia.flags |= PCACHE_KSET_FLAGS_LAST; 119 + kset_onmedia.next_cache_seg_id = next_seg; 120 + kset_onmedia.magic = PCACHE_KSET_MAGIC; 121 + kset_onmedia.crc = cache_kset_crc(&kset_onmedia); 122 + 123 + memcpy_flushcache(get_key_head_addr(cache), &kset_onmedia, sizeof(struct pcache_cache_kset_onmedia)); 124 + pmem_wmb(); 125 + cache_pos_advance(&cache->key_head, sizeof(struct pcache_cache_kset_onmedia)); 126 + } 127 + 128 + int cache_kset_close(struct pcache_cache *cache, struct pcache_cache_kset *kset) 129 + { 130 + struct pcache_cache_kset_onmedia *kset_onmedia; 131 + u32 kset_onmedia_size; 132 + int ret; 133 + 134 + kset_onmedia = &kset->kset_onmedia; 135 + 136 + if (!kset_onmedia->key_num) 137 + return 0; 138 + 139 + kset_onmedia_size = struct_size(kset_onmedia, data, kset_onmedia->key_num); 140 + 141 + spin_lock(&cache->key_head_lock); 142 + again: 143 + /* Reserve space for the last kset */ 144 + if (cache_seg_remain(&cache->key_head) < kset_onmedia_size + sizeof(struct pcache_cache_kset_onmedia)) { 145 + struct pcache_cache_segment *next_seg; 146 + 147 + next_seg = get_cache_segment(cache); 148 + if (!next_seg) { 149 + ret = -EBUSY; 150 + goto out; 151 + } 152 + 153 + /* clear outdated kset in next seg */ 154 + memcpy_flushcache(next_seg->segment.data, &pcache_empty_kset, 155 + sizeof(struct pcache_cache_kset_onmedia)); 156 + append_last_kset(cache, next_seg->cache_seg_id); 157 + cache->key_head.cache_seg = next_seg; 158 + cache->key_head.seg_off = 0; 159 + goto again; 160 + } 161 + 162 + kset_onmedia->magic = PCACHE_KSET_MAGIC; 163 + kset_onmedia->crc = cache_kset_crc(kset_onmedia); 164 + 165 + /* clear outdated kset after current kset */ 166 + memcpy_flushcache(get_key_head_addr(cache) + kset_onmedia_size, &pcache_empty_kset, 167 + sizeof(struct pcache_cache_kset_onmedia)); 168 + /* write current kset into segment */ 169 + memcpy_flushcache(get_key_head_addr(cache), kset_onmedia, kset_onmedia_size); 170 + pmem_wmb(); 171 + 172 + /* reset kset_onmedia */ 173 + memset(kset_onmedia, 0, sizeof(struct pcache_cache_kset_onmedia)); 174 + cache_pos_advance(&cache->key_head, kset_onmedia_size); 175 + 176 + ret = 0; 177 + out: 178 + spin_unlock(&cache->key_head_lock); 179 + 180 + return ret; 181 + } 182 + 183 + /** 184 + * cache_key_append - Append a cache key to the related kset. 185 + * @cache: Pointer to the pcache_cache structure. 186 + * @key: Pointer to the cache key structure to append. 187 + * @force_close: Need to close current kset if true. 188 + * 189 + * This function appends a cache key to the appropriate kset. If the kset 190 + * is full, it closes the kset. If not, it queues a flush work to write 191 + * the kset to media. 192 + * 193 + * Returns 0 on success, or a negative error code on failure. 194 + */ 195 + int cache_key_append(struct pcache_cache *cache, struct pcache_cache_key *key, bool force_close) 196 + { 197 + struct pcache_cache_kset *kset; 198 + struct pcache_cache_kset_onmedia *kset_onmedia; 199 + struct pcache_cache_key_onmedia *key_onmedia; 200 + u32 kset_id = get_kset_id(cache, key->off); 201 + int ret = 0; 202 + 203 + kset = get_kset(cache, kset_id); 204 + kset_onmedia = &kset->kset_onmedia; 205 + 206 + spin_lock(&kset->kset_lock); 207 + key_onmedia = &kset_onmedia->data[kset_onmedia->key_num]; 208 + cache_key_encode(cache, key_onmedia, key); 209 + 210 + /* Check if the current kset has reached the maximum number of keys */ 211 + if (++kset_onmedia->key_num == PCACHE_KSET_KEYS_MAX || force_close) { 212 + /* If full, close the kset */ 213 + ret = cache_kset_close(cache, kset); 214 + if (ret) { 215 + kset_onmedia->key_num--; 216 + goto out; 217 + } 218 + } else { 219 + /* If not full, queue a delayed work to flush the kset */ 220 + queue_delayed_work(cache_get_wq(cache), &kset->flush_work, 1 * HZ); 221 + } 222 + out: 223 + spin_unlock(&kset->kset_lock); 224 + 225 + return ret; 226 + } 227 + 228 + /** 229 + * cache_subtree_walk - Traverse the cache tree. 230 + * @ctx: Pointer to the context structure for traversal. 231 + * 232 + * This function traverses the cache tree starting from the specified node. 233 + * It calls the appropriate callback functions based on the relationships 234 + * between the keys in the cache tree. 235 + * 236 + * Returns 0 on success, or a negative error code on failure. 237 + */ 238 + int cache_subtree_walk(struct pcache_cache_subtree_walk_ctx *ctx) 239 + { 240 + struct pcache_cache_key *key_tmp, *key; 241 + struct rb_node *node_tmp; 242 + int ret = SUBTREE_WALK_RET_OK; 243 + 244 + key = ctx->key; 245 + node_tmp = ctx->start_node; 246 + 247 + while (node_tmp) { 248 + if (ctx->walk_done && ctx->walk_done(ctx)) 249 + break; 250 + 251 + key_tmp = CACHE_KEY(node_tmp); 252 + /* 253 + * If key_tmp ends before the start of key, continue to the next node. 254 + * |----------| 255 + * |=====| 256 + */ 257 + if (cache_key_lend(key_tmp) <= cache_key_lstart(key)) { 258 + if (ctx->after) { 259 + ret = ctx->after(key, key_tmp, ctx); 260 + if (ret) 261 + goto out; 262 + } 263 + goto next; 264 + } 265 + 266 + /* 267 + * If key_tmp starts after the end of key, stop traversing. 268 + * |--------| 269 + * |====| 270 + */ 271 + if (cache_key_lstart(key_tmp) >= cache_key_lend(key)) { 272 + if (ctx->before) { 273 + ret = ctx->before(key, key_tmp, ctx); 274 + if (ret) 275 + goto out; 276 + } 277 + break; 278 + } 279 + 280 + /* Handle overlapping keys */ 281 + if (cache_key_lstart(key_tmp) >= cache_key_lstart(key)) { 282 + /* 283 + * If key_tmp encompasses key. 284 + * |----------------| key_tmp 285 + * |===========| key 286 + */ 287 + if (cache_key_lend(key_tmp) >= cache_key_lend(key)) { 288 + if (ctx->overlap_tail) { 289 + ret = ctx->overlap_tail(key, key_tmp, ctx); 290 + if (ret) 291 + goto out; 292 + } 293 + break; 294 + } 295 + 296 + /* 297 + * If key_tmp is contained within key. 298 + * |----| key_tmp 299 + * |==========| key 300 + */ 301 + if (ctx->overlap_contain) { 302 + ret = ctx->overlap_contain(key, key_tmp, ctx); 303 + if (ret) 304 + goto out; 305 + } 306 + 307 + goto next; 308 + } 309 + 310 + /* 311 + * If key_tmp starts before key ends but ends after key. 312 + * |-----------| key_tmp 313 + * |====| key 314 + */ 315 + if (cache_key_lend(key_tmp) > cache_key_lend(key)) { 316 + if (ctx->overlap_contained) { 317 + ret = ctx->overlap_contained(key, key_tmp, ctx); 318 + if (ret) 319 + goto out; 320 + } 321 + break; 322 + } 323 + 324 + /* 325 + * If key_tmp starts before key and ends within key. 326 + * |--------| key_tmp 327 + * |==========| key 328 + */ 329 + if (ctx->overlap_head) { 330 + ret = ctx->overlap_head(key, key_tmp, ctx); 331 + if (ret) 332 + goto out; 333 + } 334 + next: 335 + node_tmp = rb_next(node_tmp); 336 + } 337 + 338 + out: 339 + if (ctx->walk_finally) 340 + ret = ctx->walk_finally(ctx, ret); 341 + 342 + return ret; 343 + } 344 + 345 + /** 346 + * cache_subtree_search - Search for a key in the cache tree. 347 + * @cache_subtree: Pointer to the cache tree structure. 348 + * @key: Pointer to the cache key to search for. 349 + * @parentp: Pointer to store the parent node of the found node. 350 + * @newp: Pointer to store the location where the new node should be inserted. 351 + * @delete_key_list: List to collect invalid keys for deletion. 352 + * 353 + * This function searches the cache tree for a specific key and returns 354 + * the node that is the predecessor of the key, or first node if the key is 355 + * less than all keys in the tree. If any invalid keys are found during 356 + * the search, they are added to the delete_key_list for later cleanup. 357 + * 358 + * Returns a pointer to the previous node. 359 + */ 360 + struct rb_node *cache_subtree_search(struct pcache_cache_subtree *cache_subtree, struct pcache_cache_key *key, 361 + struct rb_node **parentp, struct rb_node ***newp, 362 + struct list_head *delete_key_list) 363 + { 364 + struct rb_node **new, *parent = NULL; 365 + struct pcache_cache_key *key_tmp; 366 + struct rb_node *prev_node = NULL; 367 + 368 + new = &(cache_subtree->root.rb_node); 369 + while (*new) { 370 + key_tmp = container_of(*new, struct pcache_cache_key, rb_node); 371 + if (cache_key_invalid(key_tmp)) 372 + list_add(&key_tmp->list_node, delete_key_list); 373 + 374 + parent = *new; 375 + if (key_tmp->off >= key->off) { 376 + new = &((*new)->rb_left); 377 + } else { 378 + prev_node = *new; 379 + new = &((*new)->rb_right); 380 + } 381 + } 382 + 383 + if (!prev_node) 384 + prev_node = rb_first(&cache_subtree->root); 385 + 386 + if (parentp) 387 + *parentp = parent; 388 + 389 + if (newp) 390 + *newp = new; 391 + 392 + return prev_node; 393 + } 394 + 395 + static struct pcache_cache_key *get_pre_alloc_key(struct pcache_cache_subtree_walk_ctx *ctx) 396 + { 397 + struct pcache_cache_key *key; 398 + 399 + if (ctx->pre_alloc_key) { 400 + key = ctx->pre_alloc_key; 401 + ctx->pre_alloc_key = NULL; 402 + 403 + return key; 404 + } 405 + 406 + return cache_key_alloc(ctx->cache_tree, GFP_NOWAIT); 407 + } 408 + 409 + /** 410 + * fixup_overlap_tail - Adjust the key when it overlaps at the tail. 411 + * @key: Pointer to the new cache key being inserted. 412 + * @key_tmp: Pointer to the existing key that overlaps. 413 + * @ctx: Pointer to the context for walking the cache tree. 414 + * 415 + * This function modifies the existing key (key_tmp) when there is an 416 + * overlap at the tail with the new key. If the modified key becomes 417 + * empty, it is deleted. 418 + */ 419 + static int fixup_overlap_tail(struct pcache_cache_key *key, 420 + struct pcache_cache_key *key_tmp, 421 + struct pcache_cache_subtree_walk_ctx *ctx) 422 + { 423 + /* 424 + * |----------------| key_tmp 425 + * |===========| key 426 + */ 427 + BUG_ON(cache_key_empty(key)); 428 + if (cache_key_empty(key_tmp)) { 429 + cache_key_delete(key_tmp); 430 + return SUBTREE_WALK_RET_RESEARCH; 431 + } 432 + 433 + cache_key_cutfront(key_tmp, cache_key_lend(key) - cache_key_lstart(key_tmp)); 434 + if (key_tmp->len == 0) { 435 + cache_key_delete(key_tmp); 436 + return SUBTREE_WALK_RET_RESEARCH; 437 + } 438 + 439 + return SUBTREE_WALK_RET_OK; 440 + } 441 + 442 + /** 443 + * fixup_overlap_contain - Handle case where new key completely contains an existing key. 444 + * @key: Pointer to the new cache key being inserted. 445 + * @key_tmp: Pointer to the existing key that is being contained. 446 + * @ctx: Pointer to the context for walking the cache tree. 447 + * 448 + * This function deletes the existing key (key_tmp) when the new key 449 + * completely contains it. It returns SUBTREE_WALK_RET_RESEARCH to indicate that the 450 + * tree structure may have changed, necessitating a re-insertion of 451 + * the new key. 452 + */ 453 + static int fixup_overlap_contain(struct pcache_cache_key *key, 454 + struct pcache_cache_key *key_tmp, 455 + struct pcache_cache_subtree_walk_ctx *ctx) 456 + { 457 + /* 458 + * |----| key_tmp 459 + * |==========| key 460 + */ 461 + BUG_ON(cache_key_empty(key)); 462 + cache_key_delete(key_tmp); 463 + 464 + return SUBTREE_WALK_RET_RESEARCH; 465 + } 466 + 467 + /** 468 + * fixup_overlap_contained - Handle overlap when a new key is contained in an existing key. 469 + * @key: The new cache key being inserted. 470 + * @key_tmp: The existing cache key that overlaps with the new key. 471 + * @ctx: Context for the cache tree walk. 472 + * 473 + * This function adjusts the existing key if the new key is contained 474 + * within it. If the existing key is empty, it indicates a placeholder key 475 + * that was inserted during a miss read. This placeholder will later be 476 + * updated with real data from the backing_dev, making it no longer an empty key. 477 + * 478 + * If we delete key or insert a key, the structure of the entire cache tree may change, 479 + * requiring a full research of the tree to find a new insertion point. 480 + */ 481 + static int fixup_overlap_contained(struct pcache_cache_key *key, 482 + struct pcache_cache_key *key_tmp, struct pcache_cache_subtree_walk_ctx *ctx) 483 + { 484 + struct pcache_cache_tree *cache_tree = ctx->cache_tree; 485 + 486 + /* 487 + * |-----------| key_tmp 488 + * |====| key 489 + */ 490 + BUG_ON(cache_key_empty(key)); 491 + if (cache_key_empty(key_tmp)) { 492 + /* If key_tmp is empty, don't split it; 493 + * it's a placeholder key for miss reads that will be updated later. 494 + */ 495 + cache_key_cutback(key_tmp, cache_key_lend(key_tmp) - cache_key_lstart(key)); 496 + if (key_tmp->len == 0) { 497 + cache_key_delete(key_tmp); 498 + return SUBTREE_WALK_RET_RESEARCH; 499 + } 500 + } else { 501 + struct pcache_cache_key *key_fixup; 502 + bool need_research = false; 503 + 504 + key_fixup = get_pre_alloc_key(ctx); 505 + if (!key_fixup) 506 + return SUBTREE_WALK_RET_NEED_KEY; 507 + 508 + cache_key_copy(key_fixup, key_tmp); 509 + 510 + /* Split key_tmp based on the new key's range */ 511 + cache_key_cutback(key_tmp, cache_key_lend(key_tmp) - cache_key_lstart(key)); 512 + if (key_tmp->len == 0) { 513 + cache_key_delete(key_tmp); 514 + need_research = true; 515 + } 516 + 517 + /* Create a new portion for key_fixup */ 518 + cache_key_cutfront(key_fixup, cache_key_lend(key) - cache_key_lstart(key_tmp)); 519 + if (key_fixup->len == 0) { 520 + cache_key_put(key_fixup); 521 + } else { 522 + /* Insert the new key into the cache */ 523 + cache_key_insert(cache_tree, key_fixup, false); 524 + need_research = true; 525 + } 526 + 527 + if (need_research) 528 + return SUBTREE_WALK_RET_RESEARCH; 529 + } 530 + 531 + return SUBTREE_WALK_RET_OK; 532 + } 533 + 534 + /** 535 + * fixup_overlap_head - Handle overlap when a new key overlaps with the head of an existing key. 536 + * @key: The new cache key being inserted. 537 + * @key_tmp: The existing cache key that overlaps with the new key. 538 + * @ctx: Context for the cache tree walk. 539 + * 540 + * This function adjusts the existing key if the new key overlaps 541 + * with the beginning of it. If the resulting key length is zero 542 + * after the adjustment, the key is deleted. This indicates that 543 + * the key no longer holds valid data and requires the tree to be 544 + * re-researched for a new insertion point. 545 + */ 546 + static int fixup_overlap_head(struct pcache_cache_key *key, 547 + struct pcache_cache_key *key_tmp, struct pcache_cache_subtree_walk_ctx *ctx) 548 + { 549 + /* 550 + * |--------| key_tmp 551 + * |==========| key 552 + */ 553 + BUG_ON(cache_key_empty(key)); 554 + /* Adjust key_tmp by cutting back based on the new key's start */ 555 + cache_key_cutback(key_tmp, cache_key_lend(key_tmp) - cache_key_lstart(key)); 556 + if (key_tmp->len == 0) { 557 + /* If the adjusted key_tmp length is zero, delete it */ 558 + cache_key_delete(key_tmp); 559 + return SUBTREE_WALK_RET_RESEARCH; 560 + } 561 + 562 + return SUBTREE_WALK_RET_OK; 563 + } 564 + 565 + /** 566 + * cache_key_insert - Insert a new cache key into the cache tree. 567 + * @cache_tree: Pointer to the cache_tree structure. 568 + * @key: The cache key to insert. 569 + * @fixup: Indicates if this is a new key being inserted. 570 + * 571 + * This function searches for the appropriate location to insert 572 + * a new cache key into the cache tree. It handles key overlaps 573 + * and ensures any invalid keys are removed before insertion. 574 + */ 575 + void cache_key_insert(struct pcache_cache_tree *cache_tree, struct pcache_cache_key *key, bool fixup) 576 + { 577 + struct pcache_cache *cache = cache_tree->cache; 578 + struct pcache_cache_subtree_walk_ctx walk_ctx = { 0 }; 579 + struct rb_node **new, *parent = NULL; 580 + struct pcache_cache_subtree *cache_subtree; 581 + struct pcache_cache_key *key_tmp = NULL, *key_next; 582 + struct rb_node *prev_node = NULL; 583 + LIST_HEAD(delete_key_list); 584 + int ret; 585 + 586 + cache_subtree = get_subtree(cache_tree, key->off); 587 + key->cache_subtree = cache_subtree; 588 + search: 589 + prev_node = cache_subtree_search(cache_subtree, key, &parent, &new, &delete_key_list); 590 + if (!list_empty(&delete_key_list)) { 591 + /* Remove invalid keys from the delete list */ 592 + list_for_each_entry_safe(key_tmp, key_next, &delete_key_list, list_node) { 593 + list_del_init(&key_tmp->list_node); 594 + cache_key_delete(key_tmp); 595 + } 596 + goto search; 597 + } 598 + 599 + if (fixup) { 600 + /* Set up the context with the cache, start node, and new key */ 601 + walk_ctx.cache_tree = cache_tree; 602 + walk_ctx.start_node = prev_node; 603 + walk_ctx.key = key; 604 + 605 + /* Assign overlap handling functions for different scenarios */ 606 + walk_ctx.overlap_tail = fixup_overlap_tail; 607 + walk_ctx.overlap_head = fixup_overlap_head; 608 + walk_ctx.overlap_contain = fixup_overlap_contain; 609 + walk_ctx.overlap_contained = fixup_overlap_contained; 610 + 611 + ret = cache_subtree_walk(&walk_ctx); 612 + switch (ret) { 613 + case SUBTREE_WALK_RET_OK: 614 + break; 615 + case SUBTREE_WALK_RET_RESEARCH: 616 + goto search; 617 + case SUBTREE_WALK_RET_NEED_KEY: 618 + spin_unlock(&cache_subtree->tree_lock); 619 + pcache_dev_debug(CACHE_TO_PCACHE(cache), "allocate pre_alloc_key with GFP_NOIO"); 620 + walk_ctx.pre_alloc_key = cache_key_alloc(cache_tree, GFP_NOIO); 621 + spin_lock(&cache_subtree->tree_lock); 622 + goto search; 623 + default: 624 + BUG(); 625 + } 626 + } 627 + 628 + if (walk_ctx.pre_alloc_key) 629 + cache_key_put(walk_ctx.pre_alloc_key); 630 + 631 + /* Link and insert the new key into the red-black tree */ 632 + rb_link_node(&key->rb_node, parent, new); 633 + rb_insert_color(&key->rb_node, &cache_subtree->root); 634 + } 635 + 636 + /** 637 + * clean_fn - Cleanup function to remove invalid keys from the cache tree. 638 + * @work: Pointer to the work_struct associated with the cleanup. 639 + * 640 + * This function cleans up invalid keys from the cache tree in the background 641 + * after a cache segment has been invalidated during cache garbage collection. 642 + * It processes a maximum of PCACHE_CLEAN_KEYS_MAX keys per iteration and holds 643 + * the tree lock to ensure thread safety. 644 + */ 645 + void clean_fn(struct work_struct *work) 646 + { 647 + struct pcache_cache *cache = container_of(work, struct pcache_cache, clean_work); 648 + struct pcache_cache_subtree *cache_subtree; 649 + struct rb_node *node; 650 + struct pcache_cache_key *key; 651 + int i, count; 652 + 653 + for (i = 0; i < cache->req_key_tree.n_subtrees; i++) { 654 + cache_subtree = &cache->req_key_tree.subtrees[i]; 655 + 656 + again: 657 + if (pcache_is_stopping(CACHE_TO_PCACHE(cache))) 658 + return; 659 + 660 + /* Delete up to PCACHE_CLEAN_KEYS_MAX keys in one iteration */ 661 + count = 0; 662 + spin_lock(&cache_subtree->tree_lock); 663 + node = rb_first(&cache_subtree->root); 664 + while (node) { 665 + key = CACHE_KEY(node); 666 + node = rb_next(node); 667 + if (cache_key_invalid(key)) { 668 + count++; 669 + cache_key_delete(key); 670 + } 671 + 672 + if (count >= PCACHE_CLEAN_KEYS_MAX) { 673 + /* Unlock and pause before continuing cleanup */ 674 + spin_unlock(&cache_subtree->tree_lock); 675 + usleep_range(1000, 2000); 676 + goto again; 677 + } 678 + } 679 + spin_unlock(&cache_subtree->tree_lock); 680 + } 681 + } 682 + 683 + /* 684 + * kset_flush_fn - Flush work for a cache kset. 685 + * 686 + * This function is called when a kset flush work is queued from 687 + * cache_key_append(). If the kset is full, it will be closed 688 + * immediately. If not, the flush work will be queued for later closure. 689 + * 690 + * If cache_kset_close detects that a new segment is required to store 691 + * the kset and there are no available segments, it will return an error. 692 + * In this scenario, a retry will be attempted. 693 + */ 694 + void kset_flush_fn(struct work_struct *work) 695 + { 696 + struct pcache_cache_kset *kset = container_of(work, struct pcache_cache_kset, flush_work.work); 697 + struct pcache_cache *cache = kset->cache; 698 + int ret; 699 + 700 + if (pcache_is_stopping(CACHE_TO_PCACHE(cache))) 701 + return; 702 + 703 + spin_lock(&kset->kset_lock); 704 + ret = cache_kset_close(cache, kset); 705 + spin_unlock(&kset->kset_lock); 706 + 707 + if (ret) { 708 + /* Failed to flush kset, schedule a retry. */ 709 + queue_delayed_work(cache_get_wq(cache), &kset->flush_work, msecs_to_jiffies(100)); 710 + } 711 + } 712 + 713 + static int kset_replay(struct pcache_cache *cache, struct pcache_cache_kset_onmedia *kset_onmedia) 714 + { 715 + struct pcache_cache_key_onmedia *key_onmedia; 716 + struct pcache_cache_subtree *cache_subtree; 717 + struct pcache_cache_key *key; 718 + int ret; 719 + int i; 720 + 721 + for (i = 0; i < kset_onmedia->key_num; i++) { 722 + key_onmedia = &kset_onmedia->data[i]; 723 + 724 + key = cache_key_alloc(&cache->req_key_tree, GFP_NOIO); 725 + ret = cache_key_decode(cache, key_onmedia, key); 726 + if (ret) { 727 + cache_key_put(key); 728 + goto err; 729 + } 730 + 731 + __set_bit(key->cache_pos.cache_seg->cache_seg_id, cache->seg_map); 732 + 733 + /* Check if the segment generation is valid for insertion. */ 734 + if (key->seg_gen < key->cache_pos.cache_seg->gen) { 735 + cache_key_put(key); 736 + } else { 737 + cache_subtree = get_subtree(&cache->req_key_tree, key->off); 738 + spin_lock(&cache_subtree->tree_lock); 739 + cache_key_insert(&cache->req_key_tree, key, true); 740 + spin_unlock(&cache_subtree->tree_lock); 741 + } 742 + 743 + cache_seg_get(key->cache_pos.cache_seg); 744 + } 745 + 746 + return 0; 747 + err: 748 + return ret; 749 + } 750 + 751 + int cache_replay(struct pcache_cache *cache) 752 + { 753 + struct dm_pcache *pcache = CACHE_TO_PCACHE(cache); 754 + struct pcache_cache_pos pos_tail; 755 + struct pcache_cache_pos *pos; 756 + struct pcache_cache_kset_onmedia *kset_onmedia; 757 + u32 to_copy, count = 0; 758 + int ret = 0; 759 + 760 + kset_onmedia = kzalloc(PCACHE_KSET_ONMEDIA_SIZE_MAX, GFP_KERNEL); 761 + if (!kset_onmedia) 762 + return -ENOMEM; 763 + 764 + cache_pos_copy(&pos_tail, &cache->key_tail); 765 + pos = &pos_tail; 766 + 767 + /* 768 + * In cache replaying stage, there is no other one will access 769 + * cache->seg_map, so we can set bit here without cache->seg_map_lock. 770 + */ 771 + __set_bit(pos->cache_seg->cache_seg_id, cache->seg_map); 772 + 773 + while (true) { 774 + to_copy = min(PCACHE_KSET_ONMEDIA_SIZE_MAX, PCACHE_SEG_SIZE - pos->seg_off); 775 + ret = copy_mc_to_kernel(kset_onmedia, cache_pos_addr(pos), to_copy); 776 + if (ret) { 777 + ret = -EIO; 778 + goto out; 779 + } 780 + 781 + if (kset_onmedia->magic != PCACHE_KSET_MAGIC || 782 + kset_onmedia->crc != cache_kset_crc(kset_onmedia)) { 783 + break; 784 + } 785 + 786 + /* Process the last kset and prepare for the next segment. */ 787 + if (kset_onmedia->flags & PCACHE_KSET_FLAGS_LAST) { 788 + struct pcache_cache_segment *next_seg; 789 + 790 + pcache_dev_debug(pcache, "last kset replay, next: %u\n", kset_onmedia->next_cache_seg_id); 791 + 792 + next_seg = &cache->segments[kset_onmedia->next_cache_seg_id]; 793 + 794 + pos->cache_seg = next_seg; 795 + pos->seg_off = 0; 796 + 797 + __set_bit(pos->cache_seg->cache_seg_id, cache->seg_map); 798 + continue; 799 + } 800 + 801 + /* Replay the kset and check for errors. */ 802 + ret = kset_replay(cache, kset_onmedia); 803 + if (ret) 804 + goto out; 805 + 806 + /* Advance the position after processing the kset. */ 807 + cache_pos_advance(pos, get_kset_onmedia_size(kset_onmedia)); 808 + if (++count > 512) { 809 + cond_resched(); 810 + count = 0; 811 + } 812 + } 813 + 814 + /* Update the key_head position after replaying. */ 815 + spin_lock(&cache->key_head_lock); 816 + cache_pos_copy(&cache->key_head, pos); 817 + spin_unlock(&cache->key_head_lock); 818 + out: 819 + kfree(kset_onmedia); 820 + return ret; 821 + } 822 + 823 + int cache_tree_init(struct pcache_cache *cache, struct pcache_cache_tree *cache_tree, u32 n_subtrees) 824 + { 825 + int ret; 826 + u32 i; 827 + 828 + cache_tree->cache = cache; 829 + cache_tree->n_subtrees = n_subtrees; 830 + 831 + ret = mempool_init_slab_pool(&cache_tree->key_pool, 1024, key_cache); 832 + if (ret) 833 + goto err; 834 + 835 + /* 836 + * Allocate and initialize the subtrees array. 837 + * Each element is a cache tree structure that contains 838 + * an RB tree root and a spinlock for protecting its contents. 839 + */ 840 + cache_tree->subtrees = kvcalloc(cache_tree->n_subtrees, sizeof(struct pcache_cache_subtree), GFP_KERNEL); 841 + if (!cache_tree->subtrees) { 842 + ret = -ENOMEM; 843 + goto key_pool_exit; 844 + } 845 + 846 + for (i = 0; i < cache_tree->n_subtrees; i++) { 847 + struct pcache_cache_subtree *cache_subtree = &cache_tree->subtrees[i]; 848 + 849 + cache_subtree->root = RB_ROOT; 850 + spin_lock_init(&cache_subtree->tree_lock); 851 + } 852 + 853 + return 0; 854 + 855 + key_pool_exit: 856 + mempool_exit(&cache_tree->key_pool); 857 + err: 858 + return ret; 859 + } 860 + 861 + void cache_tree_clear(struct pcache_cache_tree *cache_tree) 862 + { 863 + struct pcache_cache_subtree *cache_subtree; 864 + struct rb_node *node; 865 + struct pcache_cache_key *key; 866 + u32 i; 867 + 868 + for (i = 0; i < cache_tree->n_subtrees; i++) { 869 + cache_subtree = &cache_tree->subtrees[i]; 870 + 871 + spin_lock(&cache_subtree->tree_lock); 872 + node = rb_first(&cache_subtree->root); 873 + while (node) { 874 + key = CACHE_KEY(node); 875 + node = rb_next(node); 876 + 877 + cache_key_delete(key); 878 + } 879 + spin_unlock(&cache_subtree->tree_lock); 880 + } 881 + } 882 + 883 + void cache_tree_exit(struct pcache_cache_tree *cache_tree) 884 + { 885 + cache_tree_clear(cache_tree); 886 + kvfree(cache_tree->subtrees); 887 + mempool_exit(&cache_tree->key_pool); 888 + }

+835

drivers/md/dm-pcache/cache_req.c

··· 1 + // SPDX-License-Identifier: GPL-2.0-or-later 2 + 3 + #include "cache.h" 4 + #include "backing_dev.h" 5 + #include "cache_dev.h" 6 + #include "dm_pcache.h" 7 + 8 + static int cache_data_head_init(struct pcache_cache *cache) 9 + { 10 + struct pcache_cache_segment *next_seg; 11 + struct pcache_cache_data_head *data_head; 12 + 13 + data_head = get_data_head(cache); 14 + next_seg = get_cache_segment(cache); 15 + if (!next_seg) 16 + return -EBUSY; 17 + 18 + cache_seg_get(next_seg); 19 + data_head->head_pos.cache_seg = next_seg; 20 + data_head->head_pos.seg_off = 0; 21 + 22 + return 0; 23 + } 24 + 25 + /** 26 + * cache_data_alloc - Allocate data for a cache key. 27 + * @cache: Pointer to the cache structure. 28 + * @key: Pointer to the cache key to allocate data for. 29 + * 30 + * This function tries to allocate space from the cache segment specified by the 31 + * data head. If the remaining space in the segment is insufficient to allocate 32 + * the requested length for the cache key, it will allocate whatever is available 33 + * and adjust the key's length accordingly. This function does not allocate 34 + * space that crosses segment boundaries. 35 + */ 36 + static int cache_data_alloc(struct pcache_cache *cache, struct pcache_cache_key *key) 37 + { 38 + struct pcache_cache_data_head *data_head; 39 + struct pcache_cache_pos *head_pos; 40 + struct pcache_cache_segment *cache_seg; 41 + u32 seg_remain; 42 + u32 allocated = 0, to_alloc; 43 + int ret = 0; 44 + 45 + preempt_disable(); 46 + data_head = get_data_head(cache); 47 + again: 48 + to_alloc = key->len - allocated; 49 + if (!data_head->head_pos.cache_seg) { 50 + seg_remain = 0; 51 + } else { 52 + cache_pos_copy(&key->cache_pos, &data_head->head_pos); 53 + key->seg_gen = key->cache_pos.cache_seg->gen; 54 + 55 + head_pos = &data_head->head_pos; 56 + cache_seg = head_pos->cache_seg; 57 + seg_remain = cache_seg_remain(head_pos); 58 + } 59 + 60 + if (seg_remain > to_alloc) { 61 + /* If remaining space in segment is sufficient for the cache key, allocate it. */ 62 + cache_pos_advance(head_pos, to_alloc); 63 + allocated += to_alloc; 64 + cache_seg_get(cache_seg); 65 + } else if (seg_remain) { 66 + /* If remaining space is not enough, allocate the remaining space and adjust the cache key length. */ 67 + cache_pos_advance(head_pos, seg_remain); 68 + key->len = seg_remain; 69 + 70 + /* Get for key: obtain a reference to the cache segment for the key. */ 71 + cache_seg_get(cache_seg); 72 + /* Put for head_pos->cache_seg: release the reference for the current head's segment. */ 73 + cache_seg_put(head_pos->cache_seg); 74 + head_pos->cache_seg = NULL; 75 + } else { 76 + /* Initialize a new data head if no segment is available. */ 77 + ret = cache_data_head_init(cache); 78 + if (ret) 79 + goto out; 80 + 81 + goto again; 82 + } 83 + 84 + out: 85 + preempt_enable(); 86 + 87 + return ret; 88 + } 89 + 90 + static int cache_copy_from_req_bio(struct pcache_cache *cache, struct pcache_cache_key *key, 91 + struct pcache_request *pcache_req, u32 bio_off) 92 + { 93 + struct pcache_cache_pos *pos = &key->cache_pos; 94 + struct pcache_segment *segment; 95 + 96 + segment = &pos->cache_seg->segment; 97 + 98 + return segment_copy_from_bio(segment, pos->seg_off, key->len, pcache_req->bio, bio_off); 99 + } 100 + 101 + static int cache_copy_to_req_bio(struct pcache_cache *cache, struct pcache_request *pcache_req, 102 + u32 bio_off, u32 len, struct pcache_cache_pos *pos, u64 key_gen) 103 + { 104 + struct pcache_cache_segment *cache_seg = pos->cache_seg; 105 + struct pcache_segment *segment = &cache_seg->segment; 106 + int ret; 107 + 108 + spin_lock(&cache_seg->gen_lock); 109 + if (key_gen < cache_seg->gen) { 110 + spin_unlock(&cache_seg->gen_lock); 111 + return -EINVAL; 112 + } 113 + 114 + ret = segment_copy_to_bio(segment, pos->seg_off, len, pcache_req->bio, bio_off); 115 + spin_unlock(&cache_seg->gen_lock); 116 + 117 + return ret; 118 + } 119 + 120 + /** 121 + * miss_read_end_req - Handle the end of a miss read request. 122 + * @backing_req: Pointer to the request structure. 123 + * @read_ret: Return value of read. 124 + * 125 + * This function is called when a backing request to read data from 126 + * the backing_dev is completed. If the key associated with the request 127 + * is empty (a placeholder), it allocates cache space for the key, 128 + * copies the data read from the bio into the cache, and updates 129 + * the key's status. If the key has been overwritten by a write 130 + * request during this process, it will be deleted from the cache 131 + * tree and no further action will be taken. 132 + */ 133 + static void miss_read_end_req(struct pcache_backing_dev_req *backing_req, int read_ret) 134 + { 135 + void *priv_data = backing_req->priv_data; 136 + struct pcache_request *pcache_req = backing_req->req.upper_req; 137 + struct pcache_cache *cache = backing_req->backing_dev->cache; 138 + int ret; 139 + 140 + if (priv_data) { 141 + struct pcache_cache_key *key; 142 + struct pcache_cache_subtree *cache_subtree; 143 + 144 + key = (struct pcache_cache_key *)priv_data; 145 + cache_subtree = key->cache_subtree; 146 + 147 + /* if this key was deleted from cache_subtree by a write, key->flags should be cleared, 148 + * so if cache_key_empty() return true, this key is still in cache_subtree 149 + */ 150 + spin_lock(&cache_subtree->tree_lock); 151 + if (cache_key_empty(key)) { 152 + /* Check if the backing request was successful. */ 153 + if (read_ret) { 154 + cache_key_delete(key); 155 + goto unlock; 156 + } 157 + 158 + /* Allocate cache space for the key and copy data from the backing_dev. */ 159 + ret = cache_data_alloc(cache, key); 160 + if (ret) { 161 + cache_key_delete(key); 162 + goto unlock; 163 + } 164 + 165 + ret = cache_copy_from_req_bio(cache, key, pcache_req, backing_req->req.bio_off); 166 + if (ret) { 167 + cache_seg_put(key->cache_pos.cache_seg); 168 + cache_key_delete(key); 169 + goto unlock; 170 + } 171 + key->flags &= ~PCACHE_CACHE_KEY_FLAGS_EMPTY; 172 + key->flags |= PCACHE_CACHE_KEY_FLAGS_CLEAN; 173 + 174 + /* Append the key to the cache. */ 175 + ret = cache_key_append(cache, key, false); 176 + if (ret) { 177 + cache_seg_put(key->cache_pos.cache_seg); 178 + cache_key_delete(key); 179 + goto unlock; 180 + } 181 + } 182 + unlock: 183 + spin_unlock(&cache_subtree->tree_lock); 184 + cache_key_put(key); 185 + } 186 + } 187 + 188 + /** 189 + * submit_cache_miss_req - Submit a backing request when cache data is missing 190 + * @cache: The cache context that manages cache operations 191 + * @backing_req: The cache request containing information about the read request 192 + * 193 + * This function is used to handle cases where a cache read request cannot locate 194 + * the required data in the cache. When such a miss occurs during `cache_subtree_walk`, 195 + * it triggers a backing read request to fetch data from the backing storage. 196 + * 197 + * If `pcache_req->priv_data` is set, it points to a `pcache_cache_key`, representing 198 + * a new cache key to be inserted into the cache. The function calls `cache_key_insert` 199 + * to attempt adding the key. On insertion failure, it releases the key reference and 200 + * clears `priv_data` to avoid further processing. 201 + */ 202 + static void submit_cache_miss_req(struct pcache_cache *cache, struct pcache_backing_dev_req *backing_req) 203 + { 204 + if (backing_req->priv_data) { 205 + struct pcache_cache_key *key; 206 + 207 + /* Attempt to insert the key into the cache if priv_data is set */ 208 + key = (struct pcache_cache_key *)backing_req->priv_data; 209 + cache_key_insert(&cache->req_key_tree, key, true); 210 + } 211 + backing_dev_req_submit(backing_req, false); 212 + } 213 + 214 + static void cache_miss_req_free(struct pcache_backing_dev_req *backing_req) 215 + { 216 + struct pcache_cache_key *key; 217 + 218 + if (backing_req->priv_data) { 219 + key = backing_req->priv_data; 220 + backing_req->priv_data = NULL; 221 + cache_key_put(key); /* for ->priv_data */ 222 + cache_key_put(key); /* for init ref in alloc */ 223 + } 224 + 225 + backing_dev_req_end(backing_req); 226 + } 227 + 228 + static struct pcache_backing_dev_req *cache_miss_req_alloc(struct pcache_cache *cache, 229 + struct pcache_request *parent, 230 + gfp_t gfp_mask) 231 + { 232 + struct pcache_backing_dev *backing_dev = cache->backing_dev; 233 + struct pcache_backing_dev_req *backing_req; 234 + struct pcache_cache_key *key = NULL; 235 + struct pcache_backing_dev_req_opts req_opts = { 0 }; 236 + 237 + req_opts.type = BACKING_DEV_REQ_TYPE_REQ; 238 + req_opts.gfp_mask = gfp_mask; 239 + req_opts.req.upper_req = parent; 240 + 241 + backing_req = backing_dev_req_alloc(backing_dev, &req_opts); 242 + if (!backing_req) 243 + return NULL; 244 + 245 + key = cache_key_alloc(&cache->req_key_tree, gfp_mask); 246 + if (!key) 247 + goto free_backing_req; 248 + 249 + cache_key_get(key); 250 + backing_req->priv_data = key; 251 + 252 + return backing_req; 253 + 254 + free_backing_req: 255 + cache_miss_req_free(backing_req); 256 + return NULL; 257 + } 258 + 259 + static void cache_miss_req_init(struct pcache_cache *cache, 260 + struct pcache_backing_dev_req *backing_req, 261 + struct pcache_request *parent, 262 + u32 off, u32 len, bool insert_key) 263 + { 264 + struct pcache_cache_key *key; 265 + struct pcache_backing_dev_req_opts req_opts = { 0 }; 266 + 267 + req_opts.type = BACKING_DEV_REQ_TYPE_REQ; 268 + req_opts.req.upper_req = parent; 269 + req_opts.req.req_off = off; 270 + req_opts.req.len = len; 271 + req_opts.end_fn = miss_read_end_req; 272 + 273 + backing_dev_req_init(backing_req, &req_opts); 274 + 275 + if (insert_key) { 276 + key = backing_req->priv_data; 277 + key->off = parent->off + off; 278 + key->len = len; 279 + key->flags |= PCACHE_CACHE_KEY_FLAGS_EMPTY; 280 + } else { 281 + key = backing_req->priv_data; 282 + backing_req->priv_data = NULL; 283 + cache_key_put(key); 284 + cache_key_put(key); 285 + } 286 + } 287 + 288 + static struct pcache_backing_dev_req *get_pre_alloc_req(struct pcache_cache_subtree_walk_ctx *ctx) 289 + { 290 + struct pcache_cache *cache = ctx->cache_tree->cache; 291 + struct pcache_request *pcache_req = ctx->pcache_req; 292 + struct pcache_backing_dev_req *backing_req; 293 + 294 + if (ctx->pre_alloc_req) { 295 + backing_req = ctx->pre_alloc_req; 296 + ctx->pre_alloc_req = NULL; 297 + 298 + return backing_req; 299 + } 300 + 301 + return cache_miss_req_alloc(cache, pcache_req, GFP_NOWAIT); 302 + } 303 + 304 + /* 305 + * In the process of walking the cache tree to locate cached data, this 306 + * function handles the situation where the requested data range lies 307 + * entirely before an existing cache node (`key_tmp`). This outcome 308 + * signifies that the target data is absent from the cache (cache miss). 309 + * 310 + * To fulfill this portion of the read request, the function creates a 311 + * backing request (`backing_req`) for the missing data range represented 312 + * by `key`. It then appends this request to the submission list in the 313 + * `ctx`, which will later be processed to retrieve the data from backing 314 + * storage. After setting up the backing request, `req_done` in `ctx` is 315 + * updated to reflect the length of the handled range, and the range 316 + * in `key` is adjusted by trimming off the portion that is now handled. 317 + * 318 + * The scenario handled here: 319 + * 320 + * |--------| key_tmp (existing cached range) 321 + * |====| key (requested range, preceding key_tmp) 322 + * 323 + * Since `key` is before `key_tmp`, it signifies that the requested data 324 + * range is missing in the cache (cache miss) and needs retrieval from 325 + * backing storage. 326 + */ 327 + static int read_before(struct pcache_cache_key *key, struct pcache_cache_key *key_tmp, 328 + struct pcache_cache_subtree_walk_ctx *ctx) 329 + { 330 + struct pcache_backing_dev_req *backing_req; 331 + struct pcache_cache *cache = ctx->cache_tree->cache; 332 + 333 + /* 334 + * In this scenario, `key` represents a range that precedes `key_tmp`, 335 + * meaning the requested data range is missing from the cache tree 336 + * and must be retrieved from the backing_dev. 337 + */ 338 + backing_req = get_pre_alloc_req(ctx); 339 + if (!backing_req) 340 + return SUBTREE_WALK_RET_NEED_REQ; 341 + 342 + cache_miss_req_init(cache, backing_req, ctx->pcache_req, ctx->req_done, key->len, true); 343 + 344 + list_add(&backing_req->node, ctx->submit_req_list); 345 + ctx->req_done += key->len; 346 + cache_key_cutfront(key, key->len); 347 + 348 + return SUBTREE_WALK_RET_OK; 349 + } 350 + 351 + /* 352 + * During cache_subtree_walk, this function manages a scenario where part of the 353 + * requested data range overlaps with an existing cache node (`key_tmp`). 354 + * 355 + * |----------------| key_tmp (existing cached range) 356 + * |===========| key (requested range, overlapping the tail of key_tmp) 357 + */ 358 + static int read_overlap_tail(struct pcache_cache_key *key, struct pcache_cache_key *key_tmp, 359 + struct pcache_cache_subtree_walk_ctx *ctx) 360 + { 361 + struct pcache_cache *cache = ctx->cache_tree->cache; 362 + struct pcache_backing_dev_req *backing_req; 363 + u32 io_len; 364 + int ret; 365 + 366 + /* 367 + * Calculate the length of the non-overlapping portion of `key` 368 + * before `key_tmp`, representing the data missing in the cache. 369 + */ 370 + io_len = cache_key_lstart(key_tmp) - cache_key_lstart(key); 371 + if (io_len) { 372 + backing_req = get_pre_alloc_req(ctx); 373 + if (!backing_req) 374 + return SUBTREE_WALK_RET_NEED_REQ; 375 + 376 + cache_miss_req_init(cache, backing_req, ctx->pcache_req, ctx->req_done, io_len, true); 377 + 378 + list_add(&backing_req->node, ctx->submit_req_list); 379 + ctx->req_done += io_len; 380 + cache_key_cutfront(key, io_len); 381 + } 382 + 383 + /* 384 + * Handle the overlapping portion by calculating the length of 385 + * the remaining data in `key` that coincides with `key_tmp`. 386 + */ 387 + io_len = cache_key_lend(key) - cache_key_lstart(key_tmp); 388 + if (cache_key_empty(key_tmp)) { 389 + backing_req = get_pre_alloc_req(ctx); 390 + if (!backing_req) 391 + return SUBTREE_WALK_RET_NEED_REQ; 392 + 393 + cache_miss_req_init(cache, backing_req, ctx->pcache_req, ctx->req_done, io_len, false); 394 + submit_cache_miss_req(cache, backing_req); 395 + } else { 396 + ret = cache_copy_to_req_bio(ctx->cache_tree->cache, ctx->pcache_req, ctx->req_done, 397 + io_len, &key_tmp->cache_pos, key_tmp->seg_gen); 398 + if (ret) { 399 + if (ret == -EINVAL) { 400 + cache_key_delete(key_tmp); 401 + return SUBTREE_WALK_RET_RESEARCH; 402 + } 403 + 404 + ctx->ret = ret; 405 + return SUBTREE_WALK_RET_ERR; 406 + } 407 + } 408 + 409 + ctx->req_done += io_len; 410 + cache_key_cutfront(key, io_len); 411 + 412 + return SUBTREE_WALK_RET_OK; 413 + } 414 + 415 + /* 416 + * |----| key_tmp (existing cached range) 417 + * |==========| key (requested range) 418 + */ 419 + static int read_overlap_contain(struct pcache_cache_key *key, struct pcache_cache_key *key_tmp, 420 + struct pcache_cache_subtree_walk_ctx *ctx) 421 + { 422 + struct pcache_cache *cache = ctx->cache_tree->cache; 423 + struct pcache_backing_dev_req *backing_req; 424 + u32 io_len; 425 + int ret; 426 + 427 + /* 428 + * Calculate the non-overlapping part of `key` before `key_tmp` 429 + * to identify the missing data length. 430 + */ 431 + io_len = cache_key_lstart(key_tmp) - cache_key_lstart(key); 432 + if (io_len) { 433 + backing_req = get_pre_alloc_req(ctx); 434 + if (!backing_req) 435 + return SUBTREE_WALK_RET_NEED_REQ; 436 + 437 + cache_miss_req_init(cache, backing_req, ctx->pcache_req, ctx->req_done, io_len, true); 438 + 439 + list_add(&backing_req->node, ctx->submit_req_list); 440 + 441 + ctx->req_done += io_len; 442 + cache_key_cutfront(key, io_len); 443 + } 444 + 445 + /* 446 + * Handle the overlapping portion between `key` and `key_tmp`. 447 + */ 448 + io_len = key_tmp->len; 449 + if (cache_key_empty(key_tmp)) { 450 + backing_req = get_pre_alloc_req(ctx); 451 + if (!backing_req) 452 + return SUBTREE_WALK_RET_NEED_REQ; 453 + 454 + cache_miss_req_init(cache, backing_req, ctx->pcache_req, ctx->req_done, io_len, false); 455 + submit_cache_miss_req(cache, backing_req); 456 + } else { 457 + ret = cache_copy_to_req_bio(ctx->cache_tree->cache, ctx->pcache_req, ctx->req_done, 458 + io_len, &key_tmp->cache_pos, key_tmp->seg_gen); 459 + if (ret) { 460 + if (ret == -EINVAL) { 461 + cache_key_delete(key_tmp); 462 + return SUBTREE_WALK_RET_RESEARCH; 463 + } 464 + 465 + ctx->ret = ret; 466 + return SUBTREE_WALK_RET_ERR; 467 + } 468 + } 469 + 470 + ctx->req_done += io_len; 471 + cache_key_cutfront(key, io_len); 472 + 473 + return SUBTREE_WALK_RET_OK; 474 + } 475 + 476 + /* 477 + * |-----------| key_tmp (existing cached range) 478 + * |====| key (requested range, fully within key_tmp) 479 + * 480 + * If `key_tmp` contains valid cached data, this function copies the relevant 481 + * portion to the request's bio. Otherwise, it sends a backing request to 482 + * fetch the required data range. 483 + */ 484 + static int read_overlap_contained(struct pcache_cache_key *key, struct pcache_cache_key *key_tmp, 485 + struct pcache_cache_subtree_walk_ctx *ctx) 486 + { 487 + struct pcache_cache *cache = ctx->cache_tree->cache; 488 + struct pcache_backing_dev_req *backing_req; 489 + struct pcache_cache_pos pos; 490 + int ret; 491 + 492 + /* 493 + * Check if `key_tmp` is empty, indicating a miss. If so, initiate 494 + * a backing request to fetch the required data for `key`. 495 + */ 496 + if (cache_key_empty(key_tmp)) { 497 + backing_req = get_pre_alloc_req(ctx); 498 + if (!backing_req) 499 + return SUBTREE_WALK_RET_NEED_REQ; 500 + 501 + cache_miss_req_init(cache, backing_req, ctx->pcache_req, ctx->req_done, key->len, false); 502 + submit_cache_miss_req(cache, backing_req); 503 + } else { 504 + cache_pos_copy(&pos, &key_tmp->cache_pos); 505 + cache_pos_advance(&pos, cache_key_lstart(key) - cache_key_lstart(key_tmp)); 506 + 507 + ret = cache_copy_to_req_bio(ctx->cache_tree->cache, ctx->pcache_req, ctx->req_done, 508 + key->len, &pos, key_tmp->seg_gen); 509 + if (ret) { 510 + if (ret == -EINVAL) { 511 + cache_key_delete(key_tmp); 512 + return SUBTREE_WALK_RET_RESEARCH; 513 + } 514 + 515 + ctx->ret = ret; 516 + return SUBTREE_WALK_RET_ERR; 517 + } 518 + } 519 + 520 + ctx->req_done += key->len; 521 + cache_key_cutfront(key, key->len); 522 + 523 + return SUBTREE_WALK_RET_OK; 524 + } 525 + 526 + /* 527 + * |--------| key_tmp (existing cached range) 528 + * |==========| key (requested range, overlapping the head of key_tmp) 529 + */ 530 + static int read_overlap_head(struct pcache_cache_key *key, struct pcache_cache_key *key_tmp, 531 + struct pcache_cache_subtree_walk_ctx *ctx) 532 + { 533 + struct pcache_cache *cache = ctx->cache_tree->cache; 534 + struct pcache_backing_dev_req *backing_req; 535 + struct pcache_cache_pos pos; 536 + u32 io_len; 537 + int ret; 538 + 539 + io_len = cache_key_lend(key_tmp) - cache_key_lstart(key); 540 + 541 + if (cache_key_empty(key_tmp)) { 542 + backing_req = get_pre_alloc_req(ctx); 543 + if (!backing_req) 544 + return SUBTREE_WALK_RET_NEED_REQ; 545 + 546 + cache_miss_req_init(cache, backing_req, ctx->pcache_req, ctx->req_done, io_len, false); 547 + submit_cache_miss_req(cache, backing_req); 548 + } else { 549 + cache_pos_copy(&pos, &key_tmp->cache_pos); 550 + cache_pos_advance(&pos, cache_key_lstart(key) - cache_key_lstart(key_tmp)); 551 + 552 + ret = cache_copy_to_req_bio(ctx->cache_tree->cache, ctx->pcache_req, ctx->req_done, 553 + io_len, &pos, key_tmp->seg_gen); 554 + if (ret) { 555 + if (ret == -EINVAL) { 556 + cache_key_delete(key_tmp); 557 + return SUBTREE_WALK_RET_RESEARCH; 558 + } 559 + 560 + ctx->ret = ret; 561 + return SUBTREE_WALK_RET_ERR; 562 + } 563 + } 564 + 565 + ctx->req_done += io_len; 566 + cache_key_cutfront(key, io_len); 567 + 568 + return SUBTREE_WALK_RET_OK; 569 + } 570 + 571 + /** 572 + * read_walk_finally - Finalizes the cache read tree walk by submitting any 573 + * remaining backing requests 574 + * @ctx: Context structure holding information about the cache, 575 + * read request, and submission list 576 + * @ret: the return value after this walk. 577 + * 578 + * This function is called at the end of the `cache_subtree_walk` during a 579 + * cache read operation. It completes the walk by checking if any data 580 + * requested by `key` was not found in the cache tree, and if so, it sends 581 + * a backing request to retrieve that data. Then, it iterates through the 582 + * submission list of backing requests created during the walk, removing 583 + * each request from the list and submitting it. 584 + * 585 + * The scenario managed here includes: 586 + * - Sending a backing request for the remaining length of `key` if it was 587 + * not fulfilled by existing cache entries. 588 + * - Iterating through `ctx->submit_req_list` to submit each backing request 589 + * enqueued during the walk. 590 + * 591 + * This ensures all necessary backing requests for cache misses are submitted 592 + * to the backing storage to retrieve any data that could not be found in 593 + * the cache. 594 + */ 595 + static int read_walk_finally(struct pcache_cache_subtree_walk_ctx *ctx, int ret) 596 + { 597 + struct pcache_cache *cache = ctx->cache_tree->cache; 598 + struct pcache_backing_dev_req *backing_req, *next_req; 599 + struct pcache_cache_key *key = ctx->key; 600 + 601 + list_for_each_entry_safe(backing_req, next_req, ctx->submit_req_list, node) { 602 + list_del_init(&backing_req->node); 603 + submit_cache_miss_req(ctx->cache_tree->cache, backing_req); 604 + } 605 + 606 + if (ret != SUBTREE_WALK_RET_OK) 607 + return ret; 608 + 609 + if (key->len) { 610 + backing_req = get_pre_alloc_req(ctx); 611 + if (!backing_req) 612 + return SUBTREE_WALK_RET_NEED_REQ; 613 + 614 + cache_miss_req_init(cache, backing_req, ctx->pcache_req, ctx->req_done, key->len, true); 615 + submit_cache_miss_req(cache, backing_req); 616 + ctx->req_done += key->len; 617 + } 618 + 619 + return SUBTREE_WALK_RET_OK; 620 + } 621 + 622 + /* 623 + * This function is used within `cache_subtree_walk` to determine whether the 624 + * read operation has covered the requested data length. It compares the 625 + * amount of data processed (`ctx->req_done`) with the total data length 626 + * specified in the original request (`ctx->pcache_req->data_len`). 627 + * 628 + * If `req_done` meets or exceeds the required data length, the function 629 + * returns `true`, indicating the walk is complete. Otherwise, it returns `false`, 630 + * signaling that additional data processing is needed to fulfill the request. 631 + */ 632 + static bool read_walk_done(struct pcache_cache_subtree_walk_ctx *ctx) 633 + { 634 + return (ctx->req_done >= ctx->pcache_req->data_len); 635 + } 636 + 637 + /** 638 + * cache_read - Process a read request by traversing the cache tree 639 + * @cache: Cache structure holding cache trees and related configurations 640 + * @pcache_req: Request structure with information about the data to read 641 + * 642 + * This function attempts to fulfill a read request by traversing the cache tree(s) 643 + * to locate cached data for the requested range. If parts of the data are missing 644 + * in the cache, backing requests are generated to retrieve the required segments. 645 + * 646 + * The function operates by initializing a key for the requested data range and 647 + * preparing a context (`walk_ctx`) to manage the cache tree traversal. The context 648 + * includes pointers to functions (e.g., `read_before`, `read_overlap_tail`) that handle 649 + * specific conditions encountered during the traversal. The `walk_finally` and `walk_done` 650 + * functions manage the end stages of the traversal, while the `delete_key_list` and 651 + * `submit_req_list` lists track any keys to be deleted or requests to be submitted. 652 + * 653 + * The function first calculates the requested range and checks if it fits within the 654 + * current cache tree (based on the tree's size limits). It then locks the cache tree 655 + * and performs a search to locate any matching keys. If there are outdated keys, 656 + * these are deleted, and the search is restarted to ensure accurate data retrieval. 657 + * 658 + * If the requested range spans multiple cache trees, the function moves on to the 659 + * next tree once the current range has been processed. This continues until the 660 + * entire requested data length has been handled. 661 + */ 662 + static int cache_read(struct pcache_cache *cache, struct pcache_request *pcache_req) 663 + { 664 + struct pcache_cache_key key_data = { .off = pcache_req->off, .len = pcache_req->data_len }; 665 + struct pcache_cache_subtree *cache_subtree; 666 + struct pcache_cache_key *key_tmp = NULL, *key_next; 667 + struct rb_node *prev_node = NULL; 668 + struct pcache_cache_key *key = &key_data; 669 + struct pcache_cache_subtree_walk_ctx walk_ctx = { 0 }; 670 + struct pcache_backing_dev_req *backing_req, *next_req; 671 + LIST_HEAD(delete_key_list); 672 + LIST_HEAD(submit_req_list); 673 + int ret; 674 + 675 + walk_ctx.cache_tree = &cache->req_key_tree; 676 + walk_ctx.req_done = 0; 677 + walk_ctx.pcache_req = pcache_req; 678 + walk_ctx.before = read_before; 679 + walk_ctx.overlap_tail = read_overlap_tail; 680 + walk_ctx.overlap_head = read_overlap_head; 681 + walk_ctx.overlap_contain = read_overlap_contain; 682 + walk_ctx.overlap_contained = read_overlap_contained; 683 + walk_ctx.walk_finally = read_walk_finally; 684 + walk_ctx.walk_done = read_walk_done; 685 + walk_ctx.delete_key_list = &delete_key_list; 686 + walk_ctx.submit_req_list = &submit_req_list; 687 + 688 + next: 689 + key->off = pcache_req->off + walk_ctx.req_done; 690 + key->len = pcache_req->data_len - walk_ctx.req_done; 691 + if (key->len > PCACHE_CACHE_SUBTREE_SIZE - (key->off & PCACHE_CACHE_SUBTREE_SIZE_MASK)) 692 + key->len = PCACHE_CACHE_SUBTREE_SIZE - (key->off & PCACHE_CACHE_SUBTREE_SIZE_MASK); 693 + 694 + cache_subtree = get_subtree(&cache->req_key_tree, key->off); 695 + spin_lock(&cache_subtree->tree_lock); 696 + search: 697 + prev_node = cache_subtree_search(cache_subtree, key, NULL, NULL, &delete_key_list); 698 + if (!list_empty(&delete_key_list)) { 699 + list_for_each_entry_safe(key_tmp, key_next, &delete_key_list, list_node) { 700 + list_del_init(&key_tmp->list_node); 701 + cache_key_delete(key_tmp); 702 + } 703 + goto search; 704 + } 705 + 706 + walk_ctx.start_node = prev_node; 707 + walk_ctx.key = key; 708 + 709 + ret = cache_subtree_walk(&walk_ctx); 710 + if (ret == SUBTREE_WALK_RET_RESEARCH) 711 + goto search; 712 + spin_unlock(&cache_subtree->tree_lock); 713 + 714 + if (ret == SUBTREE_WALK_RET_ERR) { 715 + ret = walk_ctx.ret; 716 + goto out; 717 + } 718 + 719 + if (ret == SUBTREE_WALK_RET_NEED_REQ) { 720 + walk_ctx.pre_alloc_req = cache_miss_req_alloc(cache, pcache_req, GFP_NOIO); 721 + pcache_dev_debug(CACHE_TO_PCACHE(cache), "allocate pre_alloc_req with GFP_NOIO"); 722 + } 723 + 724 + if (walk_ctx.req_done < pcache_req->data_len) 725 + goto next; 726 + ret = 0; 727 + out: 728 + if (walk_ctx.pre_alloc_req) 729 + cache_miss_req_free(walk_ctx.pre_alloc_req); 730 + 731 + list_for_each_entry_safe(backing_req, next_req, &submit_req_list, node) { 732 + list_del_init(&backing_req->node); 733 + backing_dev_req_end(backing_req); 734 + } 735 + 736 + return ret; 737 + } 738 + 739 + static int cache_write(struct pcache_cache *cache, struct pcache_request *pcache_req) 740 + { 741 + struct pcache_cache_subtree *cache_subtree; 742 + struct pcache_cache_key *key; 743 + u64 offset = pcache_req->off; 744 + u32 length = pcache_req->data_len; 745 + u32 io_done = 0; 746 + int ret; 747 + 748 + while (true) { 749 + if (io_done >= length) 750 + break; 751 + 752 + key = cache_key_alloc(&cache->req_key_tree, GFP_NOIO); 753 + key->off = offset + io_done; 754 + key->len = length - io_done; 755 + if (key->len > PCACHE_CACHE_SUBTREE_SIZE - (key->off & PCACHE_CACHE_SUBTREE_SIZE_MASK)) 756 + key->len = PCACHE_CACHE_SUBTREE_SIZE - (key->off & PCACHE_CACHE_SUBTREE_SIZE_MASK); 757 + 758 + ret = cache_data_alloc(cache, key); 759 + if (ret) { 760 + cache_key_put(key); 761 + goto err; 762 + } 763 + 764 + ret = cache_copy_from_req_bio(cache, key, pcache_req, io_done); 765 + if (ret) { 766 + cache_seg_put(key->cache_pos.cache_seg); 767 + cache_key_put(key); 768 + goto err; 769 + } 770 + 771 + cache_subtree = get_subtree(&cache->req_key_tree, key->off); 772 + spin_lock(&cache_subtree->tree_lock); 773 + cache_key_insert(&cache->req_key_tree, key, true); 774 + ret = cache_key_append(cache, key, pcache_req->bio->bi_opf & REQ_FUA); 775 + if (ret) { 776 + cache_seg_put(key->cache_pos.cache_seg); 777 + cache_key_delete(key); 778 + goto unlock; 779 + } 780 + 781 + io_done += key->len; 782 + spin_unlock(&cache_subtree->tree_lock); 783 + } 784 + 785 + return 0; 786 + unlock: 787 + spin_unlock(&cache_subtree->tree_lock); 788 + err: 789 + return ret; 790 + } 791 + 792 + /** 793 + * cache_flush - Flush all ksets to persist any pending cache data 794 + * @cache: Pointer to the cache structure 795 + * 796 + * This function iterates through all ksets associated with the provided `cache` 797 + * and ensures that any data marked for persistence is written to media. For each 798 + * kset, it acquires the kset lock, then invokes `cache_kset_close`, which handles 799 + * the persistence logic for that kset. 800 + * 801 + * If `cache_kset_close` encounters an error, the function exits immediately with 802 + * the respective error code, preventing the flush operation from proceeding to 803 + * subsequent ksets. 804 + */ 805 + int cache_flush(struct pcache_cache *cache) 806 + { 807 + struct pcache_cache_kset *kset; 808 + u32 i, ret; 809 + 810 + for (i = 0; i < cache->n_ksets; i++) { 811 + kset = get_kset(cache, i); 812 + 813 + spin_lock(&kset->kset_lock); 814 + ret = cache_kset_close(cache, kset); 815 + spin_unlock(&kset->kset_lock); 816 + 817 + if (ret) 818 + return ret; 819 + } 820 + 821 + return 0; 822 + } 823 + 824 + int pcache_cache_handle_req(struct pcache_cache *cache, struct pcache_request *pcache_req) 825 + { 826 + struct bio *bio = pcache_req->bio; 827 + 828 + if (unlikely(bio->bi_opf & REQ_PREFLUSH)) 829 + return cache_flush(cache); 830 + 831 + if (bio_data_dir(bio) == READ) 832 + return cache_read(cache, pcache_req); 833 + 834 + return cache_write(cache, pcache_req); 835 + }

+293

drivers/md/dm-pcache/cache_segment.c

··· 1 + // SPDX-License-Identifier: GPL-2.0-or-later 2 + 3 + #include "cache_dev.h" 4 + #include "cache.h" 5 + #include "backing_dev.h" 6 + #include "dm_pcache.h" 7 + 8 + static inline struct pcache_segment_info *get_seg_info_addr(struct pcache_cache_segment *cache_seg) 9 + { 10 + struct pcache_segment_info *seg_info_addr; 11 + u32 seg_id = cache_seg->segment.seg_id; 12 + void *seg_addr; 13 + 14 + seg_addr = CACHE_DEV_SEGMENT(cache_seg->cache->cache_dev, seg_id); 15 + seg_info_addr = seg_addr + PCACHE_SEG_INFO_SIZE * cache_seg->info_index; 16 + 17 + return seg_info_addr; 18 + } 19 + 20 + static void cache_seg_info_write(struct pcache_cache_segment *cache_seg) 21 + { 22 + struct pcache_segment_info *seg_info_addr; 23 + struct pcache_segment_info *seg_info = &cache_seg->cache_seg_info; 24 + 25 + mutex_lock(&cache_seg->info_lock); 26 + seg_info->header.seq++; 27 + seg_info->header.crc = pcache_meta_crc(&seg_info->header, sizeof(struct pcache_segment_info)); 28 + 29 + seg_info_addr = get_seg_info_addr(cache_seg); 30 + memcpy_flushcache(seg_info_addr, seg_info, sizeof(struct pcache_segment_info)); 31 + pmem_wmb(); 32 + 33 + cache_seg->info_index = (cache_seg->info_index + 1) % PCACHE_META_INDEX_MAX; 34 + mutex_unlock(&cache_seg->info_lock); 35 + } 36 + 37 + static int cache_seg_info_load(struct pcache_cache_segment *cache_seg) 38 + { 39 + struct pcache_segment_info *cache_seg_info_addr_base, *cache_seg_info_addr; 40 + struct pcache_cache_dev *cache_dev = cache_seg->cache->cache_dev; 41 + struct dm_pcache *pcache = CACHE_DEV_TO_PCACHE(cache_dev); 42 + u32 seg_id = cache_seg->segment.seg_id; 43 + int ret = 0; 44 + 45 + cache_seg_info_addr_base = CACHE_DEV_SEGMENT(cache_dev, seg_id); 46 + 47 + mutex_lock(&cache_seg->info_lock); 48 + cache_seg_info_addr = pcache_meta_find_latest(&cache_seg_info_addr_base->header, 49 + sizeof(struct pcache_segment_info), 50 + PCACHE_SEG_INFO_SIZE, 51 + &cache_seg->cache_seg_info); 52 + if (IS_ERR(cache_seg_info_addr)) { 53 + ret = PTR_ERR(cache_seg_info_addr); 54 + goto out; 55 + } else if (!cache_seg_info_addr) { 56 + ret = -EIO; 57 + goto out; 58 + } 59 + cache_seg->info_index = cache_seg_info_addr - cache_seg_info_addr_base; 60 + out: 61 + mutex_unlock(&cache_seg->info_lock); 62 + 63 + if (ret) 64 + pcache_dev_err(pcache, "can't read segment info of segment: %u, ret: %d\n", 65 + cache_seg->segment.seg_id, ret); 66 + return ret; 67 + } 68 + 69 + static int cache_seg_ctrl_load(struct pcache_cache_segment *cache_seg) 70 + { 71 + struct pcache_cache_seg_ctrl *cache_seg_ctrl = cache_seg->cache_seg_ctrl; 72 + struct pcache_cache_seg_gen cache_seg_gen, *cache_seg_gen_addr; 73 + int ret = 0; 74 + 75 + mutex_lock(&cache_seg->ctrl_lock); 76 + cache_seg_gen_addr = pcache_meta_find_latest(&cache_seg_ctrl->gen->header, 77 + sizeof(struct pcache_cache_seg_gen), 78 + sizeof(struct pcache_cache_seg_gen), 79 + &cache_seg_gen); 80 + if (IS_ERR(cache_seg_gen_addr)) { 81 + ret = PTR_ERR(cache_seg_gen_addr); 82 + goto out; 83 + } 84 + 85 + if (!cache_seg_gen_addr) { 86 + cache_seg->gen = 0; 87 + cache_seg->gen_seq = 0; 88 + cache_seg->gen_index = 0; 89 + goto out; 90 + } 91 + 92 + cache_seg->gen = cache_seg_gen.gen; 93 + cache_seg->gen_seq = cache_seg_gen.header.seq; 94 + cache_seg->gen_index = (cache_seg_gen_addr - cache_seg_ctrl->gen); 95 + out: 96 + mutex_unlock(&cache_seg->ctrl_lock); 97 + 98 + return ret; 99 + } 100 + 101 + static inline struct pcache_cache_seg_gen *get_cache_seg_gen_addr(struct pcache_cache_segment *cache_seg) 102 + { 103 + struct pcache_cache_seg_ctrl *cache_seg_ctrl = cache_seg->cache_seg_ctrl; 104 + 105 + return (cache_seg_ctrl->gen + cache_seg->gen_index); 106 + } 107 + 108 + static void cache_seg_ctrl_write(struct pcache_cache_segment *cache_seg) 109 + { 110 + struct pcache_cache_seg_gen cache_seg_gen; 111 + 112 + mutex_lock(&cache_seg->ctrl_lock); 113 + cache_seg_gen.gen = cache_seg->gen; 114 + cache_seg_gen.header.seq = ++cache_seg->gen_seq; 115 + cache_seg_gen.header.crc = pcache_meta_crc(&cache_seg_gen.header, 116 + sizeof(struct pcache_cache_seg_gen)); 117 + 118 + memcpy_flushcache(get_cache_seg_gen_addr(cache_seg), &cache_seg_gen, sizeof(struct pcache_cache_seg_gen)); 119 + pmem_wmb(); 120 + 121 + cache_seg->gen_index = (cache_seg->gen_index + 1) % PCACHE_META_INDEX_MAX; 122 + mutex_unlock(&cache_seg->ctrl_lock); 123 + } 124 + 125 + static void cache_seg_ctrl_init(struct pcache_cache_segment *cache_seg) 126 + { 127 + cache_seg->gen = 0; 128 + cache_seg->gen_seq = 0; 129 + cache_seg->gen_index = 0; 130 + cache_seg_ctrl_write(cache_seg); 131 + } 132 + 133 + static int cache_seg_meta_load(struct pcache_cache_segment *cache_seg) 134 + { 135 + int ret; 136 + 137 + ret = cache_seg_info_load(cache_seg); 138 + if (ret) 139 + goto err; 140 + 141 + ret = cache_seg_ctrl_load(cache_seg); 142 + if (ret) 143 + goto err; 144 + 145 + return 0; 146 + err: 147 + return ret; 148 + } 149 + 150 + /** 151 + * cache_seg_set_next_seg - Sets the ID of the next segment 152 + * @cache_seg: Pointer to the cache segment structure. 153 + * @seg_id: The segment ID to set as the next segment. 154 + * 155 + * A pcache_cache allocates multiple cache segments, which are linked together 156 + * through next_seg. When loading a pcache_cache, the first cache segment can 157 + * be found using cache->seg_id, which allows access to all the cache segments. 158 + */ 159 + void cache_seg_set_next_seg(struct pcache_cache_segment *cache_seg, u32 seg_id) 160 + { 161 + cache_seg->cache_seg_info.flags |= PCACHE_SEG_INFO_FLAGS_HAS_NEXT; 162 + cache_seg->cache_seg_info.next_seg = seg_id; 163 + cache_seg_info_write(cache_seg); 164 + } 165 + 166 + int cache_seg_init(struct pcache_cache *cache, u32 seg_id, u32 cache_seg_id, 167 + bool new_cache) 168 + { 169 + struct pcache_cache_dev *cache_dev = cache->cache_dev; 170 + struct pcache_cache_segment *cache_seg = &cache->segments[cache_seg_id]; 171 + struct pcache_segment_init_options seg_options = { 0 }; 172 + struct pcache_segment *segment = &cache_seg->segment; 173 + int ret; 174 + 175 + cache_seg->cache = cache; 176 + cache_seg->cache_seg_id = cache_seg_id; 177 + spin_lock_init(&cache_seg->gen_lock); 178 + atomic_set(&cache_seg->refs, 0); 179 + mutex_init(&cache_seg->info_lock); 180 + mutex_init(&cache_seg->ctrl_lock); 181 + 182 + /* init pcache_segment */ 183 + seg_options.type = PCACHE_SEGMENT_TYPE_CACHE_DATA; 184 + seg_options.data_off = PCACHE_CACHE_SEG_CTRL_OFF + PCACHE_CACHE_SEG_CTRL_SIZE; 185 + seg_options.seg_id = seg_id; 186 + seg_options.seg_info = &cache_seg->cache_seg_info; 187 + pcache_segment_init(cache_dev, segment, &seg_options); 188 + 189 + cache_seg->cache_seg_ctrl = CACHE_DEV_SEGMENT(cache_dev, seg_id) + PCACHE_CACHE_SEG_CTRL_OFF; 190 + 191 + if (new_cache) { 192 + cache_dev_zero_range(cache_dev, CACHE_DEV_SEGMENT(cache_dev, seg_id), 193 + PCACHE_SEG_INFO_SIZE * PCACHE_META_INDEX_MAX + 194 + PCACHE_CACHE_SEG_CTRL_SIZE); 195 + 196 + cache_seg_ctrl_init(cache_seg); 197 + 198 + cache_seg->info_index = 0; 199 + cache_seg_info_write(cache_seg); 200 + 201 + /* clear outdated kset in segment */ 202 + memcpy_flushcache(segment->data, &pcache_empty_kset, sizeof(struct pcache_cache_kset_onmedia)); 203 + pmem_wmb(); 204 + } else { 205 + ret = cache_seg_meta_load(cache_seg); 206 + if (ret) 207 + goto err; 208 + } 209 + 210 + return 0; 211 + err: 212 + return ret; 213 + } 214 + 215 + /** 216 + * get_cache_segment - Retrieves a free cache segment from the cache. 217 + * @cache: Pointer to the cache structure. 218 + * 219 + * This function attempts to find a free cache segment that can be used. 220 + * It locks the segment map and checks for the next available segment ID. 221 + * If a free segment is found, it initializes it and returns a pointer to the 222 + * cache segment structure. Returns NULL if no segments are available. 223 + */ 224 + struct pcache_cache_segment *get_cache_segment(struct pcache_cache *cache) 225 + { 226 + struct pcache_cache_segment *cache_seg; 227 + u32 seg_id; 228 + 229 + spin_lock(&cache->seg_map_lock); 230 + again: 231 + seg_id = find_next_zero_bit(cache->seg_map, cache->n_segs, cache->last_cache_seg); 232 + if (seg_id == cache->n_segs) { 233 + /* reset the hint of ->last_cache_seg and retry */ 234 + if (cache->last_cache_seg) { 235 + cache->last_cache_seg = 0; 236 + goto again; 237 + } 238 + cache->cache_full = true; 239 + spin_unlock(&cache->seg_map_lock); 240 + return NULL; 241 + } 242 + 243 + /* 244 + * found an available cache_seg, mark it used in seg_map 245 + * and update the search hint ->last_cache_seg 246 + */ 247 + __set_bit(seg_id, cache->seg_map); 248 + cache->last_cache_seg = seg_id; 249 + spin_unlock(&cache->seg_map_lock); 250 + 251 + cache_seg = &cache->segments[seg_id]; 252 + cache_seg->cache_seg_id = seg_id; 253 + 254 + return cache_seg; 255 + } 256 + 257 + static void cache_seg_gen_increase(struct pcache_cache_segment *cache_seg) 258 + { 259 + spin_lock(&cache_seg->gen_lock); 260 + cache_seg->gen++; 261 + spin_unlock(&cache_seg->gen_lock); 262 + 263 + cache_seg_ctrl_write(cache_seg); 264 + } 265 + 266 + void cache_seg_get(struct pcache_cache_segment *cache_seg) 267 + { 268 + atomic_inc(&cache_seg->refs); 269 + } 270 + 271 + static void cache_seg_invalidate(struct pcache_cache_segment *cache_seg) 272 + { 273 + struct pcache_cache *cache; 274 + 275 + cache = cache_seg->cache; 276 + cache_seg_gen_increase(cache_seg); 277 + 278 + spin_lock(&cache->seg_map_lock); 279 + if (cache->cache_full) 280 + cache->cache_full = false; 281 + __clear_bit(cache_seg->cache_seg_id, cache->seg_map); 282 + spin_unlock(&cache->seg_map_lock); 283 + 284 + pcache_defer_reqs_kick(CACHE_TO_PCACHE(cache)); 285 + /* clean_work will clean the bad key in key_tree*/ 286 + queue_work(cache_get_wq(cache), &cache->clean_work); 287 + } 288 + 289 + void cache_seg_put(struct pcache_cache_segment *cache_seg) 290 + { 291 + if (atomic_dec_and_test(&cache_seg->refs)) 292 + cache_seg_invalidate(cache_seg); 293 + }

+261

drivers/md/dm-pcache/cache_writeback.c

··· 1 + // SPDX-License-Identifier: GPL-2.0-or-later 2 + 3 + #include <linux/bio.h> 4 + 5 + #include "cache.h" 6 + #include "backing_dev.h" 7 + #include "cache_dev.h" 8 + #include "dm_pcache.h" 9 + 10 + static void writeback_ctx_end(struct pcache_cache *cache, int ret) 11 + { 12 + if (ret && !cache->writeback_ctx.ret) { 13 + pcache_dev_err(CACHE_TO_PCACHE(cache), "writeback error: %d", ret); 14 + cache->writeback_ctx.ret = ret; 15 + } 16 + 17 + if (!atomic_dec_and_test(&cache->writeback_ctx.pending)) 18 + return; 19 + 20 + if (!cache->writeback_ctx.ret) { 21 + backing_dev_flush(cache->backing_dev); 22 + 23 + mutex_lock(&cache->dirty_tail_lock); 24 + cache_pos_advance(&cache->dirty_tail, cache->writeback_ctx.advance); 25 + cache_encode_dirty_tail(cache); 26 + mutex_unlock(&cache->dirty_tail_lock); 27 + } 28 + queue_delayed_work(cache_get_wq(cache), &cache->writeback_work, 0); 29 + } 30 + 31 + static void writeback_end_req(struct pcache_backing_dev_req *backing_req, int ret) 32 + { 33 + struct pcache_cache *cache = backing_req->priv_data; 34 + 35 + mutex_lock(&cache->writeback_lock); 36 + writeback_ctx_end(cache, ret); 37 + mutex_unlock(&cache->writeback_lock); 38 + } 39 + 40 + static inline bool is_cache_clean(struct pcache_cache *cache, struct pcache_cache_pos *dirty_tail) 41 + { 42 + struct dm_pcache *pcache = CACHE_TO_PCACHE(cache); 43 + struct pcache_cache_kset_onmedia *kset_onmedia; 44 + u32 to_copy; 45 + void *addr; 46 + int ret; 47 + 48 + addr = cache_pos_addr(dirty_tail); 49 + kset_onmedia = (struct pcache_cache_kset_onmedia *)cache->wb_kset_onmedia_buf; 50 + 51 + to_copy = min(PCACHE_KSET_ONMEDIA_SIZE_MAX, PCACHE_SEG_SIZE - dirty_tail->seg_off); 52 + ret = copy_mc_to_kernel(kset_onmedia, addr, to_copy); 53 + if (ret) { 54 + pcache_dev_err(pcache, "error to read kset: %d", ret); 55 + return true; 56 + } 57 + 58 + /* Check if the magic number matches the expected value */ 59 + if (kset_onmedia->magic != PCACHE_KSET_MAGIC) { 60 + pcache_dev_debug(pcache, "dirty_tail: %u:%u magic: %llx, not expected: %llx\n", 61 + dirty_tail->cache_seg->cache_seg_id, dirty_tail->seg_off, 62 + kset_onmedia->magic, PCACHE_KSET_MAGIC); 63 + return true; 64 + } 65 + 66 + /* Verify the CRC checksum for data integrity */ 67 + if (kset_onmedia->crc != cache_kset_crc(kset_onmedia)) { 68 + pcache_dev_debug(pcache, "dirty_tail: %u:%u crc: %x, not expected: %x\n", 69 + dirty_tail->cache_seg->cache_seg_id, dirty_tail->seg_off, 70 + cache_kset_crc(kset_onmedia), kset_onmedia->crc); 71 + return true; 72 + } 73 + 74 + return false; 75 + } 76 + 77 + void cache_writeback_exit(struct pcache_cache *cache) 78 + { 79 + cancel_delayed_work_sync(&cache->writeback_work); 80 + backing_dev_flush(cache->backing_dev); 81 + cache_tree_exit(&cache->writeback_key_tree); 82 + } 83 + 84 + int cache_writeback_init(struct pcache_cache *cache) 85 + { 86 + int ret; 87 + 88 + ret = cache_tree_init(cache, &cache->writeback_key_tree, 1); 89 + if (ret) 90 + goto err; 91 + 92 + atomic_set(&cache->writeback_ctx.pending, 0); 93 + 94 + /* Queue delayed work to start writeback handling */ 95 + queue_delayed_work(cache_get_wq(cache), &cache->writeback_work, 0); 96 + 97 + return 0; 98 + err: 99 + return ret; 100 + } 101 + 102 + static void cache_key_writeback(struct pcache_cache *cache, struct pcache_cache_key *key) 103 + { 104 + struct pcache_backing_dev_req *writeback_req; 105 + struct pcache_backing_dev_req_opts writeback_req_opts = { 0 }; 106 + struct pcache_cache_pos *pos; 107 + void *addr; 108 + u32 seg_remain, req_len, done = 0; 109 + 110 + if (cache_key_clean(key)) 111 + return; 112 + 113 + pos = &key->cache_pos; 114 + 115 + seg_remain = cache_seg_remain(pos); 116 + BUG_ON(seg_remain < key->len); 117 + next_req: 118 + addr = cache_pos_addr(pos) + done; 119 + req_len = backing_dev_req_coalesced_max_len(addr, key->len - done); 120 + 121 + writeback_req_opts.type = BACKING_DEV_REQ_TYPE_KMEM; 122 + writeback_req_opts.gfp_mask = GFP_NOIO; 123 + writeback_req_opts.end_fn = writeback_end_req; 124 + writeback_req_opts.priv_data = cache; 125 + 126 + writeback_req_opts.kmem.data = addr; 127 + writeback_req_opts.kmem.opf = REQ_OP_WRITE; 128 + writeback_req_opts.kmem.len = req_len; 129 + writeback_req_opts.kmem.backing_off = key->off + done; 130 + 131 + writeback_req = backing_dev_req_create(cache->backing_dev, &writeback_req_opts); 132 + 133 + atomic_inc(&cache->writeback_ctx.pending); 134 + backing_dev_req_submit(writeback_req, true); 135 + 136 + done += req_len; 137 + if (done < key->len) 138 + goto next_req; 139 + } 140 + 141 + static void cache_wb_tree_writeback(struct pcache_cache *cache, u32 advance) 142 + { 143 + struct pcache_cache_tree *cache_tree = &cache->writeback_key_tree; 144 + struct pcache_cache_subtree *cache_subtree; 145 + struct rb_node *node; 146 + struct pcache_cache_key *key; 147 + u32 i; 148 + 149 + cache->writeback_ctx.ret = 0; 150 + cache->writeback_ctx.advance = advance; 151 + atomic_set(&cache->writeback_ctx.pending, 1); 152 + 153 + for (i = 0; i < cache_tree->n_subtrees; i++) { 154 + cache_subtree = &cache_tree->subtrees[i]; 155 + 156 + node = rb_first(&cache_subtree->root); 157 + while (node) { 158 + key = CACHE_KEY(node); 159 + node = rb_next(node); 160 + 161 + cache_key_writeback(cache, key); 162 + cache_key_delete(key); 163 + } 164 + } 165 + writeback_ctx_end(cache, 0); 166 + } 167 + 168 + static int cache_kset_insert_tree(struct pcache_cache *cache, struct pcache_cache_kset_onmedia *kset_onmedia) 169 + { 170 + struct pcache_cache_key_onmedia *key_onmedia; 171 + struct pcache_cache_subtree *cache_subtree; 172 + struct pcache_cache_key *key; 173 + int ret; 174 + u32 i; 175 + 176 + /* Iterate through all keys in the kset and write each back to storage */ 177 + for (i = 0; i < kset_onmedia->key_num; i++) { 178 + key_onmedia = &kset_onmedia->data[i]; 179 + 180 + key = cache_key_alloc(&cache->writeback_key_tree, GFP_NOIO); 181 + ret = cache_key_decode(cache, key_onmedia, key); 182 + if (ret) { 183 + cache_key_put(key); 184 + goto clear_tree; 185 + } 186 + 187 + cache_subtree = get_subtree(&cache->writeback_key_tree, key->off); 188 + spin_lock(&cache_subtree->tree_lock); 189 + cache_key_insert(&cache->writeback_key_tree, key, true); 190 + spin_unlock(&cache_subtree->tree_lock); 191 + } 192 + 193 + return 0; 194 + clear_tree: 195 + cache_tree_clear(&cache->writeback_key_tree); 196 + return ret; 197 + } 198 + 199 + static void last_kset_writeback(struct pcache_cache *cache, 200 + struct pcache_cache_kset_onmedia *last_kset_onmedia) 201 + { 202 + struct dm_pcache *pcache = CACHE_TO_PCACHE(cache); 203 + struct pcache_cache_segment *next_seg; 204 + 205 + pcache_dev_debug(pcache, "last kset, next: %u\n", last_kset_onmedia->next_cache_seg_id); 206 + 207 + next_seg = &cache->segments[last_kset_onmedia->next_cache_seg_id]; 208 + 209 + mutex_lock(&cache->dirty_tail_lock); 210 + cache->dirty_tail.cache_seg = next_seg; 211 + cache->dirty_tail.seg_off = 0; 212 + cache_encode_dirty_tail(cache); 213 + mutex_unlock(&cache->dirty_tail_lock); 214 + } 215 + 216 + void cache_writeback_fn(struct work_struct *work) 217 + { 218 + struct pcache_cache *cache = container_of(work, struct pcache_cache, writeback_work.work); 219 + struct dm_pcache *pcache = CACHE_TO_PCACHE(cache); 220 + struct pcache_cache_pos dirty_tail; 221 + struct pcache_cache_kset_onmedia *kset_onmedia; 222 + u32 delay; 223 + int ret; 224 + 225 + mutex_lock(&cache->writeback_lock); 226 + if (atomic_read(&cache->writeback_ctx.pending)) 227 + goto unlock; 228 + 229 + if (pcache_is_stopping(pcache)) 230 + goto unlock; 231 + 232 + kset_onmedia = (struct pcache_cache_kset_onmedia *)cache->wb_kset_onmedia_buf; 233 + 234 + mutex_lock(&cache->dirty_tail_lock); 235 + cache_pos_copy(&dirty_tail, &cache->dirty_tail); 236 + mutex_unlock(&cache->dirty_tail_lock); 237 + 238 + if (is_cache_clean(cache, &dirty_tail)) { 239 + delay = PCACHE_CACHE_WRITEBACK_INTERVAL; 240 + goto queue_work; 241 + } 242 + 243 + if (kset_onmedia->flags & PCACHE_KSET_FLAGS_LAST) { 244 + last_kset_writeback(cache, kset_onmedia); 245 + delay = 0; 246 + goto queue_work; 247 + } 248 + 249 + ret = cache_kset_insert_tree(cache, kset_onmedia); 250 + if (ret) { 251 + delay = PCACHE_CACHE_WRITEBACK_INTERVAL; 252 + goto queue_work; 253 + } 254 + 255 + cache_wb_tree_writeback(cache, get_kset_onmedia_size(kset_onmedia)); 256 + delay = 0; 257 + queue_work: 258 + queue_delayed_work(cache_get_wq(cache), &cache->writeback_work, delay); 259 + unlock: 260 + mutex_unlock(&cache->writeback_lock); 261 + }

+497

drivers/md/dm-pcache/dm_pcache.c

··· 1 + // SPDX-License-Identifier: GPL-2.0-or-later 2 + #include <linux/module.h> 3 + #include <linux/blkdev.h> 4 + #include <linux/bio.h> 5 + 6 + #include "../dm-core.h" 7 + #include "cache_dev.h" 8 + #include "backing_dev.h" 9 + #include "cache.h" 10 + #include "dm_pcache.h" 11 + 12 + void pcache_defer_reqs_kick(struct dm_pcache *pcache) 13 + { 14 + struct pcache_cache *cache = &pcache->cache; 15 + 16 + spin_lock(&cache->seg_map_lock); 17 + if (!cache->cache_full) 18 + queue_work(pcache->task_wq, &pcache->defered_req_work); 19 + spin_unlock(&cache->seg_map_lock); 20 + } 21 + 22 + static void defer_req(struct pcache_request *pcache_req) 23 + { 24 + struct dm_pcache *pcache = pcache_req->pcache; 25 + 26 + BUG_ON(!list_empty(&pcache_req->list_node)); 27 + 28 + spin_lock(&pcache->defered_req_list_lock); 29 + list_add(&pcache_req->list_node, &pcache->defered_req_list); 30 + pcache_defer_reqs_kick(pcache); 31 + spin_unlock(&pcache->defered_req_list_lock); 32 + } 33 + 34 + static void defered_req_fn(struct work_struct *work) 35 + { 36 + struct dm_pcache *pcache = container_of(work, struct dm_pcache, defered_req_work); 37 + struct pcache_request *pcache_req; 38 + LIST_HEAD(tmp_list); 39 + int ret; 40 + 41 + if (pcache_is_stopping(pcache)) 42 + return; 43 + 44 + spin_lock(&pcache->defered_req_list_lock); 45 + list_splice_init(&pcache->defered_req_list, &tmp_list); 46 + spin_unlock(&pcache->defered_req_list_lock); 47 + 48 + while (!list_empty(&tmp_list)) { 49 + pcache_req = list_first_entry(&tmp_list, 50 + struct pcache_request, list_node); 51 + list_del_init(&pcache_req->list_node); 52 + pcache_req->ret = 0; 53 + ret = pcache_cache_handle_req(&pcache->cache, pcache_req); 54 + if (ret == -EBUSY) 55 + defer_req(pcache_req); 56 + else 57 + pcache_req_put(pcache_req, ret); 58 + } 59 + } 60 + 61 + void pcache_req_get(struct pcache_request *pcache_req) 62 + { 63 + kref_get(&pcache_req->ref); 64 + } 65 + 66 + static void end_req(struct kref *ref) 67 + { 68 + struct pcache_request *pcache_req = container_of(ref, struct pcache_request, ref); 69 + struct dm_pcache *pcache = pcache_req->pcache; 70 + struct bio *bio = pcache_req->bio; 71 + int ret = pcache_req->ret; 72 + 73 + if (ret == -EBUSY) { 74 + pcache_req_get(pcache_req); 75 + defer_req(pcache_req); 76 + } else { 77 + bio->bi_status = errno_to_blk_status(ret); 78 + bio_endio(bio); 79 + 80 + if (atomic_dec_and_test(&pcache->inflight_reqs)) 81 + wake_up(&pcache->inflight_wq); 82 + } 83 + } 84 + 85 + void pcache_req_put(struct pcache_request *pcache_req, int ret) 86 + { 87 + /* Set the return status if it is not already set */ 88 + if (ret && !pcache_req->ret) 89 + pcache_req->ret = ret; 90 + 91 + kref_put(&pcache_req->ref, end_req); 92 + } 93 + 94 + static bool at_least_one_arg(struct dm_arg_set *as, char **error) 95 + { 96 + if (!as->argc) { 97 + *error = "Insufficient args"; 98 + return false; 99 + } 100 + 101 + return true; 102 + } 103 + 104 + static int parse_cache_dev(struct dm_pcache *pcache, struct dm_arg_set *as, 105 + char **error) 106 + { 107 + int ret; 108 + 109 + if (!at_least_one_arg(as, error)) 110 + return -EINVAL; 111 + ret = dm_get_device(pcache->ti, dm_shift_arg(as), 112 + BLK_OPEN_READ | BLK_OPEN_WRITE, 113 + &pcache->cache_dev.dm_dev); 114 + if (ret) { 115 + *error = "Error opening cache device"; 116 + return ret; 117 + } 118 + 119 + return 0; 120 + } 121 + 122 + static int parse_backing_dev(struct dm_pcache *pcache, struct dm_arg_set *as, 123 + char **error) 124 + { 125 + int ret; 126 + 127 + if (!at_least_one_arg(as, error)) 128 + return -EINVAL; 129 + 130 + ret = dm_get_device(pcache->ti, dm_shift_arg(as), 131 + BLK_OPEN_READ | BLK_OPEN_WRITE, 132 + &pcache->backing_dev.dm_dev); 133 + if (ret) { 134 + *error = "Error opening backing device"; 135 + return ret; 136 + } 137 + 138 + return 0; 139 + } 140 + 141 + static void pcache_init_opts(struct pcache_cache_options *opts) 142 + { 143 + opts->cache_mode = PCACHE_CACHE_MODE_WRITEBACK; 144 + opts->data_crc = false; 145 + } 146 + 147 + static int parse_cache_opts(struct dm_pcache *pcache, struct dm_arg_set *as, 148 + char **error) 149 + { 150 + struct pcache_cache_options *opts = &pcache->opts; 151 + static const struct dm_arg _args[] = { 152 + {0, 4, "Invalid number of cache option arguments"}, 153 + }; 154 + unsigned int argc; 155 + const char *arg; 156 + int ret; 157 + 158 + pcache_init_opts(opts); 159 + if (!as->argc) 160 + return 0; 161 + 162 + ret = dm_read_arg_group(_args, as, &argc, error); 163 + if (ret) 164 + return -EINVAL; 165 + 166 + while (argc) { 167 + arg = dm_shift_arg(as); 168 + argc--; 169 + 170 + if (!strcmp(arg, "cache_mode")) { 171 + arg = dm_shift_arg(as); 172 + if (!strcmp(arg, "writeback")) { 173 + opts->cache_mode = PCACHE_CACHE_MODE_WRITEBACK; 174 + } else { 175 + *error = "Invalid cache mode parameter"; 176 + return -EINVAL; 177 + } 178 + argc--; 179 + } else if (!strcmp(arg, "data_crc")) { 180 + arg = dm_shift_arg(as); 181 + if (!strcmp(arg, "true")) { 182 + opts->data_crc = true; 183 + } else if (!strcmp(arg, "false")) { 184 + opts->data_crc = false; 185 + } else { 186 + *error = "Invalid data crc parameter"; 187 + return -EINVAL; 188 + } 189 + argc--; 190 + } else { 191 + *error = "Unrecognised cache option requested"; 192 + return -EINVAL; 193 + } 194 + } 195 + 196 + return 0; 197 + } 198 + 199 + static int pcache_start(struct dm_pcache *pcache, char **error) 200 + { 201 + int ret; 202 + 203 + ret = cache_dev_start(pcache); 204 + if (ret) { 205 + *error = "Failed to start cache dev"; 206 + return ret; 207 + } 208 + 209 + ret = backing_dev_start(pcache); 210 + if (ret) { 211 + *error = "Failed to start backing dev"; 212 + goto stop_cache; 213 + } 214 + 215 + ret = pcache_cache_start(pcache); 216 + if (ret) { 217 + *error = "Failed to start pcache"; 218 + goto stop_backing; 219 + } 220 + 221 + return 0; 222 + stop_backing: 223 + backing_dev_stop(pcache); 224 + stop_cache: 225 + cache_dev_stop(pcache); 226 + 227 + return ret; 228 + } 229 + 230 + static void pcache_destroy_args(struct dm_pcache *pcache) 231 + { 232 + if (pcache->cache_dev.dm_dev) 233 + dm_put_device(pcache->ti, pcache->cache_dev.dm_dev); 234 + if (pcache->backing_dev.dm_dev) 235 + dm_put_device(pcache->ti, pcache->backing_dev.dm_dev); 236 + } 237 + 238 + static int pcache_parse_args(struct dm_pcache *pcache, unsigned int argc, char **argv, 239 + char **error) 240 + { 241 + struct dm_arg_set as; 242 + int ret; 243 + 244 + as.argc = argc; 245 + as.argv = argv; 246 + 247 + /* 248 + * Parse cache device 249 + */ 250 + ret = parse_cache_dev(pcache, &as, error); 251 + if (ret) 252 + return ret; 253 + /* 254 + * Parse backing device 255 + */ 256 + ret = parse_backing_dev(pcache, &as, error); 257 + if (ret) 258 + goto out; 259 + /* 260 + * Parse optional arguments 261 + */ 262 + ret = parse_cache_opts(pcache, &as, error); 263 + if (ret) 264 + goto out; 265 + 266 + return 0; 267 + out: 268 + pcache_destroy_args(pcache); 269 + return ret; 270 + } 271 + 272 + static int dm_pcache_ctr(struct dm_target *ti, unsigned int argc, char **argv) 273 + { 274 + struct mapped_device *md = ti->table->md; 275 + struct dm_pcache *pcache; 276 + int ret; 277 + 278 + if (md->map) { 279 + ti->error = "Don't support table loading for live md"; 280 + return -EOPNOTSUPP; 281 + } 282 + 283 + /* Allocate memory for the cache structure */ 284 + pcache = kzalloc(sizeof(struct dm_pcache), GFP_KERNEL); 285 + if (!pcache) 286 + return -ENOMEM; 287 + 288 + pcache->task_wq = alloc_workqueue("pcache-%s-wq", WQ_UNBOUND | WQ_MEM_RECLAIM, 289 + 0, md->name); 290 + if (!pcache->task_wq) { 291 + ret = -ENOMEM; 292 + goto free_pcache; 293 + } 294 + 295 + spin_lock_init(&pcache->defered_req_list_lock); 296 + INIT_LIST_HEAD(&pcache->defered_req_list); 297 + INIT_WORK(&pcache->defered_req_work, defered_req_fn); 298 + pcache->ti = ti; 299 + 300 + ret = pcache_parse_args(pcache, argc, argv, &ti->error); 301 + if (ret) 302 + goto destroy_wq; 303 + 304 + ret = pcache_start(pcache, &ti->error); 305 + if (ret) 306 + goto destroy_args; 307 + 308 + ti->num_flush_bios = 1; 309 + ti->flush_supported = true; 310 + ti->per_io_data_size = sizeof(struct pcache_request); 311 + ti->private = pcache; 312 + atomic_set(&pcache->inflight_reqs, 0); 313 + atomic_set(&pcache->state, PCACHE_STATE_RUNNING); 314 + init_waitqueue_head(&pcache->inflight_wq); 315 + 316 + return 0; 317 + destroy_args: 318 + pcache_destroy_args(pcache); 319 + destroy_wq: 320 + destroy_workqueue(pcache->task_wq); 321 + free_pcache: 322 + kfree(pcache); 323 + 324 + return ret; 325 + } 326 + 327 + static void defer_req_stop(struct dm_pcache *pcache) 328 + { 329 + struct pcache_request *pcache_req; 330 + LIST_HEAD(tmp_list); 331 + 332 + flush_work(&pcache->defered_req_work); 333 + 334 + spin_lock(&pcache->defered_req_list_lock); 335 + list_splice_init(&pcache->defered_req_list, &tmp_list); 336 + spin_unlock(&pcache->defered_req_list_lock); 337 + 338 + while (!list_empty(&tmp_list)) { 339 + pcache_req = list_first_entry(&tmp_list, 340 + struct pcache_request, list_node); 341 + list_del_init(&pcache_req->list_node); 342 + pcache_req_put(pcache_req, -EIO); 343 + } 344 + } 345 + 346 + static void dm_pcache_dtr(struct dm_target *ti) 347 + { 348 + struct dm_pcache *pcache; 349 + 350 + pcache = ti->private; 351 + atomic_set(&pcache->state, PCACHE_STATE_STOPPING); 352 + defer_req_stop(pcache); 353 + 354 + wait_event(pcache->inflight_wq, 355 + atomic_read(&pcache->inflight_reqs) == 0); 356 + 357 + pcache_cache_stop(pcache); 358 + backing_dev_stop(pcache); 359 + cache_dev_stop(pcache); 360 + 361 + pcache_destroy_args(pcache); 362 + drain_workqueue(pcache->task_wq); 363 + destroy_workqueue(pcache->task_wq); 364 + 365 + kfree(pcache); 366 + } 367 + 368 + static int dm_pcache_map_bio(struct dm_target *ti, struct bio *bio) 369 + { 370 + struct pcache_request *pcache_req = dm_per_bio_data(bio, sizeof(struct pcache_request)); 371 + struct dm_pcache *pcache = ti->private; 372 + int ret; 373 + 374 + pcache_req->pcache = pcache; 375 + kref_init(&pcache_req->ref); 376 + pcache_req->ret = 0; 377 + pcache_req->bio = bio; 378 + pcache_req->off = (u64)bio->bi_iter.bi_sector << SECTOR_SHIFT; 379 + pcache_req->data_len = bio->bi_iter.bi_size; 380 + INIT_LIST_HEAD(&pcache_req->list_node); 381 + atomic_inc(&pcache->inflight_reqs); 382 + 383 + ret = pcache_cache_handle_req(&pcache->cache, pcache_req); 384 + if (ret == -EBUSY) 385 + defer_req(pcache_req); 386 + else 387 + pcache_req_put(pcache_req, ret); 388 + 389 + return DM_MAPIO_SUBMITTED; 390 + } 391 + 392 + static void dm_pcache_status(struct dm_target *ti, status_type_t type, 393 + unsigned int status_flags, char *result, 394 + unsigned int maxlen) 395 + { 396 + struct dm_pcache *pcache = ti->private; 397 + struct pcache_cache_dev *cache_dev = &pcache->cache_dev; 398 + struct pcache_backing_dev *backing_dev = &pcache->backing_dev; 399 + struct pcache_cache *cache = &pcache->cache; 400 + unsigned int sz = 0; 401 + 402 + switch (type) { 403 + case STATUSTYPE_INFO: 404 + DMEMIT("%x %u %u %u %u %x %u:%u %u:%u %u:%u", 405 + cache_dev->sb_flags, 406 + cache_dev->seg_num, 407 + cache->n_segs, 408 + bitmap_weight(cache->seg_map, cache->n_segs), 409 + pcache_cache_get_gc_percent(cache), 410 + cache->cache_info.flags, 411 + cache->key_head.cache_seg->cache_seg_id, 412 + cache->key_head.seg_off, 413 + cache->dirty_tail.cache_seg->cache_seg_id, 414 + cache->dirty_tail.seg_off, 415 + cache->key_tail.cache_seg->cache_seg_id, 416 + cache->key_tail.seg_off); 417 + break; 418 + case STATUSTYPE_TABLE: 419 + DMEMIT("%s %s 4 cache_mode writeback crc %s", 420 + cache_dev->dm_dev->name, 421 + backing_dev->dm_dev->name, 422 + cache_data_crc_on(cache) ? "true" : "false"); 423 + break; 424 + case STATUSTYPE_IMA: 425 + *result = '\0'; 426 + break; 427 + } 428 + } 429 + 430 + static int dm_pcache_message(struct dm_target *ti, unsigned int argc, 431 + char **argv, char *result, unsigned int maxlen) 432 + { 433 + struct dm_pcache *pcache = ti->private; 434 + unsigned long val; 435 + 436 + if (argc != 2) 437 + goto err; 438 + 439 + if (!strcasecmp(argv[0], "gc_percent")) { 440 + if (kstrtoul(argv[1], 10, &val)) 441 + goto err; 442 + 443 + return pcache_cache_set_gc_percent(&pcache->cache, val); 444 + } 445 + err: 446 + return -EINVAL; 447 + } 448 + 449 + static struct target_type dm_pcache_target = { 450 + .name = "pcache", 451 + .version = {0, 1, 0}, 452 + .module = THIS_MODULE, 453 + .features = DM_TARGET_SINGLETON, 454 + .ctr = dm_pcache_ctr, 455 + .dtr = dm_pcache_dtr, 456 + .map = dm_pcache_map_bio, 457 + .status = dm_pcache_status, 458 + .message = dm_pcache_message, 459 + }; 460 + 461 + static int __init dm_pcache_init(void) 462 + { 463 + int ret; 464 + 465 + ret = pcache_backing_init(); 466 + if (ret) 467 + goto err; 468 + 469 + ret = pcache_cache_init(); 470 + if (ret) 471 + goto backing_exit; 472 + 473 + ret = dm_register_target(&dm_pcache_target); 474 + if (ret) 475 + goto cache_exit; 476 + return 0; 477 + 478 + cache_exit: 479 + pcache_cache_exit(); 480 + backing_exit: 481 + pcache_backing_exit(); 482 + err: 483 + return ret; 484 + } 485 + module_init(dm_pcache_init); 486 + 487 + static void __exit dm_pcache_exit(void) 488 + { 489 + dm_unregister_target(&dm_pcache_target); 490 + pcache_cache_exit(); 491 + pcache_backing_exit(); 492 + } 493 + module_exit(dm_pcache_exit); 494 + 495 + MODULE_DESCRIPTION("dm-pcache Persistent Cache for block device"); 496 + MODULE_AUTHOR("Dongsheng Yang <dongsheng.yang@linux.dev>"); 497 + MODULE_LICENSE("GPL");

+67

drivers/md/dm-pcache/dm_pcache.h

··· 1 + /* SPDX-License-Identifier: GPL-2.0-or-later */ 2 + #ifndef _DM_PCACHE_H 3 + #define _DM_PCACHE_H 4 + #include <linux/device-mapper.h> 5 + 6 + #include "../dm-core.h" 7 + 8 + #define CACHE_DEV_TO_PCACHE(cache_dev) (container_of(cache_dev, struct dm_pcache, cache_dev)) 9 + #define BACKING_DEV_TO_PCACHE(backing_dev) (container_of(backing_dev, struct dm_pcache, backing_dev)) 10 + #define CACHE_TO_PCACHE(cache) (container_of(cache, struct dm_pcache, cache)) 11 + 12 + #define PCACHE_STATE_RUNNING 1 13 + #define PCACHE_STATE_STOPPING 2 14 + 15 + struct pcache_cache_dev; 16 + struct pcache_backing_dev; 17 + struct pcache_cache; 18 + struct pcache_cache_options; 19 + struct dm_pcache { 20 + struct dm_target *ti; 21 + struct pcache_cache_dev cache_dev; 22 + struct pcache_backing_dev backing_dev; 23 + struct pcache_cache cache; 24 + struct pcache_cache_options opts; 25 + 26 + spinlock_t defered_req_list_lock; 27 + struct list_head defered_req_list; 28 + struct workqueue_struct *task_wq; 29 + 30 + struct work_struct defered_req_work; 31 + 32 + atomic_t state; 33 + atomic_t inflight_reqs; 34 + wait_queue_head_t inflight_wq; 35 + }; 36 + 37 + static inline bool pcache_is_stopping(struct dm_pcache *pcache) 38 + { 39 + return (atomic_read(&pcache->state) == PCACHE_STATE_STOPPING); 40 + } 41 + 42 + #define pcache_dev_err(pcache, fmt, ...) \ 43 + pcache_err("%s " fmt, pcache->ti->table->md->name, ##__VA_ARGS__) 44 + #define pcache_dev_info(pcache, fmt, ...) \ 45 + pcache_info("%s " fmt, pcache->ti->table->md->name, ##__VA_ARGS__) 46 + #define pcache_dev_debug(pcache, fmt, ...) \ 47 + pcache_debug("%s " fmt, pcache->ti->table->md->name, ##__VA_ARGS__) 48 + 49 + struct pcache_request { 50 + struct dm_pcache *pcache; 51 + struct bio *bio; 52 + 53 + u64 off; 54 + u32 data_len; 55 + 56 + struct kref ref; 57 + int ret; 58 + 59 + struct list_head list_node; 60 + }; 61 + 62 + void pcache_req_get(struct pcache_request *pcache_req); 63 + void pcache_req_put(struct pcache_request *pcache_req, int ret); 64 + 65 + void pcache_defer_reqs_kick(struct dm_pcache *pcache); 66 + 67 + #endif /* _DM_PCACHE_H */

+117

drivers/md/dm-pcache/pcache_internal.h

··· 1 + /* SPDX-License-Identifier: GPL-2.0-or-later */ 2 + #ifndef _PCACHE_INTERNAL_H 3 + #define _PCACHE_INTERNAL_H 4 + 5 + #include <linux/delay.h> 6 + #include <linux/crc32c.h> 7 + 8 + #define pcache_err(fmt, ...) \ 9 + pr_err("dm-pcache: %s:%u " fmt, __func__, __LINE__, ##__VA_ARGS__) 10 + #define pcache_info(fmt, ...) \ 11 + pr_info("dm-pcache: %s:%u " fmt, __func__, __LINE__, ##__VA_ARGS__) 12 + #define pcache_debug(fmt, ...) \ 13 + pr_debug("dm-pcache: %s:%u " fmt, __func__, __LINE__, ##__VA_ARGS__) 14 + 15 + #define PCACHE_KB (1024ULL) 16 + #define PCACHE_MB (1024 * PCACHE_KB) 17 + 18 + /* Maximum number of metadata indices */ 19 + #define PCACHE_META_INDEX_MAX 2 20 + 21 + #define PCACHE_CRC_SEED 0x3B15A 22 + /* 23 + * struct pcache_meta_header - PCACHE metadata header structure 24 + * @crc: CRC checksum for validating metadata integrity. 25 + * @seq: Sequence number to track metadata updates. 26 + * @version: Metadata version. 27 + * @res: Reserved space for future use. 28 + */ 29 + struct pcache_meta_header { 30 + __u32 crc; 31 + __u8 seq; 32 + __u8 version; 33 + __u16 res; 34 + }; 35 + 36 + /* 37 + * pcache_meta_crc - Calculate CRC for the given metadata header. 38 + * @header: Pointer to the metadata header. 39 + * @meta_size: Size of the metadata structure. 40 + * 41 + * Returns the CRC checksum calculated by excluding the CRC field itself. 42 + */ 43 + static inline u32 pcache_meta_crc(struct pcache_meta_header *header, u32 meta_size) 44 + { 45 + return crc32c(PCACHE_CRC_SEED, (void *)header + 4, meta_size - 4); 46 + } 47 + 48 + /* 49 + * pcache_meta_seq_after - Check if a sequence number is more recent, accounting for overflow. 50 + * @seq1: First sequence number. 51 + * @seq2: Second sequence number. 52 + * 53 + * Determines if @seq1 is more recent than @seq2 by calculating the signed 54 + * difference between them. This approach allows handling sequence number 55 + * overflow correctly because the difference wraps naturally, and any value 56 + * greater than zero indicates that @seq1 is "after" @seq2. This method 57 + * assumes 8-bit unsigned sequence numbers, where the difference wraps 58 + * around if seq1 overflows past seq2. 59 + * 60 + * Returns: 61 + * - true if @seq1 is more recent than @seq2, indicating it comes "after" 62 + * - false otherwise. 63 + */ 64 + static inline bool pcache_meta_seq_after(u8 seq1, u8 seq2) 65 + { 66 + return (s8)(seq1 - seq2) > 0; 67 + } 68 + 69 + /* 70 + * pcache_meta_find_latest - Find the latest valid metadata. 71 + * @header: Pointer to the metadata header. 72 + * @meta_size: Size of each metadata block. 73 + * 74 + * Finds the latest valid metadata by checking sequence numbers. If a 75 + * valid entry with the highest sequence number is found, its pointer 76 + * is returned. Returns NULL if no valid metadata is found. 77 + */ 78 + static inline void __must_check *pcache_meta_find_latest(struct pcache_meta_header *header, 79 + u32 meta_size, u32 meta_max_size, 80 + void *meta_ret) 81 + { 82 + struct pcache_meta_header *meta, *latest = NULL; 83 + u32 i, seq_latest = 0; 84 + void *meta_addr; 85 + 86 + meta = meta_ret; 87 + 88 + for (i = 0; i < PCACHE_META_INDEX_MAX; i++) { 89 + meta_addr = (void *)header + (i * meta_max_size); 90 + if (copy_mc_to_kernel(meta, meta_addr, meta_size)) { 91 + pcache_err("hardware memory error when copy meta"); 92 + return ERR_PTR(-EIO); 93 + } 94 + 95 + /* Skip if CRC check fails, which means corrupted */ 96 + if (meta->crc != pcache_meta_crc(meta, meta_size)) 97 + continue; 98 + 99 + /* Update latest if a more recent sequence is found */ 100 + if (!latest || pcache_meta_seq_after(meta->seq, seq_latest)) { 101 + seq_latest = meta->seq; 102 + latest = (void *)header + (i * meta_max_size); 103 + } 104 + } 105 + 106 + if (!latest) 107 + return NULL; 108 + 109 + if (copy_mc_to_kernel(meta_ret, latest, meta_size)) { 110 + pcache_err("hardware memory error"); 111 + return ERR_PTR(-EIO); 112 + } 113 + 114 + return latest; 115 + } 116 + 117 + #endif /* _PCACHE_INTERNAL_H */

+61

drivers/md/dm-pcache/segment.c

··· 1 + // SPDX-License-Identifier: GPL-2.0-or-later 2 + #include <linux/dax.h> 3 + 4 + #include "pcache_internal.h" 5 + #include "cache_dev.h" 6 + #include "segment.h" 7 + 8 + int segment_copy_to_bio(struct pcache_segment *segment, 9 + u32 data_off, u32 data_len, struct bio *bio, u32 bio_off) 10 + { 11 + struct iov_iter iter; 12 + size_t copied; 13 + void *src; 14 + 15 + iov_iter_bvec(&iter, ITER_DEST, &bio->bi_io_vec[bio->bi_iter.bi_idx], 16 + bio_segments(bio), bio->bi_iter.bi_size); 17 + iter.iov_offset = bio->bi_iter.bi_bvec_done; 18 + if (bio_off) 19 + iov_iter_advance(&iter, bio_off); 20 + 21 + src = segment->data + data_off; 22 + copied = _copy_mc_to_iter(src, data_len, &iter); 23 + if (copied != data_len) 24 + return -EIO; 25 + 26 + return 0; 27 + } 28 + 29 + int segment_copy_from_bio(struct pcache_segment *segment, 30 + u32 data_off, u32 data_len, struct bio *bio, u32 bio_off) 31 + { 32 + struct iov_iter iter; 33 + size_t copied; 34 + void *dst; 35 + 36 + iov_iter_bvec(&iter, ITER_SOURCE, &bio->bi_io_vec[bio->bi_iter.bi_idx], 37 + bio_segments(bio), bio->bi_iter.bi_size); 38 + iter.iov_offset = bio->bi_iter.bi_bvec_done; 39 + if (bio_off) 40 + iov_iter_advance(&iter, bio_off); 41 + 42 + dst = segment->data + data_off; 43 + copied = _copy_from_iter_flushcache(dst, data_len, &iter); 44 + if (copied != data_len) 45 + return -EIO; 46 + pmem_wmb(); 47 + 48 + return 0; 49 + } 50 + 51 + void pcache_segment_init(struct pcache_cache_dev *cache_dev, struct pcache_segment *segment, 52 + struct pcache_segment_init_options *options) 53 + { 54 + segment->seg_info = options->seg_info; 55 + segment_info_set_type(segment->seg_info, options->type); 56 + 57 + segment->cache_dev = cache_dev; 58 + segment->seg_id = options->seg_id; 59 + segment->data_size = PCACHE_SEG_SIZE - options->data_off; 60 + segment->data = CACHE_DEV_SEGMENT(cache_dev, options->seg_id) + options->data_off; 61 + }

+74

drivers/md/dm-pcache/segment.h

··· 1 + /* SPDX-License-Identifier: GPL-2.0-or-later */ 2 + #ifndef _PCACHE_SEGMENT_H 3 + #define _PCACHE_SEGMENT_H 4 + 5 + #include <linux/bio.h> 6 + #include <linux/bitfield.h> 7 + 8 + #include "pcache_internal.h" 9 + 10 + struct pcache_segment_info { 11 + struct pcache_meta_header header; 12 + __u32 flags; 13 + __u32 next_seg; 14 + }; 15 + 16 + #define PCACHE_SEG_INFO_FLAGS_HAS_NEXT BIT(0) 17 + 18 + #define PCACHE_SEG_INFO_FLAGS_TYPE_MASK GENMASK(4, 1) 19 + #define PCACHE_SEGMENT_TYPE_CACHE_DATA 1 20 + 21 + static inline bool segment_info_has_next(struct pcache_segment_info *seg_info) 22 + { 23 + return (seg_info->flags & PCACHE_SEG_INFO_FLAGS_HAS_NEXT); 24 + } 25 + 26 + static inline void segment_info_set_type(struct pcache_segment_info *seg_info, u8 type) 27 + { 28 + seg_info->flags &= ~PCACHE_SEG_INFO_FLAGS_TYPE_MASK; 29 + seg_info->flags |= FIELD_PREP(PCACHE_SEG_INFO_FLAGS_TYPE_MASK, type); 30 + } 31 + 32 + static inline u8 segment_info_get_type(struct pcache_segment_info *seg_info) 33 + { 34 + return FIELD_GET(PCACHE_SEG_INFO_FLAGS_TYPE_MASK, seg_info->flags); 35 + } 36 + 37 + struct pcache_segment_pos { 38 + struct pcache_segment *segment; /* Segment associated with the position */ 39 + u32 off; /* Offset within the segment */ 40 + }; 41 + 42 + struct pcache_segment_init_options { 43 + u8 type; 44 + u32 seg_id; 45 + u32 data_off; 46 + 47 + struct pcache_segment_info *seg_info; 48 + }; 49 + 50 + struct pcache_segment { 51 + struct pcache_cache_dev *cache_dev; 52 + 53 + void *data; 54 + u32 data_size; 55 + u32 seg_id; 56 + 57 + struct pcache_segment_info *seg_info; 58 + }; 59 + 60 + int segment_copy_to_bio(struct pcache_segment *segment, 61 + u32 data_off, u32 data_len, struct bio *bio, u32 bio_off); 62 + int segment_copy_from_bio(struct pcache_segment *segment, 63 + u32 data_off, u32 data_len, struct bio *bio, u32 bio_off); 64 + 65 + static inline void segment_pos_advance(struct pcache_segment_pos *seg_pos, u32 len) 66 + { 67 + BUG_ON(seg_pos->off + len > seg_pos->segment->data_size); 68 + 69 + seg_pos->off += len; 70 + } 71 + 72 + void pcache_segment_init(struct pcache_cache_dev *cache_dev, struct pcache_segment *segment, 73 + struct pcache_segment_init_options *options); 74 + #endif /* _PCACHE_SEGMENT_H */