z3fold: the 3-fold allocator for compressed pages

Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

kernel os linux

This patch introduces z3fold, a special purpose allocator for storing
compressed pages. It is designed to store up to three compressed pages
per physical page. It is a ZBUD derivative which allows for higher
compression ratio keeping the simplicity and determinism of its
predecessor.

This patch comes as a follow-up to the discussions at the Embedded Linux
Conference in San-Diego related to the talk [1]. The outcome of these
discussions was that it would be good to have a compressed page
allocator as stable and deterministic as zbud with with higher
compression ratio.

To keep the determinism and simplicity, z3fold, just like zbud, always
stores an integral number of compressed pages per page, but it can store
up to 3 pages unlike zbud which can store at most 2. Therefore the
compression ratio goes to around 2.6x while zbud's one is around 1.7x.

The patch is based on the latest linux.git tree.

This version has been updated after testing on various simulators (e.g.
ARM Versatile Express, MIPS Malta, x86_64/Haswell) and basing on
comments from Dan Streetman [3].

[1] https://openiotelc2016.sched.org/event/6DAC/swapping-and-embedded-compression-relieves-the-pressure-vitaly-wool-softprise-consulting-ou
[2] https://lkml.org/lkml/2016/4/21/799
[3] https://lkml.org/lkml/2016/5/4/852

Link: http://lkml.kernel.org/r/20160509151753.ec3f9fda3c9898d31ff52a32@gmail.com
Signed-off-by: Vitaly Wool <vitalywool@gmail.com>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

authored by

Vitaly Wool and committed by

Linus Torvalds 10 years ago 9a001fc1 d5ee7c3b

+830 -1

4 changed files

expand all

Documentation

z3fold.txt

Kconfig

Makefile

z3fold.c

+26

Documentation/vm/z3fold.txt

··· 1 + z3fold 2 + ------ 3 + 4 + z3fold is a special purpose allocator for storing compressed pages. 5 + It is designed to store up to three compressed pages per physical page. 6 + It is a zbud derivative which allows for higher compression 7 + ratio keeping the simplicity and determinism of its predecessor. 8 + 9 + The main differences between z3fold and zbud are: 10 + * unlike zbud, z3fold allows for up to PAGE_SIZE allocations 11 + * z3fold can hold up to 3 compressed pages in its page 12 + * z3fold doesn't export any API itself and is thus intended to be used 13 + via the zpool API. 14 + 15 + To keep the determinism and simplicity, z3fold, just like zbud, always 16 + stores an integral number of compressed pages per page, but it can store 17 + up to 3 pages unlike zbud which can store at most 2. Therefore the 18 + compression ratio goes to around 2.7x while zbud's one is around 1.7x. 19 + 20 + Unlike zbud (but like zsmalloc for that matter) z3fold_alloc() does not 21 + return a dereferenceable pointer. Instead, it returns an unsigned long 22 + handle which encodes actual location of the allocated object. 23 + 24 + Keeping effective compression ratio close to zsmalloc's, z3fold doesn't 25 + depend on MMU enabled and provides more predictable reclaim behavior 26 + which makes it a better fit for small and response-critical systems.

+11 -1

mm/Kconfig

··· 567 567 zsmalloc. 568 568 569 569 config ZBUD 570 - tristate "Low density storage for compressed pages" 570 + tristate "Low (Up to 2x) density storage for compressed pages" 571 571 default n 572 572 help 573 573 A special purpose allocator for storing compressed pages. ··· 575 575 page. While this design limits storage density, it has simple and 576 576 deterministic reclaim properties that make it preferable to a higher 577 577 density approach when reclaim will be used. 578 + 579 + config Z3FOLD 580 + tristate "Up to 3x density storage for compressed pages" 581 + depends on ZPOOL 582 + default n 583 + help 584 + A special purpose allocator for storing compressed pages. 585 + It is designed to store up to three compressed pages per physical 586 + page. It is a ZBUD derivative so the simplicity and determinism are 587 + still there. 578 588 579 589 config ZSMALLOC 580 590 tristate "Memory allocator for compressed pages"

mm/Makefile

··· 89 89 obj-$(CONFIG_ZPOOL) += zpool.o 90 90 obj-$(CONFIG_ZBUD) += zbud.o 91 91 obj-$(CONFIG_ZSMALLOC) += zsmalloc.o 92 + obj-$(CONFIG_Z3FOLD) += z3fold.o 92 93 obj-$(CONFIG_GENERIC_EARLY_IOREMAP) += early_ioremap.o 93 94 obj-$(CONFIG_CMA) += cma.o 94 95 obj-$(CONFIG_MEMORY_BALLOON) += balloon_compaction.o

+792

mm/z3fold.c

··· 1 + /* 2 + * z3fold.c 3 + * 4 + * Author: Vitaly Wool <vitaly.wool@konsulko.com> 5 + * Copyright (C) 2016, Sony Mobile Communications Inc. 6 + * 7 + * This implementation is based on zbud written by Seth Jennings. 8 + * 9 + * z3fold is an special purpose allocator for storing compressed pages. It 10 + * can store up to three compressed pages per page which improves the 11 + * compression ratio of zbud while retaining its main concepts (e. g. always 12 + * storing an integral number of objects per page) and simplicity. 13 + * It still has simple and deterministic reclaim properties that make it 14 + * preferable to a higher density approach (with no requirement on integral 15 + * number of object per page) when reclaim is used. 16 + * 17 + * As in zbud, pages are divided into "chunks". The size of the chunks is 18 + * fixed at compile time and is determined by NCHUNKS_ORDER below. 19 + * 20 + * z3fold doesn't export any API and is meant to be used via zpool API. 21 + */ 22 + 23 + #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt 24 + 25 + #include <linux/atomic.h> 26 + #include <linux/list.h> 27 + #include <linux/mm.h> 28 + #include <linux/module.h> 29 + #include <linux/preempt.h> 30 + #include <linux/slab.h> 31 + #include <linux/spinlock.h> 32 + #include <linux/zpool.h> 33 + 34 + /***************** 35 + * Structures 36 + *****************/ 37 + /* 38 + * NCHUNKS_ORDER determines the internal allocation granularity, effectively 39 + * adjusting internal fragmentation. It also determines the number of 40 + * freelists maintained in each pool. NCHUNKS_ORDER of 6 means that the 41 + * allocation granularity will be in chunks of size PAGE_SIZE/64. As one chunk 42 + * in allocated page is occupied by z3fold header, NCHUNKS will be calculated 43 + * to 63 which shows the max number of free chunks in z3fold page, also there 44 + * will be 63 freelists per pool. 45 + */ 46 + #define NCHUNKS_ORDER 6 47 + 48 + #define CHUNK_SHIFT (PAGE_SHIFT - NCHUNKS_ORDER) 49 + #define CHUNK_SIZE (1 << CHUNK_SHIFT) 50 + #define ZHDR_SIZE_ALIGNED CHUNK_SIZE 51 + #define NCHUNKS ((PAGE_SIZE - ZHDR_SIZE_ALIGNED) >> CHUNK_SHIFT) 52 + 53 + #define BUDDY_MASK ((1 << NCHUNKS_ORDER) - 1) 54 + 55 + struct z3fold_pool; 56 + struct z3fold_ops { 57 + int (*evict)(struct z3fold_pool *pool, unsigned long handle); 58 + }; 59 + 60 + /** 61 + * struct z3fold_pool - stores metadata for each z3fold pool 62 + * @lock: protects all pool fields and first|last_chunk fields of any 63 + * z3fold page in the pool 64 + * @unbuddied: array of lists tracking z3fold pages that contain 2- buddies; 65 + * the lists each z3fold page is added to depends on the size of 66 + * its free region. 67 + * @buddied: list tracking the z3fold pages that contain 3 buddies; 68 + * these z3fold pages are full 69 + * @lru: list tracking the z3fold pages in LRU order by most recently 70 + * added buddy. 71 + * @pages_nr: number of z3fold pages in the pool. 72 + * @ops: pointer to a structure of user defined operations specified at 73 + * pool creation time. 74 + * 75 + * This structure is allocated at pool creation time and maintains metadata 76 + * pertaining to a particular z3fold pool. 77 + */ 78 + struct z3fold_pool { 79 + spinlock_t lock; 80 + struct list_head unbuddied[NCHUNKS]; 81 + struct list_head buddied; 82 + struct list_head lru; 83 + u64 pages_nr; 84 + const struct z3fold_ops *ops; 85 + struct zpool *zpool; 86 + const struct zpool_ops *zpool_ops; 87 + }; 88 + 89 + enum buddy { 90 + HEADLESS = 0, 91 + FIRST, 92 + MIDDLE, 93 + LAST, 94 + BUDDIES_MAX 95 + }; 96 + 97 + /* 98 + * struct z3fold_header - z3fold page metadata occupying the first chunk of each 99 + * z3fold page, except for HEADLESS pages 100 + * @buddy: links the z3fold page into the relevant list in the pool 101 + * @first_chunks: the size of the first buddy in chunks, 0 if free 102 + * @middle_chunks: the size of the middle buddy in chunks, 0 if free 103 + * @last_chunks: the size of the last buddy in chunks, 0 if free 104 + * @first_num: the starting number (for the first handle) 105 + */ 106 + struct z3fold_header { 107 + struct list_head buddy; 108 + unsigned short first_chunks; 109 + unsigned short middle_chunks; 110 + unsigned short last_chunks; 111 + unsigned short start_middle; 112 + unsigned short first_num:NCHUNKS_ORDER; 113 + }; 114 + 115 + /* 116 + * Internal z3fold page flags 117 + */ 118 + enum z3fold_page_flags { 119 + UNDER_RECLAIM = 0, 120 + PAGE_HEADLESS, 121 + MIDDLE_CHUNK_MAPPED, 122 + }; 123 + 124 + /***************** 125 + * Helpers 126 + *****************/ 127 + 128 + /* Converts an allocation size in bytes to size in z3fold chunks */ 129 + static int size_to_chunks(size_t size) 130 + { 131 + return (size + CHUNK_SIZE - 1) >> CHUNK_SHIFT; 132 + } 133 + 134 + #define for_each_unbuddied_list(_iter, _begin) \ 135 + for ((_iter) = (_begin); (_iter) < NCHUNKS; (_iter)++) 136 + 137 + /* Initializes the z3fold header of a newly allocated z3fold page */ 138 + static struct z3fold_header *init_z3fold_page(struct page *page) 139 + { 140 + struct z3fold_header *zhdr = page_address(page); 141 + 142 + INIT_LIST_HEAD(&page->lru); 143 + clear_bit(UNDER_RECLAIM, &page->private); 144 + clear_bit(PAGE_HEADLESS, &page->private); 145 + clear_bit(MIDDLE_CHUNK_MAPPED, &page->private); 146 + 147 + zhdr->first_chunks = 0; 148 + zhdr->middle_chunks = 0; 149 + zhdr->last_chunks = 0; 150 + zhdr->first_num = 0; 151 + zhdr->start_middle = 0; 152 + INIT_LIST_HEAD(&zhdr->buddy); 153 + return zhdr; 154 + } 155 + 156 + /* Resets the struct page fields and frees the page */ 157 + static void free_z3fold_page(struct z3fold_header *zhdr) 158 + { 159 + __free_page(virt_to_page(zhdr)); 160 + } 161 + 162 + /* 163 + * Encodes the handle of a particular buddy within a z3fold page 164 + * Pool lock should be held as this function accesses first_num 165 + */ 166 + static unsigned long encode_handle(struct z3fold_header *zhdr, enum buddy bud) 167 + { 168 + unsigned long handle; 169 + 170 + handle = (unsigned long)zhdr; 171 + if (bud != HEADLESS) 172 + handle += (bud + zhdr->first_num) & BUDDY_MASK; 173 + return handle; 174 + } 175 + 176 + /* Returns the z3fold page where a given handle is stored */ 177 + static struct z3fold_header *handle_to_z3fold_header(unsigned long handle) 178 + { 179 + return (struct z3fold_header *)(handle & PAGE_MASK); 180 + } 181 + 182 + /* Returns buddy number */ 183 + static enum buddy handle_to_buddy(unsigned long handle) 184 + { 185 + struct z3fold_header *zhdr = handle_to_z3fold_header(handle); 186 + return (handle - zhdr->first_num) & BUDDY_MASK; 187 + } 188 + 189 + /* 190 + * Returns the number of free chunks in a z3fold page. 191 + * NB: can't be used with HEADLESS pages. 192 + */ 193 + static int num_free_chunks(struct z3fold_header *zhdr) 194 + { 195 + int nfree; 196 + /* 197 + * If there is a middle object, pick up the bigger free space 198 + * either before or after it. Otherwise just subtract the number 199 + * of chunks occupied by the first and the last objects. 200 + */ 201 + if (zhdr->middle_chunks != 0) { 202 + int nfree_before = zhdr->first_chunks ? 203 + 0 : zhdr->start_middle - 1; 204 + int nfree_after = zhdr->last_chunks ? 205 + 0 : NCHUNKS - zhdr->start_middle - zhdr->middle_chunks; 206 + nfree = max(nfree_before, nfree_after); 207 + } else 208 + nfree = NCHUNKS - zhdr->first_chunks - zhdr->last_chunks; 209 + return nfree; 210 + } 211 + 212 + /***************** 213 + * API Functions 214 + *****************/ 215 + /** 216 + * z3fold_create_pool() - create a new z3fold pool 217 + * @gfp: gfp flags when allocating the z3fold pool structure 218 + * @ops: user-defined operations for the z3fold pool 219 + * 220 + * Return: pointer to the new z3fold pool or NULL if the metadata allocation 221 + * failed. 222 + */ 223 + static struct z3fold_pool *z3fold_create_pool(gfp_t gfp, 224 + const struct z3fold_ops *ops) 225 + { 226 + struct z3fold_pool *pool; 227 + int i; 228 + 229 + pool = kzalloc(sizeof(struct z3fold_pool), gfp); 230 + if (!pool) 231 + return NULL; 232 + spin_lock_init(&pool->lock); 233 + for_each_unbuddied_list(i, 0) 234 + INIT_LIST_HEAD(&pool->unbuddied[i]); 235 + INIT_LIST_HEAD(&pool->buddied); 236 + INIT_LIST_HEAD(&pool->lru); 237 + pool->pages_nr = 0; 238 + pool->ops = ops; 239 + return pool; 240 + } 241 + 242 + /** 243 + * z3fold_destroy_pool() - destroys an existing z3fold pool 244 + * @pool: the z3fold pool to be destroyed 245 + * 246 + * The pool should be emptied before this function is called. 247 + */ 248 + static void z3fold_destroy_pool(struct z3fold_pool *pool) 249 + { 250 + kfree(pool); 251 + } 252 + 253 + /* Has to be called with lock held */ 254 + static int z3fold_compact_page(struct z3fold_header *zhdr) 255 + { 256 + struct page *page = virt_to_page(zhdr); 257 + void *beg = zhdr; 258 + 259 + 260 + if (!test_bit(MIDDLE_CHUNK_MAPPED, &page->private) && 261 + zhdr->middle_chunks != 0 && 262 + zhdr->first_chunks == 0 && zhdr->last_chunks == 0) { 263 + memmove(beg + ZHDR_SIZE_ALIGNED, 264 + beg + (zhdr->start_middle << CHUNK_SHIFT), 265 + zhdr->middle_chunks << CHUNK_SHIFT); 266 + zhdr->first_chunks = zhdr->middle_chunks; 267 + zhdr->middle_chunks = 0; 268 + zhdr->start_middle = 0; 269 + zhdr->first_num++; 270 + return 1; 271 + } 272 + return 0; 273 + } 274 + 275 + /** 276 + * z3fold_alloc() - allocates a region of a given size 277 + * @pool: z3fold pool from which to allocate 278 + * @size: size in bytes of the desired allocation 279 + * @gfp: gfp flags used if the pool needs to grow 280 + * @handle: handle of the new allocation 281 + * 282 + * This function will attempt to find a free region in the pool large enough to 283 + * satisfy the allocation request. A search of the unbuddied lists is 284 + * performed first. If no suitable free region is found, then a new page is 285 + * allocated and added to the pool to satisfy the request. 286 + * 287 + * gfp should not set __GFP_HIGHMEM as highmem pages cannot be used 288 + * as z3fold pool pages. 289 + * 290 + * Return: 0 if success and handle is set, otherwise -EINVAL if the size or 291 + * gfp arguments are invalid or -ENOMEM if the pool was unable to allocate 292 + * a new page. 293 + */ 294 + static int z3fold_alloc(struct z3fold_pool *pool, size_t size, gfp_t gfp, 295 + unsigned long *handle) 296 + { 297 + int chunks = 0, i, freechunks; 298 + struct z3fold_header *zhdr = NULL; 299 + enum buddy bud; 300 + struct page *page; 301 + 302 + if (!size || (gfp & __GFP_HIGHMEM)) 303 + return -EINVAL; 304 + 305 + if (size > PAGE_SIZE) 306 + return -ENOSPC; 307 + 308 + if (size > PAGE_SIZE - ZHDR_SIZE_ALIGNED - CHUNK_SIZE) 309 + bud = HEADLESS; 310 + else { 311 + chunks = size_to_chunks(size); 312 + spin_lock(&pool->lock); 313 + 314 + /* First, try to find an unbuddied z3fold page. */ 315 + zhdr = NULL; 316 + for_each_unbuddied_list(i, chunks) { 317 + if (!list_empty(&pool->unbuddied[i])) { 318 + zhdr = list_first_entry(&pool->unbuddied[i], 319 + struct z3fold_header, buddy); 320 + page = virt_to_page(zhdr); 321 + if (zhdr->first_chunks == 0) { 322 + if (zhdr->middle_chunks != 0 && 323 + chunks >= zhdr->start_middle) 324 + bud = LAST; 325 + else 326 + bud = FIRST; 327 + } else if (zhdr->last_chunks == 0) 328 + bud = LAST; 329 + else if (zhdr->middle_chunks == 0) 330 + bud = MIDDLE; 331 + else { 332 + pr_err("No free chunks in unbuddied\n"); 333 + WARN_ON(1); 334 + continue; 335 + } 336 + list_del(&zhdr->buddy); 337 + goto found; 338 + } 339 + } 340 + bud = FIRST; 341 + spin_unlock(&pool->lock); 342 + } 343 + 344 + /* Couldn't find unbuddied z3fold page, create new one */ 345 + page = alloc_page(gfp); 346 + if (!page) 347 + return -ENOMEM; 348 + spin_lock(&pool->lock); 349 + pool->pages_nr++; 350 + zhdr = init_z3fold_page(page); 351 + 352 + if (bud == HEADLESS) { 353 + set_bit(PAGE_HEADLESS, &page->private); 354 + goto headless; 355 + } 356 + 357 + found: 358 + if (bud == FIRST) 359 + zhdr->first_chunks = chunks; 360 + else if (bud == LAST) 361 + zhdr->last_chunks = chunks; 362 + else { 363 + zhdr->middle_chunks = chunks; 364 + zhdr->start_middle = zhdr->first_chunks + 1; 365 + } 366 + 367 + if (zhdr->first_chunks == 0 || zhdr->last_chunks == 0 || 368 + zhdr->middle_chunks == 0) { 369 + /* Add to unbuddied list */ 370 + freechunks = num_free_chunks(zhdr); 371 + list_add(&zhdr->buddy, &pool->unbuddied[freechunks]); 372 + } else { 373 + /* Add to buddied list */ 374 + list_add(&zhdr->buddy, &pool->buddied); 375 + } 376 + 377 + headless: 378 + /* Add/move z3fold page to beginning of LRU */ 379 + if (!list_empty(&page->lru)) 380 + list_del(&page->lru); 381 + 382 + list_add(&page->lru, &pool->lru); 383 + 384 + *handle = encode_handle(zhdr, bud); 385 + spin_unlock(&pool->lock); 386 + 387 + return 0; 388 + } 389 + 390 + /** 391 + * z3fold_free() - frees the allocation associated with the given handle 392 + * @pool: pool in which the allocation resided 393 + * @handle: handle associated with the allocation returned by z3fold_alloc() 394 + * 395 + * In the case that the z3fold page in which the allocation resides is under 396 + * reclaim, as indicated by the PG_reclaim flag being set, this function 397 + * only sets the first|last_chunks to 0. The page is actually freed 398 + * once both buddies are evicted (see z3fold_reclaim_page() below). 399 + */ 400 + static void z3fold_free(struct z3fold_pool *pool, unsigned long handle) 401 + { 402 + struct z3fold_header *zhdr; 403 + int freechunks; 404 + struct page *page; 405 + enum buddy bud; 406 + 407 + spin_lock(&pool->lock); 408 + zhdr = handle_to_z3fold_header(handle); 409 + page = virt_to_page(zhdr); 410 + 411 + if (test_bit(PAGE_HEADLESS, &page->private)) { 412 + /* HEADLESS page stored */ 413 + bud = HEADLESS; 414 + } else { 415 + bud = (handle - zhdr->first_num) & BUDDY_MASK; 416 + 417 + switch (bud) { 418 + case FIRST: 419 + zhdr->first_chunks = 0; 420 + break; 421 + case MIDDLE: 422 + zhdr->middle_chunks = 0; 423 + zhdr->start_middle = 0; 424 + break; 425 + case LAST: 426 + zhdr->last_chunks = 0; 427 + break; 428 + default: 429 + pr_err("%s: unknown bud %d\n", __func__, bud); 430 + WARN_ON(1); 431 + spin_unlock(&pool->lock); 432 + return; 433 + } 434 + } 435 + 436 + if (test_bit(UNDER_RECLAIM, &page->private)) { 437 + /* z3fold page is under reclaim, reclaim will free */ 438 + spin_unlock(&pool->lock); 439 + return; 440 + } 441 + 442 + if (bud != HEADLESS) { 443 + /* Remove from existing buddy list */ 444 + list_del(&zhdr->buddy); 445 + } 446 + 447 + if (bud == HEADLESS || 448 + (zhdr->first_chunks == 0 && zhdr->middle_chunks == 0 && 449 + zhdr->last_chunks == 0)) { 450 + /* z3fold page is empty, free */ 451 + list_del(&page->lru); 452 + clear_bit(PAGE_HEADLESS, &page->private); 453 + free_z3fold_page(zhdr); 454 + pool->pages_nr--; 455 + } else { 456 + z3fold_compact_page(zhdr); 457 + /* Add to the unbuddied list */ 458 + freechunks = num_free_chunks(zhdr); 459 + list_add(&zhdr->buddy, &pool->unbuddied[freechunks]); 460 + } 461 + 462 + spin_unlock(&pool->lock); 463 + } 464 + 465 + /** 466 + * z3fold_reclaim_page() - evicts allocations from a pool page and frees it 467 + * @pool: pool from which a page will attempt to be evicted 468 + * @retires: number of pages on the LRU list for which eviction will 469 + * be attempted before failing 470 + * 471 + * z3fold reclaim is different from normal system reclaim in that it is done 472 + * from the bottom, up. This is because only the bottom layer, z3fold, has 473 + * information on how the allocations are organized within each z3fold page. 474 + * This has the potential to create interesting locking situations between 475 + * z3fold and the user, however. 476 + * 477 + * To avoid these, this is how z3fold_reclaim_page() should be called: 478 + 479 + * The user detects a page should be reclaimed and calls z3fold_reclaim_page(). 480 + * z3fold_reclaim_page() will remove a z3fold page from the pool LRU list and 481 + * call the user-defined eviction handler with the pool and handle as 482 + * arguments. 483 + * 484 + * If the handle can not be evicted, the eviction handler should return 485 + * non-zero. z3fold_reclaim_page() will add the z3fold page back to the 486 + * appropriate list and try the next z3fold page on the LRU up to 487 + * a user defined number of retries. 488 + * 489 + * If the handle is successfully evicted, the eviction handler should 490 + * return 0 _and_ should have called z3fold_free() on the handle. z3fold_free() 491 + * contains logic to delay freeing the page if the page is under reclaim, 492 + * as indicated by the setting of the PG_reclaim flag on the underlying page. 493 + * 494 + * If all buddies in the z3fold page are successfully evicted, then the 495 + * z3fold page can be freed. 496 + * 497 + * Returns: 0 if page is successfully freed, otherwise -EINVAL if there are 498 + * no pages to evict or an eviction handler is not registered, -EAGAIN if 499 + * the retry limit was hit. 500 + */ 501 + static int z3fold_reclaim_page(struct z3fold_pool *pool, unsigned int retries) 502 + { 503 + int i, ret = 0, freechunks; 504 + struct z3fold_header *zhdr; 505 + struct page *page; 506 + unsigned long first_handle = 0, middle_handle = 0, last_handle = 0; 507 + 508 + spin_lock(&pool->lock); 509 + if (!pool->ops || !pool->ops->evict || list_empty(&pool->lru) || 510 + retries == 0) { 511 + spin_unlock(&pool->lock); 512 + return -EINVAL; 513 + } 514 + for (i = 0; i < retries; i++) { 515 + page = list_last_entry(&pool->lru, struct page, lru); 516 + list_del(&page->lru); 517 + 518 + /* Protect z3fold page against free */ 519 + set_bit(UNDER_RECLAIM, &page->private); 520 + zhdr = page_address(page); 521 + if (!test_bit(PAGE_HEADLESS, &page->private)) { 522 + list_del(&zhdr->buddy); 523 + /* 524 + * We need encode the handles before unlocking, since 525 + * we can race with free that will set 526 + * (first|last)_chunks to 0 527 + */ 528 + first_handle = 0; 529 + last_handle = 0; 530 + middle_handle = 0; 531 + if (zhdr->first_chunks) 532 + first_handle = encode_handle(zhdr, FIRST); 533 + if (zhdr->middle_chunks) 534 + middle_handle = encode_handle(zhdr, MIDDLE); 535 + if (zhdr->last_chunks) 536 + last_handle = encode_handle(zhdr, LAST); 537 + } else { 538 + first_handle = encode_handle(zhdr, HEADLESS); 539 + last_handle = middle_handle = 0; 540 + } 541 + 542 + spin_unlock(&pool->lock); 543 + 544 + /* Issue the eviction callback(s) */ 545 + if (middle_handle) { 546 + ret = pool->ops->evict(pool, middle_handle); 547 + if (ret) 548 + goto next; 549 + } 550 + if (first_handle) { 551 + ret = pool->ops->evict(pool, first_handle); 552 + if (ret) 553 + goto next; 554 + } 555 + if (last_handle) { 556 + ret = pool->ops->evict(pool, last_handle); 557 + if (ret) 558 + goto next; 559 + } 560 + next: 561 + spin_lock(&pool->lock); 562 + clear_bit(UNDER_RECLAIM, &page->private); 563 + if ((test_bit(PAGE_HEADLESS, &page->private) && ret == 0) || 564 + (zhdr->first_chunks == 0 && zhdr->last_chunks == 0 && 565 + zhdr->middle_chunks == 0)) { 566 + /* 567 + * All buddies are now free, free the z3fold page and 568 + * return success. 569 + */ 570 + clear_bit(PAGE_HEADLESS, &page->private); 571 + free_z3fold_page(zhdr); 572 + pool->pages_nr--; 573 + spin_unlock(&pool->lock); 574 + return 0; 575 + } else if (zhdr->first_chunks != 0 && 576 + zhdr->last_chunks != 0 && zhdr->middle_chunks != 0) { 577 + /* Full, add to buddied list */ 578 + list_add(&zhdr->buddy, &pool->buddied); 579 + } else if (!test_bit(PAGE_HEADLESS, &page->private)) { 580 + z3fold_compact_page(zhdr); 581 + /* add to unbuddied list */ 582 + freechunks = num_free_chunks(zhdr); 583 + list_add(&zhdr->buddy, &pool->unbuddied[freechunks]); 584 + } 585 + 586 + /* add to beginning of LRU */ 587 + list_add(&page->lru, &pool->lru); 588 + } 589 + spin_unlock(&pool->lock); 590 + return -EAGAIN; 591 + } 592 + 593 + /** 594 + * z3fold_map() - maps the allocation associated with the given handle 595 + * @pool: pool in which the allocation resides 596 + * @handle: handle associated with the allocation to be mapped 597 + * 598 + * Extracts the buddy number from handle and constructs the pointer to the 599 + * correct starting chunk within the page. 600 + * 601 + * Returns: a pointer to the mapped allocation 602 + */ 603 + static void *z3fold_map(struct z3fold_pool *pool, unsigned long handle) 604 + { 605 + struct z3fold_header *zhdr; 606 + struct page *page; 607 + void *addr; 608 + enum buddy buddy; 609 + 610 + spin_lock(&pool->lock); 611 + zhdr = handle_to_z3fold_header(handle); 612 + addr = zhdr; 613 + page = virt_to_page(zhdr); 614 + 615 + if (test_bit(PAGE_HEADLESS, &page->private)) 616 + goto out; 617 + 618 + buddy = handle_to_buddy(handle); 619 + switch (buddy) { 620 + case FIRST: 621 + addr += ZHDR_SIZE_ALIGNED; 622 + break; 623 + case MIDDLE: 624 + addr += zhdr->start_middle << CHUNK_SHIFT; 625 + set_bit(MIDDLE_CHUNK_MAPPED, &page->private); 626 + break; 627 + case LAST: 628 + addr += PAGE_SIZE - (zhdr->last_chunks << CHUNK_SHIFT); 629 + break; 630 + default: 631 + pr_err("unknown buddy id %d\n", buddy); 632 + WARN_ON(1); 633 + addr = NULL; 634 + break; 635 + } 636 + out: 637 + spin_unlock(&pool->lock); 638 + return addr; 639 + } 640 + 641 + /** 642 + * z3fold_unmap() - unmaps the allocation associated with the given handle 643 + * @pool: pool in which the allocation resides 644 + * @handle: handle associated with the allocation to be unmapped 645 + */ 646 + static void z3fold_unmap(struct z3fold_pool *pool, unsigned long handle) 647 + { 648 + struct z3fold_header *zhdr; 649 + struct page *page; 650 + enum buddy buddy; 651 + 652 + spin_lock(&pool->lock); 653 + zhdr = handle_to_z3fold_header(handle); 654 + page = virt_to_page(zhdr); 655 + 656 + if (test_bit(PAGE_HEADLESS, &page->private)) { 657 + spin_unlock(&pool->lock); 658 + return; 659 + } 660 + 661 + buddy = handle_to_buddy(handle); 662 + if (buddy == MIDDLE) 663 + clear_bit(MIDDLE_CHUNK_MAPPED, &page->private); 664 + spin_unlock(&pool->lock); 665 + } 666 + 667 + /** 668 + * z3fold_get_pool_size() - gets the z3fold pool size in pages 669 + * @pool: pool whose size is being queried 670 + * 671 + * Returns: size in pages of the given pool. The pool lock need not be 672 + * taken to access pages_nr. 673 + */ 674 + static u64 z3fold_get_pool_size(struct z3fold_pool *pool) 675 + { 676 + return pool->pages_nr; 677 + } 678 + 679 + /***************** 680 + * zpool 681 + ****************/ 682 + 683 + static int z3fold_zpool_evict(struct z3fold_pool *pool, unsigned long handle) 684 + { 685 + if (pool->zpool && pool->zpool_ops && pool->zpool_ops->evict) 686 + return pool->zpool_ops->evict(pool->zpool, handle); 687 + else 688 + return -ENOENT; 689 + } 690 + 691 + static const struct z3fold_ops z3fold_zpool_ops = { 692 + .evict = z3fold_zpool_evict 693 + }; 694 + 695 + static void *z3fold_zpool_create(const char *name, gfp_t gfp, 696 + const struct zpool_ops *zpool_ops, 697 + struct zpool *zpool) 698 + { 699 + struct z3fold_pool *pool; 700 + 701 + pool = z3fold_create_pool(gfp, zpool_ops ? &z3fold_zpool_ops : NULL); 702 + if (pool) { 703 + pool->zpool = zpool; 704 + pool->zpool_ops = zpool_ops; 705 + } 706 + return pool; 707 + } 708 + 709 + static void z3fold_zpool_destroy(void *pool) 710 + { 711 + z3fold_destroy_pool(pool); 712 + } 713 + 714 + static int z3fold_zpool_malloc(void *pool, size_t size, gfp_t gfp, 715 + unsigned long *handle) 716 + { 717 + return z3fold_alloc(pool, size, gfp, handle); 718 + } 719 + static void z3fold_zpool_free(void *pool, unsigned long handle) 720 + { 721 + z3fold_free(pool, handle); 722 + } 723 + 724 + static int z3fold_zpool_shrink(void *pool, unsigned int pages, 725 + unsigned int *reclaimed) 726 + { 727 + unsigned int total = 0; 728 + int ret = -EINVAL; 729 + 730 + while (total < pages) { 731 + ret = z3fold_reclaim_page(pool, 8); 732 + if (ret < 0) 733 + break; 734 + total++; 735 + } 736 + 737 + if (reclaimed) 738 + *reclaimed = total; 739 + 740 + return ret; 741 + } 742 + 743 + static void *z3fold_zpool_map(void *pool, unsigned long handle, 744 + enum zpool_mapmode mm) 745 + { 746 + return z3fold_map(pool, handle); 747 + } 748 + static void z3fold_zpool_unmap(void *pool, unsigned long handle) 749 + { 750 + z3fold_unmap(pool, handle); 751 + } 752 + 753 + static u64 z3fold_zpool_total_size(void *pool) 754 + { 755 + return z3fold_get_pool_size(pool) * PAGE_SIZE; 756 + } 757 + 758 + static struct zpool_driver z3fold_zpool_driver = { 759 + .type = "z3fold", 760 + .owner = THIS_MODULE, 761 + .create = z3fold_zpool_create, 762 + .destroy = z3fold_zpool_destroy, 763 + .malloc = z3fold_zpool_malloc, 764 + .free = z3fold_zpool_free, 765 + .shrink = z3fold_zpool_shrink, 766 + .map = z3fold_zpool_map, 767 + .unmap = z3fold_zpool_unmap, 768 + .total_size = z3fold_zpool_total_size, 769 + }; 770 + 771 + MODULE_ALIAS("zpool-z3fold"); 772 + 773 + static int __init init_z3fold(void) 774 + { 775 + /* Make sure the z3fold header will fit in one chunk */ 776 + BUILD_BUG_ON(sizeof(struct z3fold_header) > ZHDR_SIZE_ALIGNED); 777 + zpool_register_driver(&z3fold_zpool_driver); 778 + 779 + return 0; 780 + } 781 + 782 + static void __exit exit_z3fold(void) 783 + { 784 + zpool_unregister_driver(&z3fold_zpool_driver); 785 + } 786 + 787 + module_init(init_z3fold); 788 + module_exit(exit_z3fold); 789 + 790 + MODULE_LICENSE("GPL"); 791 + MODULE_AUTHOR("Vitaly Wool <vitalywool@gmail.com>"); 792 + MODULE_DESCRIPTION("3-Fold Allocator for Compressed Pages");