Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

Merge branch 'xarray' of git://git.infradead.org/users/willy/linux-dax

Pull XArray conversion from Matthew Wilcox:
"The XArray provides an improved interface to the radix tree data
structure, providing locking as part of the API, specifying GFP flags
at allocation time, eliminating preloading, less re-walking the tree,
more efficient iterations and not exposing RCU-protected pointers to
its users.

This patch set

1. Introduces the XArray implementation

2. Converts the pagecache to use it

3. Converts memremap to use it

The page cache is the most complex and important user of the radix
tree, so converting it was most important. Converting the memremap
code removes the only other user of the multiorder code, which allows
us to remove the radix tree code that supported it.

I have 40+ followup patches to convert many other users of the radix
tree over to the XArray, but I'd like to get this part in first. The
other conversions haven't been in linux-next and aren't suitable for
applying yet, but you can see them in the xarray-conv branch if you're
interested"

* 'xarray' of git://git.infradead.org/users/willy/linux-dax: (90 commits)
radix tree: Remove multiorder support
radix tree test: Convert multiorder tests to XArray
radix tree tests: Convert item_delete_rcu to XArray
radix tree tests: Convert item_kill_tree to XArray
radix tree tests: Move item_insert_order
radix tree test suite: Remove multiorder benchmarking
radix tree test suite: Remove __item_insert
memremap: Convert to XArray
xarray: Add range store functionality
xarray: Move multiorder_check to in-kernel tests
xarray: Move multiorder_shrink to kernel tests
xarray: Move multiorder account test in-kernel
radix tree test suite: Convert iteration test to XArray
radix tree test suite: Convert tag_tagged_items to XArray
radix tree: Remove radix_tree_clear_tags
radix tree: Remove radix_tree_maybe_preload_order
radix tree: Remove split/join code
radix tree: Remove radix_tree_update_node_t
page cache: Finish XArray conversion
dax: Convert page fault handlers to XArray
...

+7056 -3825
-1
.clang-format
··· 323 323 - 'protocol_for_each_card' 324 324 - 'protocol_for_each_dev' 325 325 - 'queue_for_each_hw_ctx' 326 - - 'radix_tree_for_each_contig' 327 326 - 'radix_tree_for_each_slot' 328 327 - 'radix_tree_for_each_tagged' 329 328 - 'rbtree_postorder_for_each_entry_safe'
+7
.mailmap
··· 119 119 Mark Yao <markyao0591@gmail.com> <mark.yao@rock-chips.com> 120 120 Martin Kepplinger <martink@posteo.de> <martin.kepplinger@theobroma-systems.com> 121 121 Martin Kepplinger <martink@posteo.de> <martin.kepplinger@ginzinger.com> 122 + Matthew Wilcox <willy@infradead.org> <matthew.r.wilcox@intel.com> 123 + Matthew Wilcox <willy@infradead.org> <matthew@wil.cx> 124 + Matthew Wilcox <willy@infradead.org> <mawilcox@linuxonhyperv.com> 125 + Matthew Wilcox <willy@infradead.org> <mawilcox@microsoft.com> 126 + Matthew Wilcox <willy@infradead.org> <willy@debian.org> 127 + Matthew Wilcox <willy@infradead.org> <willy@linux.intel.com> 128 + Matthew Wilcox <willy@infradead.org> <willy@parisc-linux.org> 122 129 Matthieu CASTET <castet.matthieu@free.fr> 123 130 Mauro Carvalho Chehab <mchehab@kernel.org> <mchehab@brturbo.com.br> 124 131 Mauro Carvalho Chehab <mchehab@kernel.org> <maurochehab@gmail.com>
+1
Documentation/core-api/index.rst
··· 21 21 local_ops 22 22 workqueue 23 23 genericirq 24 + xarray 24 25 flexible-arrays 25 26 librs 26 27 genalloc
+435
Documentation/core-api/xarray.rst
··· 1 + .. SPDX-License-Identifier: GPL-2.0+ 2 + 3 + ====== 4 + XArray 5 + ====== 6 + 7 + :Author: Matthew Wilcox 8 + 9 + Overview 10 + ======== 11 + 12 + The XArray is an abstract data type which behaves like a very large array 13 + of pointers. It meets many of the same needs as a hash or a conventional 14 + resizable array. Unlike a hash, it allows you to sensibly go to the 15 + next or previous entry in a cache-efficient manner. In contrast to a 16 + resizable array, there is no need to copy data or change MMU mappings in 17 + order to grow the array. It is more memory-efficient, parallelisable 18 + and cache friendly than a doubly-linked list. It takes advantage of 19 + RCU to perform lookups without locking. 20 + 21 + The XArray implementation is efficient when the indices used are densely 22 + clustered; hashing the object and using the hash as the index will not 23 + perform well. The XArray is optimised for small indices, but still has 24 + good performance with large indices. If your index can be larger than 25 + ``ULONG_MAX`` then the XArray is not the data type for you. The most 26 + important user of the XArray is the page cache. 27 + 28 + Each non-``NULL`` entry in the array has three bits associated with 29 + it called marks. Each mark may be set or cleared independently of 30 + the others. You can iterate over entries which are marked. 31 + 32 + Normal pointers may be stored in the XArray directly. They must be 4-byte 33 + aligned, which is true for any pointer returned from :c:func:`kmalloc` and 34 + :c:func:`alloc_page`. It isn't true for arbitrary user-space pointers, 35 + nor for function pointers. You can store pointers to statically allocated 36 + objects, as long as those objects have an alignment of at least 4. 37 + 38 + You can also store integers between 0 and ``LONG_MAX`` in the XArray. 39 + You must first convert it into an entry using :c:func:`xa_mk_value`. 40 + When you retrieve an entry from the XArray, you can check whether it is 41 + a value entry by calling :c:func:`xa_is_value`, and convert it back to 42 + an integer by calling :c:func:`xa_to_value`. 43 + 44 + Some users want to store tagged pointers instead of using the marks 45 + described above. They can call :c:func:`xa_tag_pointer` to create an 46 + entry with a tag, :c:func:`xa_untag_pointer` to turn a tagged entry 47 + back into an untagged pointer and :c:func:`xa_pointer_tag` to retrieve 48 + the tag of an entry. Tagged pointers use the same bits that are used 49 + to distinguish value entries from normal pointers, so each user must 50 + decide whether they want to store value entries or tagged pointers in 51 + any particular XArray. 52 + 53 + The XArray does not support storing :c:func:`IS_ERR` pointers as some 54 + conflict with value entries or internal entries. 55 + 56 + An unusual feature of the XArray is the ability to create entries which 57 + occupy a range of indices. Once stored to, looking up any index in 58 + the range will return the same entry as looking up any other index in 59 + the range. Setting a mark on one index will set it on all of them. 60 + Storing to any index will store to all of them. Multi-index entries can 61 + be explicitly split into smaller entries, or storing ``NULL`` into any 62 + entry will cause the XArray to forget about the range. 63 + 64 + Normal API 65 + ========== 66 + 67 + Start by initialising an XArray, either with :c:func:`DEFINE_XARRAY` 68 + for statically allocated XArrays or :c:func:`xa_init` for dynamically 69 + allocated ones. A freshly-initialised XArray contains a ``NULL`` 70 + pointer at every index. 71 + 72 + You can then set entries using :c:func:`xa_store` and get entries 73 + using :c:func:`xa_load`. xa_store will overwrite any entry with the 74 + new entry and return the previous entry stored at that index. You can 75 + use :c:func:`xa_erase` instead of calling :c:func:`xa_store` with a 76 + ``NULL`` entry. There is no difference between an entry that has never 77 + been stored to and one that has most recently had ``NULL`` stored to it. 78 + 79 + You can conditionally replace an entry at an index by using 80 + :c:func:`xa_cmpxchg`. Like :c:func:`cmpxchg`, it will only succeed if 81 + the entry at that index has the 'old' value. It also returns the entry 82 + which was at that index; if it returns the same entry which was passed as 83 + 'old', then :c:func:`xa_cmpxchg` succeeded. 84 + 85 + If you want to only store a new entry to an index if the current entry 86 + at that index is ``NULL``, you can use :c:func:`xa_insert` which 87 + returns ``-EEXIST`` if the entry is not empty. 88 + 89 + You can enquire whether a mark is set on an entry by using 90 + :c:func:`xa_get_mark`. If the entry is not ``NULL``, you can set a mark 91 + on it by using :c:func:`xa_set_mark` and remove the mark from an entry by 92 + calling :c:func:`xa_clear_mark`. You can ask whether any entry in the 93 + XArray has a particular mark set by calling :c:func:`xa_marked`. 94 + 95 + You can copy entries out of the XArray into a plain array by calling 96 + :c:func:`xa_extract`. Or you can iterate over the present entries in 97 + the XArray by calling :c:func:`xa_for_each`. You may prefer to use 98 + :c:func:`xa_find` or :c:func:`xa_find_after` to move to the next present 99 + entry in the XArray. 100 + 101 + Calling :c:func:`xa_store_range` stores the same entry in a range 102 + of indices. If you do this, some of the other operations will behave 103 + in a slightly odd way. For example, marking the entry at one index 104 + may result in the entry being marked at some, but not all of the other 105 + indices. Storing into one index may result in the entry retrieved by 106 + some, but not all of the other indices changing. 107 + 108 + Finally, you can remove all entries from an XArray by calling 109 + :c:func:`xa_destroy`. If the XArray entries are pointers, you may wish 110 + to free the entries first. You can do this by iterating over all present 111 + entries in the XArray using the :c:func:`xa_for_each` iterator. 112 + 113 + ID assignment 114 + ------------- 115 + 116 + You can call :c:func:`xa_alloc` to store the entry at any unused index 117 + in the XArray. If you need to modify the array from interrupt context, 118 + you can use :c:func:`xa_alloc_bh` or :c:func:`xa_alloc_irq` to disable 119 + interrupts while allocating the ID. Unlike :c:func:`xa_store`, allocating 120 + a ``NULL`` pointer does not delete an entry. Instead it reserves an 121 + entry like :c:func:`xa_reserve` and you can release it using either 122 + :c:func:`xa_erase` or :c:func:`xa_release`. To use ID assignment, the 123 + XArray must be defined with :c:func:`DEFINE_XARRAY_ALLOC`, or initialised 124 + by passing ``XA_FLAGS_ALLOC`` to :c:func:`xa_init_flags`, 125 + 126 + Memory allocation 127 + ----------------- 128 + 129 + The :c:func:`xa_store`, :c:func:`xa_cmpxchg`, :c:func:`xa_alloc`, 130 + :c:func:`xa_reserve` and :c:func:`xa_insert` functions take a gfp_t 131 + parameter in case the XArray needs to allocate memory to store this entry. 132 + If the entry is being deleted, no memory allocation needs to be performed, 133 + and the GFP flags specified will be ignored. 134 + 135 + It is possible for no memory to be allocatable, particularly if you pass 136 + a restrictive set of GFP flags. In that case, the functions return a 137 + special value which can be turned into an errno using :c:func:`xa_err`. 138 + If you don't need to know exactly which error occurred, using 139 + :c:func:`xa_is_err` is slightly more efficient. 140 + 141 + Locking 142 + ------- 143 + 144 + When using the Normal API, you do not have to worry about locking. 145 + The XArray uses RCU and an internal spinlock to synchronise access: 146 + 147 + No lock needed: 148 + * :c:func:`xa_empty` 149 + * :c:func:`xa_marked` 150 + 151 + Takes RCU read lock: 152 + * :c:func:`xa_load` 153 + * :c:func:`xa_for_each` 154 + * :c:func:`xa_find` 155 + * :c:func:`xa_find_after` 156 + * :c:func:`xa_extract` 157 + * :c:func:`xa_get_mark` 158 + 159 + Takes xa_lock internally: 160 + * :c:func:`xa_store` 161 + * :c:func:`xa_insert` 162 + * :c:func:`xa_erase` 163 + * :c:func:`xa_erase_bh` 164 + * :c:func:`xa_erase_irq` 165 + * :c:func:`xa_cmpxchg` 166 + * :c:func:`xa_store_range` 167 + * :c:func:`xa_alloc` 168 + * :c:func:`xa_alloc_bh` 169 + * :c:func:`xa_alloc_irq` 170 + * :c:func:`xa_destroy` 171 + * :c:func:`xa_set_mark` 172 + * :c:func:`xa_clear_mark` 173 + 174 + Assumes xa_lock held on entry: 175 + * :c:func:`__xa_store` 176 + * :c:func:`__xa_insert` 177 + * :c:func:`__xa_erase` 178 + * :c:func:`__xa_cmpxchg` 179 + * :c:func:`__xa_alloc` 180 + * :c:func:`__xa_set_mark` 181 + * :c:func:`__xa_clear_mark` 182 + 183 + If you want to take advantage of the lock to protect the data structures 184 + that you are storing in the XArray, you can call :c:func:`xa_lock` 185 + before calling :c:func:`xa_load`, then take a reference count on the 186 + object you have found before calling :c:func:`xa_unlock`. This will 187 + prevent stores from removing the object from the array between looking 188 + up the object and incrementing the refcount. You can also use RCU to 189 + avoid dereferencing freed memory, but an explanation of that is beyond 190 + the scope of this document. 191 + 192 + The XArray does not disable interrupts or softirqs while modifying 193 + the array. It is safe to read the XArray from interrupt or softirq 194 + context as the RCU lock provides enough protection. 195 + 196 + If, for example, you want to store entries in the XArray in process 197 + context and then erase them in softirq context, you can do that this way:: 198 + 199 + void foo_init(struct foo *foo) 200 + { 201 + xa_init_flags(&foo->array, XA_FLAGS_LOCK_BH); 202 + } 203 + 204 + int foo_store(struct foo *foo, unsigned long index, void *entry) 205 + { 206 + int err; 207 + 208 + xa_lock_bh(&foo->array); 209 + err = xa_err(__xa_store(&foo->array, index, entry, GFP_KERNEL)); 210 + if (!err) 211 + foo->count++; 212 + xa_unlock_bh(&foo->array); 213 + return err; 214 + } 215 + 216 + /* foo_erase() is only called from softirq context */ 217 + void foo_erase(struct foo *foo, unsigned long index) 218 + { 219 + xa_lock(&foo->array); 220 + __xa_erase(&foo->array, index); 221 + foo->count--; 222 + xa_unlock(&foo->array); 223 + } 224 + 225 + If you are going to modify the XArray from interrupt or softirq context, 226 + you need to initialise the array using :c:func:`xa_init_flags`, passing 227 + ``XA_FLAGS_LOCK_IRQ`` or ``XA_FLAGS_LOCK_BH``. 228 + 229 + The above example also shows a common pattern of wanting to extend the 230 + coverage of the xa_lock on the store side to protect some statistics 231 + associated with the array. 232 + 233 + Sharing the XArray with interrupt context is also possible, either 234 + using :c:func:`xa_lock_irqsave` in both the interrupt handler and process 235 + context, or :c:func:`xa_lock_irq` in process context and :c:func:`xa_lock` 236 + in the interrupt handler. Some of the more common patterns have helper 237 + functions such as :c:func:`xa_erase_bh` and :c:func:`xa_erase_irq`. 238 + 239 + Sometimes you need to protect access to the XArray with a mutex because 240 + that lock sits above another mutex in the locking hierarchy. That does 241 + not entitle you to use functions like :c:func:`__xa_erase` without taking 242 + the xa_lock; the xa_lock is used for lockdep validation and will be used 243 + for other purposes in the future. 244 + 245 + The :c:func:`__xa_set_mark` and :c:func:`__xa_clear_mark` functions are also 246 + available for situations where you look up an entry and want to atomically 247 + set or clear a mark. It may be more efficient to use the advanced API 248 + in this case, as it will save you from walking the tree twice. 249 + 250 + Advanced API 251 + ============ 252 + 253 + The advanced API offers more flexibility and better performance at the 254 + cost of an interface which can be harder to use and has fewer safeguards. 255 + No locking is done for you by the advanced API, and you are required 256 + to use the xa_lock while modifying the array. You can choose whether 257 + to use the xa_lock or the RCU lock while doing read-only operations on 258 + the array. You can mix advanced and normal operations on the same array; 259 + indeed the normal API is implemented in terms of the advanced API. The 260 + advanced API is only available to modules with a GPL-compatible license. 261 + 262 + The advanced API is based around the xa_state. This is an opaque data 263 + structure which you declare on the stack using the :c:func:`XA_STATE` 264 + macro. This macro initialises the xa_state ready to start walking 265 + around the XArray. It is used as a cursor to maintain the position 266 + in the XArray and let you compose various operations together without 267 + having to restart from the top every time. 268 + 269 + The xa_state is also used to store errors. You can call 270 + :c:func:`xas_error` to retrieve the error. All operations check whether 271 + the xa_state is in an error state before proceeding, so there's no need 272 + for you to check for an error after each call; you can make multiple 273 + calls in succession and only check at a convenient point. The only 274 + errors currently generated by the XArray code itself are ``ENOMEM`` and 275 + ``EINVAL``, but it supports arbitrary errors in case you want to call 276 + :c:func:`xas_set_err` yourself. 277 + 278 + If the xa_state is holding an ``ENOMEM`` error, calling :c:func:`xas_nomem` 279 + will attempt to allocate more memory using the specified gfp flags and 280 + cache it in the xa_state for the next attempt. The idea is that you take 281 + the xa_lock, attempt the operation and drop the lock. The operation 282 + attempts to allocate memory while holding the lock, but it is more 283 + likely to fail. Once you have dropped the lock, :c:func:`xas_nomem` 284 + can try harder to allocate more memory. It will return ``true`` if it 285 + is worth retrying the operation (i.e. that there was a memory error *and* 286 + more memory was allocated). If it has previously allocated memory, and 287 + that memory wasn't used, and there is no error (or some error that isn't 288 + ``ENOMEM``), then it will free the memory previously allocated. 289 + 290 + Internal Entries 291 + ---------------- 292 + 293 + The XArray reserves some entries for its own purposes. These are never 294 + exposed through the normal API, but when using the advanced API, it's 295 + possible to see them. Usually the best way to handle them is to pass them 296 + to :c:func:`xas_retry`, and retry the operation if it returns ``true``. 297 + 298 + .. flat-table:: 299 + :widths: 1 1 6 300 + 301 + * - Name 302 + - Test 303 + - Usage 304 + 305 + * - Node 306 + - :c:func:`xa_is_node` 307 + - An XArray node. May be visible when using a multi-index xa_state. 308 + 309 + * - Sibling 310 + - :c:func:`xa_is_sibling` 311 + - A non-canonical entry for a multi-index entry. The value indicates 312 + which slot in this node has the canonical entry. 313 + 314 + * - Retry 315 + - :c:func:`xa_is_retry` 316 + - This entry is currently being modified by a thread which has the 317 + xa_lock. The node containing this entry may be freed at the end 318 + of this RCU period. You should restart the lookup from the head 319 + of the array. 320 + 321 + * - Zero 322 + - :c:func:`xa_is_zero` 323 + - Zero entries appear as ``NULL`` through the Normal API, but occupy 324 + an entry in the XArray which can be used to reserve the index for 325 + future use. 326 + 327 + Other internal entries may be added in the future. As far as possible, they 328 + will be handled by :c:func:`xas_retry`. 329 + 330 + Additional functionality 331 + ------------------------ 332 + 333 + The :c:func:`xas_create_range` function allocates all the necessary memory 334 + to store every entry in a range. It will set ENOMEM in the xa_state if 335 + it cannot allocate memory. 336 + 337 + You can use :c:func:`xas_init_marks` to reset the marks on an entry 338 + to their default state. This is usually all marks clear, unless the 339 + XArray is marked with ``XA_FLAGS_TRACK_FREE``, in which case mark 0 is set 340 + and all other marks are clear. Replacing one entry with another using 341 + :c:func:`xas_store` will not reset the marks on that entry; if you want 342 + the marks reset, you should do that explicitly. 343 + 344 + The :c:func:`xas_load` will walk the xa_state as close to the entry 345 + as it can. If you know the xa_state has already been walked to the 346 + entry and need to check that the entry hasn't changed, you can use 347 + :c:func:`xas_reload` to save a function call. 348 + 349 + If you need to move to a different index in the XArray, call 350 + :c:func:`xas_set`. This resets the cursor to the top of the tree, which 351 + will generally make the next operation walk the cursor to the desired 352 + spot in the tree. If you want to move to the next or previous index, 353 + call :c:func:`xas_next` or :c:func:`xas_prev`. Setting the index does 354 + not walk the cursor around the array so does not require a lock to be 355 + held, while moving to the next or previous index does. 356 + 357 + You can search for the next present entry using :c:func:`xas_find`. This 358 + is the equivalent of both :c:func:`xa_find` and :c:func:`xa_find_after`; 359 + if the cursor has been walked to an entry, then it will find the next 360 + entry after the one currently referenced. If not, it will return the 361 + entry at the index of the xa_state. Using :c:func:`xas_next_entry` to 362 + move to the next present entry instead of :c:func:`xas_find` will save 363 + a function call in the majority of cases at the expense of emitting more 364 + inline code. 365 + 366 + The :c:func:`xas_find_marked` function is similar. If the xa_state has 367 + not been walked, it will return the entry at the index of the xa_state, 368 + if it is marked. Otherwise, it will return the first marked entry after 369 + the entry referenced by the xa_state. The :c:func:`xas_next_marked` 370 + function is the equivalent of :c:func:`xas_next_entry`. 371 + 372 + When iterating over a range of the XArray using :c:func:`xas_for_each` 373 + or :c:func:`xas_for_each_marked`, it may be necessary to temporarily stop 374 + the iteration. The :c:func:`xas_pause` function exists for this purpose. 375 + After you have done the necessary work and wish to resume, the xa_state 376 + is in an appropriate state to continue the iteration after the entry 377 + you last processed. If you have interrupts disabled while iterating, 378 + then it is good manners to pause the iteration and reenable interrupts 379 + every ``XA_CHECK_SCHED`` entries. 380 + 381 + The :c:func:`xas_get_mark`, :c:func:`xas_set_mark` and 382 + :c:func:`xas_clear_mark` functions require the xa_state cursor to have 383 + been moved to the appropriate location in the xarray; they will do 384 + nothing if you have called :c:func:`xas_pause` or :c:func:`xas_set` 385 + immediately before. 386 + 387 + You can call :c:func:`xas_set_update` to have a callback function 388 + called each time the XArray updates a node. This is used by the page 389 + cache workingset code to maintain its list of nodes which contain only 390 + shadow entries. 391 + 392 + Multi-Index Entries 393 + ------------------- 394 + 395 + The XArray has the ability to tie multiple indices together so that 396 + operations on one index affect all indices. For example, storing into 397 + any index will change the value of the entry retrieved from any index. 398 + Setting or clearing a mark on any index will set or clear the mark 399 + on every index that is tied together. The current implementation 400 + only allows tying ranges which are aligned powers of two together; 401 + eg indices 64-127 may be tied together, but 2-6 may not be. This may 402 + save substantial quantities of memory; for example tying 512 entries 403 + together will save over 4kB. 404 + 405 + You can create a multi-index entry by using :c:func:`XA_STATE_ORDER` 406 + or :c:func:`xas_set_order` followed by a call to :c:func:`xas_store`. 407 + Calling :c:func:`xas_load` with a multi-index xa_state will walk the 408 + xa_state to the right location in the tree, but the return value is not 409 + meaningful, potentially being an internal entry or ``NULL`` even when there 410 + is an entry stored within the range. Calling :c:func:`xas_find_conflict` 411 + will return the first entry within the range or ``NULL`` if there are no 412 + entries in the range. The :c:func:`xas_for_each_conflict` iterator will 413 + iterate over every entry which overlaps the specified range. 414 + 415 + If :c:func:`xas_load` encounters a multi-index entry, the xa_index 416 + in the xa_state will not be changed. When iterating over an XArray 417 + or calling :c:func:`xas_find`, if the initial index is in the middle 418 + of a multi-index entry, it will not be altered. Subsequent calls 419 + or iterations will move the index to the first index in the range. 420 + Each entry will only be returned once, no matter how many indices it 421 + occupies. 422 + 423 + Using :c:func:`xas_next` or :c:func:`xas_prev` with a multi-index xa_state 424 + is not supported. Using either of these functions on a multi-index entry 425 + will reveal sibling entries; these should be skipped over by the caller. 426 + 427 + Storing ``NULL`` into any index of a multi-index entry will set the entry 428 + at every index to ``NULL`` and dissolve the tie. Splitting a multi-index 429 + entry into entries occupying smaller ranges is not yet supported. 430 + 431 + Functions and structures 432 + ======================== 433 + 434 + .. kernel-doc:: include/linux/xarray.h 435 + .. kernel-doc:: lib/xarray.c
+14 -3
MAINTAINERS
··· 535 535 F: drivers/hwmon/adt7475.c 536 536 537 537 ADVANSYS SCSI DRIVER 538 - M: Matthew Wilcox <matthew@wil.cx> 538 + M: Matthew Wilcox <willy@infradead.org> 539 539 M: Hannes Reinecke <hare@suse.com> 540 540 L: linux-scsi@vger.kernel.org 541 541 S: Maintained ··· 4393 4393 F: drivers/i2c/busses/i2c-diolan-u2c.c 4394 4394 4395 4395 FILESYSTEM DIRECT ACCESS (DAX) 4396 - M: Matthew Wilcox <mawilcox@microsoft.com> 4396 + M: Matthew Wilcox <willy@infradead.org> 4397 4397 M: Ross Zwisler <zwisler@kernel.org> 4398 4398 M: Jan Kara <jack@suse.cz> 4399 4399 L: linux-fsdevel@vger.kernel.org ··· 8697 8697 F: drivers/scsi/mpt3sas/ 8698 8698 8699 8699 LSILOGIC/SYMBIOS/NCR 53C8XX and 53C1010 PCI-SCSI drivers 8700 - M: Matthew Wilcox <matthew@wil.cx> 8700 + M: Matthew Wilcox <willy@infradead.org> 8701 8701 L: linux-scsi@vger.kernel.org 8702 8702 S: Maintained 8703 8703 F: drivers/scsi/sym53c8xx_2/ ··· 16136 16136 T: git git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86/vdso 16137 16137 S: Maintained 16138 16138 F: arch/x86/entry/vdso/ 16139 + 16140 + XARRAY 16141 + M: Matthew Wilcox <willy@infradead.org> 16142 + L: linux-fsdevel@vger.kernel.org 16143 + S: Supported 16144 + F: Documentation/core-api/xarray.rst 16145 + F: lib/idr.c 16146 + F: lib/xarray.c 16147 + F: include/linux/idr.h 16148 + F: include/linux/xarray.h 16149 + F: tools/testing/radix-tree 16139 16150 16140 16151 XC2028/3028 TUNER DRIVER 16141 16152 M: Mauro Carvalho Chehab <mchehab@kernel.org>
+1 -1
arch/parisc/kernel/syscall.S
··· 2 2 * Linux/PA-RISC Project (http://www.parisc-linux.org/) 3 3 * 4 4 * System call entry code / Linux gateway page 5 - * Copyright (c) Matthew Wilcox 1999 <willy@bofh.ai> 5 + * Copyright (c) Matthew Wilcox 1999 <willy@infradead.org> 6 6 * Licensed under the GNU GPL. 7 7 * thanks to Philipp Rumpf, Mike Shaver and various others 8 8 * sorry about the wall, puffin..
+1 -3
arch/powerpc/include/asm/book3s/64/pgtable.h
··· 716 716 BUILD_BUG_ON(_PAGE_HPTEFLAGS & (0x1f << _PAGE_BIT_SWAP_TYPE)); \ 717 717 BUILD_BUG_ON(_PAGE_HPTEFLAGS & _PAGE_SWP_SOFT_DIRTY); \ 718 718 } while (0) 719 - /* 720 - * on pte we don't need handle RADIX_TREE_EXCEPTIONAL_SHIFT; 721 - */ 719 + 722 720 #define SWP_TYPE_BITS 5 723 721 #define __swp_type(x) (((x).val >> _PAGE_BIT_SWAP_TYPE) \ 724 722 & ((1UL << SWP_TYPE_BITS) - 1))
+1 -3
arch/powerpc/include/asm/nohash/64/pgtable.h
··· 350 350 #define MAX_SWAPFILES_CHECK() do { \ 351 351 BUILD_BUG_ON(MAX_SWAPFILES_SHIFT > SWP_TYPE_BITS); \ 352 352 } while (0) 353 - /* 354 - * on pte we don't need handle RADIX_TREE_EXCEPTIONAL_SHIFT; 355 - */ 353 + 356 354 #define SWP_TYPE_BITS 5 357 355 #define __swp_type(x) (((x).val >> _PAGE_BIT_SWAP_TYPE) \ 358 356 & ((1UL << SWP_TYPE_BITS) - 1))
+7 -10
drivers/gpu/drm/i915/i915_gem.c
··· 5996 5996 count = __sg_page_count(sg); 5997 5997 5998 5998 while (idx + count <= n) { 5999 - unsigned long exception, i; 5999 + void *entry; 6000 + unsigned long i; 6000 6001 int ret; 6001 6002 6002 6003 /* If we cannot allocate and insert this entry, or the ··· 6012 6011 if (ret && ret != -EEXIST) 6013 6012 goto scan; 6014 6013 6015 - exception = 6016 - RADIX_TREE_EXCEPTIONAL_ENTRY | 6017 - idx << RADIX_TREE_EXCEPTIONAL_SHIFT; 6014 + entry = xa_mk_value(idx); 6018 6015 for (i = 1; i < count; i++) { 6019 - ret = radix_tree_insert(&iter->radix, idx + i, 6020 - (void *)exception); 6016 + ret = radix_tree_insert(&iter->radix, idx + i, entry); 6021 6017 if (ret && ret != -EEXIST) 6022 6018 goto scan; 6023 6019 } ··· 6052 6054 GEM_BUG_ON(!sg); 6053 6055 6054 6056 /* If this index is in the middle of multi-page sg entry, 6055 - * the radixtree will contain an exceptional entry that points 6057 + * the radix tree will contain a value entry that points 6056 6058 * to the start of that range. We will return the pointer to 6057 6059 * the base page and the offset of this page within the 6058 6060 * sg entry's range. 6059 6061 */ 6060 6062 *offset = 0; 6061 - if (unlikely(radix_tree_exception(sg))) { 6062 - unsigned long base = 6063 - (unsigned long)sg >> RADIX_TREE_EXCEPTIONAL_SHIFT; 6063 + if (unlikely(xa_is_value(sg))) { 6064 + unsigned long base = xa_to_value(sg); 6064 6065 6065 6066 sg = radix_tree_lookup(&iter->radix, base); 6066 6067 GEM_BUG_ON(!sg);
+1 -1
drivers/input/keyboard/hilkbd.c
··· 2 2 * linux/drivers/hil/hilkbd.c 3 3 * 4 4 * Copyright (C) 1998 Philip Blundell <philb@gnu.org> 5 - * Copyright (C) 1999 Matthew Wilcox <willy@bofh.ai> 5 + * Copyright (C) 1999 Matthew Wilcox <willy@infradead.org> 6 6 * Copyright (C) 1999-2007 Helge Deller <deller@gmx.de> 7 7 * 8 8 * Very basic HP Human Interface Loop (HIL) driver.
+1 -1
drivers/pci/hotplug/acpiphp.h
··· 8 8 * Copyright (C) 2002 Hiroshi Aono (h-aono@ap.jp.nec.com) 9 9 * Copyright (C) 2002,2003 Takayoshi Kochi (t-kochi@bq.jp.nec.com) 10 10 * Copyright (C) 2002,2003 NEC Corporation 11 - * Copyright (C) 2003-2005 Matthew Wilcox (matthew.wilcox@hp.com) 11 + * Copyright (C) 2003-2005 Matthew Wilcox (willy@infradead.org) 12 12 * Copyright (C) 2003-2005 Hewlett Packard 13 13 * 14 14 * All rights reserved.
+2 -2
drivers/pci/hotplug/acpiphp_core.c
··· 8 8 * Copyright (C) 2002 Hiroshi Aono (h-aono@ap.jp.nec.com) 9 9 * Copyright (C) 2002,2003 Takayoshi Kochi (t-kochi@bq.jp.nec.com) 10 10 * Copyright (C) 2002,2003 NEC Corporation 11 - * Copyright (C) 2003-2005 Matthew Wilcox (matthew.wilcox@hp.com) 11 + * Copyright (C) 2003-2005 Matthew Wilcox (willy@infradead.org) 12 12 * Copyright (C) 2003-2005 Hewlett Packard 13 13 * 14 14 * All rights reserved. ··· 40 40 static struct acpiphp_attention_info *attention_info; 41 41 42 42 #define DRIVER_VERSION "0.5" 43 - #define DRIVER_AUTHOR "Greg Kroah-Hartman <gregkh@us.ibm.com>, Takayoshi Kochi <t-kochi@bq.jp.nec.com>, Matthew Wilcox <willy@hp.com>" 43 + #define DRIVER_AUTHOR "Greg Kroah-Hartman <gregkh@us.ibm.com>, Takayoshi Kochi <t-kochi@bq.jp.nec.com>, Matthew Wilcox <willy@infradead.org>" 44 44 #define DRIVER_DESC "ACPI Hot Plug PCI Controller Driver" 45 45 46 46 MODULE_AUTHOR(DRIVER_AUTHOR);
+1 -1
drivers/pci/hotplug/acpiphp_glue.c
··· 5 5 * Copyright (C) 2002,2003 Takayoshi Kochi (t-kochi@bq.jp.nec.com) 6 6 * Copyright (C) 2002 Hiroshi Aono (h-aono@ap.jp.nec.com) 7 7 * Copyright (C) 2002,2003 NEC Corporation 8 - * Copyright (C) 2003-2005 Matthew Wilcox (matthew.wilcox@hp.com) 8 + * Copyright (C) 2003-2005 Matthew Wilcox (willy@infradead.org) 9 9 * Copyright (C) 2003-2005 Hewlett Packard 10 10 * Copyright (C) 2005 Rajesh Shah (rajesh.shah@intel.com) 11 11 * Copyright (C) 2005 Intel Corporation
+6 -12
drivers/staging/erofs/utils.c
··· 35 35 36 36 #ifdef CONFIG_EROFS_FS_ZIP 37 37 38 - /* radix_tree and the future XArray both don't use tagptr_t yet */ 39 38 struct erofs_workgroup *erofs_find_workgroup( 40 39 struct super_block *sb, pgoff_t index, bool *tag) 41 40 { ··· 46 47 rcu_read_lock(); 47 48 grp = radix_tree_lookup(&sbi->workstn_tree, index); 48 49 if (grp != NULL) { 49 - *tag = radix_tree_exceptional_entry(grp); 50 - grp = (void *)((unsigned long)grp & 51 - ~RADIX_TREE_EXCEPTIONAL_ENTRY); 50 + *tag = xa_pointer_tag(grp); 51 + grp = xa_untag_pointer(grp); 52 52 53 53 if (erofs_workgroup_get(grp, &oldcount)) { 54 54 /* prefer to relax rcu read side */ ··· 81 83 sbi = EROFS_SB(sb); 82 84 erofs_workstn_lock(sbi); 83 85 84 - if (tag) 85 - grp = (void *)((unsigned long)grp | 86 - 1UL << RADIX_TREE_EXCEPTIONAL_SHIFT); 86 + grp = xa_tag_pointer(grp, tag); 87 87 88 88 err = radix_tree_insert(&sbi->workstn_tree, 89 89 grp->index, grp); ··· 127 131 128 132 for (i = 0; i < found; ++i) { 129 133 int cnt; 130 - struct erofs_workgroup *grp = (void *) 131 - ((unsigned long)batch[i] & 132 - ~RADIX_TREE_EXCEPTIONAL_ENTRY); 134 + struct erofs_workgroup *grp = xa_untag_pointer(batch[i]); 133 135 134 136 first_index = grp->index + 1; 135 137 ··· 144 150 #endif 145 151 continue; 146 152 147 - if (radix_tree_delete(&sbi->workstn_tree, 148 - grp->index) != grp) { 153 + if (xa_untag_pointer(radix_tree_delete(&sbi->workstn_tree, 154 + grp->index)) != grp) { 149 155 #ifdef EROFS_FS_HAS_MANAGED_CACHE 150 156 skip: 151 157 erofs_workgroup_unfreeze(grp, 1);
+2 -4
fs/btrfs/compression.c
··· 437 437 if (pg_index > end_index) 438 438 break; 439 439 440 - rcu_read_lock(); 441 - page = radix_tree_lookup(&mapping->i_pages, pg_index); 442 - rcu_read_unlock(); 443 - if (page && !radix_tree_exceptional_entry(page)) { 440 + page = xa_load(&mapping->i_pages, pg_index); 441 + if (page && !xa_is_value(page)) { 444 442 misses++; 445 443 if (misses > 4) 446 444 break;
+5 -7
fs/btrfs/extent_io.c
··· 3784 3784 pgoff_t index; 3785 3785 pgoff_t end; /* Inclusive */ 3786 3786 int scanned = 0; 3787 - int tag; 3787 + xa_mark_t tag; 3788 3788 3789 3789 pagevec_init(&pvec); 3790 3790 if (wbc->range_cyclic) { ··· 3909 3909 pgoff_t done_index; 3910 3910 int range_whole = 0; 3911 3911 int scanned = 0; 3912 - int tag; 3912 + xa_mark_t tag; 3913 3913 3914 3914 /* 3915 3915 * We have to hold onto the inode so that ordered extents can do their ··· 5159 5159 5160 5160 clear_page_dirty_for_io(page); 5161 5161 xa_lock_irq(&page->mapping->i_pages); 5162 - if (!PageDirty(page)) { 5163 - radix_tree_tag_clear(&page->mapping->i_pages, 5164 - page_index(page), 5165 - PAGECACHE_TAG_DIRTY); 5166 - } 5162 + if (!PageDirty(page)) 5163 + __xa_clear_mark(&page->mapping->i_pages, 5164 + page_index(page), PAGECACHE_TAG_DIRTY); 5167 5165 xa_unlock_irq(&page->mapping->i_pages); 5168 5166 ClearPageError(page); 5169 5167 unlock_page(page);
+7 -7
fs/buffer.c
··· 562 562 EXPORT_SYMBOL(mark_buffer_dirty_inode); 563 563 564 564 /* 565 - * Mark the page dirty, and set it dirty in the radix tree, and mark the inode 565 + * Mark the page dirty, and set it dirty in the page cache, and mark the inode 566 566 * dirty. 567 567 * 568 568 * If warn is true, then emit a warning if the page is not uptodate and has ··· 579 579 if (page->mapping) { /* Race with truncate? */ 580 580 WARN_ON_ONCE(warn && !PageUptodate(page)); 581 581 account_page_dirtied(page, mapping); 582 - radix_tree_tag_set(&mapping->i_pages, 583 - page_index(page), PAGECACHE_TAG_DIRTY); 582 + __xa_set_mark(&mapping->i_pages, page_index(page), 583 + PAGECACHE_TAG_DIRTY); 584 584 } 585 585 xa_unlock_irqrestore(&mapping->i_pages, flags); 586 586 } ··· 1050 1050 * The relationship between dirty buffers and dirty pages: 1051 1051 * 1052 1052 * Whenever a page has any dirty buffers, the page's dirty bit is set, and 1053 - * the page is tagged dirty in its radix tree. 1053 + * the page is tagged dirty in the page cache. 1054 1054 * 1055 1055 * At all times, the dirtiness of the buffers represents the dirtiness of 1056 1056 * subsections of the page. If the page has buffers, the page dirty bit is ··· 1073 1073 * mark_buffer_dirty - mark a buffer_head as needing writeout 1074 1074 * @bh: the buffer_head to mark dirty 1075 1075 * 1076 - * mark_buffer_dirty() will set the dirty bit against the buffer, then set its 1077 - * backing page dirty, then tag the page as dirty in its address_space's radix 1078 - * tree and then attach the address_space's inode to its superblock's dirty 1076 + * mark_buffer_dirty() will set the dirty bit against the buffer, then set 1077 + * its backing page dirty, then tag the page as dirty in the page cache 1078 + * and then attach the address_space's inode to its superblock's dirty 1079 1079 * inode list. 1080 1080 * 1081 1081 * mark_buffer_dirty() is atomic. It takes bh->b_page->mapping->private_lock,
+392 -533
fs/dax.c
··· 38 38 #define CREATE_TRACE_POINTS 39 39 #include <trace/events/fs_dax.h> 40 40 41 + static inline unsigned int pe_order(enum page_entry_size pe_size) 42 + { 43 + if (pe_size == PE_SIZE_PTE) 44 + return PAGE_SHIFT - PAGE_SHIFT; 45 + if (pe_size == PE_SIZE_PMD) 46 + return PMD_SHIFT - PAGE_SHIFT; 47 + if (pe_size == PE_SIZE_PUD) 48 + return PUD_SHIFT - PAGE_SHIFT; 49 + return ~0; 50 + } 51 + 41 52 /* We choose 4096 entries - same as per-zone page wait tables */ 42 53 #define DAX_WAIT_TABLE_BITS 12 43 54 #define DAX_WAIT_TABLE_ENTRIES (1 << DAX_WAIT_TABLE_BITS) ··· 56 45 /* The 'colour' (ie low bits) within a PMD of a page offset. */ 57 46 #define PG_PMD_COLOUR ((PMD_SIZE >> PAGE_SHIFT) - 1) 58 47 #define PG_PMD_NR (PMD_SIZE >> PAGE_SHIFT) 48 + 49 + /* The order of a PMD entry */ 50 + #define PMD_ORDER (PMD_SHIFT - PAGE_SHIFT) 59 51 60 52 static wait_queue_head_t wait_table[DAX_WAIT_TABLE_ENTRIES]; 61 53 ··· 73 59 fs_initcall(init_dax_wait_table); 74 60 75 61 /* 76 - * We use lowest available bit in exceptional entry for locking, one bit for 77 - * the entry size (PMD) and two more to tell us if the entry is a zero page or 78 - * an empty entry that is just used for locking. In total four special bits. 62 + * DAX pagecache entries use XArray value entries so they can't be mistaken 63 + * for pages. We use one bit for locking, one bit for the entry size (PMD) 64 + * and two more to tell us if the entry is a zero page or an empty entry that 65 + * is just used for locking. In total four special bits. 79 66 * 80 67 * If the PMD bit isn't set the entry has size PAGE_SIZE, and if the ZERO_PAGE 81 68 * and EMPTY bits aren't set the entry is a normal DAX entry with a filesystem 82 69 * block allocation. 83 70 */ 84 - #define RADIX_DAX_SHIFT (RADIX_TREE_EXCEPTIONAL_SHIFT + 4) 85 - #define RADIX_DAX_ENTRY_LOCK (1 << RADIX_TREE_EXCEPTIONAL_SHIFT) 86 - #define RADIX_DAX_PMD (1 << (RADIX_TREE_EXCEPTIONAL_SHIFT + 1)) 87 - #define RADIX_DAX_ZERO_PAGE (1 << (RADIX_TREE_EXCEPTIONAL_SHIFT + 2)) 88 - #define RADIX_DAX_EMPTY (1 << (RADIX_TREE_EXCEPTIONAL_SHIFT + 3)) 71 + #define DAX_SHIFT (4) 72 + #define DAX_LOCKED (1UL << 0) 73 + #define DAX_PMD (1UL << 1) 74 + #define DAX_ZERO_PAGE (1UL << 2) 75 + #define DAX_EMPTY (1UL << 3) 89 76 90 - static unsigned long dax_radix_pfn(void *entry) 77 + static unsigned long dax_to_pfn(void *entry) 91 78 { 92 - return (unsigned long)entry >> RADIX_DAX_SHIFT; 79 + return xa_to_value(entry) >> DAX_SHIFT; 93 80 } 94 81 95 - static void *dax_radix_locked_entry(unsigned long pfn, unsigned long flags) 82 + static void *dax_make_entry(pfn_t pfn, unsigned long flags) 96 83 { 97 - return (void *)(RADIX_TREE_EXCEPTIONAL_ENTRY | flags | 98 - (pfn << RADIX_DAX_SHIFT) | RADIX_DAX_ENTRY_LOCK); 84 + return xa_mk_value(flags | (pfn_t_to_pfn(pfn) << DAX_SHIFT)); 99 85 } 100 86 101 - static unsigned int dax_radix_order(void *entry) 87 + static void *dax_make_page_entry(struct page *page) 102 88 { 103 - if ((unsigned long)entry & RADIX_DAX_PMD) 104 - return PMD_SHIFT - PAGE_SHIFT; 89 + pfn_t pfn = page_to_pfn_t(page); 90 + return dax_make_entry(pfn, PageHead(page) ? DAX_PMD : 0); 91 + } 92 + 93 + static bool dax_is_locked(void *entry) 94 + { 95 + return xa_to_value(entry) & DAX_LOCKED; 96 + } 97 + 98 + static unsigned int dax_entry_order(void *entry) 99 + { 100 + if (xa_to_value(entry) & DAX_PMD) 101 + return PMD_ORDER; 105 102 return 0; 106 103 } 107 104 108 105 static int dax_is_pmd_entry(void *entry) 109 106 { 110 - return (unsigned long)entry & RADIX_DAX_PMD; 107 + return xa_to_value(entry) & DAX_PMD; 111 108 } 112 109 113 110 static int dax_is_pte_entry(void *entry) 114 111 { 115 - return !((unsigned long)entry & RADIX_DAX_PMD); 112 + return !(xa_to_value(entry) & DAX_PMD); 116 113 } 117 114 118 115 static int dax_is_zero_entry(void *entry) 119 116 { 120 - return (unsigned long)entry & RADIX_DAX_ZERO_PAGE; 117 + return xa_to_value(entry) & DAX_ZERO_PAGE; 121 118 } 122 119 123 120 static int dax_is_empty_entry(void *entry) 124 121 { 125 - return (unsigned long)entry & RADIX_DAX_EMPTY; 122 + return xa_to_value(entry) & DAX_EMPTY; 126 123 } 127 124 128 125 /* 129 - * DAX radix tree locking 126 + * DAX page cache entry locking 130 127 */ 131 128 struct exceptional_entry_key { 132 - struct address_space *mapping; 129 + struct xarray *xa; 133 130 pgoff_t entry_start; 134 131 }; 135 132 ··· 149 124 struct exceptional_entry_key key; 150 125 }; 151 126 152 - static wait_queue_head_t *dax_entry_waitqueue(struct address_space *mapping, 153 - pgoff_t index, void *entry, struct exceptional_entry_key *key) 127 + static wait_queue_head_t *dax_entry_waitqueue(struct xa_state *xas, 128 + void *entry, struct exceptional_entry_key *key) 154 129 { 155 130 unsigned long hash; 131 + unsigned long index = xas->xa_index; 156 132 157 133 /* 158 134 * If 'entry' is a PMD, align the 'index' that we use for the wait ··· 162 136 */ 163 137 if (dax_is_pmd_entry(entry)) 164 138 index &= ~PG_PMD_COLOUR; 165 - 166 - key->mapping = mapping; 139 + key->xa = xas->xa; 167 140 key->entry_start = index; 168 141 169 - hash = hash_long((unsigned long)mapping ^ index, DAX_WAIT_TABLE_BITS); 142 + hash = hash_long((unsigned long)xas->xa ^ index, DAX_WAIT_TABLE_BITS); 170 143 return wait_table + hash; 171 144 } 172 145 173 - static int wake_exceptional_entry_func(wait_queue_entry_t *wait, unsigned int mode, 174 - int sync, void *keyp) 146 + static int wake_exceptional_entry_func(wait_queue_entry_t *wait, 147 + unsigned int mode, int sync, void *keyp) 175 148 { 176 149 struct exceptional_entry_key *key = keyp; 177 150 struct wait_exceptional_entry_queue *ewait = 178 151 container_of(wait, struct wait_exceptional_entry_queue, wait); 179 152 180 - if (key->mapping != ewait->key.mapping || 153 + if (key->xa != ewait->key.xa || 181 154 key->entry_start != ewait->key.entry_start) 182 155 return 0; 183 156 return autoremove_wake_function(wait, mode, sync, NULL); ··· 187 162 * The important information it's conveying is whether the entry at 188 163 * this index used to be a PMD entry. 189 164 */ 190 - static void dax_wake_mapping_entry_waiter(struct address_space *mapping, 191 - pgoff_t index, void *entry, bool wake_all) 165 + static void dax_wake_entry(struct xa_state *xas, void *entry, bool wake_all) 192 166 { 193 167 struct exceptional_entry_key key; 194 168 wait_queue_head_t *wq; 195 169 196 - wq = dax_entry_waitqueue(mapping, index, entry, &key); 170 + wq = dax_entry_waitqueue(xas, entry, &key); 197 171 198 172 /* 199 173 * Checking for locked entry and prepare_to_wait_exclusive() happens ··· 205 181 } 206 182 207 183 /* 208 - * Check whether the given slot is locked. Must be called with the i_pages 209 - * lock held. 210 - */ 211 - static inline int slot_locked(struct address_space *mapping, void **slot) 212 - { 213 - unsigned long entry = (unsigned long) 214 - radix_tree_deref_slot_protected(slot, &mapping->i_pages.xa_lock); 215 - return entry & RADIX_DAX_ENTRY_LOCK; 216 - } 217 - 218 - /* 219 - * Mark the given slot as locked. Must be called with the i_pages lock held. 220 - */ 221 - static inline void *lock_slot(struct address_space *mapping, void **slot) 222 - { 223 - unsigned long entry = (unsigned long) 224 - radix_tree_deref_slot_protected(slot, &mapping->i_pages.xa_lock); 225 - 226 - entry |= RADIX_DAX_ENTRY_LOCK; 227 - radix_tree_replace_slot(&mapping->i_pages, slot, (void *)entry); 228 - return (void *)entry; 229 - } 230 - 231 - /* 232 - * Mark the given slot as unlocked. Must be called with the i_pages lock held. 233 - */ 234 - static inline void *unlock_slot(struct address_space *mapping, void **slot) 235 - { 236 - unsigned long entry = (unsigned long) 237 - radix_tree_deref_slot_protected(slot, &mapping->i_pages.xa_lock); 238 - 239 - entry &= ~(unsigned long)RADIX_DAX_ENTRY_LOCK; 240 - radix_tree_replace_slot(&mapping->i_pages, slot, (void *)entry); 241 - return (void *)entry; 242 - } 243 - 244 - /* 245 - * Lookup entry in radix tree, wait for it to become unlocked if it is 246 - * exceptional entry and return it. The caller must call 247 - * put_unlocked_mapping_entry() when he decided not to lock the entry or 248 - * put_locked_mapping_entry() when he locked the entry and now wants to 249 - * unlock it. 184 + * Look up entry in page cache, wait for it to become unlocked if it 185 + * is a DAX entry and return it. The caller must subsequently call 186 + * put_unlocked_entry() if it did not lock the entry or dax_unlock_entry() 187 + * if it did. 250 188 * 251 189 * Must be called with the i_pages lock held. 252 190 */ 253 - static void *__get_unlocked_mapping_entry(struct address_space *mapping, 254 - pgoff_t index, void ***slotp, bool (*wait_fn)(void)) 191 + static void *get_unlocked_entry(struct xa_state *xas) 255 192 { 256 - void *entry, **slot; 193 + void *entry; 257 194 struct wait_exceptional_entry_queue ewait; 258 195 wait_queue_head_t *wq; 259 196 ··· 222 237 ewait.wait.func = wake_exceptional_entry_func; 223 238 224 239 for (;;) { 225 - bool revalidate; 226 - 227 - entry = __radix_tree_lookup(&mapping->i_pages, index, NULL, 228 - &slot); 229 - if (!entry || 230 - WARN_ON_ONCE(!radix_tree_exceptional_entry(entry)) || 231 - !slot_locked(mapping, slot)) { 232 - if (slotp) 233 - *slotp = slot; 240 + entry = xas_load(xas); 241 + if (!entry || xa_is_internal(entry) || 242 + WARN_ON_ONCE(!xa_is_value(entry)) || 243 + !dax_is_locked(entry)) 234 244 return entry; 235 - } 236 245 237 - wq = dax_entry_waitqueue(mapping, index, entry, &ewait.key); 246 + wq = dax_entry_waitqueue(xas, entry, &ewait.key); 238 247 prepare_to_wait_exclusive(wq, &ewait.wait, 239 248 TASK_UNINTERRUPTIBLE); 240 - xa_unlock_irq(&mapping->i_pages); 241 - revalidate = wait_fn(); 249 + xas_unlock_irq(xas); 250 + xas_reset(xas); 251 + schedule(); 242 252 finish_wait(wq, &ewait.wait); 243 - xa_lock_irq(&mapping->i_pages); 244 - if (revalidate) 245 - return ERR_PTR(-EAGAIN); 253 + xas_lock_irq(xas); 246 254 } 247 255 } 248 256 249 - static bool entry_wait(void) 257 + static void put_unlocked_entry(struct xa_state *xas, void *entry) 250 258 { 251 - schedule(); 252 - /* 253 - * Never return an ERR_PTR() from 254 - * __get_unlocked_mapping_entry(), just keep looping. 255 - */ 256 - return false; 257 - } 258 - 259 - static void *get_unlocked_mapping_entry(struct address_space *mapping, 260 - pgoff_t index, void ***slotp) 261 - { 262 - return __get_unlocked_mapping_entry(mapping, index, slotp, entry_wait); 263 - } 264 - 265 - static void unlock_mapping_entry(struct address_space *mapping, pgoff_t index) 266 - { 267 - void *entry, **slot; 268 - 269 - xa_lock_irq(&mapping->i_pages); 270 - entry = __radix_tree_lookup(&mapping->i_pages, index, NULL, &slot); 271 - if (WARN_ON_ONCE(!entry || !radix_tree_exceptional_entry(entry) || 272 - !slot_locked(mapping, slot))) { 273 - xa_unlock_irq(&mapping->i_pages); 274 - return; 275 - } 276 - unlock_slot(mapping, slot); 277 - xa_unlock_irq(&mapping->i_pages); 278 - dax_wake_mapping_entry_waiter(mapping, index, entry, false); 279 - } 280 - 281 - static void put_locked_mapping_entry(struct address_space *mapping, 282 - pgoff_t index) 283 - { 284 - unlock_mapping_entry(mapping, index); 259 + /* If we were the only waiter woken, wake the next one */ 260 + if (entry) 261 + dax_wake_entry(xas, entry, false); 285 262 } 286 263 287 264 /* 288 - * Called when we are done with radix tree entry we looked up via 289 - * get_unlocked_mapping_entry() and which we didn't lock in the end. 265 + * We used the xa_state to get the entry, but then we locked the entry and 266 + * dropped the xa_lock, so we know the xa_state is stale and must be reset 267 + * before use. 290 268 */ 291 - static void put_unlocked_mapping_entry(struct address_space *mapping, 292 - pgoff_t index, void *entry) 269 + static void dax_unlock_entry(struct xa_state *xas, void *entry) 293 270 { 294 - if (!entry) 295 - return; 271 + void *old; 296 272 297 - /* We have to wake up next waiter for the radix tree entry lock */ 298 - dax_wake_mapping_entry_waiter(mapping, index, entry, false); 273 + xas_reset(xas); 274 + xas_lock_irq(xas); 275 + old = xas_store(xas, entry); 276 + xas_unlock_irq(xas); 277 + BUG_ON(!dax_is_locked(old)); 278 + dax_wake_entry(xas, entry, false); 279 + } 280 + 281 + /* 282 + * Return: The entry stored at this location before it was locked. 283 + */ 284 + static void *dax_lock_entry(struct xa_state *xas, void *entry) 285 + { 286 + unsigned long v = xa_to_value(entry); 287 + return xas_store(xas, xa_mk_value(v | DAX_LOCKED)); 299 288 } 300 289 301 290 static unsigned long dax_entry_size(void *entry) ··· 284 325 return PAGE_SIZE; 285 326 } 286 327 287 - static unsigned long dax_radix_end_pfn(void *entry) 328 + static unsigned long dax_end_pfn(void *entry) 288 329 { 289 - return dax_radix_pfn(entry) + dax_entry_size(entry) / PAGE_SIZE; 330 + return dax_to_pfn(entry) + dax_entry_size(entry) / PAGE_SIZE; 290 331 } 291 332 292 333 /* ··· 294 335 * 'empty' and 'zero' entries. 295 336 */ 296 337 #define for_each_mapped_pfn(entry, pfn) \ 297 - for (pfn = dax_radix_pfn(entry); \ 298 - pfn < dax_radix_end_pfn(entry); pfn++) 338 + for (pfn = dax_to_pfn(entry); \ 339 + pfn < dax_end_pfn(entry); pfn++) 299 340 300 341 /* 301 342 * TODO: for reflink+dax we need a way to associate a single page with ··· 352 393 return NULL; 353 394 } 354 395 355 - static bool entry_wait_revalidate(void) 356 - { 357 - rcu_read_unlock(); 358 - schedule(); 359 - rcu_read_lock(); 360 - 361 - /* 362 - * Tell __get_unlocked_mapping_entry() to take a break, we need 363 - * to revalidate page->mapping after dropping locks 364 - */ 365 - return true; 366 - } 367 - 368 396 bool dax_lock_mapping_entry(struct page *page) 369 397 { 370 - pgoff_t index; 371 - struct inode *inode; 372 - bool did_lock = false; 373 - void *entry = NULL, **slot; 374 - struct address_space *mapping; 398 + XA_STATE(xas, NULL, 0); 399 + void *entry; 375 400 376 - rcu_read_lock(); 377 401 for (;;) { 378 - mapping = READ_ONCE(page->mapping); 402 + struct address_space *mapping = READ_ONCE(page->mapping); 379 403 380 404 if (!dax_mapping(mapping)) 381 - break; 405 + return false; 382 406 383 407 /* 384 408 * In the device-dax case there's no need to lock, a ··· 370 428 * otherwise we would not have a valid pfn_to_page() 371 429 * translation. 372 430 */ 373 - inode = mapping->host; 374 - if (S_ISCHR(inode->i_mode)) { 375 - did_lock = true; 376 - break; 377 - } 431 + if (S_ISCHR(mapping->host->i_mode)) 432 + return true; 378 433 379 - xa_lock_irq(&mapping->i_pages); 434 + xas.xa = &mapping->i_pages; 435 + xas_lock_irq(&xas); 380 436 if (mapping != page->mapping) { 381 - xa_unlock_irq(&mapping->i_pages); 437 + xas_unlock_irq(&xas); 382 438 continue; 383 439 } 384 - index = page->index; 385 - 386 - entry = __get_unlocked_mapping_entry(mapping, index, &slot, 387 - entry_wait_revalidate); 388 - if (!entry) { 389 - xa_unlock_irq(&mapping->i_pages); 390 - break; 391 - } else if (IS_ERR(entry)) { 392 - xa_unlock_irq(&mapping->i_pages); 393 - WARN_ON_ONCE(PTR_ERR(entry) != -EAGAIN); 394 - continue; 440 + xas_set(&xas, page->index); 441 + entry = xas_load(&xas); 442 + if (dax_is_locked(entry)) { 443 + entry = get_unlocked_entry(&xas); 444 + /* Did the page move while we slept? */ 445 + if (dax_to_pfn(entry) != page_to_pfn(page)) { 446 + xas_unlock_irq(&xas); 447 + continue; 448 + } 395 449 } 396 - lock_slot(mapping, slot); 397 - did_lock = true; 398 - xa_unlock_irq(&mapping->i_pages); 399 - break; 450 + dax_lock_entry(&xas, entry); 451 + xas_unlock_irq(&xas); 452 + return true; 400 453 } 401 - rcu_read_unlock(); 402 - 403 - return did_lock; 404 454 } 405 455 406 456 void dax_unlock_mapping_entry(struct page *page) 407 457 { 408 458 struct address_space *mapping = page->mapping; 409 - struct inode *inode = mapping->host; 459 + XA_STATE(xas, &mapping->i_pages, page->index); 410 460 411 - if (S_ISCHR(inode->i_mode)) 461 + if (S_ISCHR(mapping->host->i_mode)) 412 462 return; 413 463 414 - unlock_mapping_entry(mapping, page->index); 464 + dax_unlock_entry(&xas, dax_make_page_entry(page)); 415 465 } 416 466 417 467 /* 418 - * Find radix tree entry at given index. If it points to an exceptional entry, 419 - * return it with the radix tree entry locked. If the radix tree doesn't 420 - * contain given index, create an empty exceptional entry for the index and 421 - * return with it locked. 468 + * Find page cache entry at given index. If it is a DAX entry, return it 469 + * with the entry locked. If the page cache doesn't contain an entry at 470 + * that index, add a locked empty entry. 422 471 * 423 - * When requesting an entry with size RADIX_DAX_PMD, grab_mapping_entry() will 424 - * either return that locked entry or will return an error. This error will 425 - * happen if there are any 4k entries within the 2MiB range that we are 426 - * requesting. 472 + * When requesting an entry with size DAX_PMD, grab_mapping_entry() will 473 + * either return that locked entry or will return VM_FAULT_FALLBACK. 474 + * This will happen if there are any PTE entries within the PMD range 475 + * that we are requesting. 427 476 * 428 - * We always favor 4k entries over 2MiB entries. There isn't a flow where we 429 - * evict 4k entries in order to 'upgrade' them to a 2MiB entry. A 2MiB 430 - * insertion will fail if it finds any 4k entries already in the tree, and a 431 - * 4k insertion will cause an existing 2MiB entry to be unmapped and 432 - * downgraded to 4k entries. This happens for both 2MiB huge zero pages as 433 - * well as 2MiB empty entries. 477 + * We always favor PTE entries over PMD entries. There isn't a flow where we 478 + * evict PTE entries in order to 'upgrade' them to a PMD entry. A PMD 479 + * insertion will fail if it finds any PTE entries already in the tree, and a 480 + * PTE insertion will cause an existing PMD entry to be unmapped and 481 + * downgraded to PTE entries. This happens for both PMD zero pages as 482 + * well as PMD empty entries. 434 483 * 435 - * The exception to this downgrade path is for 2MiB DAX PMD entries that have 436 - * real storage backing them. We will leave these real 2MiB DAX entries in 437 - * the tree, and PTE writes will simply dirty the entire 2MiB DAX entry. 484 + * The exception to this downgrade path is for PMD entries that have 485 + * real storage backing them. We will leave these real PMD entries in 486 + * the tree, and PTE writes will simply dirty the entire PMD entry. 438 487 * 439 488 * Note: Unlike filemap_fault() we don't honor FAULT_FLAG_RETRY flags. For 440 489 * persistent memory the benefit is doubtful. We can add that later if we can 441 490 * show it helps. 491 + * 492 + * On error, this function does not return an ERR_PTR. Instead it returns 493 + * a VM_FAULT code, encoded as an xarray internal entry. The ERR_PTR values 494 + * overlap with xarray value entries. 442 495 */ 443 - static void *grab_mapping_entry(struct address_space *mapping, pgoff_t index, 444 - unsigned long size_flag) 496 + static void *grab_mapping_entry(struct xa_state *xas, 497 + struct address_space *mapping, unsigned long size_flag) 445 498 { 446 - bool pmd_downgrade = false; /* splitting 2MiB entry into 4k entries? */ 447 - void *entry, **slot; 499 + unsigned long index = xas->xa_index; 500 + bool pmd_downgrade = false; /* splitting PMD entry into PTE entries? */ 501 + void *entry; 448 502 449 - restart: 450 - xa_lock_irq(&mapping->i_pages); 451 - entry = get_unlocked_mapping_entry(mapping, index, &slot); 452 - 453 - if (WARN_ON_ONCE(entry && !radix_tree_exceptional_entry(entry))) { 454 - entry = ERR_PTR(-EIO); 455 - goto out_unlock; 456 - } 503 + retry: 504 + xas_lock_irq(xas); 505 + entry = get_unlocked_entry(xas); 506 + if (xa_is_internal(entry)) 507 + goto fallback; 457 508 458 509 if (entry) { 459 - if (size_flag & RADIX_DAX_PMD) { 510 + if (WARN_ON_ONCE(!xa_is_value(entry))) { 511 + xas_set_err(xas, EIO); 512 + goto out_unlock; 513 + } 514 + 515 + if (size_flag & DAX_PMD) { 460 516 if (dax_is_pte_entry(entry)) { 461 - put_unlocked_mapping_entry(mapping, index, 462 - entry); 463 - entry = ERR_PTR(-EEXIST); 464 - goto out_unlock; 517 + put_unlocked_entry(xas, entry); 518 + goto fallback; 465 519 } 466 520 } else { /* trying to grab a PTE entry */ 467 521 if (dax_is_pmd_entry(entry) && ··· 468 530 } 469 531 } 470 532 471 - /* No entry for given index? Make sure radix tree is big enough. */ 472 - if (!entry || pmd_downgrade) { 473 - int err; 533 + if (pmd_downgrade) { 534 + /* 535 + * Make sure 'entry' remains valid while we drop 536 + * the i_pages lock. 537 + */ 538 + dax_lock_entry(xas, entry); 474 539 475 - if (pmd_downgrade) { 476 - /* 477 - * Make sure 'entry' remains valid while we drop 478 - * the i_pages lock. 479 - */ 480 - entry = lock_slot(mapping, slot); 481 - } 482 - 483 - xa_unlock_irq(&mapping->i_pages); 484 540 /* 485 541 * Besides huge zero pages the only other thing that gets 486 542 * downgraded are empty entries which don't need to be 487 543 * unmapped. 488 544 */ 489 - if (pmd_downgrade && dax_is_zero_entry(entry)) 490 - unmap_mapping_pages(mapping, index & ~PG_PMD_COLOUR, 491 - PG_PMD_NR, false); 492 - 493 - err = radix_tree_preload( 494 - mapping_gfp_mask(mapping) & ~__GFP_HIGHMEM); 495 - if (err) { 496 - if (pmd_downgrade) 497 - put_locked_mapping_entry(mapping, index); 498 - return ERR_PTR(err); 499 - } 500 - xa_lock_irq(&mapping->i_pages); 501 - 502 - if (!entry) { 503 - /* 504 - * We needed to drop the i_pages lock while calling 505 - * radix_tree_preload() and we didn't have an entry to 506 - * lock. See if another thread inserted an entry at 507 - * our index during this time. 508 - */ 509 - entry = __radix_tree_lookup(&mapping->i_pages, index, 510 - NULL, &slot); 511 - if (entry) { 512 - radix_tree_preload_end(); 513 - xa_unlock_irq(&mapping->i_pages); 514 - goto restart; 515 - } 545 + if (dax_is_zero_entry(entry)) { 546 + xas_unlock_irq(xas); 547 + unmap_mapping_pages(mapping, 548 + xas->xa_index & ~PG_PMD_COLOUR, 549 + PG_PMD_NR, false); 550 + xas_reset(xas); 551 + xas_lock_irq(xas); 516 552 } 517 553 518 - if (pmd_downgrade) { 519 - dax_disassociate_entry(entry, mapping, false); 520 - radix_tree_delete(&mapping->i_pages, index); 521 - mapping->nrexceptional--; 522 - dax_wake_mapping_entry_waiter(mapping, index, entry, 523 - true); 524 - } 525 - 526 - entry = dax_radix_locked_entry(0, size_flag | RADIX_DAX_EMPTY); 527 - 528 - err = __radix_tree_insert(&mapping->i_pages, index, 529 - dax_radix_order(entry), entry); 530 - radix_tree_preload_end(); 531 - if (err) { 532 - xa_unlock_irq(&mapping->i_pages); 533 - /* 534 - * Our insertion of a DAX entry failed, most likely 535 - * because we were inserting a PMD entry and it 536 - * collided with a PTE sized entry at a different 537 - * index in the PMD range. We haven't inserted 538 - * anything into the radix tree and have no waiters to 539 - * wake. 540 - */ 541 - return ERR_PTR(err); 542 - } 543 - /* Good, we have inserted empty locked entry into the tree. */ 544 - mapping->nrexceptional++; 545 - xa_unlock_irq(&mapping->i_pages); 546 - return entry; 554 + dax_disassociate_entry(entry, mapping, false); 555 + xas_store(xas, NULL); /* undo the PMD join */ 556 + dax_wake_entry(xas, entry, true); 557 + mapping->nrexceptional--; 558 + entry = NULL; 559 + xas_set(xas, index); 547 560 } 548 - entry = lock_slot(mapping, slot); 549 - out_unlock: 550 - xa_unlock_irq(&mapping->i_pages); 561 + 562 + if (entry) { 563 + dax_lock_entry(xas, entry); 564 + } else { 565 + entry = dax_make_entry(pfn_to_pfn_t(0), size_flag | DAX_EMPTY); 566 + dax_lock_entry(xas, entry); 567 + if (xas_error(xas)) 568 + goto out_unlock; 569 + mapping->nrexceptional++; 570 + } 571 + 572 + out_unlock: 573 + xas_unlock_irq(xas); 574 + if (xas_nomem(xas, mapping_gfp_mask(mapping) & ~__GFP_HIGHMEM)) 575 + goto retry; 576 + if (xas->xa_node == XA_ERROR(-ENOMEM)) 577 + return xa_mk_internal(VM_FAULT_OOM); 578 + if (xas_error(xas)) 579 + return xa_mk_internal(VM_FAULT_SIGBUS); 551 580 return entry; 581 + fallback: 582 + xas_unlock_irq(xas); 583 + return xa_mk_internal(VM_FAULT_FALLBACK); 552 584 } 553 585 554 586 /** ··· 538 630 */ 539 631 struct page *dax_layout_busy_page(struct address_space *mapping) 540 632 { 541 - pgoff_t indices[PAGEVEC_SIZE]; 633 + XA_STATE(xas, &mapping->i_pages, 0); 634 + void *entry; 635 + unsigned int scanned = 0; 542 636 struct page *page = NULL; 543 - struct pagevec pvec; 544 - pgoff_t index, end; 545 - unsigned i; 546 637 547 638 /* 548 639 * In the 'limited' case get_user_pages() for dax is disabled. ··· 552 645 if (!dax_mapping(mapping) || !mapping_mapped(mapping)) 553 646 return NULL; 554 647 555 - pagevec_init(&pvec); 556 - index = 0; 557 - end = -1; 558 - 559 648 /* 560 649 * If we race get_user_pages_fast() here either we'll see the 561 - * elevated page count in the pagevec_lookup and wait, or 650 + * elevated page count in the iteration and wait, or 562 651 * get_user_pages_fast() will see that the page it took a reference 563 652 * against is no longer mapped in the page tables and bail to the 564 653 * get_user_pages() slow path. The slow path is protected by ··· 566 663 */ 567 664 unmap_mapping_range(mapping, 0, 0, 1); 568 665 569 - while (index < end && pagevec_lookup_entries(&pvec, mapping, index, 570 - min(end - index, (pgoff_t)PAGEVEC_SIZE), 571 - indices)) { 572 - pgoff_t nr_pages = 1; 573 - 574 - for (i = 0; i < pagevec_count(&pvec); i++) { 575 - struct page *pvec_ent = pvec.pages[i]; 576 - void *entry; 577 - 578 - index = indices[i]; 579 - if (index >= end) 580 - break; 581 - 582 - if (WARN_ON_ONCE( 583 - !radix_tree_exceptional_entry(pvec_ent))) 584 - continue; 585 - 586 - xa_lock_irq(&mapping->i_pages); 587 - entry = get_unlocked_mapping_entry(mapping, index, NULL); 588 - if (entry) { 589 - page = dax_busy_page(entry); 590 - /* 591 - * Account for multi-order entries at 592 - * the end of the pagevec. 593 - */ 594 - if (i + 1 >= pagevec_count(&pvec)) 595 - nr_pages = 1UL << dax_radix_order(entry); 596 - } 597 - put_unlocked_mapping_entry(mapping, index, entry); 598 - xa_unlock_irq(&mapping->i_pages); 599 - if (page) 600 - break; 601 - } 602 - 603 - /* 604 - * We don't expect normal struct page entries to exist in our 605 - * tree, but we keep these pagevec calls so that this code is 606 - * consistent with the common pattern for handling pagevecs 607 - * throughout the kernel. 608 - */ 609 - pagevec_remove_exceptionals(&pvec); 610 - pagevec_release(&pvec); 611 - index += nr_pages; 612 - 666 + xas_lock_irq(&xas); 667 + xas_for_each(&xas, entry, ULONG_MAX) { 668 + if (WARN_ON_ONCE(!xa_is_value(entry))) 669 + continue; 670 + if (unlikely(dax_is_locked(entry))) 671 + entry = get_unlocked_entry(&xas); 672 + if (entry) 673 + page = dax_busy_page(entry); 674 + put_unlocked_entry(&xas, entry); 613 675 if (page) 614 676 break; 677 + if (++scanned % XA_CHECK_SCHED) 678 + continue; 679 + 680 + xas_pause(&xas); 681 + xas_unlock_irq(&xas); 682 + cond_resched(); 683 + xas_lock_irq(&xas); 615 684 } 685 + xas_unlock_irq(&xas); 616 686 return page; 617 687 } 618 688 EXPORT_SYMBOL_GPL(dax_layout_busy_page); 619 689 620 - static int __dax_invalidate_mapping_entry(struct address_space *mapping, 690 + static int __dax_invalidate_entry(struct address_space *mapping, 621 691 pgoff_t index, bool trunc) 622 692 { 693 + XA_STATE(xas, &mapping->i_pages, index); 623 694 int ret = 0; 624 695 void *entry; 625 - struct radix_tree_root *pages = &mapping->i_pages; 626 696 627 - xa_lock_irq(pages); 628 - entry = get_unlocked_mapping_entry(mapping, index, NULL); 629 - if (!entry || WARN_ON_ONCE(!radix_tree_exceptional_entry(entry))) 697 + xas_lock_irq(&xas); 698 + entry = get_unlocked_entry(&xas); 699 + if (!entry || WARN_ON_ONCE(!xa_is_value(entry))) 630 700 goto out; 631 701 if (!trunc && 632 - (radix_tree_tag_get(pages, index, PAGECACHE_TAG_DIRTY) || 633 - radix_tree_tag_get(pages, index, PAGECACHE_TAG_TOWRITE))) 702 + (xas_get_mark(&xas, PAGECACHE_TAG_DIRTY) || 703 + xas_get_mark(&xas, PAGECACHE_TAG_TOWRITE))) 634 704 goto out; 635 705 dax_disassociate_entry(entry, mapping, trunc); 636 - radix_tree_delete(pages, index); 706 + xas_store(&xas, NULL); 637 707 mapping->nrexceptional--; 638 708 ret = 1; 639 709 out: 640 - put_unlocked_mapping_entry(mapping, index, entry); 641 - xa_unlock_irq(pages); 710 + put_unlocked_entry(&xas, entry); 711 + xas_unlock_irq(&xas); 642 712 return ret; 643 713 } 714 + 644 715 /* 645 - * Delete exceptional DAX entry at @index from @mapping. Wait for radix tree 646 - * entry to get unlocked before deleting it. 716 + * Delete DAX entry at @index from @mapping. Wait for it 717 + * to be unlocked before deleting it. 647 718 */ 648 719 int dax_delete_mapping_entry(struct address_space *mapping, pgoff_t index) 649 720 { 650 - int ret = __dax_invalidate_mapping_entry(mapping, index, true); 721 + int ret = __dax_invalidate_entry(mapping, index, true); 651 722 652 723 /* 653 724 * This gets called from truncate / punch_hole path. As such, the caller 654 725 * must hold locks protecting against concurrent modifications of the 655 - * radix tree (usually fs-private i_mmap_sem for writing). Since the 656 - * caller has seen exceptional entry for this index, we better find it 726 + * page cache (usually fs-private i_mmap_sem for writing). Since the 727 + * caller has seen a DAX entry for this index, we better find it 657 728 * at that index as well... 658 729 */ 659 730 WARN_ON_ONCE(!ret); ··· 635 758 } 636 759 637 760 /* 638 - * Invalidate exceptional DAX entry if it is clean. 761 + * Invalidate DAX entry if it is clean. 639 762 */ 640 763 int dax_invalidate_mapping_entry_sync(struct address_space *mapping, 641 764 pgoff_t index) 642 765 { 643 - return __dax_invalidate_mapping_entry(mapping, index, false); 766 + return __dax_invalidate_entry(mapping, index, false); 644 767 } 645 768 646 769 static int copy_user_dax(struct block_device *bdev, struct dax_device *dax_dev, ··· 676 799 * already in the tree, we will skip the insertion and just dirty the PMD as 677 800 * appropriate. 678 801 */ 679 - static void *dax_insert_mapping_entry(struct address_space *mapping, 680 - struct vm_fault *vmf, 681 - void *entry, pfn_t pfn_t, 682 - unsigned long flags, bool dirty) 802 + static void *dax_insert_entry(struct xa_state *xas, 803 + struct address_space *mapping, struct vm_fault *vmf, 804 + void *entry, pfn_t pfn, unsigned long flags, bool dirty) 683 805 { 684 - struct radix_tree_root *pages = &mapping->i_pages; 685 - unsigned long pfn = pfn_t_to_pfn(pfn_t); 686 - pgoff_t index = vmf->pgoff; 687 - void *new_entry; 806 + void *new_entry = dax_make_entry(pfn, flags); 688 807 689 808 if (dirty) 690 809 __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); 691 810 692 - if (dax_is_zero_entry(entry) && !(flags & RADIX_DAX_ZERO_PAGE)) { 811 + if (dax_is_zero_entry(entry) && !(flags & DAX_ZERO_PAGE)) { 812 + unsigned long index = xas->xa_index; 693 813 /* we are replacing a zero page with block mapping */ 694 814 if (dax_is_pmd_entry(entry)) 695 815 unmap_mapping_pages(mapping, index & ~PG_PMD_COLOUR, 696 - PG_PMD_NR, false); 816 + PG_PMD_NR, false); 697 817 else /* pte entry */ 698 - unmap_mapping_pages(mapping, vmf->pgoff, 1, false); 818 + unmap_mapping_pages(mapping, index, 1, false); 699 819 } 700 820 701 - xa_lock_irq(pages); 702 - new_entry = dax_radix_locked_entry(pfn, flags); 821 + xas_reset(xas); 822 + xas_lock_irq(xas); 703 823 if (dax_entry_size(entry) != dax_entry_size(new_entry)) { 704 824 dax_disassociate_entry(entry, mapping, false); 705 825 dax_associate_entry(new_entry, mapping, vmf->vma, vmf->address); ··· 704 830 705 831 if (dax_is_zero_entry(entry) || dax_is_empty_entry(entry)) { 706 832 /* 707 - * Only swap our new entry into the radix tree if the current 833 + * Only swap our new entry into the page cache if the current 708 834 * entry is a zero page or an empty entry. If a normal PTE or 709 - * PMD entry is already in the tree, we leave it alone. This 835 + * PMD entry is already in the cache, we leave it alone. This 710 836 * means that if we are trying to insert a PTE and the 711 837 * existing entry is a PMD, we will just leave the PMD in the 712 838 * tree and dirty it if necessary. 713 839 */ 714 - struct radix_tree_node *node; 715 - void **slot; 716 - void *ret; 717 - 718 - ret = __radix_tree_lookup(pages, index, &node, &slot); 719 - WARN_ON_ONCE(ret != entry); 720 - __radix_tree_replace(pages, node, slot, 721 - new_entry, NULL); 840 + void *old = dax_lock_entry(xas, new_entry); 841 + WARN_ON_ONCE(old != xa_mk_value(xa_to_value(entry) | 842 + DAX_LOCKED)); 722 843 entry = new_entry; 844 + } else { 845 + xas_load(xas); /* Walk the xa_state */ 723 846 } 724 847 725 848 if (dirty) 726 - radix_tree_tag_set(pages, index, PAGECACHE_TAG_DIRTY); 849 + xas_set_mark(xas, PAGECACHE_TAG_DIRTY); 727 850 728 - xa_unlock_irq(pages); 851 + xas_unlock_irq(xas); 729 852 return entry; 730 853 } 731 854 732 - static inline unsigned long 733 - pgoff_address(pgoff_t pgoff, struct vm_area_struct *vma) 855 + static inline 856 + unsigned long pgoff_address(pgoff_t pgoff, struct vm_area_struct *vma) 734 857 { 735 858 unsigned long address; 736 859 ··· 737 866 } 738 867 739 868 /* Walk all mappings of a given index of a file and writeprotect them */ 740 - static void dax_mapping_entry_mkclean(struct address_space *mapping, 741 - pgoff_t index, unsigned long pfn) 869 + static void dax_entry_mkclean(struct address_space *mapping, pgoff_t index, 870 + unsigned long pfn) 742 871 { 743 872 struct vm_area_struct *vma; 744 873 pte_t pte, *ptep = NULL; ··· 808 937 i_mmap_unlock_read(mapping); 809 938 } 810 939 811 - static int dax_writeback_one(struct dax_device *dax_dev, 812 - struct address_space *mapping, pgoff_t index, void *entry) 940 + static int dax_writeback_one(struct xa_state *xas, struct dax_device *dax_dev, 941 + struct address_space *mapping, void *entry) 813 942 { 814 - struct radix_tree_root *pages = &mapping->i_pages; 815 - void *entry2, **slot; 816 943 unsigned long pfn; 817 944 long ret = 0; 818 945 size_t size; ··· 819 950 * A page got tagged dirty in DAX mapping? Something is seriously 820 951 * wrong. 821 952 */ 822 - if (WARN_ON(!radix_tree_exceptional_entry(entry))) 953 + if (WARN_ON(!xa_is_value(entry))) 823 954 return -EIO; 824 955 825 - xa_lock_irq(pages); 826 - entry2 = get_unlocked_mapping_entry(mapping, index, &slot); 827 - /* Entry got punched out / reallocated? */ 828 - if (!entry2 || WARN_ON_ONCE(!radix_tree_exceptional_entry(entry2))) 829 - goto put_unlocked; 830 - /* 831 - * Entry got reallocated elsewhere? No need to writeback. We have to 832 - * compare pfns as we must not bail out due to difference in lockbit 833 - * or entry type. 834 - */ 835 - if (dax_radix_pfn(entry2) != dax_radix_pfn(entry)) 836 - goto put_unlocked; 837 - if (WARN_ON_ONCE(dax_is_empty_entry(entry) || 838 - dax_is_zero_entry(entry))) { 839 - ret = -EIO; 840 - goto put_unlocked; 956 + if (unlikely(dax_is_locked(entry))) { 957 + void *old_entry = entry; 958 + 959 + entry = get_unlocked_entry(xas); 960 + 961 + /* Entry got punched out / reallocated? */ 962 + if (!entry || WARN_ON_ONCE(!xa_is_value(entry))) 963 + goto put_unlocked; 964 + /* 965 + * Entry got reallocated elsewhere? No need to writeback. 966 + * We have to compare pfns as we must not bail out due to 967 + * difference in lockbit or entry type. 968 + */ 969 + if (dax_to_pfn(old_entry) != dax_to_pfn(entry)) 970 + goto put_unlocked; 971 + if (WARN_ON_ONCE(dax_is_empty_entry(entry) || 972 + dax_is_zero_entry(entry))) { 973 + ret = -EIO; 974 + goto put_unlocked; 975 + } 976 + 977 + /* Another fsync thread may have already done this entry */ 978 + if (!xas_get_mark(xas, PAGECACHE_TAG_TOWRITE)) 979 + goto put_unlocked; 841 980 } 842 981 843 - /* Another fsync thread may have already written back this entry */ 844 - if (!radix_tree_tag_get(pages, index, PAGECACHE_TAG_TOWRITE)) 845 - goto put_unlocked; 846 982 /* Lock the entry to serialize with page faults */ 847 - entry = lock_slot(mapping, slot); 983 + dax_lock_entry(xas, entry); 984 + 848 985 /* 849 986 * We can clear the tag now but we have to be careful so that concurrent 850 987 * dax_writeback_one() calls for the same index cannot finish before we ··· 858 983 * at the entry only under the i_pages lock and once they do that 859 984 * they will see the entry locked and wait for it to unlock. 860 985 */ 861 - radix_tree_tag_clear(pages, index, PAGECACHE_TAG_TOWRITE); 862 - xa_unlock_irq(pages); 986 + xas_clear_mark(xas, PAGECACHE_TAG_TOWRITE); 987 + xas_unlock_irq(xas); 863 988 864 989 /* 865 990 * Even if dax_writeback_mapping_range() was given a wbc->range_start ··· 868 993 * This allows us to flush for PMD_SIZE and not have to worry about 869 994 * partial PMD writebacks. 870 995 */ 871 - pfn = dax_radix_pfn(entry); 872 - size = PAGE_SIZE << dax_radix_order(entry); 996 + pfn = dax_to_pfn(entry); 997 + size = PAGE_SIZE << dax_entry_order(entry); 873 998 874 - dax_mapping_entry_mkclean(mapping, index, pfn); 999 + dax_entry_mkclean(mapping, xas->xa_index, pfn); 875 1000 dax_flush(dax_dev, page_address(pfn_to_page(pfn)), size); 876 1001 /* 877 1002 * After we have flushed the cache, we can clear the dirty tag. There ··· 879 1004 * the pfn mappings are writeprotected and fault waits for mapping 880 1005 * entry lock. 881 1006 */ 882 - xa_lock_irq(pages); 883 - radix_tree_tag_clear(pages, index, PAGECACHE_TAG_DIRTY); 884 - xa_unlock_irq(pages); 885 - trace_dax_writeback_one(mapping->host, index, size >> PAGE_SHIFT); 886 - put_locked_mapping_entry(mapping, index); 1007 + xas_reset(xas); 1008 + xas_lock_irq(xas); 1009 + xas_store(xas, entry); 1010 + xas_clear_mark(xas, PAGECACHE_TAG_DIRTY); 1011 + dax_wake_entry(xas, entry, false); 1012 + 1013 + trace_dax_writeback_one(mapping->host, xas->xa_index, 1014 + size >> PAGE_SHIFT); 887 1015 return ret; 888 1016 889 1017 put_unlocked: 890 - put_unlocked_mapping_entry(mapping, index, entry2); 891 - xa_unlock_irq(pages); 1018 + put_unlocked_entry(xas, entry); 892 1019 return ret; 893 1020 } 894 1021 ··· 902 1025 int dax_writeback_mapping_range(struct address_space *mapping, 903 1026 struct block_device *bdev, struct writeback_control *wbc) 904 1027 { 1028 + XA_STATE(xas, &mapping->i_pages, wbc->range_start >> PAGE_SHIFT); 905 1029 struct inode *inode = mapping->host; 906 - pgoff_t start_index, end_index; 907 - pgoff_t indices[PAGEVEC_SIZE]; 1030 + pgoff_t end_index = wbc->range_end >> PAGE_SHIFT; 908 1031 struct dax_device *dax_dev; 909 - struct pagevec pvec; 910 - bool done = false; 911 - int i, ret = 0; 1032 + void *entry; 1033 + int ret = 0; 1034 + unsigned int scanned = 0; 912 1035 913 1036 if (WARN_ON_ONCE(inode->i_blkbits != PAGE_SHIFT)) 914 1037 return -EIO; ··· 920 1043 if (!dax_dev) 921 1044 return -EIO; 922 1045 923 - start_index = wbc->range_start >> PAGE_SHIFT; 924 - end_index = wbc->range_end >> PAGE_SHIFT; 1046 + trace_dax_writeback_range(inode, xas.xa_index, end_index); 925 1047 926 - trace_dax_writeback_range(inode, start_index, end_index); 1048 + tag_pages_for_writeback(mapping, xas.xa_index, end_index); 927 1049 928 - tag_pages_for_writeback(mapping, start_index, end_index); 929 - 930 - pagevec_init(&pvec); 931 - while (!done) { 932 - pvec.nr = find_get_entries_tag(mapping, start_index, 933 - PAGECACHE_TAG_TOWRITE, PAGEVEC_SIZE, 934 - pvec.pages, indices); 935 - 936 - if (pvec.nr == 0) 1050 + xas_lock_irq(&xas); 1051 + xas_for_each_marked(&xas, entry, end_index, PAGECACHE_TAG_TOWRITE) { 1052 + ret = dax_writeback_one(&xas, dax_dev, mapping, entry); 1053 + if (ret < 0) { 1054 + mapping_set_error(mapping, ret); 937 1055 break; 938 - 939 - for (i = 0; i < pvec.nr; i++) { 940 - if (indices[i] > end_index) { 941 - done = true; 942 - break; 943 - } 944 - 945 - ret = dax_writeback_one(dax_dev, mapping, indices[i], 946 - pvec.pages[i]); 947 - if (ret < 0) { 948 - mapping_set_error(mapping, ret); 949 - goto out; 950 - } 951 1056 } 952 - start_index = indices[pvec.nr - 1] + 1; 1057 + if (++scanned % XA_CHECK_SCHED) 1058 + continue; 1059 + 1060 + xas_pause(&xas); 1061 + xas_unlock_irq(&xas); 1062 + cond_resched(); 1063 + xas_lock_irq(&xas); 953 1064 } 954 - out: 1065 + xas_unlock_irq(&xas); 955 1066 put_dax(dax_dev); 956 - trace_dax_writeback_range_done(inode, start_index, end_index); 957 - return (ret < 0 ? ret : 0); 1067 + trace_dax_writeback_range_done(inode, xas.xa_index, end_index); 1068 + return ret; 958 1069 } 959 1070 EXPORT_SYMBOL_GPL(dax_writeback_mapping_range); 960 1071 ··· 990 1125 * If this page is ever written to we will re-fault and change the mapping to 991 1126 * point to real DAX storage instead. 992 1127 */ 993 - static vm_fault_t dax_load_hole(struct address_space *mapping, void *entry, 994 - struct vm_fault *vmf) 1128 + static vm_fault_t dax_load_hole(struct xa_state *xas, 1129 + struct address_space *mapping, void **entry, 1130 + struct vm_fault *vmf) 995 1131 { 996 1132 struct inode *inode = mapping->host; 997 1133 unsigned long vaddr = vmf->address; 998 1134 pfn_t pfn = pfn_to_pfn_t(my_zero_pfn(vaddr)); 999 1135 vm_fault_t ret; 1000 1136 1001 - dax_insert_mapping_entry(mapping, vmf, entry, pfn, RADIX_DAX_ZERO_PAGE, 1002 - false); 1137 + *entry = dax_insert_entry(xas, mapping, vmf, *entry, pfn, 1138 + DAX_ZERO_PAGE, false); 1139 + 1003 1140 ret = vmf_insert_mixed(vmf->vma, vaddr, pfn); 1004 1141 trace_dax_load_hole(inode, vmf, ret); 1005 1142 return ret; ··· 1209 1342 { 1210 1343 struct vm_area_struct *vma = vmf->vma; 1211 1344 struct address_space *mapping = vma->vm_file->f_mapping; 1345 + XA_STATE(xas, &mapping->i_pages, vmf->pgoff); 1212 1346 struct inode *inode = mapping->host; 1213 1347 unsigned long vaddr = vmf->address; 1214 1348 loff_t pos = (loff_t)vmf->pgoff << PAGE_SHIFT; ··· 1236 1368 if (write && !vmf->cow_page) 1237 1369 flags |= IOMAP_WRITE; 1238 1370 1239 - entry = grab_mapping_entry(mapping, vmf->pgoff, 0); 1240 - if (IS_ERR(entry)) { 1241 - ret = dax_fault_return(PTR_ERR(entry)); 1371 + entry = grab_mapping_entry(&xas, mapping, 0); 1372 + if (xa_is_internal(entry)) { 1373 + ret = xa_to_internal(entry); 1242 1374 goto out; 1243 1375 } 1244 1376 ··· 1311 1443 if (error < 0) 1312 1444 goto error_finish_iomap; 1313 1445 1314 - entry = dax_insert_mapping_entry(mapping, vmf, entry, pfn, 1446 + entry = dax_insert_entry(&xas, mapping, vmf, entry, pfn, 1315 1447 0, write && !sync); 1316 1448 1317 1449 /* ··· 1339 1471 case IOMAP_UNWRITTEN: 1340 1472 case IOMAP_HOLE: 1341 1473 if (!write) { 1342 - ret = dax_load_hole(mapping, entry, vmf); 1474 + ret = dax_load_hole(&xas, mapping, &entry, vmf); 1343 1475 goto finish_iomap; 1344 1476 } 1345 1477 /*FALLTHRU*/ ··· 1366 1498 ops->iomap_end(inode, pos, PAGE_SIZE, copied, flags, &iomap); 1367 1499 } 1368 1500 unlock_entry: 1369 - put_locked_mapping_entry(mapping, vmf->pgoff); 1501 + dax_unlock_entry(&xas, entry); 1370 1502 out: 1371 1503 trace_dax_pte_fault_done(inode, vmf, ret); 1372 1504 return ret | major; 1373 1505 } 1374 1506 1375 1507 #ifdef CONFIG_FS_DAX_PMD 1376 - static vm_fault_t dax_pmd_load_hole(struct vm_fault *vmf, struct iomap *iomap, 1377 - void *entry) 1508 + static vm_fault_t dax_pmd_load_hole(struct xa_state *xas, struct vm_fault *vmf, 1509 + struct iomap *iomap, void **entry) 1378 1510 { 1379 1511 struct address_space *mapping = vmf->vma->vm_file->f_mapping; 1380 1512 unsigned long pmd_addr = vmf->address & PMD_MASK; 1381 1513 struct inode *inode = mapping->host; 1382 1514 struct page *zero_page; 1383 - void *ret = NULL; 1384 1515 spinlock_t *ptl; 1385 1516 pmd_t pmd_entry; 1386 1517 pfn_t pfn; ··· 1390 1523 goto fallback; 1391 1524 1392 1525 pfn = page_to_pfn_t(zero_page); 1393 - ret = dax_insert_mapping_entry(mapping, vmf, entry, pfn, 1394 - RADIX_DAX_PMD | RADIX_DAX_ZERO_PAGE, false); 1526 + *entry = dax_insert_entry(xas, mapping, vmf, *entry, pfn, 1527 + DAX_PMD | DAX_ZERO_PAGE, false); 1395 1528 1396 1529 ptl = pmd_lock(vmf->vma->vm_mm, vmf->pmd); 1397 1530 if (!pmd_none(*(vmf->pmd))) { ··· 1403 1536 pmd_entry = pmd_mkhuge(pmd_entry); 1404 1537 set_pmd_at(vmf->vma->vm_mm, pmd_addr, vmf->pmd, pmd_entry); 1405 1538 spin_unlock(ptl); 1406 - trace_dax_pmd_load_hole(inode, vmf, zero_page, ret); 1539 + trace_dax_pmd_load_hole(inode, vmf, zero_page, *entry); 1407 1540 return VM_FAULT_NOPAGE; 1408 1541 1409 1542 fallback: 1410 - trace_dax_pmd_load_hole_fallback(inode, vmf, zero_page, ret); 1543 + trace_dax_pmd_load_hole_fallback(inode, vmf, zero_page, *entry); 1411 1544 return VM_FAULT_FALLBACK; 1412 1545 } 1413 1546 ··· 1416 1549 { 1417 1550 struct vm_area_struct *vma = vmf->vma; 1418 1551 struct address_space *mapping = vma->vm_file->f_mapping; 1552 + XA_STATE_ORDER(xas, &mapping->i_pages, vmf->pgoff, PMD_ORDER); 1419 1553 unsigned long pmd_addr = vmf->address & PMD_MASK; 1420 1554 bool write = vmf->flags & FAULT_FLAG_WRITE; 1421 1555 bool sync; ··· 1424 1556 struct inode *inode = mapping->host; 1425 1557 vm_fault_t result = VM_FAULT_FALLBACK; 1426 1558 struct iomap iomap = { 0 }; 1427 - pgoff_t max_pgoff, pgoff; 1559 + pgoff_t max_pgoff; 1428 1560 void *entry; 1429 1561 loff_t pos; 1430 1562 int error; ··· 1435 1567 * supposed to hold locks serializing us with truncate / punch hole so 1436 1568 * this is a reliable test. 1437 1569 */ 1438 - pgoff = linear_page_index(vma, pmd_addr); 1439 1570 max_pgoff = DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE); 1440 1571 1441 1572 trace_dax_pmd_fault(inode, vmf, max_pgoff, 0); ··· 1443 1576 * Make sure that the faulting address's PMD offset (color) matches 1444 1577 * the PMD offset from the start of the file. This is necessary so 1445 1578 * that a PMD range in the page table overlaps exactly with a PMD 1446 - * range in the radix tree. 1579 + * range in the page cache. 1447 1580 */ 1448 1581 if ((vmf->pgoff & PG_PMD_COLOUR) != 1449 1582 ((vmf->address >> PAGE_SHIFT) & PG_PMD_COLOUR)) ··· 1459 1592 if ((pmd_addr + PMD_SIZE) > vma->vm_end) 1460 1593 goto fallback; 1461 1594 1462 - if (pgoff >= max_pgoff) { 1595 + if (xas.xa_index >= max_pgoff) { 1463 1596 result = VM_FAULT_SIGBUS; 1464 1597 goto out; 1465 1598 } 1466 1599 1467 1600 /* If the PMD would extend beyond the file size */ 1468 - if ((pgoff | PG_PMD_COLOUR) >= max_pgoff) 1601 + if ((xas.xa_index | PG_PMD_COLOUR) >= max_pgoff) 1469 1602 goto fallback; 1470 1603 1471 1604 /* 1472 - * grab_mapping_entry() will make sure we get a 2MiB empty entry, a 1473 - * 2MiB zero page entry or a DAX PMD. If it can't (because a 4k page 1474 - * is already in the tree, for instance), it will return -EEXIST and 1475 - * we just fall back to 4k entries. 1605 + * grab_mapping_entry() will make sure we get an empty PMD entry, 1606 + * a zero PMD entry or a DAX PMD. If it can't (because a PTE 1607 + * entry is already in the array, for instance), it will return 1608 + * VM_FAULT_FALLBACK. 1476 1609 */ 1477 - entry = grab_mapping_entry(mapping, pgoff, RADIX_DAX_PMD); 1478 - if (IS_ERR(entry)) 1610 + entry = grab_mapping_entry(&xas, mapping, DAX_PMD); 1611 + if (xa_is_internal(entry)) { 1612 + result = xa_to_internal(entry); 1479 1613 goto fallback; 1614 + } 1480 1615 1481 1616 /* 1482 1617 * It is possible, particularly with mixed reads & writes to private ··· 1497 1628 * setting up a mapping, so really we're using iomap_begin() as a way 1498 1629 * to look up our filesystem block. 1499 1630 */ 1500 - pos = (loff_t)pgoff << PAGE_SHIFT; 1631 + pos = (loff_t)xas.xa_index << PAGE_SHIFT; 1501 1632 error = ops->iomap_begin(inode, pos, PMD_SIZE, iomap_flags, &iomap); 1502 1633 if (error) 1503 1634 goto unlock_entry; ··· 1513 1644 if (error < 0) 1514 1645 goto finish_iomap; 1515 1646 1516 - entry = dax_insert_mapping_entry(mapping, vmf, entry, pfn, 1517 - RADIX_DAX_PMD, write && !sync); 1647 + entry = dax_insert_entry(&xas, mapping, vmf, entry, pfn, 1648 + DAX_PMD, write && !sync); 1518 1649 1519 1650 /* 1520 1651 * If we are doing synchronous page fault and inode needs fsync, ··· 1538 1669 case IOMAP_HOLE: 1539 1670 if (WARN_ON_ONCE(write)) 1540 1671 break; 1541 - result = dax_pmd_load_hole(vmf, &iomap, entry); 1672 + result = dax_pmd_load_hole(&xas, vmf, &iomap, &entry); 1542 1673 break; 1543 1674 default: 1544 1675 WARN_ON_ONCE(1); ··· 1561 1692 &iomap); 1562 1693 } 1563 1694 unlock_entry: 1564 - put_locked_mapping_entry(mapping, pgoff); 1695 + dax_unlock_entry(&xas, entry); 1565 1696 fallback: 1566 1697 if (result == VM_FAULT_FALLBACK) { 1567 1698 split_huge_pmd(vma, vmf->pmd, vmf->address); ··· 1606 1737 } 1607 1738 EXPORT_SYMBOL_GPL(dax_iomap_fault); 1608 1739 1609 - /** 1740 + /* 1610 1741 * dax_insert_pfn_mkwrite - insert PTE or PMD entry into page tables 1611 1742 * @vmf: The description of the fault 1612 - * @pe_size: Size of entry to be inserted 1613 1743 * @pfn: PFN to insert 1744 + * @order: Order of entry to insert. 1614 1745 * 1615 - * This function inserts writeable PTE or PMD entry into page tables for mmaped 1616 - * DAX file. It takes care of marking corresponding radix tree entry as dirty 1617 - * as well. 1746 + * This function inserts a writeable PTE or PMD entry into the page tables 1747 + * for an mmaped DAX file. It also marks the page cache entry as dirty. 1618 1748 */ 1619 - static vm_fault_t dax_insert_pfn_mkwrite(struct vm_fault *vmf, 1620 - enum page_entry_size pe_size, 1621 - pfn_t pfn) 1749 + static vm_fault_t 1750 + dax_insert_pfn_mkwrite(struct vm_fault *vmf, pfn_t pfn, unsigned int order) 1622 1751 { 1623 1752 struct address_space *mapping = vmf->vma->vm_file->f_mapping; 1624 - void *entry, **slot; 1625 - pgoff_t index = vmf->pgoff; 1753 + XA_STATE_ORDER(xas, &mapping->i_pages, vmf->pgoff, order); 1754 + void *entry; 1626 1755 vm_fault_t ret; 1627 1756 1628 - xa_lock_irq(&mapping->i_pages); 1629 - entry = get_unlocked_mapping_entry(mapping, index, &slot); 1757 + xas_lock_irq(&xas); 1758 + entry = get_unlocked_entry(&xas); 1630 1759 /* Did we race with someone splitting entry or so? */ 1631 1760 if (!entry || 1632 - (pe_size == PE_SIZE_PTE && !dax_is_pte_entry(entry)) || 1633 - (pe_size == PE_SIZE_PMD && !dax_is_pmd_entry(entry))) { 1634 - put_unlocked_mapping_entry(mapping, index, entry); 1635 - xa_unlock_irq(&mapping->i_pages); 1761 + (order == 0 && !dax_is_pte_entry(entry)) || 1762 + (order == PMD_ORDER && (xa_is_internal(entry) || 1763 + !dax_is_pmd_entry(entry)))) { 1764 + put_unlocked_entry(&xas, entry); 1765 + xas_unlock_irq(&xas); 1636 1766 trace_dax_insert_pfn_mkwrite_no_entry(mapping->host, vmf, 1637 1767 VM_FAULT_NOPAGE); 1638 1768 return VM_FAULT_NOPAGE; 1639 1769 } 1640 - radix_tree_tag_set(&mapping->i_pages, index, PAGECACHE_TAG_DIRTY); 1641 - entry = lock_slot(mapping, slot); 1642 - xa_unlock_irq(&mapping->i_pages); 1643 - switch (pe_size) { 1644 - case PE_SIZE_PTE: 1770 + xas_set_mark(&xas, PAGECACHE_TAG_DIRTY); 1771 + dax_lock_entry(&xas, entry); 1772 + xas_unlock_irq(&xas); 1773 + if (order == 0) 1645 1774 ret = vmf_insert_mixed_mkwrite(vmf->vma, vmf->address, pfn); 1646 - break; 1647 1775 #ifdef CONFIG_FS_DAX_PMD 1648 - case PE_SIZE_PMD: 1776 + else if (order == PMD_ORDER) 1649 1777 ret = vmf_insert_pfn_pmd(vmf->vma, vmf->address, vmf->pmd, 1650 1778 pfn, true); 1651 - break; 1652 1779 #endif 1653 - default: 1780 + else 1654 1781 ret = VM_FAULT_FALLBACK; 1655 - } 1656 - put_locked_mapping_entry(mapping, index); 1782 + dax_unlock_entry(&xas, entry); 1657 1783 trace_dax_insert_pfn_mkwrite(mapping->host, vmf, ret); 1658 1784 return ret; 1659 1785 } ··· 1668 1804 { 1669 1805 int err; 1670 1806 loff_t start = ((loff_t)vmf->pgoff) << PAGE_SHIFT; 1671 - size_t len = 0; 1807 + unsigned int order = pe_order(pe_size); 1808 + size_t len = PAGE_SIZE << order; 1672 1809 1673 - if (pe_size == PE_SIZE_PTE) 1674 - len = PAGE_SIZE; 1675 - else if (pe_size == PE_SIZE_PMD) 1676 - len = PMD_SIZE; 1677 - else 1678 - WARN_ON_ONCE(1); 1679 1810 err = vfs_fsync_range(vmf->vma->vm_file, start, start + len - 1, 1); 1680 1811 if (err) 1681 1812 return VM_FAULT_SIGBUS; 1682 - return dax_insert_pfn_mkwrite(vmf, pe_size, pfn); 1813 + return dax_insert_pfn_mkwrite(vmf, pfn, order); 1683 1814 } 1684 1815 EXPORT_SYMBOL_GPL(dax_finish_sync_fault);
+1 -1
fs/ext4/inode.c
··· 2643 2643 long left = mpd->wbc->nr_to_write; 2644 2644 pgoff_t index = mpd->first_page; 2645 2645 pgoff_t end = mpd->last_page; 2646 - int tag; 2646 + xa_mark_t tag; 2647 2647 int i, err = 0; 2648 2648 int blkbits = mpd->inode->i_blkbits; 2649 2649 ext4_lblk_t lblk;
+3 -3
fs/f2fs/data.c
··· 2071 2071 pgoff_t done_index; 2072 2072 int cycled; 2073 2073 int range_whole = 0; 2074 - int tag; 2074 + xa_mark_t tag; 2075 2075 int nwritten = 0; 2076 2076 2077 2077 pagevec_init(&pvec); ··· 2787 2787 #endif 2788 2788 }; 2789 2789 2790 - void f2fs_clear_radix_tree_dirty_tag(struct page *page) 2790 + void f2fs_clear_page_cache_dirty_tag(struct page *page) 2791 2791 { 2792 2792 struct address_space *mapping = page_mapping(page); 2793 2793 unsigned long flags; 2794 2794 2795 2795 xa_lock_irqsave(&mapping->i_pages, flags); 2796 - radix_tree_tag_clear(&mapping->i_pages, page_index(page), 2796 + __xa_clear_mark(&mapping->i_pages, page_index(page), 2797 2797 PAGECACHE_TAG_DIRTY); 2798 2798 xa_unlock_irqrestore(&mapping->i_pages, flags); 2799 2799 }
+1 -1
fs/f2fs/dir.c
··· 726 726 727 727 if (bit_pos == NR_DENTRY_IN_BLOCK && 728 728 !f2fs_truncate_hole(dir, page->index, page->index + 1)) { 729 - f2fs_clear_radix_tree_dirty_tag(page); 729 + f2fs_clear_page_cache_dirty_tag(page); 730 730 clear_page_dirty_for_io(page); 731 731 ClearPagePrivate(page); 732 732 ClearPageUptodate(page);
+1 -1
fs/f2fs/f2fs.h
··· 3108 3108 struct page *page, enum migrate_mode mode); 3109 3109 #endif 3110 3110 bool f2fs_overwrite_io(struct inode *inode, loff_t pos, size_t len); 3111 - void f2fs_clear_radix_tree_dirty_tag(struct page *page); 3111 + void f2fs_clear_page_cache_dirty_tag(struct page *page); 3112 3112 3113 3113 /* 3114 3114 * gc.c
+1 -1
fs/f2fs/inline.c
··· 243 243 kunmap_atomic(src_addr); 244 244 set_page_dirty(dn.inode_page); 245 245 246 - f2fs_clear_radix_tree_dirty_tag(page); 246 + f2fs_clear_page_cache_dirty_tag(page); 247 247 248 248 set_inode_flag(inode, FI_APPEND_WRITE); 249 249 set_inode_flag(inode, FI_DATA_EXIST);
+2 -4
fs/f2fs/node.c
··· 101 101 static void clear_node_page_dirty(struct page *page) 102 102 { 103 103 if (PageDirty(page)) { 104 - f2fs_clear_radix_tree_dirty_tag(page); 104 + f2fs_clear_page_cache_dirty_tag(page); 105 105 clear_page_dirty_for_io(page); 106 106 dec_page_count(F2FS_P_SB(page), F2FS_DIRTY_NODES); 107 107 } ··· 1306 1306 if (f2fs_check_nid_range(sbi, nid)) 1307 1307 return; 1308 1308 1309 - rcu_read_lock(); 1310 - apage = radix_tree_lookup(&NODE_MAPPING(sbi)->i_pages, nid); 1311 - rcu_read_unlock(); 1309 + apage = xa_load(&NODE_MAPPING(sbi)->i_pages, nid); 1312 1310 if (apage) 1313 1311 return; 1314 1312
+9 -16
fs/fs-writeback.c
··· 339 339 struct address_space *mapping = inode->i_mapping; 340 340 struct bdi_writeback *old_wb = inode->i_wb; 341 341 struct bdi_writeback *new_wb = isw->new_wb; 342 - struct radix_tree_iter iter; 342 + XA_STATE(xas, &mapping->i_pages, 0); 343 + struct page *page; 343 344 bool switched = false; 344 - void **slot; 345 345 346 346 /* 347 347 * By the time control reaches here, RCU grace period has passed ··· 375 375 * to possibly dirty pages while PAGECACHE_TAG_WRITEBACK points to 376 376 * pages actually under writeback. 377 377 */ 378 - radix_tree_for_each_tagged(slot, &mapping->i_pages, &iter, 0, 379 - PAGECACHE_TAG_DIRTY) { 380 - struct page *page = radix_tree_deref_slot_protected(slot, 381 - &mapping->i_pages.xa_lock); 382 - if (likely(page) && PageDirty(page)) { 378 + xas_for_each_marked(&xas, page, ULONG_MAX, PAGECACHE_TAG_DIRTY) { 379 + if (PageDirty(page)) { 383 380 dec_wb_stat(old_wb, WB_RECLAIMABLE); 384 381 inc_wb_stat(new_wb, WB_RECLAIMABLE); 385 382 } 386 383 } 387 384 388 - radix_tree_for_each_tagged(slot, &mapping->i_pages, &iter, 0, 389 - PAGECACHE_TAG_WRITEBACK) { 390 - struct page *page = radix_tree_deref_slot_protected(slot, 391 - &mapping->i_pages.xa_lock); 392 - if (likely(page)) { 393 - WARN_ON_ONCE(!PageWriteback(page)); 394 - dec_wb_stat(old_wb, WB_WRITEBACK); 395 - inc_wb_stat(new_wb, WB_WRITEBACK); 396 - } 385 + xas_set(&xas, 0); 386 + xas_for_each_marked(&xas, page, ULONG_MAX, PAGECACHE_TAG_WRITEBACK) { 387 + WARN_ON_ONCE(!PageWriteback(page)); 388 + dec_wb_stat(old_wb, WB_WRITEBACK); 389 + inc_wb_stat(new_wb, WB_WRITEBACK); 397 390 } 398 391 399 392 wb_get(new_wb);
+1 -1
fs/gfs2/aops.c
··· 366 366 pgoff_t done_index; 367 367 int cycled; 368 368 int range_whole = 0; 369 - int tag; 369 + xa_mark_t tag; 370 370 371 371 pagevec_init(&pvec); 372 372 if (wbc->range_cyclic) {
+1 -1
fs/inode.c
··· 349 349 350 350 static void __address_space_init_once(struct address_space *mapping) 351 351 { 352 - INIT_RADIX_TREE(&mapping->i_pages, GFP_ATOMIC | __GFP_ACCOUNT); 352 + xa_init_flags(&mapping->i_pages, XA_FLAGS_LOCK_IRQ); 353 353 init_rwsem(&mapping->i_mmap_rwsem); 354 354 INIT_LIST_HEAD(&mapping->private_list); 355 355 spin_lock_init(&mapping->private_lock);
+1 -1
fs/isofs/dir.c
··· 46 46 return i; 47 47 } 48 48 49 - /* Acorn extensions written by Matthew Wilcox <willy@bofh.ai> 1998 */ 49 + /* Acorn extensions written by Matthew Wilcox <willy@infradead.org> 1998 */ 50 50 int get_acorn_filename(struct iso_directory_record *de, 51 51 char *retname, struct inode *inode) 52 52 {
+1 -1
fs/nfs/blocklayout/blocklayout.c
··· 896 896 end = DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE); 897 897 if (end != inode->i_mapping->nrpages) { 898 898 rcu_read_lock(); 899 - end = page_cache_next_hole(mapping, idx + 1, ULONG_MAX); 899 + end = page_cache_next_miss(mapping, idx + 1, ULONG_MAX); 900 900 rcu_read_unlock(); 901 901 } 902 902
+9 -17
fs/nilfs2/btnode.c
··· 168 168 ctxt->newbh = NULL; 169 169 170 170 if (inode->i_blkbits == PAGE_SHIFT) { 171 - lock_page(obh->b_page); 172 - /* 173 - * We cannot call radix_tree_preload for the kernels older 174 - * than 2.6.23, because it is not exported for modules. 175 - */ 171 + struct page *opage = obh->b_page; 172 + lock_page(opage); 176 173 retry: 177 - err = radix_tree_preload(GFP_NOFS & ~__GFP_HIGHMEM); 178 - if (err) 179 - goto failed_unlock; 180 174 /* BUG_ON(oldkey != obh->b_page->index); */ 181 - if (unlikely(oldkey != obh->b_page->index)) 182 - NILFS_PAGE_BUG(obh->b_page, 175 + if (unlikely(oldkey != opage->index)) 176 + NILFS_PAGE_BUG(opage, 183 177 "invalid oldkey %lld (newkey=%lld)", 184 178 (unsigned long long)oldkey, 185 179 (unsigned long long)newkey); 186 180 187 181 xa_lock_irq(&btnc->i_pages); 188 - err = radix_tree_insert(&btnc->i_pages, newkey, obh->b_page); 182 + err = __xa_insert(&btnc->i_pages, newkey, opage, GFP_NOFS); 189 183 xa_unlock_irq(&btnc->i_pages); 190 184 /* 191 185 * Note: page->index will not change to newkey until ··· 187 193 * To protect the page in intermediate state, the page lock 188 194 * is held. 189 195 */ 190 - radix_tree_preload_end(); 191 196 if (!err) 192 197 return 0; 193 198 else if (err != -EEXIST) ··· 196 203 if (!err) 197 204 goto retry; 198 205 /* fallback to copy mode */ 199 - unlock_page(obh->b_page); 206 + unlock_page(opage); 200 207 } 201 208 202 209 nbh = nilfs_btnode_create_block(btnc, newkey); ··· 236 243 mark_buffer_dirty(obh); 237 244 238 245 xa_lock_irq(&btnc->i_pages); 239 - radix_tree_delete(&btnc->i_pages, oldkey); 240 - radix_tree_tag_set(&btnc->i_pages, newkey, 241 - PAGECACHE_TAG_DIRTY); 246 + __xa_erase(&btnc->i_pages, oldkey); 247 + __xa_set_mark(&btnc->i_pages, newkey, PAGECACHE_TAG_DIRTY); 242 248 xa_unlock_irq(&btnc->i_pages); 243 249 244 250 opage->index = obh->b_blocknr = newkey; ··· 267 275 268 276 if (nbh == NULL) { /* blocksize == pagesize */ 269 277 xa_lock_irq(&btnc->i_pages); 270 - radix_tree_delete(&btnc->i_pages, newkey); 278 + __xa_erase(&btnc->i_pages, newkey); 271 279 xa_unlock_irq(&btnc->i_pages); 272 280 unlock_page(ctxt->bh->b_page); 273 281 } else
+13 -16
fs/nilfs2/page.c
··· 289 289 * @dmap: destination page cache 290 290 * @smap: source page cache 291 291 * 292 - * No pages must no be added to the cache during this process. 292 + * No pages must be added to the cache during this process. 293 293 * This must be ensured by the caller. 294 294 */ 295 295 void nilfs_copy_back_pages(struct address_space *dmap, ··· 298 298 struct pagevec pvec; 299 299 unsigned int i, n; 300 300 pgoff_t index = 0; 301 - int err; 302 301 303 302 pagevec_init(&pvec); 304 303 repeat: ··· 312 313 lock_page(page); 313 314 dpage = find_lock_page(dmap, offset); 314 315 if (dpage) { 315 - /* override existing page on the destination cache */ 316 + /* overwrite existing page in the destination cache */ 316 317 WARN_ON(PageDirty(dpage)); 317 318 nilfs_copy_page(dpage, page, 0); 318 319 unlock_page(dpage); 319 320 put_page(dpage); 321 + /* Do we not need to remove page from smap here? */ 320 322 } else { 321 - struct page *page2; 323 + struct page *p; 322 324 323 325 /* move the page to the destination cache */ 324 326 xa_lock_irq(&smap->i_pages); 325 - page2 = radix_tree_delete(&smap->i_pages, offset); 326 - WARN_ON(page2 != page); 327 - 327 + p = __xa_erase(&smap->i_pages, offset); 328 + WARN_ON(page != p); 328 329 smap->nrpages--; 329 330 xa_unlock_irq(&smap->i_pages); 330 331 331 332 xa_lock_irq(&dmap->i_pages); 332 - err = radix_tree_insert(&dmap->i_pages, offset, page); 333 - if (unlikely(err < 0)) { 334 - WARN_ON(err == -EEXIST); 333 + p = __xa_store(&dmap->i_pages, offset, page, GFP_NOFS); 334 + if (unlikely(p)) { 335 + /* Probably -ENOMEM */ 335 336 page->mapping = NULL; 336 - put_page(page); /* for cache */ 337 + put_page(page); 337 338 } else { 338 339 page->mapping = dmap; 339 340 dmap->nrpages++; 340 341 if (PageDirty(page)) 341 - radix_tree_tag_set(&dmap->i_pages, 342 - offset, 343 - PAGECACHE_TAG_DIRTY); 342 + __xa_set_mark(&dmap->i_pages, offset, 343 + PAGECACHE_TAG_DIRTY); 344 344 } 345 345 xa_unlock_irq(&dmap->i_pages); 346 346 } ··· 465 467 if (mapping) { 466 468 xa_lock_irq(&mapping->i_pages); 467 469 if (test_bit(PG_dirty, &page->flags)) { 468 - radix_tree_tag_clear(&mapping->i_pages, 469 - page_index(page), 470 + __xa_clear_mark(&mapping->i_pages, page_index(page), 470 471 PAGECACHE_TAG_DIRTY); 471 472 xa_unlock_irq(&mapping->i_pages); 472 473 return clear_page_dirty_for_io(page);
+1 -1
fs/proc/task_mmu.c
··· 521 521 if (!page) 522 522 return; 523 523 524 - if (radix_tree_exceptional_entry(page)) 524 + if (xa_is_value(page)) 525 525 mss->swap += PAGE_SIZE; 526 526 else 527 527 put_page(page);
+42 -23
include/linux/fs.h
··· 403 403 loff_t pos, unsigned len, unsigned copied, 404 404 struct page *page, void *fsdata); 405 405 406 + /** 407 + * struct address_space - Contents of a cacheable, mappable object. 408 + * @host: Owner, either the inode or the block_device. 409 + * @i_pages: Cached pages. 410 + * @gfp_mask: Memory allocation flags to use for allocating pages. 411 + * @i_mmap_writable: Number of VM_SHARED mappings. 412 + * @i_mmap: Tree of private and shared mappings. 413 + * @i_mmap_rwsem: Protects @i_mmap and @i_mmap_writable. 414 + * @nrpages: Number of page entries, protected by the i_pages lock. 415 + * @nrexceptional: Shadow or DAX entries, protected by the i_pages lock. 416 + * @writeback_index: Writeback starts here. 417 + * @a_ops: Methods. 418 + * @flags: Error bits and flags (AS_*). 419 + * @wb_err: The most recent error which has occurred. 420 + * @private_lock: For use by the owner of the address_space. 421 + * @private_list: For use by the owner of the address_space. 422 + * @private_data: For use by the owner of the address_space. 423 + */ 406 424 struct address_space { 407 - struct inode *host; /* owner: inode, block_device */ 408 - struct radix_tree_root i_pages; /* cached pages */ 409 - atomic_t i_mmap_writable;/* count VM_SHARED mappings */ 410 - struct rb_root_cached i_mmap; /* tree of private and shared mappings */ 411 - struct rw_semaphore i_mmap_rwsem; /* protect tree, count, list */ 412 - /* Protected by the i_pages lock */ 413 - unsigned long nrpages; /* number of total pages */ 414 - /* number of shadow or DAX exceptional entries */ 425 + struct inode *host; 426 + struct xarray i_pages; 427 + gfp_t gfp_mask; 428 + atomic_t i_mmap_writable; 429 + struct rb_root_cached i_mmap; 430 + struct rw_semaphore i_mmap_rwsem; 431 + unsigned long nrpages; 415 432 unsigned long nrexceptional; 416 - pgoff_t writeback_index;/* writeback starts here */ 417 - const struct address_space_operations *a_ops; /* methods */ 418 - unsigned long flags; /* error bits */ 419 - spinlock_t private_lock; /* for use by the address_space */ 420 - gfp_t gfp_mask; /* implicit gfp mask for allocations */ 421 - struct list_head private_list; /* for use by the address_space */ 422 - void *private_data; /* ditto */ 433 + pgoff_t writeback_index; 434 + const struct address_space_operations *a_ops; 435 + unsigned long flags; 423 436 errseq_t wb_err; 437 + spinlock_t private_lock; 438 + struct list_head private_list; 439 + void *private_data; 424 440 } __attribute__((aligned(sizeof(long)))) __randomize_layout; 425 441 /* 426 442 * On most architectures that alignment is already the case; but ··· 483 467 struct mutex bd_fsfreeze_mutex; 484 468 } __randomize_layout; 485 469 486 - /* 487 - * Radix-tree tags, for tagging dirty and writeback pages within the pagecache 488 - * radix trees 489 - */ 490 - #define PAGECACHE_TAG_DIRTY 0 491 - #define PAGECACHE_TAG_WRITEBACK 1 492 - #define PAGECACHE_TAG_TOWRITE 2 470 + /* XArray tags, for tagging dirty and writeback pages in the pagecache. */ 471 + #define PAGECACHE_TAG_DIRTY XA_MARK_0 472 + #define PAGECACHE_TAG_WRITEBACK XA_MARK_1 473 + #define PAGECACHE_TAG_TOWRITE XA_MARK_2 493 474 494 - int mapping_tagged(struct address_space *mapping, int tag); 475 + /* 476 + * Returns true if any of the pages in the mapping are marked with the tag. 477 + */ 478 + static inline bool mapping_tagged(struct address_space *mapping, xa_mark_t tag) 479 + { 480 + return xa_marked(&mapping->i_pages, tag); 481 + } 495 482 496 483 static inline void i_mmap_lock_write(struct address_space *mapping) 497 484 {
+7 -11
include/linux/idr.h
··· 214 214 ++id, (entry) = idr_get_next((idr), &(id))) 215 215 216 216 /* 217 - * IDA - IDR based id allocator, use when translation from id to 218 - * pointer isn't necessary. 217 + * IDA - ID Allocator, use when translation from id to pointer isn't necessary. 219 218 */ 220 219 #define IDA_CHUNK_SIZE 128 /* 128 bytes per chunk */ 221 220 #define IDA_BITMAP_LONGS (IDA_CHUNK_SIZE / sizeof(long)) ··· 224 225 unsigned long bitmap[IDA_BITMAP_LONGS]; 225 226 }; 226 227 227 - DECLARE_PER_CPU(struct ida_bitmap *, ida_bitmap); 228 - 229 228 struct ida { 230 - struct radix_tree_root ida_rt; 229 + struct xarray xa; 231 230 }; 232 231 232 + #define IDA_INIT_FLAGS (XA_FLAGS_LOCK_IRQ | XA_FLAGS_ALLOC) 233 + 233 234 #define IDA_INIT(name) { \ 234 - .ida_rt = RADIX_TREE_INIT(name, IDR_RT_MARKER | GFP_NOWAIT), \ 235 + .xa = XARRAY_INIT(name, IDA_INIT_FLAGS) \ 235 236 } 236 237 #define DEFINE_IDA(name) struct ida name = IDA_INIT(name) 237 238 ··· 291 292 292 293 static inline void ida_init(struct ida *ida) 293 294 { 294 - INIT_RADIX_TREE(&ida->ida_rt, IDR_RT_MARKER | GFP_NOWAIT); 295 + xa_init_flags(&ida->xa, IDA_INIT_FLAGS); 295 296 } 296 297 297 298 #define ida_simple_get(ida, start, end, gfp) \ ··· 300 301 301 302 static inline bool ida_is_empty(const struct ida *ida) 302 303 { 303 - return radix_tree_empty(&ida->ida_rt); 304 + return xa_empty(&ida->xa); 304 305 } 305 - 306 - /* in lib/radix-tree.c */ 307 - int ida_pre_get(struct ida *ida, gfp_t gfp_mask); 308 306 #endif /* __IDR_H__ */
+5 -5
include/linux/pagemap.h
··· 241 241 242 242 typedef int filler_t(void *, struct page *); 243 243 244 - pgoff_t page_cache_next_hole(struct address_space *mapping, 244 + pgoff_t page_cache_next_miss(struct address_space *mapping, 245 245 pgoff_t index, unsigned long max_scan); 246 - pgoff_t page_cache_prev_hole(struct address_space *mapping, 246 + pgoff_t page_cache_prev_miss(struct address_space *mapping, 247 247 pgoff_t index, unsigned long max_scan); 248 248 249 249 #define FGP_ACCESSED 0x00000001 ··· 363 363 unsigned find_get_pages_contig(struct address_space *mapping, pgoff_t start, 364 364 unsigned int nr_pages, struct page **pages); 365 365 unsigned find_get_pages_range_tag(struct address_space *mapping, pgoff_t *index, 366 - pgoff_t end, int tag, unsigned int nr_pages, 366 + pgoff_t end, xa_mark_t tag, unsigned int nr_pages, 367 367 struct page **pages); 368 368 static inline unsigned find_get_pages_tag(struct address_space *mapping, 369 - pgoff_t *index, int tag, unsigned int nr_pages, 369 + pgoff_t *index, xa_mark_t tag, unsigned int nr_pages, 370 370 struct page **pages) 371 371 { 372 372 return find_get_pages_range_tag(mapping, index, (pgoff_t)-1, tag, 373 373 nr_pages, pages); 374 374 } 375 375 unsigned find_get_entries_tag(struct address_space *mapping, pgoff_t start, 376 - int tag, unsigned int nr_entries, 376 + xa_mark_t tag, unsigned int nr_entries, 377 377 struct page **entries, pgoff_t *indices); 378 378 379 379 struct page *grab_cache_page_write_begin(struct address_space *mapping,
+5 -3
include/linux/pagevec.h
··· 9 9 #ifndef _LINUX_PAGEVEC_H 10 10 #define _LINUX_PAGEVEC_H 11 11 12 + #include <linux/xarray.h> 13 + 12 14 /* 15 pointers + header align the pagevec structure to a power of two */ 13 15 #define PAGEVEC_SIZE 15 14 16 ··· 42 40 43 41 unsigned pagevec_lookup_range_tag(struct pagevec *pvec, 44 42 struct address_space *mapping, pgoff_t *index, pgoff_t end, 45 - int tag); 43 + xa_mark_t tag); 46 44 unsigned pagevec_lookup_range_nr_tag(struct pagevec *pvec, 47 45 struct address_space *mapping, pgoff_t *index, pgoff_t end, 48 - int tag, unsigned max_pages); 46 + xa_mark_t tag, unsigned max_pages); 49 47 static inline unsigned pagevec_lookup_tag(struct pagevec *pvec, 50 - struct address_space *mapping, pgoff_t *index, int tag) 48 + struct address_space *mapping, pgoff_t *index, xa_mark_t tag) 51 49 { 52 50 return pagevec_lookup_range_tag(pvec, mapping, index, (pgoff_t)-1, tag); 53 51 }
+24 -154
include/linux/radix-tree.h
··· 28 28 #include <linux/rcupdate.h> 29 29 #include <linux/spinlock.h> 30 30 #include <linux/types.h> 31 + #include <linux/xarray.h> 32 + 33 + /* Keep unconverted code working */ 34 + #define radix_tree_root xarray 35 + #define radix_tree_node xa_node 31 36 32 37 /* 33 38 * The bottom two bits of the slot determine how the remaining bits in the 34 39 * slot are interpreted: 35 40 * 36 41 * 00 - data pointer 37 - * 01 - internal entry 38 - * 10 - exceptional entry 39 - * 11 - this bit combination is currently unused/reserved 42 + * 10 - internal entry 43 + * x1 - value entry 40 44 * 41 45 * The internal entry may be a pointer to the next level in the tree, a 42 46 * sibling entry, or an indicator that the entry in this slot has been moved 43 47 * to another location in the tree and the lookup should be restarted. While 44 48 * NULL fits the 'data pointer' pattern, it means that there is no entry in 45 49 * the tree for this index (no matter what level of the tree it is found at). 46 - * This means that you cannot store NULL in the tree as a value for the index. 50 + * This means that storing a NULL entry in the tree is the same as deleting 51 + * the entry from the tree. 47 52 */ 48 53 #define RADIX_TREE_ENTRY_MASK 3UL 49 - #define RADIX_TREE_INTERNAL_NODE 1UL 50 - 51 - /* 52 - * Most users of the radix tree store pointers but shmem/tmpfs stores swap 53 - * entries in the same tree. They are marked as exceptional entries to 54 - * distinguish them from pointers to struct page. 55 - * EXCEPTIONAL_ENTRY tests the bit, EXCEPTIONAL_SHIFT shifts content past it. 56 - */ 57 - #define RADIX_TREE_EXCEPTIONAL_ENTRY 2 58 - #define RADIX_TREE_EXCEPTIONAL_SHIFT 2 54 + #define RADIX_TREE_INTERNAL_NODE 2UL 59 55 60 56 static inline bool radix_tree_is_internal_node(void *ptr) 61 57 { ··· 61 65 62 66 /*** radix-tree API starts here ***/ 63 67 64 - #define RADIX_TREE_MAX_TAGS 3 65 - 66 - #ifndef RADIX_TREE_MAP_SHIFT 67 - #define RADIX_TREE_MAP_SHIFT (CONFIG_BASE_SMALL ? 4 : 6) 68 - #endif 69 - 68 + #define RADIX_TREE_MAP_SHIFT XA_CHUNK_SHIFT 70 69 #define RADIX_TREE_MAP_SIZE (1UL << RADIX_TREE_MAP_SHIFT) 71 70 #define RADIX_TREE_MAP_MASK (RADIX_TREE_MAP_SIZE-1) 72 71 73 - #define RADIX_TREE_TAG_LONGS \ 74 - ((RADIX_TREE_MAP_SIZE + BITS_PER_LONG - 1) / BITS_PER_LONG) 72 + #define RADIX_TREE_MAX_TAGS XA_MAX_MARKS 73 + #define RADIX_TREE_TAG_LONGS XA_MARK_LONGS 75 74 76 75 #define RADIX_TREE_INDEX_BITS (8 /* CHAR_BIT */ * sizeof(unsigned long)) 77 76 #define RADIX_TREE_MAX_PATH (DIV_ROUND_UP(RADIX_TREE_INDEX_BITS, \ 78 77 RADIX_TREE_MAP_SHIFT)) 79 78 80 - /* 81 - * @count is the count of every non-NULL element in the ->slots array 82 - * whether that is an exceptional entry, a retry entry, a user pointer, 83 - * a sibling entry or a pointer to the next level of the tree. 84 - * @exceptional is the count of every element in ->slots which is 85 - * either radix_tree_exceptional_entry() or is a sibling entry for an 86 - * exceptional entry. 87 - */ 88 - struct radix_tree_node { 89 - unsigned char shift; /* Bits remaining in each slot */ 90 - unsigned char offset; /* Slot offset in parent */ 91 - unsigned char count; /* Total entry count */ 92 - unsigned char exceptional; /* Exceptional entry count */ 93 - struct radix_tree_node *parent; /* Used when ascending tree */ 94 - struct radix_tree_root *root; /* The tree we belong to */ 95 - union { 96 - struct list_head private_list; /* For tree user */ 97 - struct rcu_head rcu_head; /* Used when freeing node */ 98 - }; 99 - void __rcu *slots[RADIX_TREE_MAP_SIZE]; 100 - unsigned long tags[RADIX_TREE_MAX_TAGS][RADIX_TREE_TAG_LONGS]; 101 - }; 102 - 103 - /* The IDR tag is stored in the low bits of the GFP flags */ 79 + /* The IDR tag is stored in the low bits of xa_flags */ 104 80 #define ROOT_IS_IDR ((__force gfp_t)4) 105 - /* The top bits of gfp_mask are used to store the root tags */ 81 + /* The top bits of xa_flags are used to store the root tags */ 106 82 #define ROOT_TAG_SHIFT (__GFP_BITS_SHIFT) 107 83 108 - struct radix_tree_root { 109 - spinlock_t xa_lock; 110 - gfp_t gfp_mask; 111 - struct radix_tree_node __rcu *rnode; 112 - }; 113 - 114 - #define RADIX_TREE_INIT(name, mask) { \ 115 - .xa_lock = __SPIN_LOCK_UNLOCKED(name.xa_lock), \ 116 - .gfp_mask = (mask), \ 117 - .rnode = NULL, \ 118 - } 84 + #define RADIX_TREE_INIT(name, mask) XARRAY_INIT(name, mask) 119 85 120 86 #define RADIX_TREE(name, mask) \ 121 87 struct radix_tree_root name = RADIX_TREE_INIT(name, mask) 122 88 123 - #define INIT_RADIX_TREE(root, mask) \ 124 - do { \ 125 - spin_lock_init(&(root)->xa_lock); \ 126 - (root)->gfp_mask = (mask); \ 127 - (root)->rnode = NULL; \ 128 - } while (0) 89 + #define INIT_RADIX_TREE(root, mask) xa_init_flags(root, mask) 129 90 130 91 static inline bool radix_tree_empty(const struct radix_tree_root *root) 131 92 { 132 - return root->rnode == NULL; 93 + return root->xa_head == NULL; 133 94 } 134 95 135 96 /** ··· 96 143 * @next_index: one beyond the last index for this chunk 97 144 * @tags: bit-mask for tag-iterating 98 145 * @node: node that contains current slot 99 - * @shift: shift for the node that holds our slots 100 146 * 101 147 * This radix tree iterator works in terms of "chunks" of slots. A chunk is a 102 148 * subinterval of slots contained within one radix tree leaf node. It is ··· 109 157 unsigned long next_index; 110 158 unsigned long tags; 111 159 struct radix_tree_node *node; 112 - #ifdef CONFIG_RADIX_TREE_MULTIORDER 113 - unsigned int shift; 114 - #endif 115 160 }; 116 - 117 - static inline unsigned int iter_shift(const struct radix_tree_iter *iter) 118 - { 119 - #ifdef CONFIG_RADIX_TREE_MULTIORDER 120 - return iter->shift; 121 - #else 122 - return 0; 123 - #endif 124 - } 125 161 126 162 /** 127 163 * Radix-tree synchronization ··· 134 194 * radix_tree_lookup_slot 135 195 * radix_tree_tag_get 136 196 * radix_tree_gang_lookup 137 - * radix_tree_gang_lookup_slot 138 197 * radix_tree_gang_lookup_tag 139 198 * radix_tree_gang_lookup_tag_slot 140 199 * radix_tree_tagged 141 200 * 142 - * The first 8 functions are able to be called locklessly, using RCU. The 201 + * The first 7 functions are able to be called locklessly, using RCU. The 143 202 * caller must ensure calls to these functions are made within rcu_read_lock() 144 203 * regions. Other readers (lock-free or otherwise) and modifications may be 145 204 * running concurrently. ··· 208 269 } 209 270 210 271 /** 211 - * radix_tree_exceptional_entry - radix_tree_deref_slot gave exceptional entry? 212 - * @arg: value returned by radix_tree_deref_slot 213 - * Returns: 0 if well-aligned pointer, non-0 if exceptional entry. 214 - */ 215 - static inline int radix_tree_exceptional_entry(void *arg) 216 - { 217 - /* Not unlikely because radix_tree_exception often tested first */ 218 - return (unsigned long)arg & RADIX_TREE_EXCEPTIONAL_ENTRY; 219 - } 220 - 221 - /** 222 272 * radix_tree_exception - radix_tree_deref_slot returned either exception? 223 273 * @arg: value returned by radix_tree_deref_slot 224 274 * Returns: 0 if well-aligned pointer, non-0 if either kind of exception. ··· 217 289 return unlikely((unsigned long)arg & RADIX_TREE_ENTRY_MASK); 218 290 } 219 291 220 - int __radix_tree_create(struct radix_tree_root *, unsigned long index, 221 - unsigned order, struct radix_tree_node **nodep, 222 - void __rcu ***slotp); 223 - int __radix_tree_insert(struct radix_tree_root *, unsigned long index, 224 - unsigned order, void *); 225 - static inline int radix_tree_insert(struct radix_tree_root *root, 226 - unsigned long index, void *entry) 227 - { 228 - return __radix_tree_insert(root, index, 0, entry); 229 - } 292 + int radix_tree_insert(struct radix_tree_root *, unsigned long index, 293 + void *); 230 294 void *__radix_tree_lookup(const struct radix_tree_root *, unsigned long index, 231 295 struct radix_tree_node **nodep, void __rcu ***slotp); 232 296 void *radix_tree_lookup(const struct radix_tree_root *, unsigned long); 233 297 void __rcu **radix_tree_lookup_slot(const struct radix_tree_root *, 234 298 unsigned long index); 235 - typedef void (*radix_tree_update_node_t)(struct radix_tree_node *); 236 299 void __radix_tree_replace(struct radix_tree_root *, struct radix_tree_node *, 237 - void __rcu **slot, void *entry, 238 - radix_tree_update_node_t update_node); 300 + void __rcu **slot, void *entry); 239 301 void radix_tree_iter_replace(struct radix_tree_root *, 240 302 const struct radix_tree_iter *, void __rcu **slot, void *entry); 241 303 void radix_tree_replace_slot(struct radix_tree_root *, 242 304 void __rcu **slot, void *entry); 243 - void __radix_tree_delete_node(struct radix_tree_root *, 244 - struct radix_tree_node *, 245 - radix_tree_update_node_t update_node); 246 305 void radix_tree_iter_delete(struct radix_tree_root *, 247 306 struct radix_tree_iter *iter, void __rcu **slot); 248 307 void *radix_tree_delete_item(struct radix_tree_root *, unsigned long, void *); 249 308 void *radix_tree_delete(struct radix_tree_root *, unsigned long); 250 - void radix_tree_clear_tags(struct radix_tree_root *, struct radix_tree_node *, 251 - void __rcu **slot); 252 309 unsigned int radix_tree_gang_lookup(const struct radix_tree_root *, 253 310 void **results, unsigned long first_index, 254 311 unsigned int max_items); 255 - unsigned int radix_tree_gang_lookup_slot(const struct radix_tree_root *, 256 - void __rcu ***results, unsigned long *indices, 257 - unsigned long first_index, unsigned int max_items); 258 312 int radix_tree_preload(gfp_t gfp_mask); 259 313 int radix_tree_maybe_preload(gfp_t gfp_mask); 260 - int radix_tree_maybe_preload_order(gfp_t gfp_mask, int order); 261 314 void radix_tree_init(void); 262 315 void *radix_tree_tag_set(struct radix_tree_root *, 263 316 unsigned long index, unsigned int tag); ··· 246 337 unsigned long index, unsigned int tag); 247 338 int radix_tree_tag_get(const struct radix_tree_root *, 248 339 unsigned long index, unsigned int tag); 249 - void radix_tree_iter_tag_set(struct radix_tree_root *, 250 - const struct radix_tree_iter *iter, unsigned int tag); 251 340 void radix_tree_iter_tag_clear(struct radix_tree_root *, 252 341 const struct radix_tree_iter *iter, unsigned int tag); 253 342 unsigned int radix_tree_gang_lookup_tag(const struct radix_tree_root *, ··· 260 353 { 261 354 preempt_enable(); 262 355 } 263 - 264 - int radix_tree_split_preload(unsigned old_order, unsigned new_order, gfp_t); 265 - int radix_tree_split(struct radix_tree_root *, unsigned long index, 266 - unsigned new_order); 267 - int radix_tree_join(struct radix_tree_root *, unsigned long index, 268 - unsigned new_order, void *); 269 356 270 357 void __rcu **idr_get_free(struct radix_tree_root *root, 271 358 struct radix_tree_iter *iter, gfp_t gfp, ··· 366 465 static inline unsigned long 367 466 __radix_tree_iter_add(struct radix_tree_iter *iter, unsigned long slots) 368 467 { 369 - return iter->index + (slots << iter_shift(iter)); 468 + return iter->index + slots; 370 469 } 371 470 372 471 /** ··· 391 490 static __always_inline long 392 491 radix_tree_chunk_size(struct radix_tree_iter *iter) 393 492 { 394 - return (iter->next_index - iter->index) >> iter_shift(iter); 493 + return iter->next_index - iter->index; 395 494 } 396 - 397 - #ifdef CONFIG_RADIX_TREE_MULTIORDER 398 - void __rcu **__radix_tree_next_slot(void __rcu **slot, 399 - struct radix_tree_iter *iter, unsigned flags); 400 - #else 401 - /* Can't happen without sibling entries, but the compiler can't tell that */ 402 - static inline void __rcu **__radix_tree_next_slot(void __rcu **slot, 403 - struct radix_tree_iter *iter, unsigned flags) 404 - { 405 - return slot; 406 - } 407 - #endif 408 495 409 496 /** 410 497 * radix_tree_next_slot - find next slot in chunk ··· 452 563 return NULL; 453 564 454 565 found: 455 - if (unlikely(radix_tree_is_internal_node(rcu_dereference_raw(*slot)))) 456 - return __radix_tree_next_slot(slot, iter, flags); 457 566 return slot; 458 567 } 459 568 ··· 469 582 for (slot = radix_tree_iter_init(iter, start) ; \ 470 583 slot || (slot = radix_tree_next_chunk(root, iter, 0)) ; \ 471 584 slot = radix_tree_next_slot(slot, iter, 0)) 472 - 473 - /** 474 - * radix_tree_for_each_contig - iterate over contiguous slots 475 - * 476 - * @slot: the void** variable for pointer to slot 477 - * @root: the struct radix_tree_root pointer 478 - * @iter: the struct radix_tree_iter pointer 479 - * @start: iteration starting index 480 - * 481 - * @slot points to radix tree slot, @iter->index contains its index. 482 - */ 483 - #define radix_tree_for_each_contig(slot, root, iter, start) \ 484 - for (slot = radix_tree_iter_init(iter, start) ; \ 485 - slot || (slot = radix_tree_next_chunk(root, iter, \ 486 - RADIX_TREE_ITER_CONTIG)) ; \ 487 - slot = radix_tree_next_slot(slot, iter, \ 488 - RADIX_TREE_ITER_CONTIG)) 489 585 490 586 /** 491 587 * radix_tree_for_each_tagged - iterate over tagged slots
+9 -13
include/linux/swap.h
··· 300 300 void workingset_refault(struct page *page, void *shadow); 301 301 void workingset_activation(struct page *page); 302 302 303 - /* Do not use directly, use workingset_lookup_update */ 304 - void workingset_update_node(struct radix_tree_node *node); 305 - 306 - /* Returns workingset_update_node() if the mapping has shadow entries. */ 307 - #define workingset_lookup_update(mapping) \ 308 - ({ \ 309 - radix_tree_update_node_t __helper = workingset_update_node; \ 310 - if (dax_mapping(mapping) || shmem_mapping(mapping)) \ 311 - __helper = NULL; \ 312 - __helper; \ 313 - }) 303 + /* Only track the nodes of mappings with shadow entries */ 304 + void workingset_update_node(struct xa_node *node); 305 + #define mapping_set_update(xas, mapping) do { \ 306 + if (!dax_mapping(mapping) && !shmem_mapping(mapping)) \ 307 + xas_set_update(xas, workingset_update_node); \ 308 + } while (0) 314 309 315 310 /* linux/mm/page_alloc.c */ 316 311 extern unsigned long totalram_pages; ··· 404 409 extern int add_to_swap(struct page *page); 405 410 extern int add_to_swap_cache(struct page *, swp_entry_t, gfp_t); 406 411 extern int __add_to_swap_cache(struct page *page, swp_entry_t entry); 407 - extern void __delete_from_swap_cache(struct page *); 412 + extern void __delete_from_swap_cache(struct page *, swp_entry_t entry); 408 413 extern void delete_from_swap_cache(struct page *); 409 414 extern void free_page_and_swap_cache(struct page *); 410 415 extern void free_pages_and_swap_cache(struct page **, int); ··· 558 563 return -1; 559 564 } 560 565 561 - static inline void __delete_from_swap_cache(struct page *page) 566 + static inline void __delete_from_swap_cache(struct page *page, 567 + swp_entry_t entry) 562 568 { 563 569 } 564 570
+7 -12
include/linux/swapops.h
··· 18 18 * 19 19 * swp_entry_t's are *never* stored anywhere in their arch-dependent format. 20 20 */ 21 - #define SWP_TYPE_SHIFT(e) ((sizeof(e.val) * 8) - \ 22 - (MAX_SWAPFILES_SHIFT + RADIX_TREE_EXCEPTIONAL_SHIFT)) 23 - #define SWP_OFFSET_MASK(e) ((1UL << SWP_TYPE_SHIFT(e)) - 1) 21 + #define SWP_TYPE_SHIFT (BITS_PER_XA_VALUE - MAX_SWAPFILES_SHIFT) 22 + #define SWP_OFFSET_MASK ((1UL << SWP_TYPE_SHIFT) - 1) 24 23 25 24 /* 26 25 * Store a type+offset into a swp_entry_t in an arch-independent format ··· 28 29 { 29 30 swp_entry_t ret; 30 31 31 - ret.val = (type << SWP_TYPE_SHIFT(ret)) | 32 - (offset & SWP_OFFSET_MASK(ret)); 32 + ret.val = (type << SWP_TYPE_SHIFT) | (offset & SWP_OFFSET_MASK); 33 33 return ret; 34 34 } 35 35 ··· 38 40 */ 39 41 static inline unsigned swp_type(swp_entry_t entry) 40 42 { 41 - return (entry.val >> SWP_TYPE_SHIFT(entry)); 43 + return (entry.val >> SWP_TYPE_SHIFT); 42 44 } 43 45 44 46 /* ··· 47 49 */ 48 50 static inline pgoff_t swp_offset(swp_entry_t entry) 49 51 { 50 - return entry.val & SWP_OFFSET_MASK(entry); 52 + return entry.val & SWP_OFFSET_MASK; 51 53 } 52 54 53 55 #ifdef CONFIG_MMU ··· 88 90 { 89 91 swp_entry_t entry; 90 92 91 - entry.val = (unsigned long)arg >> RADIX_TREE_EXCEPTIONAL_SHIFT; 93 + entry.val = xa_to_value(arg); 92 94 return entry; 93 95 } 94 96 95 97 static inline void *swp_to_radix_entry(swp_entry_t entry) 96 98 { 97 - unsigned long value; 98 - 99 - value = entry.val << RADIX_TREE_EXCEPTIONAL_SHIFT; 100 - return (void *)(value | RADIX_TREE_EXCEPTIONAL_ENTRY); 99 + return xa_mk_value(entry.val); 101 100 } 102 101 103 102 #if IS_ENABLED(CONFIG_DEVICE_PRIVATE)
+1292 -1
include/linux/xarray.h
··· 4 4 /* 5 5 * eXtensible Arrays 6 6 * Copyright (c) 2017 Microsoft Corporation 7 - * Author: Matthew Wilcox <mawilcox@microsoft.com> 7 + * Author: Matthew Wilcox <willy@infradead.org> 8 + * 9 + * See Documentation/core-api/xarray.rst for how to use the XArray. 8 10 */ 9 11 12 + #include <linux/bug.h> 13 + #include <linux/compiler.h> 14 + #include <linux/gfp.h> 15 + #include <linux/kconfig.h> 16 + #include <linux/kernel.h> 17 + #include <linux/rcupdate.h> 10 18 #include <linux/spinlock.h> 19 + #include <linux/types.h> 20 + 21 + /* 22 + * The bottom two bits of the entry determine how the XArray interprets 23 + * the contents: 24 + * 25 + * 00: Pointer entry 26 + * 10: Internal entry 27 + * x1: Value entry or tagged pointer 28 + * 29 + * Attempting to store internal entries in the XArray is a bug. 30 + * 31 + * Most internal entries are pointers to the next node in the tree. 32 + * The following internal entries have a special meaning: 33 + * 34 + * 0-62: Sibling entries 35 + * 256: Zero entry 36 + * 257: Retry entry 37 + * 38 + * Errors are also represented as internal entries, but use the negative 39 + * space (-4094 to -2). They're never stored in the slots array; only 40 + * returned by the normal API. 41 + */ 42 + 43 + #define BITS_PER_XA_VALUE (BITS_PER_LONG - 1) 44 + 45 + /** 46 + * xa_mk_value() - Create an XArray entry from an integer. 47 + * @v: Value to store in XArray. 48 + * 49 + * Context: Any context. 50 + * Return: An entry suitable for storing in the XArray. 51 + */ 52 + static inline void *xa_mk_value(unsigned long v) 53 + { 54 + WARN_ON((long)v < 0); 55 + return (void *)((v << 1) | 1); 56 + } 57 + 58 + /** 59 + * xa_to_value() - Get value stored in an XArray entry. 60 + * @entry: XArray entry. 61 + * 62 + * Context: Any context. 63 + * Return: The value stored in the XArray entry. 64 + */ 65 + static inline unsigned long xa_to_value(const void *entry) 66 + { 67 + return (unsigned long)entry >> 1; 68 + } 69 + 70 + /** 71 + * xa_is_value() - Determine if an entry is a value. 72 + * @entry: XArray entry. 73 + * 74 + * Context: Any context. 75 + * Return: True if the entry is a value, false if it is a pointer. 76 + */ 77 + static inline bool xa_is_value(const void *entry) 78 + { 79 + return (unsigned long)entry & 1; 80 + } 81 + 82 + /** 83 + * xa_tag_pointer() - Create an XArray entry for a tagged pointer. 84 + * @p: Plain pointer. 85 + * @tag: Tag value (0, 1 or 3). 86 + * 87 + * If the user of the XArray prefers, they can tag their pointers instead 88 + * of storing value entries. Three tags are available (0, 1 and 3). 89 + * These are distinct from the xa_mark_t as they are not replicated up 90 + * through the array and cannot be searched for. 91 + * 92 + * Context: Any context. 93 + * Return: An XArray entry. 94 + */ 95 + static inline void *xa_tag_pointer(void *p, unsigned long tag) 96 + { 97 + return (void *)((unsigned long)p | tag); 98 + } 99 + 100 + /** 101 + * xa_untag_pointer() - Turn an XArray entry into a plain pointer. 102 + * @entry: XArray entry. 103 + * 104 + * If you have stored a tagged pointer in the XArray, call this function 105 + * to get the untagged version of the pointer. 106 + * 107 + * Context: Any context. 108 + * Return: A pointer. 109 + */ 110 + static inline void *xa_untag_pointer(void *entry) 111 + { 112 + return (void *)((unsigned long)entry & ~3UL); 113 + } 114 + 115 + /** 116 + * xa_pointer_tag() - Get the tag stored in an XArray entry. 117 + * @entry: XArray entry. 118 + * 119 + * If you have stored a tagged pointer in the XArray, call this function 120 + * to get the tag of that pointer. 121 + * 122 + * Context: Any context. 123 + * Return: A tag. 124 + */ 125 + static inline unsigned int xa_pointer_tag(void *entry) 126 + { 127 + return (unsigned long)entry & 3UL; 128 + } 129 + 130 + /* 131 + * xa_mk_internal() - Create an internal entry. 132 + * @v: Value to turn into an internal entry. 133 + * 134 + * Context: Any context. 135 + * Return: An XArray internal entry corresponding to this value. 136 + */ 137 + static inline void *xa_mk_internal(unsigned long v) 138 + { 139 + return (void *)((v << 2) | 2); 140 + } 141 + 142 + /* 143 + * xa_to_internal() - Extract the value from an internal entry. 144 + * @entry: XArray entry. 145 + * 146 + * Context: Any context. 147 + * Return: The value which was stored in the internal entry. 148 + */ 149 + static inline unsigned long xa_to_internal(const void *entry) 150 + { 151 + return (unsigned long)entry >> 2; 152 + } 153 + 154 + /* 155 + * xa_is_internal() - Is the entry an internal entry? 156 + * @entry: XArray entry. 157 + * 158 + * Context: Any context. 159 + * Return: %true if the entry is an internal entry. 160 + */ 161 + static inline bool xa_is_internal(const void *entry) 162 + { 163 + return ((unsigned long)entry & 3) == 2; 164 + } 165 + 166 + /** 167 + * xa_is_err() - Report whether an XArray operation returned an error 168 + * @entry: Result from calling an XArray function 169 + * 170 + * If an XArray operation cannot complete an operation, it will return 171 + * a special value indicating an error. This function tells you 172 + * whether an error occurred; xa_err() tells you which error occurred. 173 + * 174 + * Context: Any context. 175 + * Return: %true if the entry indicates an error. 176 + */ 177 + static inline bool xa_is_err(const void *entry) 178 + { 179 + return unlikely(xa_is_internal(entry)); 180 + } 181 + 182 + /** 183 + * xa_err() - Turn an XArray result into an errno. 184 + * @entry: Result from calling an XArray function. 185 + * 186 + * If an XArray operation cannot complete an operation, it will return 187 + * a special pointer value which encodes an errno. This function extracts 188 + * the errno from the pointer value, or returns 0 if the pointer does not 189 + * represent an errno. 190 + * 191 + * Context: Any context. 192 + * Return: A negative errno or 0. 193 + */ 194 + static inline int xa_err(void *entry) 195 + { 196 + /* xa_to_internal() would not do sign extension. */ 197 + if (xa_is_err(entry)) 198 + return (long)entry >> 2; 199 + return 0; 200 + } 201 + 202 + typedef unsigned __bitwise xa_mark_t; 203 + #define XA_MARK_0 ((__force xa_mark_t)0U) 204 + #define XA_MARK_1 ((__force xa_mark_t)1U) 205 + #define XA_MARK_2 ((__force xa_mark_t)2U) 206 + #define XA_PRESENT ((__force xa_mark_t)8U) 207 + #define XA_MARK_MAX XA_MARK_2 208 + #define XA_FREE_MARK XA_MARK_0 209 + 210 + enum xa_lock_type { 211 + XA_LOCK_IRQ = 1, 212 + XA_LOCK_BH = 2, 213 + }; 214 + 215 + /* 216 + * Values for xa_flags. The radix tree stores its GFP flags in the xa_flags, 217 + * and we remain compatible with that. 218 + */ 219 + #define XA_FLAGS_LOCK_IRQ ((__force gfp_t)XA_LOCK_IRQ) 220 + #define XA_FLAGS_LOCK_BH ((__force gfp_t)XA_LOCK_BH) 221 + #define XA_FLAGS_TRACK_FREE ((__force gfp_t)4U) 222 + #define XA_FLAGS_MARK(mark) ((__force gfp_t)((1U << __GFP_BITS_SHIFT) << \ 223 + (__force unsigned)(mark))) 224 + 225 + #define XA_FLAGS_ALLOC (XA_FLAGS_TRACK_FREE | XA_FLAGS_MARK(XA_FREE_MARK)) 226 + 227 + /** 228 + * struct xarray - The anchor of the XArray. 229 + * @xa_lock: Lock that protects the contents of the XArray. 230 + * 231 + * To use the xarray, define it statically or embed it in your data structure. 232 + * It is a very small data structure, so it does not usually make sense to 233 + * allocate it separately and keep a pointer to it in your data structure. 234 + * 235 + * You may use the xa_lock to protect your own data structures as well. 236 + */ 237 + /* 238 + * If all of the entries in the array are NULL, @xa_head is a NULL pointer. 239 + * If the only non-NULL entry in the array is at index 0, @xa_head is that 240 + * entry. If any other entry in the array is non-NULL, @xa_head points 241 + * to an @xa_node. 242 + */ 243 + struct xarray { 244 + spinlock_t xa_lock; 245 + /* private: The rest of the data structure is not to be used directly. */ 246 + gfp_t xa_flags; 247 + void __rcu * xa_head; 248 + }; 249 + 250 + #define XARRAY_INIT(name, flags) { \ 251 + .xa_lock = __SPIN_LOCK_UNLOCKED(name.xa_lock), \ 252 + .xa_flags = flags, \ 253 + .xa_head = NULL, \ 254 + } 255 + 256 + /** 257 + * DEFINE_XARRAY_FLAGS() - Define an XArray with custom flags. 258 + * @name: A string that names your XArray. 259 + * @flags: XA_FLAG values. 260 + * 261 + * This is intended for file scope definitions of XArrays. It declares 262 + * and initialises an empty XArray with the chosen name and flags. It is 263 + * equivalent to calling xa_init_flags() on the array, but it does the 264 + * initialisation at compiletime instead of runtime. 265 + */ 266 + #define DEFINE_XARRAY_FLAGS(name, flags) \ 267 + struct xarray name = XARRAY_INIT(name, flags) 268 + 269 + /** 270 + * DEFINE_XARRAY() - Define an XArray. 271 + * @name: A string that names your XArray. 272 + * 273 + * This is intended for file scope definitions of XArrays. It declares 274 + * and initialises an empty XArray with the chosen name. It is equivalent 275 + * to calling xa_init() on the array, but it does the initialisation at 276 + * compiletime instead of runtime. 277 + */ 278 + #define DEFINE_XARRAY(name) DEFINE_XARRAY_FLAGS(name, 0) 279 + 280 + /** 281 + * DEFINE_XARRAY_ALLOC() - Define an XArray which can allocate IDs. 282 + * @name: A string that names your XArray. 283 + * 284 + * This is intended for file scope definitions of allocating XArrays. 285 + * See also DEFINE_XARRAY(). 286 + */ 287 + #define DEFINE_XARRAY_ALLOC(name) DEFINE_XARRAY_FLAGS(name, XA_FLAGS_ALLOC) 288 + 289 + void xa_init_flags(struct xarray *, gfp_t flags); 290 + void *xa_load(struct xarray *, unsigned long index); 291 + void *xa_store(struct xarray *, unsigned long index, void *entry, gfp_t); 292 + void *xa_cmpxchg(struct xarray *, unsigned long index, 293 + void *old, void *entry, gfp_t); 294 + int xa_reserve(struct xarray *, unsigned long index, gfp_t); 295 + void *xa_store_range(struct xarray *, unsigned long first, unsigned long last, 296 + void *entry, gfp_t); 297 + bool xa_get_mark(struct xarray *, unsigned long index, xa_mark_t); 298 + void xa_set_mark(struct xarray *, unsigned long index, xa_mark_t); 299 + void xa_clear_mark(struct xarray *, unsigned long index, xa_mark_t); 300 + void *xa_find(struct xarray *xa, unsigned long *index, 301 + unsigned long max, xa_mark_t) __attribute__((nonnull(2))); 302 + void *xa_find_after(struct xarray *xa, unsigned long *index, 303 + unsigned long max, xa_mark_t) __attribute__((nonnull(2))); 304 + unsigned int xa_extract(struct xarray *, void **dst, unsigned long start, 305 + unsigned long max, unsigned int n, xa_mark_t); 306 + void xa_destroy(struct xarray *); 307 + 308 + /** 309 + * xa_init() - Initialise an empty XArray. 310 + * @xa: XArray. 311 + * 312 + * An empty XArray is full of NULL entries. 313 + * 314 + * Context: Any context. 315 + */ 316 + static inline void xa_init(struct xarray *xa) 317 + { 318 + xa_init_flags(xa, 0); 319 + } 320 + 321 + /** 322 + * xa_empty() - Determine if an array has any present entries. 323 + * @xa: XArray. 324 + * 325 + * Context: Any context. 326 + * Return: %true if the array contains only NULL pointers. 327 + */ 328 + static inline bool xa_empty(const struct xarray *xa) 329 + { 330 + return xa->xa_head == NULL; 331 + } 332 + 333 + /** 334 + * xa_marked() - Inquire whether any entry in this array has a mark set 335 + * @xa: Array 336 + * @mark: Mark value 337 + * 338 + * Context: Any context. 339 + * Return: %true if any entry has this mark set. 340 + */ 341 + static inline bool xa_marked(const struct xarray *xa, xa_mark_t mark) 342 + { 343 + return xa->xa_flags & XA_FLAGS_MARK(mark); 344 + } 345 + 346 + /** 347 + * xa_erase() - Erase this entry from the XArray. 348 + * @xa: XArray. 349 + * @index: Index of entry. 350 + * 351 + * This function is the equivalent of calling xa_store() with %NULL as 352 + * the third argument. The XArray does not need to allocate memory, so 353 + * the user does not need to provide GFP flags. 354 + * 355 + * Context: Process context. Takes and releases the xa_lock. 356 + * Return: The entry which used to be at this index. 357 + */ 358 + static inline void *xa_erase(struct xarray *xa, unsigned long index) 359 + { 360 + return xa_store(xa, index, NULL, 0); 361 + } 362 + 363 + /** 364 + * xa_insert() - Store this entry in the XArray unless another entry is 365 + * already present. 366 + * @xa: XArray. 367 + * @index: Index into array. 368 + * @entry: New entry. 369 + * @gfp: Memory allocation flags. 370 + * 371 + * If you would rather see the existing entry in the array, use xa_cmpxchg(). 372 + * This function is for users who don't care what the entry is, only that 373 + * one is present. 374 + * 375 + * Context: Process context. Takes and releases the xa_lock. 376 + * May sleep if the @gfp flags permit. 377 + * Return: 0 if the store succeeded. -EEXIST if another entry was present. 378 + * -ENOMEM if memory could not be allocated. 379 + */ 380 + static inline int xa_insert(struct xarray *xa, unsigned long index, 381 + void *entry, gfp_t gfp) 382 + { 383 + void *curr = xa_cmpxchg(xa, index, NULL, entry, gfp); 384 + if (!curr) 385 + return 0; 386 + if (xa_is_err(curr)) 387 + return xa_err(curr); 388 + return -EEXIST; 389 + } 390 + 391 + /** 392 + * xa_release() - Release a reserved entry. 393 + * @xa: XArray. 394 + * @index: Index of entry. 395 + * 396 + * After calling xa_reserve(), you can call this function to release the 397 + * reservation. If the entry at @index has been stored to, this function 398 + * will do nothing. 399 + */ 400 + static inline void xa_release(struct xarray *xa, unsigned long index) 401 + { 402 + xa_cmpxchg(xa, index, NULL, NULL, 0); 403 + } 404 + 405 + /** 406 + * xa_for_each() - Iterate over a portion of an XArray. 407 + * @xa: XArray. 408 + * @entry: Entry retrieved from array. 409 + * @index: Index of @entry. 410 + * @max: Maximum index to retrieve from array. 411 + * @filter: Selection criterion. 412 + * 413 + * Initialise @index to the lowest index you want to retrieve from the 414 + * array. During the iteration, @entry will have the value of the entry 415 + * stored in @xa at @index. The iteration will skip all entries in the 416 + * array which do not match @filter. You may modify @index during the 417 + * iteration if you want to skip or reprocess indices. It is safe to modify 418 + * the array during the iteration. At the end of the iteration, @entry will 419 + * be set to NULL and @index will have a value less than or equal to max. 420 + * 421 + * xa_for_each() is O(n.log(n)) while xas_for_each() is O(n). You have 422 + * to handle your own locking with xas_for_each(), and if you have to unlock 423 + * after each iteration, it will also end up being O(n.log(n)). xa_for_each() 424 + * will spin if it hits a retry entry; if you intend to see retry entries, 425 + * you should use the xas_for_each() iterator instead. The xas_for_each() 426 + * iterator will expand into more inline code than xa_for_each(). 427 + * 428 + * Context: Any context. Takes and releases the RCU lock. 429 + */ 430 + #define xa_for_each(xa, entry, index, max, filter) \ 431 + for (entry = xa_find(xa, &index, max, filter); entry; \ 432 + entry = xa_find_after(xa, &index, max, filter)) 11 433 12 434 #define xa_trylock(xa) spin_trylock(&(xa)->xa_lock) 13 435 #define xa_lock(xa) spin_lock(&(xa)->xa_lock) ··· 442 20 spin_lock_irqsave(&(xa)->xa_lock, flags) 443 21 #define xa_unlock_irqrestore(xa, flags) \ 444 22 spin_unlock_irqrestore(&(xa)->xa_lock, flags) 23 + 24 + /* 25 + * Versions of the normal API which require the caller to hold the 26 + * xa_lock. If the GFP flags allow it, they will drop the lock to 27 + * allocate memory, then reacquire it afterwards. These functions 28 + * may also re-enable interrupts if the XArray flags indicate the 29 + * locking should be interrupt safe. 30 + */ 31 + void *__xa_erase(struct xarray *, unsigned long index); 32 + void *__xa_store(struct xarray *, unsigned long index, void *entry, gfp_t); 33 + void *__xa_cmpxchg(struct xarray *, unsigned long index, void *old, 34 + void *entry, gfp_t); 35 + int __xa_alloc(struct xarray *, u32 *id, u32 max, void *entry, gfp_t); 36 + void __xa_set_mark(struct xarray *, unsigned long index, xa_mark_t); 37 + void __xa_clear_mark(struct xarray *, unsigned long index, xa_mark_t); 38 + 39 + /** 40 + * __xa_insert() - Store this entry in the XArray unless another entry is 41 + * already present. 42 + * @xa: XArray. 43 + * @index: Index into array. 44 + * @entry: New entry. 45 + * @gfp: Memory allocation flags. 46 + * 47 + * If you would rather see the existing entry in the array, use __xa_cmpxchg(). 48 + * This function is for users who don't care what the entry is, only that 49 + * one is present. 50 + * 51 + * Context: Any context. Expects xa_lock to be held on entry. May 52 + * release and reacquire xa_lock if the @gfp flags permit. 53 + * Return: 0 if the store succeeded. -EEXIST if another entry was present. 54 + * -ENOMEM if memory could not be allocated. 55 + */ 56 + static inline int __xa_insert(struct xarray *xa, unsigned long index, 57 + void *entry, gfp_t gfp) 58 + { 59 + void *curr = __xa_cmpxchg(xa, index, NULL, entry, gfp); 60 + if (!curr) 61 + return 0; 62 + if (xa_is_err(curr)) 63 + return xa_err(curr); 64 + return -EEXIST; 65 + } 66 + 67 + /** 68 + * xa_erase_bh() - Erase this entry from the XArray. 69 + * @xa: XArray. 70 + * @index: Index of entry. 71 + * 72 + * This function is the equivalent of calling xa_store() with %NULL as 73 + * the third argument. The XArray does not need to allocate memory, so 74 + * the user does not need to provide GFP flags. 75 + * 76 + * Context: Process context. Takes and releases the xa_lock while 77 + * disabling softirqs. 78 + * Return: The entry which used to be at this index. 79 + */ 80 + static inline void *xa_erase_bh(struct xarray *xa, unsigned long index) 81 + { 82 + void *entry; 83 + 84 + xa_lock_bh(xa); 85 + entry = __xa_erase(xa, index); 86 + xa_unlock_bh(xa); 87 + 88 + return entry; 89 + } 90 + 91 + /** 92 + * xa_erase_irq() - Erase this entry from the XArray. 93 + * @xa: XArray. 94 + * @index: Index of entry. 95 + * 96 + * This function is the equivalent of calling xa_store() with %NULL as 97 + * the third argument. The XArray does not need to allocate memory, so 98 + * the user does not need to provide GFP flags. 99 + * 100 + * Context: Process context. Takes and releases the xa_lock while 101 + * disabling interrupts. 102 + * Return: The entry which used to be at this index. 103 + */ 104 + static inline void *xa_erase_irq(struct xarray *xa, unsigned long index) 105 + { 106 + void *entry; 107 + 108 + xa_lock_irq(xa); 109 + entry = __xa_erase(xa, index); 110 + xa_unlock_irq(xa); 111 + 112 + return entry; 113 + } 114 + 115 + /** 116 + * xa_alloc() - Find somewhere to store this entry in the XArray. 117 + * @xa: XArray. 118 + * @id: Pointer to ID. 119 + * @max: Maximum ID to allocate (inclusive). 120 + * @entry: New entry. 121 + * @gfp: Memory allocation flags. 122 + * 123 + * Allocates an unused ID in the range specified by @id and @max. 124 + * Updates the @id pointer with the index, then stores the entry at that 125 + * index. A concurrent lookup will not see an uninitialised @id. 126 + * 127 + * Context: Process context. Takes and releases the xa_lock. May sleep if 128 + * the @gfp flags permit. 129 + * Return: 0 on success, -ENOMEM if memory allocation fails or -ENOSPC if 130 + * there is no more space in the XArray. 131 + */ 132 + static inline int xa_alloc(struct xarray *xa, u32 *id, u32 max, void *entry, 133 + gfp_t gfp) 134 + { 135 + int err; 136 + 137 + xa_lock(xa); 138 + err = __xa_alloc(xa, id, max, entry, gfp); 139 + xa_unlock(xa); 140 + 141 + return err; 142 + } 143 + 144 + /** 145 + * xa_alloc_bh() - Find somewhere to store this entry in the XArray. 146 + * @xa: XArray. 147 + * @id: Pointer to ID. 148 + * @max: Maximum ID to allocate (inclusive). 149 + * @entry: New entry. 150 + * @gfp: Memory allocation flags. 151 + * 152 + * Allocates an unused ID in the range specified by @id and @max. 153 + * Updates the @id pointer with the index, then stores the entry at that 154 + * index. A concurrent lookup will not see an uninitialised @id. 155 + * 156 + * Context: Process context. Takes and releases the xa_lock while 157 + * disabling softirqs. May sleep if the @gfp flags permit. 158 + * Return: 0 on success, -ENOMEM if memory allocation fails or -ENOSPC if 159 + * there is no more space in the XArray. 160 + */ 161 + static inline int xa_alloc_bh(struct xarray *xa, u32 *id, u32 max, void *entry, 162 + gfp_t gfp) 163 + { 164 + int err; 165 + 166 + xa_lock_bh(xa); 167 + err = __xa_alloc(xa, id, max, entry, gfp); 168 + xa_unlock_bh(xa); 169 + 170 + return err; 171 + } 172 + 173 + /** 174 + * xa_alloc_irq() - Find somewhere to store this entry in the XArray. 175 + * @xa: XArray. 176 + * @id: Pointer to ID. 177 + * @max: Maximum ID to allocate (inclusive). 178 + * @entry: New entry. 179 + * @gfp: Memory allocation flags. 180 + * 181 + * Allocates an unused ID in the range specified by @id and @max. 182 + * Updates the @id pointer with the index, then stores the entry at that 183 + * index. A concurrent lookup will not see an uninitialised @id. 184 + * 185 + * Context: Process context. Takes and releases the xa_lock while 186 + * disabling interrupts. May sleep if the @gfp flags permit. 187 + * Return: 0 on success, -ENOMEM if memory allocation fails or -ENOSPC if 188 + * there is no more space in the XArray. 189 + */ 190 + static inline int xa_alloc_irq(struct xarray *xa, u32 *id, u32 max, void *entry, 191 + gfp_t gfp) 192 + { 193 + int err; 194 + 195 + xa_lock_irq(xa); 196 + err = __xa_alloc(xa, id, max, entry, gfp); 197 + xa_unlock_irq(xa); 198 + 199 + return err; 200 + } 201 + 202 + /* Everything below here is the Advanced API. Proceed with caution. */ 203 + 204 + /* 205 + * The xarray is constructed out of a set of 'chunks' of pointers. Choosing 206 + * the best chunk size requires some tradeoffs. A power of two recommends 207 + * itself so that we can walk the tree based purely on shifts and masks. 208 + * Generally, the larger the better; as the number of slots per level of the 209 + * tree increases, the less tall the tree needs to be. But that needs to be 210 + * balanced against the memory consumption of each node. On a 64-bit system, 211 + * xa_node is currently 576 bytes, and we get 7 of them per 4kB page. If we 212 + * doubled the number of slots per node, we'd get only 3 nodes per 4kB page. 213 + */ 214 + #ifndef XA_CHUNK_SHIFT 215 + #define XA_CHUNK_SHIFT (CONFIG_BASE_SMALL ? 4 : 6) 216 + #endif 217 + #define XA_CHUNK_SIZE (1UL << XA_CHUNK_SHIFT) 218 + #define XA_CHUNK_MASK (XA_CHUNK_SIZE - 1) 219 + #define XA_MAX_MARKS 3 220 + #define XA_MARK_LONGS DIV_ROUND_UP(XA_CHUNK_SIZE, BITS_PER_LONG) 221 + 222 + /* 223 + * @count is the count of every non-NULL element in the ->slots array 224 + * whether that is a value entry, a retry entry, a user pointer, 225 + * a sibling entry or a pointer to the next level of the tree. 226 + * @nr_values is the count of every element in ->slots which is 227 + * either a value entry or a sibling of a value entry. 228 + */ 229 + struct xa_node { 230 + unsigned char shift; /* Bits remaining in each slot */ 231 + unsigned char offset; /* Slot offset in parent */ 232 + unsigned char count; /* Total entry count */ 233 + unsigned char nr_values; /* Value entry count */ 234 + struct xa_node __rcu *parent; /* NULL at top of tree */ 235 + struct xarray *array; /* The array we belong to */ 236 + union { 237 + struct list_head private_list; /* For tree user */ 238 + struct rcu_head rcu_head; /* Used when freeing node */ 239 + }; 240 + void __rcu *slots[XA_CHUNK_SIZE]; 241 + union { 242 + unsigned long tags[XA_MAX_MARKS][XA_MARK_LONGS]; 243 + unsigned long marks[XA_MAX_MARKS][XA_MARK_LONGS]; 244 + }; 245 + }; 246 + 247 + void xa_dump(const struct xarray *); 248 + void xa_dump_node(const struct xa_node *); 249 + 250 + #ifdef XA_DEBUG 251 + #define XA_BUG_ON(xa, x) do { \ 252 + if (x) { \ 253 + xa_dump(xa); \ 254 + BUG(); \ 255 + } \ 256 + } while (0) 257 + #define XA_NODE_BUG_ON(node, x) do { \ 258 + if (x) { \ 259 + if (node) xa_dump_node(node); \ 260 + BUG(); \ 261 + } \ 262 + } while (0) 263 + #else 264 + #define XA_BUG_ON(xa, x) do { } while (0) 265 + #define XA_NODE_BUG_ON(node, x) do { } while (0) 266 + #endif 267 + 268 + /* Private */ 269 + static inline void *xa_head(const struct xarray *xa) 270 + { 271 + return rcu_dereference_check(xa->xa_head, 272 + lockdep_is_held(&xa->xa_lock)); 273 + } 274 + 275 + /* Private */ 276 + static inline void *xa_head_locked(const struct xarray *xa) 277 + { 278 + return rcu_dereference_protected(xa->xa_head, 279 + lockdep_is_held(&xa->xa_lock)); 280 + } 281 + 282 + /* Private */ 283 + static inline void *xa_entry(const struct xarray *xa, 284 + const struct xa_node *node, unsigned int offset) 285 + { 286 + XA_NODE_BUG_ON(node, offset >= XA_CHUNK_SIZE); 287 + return rcu_dereference_check(node->slots[offset], 288 + lockdep_is_held(&xa->xa_lock)); 289 + } 290 + 291 + /* Private */ 292 + static inline void *xa_entry_locked(const struct xarray *xa, 293 + const struct xa_node *node, unsigned int offset) 294 + { 295 + XA_NODE_BUG_ON(node, offset >= XA_CHUNK_SIZE); 296 + return rcu_dereference_protected(node->slots[offset], 297 + lockdep_is_held(&xa->xa_lock)); 298 + } 299 + 300 + /* Private */ 301 + static inline struct xa_node *xa_parent(const struct xarray *xa, 302 + const struct xa_node *node) 303 + { 304 + return rcu_dereference_check(node->parent, 305 + lockdep_is_held(&xa->xa_lock)); 306 + } 307 + 308 + /* Private */ 309 + static inline struct xa_node *xa_parent_locked(const struct xarray *xa, 310 + const struct xa_node *node) 311 + { 312 + return rcu_dereference_protected(node->parent, 313 + lockdep_is_held(&xa->xa_lock)); 314 + } 315 + 316 + /* Private */ 317 + static inline void *xa_mk_node(const struct xa_node *node) 318 + { 319 + return (void *)((unsigned long)node | 2); 320 + } 321 + 322 + /* Private */ 323 + static inline struct xa_node *xa_to_node(const void *entry) 324 + { 325 + return (struct xa_node *)((unsigned long)entry - 2); 326 + } 327 + 328 + /* Private */ 329 + static inline bool xa_is_node(const void *entry) 330 + { 331 + return xa_is_internal(entry) && (unsigned long)entry > 4096; 332 + } 333 + 334 + /* Private */ 335 + static inline void *xa_mk_sibling(unsigned int offset) 336 + { 337 + return xa_mk_internal(offset); 338 + } 339 + 340 + /* Private */ 341 + static inline unsigned long xa_to_sibling(const void *entry) 342 + { 343 + return xa_to_internal(entry); 344 + } 345 + 346 + /** 347 + * xa_is_sibling() - Is the entry a sibling entry? 348 + * @entry: Entry retrieved from the XArray 349 + * 350 + * Return: %true if the entry is a sibling entry. 351 + */ 352 + static inline bool xa_is_sibling(const void *entry) 353 + { 354 + return IS_ENABLED(CONFIG_XARRAY_MULTI) && xa_is_internal(entry) && 355 + (entry < xa_mk_sibling(XA_CHUNK_SIZE - 1)); 356 + } 357 + 358 + #define XA_ZERO_ENTRY xa_mk_internal(256) 359 + #define XA_RETRY_ENTRY xa_mk_internal(257) 360 + 361 + /** 362 + * xa_is_zero() - Is the entry a zero entry? 363 + * @entry: Entry retrieved from the XArray 364 + * 365 + * Return: %true if the entry is a zero entry. 366 + */ 367 + static inline bool xa_is_zero(const void *entry) 368 + { 369 + return unlikely(entry == XA_ZERO_ENTRY); 370 + } 371 + 372 + /** 373 + * xa_is_retry() - Is the entry a retry entry? 374 + * @entry: Entry retrieved from the XArray 375 + * 376 + * Return: %true if the entry is a retry entry. 377 + */ 378 + static inline bool xa_is_retry(const void *entry) 379 + { 380 + return unlikely(entry == XA_RETRY_ENTRY); 381 + } 382 + 383 + /** 384 + * typedef xa_update_node_t - A callback function from the XArray. 385 + * @node: The node which is being processed 386 + * 387 + * This function is called every time the XArray updates the count of 388 + * present and value entries in a node. It allows advanced users to 389 + * maintain the private_list in the node. 390 + * 391 + * Context: The xa_lock is held and interrupts may be disabled. 392 + * Implementations should not drop the xa_lock, nor re-enable 393 + * interrupts. 394 + */ 395 + typedef void (*xa_update_node_t)(struct xa_node *node); 396 + 397 + /* 398 + * The xa_state is opaque to its users. It contains various different pieces 399 + * of state involved in the current operation on the XArray. It should be 400 + * declared on the stack and passed between the various internal routines. 401 + * The various elements in it should not be accessed directly, but only 402 + * through the provided accessor functions. The below documentation is for 403 + * the benefit of those working on the code, not for users of the XArray. 404 + * 405 + * @xa_node usually points to the xa_node containing the slot we're operating 406 + * on (and @xa_offset is the offset in the slots array). If there is a 407 + * single entry in the array at index 0, there are no allocated xa_nodes to 408 + * point to, and so we store %NULL in @xa_node. @xa_node is set to 409 + * the value %XAS_RESTART if the xa_state is not walked to the correct 410 + * position in the tree of nodes for this operation. If an error occurs 411 + * during an operation, it is set to an %XAS_ERROR value. If we run off the 412 + * end of the allocated nodes, it is set to %XAS_BOUNDS. 413 + */ 414 + struct xa_state { 415 + struct xarray *xa; 416 + unsigned long xa_index; 417 + unsigned char xa_shift; 418 + unsigned char xa_sibs; 419 + unsigned char xa_offset; 420 + unsigned char xa_pad; /* Helps gcc generate better code */ 421 + struct xa_node *xa_node; 422 + struct xa_node *xa_alloc; 423 + xa_update_node_t xa_update; 424 + }; 425 + 426 + /* 427 + * We encode errnos in the xas->xa_node. If an error has happened, we need to 428 + * drop the lock to fix it, and once we've done so the xa_state is invalid. 429 + */ 430 + #define XA_ERROR(errno) ((struct xa_node *)(((unsigned long)errno << 2) | 2UL)) 431 + #define XAS_BOUNDS ((struct xa_node *)1UL) 432 + #define XAS_RESTART ((struct xa_node *)3UL) 433 + 434 + #define __XA_STATE(array, index, shift, sibs) { \ 435 + .xa = array, \ 436 + .xa_index = index, \ 437 + .xa_shift = shift, \ 438 + .xa_sibs = sibs, \ 439 + .xa_offset = 0, \ 440 + .xa_pad = 0, \ 441 + .xa_node = XAS_RESTART, \ 442 + .xa_alloc = NULL, \ 443 + .xa_update = NULL \ 444 + } 445 + 446 + /** 447 + * XA_STATE() - Declare an XArray operation state. 448 + * @name: Name of this operation state (usually xas). 449 + * @array: Array to operate on. 450 + * @index: Initial index of interest. 451 + * 452 + * Declare and initialise an xa_state on the stack. 453 + */ 454 + #define XA_STATE(name, array, index) \ 455 + struct xa_state name = __XA_STATE(array, index, 0, 0) 456 + 457 + /** 458 + * XA_STATE_ORDER() - Declare an XArray operation state. 459 + * @name: Name of this operation state (usually xas). 460 + * @array: Array to operate on. 461 + * @index: Initial index of interest. 462 + * @order: Order of entry. 463 + * 464 + * Declare and initialise an xa_state on the stack. This variant of 465 + * XA_STATE() allows you to specify the 'order' of the element you 466 + * want to operate on.` 467 + */ 468 + #define XA_STATE_ORDER(name, array, index, order) \ 469 + struct xa_state name = __XA_STATE(array, \ 470 + (index >> order) << order, \ 471 + order - (order % XA_CHUNK_SHIFT), \ 472 + (1U << (order % XA_CHUNK_SHIFT)) - 1) 473 + 474 + #define xas_marked(xas, mark) xa_marked((xas)->xa, (mark)) 475 + #define xas_trylock(xas) xa_trylock((xas)->xa) 476 + #define xas_lock(xas) xa_lock((xas)->xa) 477 + #define xas_unlock(xas) xa_unlock((xas)->xa) 478 + #define xas_lock_bh(xas) xa_lock_bh((xas)->xa) 479 + #define xas_unlock_bh(xas) xa_unlock_bh((xas)->xa) 480 + #define xas_lock_irq(xas) xa_lock_irq((xas)->xa) 481 + #define xas_unlock_irq(xas) xa_unlock_irq((xas)->xa) 482 + #define xas_lock_irqsave(xas, flags) \ 483 + xa_lock_irqsave((xas)->xa, flags) 484 + #define xas_unlock_irqrestore(xas, flags) \ 485 + xa_unlock_irqrestore((xas)->xa, flags) 486 + 487 + /** 488 + * xas_error() - Return an errno stored in the xa_state. 489 + * @xas: XArray operation state. 490 + * 491 + * Return: 0 if no error has been noted. A negative errno if one has. 492 + */ 493 + static inline int xas_error(const struct xa_state *xas) 494 + { 495 + return xa_err(xas->xa_node); 496 + } 497 + 498 + /** 499 + * xas_set_err() - Note an error in the xa_state. 500 + * @xas: XArray operation state. 501 + * @err: Negative error number. 502 + * 503 + * Only call this function with a negative @err; zero or positive errors 504 + * will probably not behave the way you think they should. If you want 505 + * to clear the error from an xa_state, use xas_reset(). 506 + */ 507 + static inline void xas_set_err(struct xa_state *xas, long err) 508 + { 509 + xas->xa_node = XA_ERROR(err); 510 + } 511 + 512 + /** 513 + * xas_invalid() - Is the xas in a retry or error state? 514 + * @xas: XArray operation state. 515 + * 516 + * Return: %true if the xas cannot be used for operations. 517 + */ 518 + static inline bool xas_invalid(const struct xa_state *xas) 519 + { 520 + return (unsigned long)xas->xa_node & 3; 521 + } 522 + 523 + /** 524 + * xas_valid() - Is the xas a valid cursor into the array? 525 + * @xas: XArray operation state. 526 + * 527 + * Return: %true if the xas can be used for operations. 528 + */ 529 + static inline bool xas_valid(const struct xa_state *xas) 530 + { 531 + return !xas_invalid(xas); 532 + } 533 + 534 + /** 535 + * xas_is_node() - Does the xas point to a node? 536 + * @xas: XArray operation state. 537 + * 538 + * Return: %true if the xas currently references a node. 539 + */ 540 + static inline bool xas_is_node(const struct xa_state *xas) 541 + { 542 + return xas_valid(xas) && xas->xa_node; 543 + } 544 + 545 + /* True if the pointer is something other than a node */ 546 + static inline bool xas_not_node(struct xa_node *node) 547 + { 548 + return ((unsigned long)node & 3) || !node; 549 + } 550 + 551 + /* True if the node represents RESTART or an error */ 552 + static inline bool xas_frozen(struct xa_node *node) 553 + { 554 + return (unsigned long)node & 2; 555 + } 556 + 557 + /* True if the node represents head-of-tree, RESTART or BOUNDS */ 558 + static inline bool xas_top(struct xa_node *node) 559 + { 560 + return node <= XAS_RESTART; 561 + } 562 + 563 + /** 564 + * xas_reset() - Reset an XArray operation state. 565 + * @xas: XArray operation state. 566 + * 567 + * Resets the error or walk state of the @xas so future walks of the 568 + * array will start from the root. Use this if you have dropped the 569 + * xarray lock and want to reuse the xa_state. 570 + * 571 + * Context: Any context. 572 + */ 573 + static inline void xas_reset(struct xa_state *xas) 574 + { 575 + xas->xa_node = XAS_RESTART; 576 + } 577 + 578 + /** 579 + * xas_retry() - Retry the operation if appropriate. 580 + * @xas: XArray operation state. 581 + * @entry: Entry from xarray. 582 + * 583 + * The advanced functions may sometimes return an internal entry, such as 584 + * a retry entry or a zero entry. This function sets up the @xas to restart 585 + * the walk from the head of the array if needed. 586 + * 587 + * Context: Any context. 588 + * Return: true if the operation needs to be retried. 589 + */ 590 + static inline bool xas_retry(struct xa_state *xas, const void *entry) 591 + { 592 + if (xa_is_zero(entry)) 593 + return true; 594 + if (!xa_is_retry(entry)) 595 + return false; 596 + xas_reset(xas); 597 + return true; 598 + } 599 + 600 + void *xas_load(struct xa_state *); 601 + void *xas_store(struct xa_state *, void *entry); 602 + void *xas_find(struct xa_state *, unsigned long max); 603 + void *xas_find_conflict(struct xa_state *); 604 + 605 + bool xas_get_mark(const struct xa_state *, xa_mark_t); 606 + void xas_set_mark(const struct xa_state *, xa_mark_t); 607 + void xas_clear_mark(const struct xa_state *, xa_mark_t); 608 + void *xas_find_marked(struct xa_state *, unsigned long max, xa_mark_t); 609 + void xas_init_marks(const struct xa_state *); 610 + 611 + bool xas_nomem(struct xa_state *, gfp_t); 612 + void xas_pause(struct xa_state *); 613 + 614 + void xas_create_range(struct xa_state *); 615 + 616 + /** 617 + * xas_reload() - Refetch an entry from the xarray. 618 + * @xas: XArray operation state. 619 + * 620 + * Use this function to check that a previously loaded entry still has 621 + * the same value. This is useful for the lockless pagecache lookup where 622 + * we walk the array with only the RCU lock to protect us, lock the page, 623 + * then check that the page hasn't moved since we looked it up. 624 + * 625 + * The caller guarantees that @xas is still valid. If it may be in an 626 + * error or restart state, call xas_load() instead. 627 + * 628 + * Return: The entry at this location in the xarray. 629 + */ 630 + static inline void *xas_reload(struct xa_state *xas) 631 + { 632 + struct xa_node *node = xas->xa_node; 633 + 634 + if (node) 635 + return xa_entry(xas->xa, node, xas->xa_offset); 636 + return xa_head(xas->xa); 637 + } 638 + 639 + /** 640 + * xas_set() - Set up XArray operation state for a different index. 641 + * @xas: XArray operation state. 642 + * @index: New index into the XArray. 643 + * 644 + * Move the operation state to refer to a different index. This will 645 + * have the effect of starting a walk from the top; see xas_next() 646 + * to move to an adjacent index. 647 + */ 648 + static inline void xas_set(struct xa_state *xas, unsigned long index) 649 + { 650 + xas->xa_index = index; 651 + xas->xa_node = XAS_RESTART; 652 + } 653 + 654 + /** 655 + * xas_set_order() - Set up XArray operation state for a multislot entry. 656 + * @xas: XArray operation state. 657 + * @index: Target of the operation. 658 + * @order: Entry occupies 2^@order indices. 659 + */ 660 + static inline void xas_set_order(struct xa_state *xas, unsigned long index, 661 + unsigned int order) 662 + { 663 + #ifdef CONFIG_XARRAY_MULTI 664 + xas->xa_index = order < BITS_PER_LONG ? (index >> order) << order : 0; 665 + xas->xa_shift = order - (order % XA_CHUNK_SHIFT); 666 + xas->xa_sibs = (1 << (order % XA_CHUNK_SHIFT)) - 1; 667 + xas->xa_node = XAS_RESTART; 668 + #else 669 + BUG_ON(order > 0); 670 + xas_set(xas, index); 671 + #endif 672 + } 673 + 674 + /** 675 + * xas_set_update() - Set up XArray operation state for a callback. 676 + * @xas: XArray operation state. 677 + * @update: Function to call when updating a node. 678 + * 679 + * The XArray can notify a caller after it has updated an xa_node. 680 + * This is advanced functionality and is only needed by the page cache. 681 + */ 682 + static inline void xas_set_update(struct xa_state *xas, xa_update_node_t update) 683 + { 684 + xas->xa_update = update; 685 + } 686 + 687 + /** 688 + * xas_next_entry() - Advance iterator to next present entry. 689 + * @xas: XArray operation state. 690 + * @max: Highest index to return. 691 + * 692 + * xas_next_entry() is an inline function to optimise xarray traversal for 693 + * speed. It is equivalent to calling xas_find(), and will call xas_find() 694 + * for all the hard cases. 695 + * 696 + * Return: The next present entry after the one currently referred to by @xas. 697 + */ 698 + static inline void *xas_next_entry(struct xa_state *xas, unsigned long max) 699 + { 700 + struct xa_node *node = xas->xa_node; 701 + void *entry; 702 + 703 + if (unlikely(xas_not_node(node) || node->shift || 704 + xas->xa_offset != (xas->xa_index & XA_CHUNK_MASK))) 705 + return xas_find(xas, max); 706 + 707 + do { 708 + if (unlikely(xas->xa_index >= max)) 709 + return xas_find(xas, max); 710 + if (unlikely(xas->xa_offset == XA_CHUNK_MASK)) 711 + return xas_find(xas, max); 712 + entry = xa_entry(xas->xa, node, xas->xa_offset + 1); 713 + if (unlikely(xa_is_internal(entry))) 714 + return xas_find(xas, max); 715 + xas->xa_offset++; 716 + xas->xa_index++; 717 + } while (!entry); 718 + 719 + return entry; 720 + } 721 + 722 + /* Private */ 723 + static inline unsigned int xas_find_chunk(struct xa_state *xas, bool advance, 724 + xa_mark_t mark) 725 + { 726 + unsigned long *addr = xas->xa_node->marks[(__force unsigned)mark]; 727 + unsigned int offset = xas->xa_offset; 728 + 729 + if (advance) 730 + offset++; 731 + if (XA_CHUNK_SIZE == BITS_PER_LONG) { 732 + if (offset < XA_CHUNK_SIZE) { 733 + unsigned long data = *addr & (~0UL << offset); 734 + if (data) 735 + return __ffs(data); 736 + } 737 + return XA_CHUNK_SIZE; 738 + } 739 + 740 + return find_next_bit(addr, XA_CHUNK_SIZE, offset); 741 + } 742 + 743 + /** 744 + * xas_next_marked() - Advance iterator to next marked entry. 745 + * @xas: XArray operation state. 746 + * @max: Highest index to return. 747 + * @mark: Mark to search for. 748 + * 749 + * xas_next_marked() is an inline function to optimise xarray traversal for 750 + * speed. It is equivalent to calling xas_find_marked(), and will call 751 + * xas_find_marked() for all the hard cases. 752 + * 753 + * Return: The next marked entry after the one currently referred to by @xas. 754 + */ 755 + static inline void *xas_next_marked(struct xa_state *xas, unsigned long max, 756 + xa_mark_t mark) 757 + { 758 + struct xa_node *node = xas->xa_node; 759 + unsigned int offset; 760 + 761 + if (unlikely(xas_not_node(node) || node->shift)) 762 + return xas_find_marked(xas, max, mark); 763 + offset = xas_find_chunk(xas, true, mark); 764 + xas->xa_offset = offset; 765 + xas->xa_index = (xas->xa_index & ~XA_CHUNK_MASK) + offset; 766 + if (xas->xa_index > max) 767 + return NULL; 768 + if (offset == XA_CHUNK_SIZE) 769 + return xas_find_marked(xas, max, mark); 770 + return xa_entry(xas->xa, node, offset); 771 + } 772 + 773 + /* 774 + * If iterating while holding a lock, drop the lock and reschedule 775 + * every %XA_CHECK_SCHED loops. 776 + */ 777 + enum { 778 + XA_CHECK_SCHED = 4096, 779 + }; 780 + 781 + /** 782 + * xas_for_each() - Iterate over a range of an XArray. 783 + * @xas: XArray operation state. 784 + * @entry: Entry retrieved from the array. 785 + * @max: Maximum index to retrieve from array. 786 + * 787 + * The loop body will be executed for each entry present in the xarray 788 + * between the current xas position and @max. @entry will be set to 789 + * the entry retrieved from the xarray. It is safe to delete entries 790 + * from the array in the loop body. You should hold either the RCU lock 791 + * or the xa_lock while iterating. If you need to drop the lock, call 792 + * xas_pause() first. 793 + */ 794 + #define xas_for_each(xas, entry, max) \ 795 + for (entry = xas_find(xas, max); entry; \ 796 + entry = xas_next_entry(xas, max)) 797 + 798 + /** 799 + * xas_for_each_marked() - Iterate over a range of an XArray. 800 + * @xas: XArray operation state. 801 + * @entry: Entry retrieved from the array. 802 + * @max: Maximum index to retrieve from array. 803 + * @mark: Mark to search for. 804 + * 805 + * The loop body will be executed for each marked entry in the xarray 806 + * between the current xas position and @max. @entry will be set to 807 + * the entry retrieved from the xarray. It is safe to delete entries 808 + * from the array in the loop body. You should hold either the RCU lock 809 + * or the xa_lock while iterating. If you need to drop the lock, call 810 + * xas_pause() first. 811 + */ 812 + #define xas_for_each_marked(xas, entry, max, mark) \ 813 + for (entry = xas_find_marked(xas, max, mark); entry; \ 814 + entry = xas_next_marked(xas, max, mark)) 815 + 816 + /** 817 + * xas_for_each_conflict() - Iterate over a range of an XArray. 818 + * @xas: XArray operation state. 819 + * @entry: Entry retrieved from the array. 820 + * 821 + * The loop body will be executed for each entry in the XArray that lies 822 + * within the range specified by @xas. If the loop completes successfully, 823 + * any entries that lie in this range will be replaced by @entry. The caller 824 + * may break out of the loop; if they do so, the contents of the XArray will 825 + * be unchanged. The operation may fail due to an out of memory condition. 826 + * The caller may also call xa_set_err() to exit the loop while setting an 827 + * error to record the reason. 828 + */ 829 + #define xas_for_each_conflict(xas, entry) \ 830 + while ((entry = xas_find_conflict(xas))) 831 + 832 + void *__xas_next(struct xa_state *); 833 + void *__xas_prev(struct xa_state *); 834 + 835 + /** 836 + * xas_prev() - Move iterator to previous index. 837 + * @xas: XArray operation state. 838 + * 839 + * If the @xas was in an error state, it will remain in an error state 840 + * and this function will return %NULL. If the @xas has never been walked, 841 + * it will have the effect of calling xas_load(). Otherwise one will be 842 + * subtracted from the index and the state will be walked to the correct 843 + * location in the array for the next operation. 844 + * 845 + * If the iterator was referencing index 0, this function wraps 846 + * around to %ULONG_MAX. 847 + * 848 + * Return: The entry at the new index. This may be %NULL or an internal 849 + * entry. 850 + */ 851 + static inline void *xas_prev(struct xa_state *xas) 852 + { 853 + struct xa_node *node = xas->xa_node; 854 + 855 + if (unlikely(xas_not_node(node) || node->shift || 856 + xas->xa_offset == 0)) 857 + return __xas_prev(xas); 858 + 859 + xas->xa_index--; 860 + xas->xa_offset--; 861 + return xa_entry(xas->xa, node, xas->xa_offset); 862 + } 863 + 864 + /** 865 + * xas_next() - Move state to next index. 866 + * @xas: XArray operation state. 867 + * 868 + * If the @xas was in an error state, it will remain in an error state 869 + * and this function will return %NULL. If the @xas has never been walked, 870 + * it will have the effect of calling xas_load(). Otherwise one will be 871 + * added to the index and the state will be walked to the correct 872 + * location in the array for the next operation. 873 + * 874 + * If the iterator was referencing index %ULONG_MAX, this function wraps 875 + * around to 0. 876 + * 877 + * Return: The entry at the new index. This may be %NULL or an internal 878 + * entry. 879 + */ 880 + static inline void *xas_next(struct xa_state *xas) 881 + { 882 + struct xa_node *node = xas->xa_node; 883 + 884 + if (unlikely(xas_not_node(node) || node->shift || 885 + xas->xa_offset == XA_CHUNK_MASK)) 886 + return __xas_next(xas); 887 + 888 + xas->xa_index++; 889 + xas->xa_offset++; 890 + return xa_entry(xas->xa, node, xas->xa_offset); 891 + } 445 892 446 893 #endif /* _LINUX_XARRAY_H */
+15 -60
kernel/memremap.c
··· 1 1 /* SPDX-License-Identifier: GPL-2.0 */ 2 2 /* Copyright(c) 2015 Intel Corporation. All rights reserved. */ 3 - #include <linux/radix-tree.h> 4 3 #include <linux/device.h> 5 - #include <linux/types.h> 6 - #include <linux/pfn_t.h> 7 4 #include <linux/io.h> 8 5 #include <linux/kasan.h> 9 - #include <linux/mm.h> 10 6 #include <linux/memory_hotplug.h> 7 + #include <linux/mm.h> 8 + #include <linux/pfn_t.h> 11 9 #include <linux/swap.h> 12 10 #include <linux/swapops.h> 11 + #include <linux/types.h> 13 12 #include <linux/wait_bit.h> 13 + #include <linux/xarray.h> 14 14 15 - static DEFINE_MUTEX(pgmap_lock); 16 - static RADIX_TREE(pgmap_radix, GFP_KERNEL); 15 + static DEFINE_XARRAY(pgmap_array); 17 16 #define SECTION_MASK ~((1UL << PA_SECTION_SHIFT) - 1) 18 17 #define SECTION_SIZE (1UL << PA_SECTION_SHIFT) 19 - 20 - static unsigned long order_at(struct resource *res, unsigned long pgoff) 21 - { 22 - unsigned long phys_pgoff = PHYS_PFN(res->start) + pgoff; 23 - unsigned long nr_pages, mask; 24 - 25 - nr_pages = PHYS_PFN(resource_size(res)); 26 - if (nr_pages == pgoff) 27 - return ULONG_MAX; 28 - 29 - /* 30 - * What is the largest aligned power-of-2 range available from 31 - * this resource pgoff to the end of the resource range, 32 - * considering the alignment of the current pgoff? 33 - */ 34 - mask = phys_pgoff | rounddown_pow_of_two(nr_pages - pgoff); 35 - if (!mask) 36 - return ULONG_MAX; 37 - 38 - return find_first_bit(&mask, BITS_PER_LONG); 39 - } 40 - 41 - #define foreach_order_pgoff(res, order, pgoff) \ 42 - for (pgoff = 0, order = order_at((res), pgoff); order < ULONG_MAX; \ 43 - pgoff += 1UL << order, order = order_at((res), pgoff)) 44 18 45 19 #if IS_ENABLED(CONFIG_DEVICE_PRIVATE) 46 20 vm_fault_t device_private_entry_fault(struct vm_area_struct *vma, ··· 44 70 EXPORT_SYMBOL(device_private_entry_fault); 45 71 #endif /* CONFIG_DEVICE_PRIVATE */ 46 72 47 - static void pgmap_radix_release(struct resource *res, unsigned long end_pgoff) 73 + static void pgmap_array_delete(struct resource *res) 48 74 { 49 - unsigned long pgoff, order; 50 - 51 - mutex_lock(&pgmap_lock); 52 - foreach_order_pgoff(res, order, pgoff) { 53 - if (pgoff >= end_pgoff) 54 - break; 55 - radix_tree_delete(&pgmap_radix, PHYS_PFN(res->start) + pgoff); 56 - } 57 - mutex_unlock(&pgmap_lock); 58 - 75 + xa_store_range(&pgmap_array, PHYS_PFN(res->start), PHYS_PFN(res->end), 76 + NULL, GFP_KERNEL); 59 77 synchronize_rcu(); 60 78 } 61 79 ··· 108 142 mem_hotplug_done(); 109 143 110 144 untrack_pfn(NULL, PHYS_PFN(align_start), align_size); 111 - pgmap_radix_release(res, -1); 145 + pgmap_array_delete(res); 112 146 dev_WARN_ONCE(dev, pgmap->altmap.alloc, 113 147 "%s: failed to free all reserved pages\n", __func__); 114 148 } ··· 143 177 struct resource *res = &pgmap->res; 144 178 struct dev_pagemap *conflict_pgmap; 145 179 pgprot_t pgprot = PAGE_KERNEL; 146 - unsigned long pgoff, order; 147 180 int error, nid, is_ram; 148 181 149 182 align_start = res->start & ~(SECTION_SIZE - 1); ··· 181 216 182 217 pgmap->dev = dev; 183 218 184 - mutex_lock(&pgmap_lock); 185 - error = 0; 186 - 187 - foreach_order_pgoff(res, order, pgoff) { 188 - error = __radix_tree_insert(&pgmap_radix, 189 - PHYS_PFN(res->start) + pgoff, order, pgmap); 190 - if (error) { 191 - dev_err(dev, "%s: failed: %d\n", __func__, error); 192 - break; 193 - } 194 - } 195 - mutex_unlock(&pgmap_lock); 219 + error = xa_err(xa_store_range(&pgmap_array, PHYS_PFN(res->start), 220 + PHYS_PFN(res->end), pgmap, GFP_KERNEL)); 196 221 if (error) 197 - goto err_radix; 222 + goto err_array; 198 223 199 224 nid = dev_to_node(dev); 200 225 if (nid < 0) ··· 229 274 err_kasan: 230 275 untrack_pfn(NULL, PHYS_PFN(align_start), align_size); 231 276 err_pfn_remap: 232 - err_radix: 233 - pgmap_radix_release(res, pgoff); 277 + pgmap_array_delete(res); 278 + err_array: 234 279 return ERR_PTR(error); 235 280 } 236 281 EXPORT_SYMBOL(devm_memremap_pages); ··· 270 315 271 316 /* fall back to slow path lookup */ 272 317 rcu_read_lock(); 273 - pgmap = radix_tree_lookup(&pgmap_radix, PHYS_PFN(phys)); 318 + pgmap = xa_load(&pgmap_array, PHYS_PFN(phys)); 274 319 if (pgmap && !percpu_ref_tryget_live(pgmap->ref)) 275 320 pgmap = NULL; 276 321 rcu_read_unlock();
+4 -1
lib/Kconfig
··· 399 399 400 400 for more information. 401 401 402 - config RADIX_TREE_MULTIORDER 402 + config XARRAY_MULTI 403 403 bool 404 + help 405 + Support entries which occupy multiple consecutive indices in the 406 + XArray. 404 407 405 408 config ASSOCIATIVE_ARRAY 406 409 bool
+3
lib/Kconfig.debug
··· 1813 1813 config TEST_UUID 1814 1814 tristate "Test functions located in the uuid module at runtime" 1815 1815 1816 + config TEST_XARRAY 1817 + tristate "Test the XArray code at runtime" 1818 + 1816 1819 config TEST_OVERFLOW 1817 1820 tristate "Test check_*_overflow() functions at runtime" 1818 1821
+2 -1
lib/Makefile
··· 18 18 KCOV_INSTRUMENT_dynamic_debug.o := n 19 19 20 20 lib-y := ctype.o string.o vsprintf.o cmdline.o \ 21 - rbtree.o radix-tree.o timerqueue.o\ 21 + rbtree.o radix-tree.o timerqueue.o xarray.o \ 22 22 idr.o int_sqrt.o extable.o \ 23 23 sha1.o chacha20.o irq_regs.o argv_split.o \ 24 24 flex_proportions.o ratelimit.o show_mem.o \ ··· 68 68 obj-$(CONFIG_TEST_BITMAP) += test_bitmap.o 69 69 obj-$(CONFIG_TEST_BITFIELD) += test_bitfield.o 70 70 obj-$(CONFIG_TEST_UUID) += test_uuid.o 71 + obj-$(CONFIG_TEST_XARRAY) += test_xarray.o 71 72 obj-$(CONFIG_TEST_PARMAN) += test_parman.o 72 73 obj-$(CONFIG_TEST_KMOD) += test_kmod.o 73 74 obj-$(CONFIG_TEST_DEBUG_VIRTUAL) += test_debug_virtual.o
+212 -199
lib/idr.c
··· 6 6 #include <linux/spinlock.h> 7 7 #include <linux/xarray.h> 8 8 9 - DEFINE_PER_CPU(struct ida_bitmap *, ida_bitmap); 10 - 11 9 /** 12 10 * idr_alloc_u32() - Allocate an ID. 13 11 * @idr: IDR handle. ··· 37 39 unsigned int base = idr->idr_base; 38 40 unsigned int id = *nextid; 39 41 40 - if (WARN_ON_ONCE(radix_tree_is_internal_node(ptr))) 41 - return -EINVAL; 42 - if (WARN_ON_ONCE(!(idr->idr_rt.gfp_mask & ROOT_IS_IDR))) 43 - idr->idr_rt.gfp_mask |= IDR_RT_MARKER; 42 + if (WARN_ON_ONCE(!(idr->idr_rt.xa_flags & ROOT_IS_IDR))) 43 + idr->idr_rt.xa_flags |= IDR_RT_MARKER; 44 44 45 45 id = (id < base) ? 0 : id - base; 46 46 radix_tree_iter_init(&iter, id); ··· 291 295 void __rcu **slot = NULL; 292 296 void *entry; 293 297 294 - if (WARN_ON_ONCE(radix_tree_is_internal_node(ptr))) 295 - return ERR_PTR(-EINVAL); 296 298 id -= idr->idr_base; 297 299 298 300 entry = __radix_tree_lookup(&idr->idr_rt, id, &node, &slot); 299 301 if (!slot || radix_tree_tag_get(&idr->idr_rt, id, IDR_FREE)) 300 302 return ERR_PTR(-ENOENT); 301 303 302 - __radix_tree_replace(&idr->idr_rt, node, slot, ptr, NULL); 304 + __radix_tree_replace(&idr->idr_rt, node, slot, ptr); 303 305 304 306 return entry; 305 307 } ··· 318 324 * free the individual IDs in it. You can use ida_is_empty() to find 319 325 * out whether the IDA has any IDs currently allocated. 320 326 * 327 + * The IDA handles its own locking. It is safe to call any of the IDA 328 + * functions without synchronisation in your code. 329 + * 321 330 * IDs are currently limited to the range [0-INT_MAX]. If this is an awkward 322 331 * limitation, it should be quite straightforward to raise the maximum. 323 332 */ ··· 328 331 /* 329 332 * Developer's notes: 330 333 * 331 - * The IDA uses the functionality provided by the IDR & radix tree to store 332 - * bitmaps in each entry. The IDR_FREE tag means there is at least one bit 333 - * free, unlike the IDR where it means at least one entry is free. 334 + * The IDA uses the functionality provided by the XArray to store bitmaps in 335 + * each entry. The XA_FREE_MARK is only cleared when all bits in the bitmap 336 + * have been set. 334 337 * 335 - * I considered telling the radix tree that each slot is an order-10 node 336 - * and storing the bit numbers in the radix tree, but the radix tree can't 337 - * allow a single multiorder entry at index 0, which would significantly 338 - * increase memory consumption for the IDA. So instead we divide the index 339 - * by the number of bits in the leaf bitmap before doing a radix tree lookup. 338 + * I considered telling the XArray that each slot is an order-10 node 339 + * and indexing by bit number, but the XArray can't allow a single multi-index 340 + * entry in the head, which would significantly increase memory consumption 341 + * for the IDA. So instead we divide the index by the number of bits in the 342 + * leaf bitmap before doing a radix tree lookup. 340 343 * 341 344 * As an optimisation, if there are only a few low bits set in any given 342 - * leaf, instead of allocating a 128-byte bitmap, we use the 'exceptional 343 - * entry' functionality of the radix tree to store BITS_PER_LONG - 2 bits 344 - * directly in the entry. By being really tricksy, we could store 345 - * BITS_PER_LONG - 1 bits, but there're diminishing returns after optimising 346 - * for 0-3 allocated IDs. 345 + * leaf, instead of allocating a 128-byte bitmap, we store the bits 346 + * as a value entry. Value entries never have the XA_FREE_MARK cleared 347 + * because we can always convert them into a bitmap entry. 347 348 * 348 - * We allow the radix tree 'exceptional' count to get out of date. Nothing 349 - * in the IDA nor the radix tree code checks it. If it becomes important 350 - * to maintain an accurate exceptional count, switch the rcu_assign_pointer() 351 - * calls to radix_tree_iter_replace() which will correct the exceptional 352 - * count. 349 + * It would be possible to optimise further; once we've run out of a 350 + * single 128-byte bitmap, we currently switch to a 576-byte node, put 351 + * the 128-byte bitmap in the first entry and then start allocating extra 352 + * 128-byte entries. We could instead use the 512 bytes of the node's 353 + * data as a bitmap before moving to that scheme. I do not believe this 354 + * is a worthwhile optimisation; Rasmus Villemoes surveyed the current 355 + * users of the IDA and almost none of them use more than 1024 entries. 356 + * Those that do use more than the 8192 IDs that the 512 bytes would 357 + * provide. 353 358 * 354 - * The IDA always requires a lock to alloc/free. If we add a 'test_bit' 359 + * The IDA always uses a lock to alloc/free. If we add a 'test_bit' 355 360 * equivalent, it will still need locking. Going to RCU lookup would require 356 361 * using RCU to free bitmaps, and that's not trivial without embedding an 357 362 * RCU head in the bitmap, which adds a 2-pointer overhead to each 128-byte 358 363 * bitmap, which is excessive. 359 364 */ 360 - 361 - #define IDA_MAX (0x80000000U / IDA_BITMAP_BITS - 1) 362 - 363 - static int ida_get_new_above(struct ida *ida, int start) 364 - { 365 - struct radix_tree_root *root = &ida->ida_rt; 366 - void __rcu **slot; 367 - struct radix_tree_iter iter; 368 - struct ida_bitmap *bitmap; 369 - unsigned long index; 370 - unsigned bit, ebit; 371 - int new; 372 - 373 - index = start / IDA_BITMAP_BITS; 374 - bit = start % IDA_BITMAP_BITS; 375 - ebit = bit + RADIX_TREE_EXCEPTIONAL_SHIFT; 376 - 377 - slot = radix_tree_iter_init(&iter, index); 378 - for (;;) { 379 - if (slot) 380 - slot = radix_tree_next_slot(slot, &iter, 381 - RADIX_TREE_ITER_TAGGED); 382 - if (!slot) { 383 - slot = idr_get_free(root, &iter, GFP_NOWAIT, IDA_MAX); 384 - if (IS_ERR(slot)) { 385 - if (slot == ERR_PTR(-ENOMEM)) 386 - return -EAGAIN; 387 - return PTR_ERR(slot); 388 - } 389 - } 390 - if (iter.index > index) { 391 - bit = 0; 392 - ebit = RADIX_TREE_EXCEPTIONAL_SHIFT; 393 - } 394 - new = iter.index * IDA_BITMAP_BITS; 395 - bitmap = rcu_dereference_raw(*slot); 396 - if (radix_tree_exception(bitmap)) { 397 - unsigned long tmp = (unsigned long)bitmap; 398 - ebit = find_next_zero_bit(&tmp, BITS_PER_LONG, ebit); 399 - if (ebit < BITS_PER_LONG) { 400 - tmp |= 1UL << ebit; 401 - rcu_assign_pointer(*slot, (void *)tmp); 402 - return new + ebit - 403 - RADIX_TREE_EXCEPTIONAL_SHIFT; 404 - } 405 - bitmap = this_cpu_xchg(ida_bitmap, NULL); 406 - if (!bitmap) 407 - return -EAGAIN; 408 - bitmap->bitmap[0] = tmp >> RADIX_TREE_EXCEPTIONAL_SHIFT; 409 - rcu_assign_pointer(*slot, bitmap); 410 - } 411 - 412 - if (bitmap) { 413 - bit = find_next_zero_bit(bitmap->bitmap, 414 - IDA_BITMAP_BITS, bit); 415 - new += bit; 416 - if (new < 0) 417 - return -ENOSPC; 418 - if (bit == IDA_BITMAP_BITS) 419 - continue; 420 - 421 - __set_bit(bit, bitmap->bitmap); 422 - if (bitmap_full(bitmap->bitmap, IDA_BITMAP_BITS)) 423 - radix_tree_iter_tag_clear(root, &iter, 424 - IDR_FREE); 425 - } else { 426 - new += bit; 427 - if (new < 0) 428 - return -ENOSPC; 429 - if (ebit < BITS_PER_LONG) { 430 - bitmap = (void *)((1UL << ebit) | 431 - RADIX_TREE_EXCEPTIONAL_ENTRY); 432 - radix_tree_iter_replace(root, &iter, slot, 433 - bitmap); 434 - return new; 435 - } 436 - bitmap = this_cpu_xchg(ida_bitmap, NULL); 437 - if (!bitmap) 438 - return -EAGAIN; 439 - __set_bit(bit, bitmap->bitmap); 440 - radix_tree_iter_replace(root, &iter, slot, bitmap); 441 - } 442 - 443 - return new; 444 - } 445 - } 446 - 447 - static void ida_remove(struct ida *ida, int id) 448 - { 449 - unsigned long index = id / IDA_BITMAP_BITS; 450 - unsigned offset = id % IDA_BITMAP_BITS; 451 - struct ida_bitmap *bitmap; 452 - unsigned long *btmp; 453 - struct radix_tree_iter iter; 454 - void __rcu **slot; 455 - 456 - slot = radix_tree_iter_lookup(&ida->ida_rt, &iter, index); 457 - if (!slot) 458 - goto err; 459 - 460 - bitmap = rcu_dereference_raw(*slot); 461 - if (radix_tree_exception(bitmap)) { 462 - btmp = (unsigned long *)slot; 463 - offset += RADIX_TREE_EXCEPTIONAL_SHIFT; 464 - if (offset >= BITS_PER_LONG) 465 - goto err; 466 - } else { 467 - btmp = bitmap->bitmap; 468 - } 469 - if (!test_bit(offset, btmp)) 470 - goto err; 471 - 472 - __clear_bit(offset, btmp); 473 - radix_tree_iter_tag_set(&ida->ida_rt, &iter, IDR_FREE); 474 - if (radix_tree_exception(bitmap)) { 475 - if (rcu_dereference_raw(*slot) == 476 - (void *)RADIX_TREE_EXCEPTIONAL_ENTRY) 477 - radix_tree_iter_delete(&ida->ida_rt, &iter, slot); 478 - } else if (bitmap_empty(btmp, IDA_BITMAP_BITS)) { 479 - kfree(bitmap); 480 - radix_tree_iter_delete(&ida->ida_rt, &iter, slot); 481 - } 482 - return; 483 - err: 484 - WARN(1, "ida_free called for id=%d which is not allocated.\n", id); 485 - } 486 - 487 - /** 488 - * ida_destroy() - Free all IDs. 489 - * @ida: IDA handle. 490 - * 491 - * Calling this function frees all IDs and releases all resources used 492 - * by an IDA. When this call returns, the IDA is empty and can be reused 493 - * or freed. If the IDA is already empty, there is no need to call this 494 - * function. 495 - * 496 - * Context: Any context. 497 - */ 498 - void ida_destroy(struct ida *ida) 499 - { 500 - unsigned long flags; 501 - struct radix_tree_iter iter; 502 - void __rcu **slot; 503 - 504 - xa_lock_irqsave(&ida->ida_rt, flags); 505 - radix_tree_for_each_slot(slot, &ida->ida_rt, &iter, 0) { 506 - struct ida_bitmap *bitmap = rcu_dereference_raw(*slot); 507 - if (!radix_tree_exception(bitmap)) 508 - kfree(bitmap); 509 - radix_tree_iter_delete(&ida->ida_rt, &iter, slot); 510 - } 511 - xa_unlock_irqrestore(&ida->ida_rt, flags); 512 - } 513 - EXPORT_SYMBOL(ida_destroy); 514 365 515 366 /** 516 367 * ida_alloc_range() - Allocate an unused ID. ··· 377 532 int ida_alloc_range(struct ida *ida, unsigned int min, unsigned int max, 378 533 gfp_t gfp) 379 534 { 380 - int id = 0; 535 + XA_STATE(xas, &ida->xa, min / IDA_BITMAP_BITS); 536 + unsigned bit = min % IDA_BITMAP_BITS; 381 537 unsigned long flags; 538 + struct ida_bitmap *bitmap, *alloc = NULL; 382 539 383 540 if ((int)min < 0) 384 541 return -ENOSPC; ··· 388 541 if ((int)max < 0) 389 542 max = INT_MAX; 390 543 391 - again: 392 - xa_lock_irqsave(&ida->ida_rt, flags); 393 - id = ida_get_new_above(ida, min); 394 - if (id > (int)max) { 395 - ida_remove(ida, id); 396 - id = -ENOSPC; 397 - } 398 - xa_unlock_irqrestore(&ida->ida_rt, flags); 544 + retry: 545 + xas_lock_irqsave(&xas, flags); 546 + next: 547 + bitmap = xas_find_marked(&xas, max / IDA_BITMAP_BITS, XA_FREE_MARK); 548 + if (xas.xa_index > min / IDA_BITMAP_BITS) 549 + bit = 0; 550 + if (xas.xa_index * IDA_BITMAP_BITS + bit > max) 551 + goto nospc; 399 552 400 - if (unlikely(id == -EAGAIN)) { 401 - if (!ida_pre_get(ida, gfp)) 402 - return -ENOMEM; 403 - goto again; 553 + if (xa_is_value(bitmap)) { 554 + unsigned long tmp = xa_to_value(bitmap); 555 + 556 + if (bit < BITS_PER_XA_VALUE) { 557 + bit = find_next_zero_bit(&tmp, BITS_PER_XA_VALUE, bit); 558 + if (xas.xa_index * IDA_BITMAP_BITS + bit > max) 559 + goto nospc; 560 + if (bit < BITS_PER_XA_VALUE) { 561 + tmp |= 1UL << bit; 562 + xas_store(&xas, xa_mk_value(tmp)); 563 + goto out; 564 + } 565 + } 566 + bitmap = alloc; 567 + if (!bitmap) 568 + bitmap = kzalloc(sizeof(*bitmap), GFP_NOWAIT); 569 + if (!bitmap) 570 + goto alloc; 571 + bitmap->bitmap[0] = tmp; 572 + xas_store(&xas, bitmap); 573 + if (xas_error(&xas)) { 574 + bitmap->bitmap[0] = 0; 575 + goto out; 576 + } 404 577 } 405 578 406 - return id; 579 + if (bitmap) { 580 + bit = find_next_zero_bit(bitmap->bitmap, IDA_BITMAP_BITS, bit); 581 + if (xas.xa_index * IDA_BITMAP_BITS + bit > max) 582 + goto nospc; 583 + if (bit == IDA_BITMAP_BITS) 584 + goto next; 585 + 586 + __set_bit(bit, bitmap->bitmap); 587 + if (bitmap_full(bitmap->bitmap, IDA_BITMAP_BITS)) 588 + xas_clear_mark(&xas, XA_FREE_MARK); 589 + } else { 590 + if (bit < BITS_PER_XA_VALUE) { 591 + bitmap = xa_mk_value(1UL << bit); 592 + } else { 593 + bitmap = alloc; 594 + if (!bitmap) 595 + bitmap = kzalloc(sizeof(*bitmap), GFP_NOWAIT); 596 + if (!bitmap) 597 + goto alloc; 598 + __set_bit(bit, bitmap->bitmap); 599 + } 600 + xas_store(&xas, bitmap); 601 + } 602 + out: 603 + xas_unlock_irqrestore(&xas, flags); 604 + if (xas_nomem(&xas, gfp)) { 605 + xas.xa_index = min / IDA_BITMAP_BITS; 606 + bit = min % IDA_BITMAP_BITS; 607 + goto retry; 608 + } 609 + if (bitmap != alloc) 610 + kfree(alloc); 611 + if (xas_error(&xas)) 612 + return xas_error(&xas); 613 + return xas.xa_index * IDA_BITMAP_BITS + bit; 614 + alloc: 615 + xas_unlock_irqrestore(&xas, flags); 616 + alloc = kzalloc(sizeof(*bitmap), gfp); 617 + if (!alloc) 618 + return -ENOMEM; 619 + xas_set(&xas, min / IDA_BITMAP_BITS); 620 + bit = min % IDA_BITMAP_BITS; 621 + goto retry; 622 + nospc: 623 + xas_unlock_irqrestore(&xas, flags); 624 + return -ENOSPC; 407 625 } 408 626 EXPORT_SYMBOL(ida_alloc_range); 409 627 ··· 481 569 */ 482 570 void ida_free(struct ida *ida, unsigned int id) 483 571 { 572 + XA_STATE(xas, &ida->xa, id / IDA_BITMAP_BITS); 573 + unsigned bit = id % IDA_BITMAP_BITS; 574 + struct ida_bitmap *bitmap; 484 575 unsigned long flags; 485 576 486 577 BUG_ON((int)id < 0); 487 - xa_lock_irqsave(&ida->ida_rt, flags); 488 - ida_remove(ida, id); 489 - xa_unlock_irqrestore(&ida->ida_rt, flags); 578 + 579 + xas_lock_irqsave(&xas, flags); 580 + bitmap = xas_load(&xas); 581 + 582 + if (xa_is_value(bitmap)) { 583 + unsigned long v = xa_to_value(bitmap); 584 + if (bit >= BITS_PER_XA_VALUE) 585 + goto err; 586 + if (!(v & (1UL << bit))) 587 + goto err; 588 + v &= ~(1UL << bit); 589 + if (!v) 590 + goto delete; 591 + xas_store(&xas, xa_mk_value(v)); 592 + } else { 593 + if (!test_bit(bit, bitmap->bitmap)) 594 + goto err; 595 + __clear_bit(bit, bitmap->bitmap); 596 + xas_set_mark(&xas, XA_FREE_MARK); 597 + if (bitmap_empty(bitmap->bitmap, IDA_BITMAP_BITS)) { 598 + kfree(bitmap); 599 + delete: 600 + xas_store(&xas, NULL); 601 + } 602 + } 603 + xas_unlock_irqrestore(&xas, flags); 604 + return; 605 + err: 606 + xas_unlock_irqrestore(&xas, flags); 607 + WARN(1, "ida_free called for id=%d which is not allocated.\n", id); 490 608 } 491 609 EXPORT_SYMBOL(ida_free); 610 + 611 + /** 612 + * ida_destroy() - Free all IDs. 613 + * @ida: IDA handle. 614 + * 615 + * Calling this function frees all IDs and releases all resources used 616 + * by an IDA. When this call returns, the IDA is empty and can be reused 617 + * or freed. If the IDA is already empty, there is no need to call this 618 + * function. 619 + * 620 + * Context: Any context. 621 + */ 622 + void ida_destroy(struct ida *ida) 623 + { 624 + XA_STATE(xas, &ida->xa, 0); 625 + struct ida_bitmap *bitmap; 626 + unsigned long flags; 627 + 628 + xas_lock_irqsave(&xas, flags); 629 + xas_for_each(&xas, bitmap, ULONG_MAX) { 630 + if (!xa_is_value(bitmap)) 631 + kfree(bitmap); 632 + xas_store(&xas, NULL); 633 + } 634 + xas_unlock_irqrestore(&xas, flags); 635 + } 636 + EXPORT_SYMBOL(ida_destroy); 637 + 638 + #ifndef __KERNEL__ 639 + extern void xa_dump_index(unsigned long index, unsigned int shift); 640 + #define IDA_CHUNK_SHIFT ilog2(IDA_BITMAP_BITS) 641 + 642 + static void ida_dump_entry(void *entry, unsigned long index) 643 + { 644 + unsigned long i; 645 + 646 + if (!entry) 647 + return; 648 + 649 + if (xa_is_node(entry)) { 650 + struct xa_node *node = xa_to_node(entry); 651 + unsigned int shift = node->shift + IDA_CHUNK_SHIFT + 652 + XA_CHUNK_SHIFT; 653 + 654 + xa_dump_index(index * IDA_BITMAP_BITS, shift); 655 + xa_dump_node(node); 656 + for (i = 0; i < XA_CHUNK_SIZE; i++) 657 + ida_dump_entry(node->slots[i], 658 + index | (i << node->shift)); 659 + } else if (xa_is_value(entry)) { 660 + xa_dump_index(index * IDA_BITMAP_BITS, ilog2(BITS_PER_LONG)); 661 + pr_cont("value: data %lx [%px]\n", xa_to_value(entry), entry); 662 + } else { 663 + struct ida_bitmap *bitmap = entry; 664 + 665 + xa_dump_index(index * IDA_BITMAP_BITS, IDA_CHUNK_SHIFT); 666 + pr_cont("bitmap: %p data", bitmap); 667 + for (i = 0; i < IDA_BITMAP_LONGS; i++) 668 + pr_cont(" %lx", bitmap->bitmap[i]); 669 + pr_cont("\n"); 670 + } 671 + } 672 + 673 + static void ida_dump(struct ida *ida) 674 + { 675 + struct xarray *xa = &ida->xa; 676 + pr_debug("ida: %p node %p free %d\n", ida, xa->xa_head, 677 + xa->xa_flags >> ROOT_TAG_SHIFT); 678 + ida_dump_entry(xa->xa_head, 0); 679 + } 680 + #endif
+88 -746
lib/radix-tree.c
··· 38 38 #include <linux/rcupdate.h> 39 39 #include <linux/slab.h> 40 40 #include <linux/string.h> 41 + #include <linux/xarray.h> 41 42 42 - 43 - /* Number of nodes in fully populated tree of given height */ 44 - static unsigned long height_to_maxnodes[RADIX_TREE_MAX_PATH + 1] __read_mostly; 45 43 46 44 /* 47 45 * Radix tree node cache. 48 46 */ 49 - static struct kmem_cache *radix_tree_node_cachep; 47 + struct kmem_cache *radix_tree_node_cachep; 50 48 51 49 /* 52 50 * The radix tree is variable-height, so an insert operation not only has ··· 96 98 return (void *)((unsigned long)ptr | RADIX_TREE_INTERNAL_NODE); 97 99 } 98 100 99 - #define RADIX_TREE_RETRY node_to_entry(NULL) 100 - 101 - #ifdef CONFIG_RADIX_TREE_MULTIORDER 102 - /* Sibling slots point directly to another slot in the same node */ 103 - static inline 104 - bool is_sibling_entry(const struct radix_tree_node *parent, void *node) 105 - { 106 - void __rcu **ptr = node; 107 - return (parent->slots <= ptr) && 108 - (ptr < parent->slots + RADIX_TREE_MAP_SIZE); 109 - } 110 - #else 111 - static inline 112 - bool is_sibling_entry(const struct radix_tree_node *parent, void *node) 113 - { 114 - return false; 115 - } 116 - #endif 101 + #define RADIX_TREE_RETRY XA_RETRY_ENTRY 117 102 118 103 static inline unsigned long 119 104 get_slot_offset(const struct radix_tree_node *parent, void __rcu **slot) ··· 110 129 unsigned int offset = (index >> parent->shift) & RADIX_TREE_MAP_MASK; 111 130 void __rcu **entry = rcu_dereference_raw(parent->slots[offset]); 112 131 113 - #ifdef CONFIG_RADIX_TREE_MULTIORDER 114 - if (radix_tree_is_internal_node(entry)) { 115 - if (is_sibling_entry(parent, entry)) { 116 - void __rcu **sibentry; 117 - sibentry = (void __rcu **) entry_to_node(entry); 118 - offset = get_slot_offset(parent, sibentry); 119 - entry = rcu_dereference_raw(*sibentry); 120 - } 121 - } 122 - #endif 123 - 124 132 *nodep = (void *)entry; 125 133 return offset; 126 134 } 127 135 128 136 static inline gfp_t root_gfp_mask(const struct radix_tree_root *root) 129 137 { 130 - return root->gfp_mask & (__GFP_BITS_MASK & ~GFP_ZONEMASK); 138 + return root->xa_flags & (__GFP_BITS_MASK & ~GFP_ZONEMASK); 131 139 } 132 140 133 141 static inline void tag_set(struct radix_tree_node *node, unsigned int tag, ··· 139 169 140 170 static inline void root_tag_set(struct radix_tree_root *root, unsigned tag) 141 171 { 142 - root->gfp_mask |= (__force gfp_t)(1 << (tag + ROOT_TAG_SHIFT)); 172 + root->xa_flags |= (__force gfp_t)(1 << (tag + ROOT_TAG_SHIFT)); 143 173 } 144 174 145 175 static inline void root_tag_clear(struct radix_tree_root *root, unsigned tag) 146 176 { 147 - root->gfp_mask &= (__force gfp_t)~(1 << (tag + ROOT_TAG_SHIFT)); 177 + root->xa_flags &= (__force gfp_t)~(1 << (tag + ROOT_TAG_SHIFT)); 148 178 } 149 179 150 180 static inline void root_tag_clear_all(struct radix_tree_root *root) 151 181 { 152 - root->gfp_mask &= (1 << ROOT_TAG_SHIFT) - 1; 182 + root->xa_flags &= (__force gfp_t)((1 << ROOT_TAG_SHIFT) - 1); 153 183 } 154 184 155 185 static inline int root_tag_get(const struct radix_tree_root *root, unsigned tag) 156 186 { 157 - return (__force int)root->gfp_mask & (1 << (tag + ROOT_TAG_SHIFT)); 187 + return (__force int)root->xa_flags & (1 << (tag + ROOT_TAG_SHIFT)); 158 188 } 159 189 160 190 static inline unsigned root_tags_get(const struct radix_tree_root *root) 161 191 { 162 - return (__force unsigned)root->gfp_mask >> ROOT_TAG_SHIFT; 192 + return (__force unsigned)root->xa_flags >> ROOT_TAG_SHIFT; 163 193 } 164 194 165 195 static inline bool is_idr(const struct radix_tree_root *root) 166 196 { 167 - return !!(root->gfp_mask & ROOT_IS_IDR); 197 + return !!(root->xa_flags & ROOT_IS_IDR); 168 198 } 169 199 170 200 /* ··· 224 254 225 255 static unsigned int iter_offset(const struct radix_tree_iter *iter) 226 256 { 227 - return (iter->index >> iter_shift(iter)) & RADIX_TREE_MAP_MASK; 257 + return iter->index & RADIX_TREE_MAP_MASK; 228 258 } 229 259 230 260 /* ··· 247 277 return (index & ~node_maxindex(node)) + (offset << node->shift); 248 278 } 249 279 250 - #ifndef __KERNEL__ 251 - static void dump_node(struct radix_tree_node *node, unsigned long index) 252 - { 253 - unsigned long i; 254 - 255 - pr_debug("radix node: %p offset %d indices %lu-%lu parent %p tags %lx %lx %lx shift %d count %d exceptional %d\n", 256 - node, node->offset, index, index | node_maxindex(node), 257 - node->parent, 258 - node->tags[0][0], node->tags[1][0], node->tags[2][0], 259 - node->shift, node->count, node->exceptional); 260 - 261 - for (i = 0; i < RADIX_TREE_MAP_SIZE; i++) { 262 - unsigned long first = index | (i << node->shift); 263 - unsigned long last = first | ((1UL << node->shift) - 1); 264 - void *entry = node->slots[i]; 265 - if (!entry) 266 - continue; 267 - if (entry == RADIX_TREE_RETRY) { 268 - pr_debug("radix retry offset %ld indices %lu-%lu parent %p\n", 269 - i, first, last, node); 270 - } else if (!radix_tree_is_internal_node(entry)) { 271 - pr_debug("radix entry %p offset %ld indices %lu-%lu parent %p\n", 272 - entry, i, first, last, node); 273 - } else if (is_sibling_entry(node, entry)) { 274 - pr_debug("radix sblng %p offset %ld indices %lu-%lu parent %p val %p\n", 275 - entry, i, first, last, node, 276 - *(void **)entry_to_node(entry)); 277 - } else { 278 - dump_node(entry_to_node(entry), first); 279 - } 280 - } 281 - } 282 - 283 - /* For debug */ 284 - static void radix_tree_dump(struct radix_tree_root *root) 285 - { 286 - pr_debug("radix root: %p rnode %p tags %x\n", 287 - root, root->rnode, 288 - root->gfp_mask >> ROOT_TAG_SHIFT); 289 - if (!radix_tree_is_internal_node(root->rnode)) 290 - return; 291 - dump_node(entry_to_node(root->rnode), 0); 292 - } 293 - 294 - static void dump_ida_node(void *entry, unsigned long index) 295 - { 296 - unsigned long i; 297 - 298 - if (!entry) 299 - return; 300 - 301 - if (radix_tree_is_internal_node(entry)) { 302 - struct radix_tree_node *node = entry_to_node(entry); 303 - 304 - pr_debug("ida node: %p offset %d indices %lu-%lu parent %p free %lx shift %d count %d\n", 305 - node, node->offset, index * IDA_BITMAP_BITS, 306 - ((index | node_maxindex(node)) + 1) * 307 - IDA_BITMAP_BITS - 1, 308 - node->parent, node->tags[0][0], node->shift, 309 - node->count); 310 - for (i = 0; i < RADIX_TREE_MAP_SIZE; i++) 311 - dump_ida_node(node->slots[i], 312 - index | (i << node->shift)); 313 - } else if (radix_tree_exceptional_entry(entry)) { 314 - pr_debug("ida excp: %p offset %d indices %lu-%lu data %lx\n", 315 - entry, (int)(index & RADIX_TREE_MAP_MASK), 316 - index * IDA_BITMAP_BITS, 317 - index * IDA_BITMAP_BITS + BITS_PER_LONG - 318 - RADIX_TREE_EXCEPTIONAL_SHIFT, 319 - (unsigned long)entry >> 320 - RADIX_TREE_EXCEPTIONAL_SHIFT); 321 - } else { 322 - struct ida_bitmap *bitmap = entry; 323 - 324 - pr_debug("ida btmp: %p offset %d indices %lu-%lu data", bitmap, 325 - (int)(index & RADIX_TREE_MAP_MASK), 326 - index * IDA_BITMAP_BITS, 327 - (index + 1) * IDA_BITMAP_BITS - 1); 328 - for (i = 0; i < IDA_BITMAP_LONGS; i++) 329 - pr_cont(" %lx", bitmap->bitmap[i]); 330 - pr_cont("\n"); 331 - } 332 - } 333 - 334 - static void ida_dump(struct ida *ida) 335 - { 336 - struct radix_tree_root *root = &ida->ida_rt; 337 - pr_debug("ida: %p node %p free %d\n", ida, root->rnode, 338 - root->gfp_mask >> ROOT_TAG_SHIFT); 339 - dump_ida_node(root->rnode, 0); 340 - } 341 - #endif 342 - 343 280 /* 344 281 * This assumes that the caller has performed appropriate preallocation, and 345 282 * that the caller has pinned this thread of control to the current CPU. ··· 255 378 radix_tree_node_alloc(gfp_t gfp_mask, struct radix_tree_node *parent, 256 379 struct radix_tree_root *root, 257 380 unsigned int shift, unsigned int offset, 258 - unsigned int count, unsigned int exceptional) 381 + unsigned int count, unsigned int nr_values) 259 382 { 260 383 struct radix_tree_node *ret = NULL; 261 384 ··· 302 425 ret->shift = shift; 303 426 ret->offset = offset; 304 427 ret->count = count; 305 - ret->exceptional = exceptional; 428 + ret->nr_values = nr_values; 306 429 ret->parent = parent; 307 - ret->root = root; 430 + ret->array = root; 308 431 } 309 432 return ret; 310 433 } 311 434 312 - static void radix_tree_node_rcu_free(struct rcu_head *head) 435 + void radix_tree_node_rcu_free(struct rcu_head *head) 313 436 { 314 437 struct radix_tree_node *node = 315 438 container_of(head, struct radix_tree_node, rcu_head); ··· 407 530 } 408 531 EXPORT_SYMBOL(radix_tree_maybe_preload); 409 532 410 - #ifdef CONFIG_RADIX_TREE_MULTIORDER 411 - /* 412 - * Preload with enough objects to ensure that we can split a single entry 413 - * of order @old_order into many entries of size @new_order 414 - */ 415 - int radix_tree_split_preload(unsigned int old_order, unsigned int new_order, 416 - gfp_t gfp_mask) 417 - { 418 - unsigned top = 1 << (old_order % RADIX_TREE_MAP_SHIFT); 419 - unsigned layers = (old_order / RADIX_TREE_MAP_SHIFT) - 420 - (new_order / RADIX_TREE_MAP_SHIFT); 421 - unsigned nr = 0; 422 - 423 - WARN_ON_ONCE(!gfpflags_allow_blocking(gfp_mask)); 424 - BUG_ON(new_order >= old_order); 425 - 426 - while (layers--) 427 - nr = nr * RADIX_TREE_MAP_SIZE + 1; 428 - return __radix_tree_preload(gfp_mask, top * nr); 429 - } 430 - #endif 431 - 432 - /* 433 - * The same as function above, but preload number of nodes required to insert 434 - * (1 << order) continuous naturally-aligned elements. 435 - */ 436 - int radix_tree_maybe_preload_order(gfp_t gfp_mask, int order) 437 - { 438 - unsigned long nr_subtrees; 439 - int nr_nodes, subtree_height; 440 - 441 - /* Preloading doesn't help anything with this gfp mask, skip it */ 442 - if (!gfpflags_allow_blocking(gfp_mask)) { 443 - preempt_disable(); 444 - return 0; 445 - } 446 - 447 - /* 448 - * Calculate number and height of fully populated subtrees it takes to 449 - * store (1 << order) elements. 450 - */ 451 - nr_subtrees = 1 << order; 452 - for (subtree_height = 0; nr_subtrees > RADIX_TREE_MAP_SIZE; 453 - subtree_height++) 454 - nr_subtrees >>= RADIX_TREE_MAP_SHIFT; 455 - 456 - /* 457 - * The worst case is zero height tree with a single item at index 0 and 458 - * then inserting items starting at ULONG_MAX - (1 << order). 459 - * 460 - * This requires RADIX_TREE_MAX_PATH nodes to build branch from root to 461 - * 0-index item. 462 - */ 463 - nr_nodes = RADIX_TREE_MAX_PATH; 464 - 465 - /* Plus branch to fully populated subtrees. */ 466 - nr_nodes += RADIX_TREE_MAX_PATH - subtree_height; 467 - 468 - /* Root node is shared. */ 469 - nr_nodes--; 470 - 471 - /* Plus nodes required to build subtrees. */ 472 - nr_nodes += nr_subtrees * height_to_maxnodes[subtree_height]; 473 - 474 - return __radix_tree_preload(gfp_mask, nr_nodes); 475 - } 476 - 477 533 static unsigned radix_tree_load_root(const struct radix_tree_root *root, 478 534 struct radix_tree_node **nodep, unsigned long *maxindex) 479 535 { 480 - struct radix_tree_node *node = rcu_dereference_raw(root->rnode); 536 + struct radix_tree_node *node = rcu_dereference_raw(root->xa_head); 481 537 482 538 *nodep = node; 483 539 ··· 439 629 while (index > shift_maxindex(maxshift)) 440 630 maxshift += RADIX_TREE_MAP_SHIFT; 441 631 442 - entry = rcu_dereference_raw(root->rnode); 632 + entry = rcu_dereference_raw(root->xa_head); 443 633 if (!entry && (!is_idr(root) || root_tag_get(root, IDR_FREE))) 444 634 goto out; 445 635 ··· 466 656 BUG_ON(shift > BITS_PER_LONG); 467 657 if (radix_tree_is_internal_node(entry)) { 468 658 entry_to_node(entry)->parent = node; 469 - } else if (radix_tree_exceptional_entry(entry)) { 470 - /* Moving an exceptional root->rnode to a node */ 471 - node->exceptional = 1; 659 + } else if (xa_is_value(entry)) { 660 + /* Moving a value entry root->xa_head to a node */ 661 + node->nr_values = 1; 472 662 } 473 663 /* 474 664 * entry was already in the radix tree, so we do not need ··· 476 666 */ 477 667 node->slots[0] = (void __rcu *)entry; 478 668 entry = node_to_entry(node); 479 - rcu_assign_pointer(root->rnode, entry); 669 + rcu_assign_pointer(root->xa_head, entry); 480 670 shift += RADIX_TREE_MAP_SHIFT; 481 671 } while (shift <= maxshift); 482 672 out: ··· 487 677 * radix_tree_shrink - shrink radix tree to minimum height 488 678 * @root radix tree root 489 679 */ 490 - static inline bool radix_tree_shrink(struct radix_tree_root *root, 491 - radix_tree_update_node_t update_node) 680 + static inline bool radix_tree_shrink(struct radix_tree_root *root) 492 681 { 493 682 bool shrunk = false; 494 683 495 684 for (;;) { 496 - struct radix_tree_node *node = rcu_dereference_raw(root->rnode); 685 + struct radix_tree_node *node = rcu_dereference_raw(root->xa_head); 497 686 struct radix_tree_node *child; 498 687 499 688 if (!radix_tree_is_internal_node(node)) ··· 501 692 502 693 /* 503 694 * The candidate node has more than one child, or its child 504 - * is not at the leftmost slot, or the child is a multiorder 505 - * entry, we cannot shrink. 695 + * is not at the leftmost slot, we cannot shrink. 506 696 */ 507 697 if (node->count != 1) 508 698 break; 509 699 child = rcu_dereference_raw(node->slots[0]); 510 700 if (!child) 511 701 break; 512 - if (!radix_tree_is_internal_node(child) && node->shift) 702 + 703 + /* 704 + * For an IDR, we must not shrink entry 0 into the root in 705 + * case somebody calls idr_replace() with a pointer that 706 + * appears to be an internal entry 707 + */ 708 + if (!node->shift && is_idr(root)) 513 709 break; 514 710 515 711 if (radix_tree_is_internal_node(child)) ··· 525 711 * moving the node from one part of the tree to another: if it 526 712 * was safe to dereference the old pointer to it 527 713 * (node->slots[0]), it will be safe to dereference the new 528 - * one (root->rnode) as far as dependent read barriers go. 714 + * one (root->xa_head) as far as dependent read barriers go. 529 715 */ 530 - root->rnode = (void __rcu *)child; 716 + root->xa_head = (void __rcu *)child; 531 717 if (is_idr(root) && !tag_get(node, IDR_FREE, 0)) 532 718 root_tag_clear(root, IDR_FREE); 533 719 ··· 552 738 node->count = 0; 553 739 if (!radix_tree_is_internal_node(child)) { 554 740 node->slots[0] = (void __rcu *)RADIX_TREE_RETRY; 555 - if (update_node) 556 - update_node(node); 557 741 } 558 742 559 743 WARN_ON_ONCE(!list_empty(&node->private_list)); ··· 563 751 } 564 752 565 753 static bool delete_node(struct radix_tree_root *root, 566 - struct radix_tree_node *node, 567 - radix_tree_update_node_t update_node) 754 + struct radix_tree_node *node) 568 755 { 569 756 bool deleted = false; 570 757 ··· 572 761 573 762 if (node->count) { 574 763 if (node_to_entry(node) == 575 - rcu_dereference_raw(root->rnode)) 576 - deleted |= radix_tree_shrink(root, 577 - update_node); 764 + rcu_dereference_raw(root->xa_head)) 765 + deleted |= radix_tree_shrink(root); 578 766 return deleted; 579 767 } 580 768 ··· 588 778 */ 589 779 if (!is_idr(root)) 590 780 root_tag_clear_all(root); 591 - root->rnode = NULL; 781 + root->xa_head = NULL; 592 782 } 593 783 594 784 WARN_ON_ONCE(!list_empty(&node->private_list)); ··· 605 795 * __radix_tree_create - create a slot in a radix tree 606 796 * @root: radix tree root 607 797 * @index: index key 608 - * @order: index occupies 2^order aligned slots 609 798 * @nodep: returns node 610 799 * @slotp: returns slot 611 800 * ··· 612 803 * at position @index in the radix tree @root. 613 804 * 614 805 * Until there is more than one item in the tree, no nodes are 615 - * allocated and @root->rnode is used as a direct slot instead of 806 + * allocated and @root->xa_head is used as a direct slot instead of 616 807 * pointing to a node, in which case *@nodep will be NULL. 617 808 * 618 809 * Returns -ENOMEM, or 0 for success. 619 810 */ 620 - int __radix_tree_create(struct radix_tree_root *root, unsigned long index, 621 - unsigned order, struct radix_tree_node **nodep, 622 - void __rcu ***slotp) 811 + static int __radix_tree_create(struct radix_tree_root *root, 812 + unsigned long index, struct radix_tree_node **nodep, 813 + void __rcu ***slotp) 623 814 { 624 815 struct radix_tree_node *node = NULL, *child; 625 - void __rcu **slot = (void __rcu **)&root->rnode; 816 + void __rcu **slot = (void __rcu **)&root->xa_head; 626 817 unsigned long maxindex; 627 818 unsigned int shift, offset = 0; 628 - unsigned long max = index | ((1UL << order) - 1); 819 + unsigned long max = index; 629 820 gfp_t gfp = root_gfp_mask(root); 630 821 631 822 shift = radix_tree_load_root(root, &child, &maxindex); 632 823 633 824 /* Make sure the tree is high enough. */ 634 - if (order > 0 && max == ((1UL << order) - 1)) 635 - max++; 636 825 if (max > maxindex) { 637 826 int error = radix_tree_extend(root, gfp, max, shift); 638 827 if (error < 0) 639 828 return error; 640 829 shift = error; 641 - child = rcu_dereference_raw(root->rnode); 830 + child = rcu_dereference_raw(root->xa_head); 642 831 } 643 832 644 - while (shift > order) { 833 + while (shift > 0) { 645 834 shift -= RADIX_TREE_MAP_SHIFT; 646 835 if (child == NULL) { 647 836 /* Have to add a child node. */ ··· 682 875 683 876 for (;;) { 684 877 void *entry = rcu_dereference_raw(child->slots[offset]); 685 - if (radix_tree_is_internal_node(entry) && 686 - !is_sibling_entry(child, entry)) { 878 + if (xa_is_node(entry) && child->shift) { 687 879 child = entry_to_node(entry); 688 880 offset = 0; 689 881 continue; ··· 700 894 } 701 895 } 702 896 703 - #ifdef CONFIG_RADIX_TREE_MULTIORDER 704 897 static inline int insert_entries(struct radix_tree_node *node, 705 - void __rcu **slot, void *item, unsigned order, bool replace) 706 - { 707 - struct radix_tree_node *child; 708 - unsigned i, n, tag, offset, tags = 0; 709 - 710 - if (node) { 711 - if (order > node->shift) 712 - n = 1 << (order - node->shift); 713 - else 714 - n = 1; 715 - offset = get_slot_offset(node, slot); 716 - } else { 717 - n = 1; 718 - offset = 0; 719 - } 720 - 721 - if (n > 1) { 722 - offset = offset & ~(n - 1); 723 - slot = &node->slots[offset]; 724 - } 725 - child = node_to_entry(slot); 726 - 727 - for (i = 0; i < n; i++) { 728 - if (slot[i]) { 729 - if (replace) { 730 - node->count--; 731 - for (tag = 0; tag < RADIX_TREE_MAX_TAGS; tag++) 732 - if (tag_get(node, tag, offset + i)) 733 - tags |= 1 << tag; 734 - } else 735 - return -EEXIST; 736 - } 737 - } 738 - 739 - for (i = 0; i < n; i++) { 740 - struct radix_tree_node *old = rcu_dereference_raw(slot[i]); 741 - if (i) { 742 - rcu_assign_pointer(slot[i], child); 743 - for (tag = 0; tag < RADIX_TREE_MAX_TAGS; tag++) 744 - if (tags & (1 << tag)) 745 - tag_clear(node, tag, offset + i); 746 - } else { 747 - rcu_assign_pointer(slot[i], item); 748 - for (tag = 0; tag < RADIX_TREE_MAX_TAGS; tag++) 749 - if (tags & (1 << tag)) 750 - tag_set(node, tag, offset); 751 - } 752 - if (radix_tree_is_internal_node(old) && 753 - !is_sibling_entry(node, old) && 754 - (old != RADIX_TREE_RETRY)) 755 - radix_tree_free_nodes(old); 756 - if (radix_tree_exceptional_entry(old)) 757 - node->exceptional--; 758 - } 759 - if (node) { 760 - node->count += n; 761 - if (radix_tree_exceptional_entry(item)) 762 - node->exceptional += n; 763 - } 764 - return n; 765 - } 766 - #else 767 - static inline int insert_entries(struct radix_tree_node *node, 768 - void __rcu **slot, void *item, unsigned order, bool replace) 898 + void __rcu **slot, void *item, bool replace) 769 899 { 770 900 if (*slot) 771 901 return -EEXIST; 772 902 rcu_assign_pointer(*slot, item); 773 903 if (node) { 774 904 node->count++; 775 - if (radix_tree_exceptional_entry(item)) 776 - node->exceptional++; 905 + if (xa_is_value(item)) 906 + node->nr_values++; 777 907 } 778 908 return 1; 779 909 } 780 - #endif 781 910 782 911 /** 783 912 * __radix_tree_insert - insert into a radix tree 784 913 * @root: radix tree root 785 914 * @index: index key 786 - * @order: key covers the 2^order indices around index 787 915 * @item: item to insert 788 916 * 789 917 * Insert an item into the radix tree at position @index. 790 918 */ 791 - int __radix_tree_insert(struct radix_tree_root *root, unsigned long index, 792 - unsigned order, void *item) 919 + int radix_tree_insert(struct radix_tree_root *root, unsigned long index, 920 + void *item) 793 921 { 794 922 struct radix_tree_node *node; 795 923 void __rcu **slot; ··· 731 991 732 992 BUG_ON(radix_tree_is_internal_node(item)); 733 993 734 - error = __radix_tree_create(root, index, order, &node, &slot); 994 + error = __radix_tree_create(root, index, &node, &slot); 735 995 if (error) 736 996 return error; 737 997 738 - error = insert_entries(node, slot, item, order, false); 998 + error = insert_entries(node, slot, item, false); 739 999 if (error < 0) 740 1000 return error; 741 1001 ··· 750 1010 751 1011 return 0; 752 1012 } 753 - EXPORT_SYMBOL(__radix_tree_insert); 1013 + EXPORT_SYMBOL(radix_tree_insert); 754 1014 755 1015 /** 756 1016 * __radix_tree_lookup - lookup an item in a radix tree ··· 763 1023 * tree @root. 764 1024 * 765 1025 * Until there is more than one item in the tree, no nodes are 766 - * allocated and @root->rnode is used as a direct slot instead of 1026 + * allocated and @root->xa_head is used as a direct slot instead of 767 1027 * pointing to a node, in which case *@nodep will be NULL. 768 1028 */ 769 1029 void *__radix_tree_lookup(const struct radix_tree_root *root, ··· 776 1036 777 1037 restart: 778 1038 parent = NULL; 779 - slot = (void __rcu **)&root->rnode; 1039 + slot = (void __rcu **)&root->xa_head; 780 1040 radix_tree_load_root(root, &node, &maxindex); 781 1041 if (index > maxindex) 782 1042 return NULL; ··· 789 1049 parent = entry_to_node(node); 790 1050 offset = radix_tree_descend(parent, &node, index); 791 1051 slot = parent->slots + offset; 1052 + if (parent->shift == 0) 1053 + break; 792 1054 } 793 1055 794 1056 if (nodep) ··· 842 1100 } 843 1101 EXPORT_SYMBOL(radix_tree_lookup); 844 1102 845 - static inline void replace_sibling_entries(struct radix_tree_node *node, 846 - void __rcu **slot, int count, int exceptional) 847 - { 848 - #ifdef CONFIG_RADIX_TREE_MULTIORDER 849 - void *ptr = node_to_entry(slot); 850 - unsigned offset = get_slot_offset(node, slot) + 1; 851 - 852 - while (offset < RADIX_TREE_MAP_SIZE) { 853 - if (rcu_dereference_raw(node->slots[offset]) != ptr) 854 - break; 855 - if (count < 0) { 856 - node->slots[offset] = NULL; 857 - node->count--; 858 - } 859 - node->exceptional += exceptional; 860 - offset++; 861 - } 862 - #endif 863 - } 864 - 865 1103 static void replace_slot(void __rcu **slot, void *item, 866 - struct radix_tree_node *node, int count, int exceptional) 1104 + struct radix_tree_node *node, int count, int values) 867 1105 { 868 - if (WARN_ON_ONCE(radix_tree_is_internal_node(item))) 869 - return; 870 - 871 - if (node && (count || exceptional)) { 1106 + if (node && (count || values)) { 872 1107 node->count += count; 873 - node->exceptional += exceptional; 874 - replace_sibling_entries(node, slot, count, exceptional); 1108 + node->nr_values += values; 875 1109 } 876 1110 877 1111 rcu_assign_pointer(*slot, item); ··· 890 1172 * @node: pointer to tree node 891 1173 * @slot: pointer to slot in @node 892 1174 * @item: new item to store in the slot. 893 - * @update_node: callback for changing leaf nodes 894 1175 * 895 1176 * For use with __radix_tree_lookup(). Caller must hold tree write locked 896 1177 * across slot lookup and replacement. 897 1178 */ 898 1179 void __radix_tree_replace(struct radix_tree_root *root, 899 1180 struct radix_tree_node *node, 900 - void __rcu **slot, void *item, 901 - radix_tree_update_node_t update_node) 1181 + void __rcu **slot, void *item) 902 1182 { 903 1183 void *old = rcu_dereference_raw(*slot); 904 - int exceptional = !!radix_tree_exceptional_entry(item) - 905 - !!radix_tree_exceptional_entry(old); 1184 + int values = !!xa_is_value(item) - !!xa_is_value(old); 906 1185 int count = calculate_count(root, node, slot, item, old); 907 1186 908 1187 /* 909 - * This function supports replacing exceptional entries and 1188 + * This function supports replacing value entries and 910 1189 * deleting entries, but that needs accounting against the 911 - * node unless the slot is root->rnode. 1190 + * node unless the slot is root->xa_head. 912 1191 */ 913 - WARN_ON_ONCE(!node && (slot != (void __rcu **)&root->rnode) && 914 - (count || exceptional)); 915 - replace_slot(slot, item, node, count, exceptional); 1192 + WARN_ON_ONCE(!node && (slot != (void __rcu **)&root->xa_head) && 1193 + (count || values)); 1194 + replace_slot(slot, item, node, count, values); 916 1195 917 1196 if (!node) 918 1197 return; 919 1198 920 - if (update_node) 921 - update_node(node); 922 - 923 - delete_node(root, node, update_node); 1199 + delete_node(root, node); 924 1200 } 925 1201 926 1202 /** ··· 923 1211 * @slot: pointer to slot 924 1212 * @item: new item to store in the slot. 925 1213 * 926 - * For use with radix_tree_lookup_slot(), radix_tree_gang_lookup_slot(), 1214 + * For use with radix_tree_lookup_slot() and 927 1215 * radix_tree_gang_lookup_tag_slot(). Caller must hold tree write locked 928 1216 * across slot lookup and replacement. 929 1217 * 930 1218 * NOTE: This cannot be used to switch between non-entries (empty slots), 931 - * regular entries, and exceptional entries, as that requires accounting 1219 + * regular entries, and value entries, as that requires accounting 932 1220 * inside the radix tree node. When switching from one type of entry or 933 1221 * deleting, use __radix_tree_lookup() and __radix_tree_replace() or 934 1222 * radix_tree_iter_replace(). ··· 936 1224 void radix_tree_replace_slot(struct radix_tree_root *root, 937 1225 void __rcu **slot, void *item) 938 1226 { 939 - __radix_tree_replace(root, NULL, slot, item, NULL); 1227 + __radix_tree_replace(root, NULL, slot, item); 940 1228 } 941 1229 EXPORT_SYMBOL(radix_tree_replace_slot); 942 1230 ··· 946 1234 * @slot: pointer to slot 947 1235 * @item: new item to store in the slot. 948 1236 * 949 - * For use with radix_tree_split() and radix_tree_for_each_slot(). 950 - * Caller must hold tree write locked across split and replacement. 1237 + * For use with radix_tree_for_each_slot(). 1238 + * Caller must hold tree write locked. 951 1239 */ 952 1240 void radix_tree_iter_replace(struct radix_tree_root *root, 953 1241 const struct radix_tree_iter *iter, 954 1242 void __rcu **slot, void *item) 955 1243 { 956 - __radix_tree_replace(root, iter->node, slot, item, NULL); 1244 + __radix_tree_replace(root, iter->node, slot, item); 957 1245 } 958 - 959 - #ifdef CONFIG_RADIX_TREE_MULTIORDER 960 - /** 961 - * radix_tree_join - replace multiple entries with one multiorder entry 962 - * @root: radix tree root 963 - * @index: an index inside the new entry 964 - * @order: order of the new entry 965 - * @item: new entry 966 - * 967 - * Call this function to replace several entries with one larger entry. 968 - * The existing entries are presumed to not need freeing as a result of 969 - * this call. 970 - * 971 - * The replacement entry will have all the tags set on it that were set 972 - * on any of the entries it is replacing. 973 - */ 974 - int radix_tree_join(struct radix_tree_root *root, unsigned long index, 975 - unsigned order, void *item) 976 - { 977 - struct radix_tree_node *node; 978 - void __rcu **slot; 979 - int error; 980 - 981 - BUG_ON(radix_tree_is_internal_node(item)); 982 - 983 - error = __radix_tree_create(root, index, order, &node, &slot); 984 - if (!error) 985 - error = insert_entries(node, slot, item, order, true); 986 - if (error > 0) 987 - error = 0; 988 - 989 - return error; 990 - } 991 - 992 - /** 993 - * radix_tree_split - Split an entry into smaller entries 994 - * @root: radix tree root 995 - * @index: An index within the large entry 996 - * @order: Order of new entries 997 - * 998 - * Call this function as the first step in replacing a multiorder entry 999 - * with several entries of lower order. After this function returns, 1000 - * loop over the relevant portion of the tree using radix_tree_for_each_slot() 1001 - * and call radix_tree_iter_replace() to set up each new entry. 1002 - * 1003 - * The tags from this entry are replicated to all the new entries. 1004 - * 1005 - * The radix tree should be locked against modification during the entire 1006 - * replacement operation. Lock-free lookups will see RADIX_TREE_RETRY which 1007 - * should prompt RCU walkers to restart the lookup from the root. 1008 - */ 1009 - int radix_tree_split(struct radix_tree_root *root, unsigned long index, 1010 - unsigned order) 1011 - { 1012 - struct radix_tree_node *parent, *node, *child; 1013 - void __rcu **slot; 1014 - unsigned int offset, end; 1015 - unsigned n, tag, tags = 0; 1016 - gfp_t gfp = root_gfp_mask(root); 1017 - 1018 - if (!__radix_tree_lookup(root, index, &parent, &slot)) 1019 - return -ENOENT; 1020 - if (!parent) 1021 - return -ENOENT; 1022 - 1023 - offset = get_slot_offset(parent, slot); 1024 - 1025 - for (tag = 0; tag < RADIX_TREE_MAX_TAGS; tag++) 1026 - if (tag_get(parent, tag, offset)) 1027 - tags |= 1 << tag; 1028 - 1029 - for (end = offset + 1; end < RADIX_TREE_MAP_SIZE; end++) { 1030 - if (!is_sibling_entry(parent, 1031 - rcu_dereference_raw(parent->slots[end]))) 1032 - break; 1033 - for (tag = 0; tag < RADIX_TREE_MAX_TAGS; tag++) 1034 - if (tags & (1 << tag)) 1035 - tag_set(parent, tag, end); 1036 - /* rcu_assign_pointer ensures tags are set before RETRY */ 1037 - rcu_assign_pointer(parent->slots[end], RADIX_TREE_RETRY); 1038 - } 1039 - rcu_assign_pointer(parent->slots[offset], RADIX_TREE_RETRY); 1040 - parent->exceptional -= (end - offset); 1041 - 1042 - if (order == parent->shift) 1043 - return 0; 1044 - if (order > parent->shift) { 1045 - while (offset < end) 1046 - offset += insert_entries(parent, &parent->slots[offset], 1047 - RADIX_TREE_RETRY, order, true); 1048 - return 0; 1049 - } 1050 - 1051 - node = parent; 1052 - 1053 - for (;;) { 1054 - if (node->shift > order) { 1055 - child = radix_tree_node_alloc(gfp, node, root, 1056 - node->shift - RADIX_TREE_MAP_SHIFT, 1057 - offset, 0, 0); 1058 - if (!child) 1059 - goto nomem; 1060 - if (node != parent) { 1061 - node->count++; 1062 - rcu_assign_pointer(node->slots[offset], 1063 - node_to_entry(child)); 1064 - for (tag = 0; tag < RADIX_TREE_MAX_TAGS; tag++) 1065 - if (tags & (1 << tag)) 1066 - tag_set(node, tag, offset); 1067 - } 1068 - 1069 - node = child; 1070 - offset = 0; 1071 - continue; 1072 - } 1073 - 1074 - n = insert_entries(node, &node->slots[offset], 1075 - RADIX_TREE_RETRY, order, false); 1076 - BUG_ON(n > RADIX_TREE_MAP_SIZE); 1077 - 1078 - for (tag = 0; tag < RADIX_TREE_MAX_TAGS; tag++) 1079 - if (tags & (1 << tag)) 1080 - tag_set(node, tag, offset); 1081 - offset += n; 1082 - 1083 - while (offset == RADIX_TREE_MAP_SIZE) { 1084 - if (node == parent) 1085 - break; 1086 - offset = node->offset; 1087 - child = node; 1088 - node = node->parent; 1089 - rcu_assign_pointer(node->slots[offset], 1090 - node_to_entry(child)); 1091 - offset++; 1092 - } 1093 - if ((node == parent) && (offset == end)) 1094 - return 0; 1095 - } 1096 - 1097 - nomem: 1098 - /* Shouldn't happen; did user forget to preload? */ 1099 - /* TODO: free all the allocated nodes */ 1100 - WARN_ON(1); 1101 - return -ENOMEM; 1102 - } 1103 - #endif 1104 1246 1105 1247 static void node_tag_set(struct radix_tree_root *root, 1106 1248 struct radix_tree_node *node, ··· 1012 1446 return node; 1013 1447 } 1014 1448 EXPORT_SYMBOL(radix_tree_tag_set); 1015 - 1016 - /** 1017 - * radix_tree_iter_tag_set - set a tag on the current iterator entry 1018 - * @root: radix tree root 1019 - * @iter: iterator state 1020 - * @tag: tag to set 1021 - */ 1022 - void radix_tree_iter_tag_set(struct radix_tree_root *root, 1023 - const struct radix_tree_iter *iter, unsigned int tag) 1024 - { 1025 - node_tag_set(root, iter->node, tag, iter_offset(iter)); 1026 - } 1027 1449 1028 1450 static void node_tag_clear(struct radix_tree_root *root, 1029 1451 struct radix_tree_node *node, ··· 1128 1574 } 1129 1575 EXPORT_SYMBOL(radix_tree_tag_get); 1130 1576 1131 - static inline void __set_iter_shift(struct radix_tree_iter *iter, 1132 - unsigned int shift) 1133 - { 1134 - #ifdef CONFIG_RADIX_TREE_MULTIORDER 1135 - iter->shift = shift; 1136 - #endif 1137 - } 1138 - 1139 1577 /* Construct iter->tags bit-mask from node->tags[tag] array */ 1140 1578 static void set_iter_tags(struct radix_tree_iter *iter, 1141 1579 struct radix_tree_node *node, unsigned offset, ··· 1154 1608 } 1155 1609 } 1156 1610 1157 - #ifdef CONFIG_RADIX_TREE_MULTIORDER 1158 - static void __rcu **skip_siblings(struct radix_tree_node **nodep, 1159 - void __rcu **slot, struct radix_tree_iter *iter) 1160 - { 1161 - while (iter->index < iter->next_index) { 1162 - *nodep = rcu_dereference_raw(*slot); 1163 - if (*nodep && !is_sibling_entry(iter->node, *nodep)) 1164 - return slot; 1165 - slot++; 1166 - iter->index = __radix_tree_iter_add(iter, 1); 1167 - iter->tags >>= 1; 1168 - } 1169 - 1170 - *nodep = NULL; 1171 - return NULL; 1172 - } 1173 - 1174 - void __rcu **__radix_tree_next_slot(void __rcu **slot, 1175 - struct radix_tree_iter *iter, unsigned flags) 1176 - { 1177 - unsigned tag = flags & RADIX_TREE_ITER_TAG_MASK; 1178 - struct radix_tree_node *node; 1179 - 1180 - slot = skip_siblings(&node, slot, iter); 1181 - 1182 - while (radix_tree_is_internal_node(node)) { 1183 - unsigned offset; 1184 - unsigned long next_index; 1185 - 1186 - if (node == RADIX_TREE_RETRY) 1187 - return slot; 1188 - node = entry_to_node(node); 1189 - iter->node = node; 1190 - iter->shift = node->shift; 1191 - 1192 - if (flags & RADIX_TREE_ITER_TAGGED) { 1193 - offset = radix_tree_find_next_bit(node, tag, 0); 1194 - if (offset == RADIX_TREE_MAP_SIZE) 1195 - return NULL; 1196 - slot = &node->slots[offset]; 1197 - iter->index = __radix_tree_iter_add(iter, offset); 1198 - set_iter_tags(iter, node, offset, tag); 1199 - node = rcu_dereference_raw(*slot); 1200 - } else { 1201 - offset = 0; 1202 - slot = &node->slots[0]; 1203 - for (;;) { 1204 - node = rcu_dereference_raw(*slot); 1205 - if (node) 1206 - break; 1207 - slot++; 1208 - offset++; 1209 - if (offset == RADIX_TREE_MAP_SIZE) 1210 - return NULL; 1211 - } 1212 - iter->index = __radix_tree_iter_add(iter, offset); 1213 - } 1214 - if ((flags & RADIX_TREE_ITER_CONTIG) && (offset > 0)) 1215 - goto none; 1216 - next_index = (iter->index | shift_maxindex(iter->shift)) + 1; 1217 - if (next_index < iter->next_index) 1218 - iter->next_index = next_index; 1219 - } 1220 - 1221 - return slot; 1222 - none: 1223 - iter->next_index = 0; 1224 - return NULL; 1225 - } 1226 - EXPORT_SYMBOL(__radix_tree_next_slot); 1227 - #else 1228 - static void __rcu **skip_siblings(struct radix_tree_node **nodep, 1229 - void __rcu **slot, struct radix_tree_iter *iter) 1230 - { 1231 - return slot; 1232 - } 1233 - #endif 1234 - 1235 1611 void __rcu **radix_tree_iter_resume(void __rcu **slot, 1236 1612 struct radix_tree_iter *iter) 1237 1613 { 1238 - struct radix_tree_node *node; 1239 - 1240 1614 slot++; 1241 1615 iter->index = __radix_tree_iter_add(iter, 1); 1242 - skip_siblings(&node, slot, iter); 1243 1616 iter->next_index = iter->index; 1244 1617 iter->tags = 0; 1245 1618 return NULL; ··· 1209 1744 iter->next_index = maxindex + 1; 1210 1745 iter->tags = 1; 1211 1746 iter->node = NULL; 1212 - __set_iter_shift(iter, 0); 1213 - return (void __rcu **)&root->rnode; 1747 + return (void __rcu **)&root->xa_head; 1214 1748 } 1215 1749 1216 1750 do { ··· 1229 1765 while (++offset < RADIX_TREE_MAP_SIZE) { 1230 1766 void *slot = rcu_dereference_raw( 1231 1767 node->slots[offset]); 1232 - if (is_sibling_entry(node, slot)) 1233 - continue; 1234 1768 if (slot) 1235 1769 break; 1236 1770 } ··· 1246 1784 goto restart; 1247 1785 if (child == RADIX_TREE_RETRY) 1248 1786 break; 1249 - } while (radix_tree_is_internal_node(child)); 1787 + } while (node->shift && radix_tree_is_internal_node(child)); 1250 1788 1251 1789 /* Update the iterator state */ 1252 - iter->index = (index &~ node_maxindex(node)) | (offset << node->shift); 1790 + iter->index = (index &~ node_maxindex(node)) | offset; 1253 1791 iter->next_index = (index | node_maxindex(node)) + 1; 1254 1792 iter->node = node; 1255 - __set_iter_shift(iter, node->shift); 1256 1793 1257 1794 if (flags & RADIX_TREE_ITER_TAGGED) 1258 1795 set_iter_tags(iter, node, offset, tag); ··· 1306 1845 return ret; 1307 1846 } 1308 1847 EXPORT_SYMBOL(radix_tree_gang_lookup); 1309 - 1310 - /** 1311 - * radix_tree_gang_lookup_slot - perform multiple slot lookup on radix tree 1312 - * @root: radix tree root 1313 - * @results: where the results of the lookup are placed 1314 - * @indices: where their indices should be placed (but usually NULL) 1315 - * @first_index: start the lookup from this key 1316 - * @max_items: place up to this many items at *results 1317 - * 1318 - * Performs an index-ascending scan of the tree for present items. Places 1319 - * their slots at *@results and returns the number of items which were 1320 - * placed at *@results. 1321 - * 1322 - * The implementation is naive. 1323 - * 1324 - * Like radix_tree_gang_lookup as far as RCU and locking goes. Slots must 1325 - * be dereferenced with radix_tree_deref_slot, and if using only RCU 1326 - * protection, radix_tree_deref_slot may fail requiring a retry. 1327 - */ 1328 - unsigned int 1329 - radix_tree_gang_lookup_slot(const struct radix_tree_root *root, 1330 - void __rcu ***results, unsigned long *indices, 1331 - unsigned long first_index, unsigned int max_items) 1332 - { 1333 - struct radix_tree_iter iter; 1334 - void __rcu **slot; 1335 - unsigned int ret = 0; 1336 - 1337 - if (unlikely(!max_items)) 1338 - return 0; 1339 - 1340 - radix_tree_for_each_slot(slot, root, &iter, first_index) { 1341 - results[ret] = slot; 1342 - if (indices) 1343 - indices[ret] = iter.index; 1344 - if (++ret == max_items) 1345 - break; 1346 - } 1347 - 1348 - return ret; 1349 - } 1350 - EXPORT_SYMBOL(radix_tree_gang_lookup_slot); 1351 1848 1352 1849 /** 1353 1850 * radix_tree_gang_lookup_tag - perform multiple lookup on a radix tree ··· 1383 1964 } 1384 1965 EXPORT_SYMBOL(radix_tree_gang_lookup_tag_slot); 1385 1966 1386 - /** 1387 - * __radix_tree_delete_node - try to free node after clearing a slot 1388 - * @root: radix tree root 1389 - * @node: node containing @index 1390 - * @update_node: callback for changing leaf nodes 1391 - * 1392 - * After clearing the slot at @index in @node from radix tree 1393 - * rooted at @root, call this function to attempt freeing the 1394 - * node and shrinking the tree. 1395 - */ 1396 - void __radix_tree_delete_node(struct radix_tree_root *root, 1397 - struct radix_tree_node *node, 1398 - radix_tree_update_node_t update_node) 1399 - { 1400 - delete_node(root, node, update_node); 1401 - } 1402 - 1403 1967 static bool __radix_tree_delete(struct radix_tree_root *root, 1404 1968 struct radix_tree_node *node, void __rcu **slot) 1405 1969 { 1406 1970 void *old = rcu_dereference_raw(*slot); 1407 - int exceptional = radix_tree_exceptional_entry(old) ? -1 : 0; 1971 + int values = xa_is_value(old) ? -1 : 0; 1408 1972 unsigned offset = get_slot_offset(node, slot); 1409 1973 int tag; 1410 1974 ··· 1397 1995 for (tag = 0; tag < RADIX_TREE_MAX_TAGS; tag++) 1398 1996 node_tag_clear(root, node, tag, offset); 1399 1997 1400 - replace_slot(slot, NULL, node, -1, exceptional); 1401 - return node && delete_node(root, node, NULL); 1998 + replace_slot(slot, NULL, node, -1, values); 1999 + return node && delete_node(root, node); 1402 2000 } 1403 2001 1404 2002 /** ··· 1470 2068 } 1471 2069 EXPORT_SYMBOL(radix_tree_delete); 1472 2070 1473 - void radix_tree_clear_tags(struct radix_tree_root *root, 1474 - struct radix_tree_node *node, 1475 - void __rcu **slot) 1476 - { 1477 - if (node) { 1478 - unsigned int tag, offset = get_slot_offset(node, slot); 1479 - for (tag = 0; tag < RADIX_TREE_MAX_TAGS; tag++) 1480 - node_tag_clear(root, node, tag, offset); 1481 - } else { 1482 - root_tag_clear_all(root); 1483 - } 1484 - } 1485 - 1486 2071 /** 1487 2072 * radix_tree_tagged - test whether any items in the tree are tagged 1488 2073 * @root: radix tree root ··· 1495 2106 } 1496 2107 EXPORT_SYMBOL(idr_preload); 1497 2108 1498 - int ida_pre_get(struct ida *ida, gfp_t gfp) 1499 - { 1500 - /* 1501 - * The IDA API has no preload_end() equivalent. Instead, 1502 - * ida_get_new() can return -EAGAIN, prompting the caller 1503 - * to return to the ida_pre_get() step. 1504 - */ 1505 - if (!__radix_tree_preload(gfp, IDA_PRELOAD_SIZE)) 1506 - preempt_enable(); 1507 - 1508 - if (!this_cpu_read(ida_bitmap)) { 1509 - struct ida_bitmap *bitmap = kzalloc(sizeof(*bitmap), gfp); 1510 - if (!bitmap) 1511 - return 0; 1512 - if (this_cpu_cmpxchg(ida_bitmap, NULL, bitmap)) 1513 - kfree(bitmap); 1514 - } 1515 - 1516 - return 1; 1517 - } 1518 - 1519 2109 void __rcu **idr_get_free(struct radix_tree_root *root, 1520 2110 struct radix_tree_iter *iter, gfp_t gfp, 1521 2111 unsigned long max) 1522 2112 { 1523 2113 struct radix_tree_node *node = NULL, *child; 1524 - void __rcu **slot = (void __rcu **)&root->rnode; 2114 + void __rcu **slot = (void __rcu **)&root->xa_head; 1525 2115 unsigned long maxindex, start = iter->next_index; 1526 2116 unsigned int shift, offset = 0; 1527 2117 ··· 1516 2148 if (error < 0) 1517 2149 return ERR_PTR(error); 1518 2150 shift = error; 1519 - child = rcu_dereference_raw(root->rnode); 2151 + child = rcu_dereference_raw(root->xa_head); 1520 2152 } 2153 + if (start == 0 && shift == 0) 2154 + shift = RADIX_TREE_MAP_SHIFT; 1521 2155 1522 2156 while (shift) { 1523 2157 shift -= RADIX_TREE_MAP_SHIFT; ··· 1562 2192 else 1563 2193 iter->next_index = 1; 1564 2194 iter->node = node; 1565 - __set_iter_shift(iter, shift); 1566 2195 set_iter_tags(iter, node, offset, IDR_FREE); 1567 2196 1568 2197 return slot; ··· 1580 2211 */ 1581 2212 void idr_destroy(struct idr *idr) 1582 2213 { 1583 - struct radix_tree_node *node = rcu_dereference_raw(idr->idr_rt.rnode); 2214 + struct radix_tree_node *node = rcu_dereference_raw(idr->idr_rt.xa_head); 1584 2215 if (radix_tree_is_internal_node(node)) 1585 2216 radix_tree_free_nodes(node); 1586 - idr->idr_rt.rnode = NULL; 2217 + idr->idr_rt.xa_head = NULL; 1587 2218 root_tag_set(&idr->idr_rt, IDR_FREE); 1588 2219 } 1589 2220 EXPORT_SYMBOL(idr_destroy); ··· 1595 2226 1596 2227 memset(node, 0, sizeof(*node)); 1597 2228 INIT_LIST_HEAD(&node->private_list); 1598 - } 1599 - 1600 - static __init unsigned long __maxindex(unsigned int height) 1601 - { 1602 - unsigned int width = height * RADIX_TREE_MAP_SHIFT; 1603 - int shift = RADIX_TREE_INDEX_BITS - width; 1604 - 1605 - if (shift < 0) 1606 - return ~0UL; 1607 - if (shift >= BITS_PER_LONG) 1608 - return 0UL; 1609 - return ~0UL >> shift; 1610 - } 1611 - 1612 - static __init void radix_tree_init_maxnodes(void) 1613 - { 1614 - unsigned long height_to_maxindex[RADIX_TREE_MAX_PATH + 1]; 1615 - unsigned int i, j; 1616 - 1617 - for (i = 0; i < ARRAY_SIZE(height_to_maxindex); i++) 1618 - height_to_maxindex[i] = __maxindex(i); 1619 - for (i = 0; i < ARRAY_SIZE(height_to_maxnodes); i++) { 1620 - for (j = i; j > 0; j--) 1621 - height_to_maxnodes[i] += height_to_maxindex[j - 1] + 1; 1622 - } 1623 2229 } 1624 2230 1625 2231 static int radix_tree_cpu_dead(unsigned int cpu) ··· 1610 2266 kmem_cache_free(radix_tree_node_cachep, node); 1611 2267 rtp->nr--; 1612 2268 } 1613 - kfree(per_cpu(ida_bitmap, cpu)); 1614 - per_cpu(ida_bitmap, cpu) = NULL; 1615 2269 return 0; 1616 2270 } 1617 2271 ··· 1619 2277 1620 2278 BUILD_BUG_ON(RADIX_TREE_MAX_TAGS + __GFP_BITS_SHIFT > 32); 1621 2279 BUILD_BUG_ON(ROOT_IS_IDR & ~GFP_ZONEMASK); 2280 + BUILD_BUG_ON(XA_CHUNK_SIZE > 255); 1622 2281 radix_tree_node_cachep = kmem_cache_create("radix_tree_node", 1623 2282 sizeof(struct radix_tree_node), 0, 1624 2283 SLAB_PANIC | SLAB_RECLAIM_ACCOUNT, 1625 2284 radix_tree_node_ctor); 1626 - radix_tree_init_maxnodes(); 1627 2285 ret = cpuhp_setup_state_nocalls(CPUHP_RADIX_DEAD, "lib/radix:dead", 1628 2286 NULL, radix_tree_cpu_dead); 1629 2287 WARN_ON(ret < 0);
+1238
lib/test_xarray.c
··· 1 + // SPDX-License-Identifier: GPL-2.0+ 2 + /* 3 + * test_xarray.c: Test the XArray API 4 + * Copyright (c) 2017-2018 Microsoft Corporation 5 + * Author: Matthew Wilcox <willy@infradead.org> 6 + */ 7 + 8 + #include <linux/xarray.h> 9 + #include <linux/module.h> 10 + 11 + static unsigned int tests_run; 12 + static unsigned int tests_passed; 13 + 14 + #ifndef XA_DEBUG 15 + # ifdef __KERNEL__ 16 + void xa_dump(const struct xarray *xa) { } 17 + # endif 18 + #undef XA_BUG_ON 19 + #define XA_BUG_ON(xa, x) do { \ 20 + tests_run++; \ 21 + if (x) { \ 22 + printk("BUG at %s:%d\n", __func__, __LINE__); \ 23 + xa_dump(xa); \ 24 + dump_stack(); \ 25 + } else { \ 26 + tests_passed++; \ 27 + } \ 28 + } while (0) 29 + #endif 30 + 31 + static void *xa_store_index(struct xarray *xa, unsigned long index, gfp_t gfp) 32 + { 33 + return xa_store(xa, index, xa_mk_value(index & LONG_MAX), gfp); 34 + } 35 + 36 + static void xa_alloc_index(struct xarray *xa, unsigned long index, gfp_t gfp) 37 + { 38 + u32 id = 0; 39 + 40 + XA_BUG_ON(xa, xa_alloc(xa, &id, UINT_MAX, xa_mk_value(index & LONG_MAX), 41 + gfp) != 0); 42 + XA_BUG_ON(xa, id != index); 43 + } 44 + 45 + static void xa_erase_index(struct xarray *xa, unsigned long index) 46 + { 47 + XA_BUG_ON(xa, xa_erase(xa, index) != xa_mk_value(index & LONG_MAX)); 48 + XA_BUG_ON(xa, xa_load(xa, index) != NULL); 49 + } 50 + 51 + /* 52 + * If anyone needs this, please move it to xarray.c. We have no current 53 + * users outside the test suite because all current multislot users want 54 + * to use the advanced API. 55 + */ 56 + static void *xa_store_order(struct xarray *xa, unsigned long index, 57 + unsigned order, void *entry, gfp_t gfp) 58 + { 59 + XA_STATE_ORDER(xas, xa, index, order); 60 + void *curr; 61 + 62 + do { 63 + xas_lock(&xas); 64 + curr = xas_store(&xas, entry); 65 + xas_unlock(&xas); 66 + } while (xas_nomem(&xas, gfp)); 67 + 68 + return curr; 69 + } 70 + 71 + static noinline void check_xa_err(struct xarray *xa) 72 + { 73 + XA_BUG_ON(xa, xa_err(xa_store_index(xa, 0, GFP_NOWAIT)) != 0); 74 + XA_BUG_ON(xa, xa_err(xa_erase(xa, 0)) != 0); 75 + #ifndef __KERNEL__ 76 + /* The kernel does not fail GFP_NOWAIT allocations */ 77 + XA_BUG_ON(xa, xa_err(xa_store_index(xa, 1, GFP_NOWAIT)) != -ENOMEM); 78 + XA_BUG_ON(xa, xa_err(xa_store_index(xa, 1, GFP_NOWAIT)) != -ENOMEM); 79 + #endif 80 + XA_BUG_ON(xa, xa_err(xa_store_index(xa, 1, GFP_KERNEL)) != 0); 81 + XA_BUG_ON(xa, xa_err(xa_store(xa, 1, xa_mk_value(0), GFP_KERNEL)) != 0); 82 + XA_BUG_ON(xa, xa_err(xa_erase(xa, 1)) != 0); 83 + // kills the test-suite :-( 84 + // XA_BUG_ON(xa, xa_err(xa_store(xa, 0, xa_mk_internal(0), 0)) != -EINVAL); 85 + } 86 + 87 + static noinline void check_xas_retry(struct xarray *xa) 88 + { 89 + XA_STATE(xas, xa, 0); 90 + void *entry; 91 + 92 + xa_store_index(xa, 0, GFP_KERNEL); 93 + xa_store_index(xa, 1, GFP_KERNEL); 94 + 95 + rcu_read_lock(); 96 + XA_BUG_ON(xa, xas_find(&xas, ULONG_MAX) != xa_mk_value(0)); 97 + xa_erase_index(xa, 1); 98 + XA_BUG_ON(xa, !xa_is_retry(xas_reload(&xas))); 99 + XA_BUG_ON(xa, xas_retry(&xas, NULL)); 100 + XA_BUG_ON(xa, xas_retry(&xas, xa_mk_value(0))); 101 + xas_reset(&xas); 102 + XA_BUG_ON(xa, xas.xa_node != XAS_RESTART); 103 + XA_BUG_ON(xa, xas_next_entry(&xas, ULONG_MAX) != xa_mk_value(0)); 104 + XA_BUG_ON(xa, xas.xa_node != NULL); 105 + 106 + XA_BUG_ON(xa, xa_store_index(xa, 1, GFP_KERNEL) != NULL); 107 + XA_BUG_ON(xa, !xa_is_internal(xas_reload(&xas))); 108 + xas.xa_node = XAS_RESTART; 109 + XA_BUG_ON(xa, xas_next_entry(&xas, ULONG_MAX) != xa_mk_value(0)); 110 + rcu_read_unlock(); 111 + 112 + /* Make sure we can iterate through retry entries */ 113 + xas_lock(&xas); 114 + xas_set(&xas, 0); 115 + xas_store(&xas, XA_RETRY_ENTRY); 116 + xas_set(&xas, 1); 117 + xas_store(&xas, XA_RETRY_ENTRY); 118 + 119 + xas_set(&xas, 0); 120 + xas_for_each(&xas, entry, ULONG_MAX) { 121 + xas_store(&xas, xa_mk_value(xas.xa_index)); 122 + } 123 + xas_unlock(&xas); 124 + 125 + xa_erase_index(xa, 0); 126 + xa_erase_index(xa, 1); 127 + } 128 + 129 + static noinline void check_xa_load(struct xarray *xa) 130 + { 131 + unsigned long i, j; 132 + 133 + for (i = 0; i < 1024; i++) { 134 + for (j = 0; j < 1024; j++) { 135 + void *entry = xa_load(xa, j); 136 + if (j < i) 137 + XA_BUG_ON(xa, xa_to_value(entry) != j); 138 + else 139 + XA_BUG_ON(xa, entry); 140 + } 141 + XA_BUG_ON(xa, xa_store_index(xa, i, GFP_KERNEL) != NULL); 142 + } 143 + 144 + for (i = 0; i < 1024; i++) { 145 + for (j = 0; j < 1024; j++) { 146 + void *entry = xa_load(xa, j); 147 + if (j >= i) 148 + XA_BUG_ON(xa, xa_to_value(entry) != j); 149 + else 150 + XA_BUG_ON(xa, entry); 151 + } 152 + xa_erase_index(xa, i); 153 + } 154 + XA_BUG_ON(xa, !xa_empty(xa)); 155 + } 156 + 157 + static noinline void check_xa_mark_1(struct xarray *xa, unsigned long index) 158 + { 159 + unsigned int order; 160 + unsigned int max_order = IS_ENABLED(CONFIG_XARRAY_MULTI) ? 8 : 1; 161 + 162 + /* NULL elements have no marks set */ 163 + XA_BUG_ON(xa, xa_get_mark(xa, index, XA_MARK_0)); 164 + xa_set_mark(xa, index, XA_MARK_0); 165 + XA_BUG_ON(xa, xa_get_mark(xa, index, XA_MARK_0)); 166 + 167 + /* Storing a pointer will not make a mark appear */ 168 + XA_BUG_ON(xa, xa_store_index(xa, index, GFP_KERNEL) != NULL); 169 + XA_BUG_ON(xa, xa_get_mark(xa, index, XA_MARK_0)); 170 + xa_set_mark(xa, index, XA_MARK_0); 171 + XA_BUG_ON(xa, !xa_get_mark(xa, index, XA_MARK_0)); 172 + 173 + /* Setting one mark will not set another mark */ 174 + XA_BUG_ON(xa, xa_get_mark(xa, index + 1, XA_MARK_0)); 175 + XA_BUG_ON(xa, xa_get_mark(xa, index, XA_MARK_1)); 176 + 177 + /* Storing NULL clears marks, and they can't be set again */ 178 + xa_erase_index(xa, index); 179 + XA_BUG_ON(xa, !xa_empty(xa)); 180 + XA_BUG_ON(xa, xa_get_mark(xa, index, XA_MARK_0)); 181 + xa_set_mark(xa, index, XA_MARK_0); 182 + XA_BUG_ON(xa, xa_get_mark(xa, index, XA_MARK_0)); 183 + 184 + /* 185 + * Storing a multi-index entry over entries with marks gives the 186 + * entire entry the union of the marks 187 + */ 188 + BUG_ON((index % 4) != 0); 189 + for (order = 2; order < max_order; order++) { 190 + unsigned long base = round_down(index, 1UL << order); 191 + unsigned long next = base + (1UL << order); 192 + unsigned long i; 193 + 194 + XA_BUG_ON(xa, xa_store_index(xa, index + 1, GFP_KERNEL)); 195 + xa_set_mark(xa, index + 1, XA_MARK_0); 196 + XA_BUG_ON(xa, xa_store_index(xa, index + 2, GFP_KERNEL)); 197 + xa_set_mark(xa, index + 2, XA_MARK_1); 198 + XA_BUG_ON(xa, xa_store_index(xa, next, GFP_KERNEL)); 199 + xa_store_order(xa, index, order, xa_mk_value(index), 200 + GFP_KERNEL); 201 + for (i = base; i < next; i++) { 202 + XA_STATE(xas, xa, i); 203 + unsigned int seen = 0; 204 + void *entry; 205 + 206 + XA_BUG_ON(xa, !xa_get_mark(xa, i, XA_MARK_0)); 207 + XA_BUG_ON(xa, !xa_get_mark(xa, i, XA_MARK_1)); 208 + XA_BUG_ON(xa, xa_get_mark(xa, i, XA_MARK_2)); 209 + 210 + /* We should see two elements in the array */ 211 + xas_for_each(&xas, entry, ULONG_MAX) 212 + seen++; 213 + XA_BUG_ON(xa, seen != 2); 214 + 215 + /* One of which is marked */ 216 + xas_set(&xas, 0); 217 + seen = 0; 218 + xas_for_each_marked(&xas, entry, ULONG_MAX, XA_MARK_0) 219 + seen++; 220 + XA_BUG_ON(xa, seen != 1); 221 + } 222 + XA_BUG_ON(xa, xa_get_mark(xa, next, XA_MARK_0)); 223 + XA_BUG_ON(xa, xa_get_mark(xa, next, XA_MARK_1)); 224 + XA_BUG_ON(xa, xa_get_mark(xa, next, XA_MARK_2)); 225 + xa_erase_index(xa, index); 226 + xa_erase_index(xa, next); 227 + XA_BUG_ON(xa, !xa_empty(xa)); 228 + } 229 + XA_BUG_ON(xa, !xa_empty(xa)); 230 + } 231 + 232 + static noinline void check_xa_mark_2(struct xarray *xa) 233 + { 234 + XA_STATE(xas, xa, 0); 235 + unsigned long index; 236 + unsigned int count = 0; 237 + void *entry; 238 + 239 + xa_store_index(xa, 0, GFP_KERNEL); 240 + xa_set_mark(xa, 0, XA_MARK_0); 241 + xas_lock(&xas); 242 + xas_load(&xas); 243 + xas_init_marks(&xas); 244 + xas_unlock(&xas); 245 + XA_BUG_ON(xa, !xa_get_mark(xa, 0, XA_MARK_0) == 0); 246 + 247 + for (index = 3500; index < 4500; index++) { 248 + xa_store_index(xa, index, GFP_KERNEL); 249 + xa_set_mark(xa, index, XA_MARK_0); 250 + } 251 + 252 + xas_reset(&xas); 253 + rcu_read_lock(); 254 + xas_for_each_marked(&xas, entry, ULONG_MAX, XA_MARK_0) 255 + count++; 256 + rcu_read_unlock(); 257 + XA_BUG_ON(xa, count != 1000); 258 + 259 + xas_lock(&xas); 260 + xas_for_each(&xas, entry, ULONG_MAX) { 261 + xas_init_marks(&xas); 262 + XA_BUG_ON(xa, !xa_get_mark(xa, xas.xa_index, XA_MARK_0)); 263 + XA_BUG_ON(xa, !xas_get_mark(&xas, XA_MARK_0)); 264 + } 265 + xas_unlock(&xas); 266 + 267 + xa_destroy(xa); 268 + } 269 + 270 + static noinline void check_xa_mark(struct xarray *xa) 271 + { 272 + unsigned long index; 273 + 274 + for (index = 0; index < 16384; index += 4) 275 + check_xa_mark_1(xa, index); 276 + 277 + check_xa_mark_2(xa); 278 + } 279 + 280 + static noinline void check_xa_shrink(struct xarray *xa) 281 + { 282 + XA_STATE(xas, xa, 1); 283 + struct xa_node *node; 284 + unsigned int order; 285 + unsigned int max_order = IS_ENABLED(CONFIG_XARRAY_MULTI) ? 15 : 1; 286 + 287 + XA_BUG_ON(xa, !xa_empty(xa)); 288 + XA_BUG_ON(xa, xa_store_index(xa, 0, GFP_KERNEL) != NULL); 289 + XA_BUG_ON(xa, xa_store_index(xa, 1, GFP_KERNEL) != NULL); 290 + 291 + /* 292 + * Check that erasing the entry at 1 shrinks the tree and properly 293 + * marks the node as being deleted. 294 + */ 295 + xas_lock(&xas); 296 + XA_BUG_ON(xa, xas_load(&xas) != xa_mk_value(1)); 297 + node = xas.xa_node; 298 + XA_BUG_ON(xa, xa_entry_locked(xa, node, 0) != xa_mk_value(0)); 299 + XA_BUG_ON(xa, xas_store(&xas, NULL) != xa_mk_value(1)); 300 + XA_BUG_ON(xa, xa_load(xa, 1) != NULL); 301 + XA_BUG_ON(xa, xas.xa_node != XAS_BOUNDS); 302 + XA_BUG_ON(xa, xa_entry_locked(xa, node, 0) != XA_RETRY_ENTRY); 303 + XA_BUG_ON(xa, xas_load(&xas) != NULL); 304 + xas_unlock(&xas); 305 + XA_BUG_ON(xa, xa_load(xa, 0) != xa_mk_value(0)); 306 + xa_erase_index(xa, 0); 307 + XA_BUG_ON(xa, !xa_empty(xa)); 308 + 309 + for (order = 0; order < max_order; order++) { 310 + unsigned long max = (1UL << order) - 1; 311 + xa_store_order(xa, 0, order, xa_mk_value(0), GFP_KERNEL); 312 + XA_BUG_ON(xa, xa_load(xa, max) != xa_mk_value(0)); 313 + XA_BUG_ON(xa, xa_load(xa, max + 1) != NULL); 314 + rcu_read_lock(); 315 + node = xa_head(xa); 316 + rcu_read_unlock(); 317 + XA_BUG_ON(xa, xa_store_index(xa, ULONG_MAX, GFP_KERNEL) != 318 + NULL); 319 + rcu_read_lock(); 320 + XA_BUG_ON(xa, xa_head(xa) == node); 321 + rcu_read_unlock(); 322 + XA_BUG_ON(xa, xa_load(xa, max + 1) != NULL); 323 + xa_erase_index(xa, ULONG_MAX); 324 + XA_BUG_ON(xa, xa->xa_head != node); 325 + xa_erase_index(xa, 0); 326 + } 327 + } 328 + 329 + static noinline void check_cmpxchg(struct xarray *xa) 330 + { 331 + void *FIVE = xa_mk_value(5); 332 + void *SIX = xa_mk_value(6); 333 + void *LOTS = xa_mk_value(12345678); 334 + 335 + XA_BUG_ON(xa, !xa_empty(xa)); 336 + XA_BUG_ON(xa, xa_store_index(xa, 12345678, GFP_KERNEL) != NULL); 337 + XA_BUG_ON(xa, xa_insert(xa, 12345678, xa, GFP_KERNEL) != -EEXIST); 338 + XA_BUG_ON(xa, xa_cmpxchg(xa, 12345678, SIX, FIVE, GFP_KERNEL) != LOTS); 339 + XA_BUG_ON(xa, xa_cmpxchg(xa, 12345678, LOTS, FIVE, GFP_KERNEL) != LOTS); 340 + XA_BUG_ON(xa, xa_cmpxchg(xa, 12345678, FIVE, LOTS, GFP_KERNEL) != FIVE); 341 + XA_BUG_ON(xa, xa_cmpxchg(xa, 5, FIVE, NULL, GFP_KERNEL) != NULL); 342 + XA_BUG_ON(xa, xa_cmpxchg(xa, 5, NULL, FIVE, GFP_KERNEL) != NULL); 343 + xa_erase_index(xa, 12345678); 344 + xa_erase_index(xa, 5); 345 + XA_BUG_ON(xa, !xa_empty(xa)); 346 + } 347 + 348 + static noinline void check_reserve(struct xarray *xa) 349 + { 350 + void *entry; 351 + unsigned long index = 0; 352 + 353 + /* An array with a reserved entry is not empty */ 354 + XA_BUG_ON(xa, !xa_empty(xa)); 355 + xa_reserve(xa, 12345678, GFP_KERNEL); 356 + XA_BUG_ON(xa, xa_empty(xa)); 357 + XA_BUG_ON(xa, xa_load(xa, 12345678)); 358 + xa_release(xa, 12345678); 359 + XA_BUG_ON(xa, !xa_empty(xa)); 360 + 361 + /* Releasing a used entry does nothing */ 362 + xa_reserve(xa, 12345678, GFP_KERNEL); 363 + XA_BUG_ON(xa, xa_store_index(xa, 12345678, GFP_NOWAIT) != NULL); 364 + xa_release(xa, 12345678); 365 + xa_erase_index(xa, 12345678); 366 + XA_BUG_ON(xa, !xa_empty(xa)); 367 + 368 + /* cmpxchg sees a reserved entry as NULL */ 369 + xa_reserve(xa, 12345678, GFP_KERNEL); 370 + XA_BUG_ON(xa, xa_cmpxchg(xa, 12345678, NULL, xa_mk_value(12345678), 371 + GFP_NOWAIT) != NULL); 372 + xa_release(xa, 12345678); 373 + xa_erase_index(xa, 12345678); 374 + XA_BUG_ON(xa, !xa_empty(xa)); 375 + 376 + /* Can iterate through a reserved entry */ 377 + xa_store_index(xa, 5, GFP_KERNEL); 378 + xa_reserve(xa, 6, GFP_KERNEL); 379 + xa_store_index(xa, 7, GFP_KERNEL); 380 + 381 + xa_for_each(xa, entry, index, ULONG_MAX, XA_PRESENT) { 382 + XA_BUG_ON(xa, index != 5 && index != 7); 383 + } 384 + xa_destroy(xa); 385 + } 386 + 387 + static noinline void check_xas_erase(struct xarray *xa) 388 + { 389 + XA_STATE(xas, xa, 0); 390 + void *entry; 391 + unsigned long i, j; 392 + 393 + for (i = 0; i < 200; i++) { 394 + for (j = i; j < 2 * i + 17; j++) { 395 + xas_set(&xas, j); 396 + do { 397 + xas_lock(&xas); 398 + xas_store(&xas, xa_mk_value(j)); 399 + xas_unlock(&xas); 400 + } while (xas_nomem(&xas, GFP_KERNEL)); 401 + } 402 + 403 + xas_set(&xas, ULONG_MAX); 404 + do { 405 + xas_lock(&xas); 406 + xas_store(&xas, xa_mk_value(0)); 407 + xas_unlock(&xas); 408 + } while (xas_nomem(&xas, GFP_KERNEL)); 409 + 410 + xas_lock(&xas); 411 + xas_store(&xas, NULL); 412 + 413 + xas_set(&xas, 0); 414 + j = i; 415 + xas_for_each(&xas, entry, ULONG_MAX) { 416 + XA_BUG_ON(xa, entry != xa_mk_value(j)); 417 + xas_store(&xas, NULL); 418 + j++; 419 + } 420 + xas_unlock(&xas); 421 + XA_BUG_ON(xa, !xa_empty(xa)); 422 + } 423 + } 424 + 425 + #ifdef CONFIG_XARRAY_MULTI 426 + static noinline void check_multi_store_1(struct xarray *xa, unsigned long index, 427 + unsigned int order) 428 + { 429 + XA_STATE(xas, xa, index); 430 + unsigned long min = index & ~((1UL << order) - 1); 431 + unsigned long max = min + (1UL << order); 432 + 433 + xa_store_order(xa, index, order, xa_mk_value(index), GFP_KERNEL); 434 + XA_BUG_ON(xa, xa_load(xa, min) != xa_mk_value(index)); 435 + XA_BUG_ON(xa, xa_load(xa, max - 1) != xa_mk_value(index)); 436 + XA_BUG_ON(xa, xa_load(xa, max) != NULL); 437 + XA_BUG_ON(xa, xa_load(xa, min - 1) != NULL); 438 + 439 + XA_BUG_ON(xa, xas_store(&xas, xa_mk_value(min)) != xa_mk_value(index)); 440 + XA_BUG_ON(xa, xa_load(xa, min) != xa_mk_value(min)); 441 + XA_BUG_ON(xa, xa_load(xa, max - 1) != xa_mk_value(min)); 442 + XA_BUG_ON(xa, xa_load(xa, max) != NULL); 443 + XA_BUG_ON(xa, xa_load(xa, min - 1) != NULL); 444 + 445 + xa_erase_index(xa, min); 446 + XA_BUG_ON(xa, !xa_empty(xa)); 447 + } 448 + 449 + static noinline void check_multi_store_2(struct xarray *xa, unsigned long index, 450 + unsigned int order) 451 + { 452 + XA_STATE(xas, xa, index); 453 + xa_store_order(xa, index, order, xa_mk_value(0), GFP_KERNEL); 454 + 455 + XA_BUG_ON(xa, xas_store(&xas, xa_mk_value(1)) != xa_mk_value(0)); 456 + XA_BUG_ON(xa, xas.xa_index != index); 457 + XA_BUG_ON(xa, xas_store(&xas, NULL) != xa_mk_value(1)); 458 + XA_BUG_ON(xa, !xa_empty(xa)); 459 + } 460 + #endif 461 + 462 + static noinline void check_multi_store(struct xarray *xa) 463 + { 464 + #ifdef CONFIG_XARRAY_MULTI 465 + unsigned long i, j, k; 466 + unsigned int max_order = (sizeof(long) == 4) ? 30 : 60; 467 + 468 + /* Loading from any position returns the same value */ 469 + xa_store_order(xa, 0, 1, xa_mk_value(0), GFP_KERNEL); 470 + XA_BUG_ON(xa, xa_load(xa, 0) != xa_mk_value(0)); 471 + XA_BUG_ON(xa, xa_load(xa, 1) != xa_mk_value(0)); 472 + XA_BUG_ON(xa, xa_load(xa, 2) != NULL); 473 + rcu_read_lock(); 474 + XA_BUG_ON(xa, xa_to_node(xa_head(xa))->count != 2); 475 + XA_BUG_ON(xa, xa_to_node(xa_head(xa))->nr_values != 2); 476 + rcu_read_unlock(); 477 + 478 + /* Storing adjacent to the value does not alter the value */ 479 + xa_store(xa, 3, xa, GFP_KERNEL); 480 + XA_BUG_ON(xa, xa_load(xa, 0) != xa_mk_value(0)); 481 + XA_BUG_ON(xa, xa_load(xa, 1) != xa_mk_value(0)); 482 + XA_BUG_ON(xa, xa_load(xa, 2) != NULL); 483 + rcu_read_lock(); 484 + XA_BUG_ON(xa, xa_to_node(xa_head(xa))->count != 3); 485 + XA_BUG_ON(xa, xa_to_node(xa_head(xa))->nr_values != 2); 486 + rcu_read_unlock(); 487 + 488 + /* Overwriting multiple indexes works */ 489 + xa_store_order(xa, 0, 2, xa_mk_value(1), GFP_KERNEL); 490 + XA_BUG_ON(xa, xa_load(xa, 0) != xa_mk_value(1)); 491 + XA_BUG_ON(xa, xa_load(xa, 1) != xa_mk_value(1)); 492 + XA_BUG_ON(xa, xa_load(xa, 2) != xa_mk_value(1)); 493 + XA_BUG_ON(xa, xa_load(xa, 3) != xa_mk_value(1)); 494 + XA_BUG_ON(xa, xa_load(xa, 4) != NULL); 495 + rcu_read_lock(); 496 + XA_BUG_ON(xa, xa_to_node(xa_head(xa))->count != 4); 497 + XA_BUG_ON(xa, xa_to_node(xa_head(xa))->nr_values != 4); 498 + rcu_read_unlock(); 499 + 500 + /* We can erase multiple values with a single store */ 501 + xa_store_order(xa, 0, 63, NULL, GFP_KERNEL); 502 + XA_BUG_ON(xa, !xa_empty(xa)); 503 + 504 + /* Even when the first slot is empty but the others aren't */ 505 + xa_store_index(xa, 1, GFP_KERNEL); 506 + xa_store_index(xa, 2, GFP_KERNEL); 507 + xa_store_order(xa, 0, 2, NULL, GFP_KERNEL); 508 + XA_BUG_ON(xa, !xa_empty(xa)); 509 + 510 + for (i = 0; i < max_order; i++) { 511 + for (j = 0; j < max_order; j++) { 512 + xa_store_order(xa, 0, i, xa_mk_value(i), GFP_KERNEL); 513 + xa_store_order(xa, 0, j, xa_mk_value(j), GFP_KERNEL); 514 + 515 + for (k = 0; k < max_order; k++) { 516 + void *entry = xa_load(xa, (1UL << k) - 1); 517 + if ((i < k) && (j < k)) 518 + XA_BUG_ON(xa, entry != NULL); 519 + else 520 + XA_BUG_ON(xa, entry != xa_mk_value(j)); 521 + } 522 + 523 + xa_erase(xa, 0); 524 + XA_BUG_ON(xa, !xa_empty(xa)); 525 + } 526 + } 527 + 528 + for (i = 0; i < 20; i++) { 529 + check_multi_store_1(xa, 200, i); 530 + check_multi_store_1(xa, 0, i); 531 + check_multi_store_1(xa, (1UL << i) + 1, i); 532 + } 533 + check_multi_store_2(xa, 4095, 9); 534 + #endif 535 + } 536 + 537 + static DEFINE_XARRAY_ALLOC(xa0); 538 + 539 + static noinline void check_xa_alloc(void) 540 + { 541 + int i; 542 + u32 id; 543 + 544 + /* An empty array should assign 0 to the first alloc */ 545 + xa_alloc_index(&xa0, 0, GFP_KERNEL); 546 + 547 + /* Erasing it should make the array empty again */ 548 + xa_erase_index(&xa0, 0); 549 + XA_BUG_ON(&xa0, !xa_empty(&xa0)); 550 + 551 + /* And it should assign 0 again */ 552 + xa_alloc_index(&xa0, 0, GFP_KERNEL); 553 + 554 + /* The next assigned ID should be 1 */ 555 + xa_alloc_index(&xa0, 1, GFP_KERNEL); 556 + xa_erase_index(&xa0, 1); 557 + 558 + /* Storing a value should mark it used */ 559 + xa_store_index(&xa0, 1, GFP_KERNEL); 560 + xa_alloc_index(&xa0, 2, GFP_KERNEL); 561 + 562 + /* If we then erase 0, it should be free */ 563 + xa_erase_index(&xa0, 0); 564 + xa_alloc_index(&xa0, 0, GFP_KERNEL); 565 + 566 + xa_erase_index(&xa0, 1); 567 + xa_erase_index(&xa0, 2); 568 + 569 + for (i = 1; i < 5000; i++) { 570 + xa_alloc_index(&xa0, i, GFP_KERNEL); 571 + } 572 + 573 + xa_destroy(&xa0); 574 + 575 + id = 0xfffffffeU; 576 + XA_BUG_ON(&xa0, xa_alloc(&xa0, &id, UINT_MAX, xa_mk_value(0), 577 + GFP_KERNEL) != 0); 578 + XA_BUG_ON(&xa0, id != 0xfffffffeU); 579 + XA_BUG_ON(&xa0, xa_alloc(&xa0, &id, UINT_MAX, xa_mk_value(0), 580 + GFP_KERNEL) != 0); 581 + XA_BUG_ON(&xa0, id != 0xffffffffU); 582 + XA_BUG_ON(&xa0, xa_alloc(&xa0, &id, UINT_MAX, xa_mk_value(0), 583 + GFP_KERNEL) != -ENOSPC); 584 + XA_BUG_ON(&xa0, id != 0xffffffffU); 585 + xa_destroy(&xa0); 586 + } 587 + 588 + static noinline void __check_store_iter(struct xarray *xa, unsigned long start, 589 + unsigned int order, unsigned int present) 590 + { 591 + XA_STATE_ORDER(xas, xa, start, order); 592 + void *entry; 593 + unsigned int count = 0; 594 + 595 + retry: 596 + xas_lock(&xas); 597 + xas_for_each_conflict(&xas, entry) { 598 + XA_BUG_ON(xa, !xa_is_value(entry)); 599 + XA_BUG_ON(xa, entry < xa_mk_value(start)); 600 + XA_BUG_ON(xa, entry > xa_mk_value(start + (1UL << order) - 1)); 601 + count++; 602 + } 603 + xas_store(&xas, xa_mk_value(start)); 604 + xas_unlock(&xas); 605 + if (xas_nomem(&xas, GFP_KERNEL)) { 606 + count = 0; 607 + goto retry; 608 + } 609 + XA_BUG_ON(xa, xas_error(&xas)); 610 + XA_BUG_ON(xa, count != present); 611 + XA_BUG_ON(xa, xa_load(xa, start) != xa_mk_value(start)); 612 + XA_BUG_ON(xa, xa_load(xa, start + (1UL << order) - 1) != 613 + xa_mk_value(start)); 614 + xa_erase_index(xa, start); 615 + } 616 + 617 + static noinline void check_store_iter(struct xarray *xa) 618 + { 619 + unsigned int i, j; 620 + unsigned int max_order = IS_ENABLED(CONFIG_XARRAY_MULTI) ? 20 : 1; 621 + 622 + for (i = 0; i < max_order; i++) { 623 + unsigned int min = 1 << i; 624 + unsigned int max = (2 << i) - 1; 625 + __check_store_iter(xa, 0, i, 0); 626 + XA_BUG_ON(xa, !xa_empty(xa)); 627 + __check_store_iter(xa, min, i, 0); 628 + XA_BUG_ON(xa, !xa_empty(xa)); 629 + 630 + xa_store_index(xa, min, GFP_KERNEL); 631 + __check_store_iter(xa, min, i, 1); 632 + XA_BUG_ON(xa, !xa_empty(xa)); 633 + xa_store_index(xa, max, GFP_KERNEL); 634 + __check_store_iter(xa, min, i, 1); 635 + XA_BUG_ON(xa, !xa_empty(xa)); 636 + 637 + for (j = 0; j < min; j++) 638 + xa_store_index(xa, j, GFP_KERNEL); 639 + __check_store_iter(xa, 0, i, min); 640 + XA_BUG_ON(xa, !xa_empty(xa)); 641 + for (j = 0; j < min; j++) 642 + xa_store_index(xa, min + j, GFP_KERNEL); 643 + __check_store_iter(xa, min, i, min); 644 + XA_BUG_ON(xa, !xa_empty(xa)); 645 + } 646 + #ifdef CONFIG_XARRAY_MULTI 647 + xa_store_index(xa, 63, GFP_KERNEL); 648 + xa_store_index(xa, 65, GFP_KERNEL); 649 + __check_store_iter(xa, 64, 2, 1); 650 + xa_erase_index(xa, 63); 651 + #endif 652 + XA_BUG_ON(xa, !xa_empty(xa)); 653 + } 654 + 655 + static noinline void check_multi_find(struct xarray *xa) 656 + { 657 + #ifdef CONFIG_XARRAY_MULTI 658 + unsigned long index; 659 + 660 + xa_store_order(xa, 12, 2, xa_mk_value(12), GFP_KERNEL); 661 + XA_BUG_ON(xa, xa_store_index(xa, 16, GFP_KERNEL) != NULL); 662 + 663 + index = 0; 664 + XA_BUG_ON(xa, xa_find(xa, &index, ULONG_MAX, XA_PRESENT) != 665 + xa_mk_value(12)); 666 + XA_BUG_ON(xa, index != 12); 667 + index = 13; 668 + XA_BUG_ON(xa, xa_find(xa, &index, ULONG_MAX, XA_PRESENT) != 669 + xa_mk_value(12)); 670 + XA_BUG_ON(xa, (index < 12) || (index >= 16)); 671 + XA_BUG_ON(xa, xa_find_after(xa, &index, ULONG_MAX, XA_PRESENT) != 672 + xa_mk_value(16)); 673 + XA_BUG_ON(xa, index != 16); 674 + 675 + xa_erase_index(xa, 12); 676 + xa_erase_index(xa, 16); 677 + XA_BUG_ON(xa, !xa_empty(xa)); 678 + #endif 679 + } 680 + 681 + static noinline void check_multi_find_2(struct xarray *xa) 682 + { 683 + unsigned int max_order = IS_ENABLED(CONFIG_XARRAY_MULTI) ? 10 : 1; 684 + unsigned int i, j; 685 + void *entry; 686 + 687 + for (i = 0; i < max_order; i++) { 688 + unsigned long index = 1UL << i; 689 + for (j = 0; j < index; j++) { 690 + XA_STATE(xas, xa, j + index); 691 + xa_store_index(xa, index - 1, GFP_KERNEL); 692 + xa_store_order(xa, index, i, xa_mk_value(index), 693 + GFP_KERNEL); 694 + rcu_read_lock(); 695 + xas_for_each(&xas, entry, ULONG_MAX) { 696 + xa_erase_index(xa, index); 697 + } 698 + rcu_read_unlock(); 699 + xa_erase_index(xa, index - 1); 700 + XA_BUG_ON(xa, !xa_empty(xa)); 701 + } 702 + } 703 + } 704 + 705 + static noinline void check_find(struct xarray *xa) 706 + { 707 + unsigned long i, j, k; 708 + 709 + XA_BUG_ON(xa, !xa_empty(xa)); 710 + 711 + /* 712 + * Check xa_find with all pairs between 0 and 99 inclusive, 713 + * starting at every index between 0 and 99 714 + */ 715 + for (i = 0; i < 100; i++) { 716 + XA_BUG_ON(xa, xa_store_index(xa, i, GFP_KERNEL) != NULL); 717 + xa_set_mark(xa, i, XA_MARK_0); 718 + for (j = 0; j < i; j++) { 719 + XA_BUG_ON(xa, xa_store_index(xa, j, GFP_KERNEL) != 720 + NULL); 721 + xa_set_mark(xa, j, XA_MARK_0); 722 + for (k = 0; k < 100; k++) { 723 + unsigned long index = k; 724 + void *entry = xa_find(xa, &index, ULONG_MAX, 725 + XA_PRESENT); 726 + if (k <= j) 727 + XA_BUG_ON(xa, index != j); 728 + else if (k <= i) 729 + XA_BUG_ON(xa, index != i); 730 + else 731 + XA_BUG_ON(xa, entry != NULL); 732 + 733 + index = k; 734 + entry = xa_find(xa, &index, ULONG_MAX, 735 + XA_MARK_0); 736 + if (k <= j) 737 + XA_BUG_ON(xa, index != j); 738 + else if (k <= i) 739 + XA_BUG_ON(xa, index != i); 740 + else 741 + XA_BUG_ON(xa, entry != NULL); 742 + } 743 + xa_erase_index(xa, j); 744 + XA_BUG_ON(xa, xa_get_mark(xa, j, XA_MARK_0)); 745 + XA_BUG_ON(xa, !xa_get_mark(xa, i, XA_MARK_0)); 746 + } 747 + xa_erase_index(xa, i); 748 + XA_BUG_ON(xa, xa_get_mark(xa, i, XA_MARK_0)); 749 + } 750 + XA_BUG_ON(xa, !xa_empty(xa)); 751 + check_multi_find(xa); 752 + check_multi_find_2(xa); 753 + } 754 + 755 + /* See find_swap_entry() in mm/shmem.c */ 756 + static noinline unsigned long xa_find_entry(struct xarray *xa, void *item) 757 + { 758 + XA_STATE(xas, xa, 0); 759 + unsigned int checked = 0; 760 + void *entry; 761 + 762 + rcu_read_lock(); 763 + xas_for_each(&xas, entry, ULONG_MAX) { 764 + if (xas_retry(&xas, entry)) 765 + continue; 766 + if (entry == item) 767 + break; 768 + checked++; 769 + if ((checked % 4) != 0) 770 + continue; 771 + xas_pause(&xas); 772 + } 773 + rcu_read_unlock(); 774 + 775 + return entry ? xas.xa_index : -1; 776 + } 777 + 778 + static noinline void check_find_entry(struct xarray *xa) 779 + { 780 + #ifdef CONFIG_XARRAY_MULTI 781 + unsigned int order; 782 + unsigned long offset, index; 783 + 784 + for (order = 0; order < 20; order++) { 785 + for (offset = 0; offset < (1UL << (order + 3)); 786 + offset += (1UL << order)) { 787 + for (index = 0; index < (1UL << (order + 5)); 788 + index += (1UL << order)) { 789 + xa_store_order(xa, index, order, 790 + xa_mk_value(index), GFP_KERNEL); 791 + XA_BUG_ON(xa, xa_load(xa, index) != 792 + xa_mk_value(index)); 793 + XA_BUG_ON(xa, xa_find_entry(xa, 794 + xa_mk_value(index)) != index); 795 + } 796 + XA_BUG_ON(xa, xa_find_entry(xa, xa) != -1); 797 + xa_destroy(xa); 798 + } 799 + } 800 + #endif 801 + 802 + XA_BUG_ON(xa, xa_find_entry(xa, xa) != -1); 803 + xa_store_index(xa, ULONG_MAX, GFP_KERNEL); 804 + XA_BUG_ON(xa, xa_find_entry(xa, xa) != -1); 805 + XA_BUG_ON(xa, xa_find_entry(xa, xa_mk_value(LONG_MAX)) != -1); 806 + xa_erase_index(xa, ULONG_MAX); 807 + XA_BUG_ON(xa, !xa_empty(xa)); 808 + } 809 + 810 + static noinline void check_move_small(struct xarray *xa, unsigned long idx) 811 + { 812 + XA_STATE(xas, xa, 0); 813 + unsigned long i; 814 + 815 + xa_store_index(xa, 0, GFP_KERNEL); 816 + xa_store_index(xa, idx, GFP_KERNEL); 817 + 818 + rcu_read_lock(); 819 + for (i = 0; i < idx * 4; i++) { 820 + void *entry = xas_next(&xas); 821 + if (i <= idx) 822 + XA_BUG_ON(xa, xas.xa_node == XAS_RESTART); 823 + XA_BUG_ON(xa, xas.xa_index != i); 824 + if (i == 0 || i == idx) 825 + XA_BUG_ON(xa, entry != xa_mk_value(i)); 826 + else 827 + XA_BUG_ON(xa, entry != NULL); 828 + } 829 + xas_next(&xas); 830 + XA_BUG_ON(xa, xas.xa_index != i); 831 + 832 + do { 833 + void *entry = xas_prev(&xas); 834 + i--; 835 + if (i <= idx) 836 + XA_BUG_ON(xa, xas.xa_node == XAS_RESTART); 837 + XA_BUG_ON(xa, xas.xa_index != i); 838 + if (i == 0 || i == idx) 839 + XA_BUG_ON(xa, entry != xa_mk_value(i)); 840 + else 841 + XA_BUG_ON(xa, entry != NULL); 842 + } while (i > 0); 843 + 844 + xas_set(&xas, ULONG_MAX); 845 + XA_BUG_ON(xa, xas_next(&xas) != NULL); 846 + XA_BUG_ON(xa, xas.xa_index != ULONG_MAX); 847 + XA_BUG_ON(xa, xas_next(&xas) != xa_mk_value(0)); 848 + XA_BUG_ON(xa, xas.xa_index != 0); 849 + XA_BUG_ON(xa, xas_prev(&xas) != NULL); 850 + XA_BUG_ON(xa, xas.xa_index != ULONG_MAX); 851 + rcu_read_unlock(); 852 + 853 + xa_erase_index(xa, 0); 854 + xa_erase_index(xa, idx); 855 + XA_BUG_ON(xa, !xa_empty(xa)); 856 + } 857 + 858 + static noinline void check_move(struct xarray *xa) 859 + { 860 + XA_STATE(xas, xa, (1 << 16) - 1); 861 + unsigned long i; 862 + 863 + for (i = 0; i < (1 << 16); i++) 864 + XA_BUG_ON(xa, xa_store_index(xa, i, GFP_KERNEL) != NULL); 865 + 866 + rcu_read_lock(); 867 + do { 868 + void *entry = xas_prev(&xas); 869 + i--; 870 + XA_BUG_ON(xa, entry != xa_mk_value(i)); 871 + XA_BUG_ON(xa, i != xas.xa_index); 872 + } while (i != 0); 873 + 874 + XA_BUG_ON(xa, xas_prev(&xas) != NULL); 875 + XA_BUG_ON(xa, xas.xa_index != ULONG_MAX); 876 + 877 + do { 878 + void *entry = xas_next(&xas); 879 + XA_BUG_ON(xa, entry != xa_mk_value(i)); 880 + XA_BUG_ON(xa, i != xas.xa_index); 881 + i++; 882 + } while (i < (1 << 16)); 883 + rcu_read_unlock(); 884 + 885 + for (i = (1 << 8); i < (1 << 15); i++) 886 + xa_erase_index(xa, i); 887 + 888 + i = xas.xa_index; 889 + 890 + rcu_read_lock(); 891 + do { 892 + void *entry = xas_prev(&xas); 893 + i--; 894 + if ((i < (1 << 8)) || (i >= (1 << 15))) 895 + XA_BUG_ON(xa, entry != xa_mk_value(i)); 896 + else 897 + XA_BUG_ON(xa, entry != NULL); 898 + XA_BUG_ON(xa, i != xas.xa_index); 899 + } while (i != 0); 900 + 901 + XA_BUG_ON(xa, xas_prev(&xas) != NULL); 902 + XA_BUG_ON(xa, xas.xa_index != ULONG_MAX); 903 + 904 + do { 905 + void *entry = xas_next(&xas); 906 + if ((i < (1 << 8)) || (i >= (1 << 15))) 907 + XA_BUG_ON(xa, entry != xa_mk_value(i)); 908 + else 909 + XA_BUG_ON(xa, entry != NULL); 910 + XA_BUG_ON(xa, i != xas.xa_index); 911 + i++; 912 + } while (i < (1 << 16)); 913 + rcu_read_unlock(); 914 + 915 + xa_destroy(xa); 916 + 917 + for (i = 0; i < 16; i++) 918 + check_move_small(xa, 1UL << i); 919 + 920 + for (i = 2; i < 16; i++) 921 + check_move_small(xa, (1UL << i) - 1); 922 + } 923 + 924 + static noinline void xa_store_many_order(struct xarray *xa, 925 + unsigned long index, unsigned order) 926 + { 927 + XA_STATE_ORDER(xas, xa, index, order); 928 + unsigned int i = 0; 929 + 930 + do { 931 + xas_lock(&xas); 932 + XA_BUG_ON(xa, xas_find_conflict(&xas)); 933 + xas_create_range(&xas); 934 + if (xas_error(&xas)) 935 + goto unlock; 936 + for (i = 0; i < (1U << order); i++) { 937 + XA_BUG_ON(xa, xas_store(&xas, xa_mk_value(index + i))); 938 + xas_next(&xas); 939 + } 940 + unlock: 941 + xas_unlock(&xas); 942 + } while (xas_nomem(&xas, GFP_KERNEL)); 943 + 944 + XA_BUG_ON(xa, xas_error(&xas)); 945 + } 946 + 947 + static noinline void check_create_range_1(struct xarray *xa, 948 + unsigned long index, unsigned order) 949 + { 950 + unsigned long i; 951 + 952 + xa_store_many_order(xa, index, order); 953 + for (i = index; i < index + (1UL << order); i++) 954 + xa_erase_index(xa, i); 955 + XA_BUG_ON(xa, !xa_empty(xa)); 956 + } 957 + 958 + static noinline void check_create_range_2(struct xarray *xa, unsigned order) 959 + { 960 + unsigned long i; 961 + unsigned long nr = 1UL << order; 962 + 963 + for (i = 0; i < nr * nr; i += nr) 964 + xa_store_many_order(xa, i, order); 965 + for (i = 0; i < nr * nr; i++) 966 + xa_erase_index(xa, i); 967 + XA_BUG_ON(xa, !xa_empty(xa)); 968 + } 969 + 970 + static noinline void check_create_range_3(void) 971 + { 972 + XA_STATE(xas, NULL, 0); 973 + xas_set_err(&xas, -EEXIST); 974 + xas_create_range(&xas); 975 + XA_BUG_ON(NULL, xas_error(&xas) != -EEXIST); 976 + } 977 + 978 + static noinline void check_create_range_4(struct xarray *xa, 979 + unsigned long index, unsigned order) 980 + { 981 + XA_STATE_ORDER(xas, xa, index, order); 982 + unsigned long base = xas.xa_index; 983 + unsigned long i = 0; 984 + 985 + xa_store_index(xa, index, GFP_KERNEL); 986 + do { 987 + xas_lock(&xas); 988 + xas_create_range(&xas); 989 + if (xas_error(&xas)) 990 + goto unlock; 991 + for (i = 0; i < (1UL << order); i++) { 992 + void *old = xas_store(&xas, xa_mk_value(base + i)); 993 + if (xas.xa_index == index) 994 + XA_BUG_ON(xa, old != xa_mk_value(base + i)); 995 + else 996 + XA_BUG_ON(xa, old != NULL); 997 + xas_next(&xas); 998 + } 999 + unlock: 1000 + xas_unlock(&xas); 1001 + } while (xas_nomem(&xas, GFP_KERNEL)); 1002 + 1003 + XA_BUG_ON(xa, xas_error(&xas)); 1004 + 1005 + for (i = base; i < base + (1UL << order); i++) 1006 + xa_erase_index(xa, i); 1007 + XA_BUG_ON(xa, !xa_empty(xa)); 1008 + } 1009 + 1010 + static noinline void check_create_range(struct xarray *xa) 1011 + { 1012 + unsigned int order; 1013 + unsigned int max_order = IS_ENABLED(CONFIG_XARRAY_MULTI) ? 12 : 1; 1014 + 1015 + for (order = 0; order < max_order; order++) { 1016 + check_create_range_1(xa, 0, order); 1017 + check_create_range_1(xa, 1U << order, order); 1018 + check_create_range_1(xa, 2U << order, order); 1019 + check_create_range_1(xa, 3U << order, order); 1020 + check_create_range_1(xa, 1U << 24, order); 1021 + if (order < 10) 1022 + check_create_range_2(xa, order); 1023 + 1024 + check_create_range_4(xa, 0, order); 1025 + check_create_range_4(xa, 1U << order, order); 1026 + check_create_range_4(xa, 2U << order, order); 1027 + check_create_range_4(xa, 3U << order, order); 1028 + check_create_range_4(xa, 1U << 24, order); 1029 + 1030 + check_create_range_4(xa, 1, order); 1031 + check_create_range_4(xa, (1U << order) + 1, order); 1032 + check_create_range_4(xa, (2U << order) + 1, order); 1033 + check_create_range_4(xa, (2U << order) - 1, order); 1034 + check_create_range_4(xa, (3U << order) + 1, order); 1035 + check_create_range_4(xa, (3U << order) - 1, order); 1036 + check_create_range_4(xa, (1U << 24) + 1, order); 1037 + } 1038 + 1039 + check_create_range_3(); 1040 + } 1041 + 1042 + static noinline void __check_store_range(struct xarray *xa, unsigned long first, 1043 + unsigned long last) 1044 + { 1045 + #ifdef CONFIG_XARRAY_MULTI 1046 + xa_store_range(xa, first, last, xa_mk_value(first), GFP_KERNEL); 1047 + 1048 + XA_BUG_ON(xa, xa_load(xa, first) != xa_mk_value(first)); 1049 + XA_BUG_ON(xa, xa_load(xa, last) != xa_mk_value(first)); 1050 + XA_BUG_ON(xa, xa_load(xa, first - 1) != NULL); 1051 + XA_BUG_ON(xa, xa_load(xa, last + 1) != NULL); 1052 + 1053 + xa_store_range(xa, first, last, NULL, GFP_KERNEL); 1054 + #endif 1055 + 1056 + XA_BUG_ON(xa, !xa_empty(xa)); 1057 + } 1058 + 1059 + static noinline void check_store_range(struct xarray *xa) 1060 + { 1061 + unsigned long i, j; 1062 + 1063 + for (i = 0; i < 128; i++) { 1064 + for (j = i; j < 128; j++) { 1065 + __check_store_range(xa, i, j); 1066 + __check_store_range(xa, 128 + i, 128 + j); 1067 + __check_store_range(xa, 4095 + i, 4095 + j); 1068 + __check_store_range(xa, 4096 + i, 4096 + j); 1069 + __check_store_range(xa, 123456 + i, 123456 + j); 1070 + __check_store_range(xa, UINT_MAX + i, UINT_MAX + j); 1071 + } 1072 + } 1073 + } 1074 + 1075 + static LIST_HEAD(shadow_nodes); 1076 + 1077 + static void test_update_node(struct xa_node *node) 1078 + { 1079 + if (node->count && node->count == node->nr_values) { 1080 + if (list_empty(&node->private_list)) 1081 + list_add(&shadow_nodes, &node->private_list); 1082 + } else { 1083 + if (!list_empty(&node->private_list)) 1084 + list_del_init(&node->private_list); 1085 + } 1086 + } 1087 + 1088 + static noinline void shadow_remove(struct xarray *xa) 1089 + { 1090 + struct xa_node *node; 1091 + 1092 + xa_lock(xa); 1093 + while ((node = list_first_entry_or_null(&shadow_nodes, 1094 + struct xa_node, private_list))) { 1095 + XA_STATE(xas, node->array, 0); 1096 + XA_BUG_ON(xa, node->array != xa); 1097 + list_del_init(&node->private_list); 1098 + xas.xa_node = xa_parent_locked(node->array, node); 1099 + xas.xa_offset = node->offset; 1100 + xas.xa_shift = node->shift + XA_CHUNK_SHIFT; 1101 + xas_set_update(&xas, test_update_node); 1102 + xas_store(&xas, NULL); 1103 + } 1104 + xa_unlock(xa); 1105 + } 1106 + 1107 + static noinline void check_workingset(struct xarray *xa, unsigned long index) 1108 + { 1109 + XA_STATE(xas, xa, index); 1110 + xas_set_update(&xas, test_update_node); 1111 + 1112 + do { 1113 + xas_lock(&xas); 1114 + xas_store(&xas, xa_mk_value(0)); 1115 + xas_next(&xas); 1116 + xas_store(&xas, xa_mk_value(1)); 1117 + xas_unlock(&xas); 1118 + } while (xas_nomem(&xas, GFP_KERNEL)); 1119 + 1120 + XA_BUG_ON(xa, list_empty(&shadow_nodes)); 1121 + 1122 + xas_lock(&xas); 1123 + xas_next(&xas); 1124 + xas_store(&xas, &xas); 1125 + XA_BUG_ON(xa, !list_empty(&shadow_nodes)); 1126 + 1127 + xas_store(&xas, xa_mk_value(2)); 1128 + xas_unlock(&xas); 1129 + XA_BUG_ON(xa, list_empty(&shadow_nodes)); 1130 + 1131 + shadow_remove(xa); 1132 + XA_BUG_ON(xa, !list_empty(&shadow_nodes)); 1133 + XA_BUG_ON(xa, !xa_empty(xa)); 1134 + } 1135 + 1136 + /* 1137 + * Check that the pointer / value / sibling entries are accounted the 1138 + * way we expect them to be. 1139 + */ 1140 + static noinline void check_account(struct xarray *xa) 1141 + { 1142 + #ifdef CONFIG_XARRAY_MULTI 1143 + unsigned int order; 1144 + 1145 + for (order = 1; order < 12; order++) { 1146 + XA_STATE(xas, xa, 1 << order); 1147 + 1148 + xa_store_order(xa, 0, order, xa, GFP_KERNEL); 1149 + xas_load(&xas); 1150 + XA_BUG_ON(xa, xas.xa_node->count == 0); 1151 + XA_BUG_ON(xa, xas.xa_node->count > (1 << order)); 1152 + XA_BUG_ON(xa, xas.xa_node->nr_values != 0); 1153 + 1154 + xa_store_order(xa, 1 << order, order, xa_mk_value(1 << order), 1155 + GFP_KERNEL); 1156 + XA_BUG_ON(xa, xas.xa_node->count != xas.xa_node->nr_values * 2); 1157 + 1158 + xa_erase(xa, 1 << order); 1159 + XA_BUG_ON(xa, xas.xa_node->nr_values != 0); 1160 + 1161 + xa_erase(xa, 0); 1162 + XA_BUG_ON(xa, !xa_empty(xa)); 1163 + } 1164 + #endif 1165 + } 1166 + 1167 + static noinline void check_destroy(struct xarray *xa) 1168 + { 1169 + unsigned long index; 1170 + 1171 + XA_BUG_ON(xa, !xa_empty(xa)); 1172 + 1173 + /* Destroying an empty array is a no-op */ 1174 + xa_destroy(xa); 1175 + XA_BUG_ON(xa, !xa_empty(xa)); 1176 + 1177 + /* Destroying an array with a single entry */ 1178 + for (index = 0; index < 1000; index++) { 1179 + xa_store_index(xa, index, GFP_KERNEL); 1180 + XA_BUG_ON(xa, xa_empty(xa)); 1181 + xa_destroy(xa); 1182 + XA_BUG_ON(xa, !xa_empty(xa)); 1183 + } 1184 + 1185 + /* Destroying an array with a single entry at ULONG_MAX */ 1186 + xa_store(xa, ULONG_MAX, xa, GFP_KERNEL); 1187 + XA_BUG_ON(xa, xa_empty(xa)); 1188 + xa_destroy(xa); 1189 + XA_BUG_ON(xa, !xa_empty(xa)); 1190 + 1191 + #ifdef CONFIG_XARRAY_MULTI 1192 + /* Destroying an array with a multi-index entry */ 1193 + xa_store_order(xa, 1 << 11, 11, xa, GFP_KERNEL); 1194 + XA_BUG_ON(xa, xa_empty(xa)); 1195 + xa_destroy(xa); 1196 + XA_BUG_ON(xa, !xa_empty(xa)); 1197 + #endif 1198 + } 1199 + 1200 + static DEFINE_XARRAY(array); 1201 + 1202 + static int xarray_checks(void) 1203 + { 1204 + check_xa_err(&array); 1205 + check_xas_retry(&array); 1206 + check_xa_load(&array); 1207 + check_xa_mark(&array); 1208 + check_xa_shrink(&array); 1209 + check_xas_erase(&array); 1210 + check_cmpxchg(&array); 1211 + check_reserve(&array); 1212 + check_multi_store(&array); 1213 + check_xa_alloc(); 1214 + check_find(&array); 1215 + check_find_entry(&array); 1216 + check_account(&array); 1217 + check_destroy(&array); 1218 + check_move(&array); 1219 + check_create_range(&array); 1220 + check_store_range(&array); 1221 + check_store_iter(&array); 1222 + 1223 + check_workingset(&array, 0); 1224 + check_workingset(&array, 64); 1225 + check_workingset(&array, 4096); 1226 + 1227 + printk("XArray: %u of %u tests passed\n", tests_passed, tests_run); 1228 + return (tests_run == tests_passed) ? 0 : -EINVAL; 1229 + } 1230 + 1231 + static void xarray_exit(void) 1232 + { 1233 + } 1234 + 1235 + module_init(xarray_checks); 1236 + module_exit(xarray_exit); 1237 + MODULE_AUTHOR("Matthew Wilcox <willy@infradead.org>"); 1238 + MODULE_LICENSE("GPL");
+2036
lib/xarray.c
··· 1 + // SPDX-License-Identifier: GPL-2.0+ 2 + /* 3 + * XArray implementation 4 + * Copyright (c) 2017 Microsoft Corporation 5 + * Author: Matthew Wilcox <willy@infradead.org> 6 + */ 7 + 8 + #include <linux/bitmap.h> 9 + #include <linux/export.h> 10 + #include <linux/list.h> 11 + #include <linux/slab.h> 12 + #include <linux/xarray.h> 13 + 14 + /* 15 + * Coding conventions in this file: 16 + * 17 + * @xa is used to refer to the entire xarray. 18 + * @xas is the 'xarray operation state'. It may be either a pointer to 19 + * an xa_state, or an xa_state stored on the stack. This is an unfortunate 20 + * ambiguity. 21 + * @index is the index of the entry being operated on 22 + * @mark is an xa_mark_t; a small number indicating one of the mark bits. 23 + * @node refers to an xa_node; usually the primary one being operated on by 24 + * this function. 25 + * @offset is the index into the slots array inside an xa_node. 26 + * @parent refers to the @xa_node closer to the head than @node. 27 + * @entry refers to something stored in a slot in the xarray 28 + */ 29 + 30 + static inline unsigned int xa_lock_type(const struct xarray *xa) 31 + { 32 + return (__force unsigned int)xa->xa_flags & 3; 33 + } 34 + 35 + static inline void xas_lock_type(struct xa_state *xas, unsigned int lock_type) 36 + { 37 + if (lock_type == XA_LOCK_IRQ) 38 + xas_lock_irq(xas); 39 + else if (lock_type == XA_LOCK_BH) 40 + xas_lock_bh(xas); 41 + else 42 + xas_lock(xas); 43 + } 44 + 45 + static inline void xas_unlock_type(struct xa_state *xas, unsigned int lock_type) 46 + { 47 + if (lock_type == XA_LOCK_IRQ) 48 + xas_unlock_irq(xas); 49 + else if (lock_type == XA_LOCK_BH) 50 + xas_unlock_bh(xas); 51 + else 52 + xas_unlock(xas); 53 + } 54 + 55 + static inline bool xa_track_free(const struct xarray *xa) 56 + { 57 + return xa->xa_flags & XA_FLAGS_TRACK_FREE; 58 + } 59 + 60 + static inline void xa_mark_set(struct xarray *xa, xa_mark_t mark) 61 + { 62 + if (!(xa->xa_flags & XA_FLAGS_MARK(mark))) 63 + xa->xa_flags |= XA_FLAGS_MARK(mark); 64 + } 65 + 66 + static inline void xa_mark_clear(struct xarray *xa, xa_mark_t mark) 67 + { 68 + if (xa->xa_flags & XA_FLAGS_MARK(mark)) 69 + xa->xa_flags &= ~(XA_FLAGS_MARK(mark)); 70 + } 71 + 72 + static inline unsigned long *node_marks(struct xa_node *node, xa_mark_t mark) 73 + { 74 + return node->marks[(__force unsigned)mark]; 75 + } 76 + 77 + static inline bool node_get_mark(struct xa_node *node, 78 + unsigned int offset, xa_mark_t mark) 79 + { 80 + return test_bit(offset, node_marks(node, mark)); 81 + } 82 + 83 + /* returns true if the bit was set */ 84 + static inline bool node_set_mark(struct xa_node *node, unsigned int offset, 85 + xa_mark_t mark) 86 + { 87 + return __test_and_set_bit(offset, node_marks(node, mark)); 88 + } 89 + 90 + /* returns true if the bit was set */ 91 + static inline bool node_clear_mark(struct xa_node *node, unsigned int offset, 92 + xa_mark_t mark) 93 + { 94 + return __test_and_clear_bit(offset, node_marks(node, mark)); 95 + } 96 + 97 + static inline bool node_any_mark(struct xa_node *node, xa_mark_t mark) 98 + { 99 + return !bitmap_empty(node_marks(node, mark), XA_CHUNK_SIZE); 100 + } 101 + 102 + static inline void node_mark_all(struct xa_node *node, xa_mark_t mark) 103 + { 104 + bitmap_fill(node_marks(node, mark), XA_CHUNK_SIZE); 105 + } 106 + 107 + #define mark_inc(mark) do { \ 108 + mark = (__force xa_mark_t)((__force unsigned)(mark) + 1); \ 109 + } while (0) 110 + 111 + /* 112 + * xas_squash_marks() - Merge all marks to the first entry 113 + * @xas: Array operation state. 114 + * 115 + * Set a mark on the first entry if any entry has it set. Clear marks on 116 + * all sibling entries. 117 + */ 118 + static void xas_squash_marks(const struct xa_state *xas) 119 + { 120 + unsigned int mark = 0; 121 + unsigned int limit = xas->xa_offset + xas->xa_sibs + 1; 122 + 123 + if (!xas->xa_sibs) 124 + return; 125 + 126 + do { 127 + unsigned long *marks = xas->xa_node->marks[mark]; 128 + if (find_next_bit(marks, limit, xas->xa_offset + 1) == limit) 129 + continue; 130 + __set_bit(xas->xa_offset, marks); 131 + bitmap_clear(marks, xas->xa_offset + 1, xas->xa_sibs); 132 + } while (mark++ != (__force unsigned)XA_MARK_MAX); 133 + } 134 + 135 + /* extracts the offset within this node from the index */ 136 + static unsigned int get_offset(unsigned long index, struct xa_node *node) 137 + { 138 + return (index >> node->shift) & XA_CHUNK_MASK; 139 + } 140 + 141 + static void xas_set_offset(struct xa_state *xas) 142 + { 143 + xas->xa_offset = get_offset(xas->xa_index, xas->xa_node); 144 + } 145 + 146 + /* move the index either forwards (find) or backwards (sibling slot) */ 147 + static void xas_move_index(struct xa_state *xas, unsigned long offset) 148 + { 149 + unsigned int shift = xas->xa_node->shift; 150 + xas->xa_index &= ~XA_CHUNK_MASK << shift; 151 + xas->xa_index += offset << shift; 152 + } 153 + 154 + static void xas_advance(struct xa_state *xas) 155 + { 156 + xas->xa_offset++; 157 + xas_move_index(xas, xas->xa_offset); 158 + } 159 + 160 + static void *set_bounds(struct xa_state *xas) 161 + { 162 + xas->xa_node = XAS_BOUNDS; 163 + return NULL; 164 + } 165 + 166 + /* 167 + * Starts a walk. If the @xas is already valid, we assume that it's on 168 + * the right path and just return where we've got to. If we're in an 169 + * error state, return NULL. If the index is outside the current scope 170 + * of the xarray, return NULL without changing @xas->xa_node. Otherwise 171 + * set @xas->xa_node to NULL and return the current head of the array. 172 + */ 173 + static void *xas_start(struct xa_state *xas) 174 + { 175 + void *entry; 176 + 177 + if (xas_valid(xas)) 178 + return xas_reload(xas); 179 + if (xas_error(xas)) 180 + return NULL; 181 + 182 + entry = xa_head(xas->xa); 183 + if (!xa_is_node(entry)) { 184 + if (xas->xa_index) 185 + return set_bounds(xas); 186 + } else { 187 + if ((xas->xa_index >> xa_to_node(entry)->shift) > XA_CHUNK_MASK) 188 + return set_bounds(xas); 189 + } 190 + 191 + xas->xa_node = NULL; 192 + return entry; 193 + } 194 + 195 + static void *xas_descend(struct xa_state *xas, struct xa_node *node) 196 + { 197 + unsigned int offset = get_offset(xas->xa_index, node); 198 + void *entry = xa_entry(xas->xa, node, offset); 199 + 200 + xas->xa_node = node; 201 + if (xa_is_sibling(entry)) { 202 + offset = xa_to_sibling(entry); 203 + entry = xa_entry(xas->xa, node, offset); 204 + } 205 + 206 + xas->xa_offset = offset; 207 + return entry; 208 + } 209 + 210 + /** 211 + * xas_load() - Load an entry from the XArray (advanced). 212 + * @xas: XArray operation state. 213 + * 214 + * Usually walks the @xas to the appropriate state to load the entry 215 + * stored at xa_index. However, it will do nothing and return %NULL if 216 + * @xas is in an error state. xas_load() will never expand the tree. 217 + * 218 + * If the xa_state is set up to operate on a multi-index entry, xas_load() 219 + * may return %NULL or an internal entry, even if there are entries 220 + * present within the range specified by @xas. 221 + * 222 + * Context: Any context. The caller should hold the xa_lock or the RCU lock. 223 + * Return: Usually an entry in the XArray, but see description for exceptions. 224 + */ 225 + void *xas_load(struct xa_state *xas) 226 + { 227 + void *entry = xas_start(xas); 228 + 229 + while (xa_is_node(entry)) { 230 + struct xa_node *node = xa_to_node(entry); 231 + 232 + if (xas->xa_shift > node->shift) 233 + break; 234 + entry = xas_descend(xas, node); 235 + } 236 + return entry; 237 + } 238 + EXPORT_SYMBOL_GPL(xas_load); 239 + 240 + /* Move the radix tree node cache here */ 241 + extern struct kmem_cache *radix_tree_node_cachep; 242 + extern void radix_tree_node_rcu_free(struct rcu_head *head); 243 + 244 + #define XA_RCU_FREE ((struct xarray *)1) 245 + 246 + static void xa_node_free(struct xa_node *node) 247 + { 248 + XA_NODE_BUG_ON(node, !list_empty(&node->private_list)); 249 + node->array = XA_RCU_FREE; 250 + call_rcu(&node->rcu_head, radix_tree_node_rcu_free); 251 + } 252 + 253 + /* 254 + * xas_destroy() - Free any resources allocated during the XArray operation. 255 + * @xas: XArray operation state. 256 + * 257 + * This function is now internal-only. 258 + */ 259 + static void xas_destroy(struct xa_state *xas) 260 + { 261 + struct xa_node *node = xas->xa_alloc; 262 + 263 + if (!node) 264 + return; 265 + XA_NODE_BUG_ON(node, !list_empty(&node->private_list)); 266 + kmem_cache_free(radix_tree_node_cachep, node); 267 + xas->xa_alloc = NULL; 268 + } 269 + 270 + /** 271 + * xas_nomem() - Allocate memory if needed. 272 + * @xas: XArray operation state. 273 + * @gfp: Memory allocation flags. 274 + * 275 + * If we need to add new nodes to the XArray, we try to allocate memory 276 + * with GFP_NOWAIT while holding the lock, which will usually succeed. 277 + * If it fails, @xas is flagged as needing memory to continue. The caller 278 + * should drop the lock and call xas_nomem(). If xas_nomem() succeeds, 279 + * the caller should retry the operation. 280 + * 281 + * Forward progress is guaranteed as one node is allocated here and 282 + * stored in the xa_state where it will be found by xas_alloc(). More 283 + * nodes will likely be found in the slab allocator, but we do not tie 284 + * them up here. 285 + * 286 + * Return: true if memory was needed, and was successfully allocated. 287 + */ 288 + bool xas_nomem(struct xa_state *xas, gfp_t gfp) 289 + { 290 + if (xas->xa_node != XA_ERROR(-ENOMEM)) { 291 + xas_destroy(xas); 292 + return false; 293 + } 294 + xas->xa_alloc = kmem_cache_alloc(radix_tree_node_cachep, gfp); 295 + if (!xas->xa_alloc) 296 + return false; 297 + XA_NODE_BUG_ON(xas->xa_alloc, !list_empty(&xas->xa_alloc->private_list)); 298 + xas->xa_node = XAS_RESTART; 299 + return true; 300 + } 301 + EXPORT_SYMBOL_GPL(xas_nomem); 302 + 303 + /* 304 + * __xas_nomem() - Drop locks and allocate memory if needed. 305 + * @xas: XArray operation state. 306 + * @gfp: Memory allocation flags. 307 + * 308 + * Internal variant of xas_nomem(). 309 + * 310 + * Return: true if memory was needed, and was successfully allocated. 311 + */ 312 + static bool __xas_nomem(struct xa_state *xas, gfp_t gfp) 313 + __must_hold(xas->xa->xa_lock) 314 + { 315 + unsigned int lock_type = xa_lock_type(xas->xa); 316 + 317 + if (xas->xa_node != XA_ERROR(-ENOMEM)) { 318 + xas_destroy(xas); 319 + return false; 320 + } 321 + if (gfpflags_allow_blocking(gfp)) { 322 + xas_unlock_type(xas, lock_type); 323 + xas->xa_alloc = kmem_cache_alloc(radix_tree_node_cachep, gfp); 324 + xas_lock_type(xas, lock_type); 325 + } else { 326 + xas->xa_alloc = kmem_cache_alloc(radix_tree_node_cachep, gfp); 327 + } 328 + if (!xas->xa_alloc) 329 + return false; 330 + XA_NODE_BUG_ON(xas->xa_alloc, !list_empty(&xas->xa_alloc->private_list)); 331 + xas->xa_node = XAS_RESTART; 332 + return true; 333 + } 334 + 335 + static void xas_update(struct xa_state *xas, struct xa_node *node) 336 + { 337 + if (xas->xa_update) 338 + xas->xa_update(node); 339 + else 340 + XA_NODE_BUG_ON(node, !list_empty(&node->private_list)); 341 + } 342 + 343 + static void *xas_alloc(struct xa_state *xas, unsigned int shift) 344 + { 345 + struct xa_node *parent = xas->xa_node; 346 + struct xa_node *node = xas->xa_alloc; 347 + 348 + if (xas_invalid(xas)) 349 + return NULL; 350 + 351 + if (node) { 352 + xas->xa_alloc = NULL; 353 + } else { 354 + node = kmem_cache_alloc(radix_tree_node_cachep, 355 + GFP_NOWAIT | __GFP_NOWARN); 356 + if (!node) { 357 + xas_set_err(xas, -ENOMEM); 358 + return NULL; 359 + } 360 + } 361 + 362 + if (parent) { 363 + node->offset = xas->xa_offset; 364 + parent->count++; 365 + XA_NODE_BUG_ON(node, parent->count > XA_CHUNK_SIZE); 366 + xas_update(xas, parent); 367 + } 368 + XA_NODE_BUG_ON(node, shift > BITS_PER_LONG); 369 + XA_NODE_BUG_ON(node, !list_empty(&node->private_list)); 370 + node->shift = shift; 371 + node->count = 0; 372 + node->nr_values = 0; 373 + RCU_INIT_POINTER(node->parent, xas->xa_node); 374 + node->array = xas->xa; 375 + 376 + return node; 377 + } 378 + 379 + #ifdef CONFIG_XARRAY_MULTI 380 + /* Returns the number of indices covered by a given xa_state */ 381 + static unsigned long xas_size(const struct xa_state *xas) 382 + { 383 + return (xas->xa_sibs + 1UL) << xas->xa_shift; 384 + } 385 + #endif 386 + 387 + /* 388 + * Use this to calculate the maximum index that will need to be created 389 + * in order to add the entry described by @xas. Because we cannot store a 390 + * multiple-index entry at index 0, the calculation is a little more complex 391 + * than you might expect. 392 + */ 393 + static unsigned long xas_max(struct xa_state *xas) 394 + { 395 + unsigned long max = xas->xa_index; 396 + 397 + #ifdef CONFIG_XARRAY_MULTI 398 + if (xas->xa_shift || xas->xa_sibs) { 399 + unsigned long mask = xas_size(xas) - 1; 400 + max |= mask; 401 + if (mask == max) 402 + max++; 403 + } 404 + #endif 405 + 406 + return max; 407 + } 408 + 409 + /* The maximum index that can be contained in the array without expanding it */ 410 + static unsigned long max_index(void *entry) 411 + { 412 + if (!xa_is_node(entry)) 413 + return 0; 414 + return (XA_CHUNK_SIZE << xa_to_node(entry)->shift) - 1; 415 + } 416 + 417 + static void xas_shrink(struct xa_state *xas) 418 + { 419 + struct xarray *xa = xas->xa; 420 + struct xa_node *node = xas->xa_node; 421 + 422 + for (;;) { 423 + void *entry; 424 + 425 + XA_NODE_BUG_ON(node, node->count > XA_CHUNK_SIZE); 426 + if (node->count != 1) 427 + break; 428 + entry = xa_entry_locked(xa, node, 0); 429 + if (!entry) 430 + break; 431 + if (!xa_is_node(entry) && node->shift) 432 + break; 433 + xas->xa_node = XAS_BOUNDS; 434 + 435 + RCU_INIT_POINTER(xa->xa_head, entry); 436 + if (xa_track_free(xa) && !node_get_mark(node, 0, XA_FREE_MARK)) 437 + xa_mark_clear(xa, XA_FREE_MARK); 438 + 439 + node->count = 0; 440 + node->nr_values = 0; 441 + if (!xa_is_node(entry)) 442 + RCU_INIT_POINTER(node->slots[0], XA_RETRY_ENTRY); 443 + xas_update(xas, node); 444 + xa_node_free(node); 445 + if (!xa_is_node(entry)) 446 + break; 447 + node = xa_to_node(entry); 448 + node->parent = NULL; 449 + } 450 + } 451 + 452 + /* 453 + * xas_delete_node() - Attempt to delete an xa_node 454 + * @xas: Array operation state. 455 + * 456 + * Attempts to delete the @xas->xa_node. This will fail if xa->node has 457 + * a non-zero reference count. 458 + */ 459 + static void xas_delete_node(struct xa_state *xas) 460 + { 461 + struct xa_node *node = xas->xa_node; 462 + 463 + for (;;) { 464 + struct xa_node *parent; 465 + 466 + XA_NODE_BUG_ON(node, node->count > XA_CHUNK_SIZE); 467 + if (node->count) 468 + break; 469 + 470 + parent = xa_parent_locked(xas->xa, node); 471 + xas->xa_node = parent; 472 + xas->xa_offset = node->offset; 473 + xa_node_free(node); 474 + 475 + if (!parent) { 476 + xas->xa->xa_head = NULL; 477 + xas->xa_node = XAS_BOUNDS; 478 + return; 479 + } 480 + 481 + parent->slots[xas->xa_offset] = NULL; 482 + parent->count--; 483 + XA_NODE_BUG_ON(parent, parent->count > XA_CHUNK_SIZE); 484 + node = parent; 485 + xas_update(xas, node); 486 + } 487 + 488 + if (!node->parent) 489 + xas_shrink(xas); 490 + } 491 + 492 + /** 493 + * xas_free_nodes() - Free this node and all nodes that it references 494 + * @xas: Array operation state. 495 + * @top: Node to free 496 + * 497 + * This node has been removed from the tree. We must now free it and all 498 + * of its subnodes. There may be RCU walkers with references into the tree, 499 + * so we must replace all entries with retry markers. 500 + */ 501 + static void xas_free_nodes(struct xa_state *xas, struct xa_node *top) 502 + { 503 + unsigned int offset = 0; 504 + struct xa_node *node = top; 505 + 506 + for (;;) { 507 + void *entry = xa_entry_locked(xas->xa, node, offset); 508 + 509 + if (xa_is_node(entry)) { 510 + node = xa_to_node(entry); 511 + offset = 0; 512 + continue; 513 + } 514 + if (entry) 515 + RCU_INIT_POINTER(node->slots[offset], XA_RETRY_ENTRY); 516 + offset++; 517 + while (offset == XA_CHUNK_SIZE) { 518 + struct xa_node *parent; 519 + 520 + parent = xa_parent_locked(xas->xa, node); 521 + offset = node->offset + 1; 522 + node->count = 0; 523 + node->nr_values = 0; 524 + xas_update(xas, node); 525 + xa_node_free(node); 526 + if (node == top) 527 + return; 528 + node = parent; 529 + } 530 + } 531 + } 532 + 533 + /* 534 + * xas_expand adds nodes to the head of the tree until it has reached 535 + * sufficient height to be able to contain @xas->xa_index 536 + */ 537 + static int xas_expand(struct xa_state *xas, void *head) 538 + { 539 + struct xarray *xa = xas->xa; 540 + struct xa_node *node = NULL; 541 + unsigned int shift = 0; 542 + unsigned long max = xas_max(xas); 543 + 544 + if (!head) { 545 + if (max == 0) 546 + return 0; 547 + while ((max >> shift) >= XA_CHUNK_SIZE) 548 + shift += XA_CHUNK_SHIFT; 549 + return shift + XA_CHUNK_SHIFT; 550 + } else if (xa_is_node(head)) { 551 + node = xa_to_node(head); 552 + shift = node->shift + XA_CHUNK_SHIFT; 553 + } 554 + xas->xa_node = NULL; 555 + 556 + while (max > max_index(head)) { 557 + xa_mark_t mark = 0; 558 + 559 + XA_NODE_BUG_ON(node, shift > BITS_PER_LONG); 560 + node = xas_alloc(xas, shift); 561 + if (!node) 562 + return -ENOMEM; 563 + 564 + node->count = 1; 565 + if (xa_is_value(head)) 566 + node->nr_values = 1; 567 + RCU_INIT_POINTER(node->slots[0], head); 568 + 569 + /* Propagate the aggregated mark info to the new child */ 570 + for (;;) { 571 + if (xa_track_free(xa) && mark == XA_FREE_MARK) { 572 + node_mark_all(node, XA_FREE_MARK); 573 + if (!xa_marked(xa, XA_FREE_MARK)) { 574 + node_clear_mark(node, 0, XA_FREE_MARK); 575 + xa_mark_set(xa, XA_FREE_MARK); 576 + } 577 + } else if (xa_marked(xa, mark)) { 578 + node_set_mark(node, 0, mark); 579 + } 580 + if (mark == XA_MARK_MAX) 581 + break; 582 + mark_inc(mark); 583 + } 584 + 585 + /* 586 + * Now that the new node is fully initialised, we can add 587 + * it to the tree 588 + */ 589 + if (xa_is_node(head)) { 590 + xa_to_node(head)->offset = 0; 591 + rcu_assign_pointer(xa_to_node(head)->parent, node); 592 + } 593 + head = xa_mk_node(node); 594 + rcu_assign_pointer(xa->xa_head, head); 595 + xas_update(xas, node); 596 + 597 + shift += XA_CHUNK_SHIFT; 598 + } 599 + 600 + xas->xa_node = node; 601 + return shift; 602 + } 603 + 604 + /* 605 + * xas_create() - Create a slot to store an entry in. 606 + * @xas: XArray operation state. 607 + * 608 + * Most users will not need to call this function directly, as it is called 609 + * by xas_store(). It is useful for doing conditional store operations 610 + * (see the xa_cmpxchg() implementation for an example). 611 + * 612 + * Return: If the slot already existed, returns the contents of this slot. 613 + * If the slot was newly created, returns NULL. If it failed to create the 614 + * slot, returns NULL and indicates the error in @xas. 615 + */ 616 + static void *xas_create(struct xa_state *xas) 617 + { 618 + struct xarray *xa = xas->xa; 619 + void *entry; 620 + void __rcu **slot; 621 + struct xa_node *node = xas->xa_node; 622 + int shift; 623 + unsigned int order = xas->xa_shift; 624 + 625 + if (xas_top(node)) { 626 + entry = xa_head_locked(xa); 627 + xas->xa_node = NULL; 628 + shift = xas_expand(xas, entry); 629 + if (shift < 0) 630 + return NULL; 631 + entry = xa_head_locked(xa); 632 + slot = &xa->xa_head; 633 + } else if (xas_error(xas)) { 634 + return NULL; 635 + } else if (node) { 636 + unsigned int offset = xas->xa_offset; 637 + 638 + shift = node->shift; 639 + entry = xa_entry_locked(xa, node, offset); 640 + slot = &node->slots[offset]; 641 + } else { 642 + shift = 0; 643 + entry = xa_head_locked(xa); 644 + slot = &xa->xa_head; 645 + } 646 + 647 + while (shift > order) { 648 + shift -= XA_CHUNK_SHIFT; 649 + if (!entry) { 650 + node = xas_alloc(xas, shift); 651 + if (!node) 652 + break; 653 + if (xa_track_free(xa)) 654 + node_mark_all(node, XA_FREE_MARK); 655 + rcu_assign_pointer(*slot, xa_mk_node(node)); 656 + } else if (xa_is_node(entry)) { 657 + node = xa_to_node(entry); 658 + } else { 659 + break; 660 + } 661 + entry = xas_descend(xas, node); 662 + slot = &node->slots[xas->xa_offset]; 663 + } 664 + 665 + return entry; 666 + } 667 + 668 + /** 669 + * xas_create_range() - Ensure that stores to this range will succeed 670 + * @xas: XArray operation state. 671 + * 672 + * Creates all of the slots in the range covered by @xas. Sets @xas to 673 + * create single-index entries and positions it at the beginning of the 674 + * range. This is for the benefit of users which have not yet been 675 + * converted to use multi-index entries. 676 + */ 677 + void xas_create_range(struct xa_state *xas) 678 + { 679 + unsigned long index = xas->xa_index; 680 + unsigned char shift = xas->xa_shift; 681 + unsigned char sibs = xas->xa_sibs; 682 + 683 + xas->xa_index |= ((sibs + 1) << shift) - 1; 684 + if (xas_is_node(xas) && xas->xa_node->shift == xas->xa_shift) 685 + xas->xa_offset |= sibs; 686 + xas->xa_shift = 0; 687 + xas->xa_sibs = 0; 688 + 689 + for (;;) { 690 + xas_create(xas); 691 + if (xas_error(xas)) 692 + goto restore; 693 + if (xas->xa_index <= (index | XA_CHUNK_MASK)) 694 + goto success; 695 + xas->xa_index -= XA_CHUNK_SIZE; 696 + 697 + for (;;) { 698 + struct xa_node *node = xas->xa_node; 699 + xas->xa_node = xa_parent_locked(xas->xa, node); 700 + xas->xa_offset = node->offset - 1; 701 + if (node->offset != 0) 702 + break; 703 + } 704 + } 705 + 706 + restore: 707 + xas->xa_shift = shift; 708 + xas->xa_sibs = sibs; 709 + xas->xa_index = index; 710 + return; 711 + success: 712 + xas->xa_index = index; 713 + if (xas->xa_node) 714 + xas_set_offset(xas); 715 + } 716 + EXPORT_SYMBOL_GPL(xas_create_range); 717 + 718 + static void update_node(struct xa_state *xas, struct xa_node *node, 719 + int count, int values) 720 + { 721 + if (!node || (!count && !values)) 722 + return; 723 + 724 + node->count += count; 725 + node->nr_values += values; 726 + XA_NODE_BUG_ON(node, node->count > XA_CHUNK_SIZE); 727 + XA_NODE_BUG_ON(node, node->nr_values > XA_CHUNK_SIZE); 728 + xas_update(xas, node); 729 + if (count < 0) 730 + xas_delete_node(xas); 731 + } 732 + 733 + /** 734 + * xas_store() - Store this entry in the XArray. 735 + * @xas: XArray operation state. 736 + * @entry: New entry. 737 + * 738 + * If @xas is operating on a multi-index entry, the entry returned by this 739 + * function is essentially meaningless (it may be an internal entry or it 740 + * may be %NULL, even if there are non-NULL entries at some of the indices 741 + * covered by the range). This is not a problem for any current users, 742 + * and can be changed if needed. 743 + * 744 + * Return: The old entry at this index. 745 + */ 746 + void *xas_store(struct xa_state *xas, void *entry) 747 + { 748 + struct xa_node *node; 749 + void __rcu **slot = &xas->xa->xa_head; 750 + unsigned int offset, max; 751 + int count = 0; 752 + int values = 0; 753 + void *first, *next; 754 + bool value = xa_is_value(entry); 755 + 756 + if (entry) 757 + first = xas_create(xas); 758 + else 759 + first = xas_load(xas); 760 + 761 + if (xas_invalid(xas)) 762 + return first; 763 + node = xas->xa_node; 764 + if (node && (xas->xa_shift < node->shift)) 765 + xas->xa_sibs = 0; 766 + if ((first == entry) && !xas->xa_sibs) 767 + return first; 768 + 769 + next = first; 770 + offset = xas->xa_offset; 771 + max = xas->xa_offset + xas->xa_sibs; 772 + if (node) { 773 + slot = &node->slots[offset]; 774 + if (xas->xa_sibs) 775 + xas_squash_marks(xas); 776 + } 777 + if (!entry) 778 + xas_init_marks(xas); 779 + 780 + for (;;) { 781 + /* 782 + * Must clear the marks before setting the entry to NULL, 783 + * otherwise xas_for_each_marked may find a NULL entry and 784 + * stop early. rcu_assign_pointer contains a release barrier 785 + * so the mark clearing will appear to happen before the 786 + * entry is set to NULL. 787 + */ 788 + rcu_assign_pointer(*slot, entry); 789 + if (xa_is_node(next)) 790 + xas_free_nodes(xas, xa_to_node(next)); 791 + if (!node) 792 + break; 793 + count += !next - !entry; 794 + values += !xa_is_value(first) - !value; 795 + if (entry) { 796 + if (offset == max) 797 + break; 798 + if (!xa_is_sibling(entry)) 799 + entry = xa_mk_sibling(xas->xa_offset); 800 + } else { 801 + if (offset == XA_CHUNK_MASK) 802 + break; 803 + } 804 + next = xa_entry_locked(xas->xa, node, ++offset); 805 + if (!xa_is_sibling(next)) { 806 + if (!entry && (offset > max)) 807 + break; 808 + first = next; 809 + } 810 + slot++; 811 + } 812 + 813 + update_node(xas, node, count, values); 814 + return first; 815 + } 816 + EXPORT_SYMBOL_GPL(xas_store); 817 + 818 + /** 819 + * xas_get_mark() - Returns the state of this mark. 820 + * @xas: XArray operation state. 821 + * @mark: Mark number. 822 + * 823 + * Return: true if the mark is set, false if the mark is clear or @xas 824 + * is in an error state. 825 + */ 826 + bool xas_get_mark(const struct xa_state *xas, xa_mark_t mark) 827 + { 828 + if (xas_invalid(xas)) 829 + return false; 830 + if (!xas->xa_node) 831 + return xa_marked(xas->xa, mark); 832 + return node_get_mark(xas->xa_node, xas->xa_offset, mark); 833 + } 834 + EXPORT_SYMBOL_GPL(xas_get_mark); 835 + 836 + /** 837 + * xas_set_mark() - Sets the mark on this entry and its parents. 838 + * @xas: XArray operation state. 839 + * @mark: Mark number. 840 + * 841 + * Sets the specified mark on this entry, and walks up the tree setting it 842 + * on all the ancestor entries. Does nothing if @xas has not been walked to 843 + * an entry, or is in an error state. 844 + */ 845 + void xas_set_mark(const struct xa_state *xas, xa_mark_t mark) 846 + { 847 + struct xa_node *node = xas->xa_node; 848 + unsigned int offset = xas->xa_offset; 849 + 850 + if (xas_invalid(xas)) 851 + return; 852 + 853 + while (node) { 854 + if (node_set_mark(node, offset, mark)) 855 + return; 856 + offset = node->offset; 857 + node = xa_parent_locked(xas->xa, node); 858 + } 859 + 860 + if (!xa_marked(xas->xa, mark)) 861 + xa_mark_set(xas->xa, mark); 862 + } 863 + EXPORT_SYMBOL_GPL(xas_set_mark); 864 + 865 + /** 866 + * xas_clear_mark() - Clears the mark on this entry and its parents. 867 + * @xas: XArray operation state. 868 + * @mark: Mark number. 869 + * 870 + * Clears the specified mark on this entry, and walks back to the head 871 + * attempting to clear it on all the ancestor entries. Does nothing if 872 + * @xas has not been walked to an entry, or is in an error state. 873 + */ 874 + void xas_clear_mark(const struct xa_state *xas, xa_mark_t mark) 875 + { 876 + struct xa_node *node = xas->xa_node; 877 + unsigned int offset = xas->xa_offset; 878 + 879 + if (xas_invalid(xas)) 880 + return; 881 + 882 + while (node) { 883 + if (!node_clear_mark(node, offset, mark)) 884 + return; 885 + if (node_any_mark(node, mark)) 886 + return; 887 + 888 + offset = node->offset; 889 + node = xa_parent_locked(xas->xa, node); 890 + } 891 + 892 + if (xa_marked(xas->xa, mark)) 893 + xa_mark_clear(xas->xa, mark); 894 + } 895 + EXPORT_SYMBOL_GPL(xas_clear_mark); 896 + 897 + /** 898 + * xas_init_marks() - Initialise all marks for the entry 899 + * @xas: Array operations state. 900 + * 901 + * Initialise all marks for the entry specified by @xas. If we're tracking 902 + * free entries with a mark, we need to set it on all entries. All other 903 + * marks are cleared. 904 + * 905 + * This implementation is not as efficient as it could be; we may walk 906 + * up the tree multiple times. 907 + */ 908 + void xas_init_marks(const struct xa_state *xas) 909 + { 910 + xa_mark_t mark = 0; 911 + 912 + for (;;) { 913 + if (xa_track_free(xas->xa) && mark == XA_FREE_MARK) 914 + xas_set_mark(xas, mark); 915 + else 916 + xas_clear_mark(xas, mark); 917 + if (mark == XA_MARK_MAX) 918 + break; 919 + mark_inc(mark); 920 + } 921 + } 922 + EXPORT_SYMBOL_GPL(xas_init_marks); 923 + 924 + /** 925 + * xas_pause() - Pause a walk to drop a lock. 926 + * @xas: XArray operation state. 927 + * 928 + * Some users need to pause a walk and drop the lock they're holding in 929 + * order to yield to a higher priority thread or carry out an operation 930 + * on an entry. Those users should call this function before they drop 931 + * the lock. It resets the @xas to be suitable for the next iteration 932 + * of the loop after the user has reacquired the lock. If most entries 933 + * found during a walk require you to call xas_pause(), the xa_for_each() 934 + * iterator may be more appropriate. 935 + * 936 + * Note that xas_pause() only works for forward iteration. If a user needs 937 + * to pause a reverse iteration, we will need a xas_pause_rev(). 938 + */ 939 + void xas_pause(struct xa_state *xas) 940 + { 941 + struct xa_node *node = xas->xa_node; 942 + 943 + if (xas_invalid(xas)) 944 + return; 945 + 946 + if (node) { 947 + unsigned int offset = xas->xa_offset; 948 + while (++offset < XA_CHUNK_SIZE) { 949 + if (!xa_is_sibling(xa_entry(xas->xa, node, offset))) 950 + break; 951 + } 952 + xas->xa_index += (offset - xas->xa_offset) << node->shift; 953 + } else { 954 + xas->xa_index++; 955 + } 956 + xas->xa_node = XAS_RESTART; 957 + } 958 + EXPORT_SYMBOL_GPL(xas_pause); 959 + 960 + /* 961 + * __xas_prev() - Find the previous entry in the XArray. 962 + * @xas: XArray operation state. 963 + * 964 + * Helper function for xas_prev() which handles all the complex cases 965 + * out of line. 966 + */ 967 + void *__xas_prev(struct xa_state *xas) 968 + { 969 + void *entry; 970 + 971 + if (!xas_frozen(xas->xa_node)) 972 + xas->xa_index--; 973 + if (xas_not_node(xas->xa_node)) 974 + return xas_load(xas); 975 + 976 + if (xas->xa_offset != get_offset(xas->xa_index, xas->xa_node)) 977 + xas->xa_offset--; 978 + 979 + while (xas->xa_offset == 255) { 980 + xas->xa_offset = xas->xa_node->offset - 1; 981 + xas->xa_node = xa_parent(xas->xa, xas->xa_node); 982 + if (!xas->xa_node) 983 + return set_bounds(xas); 984 + } 985 + 986 + for (;;) { 987 + entry = xa_entry(xas->xa, xas->xa_node, xas->xa_offset); 988 + if (!xa_is_node(entry)) 989 + return entry; 990 + 991 + xas->xa_node = xa_to_node(entry); 992 + xas_set_offset(xas); 993 + } 994 + } 995 + EXPORT_SYMBOL_GPL(__xas_prev); 996 + 997 + /* 998 + * __xas_next() - Find the next entry in the XArray. 999 + * @xas: XArray operation state. 1000 + * 1001 + * Helper function for xas_next() which handles all the complex cases 1002 + * out of line. 1003 + */ 1004 + void *__xas_next(struct xa_state *xas) 1005 + { 1006 + void *entry; 1007 + 1008 + if (!xas_frozen(xas->xa_node)) 1009 + xas->xa_index++; 1010 + if (xas_not_node(xas->xa_node)) 1011 + return xas_load(xas); 1012 + 1013 + if (xas->xa_offset != get_offset(xas->xa_index, xas->xa_node)) 1014 + xas->xa_offset++; 1015 + 1016 + while (xas->xa_offset == XA_CHUNK_SIZE) { 1017 + xas->xa_offset = xas->xa_node->offset + 1; 1018 + xas->xa_node = xa_parent(xas->xa, xas->xa_node); 1019 + if (!xas->xa_node) 1020 + return set_bounds(xas); 1021 + } 1022 + 1023 + for (;;) { 1024 + entry = xa_entry(xas->xa, xas->xa_node, xas->xa_offset); 1025 + if (!xa_is_node(entry)) 1026 + return entry; 1027 + 1028 + xas->xa_node = xa_to_node(entry); 1029 + xas_set_offset(xas); 1030 + } 1031 + } 1032 + EXPORT_SYMBOL_GPL(__xas_next); 1033 + 1034 + /** 1035 + * xas_find() - Find the next present entry in the XArray. 1036 + * @xas: XArray operation state. 1037 + * @max: Highest index to return. 1038 + * 1039 + * If the @xas has not yet been walked to an entry, return the entry 1040 + * which has an index >= xas.xa_index. If it has been walked, the entry 1041 + * currently being pointed at has been processed, and so we move to the 1042 + * next entry. 1043 + * 1044 + * If no entry is found and the array is smaller than @max, the iterator 1045 + * is set to the smallest index not yet in the array. This allows @xas 1046 + * to be immediately passed to xas_store(). 1047 + * 1048 + * Return: The entry, if found, otherwise %NULL. 1049 + */ 1050 + void *xas_find(struct xa_state *xas, unsigned long max) 1051 + { 1052 + void *entry; 1053 + 1054 + if (xas_error(xas)) 1055 + return NULL; 1056 + 1057 + if (!xas->xa_node) { 1058 + xas->xa_index = 1; 1059 + return set_bounds(xas); 1060 + } else if (xas_top(xas->xa_node)) { 1061 + entry = xas_load(xas); 1062 + if (entry || xas_not_node(xas->xa_node)) 1063 + return entry; 1064 + } else if (!xas->xa_node->shift && 1065 + xas->xa_offset != (xas->xa_index & XA_CHUNK_MASK)) { 1066 + xas->xa_offset = ((xas->xa_index - 1) & XA_CHUNK_MASK) + 1; 1067 + } 1068 + 1069 + xas_advance(xas); 1070 + 1071 + while (xas->xa_node && (xas->xa_index <= max)) { 1072 + if (unlikely(xas->xa_offset == XA_CHUNK_SIZE)) { 1073 + xas->xa_offset = xas->xa_node->offset + 1; 1074 + xas->xa_node = xa_parent(xas->xa, xas->xa_node); 1075 + continue; 1076 + } 1077 + 1078 + entry = xa_entry(xas->xa, xas->xa_node, xas->xa_offset); 1079 + if (xa_is_node(entry)) { 1080 + xas->xa_node = xa_to_node(entry); 1081 + xas->xa_offset = 0; 1082 + continue; 1083 + } 1084 + if (entry && !xa_is_sibling(entry)) 1085 + return entry; 1086 + 1087 + xas_advance(xas); 1088 + } 1089 + 1090 + if (!xas->xa_node) 1091 + xas->xa_node = XAS_BOUNDS; 1092 + return NULL; 1093 + } 1094 + EXPORT_SYMBOL_GPL(xas_find); 1095 + 1096 + /** 1097 + * xas_find_marked() - Find the next marked entry in the XArray. 1098 + * @xas: XArray operation state. 1099 + * @max: Highest index to return. 1100 + * @mark: Mark number to search for. 1101 + * 1102 + * If the @xas has not yet been walked to an entry, return the marked entry 1103 + * which has an index >= xas.xa_index. If it has been walked, the entry 1104 + * currently being pointed at has been processed, and so we return the 1105 + * first marked entry with an index > xas.xa_index. 1106 + * 1107 + * If no marked entry is found and the array is smaller than @max, @xas is 1108 + * set to the bounds state and xas->xa_index is set to the smallest index 1109 + * not yet in the array. This allows @xas to be immediately passed to 1110 + * xas_store(). 1111 + * 1112 + * If no entry is found before @max is reached, @xas is set to the restart 1113 + * state. 1114 + * 1115 + * Return: The entry, if found, otherwise %NULL. 1116 + */ 1117 + void *xas_find_marked(struct xa_state *xas, unsigned long max, xa_mark_t mark) 1118 + { 1119 + bool advance = true; 1120 + unsigned int offset; 1121 + void *entry; 1122 + 1123 + if (xas_error(xas)) 1124 + return NULL; 1125 + 1126 + if (!xas->xa_node) { 1127 + xas->xa_index = 1; 1128 + goto out; 1129 + } else if (xas_top(xas->xa_node)) { 1130 + advance = false; 1131 + entry = xa_head(xas->xa); 1132 + xas->xa_node = NULL; 1133 + if (xas->xa_index > max_index(entry)) 1134 + goto bounds; 1135 + if (!xa_is_node(entry)) { 1136 + if (xa_marked(xas->xa, mark)) 1137 + return entry; 1138 + xas->xa_index = 1; 1139 + goto out; 1140 + } 1141 + xas->xa_node = xa_to_node(entry); 1142 + xas->xa_offset = xas->xa_index >> xas->xa_node->shift; 1143 + } 1144 + 1145 + while (xas->xa_index <= max) { 1146 + if (unlikely(xas->xa_offset == XA_CHUNK_SIZE)) { 1147 + xas->xa_offset = xas->xa_node->offset + 1; 1148 + xas->xa_node = xa_parent(xas->xa, xas->xa_node); 1149 + if (!xas->xa_node) 1150 + break; 1151 + advance = false; 1152 + continue; 1153 + } 1154 + 1155 + if (!advance) { 1156 + entry = xa_entry(xas->xa, xas->xa_node, xas->xa_offset); 1157 + if (xa_is_sibling(entry)) { 1158 + xas->xa_offset = xa_to_sibling(entry); 1159 + xas_move_index(xas, xas->xa_offset); 1160 + } 1161 + } 1162 + 1163 + offset = xas_find_chunk(xas, advance, mark); 1164 + if (offset > xas->xa_offset) { 1165 + advance = false; 1166 + xas_move_index(xas, offset); 1167 + /* Mind the wrap */ 1168 + if ((xas->xa_index - 1) >= max) 1169 + goto max; 1170 + xas->xa_offset = offset; 1171 + if (offset == XA_CHUNK_SIZE) 1172 + continue; 1173 + } 1174 + 1175 + entry = xa_entry(xas->xa, xas->xa_node, xas->xa_offset); 1176 + if (!xa_is_node(entry)) 1177 + return entry; 1178 + xas->xa_node = xa_to_node(entry); 1179 + xas_set_offset(xas); 1180 + } 1181 + 1182 + out: 1183 + if (!max) 1184 + goto max; 1185 + bounds: 1186 + xas->xa_node = XAS_BOUNDS; 1187 + return NULL; 1188 + max: 1189 + xas->xa_node = XAS_RESTART; 1190 + return NULL; 1191 + } 1192 + EXPORT_SYMBOL_GPL(xas_find_marked); 1193 + 1194 + /** 1195 + * xas_find_conflict() - Find the next present entry in a range. 1196 + * @xas: XArray operation state. 1197 + * 1198 + * The @xas describes both a range and a position within that range. 1199 + * 1200 + * Context: Any context. Expects xa_lock to be held. 1201 + * Return: The next entry in the range covered by @xas or %NULL. 1202 + */ 1203 + void *xas_find_conflict(struct xa_state *xas) 1204 + { 1205 + void *curr; 1206 + 1207 + if (xas_error(xas)) 1208 + return NULL; 1209 + 1210 + if (!xas->xa_node) 1211 + return NULL; 1212 + 1213 + if (xas_top(xas->xa_node)) { 1214 + curr = xas_start(xas); 1215 + if (!curr) 1216 + return NULL; 1217 + while (xa_is_node(curr)) { 1218 + struct xa_node *node = xa_to_node(curr); 1219 + curr = xas_descend(xas, node); 1220 + } 1221 + if (curr) 1222 + return curr; 1223 + } 1224 + 1225 + if (xas->xa_node->shift > xas->xa_shift) 1226 + return NULL; 1227 + 1228 + for (;;) { 1229 + if (xas->xa_node->shift == xas->xa_shift) { 1230 + if ((xas->xa_offset & xas->xa_sibs) == xas->xa_sibs) 1231 + break; 1232 + } else if (xas->xa_offset == XA_CHUNK_MASK) { 1233 + xas->xa_offset = xas->xa_node->offset; 1234 + xas->xa_node = xa_parent_locked(xas->xa, xas->xa_node); 1235 + if (!xas->xa_node) 1236 + break; 1237 + continue; 1238 + } 1239 + curr = xa_entry_locked(xas->xa, xas->xa_node, ++xas->xa_offset); 1240 + if (xa_is_sibling(curr)) 1241 + continue; 1242 + while (xa_is_node(curr)) { 1243 + xas->xa_node = xa_to_node(curr); 1244 + xas->xa_offset = 0; 1245 + curr = xa_entry_locked(xas->xa, xas->xa_node, 0); 1246 + } 1247 + if (curr) 1248 + return curr; 1249 + } 1250 + xas->xa_offset -= xas->xa_sibs; 1251 + return NULL; 1252 + } 1253 + EXPORT_SYMBOL_GPL(xas_find_conflict); 1254 + 1255 + /** 1256 + * xa_init_flags() - Initialise an empty XArray with flags. 1257 + * @xa: XArray. 1258 + * @flags: XA_FLAG values. 1259 + * 1260 + * If you need to initialise an XArray with special flags (eg you need 1261 + * to take the lock from interrupt context), use this function instead 1262 + * of xa_init(). 1263 + * 1264 + * Context: Any context. 1265 + */ 1266 + void xa_init_flags(struct xarray *xa, gfp_t flags) 1267 + { 1268 + unsigned int lock_type; 1269 + static struct lock_class_key xa_lock_irq; 1270 + static struct lock_class_key xa_lock_bh; 1271 + 1272 + spin_lock_init(&xa->xa_lock); 1273 + xa->xa_flags = flags; 1274 + xa->xa_head = NULL; 1275 + 1276 + lock_type = xa_lock_type(xa); 1277 + if (lock_type == XA_LOCK_IRQ) 1278 + lockdep_set_class(&xa->xa_lock, &xa_lock_irq); 1279 + else if (lock_type == XA_LOCK_BH) 1280 + lockdep_set_class(&xa->xa_lock, &xa_lock_bh); 1281 + } 1282 + EXPORT_SYMBOL(xa_init_flags); 1283 + 1284 + /** 1285 + * xa_load() - Load an entry from an XArray. 1286 + * @xa: XArray. 1287 + * @index: index into array. 1288 + * 1289 + * Context: Any context. Takes and releases the RCU lock. 1290 + * Return: The entry at @index in @xa. 1291 + */ 1292 + void *xa_load(struct xarray *xa, unsigned long index) 1293 + { 1294 + XA_STATE(xas, xa, index); 1295 + void *entry; 1296 + 1297 + rcu_read_lock(); 1298 + do { 1299 + entry = xas_load(&xas); 1300 + if (xa_is_zero(entry)) 1301 + entry = NULL; 1302 + } while (xas_retry(&xas, entry)); 1303 + rcu_read_unlock(); 1304 + 1305 + return entry; 1306 + } 1307 + EXPORT_SYMBOL(xa_load); 1308 + 1309 + static void *xas_result(struct xa_state *xas, void *curr) 1310 + { 1311 + if (xa_is_zero(curr)) 1312 + return NULL; 1313 + XA_NODE_BUG_ON(xas->xa_node, xa_is_internal(curr)); 1314 + if (xas_error(xas)) 1315 + curr = xas->xa_node; 1316 + return curr; 1317 + } 1318 + 1319 + /** 1320 + * __xa_erase() - Erase this entry from the XArray while locked. 1321 + * @xa: XArray. 1322 + * @index: Index into array. 1323 + * 1324 + * If the entry at this index is a multi-index entry then all indices will 1325 + * be erased, and the entry will no longer be a multi-index entry. 1326 + * This function expects the xa_lock to be held on entry. 1327 + * 1328 + * Context: Any context. Expects xa_lock to be held on entry. May 1329 + * release and reacquire xa_lock if @gfp flags permit. 1330 + * Return: The old entry at this index. 1331 + */ 1332 + void *__xa_erase(struct xarray *xa, unsigned long index) 1333 + { 1334 + XA_STATE(xas, xa, index); 1335 + return xas_result(&xas, xas_store(&xas, NULL)); 1336 + } 1337 + EXPORT_SYMBOL_GPL(__xa_erase); 1338 + 1339 + /** 1340 + * xa_store() - Store this entry in the XArray. 1341 + * @xa: XArray. 1342 + * @index: Index into array. 1343 + * @entry: New entry. 1344 + * @gfp: Memory allocation flags. 1345 + * 1346 + * After this function returns, loads from this index will return @entry. 1347 + * Storing into an existing multislot entry updates the entry of every index. 1348 + * The marks associated with @index are unaffected unless @entry is %NULL. 1349 + * 1350 + * Context: Process context. Takes and releases the xa_lock. May sleep 1351 + * if the @gfp flags permit. 1352 + * Return: The old entry at this index on success, xa_err(-EINVAL) if @entry 1353 + * cannot be stored in an XArray, or xa_err(-ENOMEM) if memory allocation 1354 + * failed. 1355 + */ 1356 + void *xa_store(struct xarray *xa, unsigned long index, void *entry, gfp_t gfp) 1357 + { 1358 + XA_STATE(xas, xa, index); 1359 + void *curr; 1360 + 1361 + if (WARN_ON_ONCE(xa_is_internal(entry))) 1362 + return XA_ERROR(-EINVAL); 1363 + 1364 + do { 1365 + xas_lock(&xas); 1366 + curr = xas_store(&xas, entry); 1367 + if (xa_track_free(xa) && entry) 1368 + xas_clear_mark(&xas, XA_FREE_MARK); 1369 + xas_unlock(&xas); 1370 + } while (xas_nomem(&xas, gfp)); 1371 + 1372 + return xas_result(&xas, curr); 1373 + } 1374 + EXPORT_SYMBOL(xa_store); 1375 + 1376 + /** 1377 + * __xa_store() - Store this entry in the XArray. 1378 + * @xa: XArray. 1379 + * @index: Index into array. 1380 + * @entry: New entry. 1381 + * @gfp: Memory allocation flags. 1382 + * 1383 + * You must already be holding the xa_lock when calling this function. 1384 + * It will drop the lock if needed to allocate memory, and then reacquire 1385 + * it afterwards. 1386 + * 1387 + * Context: Any context. Expects xa_lock to be held on entry. May 1388 + * release and reacquire xa_lock if @gfp flags permit. 1389 + * Return: The old entry at this index or xa_err() if an error happened. 1390 + */ 1391 + void *__xa_store(struct xarray *xa, unsigned long index, void *entry, gfp_t gfp) 1392 + { 1393 + XA_STATE(xas, xa, index); 1394 + void *curr; 1395 + 1396 + if (WARN_ON_ONCE(xa_is_internal(entry))) 1397 + return XA_ERROR(-EINVAL); 1398 + 1399 + do { 1400 + curr = xas_store(&xas, entry); 1401 + if (xa_track_free(xa) && entry) 1402 + xas_clear_mark(&xas, XA_FREE_MARK); 1403 + } while (__xas_nomem(&xas, gfp)); 1404 + 1405 + return xas_result(&xas, curr); 1406 + } 1407 + EXPORT_SYMBOL(__xa_store); 1408 + 1409 + /** 1410 + * xa_cmpxchg() - Conditionally replace an entry in the XArray. 1411 + * @xa: XArray. 1412 + * @index: Index into array. 1413 + * @old: Old value to test against. 1414 + * @entry: New value to place in array. 1415 + * @gfp: Memory allocation flags. 1416 + * 1417 + * If the entry at @index is the same as @old, replace it with @entry. 1418 + * If the return value is equal to @old, then the exchange was successful. 1419 + * 1420 + * Context: Process context. Takes and releases the xa_lock. May sleep 1421 + * if the @gfp flags permit. 1422 + * Return: The old value at this index or xa_err() if an error happened. 1423 + */ 1424 + void *xa_cmpxchg(struct xarray *xa, unsigned long index, 1425 + void *old, void *entry, gfp_t gfp) 1426 + { 1427 + XA_STATE(xas, xa, index); 1428 + void *curr; 1429 + 1430 + if (WARN_ON_ONCE(xa_is_internal(entry))) 1431 + return XA_ERROR(-EINVAL); 1432 + 1433 + do { 1434 + xas_lock(&xas); 1435 + curr = xas_load(&xas); 1436 + if (curr == XA_ZERO_ENTRY) 1437 + curr = NULL; 1438 + if (curr == old) { 1439 + xas_store(&xas, entry); 1440 + if (xa_track_free(xa) && entry) 1441 + xas_clear_mark(&xas, XA_FREE_MARK); 1442 + } 1443 + xas_unlock(&xas); 1444 + } while (xas_nomem(&xas, gfp)); 1445 + 1446 + return xas_result(&xas, curr); 1447 + } 1448 + EXPORT_SYMBOL(xa_cmpxchg); 1449 + 1450 + /** 1451 + * __xa_cmpxchg() - Store this entry in the XArray. 1452 + * @xa: XArray. 1453 + * @index: Index into array. 1454 + * @old: Old value to test against. 1455 + * @entry: New entry. 1456 + * @gfp: Memory allocation flags. 1457 + * 1458 + * You must already be holding the xa_lock when calling this function. 1459 + * It will drop the lock if needed to allocate memory, and then reacquire 1460 + * it afterwards. 1461 + * 1462 + * Context: Any context. Expects xa_lock to be held on entry. May 1463 + * release and reacquire xa_lock if @gfp flags permit. 1464 + * Return: The old entry at this index or xa_err() if an error happened. 1465 + */ 1466 + void *__xa_cmpxchg(struct xarray *xa, unsigned long index, 1467 + void *old, void *entry, gfp_t gfp) 1468 + { 1469 + XA_STATE(xas, xa, index); 1470 + void *curr; 1471 + 1472 + if (WARN_ON_ONCE(xa_is_internal(entry))) 1473 + return XA_ERROR(-EINVAL); 1474 + 1475 + do { 1476 + curr = xas_load(&xas); 1477 + if (curr == XA_ZERO_ENTRY) 1478 + curr = NULL; 1479 + if (curr == old) { 1480 + xas_store(&xas, entry); 1481 + if (xa_track_free(xa) && entry) 1482 + xas_clear_mark(&xas, XA_FREE_MARK); 1483 + } 1484 + } while (__xas_nomem(&xas, gfp)); 1485 + 1486 + return xas_result(&xas, curr); 1487 + } 1488 + EXPORT_SYMBOL(__xa_cmpxchg); 1489 + 1490 + /** 1491 + * xa_reserve() - Reserve this index in the XArray. 1492 + * @xa: XArray. 1493 + * @index: Index into array. 1494 + * @gfp: Memory allocation flags. 1495 + * 1496 + * Ensures there is somewhere to store an entry at @index in the array. 1497 + * If there is already something stored at @index, this function does 1498 + * nothing. If there was nothing there, the entry is marked as reserved. 1499 + * Loads from @index will continue to see a %NULL pointer until a 1500 + * subsequent store to @index. 1501 + * 1502 + * If you do not use the entry that you have reserved, call xa_release() 1503 + * or xa_erase() to free any unnecessary memory. 1504 + * 1505 + * Context: Process context. Takes and releases the xa_lock, IRQ or BH safe 1506 + * if specified in XArray flags. May sleep if the @gfp flags permit. 1507 + * Return: 0 if the reservation succeeded or -ENOMEM if it failed. 1508 + */ 1509 + int xa_reserve(struct xarray *xa, unsigned long index, gfp_t gfp) 1510 + { 1511 + XA_STATE(xas, xa, index); 1512 + unsigned int lock_type = xa_lock_type(xa); 1513 + void *curr; 1514 + 1515 + do { 1516 + xas_lock_type(&xas, lock_type); 1517 + curr = xas_load(&xas); 1518 + if (!curr) 1519 + xas_store(&xas, XA_ZERO_ENTRY); 1520 + xas_unlock_type(&xas, lock_type); 1521 + } while (xas_nomem(&xas, gfp)); 1522 + 1523 + return xas_error(&xas); 1524 + } 1525 + EXPORT_SYMBOL(xa_reserve); 1526 + 1527 + #ifdef CONFIG_XARRAY_MULTI 1528 + static void xas_set_range(struct xa_state *xas, unsigned long first, 1529 + unsigned long last) 1530 + { 1531 + unsigned int shift = 0; 1532 + unsigned long sibs = last - first; 1533 + unsigned int offset = XA_CHUNK_MASK; 1534 + 1535 + xas_set(xas, first); 1536 + 1537 + while ((first & XA_CHUNK_MASK) == 0) { 1538 + if (sibs < XA_CHUNK_MASK) 1539 + break; 1540 + if ((sibs == XA_CHUNK_MASK) && (offset < XA_CHUNK_MASK)) 1541 + break; 1542 + shift += XA_CHUNK_SHIFT; 1543 + if (offset == XA_CHUNK_MASK) 1544 + offset = sibs & XA_CHUNK_MASK; 1545 + sibs >>= XA_CHUNK_SHIFT; 1546 + first >>= XA_CHUNK_SHIFT; 1547 + } 1548 + 1549 + offset = first & XA_CHUNK_MASK; 1550 + if (offset + sibs > XA_CHUNK_MASK) 1551 + sibs = XA_CHUNK_MASK - offset; 1552 + if ((((first + sibs + 1) << shift) - 1) > last) 1553 + sibs -= 1; 1554 + 1555 + xas->xa_shift = shift; 1556 + xas->xa_sibs = sibs; 1557 + } 1558 + 1559 + /** 1560 + * xa_store_range() - Store this entry at a range of indices in the XArray. 1561 + * @xa: XArray. 1562 + * @first: First index to affect. 1563 + * @last: Last index to affect. 1564 + * @entry: New entry. 1565 + * @gfp: Memory allocation flags. 1566 + * 1567 + * After this function returns, loads from any index between @first and @last, 1568 + * inclusive will return @entry. 1569 + * Storing into an existing multislot entry updates the entry of every index. 1570 + * The marks associated with @index are unaffected unless @entry is %NULL. 1571 + * 1572 + * Context: Process context. Takes and releases the xa_lock. May sleep 1573 + * if the @gfp flags permit. 1574 + * Return: %NULL on success, xa_err(-EINVAL) if @entry cannot be stored in 1575 + * an XArray, or xa_err(-ENOMEM) if memory allocation failed. 1576 + */ 1577 + void *xa_store_range(struct xarray *xa, unsigned long first, 1578 + unsigned long last, void *entry, gfp_t gfp) 1579 + { 1580 + XA_STATE(xas, xa, 0); 1581 + 1582 + if (WARN_ON_ONCE(xa_is_internal(entry))) 1583 + return XA_ERROR(-EINVAL); 1584 + if (last < first) 1585 + return XA_ERROR(-EINVAL); 1586 + 1587 + do { 1588 + xas_lock(&xas); 1589 + if (entry) { 1590 + unsigned int order = (last == ~0UL) ? 64 : 1591 + ilog2(last + 1); 1592 + xas_set_order(&xas, last, order); 1593 + xas_create(&xas); 1594 + if (xas_error(&xas)) 1595 + goto unlock; 1596 + } 1597 + do { 1598 + xas_set_range(&xas, first, last); 1599 + xas_store(&xas, entry); 1600 + if (xas_error(&xas)) 1601 + goto unlock; 1602 + first += xas_size(&xas); 1603 + } while (first <= last); 1604 + unlock: 1605 + xas_unlock(&xas); 1606 + } while (xas_nomem(&xas, gfp)); 1607 + 1608 + return xas_result(&xas, NULL); 1609 + } 1610 + EXPORT_SYMBOL(xa_store_range); 1611 + #endif /* CONFIG_XARRAY_MULTI */ 1612 + 1613 + /** 1614 + * __xa_alloc() - Find somewhere to store this entry in the XArray. 1615 + * @xa: XArray. 1616 + * @id: Pointer to ID. 1617 + * @max: Maximum ID to allocate (inclusive). 1618 + * @entry: New entry. 1619 + * @gfp: Memory allocation flags. 1620 + * 1621 + * Allocates an unused ID in the range specified by @id and @max. 1622 + * Updates the @id pointer with the index, then stores the entry at that 1623 + * index. A concurrent lookup will not see an uninitialised @id. 1624 + * 1625 + * Context: Any context. Expects xa_lock to be held on entry. May 1626 + * release and reacquire xa_lock if @gfp flags permit. 1627 + * Return: 0 on success, -ENOMEM if memory allocation fails or -ENOSPC if 1628 + * there is no more space in the XArray. 1629 + */ 1630 + int __xa_alloc(struct xarray *xa, u32 *id, u32 max, void *entry, gfp_t gfp) 1631 + { 1632 + XA_STATE(xas, xa, 0); 1633 + int err; 1634 + 1635 + if (WARN_ON_ONCE(xa_is_internal(entry))) 1636 + return -EINVAL; 1637 + if (WARN_ON_ONCE(!xa_track_free(xa))) 1638 + return -EINVAL; 1639 + 1640 + if (!entry) 1641 + entry = XA_ZERO_ENTRY; 1642 + 1643 + do { 1644 + xas.xa_index = *id; 1645 + xas_find_marked(&xas, max, XA_FREE_MARK); 1646 + if (xas.xa_node == XAS_RESTART) 1647 + xas_set_err(&xas, -ENOSPC); 1648 + xas_store(&xas, entry); 1649 + xas_clear_mark(&xas, XA_FREE_MARK); 1650 + } while (__xas_nomem(&xas, gfp)); 1651 + 1652 + err = xas_error(&xas); 1653 + if (!err) 1654 + *id = xas.xa_index; 1655 + return err; 1656 + } 1657 + EXPORT_SYMBOL(__xa_alloc); 1658 + 1659 + /** 1660 + * __xa_set_mark() - Set this mark on this entry while locked. 1661 + * @xa: XArray. 1662 + * @index: Index of entry. 1663 + * @mark: Mark number. 1664 + * 1665 + * Attempting to set a mark on a NULL entry does not succeed. 1666 + * 1667 + * Context: Any context. Expects xa_lock to be held on entry. 1668 + */ 1669 + void __xa_set_mark(struct xarray *xa, unsigned long index, xa_mark_t mark) 1670 + { 1671 + XA_STATE(xas, xa, index); 1672 + void *entry = xas_load(&xas); 1673 + 1674 + if (entry) 1675 + xas_set_mark(&xas, mark); 1676 + } 1677 + EXPORT_SYMBOL_GPL(__xa_set_mark); 1678 + 1679 + /** 1680 + * __xa_clear_mark() - Clear this mark on this entry while locked. 1681 + * @xa: XArray. 1682 + * @index: Index of entry. 1683 + * @mark: Mark number. 1684 + * 1685 + * Context: Any context. Expects xa_lock to be held on entry. 1686 + */ 1687 + void __xa_clear_mark(struct xarray *xa, unsigned long index, xa_mark_t mark) 1688 + { 1689 + XA_STATE(xas, xa, index); 1690 + void *entry = xas_load(&xas); 1691 + 1692 + if (entry) 1693 + xas_clear_mark(&xas, mark); 1694 + } 1695 + EXPORT_SYMBOL_GPL(__xa_clear_mark); 1696 + 1697 + /** 1698 + * xa_get_mark() - Inquire whether this mark is set on this entry. 1699 + * @xa: XArray. 1700 + * @index: Index of entry. 1701 + * @mark: Mark number. 1702 + * 1703 + * This function uses the RCU read lock, so the result may be out of date 1704 + * by the time it returns. If you need the result to be stable, use a lock. 1705 + * 1706 + * Context: Any context. Takes and releases the RCU lock. 1707 + * Return: True if the entry at @index has this mark set, false if it doesn't. 1708 + */ 1709 + bool xa_get_mark(struct xarray *xa, unsigned long index, xa_mark_t mark) 1710 + { 1711 + XA_STATE(xas, xa, index); 1712 + void *entry; 1713 + 1714 + rcu_read_lock(); 1715 + entry = xas_start(&xas); 1716 + while (xas_get_mark(&xas, mark)) { 1717 + if (!xa_is_node(entry)) 1718 + goto found; 1719 + entry = xas_descend(&xas, xa_to_node(entry)); 1720 + } 1721 + rcu_read_unlock(); 1722 + return false; 1723 + found: 1724 + rcu_read_unlock(); 1725 + return true; 1726 + } 1727 + EXPORT_SYMBOL(xa_get_mark); 1728 + 1729 + /** 1730 + * xa_set_mark() - Set this mark on this entry. 1731 + * @xa: XArray. 1732 + * @index: Index of entry. 1733 + * @mark: Mark number. 1734 + * 1735 + * Attempting to set a mark on a NULL entry does not succeed. 1736 + * 1737 + * Context: Process context. Takes and releases the xa_lock. 1738 + */ 1739 + void xa_set_mark(struct xarray *xa, unsigned long index, xa_mark_t mark) 1740 + { 1741 + xa_lock(xa); 1742 + __xa_set_mark(xa, index, mark); 1743 + xa_unlock(xa); 1744 + } 1745 + EXPORT_SYMBOL(xa_set_mark); 1746 + 1747 + /** 1748 + * xa_clear_mark() - Clear this mark on this entry. 1749 + * @xa: XArray. 1750 + * @index: Index of entry. 1751 + * @mark: Mark number. 1752 + * 1753 + * Clearing a mark always succeeds. 1754 + * 1755 + * Context: Process context. Takes and releases the xa_lock. 1756 + */ 1757 + void xa_clear_mark(struct xarray *xa, unsigned long index, xa_mark_t mark) 1758 + { 1759 + xa_lock(xa); 1760 + __xa_clear_mark(xa, index, mark); 1761 + xa_unlock(xa); 1762 + } 1763 + EXPORT_SYMBOL(xa_clear_mark); 1764 + 1765 + /** 1766 + * xa_find() - Search the XArray for an entry. 1767 + * @xa: XArray. 1768 + * @indexp: Pointer to an index. 1769 + * @max: Maximum index to search to. 1770 + * @filter: Selection criterion. 1771 + * 1772 + * Finds the entry in @xa which matches the @filter, and has the lowest 1773 + * index that is at least @indexp and no more than @max. 1774 + * If an entry is found, @indexp is updated to be the index of the entry. 1775 + * This function is protected by the RCU read lock, so it may not find 1776 + * entries which are being simultaneously added. It will not return an 1777 + * %XA_RETRY_ENTRY; if you need to see retry entries, use xas_find(). 1778 + * 1779 + * Context: Any context. Takes and releases the RCU lock. 1780 + * Return: The entry, if found, otherwise %NULL. 1781 + */ 1782 + void *xa_find(struct xarray *xa, unsigned long *indexp, 1783 + unsigned long max, xa_mark_t filter) 1784 + { 1785 + XA_STATE(xas, xa, *indexp); 1786 + void *entry; 1787 + 1788 + rcu_read_lock(); 1789 + do { 1790 + if ((__force unsigned int)filter < XA_MAX_MARKS) 1791 + entry = xas_find_marked(&xas, max, filter); 1792 + else 1793 + entry = xas_find(&xas, max); 1794 + } while (xas_retry(&xas, entry)); 1795 + rcu_read_unlock(); 1796 + 1797 + if (entry) 1798 + *indexp = xas.xa_index; 1799 + return entry; 1800 + } 1801 + EXPORT_SYMBOL(xa_find); 1802 + 1803 + /** 1804 + * xa_find_after() - Search the XArray for a present entry. 1805 + * @xa: XArray. 1806 + * @indexp: Pointer to an index. 1807 + * @max: Maximum index to search to. 1808 + * @filter: Selection criterion. 1809 + * 1810 + * Finds the entry in @xa which matches the @filter and has the lowest 1811 + * index that is above @indexp and no more than @max. 1812 + * If an entry is found, @indexp is updated to be the index of the entry. 1813 + * This function is protected by the RCU read lock, so it may miss entries 1814 + * which are being simultaneously added. It will not return an 1815 + * %XA_RETRY_ENTRY; if you need to see retry entries, use xas_find(). 1816 + * 1817 + * Context: Any context. Takes and releases the RCU lock. 1818 + * Return: The pointer, if found, otherwise %NULL. 1819 + */ 1820 + void *xa_find_after(struct xarray *xa, unsigned long *indexp, 1821 + unsigned long max, xa_mark_t filter) 1822 + { 1823 + XA_STATE(xas, xa, *indexp + 1); 1824 + void *entry; 1825 + 1826 + rcu_read_lock(); 1827 + for (;;) { 1828 + if ((__force unsigned int)filter < XA_MAX_MARKS) 1829 + entry = xas_find_marked(&xas, max, filter); 1830 + else 1831 + entry = xas_find(&xas, max); 1832 + if (xas.xa_shift) { 1833 + if (xas.xa_index & ((1UL << xas.xa_shift) - 1)) 1834 + continue; 1835 + } else { 1836 + if (xas.xa_offset < (xas.xa_index & XA_CHUNK_MASK)) 1837 + continue; 1838 + } 1839 + if (!xas_retry(&xas, entry)) 1840 + break; 1841 + } 1842 + rcu_read_unlock(); 1843 + 1844 + if (entry) 1845 + *indexp = xas.xa_index; 1846 + return entry; 1847 + } 1848 + EXPORT_SYMBOL(xa_find_after); 1849 + 1850 + static unsigned int xas_extract_present(struct xa_state *xas, void **dst, 1851 + unsigned long max, unsigned int n) 1852 + { 1853 + void *entry; 1854 + unsigned int i = 0; 1855 + 1856 + rcu_read_lock(); 1857 + xas_for_each(xas, entry, max) { 1858 + if (xas_retry(xas, entry)) 1859 + continue; 1860 + dst[i++] = entry; 1861 + if (i == n) 1862 + break; 1863 + } 1864 + rcu_read_unlock(); 1865 + 1866 + return i; 1867 + } 1868 + 1869 + static unsigned int xas_extract_marked(struct xa_state *xas, void **dst, 1870 + unsigned long max, unsigned int n, xa_mark_t mark) 1871 + { 1872 + void *entry; 1873 + unsigned int i = 0; 1874 + 1875 + rcu_read_lock(); 1876 + xas_for_each_marked(xas, entry, max, mark) { 1877 + if (xas_retry(xas, entry)) 1878 + continue; 1879 + dst[i++] = entry; 1880 + if (i == n) 1881 + break; 1882 + } 1883 + rcu_read_unlock(); 1884 + 1885 + return i; 1886 + } 1887 + 1888 + /** 1889 + * xa_extract() - Copy selected entries from the XArray into a normal array. 1890 + * @xa: The source XArray to copy from. 1891 + * @dst: The buffer to copy entries into. 1892 + * @start: The first index in the XArray eligible to be selected. 1893 + * @max: The last index in the XArray eligible to be selected. 1894 + * @n: The maximum number of entries to copy. 1895 + * @filter: Selection criterion. 1896 + * 1897 + * Copies up to @n entries that match @filter from the XArray. The 1898 + * copied entries will have indices between @start and @max, inclusive. 1899 + * 1900 + * The @filter may be an XArray mark value, in which case entries which are 1901 + * marked with that mark will be copied. It may also be %XA_PRESENT, in 1902 + * which case all entries which are not NULL will be copied. 1903 + * 1904 + * The entries returned may not represent a snapshot of the XArray at a 1905 + * moment in time. For example, if another thread stores to index 5, then 1906 + * index 10, calling xa_extract() may return the old contents of index 5 1907 + * and the new contents of index 10. Indices not modified while this 1908 + * function is running will not be skipped. 1909 + * 1910 + * If you need stronger guarantees, holding the xa_lock across calls to this 1911 + * function will prevent concurrent modification. 1912 + * 1913 + * Context: Any context. Takes and releases the RCU lock. 1914 + * Return: The number of entries copied. 1915 + */ 1916 + unsigned int xa_extract(struct xarray *xa, void **dst, unsigned long start, 1917 + unsigned long max, unsigned int n, xa_mark_t filter) 1918 + { 1919 + XA_STATE(xas, xa, start); 1920 + 1921 + if (!n) 1922 + return 0; 1923 + 1924 + if ((__force unsigned int)filter < XA_MAX_MARKS) 1925 + return xas_extract_marked(&xas, dst, max, n, filter); 1926 + return xas_extract_present(&xas, dst, max, n); 1927 + } 1928 + EXPORT_SYMBOL(xa_extract); 1929 + 1930 + /** 1931 + * xa_destroy() - Free all internal data structures. 1932 + * @xa: XArray. 1933 + * 1934 + * After calling this function, the XArray is empty and has freed all memory 1935 + * allocated for its internal data structures. You are responsible for 1936 + * freeing the objects referenced by the XArray. 1937 + * 1938 + * Context: Any context. Takes and releases the xa_lock, interrupt-safe. 1939 + */ 1940 + void xa_destroy(struct xarray *xa) 1941 + { 1942 + XA_STATE(xas, xa, 0); 1943 + unsigned long flags; 1944 + void *entry; 1945 + 1946 + xas.xa_node = NULL; 1947 + xas_lock_irqsave(&xas, flags); 1948 + entry = xa_head_locked(xa); 1949 + RCU_INIT_POINTER(xa->xa_head, NULL); 1950 + xas_init_marks(&xas); 1951 + /* lockdep checks we're still holding the lock in xas_free_nodes() */ 1952 + if (xa_is_node(entry)) 1953 + xas_free_nodes(&xas, xa_to_node(entry)); 1954 + xas_unlock_irqrestore(&xas, flags); 1955 + } 1956 + EXPORT_SYMBOL(xa_destroy); 1957 + 1958 + #ifdef XA_DEBUG 1959 + void xa_dump_node(const struct xa_node *node) 1960 + { 1961 + unsigned i, j; 1962 + 1963 + if (!node) 1964 + return; 1965 + if ((unsigned long)node & 3) { 1966 + pr_cont("node %px\n", node); 1967 + return; 1968 + } 1969 + 1970 + pr_cont("node %px %s %d parent %px shift %d count %d values %d " 1971 + "array %px list %px %px marks", 1972 + node, node->parent ? "offset" : "max", node->offset, 1973 + node->parent, node->shift, node->count, node->nr_values, 1974 + node->array, node->private_list.prev, node->private_list.next); 1975 + for (i = 0; i < XA_MAX_MARKS; i++) 1976 + for (j = 0; j < XA_MARK_LONGS; j++) 1977 + pr_cont(" %lx", node->marks[i][j]); 1978 + pr_cont("\n"); 1979 + } 1980 + 1981 + void xa_dump_index(unsigned long index, unsigned int shift) 1982 + { 1983 + if (!shift) 1984 + pr_info("%lu: ", index); 1985 + else if (shift >= BITS_PER_LONG) 1986 + pr_info("0-%lu: ", ~0UL); 1987 + else 1988 + pr_info("%lu-%lu: ", index, index | ((1UL << shift) - 1)); 1989 + } 1990 + 1991 + void xa_dump_entry(const void *entry, unsigned long index, unsigned long shift) 1992 + { 1993 + if (!entry) 1994 + return; 1995 + 1996 + xa_dump_index(index, shift); 1997 + 1998 + if (xa_is_node(entry)) { 1999 + if (shift == 0) { 2000 + pr_cont("%px\n", entry); 2001 + } else { 2002 + unsigned long i; 2003 + struct xa_node *node = xa_to_node(entry); 2004 + xa_dump_node(node); 2005 + for (i = 0; i < XA_CHUNK_SIZE; i++) 2006 + xa_dump_entry(node->slots[i], 2007 + index + (i << node->shift), node->shift); 2008 + } 2009 + } else if (xa_is_value(entry)) 2010 + pr_cont("value %ld (0x%lx) [%px]\n", xa_to_value(entry), 2011 + xa_to_value(entry), entry); 2012 + else if (!xa_is_internal(entry)) 2013 + pr_cont("%px\n", entry); 2014 + else if (xa_is_retry(entry)) 2015 + pr_cont("retry (%ld)\n", xa_to_internal(entry)); 2016 + else if (xa_is_sibling(entry)) 2017 + pr_cont("sibling (slot %ld)\n", xa_to_sibling(entry)); 2018 + else if (xa_is_zero(entry)) 2019 + pr_cont("zero (%ld)\n", xa_to_internal(entry)); 2020 + else 2021 + pr_cont("UNKNOWN ENTRY (%px)\n", entry); 2022 + } 2023 + 2024 + void xa_dump(const struct xarray *xa) 2025 + { 2026 + void *entry = xa->xa_head; 2027 + unsigned int shift = 0; 2028 + 2029 + pr_info("xarray: %px head %px flags %x marks %d %d %d\n", xa, entry, 2030 + xa->xa_flags, xa_marked(xa, XA_MARK_0), 2031 + xa_marked(xa, XA_MARK_1), xa_marked(xa, XA_MARK_2)); 2032 + if (xa_is_node(entry)) 2033 + shift = xa_to_node(entry)->shift + XA_CHUNK_SHIFT; 2034 + xa_dump_entry(entry, 0, shift); 2035 + } 2036 + #endif
+2 -2
mm/Kconfig
··· 379 379 bool "Transparent Hugepage Support" 380 380 depends on HAVE_ARCH_TRANSPARENT_HUGEPAGE 381 381 select COMPACTION 382 - select RADIX_TREE_MULTIORDER 382 + select XARRAY_MULTI 383 383 help 384 384 Transparent Hugepages allows the kernel to use huge pages and 385 385 huge tlb transparently to the applications whenever possible. ··· 671 671 depends on MEMORY_HOTREMOVE 672 672 depends on SPARSEMEM_VMEMMAP 673 673 depends on ARCH_HAS_ZONE_DEVICE 674 - select RADIX_TREE_MULTIORDER 674 + select XARRAY_MULTI 675 675 676 676 help 677 677 Device memory hotplug support allows for establishing pmem,
+304 -408
mm/filemap.c
··· 113 113 * ->tasklist_lock (memory_failure, collect_procs_ao) 114 114 */ 115 115 116 - static int page_cache_tree_insert(struct address_space *mapping, 117 - struct page *page, void **shadowp) 118 - { 119 - struct radix_tree_node *node; 120 - void **slot; 121 - int error; 122 - 123 - error = __radix_tree_create(&mapping->i_pages, page->index, 0, 124 - &node, &slot); 125 - if (error) 126 - return error; 127 - if (*slot) { 128 - void *p; 129 - 130 - p = radix_tree_deref_slot_protected(slot, 131 - &mapping->i_pages.xa_lock); 132 - if (!radix_tree_exceptional_entry(p)) 133 - return -EEXIST; 134 - 135 - mapping->nrexceptional--; 136 - if (shadowp) 137 - *shadowp = p; 138 - } 139 - __radix_tree_replace(&mapping->i_pages, node, slot, page, 140 - workingset_lookup_update(mapping)); 141 - mapping->nrpages++; 142 - return 0; 143 - } 144 - 145 - static void page_cache_tree_delete(struct address_space *mapping, 116 + static void page_cache_delete(struct address_space *mapping, 146 117 struct page *page, void *shadow) 147 118 { 148 - int i, nr; 119 + XA_STATE(xas, &mapping->i_pages, page->index); 120 + unsigned int nr = 1; 149 121 150 - /* hugetlb pages are represented by one entry in the radix tree */ 151 - nr = PageHuge(page) ? 1 : hpage_nr_pages(page); 122 + mapping_set_update(&xas, mapping); 123 + 124 + /* hugetlb pages are represented by a single entry in the xarray */ 125 + if (!PageHuge(page)) { 126 + xas_set_order(&xas, page->index, compound_order(page)); 127 + nr = 1U << compound_order(page); 128 + } 152 129 153 130 VM_BUG_ON_PAGE(!PageLocked(page), page); 154 131 VM_BUG_ON_PAGE(PageTail(page), page); 155 132 VM_BUG_ON_PAGE(nr != 1 && shadow, page); 156 133 157 - for (i = 0; i < nr; i++) { 158 - struct radix_tree_node *node; 159 - void **slot; 160 - 161 - __radix_tree_lookup(&mapping->i_pages, page->index + i, 162 - &node, &slot); 163 - 164 - VM_BUG_ON_PAGE(!node && nr != 1, page); 165 - 166 - radix_tree_clear_tags(&mapping->i_pages, node, slot); 167 - __radix_tree_replace(&mapping->i_pages, node, slot, shadow, 168 - workingset_lookup_update(mapping)); 169 - } 134 + xas_store(&xas, shadow); 135 + xas_init_marks(&xas); 170 136 171 137 page->mapping = NULL; 172 138 /* Leave page->index set: truncation lookup relies upon it */ ··· 231 265 trace_mm_filemap_delete_from_page_cache(page); 232 266 233 267 unaccount_page_cache_page(mapping, page); 234 - page_cache_tree_delete(mapping, page, shadow); 268 + page_cache_delete(mapping, page, shadow); 235 269 } 236 270 237 271 static void page_cache_free_page(struct address_space *mapping, ··· 274 308 EXPORT_SYMBOL(delete_from_page_cache); 275 309 276 310 /* 277 - * page_cache_tree_delete_batch - delete several pages from page cache 311 + * page_cache_delete_batch - delete several pages from page cache 278 312 * @mapping: the mapping to which pages belong 279 313 * @pvec: pagevec with pages to delete 280 314 * ··· 287 321 * 288 322 * The function expects the i_pages lock to be held. 289 323 */ 290 - static void 291 - page_cache_tree_delete_batch(struct address_space *mapping, 324 + static void page_cache_delete_batch(struct address_space *mapping, 292 325 struct pagevec *pvec) 293 326 { 294 - struct radix_tree_iter iter; 295 - void **slot; 327 + XA_STATE(xas, &mapping->i_pages, pvec->pages[0]->index); 296 328 int total_pages = 0; 297 329 int i = 0, tail_pages = 0; 298 330 struct page *page; 299 - pgoff_t start; 300 331 301 - start = pvec->pages[0]->index; 302 - radix_tree_for_each_slot(slot, &mapping->i_pages, &iter, start) { 332 + mapping_set_update(&xas, mapping); 333 + xas_for_each(&xas, page, ULONG_MAX) { 303 334 if (i >= pagevec_count(pvec) && !tail_pages) 304 335 break; 305 - page = radix_tree_deref_slot_protected(slot, 306 - &mapping->i_pages.xa_lock); 307 - if (radix_tree_exceptional_entry(page)) 336 + if (xa_is_value(page)) 308 337 continue; 309 338 if (!tail_pages) { 310 339 /* ··· 307 346 * have our pages locked so they are protected from 308 347 * being removed. 309 348 */ 310 - if (page != pvec->pages[i]) 349 + if (page != pvec->pages[i]) { 350 + VM_BUG_ON_PAGE(page->index > 351 + pvec->pages[i]->index, page); 311 352 continue; 353 + } 312 354 WARN_ON_ONCE(!PageLocked(page)); 313 355 if (PageTransHuge(page) && !PageHuge(page)) 314 356 tail_pages = HPAGE_PMD_NR - 1; ··· 322 358 */ 323 359 i++; 324 360 } else { 361 + VM_BUG_ON_PAGE(page->index + HPAGE_PMD_NR - tail_pages 362 + != pvec->pages[i]->index, page); 325 363 tail_pages--; 326 364 } 327 - radix_tree_clear_tags(&mapping->i_pages, iter.node, slot); 328 - __radix_tree_replace(&mapping->i_pages, iter.node, slot, NULL, 329 - workingset_lookup_update(mapping)); 365 + xas_store(&xas, NULL); 330 366 total_pages++; 331 367 } 332 368 mapping->nrpages -= total_pages; ··· 347 383 348 384 unaccount_page_cache_page(mapping, pvec->pages[i]); 349 385 } 350 - page_cache_tree_delete_batch(mapping, pvec); 386 + page_cache_delete_batch(mapping, pvec); 351 387 xa_unlock_irqrestore(&mapping->i_pages, flags); 352 388 353 389 for (i = 0; i < pagevec_count(pvec); i++) ··· 457 493 bool filemap_range_has_page(struct address_space *mapping, 458 494 loff_t start_byte, loff_t end_byte) 459 495 { 460 - pgoff_t index = start_byte >> PAGE_SHIFT; 461 - pgoff_t end = end_byte >> PAGE_SHIFT; 462 496 struct page *page; 497 + XA_STATE(xas, &mapping->i_pages, start_byte >> PAGE_SHIFT); 498 + pgoff_t max = end_byte >> PAGE_SHIFT; 463 499 464 500 if (end_byte < start_byte) 465 501 return false; 466 502 467 - if (mapping->nrpages == 0) 468 - return false; 503 + rcu_read_lock(); 504 + for (;;) { 505 + page = xas_find(&xas, max); 506 + if (xas_retry(&xas, page)) 507 + continue; 508 + /* Shadow entries don't count */ 509 + if (xa_is_value(page)) 510 + continue; 511 + /* 512 + * We don't need to try to pin this page; we're about to 513 + * release the RCU lock anyway. It is enough to know that 514 + * there was a page here recently. 515 + */ 516 + break; 517 + } 518 + rcu_read_unlock(); 469 519 470 - if (!find_get_pages_range(mapping, &index, end, 1, &page)) 471 - return false; 472 - put_page(page); 473 - return true; 520 + return page != NULL; 474 521 } 475 522 EXPORT_SYMBOL(filemap_range_has_page); 476 523 ··· 752 777 * locked. This function does not add the new page to the LRU, the 753 778 * caller must do that. 754 779 * 755 - * The remove + add is atomic. The only way this function can fail is 756 - * memory allocation failure. 780 + * The remove + add is atomic. This function cannot fail. 757 781 */ 758 782 int replace_page_cache_page(struct page *old, struct page *new, gfp_t gfp_mask) 759 783 { 760 - int error; 784 + struct address_space *mapping = old->mapping; 785 + void (*freepage)(struct page *) = mapping->a_ops->freepage; 786 + pgoff_t offset = old->index; 787 + XA_STATE(xas, &mapping->i_pages, offset); 788 + unsigned long flags; 761 789 762 790 VM_BUG_ON_PAGE(!PageLocked(old), old); 763 791 VM_BUG_ON_PAGE(!PageLocked(new), new); 764 792 VM_BUG_ON_PAGE(new->mapping, new); 765 793 766 - error = radix_tree_preload(gfp_mask & GFP_RECLAIM_MASK); 767 - if (!error) { 768 - struct address_space *mapping = old->mapping; 769 - void (*freepage)(struct page *); 770 - unsigned long flags; 794 + get_page(new); 795 + new->mapping = mapping; 796 + new->index = offset; 771 797 772 - pgoff_t offset = old->index; 773 - freepage = mapping->a_ops->freepage; 798 + xas_lock_irqsave(&xas, flags); 799 + xas_store(&xas, new); 774 800 775 - get_page(new); 776 - new->mapping = mapping; 777 - new->index = offset; 801 + old->mapping = NULL; 802 + /* hugetlb pages do not participate in page cache accounting. */ 803 + if (!PageHuge(old)) 804 + __dec_node_page_state(new, NR_FILE_PAGES); 805 + if (!PageHuge(new)) 806 + __inc_node_page_state(new, NR_FILE_PAGES); 807 + if (PageSwapBacked(old)) 808 + __dec_node_page_state(new, NR_SHMEM); 809 + if (PageSwapBacked(new)) 810 + __inc_node_page_state(new, NR_SHMEM); 811 + xas_unlock_irqrestore(&xas, flags); 812 + mem_cgroup_migrate(old, new); 813 + if (freepage) 814 + freepage(old); 815 + put_page(old); 778 816 779 - xa_lock_irqsave(&mapping->i_pages, flags); 780 - __delete_from_page_cache(old, NULL); 781 - error = page_cache_tree_insert(mapping, new, NULL); 782 - BUG_ON(error); 783 - 784 - /* 785 - * hugetlb pages do not participate in page cache accounting. 786 - */ 787 - if (!PageHuge(new)) 788 - __inc_node_page_state(new, NR_FILE_PAGES); 789 - if (PageSwapBacked(new)) 790 - __inc_node_page_state(new, NR_SHMEM); 791 - xa_unlock_irqrestore(&mapping->i_pages, flags); 792 - mem_cgroup_migrate(old, new); 793 - radix_tree_preload_end(); 794 - if (freepage) 795 - freepage(old); 796 - put_page(old); 797 - } 798 - 799 - return error; 817 + return 0; 800 818 } 801 819 EXPORT_SYMBOL_GPL(replace_page_cache_page); 802 820 ··· 798 830 pgoff_t offset, gfp_t gfp_mask, 799 831 void **shadowp) 800 832 { 833 + XA_STATE(xas, &mapping->i_pages, offset); 801 834 int huge = PageHuge(page); 802 835 struct mem_cgroup *memcg; 803 836 int error; 837 + void *old; 804 838 805 839 VM_BUG_ON_PAGE(!PageLocked(page), page); 806 840 VM_BUG_ON_PAGE(PageSwapBacked(page), page); 841 + mapping_set_update(&xas, mapping); 807 842 808 843 if (!huge) { 809 844 error = mem_cgroup_try_charge(page, current->mm, ··· 815 844 return error; 816 845 } 817 846 818 - error = radix_tree_maybe_preload(gfp_mask & GFP_RECLAIM_MASK); 819 - if (error) { 820 - if (!huge) 821 - mem_cgroup_cancel_charge(page, memcg, false); 822 - return error; 823 - } 824 - 825 847 get_page(page); 826 848 page->mapping = mapping; 827 849 page->index = offset; 828 850 829 - xa_lock_irq(&mapping->i_pages); 830 - error = page_cache_tree_insert(mapping, page, shadowp); 831 - radix_tree_preload_end(); 832 - if (unlikely(error)) 833 - goto err_insert; 851 + do { 852 + xas_lock_irq(&xas); 853 + old = xas_load(&xas); 854 + if (old && !xa_is_value(old)) 855 + xas_set_err(&xas, -EEXIST); 856 + xas_store(&xas, page); 857 + if (xas_error(&xas)) 858 + goto unlock; 834 859 835 - /* hugetlb pages do not participate in page cache accounting. */ 836 - if (!huge) 837 - __inc_node_page_state(page, NR_FILE_PAGES); 838 - xa_unlock_irq(&mapping->i_pages); 860 + if (xa_is_value(old)) { 861 + mapping->nrexceptional--; 862 + if (shadowp) 863 + *shadowp = old; 864 + } 865 + mapping->nrpages++; 866 + 867 + /* hugetlb pages do not participate in page cache accounting */ 868 + if (!huge) 869 + __inc_node_page_state(page, NR_FILE_PAGES); 870 + unlock: 871 + xas_unlock_irq(&xas); 872 + } while (xas_nomem(&xas, gfp_mask & GFP_RECLAIM_MASK)); 873 + 874 + if (xas_error(&xas)) 875 + goto error; 876 + 839 877 if (!huge) 840 878 mem_cgroup_commit_charge(page, memcg, false, false); 841 879 trace_mm_filemap_add_to_page_cache(page); 842 880 return 0; 843 - err_insert: 881 + error: 844 882 page->mapping = NULL; 845 883 /* Leave page->index set: truncation relies upon it */ 846 - xa_unlock_irq(&mapping->i_pages); 847 884 if (!huge) 848 885 mem_cgroup_cancel_charge(page, memcg, false); 849 886 put_page(page); 850 - return error; 887 + return xas_error(&xas); 851 888 } 852 889 853 890 /** ··· 1320 1341 } 1321 1342 1322 1343 /** 1323 - * page_cache_next_hole - find the next hole (not-present entry) 1324 - * @mapping: mapping 1325 - * @index: index 1326 - * @max_scan: maximum range to search 1344 + * page_cache_next_miss() - Find the next gap in the page cache. 1345 + * @mapping: Mapping. 1346 + * @index: Index. 1347 + * @max_scan: Maximum range to search. 1327 1348 * 1328 - * Search the set [index, min(index+max_scan-1, MAX_INDEX)] for the 1329 - * lowest indexed hole. 1349 + * Search the range [index, min(index + max_scan - 1, ULONG_MAX)] for the 1350 + * gap with the lowest index. 1330 1351 * 1331 - * Returns: the index of the hole if found, otherwise returns an index 1332 - * outside of the set specified (in which case 'return - index >= 1333 - * max_scan' will be true). In rare cases of index wrap-around, 0 will 1334 - * be returned. 1352 + * This function may be called under the rcu_read_lock. However, this will 1353 + * not atomically search a snapshot of the cache at a single point in time. 1354 + * For example, if a gap is created at index 5, then subsequently a gap is 1355 + * created at index 10, page_cache_next_miss covering both indices may 1356 + * return 10 if called under the rcu_read_lock. 1335 1357 * 1336 - * page_cache_next_hole may be called under rcu_read_lock. However, 1337 - * like radix_tree_gang_lookup, this will not atomically search a 1338 - * snapshot of the tree at a single point in time. For example, if a 1339 - * hole is created at index 5, then subsequently a hole is created at 1340 - * index 10, page_cache_next_hole covering both indexes may return 10 1341 - * if called under rcu_read_lock. 1358 + * Return: The index of the gap if found, otherwise an index outside the 1359 + * range specified (in which case 'return - index >= max_scan' will be true). 1360 + * In the rare case of index wrap-around, 0 will be returned. 1342 1361 */ 1343 - pgoff_t page_cache_next_hole(struct address_space *mapping, 1362 + pgoff_t page_cache_next_miss(struct address_space *mapping, 1344 1363 pgoff_t index, unsigned long max_scan) 1345 1364 { 1346 - unsigned long i; 1365 + XA_STATE(xas, &mapping->i_pages, index); 1347 1366 1348 - for (i = 0; i < max_scan; i++) { 1349 - struct page *page; 1350 - 1351 - page = radix_tree_lookup(&mapping->i_pages, index); 1352 - if (!page || radix_tree_exceptional_entry(page)) 1367 + while (max_scan--) { 1368 + void *entry = xas_next(&xas); 1369 + if (!entry || xa_is_value(entry)) 1353 1370 break; 1354 - index++; 1355 - if (index == 0) 1371 + if (xas.xa_index == 0) 1356 1372 break; 1357 1373 } 1358 1374 1359 - return index; 1375 + return xas.xa_index; 1360 1376 } 1361 - EXPORT_SYMBOL(page_cache_next_hole); 1377 + EXPORT_SYMBOL(page_cache_next_miss); 1362 1378 1363 1379 /** 1364 - * page_cache_prev_hole - find the prev hole (not-present entry) 1365 - * @mapping: mapping 1366 - * @index: index 1367 - * @max_scan: maximum range to search 1380 + * page_cache_prev_miss() - Find the next gap in the page cache. 1381 + * @mapping: Mapping. 1382 + * @index: Index. 1383 + * @max_scan: Maximum range to search. 1368 1384 * 1369 - * Search backwards in the range [max(index-max_scan+1, 0), index] for 1370 - * the first hole. 1385 + * Search the range [max(index - max_scan + 1, 0), index] for the 1386 + * gap with the highest index. 1371 1387 * 1372 - * Returns: the index of the hole if found, otherwise returns an index 1373 - * outside of the set specified (in which case 'index - return >= 1374 - * max_scan' will be true). In rare cases of wrap-around, ULONG_MAX 1375 - * will be returned. 1388 + * This function may be called under the rcu_read_lock. However, this will 1389 + * not atomically search a snapshot of the cache at a single point in time. 1390 + * For example, if a gap is created at index 10, then subsequently a gap is 1391 + * created at index 5, page_cache_prev_miss() covering both indices may 1392 + * return 5 if called under the rcu_read_lock. 1376 1393 * 1377 - * page_cache_prev_hole may be called under rcu_read_lock. However, 1378 - * like radix_tree_gang_lookup, this will not atomically search a 1379 - * snapshot of the tree at a single point in time. For example, if a 1380 - * hole is created at index 10, then subsequently a hole is created at 1381 - * index 5, page_cache_prev_hole covering both indexes may return 5 if 1382 - * called under rcu_read_lock. 1394 + * Return: The index of the gap if found, otherwise an index outside the 1395 + * range specified (in which case 'index - return >= max_scan' will be true). 1396 + * In the rare case of wrap-around, ULONG_MAX will be returned. 1383 1397 */ 1384 - pgoff_t page_cache_prev_hole(struct address_space *mapping, 1398 + pgoff_t page_cache_prev_miss(struct address_space *mapping, 1385 1399 pgoff_t index, unsigned long max_scan) 1386 1400 { 1387 - unsigned long i; 1401 + XA_STATE(xas, &mapping->i_pages, index); 1388 1402 1389 - for (i = 0; i < max_scan; i++) { 1390 - struct page *page; 1391 - 1392 - page = radix_tree_lookup(&mapping->i_pages, index); 1393 - if (!page || radix_tree_exceptional_entry(page)) 1403 + while (max_scan--) { 1404 + void *entry = xas_prev(&xas); 1405 + if (!entry || xa_is_value(entry)) 1394 1406 break; 1395 - index--; 1396 - if (index == ULONG_MAX) 1407 + if (xas.xa_index == ULONG_MAX) 1397 1408 break; 1398 1409 } 1399 1410 1400 - return index; 1411 + return xas.xa_index; 1401 1412 } 1402 - EXPORT_SYMBOL(page_cache_prev_hole); 1413 + EXPORT_SYMBOL(page_cache_prev_miss); 1403 1414 1404 1415 /** 1405 1416 * find_get_entry - find and get a page cache entry ··· 1406 1437 */ 1407 1438 struct page *find_get_entry(struct address_space *mapping, pgoff_t offset) 1408 1439 { 1409 - void **pagep; 1440 + XA_STATE(xas, &mapping->i_pages, offset); 1410 1441 struct page *head, *page; 1411 1442 1412 1443 rcu_read_lock(); 1413 1444 repeat: 1414 - page = NULL; 1415 - pagep = radix_tree_lookup_slot(&mapping->i_pages, offset); 1416 - if (pagep) { 1417 - page = radix_tree_deref_slot(pagep); 1418 - if (unlikely(!page)) 1419 - goto out; 1420 - if (radix_tree_exception(page)) { 1421 - if (radix_tree_deref_retry(page)) 1422 - goto repeat; 1423 - /* 1424 - * A shadow entry of a recently evicted page, 1425 - * or a swap entry from shmem/tmpfs. Return 1426 - * it without attempting to raise page count. 1427 - */ 1428 - goto out; 1429 - } 1445 + xas_reset(&xas); 1446 + page = xas_load(&xas); 1447 + if (xas_retry(&xas, page)) 1448 + goto repeat; 1449 + /* 1450 + * A shadow entry of a recently evicted page, or a swap entry from 1451 + * shmem/tmpfs. Return it without attempting to raise page count. 1452 + */ 1453 + if (!page || xa_is_value(page)) 1454 + goto out; 1430 1455 1431 - head = compound_head(page); 1432 - if (!page_cache_get_speculative(head)) 1433 - goto repeat; 1456 + head = compound_head(page); 1457 + if (!page_cache_get_speculative(head)) 1458 + goto repeat; 1434 1459 1435 - /* The page was split under us? */ 1436 - if (compound_head(page) != head) { 1437 - put_page(head); 1438 - goto repeat; 1439 - } 1460 + /* The page was split under us? */ 1461 + if (compound_head(page) != head) { 1462 + put_page(head); 1463 + goto repeat; 1464 + } 1440 1465 1441 - /* 1442 - * Has the page moved? 1443 - * This is part of the lockless pagecache protocol. See 1444 - * include/linux/pagemap.h for details. 1445 - */ 1446 - if (unlikely(page != *pagep)) { 1447 - put_page(head); 1448 - goto repeat; 1449 - } 1466 + /* 1467 + * Has the page moved? 1468 + * This is part of the lockless pagecache protocol. See 1469 + * include/linux/pagemap.h for details. 1470 + */ 1471 + if (unlikely(page != xas_reload(&xas))) { 1472 + put_page(head); 1473 + goto repeat; 1450 1474 } 1451 1475 out: 1452 1476 rcu_read_unlock(); ··· 1470 1508 1471 1509 repeat: 1472 1510 page = find_get_entry(mapping, offset); 1473 - if (page && !radix_tree_exception(page)) { 1511 + if (page && !xa_is_value(page)) { 1474 1512 lock_page(page); 1475 1513 /* Has the page been truncated? */ 1476 1514 if (unlikely(page_mapping(page) != mapping)) { ··· 1516 1554 1517 1555 repeat: 1518 1556 page = find_get_entry(mapping, offset); 1519 - if (radix_tree_exceptional_entry(page)) 1557 + if (xa_is_value(page)) 1520 1558 page = NULL; 1521 1559 if (!page) 1522 1560 goto no_page; ··· 1602 1640 pgoff_t start, unsigned int nr_entries, 1603 1641 struct page **entries, pgoff_t *indices) 1604 1642 { 1605 - void **slot; 1643 + XA_STATE(xas, &mapping->i_pages, start); 1644 + struct page *page; 1606 1645 unsigned int ret = 0; 1607 - struct radix_tree_iter iter; 1608 1646 1609 1647 if (!nr_entries) 1610 1648 return 0; 1611 1649 1612 1650 rcu_read_lock(); 1613 - radix_tree_for_each_slot(slot, &mapping->i_pages, &iter, start) { 1614 - struct page *head, *page; 1615 - repeat: 1616 - page = radix_tree_deref_slot(slot); 1617 - if (unlikely(!page)) 1651 + xas_for_each(&xas, page, ULONG_MAX) { 1652 + struct page *head; 1653 + if (xas_retry(&xas, page)) 1618 1654 continue; 1619 - if (radix_tree_exception(page)) { 1620 - if (radix_tree_deref_retry(page)) { 1621 - slot = radix_tree_iter_retry(&iter); 1622 - continue; 1623 - } 1624 - /* 1625 - * A shadow entry of a recently evicted page, a swap 1626 - * entry from shmem/tmpfs or a DAX entry. Return it 1627 - * without attempting to raise page count. 1628 - */ 1655 + /* 1656 + * A shadow entry of a recently evicted page, a swap 1657 + * entry from shmem/tmpfs or a DAX entry. Return it 1658 + * without attempting to raise page count. 1659 + */ 1660 + if (xa_is_value(page)) 1629 1661 goto export; 1630 - } 1631 1662 1632 1663 head = compound_head(page); 1633 1664 if (!page_cache_get_speculative(head)) 1634 - goto repeat; 1665 + goto retry; 1635 1666 1636 1667 /* The page was split under us? */ 1637 - if (compound_head(page) != head) { 1638 - put_page(head); 1639 - goto repeat; 1640 - } 1668 + if (compound_head(page) != head) 1669 + goto put_page; 1641 1670 1642 1671 /* Has the page moved? */ 1643 - if (unlikely(page != *slot)) { 1644 - put_page(head); 1645 - goto repeat; 1646 - } 1672 + if (unlikely(page != xas_reload(&xas))) 1673 + goto put_page; 1674 + 1647 1675 export: 1648 - indices[ret] = iter.index; 1676 + indices[ret] = xas.xa_index; 1649 1677 entries[ret] = page; 1650 1678 if (++ret == nr_entries) 1651 1679 break; 1680 + continue; 1681 + put_page: 1682 + put_page(head); 1683 + retry: 1684 + xas_reset(&xas); 1652 1685 } 1653 1686 rcu_read_unlock(); 1654 1687 return ret; ··· 1674 1717 pgoff_t end, unsigned int nr_pages, 1675 1718 struct page **pages) 1676 1719 { 1677 - struct radix_tree_iter iter; 1678 - void **slot; 1720 + XA_STATE(xas, &mapping->i_pages, *start); 1721 + struct page *page; 1679 1722 unsigned ret = 0; 1680 1723 1681 1724 if (unlikely(!nr_pages)) 1682 1725 return 0; 1683 1726 1684 1727 rcu_read_lock(); 1685 - radix_tree_for_each_slot(slot, &mapping->i_pages, &iter, *start) { 1686 - struct page *head, *page; 1687 - 1688 - if (iter.index > end) 1689 - break; 1690 - repeat: 1691 - page = radix_tree_deref_slot(slot); 1692 - if (unlikely(!page)) 1728 + xas_for_each(&xas, page, end) { 1729 + struct page *head; 1730 + if (xas_retry(&xas, page)) 1693 1731 continue; 1694 - 1695 - if (radix_tree_exception(page)) { 1696 - if (radix_tree_deref_retry(page)) { 1697 - slot = radix_tree_iter_retry(&iter); 1698 - continue; 1699 - } 1700 - /* 1701 - * A shadow entry of a recently evicted page, 1702 - * or a swap entry from shmem/tmpfs. Skip 1703 - * over it. 1704 - */ 1732 + /* Skip over shadow, swap and DAX entries */ 1733 + if (xa_is_value(page)) 1705 1734 continue; 1706 - } 1707 1735 1708 1736 head = compound_head(page); 1709 1737 if (!page_cache_get_speculative(head)) 1710 - goto repeat; 1738 + goto retry; 1711 1739 1712 1740 /* The page was split under us? */ 1713 - if (compound_head(page) != head) { 1714 - put_page(head); 1715 - goto repeat; 1716 - } 1741 + if (compound_head(page) != head) 1742 + goto put_page; 1717 1743 1718 1744 /* Has the page moved? */ 1719 - if (unlikely(page != *slot)) { 1720 - put_page(head); 1721 - goto repeat; 1722 - } 1745 + if (unlikely(page != xas_reload(&xas))) 1746 + goto put_page; 1723 1747 1724 1748 pages[ret] = page; 1725 1749 if (++ret == nr_pages) { 1726 - *start = pages[ret - 1]->index + 1; 1750 + *start = page->index + 1; 1727 1751 goto out; 1728 1752 } 1753 + continue; 1754 + put_page: 1755 + put_page(head); 1756 + retry: 1757 + xas_reset(&xas); 1729 1758 } 1730 1759 1731 1760 /* 1732 1761 * We come here when there is no page beyond @end. We take care to not 1733 1762 * overflow the index @start as it confuses some of the callers. This 1734 - * breaks the iteration when there is page at index -1 but that is 1763 + * breaks the iteration when there is a page at index -1 but that is 1735 1764 * already broken anyway. 1736 1765 */ 1737 1766 if (end == (pgoff_t)-1) ··· 1745 1802 unsigned find_get_pages_contig(struct address_space *mapping, pgoff_t index, 1746 1803 unsigned int nr_pages, struct page **pages) 1747 1804 { 1748 - struct radix_tree_iter iter; 1749 - void **slot; 1805 + XA_STATE(xas, &mapping->i_pages, index); 1806 + struct page *page; 1750 1807 unsigned int ret = 0; 1751 1808 1752 1809 if (unlikely(!nr_pages)) 1753 1810 return 0; 1754 1811 1755 1812 rcu_read_lock(); 1756 - radix_tree_for_each_contig(slot, &mapping->i_pages, &iter, index) { 1757 - struct page *head, *page; 1758 - repeat: 1759 - page = radix_tree_deref_slot(slot); 1760 - /* The hole, there no reason to continue */ 1761 - if (unlikely(!page)) 1813 + for (page = xas_load(&xas); page; page = xas_next(&xas)) { 1814 + struct page *head; 1815 + if (xas_retry(&xas, page)) 1816 + continue; 1817 + /* 1818 + * If the entry has been swapped out, we can stop looking. 1819 + * No current caller is looking for DAX entries. 1820 + */ 1821 + if (xa_is_value(page)) 1762 1822 break; 1763 - 1764 - if (radix_tree_exception(page)) { 1765 - if (radix_tree_deref_retry(page)) { 1766 - slot = radix_tree_iter_retry(&iter); 1767 - continue; 1768 - } 1769 - /* 1770 - * A shadow entry of a recently evicted page, 1771 - * or a swap entry from shmem/tmpfs. Stop 1772 - * looking for contiguous pages. 1773 - */ 1774 - break; 1775 - } 1776 1823 1777 1824 head = compound_head(page); 1778 1825 if (!page_cache_get_speculative(head)) 1779 - goto repeat; 1826 + goto retry; 1780 1827 1781 1828 /* The page was split under us? */ 1782 - if (compound_head(page) != head) { 1783 - put_page(head); 1784 - goto repeat; 1785 - } 1829 + if (compound_head(page) != head) 1830 + goto put_page; 1786 1831 1787 1832 /* Has the page moved? */ 1788 - if (unlikely(page != *slot)) { 1789 - put_page(head); 1790 - goto repeat; 1791 - } 1833 + if (unlikely(page != xas_reload(&xas))) 1834 + goto put_page; 1792 1835 1793 1836 /* 1794 1837 * must check mapping and index after taking the ref. 1795 1838 * otherwise we can get both false positives and false 1796 1839 * negatives, which is just confusing to the caller. 1797 1840 */ 1798 - if (page->mapping == NULL || page_to_pgoff(page) != iter.index) { 1841 + if (!page->mapping || page_to_pgoff(page) != xas.xa_index) { 1799 1842 put_page(page); 1800 1843 break; 1801 1844 } ··· 1789 1860 pages[ret] = page; 1790 1861 if (++ret == nr_pages) 1791 1862 break; 1863 + continue; 1864 + put_page: 1865 + put_page(head); 1866 + retry: 1867 + xas_reset(&xas); 1792 1868 } 1793 1869 rcu_read_unlock(); 1794 1870 return ret; ··· 1813 1879 * @tag. We update @index to index the next page for the traversal. 1814 1880 */ 1815 1881 unsigned find_get_pages_range_tag(struct address_space *mapping, pgoff_t *index, 1816 - pgoff_t end, int tag, unsigned int nr_pages, 1882 + pgoff_t end, xa_mark_t tag, unsigned int nr_pages, 1817 1883 struct page **pages) 1818 1884 { 1819 - struct radix_tree_iter iter; 1820 - void **slot; 1885 + XA_STATE(xas, &mapping->i_pages, *index); 1886 + struct page *page; 1821 1887 unsigned ret = 0; 1822 1888 1823 1889 if (unlikely(!nr_pages)) 1824 1890 return 0; 1825 1891 1826 1892 rcu_read_lock(); 1827 - radix_tree_for_each_tagged(slot, &mapping->i_pages, &iter, *index, tag) { 1828 - struct page *head, *page; 1829 - 1830 - if (iter.index > end) 1831 - break; 1832 - repeat: 1833 - page = radix_tree_deref_slot(slot); 1834 - if (unlikely(!page)) 1893 + xas_for_each_marked(&xas, page, end, tag) { 1894 + struct page *head; 1895 + if (xas_retry(&xas, page)) 1835 1896 continue; 1836 - 1837 - if (radix_tree_exception(page)) { 1838 - if (radix_tree_deref_retry(page)) { 1839 - slot = radix_tree_iter_retry(&iter); 1840 - continue; 1841 - } 1842 - /* 1843 - * A shadow entry of a recently evicted page. 1844 - * 1845 - * Those entries should never be tagged, but 1846 - * this tree walk is lockless and the tags are 1847 - * looked up in bulk, one radix tree node at a 1848 - * time, so there is a sizable window for page 1849 - * reclaim to evict a page we saw tagged. 1850 - * 1851 - * Skip over it. 1852 - */ 1897 + /* 1898 + * Shadow entries should never be tagged, but this iteration 1899 + * is lockless so there is a window for page reclaim to evict 1900 + * a page we saw tagged. Skip over it. 1901 + */ 1902 + if (xa_is_value(page)) 1853 1903 continue; 1854 - } 1855 1904 1856 1905 head = compound_head(page); 1857 1906 if (!page_cache_get_speculative(head)) 1858 - goto repeat; 1907 + goto retry; 1859 1908 1860 1909 /* The page was split under us? */ 1861 - if (compound_head(page) != head) { 1862 - put_page(head); 1863 - goto repeat; 1864 - } 1910 + if (compound_head(page) != head) 1911 + goto put_page; 1865 1912 1866 1913 /* Has the page moved? */ 1867 - if (unlikely(page != *slot)) { 1868 - put_page(head); 1869 - goto repeat; 1870 - } 1914 + if (unlikely(page != xas_reload(&xas))) 1915 + goto put_page; 1871 1916 1872 1917 pages[ret] = page; 1873 1918 if (++ret == nr_pages) { 1874 - *index = pages[ret - 1]->index + 1; 1919 + *index = page->index + 1; 1875 1920 goto out; 1876 1921 } 1922 + continue; 1923 + put_page: 1924 + put_page(head); 1925 + retry: 1926 + xas_reset(&xas); 1877 1927 } 1878 1928 1879 1929 /* 1880 - * We come here when we got at @end. We take care to not overflow the 1930 + * We come here when we got to @end. We take care to not overflow the 1881 1931 * index @index as it confuses some of the callers. This breaks the 1882 - * iteration when there is page at index -1 but that is already broken 1883 - * anyway. 1932 + * iteration when there is a page at index -1 but that is already 1933 + * broken anyway. 1884 1934 */ 1885 1935 if (end == (pgoff_t)-1) 1886 1936 *index = (pgoff_t)-1; ··· 1890 1972 * @tag. 1891 1973 */ 1892 1974 unsigned find_get_entries_tag(struct address_space *mapping, pgoff_t start, 1893 - int tag, unsigned int nr_entries, 1975 + xa_mark_t tag, unsigned int nr_entries, 1894 1976 struct page **entries, pgoff_t *indices) 1895 1977 { 1896 - void **slot; 1978 + XA_STATE(xas, &mapping->i_pages, start); 1979 + struct page *page; 1897 1980 unsigned int ret = 0; 1898 - struct radix_tree_iter iter; 1899 1981 1900 1982 if (!nr_entries) 1901 1983 return 0; 1902 1984 1903 1985 rcu_read_lock(); 1904 - radix_tree_for_each_tagged(slot, &mapping->i_pages, &iter, start, tag) { 1905 - struct page *head, *page; 1906 - repeat: 1907 - page = radix_tree_deref_slot(slot); 1908 - if (unlikely(!page)) 1986 + xas_for_each_marked(&xas, page, ULONG_MAX, tag) { 1987 + struct page *head; 1988 + if (xas_retry(&xas, page)) 1909 1989 continue; 1910 - if (radix_tree_exception(page)) { 1911 - if (radix_tree_deref_retry(page)) { 1912 - slot = radix_tree_iter_retry(&iter); 1913 - continue; 1914 - } 1915 - 1916 - /* 1917 - * A shadow entry of a recently evicted page, a swap 1918 - * entry from shmem/tmpfs or a DAX entry. Return it 1919 - * without attempting to raise page count. 1920 - */ 1990 + /* 1991 + * A shadow entry of a recently evicted page, a swap 1992 + * entry from shmem/tmpfs or a DAX entry. Return it 1993 + * without attempting to raise page count. 1994 + */ 1995 + if (xa_is_value(page)) 1921 1996 goto export; 1922 - } 1923 1997 1924 1998 head = compound_head(page); 1925 1999 if (!page_cache_get_speculative(head)) 1926 - goto repeat; 2000 + goto retry; 1927 2001 1928 2002 /* The page was split under us? */ 1929 - if (compound_head(page) != head) { 1930 - put_page(head); 1931 - goto repeat; 1932 - } 2003 + if (compound_head(page) != head) 2004 + goto put_page; 1933 2005 1934 2006 /* Has the page moved? */ 1935 - if (unlikely(page != *slot)) { 1936 - put_page(head); 1937 - goto repeat; 1938 - } 2007 + if (unlikely(page != xas_reload(&xas))) 2008 + goto put_page; 2009 + 1939 2010 export: 1940 - indices[ret] = iter.index; 2011 + indices[ret] = xas.xa_index; 1941 2012 entries[ret] = page; 1942 2013 if (++ret == nr_entries) 1943 2014 break; 2015 + continue; 2016 + put_page: 2017 + put_page(head); 2018 + retry: 2019 + xas_reset(&xas); 1944 2020 } 1945 2021 rcu_read_unlock(); 1946 2022 return ret; ··· 2538 2626 void filemap_map_pages(struct vm_fault *vmf, 2539 2627 pgoff_t start_pgoff, pgoff_t end_pgoff) 2540 2628 { 2541 - struct radix_tree_iter iter; 2542 - void **slot; 2543 2629 struct file *file = vmf->vma->vm_file; 2544 2630 struct address_space *mapping = file->f_mapping; 2545 2631 pgoff_t last_pgoff = start_pgoff; 2546 2632 unsigned long max_idx; 2633 + XA_STATE(xas, &mapping->i_pages, start_pgoff); 2547 2634 struct page *head, *page; 2548 2635 2549 2636 rcu_read_lock(); 2550 - radix_tree_for_each_slot(slot, &mapping->i_pages, &iter, start_pgoff) { 2551 - if (iter.index > end_pgoff) 2552 - break; 2553 - repeat: 2554 - page = radix_tree_deref_slot(slot); 2555 - if (unlikely(!page)) 2637 + xas_for_each(&xas, page, end_pgoff) { 2638 + if (xas_retry(&xas, page)) 2639 + continue; 2640 + if (xa_is_value(page)) 2556 2641 goto next; 2557 - if (radix_tree_exception(page)) { 2558 - if (radix_tree_deref_retry(page)) { 2559 - slot = radix_tree_iter_retry(&iter); 2560 - continue; 2561 - } 2562 - goto next; 2563 - } 2564 2642 2565 2643 head = compound_head(page); 2566 2644 if (!page_cache_get_speculative(head)) 2567 - goto repeat; 2645 + goto next; 2568 2646 2569 2647 /* The page was split under us? */ 2570 - if (compound_head(page) != head) { 2571 - put_page(head); 2572 - goto repeat; 2573 - } 2648 + if (compound_head(page) != head) 2649 + goto skip; 2574 2650 2575 2651 /* Has the page moved? */ 2576 - if (unlikely(page != *slot)) { 2577 - put_page(head); 2578 - goto repeat; 2579 - } 2652 + if (unlikely(page != xas_reload(&xas))) 2653 + goto skip; 2580 2654 2581 2655 if (!PageUptodate(page) || 2582 2656 PageReadahead(page) || ··· 2581 2683 if (file->f_ra.mmap_miss > 0) 2582 2684 file->f_ra.mmap_miss--; 2583 2685 2584 - vmf->address += (iter.index - last_pgoff) << PAGE_SHIFT; 2686 + vmf->address += (xas.xa_index - last_pgoff) << PAGE_SHIFT; 2585 2687 if (vmf->pte) 2586 - vmf->pte += iter.index - last_pgoff; 2587 - last_pgoff = iter.index; 2688 + vmf->pte += xas.xa_index - last_pgoff; 2689 + last_pgoff = xas.xa_index; 2588 2690 if (alloc_set_pte(vmf, NULL, page)) 2589 2691 goto unlock; 2590 2692 unlock_page(page); ··· 2596 2698 next: 2597 2699 /* Huge page is mapped? No need to proceed. */ 2598 2700 if (pmd_trans_huge(*vmf->pmd)) 2599 - break; 2600 - if (iter.index == end_pgoff) 2601 2701 break; 2602 2702 } 2603 2703 rcu_read_unlock(); ··· 2706 2810 put_page(page); 2707 2811 if (err == -EEXIST) 2708 2812 goto repeat; 2709 - /* Presumably ENOMEM for radix tree node */ 2813 + /* Presumably ENOMEM for xarray node */ 2710 2814 return ERR_PTR(err); 2711 2815 } 2712 2816
+7 -10
mm/huge_memory.c
··· 2450 2450 ClearPageCompound(head); 2451 2451 /* See comment in __split_huge_page_tail() */ 2452 2452 if (PageAnon(head)) { 2453 - /* Additional pin to radix tree of swap cache */ 2453 + /* Additional pin to swap cache */ 2454 2454 if (PageSwapCache(head)) 2455 2455 page_ref_add(head, 2); 2456 2456 else 2457 2457 page_ref_inc(head); 2458 2458 } else { 2459 - /* Additional pin to radix tree */ 2459 + /* Additional pin to page cache */ 2460 2460 page_ref_add(head, 2); 2461 2461 xa_unlock(&head->mapping->i_pages); 2462 2462 } ··· 2568 2568 { 2569 2569 int extra_pins; 2570 2570 2571 - /* Additional pins from radix tree */ 2571 + /* Additional pins from page cache */ 2572 2572 if (PageAnon(page)) 2573 2573 extra_pins = PageSwapCache(page) ? HPAGE_PMD_NR : 0; 2574 2574 else ··· 2664 2664 spin_lock_irqsave(zone_lru_lock(page_zone(head)), flags); 2665 2665 2666 2666 if (mapping) { 2667 - void **pslot; 2667 + XA_STATE(xas, &mapping->i_pages, page_index(head)); 2668 2668 2669 - xa_lock(&mapping->i_pages); 2670 - pslot = radix_tree_lookup_slot(&mapping->i_pages, 2671 - page_index(head)); 2672 2669 /* 2673 - * Check if the head page is present in radix tree. 2670 + * Check if the head page is present in page cache. 2674 2671 * We assume all tail are present too, if head is there. 2675 2672 */ 2676 - if (radix_tree_deref_slot_protected(pslot, 2677 - &mapping->i_pages.xa_lock) != head) 2673 + xa_lock(&mapping->i_pages); 2674 + if (xas_load(&xas) != head) 2678 2675 goto fail; 2679 2676 } 2680 2677
+72 -106
mm/khugepaged.c
··· 1288 1288 * 1289 1289 * Basic scheme is simple, details are more complex: 1290 1290 * - allocate and freeze a new huge page; 1291 - * - scan over radix tree replacing old pages the new one 1291 + * - scan page cache replacing old pages with the new one 1292 1292 * + swap in pages if necessary; 1293 1293 * + fill in gaps; 1294 - * + keep old pages around in case if rollback is required; 1295 - * - if replacing succeed: 1294 + * + keep old pages around in case rollback is required; 1295 + * - if replacing succeeds: 1296 1296 * + copy data over; 1297 1297 * + free old pages; 1298 1298 * + unfreeze huge page; 1299 1299 * - if replacing failed; 1300 1300 * + put all pages back and unfreeze them; 1301 - * + restore gaps in the radix-tree; 1301 + * + restore gaps in the page cache; 1302 1302 * + free huge page; 1303 1303 */ 1304 1304 static void collapse_shmem(struct mm_struct *mm, ··· 1306 1306 struct page **hpage, int node) 1307 1307 { 1308 1308 gfp_t gfp; 1309 - struct page *page, *new_page, *tmp; 1309 + struct page *new_page; 1310 1310 struct mem_cgroup *memcg; 1311 1311 pgoff_t index, end = start + HPAGE_PMD_NR; 1312 1312 LIST_HEAD(pagelist); 1313 - struct radix_tree_iter iter; 1314 - void **slot; 1313 + XA_STATE_ORDER(xas, &mapping->i_pages, start, HPAGE_PMD_ORDER); 1315 1314 int nr_none = 0, result = SCAN_SUCCEED; 1316 1315 1317 1316 VM_BUG_ON(start & (HPAGE_PMD_NR - 1)); ··· 1335 1336 __SetPageLocked(new_page); 1336 1337 BUG_ON(!page_ref_freeze(new_page, 1)); 1337 1338 1338 - 1339 1339 /* 1340 - * At this point the new_page is 'frozen' (page_count() is zero), locked 1341 - * and not up-to-date. It's safe to insert it into radix tree, because 1342 - * nobody would be able to map it or use it in other way until we 1343 - * unfreeze it. 1340 + * At this point the new_page is 'frozen' (page_count() is zero), 1341 + * locked and not up-to-date. It's safe to insert it into the page 1342 + * cache, because nobody would be able to map it or use it in other 1343 + * way until we unfreeze it. 1344 1344 */ 1345 1345 1346 - index = start; 1347 - xa_lock_irq(&mapping->i_pages); 1348 - radix_tree_for_each_slot(slot, &mapping->i_pages, &iter, start) { 1349 - int n = min(iter.index, end) - index; 1350 - 1351 - /* 1352 - * Handle holes in the radix tree: charge it from shmem and 1353 - * insert relevant subpage of new_page into the radix-tree. 1354 - */ 1355 - if (n && !shmem_charge(mapping->host, n)) { 1356 - result = SCAN_FAIL; 1346 + /* This will be less messy when we use multi-index entries */ 1347 + do { 1348 + xas_lock_irq(&xas); 1349 + xas_create_range(&xas); 1350 + if (!xas_error(&xas)) 1357 1351 break; 1358 - } 1359 - nr_none += n; 1360 - for (; index < min(iter.index, end); index++) { 1361 - radix_tree_insert(&mapping->i_pages, index, 1362 - new_page + (index % HPAGE_PMD_NR)); 1352 + xas_unlock_irq(&xas); 1353 + if (!xas_nomem(&xas, GFP_KERNEL)) 1354 + goto out; 1355 + } while (1); 1356 + 1357 + xas_set(&xas, start); 1358 + for (index = start; index < end; index++) { 1359 + struct page *page = xas_next(&xas); 1360 + 1361 + VM_BUG_ON(index != xas.xa_index); 1362 + if (!page) { 1363 + if (!shmem_charge(mapping->host, 1)) { 1364 + result = SCAN_FAIL; 1365 + break; 1366 + } 1367 + xas_store(&xas, new_page + (index % HPAGE_PMD_NR)); 1368 + nr_none++; 1369 + continue; 1363 1370 } 1364 1371 1365 - /* We are done. */ 1366 - if (index >= end) 1367 - break; 1368 - 1369 - page = radix_tree_deref_slot_protected(slot, 1370 - &mapping->i_pages.xa_lock); 1371 - if (radix_tree_exceptional_entry(page) || !PageUptodate(page)) { 1372 - xa_unlock_irq(&mapping->i_pages); 1372 + if (xa_is_value(page) || !PageUptodate(page)) { 1373 + xas_unlock_irq(&xas); 1373 1374 /* swap in or instantiate fallocated page */ 1374 1375 if (shmem_getpage(mapping->host, index, &page, 1375 1376 SGP_NOHUGE)) { 1376 1377 result = SCAN_FAIL; 1377 - goto tree_unlocked; 1378 + goto xa_unlocked; 1378 1379 } 1379 - xa_lock_irq(&mapping->i_pages); 1380 + xas_lock_irq(&xas); 1381 + xas_set(&xas, index); 1380 1382 } else if (trylock_page(page)) { 1381 1383 get_page(page); 1382 1384 } else { ··· 1397 1397 result = SCAN_TRUNCATED; 1398 1398 goto out_unlock; 1399 1399 } 1400 - xa_unlock_irq(&mapping->i_pages); 1400 + xas_unlock_irq(&xas); 1401 1401 1402 1402 if (isolate_lru_page(page)) { 1403 1403 result = SCAN_DEL_PAGE_LRU; ··· 1407 1407 if (page_mapped(page)) 1408 1408 unmap_mapping_pages(mapping, index, 1, false); 1409 1409 1410 - xa_lock_irq(&mapping->i_pages); 1410 + xas_lock_irq(&xas); 1411 + xas_set(&xas, index); 1411 1412 1412 - slot = radix_tree_lookup_slot(&mapping->i_pages, index); 1413 - VM_BUG_ON_PAGE(page != radix_tree_deref_slot_protected(slot, 1414 - &mapping->i_pages.xa_lock), page); 1413 + VM_BUG_ON_PAGE(page != xas_load(&xas), page); 1415 1414 VM_BUG_ON_PAGE(page_mapped(page), page); 1416 1415 1417 1416 /* 1418 1417 * The page is expected to have page_count() == 3: 1419 1418 * - we hold a pin on it; 1420 - * - one reference from radix tree; 1419 + * - one reference from page cache; 1421 1420 * - one from isolate_lru_page; 1422 1421 */ 1423 1422 if (!page_ref_freeze(page, 3)) { ··· 1431 1432 list_add_tail(&page->lru, &pagelist); 1432 1433 1433 1434 /* Finally, replace with the new page. */ 1434 - radix_tree_replace_slot(&mapping->i_pages, slot, 1435 - new_page + (index % HPAGE_PMD_NR)); 1436 - 1437 - slot = radix_tree_iter_resume(slot, &iter); 1438 - index++; 1435 + xas_store(&xas, new_page + (index % HPAGE_PMD_NR)); 1439 1436 continue; 1440 1437 out_lru: 1441 - xa_unlock_irq(&mapping->i_pages); 1438 + xas_unlock_irq(&xas); 1442 1439 putback_lru_page(page); 1443 1440 out_isolate_failed: 1444 1441 unlock_page(page); 1445 1442 put_page(page); 1446 - goto tree_unlocked; 1443 + goto xa_unlocked; 1447 1444 out_unlock: 1448 1445 unlock_page(page); 1449 1446 put_page(page); 1450 1447 break; 1451 1448 } 1449 + xas_unlock_irq(&xas); 1452 1450 1453 - /* 1454 - * Handle hole in radix tree at the end of the range. 1455 - * This code only triggers if there's nothing in radix tree 1456 - * beyond 'end'. 1457 - */ 1458 - if (result == SCAN_SUCCEED && index < end) { 1459 - int n = end - index; 1460 - 1461 - if (!shmem_charge(mapping->host, n)) { 1462 - result = SCAN_FAIL; 1463 - goto tree_locked; 1464 - } 1465 - 1466 - for (; index < end; index++) { 1467 - radix_tree_insert(&mapping->i_pages, index, 1468 - new_page + (index % HPAGE_PMD_NR)); 1469 - } 1470 - nr_none += n; 1471 - } 1472 - 1473 - tree_locked: 1474 - xa_unlock_irq(&mapping->i_pages); 1475 - tree_unlocked: 1476 - 1451 + xa_unlocked: 1477 1452 if (result == SCAN_SUCCEED) { 1478 - unsigned long flags; 1453 + struct page *page, *tmp; 1479 1454 struct zone *zone = page_zone(new_page); 1480 1455 1481 1456 /* 1482 - * Replacing old pages with new one has succeed, now we need to 1483 - * copy the content and free old pages. 1457 + * Replacing old pages with new one has succeeded, now we 1458 + * need to copy the content and free the old pages. 1484 1459 */ 1485 1460 list_for_each_entry_safe(page, tmp, &pagelist, lru) { 1486 1461 copy_highpage(new_page + (page->index % HPAGE_PMD_NR), ··· 1468 1495 put_page(page); 1469 1496 } 1470 1497 1471 - local_irq_save(flags); 1498 + local_irq_disable(); 1472 1499 __inc_node_page_state(new_page, NR_SHMEM_THPS); 1473 1500 if (nr_none) { 1474 1501 __mod_node_page_state(zone->zone_pgdat, NR_FILE_PAGES, nr_none); 1475 1502 __mod_node_page_state(zone->zone_pgdat, NR_SHMEM, nr_none); 1476 1503 } 1477 - local_irq_restore(flags); 1504 + local_irq_enable(); 1478 1505 1479 1506 /* 1480 - * Remove pte page tables, so we can re-faulti 1507 + * Remove pte page tables, so we can re-fault 1481 1508 * the page as huge. 1482 1509 */ 1483 1510 retract_page_tables(mapping, start); ··· 1494 1521 1495 1522 khugepaged_pages_collapsed++; 1496 1523 } else { 1497 - /* Something went wrong: rollback changes to the radix-tree */ 1524 + struct page *page; 1525 + /* Something went wrong: roll back page cache changes */ 1498 1526 shmem_uncharge(mapping->host, nr_none); 1499 - xa_lock_irq(&mapping->i_pages); 1500 - radix_tree_for_each_slot(slot, &mapping->i_pages, &iter, start) { 1501 - if (iter.index >= end) 1502 - break; 1527 + xas_lock_irq(&xas); 1528 + xas_set(&xas, start); 1529 + xas_for_each(&xas, page, end - 1) { 1503 1530 page = list_first_entry_or_null(&pagelist, 1504 1531 struct page, lru); 1505 - if (!page || iter.index < page->index) { 1532 + if (!page || xas.xa_index < page->index) { 1506 1533 if (!nr_none) 1507 1534 break; 1508 1535 nr_none--; 1509 1536 /* Put holes back where they were */ 1510 - radix_tree_delete(&mapping->i_pages, iter.index); 1537 + xas_store(&xas, NULL); 1511 1538 continue; 1512 1539 } 1513 1540 1514 - VM_BUG_ON_PAGE(page->index != iter.index, page); 1541 + VM_BUG_ON_PAGE(page->index != xas.xa_index, page); 1515 1542 1516 1543 /* Unfreeze the page. */ 1517 1544 list_del(&page->lru); 1518 1545 page_ref_unfreeze(page, 2); 1519 - radix_tree_replace_slot(&mapping->i_pages, slot, page); 1520 - slot = radix_tree_iter_resume(slot, &iter); 1521 - xa_unlock_irq(&mapping->i_pages); 1546 + xas_store(&xas, page); 1547 + xas_pause(&xas); 1548 + xas_unlock_irq(&xas); 1522 1549 putback_lru_page(page); 1523 1550 unlock_page(page); 1524 - xa_lock_irq(&mapping->i_pages); 1551 + xas_lock_irq(&xas); 1525 1552 } 1526 1553 VM_BUG_ON(nr_none); 1527 - xa_unlock_irq(&mapping->i_pages); 1554 + xas_unlock_irq(&xas); 1528 1555 1529 1556 /* Unfreeze new_page, caller would take care about freeing it */ 1530 1557 page_ref_unfreeze(new_page, 1); ··· 1542 1569 pgoff_t start, struct page **hpage) 1543 1570 { 1544 1571 struct page *page = NULL; 1545 - struct radix_tree_iter iter; 1546 - void **slot; 1572 + XA_STATE(xas, &mapping->i_pages, start); 1547 1573 int present, swap; 1548 1574 int node = NUMA_NO_NODE; 1549 1575 int result = SCAN_SUCCEED; ··· 1551 1579 swap = 0; 1552 1580 memset(khugepaged_node_load, 0, sizeof(khugepaged_node_load)); 1553 1581 rcu_read_lock(); 1554 - radix_tree_for_each_slot(slot, &mapping->i_pages, &iter, start) { 1555 - if (iter.index >= start + HPAGE_PMD_NR) 1556 - break; 1557 - 1558 - page = radix_tree_deref_slot(slot); 1559 - if (radix_tree_deref_retry(page)) { 1560 - slot = radix_tree_iter_retry(&iter); 1582 + xas_for_each(&xas, page, start + HPAGE_PMD_NR - 1) { 1583 + if (xas_retry(&xas, page)) 1561 1584 continue; 1562 - } 1563 1585 1564 - if (radix_tree_exception(page)) { 1586 + if (xa_is_value(page)) { 1565 1587 if (++swap > khugepaged_max_ptes_swap) { 1566 1588 result = SCAN_EXCEED_SWAP_PTE; 1567 1589 break; ··· 1594 1628 present++; 1595 1629 1596 1630 if (need_resched()) { 1597 - slot = radix_tree_iter_resume(slot, &iter); 1631 + xas_pause(&xas); 1598 1632 cond_resched_rcu(); 1599 1633 } 1600 1634 }
+1 -1
mm/madvise.c
··· 251 251 index = ((start - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff; 252 252 253 253 page = find_get_entry(mapping, index); 254 - if (!radix_tree_exceptional_entry(page)) { 254 + if (!xa_is_value(page)) { 255 255 if (page) 256 256 put_page(page); 257 257 continue;
+1 -1
mm/memcontrol.c
··· 4728 4728 /* shmem/tmpfs may report page out on swap: account for that too. */ 4729 4729 if (shmem_mapping(mapping)) { 4730 4730 page = find_get_entry(mapping, pgoff); 4731 - if (radix_tree_exceptional_entry(page)) { 4731 + if (xa_is_value(page)) { 4732 4732 swp_entry_t swp = radix_to_swp_entry(page); 4733 4733 if (do_memsw_account()) 4734 4734 *entry = swp;
+42 -61
mm/memfd.c
··· 21 21 #include <uapi/linux/memfd.h> 22 22 23 23 /* 24 - * We need a tag: a new tag would expand every radix_tree_node by 8 bytes, 24 + * We need a tag: a new tag would expand every xa_node by 8 bytes, 25 25 * so reuse a tag which we firmly believe is never set or cleared on tmpfs 26 26 * or hugetlbfs because they are memory only filesystems. 27 27 */ 28 28 #define MEMFD_TAG_PINNED PAGECACHE_TAG_TOWRITE 29 29 #define LAST_SCAN 4 /* about 150ms max */ 30 30 31 - static void memfd_tag_pins(struct address_space *mapping) 31 + static void memfd_tag_pins(struct xa_state *xas) 32 32 { 33 - struct radix_tree_iter iter; 34 - void __rcu **slot; 35 - pgoff_t start; 36 33 struct page *page; 34 + unsigned int tagged = 0; 37 35 38 36 lru_add_drain(); 39 - start = 0; 40 - rcu_read_lock(); 41 37 42 - radix_tree_for_each_slot(slot, &mapping->i_pages, &iter, start) { 43 - page = radix_tree_deref_slot(slot); 44 - if (!page || radix_tree_exception(page)) { 45 - if (radix_tree_deref_retry(page)) { 46 - slot = radix_tree_iter_retry(&iter); 47 - continue; 48 - } 49 - } else if (page_count(page) - page_mapcount(page) > 1) { 50 - xa_lock_irq(&mapping->i_pages); 51 - radix_tree_tag_set(&mapping->i_pages, iter.index, 52 - MEMFD_TAG_PINNED); 53 - xa_unlock_irq(&mapping->i_pages); 54 - } 38 + xas_lock_irq(xas); 39 + xas_for_each(xas, page, ULONG_MAX) { 40 + if (xa_is_value(page)) 41 + continue; 42 + if (page_count(page) - page_mapcount(page) > 1) 43 + xas_set_mark(xas, MEMFD_TAG_PINNED); 55 44 56 - if (need_resched()) { 57 - slot = radix_tree_iter_resume(slot, &iter); 58 - cond_resched_rcu(); 59 - } 45 + if (++tagged % XA_CHECK_SCHED) 46 + continue; 47 + 48 + xas_pause(xas); 49 + xas_unlock_irq(xas); 50 + cond_resched(); 51 + xas_lock_irq(xas); 60 52 } 61 - rcu_read_unlock(); 53 + xas_unlock_irq(xas); 62 54 } 63 55 64 56 /* ··· 64 72 */ 65 73 static int memfd_wait_for_pins(struct address_space *mapping) 66 74 { 67 - struct radix_tree_iter iter; 68 - void __rcu **slot; 69 - pgoff_t start; 75 + XA_STATE(xas, &mapping->i_pages, 0); 70 76 struct page *page; 71 77 int error, scan; 72 78 73 - memfd_tag_pins(mapping); 79 + memfd_tag_pins(&xas); 74 80 75 81 error = 0; 76 82 for (scan = 0; scan <= LAST_SCAN; scan++) { 77 - if (!radix_tree_tagged(&mapping->i_pages, MEMFD_TAG_PINNED)) 83 + unsigned int tagged = 0; 84 + 85 + if (!xas_marked(&xas, MEMFD_TAG_PINNED)) 78 86 break; 79 87 80 88 if (!scan) ··· 82 90 else if (schedule_timeout_killable((HZ << scan) / 200)) 83 91 scan = LAST_SCAN; 84 92 85 - start = 0; 86 - rcu_read_lock(); 87 - radix_tree_for_each_tagged(slot, &mapping->i_pages, &iter, 88 - start, MEMFD_TAG_PINNED) { 89 - 90 - page = radix_tree_deref_slot(slot); 91 - if (radix_tree_exception(page)) { 92 - if (radix_tree_deref_retry(page)) { 93 - slot = radix_tree_iter_retry(&iter); 94 - continue; 95 - } 96 - 97 - page = NULL; 98 - } 99 - 100 - if (page && 101 - page_count(page) - page_mapcount(page) != 1) { 102 - if (scan < LAST_SCAN) 103 - goto continue_resched; 104 - 93 + xas_set(&xas, 0); 94 + xas_lock_irq(&xas); 95 + xas_for_each_marked(&xas, page, ULONG_MAX, MEMFD_TAG_PINNED) { 96 + bool clear = true; 97 + if (xa_is_value(page)) 98 + continue; 99 + if (page_count(page) - page_mapcount(page) != 1) { 105 100 /* 106 101 * On the last scan, we clean up all those tags 107 102 * we inserted; but make a note that we still 108 103 * found pages pinned. 109 104 */ 110 - error = -EBUSY; 105 + if (scan == LAST_SCAN) 106 + error = -EBUSY; 107 + else 108 + clear = false; 111 109 } 110 + if (clear) 111 + xas_clear_mark(&xas, MEMFD_TAG_PINNED); 112 + if (++tagged % XA_CHECK_SCHED) 113 + continue; 112 114 113 - xa_lock_irq(&mapping->i_pages); 114 - radix_tree_tag_clear(&mapping->i_pages, 115 - iter.index, MEMFD_TAG_PINNED); 116 - xa_unlock_irq(&mapping->i_pages); 117 - continue_resched: 118 - if (need_resched()) { 119 - slot = radix_tree_iter_resume(slot, &iter); 120 - cond_resched_rcu(); 121 - } 115 + xas_pause(&xas); 116 + xas_unlock_irq(&xas); 117 + cond_resched(); 118 + xas_lock_irq(&xas); 122 119 } 123 - rcu_read_unlock(); 120 + xas_unlock_irq(&xas); 124 121 } 125 122 126 123 return error;
+18 -30
mm/migrate.c
··· 326 326 page = migration_entry_to_page(entry); 327 327 328 328 /* 329 - * Once radix-tree replacement of page migration started, page_count 329 + * Once page cache replacement of page migration started, page_count 330 330 * *must* be zero. And, we don't want to call wait_on_page_locked() 331 331 * against a page without get_page(). 332 332 * So, we use get_page_unless_zero(), here. Even failed, page fault ··· 441 441 struct buffer_head *head, enum migrate_mode mode, 442 442 int extra_count) 443 443 { 444 + XA_STATE(xas, &mapping->i_pages, page_index(page)); 444 445 struct zone *oldzone, *newzone; 445 446 int dirty; 446 447 int expected_count = 1 + extra_count; 447 - void **pslot; 448 448 449 449 /* 450 450 * Device public or private pages have an extra refcount as they are ··· 470 470 oldzone = page_zone(page); 471 471 newzone = page_zone(newpage); 472 472 473 - xa_lock_irq(&mapping->i_pages); 474 - 475 - pslot = radix_tree_lookup_slot(&mapping->i_pages, 476 - page_index(page)); 473 + xas_lock_irq(&xas); 477 474 478 475 expected_count += hpage_nr_pages(page) + page_has_private(page); 479 - if (page_count(page) != expected_count || 480 - radix_tree_deref_slot_protected(pslot, 481 - &mapping->i_pages.xa_lock) != page) { 482 - xa_unlock_irq(&mapping->i_pages); 476 + if (page_count(page) != expected_count || xas_load(&xas) != page) { 477 + xas_unlock_irq(&xas); 483 478 return -EAGAIN; 484 479 } 485 480 486 481 if (!page_ref_freeze(page, expected_count)) { 487 - xa_unlock_irq(&mapping->i_pages); 482 + xas_unlock_irq(&xas); 488 483 return -EAGAIN; 489 484 } 490 485 ··· 493 498 if (mode == MIGRATE_ASYNC && head && 494 499 !buffer_migrate_lock_buffers(head, mode)) { 495 500 page_ref_unfreeze(page, expected_count); 496 - xa_unlock_irq(&mapping->i_pages); 501 + xas_unlock_irq(&xas); 497 502 return -EAGAIN; 498 503 } 499 504 ··· 521 526 SetPageDirty(newpage); 522 527 } 523 528 524 - radix_tree_replace_slot(&mapping->i_pages, pslot, newpage); 529 + xas_store(&xas, newpage); 525 530 if (PageTransHuge(page)) { 526 531 int i; 527 - int index = page_index(page); 528 532 529 533 for (i = 1; i < HPAGE_PMD_NR; i++) { 530 - pslot = radix_tree_lookup_slot(&mapping->i_pages, 531 - index + i); 532 - radix_tree_replace_slot(&mapping->i_pages, pslot, 533 - newpage + i); 534 + xas_next(&xas); 535 + xas_store(&xas, newpage + i); 534 536 } 535 537 } 536 538 ··· 538 546 */ 539 547 page_ref_unfreeze(page, expected_count - hpage_nr_pages(page)); 540 548 541 - xa_unlock(&mapping->i_pages); 549 + xas_unlock(&xas); 542 550 /* Leave irq disabled to prevent preemption while updating stats */ 543 551 544 552 /* ··· 578 586 int migrate_huge_page_move_mapping(struct address_space *mapping, 579 587 struct page *newpage, struct page *page) 580 588 { 589 + XA_STATE(xas, &mapping->i_pages, page_index(page)); 581 590 int expected_count; 582 - void **pslot; 583 591 584 - xa_lock_irq(&mapping->i_pages); 585 - 586 - pslot = radix_tree_lookup_slot(&mapping->i_pages, page_index(page)); 587 - 592 + xas_lock_irq(&xas); 588 593 expected_count = 2 + page_has_private(page); 589 - if (page_count(page) != expected_count || 590 - radix_tree_deref_slot_protected(pslot, &mapping->i_pages.xa_lock) != page) { 591 - xa_unlock_irq(&mapping->i_pages); 594 + if (page_count(page) != expected_count || xas_load(&xas) != page) { 595 + xas_unlock_irq(&xas); 592 596 return -EAGAIN; 593 597 } 594 598 595 599 if (!page_ref_freeze(page, expected_count)) { 596 - xa_unlock_irq(&mapping->i_pages); 600 + xas_unlock_irq(&xas); 597 601 return -EAGAIN; 598 602 } 599 603 ··· 598 610 599 611 get_page(newpage); 600 612 601 - radix_tree_replace_slot(&mapping->i_pages, pslot, newpage); 613 + xas_store(&xas, newpage); 602 614 603 615 page_ref_unfreeze(page, expected_count - 1); 604 616 605 - xa_unlock_irq(&mapping->i_pages); 617 + xas_unlock_irq(&xas); 606 618 607 619 return MIGRATEPAGE_SUCCESS; 608 620 }
+1 -1
mm/mincore.c
··· 66 66 * shmem/tmpfs may return swap: account for swapcache 67 67 * page too. 68 68 */ 69 - if (radix_tree_exceptional_entry(page)) { 69 + if (xa_is_value(page)) { 70 70 swp_entry_t swp = radix_to_swp_entry(page); 71 71 page = find_get_page(swap_address_space(swp), 72 72 swp_offset(swp));
+26 -46
mm/page-writeback.c
··· 2097 2097 * dirty pages in the file (thus it is important for this function to be quick 2098 2098 * so that it can tag pages faster than a dirtying process can create them). 2099 2099 */ 2100 - /* 2101 - * We tag pages in batches of WRITEBACK_TAG_BATCH to reduce the i_pages lock 2102 - * latency. 2103 - */ 2104 2100 void tag_pages_for_writeback(struct address_space *mapping, 2105 2101 pgoff_t start, pgoff_t end) 2106 2102 { 2107 - #define WRITEBACK_TAG_BATCH 4096 2108 - unsigned long tagged = 0; 2109 - struct radix_tree_iter iter; 2110 - void **slot; 2103 + XA_STATE(xas, &mapping->i_pages, start); 2104 + unsigned int tagged = 0; 2105 + void *page; 2111 2106 2112 - xa_lock_irq(&mapping->i_pages); 2113 - radix_tree_for_each_tagged(slot, &mapping->i_pages, &iter, start, 2114 - PAGECACHE_TAG_DIRTY) { 2115 - if (iter.index > end) 2116 - break; 2117 - radix_tree_iter_tag_set(&mapping->i_pages, &iter, 2118 - PAGECACHE_TAG_TOWRITE); 2119 - tagged++; 2120 - if ((tagged % WRITEBACK_TAG_BATCH) != 0) 2107 + xas_lock_irq(&xas); 2108 + xas_for_each_marked(&xas, page, end, PAGECACHE_TAG_DIRTY) { 2109 + xas_set_mark(&xas, PAGECACHE_TAG_TOWRITE); 2110 + if (++tagged % XA_CHECK_SCHED) 2121 2111 continue; 2122 - slot = radix_tree_iter_resume(slot, &iter); 2123 - xa_unlock_irq(&mapping->i_pages); 2112 + 2113 + xas_pause(&xas); 2114 + xas_unlock_irq(&xas); 2124 2115 cond_resched(); 2125 - xa_lock_irq(&mapping->i_pages); 2116 + xas_lock_irq(&xas); 2126 2117 } 2127 - xa_unlock_irq(&mapping->i_pages); 2118 + xas_unlock_irq(&xas); 2128 2119 } 2129 2120 EXPORT_SYMBOL(tag_pages_for_writeback); 2130 2121 ··· 2161 2170 pgoff_t end; /* Inclusive */ 2162 2171 pgoff_t done_index; 2163 2172 int range_whole = 0; 2164 - int tag; 2173 + xa_mark_t tag; 2165 2174 2166 2175 pagevec_init(&pvec); 2167 2176 if (wbc->range_cyclic) { ··· 2433 2442 2434 2443 /* 2435 2444 * For address_spaces which do not use buffers. Just tag the page as dirty in 2436 - * its radix tree. 2445 + * the xarray. 2437 2446 * 2438 2447 * This is also used when a single buffer is being dirtied: we want to set the 2439 2448 * page dirty in that case, but not all the buffers. This is a "bottom-up" ··· 2459 2468 BUG_ON(page_mapping(page) != mapping); 2460 2469 WARN_ON_ONCE(!PagePrivate(page) && !PageUptodate(page)); 2461 2470 account_page_dirtied(page, mapping); 2462 - radix_tree_tag_set(&mapping->i_pages, page_index(page), 2471 + __xa_set_mark(&mapping->i_pages, page_index(page), 2463 2472 PAGECACHE_TAG_DIRTY); 2464 2473 xa_unlock_irqrestore(&mapping->i_pages, flags); 2465 2474 unlock_page_memcg(page); ··· 2622 2631 * Returns true if the page was previously dirty. 2623 2632 * 2624 2633 * This is for preparing to put the page under writeout. We leave the page 2625 - * tagged as dirty in the radix tree so that a concurrent write-for-sync 2634 + * tagged as dirty in the xarray so that a concurrent write-for-sync 2626 2635 * can discover it via a PAGECACHE_TAG_DIRTY walk. The ->writepage 2627 2636 * implementation will run either set_page_writeback() or set_page_dirty(), 2628 - * at which stage we bring the page's dirty flag and radix-tree dirty tag 2637 + * at which stage we bring the page's dirty flag and xarray dirty tag 2629 2638 * back into sync. 2630 2639 * 2631 - * This incoherency between the page's dirty flag and radix-tree tag is 2640 + * This incoherency between the page's dirty flag and xarray tag is 2632 2641 * unfortunate, but it only exists while the page is locked. 2633 2642 */ 2634 2643 int clear_page_dirty_for_io(struct page *page) ··· 2709 2718 xa_lock_irqsave(&mapping->i_pages, flags); 2710 2719 ret = TestClearPageWriteback(page); 2711 2720 if (ret) { 2712 - radix_tree_tag_clear(&mapping->i_pages, page_index(page), 2721 + __xa_clear_mark(&mapping->i_pages, page_index(page), 2713 2722 PAGECACHE_TAG_WRITEBACK); 2714 2723 if (bdi_cap_account_writeback(bdi)) { 2715 2724 struct bdi_writeback *wb = inode_to_wb(inode); ··· 2749 2758 2750 2759 lock_page_memcg(page); 2751 2760 if (mapping && mapping_use_writeback_tags(mapping)) { 2761 + XA_STATE(xas, &mapping->i_pages, page_index(page)); 2752 2762 struct inode *inode = mapping->host; 2753 2763 struct backing_dev_info *bdi = inode_to_bdi(inode); 2754 2764 unsigned long flags; 2755 2765 2756 - xa_lock_irqsave(&mapping->i_pages, flags); 2766 + xas_lock_irqsave(&xas, flags); 2767 + xas_load(&xas); 2757 2768 ret = TestSetPageWriteback(page); 2758 2769 if (!ret) { 2759 2770 bool on_wblist; ··· 2763 2770 on_wblist = mapping_tagged(mapping, 2764 2771 PAGECACHE_TAG_WRITEBACK); 2765 2772 2766 - radix_tree_tag_set(&mapping->i_pages, page_index(page), 2767 - PAGECACHE_TAG_WRITEBACK); 2773 + xas_set_mark(&xas, PAGECACHE_TAG_WRITEBACK); 2768 2774 if (bdi_cap_account_writeback(bdi)) 2769 2775 inc_wb_stat(inode_to_wb(inode), WB_WRITEBACK); 2770 2776 ··· 2776 2784 sb_mark_inode_writeback(mapping->host); 2777 2785 } 2778 2786 if (!PageDirty(page)) 2779 - radix_tree_tag_clear(&mapping->i_pages, page_index(page), 2780 - PAGECACHE_TAG_DIRTY); 2787 + xas_clear_mark(&xas, PAGECACHE_TAG_DIRTY); 2781 2788 if (!keep_write) 2782 - radix_tree_tag_clear(&mapping->i_pages, page_index(page), 2783 - PAGECACHE_TAG_TOWRITE); 2784 - xa_unlock_irqrestore(&mapping->i_pages, flags); 2789 + xas_clear_mark(&xas, PAGECACHE_TAG_TOWRITE); 2790 + xas_unlock_irqrestore(&xas, flags); 2785 2791 } else { 2786 2792 ret = TestSetPageWriteback(page); 2787 2793 } ··· 2792 2802 2793 2803 } 2794 2804 EXPORT_SYMBOL(__test_set_page_writeback); 2795 - 2796 - /* 2797 - * Return true if any of the pages in the mapping are marked with the 2798 - * passed tag. 2799 - */ 2800 - int mapping_tagged(struct address_space *mapping, int tag) 2801 - { 2802 - return radix_tree_tagged(&mapping->i_pages, tag); 2803 - } 2804 - EXPORT_SYMBOL(mapping_tagged); 2805 2805 2806 2806 /** 2807 2807 * wait_for_stable_page() - wait for writeback to finish, if necessary.
+4 -6
mm/readahead.c
··· 176 176 if (page_offset > end_index) 177 177 break; 178 178 179 - rcu_read_lock(); 180 - page = radix_tree_lookup(&mapping->i_pages, page_offset); 181 - rcu_read_unlock(); 182 - if (page && !radix_tree_exceptional_entry(page)) { 179 + page = xa_load(&mapping->i_pages, page_offset); 180 + if (page && !xa_is_value(page)) { 183 181 /* 184 182 * Page already present? Kick off the current batch of 185 183 * contiguous pages before continuing with the next ··· 334 336 pgoff_t head; 335 337 336 338 rcu_read_lock(); 337 - head = page_cache_prev_hole(mapping, offset - 1, max); 339 + head = page_cache_prev_miss(mapping, offset - 1, max); 338 340 rcu_read_unlock(); 339 341 340 342 return offset - 1 - head; ··· 423 425 pgoff_t start; 424 426 425 427 rcu_read_lock(); 426 - start = page_cache_next_hole(mapping, offset + 1, max_pages); 428 + start = page_cache_next_miss(mapping, offset + 1, max_pages); 427 429 rcu_read_unlock(); 428 430 429 431 if (!start || start - offset > max_pages)
+74 -121
mm/shmem.c
··· 322 322 } 323 323 324 324 /* 325 - * Replace item expected in radix tree by a new item, while holding tree lock. 325 + * Replace item expected in xarray by a new item, while holding xa_lock. 326 326 */ 327 - static int shmem_radix_tree_replace(struct address_space *mapping, 327 + static int shmem_replace_entry(struct address_space *mapping, 328 328 pgoff_t index, void *expected, void *replacement) 329 329 { 330 - struct radix_tree_node *node; 331 - void __rcu **pslot; 330 + XA_STATE(xas, &mapping->i_pages, index); 332 331 void *item; 333 332 334 333 VM_BUG_ON(!expected); 335 334 VM_BUG_ON(!replacement); 336 - item = __radix_tree_lookup(&mapping->i_pages, index, &node, &pslot); 337 - if (!item) 338 - return -ENOENT; 335 + item = xas_load(&xas); 339 336 if (item != expected) 340 337 return -ENOENT; 341 - __radix_tree_replace(&mapping->i_pages, node, pslot, 342 - replacement, NULL); 338 + xas_store(&xas, replacement); 343 339 return 0; 344 340 } 345 341 ··· 349 353 static bool shmem_confirm_swap(struct address_space *mapping, 350 354 pgoff_t index, swp_entry_t swap) 351 355 { 352 - void *item; 353 - 354 - rcu_read_lock(); 355 - item = radix_tree_lookup(&mapping->i_pages, index); 356 - rcu_read_unlock(); 357 - return item == swp_to_radix_entry(swap); 356 + return xa_load(&mapping->i_pages, index) == swp_to_radix_entry(swap); 358 357 } 359 358 360 359 /* ··· 577 586 */ 578 587 static int shmem_add_to_page_cache(struct page *page, 579 588 struct address_space *mapping, 580 - pgoff_t index, void *expected) 589 + pgoff_t index, void *expected, gfp_t gfp) 581 590 { 582 - int error, nr = hpage_nr_pages(page); 591 + XA_STATE_ORDER(xas, &mapping->i_pages, index, compound_order(page)); 592 + unsigned long i = 0; 593 + unsigned long nr = 1UL << compound_order(page); 583 594 584 595 VM_BUG_ON_PAGE(PageTail(page), page); 585 596 VM_BUG_ON_PAGE(index != round_down(index, nr), page); ··· 593 600 page->mapping = mapping; 594 601 page->index = index; 595 602 596 - xa_lock_irq(&mapping->i_pages); 597 - if (PageTransHuge(page)) { 598 - void __rcu **results; 599 - pgoff_t idx; 600 - int i; 601 - 602 - error = 0; 603 - if (radix_tree_gang_lookup_slot(&mapping->i_pages, 604 - &results, &idx, index, 1) && 605 - idx < index + HPAGE_PMD_NR) { 606 - error = -EEXIST; 603 + do { 604 + void *entry; 605 + xas_lock_irq(&xas); 606 + entry = xas_find_conflict(&xas); 607 + if (entry != expected) 608 + xas_set_err(&xas, -EEXIST); 609 + xas_create_range(&xas); 610 + if (xas_error(&xas)) 611 + goto unlock; 612 + next: 613 + xas_store(&xas, page + i); 614 + if (++i < nr) { 615 + xas_next(&xas); 616 + goto next; 607 617 } 608 - 609 - if (!error) { 610 - for (i = 0; i < HPAGE_PMD_NR; i++) { 611 - error = radix_tree_insert(&mapping->i_pages, 612 - index + i, page + i); 613 - VM_BUG_ON(error); 614 - } 618 + if (PageTransHuge(page)) { 615 619 count_vm_event(THP_FILE_ALLOC); 616 - } 617 - } else if (!expected) { 618 - error = radix_tree_insert(&mapping->i_pages, index, page); 619 - } else { 620 - error = shmem_radix_tree_replace(mapping, index, expected, 621 - page); 622 - } 623 - 624 - if (!error) { 625 - mapping->nrpages += nr; 626 - if (PageTransHuge(page)) 627 620 __inc_node_page_state(page, NR_SHMEM_THPS); 621 + } 622 + mapping->nrpages += nr; 628 623 __mod_node_page_state(page_pgdat(page), NR_FILE_PAGES, nr); 629 624 __mod_node_page_state(page_pgdat(page), NR_SHMEM, nr); 630 - xa_unlock_irq(&mapping->i_pages); 631 - } else { 625 + unlock: 626 + xas_unlock_irq(&xas); 627 + } while (xas_nomem(&xas, gfp)); 628 + 629 + if (xas_error(&xas)) { 632 630 page->mapping = NULL; 633 - xa_unlock_irq(&mapping->i_pages); 634 631 page_ref_sub(page, nr); 632 + return xas_error(&xas); 635 633 } 636 - return error; 634 + 635 + return 0; 637 636 } 638 637 639 638 /* ··· 639 654 VM_BUG_ON_PAGE(PageCompound(page), page); 640 655 641 656 xa_lock_irq(&mapping->i_pages); 642 - error = shmem_radix_tree_replace(mapping, page->index, page, radswap); 657 + error = shmem_replace_entry(mapping, page->index, page, radswap); 643 658 page->mapping = NULL; 644 659 mapping->nrpages--; 645 660 __dec_node_page_state(page, NR_FILE_PAGES); ··· 650 665 } 651 666 652 667 /* 653 - * Remove swap entry from radix tree, free the swap and its page cache. 668 + * Remove swap entry from page cache, free the swap and its page cache. 654 669 */ 655 670 static int shmem_free_swap(struct address_space *mapping, 656 671 pgoff_t index, void *radswap) ··· 658 673 void *old; 659 674 660 675 xa_lock_irq(&mapping->i_pages); 661 - old = radix_tree_delete_item(&mapping->i_pages, index, radswap); 676 + old = __xa_cmpxchg(&mapping->i_pages, index, radswap, NULL, 0); 662 677 xa_unlock_irq(&mapping->i_pages); 663 678 if (old != radswap) 664 679 return -ENOENT; ··· 676 691 unsigned long shmem_partial_swap_usage(struct address_space *mapping, 677 692 pgoff_t start, pgoff_t end) 678 693 { 679 - struct radix_tree_iter iter; 680 - void __rcu **slot; 694 + XA_STATE(xas, &mapping->i_pages, start); 681 695 struct page *page; 682 696 unsigned long swapped = 0; 683 697 684 698 rcu_read_lock(); 685 - 686 - radix_tree_for_each_slot(slot, &mapping->i_pages, &iter, start) { 687 - if (iter.index >= end) 688 - break; 689 - 690 - page = radix_tree_deref_slot(slot); 691 - 692 - if (radix_tree_deref_retry(page)) { 693 - slot = radix_tree_iter_retry(&iter); 699 + xas_for_each(&xas, page, end - 1) { 700 + if (xas_retry(&xas, page)) 694 701 continue; 695 - } 696 - 697 - if (radix_tree_exceptional_entry(page)) 702 + if (xa_is_value(page)) 698 703 swapped++; 699 704 700 705 if (need_resched()) { 701 - slot = radix_tree_iter_resume(slot, &iter); 706 + xas_pause(&xas); 702 707 cond_resched_rcu(); 703 708 } 704 709 } ··· 763 788 } 764 789 765 790 /* 766 - * Remove range of pages and swap entries from radix tree, and free them. 791 + * Remove range of pages and swap entries from page cache, and free them. 767 792 * If !unfalloc, truncate or punch hole; if unfalloc, undo failed fallocate. 768 793 */ 769 794 static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend, ··· 799 824 if (index >= end) 800 825 break; 801 826 802 - if (radix_tree_exceptional_entry(page)) { 827 + if (xa_is_value(page)) { 803 828 if (unfalloc) 804 829 continue; 805 830 nr_swaps_freed += !shmem_free_swap(mapping, ··· 896 921 if (index >= end) 897 922 break; 898 923 899 - if (radix_tree_exceptional_entry(page)) { 924 + if (xa_is_value(page)) { 900 925 if (unfalloc) 901 926 continue; 902 927 if (shmem_free_swap(mapping, index, page)) { ··· 1085 1110 clear_inode(inode); 1086 1111 } 1087 1112 1088 - static unsigned long find_swap_entry(struct radix_tree_root *root, void *item) 1113 + static unsigned long find_swap_entry(struct xarray *xa, void *item) 1089 1114 { 1090 - struct radix_tree_iter iter; 1091 - void __rcu **slot; 1092 - unsigned long found = -1; 1115 + XA_STATE(xas, xa, 0); 1093 1116 unsigned int checked = 0; 1117 + void *entry; 1094 1118 1095 1119 rcu_read_lock(); 1096 - radix_tree_for_each_slot(slot, root, &iter, 0) { 1097 - void *entry = radix_tree_deref_slot(slot); 1098 - 1099 - if (radix_tree_deref_retry(entry)) { 1100 - slot = radix_tree_iter_retry(&iter); 1120 + xas_for_each(&xas, entry, ULONG_MAX) { 1121 + if (xas_retry(&xas, entry)) 1101 1122 continue; 1102 - } 1103 - if (entry == item) { 1104 - found = iter.index; 1123 + if (entry == item) 1105 1124 break; 1106 - } 1107 1125 checked++; 1108 - if ((checked % 4096) != 0) 1126 + if ((checked % XA_CHECK_SCHED) != 0) 1109 1127 continue; 1110 - slot = radix_tree_iter_resume(slot, &iter); 1128 + xas_pause(&xas); 1111 1129 cond_resched_rcu(); 1112 1130 } 1113 - 1114 1131 rcu_read_unlock(); 1115 - return found; 1132 + 1133 + return entry ? xas.xa_index : -1; 1116 1134 } 1117 1135 1118 1136 /* ··· 1143 1175 * We needed to drop mutex to make that restrictive page 1144 1176 * allocation, but the inode might have been freed while we 1145 1177 * dropped it: although a racing shmem_evict_inode() cannot 1146 - * complete without emptying the radix_tree, our page lock 1178 + * complete without emptying the page cache, our page lock 1147 1179 * on this swapcache page is not enough to prevent that - 1148 1180 * free_swap_and_cache() of our swap entry will only 1149 - * trylock_page(), removing swap from radix_tree whatever. 1181 + * trylock_page(), removing swap from page cache whatever. 1150 1182 * 1151 1183 * We must not proceed to shmem_add_to_page_cache() if the 1152 1184 * inode has been freed, but of course we cannot rely on ··· 1168 1200 */ 1169 1201 if (!error) 1170 1202 error = shmem_add_to_page_cache(*pagep, mapping, index, 1171 - radswap); 1203 + radswap, gfp); 1172 1204 if (error != -ENOMEM) { 1173 1205 /* 1174 1206 * Truncation and eviction use free_swap_and_cache(), which ··· 1212 1244 &memcg, false); 1213 1245 if (error) 1214 1246 goto out; 1215 - /* No radix_tree_preload: swap entry keeps a place for page in tree */ 1247 + /* No memory allocation: swap entry occupies the slot for the page */ 1216 1248 error = -EAGAIN; 1217 1249 1218 1250 mutex_lock(&shmem_swaplist_mutex); ··· 1421 1453 struct shmem_inode_info *info, pgoff_t index) 1422 1454 { 1423 1455 struct vm_area_struct pvma; 1424 - struct inode *inode = &info->vfs_inode; 1425 - struct address_space *mapping = inode->i_mapping; 1426 - pgoff_t idx, hindex; 1427 - void __rcu **results; 1456 + struct address_space *mapping = info->vfs_inode.i_mapping; 1457 + pgoff_t hindex; 1428 1458 struct page *page; 1429 1459 1430 1460 if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGE_PAGECACHE)) 1431 1461 return NULL; 1432 1462 1433 1463 hindex = round_down(index, HPAGE_PMD_NR); 1434 - rcu_read_lock(); 1435 - if (radix_tree_gang_lookup_slot(&mapping->i_pages, &results, &idx, 1436 - hindex, 1) && idx < hindex + HPAGE_PMD_NR) { 1437 - rcu_read_unlock(); 1464 + if (xa_find(&mapping->i_pages, &hindex, hindex + HPAGE_PMD_NR - 1, 1465 + XA_PRESENT)) 1438 1466 return NULL; 1439 - } 1440 - rcu_read_unlock(); 1441 1467 1442 1468 shmem_pseudo_vma_init(&pvma, info, hindex); 1443 1469 page = alloc_pages_vma(gfp | __GFP_COMP | __GFP_NORETRY | __GFP_NOWARN, ··· 1540 1578 * a nice clean interface for us to replace oldpage by newpage there. 1541 1579 */ 1542 1580 xa_lock_irq(&swap_mapping->i_pages); 1543 - error = shmem_radix_tree_replace(swap_mapping, swap_index, oldpage, 1544 - newpage); 1581 + error = shmem_replace_entry(swap_mapping, swap_index, oldpage, newpage); 1545 1582 if (!error) { 1546 1583 __inc_node_page_state(newpage, NR_FILE_PAGES); 1547 1584 __dec_node_page_state(oldpage, NR_FILE_PAGES); ··· 1604 1643 repeat: 1605 1644 swap.val = 0; 1606 1645 page = find_lock_entry(mapping, index); 1607 - if (radix_tree_exceptional_entry(page)) { 1646 + if (xa_is_value(page)) { 1608 1647 swap = radix_to_swp_entry(page); 1609 1648 page = NULL; 1610 1649 } ··· 1679 1718 false); 1680 1719 if (!error) { 1681 1720 error = shmem_add_to_page_cache(page, mapping, index, 1682 - swp_to_radix_entry(swap)); 1721 + swp_to_radix_entry(swap), gfp); 1683 1722 /* 1684 1723 * We already confirmed swap under page lock, and make 1685 1724 * no memory allocation here, so usually no possibility ··· 1785 1824 PageTransHuge(page)); 1786 1825 if (error) 1787 1826 goto unacct; 1788 - error = radix_tree_maybe_preload_order(gfp & GFP_RECLAIM_MASK, 1789 - compound_order(page)); 1790 - if (!error) { 1791 - error = shmem_add_to_page_cache(page, mapping, hindex, 1792 - NULL); 1793 - radix_tree_preload_end(); 1794 - } 1827 + error = shmem_add_to_page_cache(page, mapping, hindex, 1828 + NULL, gfp & GFP_RECLAIM_MASK); 1795 1829 if (error) { 1796 1830 mem_cgroup_cancel_charge(page, memcg, 1797 1831 PageTransHuge(page)); ··· 1887 1931 spin_unlock_irq(&info->lock); 1888 1932 goto repeat; 1889 1933 } 1890 - if (error == -EEXIST) /* from above or from radix_tree_insert */ 1934 + if (error == -EEXIST) 1891 1935 goto repeat; 1892 1936 return error; 1893 1937 } ··· 2255 2299 if (ret) 2256 2300 goto out_release; 2257 2301 2258 - ret = radix_tree_maybe_preload(gfp & GFP_RECLAIM_MASK); 2259 - if (!ret) { 2260 - ret = shmem_add_to_page_cache(page, mapping, pgoff, NULL); 2261 - radix_tree_preload_end(); 2262 - } 2302 + ret = shmem_add_to_page_cache(page, mapping, pgoff, NULL, 2303 + gfp & GFP_RECLAIM_MASK); 2263 2304 if (ret) 2264 2305 goto out_release_uncharge; 2265 2306 ··· 2501 2548 } 2502 2549 2503 2550 /* 2504 - * llseek SEEK_DATA or SEEK_HOLE through the radix_tree. 2551 + * llseek SEEK_DATA or SEEK_HOLE through the page cache. 2505 2552 */ 2506 2553 static pgoff_t shmem_seek_hole_data(struct address_space *mapping, 2507 2554 pgoff_t index, pgoff_t end, int whence) ··· 2531 2578 index = indices[i]; 2532 2579 } 2533 2580 page = pvec.pages[i]; 2534 - if (page && !radix_tree_exceptional_entry(page)) { 2581 + if (page && !xa_is_value(page)) { 2535 2582 if (!PageUptodate(page)) 2536 2583 page = NULL; 2537 2584 }
+3 -3
mm/swap.c
··· 964 964 965 965 for (i = 0, j = 0; i < pagevec_count(pvec); i++) { 966 966 struct page *page = pvec->pages[i]; 967 - if (!radix_tree_exceptional_entry(page)) 967 + if (!xa_is_value(page)) 968 968 pvec->pages[j++] = page; 969 969 } 970 970 pvec->nr = j; ··· 1001 1001 1002 1002 unsigned pagevec_lookup_range_tag(struct pagevec *pvec, 1003 1003 struct address_space *mapping, pgoff_t *index, pgoff_t end, 1004 - int tag) 1004 + xa_mark_t tag) 1005 1005 { 1006 1006 pvec->nr = find_get_pages_range_tag(mapping, index, end, tag, 1007 1007 PAGEVEC_SIZE, pvec->pages); ··· 1011 1011 1012 1012 unsigned pagevec_lookup_range_nr_tag(struct pagevec *pvec, 1013 1013 struct address_space *mapping, pgoff_t *index, pgoff_t end, 1014 - int tag, unsigned max_pages) 1014 + xa_mark_t tag, unsigned max_pages) 1015 1015 { 1016 1016 pvec->nr = find_get_pages_range_tag(mapping, index, end, tag, 1017 1017 min_t(unsigned int, max_pages, PAGEVEC_SIZE), pvec->pages);
+40 -79
mm/swap_state.c
··· 107 107 } 108 108 109 109 /* 110 - * __add_to_swap_cache resembles add_to_page_cache_locked on swapper_space, 110 + * add_to_swap_cache resembles add_to_page_cache_locked on swapper_space, 111 111 * but sets SwapCache flag and private instead of mapping and index. 112 112 */ 113 - int __add_to_swap_cache(struct page *page, swp_entry_t entry) 113 + int add_to_swap_cache(struct page *page, swp_entry_t entry, gfp_t gfp) 114 114 { 115 - int error, i, nr = hpage_nr_pages(page); 116 - struct address_space *address_space; 115 + struct address_space *address_space = swap_address_space(entry); 117 116 pgoff_t idx = swp_offset(entry); 117 + XA_STATE_ORDER(xas, &address_space->i_pages, idx, compound_order(page)); 118 + unsigned long i, nr = 1UL << compound_order(page); 118 119 119 120 VM_BUG_ON_PAGE(!PageLocked(page), page); 120 121 VM_BUG_ON_PAGE(PageSwapCache(page), page); ··· 124 123 page_ref_add(page, nr); 125 124 SetPageSwapCache(page); 126 125 127 - address_space = swap_address_space(entry); 128 - xa_lock_irq(&address_space->i_pages); 129 - for (i = 0; i < nr; i++) { 130 - set_page_private(page + i, entry.val + i); 131 - error = radix_tree_insert(&address_space->i_pages, 132 - idx + i, page + i); 133 - if (unlikely(error)) 134 - break; 135 - } 136 - if (likely(!error)) { 126 + do { 127 + xas_lock_irq(&xas); 128 + xas_create_range(&xas); 129 + if (xas_error(&xas)) 130 + goto unlock; 131 + for (i = 0; i < nr; i++) { 132 + VM_BUG_ON_PAGE(xas.xa_index != idx + i, page); 133 + set_page_private(page + i, entry.val + i); 134 + xas_store(&xas, page + i); 135 + xas_next(&xas); 136 + } 137 137 address_space->nrpages += nr; 138 138 __mod_node_page_state(page_pgdat(page), NR_FILE_PAGES, nr); 139 139 ADD_CACHE_INFO(add_total, nr); 140 - } else { 141 - /* 142 - * Only the context which have set SWAP_HAS_CACHE flag 143 - * would call add_to_swap_cache(). 144 - * So add_to_swap_cache() doesn't returns -EEXIST. 145 - */ 146 - VM_BUG_ON(error == -EEXIST); 147 - set_page_private(page + i, 0UL); 148 - while (i--) { 149 - radix_tree_delete(&address_space->i_pages, idx + i); 150 - set_page_private(page + i, 0UL); 151 - } 152 - ClearPageSwapCache(page); 153 - page_ref_sub(page, nr); 154 - } 155 - xa_unlock_irq(&address_space->i_pages); 140 + unlock: 141 + xas_unlock_irq(&xas); 142 + } while (xas_nomem(&xas, gfp)); 156 143 157 - return error; 158 - } 144 + if (!xas_error(&xas)) 145 + return 0; 159 146 160 - 161 - int add_to_swap_cache(struct page *page, swp_entry_t entry, gfp_t gfp_mask) 162 - { 163 - int error; 164 - 165 - error = radix_tree_maybe_preload_order(gfp_mask, compound_order(page)); 166 - if (!error) { 167 - error = __add_to_swap_cache(page, entry); 168 - radix_tree_preload_end(); 169 - } 170 - return error; 147 + ClearPageSwapCache(page); 148 + page_ref_sub(page, nr); 149 + return xas_error(&xas); 171 150 } 172 151 173 152 /* 174 153 * This must be called only on pages that have 175 154 * been verified to be in the swap cache. 176 155 */ 177 - void __delete_from_swap_cache(struct page *page) 156 + void __delete_from_swap_cache(struct page *page, swp_entry_t entry) 178 157 { 179 - struct address_space *address_space; 158 + struct address_space *address_space = swap_address_space(entry); 180 159 int i, nr = hpage_nr_pages(page); 181 - swp_entry_t entry; 182 - pgoff_t idx; 160 + pgoff_t idx = swp_offset(entry); 161 + XA_STATE(xas, &address_space->i_pages, idx); 183 162 184 163 VM_BUG_ON_PAGE(!PageLocked(page), page); 185 164 VM_BUG_ON_PAGE(!PageSwapCache(page), page); 186 165 VM_BUG_ON_PAGE(PageWriteback(page), page); 187 166 188 - entry.val = page_private(page); 189 - address_space = swap_address_space(entry); 190 - idx = swp_offset(entry); 191 167 for (i = 0; i < nr; i++) { 192 - radix_tree_delete(&address_space->i_pages, idx + i); 168 + void *entry = xas_store(&xas, NULL); 169 + VM_BUG_ON_PAGE(entry != page + i, entry); 193 170 set_page_private(page + i, 0); 171 + xas_next(&xas); 194 172 } 195 173 ClearPageSwapCache(page); 196 174 address_space->nrpages -= nr; ··· 197 217 return 0; 198 218 199 219 /* 200 - * Radix-tree node allocations from PF_MEMALLOC contexts could 220 + * XArray node allocations from PF_MEMALLOC contexts could 201 221 * completely exhaust the page allocator. __GFP_NOMEMALLOC 202 222 * stops emergency reserves from being allocated. 203 223 * ··· 209 229 */ 210 230 err = add_to_swap_cache(page, entry, 211 231 __GFP_HIGH|__GFP_NOMEMALLOC|__GFP_NOWARN); 212 - /* -ENOMEM radix-tree allocation failure */ 213 232 if (err) 214 233 /* 215 234 * add_to_swap_cache() doesn't return -EEXIST, so we can safely ··· 242 263 */ 243 264 void delete_from_swap_cache(struct page *page) 244 265 { 245 - swp_entry_t entry; 246 - struct address_space *address_space; 266 + swp_entry_t entry = { .val = page_private(page) }; 267 + struct address_space *address_space = swap_address_space(entry); 247 268 248 - entry.val = page_private(page); 249 - 250 - address_space = swap_address_space(entry); 251 269 xa_lock_irq(&address_space->i_pages); 252 - __delete_from_swap_cache(page); 270 + __delete_from_swap_cache(page, entry); 253 271 xa_unlock_irq(&address_space->i_pages); 254 272 255 273 put_swap_page(page, entry); ··· 390 414 } 391 415 392 416 /* 393 - * call radix_tree_preload() while we can wait. 394 - */ 395 - err = radix_tree_maybe_preload(gfp_mask & GFP_KERNEL); 396 - if (err) 397 - break; 398 - 399 - /* 400 417 * Swap entry may have been freed since our caller observed it. 401 418 */ 402 419 err = swapcache_prepare(entry); 403 420 if (err == -EEXIST) { 404 - radix_tree_preload_end(); 405 421 /* 406 422 * We might race against get_swap_page() and stumble 407 423 * across a SWAP_HAS_CACHE swap_map entry whose page ··· 401 433 */ 402 434 cond_resched(); 403 435 continue; 404 - } 405 - if (err) { /* swp entry is obsolete ? */ 406 - radix_tree_preload_end(); 436 + } else if (err) /* swp entry is obsolete ? */ 407 437 break; 408 - } 409 438 410 - /* May fail (-ENOMEM) if radix-tree node allocation failed. */ 439 + /* May fail (-ENOMEM) if XArray node allocation failed. */ 411 440 __SetPageLocked(new_page); 412 441 __SetPageSwapBacked(new_page); 413 - err = __add_to_swap_cache(new_page, entry); 442 + err = add_to_swap_cache(new_page, entry, gfp_mask & GFP_KERNEL); 414 443 if (likely(!err)) { 415 - radix_tree_preload_end(); 416 - /* 417 - * Initiate read into locked page and return. 418 - */ 444 + /* Initiate read into locked page */ 419 445 SetPageWorkingset(new_page); 420 446 lru_cache_add_anon(new_page); 421 447 *new_page_allocated = true; 422 448 return new_page; 423 449 } 424 - radix_tree_preload_end(); 425 450 __ClearPageLocked(new_page); 426 451 /* 427 452 * add_to_swap_cache() doesn't return -EEXIST, so we can safely ··· 587 626 return -ENOMEM; 588 627 for (i = 0; i < nr; i++) { 589 628 space = spaces + i; 590 - INIT_RADIX_TREE(&space->i_pages, GFP_ATOMIC|__GFP_NOWARN); 629 + xa_init_flags(&space->i_pages, XA_FLAGS_LOCK_IRQ); 591 630 atomic_set(&space->i_mmap_writable, 0); 592 631 space->a_ops = &swap_aops; 593 632 /* swap cache doesn't use writeback related tags */
+12 -15
mm/truncate.c
··· 33 33 static inline void __clear_shadow_entry(struct address_space *mapping, 34 34 pgoff_t index, void *entry) 35 35 { 36 - struct radix_tree_node *node; 37 - void **slot; 36 + XA_STATE(xas, &mapping->i_pages, index); 38 37 39 - if (!__radix_tree_lookup(&mapping->i_pages, index, &node, &slot)) 38 + xas_set_update(&xas, workingset_update_node); 39 + if (xas_load(&xas) != entry) 40 40 return; 41 - if (*slot != entry) 42 - return; 43 - __radix_tree_replace(&mapping->i_pages, node, slot, NULL, 44 - workingset_update_node); 41 + xas_store(&xas, NULL); 45 42 mapping->nrexceptional--; 46 43 } 47 44 ··· 67 70 return; 68 71 69 72 for (j = 0; j < pagevec_count(pvec); j++) 70 - if (radix_tree_exceptional_entry(pvec->pages[j])) 73 + if (xa_is_value(pvec->pages[j])) 71 74 break; 72 75 73 76 if (j == pagevec_count(pvec)) ··· 82 85 struct page *page = pvec->pages[i]; 83 86 pgoff_t index = indices[i]; 84 87 85 - if (!radix_tree_exceptional_entry(page)) { 88 + if (!xa_is_value(page)) { 86 89 pvec->pages[j++] = page; 87 90 continue; 88 91 } ··· 344 347 if (index >= end) 345 348 break; 346 349 347 - if (radix_tree_exceptional_entry(page)) 350 + if (xa_is_value(page)) 348 351 continue; 349 352 350 353 if (!trylock_page(page)) ··· 439 442 break; 440 443 } 441 444 442 - if (radix_tree_exceptional_entry(page)) 445 + if (xa_is_value(page)) 443 446 continue; 444 447 445 448 lock_page(page); ··· 558 561 if (index > end) 559 562 break; 560 563 561 - if (radix_tree_exceptional_entry(page)) { 564 + if (xa_is_value(page)) { 562 565 invalidate_exceptional_entry(mapping, index, 563 566 page); 564 567 continue; ··· 689 692 if (index > end) 690 693 break; 691 694 692 - if (radix_tree_exceptional_entry(page)) { 695 + if (xa_is_value(page)) { 693 696 if (!invalidate_exceptional_entry2(mapping, 694 697 index, page)) 695 698 ret = -EBUSY; ··· 735 738 index++; 736 739 } 737 740 /* 738 - * For DAX we invalidate page tables after invalidating radix tree. We 741 + * For DAX we invalidate page tables after invalidating page cache. We 739 742 * could invalidate page tables while invalidating each entry however 740 743 * that would be expensive. And doing range unmapping before doesn't 741 - * work as we have no cheap way to find whether radix tree entry didn't 744 + * work as we have no cheap way to find whether page cache entry didn't 742 745 * get remapped later. 743 746 */ 744 747 if (dax_mapping(mapping)) {
+5 -5
mm/vmscan.c
··· 751 751 { 752 752 /* 753 753 * A freeable page cache page is referenced only by the caller 754 - * that isolated the page, the page cache radix tree and 755 - * optional buffer heads at page->private. 754 + * that isolated the page, the page cache and optional buffer 755 + * heads at page->private. 756 756 */ 757 - int radix_pins = PageTransHuge(page) && PageSwapCache(page) ? 757 + int page_cache_pins = PageTransHuge(page) && PageSwapCache(page) ? 758 758 HPAGE_PMD_NR : 1; 759 - return page_count(page) - page_has_private(page) == 1 + radix_pins; 759 + return page_count(page) - page_has_private(page) == 1 + page_cache_pins; 760 760 } 761 761 762 762 static int may_write_to_inode(struct inode *inode, struct scan_control *sc) ··· 932 932 if (PageSwapCache(page)) { 933 933 swp_entry_t swap = { .val = page_private(page) }; 934 934 mem_cgroup_swapout(page, swap); 935 - __delete_from_swap_cache(page); 935 + __delete_from_swap_cache(page, swap); 936 936 xa_unlock_irqrestore(&mapping->i_pages, flags); 937 937 put_swap_page(page, swap); 938 938 } else {
+29 -39
mm/workingset.c
··· 160 160 * and activations is maintained (node->inactive_age). 161 161 * 162 162 * On eviction, a snapshot of this counter (along with some bits to 163 - * identify the node) is stored in the now empty page cache radix tree 163 + * identify the node) is stored in the now empty page cache 164 164 * slot of the evicted page. This is called a shadow entry. 165 165 * 166 166 * On cache misses for which there are shadow entries, an eligible 167 167 * refault distance will immediately activate the refaulting page. 168 168 */ 169 169 170 - #define EVICTION_SHIFT (RADIX_TREE_EXCEPTIONAL_ENTRY + \ 170 + #define EVICTION_SHIFT ((BITS_PER_LONG - BITS_PER_XA_VALUE) + \ 171 171 1 + NODES_SHIFT + MEM_CGROUP_ID_SHIFT) 172 172 #define EVICTION_MASK (~0UL >> EVICTION_SHIFT) 173 173 174 174 /* 175 175 * Eviction timestamps need to be able to cover the full range of 176 - * actionable refaults. However, bits are tight in the radix tree 176 + * actionable refaults. However, bits are tight in the xarray 177 177 * entry, and after storing the identifier for the lruvec there might 178 178 * not be enough left to represent every single actionable refault. In 179 179 * that case, we have to sacrifice granularity for distance, and group ··· 185 185 bool workingset) 186 186 { 187 187 eviction >>= bucket_order; 188 + eviction &= EVICTION_MASK; 188 189 eviction = (eviction << MEM_CGROUP_ID_SHIFT) | memcgid; 189 190 eviction = (eviction << NODES_SHIFT) | pgdat->node_id; 190 191 eviction = (eviction << 1) | workingset; 191 - eviction = (eviction << RADIX_TREE_EXCEPTIONAL_SHIFT); 192 192 193 - return (void *)(eviction | RADIX_TREE_EXCEPTIONAL_ENTRY); 193 + return xa_mk_value(eviction); 194 194 } 195 195 196 196 static void unpack_shadow(void *shadow, int *memcgidp, pg_data_t **pgdat, 197 197 unsigned long *evictionp, bool *workingsetp) 198 198 { 199 - unsigned long entry = (unsigned long)shadow; 199 + unsigned long entry = xa_to_value(shadow); 200 200 int memcgid, nid; 201 201 bool workingset; 202 202 203 - entry >>= RADIX_TREE_EXCEPTIONAL_SHIFT; 204 203 workingset = entry & 1; 205 204 entry >>= 1; 206 205 nid = entry & ((1UL << NODES_SHIFT) - 1); ··· 366 367 367 368 static struct list_lru shadow_nodes; 368 369 369 - void workingset_update_node(struct radix_tree_node *node) 370 + void workingset_update_node(struct xa_node *node) 370 371 { 371 372 /* 372 373 * Track non-empty nodes that contain only shadow entries; ··· 378 379 */ 379 380 VM_WARN_ON_ONCE(!irqs_disabled()); /* For __inc_lruvec_page_state */ 380 381 381 - if (node->count && node->count == node->exceptional) { 382 + if (node->count && node->count == node->nr_values) { 382 383 if (list_empty(&node->private_list)) { 383 384 list_lru_add(&shadow_nodes, &node->private_list); 384 385 __inc_lruvec_page_state(virt_to_page(node), ··· 403 404 nodes = list_lru_shrink_count(&shadow_nodes, sc); 404 405 405 406 /* 406 - * Approximate a reasonable limit for the radix tree nodes 407 + * Approximate a reasonable limit for the nodes 407 408 * containing shadow entries. We don't need to keep more 408 409 * shadow entries than possible pages on the active list, 409 410 * since refault distances bigger than that are dismissed. ··· 418 419 * worst-case density of 1/8th. Below that, not all eligible 419 420 * refaults can be detected anymore. 420 421 * 421 - * On 64-bit with 7 radix_tree_nodes per page and 64 slots 422 + * On 64-bit with 7 xa_nodes per page and 64 slots 422 423 * each, this will reclaim shadow entries when they consume 423 424 * ~1.8% of available memory: 424 425 * 425 - * PAGE_SIZE / radix_tree_nodes / node_entries * 8 / PAGE_SIZE 426 + * PAGE_SIZE / xa_nodes / node_entries * 8 / PAGE_SIZE 426 427 */ 427 428 #ifdef CONFIG_MEMCG 428 429 if (sc->memcg) { ··· 437 438 #endif 438 439 pages = node_present_pages(sc->nid); 439 440 440 - max_nodes = pages >> (RADIX_TREE_MAP_SHIFT - 3); 441 + max_nodes = pages >> (XA_CHUNK_SHIFT - 3); 441 442 442 443 if (!nodes) 443 444 return SHRINK_EMPTY; ··· 450 451 static enum lru_status shadow_lru_isolate(struct list_head *item, 451 452 struct list_lru_one *lru, 452 453 spinlock_t *lru_lock, 453 - void *arg) 454 + void *arg) __must_hold(lru_lock) 454 455 { 456 + struct xa_node *node = container_of(item, struct xa_node, private_list); 457 + XA_STATE(xas, node->array, 0); 455 458 struct address_space *mapping; 456 - struct radix_tree_node *node; 457 - unsigned int i; 458 459 int ret; 459 460 460 461 /* ··· 462 463 * the shadow node LRU under the i_pages lock and the 463 464 * lru_lock. Because the page cache tree is emptied before 464 465 * the inode can be destroyed, holding the lru_lock pins any 465 - * address_space that has radix tree nodes on the LRU. 466 + * address_space that has nodes on the LRU. 466 467 * 467 468 * We can then safely transition to the i_pages lock to 468 469 * pin only the address_space of the particular node we want 469 470 * to reclaim, take the node off-LRU, and drop the lru_lock. 470 471 */ 471 472 472 - node = container_of(item, struct radix_tree_node, private_list); 473 - mapping = container_of(node->root, struct address_space, i_pages); 473 + mapping = container_of(node->array, struct address_space, i_pages); 474 474 475 475 /* Coming from the list, invert the lock order */ 476 476 if (!xa_trylock(&mapping->i_pages)) { ··· 488 490 * no pages, so we expect to be able to remove them all and 489 491 * delete and free the empty node afterwards. 490 492 */ 491 - if (WARN_ON_ONCE(!node->exceptional)) 493 + if (WARN_ON_ONCE(!node->nr_values)) 492 494 goto out_invalid; 493 - if (WARN_ON_ONCE(node->count != node->exceptional)) 495 + if (WARN_ON_ONCE(node->count != node->nr_values)) 494 496 goto out_invalid; 495 - for (i = 0; i < RADIX_TREE_MAP_SIZE; i++) { 496 - if (node->slots[i]) { 497 - if (WARN_ON_ONCE(!radix_tree_exceptional_entry(node->slots[i]))) 498 - goto out_invalid; 499 - if (WARN_ON_ONCE(!node->exceptional)) 500 - goto out_invalid; 501 - if (WARN_ON_ONCE(!mapping->nrexceptional)) 502 - goto out_invalid; 503 - node->slots[i] = NULL; 504 - node->exceptional--; 505 - node->count--; 506 - mapping->nrexceptional--; 507 - } 508 - } 509 - if (WARN_ON_ONCE(node->exceptional)) 510 - goto out_invalid; 497 + mapping->nrexceptional -= node->nr_values; 498 + xas.xa_node = xa_parent_locked(&mapping->i_pages, node); 499 + xas.xa_offset = node->offset; 500 + xas.xa_shift = node->shift + XA_CHUNK_SHIFT; 501 + xas_set_update(&xas, workingset_update_node); 502 + /* 503 + * We could store a shadow entry here which was the minimum of the 504 + * shadow entries we were tracking ... 505 + */ 506 + xas_store(&xas, NULL); 511 507 __inc_lruvec_page_state(virt_to_page(node), WORKINGSET_NODERECLAIM); 512 - __radix_tree_delete_node(&mapping->i_pages, node, 513 - workingset_lookup_update(mapping)); 514 508 515 509 out_invalid: 516 510 xa_unlock_irq(&mapping->i_pages);
+1
tools/include/asm-generic/bitops.h
··· 27 27 #include <asm-generic/bitops/hweight.h> 28 28 29 29 #include <asm-generic/bitops/atomic.h> 30 + #include <asm-generic/bitops/non-atomic.h> 30 31 31 32 #endif /* __TOOLS_ASM_GENERIC_BITOPS_H */
-9
tools/include/asm-generic/bitops/atomic.h
··· 15 15 addr[nr / __BITS_PER_LONG] &= ~(1UL << (nr % __BITS_PER_LONG)); 16 16 } 17 17 18 - static __always_inline int test_bit(unsigned int nr, const unsigned long *addr) 19 - { 20 - return ((1UL << (nr % __BITS_PER_LONG)) & 21 - (((unsigned long *)addr)[nr / __BITS_PER_LONG])) != 0; 22 - } 23 - 24 - #define __set_bit(nr, addr) set_bit(nr, addr) 25 - #define __clear_bit(nr, addr) clear_bit(nr, addr) 26 - 27 18 #endif /* _TOOLS_LINUX_ASM_GENERIC_BITOPS_ATOMIC_H_ */
+109
tools/include/asm-generic/bitops/non-atomic.h
··· 1 + /* SPDX-License-Identifier: GPL-2.0 */ 2 + #ifndef _ASM_GENERIC_BITOPS_NON_ATOMIC_H_ 3 + #define _ASM_GENERIC_BITOPS_NON_ATOMIC_H_ 4 + 5 + #include <asm/types.h> 6 + 7 + /** 8 + * __set_bit - Set a bit in memory 9 + * @nr: the bit to set 10 + * @addr: the address to start counting from 11 + * 12 + * Unlike set_bit(), this function is non-atomic and may be reordered. 13 + * If it's called on the same region of memory simultaneously, the effect 14 + * may be that only one operation succeeds. 15 + */ 16 + static inline void __set_bit(int nr, volatile unsigned long *addr) 17 + { 18 + unsigned long mask = BIT_MASK(nr); 19 + unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr); 20 + 21 + *p |= mask; 22 + } 23 + 24 + static inline void __clear_bit(int nr, volatile unsigned long *addr) 25 + { 26 + unsigned long mask = BIT_MASK(nr); 27 + unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr); 28 + 29 + *p &= ~mask; 30 + } 31 + 32 + /** 33 + * __change_bit - Toggle a bit in memory 34 + * @nr: the bit to change 35 + * @addr: the address to start counting from 36 + * 37 + * Unlike change_bit(), this function is non-atomic and may be reordered. 38 + * If it's called on the same region of memory simultaneously, the effect 39 + * may be that only one operation succeeds. 40 + */ 41 + static inline void __change_bit(int nr, volatile unsigned long *addr) 42 + { 43 + unsigned long mask = BIT_MASK(nr); 44 + unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr); 45 + 46 + *p ^= mask; 47 + } 48 + 49 + /** 50 + * __test_and_set_bit - Set a bit and return its old value 51 + * @nr: Bit to set 52 + * @addr: Address to count from 53 + * 54 + * This operation is non-atomic and can be reordered. 55 + * If two examples of this operation race, one can appear to succeed 56 + * but actually fail. You must protect multiple accesses with a lock. 57 + */ 58 + static inline int __test_and_set_bit(int nr, volatile unsigned long *addr) 59 + { 60 + unsigned long mask = BIT_MASK(nr); 61 + unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr); 62 + unsigned long old = *p; 63 + 64 + *p = old | mask; 65 + return (old & mask) != 0; 66 + } 67 + 68 + /** 69 + * __test_and_clear_bit - Clear a bit and return its old value 70 + * @nr: Bit to clear 71 + * @addr: Address to count from 72 + * 73 + * This operation is non-atomic and can be reordered. 74 + * If two examples of this operation race, one can appear to succeed 75 + * but actually fail. You must protect multiple accesses with a lock. 76 + */ 77 + static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr) 78 + { 79 + unsigned long mask = BIT_MASK(nr); 80 + unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr); 81 + unsigned long old = *p; 82 + 83 + *p = old & ~mask; 84 + return (old & mask) != 0; 85 + } 86 + 87 + /* WARNING: non atomic and it can be reordered! */ 88 + static inline int __test_and_change_bit(int nr, 89 + volatile unsigned long *addr) 90 + { 91 + unsigned long mask = BIT_MASK(nr); 92 + unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr); 93 + unsigned long old = *p; 94 + 95 + *p = old ^ mask; 96 + return (old & mask) != 0; 97 + } 98 + 99 + /** 100 + * test_bit - Determine whether a bit is set 101 + * @nr: bit number to test 102 + * @addr: Address to start counting from 103 + */ 104 + static inline int test_bit(int nr, const volatile unsigned long *addr) 105 + { 106 + return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1))); 107 + } 108 + 109 + #endif /* _ASM_GENERIC_BITOPS_NON_ATOMIC_H_ */
+1
tools/include/linux/bitmap.h
··· 15 15 const unsigned long *bitmap2, int bits); 16 16 int __bitmap_and(unsigned long *dst, const unsigned long *bitmap1, 17 17 const unsigned long *bitmap2, unsigned int bits); 18 + void bitmap_clear(unsigned long *map, unsigned int start, int len); 18 19 19 20 #define BITMAP_FIRST_WORD_MASK(start) (~0UL << ((start) & (BITS_PER_LONG - 1))) 20 21
+1
tools/include/linux/kernel.h
··· 70 70 #define BUG_ON(cond) assert(!(cond)) 71 71 #endif 72 72 #endif 73 + #define BUG() BUG_ON(1) 73 74 74 75 #if __BYTE_ORDER == __BIG_ENDIAN 75 76 #define cpu_to_le16 bswap_16
+9 -1
tools/include/linux/spinlock.h
··· 8 8 #define spinlock_t pthread_mutex_t 9 9 #define DEFINE_SPINLOCK(x) pthread_mutex_t x = PTHREAD_MUTEX_INITIALIZER 10 10 #define __SPIN_LOCK_UNLOCKED(x) (pthread_mutex_t)PTHREAD_MUTEX_INITIALIZER 11 - #define spin_lock_init(x) pthread_mutex_init(x, NULL) 11 + #define spin_lock_init(x) pthread_mutex_init(x, NULL) 12 12 13 + #define spin_lock(x) pthread_mutex_lock(x) 14 + #define spin_unlock(x) pthread_mutex_unlock(x) 15 + #define spin_lock_bh(x) pthread_mutex_lock(x) 16 + #define spin_unlock_bh(x) pthread_mutex_unlock(x) 17 + #define spin_lock_irq(x) pthread_mutex_lock(x) 18 + #define spin_unlock_irq(x) pthread_mutex_unlock(x) 13 19 #define spin_lock_irqsave(x, f) (void)f, pthread_mutex_lock(x) 14 20 #define spin_unlock_irqrestore(x, f) (void)f, pthread_mutex_unlock(x) 15 21 ··· 36 30 { 37 31 return true; 38 32 } 33 + 34 + #include <linux/lockdep.h> 39 35 40 36 #endif
+1
tools/testing/radix-tree/.gitignore
··· 4 4 main 5 5 multiorder 6 6 radix-tree.c 7 + xarray
+8 -3
tools/testing/radix-tree/Makefile
··· 4 4 -fsanitize=undefined 5 5 LDFLAGS += -fsanitize=address -fsanitize=undefined 6 6 LDLIBS+= -lpthread -lurcu 7 - TARGETS = main idr-test multiorder 8 - CORE_OFILES := radix-tree.o idr.o linux.o test.o find_bit.o 7 + TARGETS = main idr-test multiorder xarray 8 + CORE_OFILES := xarray.o radix-tree.o idr.o linux.o test.o find_bit.o bitmap.o 9 9 OFILES = main.o $(CORE_OFILES) regression1.o regression2.o regression3.o \ 10 10 tag_check.o multiorder.o idr-test.o iteration_check.o benchmark.o 11 11 ··· 25 25 idr-test.o: ../../../lib/test_ida.c 26 26 idr-test: idr-test.o $(CORE_OFILES) 27 27 28 + xarray: $(CORE_OFILES) 29 + 28 30 multiorder: multiorder.o $(CORE_OFILES) 29 31 30 32 clean: ··· 37 35 $(OFILES): Makefile *.h */*.h generated/map-shift.h \ 38 36 ../../include/linux/*.h \ 39 37 ../../include/asm/*.h \ 38 + ../../../include/linux/xarray.h \ 40 39 ../../../include/linux/radix-tree.h \ 41 40 ../../../include/linux/idr.h 42 41 ··· 47 44 idr.c: ../../../lib/idr.c 48 45 sed -e 's/^static //' -e 's/__always_inline //' -e 's/inline //' < $< > $@ 49 46 47 + xarray.o: ../../../lib/xarray.c ../../../lib/test_xarray.c 48 + 50 49 generated/map-shift.h: 51 50 @if ! grep -qws $(SHIFT) generated/map-shift.h; then \ 52 - echo "#define RADIX_TREE_MAP_SHIFT $(SHIFT)" > \ 51 + echo "#define XA_CHUNK_SHIFT $(SHIFT)" > \ 53 52 generated/map-shift.h; \ 54 53 fi
+21 -120
tools/testing/radix-tree/benchmark.c
··· 17 17 #include <time.h> 18 18 #include "test.h" 19 19 20 - #define for_each_index(i, base, order) \ 21 - for (i = base; i < base + (1 << order); i++) 22 - 23 20 #define NSEC_PER_SEC 1000000000L 24 21 25 22 static long long benchmark_iter(struct radix_tree_root *root, bool tagged) ··· 58 61 } 59 62 60 63 static void benchmark_insert(struct radix_tree_root *root, 61 - unsigned long size, unsigned long step, int order) 64 + unsigned long size, unsigned long step) 62 65 { 63 66 struct timespec start, finish; 64 67 unsigned long index; ··· 67 70 clock_gettime(CLOCK_MONOTONIC, &start); 68 71 69 72 for (index = 0 ; index < size ; index += step) 70 - item_insert_order(root, index, order); 73 + item_insert(root, index); 71 74 72 75 clock_gettime(CLOCK_MONOTONIC, &finish); 73 76 74 77 nsec = (finish.tv_sec - start.tv_sec) * NSEC_PER_SEC + 75 78 (finish.tv_nsec - start.tv_nsec); 76 79 77 - printv(2, "Size: %8ld, step: %8ld, order: %d, insertion: %15lld ns\n", 78 - size, step, order, nsec); 80 + printv(2, "Size: %8ld, step: %8ld, insertion: %15lld ns\n", 81 + size, step, nsec); 79 82 } 80 83 81 84 static void benchmark_tagging(struct radix_tree_root *root, 82 - unsigned long size, unsigned long step, int order) 85 + unsigned long size, unsigned long step) 83 86 { 84 87 struct timespec start, finish; 85 88 unsigned long index; ··· 95 98 nsec = (finish.tv_sec - start.tv_sec) * NSEC_PER_SEC + 96 99 (finish.tv_nsec - start.tv_nsec); 97 100 98 - printv(2, "Size: %8ld, step: %8ld, order: %d, tagging: %17lld ns\n", 99 - size, step, order, nsec); 101 + printv(2, "Size: %8ld, step: %8ld, tagging: %17lld ns\n", 102 + size, step, nsec); 100 103 } 101 104 102 105 static void benchmark_delete(struct radix_tree_root *root, 103 - unsigned long size, unsigned long step, int order) 106 + unsigned long size, unsigned long step) 104 107 { 105 108 struct timespec start, finish; 106 - unsigned long index, i; 109 + unsigned long index; 107 110 long long nsec; 108 111 109 112 clock_gettime(CLOCK_MONOTONIC, &start); 110 113 111 114 for (index = 0 ; index < size ; index += step) 112 - for_each_index(i, index, order) 113 - item_delete(root, i); 115 + item_delete(root, index); 114 116 115 117 clock_gettime(CLOCK_MONOTONIC, &finish); 116 118 117 119 nsec = (finish.tv_sec - start.tv_sec) * NSEC_PER_SEC + 118 120 (finish.tv_nsec - start.tv_nsec); 119 121 120 - printv(2, "Size: %8ld, step: %8ld, order: %d, deletion: %16lld ns\n", 121 - size, step, order, nsec); 122 + printv(2, "Size: %8ld, step: %8ld, deletion: %16lld ns\n", 123 + size, step, nsec); 122 124 } 123 125 124 - static void benchmark_size(unsigned long size, unsigned long step, int order) 126 + static void benchmark_size(unsigned long size, unsigned long step) 125 127 { 126 128 RADIX_TREE(tree, GFP_KERNEL); 127 129 long long normal, tagged; 128 130 129 - benchmark_insert(&tree, size, step, order); 130 - benchmark_tagging(&tree, size, step, order); 131 + benchmark_insert(&tree, size, step); 132 + benchmark_tagging(&tree, size, step); 131 133 132 134 tagged = benchmark_iter(&tree, true); 133 135 normal = benchmark_iter(&tree, false); 134 136 135 - printv(2, "Size: %8ld, step: %8ld, order: %d, tagged iteration: %8lld ns\n", 136 - size, step, order, tagged); 137 - printv(2, "Size: %8ld, step: %8ld, order: %d, normal iteration: %8lld ns\n", 138 - size, step, order, normal); 137 + printv(2, "Size: %8ld, step: %8ld, tagged iteration: %8lld ns\n", 138 + size, step, tagged); 139 + printv(2, "Size: %8ld, step: %8ld, normal iteration: %8lld ns\n", 140 + size, step, normal); 139 141 140 - benchmark_delete(&tree, size, step, order); 142 + benchmark_delete(&tree, size, step); 141 143 142 144 item_kill_tree(&tree); 143 145 rcu_barrier(); 144 - } 145 - 146 - static long long __benchmark_split(unsigned long index, 147 - int old_order, int new_order) 148 - { 149 - struct timespec start, finish; 150 - long long nsec; 151 - RADIX_TREE(tree, GFP_ATOMIC); 152 - 153 - item_insert_order(&tree, index, old_order); 154 - 155 - clock_gettime(CLOCK_MONOTONIC, &start); 156 - radix_tree_split(&tree, index, new_order); 157 - clock_gettime(CLOCK_MONOTONIC, &finish); 158 - nsec = (finish.tv_sec - start.tv_sec) * NSEC_PER_SEC + 159 - (finish.tv_nsec - start.tv_nsec); 160 - 161 - item_kill_tree(&tree); 162 - 163 - return nsec; 164 - 165 - } 166 - 167 - static void benchmark_split(unsigned long size, unsigned long step) 168 - { 169 - int i, j, idx; 170 - long long nsec = 0; 171 - 172 - 173 - for (idx = 0; idx < size; idx += step) { 174 - for (i = 3; i < 11; i++) { 175 - for (j = 0; j < i; j++) { 176 - nsec += __benchmark_split(idx, i, j); 177 - } 178 - } 179 - } 180 - 181 - printv(2, "Size %8ld, step %8ld, split time %10lld ns\n", 182 - size, step, nsec); 183 - 184 - } 185 - 186 - static long long __benchmark_join(unsigned long index, 187 - unsigned order1, unsigned order2) 188 - { 189 - unsigned long loc; 190 - struct timespec start, finish; 191 - long long nsec; 192 - void *item, *item2 = item_create(index + 1, order1); 193 - RADIX_TREE(tree, GFP_KERNEL); 194 - 195 - item_insert_order(&tree, index, order2); 196 - item = radix_tree_lookup(&tree, index); 197 - 198 - clock_gettime(CLOCK_MONOTONIC, &start); 199 - radix_tree_join(&tree, index + 1, order1, item2); 200 - clock_gettime(CLOCK_MONOTONIC, &finish); 201 - nsec = (finish.tv_sec - start.tv_sec) * NSEC_PER_SEC + 202 - (finish.tv_nsec - start.tv_nsec); 203 - 204 - loc = find_item(&tree, item); 205 - if (loc == -1) 206 - free(item); 207 - 208 - item_kill_tree(&tree); 209 - 210 - return nsec; 211 - } 212 - 213 - static void benchmark_join(unsigned long step) 214 - { 215 - int i, j, idx; 216 - long long nsec = 0; 217 - 218 - for (idx = 0; idx < 1 << 10; idx += step) { 219 - for (i = 1; i < 15; i++) { 220 - for (j = 0; j < i; j++) { 221 - nsec += __benchmark_join(idx, i, j); 222 - } 223 - } 224 - } 225 - 226 - printv(2, "Size %8d, step %8ld, join time %10lld ns\n", 227 - 1 << 10, step, nsec); 228 146 } 229 147 230 148 void benchmark(void) ··· 154 242 155 243 for (c = 0; size[c]; c++) 156 244 for (s = 0; step[s]; s++) 157 - benchmark_size(size[c], step[s], 0); 158 - 159 - for (c = 0; size[c]; c++) 160 - for (s = 0; step[s]; s++) 161 - benchmark_size(size[c], step[s] << 9, 9); 162 - 163 - for (c = 0; size[c]; c++) 164 - for (s = 0; step[s]; s++) 165 - benchmark_split(size[c], step[s]); 166 - 167 - for (s = 0; step[s]; s++) 168 - benchmark_join(step[s]); 245 + benchmark_size(size[c], step[s]); 169 246 }
+23
tools/testing/radix-tree/bitmap.c
··· 1 + /* lib/bitmap.c pulls in at least two other files. */ 2 + 3 + #include <linux/bitmap.h> 4 + 5 + void bitmap_clear(unsigned long *map, unsigned int start, int len) 6 + { 7 + unsigned long *p = map + BIT_WORD(start); 8 + const unsigned int size = start + len; 9 + int bits_to_clear = BITS_PER_LONG - (start % BITS_PER_LONG); 10 + unsigned long mask_to_clear = BITMAP_FIRST_WORD_MASK(start); 11 + 12 + while (len - bits_to_clear >= 0) { 13 + *p &= ~mask_to_clear; 14 + len -= bits_to_clear; 15 + bits_to_clear = BITS_PER_LONG; 16 + mask_to_clear = ~0UL; 17 + p++; 18 + } 19 + if (len) { 20 + mask_to_clear &= BITMAP_LAST_WORD_MASK(size); 21 + *p &= ~mask_to_clear; 22 + } 23 + }
+1 -1
tools/testing/radix-tree/generated/autoconf.h
··· 1 - #define CONFIG_RADIX_TREE_MULTIORDER 1 1 + #define CONFIG_XARRAY_MULTI 1
+66 -5
tools/testing/radix-tree/idr-test.c
··· 19 19 20 20 #include "test.h" 21 21 22 - #define DUMMY_PTR ((void *)0x12) 22 + #define DUMMY_PTR ((void *)0x10) 23 23 24 24 int item_idr_free(int id, void *p, void *data) 25 25 { ··· 227 227 idr_u32_test1(&idr, 0xffffffff); 228 228 } 229 229 230 + static void idr_align_test(struct idr *idr) 231 + { 232 + char name[] = "Motorola 68000"; 233 + int i, id; 234 + void *entry; 235 + 236 + for (i = 0; i < 9; i++) { 237 + BUG_ON(idr_alloc(idr, &name[i], 0, 0, GFP_KERNEL) != i); 238 + idr_for_each_entry(idr, entry, id); 239 + } 240 + idr_destroy(idr); 241 + 242 + for (i = 1; i < 10; i++) { 243 + BUG_ON(idr_alloc(idr, &name[i], 0, 0, GFP_KERNEL) != i - 1); 244 + idr_for_each_entry(idr, entry, id); 245 + } 246 + idr_destroy(idr); 247 + 248 + for (i = 2; i < 11; i++) { 249 + BUG_ON(idr_alloc(idr, &name[i], 0, 0, GFP_KERNEL) != i - 2); 250 + idr_for_each_entry(idr, entry, id); 251 + } 252 + idr_destroy(idr); 253 + 254 + for (i = 3; i < 12; i++) { 255 + BUG_ON(idr_alloc(idr, &name[i], 0, 0, GFP_KERNEL) != i - 3); 256 + idr_for_each_entry(idr, entry, id); 257 + } 258 + idr_destroy(idr); 259 + 260 + for (i = 0; i < 8; i++) { 261 + BUG_ON(idr_alloc(idr, &name[i], 0, 0, GFP_KERNEL) != 0); 262 + BUG_ON(idr_alloc(idr, &name[i + 1], 0, 0, GFP_KERNEL) != 1); 263 + idr_for_each_entry(idr, entry, id); 264 + idr_remove(idr, 1); 265 + idr_for_each_entry(idr, entry, id); 266 + idr_remove(idr, 0); 267 + BUG_ON(!idr_is_empty(idr)); 268 + } 269 + 270 + for (i = 0; i < 8; i++) { 271 + BUG_ON(idr_alloc(idr, NULL, 0, 0, GFP_KERNEL) != 0); 272 + idr_for_each_entry(idr, entry, id); 273 + idr_replace(idr, &name[i], 0); 274 + idr_for_each_entry(idr, entry, id); 275 + BUG_ON(idr_find(idr, 0) != &name[i]); 276 + idr_remove(idr, 0); 277 + } 278 + 279 + for (i = 0; i < 8; i++) { 280 + BUG_ON(idr_alloc(idr, &name[i], 0, 0, GFP_KERNEL) != 0); 281 + BUG_ON(idr_alloc(idr, NULL, 0, 0, GFP_KERNEL) != 1); 282 + idr_remove(idr, 1); 283 + idr_for_each_entry(idr, entry, id); 284 + idr_replace(idr, &name[i + 1], 0); 285 + idr_for_each_entry(idr, entry, id); 286 + idr_remove(idr, 0); 287 + } 288 + } 289 + 230 290 void idr_checks(void) 231 291 { 232 292 unsigned long i; ··· 367 307 idr_u32_test(4); 368 308 idr_u32_test(1); 369 309 idr_u32_test(0); 310 + idr_align_test(&idr); 370 311 } 371 312 372 313 #define module_init(x) ··· 405 344 DEFINE_IDA(ida); 406 345 unsigned long i; 407 346 408 - radix_tree_cpu_dead(1); 409 347 for (i = 0; i < 1000000; i++) { 410 348 int id = ida_alloc(&ida, GFP_NOWAIT); 411 349 if (id == -ENOMEM) { 412 - IDA_BUG_ON(&ida, (i % IDA_BITMAP_BITS) != 413 - BITS_PER_LONG - 2); 350 + IDA_BUG_ON(&ida, ((i % IDA_BITMAP_BITS) != 351 + BITS_PER_XA_VALUE) && 352 + ((i % IDA_BITMAP_BITS) != 0)); 414 353 id = ida_alloc(&ida, GFP_KERNEL); 415 354 } else { 416 355 IDA_BUG_ON(&ida, (i % IDA_BITMAP_BITS) == 417 - BITS_PER_LONG - 2); 356 + BITS_PER_XA_VALUE); 418 357 } 419 358 IDA_BUG_ON(&ida, id != i); 420 359 }
+53 -56
tools/testing/radix-tree/iteration_check.c
··· 1 1 /* 2 - * iteration_check.c: test races having to do with radix tree iteration 2 + * iteration_check.c: test races having to do with xarray iteration 3 3 * Copyright (c) 2016 Intel Corporation 4 4 * Author: Ross Zwisler <ross.zwisler@linux.intel.com> 5 5 * ··· 12 12 * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for 13 13 * more details. 14 14 */ 15 - #include <linux/radix-tree.h> 16 15 #include <pthread.h> 17 16 #include "test.h" 18 17 19 18 #define NUM_THREADS 5 20 19 #define MAX_IDX 100 21 - #define TAG 0 22 - #define NEW_TAG 1 20 + #define TAG XA_MARK_0 21 + #define NEW_TAG XA_MARK_1 23 22 24 - static pthread_mutex_t tree_lock = PTHREAD_MUTEX_INITIALIZER; 25 23 static pthread_t threads[NUM_THREADS]; 26 24 static unsigned int seeds[3]; 27 - static RADIX_TREE(tree, GFP_KERNEL); 25 + static DEFINE_XARRAY(array); 28 26 static bool test_complete; 29 27 static int max_order; 30 28 31 - /* relentlessly fill the tree with tagged entries */ 29 + void my_item_insert(struct xarray *xa, unsigned long index) 30 + { 31 + XA_STATE(xas, xa, index); 32 + struct item *item = item_create(index, 0); 33 + int order; 34 + 35 + retry: 36 + xas_lock(&xas); 37 + for (order = max_order; order >= 0; order--) { 38 + xas_set_order(&xas, index, order); 39 + item->order = order; 40 + if (xas_find_conflict(&xas)) 41 + continue; 42 + xas_store(&xas, item); 43 + xas_set_mark(&xas, TAG); 44 + break; 45 + } 46 + xas_unlock(&xas); 47 + if (xas_nomem(&xas, GFP_KERNEL)) 48 + goto retry; 49 + if (order < 0) 50 + free(item); 51 + } 52 + 53 + /* relentlessly fill the array with tagged entries */ 32 54 static void *add_entries_fn(void *arg) 33 55 { 34 56 rcu_register_thread(); 35 57 36 58 while (!test_complete) { 37 59 unsigned long pgoff; 38 - int order; 39 60 40 61 for (pgoff = 0; pgoff < MAX_IDX; pgoff++) { 41 - pthread_mutex_lock(&tree_lock); 42 - for (order = max_order; order >= 0; order--) { 43 - if (item_insert_order(&tree, pgoff, order) 44 - == 0) { 45 - item_tag_set(&tree, pgoff, TAG); 46 - break; 47 - } 48 - } 49 - pthread_mutex_unlock(&tree_lock); 62 + my_item_insert(&array, pgoff); 50 63 } 51 64 } 52 65 ··· 69 56 } 70 57 71 58 /* 72 - * Iterate over the tagged entries, doing a radix_tree_iter_retry() as we find 73 - * things that have been removed and randomly resetting our iteration to the 74 - * next chunk with radix_tree_iter_resume(). Both radix_tree_iter_retry() and 75 - * radix_tree_iter_resume() cause radix_tree_next_slot() to be called with a 76 - * NULL 'slot' variable. 59 + * Iterate over tagged entries, retrying when we find ourselves in a deleted 60 + * node and randomly pausing the iteration. 77 61 */ 78 62 static void *tagged_iteration_fn(void *arg) 79 63 { 80 - struct radix_tree_iter iter; 81 - void **slot; 64 + XA_STATE(xas, &array, 0); 65 + void *entry; 82 66 83 67 rcu_register_thread(); 84 68 85 69 while (!test_complete) { 70 + xas_set(&xas, 0); 86 71 rcu_read_lock(); 87 - radix_tree_for_each_tagged(slot, &tree, &iter, 0, TAG) { 88 - void *entry = radix_tree_deref_slot(slot); 89 - if (unlikely(!entry)) 72 + xas_for_each_marked(&xas, entry, ULONG_MAX, TAG) { 73 + if (xas_retry(&xas, entry)) 90 74 continue; 91 - 92 - if (radix_tree_deref_retry(entry)) { 93 - slot = radix_tree_iter_retry(&iter); 94 - continue; 95 - } 96 75 97 76 if (rand_r(&seeds[0]) % 50 == 0) { 98 - slot = radix_tree_iter_resume(slot, &iter); 77 + xas_pause(&xas); 99 78 rcu_read_unlock(); 100 79 rcu_barrier(); 101 80 rcu_read_lock(); ··· 102 97 } 103 98 104 99 /* 105 - * Iterate over the entries, doing a radix_tree_iter_retry() as we find things 106 - * that have been removed and randomly resetting our iteration to the next 107 - * chunk with radix_tree_iter_resume(). Both radix_tree_iter_retry() and 108 - * radix_tree_iter_resume() cause radix_tree_next_slot() to be called with a 109 - * NULL 'slot' variable. 100 + * Iterate over the entries, retrying when we find ourselves in a deleted 101 + * node and randomly pausing the iteration. 110 102 */ 111 103 static void *untagged_iteration_fn(void *arg) 112 104 { 113 - struct radix_tree_iter iter; 114 - void **slot; 105 + XA_STATE(xas, &array, 0); 106 + void *entry; 115 107 116 108 rcu_register_thread(); 117 109 118 110 while (!test_complete) { 111 + xas_set(&xas, 0); 119 112 rcu_read_lock(); 120 - radix_tree_for_each_slot(slot, &tree, &iter, 0) { 121 - void *entry = radix_tree_deref_slot(slot); 122 - if (unlikely(!entry)) 113 + xas_for_each(&xas, entry, ULONG_MAX) { 114 + if (xas_retry(&xas, entry)) 123 115 continue; 124 - 125 - if (radix_tree_deref_retry(entry)) { 126 - slot = radix_tree_iter_retry(&iter); 127 - continue; 128 - } 129 116 130 117 if (rand_r(&seeds[1]) % 50 == 0) { 131 - slot = radix_tree_iter_resume(slot, &iter); 118 + xas_pause(&xas); 132 119 rcu_read_unlock(); 133 120 rcu_barrier(); 134 121 rcu_read_lock(); ··· 135 138 } 136 139 137 140 /* 138 - * Randomly remove entries to help induce radix_tree_iter_retry() calls in the 141 + * Randomly remove entries to help induce retries in the 139 142 * two iteration functions. 140 143 */ 141 144 static void *remove_entries_fn(void *arg) ··· 144 147 145 148 while (!test_complete) { 146 149 int pgoff; 150 + struct item *item; 147 151 148 152 pgoff = rand_r(&seeds[2]) % MAX_IDX; 149 153 150 - pthread_mutex_lock(&tree_lock); 151 - item_delete(&tree, pgoff); 152 - pthread_mutex_unlock(&tree_lock); 154 + item = xa_erase(&array, pgoff); 155 + if (item) 156 + item_free(item, pgoff); 153 157 } 154 158 155 159 rcu_unregister_thread(); ··· 163 165 rcu_register_thread(); 164 166 165 167 while (!test_complete) { 166 - tag_tagged_items(&tree, &tree_lock, 0, MAX_IDX, 10, TAG, 167 - NEW_TAG); 168 + tag_tagged_items(&array, 0, MAX_IDX, 10, TAG, NEW_TAG); 168 169 } 169 170 rcu_unregister_thread(); 170 171 return NULL; ··· 214 217 } 215 218 } 216 219 217 - item_kill_tree(&tree); 220 + item_kill_tree(&array); 218 221 }
+1
tools/testing/radix-tree/linux/bug.h
··· 1 + #include <stdio.h> 1 2 #include "asm/bug.h"
+1
tools/testing/radix-tree/linux/kconfig.h
··· 1 + #include "../../../../include/linux/kconfig.h"
+5
tools/testing/radix-tree/linux/kernel.h
··· 14 14 #include "../../../include/linux/kconfig.h" 15 15 16 16 #define printk printf 17 + #define pr_info printk 17 18 #define pr_debug printk 18 19 #define pr_cont printk 20 + 21 + #define __acquires(x) 22 + #define __releases(x) 23 + #define __must_hold(x) 19 24 20 25 #endif /* _KERNEL_H */
+11
tools/testing/radix-tree/linux/lockdep.h
··· 1 + #ifndef _LINUX_LOCKDEP_H 2 + #define _LINUX_LOCKDEP_H 3 + struct lock_class_key { 4 + unsigned int a; 5 + }; 6 + 7 + static inline void lockdep_set_class(spinlock_t *lock, 8 + struct lock_class_key *key) 9 + { 10 + } 11 + #endif /* _LINUX_LOCKDEP_H */
-1
tools/testing/radix-tree/linux/radix-tree.h
··· 2 2 #ifndef _TEST_RADIX_TREE_H 3 3 #define _TEST_RADIX_TREE_H 4 4 5 - #include "generated/map-shift.h" 6 5 #include "../../../../include/linux/radix-tree.h" 7 6 8 7 extern int kmalloc_verbose;
+2
tools/testing/radix-tree/linux/rcupdate.h
··· 6 6 7 7 #define rcu_dereference_raw(p) rcu_dereference(p) 8 8 #define rcu_dereference_protected(p, cond) rcu_dereference(p) 9 + #define rcu_dereference_check(p, cond) rcu_dereference(p) 10 + #define RCU_INIT_POINTER(p, v) (p) = (v) 9 11 10 12 #endif
+3 -63
tools/testing/radix-tree/main.c
··· 214 214 } 215 215 216 216 // printf("\ncopying tags...\n"); 217 - tagged = tag_tagged_items(&tree, NULL, start, end, ITEMS, 0, 1); 217 + tagged = tag_tagged_items(&tree, start, end, ITEMS, XA_MARK_0, XA_MARK_1); 218 218 219 219 // printf("checking copied tags\n"); 220 220 assert(tagged == count); ··· 223 223 /* Copy tags in several rounds */ 224 224 // printf("\ncopying tags...\n"); 225 225 tmp = rand() % (count / 10 + 2); 226 - tagged = tag_tagged_items(&tree, NULL, start, end, tmp, 0, 2); 226 + tagged = tag_tagged_items(&tree, start, end, tmp, XA_MARK_0, XA_MARK_2); 227 227 assert(tagged == count); 228 228 229 229 // printf("%lu %lu %lu\n", tagged, tmp, count); ··· 236 236 item_kill_tree(&tree); 237 237 } 238 238 239 - static void __locate_check(struct radix_tree_root *tree, unsigned long index, 240 - unsigned order) 241 - { 242 - struct item *item; 243 - unsigned long index2; 244 - 245 - item_insert_order(tree, index, order); 246 - item = item_lookup(tree, index); 247 - index2 = find_item(tree, item); 248 - if (index != index2) { 249 - printv(2, "index %ld order %d inserted; found %ld\n", 250 - index, order, index2); 251 - abort(); 252 - } 253 - } 254 - 255 - static void __order_0_locate_check(void) 256 - { 257 - RADIX_TREE(tree, GFP_KERNEL); 258 - int i; 259 - 260 - for (i = 0; i < 50; i++) 261 - __locate_check(&tree, rand() % INT_MAX, 0); 262 - 263 - item_kill_tree(&tree); 264 - } 265 - 266 - static void locate_check(void) 267 - { 268 - RADIX_TREE(tree, GFP_KERNEL); 269 - unsigned order; 270 - unsigned long offset, index; 271 - 272 - __order_0_locate_check(); 273 - 274 - for (order = 0; order < 20; order++) { 275 - for (offset = 0; offset < (1 << (order + 3)); 276 - offset += (1UL << order)) { 277 - for (index = 0; index < (1UL << (order + 5)); 278 - index += (1UL << order)) { 279 - __locate_check(&tree, index + offset, order); 280 - } 281 - if (find_item(&tree, &tree) != -1) 282 - abort(); 283 - 284 - item_kill_tree(&tree); 285 - } 286 - } 287 - 288 - if (find_item(&tree, &tree) != -1) 289 - abort(); 290 - __locate_check(&tree, -1, 0); 291 - if (find_item(&tree, &tree) != -1) 292 - abort(); 293 - item_kill_tree(&tree); 294 - } 295 - 296 239 static void single_thread_tests(bool long_run) 297 240 { 298 241 int i; ··· 245 302 multiorder_checks(); 246 303 rcu_barrier(); 247 304 printv(2, "after multiorder_check: %d allocated, preempt %d\n", 248 - nr_allocated, preempt_count); 249 - locate_check(); 250 - rcu_barrier(); 251 - printv(2, "after locate_check: %d allocated, preempt %d\n", 252 305 nr_allocated, preempt_count); 253 306 tag_check(); 254 307 rcu_barrier(); ··· 304 365 rcu_register_thread(); 305 366 radix_tree_init(); 306 367 368 + xarray_tests(); 307 369 regression1_test(); 308 370 regression2_test(); 309 371 regression3_test();
+65 -546
tools/testing/radix-tree/multiorder.c
··· 20 20 21 21 #include "test.h" 22 22 23 - #define for_each_index(i, base, order) \ 24 - for (i = base; i < base + (1 << order); i++) 25 - 26 - static void __multiorder_tag_test(int index, int order) 23 + static int item_insert_order(struct xarray *xa, unsigned long index, 24 + unsigned order) 27 25 { 28 - RADIX_TREE(tree, GFP_KERNEL); 29 - int base, err, i; 26 + XA_STATE_ORDER(xas, xa, index, order); 27 + struct item *item = item_create(index, order); 30 28 31 - /* our canonical entry */ 32 - base = index & ~((1 << order) - 1); 29 + do { 30 + xas_lock(&xas); 31 + xas_store(&xas, item); 32 + xas_unlock(&xas); 33 + } while (xas_nomem(&xas, GFP_KERNEL)); 33 34 34 - printv(2, "Multiorder tag test with index %d, canonical entry %d\n", 35 - index, base); 35 + if (!xas_error(&xas)) 36 + return 0; 36 37 37 - err = item_insert_order(&tree, index, order); 38 - assert(!err); 39 - 40 - /* 41 - * Verify we get collisions for covered indices. We try and fail to 42 - * insert an exceptional entry so we don't leak memory via 43 - * item_insert_order(). 44 - */ 45 - for_each_index(i, base, order) { 46 - err = __radix_tree_insert(&tree, i, order, 47 - (void *)(0xA0 | RADIX_TREE_EXCEPTIONAL_ENTRY)); 48 - assert(err == -EEXIST); 49 - } 50 - 51 - for_each_index(i, base, order) { 52 - assert(!radix_tree_tag_get(&tree, i, 0)); 53 - assert(!radix_tree_tag_get(&tree, i, 1)); 54 - } 55 - 56 - assert(radix_tree_tag_set(&tree, index, 0)); 57 - 58 - for_each_index(i, base, order) { 59 - assert(radix_tree_tag_get(&tree, i, 0)); 60 - assert(!radix_tree_tag_get(&tree, i, 1)); 61 - } 62 - 63 - assert(tag_tagged_items(&tree, NULL, 0, ~0UL, 10, 0, 1) == 1); 64 - assert(radix_tree_tag_clear(&tree, index, 0)); 65 - 66 - for_each_index(i, base, order) { 67 - assert(!radix_tree_tag_get(&tree, i, 0)); 68 - assert(radix_tree_tag_get(&tree, i, 1)); 69 - } 70 - 71 - assert(radix_tree_tag_clear(&tree, index, 1)); 72 - 73 - assert(!radix_tree_tagged(&tree, 0)); 74 - assert(!radix_tree_tagged(&tree, 1)); 75 - 76 - item_kill_tree(&tree); 38 + free(item); 39 + return xas_error(&xas); 77 40 } 78 41 79 - static void __multiorder_tag_test2(unsigned order, unsigned long index2) 42 + void multiorder_iteration(struct xarray *xa) 80 43 { 81 - RADIX_TREE(tree, GFP_KERNEL); 82 - unsigned long index = (1 << order); 83 - index2 += index; 84 - 85 - assert(item_insert_order(&tree, 0, order) == 0); 86 - assert(item_insert(&tree, index2) == 0); 87 - 88 - assert(radix_tree_tag_set(&tree, 0, 0)); 89 - assert(radix_tree_tag_set(&tree, index2, 0)); 90 - 91 - assert(tag_tagged_items(&tree, NULL, 0, ~0UL, 10, 0, 1) == 2); 92 - 93 - item_kill_tree(&tree); 94 - } 95 - 96 - static void multiorder_tag_tests(void) 97 - { 98 - int i, j; 99 - 100 - /* test multi-order entry for indices 0-7 with no sibling pointers */ 101 - __multiorder_tag_test(0, 3); 102 - __multiorder_tag_test(5, 3); 103 - 104 - /* test multi-order entry for indices 8-15 with no sibling pointers */ 105 - __multiorder_tag_test(8, 3); 106 - __multiorder_tag_test(15, 3); 107 - 108 - /* 109 - * Our order 5 entry covers indices 0-31 in a tree with height=2. 110 - * This is broken up as follows: 111 - * 0-7: canonical entry 112 - * 8-15: sibling 1 113 - * 16-23: sibling 2 114 - * 24-31: sibling 3 115 - */ 116 - __multiorder_tag_test(0, 5); 117 - __multiorder_tag_test(29, 5); 118 - 119 - /* same test, but with indices 32-63 */ 120 - __multiorder_tag_test(32, 5); 121 - __multiorder_tag_test(44, 5); 122 - 123 - /* 124 - * Our order 8 entry covers indices 0-255 in a tree with height=3. 125 - * This is broken up as follows: 126 - * 0-63: canonical entry 127 - * 64-127: sibling 1 128 - * 128-191: sibling 2 129 - * 192-255: sibling 3 130 - */ 131 - __multiorder_tag_test(0, 8); 132 - __multiorder_tag_test(190, 8); 133 - 134 - /* same test, but with indices 256-511 */ 135 - __multiorder_tag_test(256, 8); 136 - __multiorder_tag_test(300, 8); 137 - 138 - __multiorder_tag_test(0x12345678UL, 8); 139 - 140 - for (i = 1; i < 10; i++) 141 - for (j = 0; j < (10 << i); j++) 142 - __multiorder_tag_test2(i, j); 143 - } 144 - 145 - static void multiorder_check(unsigned long index, int order) 146 - { 147 - unsigned long i; 148 - unsigned long min = index & ~((1UL << order) - 1); 149 - unsigned long max = min + (1UL << order); 150 - void **slot; 151 - struct item *item2 = item_create(min, order); 152 - RADIX_TREE(tree, GFP_KERNEL); 153 - 154 - printv(2, "Multiorder index %ld, order %d\n", index, order); 155 - 156 - assert(item_insert_order(&tree, index, order) == 0); 157 - 158 - for (i = min; i < max; i++) { 159 - struct item *item = item_lookup(&tree, i); 160 - assert(item != 0); 161 - assert(item->index == index); 162 - } 163 - for (i = 0; i < min; i++) 164 - item_check_absent(&tree, i); 165 - for (i = max; i < 2*max; i++) 166 - item_check_absent(&tree, i); 167 - for (i = min; i < max; i++) 168 - assert(radix_tree_insert(&tree, i, item2) == -EEXIST); 169 - 170 - slot = radix_tree_lookup_slot(&tree, index); 171 - free(*slot); 172 - radix_tree_replace_slot(&tree, slot, item2); 173 - for (i = min; i < max; i++) { 174 - struct item *item = item_lookup(&tree, i); 175 - assert(item != 0); 176 - assert(item->index == min); 177 - } 178 - 179 - assert(item_delete(&tree, min) != 0); 180 - 181 - for (i = 0; i < 2*max; i++) 182 - item_check_absent(&tree, i); 183 - } 184 - 185 - static void multiorder_shrink(unsigned long index, int order) 186 - { 187 - unsigned long i; 188 - unsigned long max = 1 << order; 189 - RADIX_TREE(tree, GFP_KERNEL); 190 - struct radix_tree_node *node; 191 - 192 - printv(2, "Multiorder shrink index %ld, order %d\n", index, order); 193 - 194 - assert(item_insert_order(&tree, 0, order) == 0); 195 - 196 - node = tree.rnode; 197 - 198 - assert(item_insert(&tree, index) == 0); 199 - assert(node != tree.rnode); 200 - 201 - assert(item_delete(&tree, index) != 0); 202 - assert(node == tree.rnode); 203 - 204 - for (i = 0; i < max; i++) { 205 - struct item *item = item_lookup(&tree, i); 206 - assert(item != 0); 207 - assert(item->index == 0); 208 - } 209 - for (i = max; i < 2*max; i++) 210 - item_check_absent(&tree, i); 211 - 212 - if (!item_delete(&tree, 0)) { 213 - printv(2, "failed to delete index %ld (order %d)\n", index, order); 214 - abort(); 215 - } 216 - 217 - for (i = 0; i < 2*max; i++) 218 - item_check_absent(&tree, i); 219 - } 220 - 221 - static void multiorder_insert_bug(void) 222 - { 223 - RADIX_TREE(tree, GFP_KERNEL); 224 - 225 - item_insert(&tree, 0); 226 - radix_tree_tag_set(&tree, 0, 0); 227 - item_insert_order(&tree, 3 << 6, 6); 228 - 229 - item_kill_tree(&tree); 230 - } 231 - 232 - void multiorder_iteration(void) 233 - { 234 - RADIX_TREE(tree, GFP_KERNEL); 235 - struct radix_tree_iter iter; 236 - void **slot; 44 + XA_STATE(xas, xa, 0); 45 + struct item *item; 237 46 int i, j, err; 238 - 239 - printv(1, "Multiorder iteration test\n"); 240 47 241 48 #define NUM_ENTRIES 11 242 49 int index[NUM_ENTRIES] = {0, 2, 4, 8, 16, 32, 34, 36, 64, 72, 128}; 243 50 int order[NUM_ENTRIES] = {1, 1, 2, 3, 4, 1, 0, 1, 3, 0, 7}; 244 51 52 + printv(1, "Multiorder iteration test\n"); 53 + 245 54 for (i = 0; i < NUM_ENTRIES; i++) { 246 - err = item_insert_order(&tree, index[i], order[i]); 55 + err = item_insert_order(xa, index[i], order[i]); 247 56 assert(!err); 248 57 } 249 58 ··· 61 252 if (j <= (index[i] | ((1 << order[i]) - 1))) 62 253 break; 63 254 64 - radix_tree_for_each_slot(slot, &tree, &iter, j) { 65 - int height = order[i] / RADIX_TREE_MAP_SHIFT; 66 - int shift = height * RADIX_TREE_MAP_SHIFT; 255 + xas_set(&xas, j); 256 + xas_for_each(&xas, item, ULONG_MAX) { 257 + int height = order[i] / XA_CHUNK_SHIFT; 258 + int shift = height * XA_CHUNK_SHIFT; 67 259 unsigned long mask = (1UL << order[i]) - 1; 68 - struct item *item = *slot; 69 260 70 - assert((iter.index | mask) == (index[i] | mask)); 71 - assert(iter.shift == shift); 261 + assert((xas.xa_index | mask) == (index[i] | mask)); 262 + assert(xas.xa_node->shift == shift); 72 263 assert(!radix_tree_is_internal_node(item)); 73 264 assert((item->index | mask) == (index[i] | mask)); 74 265 assert(item->order == order[i]); ··· 76 267 } 77 268 } 78 269 79 - item_kill_tree(&tree); 270 + item_kill_tree(xa); 80 271 } 81 272 82 - void multiorder_tagged_iteration(void) 273 + void multiorder_tagged_iteration(struct xarray *xa) 83 274 { 84 - RADIX_TREE(tree, GFP_KERNEL); 85 - struct radix_tree_iter iter; 86 - void **slot; 275 + XA_STATE(xas, xa, 0); 276 + struct item *item; 87 277 int i, j; 88 - 89 - printv(1, "Multiorder tagged iteration test\n"); 90 278 91 279 #define MT_NUM_ENTRIES 9 92 280 int index[MT_NUM_ENTRIES] = {0, 2, 4, 16, 32, 40, 64, 72, 128}; ··· 92 286 #define TAG_ENTRIES 7 93 287 int tag_index[TAG_ENTRIES] = {0, 4, 16, 40, 64, 72, 128}; 94 288 95 - for (i = 0; i < MT_NUM_ENTRIES; i++) 96 - assert(!item_insert_order(&tree, index[i], order[i])); 289 + printv(1, "Multiorder tagged iteration test\n"); 97 290 98 - assert(!radix_tree_tagged(&tree, 1)); 291 + for (i = 0; i < MT_NUM_ENTRIES; i++) 292 + assert(!item_insert_order(xa, index[i], order[i])); 293 + 294 + assert(!xa_marked(xa, XA_MARK_1)); 99 295 100 296 for (i = 0; i < TAG_ENTRIES; i++) 101 - assert(radix_tree_tag_set(&tree, tag_index[i], 1)); 297 + xa_set_mark(xa, tag_index[i], XA_MARK_1); 102 298 103 299 for (j = 0; j < 256; j++) { 104 300 int k; ··· 112 304 break; 113 305 } 114 306 115 - radix_tree_for_each_tagged(slot, &tree, &iter, j, 1) { 307 + xas_set(&xas, j); 308 + xas_for_each_marked(&xas, item, ULONG_MAX, XA_MARK_1) { 116 309 unsigned long mask; 117 - struct item *item = *slot; 118 310 for (k = i; index[k] < tag_index[i]; k++) 119 311 ; 120 312 mask = (1UL << order[k]) - 1; 121 313 122 - assert((iter.index | mask) == (tag_index[i] | mask)); 123 - assert(!radix_tree_is_internal_node(item)); 314 + assert((xas.xa_index | mask) == (tag_index[i] | mask)); 315 + assert(!xa_is_internal(item)); 124 316 assert((item->index | mask) == (tag_index[i] | mask)); 125 317 assert(item->order == order[k]); 126 318 i++; 127 319 } 128 320 } 129 321 130 - assert(tag_tagged_items(&tree, NULL, 0, ~0UL, TAG_ENTRIES, 1, 2) == 131 - TAG_ENTRIES); 322 + assert(tag_tagged_items(xa, 0, ULONG_MAX, TAG_ENTRIES, XA_MARK_1, 323 + XA_MARK_2) == TAG_ENTRIES); 132 324 133 325 for (j = 0; j < 256; j++) { 134 326 int mask, k; ··· 140 332 break; 141 333 } 142 334 143 - radix_tree_for_each_tagged(slot, &tree, &iter, j, 2) { 144 - struct item *item = *slot; 335 + xas_set(&xas, j); 336 + xas_for_each_marked(&xas, item, ULONG_MAX, XA_MARK_2) { 145 337 for (k = i; index[k] < tag_index[i]; k++) 146 338 ; 147 339 mask = (1 << order[k]) - 1; 148 340 149 - assert((iter.index | mask) == (tag_index[i] | mask)); 150 - assert(!radix_tree_is_internal_node(item)); 341 + assert((xas.xa_index | mask) == (tag_index[i] | mask)); 342 + assert(!xa_is_internal(item)); 151 343 assert((item->index | mask) == (tag_index[i] | mask)); 152 344 assert(item->order == order[k]); 153 345 i++; 154 346 } 155 347 } 156 348 157 - assert(tag_tagged_items(&tree, NULL, 1, ~0UL, MT_NUM_ENTRIES * 2, 1, 0) 158 - == TAG_ENTRIES); 349 + assert(tag_tagged_items(xa, 1, ULONG_MAX, MT_NUM_ENTRIES * 2, XA_MARK_1, 350 + XA_MARK_0) == TAG_ENTRIES); 159 351 i = 0; 160 - radix_tree_for_each_tagged(slot, &tree, &iter, 0, 0) { 161 - assert(iter.index == tag_index[i]); 352 + xas_set(&xas, 0); 353 + xas_for_each_marked(&xas, item, ULONG_MAX, XA_MARK_0) { 354 + assert(xas.xa_index == tag_index[i]); 162 355 i++; 163 356 } 357 + assert(i == TAG_ENTRIES); 164 358 165 - item_kill_tree(&tree); 166 - } 167 - 168 - /* 169 - * Basic join checks: make sure we can't find an entry in the tree after 170 - * a larger entry has replaced it 171 - */ 172 - static void multiorder_join1(unsigned long index, 173 - unsigned order1, unsigned order2) 174 - { 175 - unsigned long loc; 176 - void *item, *item2 = item_create(index + 1, order1); 177 - RADIX_TREE(tree, GFP_KERNEL); 178 - 179 - item_insert_order(&tree, index, order2); 180 - item = radix_tree_lookup(&tree, index); 181 - radix_tree_join(&tree, index + 1, order1, item2); 182 - loc = find_item(&tree, item); 183 - if (loc == -1) 184 - free(item); 185 - item = radix_tree_lookup(&tree, index + 1); 186 - assert(item == item2); 187 - item_kill_tree(&tree); 188 - } 189 - 190 - /* 191 - * Check that the accounting of exceptional entries is handled correctly 192 - * by joining an exceptional entry to a normal pointer. 193 - */ 194 - static void multiorder_join2(unsigned order1, unsigned order2) 195 - { 196 - RADIX_TREE(tree, GFP_KERNEL); 197 - struct radix_tree_node *node; 198 - void *item1 = item_create(0, order1); 199 - void *item2; 200 - 201 - item_insert_order(&tree, 0, order2); 202 - radix_tree_insert(&tree, 1 << order2, (void *)0x12UL); 203 - item2 = __radix_tree_lookup(&tree, 1 << order2, &node, NULL); 204 - assert(item2 == (void *)0x12UL); 205 - assert(node->exceptional == 1); 206 - 207 - item2 = radix_tree_lookup(&tree, 0); 208 - free(item2); 209 - 210 - radix_tree_join(&tree, 0, order1, item1); 211 - item2 = __radix_tree_lookup(&tree, 1 << order2, &node, NULL); 212 - assert(item2 == item1); 213 - assert(node->exceptional == 0); 214 - item_kill_tree(&tree); 215 - } 216 - 217 - /* 218 - * This test revealed an accounting bug for exceptional entries at one point. 219 - * Nodes were being freed back into the pool with an elevated exception count 220 - * by radix_tree_join() and then radix_tree_split() was failing to zero the 221 - * count of exceptional entries. 222 - */ 223 - static void multiorder_join3(unsigned int order) 224 - { 225 - RADIX_TREE(tree, GFP_KERNEL); 226 - struct radix_tree_node *node; 227 - void **slot; 228 - struct radix_tree_iter iter; 229 - unsigned long i; 230 - 231 - for (i = 0; i < (1 << order); i++) { 232 - radix_tree_insert(&tree, i, (void *)0x12UL); 233 - } 234 - 235 - radix_tree_join(&tree, 0, order, (void *)0x16UL); 236 - rcu_barrier(); 237 - 238 - radix_tree_split(&tree, 0, 0); 239 - 240 - radix_tree_for_each_slot(slot, &tree, &iter, 0) { 241 - radix_tree_iter_replace(&tree, &iter, slot, (void *)0x12UL); 242 - } 243 - 244 - __radix_tree_lookup(&tree, 0, &node, NULL); 245 - assert(node->exceptional == node->count); 246 - 247 - item_kill_tree(&tree); 248 - } 249 - 250 - static void multiorder_join(void) 251 - { 252 - int i, j, idx; 253 - 254 - for (idx = 0; idx < 1024; idx = idx * 2 + 3) { 255 - for (i = 1; i < 15; i++) { 256 - for (j = 0; j < i; j++) { 257 - multiorder_join1(idx, i, j); 258 - } 259 - } 260 - } 261 - 262 - for (i = 1; i < 15; i++) { 263 - for (j = 0; j < i; j++) { 264 - multiorder_join2(i, j); 265 - } 266 - } 267 - 268 - for (i = 3; i < 10; i++) { 269 - multiorder_join3(i); 270 - } 271 - } 272 - 273 - static void check_mem(unsigned old_order, unsigned new_order, unsigned alloc) 274 - { 275 - struct radix_tree_preload *rtp = &radix_tree_preloads; 276 - if (rtp->nr != 0) 277 - printv(2, "split(%u %u) remaining %u\n", old_order, new_order, 278 - rtp->nr); 279 - /* 280 - * Can't check for equality here as some nodes may have been 281 - * RCU-freed while we ran. But we should never finish with more 282 - * nodes allocated since they should have all been preloaded. 283 - */ 284 - if (nr_allocated > alloc) 285 - printv(2, "split(%u %u) allocated %u %u\n", old_order, new_order, 286 - alloc, nr_allocated); 287 - } 288 - 289 - static void __multiorder_split(int old_order, int new_order) 290 - { 291 - RADIX_TREE(tree, GFP_ATOMIC); 292 - void **slot; 293 - struct radix_tree_iter iter; 294 - unsigned alloc; 295 - struct item *item; 296 - 297 - radix_tree_preload(GFP_KERNEL); 298 - assert(item_insert_order(&tree, 0, old_order) == 0); 299 - radix_tree_preload_end(); 300 - 301 - /* Wipe out the preloaded cache or it'll confuse check_mem() */ 302 - radix_tree_cpu_dead(0); 303 - 304 - item = radix_tree_tag_set(&tree, 0, 2); 305 - 306 - radix_tree_split_preload(old_order, new_order, GFP_KERNEL); 307 - alloc = nr_allocated; 308 - radix_tree_split(&tree, 0, new_order); 309 - check_mem(old_order, new_order, alloc); 310 - radix_tree_for_each_slot(slot, &tree, &iter, 0) { 311 - radix_tree_iter_replace(&tree, &iter, slot, 312 - item_create(iter.index, new_order)); 313 - } 314 - radix_tree_preload_end(); 315 - 316 - item_kill_tree(&tree); 317 - free(item); 318 - } 319 - 320 - static void __multiorder_split2(int old_order, int new_order) 321 - { 322 - RADIX_TREE(tree, GFP_KERNEL); 323 - void **slot; 324 - struct radix_tree_iter iter; 325 - struct radix_tree_node *node; 326 - void *item; 327 - 328 - __radix_tree_insert(&tree, 0, old_order, (void *)0x12); 329 - 330 - item = __radix_tree_lookup(&tree, 0, &node, NULL); 331 - assert(item == (void *)0x12); 332 - assert(node->exceptional > 0); 333 - 334 - radix_tree_split(&tree, 0, new_order); 335 - radix_tree_for_each_slot(slot, &tree, &iter, 0) { 336 - radix_tree_iter_replace(&tree, &iter, slot, 337 - item_create(iter.index, new_order)); 338 - } 339 - 340 - item = __radix_tree_lookup(&tree, 0, &node, NULL); 341 - assert(item != (void *)0x12); 342 - assert(node->exceptional == 0); 343 - 344 - item_kill_tree(&tree); 345 - } 346 - 347 - static void __multiorder_split3(int old_order, int new_order) 348 - { 349 - RADIX_TREE(tree, GFP_KERNEL); 350 - void **slot; 351 - struct radix_tree_iter iter; 352 - struct radix_tree_node *node; 353 - void *item; 354 - 355 - __radix_tree_insert(&tree, 0, old_order, (void *)0x12); 356 - 357 - item = __radix_tree_lookup(&tree, 0, &node, NULL); 358 - assert(item == (void *)0x12); 359 - assert(node->exceptional > 0); 360 - 361 - radix_tree_split(&tree, 0, new_order); 362 - radix_tree_for_each_slot(slot, &tree, &iter, 0) { 363 - radix_tree_iter_replace(&tree, &iter, slot, (void *)0x16); 364 - } 365 - 366 - item = __radix_tree_lookup(&tree, 0, &node, NULL); 367 - assert(item == (void *)0x16); 368 - assert(node->exceptional > 0); 369 - 370 - item_kill_tree(&tree); 371 - 372 - __radix_tree_insert(&tree, 0, old_order, (void *)0x12); 373 - 374 - item = __radix_tree_lookup(&tree, 0, &node, NULL); 375 - assert(item == (void *)0x12); 376 - assert(node->exceptional > 0); 377 - 378 - radix_tree_split(&tree, 0, new_order); 379 - radix_tree_for_each_slot(slot, &tree, &iter, 0) { 380 - if (iter.index == (1 << new_order)) 381 - radix_tree_iter_replace(&tree, &iter, slot, 382 - (void *)0x16); 383 - else 384 - radix_tree_iter_replace(&tree, &iter, slot, NULL); 385 - } 386 - 387 - item = __radix_tree_lookup(&tree, 1 << new_order, &node, NULL); 388 - assert(item == (void *)0x16); 389 - assert(node->count == node->exceptional); 390 - do { 391 - node = node->parent; 392 - if (!node) 393 - break; 394 - assert(node->count == 1); 395 - assert(node->exceptional == 0); 396 - } while (1); 397 - 398 - item_kill_tree(&tree); 399 - } 400 - 401 - static void multiorder_split(void) 402 - { 403 - int i, j; 404 - 405 - for (i = 3; i < 11; i++) 406 - for (j = 0; j < i; j++) { 407 - __multiorder_split(i, j); 408 - __multiorder_split2(i, j); 409 - __multiorder_split3(i, j); 410 - } 411 - } 412 - 413 - static void multiorder_account(void) 414 - { 415 - RADIX_TREE(tree, GFP_KERNEL); 416 - struct radix_tree_node *node; 417 - void **slot; 418 - 419 - item_insert_order(&tree, 0, 5); 420 - 421 - __radix_tree_insert(&tree, 1 << 5, 5, (void *)0x12); 422 - __radix_tree_lookup(&tree, 0, &node, NULL); 423 - assert(node->count == node->exceptional * 2); 424 - radix_tree_delete(&tree, 1 << 5); 425 - assert(node->exceptional == 0); 426 - 427 - __radix_tree_insert(&tree, 1 << 5, 5, (void *)0x12); 428 - __radix_tree_lookup(&tree, 1 << 5, &node, &slot); 429 - assert(node->count == node->exceptional * 2); 430 - __radix_tree_replace(&tree, node, slot, NULL, NULL); 431 - assert(node->exceptional == 0); 432 - 433 - item_kill_tree(&tree); 359 + item_kill_tree(xa); 434 360 } 435 361 436 362 bool stop_iteration = false; ··· 187 645 188 646 static void *iterator_func(void *ptr) 189 647 { 190 - struct radix_tree_root *tree = ptr; 191 - struct radix_tree_iter iter; 648 + XA_STATE(xas, ptr, 0); 192 649 struct item *item; 193 - void **slot; 194 650 195 651 while (!stop_iteration) { 196 652 rcu_read_lock(); 197 - radix_tree_for_each_slot(slot, tree, &iter, 0) { 198 - item = radix_tree_deref_slot(slot); 199 - 200 - if (!item) 653 + xas_for_each(&xas, item, ULONG_MAX) { 654 + if (xas_retry(&xas, item)) 201 655 continue; 202 - if (radix_tree_deref_retry(item)) { 203 - slot = radix_tree_iter_retry(&iter); 204 - continue; 205 - } 206 656 207 - item_sanity(item, iter.index); 657 + item_sanity(item, xas.xa_index); 208 658 } 209 659 rcu_read_unlock(); 210 660 } 211 661 return NULL; 212 662 } 213 663 214 - static void multiorder_iteration_race(void) 664 + static void multiorder_iteration_race(struct xarray *xa) 215 665 { 216 666 const int num_threads = sysconf(_SC_NPROCESSORS_ONLN); 217 667 pthread_t worker_thread[num_threads]; 218 - RADIX_TREE(tree, GFP_KERNEL); 219 668 int i; 220 669 221 - pthread_create(&worker_thread[0], NULL, &creator_func, &tree); 670 + pthread_create(&worker_thread[0], NULL, &creator_func, xa); 222 671 for (i = 1; i < num_threads; i++) 223 - pthread_create(&worker_thread[i], NULL, &iterator_func, &tree); 672 + pthread_create(&worker_thread[i], NULL, &iterator_func, xa); 224 673 225 674 for (i = 0; i < num_threads; i++) 226 675 pthread_join(worker_thread[i], NULL); 227 676 228 - item_kill_tree(&tree); 677 + item_kill_tree(xa); 229 678 } 679 + 680 + static DEFINE_XARRAY(array); 230 681 231 682 void multiorder_checks(void) 232 683 { 233 - int i; 234 - 235 - for (i = 0; i < 20; i++) { 236 - multiorder_check(200, i); 237 - multiorder_check(0, i); 238 - multiorder_check((1UL << i) + 1, i); 239 - } 240 - 241 - for (i = 0; i < 15; i++) 242 - multiorder_shrink((1UL << (i + RADIX_TREE_MAP_SHIFT)), i); 243 - 244 - multiorder_insert_bug(); 245 - multiorder_tag_tests(); 246 - multiorder_iteration(); 247 - multiorder_tagged_iteration(); 248 - multiorder_join(); 249 - multiorder_split(); 250 - multiorder_account(); 251 - multiorder_iteration_race(); 684 + multiorder_iteration(&array); 685 + multiorder_tagged_iteration(&array); 686 + multiorder_iteration_race(&array); 252 687 253 688 radix_tree_cpu_dead(0); 254 689 }
+27 -48
tools/testing/radix-tree/regression1.c
··· 44 44 #include "regression.h" 45 45 46 46 static RADIX_TREE(mt_tree, GFP_KERNEL); 47 - static pthread_mutex_t mt_lock = PTHREAD_MUTEX_INITIALIZER; 48 47 49 48 struct page { 50 49 pthread_mutex_t lock; ··· 52 53 unsigned long index; 53 54 }; 54 55 55 - static struct page *page_alloc(void) 56 + static struct page *page_alloc(int index) 56 57 { 57 58 struct page *p; 58 59 p = malloc(sizeof(struct page)); 59 60 p->count = 1; 60 - p->index = 1; 61 + p->index = index; 61 62 pthread_mutex_init(&p->lock, NULL); 62 63 63 64 return p; ··· 79 80 static unsigned find_get_pages(unsigned long start, 80 81 unsigned int nr_pages, struct page **pages) 81 82 { 82 - unsigned int i; 83 - unsigned int ret; 84 - unsigned int nr_found; 83 + XA_STATE(xas, &mt_tree, start); 84 + struct page *page; 85 + unsigned int ret = 0; 85 86 86 87 rcu_read_lock(); 87 - restart: 88 - nr_found = radix_tree_gang_lookup_slot(&mt_tree, 89 - (void ***)pages, NULL, start, nr_pages); 90 - ret = 0; 91 - for (i = 0; i < nr_found; i++) { 92 - struct page *page; 93 - repeat: 94 - page = radix_tree_deref_slot((void **)pages[i]); 95 - if (unlikely(!page)) 88 + xas_for_each(&xas, page, ULONG_MAX) { 89 + if (xas_retry(&xas, page)) 96 90 continue; 97 91 98 - if (radix_tree_exception(page)) { 99 - if (radix_tree_deref_retry(page)) { 100 - /* 101 - * Transient condition which can only trigger 102 - * when entry at index 0 moves out of or back 103 - * to root: none yet gotten, safe to restart. 104 - */ 105 - assert((start | i) == 0); 106 - goto restart; 107 - } 108 - /* 109 - * No exceptional entries are inserted in this test. 110 - */ 111 - assert(0); 112 - } 113 - 114 92 pthread_mutex_lock(&page->lock); 115 - if (!page->count) { 116 - pthread_mutex_unlock(&page->lock); 117 - goto repeat; 118 - } 93 + if (!page->count) 94 + goto unlock; 95 + 119 96 /* don't actually update page refcount */ 120 97 pthread_mutex_unlock(&page->lock); 121 98 122 99 /* Has the page moved? */ 123 - if (unlikely(page != *((void **)pages[i]))) { 124 - goto repeat; 125 - } 100 + if (unlikely(page != xas_reload(&xas))) 101 + goto put_page; 126 102 127 103 pages[ret] = page; 128 104 ret++; 105 + continue; 106 + unlock: 107 + pthread_mutex_unlock(&page->lock); 108 + put_page: 109 + xas_reset(&xas); 129 110 } 130 111 rcu_read_unlock(); 131 112 return ret; ··· 124 145 for (j = 0; j < 1000000; j++) { 125 146 struct page *p; 126 147 127 - p = page_alloc(); 128 - pthread_mutex_lock(&mt_lock); 148 + p = page_alloc(0); 149 + xa_lock(&mt_tree); 129 150 radix_tree_insert(&mt_tree, 0, p); 130 - pthread_mutex_unlock(&mt_lock); 151 + xa_unlock(&mt_tree); 131 152 132 - p = page_alloc(); 133 - pthread_mutex_lock(&mt_lock); 153 + p = page_alloc(1); 154 + xa_lock(&mt_tree); 134 155 radix_tree_insert(&mt_tree, 1, p); 135 - pthread_mutex_unlock(&mt_lock); 156 + xa_unlock(&mt_tree); 136 157 137 - pthread_mutex_lock(&mt_lock); 158 + xa_lock(&mt_tree); 138 159 p = radix_tree_delete(&mt_tree, 1); 139 160 pthread_mutex_lock(&p->lock); 140 161 p->count--; 141 162 pthread_mutex_unlock(&p->lock); 142 - pthread_mutex_unlock(&mt_lock); 163 + xa_unlock(&mt_tree); 143 164 page_free(p); 144 165 145 - pthread_mutex_lock(&mt_lock); 166 + xa_lock(&mt_tree); 146 167 p = radix_tree_delete(&mt_tree, 0); 147 168 pthread_mutex_lock(&p->lock); 148 169 p->count--; 149 170 pthread_mutex_unlock(&p->lock); 150 - pthread_mutex_unlock(&mt_lock); 171 + xa_unlock(&mt_tree); 151 172 page_free(p); 152 173 } 153 174 } else {
+4 -4
tools/testing/radix-tree/regression2.c
··· 53 53 #include "regression.h" 54 54 #include "test.h" 55 55 56 - #define PAGECACHE_TAG_DIRTY 0 57 - #define PAGECACHE_TAG_WRITEBACK 1 58 - #define PAGECACHE_TAG_TOWRITE 2 56 + #define PAGECACHE_TAG_DIRTY XA_MARK_0 57 + #define PAGECACHE_TAG_WRITEBACK XA_MARK_1 58 + #define PAGECACHE_TAG_TOWRITE XA_MARK_2 59 59 60 60 static RADIX_TREE(mt_tree, GFP_KERNEL); 61 61 unsigned long page_count = 0; ··· 92 92 /* 1. */ 93 93 start = 0; 94 94 end = max_slots - 2; 95 - tag_tagged_items(&mt_tree, NULL, start, end, 1, 95 + tag_tagged_items(&mt_tree, start, end, 1, 96 96 PAGECACHE_TAG_DIRTY, PAGECACHE_TAG_TOWRITE); 97 97 98 98 /* 2. */
-23
tools/testing/radix-tree/regression3.c
··· 69 69 continue; 70 70 } 71 71 } 72 - radix_tree_delete(&root, 1); 73 - 74 - first = true; 75 - radix_tree_for_each_contig(slot, &root, &iter, 0) { 76 - printv(2, "contig %ld %p\n", iter.index, *slot); 77 - if (first) { 78 - radix_tree_insert(&root, 1, ptr); 79 - first = false; 80 - } 81 - if (radix_tree_deref_retry(*slot)) { 82 - printv(2, "retry at %ld\n", iter.index); 83 - slot = radix_tree_iter_retry(&iter); 84 - continue; 85 - } 86 - } 87 72 88 73 radix_tree_for_each_slot(slot, &root, &iter, 0) { 89 74 printv(2, "slot %ld %p\n", iter.index, *slot); 90 - if (!iter.index) { 91 - printv(2, "next at %ld\n", iter.index); 92 - slot = radix_tree_iter_resume(slot, &iter); 93 - } 94 - } 95 - 96 - radix_tree_for_each_contig(slot, &root, &iter, 0) { 97 - printv(2, "contig %ld %p\n", iter.index, *slot); 98 75 if (!iter.index) { 99 76 printv(2, "next at %ld\n", iter.index); 100 77 slot = radix_tree_iter_resume(slot, &iter);
+2 -31
tools/testing/radix-tree/tag_check.c
··· 24 24 item_tag_set(tree, index, tag); 25 25 ret = item_tag_get(tree, index, tag); 26 26 assert(ret != 0); 27 - ret = tag_tagged_items(tree, NULL, first, ~0UL, 10, tag, !tag); 27 + ret = tag_tagged_items(tree, first, ~0UL, 10, tag, !tag); 28 28 assert(ret == 1); 29 29 ret = item_tag_get(tree, index, !tag); 30 30 assert(ret != 0); ··· 321 321 assert(ret == 0); 322 322 verify_tag_consistency(&tree, 0); 323 323 verify_tag_consistency(&tree, 1); 324 - ret = tag_tagged_items(&tree, NULL, first, 10, 10, 0, 1); 324 + ret = tag_tagged_items(&tree, first, 10, 10, XA_MARK_0, XA_MARK_1); 325 325 assert(ret == 1); 326 326 ret = radix_tree_gang_lookup_tag(&tree, (void **)items, 0, BATCH, 1); 327 327 assert(ret == 1); 328 328 item_tag_clear(&tree, 0, 0); 329 329 ret = radix_tree_gang_lookup_tag(&tree, (void **)items, 0, BATCH, 0); 330 330 assert(ret == 0); 331 - item_kill_tree(&tree); 332 - } 333 - 334 - void radix_tree_clear_tags_test(void) 335 - { 336 - unsigned long index; 337 - struct radix_tree_node *node; 338 - struct radix_tree_iter iter; 339 - void **slot; 340 - 341 - RADIX_TREE(tree, GFP_KERNEL); 342 - 343 - item_insert(&tree, 0); 344 - item_tag_set(&tree, 0, 0); 345 - __radix_tree_lookup(&tree, 0, &node, &slot); 346 - radix_tree_clear_tags(&tree, node, slot); 347 - assert(item_tag_get(&tree, 0, 0) == 0); 348 - 349 - for (index = 0; index < 1000; index++) { 350 - item_insert(&tree, index); 351 - item_tag_set(&tree, index, 0); 352 - } 353 - 354 - radix_tree_for_each_slot(slot, &tree, &iter, 0) { 355 - radix_tree_clear_tags(&tree, iter.node, slot); 356 - assert(item_tag_get(&tree, iter.index, 0) == 0); 357 - } 358 - 359 331 item_kill_tree(&tree); 360 332 } 361 333 ··· 348 376 thrash_tags(); 349 377 rcu_barrier(); 350 378 printv(2, "after thrash_tags: %d allocated\n", nr_allocated); 351 - radix_tree_clear_tags_test(); 352 379 }
+42 -89
tools/testing/radix-tree/test.c
··· 25 25 return radix_tree_tag_get(root, index, tag); 26 26 } 27 27 28 - int __item_insert(struct radix_tree_root *root, struct item *item) 29 - { 30 - return __radix_tree_insert(root, item->index, item->order, item); 31 - } 32 - 33 28 struct item *item_create(unsigned long index, unsigned int order) 34 29 { 35 30 struct item *ret = malloc(sizeof(*ret)); ··· 34 39 return ret; 35 40 } 36 41 37 - int item_insert_order(struct radix_tree_root *root, unsigned long index, 38 - unsigned order) 42 + int item_insert(struct radix_tree_root *root, unsigned long index) 39 43 { 40 - struct item *item = item_create(index, order); 41 - int err = __item_insert(root, item); 44 + struct item *item = item_create(index, 0); 45 + int err = radix_tree_insert(root, item->index, item); 42 46 if (err) 43 47 free(item); 44 48 return err; 45 - } 46 - 47 - int item_insert(struct radix_tree_root *root, unsigned long index) 48 - { 49 - return item_insert_order(root, index, 0); 50 49 } 51 50 52 51 void item_sanity(struct item *item, unsigned long index) ··· 52 63 assert((item->index | mask) == (index | mask)); 53 64 } 54 65 66 + void item_free(struct item *item, unsigned long index) 67 + { 68 + item_sanity(item, index); 69 + free(item); 70 + } 71 + 55 72 int item_delete(struct radix_tree_root *root, unsigned long index) 56 73 { 57 74 struct item *item = radix_tree_delete(root, index); 58 75 59 - if (item) { 60 - item_sanity(item, index); 61 - free(item); 62 - return 1; 63 - } 64 - return 0; 76 + if (!item) 77 + return 0; 78 + 79 + item_free(item, index); 80 + return 1; 65 81 } 66 82 67 83 static void item_free_rcu(struct rcu_head *head) ··· 76 82 free(item); 77 83 } 78 84 79 - int item_delete_rcu(struct radix_tree_root *root, unsigned long index) 85 + int item_delete_rcu(struct xarray *xa, unsigned long index) 80 86 { 81 - struct item *item = radix_tree_delete(root, index); 87 + struct item *item = xa_erase(xa, index); 82 88 83 89 if (item) { 84 90 item_sanity(item, index); ··· 170 176 } 171 177 172 178 /* Use the same pattern as tag_pages_for_writeback() in mm/page-writeback.c */ 173 - int tag_tagged_items(struct radix_tree_root *root, pthread_mutex_t *lock, 174 - unsigned long start, unsigned long end, unsigned batch, 175 - unsigned iftag, unsigned thentag) 179 + int tag_tagged_items(struct xarray *xa, unsigned long start, unsigned long end, 180 + unsigned batch, xa_mark_t iftag, xa_mark_t thentag) 176 181 { 177 - unsigned long tagged = 0; 178 - struct radix_tree_iter iter; 179 - void **slot; 182 + XA_STATE(xas, xa, start); 183 + unsigned int tagged = 0; 184 + struct item *item; 180 185 181 186 if (batch == 0) 182 187 batch = 1; 183 188 184 - if (lock) 185 - pthread_mutex_lock(lock); 186 - radix_tree_for_each_tagged(slot, root, &iter, start, iftag) { 187 - if (iter.index > end) 188 - break; 189 - radix_tree_iter_tag_set(root, &iter, thentag); 190 - tagged++; 191 - if ((tagged % batch) != 0) 189 + xas_lock_irq(&xas); 190 + xas_for_each_marked(&xas, item, end, iftag) { 191 + xas_set_mark(&xas, thentag); 192 + if (++tagged % batch) 192 193 continue; 193 - slot = radix_tree_iter_resume(slot, &iter); 194 - if (lock) { 195 - pthread_mutex_unlock(lock); 196 - rcu_barrier(); 197 - pthread_mutex_lock(lock); 198 - } 194 + 195 + xas_pause(&xas); 196 + xas_unlock_irq(&xas); 197 + rcu_barrier(); 198 + xas_lock_irq(&xas); 199 199 } 200 - if (lock) 201 - pthread_mutex_unlock(lock); 200 + xas_unlock_irq(&xas); 202 201 203 202 return tagged; 204 - } 205 - 206 - /* Use the same pattern as find_swap_entry() in mm/shmem.c */ 207 - unsigned long find_item(struct radix_tree_root *root, void *item) 208 - { 209 - struct radix_tree_iter iter; 210 - void **slot; 211 - unsigned long found = -1; 212 - unsigned long checked = 0; 213 - 214 - radix_tree_for_each_slot(slot, root, &iter, 0) { 215 - if (*slot == item) { 216 - found = iter.index; 217 - break; 218 - } 219 - checked++; 220 - if ((checked % 4) != 0) 221 - continue; 222 - slot = radix_tree_iter_resume(slot, &iter); 223 - } 224 - 225 - return found; 226 203 } 227 204 228 205 static int verify_node(struct radix_tree_node *slot, unsigned int tag, ··· 246 281 247 282 void verify_tag_consistency(struct radix_tree_root *root, unsigned int tag) 248 283 { 249 - struct radix_tree_node *node = root->rnode; 284 + struct radix_tree_node *node = root->xa_head; 250 285 if (!radix_tree_is_internal_node(node)) 251 286 return; 252 287 verify_node(node, tag, !!root_tag_get(root, tag)); 253 288 } 254 289 255 - void item_kill_tree(struct radix_tree_root *root) 290 + void item_kill_tree(struct xarray *xa) 256 291 { 257 - struct radix_tree_iter iter; 258 - void **slot; 259 - struct item *items[32]; 260 - int nfound; 292 + XA_STATE(xas, xa, 0); 293 + void *entry; 261 294 262 - radix_tree_for_each_slot(slot, root, &iter, 0) { 263 - if (radix_tree_exceptional_entry(*slot)) 264 - radix_tree_delete(root, iter.index); 265 - } 266 - 267 - while ((nfound = radix_tree_gang_lookup(root, (void **)items, 0, 32))) { 268 - int i; 269 - 270 - for (i = 0; i < nfound; i++) { 271 - void *ret; 272 - 273 - ret = radix_tree_delete(root, items[i]->index); 274 - assert(ret == items[i]); 275 - free(items[i]); 295 + xas_for_each(&xas, entry, ULONG_MAX) { 296 + if (!xa_is_value(entry)) { 297 + item_free(entry, xas.xa_index); 276 298 } 299 + xas_store(&xas, NULL); 277 300 } 278 - assert(radix_tree_gang_lookup(root, (void **)items, 0, 32) == 0); 279 - assert(root->rnode == NULL); 301 + 302 + assert(xa_empty(xa)); 280 303 } 281 304 282 305 void tree_verify_min_height(struct radix_tree_root *root, int maxindex) 283 306 { 284 307 unsigned shift; 285 - struct radix_tree_node *node = root->rnode; 308 + struct radix_tree_node *node = root->xa_head; 286 309 if (!radix_tree_is_internal_node(node)) { 287 310 assert(maxindex == 0); 288 311 return;
+5 -8
tools/testing/radix-tree/test.h
··· 11 11 }; 12 12 13 13 struct item *item_create(unsigned long index, unsigned int order); 14 - int __item_insert(struct radix_tree_root *root, struct item *item); 15 14 int item_insert(struct radix_tree_root *root, unsigned long index); 16 15 void item_sanity(struct item *item, unsigned long index); 17 - int item_insert_order(struct radix_tree_root *root, unsigned long index, 18 - unsigned order); 16 + void item_free(struct item *item, unsigned long index); 19 17 int item_delete(struct radix_tree_root *root, unsigned long index); 20 - int item_delete_rcu(struct radix_tree_root *root, unsigned long index); 18 + int item_delete_rcu(struct xarray *xa, unsigned long index); 21 19 struct item *item_lookup(struct radix_tree_root *root, unsigned long index); 22 20 23 21 void item_check_present(struct radix_tree_root *root, unsigned long index); ··· 27 29 unsigned long nr, int chunk); 28 30 void item_kill_tree(struct radix_tree_root *root); 29 31 30 - int tag_tagged_items(struct radix_tree_root *, pthread_mutex_t *, 31 - unsigned long start, unsigned long end, unsigned batch, 32 - unsigned iftag, unsigned thentag); 33 - unsigned long find_item(struct radix_tree_root *, void *item); 32 + int tag_tagged_items(struct xarray *, unsigned long start, unsigned long end, 33 + unsigned batch, xa_mark_t iftag, xa_mark_t thentag); 34 34 35 + void xarray_tests(void); 35 36 void tag_check(void); 36 37 void multiorder_checks(void); 37 38 void iteration_test(unsigned order, unsigned duration);
+35
tools/testing/radix-tree/xarray.c
··· 1 + // SPDX-License-Identifier: GPL-2.0+ 2 + /* 3 + * xarray.c: Userspace shim for XArray test-suite 4 + * Copyright (c) 2018 Matthew Wilcox <willy@infradead.org> 5 + */ 6 + 7 + #define XA_DEBUG 8 + #include "test.h" 9 + 10 + #define module_init(x) 11 + #define module_exit(x) 12 + #define MODULE_AUTHOR(x) 13 + #define MODULE_LICENSE(x) 14 + #define dump_stack() assert(0) 15 + 16 + #include "../../../lib/xarray.c" 17 + #undef XA_DEBUG 18 + #include "../../../lib/test_xarray.c" 19 + 20 + void xarray_tests(void) 21 + { 22 + xarray_checks(); 23 + xarray_exit(); 24 + } 25 + 26 + int __weak main(void) 27 + { 28 + radix_tree_init(); 29 + xarray_tests(); 30 + radix_tree_cpu_dead(1); 31 + rcu_barrier(); 32 + if (nr_allocated) 33 + printf("nr_allocated = %d\n", nr_allocated); 34 + return 0; 35 + }