Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

rhashtable: use bit_spin_locks to protect hash bucket.

This patch changes rhashtables to use a bit_spin_lock on BIT(1) of the
bucket pointer to lock the hash chain for that bucket.

The benefits of a bit spin_lock are:
- no need to allocate a separate array of locks.
- no need to have a configuration option to guide the
choice of the size of this array
- locking cost is often a single test-and-set in a cache line
that will have to be loaded anyway. When inserting at, or removing
from, the head of the chain, the unlock is free - writing the new
address in the bucket head implicitly clears the lock bit.
For __rhashtable_insert_fast() we ensure this always happens
when adding a new key.
- even when lockings costs 2 updates (lock and unlock), they are
in a cacheline that needs to be read anyway.

The cost of using a bit spin_lock is a little bit of code complexity,
which I think is quite manageable.

Bit spin_locks are sometimes inappropriate because they are not fair -
if multiple CPUs repeatedly contend of the same lock, one CPU can
easily be starved. This is not a credible situation with rhashtable.
Multiple CPUs may want to repeatedly add or remove objects, but they
will typically do so at different buckets, so they will attempt to
acquire different locks.

As we have more bit-locks than we previously had spinlocks (by at
least a factor of two) we can expect slightly less contention to
go with the slightly better cache behavior and reduced memory
consumption.

To enhance type checking, a new struct is introduced to represent the
pointer plus lock-bit
that is stored in the bucket-table. This is "struct rhash_lock_head"
and is empty. A pointer to this needs to be cast to either an
unsigned lock, or a "struct rhash_head *" to be useful.
Variables of this type are most often called "bkt".

Previously "pprev" would sometimes point to a bucket, and sometimes a
->next pointer in an rhash_head. As these are now different types,
pprev is NULL when it would have pointed to the bucket. In that case,
'blk' is used, together with correct locking protocol.

Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

authored by

NeilBrown and committed by
David S. Miller
8f0db018 ff302db9

+235 -177
-2
include/linux/rhashtable-types.h
··· 48 48 * @head_offset: Offset of rhash_head in struct to be hashed 49 49 * @max_size: Maximum size while expanding 50 50 * @min_size: Minimum size while shrinking 51 - * @locks_mul: Number of bucket locks to allocate per cpu (default: 32) 52 51 * @automatic_shrinking: Enable automatic shrinking of tables 53 52 * @hashfn: Hash function (default: jhash2 if !(key_len % 4), or jhash) 54 53 * @obj_hashfn: Function to hash object ··· 61 62 unsigned int max_size; 62 63 u16 min_size; 63 64 bool automatic_shrinking; 64 - u8 locks_mul; 65 65 rht_hashfn_t hashfn; 66 66 rht_obj_hashfn_t obj_hashfn; 67 67 rht_obj_cmpfn_t obj_cmpfn;
+165 -96
include/linux/rhashtable.h
··· 24 24 #include <linux/list_nulls.h> 25 25 #include <linux/workqueue.h> 26 26 #include <linux/rculist.h> 27 + #include <linux/bit_spinlock.h> 27 28 28 29 #include <linux/rhashtable-types.h> 29 30 /* 31 + * Objects in an rhashtable have an embedded struct rhash_head 32 + * which is linked into as hash chain from the hash table - or one 33 + * of two or more hash tables when the rhashtable is being resized. 30 34 * The end of the chain is marked with a special nulls marks which has 31 - * the least significant bit set. 35 + * the least significant bit set but otherwise stores the address of 36 + * the hash bucket. This allows us to be be sure we've found the end 37 + * of the right list. 38 + * The value stored in the hash bucket has BIT(2) used as a lock bit. 39 + * This bit must be atomically set before any changes are made to 40 + * the chain. To avoid dereferencing this pointer without clearing 41 + * the bit first, we use an opaque 'struct rhash_lock_head *' for the 42 + * pointer stored in the bucket. This struct needs to be defined so 43 + * that rcu_derefernce() works on it, but it has no content so a 44 + * cast is needed for it to be useful. This ensures it isn't 45 + * used by mistake with clearing the lock bit first. 32 46 */ 47 + struct rhash_lock_head {}; 33 48 34 49 /* Maximum chain length before rehash 35 50 * ··· 67 52 * @nest: Number of bits of first-level nested table. 68 53 * @rehash: Current bucket being rehashed 69 54 * @hash_rnd: Random seed to fold into hash 70 - * @locks_mask: Mask to apply before accessing locks[] 71 - * @locks: Array of spinlocks protecting individual buckets 72 55 * @walkers: List of active walkers 73 56 * @rcu: RCU structure for freeing the table 74 57 * @future_tbl: Table under construction during rehashing ··· 77 64 unsigned int size; 78 65 unsigned int nest; 79 66 u32 hash_rnd; 80 - unsigned int locks_mask; 81 - spinlock_t *locks; 82 67 struct list_head walkers; 83 68 struct rcu_head rcu; 84 69 85 70 struct bucket_table __rcu *future_tbl; 86 71 87 - struct rhash_head __rcu *buckets[] ____cacheline_aligned_in_smp; 72 + struct rhash_lock_head __rcu *buckets[] ____cacheline_aligned_in_smp; 88 73 }; 74 + 75 + /* 76 + * We lock a bucket by setting BIT(1) in the pointer - this is always 77 + * zero in real pointers and in the nulls marker. 78 + * bit_spin_locks do not handle contention well, but the whole point 79 + * of the hashtable design is to achieve minimum per-bucket contention. 80 + * A nested hash table might not have a bucket pointer. In that case 81 + * we cannot get a lock. For remove and replace the bucket cannot be 82 + * interesting and doesn't need locking. 83 + * For insert we allocate the bucket if this is the last bucket_table, 84 + * and then take the lock. 85 + * Sometimes we unlock a bucket by writing a new pointer there. In that 86 + * case we don't need to unlock, but we do need to reset state such as 87 + * local_bh. For that we have rht_assign_unlock(). As rcu_assign_pointer() 88 + * provides the same release semantics that bit_spin_unlock() provides, 89 + * this is safe. 90 + */ 91 + 92 + static inline void rht_lock(struct rhash_lock_head **bkt) 93 + { 94 + local_bh_disable(); 95 + bit_spin_lock(1, (unsigned long *)bkt); 96 + } 97 + 98 + static inline void rht_unlock(struct rhash_lock_head **bkt) 99 + { 100 + bit_spin_unlock(1, (unsigned long *)bkt); 101 + local_bh_enable(); 102 + } 103 + 104 + static inline void rht_assign_unlock(struct rhash_lock_head **bkt, 105 + struct rhash_head *obj) 106 + { 107 + struct rhash_head **p = (struct rhash_head **)bkt; 108 + 109 + rcu_assign_pointer(*p, obj); 110 + preempt_enable(); 111 + __release(bitlock); 112 + local_bh_enable(); 113 + } 114 + 115 + /* 116 + * If 'p' is a bucket head and might be locked: 117 + * rht_ptr() returns the address without the lock bit. 118 + * rht_ptr_locked() returns the address WITH the lock bit. 119 + */ 120 + static inline struct rhash_head __rcu *rht_ptr(const struct rhash_lock_head *p) 121 + { 122 + return (void *)(((unsigned long)p) & ~BIT(1)); 123 + } 124 + 125 + static inline struct rhash_lock_head __rcu *rht_ptr_locked(const 126 + struct rhash_head *p) 127 + { 128 + return (void *)(((unsigned long)p) | BIT(1)); 129 + } 89 130 90 131 /* 91 132 * NULLS_MARKER() expects a hash value with the low ··· 273 206 return atomic_read(&ht->nelems) >= ht->max_elems; 274 207 } 275 208 276 - /* The bucket lock is selected based on the hash and protects mutations 277 - * on a group of hash buckets. 278 - * 279 - * A maximum of tbl->size/2 bucket locks is allocated. This ensures that 280 - * a single lock always covers both buckets which may both contains 281 - * entries which link to the same bucket of the old table during resizing. 282 - * This allows to simplify the locking as locking the bucket in both 283 - * tables during resize always guarantee protection. 284 - * 285 - * IMPORTANT: When holding the bucket lock of both the old and new table 286 - * during expansions and shrinking, the old bucket lock must always be 287 - * acquired first. 288 - */ 289 - static inline spinlock_t *rht_bucket_lock(const struct bucket_table *tbl, 290 - unsigned int hash) 291 - { 292 - return &tbl->locks[hash & tbl->locks_mask]; 293 - } 294 - 295 209 #ifdef CONFIG_PROVE_LOCKING 296 210 int lockdep_rht_mutex_is_held(struct rhashtable *ht); 297 211 int lockdep_rht_bucket_is_held(const struct bucket_table *tbl, u32 hash); ··· 311 263 void *arg); 312 264 void rhashtable_destroy(struct rhashtable *ht); 313 265 314 - struct rhash_head __rcu **rht_bucket_nested(const struct bucket_table *tbl, 315 - unsigned int hash); 316 - struct rhash_head __rcu **__rht_bucket_nested(const struct bucket_table *tbl, 317 - unsigned int hash); 318 - struct rhash_head __rcu **rht_bucket_nested_insert(struct rhashtable *ht, 319 - struct bucket_table *tbl, 266 + struct rhash_lock_head __rcu **rht_bucket_nested(const struct bucket_table *tbl, 267 + unsigned int hash); 268 + struct rhash_lock_head __rcu **__rht_bucket_nested(const struct bucket_table *tbl, 320 269 unsigned int hash); 270 + struct rhash_lock_head __rcu **rht_bucket_nested_insert(struct rhashtable *ht, 271 + struct bucket_table *tbl, 272 + unsigned int hash); 321 273 322 274 #define rht_dereference(p, ht) \ 323 275 rcu_dereference_protected(p, lockdep_rht_mutex_is_held(ht)) ··· 334 286 #define rht_entry(tpos, pos, member) \ 335 287 ({ tpos = container_of(pos, typeof(*tpos), member); 1; }) 336 288 337 - static inline struct rhash_head __rcu *const *rht_bucket( 289 + static inline struct rhash_lock_head __rcu *const *rht_bucket( 338 290 const struct bucket_table *tbl, unsigned int hash) 339 291 { 340 292 return unlikely(tbl->nest) ? rht_bucket_nested(tbl, hash) : 341 293 &tbl->buckets[hash]; 342 294 } 343 295 344 - static inline struct rhash_head __rcu **rht_bucket_var( 296 + static inline struct rhash_lock_head __rcu **rht_bucket_var( 345 297 struct bucket_table *tbl, unsigned int hash) 346 298 { 347 299 return unlikely(tbl->nest) ? __rht_bucket_nested(tbl, hash) : 348 300 &tbl->buckets[hash]; 349 301 } 350 302 351 - static inline struct rhash_head __rcu **rht_bucket_insert( 303 + static inline struct rhash_lock_head __rcu **rht_bucket_insert( 352 304 struct rhashtable *ht, struct bucket_table *tbl, unsigned int hash) 353 305 { 354 306 return unlikely(tbl->nest) ? rht_bucket_nested_insert(ht, tbl, hash) : ··· 374 326 * @hash: the hash value / bucket index 375 327 */ 376 328 #define rht_for_each(pos, tbl, hash) \ 377 - rht_for_each_from(pos, *rht_bucket(tbl, hash), tbl, hash) 329 + rht_for_each_from(pos, rht_ptr(*rht_bucket(tbl, hash)), tbl, hash) 378 330 379 331 /** 380 332 * rht_for_each_entry_from - iterate over hash chain from given head ··· 399 351 * @member: name of the &struct rhash_head within the hashable struct. 400 352 */ 401 353 #define rht_for_each_entry(tpos, pos, tbl, hash, member) \ 402 - rht_for_each_entry_from(tpos, pos, *rht_bucket(tbl, hash), \ 354 + rht_for_each_entry_from(tpos, pos, rht_ptr(*rht_bucket(tbl, hash)), \ 403 355 tbl, hash, member) 404 356 405 357 /** ··· 415 367 * remove the loop cursor from the list. 416 368 */ 417 369 #define rht_for_each_entry_safe(tpos, pos, next, tbl, hash, member) \ 418 - for (pos = rht_dereference_bucket(*rht_bucket(tbl, hash), tbl, hash), \ 370 + for (pos = rht_dereference_bucket(rht_ptr(*rht_bucket(tbl, hash)), \ 371 + tbl, hash), \ 419 372 next = !rht_is_a_nulls(pos) ? \ 420 373 rht_dereference_bucket(pos->next, tbl, hash) : NULL; \ 421 374 (!rht_is_a_nulls(pos)) && rht_entry(tpos, pos, member); \ ··· 451 402 * the _rcu mutation primitives such as rhashtable_insert() as long as the 452 403 * traversal is guarded by rcu_read_lock(). 453 404 */ 454 - #define rht_for_each_rcu(pos, tbl, hash) \ 455 - rht_for_each_rcu_from(pos, *rht_bucket(tbl, hash), tbl, hash) 405 + #define rht_for_each_rcu(pos, tbl, hash) \ 406 + for (({barrier(); }), \ 407 + pos = rht_ptr(rht_dereference_bucket_rcu( \ 408 + *rht_bucket(tbl, hash), tbl, hash)); \ 409 + !rht_is_a_nulls(pos); \ 410 + pos = rcu_dereference_raw(pos->next)) 456 411 457 412 /** 458 413 * rht_for_each_entry_rcu_from - iterated over rcu hash chain from given head ··· 490 437 * traversal is guarded by rcu_read_lock(). 491 438 */ 492 439 #define rht_for_each_entry_rcu(tpos, pos, tbl, hash, member) \ 493 - rht_for_each_entry_rcu_from(tpos, pos, *rht_bucket(tbl, hash), \ 440 + rht_for_each_entry_rcu_from(tpos, pos, \ 441 + rht_ptr(*rht_bucket(tbl, hash)), \ 494 442 tbl, hash, member) 495 443 496 444 /** ··· 537 483 .ht = ht, 538 484 .key = key, 539 485 }; 540 - struct rhash_head __rcu * const *head; 486 + struct rhash_lock_head __rcu * const *bkt; 541 487 struct bucket_table *tbl; 542 488 struct rhash_head *he; 543 489 unsigned int hash; ··· 545 491 tbl = rht_dereference_rcu(ht->tbl, ht); 546 492 restart: 547 493 hash = rht_key_hashfn(ht, tbl, key, params); 548 - head = rht_bucket(tbl, hash); 494 + bkt = rht_bucket(tbl, hash); 549 495 do { 550 - rht_for_each_rcu_from(he, *head, tbl, hash) { 496 + he = rht_ptr(rht_dereference_bucket_rcu(*bkt, tbl, hash)); 497 + rht_for_each_rcu_from(he, he, tbl, hash) { 551 498 if (params.obj_cmpfn ? 552 499 params.obj_cmpfn(&arg, rht_obj(ht, he)) : 553 500 rhashtable_compare(&arg, rht_obj(ht, he))) ··· 558 503 /* An object might have been moved to a different hash chain, 559 504 * while we walk along it - better check and retry. 560 505 */ 561 - } while (he != RHT_NULLS_MARKER(head)); 506 + } while (he != RHT_NULLS_MARKER(bkt)); 562 507 563 508 /* Ensure we see any new tables. */ 564 509 smp_rmb(); ··· 654 599 .ht = ht, 655 600 .key = key, 656 601 }; 602 + struct rhash_lock_head __rcu **bkt; 657 603 struct rhash_head __rcu **pprev; 658 604 struct bucket_table *tbl; 659 605 struct rhash_head *head; 660 - spinlock_t *lock; 661 606 unsigned int hash; 662 607 int elasticity; 663 608 void *data; ··· 666 611 667 612 tbl = rht_dereference_rcu(ht->tbl, ht); 668 613 hash = rht_head_hashfn(ht, tbl, obj, params); 669 - lock = rht_bucket_lock(tbl, hash); 670 - spin_lock_bh(lock); 614 + elasticity = RHT_ELASTICITY; 615 + bkt = rht_bucket_insert(ht, tbl, hash); 616 + data = ERR_PTR(-ENOMEM); 617 + if (!bkt) 618 + goto out; 619 + pprev = NULL; 620 + rht_lock(bkt); 671 621 672 622 if (unlikely(rcu_access_pointer(tbl->future_tbl))) { 673 623 slow_path: 674 - spin_unlock_bh(lock); 624 + rht_unlock(bkt); 675 625 rcu_read_unlock(); 676 626 return rhashtable_insert_slow(ht, key, obj); 677 627 } 678 628 679 - elasticity = RHT_ELASTICITY; 680 - pprev = rht_bucket_insert(ht, tbl, hash); 681 - data = ERR_PTR(-ENOMEM); 682 - if (!pprev) 683 - goto out; 684 - 685 - rht_for_each_from(head, *pprev, tbl, hash) { 629 + rht_for_each_from(head, rht_ptr(*bkt), tbl, hash) { 686 630 struct rhlist_head *plist; 687 631 struct rhlist_head *list; 688 632 ··· 697 643 data = rht_obj(ht, head); 698 644 699 645 if (!rhlist) 700 - goto out; 646 + goto out_unlock; 701 647 702 648 703 649 list = container_of(obj, struct rhlist_head, rhead); ··· 706 652 RCU_INIT_POINTER(list->next, plist); 707 653 head = rht_dereference_bucket(head->next, tbl, hash); 708 654 RCU_INIT_POINTER(list->rhead.next, head); 709 - rcu_assign_pointer(*pprev, obj); 710 - 711 - goto good; 655 + if (pprev) { 656 + rcu_assign_pointer(*pprev, obj); 657 + rht_unlock(bkt); 658 + } else 659 + rht_assign_unlock(bkt, obj); 660 + data = NULL; 661 + goto out; 712 662 } 713 663 714 664 if (elasticity <= 0) ··· 720 662 721 663 data = ERR_PTR(-E2BIG); 722 664 if (unlikely(rht_grow_above_max(ht, tbl))) 723 - goto out; 665 + goto out_unlock; 724 666 725 667 if (unlikely(rht_grow_above_100(ht, tbl))) 726 668 goto slow_path; 727 669 728 - head = rht_dereference_bucket(*pprev, tbl, hash); 670 + /* Inserting at head of list makes unlocking free. */ 671 + head = rht_ptr(rht_dereference_bucket(*bkt, tbl, hash)); 729 672 730 673 RCU_INIT_POINTER(obj->next, head); 731 674 if (rhlist) { ··· 736 677 RCU_INIT_POINTER(list->next, NULL); 737 678 } 738 679 739 - rcu_assign_pointer(*pprev, obj); 740 - 741 680 atomic_inc(&ht->nelems); 681 + rht_assign_unlock(bkt, obj); 682 + 742 683 if (rht_grow_above_75(ht, tbl)) 743 684 schedule_work(&ht->run_work); 744 685 745 - good: 746 686 data = NULL; 747 - 748 687 out: 749 - spin_unlock_bh(lock); 750 688 rcu_read_unlock(); 751 689 752 690 return data; 691 + 692 + out_unlock: 693 + rht_unlock(bkt); 694 + goto out; 753 695 } 754 696 755 697 /** ··· 759 699 * @obj: pointer to hash head inside object 760 700 * @params: hash table parameters 761 701 * 762 - * Will take a per bucket spinlock to protect against mutual mutations 702 + * Will take the per bucket bitlock to protect against mutual mutations 763 703 * on the same bucket. Multiple insertions may occur in parallel unless 764 - * they map to the same bucket lock. 704 + * they map to the same bucket. 765 705 * 766 706 * It is safe to call this function from atomic context. 767 707 * ··· 788 728 * @list: pointer to hash list head inside object 789 729 * @params: hash table parameters 790 730 * 791 - * Will take a per bucket spinlock to protect against mutual mutations 731 + * Will take the per bucket bitlock to protect against mutual mutations 792 732 * on the same bucket. Multiple insertions may occur in parallel unless 793 - * they map to the same bucket lock. 733 + * they map to the same bucket. 794 734 * 795 735 * It is safe to call this function from atomic context. 796 736 * ··· 811 751 * @list: pointer to hash list head inside object 812 752 * @params: hash table parameters 813 753 * 814 - * Will take a per bucket spinlock to protect against mutual mutations 754 + * Will take the per bucket bitlock to protect against mutual mutations 815 755 * on the same bucket. Multiple insertions may occur in parallel unless 816 - * they map to the same bucket lock. 756 + * they map to the same bucket. 817 757 * 818 758 * It is safe to call this function from atomic context. 819 759 * ··· 940 880 struct rhash_head *obj, const struct rhashtable_params params, 941 881 bool rhlist) 942 882 { 883 + struct rhash_lock_head __rcu **bkt; 943 884 struct rhash_head __rcu **pprev; 944 885 struct rhash_head *he; 945 - spinlock_t * lock; 946 886 unsigned int hash; 947 887 int err = -ENOENT; 948 888 949 889 hash = rht_head_hashfn(ht, tbl, obj, params); 950 - lock = rht_bucket_lock(tbl, hash); 890 + bkt = rht_bucket_var(tbl, hash); 891 + if (!bkt) 892 + return -ENOENT; 893 + pprev = NULL; 894 + rht_lock(bkt); 951 895 952 - spin_lock_bh(lock); 953 - 954 - pprev = rht_bucket_var(tbl, hash); 955 - if (!pprev) 956 - goto out; 957 - rht_for_each_from(he, *pprev, tbl, hash) { 896 + rht_for_each_from(he, rht_ptr(*bkt), tbl, hash) { 958 897 struct rhlist_head *list; 959 898 960 899 list = container_of(he, struct rhlist_head, rhead); ··· 993 934 } 994 935 } 995 936 996 - rcu_assign_pointer(*pprev, obj); 997 - break; 937 + if (pprev) { 938 + rcu_assign_pointer(*pprev, obj); 939 + rht_unlock(bkt); 940 + } else { 941 + rht_assign_unlock(bkt, obj); 942 + } 943 + goto unlocked; 998 944 } 999 945 1000 - out: 1001 - spin_unlock_bh(lock); 1002 - 946 + rht_unlock(bkt); 947 + unlocked: 1003 948 if (err > 0) { 1004 949 atomic_dec(&ht->nelems); 1005 950 if (unlikely(ht->p.automatic_shrinking && ··· 1092 1029 struct rhash_head *obj_old, struct rhash_head *obj_new, 1093 1030 const struct rhashtable_params params) 1094 1031 { 1032 + struct rhash_lock_head __rcu **bkt; 1095 1033 struct rhash_head __rcu **pprev; 1096 1034 struct rhash_head *he; 1097 - spinlock_t *lock; 1098 1035 unsigned int hash; 1099 1036 int err = -ENOENT; 1100 1037 ··· 1105 1042 if (hash != rht_head_hashfn(ht, tbl, obj_new, params)) 1106 1043 return -EINVAL; 1107 1044 1108 - lock = rht_bucket_lock(tbl, hash); 1045 + bkt = rht_bucket_var(tbl, hash); 1046 + if (!bkt) 1047 + return -ENOENT; 1109 1048 1110 - spin_lock_bh(lock); 1049 + pprev = NULL; 1050 + rht_lock(bkt); 1111 1051 1112 - pprev = rht_bucket_var(tbl, hash); 1113 - if (!pprev) 1114 - goto out; 1115 - rht_for_each_from(he, *pprev, tbl, hash) { 1052 + rht_for_each_from(he, rht_ptr(*bkt), tbl, hash) { 1116 1053 if (he != obj_old) { 1117 1054 pprev = &he->next; 1118 1055 continue; 1119 1056 } 1120 1057 1121 1058 rcu_assign_pointer(obj_new->next, obj_old->next); 1122 - rcu_assign_pointer(*pprev, obj_new); 1059 + if (pprev) { 1060 + rcu_assign_pointer(*pprev, obj_new); 1061 + rht_unlock(bkt); 1062 + } else { 1063 + rht_assign_unlock(bkt, obj_new); 1064 + } 1123 1065 err = 0; 1124 - break; 1066 + goto unlocked; 1125 1067 } 1126 - out: 1127 - spin_unlock_bh(lock); 1128 1068 1069 + rht_unlock(bkt); 1070 + 1071 + unlocked: 1129 1072 return err; 1130 1073 } 1131 1074
-1
ipc/util.c
··· 101 101 .head_offset = offsetof(struct kern_ipc_perm, khtnode), 102 102 .key_offset = offsetof(struct kern_ipc_perm, key), 103 103 .key_len = FIELD_SIZEOF(struct kern_ipc_perm, key), 104 - .locks_mul = 1, 105 104 .automatic_shrinking = true, 106 105 }; 107 106
+69 -70
lib/rhashtable.c
··· 31 31 32 32 #define HASH_DEFAULT_SIZE 64UL 33 33 #define HASH_MIN_SIZE 4U 34 - #define BUCKET_LOCKS_PER_CPU 32UL 35 34 36 35 union nested_table { 37 36 union nested_table __rcu *table; 38 - struct rhash_head __rcu *bucket; 37 + struct rhash_lock_head __rcu *bucket; 39 38 }; 40 39 41 40 static u32 head_hashfn(struct rhashtable *ht, ··· 55 56 56 57 int lockdep_rht_bucket_is_held(const struct bucket_table *tbl, u32 hash) 57 58 { 58 - spinlock_t *lock = rht_bucket_lock(tbl, hash); 59 - 60 - return (debug_locks) ? lockdep_is_held(lock) : 1; 59 + if (!debug_locks) 60 + return 1; 61 + if (unlikely(tbl->nest)) 62 + return 1; 63 + return bit_spin_is_locked(1, (unsigned long *)&tbl->buckets[hash]); 61 64 } 62 65 EXPORT_SYMBOL_GPL(lockdep_rht_bucket_is_held); 63 66 #else ··· 105 104 if (tbl->nest) 106 105 nested_bucket_table_free(tbl); 107 106 108 - free_bucket_spinlocks(tbl->locks); 109 107 kvfree(tbl); 110 108 } 111 109 ··· 171 171 gfp_t gfp) 172 172 { 173 173 struct bucket_table *tbl = NULL; 174 - size_t size, max_locks; 174 + size_t size; 175 175 int i; 176 176 177 177 size = sizeof(*tbl) + nbuckets * sizeof(tbl->buckets[0]); ··· 188 188 return NULL; 189 189 190 190 tbl->size = size; 191 - 192 - max_locks = size >> 1; 193 - if (tbl->nest) 194 - max_locks = min_t(size_t, max_locks, 1U << tbl->nest); 195 - 196 - if (alloc_bucket_spinlocks(&tbl->locks, &tbl->locks_mask, max_locks, 197 - ht->p.locks_mul, gfp) < 0) { 198 - bucket_table_free(tbl); 199 - return NULL; 200 - } 201 191 202 192 rcu_head_init(&tbl->rcu); 203 193 INIT_LIST_HEAD(&tbl->walkers); ··· 213 223 return new_tbl; 214 224 } 215 225 216 - static int rhashtable_rehash_one(struct rhashtable *ht, unsigned int old_hash) 226 + static int rhashtable_rehash_one(struct rhashtable *ht, 227 + struct rhash_lock_head __rcu **bkt, 228 + unsigned int old_hash) 217 229 { 218 230 struct bucket_table *old_tbl = rht_dereference(ht->tbl, ht); 219 231 struct bucket_table *new_tbl = rhashtable_last_table(ht, old_tbl); 220 - struct rhash_head __rcu **pprev = rht_bucket_var(old_tbl, old_hash); 221 232 int err = -EAGAIN; 222 233 struct rhash_head *head, *next, *entry; 223 - spinlock_t *new_bucket_lock; 234 + struct rhash_head **pprev = NULL; 224 235 unsigned int new_hash; 225 236 226 237 if (new_tbl->nest) 227 238 goto out; 228 239 229 240 err = -ENOENT; 230 - if (!pprev) 231 - goto out; 232 241 233 - rht_for_each_from(entry, *pprev, old_tbl, old_hash) { 242 + rht_for_each_from(entry, rht_ptr(*bkt), old_tbl, old_hash) { 234 243 err = 0; 235 244 next = rht_dereference_bucket(entry->next, old_tbl, old_hash); 236 245 ··· 244 255 245 256 new_hash = head_hashfn(ht, new_tbl, entry); 246 257 247 - new_bucket_lock = rht_bucket_lock(new_tbl, new_hash); 258 + rht_lock(&new_tbl->buckets[new_hash]); 248 259 249 - spin_lock_nested(new_bucket_lock, SINGLE_DEPTH_NESTING); 250 - head = rht_dereference_bucket(new_tbl->buckets[new_hash], 251 - new_tbl, new_hash); 260 + head = rht_ptr(rht_dereference_bucket(new_tbl->buckets[new_hash], 261 + new_tbl, new_hash)); 252 262 253 263 RCU_INIT_POINTER(entry->next, head); 254 264 255 - rcu_assign_pointer(new_tbl->buckets[new_hash], entry); 256 - spin_unlock(new_bucket_lock); 265 + rht_assign_unlock(&new_tbl->buckets[new_hash], entry); 257 266 258 - rcu_assign_pointer(*pprev, next); 267 + if (pprev) 268 + rcu_assign_pointer(*pprev, next); 269 + else 270 + /* Need to preserved the bit lock. */ 271 + rcu_assign_pointer(*bkt, rht_ptr_locked(next)); 259 272 260 273 out: 261 274 return err; ··· 267 276 unsigned int old_hash) 268 277 { 269 278 struct bucket_table *old_tbl = rht_dereference(ht->tbl, ht); 270 - spinlock_t *old_bucket_lock; 279 + struct rhash_lock_head __rcu **bkt = rht_bucket_var(old_tbl, old_hash); 271 280 int err; 272 281 273 - old_bucket_lock = rht_bucket_lock(old_tbl, old_hash); 282 + if (!bkt) 283 + return 0; 284 + rht_lock(bkt); 274 285 275 - spin_lock_bh(old_bucket_lock); 276 - while (!(err = rhashtable_rehash_one(ht, old_hash))) 286 + while (!(err = rhashtable_rehash_one(ht, bkt, old_hash))) 277 287 ; 278 288 279 289 if (err == -ENOENT) 280 290 err = 0; 281 - 282 - spin_unlock_bh(old_bucket_lock); 291 + rht_unlock(bkt); 283 292 284 293 return err; 285 294 } ··· 476 485 } 477 486 478 487 static void *rhashtable_lookup_one(struct rhashtable *ht, 488 + struct rhash_lock_head __rcu **bkt, 479 489 struct bucket_table *tbl, unsigned int hash, 480 490 const void *key, struct rhash_head *obj) 481 491 { ··· 484 492 .ht = ht, 485 493 .key = key, 486 494 }; 487 - struct rhash_head __rcu **pprev; 495 + struct rhash_head **pprev = NULL; 488 496 struct rhash_head *head; 489 497 int elasticity; 490 498 491 499 elasticity = RHT_ELASTICITY; 492 - pprev = rht_bucket_var(tbl, hash); 493 - if (!pprev) 494 - return ERR_PTR(-ENOENT); 495 - rht_for_each_from(head, *pprev, tbl, hash) { 500 + rht_for_each_from(head, rht_ptr(*bkt), tbl, hash) { 496 501 struct rhlist_head *list; 497 502 struct rhlist_head *plist; 498 503 ··· 511 522 RCU_INIT_POINTER(list->next, plist); 512 523 head = rht_dereference_bucket(head->next, tbl, hash); 513 524 RCU_INIT_POINTER(list->rhead.next, head); 514 - rcu_assign_pointer(*pprev, obj); 525 + if (pprev) 526 + rcu_assign_pointer(*pprev, obj); 527 + else 528 + /* Need to preserve the bit lock */ 529 + rcu_assign_pointer(*bkt, rht_ptr_locked(obj)); 515 530 516 531 return NULL; 517 532 } ··· 527 534 } 528 535 529 536 static struct bucket_table *rhashtable_insert_one(struct rhashtable *ht, 537 + struct rhash_lock_head __rcu **bkt, 530 538 struct bucket_table *tbl, 531 539 unsigned int hash, 532 540 struct rhash_head *obj, 533 541 void *data) 534 542 { 535 - struct rhash_head __rcu **pprev; 536 543 struct bucket_table *new_tbl; 537 544 struct rhash_head *head; 538 545 ··· 555 562 if (unlikely(rht_grow_above_100(ht, tbl))) 556 563 return ERR_PTR(-EAGAIN); 557 564 558 - pprev = rht_bucket_insert(ht, tbl, hash); 559 - if (!pprev) 560 - return ERR_PTR(-ENOMEM); 561 - 562 - head = rht_dereference_bucket(*pprev, tbl, hash); 565 + head = rht_ptr(rht_dereference_bucket(*bkt, tbl, hash)); 563 566 564 567 RCU_INIT_POINTER(obj->next, head); 565 568 if (ht->rhlist) { ··· 565 576 RCU_INIT_POINTER(list->next, NULL); 566 577 } 567 578 568 - rcu_assign_pointer(*pprev, obj); 579 + /* bkt is always the head of the list, so it holds 580 + * the lock, which we need to preserve 581 + */ 582 + rcu_assign_pointer(*bkt, rht_ptr_locked(obj)); 569 583 570 584 atomic_inc(&ht->nelems); 571 585 if (rht_grow_above_75(ht, tbl)) ··· 582 590 { 583 591 struct bucket_table *new_tbl; 584 592 struct bucket_table *tbl; 593 + struct rhash_lock_head __rcu **bkt; 585 594 unsigned int hash; 586 595 void *data; 587 596 ··· 591 598 do { 592 599 tbl = new_tbl; 593 600 hash = rht_head_hashfn(ht, tbl, obj, ht->p); 594 - spin_lock_bh(rht_bucket_lock(tbl, hash)); 601 + if (rcu_access_pointer(tbl->future_tbl)) 602 + /* Failure is OK */ 603 + bkt = rht_bucket_var(tbl, hash); 604 + else 605 + bkt = rht_bucket_insert(ht, tbl, hash); 606 + if (bkt == NULL) { 607 + new_tbl = rht_dereference_rcu(tbl->future_tbl, ht); 608 + data = ERR_PTR(-EAGAIN); 609 + } else { 610 + rht_lock(bkt); 611 + data = rhashtable_lookup_one(ht, bkt, tbl, 612 + hash, key, obj); 613 + new_tbl = rhashtable_insert_one(ht, bkt, tbl, 614 + hash, obj, data); 615 + if (PTR_ERR(new_tbl) != -EEXIST) 616 + data = ERR_CAST(new_tbl); 595 617 596 - data = rhashtable_lookup_one(ht, tbl, hash, key, obj); 597 - new_tbl = rhashtable_insert_one(ht, tbl, hash, obj, data); 598 - if (PTR_ERR(new_tbl) != -EEXIST) 599 - data = ERR_CAST(new_tbl); 600 - 601 - spin_unlock_bh(rht_bucket_lock(tbl, hash)); 618 + rht_unlock(bkt); 619 + } 602 620 } while (!IS_ERR_OR_NULL(new_tbl)); 603 621 604 622 if (PTR_ERR(data) == -EAGAIN) ··· 1036 1032 1037 1033 size = rounded_hashtable_size(&ht->p); 1038 1034 1039 - if (params->locks_mul) 1040 - ht->p.locks_mul = roundup_pow_of_two(params->locks_mul); 1041 - else 1042 - ht->p.locks_mul = BUCKET_LOCKS_PER_CPU; 1043 - 1044 1035 ht->key_len = ht->p.key_len; 1045 1036 if (!params->hashfn) { 1046 1037 ht->p.hashfn = jhash; ··· 1137 1138 struct rhash_head *pos, *next; 1138 1139 1139 1140 cond_resched(); 1140 - for (pos = rht_dereference(*rht_bucket(tbl, i), ht), 1141 + for (pos = rht_ptr(rht_dereference(*rht_bucket(tbl, i), ht)), 1141 1142 next = !rht_is_a_nulls(pos) ? 1142 1143 rht_dereference(pos->next, ht) : NULL; 1143 1144 !rht_is_a_nulls(pos); ··· 1164 1165 } 1165 1166 EXPORT_SYMBOL_GPL(rhashtable_destroy); 1166 1167 1167 - struct rhash_head __rcu **__rht_bucket_nested(const struct bucket_table *tbl, 1168 - unsigned int hash) 1168 + struct rhash_lock_head __rcu **__rht_bucket_nested(const struct bucket_table *tbl, 1169 + unsigned int hash) 1169 1170 { 1170 1171 const unsigned int shift = PAGE_SHIFT - ilog2(sizeof(void *)); 1171 1172 unsigned int index = hash & ((1 << tbl->nest) - 1); ··· 1193 1194 } 1194 1195 EXPORT_SYMBOL_GPL(__rht_bucket_nested); 1195 1196 1196 - struct rhash_head __rcu **rht_bucket_nested(const struct bucket_table *tbl, 1197 - unsigned int hash) 1197 + struct rhash_lock_head __rcu **rht_bucket_nested(const struct bucket_table *tbl, 1198 + unsigned int hash) 1198 1199 { 1199 - static struct rhash_head __rcu *rhnull; 1200 + static struct rhash_lock_head __rcu *rhnull; 1200 1201 1201 1202 if (!rhnull) 1202 1203 INIT_RHT_NULLS_HEAD(rhnull); ··· 1204 1205 } 1205 1206 EXPORT_SYMBOL_GPL(rht_bucket_nested); 1206 1207 1207 - struct rhash_head __rcu **rht_bucket_nested_insert(struct rhashtable *ht, 1208 - struct bucket_table *tbl, 1209 - unsigned int hash) 1208 + struct rhash_lock_head __rcu **rht_bucket_nested_insert(struct rhashtable *ht, 1209 + struct bucket_table *tbl, 1210 + unsigned int hash) 1210 1211 { 1211 1212 const unsigned int shift = PAGE_SHIFT - ilog2(sizeof(void *)); 1212 1213 unsigned int index = hash & ((1 << tbl->nest) - 1);
+1 -1
lib/test_rhashtable.c
··· 500 500 struct rhash_head *pos, *next; 501 501 struct test_obj_rhl *p; 502 502 503 - pos = rht_dereference(tbl->buckets[i], ht); 503 + pos = rht_ptr(rht_dereference(tbl->buckets[i], ht)); 504 504 next = !rht_is_a_nulls(pos) ? rht_dereference(pos->next, ht) : NULL; 505 505 506 506 if (!rht_is_a_nulls(pos)) {
-1
net/bridge/br_fdb.c
··· 33 33 .key_offset = offsetof(struct net_bridge_fdb_entry, key), 34 34 .key_len = sizeof(struct net_bridge_fdb_key), 35 35 .automatic_shrinking = true, 36 - .locks_mul = 1, 37 36 }; 38 37 39 38 static struct kmem_cache *br_fdb_cache __read_mostly;
-1
net/bridge/br_multicast.c
··· 44 44 .key_offset = offsetof(struct net_bridge_mdb_entry, addr), 45 45 .key_len = sizeof(struct br_ip), 46 46 .automatic_shrinking = true, 47 - .locks_mul = 1, 48 47 }; 49 48 50 49 static void br_multicast_start_querier(struct net_bridge *br,
-1
net/bridge/br_vlan.c
··· 21 21 .key_offset = offsetof(struct net_bridge_vlan, vid), 22 22 .key_len = sizeof(u16), 23 23 .nelem_hint = 3, 24 - .locks_mul = 1, 25 24 .max_size = VLAN_N_VID, 26 25 .obj_cmpfn = br_vlan_cmp, 27 26 .automatic_shrinking = true,
-1
net/bridge/br_vlan_tunnel.c
··· 34 34 .key_offset = offsetof(struct net_bridge_vlan, tinfo.tunnel_id), 35 35 .key_len = sizeof(__be64), 36 36 .nelem_hint = 3, 37 - .locks_mul = 1, 38 37 .obj_cmpfn = br_vlan_tunid_cmp, 39 38 .automatic_shrinking = true, 40 39 };
-1
net/ipv4/ipmr.c
··· 373 373 .key_offset = offsetof(struct mfc_cache, cmparg), 374 374 .key_len = sizeof(struct mfc_cache_cmp_arg), 375 375 .nelem_hint = 3, 376 - .locks_mul = 1, 377 376 .obj_cmpfn = ipmr_hash_cmp, 378 377 .automatic_shrinking = true, 379 378 };
-1
net/ipv6/ip6mr.c
··· 355 355 .key_offset = offsetof(struct mfc6_cache, cmparg), 356 356 .key_len = sizeof(struct mfc6_cache_cmp_arg), 357 357 .nelem_hint = 3, 358 - .locks_mul = 1, 359 358 .obj_cmpfn = ip6mr_hash_cmp, 360 359 .automatic_shrinking = true, 361 360 };
-1
net/netfilter/nf_tables_api.c
··· 53 53 .hashfn = nft_chain_hash, 54 54 .obj_hashfn = nft_chain_hash_obj, 55 55 .obj_cmpfn = nft_chain_hash_cmp, 56 - .locks_mul = 1, 57 56 .automatic_shrinking = true, 58 57 }; 59 58