Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

Merge branch 'futexes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'futexes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
futex: fix restart in wait_requeue_pi
futex: fix restart for early wakeup in futex_wait_requeue_pi()
futex: cleanup error exit
futex: remove the wait queue
futex: add requeue-pi documentation
futex: remove FUTEX_REQUEUE_PI (non CMP)
futex: fix futex_wait_setup key handling
sparc64: extend TI_RESTART_BLOCK space by 8 bytes
futex: fixup unlocked requeue pi case
futex: add requeue_pi functionality
futex: split out futex value validation code
futex: distangle futex_requeue()
futex: add FUTEX_HAS_TIMEOUT flag to restart.futex.flags
rt_mutex: add proxy lock routines
futex: split out fixup owner logic from futex_lock_pi()
futex: split out atomic logic from futex_lock_pi()
futex: add helper to find the top prio waiter of a futex
futex: separate futex_wait_queue_me() logic from futex_wait()

+1254 -384
+131
Documentation/futex-requeue-pi.txt
··· 1 + Futex Requeue PI 2 + ---------------- 3 + 4 + Requeueing of tasks from a non-PI futex to a PI futex requires 5 + special handling in order to ensure the underlying rt_mutex is never 6 + left without an owner if it has waiters; doing so would break the PI 7 + boosting logic [see rt-mutex-desgin.txt] For the purposes of 8 + brevity, this action will be referred to as "requeue_pi" throughout 9 + this document. Priority inheritance is abbreviated throughout as 10 + "PI". 11 + 12 + Motivation 13 + ---------- 14 + 15 + Without requeue_pi, the glibc implementation of 16 + pthread_cond_broadcast() must resort to waking all the tasks waiting 17 + on a pthread_condvar and letting them try to sort out which task 18 + gets to run first in classic thundering-herd formation. An ideal 19 + implementation would wake the highest-priority waiter, and leave the 20 + rest to the natural wakeup inherent in unlocking the mutex 21 + associated with the condvar. 22 + 23 + Consider the simplified glibc calls: 24 + 25 + /* caller must lock mutex */ 26 + pthread_cond_wait(cond, mutex) 27 + { 28 + lock(cond->__data.__lock); 29 + unlock(mutex); 30 + do { 31 + unlock(cond->__data.__lock); 32 + futex_wait(cond->__data.__futex); 33 + lock(cond->__data.__lock); 34 + } while(...) 35 + unlock(cond->__data.__lock); 36 + lock(mutex); 37 + } 38 + 39 + pthread_cond_broadcast(cond) 40 + { 41 + lock(cond->__data.__lock); 42 + unlock(cond->__data.__lock); 43 + futex_requeue(cond->data.__futex, cond->mutex); 44 + } 45 + 46 + Once pthread_cond_broadcast() requeues the tasks, the cond->mutex 47 + has waiters. Note that pthread_cond_wait() attempts to lock the 48 + mutex only after it has returned to user space. This will leave the 49 + underlying rt_mutex with waiters, and no owner, breaking the 50 + previously mentioned PI-boosting algorithms. 51 + 52 + In order to support PI-aware pthread_condvar's, the kernel needs to 53 + be able to requeue tasks to PI futexes. This support implies that 54 + upon a successful futex_wait system call, the caller would return to 55 + user space already holding the PI futex. The glibc implementation 56 + would be modified as follows: 57 + 58 + 59 + /* caller must lock mutex */ 60 + pthread_cond_wait_pi(cond, mutex) 61 + { 62 + lock(cond->__data.__lock); 63 + unlock(mutex); 64 + do { 65 + unlock(cond->__data.__lock); 66 + futex_wait_requeue_pi(cond->__data.__futex); 67 + lock(cond->__data.__lock); 68 + } while(...) 69 + unlock(cond->__data.__lock); 70 + /* the kernel acquired the the mutex for us */ 71 + } 72 + 73 + pthread_cond_broadcast_pi(cond) 74 + { 75 + lock(cond->__data.__lock); 76 + unlock(cond->__data.__lock); 77 + futex_requeue_pi(cond->data.__futex, cond->mutex); 78 + } 79 + 80 + The actual glibc implementation will likely test for PI and make the 81 + necessary changes inside the existing calls rather than creating new 82 + calls for the PI cases. Similar changes are needed for 83 + pthread_cond_timedwait() and pthread_cond_signal(). 84 + 85 + Implementation 86 + -------------- 87 + 88 + In order to ensure the rt_mutex has an owner if it has waiters, it 89 + is necessary for both the requeue code, as well as the waiting code, 90 + to be able to acquire the rt_mutex before returning to user space. 91 + The requeue code cannot simply wake the waiter and leave it to 92 + acquire the rt_mutex as it would open a race window between the 93 + requeue call returning to user space and the waiter waking and 94 + starting to run. This is especially true in the uncontended case. 95 + 96 + The solution involves two new rt_mutex helper routines, 97 + rt_mutex_start_proxy_lock() and rt_mutex_finish_proxy_lock(), which 98 + allow the requeue code to acquire an uncontended rt_mutex on behalf 99 + of the waiter and to enqueue the waiter on a contended rt_mutex. 100 + Two new system calls provide the kernel<->user interface to 101 + requeue_pi: FUTEX_WAIT_REQUEUE_PI and FUTEX_REQUEUE_CMP_PI. 102 + 103 + FUTEX_WAIT_REQUEUE_PI is called by the waiter (pthread_cond_wait() 104 + and pthread_cond_timedwait()) to block on the initial futex and wait 105 + to be requeued to a PI-aware futex. The implementation is the 106 + result of a high-speed collision between futex_wait() and 107 + futex_lock_pi(), with some extra logic to check for the additional 108 + wake-up scenarios. 109 + 110 + FUTEX_REQUEUE_CMP_PI is called by the waker 111 + (pthread_cond_broadcast() and pthread_cond_signal()) to requeue and 112 + possibly wake the waiting tasks. Internally, this system call is 113 + still handled by futex_requeue (by passing requeue_pi=1). Before 114 + requeueing, futex_requeue() attempts to acquire the requeue target 115 + PI futex on behalf of the top waiter. If it can, this waiter is 116 + woken. futex_requeue() then proceeds to requeue the remaining 117 + nr_wake+nr_requeue tasks to the PI futex, calling 118 + rt_mutex_start_proxy_lock() prior to each requeue to prepare the 119 + task as a waiter on the underlying rt_mutex. It is possible that 120 + the lock can be acquired at this stage as well, if so, the next 121 + waiter is woken to finish the acquisition of the lock. 122 + 123 + FUTEX_REQUEUE_PI accepts nr_wake and nr_requeue as arguments, but 124 + their sum is all that really matters. futex_requeue() will wake or 125 + requeue up to nr_wake + nr_requeue tasks. It will wake only as many 126 + tasks as it can acquire the lock for, which in the majority of cases 127 + should be 0 as good programming practice dictates that the caller of 128 + either pthread_cond_broadcast() or pthread_cond_signal() acquire the 129 + mutex prior to making the call. FUTEX_REQUEUE_PI requires that 130 + nr_wake=1. nr_requeue should be INT_MAX for broadcast and 0 for 131 + signal.
+2 -2
arch/sparc/include/asm/thread_info_64.h
··· 102 102 #define TI_KERN_CNTD1 0x00000488 103 103 #define TI_PCR 0x00000490 104 104 #define TI_RESTART_BLOCK 0x00000498 105 - #define TI_KUNA_REGS 0x000004c0 106 - #define TI_KUNA_INSN 0x000004c8 105 + #define TI_KUNA_REGS 0x000004c8 106 + #define TI_KUNA_INSN 0x000004d0 107 107 #define TI_FPREGS 0x00000500 108 108 109 109 /* We embed this in the uppermost byte of thread_info->flags */
+6
include/linux/futex.h
··· 23 23 #define FUTEX_TRYLOCK_PI 8 24 24 #define FUTEX_WAIT_BITSET 9 25 25 #define FUTEX_WAKE_BITSET 10 26 + #define FUTEX_WAIT_REQUEUE_PI 11 27 + #define FUTEX_CMP_REQUEUE_PI 12 26 28 27 29 #define FUTEX_PRIVATE_FLAG 128 28 30 #define FUTEX_CLOCK_REALTIME 256 ··· 40 38 #define FUTEX_TRYLOCK_PI_PRIVATE (FUTEX_TRYLOCK_PI | FUTEX_PRIVATE_FLAG) 41 39 #define FUTEX_WAIT_BITSET_PRIVATE (FUTEX_WAIT_BITS | FUTEX_PRIVATE_FLAG) 42 40 #define FUTEX_WAKE_BITSET_PRIVATE (FUTEX_WAKE_BITS | FUTEX_PRIVATE_FLAG) 41 + #define FUTEX_WAIT_REQUEUE_PI_PRIVATE (FUTEX_WAIT_REQUEUE_PI | \ 42 + FUTEX_PRIVATE_FLAG) 43 + #define FUTEX_CMP_REQUEUE_PI_PRIVATE (FUTEX_CMP_REQUEUE_PI | \ 44 + FUTEX_PRIVATE_FLAG) 43 45 44 46 /* 45 47 * Support for robust futexes: the kernel cleans up held futexes at
+2 -1
include/linux/thread_info.h
··· 21 21 struct { 22 22 unsigned long arg0, arg1, arg2, arg3; 23 23 }; 24 - /* For futex_wait */ 24 + /* For futex_wait and futex_wait_requeue_pi */ 25 25 struct { 26 26 u32 *uaddr; 27 27 u32 val; 28 28 u32 flags; 29 29 u32 bitset; 30 30 u64 time; 31 + u32 *uaddr2; 31 32 } futex; 32 33 /* For nanosleep */ 33 34 struct {
+894 -304
kernel/futex.c
··· 19 19 * PRIVATE futexes by Eric Dumazet 20 20 * Copyright (C) 2007 Eric Dumazet <dada1@cosmosbay.com> 21 21 * 22 + * Requeue-PI support by Darren Hart <dvhltc@us.ibm.com> 23 + * Copyright (C) IBM Corporation, 2009 24 + * Thanks to Thomas Gleixner for conceptual design and careful reviews. 25 + * 22 26 * Thanks to Ben LaHaise for yelling "hashed waitqueues" loudly 23 27 * enough at me, Linus for the original (flawed) idea, Matthew 24 28 * Kirkwood for proof-of-concept implementation. ··· 100 96 */ 101 97 struct futex_q { 102 98 struct plist_node list; 103 - /* There can only be a single waiter */ 104 - wait_queue_head_t waiter; 99 + /* Waiter reference */ 100 + struct task_struct *task; 105 101 106 102 /* Which hash list lock to use: */ 107 103 spinlock_t *lock_ptr; ··· 111 107 112 108 /* Optional priority inheritance state: */ 113 109 struct futex_pi_state *pi_state; 114 - struct task_struct *task; 110 + 111 + /* rt_waiter storage for requeue_pi: */ 112 + struct rt_mutex_waiter *rt_waiter; 115 113 116 114 /* Bitset for the optional bitmasked wakeup */ 117 115 u32 bitset; ··· 282 276 void put_futex_key(int fshared, union futex_key *key) 283 277 { 284 278 drop_futex_key_refs(key); 279 + } 280 + 281 + /** 282 + * futex_top_waiter() - Return the highest priority waiter on a futex 283 + * @hb: the hash bucket the futex_q's reside in 284 + * @key: the futex key (to distinguish it from other futex futex_q's) 285 + * 286 + * Must be called with the hb lock held. 287 + */ 288 + static struct futex_q *futex_top_waiter(struct futex_hash_bucket *hb, 289 + union futex_key *key) 290 + { 291 + struct futex_q *this; 292 + 293 + plist_for_each_entry(this, &hb->chain, list) { 294 + if (match_futex(&this->key, key)) 295 + return this; 296 + } 297 + return NULL; 285 298 } 286 299 287 300 static u32 cmpxchg_futex_value_locked(u32 __user *uaddr, u32 uval, u32 newval) ··· 564 539 return 0; 565 540 } 566 541 542 + /** 543 + * futex_lock_pi_atomic() - atomic work required to acquire a pi aware futex 544 + * @uaddr: the pi futex user address 545 + * @hb: the pi futex hash bucket 546 + * @key: the futex key associated with uaddr and hb 547 + * @ps: the pi_state pointer where we store the result of the 548 + * lookup 549 + * @task: the task to perform the atomic lock work for. This will 550 + * be "current" except in the case of requeue pi. 551 + * @set_waiters: force setting the FUTEX_WAITERS bit (1) or not (0) 552 + * 553 + * Returns: 554 + * 0 - ready to wait 555 + * 1 - acquired the lock 556 + * <0 - error 557 + * 558 + * The hb->lock and futex_key refs shall be held by the caller. 559 + */ 560 + static int futex_lock_pi_atomic(u32 __user *uaddr, struct futex_hash_bucket *hb, 561 + union futex_key *key, 562 + struct futex_pi_state **ps, 563 + struct task_struct *task, int set_waiters) 564 + { 565 + int lock_taken, ret, ownerdied = 0; 566 + u32 uval, newval, curval; 567 + 568 + retry: 569 + ret = lock_taken = 0; 570 + 571 + /* 572 + * To avoid races, we attempt to take the lock here again 573 + * (by doing a 0 -> TID atomic cmpxchg), while holding all 574 + * the locks. It will most likely not succeed. 575 + */ 576 + newval = task_pid_vnr(task); 577 + if (set_waiters) 578 + newval |= FUTEX_WAITERS; 579 + 580 + curval = cmpxchg_futex_value_locked(uaddr, 0, newval); 581 + 582 + if (unlikely(curval == -EFAULT)) 583 + return -EFAULT; 584 + 585 + /* 586 + * Detect deadlocks. 587 + */ 588 + if ((unlikely((curval & FUTEX_TID_MASK) == task_pid_vnr(task)))) 589 + return -EDEADLK; 590 + 591 + /* 592 + * Surprise - we got the lock. Just return to userspace: 593 + */ 594 + if (unlikely(!curval)) 595 + return 1; 596 + 597 + uval = curval; 598 + 599 + /* 600 + * Set the FUTEX_WAITERS flag, so the owner will know it has someone 601 + * to wake at the next unlock. 602 + */ 603 + newval = curval | FUTEX_WAITERS; 604 + 605 + /* 606 + * There are two cases, where a futex might have no owner (the 607 + * owner TID is 0): OWNER_DIED. We take over the futex in this 608 + * case. We also do an unconditional take over, when the owner 609 + * of the futex died. 610 + * 611 + * This is safe as we are protected by the hash bucket lock ! 612 + */ 613 + if (unlikely(ownerdied || !(curval & FUTEX_TID_MASK))) { 614 + /* Keep the OWNER_DIED bit */ 615 + newval = (curval & ~FUTEX_TID_MASK) | task_pid_vnr(task); 616 + ownerdied = 0; 617 + lock_taken = 1; 618 + } 619 + 620 + curval = cmpxchg_futex_value_locked(uaddr, uval, newval); 621 + 622 + if (unlikely(curval == -EFAULT)) 623 + return -EFAULT; 624 + if (unlikely(curval != uval)) 625 + goto retry; 626 + 627 + /* 628 + * We took the lock due to owner died take over. 629 + */ 630 + if (unlikely(lock_taken)) 631 + return 1; 632 + 633 + /* 634 + * We dont have the lock. Look up the PI state (or create it if 635 + * we are the first waiter): 636 + */ 637 + ret = lookup_pi_state(uval, hb, key, ps); 638 + 639 + if (unlikely(ret)) { 640 + switch (ret) { 641 + case -ESRCH: 642 + /* 643 + * No owner found for this futex. Check if the 644 + * OWNER_DIED bit is set to figure out whether 645 + * this is a robust futex or not. 646 + */ 647 + if (get_futex_value_locked(&curval, uaddr)) 648 + return -EFAULT; 649 + 650 + /* 651 + * We simply start over in case of a robust 652 + * futex. The code above will take the futex 653 + * and return happy. 654 + */ 655 + if (curval & FUTEX_OWNER_DIED) { 656 + ownerdied = 1; 657 + goto retry; 658 + } 659 + default: 660 + break; 661 + } 662 + } 663 + 664 + return ret; 665 + } 666 + 567 667 /* 568 668 * The hash bucket lock must be held when this is called. 569 669 * Afterwards, the futex_q must not be accessed. 570 670 */ 571 671 static void wake_futex(struct futex_q *q) 572 672 { 673 + struct task_struct *p = q->task; 674 + 675 + /* 676 + * We set q->lock_ptr = NULL _before_ we wake up the task. If 677 + * a non futex wake up happens on another CPU then the task 678 + * might exit and p would dereference a non existing task 679 + * struct. Prevent this by holding a reference on p across the 680 + * wake up. 681 + */ 682 + get_task_struct(p); 683 + 573 684 plist_del(&q->list, &q->list.plist); 574 685 /* 575 - * The lock in wake_up_all() is a crucial memory barrier after the 576 - * plist_del() and also before assigning to q->lock_ptr. 577 - */ 578 - wake_up(&q->waiter); 579 - /* 580 - * The waiting task can free the futex_q as soon as this is written, 581 - * without taking any locks. This must come last. 582 - * 583 - * A memory barrier is required here to prevent the following store to 584 - * lock_ptr from getting ahead of the wakeup. Clearing the lock at the 585 - * end of wake_up() does not prevent this store from moving. 686 + * The waiting task can free the futex_q as soon as 687 + * q->lock_ptr = NULL is written, without taking any locks. A 688 + * memory barrier is required here to prevent the following 689 + * store to lock_ptr from getting ahead of the plist_del. 586 690 */ 587 691 smp_wmb(); 588 692 q->lock_ptr = NULL; 693 + 694 + wake_up_state(p, TASK_NORMAL); 695 + put_task_struct(p); 589 696 } 590 697 591 698 static int wake_futex_pi(u32 __user *uaddr, u32 uval, struct futex_q *this) ··· 846 689 847 690 plist_for_each_entry_safe(this, next, head, list) { 848 691 if (match_futex (&this->key, &key)) { 849 - if (this->pi_state) { 692 + if (this->pi_state || this->rt_waiter) { 850 693 ret = -EINVAL; 851 694 break; 852 695 } ··· 959 802 return ret; 960 803 } 961 804 962 - /* 963 - * Requeue all waiters hashed on one physical page to another 964 - * physical page. 805 + /** 806 + * requeue_futex() - Requeue a futex_q from one hb to another 807 + * @q: the futex_q to requeue 808 + * @hb1: the source hash_bucket 809 + * @hb2: the target hash_bucket 810 + * @key2: the new key for the requeued futex_q 811 + */ 812 + static inline 813 + void requeue_futex(struct futex_q *q, struct futex_hash_bucket *hb1, 814 + struct futex_hash_bucket *hb2, union futex_key *key2) 815 + { 816 + 817 + /* 818 + * If key1 and key2 hash to the same bucket, no need to 819 + * requeue. 820 + */ 821 + if (likely(&hb1->chain != &hb2->chain)) { 822 + plist_del(&q->list, &hb1->chain); 823 + plist_add(&q->list, &hb2->chain); 824 + q->lock_ptr = &hb2->lock; 825 + #ifdef CONFIG_DEBUG_PI_LIST 826 + q->list.plist.lock = &hb2->lock; 827 + #endif 828 + } 829 + get_futex_key_refs(key2); 830 + q->key = *key2; 831 + } 832 + 833 + /** 834 + * requeue_pi_wake_futex() - Wake a task that acquired the lock during requeue 835 + * q: the futex_q 836 + * key: the key of the requeue target futex 837 + * 838 + * During futex_requeue, with requeue_pi=1, it is possible to acquire the 839 + * target futex if it is uncontended or via a lock steal. Set the futex_q key 840 + * to the requeue target futex so the waiter can detect the wakeup on the right 841 + * futex, but remove it from the hb and NULL the rt_waiter so it can detect 842 + * atomic lock acquisition. Must be called with the q->lock_ptr held. 843 + */ 844 + static inline 845 + void requeue_pi_wake_futex(struct futex_q *q, union futex_key *key) 846 + { 847 + drop_futex_key_refs(&q->key); 848 + get_futex_key_refs(key); 849 + q->key = *key; 850 + 851 + WARN_ON(plist_node_empty(&q->list)); 852 + plist_del(&q->list, &q->list.plist); 853 + 854 + WARN_ON(!q->rt_waiter); 855 + q->rt_waiter = NULL; 856 + 857 + wake_up_state(q->task, TASK_NORMAL); 858 + } 859 + 860 + /** 861 + * futex_proxy_trylock_atomic() - Attempt an atomic lock for the top waiter 862 + * @pifutex: the user address of the to futex 863 + * @hb1: the from futex hash bucket, must be locked by the caller 864 + * @hb2: the to futex hash bucket, must be locked by the caller 865 + * @key1: the from futex key 866 + * @key2: the to futex key 867 + * @ps: address to store the pi_state pointer 868 + * @set_waiters: force setting the FUTEX_WAITERS bit (1) or not (0) 869 + * 870 + * Try and get the lock on behalf of the top waiter if we can do it atomically. 871 + * Wake the top waiter if we succeed. If the caller specified set_waiters, 872 + * then direct futex_lock_pi_atomic() to force setting the FUTEX_WAITERS bit. 873 + * hb1 and hb2 must be held by the caller. 874 + * 875 + * Returns: 876 + * 0 - failed to acquire the lock atomicly 877 + * 1 - acquired the lock 878 + * <0 - error 879 + */ 880 + static int futex_proxy_trylock_atomic(u32 __user *pifutex, 881 + struct futex_hash_bucket *hb1, 882 + struct futex_hash_bucket *hb2, 883 + union futex_key *key1, union futex_key *key2, 884 + struct futex_pi_state **ps, int set_waiters) 885 + { 886 + struct futex_q *top_waiter = NULL; 887 + u32 curval; 888 + int ret; 889 + 890 + if (get_futex_value_locked(&curval, pifutex)) 891 + return -EFAULT; 892 + 893 + /* 894 + * Find the top_waiter and determine if there are additional waiters. 895 + * If the caller intends to requeue more than 1 waiter to pifutex, 896 + * force futex_lock_pi_atomic() to set the FUTEX_WAITERS bit now, 897 + * as we have means to handle the possible fault. If not, don't set 898 + * the bit unecessarily as it will force the subsequent unlock to enter 899 + * the kernel. 900 + */ 901 + top_waiter = futex_top_waiter(hb1, key1); 902 + 903 + /* There are no waiters, nothing for us to do. */ 904 + if (!top_waiter) 905 + return 0; 906 + 907 + /* 908 + * Try to take the lock for top_waiter. Set the FUTEX_WAITERS bit in 909 + * the contended case or if set_waiters is 1. The pi_state is returned 910 + * in ps in contended cases. 911 + */ 912 + ret = futex_lock_pi_atomic(pifutex, hb2, key2, ps, top_waiter->task, 913 + set_waiters); 914 + if (ret == 1) 915 + requeue_pi_wake_futex(top_waiter, key2); 916 + 917 + return ret; 918 + } 919 + 920 + /** 921 + * futex_requeue() - Requeue waiters from uaddr1 to uaddr2 922 + * uaddr1: source futex user address 923 + * uaddr2: target futex user address 924 + * nr_wake: number of waiters to wake (must be 1 for requeue_pi) 925 + * nr_requeue: number of waiters to requeue (0-INT_MAX) 926 + * requeue_pi: if we are attempting to requeue from a non-pi futex to a 927 + * pi futex (pi to pi requeue is not supported) 928 + * 929 + * Requeue waiters on uaddr1 to uaddr2. In the requeue_pi case, try to acquire 930 + * uaddr2 atomically on behalf of the top waiter. 931 + * 932 + * Returns: 933 + * >=0 - on success, the number of tasks requeued or woken 934 + * <0 - on error 965 935 */ 966 936 static int futex_requeue(u32 __user *uaddr1, int fshared, u32 __user *uaddr2, 967 - int nr_wake, int nr_requeue, u32 *cmpval) 937 + int nr_wake, int nr_requeue, u32 *cmpval, 938 + int requeue_pi) 968 939 { 969 940 union futex_key key1 = FUTEX_KEY_INIT, key2 = FUTEX_KEY_INIT; 941 + int drop_count = 0, task_count = 0, ret; 942 + struct futex_pi_state *pi_state = NULL; 970 943 struct futex_hash_bucket *hb1, *hb2; 971 944 struct plist_head *head1; 972 945 struct futex_q *this, *next; 973 - int ret, drop_count = 0; 946 + u32 curval2; 947 + 948 + if (requeue_pi) { 949 + /* 950 + * requeue_pi requires a pi_state, try to allocate it now 951 + * without any locks in case it fails. 952 + */ 953 + if (refill_pi_state_cache()) 954 + return -ENOMEM; 955 + /* 956 + * requeue_pi must wake as many tasks as it can, up to nr_wake 957 + * + nr_requeue, since it acquires the rt_mutex prior to 958 + * returning to userspace, so as to not leave the rt_mutex with 959 + * waiters and no owner. However, second and third wake-ups 960 + * cannot be predicted as they involve race conditions with the 961 + * first wake and a fault while looking up the pi_state. Both 962 + * pthread_cond_signal() and pthread_cond_broadcast() should 963 + * use nr_wake=1. 964 + */ 965 + if (nr_wake != 1) 966 + return -EINVAL; 967 + } 974 968 975 969 retry: 970 + if (pi_state != NULL) { 971 + /* 972 + * We will have to lookup the pi_state again, so free this one 973 + * to keep the accounting correct. 974 + */ 975 + free_pi_state(pi_state); 976 + pi_state = NULL; 977 + } 978 + 976 979 ret = get_futex_key(uaddr1, fshared, &key1, VERIFY_READ); 977 980 if (unlikely(ret != 0)) 978 981 goto out; 979 - ret = get_futex_key(uaddr2, fshared, &key2, VERIFY_READ); 982 + ret = get_futex_key(uaddr2, fshared, &key2, 983 + requeue_pi ? VERIFY_WRITE : VERIFY_READ); 980 984 if (unlikely(ret != 0)) 981 985 goto out_put_key1; 982 986 ··· 1172 854 } 1173 855 } 1174 856 857 + if (requeue_pi && (task_count - nr_wake < nr_requeue)) { 858 + /* 859 + * Attempt to acquire uaddr2 and wake the top waiter. If we 860 + * intend to requeue waiters, force setting the FUTEX_WAITERS 861 + * bit. We force this here where we are able to easily handle 862 + * faults rather in the requeue loop below. 863 + */ 864 + ret = futex_proxy_trylock_atomic(uaddr2, hb1, hb2, &key1, 865 + &key2, &pi_state, nr_requeue); 866 + 867 + /* 868 + * At this point the top_waiter has either taken uaddr2 or is 869 + * waiting on it. If the former, then the pi_state will not 870 + * exist yet, look it up one more time to ensure we have a 871 + * reference to it. 872 + */ 873 + if (ret == 1) { 874 + WARN_ON(pi_state); 875 + task_count++; 876 + ret = get_futex_value_locked(&curval2, uaddr2); 877 + if (!ret) 878 + ret = lookup_pi_state(curval2, hb2, &key2, 879 + &pi_state); 880 + } 881 + 882 + switch (ret) { 883 + case 0: 884 + break; 885 + case -EFAULT: 886 + double_unlock_hb(hb1, hb2); 887 + put_futex_key(fshared, &key2); 888 + put_futex_key(fshared, &key1); 889 + ret = get_user(curval2, uaddr2); 890 + if (!ret) 891 + goto retry; 892 + goto out; 893 + case -EAGAIN: 894 + /* The owner was exiting, try again. */ 895 + double_unlock_hb(hb1, hb2); 896 + put_futex_key(fshared, &key2); 897 + put_futex_key(fshared, &key1); 898 + cond_resched(); 899 + goto retry; 900 + default: 901 + goto out_unlock; 902 + } 903 + } 904 + 1175 905 head1 = &hb1->chain; 1176 906 plist_for_each_entry_safe(this, next, head1, list) { 1177 - if (!match_futex (&this->key, &key1)) 1178 - continue; 1179 - if (++ret <= nr_wake) { 1180 - wake_futex(this); 1181 - } else { 1182 - /* 1183 - * If key1 and key2 hash to the same bucket, no need to 1184 - * requeue. 1185 - */ 1186 - if (likely(head1 != &hb2->chain)) { 1187 - plist_del(&this->list, &hb1->chain); 1188 - plist_add(&this->list, &hb2->chain); 1189 - this->lock_ptr = &hb2->lock; 1190 - #ifdef CONFIG_DEBUG_PI_LIST 1191 - this->list.plist.lock = &hb2->lock; 1192 - #endif 1193 - } 1194 - this->key = key2; 1195 - get_futex_key_refs(&key2); 1196 - drop_count++; 907 + if (task_count - nr_wake >= nr_requeue) 908 + break; 1197 909 1198 - if (ret - nr_wake >= nr_requeue) 1199 - break; 910 + if (!match_futex(&this->key, &key1)) 911 + continue; 912 + 913 + WARN_ON(!requeue_pi && this->rt_waiter); 914 + WARN_ON(requeue_pi && !this->rt_waiter); 915 + 916 + /* 917 + * Wake nr_wake waiters. For requeue_pi, if we acquired the 918 + * lock, we already woke the top_waiter. If not, it will be 919 + * woken by futex_unlock_pi(). 920 + */ 921 + if (++task_count <= nr_wake && !requeue_pi) { 922 + wake_futex(this); 923 + continue; 1200 924 } 925 + 926 + /* 927 + * Requeue nr_requeue waiters and possibly one more in the case 928 + * of requeue_pi if we couldn't acquire the lock atomically. 929 + */ 930 + if (requeue_pi) { 931 + /* Prepare the waiter to take the rt_mutex. */ 932 + atomic_inc(&pi_state->refcount); 933 + this->pi_state = pi_state; 934 + ret = rt_mutex_start_proxy_lock(&pi_state->pi_mutex, 935 + this->rt_waiter, 936 + this->task, 1); 937 + if (ret == 1) { 938 + /* We got the lock. */ 939 + requeue_pi_wake_futex(this, &key2); 940 + continue; 941 + } else if (ret) { 942 + /* -EDEADLK */ 943 + this->pi_state = NULL; 944 + free_pi_state(pi_state); 945 + goto out_unlock; 946 + } 947 + } 948 + requeue_futex(this, hb1, hb2, &key2); 949 + drop_count++; 1201 950 } 1202 951 1203 952 out_unlock: ··· 1284 899 out_put_key1: 1285 900 put_futex_key(fshared, &key1); 1286 901 out: 1287 - return ret; 902 + if (pi_state != NULL) 903 + free_pi_state(pi_state); 904 + return ret ? ret : task_count; 1288 905 } 1289 906 1290 907 /* The key must be already stored in q->key. */ 1291 908 static inline struct futex_hash_bucket *queue_lock(struct futex_q *q) 1292 909 { 1293 910 struct futex_hash_bucket *hb; 1294 - 1295 - init_waitqueue_head(&q->waiter); 1296 911 1297 912 get_futex_key_refs(&q->key); 1298 913 hb = hash_futex(&q->key); ··· 1504 1119 */ 1505 1120 #define FLAGS_SHARED 0x01 1506 1121 #define FLAGS_CLOCKRT 0x02 1122 + #define FLAGS_HAS_TIMEOUT 0x04 1507 1123 1508 1124 static long futex_wait_restart(struct restart_block *restart); 1509 1125 1510 - static int futex_wait(u32 __user *uaddr, int fshared, 1511 - u32 val, ktime_t *abs_time, u32 bitset, int clockrt) 1126 + /** 1127 + * fixup_owner() - Post lock pi_state and corner case management 1128 + * @uaddr: user address of the futex 1129 + * @fshared: whether the futex is shared (1) or not (0) 1130 + * @q: futex_q (contains pi_state and access to the rt_mutex) 1131 + * @locked: if the attempt to take the rt_mutex succeeded (1) or not (0) 1132 + * 1133 + * After attempting to lock an rt_mutex, this function is called to cleanup 1134 + * the pi_state owner as well as handle race conditions that may allow us to 1135 + * acquire the lock. Must be called with the hb lock held. 1136 + * 1137 + * Returns: 1138 + * 1 - success, lock taken 1139 + * 0 - success, lock not taken 1140 + * <0 - on error (-EFAULT) 1141 + */ 1142 + static int fixup_owner(u32 __user *uaddr, int fshared, struct futex_q *q, 1143 + int locked) 1512 1144 { 1513 - struct task_struct *curr = current; 1514 - struct restart_block *restart; 1515 - DECLARE_WAITQUEUE(wait, curr); 1516 - struct futex_hash_bucket *hb; 1517 - struct futex_q q; 1145 + struct task_struct *owner; 1146 + int ret = 0; 1147 + 1148 + if (locked) { 1149 + /* 1150 + * Got the lock. We might not be the anticipated owner if we 1151 + * did a lock-steal - fix up the PI-state in that case: 1152 + */ 1153 + if (q->pi_state->owner != current) 1154 + ret = fixup_pi_state_owner(uaddr, q, current, fshared); 1155 + goto out; 1156 + } 1157 + 1158 + /* 1159 + * Catch the rare case, where the lock was released when we were on the 1160 + * way back before we locked the hash bucket. 1161 + */ 1162 + if (q->pi_state->owner == current) { 1163 + /* 1164 + * Try to get the rt_mutex now. This might fail as some other 1165 + * task acquired the rt_mutex after we removed ourself from the 1166 + * rt_mutex waiters list. 1167 + */ 1168 + if (rt_mutex_trylock(&q->pi_state->pi_mutex)) { 1169 + locked = 1; 1170 + goto out; 1171 + } 1172 + 1173 + /* 1174 + * pi_state is incorrect, some other task did a lock steal and 1175 + * we returned due to timeout or signal without taking the 1176 + * rt_mutex. Too late. We can access the rt_mutex_owner without 1177 + * locking, as the other task is now blocked on the hash bucket 1178 + * lock. Fix the state up. 1179 + */ 1180 + owner = rt_mutex_owner(&q->pi_state->pi_mutex); 1181 + ret = fixup_pi_state_owner(uaddr, q, owner, fshared); 1182 + goto out; 1183 + } 1184 + 1185 + /* 1186 + * Paranoia check. If we did not take the lock, then we should not be 1187 + * the owner, nor the pending owner, of the rt_mutex. 1188 + */ 1189 + if (rt_mutex_owner(&q->pi_state->pi_mutex) == current) 1190 + printk(KERN_ERR "fixup_owner: ret = %d pi-mutex: %p " 1191 + "pi-state %p\n", ret, 1192 + q->pi_state->pi_mutex.owner, 1193 + q->pi_state->owner); 1194 + 1195 + out: 1196 + return ret ? ret : locked; 1197 + } 1198 + 1199 + /** 1200 + * futex_wait_queue_me() - queue_me() and wait for wakeup, timeout, or signal 1201 + * @hb: the futex hash bucket, must be locked by the caller 1202 + * @q: the futex_q to queue up on 1203 + * @timeout: the prepared hrtimer_sleeper, or null for no timeout 1204 + */ 1205 + static void futex_wait_queue_me(struct futex_hash_bucket *hb, struct futex_q *q, 1206 + struct hrtimer_sleeper *timeout) 1207 + { 1208 + queue_me(q, hb); 1209 + 1210 + /* 1211 + * There might have been scheduling since the queue_me(), as we 1212 + * cannot hold a spinlock across the get_user() in case it 1213 + * faults, and we cannot just set TASK_INTERRUPTIBLE state when 1214 + * queueing ourselves into the futex hash. This code thus has to 1215 + * rely on the futex_wake() code removing us from hash when it 1216 + * wakes us up. 1217 + */ 1218 + set_current_state(TASK_INTERRUPTIBLE); 1219 + 1220 + /* Arm the timer */ 1221 + if (timeout) { 1222 + hrtimer_start_expires(&timeout->timer, HRTIMER_MODE_ABS); 1223 + if (!hrtimer_active(&timeout->timer)) 1224 + timeout->task = NULL; 1225 + } 1226 + 1227 + /* 1228 + * !plist_node_empty() is safe here without any lock. 1229 + * q.lock_ptr != 0 is not safe, because of ordering against wakeup. 1230 + */ 1231 + if (likely(!plist_node_empty(&q->list))) { 1232 + /* 1233 + * If the timer has already expired, current will already be 1234 + * flagged for rescheduling. Only call schedule if there 1235 + * is no timeout, or if it has yet to expire. 1236 + */ 1237 + if (!timeout || timeout->task) 1238 + schedule(); 1239 + } 1240 + __set_current_state(TASK_RUNNING); 1241 + } 1242 + 1243 + /** 1244 + * futex_wait_setup() - Prepare to wait on a futex 1245 + * @uaddr: the futex userspace address 1246 + * @val: the expected value 1247 + * @fshared: whether the futex is shared (1) or not (0) 1248 + * @q: the associated futex_q 1249 + * @hb: storage for hash_bucket pointer to be returned to caller 1250 + * 1251 + * Setup the futex_q and locate the hash_bucket. Get the futex value and 1252 + * compare it with the expected value. Handle atomic faults internally. 1253 + * Return with the hb lock held and a q.key reference on success, and unlocked 1254 + * with no q.key reference on failure. 1255 + * 1256 + * Returns: 1257 + * 0 - uaddr contains val and hb has been locked 1258 + * <1 - -EFAULT or -EWOULDBLOCK (uaddr does not contain val) and hb is unlcoked 1259 + */ 1260 + static int futex_wait_setup(u32 __user *uaddr, u32 val, int fshared, 1261 + struct futex_q *q, struct futex_hash_bucket **hb) 1262 + { 1518 1263 u32 uval; 1519 1264 int ret; 1520 - struct hrtimer_sleeper t; 1521 - int rem = 0; 1522 - 1523 - if (!bitset) 1524 - return -EINVAL; 1525 - 1526 - q.pi_state = NULL; 1527 - q.bitset = bitset; 1528 - retry: 1529 - q.key = FUTEX_KEY_INIT; 1530 - ret = get_futex_key(uaddr, fshared, &q.key, VERIFY_READ); 1531 - if (unlikely(ret != 0)) 1532 - goto out; 1533 - 1534 - retry_private: 1535 - hb = queue_lock(&q); 1536 1265 1537 1266 /* 1538 1267 * Access the page AFTER the hash-bucket is locked. ··· 1664 1165 * A consequence is that futex_wait() can return zero and absorb 1665 1166 * a wakeup when *uaddr != val on entry to the syscall. This is 1666 1167 * rare, but normal. 1667 - * 1668 - * For shared futexes, we hold the mmap semaphore, so the mapping 1669 - * cannot have changed since we looked it up in get_futex_key. 1670 1168 */ 1169 + retry: 1170 + q->key = FUTEX_KEY_INIT; 1171 + ret = get_futex_key(uaddr, fshared, &q->key, VERIFY_READ); 1172 + if (unlikely(ret != 0)) 1173 + return ret; 1174 + 1175 + retry_private: 1176 + *hb = queue_lock(q); 1177 + 1671 1178 ret = get_futex_value_locked(&uval, uaddr); 1672 1179 1673 - if (unlikely(ret)) { 1674 - queue_unlock(&q, hb); 1180 + if (ret) { 1181 + queue_unlock(q, *hb); 1675 1182 1676 1183 ret = get_user(uval, uaddr); 1677 1184 if (ret) 1678 - goto out_put_key; 1185 + goto out; 1679 1186 1680 1187 if (!fshared) 1681 1188 goto retry_private; 1682 1189 1683 - put_futex_key(fshared, &q.key); 1190 + put_futex_key(fshared, &q->key); 1684 1191 goto retry; 1685 1192 } 1686 - ret = -EWOULDBLOCK; 1687 - if (unlikely(uval != val)) { 1688 - queue_unlock(&q, hb); 1689 - goto out_put_key; 1193 + 1194 + if (uval != val) { 1195 + queue_unlock(q, *hb); 1196 + ret = -EWOULDBLOCK; 1690 1197 } 1691 1198 1692 - /* Only actually queue if *uaddr contained val. */ 1693 - queue_me(&q, hb); 1199 + out: 1200 + if (ret) 1201 + put_futex_key(fshared, &q->key); 1202 + return ret; 1203 + } 1694 1204 1695 - /* 1696 - * There might have been scheduling since the queue_me(), as we 1697 - * cannot hold a spinlock across the get_user() in case it 1698 - * faults, and we cannot just set TASK_INTERRUPTIBLE state when 1699 - * queueing ourselves into the futex hash. This code thus has to 1700 - * rely on the futex_wake() code removing us from hash when it 1701 - * wakes us up. 1702 - */ 1205 + static int futex_wait(u32 __user *uaddr, int fshared, 1206 + u32 val, ktime_t *abs_time, u32 bitset, int clockrt) 1207 + { 1208 + struct hrtimer_sleeper timeout, *to = NULL; 1209 + struct restart_block *restart; 1210 + struct futex_hash_bucket *hb; 1211 + struct futex_q q; 1212 + int ret; 1703 1213 1704 - /* add_wait_queue is the barrier after __set_current_state. */ 1705 - __set_current_state(TASK_INTERRUPTIBLE); 1706 - add_wait_queue(&q.waiter, &wait); 1707 - /* 1708 - * !plist_node_empty() is safe here without any lock. 1709 - * q.lock_ptr != 0 is not safe, because of ordering against wakeup. 1710 - */ 1711 - if (likely(!plist_node_empty(&q.list))) { 1712 - if (!abs_time) 1713 - schedule(); 1714 - else { 1715 - hrtimer_init_on_stack(&t.timer, 1716 - clockrt ? CLOCK_REALTIME : 1717 - CLOCK_MONOTONIC, 1718 - HRTIMER_MODE_ABS); 1719 - hrtimer_init_sleeper(&t, current); 1720 - hrtimer_set_expires_range_ns(&t.timer, *abs_time, 1721 - current->timer_slack_ns); 1214 + if (!bitset) 1215 + return -EINVAL; 1722 1216 1723 - hrtimer_start_expires(&t.timer, HRTIMER_MODE_ABS); 1724 - if (!hrtimer_active(&t.timer)) 1725 - t.task = NULL; 1217 + q.pi_state = NULL; 1218 + q.bitset = bitset; 1219 + q.rt_waiter = NULL; 1726 1220 1727 - /* 1728 - * the timer could have already expired, in which 1729 - * case current would be flagged for rescheduling. 1730 - * Don't bother calling schedule. 1731 - */ 1732 - if (likely(t.task)) 1733 - schedule(); 1221 + if (abs_time) { 1222 + to = &timeout; 1734 1223 1735 - hrtimer_cancel(&t.timer); 1736 - 1737 - /* Flag if a timeout occured */ 1738 - rem = (t.task == NULL); 1739 - 1740 - destroy_hrtimer_on_stack(&t.timer); 1741 - } 1224 + hrtimer_init_on_stack(&to->timer, clockrt ? CLOCK_REALTIME : 1225 + CLOCK_MONOTONIC, HRTIMER_MODE_ABS); 1226 + hrtimer_init_sleeper(to, current); 1227 + hrtimer_set_expires_range_ns(&to->timer, *abs_time, 1228 + current->timer_slack_ns); 1742 1229 } 1743 - __set_current_state(TASK_RUNNING); 1744 1230 1745 - /* 1746 - * NOTE: we don't remove ourselves from the waitqueue because 1747 - * we are the only user of it. 1748 - */ 1231 + /* Prepare to wait on uaddr. */ 1232 + ret = futex_wait_setup(uaddr, val, fshared, &q, &hb); 1233 + if (ret) 1234 + goto out; 1235 + 1236 + /* queue_me and wait for wakeup, timeout, or a signal. */ 1237 + futex_wait_queue_me(hb, &q, to); 1749 1238 1750 1239 /* If we were woken (and unqueued), we succeeded, whatever. */ 1751 1240 ret = 0; 1752 1241 if (!unqueue_me(&q)) 1753 1242 goto out_put_key; 1754 1243 ret = -ETIMEDOUT; 1755 - if (rem) 1244 + if (to && !to->task) 1756 1245 goto out_put_key; 1757 1246 1758 1247 /* ··· 1757 1270 restart->futex.val = val; 1758 1271 restart->futex.time = abs_time->tv64; 1759 1272 restart->futex.bitset = bitset; 1760 - restart->futex.flags = 0; 1273 + restart->futex.flags = FLAGS_HAS_TIMEOUT; 1761 1274 1762 1275 if (fshared) 1763 1276 restart->futex.flags |= FLAGS_SHARED; ··· 1769 1282 out_put_key: 1770 1283 put_futex_key(fshared, &q.key); 1771 1284 out: 1285 + if (to) { 1286 + hrtimer_cancel(&to->timer); 1287 + destroy_hrtimer_on_stack(&to->timer); 1288 + } 1772 1289 return ret; 1773 1290 } 1774 1291 ··· 1781 1290 { 1782 1291 u32 __user *uaddr = (u32 __user *)restart->futex.uaddr; 1783 1292 int fshared = 0; 1784 - ktime_t t; 1293 + ktime_t t, *tp = NULL; 1785 1294 1786 - t.tv64 = restart->futex.time; 1295 + if (restart->futex.flags & FLAGS_HAS_TIMEOUT) { 1296 + t.tv64 = restart->futex.time; 1297 + tp = &t; 1298 + } 1787 1299 restart->fn = do_no_restart_syscall; 1788 1300 if (restart->futex.flags & FLAGS_SHARED) 1789 1301 fshared = 1; 1790 - return (long)futex_wait(uaddr, fshared, restart->futex.val, &t, 1302 + return (long)futex_wait(uaddr, fshared, restart->futex.val, tp, 1791 1303 restart->futex.bitset, 1792 1304 restart->futex.flags & FLAGS_CLOCKRT); 1793 1305 } ··· 1806 1312 int detect, ktime_t *time, int trylock) 1807 1313 { 1808 1314 struct hrtimer_sleeper timeout, *to = NULL; 1809 - struct task_struct *curr = current; 1810 1315 struct futex_hash_bucket *hb; 1811 - u32 uval, newval, curval; 1316 + u32 uval; 1812 1317 struct futex_q q; 1813 - int ret, lock_taken, ownerdied = 0; 1318 + int res, ret; 1814 1319 1815 1320 if (refill_pi_state_cache()) 1816 1321 return -ENOMEM; ··· 1823 1330 } 1824 1331 1825 1332 q.pi_state = NULL; 1333 + q.rt_waiter = NULL; 1826 1334 retry: 1827 1335 q.key = FUTEX_KEY_INIT; 1828 1336 ret = get_futex_key(uaddr, fshared, &q.key, VERIFY_WRITE); ··· 1833 1339 retry_private: 1834 1340 hb = queue_lock(&q); 1835 1341 1836 - retry_locked: 1837 - ret = lock_taken = 0; 1838 - 1839 - /* 1840 - * To avoid races, we attempt to take the lock here again 1841 - * (by doing a 0 -> TID atomic cmpxchg), while holding all 1842 - * the locks. It will most likely not succeed. 1843 - */ 1844 - newval = task_pid_vnr(current); 1845 - 1846 - curval = cmpxchg_futex_value_locked(uaddr, 0, newval); 1847 - 1848 - if (unlikely(curval == -EFAULT)) 1849 - goto uaddr_faulted; 1850 - 1851 - /* 1852 - * Detect deadlocks. In case of REQUEUE_PI this is a valid 1853 - * situation and we return success to user space. 1854 - */ 1855 - if (unlikely((curval & FUTEX_TID_MASK) == task_pid_vnr(current))) { 1856 - ret = -EDEADLK; 1857 - goto out_unlock_put_key; 1858 - } 1859 - 1860 - /* 1861 - * Surprise - we got the lock. Just return to userspace: 1862 - */ 1863 - if (unlikely(!curval)) 1864 - goto out_unlock_put_key; 1865 - 1866 - uval = curval; 1867 - 1868 - /* 1869 - * Set the WAITERS flag, so the owner will know it has someone 1870 - * to wake at next unlock 1871 - */ 1872 - newval = curval | FUTEX_WAITERS; 1873 - 1874 - /* 1875 - * There are two cases, where a futex might have no owner (the 1876 - * owner TID is 0): OWNER_DIED. We take over the futex in this 1877 - * case. We also do an unconditional take over, when the owner 1878 - * of the futex died. 1879 - * 1880 - * This is safe as we are protected by the hash bucket lock ! 1881 - */ 1882 - if (unlikely(ownerdied || !(curval & FUTEX_TID_MASK))) { 1883 - /* Keep the OWNER_DIED bit */ 1884 - newval = (curval & ~FUTEX_TID_MASK) | task_pid_vnr(current); 1885 - ownerdied = 0; 1886 - lock_taken = 1; 1887 - } 1888 - 1889 - curval = cmpxchg_futex_value_locked(uaddr, uval, newval); 1890 - 1891 - if (unlikely(curval == -EFAULT)) 1892 - goto uaddr_faulted; 1893 - if (unlikely(curval != uval)) 1894 - goto retry_locked; 1895 - 1896 - /* 1897 - * We took the lock due to owner died take over. 1898 - */ 1899 - if (unlikely(lock_taken)) 1900 - goto out_unlock_put_key; 1901 - 1902 - /* 1903 - * We dont have the lock. Look up the PI state (or create it if 1904 - * we are the first waiter): 1905 - */ 1906 - ret = lookup_pi_state(uval, hb, &q.key, &q.pi_state); 1907 - 1342 + ret = futex_lock_pi_atomic(uaddr, hb, &q.key, &q.pi_state, current, 0); 1908 1343 if (unlikely(ret)) { 1909 1344 switch (ret) { 1910 - 1345 + case 1: 1346 + /* We got the lock. */ 1347 + ret = 0; 1348 + goto out_unlock_put_key; 1349 + case -EFAULT: 1350 + goto uaddr_faulted; 1911 1351 case -EAGAIN: 1912 1352 /* 1913 1353 * Task is exiting and we just wait for the ··· 1851 1423 put_futex_key(fshared, &q.key); 1852 1424 cond_resched(); 1853 1425 goto retry; 1854 - 1855 - case -ESRCH: 1856 - /* 1857 - * No owner found for this futex. Check if the 1858 - * OWNER_DIED bit is set to figure out whether 1859 - * this is a robust futex or not. 1860 - */ 1861 - if (get_futex_value_locked(&curval, uaddr)) 1862 - goto uaddr_faulted; 1863 - 1864 - /* 1865 - * We simply start over in case of a robust 1866 - * futex. The code above will take the futex 1867 - * and return happy. 1868 - */ 1869 - if (curval & FUTEX_OWNER_DIED) { 1870 - ownerdied = 1; 1871 - goto retry_locked; 1872 - } 1873 1426 default: 1874 1427 goto out_unlock_put_key; 1875 1428 } ··· 1874 1465 } 1875 1466 1876 1467 spin_lock(q.lock_ptr); 1877 - 1878 - if (!ret) { 1879 - /* 1880 - * Got the lock. We might not be the anticipated owner 1881 - * if we did a lock-steal - fix up the PI-state in 1882 - * that case: 1883 - */ 1884 - if (q.pi_state->owner != curr) 1885 - ret = fixup_pi_state_owner(uaddr, &q, curr, fshared); 1886 - } else { 1887 - /* 1888 - * Catch the rare case, where the lock was released 1889 - * when we were on the way back before we locked the 1890 - * hash bucket. 1891 - */ 1892 - if (q.pi_state->owner == curr) { 1893 - /* 1894 - * Try to get the rt_mutex now. This might 1895 - * fail as some other task acquired the 1896 - * rt_mutex after we removed ourself from the 1897 - * rt_mutex waiters list. 1898 - */ 1899 - if (rt_mutex_trylock(&q.pi_state->pi_mutex)) 1900 - ret = 0; 1901 - else { 1902 - /* 1903 - * pi_state is incorrect, some other 1904 - * task did a lock steal and we 1905 - * returned due to timeout or signal 1906 - * without taking the rt_mutex. Too 1907 - * late. We can access the 1908 - * rt_mutex_owner without locking, as 1909 - * the other task is now blocked on 1910 - * the hash bucket lock. Fix the state 1911 - * up. 1912 - */ 1913 - struct task_struct *owner; 1914 - int res; 1915 - 1916 - owner = rt_mutex_owner(&q.pi_state->pi_mutex); 1917 - res = fixup_pi_state_owner(uaddr, &q, owner, 1918 - fshared); 1919 - 1920 - /* propagate -EFAULT, if the fixup failed */ 1921 - if (res) 1922 - ret = res; 1923 - } 1924 - } else { 1925 - /* 1926 - * Paranoia check. If we did not take the lock 1927 - * in the trylock above, then we should not be 1928 - * the owner of the rtmutex, neither the real 1929 - * nor the pending one: 1930 - */ 1931 - if (rt_mutex_owner(&q.pi_state->pi_mutex) == curr) 1932 - printk(KERN_ERR "futex_lock_pi: ret = %d " 1933 - "pi-mutex: %p pi-state %p\n", ret, 1934 - q.pi_state->pi_mutex.owner, 1935 - q.pi_state->owner); 1936 - } 1937 - } 1468 + /* 1469 + * Fixup the pi_state owner and possibly acquire the lock if we 1470 + * haven't already. 1471 + */ 1472 + res = fixup_owner(uaddr, fshared, &q, !ret); 1473 + /* 1474 + * If fixup_owner() returned an error, proprogate that. If it acquired 1475 + * the lock, clear our -ETIMEDOUT or -EINTR. 1476 + */ 1477 + if (res) 1478 + ret = (res < 0) ? res : 0; 1938 1479 1939 1480 /* 1940 - * If fixup_pi_state_owner() faulted and was unable to handle the 1941 - * fault, unlock it and return the fault to userspace. 1481 + * If fixup_owner() faulted and was unable to handle the fault, unlock 1482 + * it and return the fault to userspace. 1942 1483 */ 1943 1484 if (ret && (rt_mutex_owner(&q.pi_state->pi_mutex) == current)) 1944 1485 rt_mutex_unlock(&q.pi_state->pi_mutex); ··· 1896 1537 /* Unqueue and drop the lock */ 1897 1538 unqueue_me_pi(&q); 1898 1539 1899 - if (to) 1900 - destroy_hrtimer_on_stack(&to->timer); 1901 - return ret != -EINTR ? ret : -ERESTARTNOINTR; 1540 + goto out; 1902 1541 1903 1542 out_unlock_put_key: 1904 1543 queue_unlock(&q, hb); ··· 1906 1549 out: 1907 1550 if (to) 1908 1551 destroy_hrtimer_on_stack(&to->timer); 1909 - return ret; 1552 + return ret != -EINTR ? ret : -ERESTARTNOINTR; 1910 1553 1911 1554 uaddr_faulted: 1912 1555 /* ··· 1928 1571 put_futex_key(fshared, &q.key); 1929 1572 goto retry; 1930 1573 } 1931 - 1932 1574 1933 1575 /* 1934 1576 * Userspace attempted a TID -> 0 atomic transition, and failed. ··· 2027 1671 if (!ret) 2028 1672 goto retry; 2029 1673 1674 + return ret; 1675 + } 1676 + 1677 + /** 1678 + * handle_early_requeue_pi_wakeup() - Detect early wakeup on the initial futex 1679 + * @hb: the hash_bucket futex_q was original enqueued on 1680 + * @q: the futex_q woken while waiting to be requeued 1681 + * @key2: the futex_key of the requeue target futex 1682 + * @timeout: the timeout associated with the wait (NULL if none) 1683 + * 1684 + * Detect if the task was woken on the initial futex as opposed to the requeue 1685 + * target futex. If so, determine if it was a timeout or a signal that caused 1686 + * the wakeup and return the appropriate error code to the caller. Must be 1687 + * called with the hb lock held. 1688 + * 1689 + * Returns 1690 + * 0 - no early wakeup detected 1691 + * <0 - -ETIMEDOUT or -ERESTARTNOINTR 1692 + */ 1693 + static inline 1694 + int handle_early_requeue_pi_wakeup(struct futex_hash_bucket *hb, 1695 + struct futex_q *q, union futex_key *key2, 1696 + struct hrtimer_sleeper *timeout) 1697 + { 1698 + int ret = 0; 1699 + 1700 + /* 1701 + * With the hb lock held, we avoid races while we process the wakeup. 1702 + * We only need to hold hb (and not hb2) to ensure atomicity as the 1703 + * wakeup code can't change q.key from uaddr to uaddr2 if we hold hb. 1704 + * It can't be requeued from uaddr2 to something else since we don't 1705 + * support a PI aware source futex for requeue. 1706 + */ 1707 + if (!match_futex(&q->key, key2)) { 1708 + WARN_ON(q->lock_ptr && (&hb->lock != q->lock_ptr)); 1709 + /* 1710 + * We were woken prior to requeue by a timeout or a signal. 1711 + * Unqueue the futex_q and determine which it was. 1712 + */ 1713 + plist_del(&q->list, &q->list.plist); 1714 + drop_futex_key_refs(&q->key); 1715 + 1716 + if (timeout && !timeout->task) 1717 + ret = -ETIMEDOUT; 1718 + else 1719 + ret = -ERESTARTNOINTR; 1720 + } 1721 + return ret; 1722 + } 1723 + 1724 + /** 1725 + * futex_wait_requeue_pi() - Wait on uaddr and take uaddr2 1726 + * @uaddr: the futex we initialyl wait on (non-pi) 1727 + * @fshared: whether the futexes are shared (1) or not (0). They must be 1728 + * the same type, no requeueing from private to shared, etc. 1729 + * @val: the expected value of uaddr 1730 + * @abs_time: absolute timeout 1731 + * @bitset: 32 bit wakeup bitset set by userspace, defaults to all. 1732 + * @clockrt: whether to use CLOCK_REALTIME (1) or CLOCK_MONOTONIC (0) 1733 + * @uaddr2: the pi futex we will take prior to returning to user-space 1734 + * 1735 + * The caller will wait on uaddr and will be requeued by futex_requeue() to 1736 + * uaddr2 which must be PI aware. Normal wakeup will wake on uaddr2 and 1737 + * complete the acquisition of the rt_mutex prior to returning to userspace. 1738 + * This ensures the rt_mutex maintains an owner when it has waiters; without 1739 + * one, the pi logic wouldn't know which task to boost/deboost, if there was a 1740 + * need to. 1741 + * 1742 + * We call schedule in futex_wait_queue_me() when we enqueue and return there 1743 + * via the following: 1744 + * 1) wakeup on uaddr2 after an atomic lock acquisition by futex_requeue() 1745 + * 2) wakeup on uaddr2 after a requeue and subsequent unlock 1746 + * 3) signal (before or after requeue) 1747 + * 4) timeout (before or after requeue) 1748 + * 1749 + * If 3, we setup a restart_block with futex_wait_requeue_pi() as the function. 1750 + * 1751 + * If 2, we may then block on trying to take the rt_mutex and return via: 1752 + * 5) successful lock 1753 + * 6) signal 1754 + * 7) timeout 1755 + * 8) other lock acquisition failure 1756 + * 1757 + * If 6, we setup a restart_block with futex_lock_pi() as the function. 1758 + * 1759 + * If 4 or 7, we cleanup and return with -ETIMEDOUT. 1760 + * 1761 + * Returns: 1762 + * 0 - On success 1763 + * <0 - On error 1764 + */ 1765 + static int futex_wait_requeue_pi(u32 __user *uaddr, int fshared, 1766 + u32 val, ktime_t *abs_time, u32 bitset, 1767 + int clockrt, u32 __user *uaddr2) 1768 + { 1769 + struct hrtimer_sleeper timeout, *to = NULL; 1770 + struct rt_mutex_waiter rt_waiter; 1771 + struct rt_mutex *pi_mutex = NULL; 1772 + struct futex_hash_bucket *hb; 1773 + union futex_key key2; 1774 + struct futex_q q; 1775 + int res, ret; 1776 + 1777 + if (!bitset) 1778 + return -EINVAL; 1779 + 1780 + if (abs_time) { 1781 + to = &timeout; 1782 + hrtimer_init_on_stack(&to->timer, clockrt ? CLOCK_REALTIME : 1783 + CLOCK_MONOTONIC, HRTIMER_MODE_ABS); 1784 + hrtimer_init_sleeper(to, current); 1785 + hrtimer_set_expires_range_ns(&to->timer, *abs_time, 1786 + current->timer_slack_ns); 1787 + } 1788 + 1789 + /* 1790 + * The waiter is allocated on our stack, manipulated by the requeue 1791 + * code while we sleep on uaddr. 1792 + */ 1793 + debug_rt_mutex_init_waiter(&rt_waiter); 1794 + rt_waiter.task = NULL; 1795 + 1796 + q.pi_state = NULL; 1797 + q.bitset = bitset; 1798 + q.rt_waiter = &rt_waiter; 1799 + 1800 + key2 = FUTEX_KEY_INIT; 1801 + ret = get_futex_key(uaddr2, fshared, &key2, VERIFY_WRITE); 1802 + if (unlikely(ret != 0)) 1803 + goto out; 1804 + 1805 + /* Prepare to wait on uaddr. */ 1806 + ret = futex_wait_setup(uaddr, val, fshared, &q, &hb); 1807 + if (ret) 1808 + goto out_key2; 1809 + 1810 + /* Queue the futex_q, drop the hb lock, wait for wakeup. */ 1811 + futex_wait_queue_me(hb, &q, to); 1812 + 1813 + spin_lock(&hb->lock); 1814 + ret = handle_early_requeue_pi_wakeup(hb, &q, &key2, to); 1815 + spin_unlock(&hb->lock); 1816 + if (ret) 1817 + goto out_put_keys; 1818 + 1819 + /* 1820 + * In order for us to be here, we know our q.key == key2, and since 1821 + * we took the hb->lock above, we also know that futex_requeue() has 1822 + * completed and we no longer have to concern ourselves with a wakeup 1823 + * race with the atomic proxy lock acquition by the requeue code. 1824 + */ 1825 + 1826 + /* Check if the requeue code acquired the second futex for us. */ 1827 + if (!q.rt_waiter) { 1828 + /* 1829 + * Got the lock. We might not be the anticipated owner if we 1830 + * did a lock-steal - fix up the PI-state in that case. 1831 + */ 1832 + if (q.pi_state && (q.pi_state->owner != current)) { 1833 + spin_lock(q.lock_ptr); 1834 + ret = fixup_pi_state_owner(uaddr2, &q, current, 1835 + fshared); 1836 + spin_unlock(q.lock_ptr); 1837 + } 1838 + } else { 1839 + /* 1840 + * We have been woken up by futex_unlock_pi(), a timeout, or a 1841 + * signal. futex_unlock_pi() will not destroy the lock_ptr nor 1842 + * the pi_state. 1843 + */ 1844 + WARN_ON(!&q.pi_state); 1845 + pi_mutex = &q.pi_state->pi_mutex; 1846 + ret = rt_mutex_finish_proxy_lock(pi_mutex, to, &rt_waiter, 1); 1847 + debug_rt_mutex_free_waiter(&rt_waiter); 1848 + 1849 + spin_lock(q.lock_ptr); 1850 + /* 1851 + * Fixup the pi_state owner and possibly acquire the lock if we 1852 + * haven't already. 1853 + */ 1854 + res = fixup_owner(uaddr2, fshared, &q, !ret); 1855 + /* 1856 + * If fixup_owner() returned an error, proprogate that. If it 1857 + * acquired the lock, clear our -ETIMEDOUT or -EINTR. 1858 + */ 1859 + if (res) 1860 + ret = (res < 0) ? res : 0; 1861 + 1862 + /* Unqueue and drop the lock. */ 1863 + unqueue_me_pi(&q); 1864 + } 1865 + 1866 + /* 1867 + * If fixup_pi_state_owner() faulted and was unable to handle the 1868 + * fault, unlock the rt_mutex and return the fault to userspace. 1869 + */ 1870 + if (ret == -EFAULT) { 1871 + if (rt_mutex_owner(pi_mutex) == current) 1872 + rt_mutex_unlock(pi_mutex); 1873 + } else if (ret == -EINTR) { 1874 + /* 1875 + * We've already been requeued, but we have no way to 1876 + * restart by calling futex_lock_pi() directly. We 1877 + * could restart the syscall, but that will look at 1878 + * the user space value and return right away. So we 1879 + * drop back with EWOULDBLOCK to tell user space that 1880 + * "val" has been changed. That's the same what the 1881 + * restart of the syscall would do in 1882 + * futex_wait_setup(). 1883 + */ 1884 + ret = -EWOULDBLOCK; 1885 + } 1886 + 1887 + out_put_keys: 1888 + put_futex_key(fshared, &q.key); 1889 + out_key2: 1890 + put_futex_key(fshared, &key2); 1891 + 1892 + out: 1893 + if (to) { 1894 + hrtimer_cancel(&to->timer); 1895 + destroy_hrtimer_on_stack(&to->timer); 1896 + } 2030 1897 return ret; 2031 1898 } 2032 1899 ··· 2475 1896 fshared = 1; 2476 1897 2477 1898 clockrt = op & FUTEX_CLOCK_REALTIME; 2478 - if (clockrt && cmd != FUTEX_WAIT_BITSET) 1899 + if (clockrt && cmd != FUTEX_WAIT_BITSET && cmd != FUTEX_WAIT_REQUEUE_PI) 2479 1900 return -ENOSYS; 2480 1901 2481 1902 switch (cmd) { ··· 2490 1911 ret = futex_wake(uaddr, fshared, val, val3); 2491 1912 break; 2492 1913 case FUTEX_REQUEUE: 2493 - ret = futex_requeue(uaddr, fshared, uaddr2, val, val2, NULL); 1914 + ret = futex_requeue(uaddr, fshared, uaddr2, val, val2, NULL, 0); 2494 1915 break; 2495 1916 case FUTEX_CMP_REQUEUE: 2496 - ret = futex_requeue(uaddr, fshared, uaddr2, val, val2, &val3); 1917 + ret = futex_requeue(uaddr, fshared, uaddr2, val, val2, &val3, 1918 + 0); 2497 1919 break; 2498 1920 case FUTEX_WAKE_OP: 2499 1921 ret = futex_wake_op(uaddr, fshared, uaddr2, val, val2, val3); ··· 2510 1930 case FUTEX_TRYLOCK_PI: 2511 1931 if (futex_cmpxchg_enabled) 2512 1932 ret = futex_lock_pi(uaddr, fshared, 0, timeout, 1); 1933 + break; 1934 + case FUTEX_WAIT_REQUEUE_PI: 1935 + val3 = FUTEX_BITSET_MATCH_ANY; 1936 + ret = futex_wait_requeue_pi(uaddr, fshared, val, timeout, val3, 1937 + clockrt, uaddr2); 1938 + break; 1939 + case FUTEX_CMP_REQUEUE_PI: 1940 + ret = futex_requeue(uaddr, fshared, uaddr2, val, val2, &val3, 1941 + 1); 2513 1942 break; 2514 1943 default: 2515 1944 ret = -ENOSYS; ··· 2537 1948 int cmd = op & FUTEX_CMD_MASK; 2538 1949 2539 1950 if (utime && (cmd == FUTEX_WAIT || cmd == FUTEX_LOCK_PI || 2540 - cmd == FUTEX_WAIT_BITSET)) { 1951 + cmd == FUTEX_WAIT_BITSET || 1952 + cmd == FUTEX_WAIT_REQUEUE_PI)) { 2541 1953 if (copy_from_user(&ts, utime, sizeof(ts)) != 0) 2542 1954 return -EFAULT; 2543 1955 if (!timespec_valid(&ts)) ··· 2550 1960 tp = &t; 2551 1961 } 2552 1962 /* 2553 - * requeue parameter in 'utime' if cmd == FUTEX_REQUEUE. 1963 + * requeue parameter in 'utime' if cmd == FUTEX_*_REQUEUE_*. 2554 1964 * number of waiters to wake in 'utime' if cmd == FUTEX_WAKE_OP. 2555 1965 */ 2556 1966 if (cmd == FUTEX_REQUEUE || cmd == FUTEX_CMP_REQUEUE || 2557 - cmd == FUTEX_WAKE_OP) 1967 + cmd == FUTEX_CMP_REQUEUE_PI || cmd == FUTEX_WAKE_OP) 2558 1968 val2 = (u32) (unsigned long) utime; 2559 1969 2560 1970 return do_futex(uaddr, op, val, tp, uaddr2, val2, val3);
+211 -77
kernel/rtmutex.c
··· 300 300 * assigned pending owner [which might not have taken the 301 301 * lock yet]: 302 302 */ 303 - static inline int try_to_steal_lock(struct rt_mutex *lock) 303 + static inline int try_to_steal_lock(struct rt_mutex *lock, 304 + struct task_struct *task) 304 305 { 305 306 struct task_struct *pendowner = rt_mutex_owner(lock); 306 307 struct rt_mutex_waiter *next; ··· 310 309 if (!rt_mutex_owner_pending(lock)) 311 310 return 0; 312 311 313 - if (pendowner == current) 312 + if (pendowner == task) 314 313 return 1; 315 314 316 315 spin_lock_irqsave(&pendowner->pi_lock, flags); 317 - if (current->prio >= pendowner->prio) { 316 + if (task->prio >= pendowner->prio) { 318 317 spin_unlock_irqrestore(&pendowner->pi_lock, flags); 319 318 return 0; 320 319 } ··· 339 338 * We are going to steal the lock and a waiter was 340 339 * enqueued on the pending owners pi_waiters queue. So 341 340 * we have to enqueue this waiter into 342 - * current->pi_waiters list. This covers the case, 343 - * where current is boosted because it holds another 341 + * task->pi_waiters list. This covers the case, 342 + * where task is boosted because it holds another 344 343 * lock and gets unboosted because the booster is 345 344 * interrupted, so we would delay a waiter with higher 346 - * priority as current->normal_prio. 345 + * priority as task->normal_prio. 347 346 * 348 347 * Note: in the rare case of a SCHED_OTHER task changing 349 348 * its priority and thus stealing the lock, next->task 350 - * might be current: 349 + * might be task: 351 350 */ 352 - if (likely(next->task != current)) { 353 - spin_lock_irqsave(&current->pi_lock, flags); 354 - plist_add(&next->pi_list_entry, &current->pi_waiters); 355 - __rt_mutex_adjust_prio(current); 356 - spin_unlock_irqrestore(&current->pi_lock, flags); 351 + if (likely(next->task != task)) { 352 + spin_lock_irqsave(&task->pi_lock, flags); 353 + plist_add(&next->pi_list_entry, &task->pi_waiters); 354 + __rt_mutex_adjust_prio(task); 355 + spin_unlock_irqrestore(&task->pi_lock, flags); 357 356 } 358 357 return 1; 359 358 } ··· 390 389 */ 391 390 mark_rt_mutex_waiters(lock); 392 391 393 - if (rt_mutex_owner(lock) && !try_to_steal_lock(lock)) 392 + if (rt_mutex_owner(lock) && !try_to_steal_lock(lock, current)) 394 393 return 0; 395 394 396 395 /* We got the lock. */ ··· 412 411 */ 413 412 static int task_blocks_on_rt_mutex(struct rt_mutex *lock, 414 413 struct rt_mutex_waiter *waiter, 414 + struct task_struct *task, 415 415 int detect_deadlock) 416 416 { 417 417 struct task_struct *owner = rt_mutex_owner(lock); ··· 420 418 unsigned long flags; 421 419 int chain_walk = 0, res; 422 420 423 - spin_lock_irqsave(&current->pi_lock, flags); 424 - __rt_mutex_adjust_prio(current); 425 - waiter->task = current; 421 + spin_lock_irqsave(&task->pi_lock, flags); 422 + __rt_mutex_adjust_prio(task); 423 + waiter->task = task; 426 424 waiter->lock = lock; 427 - plist_node_init(&waiter->list_entry, current->prio); 428 - plist_node_init(&waiter->pi_list_entry, current->prio); 425 + plist_node_init(&waiter->list_entry, task->prio); 426 + plist_node_init(&waiter->pi_list_entry, task->prio); 429 427 430 428 /* Get the top priority waiter on the lock */ 431 429 if (rt_mutex_has_waiters(lock)) 432 430 top_waiter = rt_mutex_top_waiter(lock); 433 431 plist_add(&waiter->list_entry, &lock->wait_list); 434 432 435 - current->pi_blocked_on = waiter; 433 + task->pi_blocked_on = waiter; 436 434 437 - spin_unlock_irqrestore(&current->pi_lock, flags); 435 + spin_unlock_irqrestore(&task->pi_lock, flags); 438 436 439 437 if (waiter == rt_mutex_top_waiter(lock)) { 440 438 spin_lock_irqsave(&owner->pi_lock, flags); ··· 462 460 spin_unlock(&lock->wait_lock); 463 461 464 462 res = rt_mutex_adjust_prio_chain(owner, detect_deadlock, lock, waiter, 465 - current); 463 + task); 466 464 467 465 spin_lock(&lock->wait_lock); 468 466 ··· 607 605 rt_mutex_adjust_prio_chain(task, 0, NULL, NULL, task); 608 606 } 609 607 608 + /** 609 + * __rt_mutex_slowlock() - Perform the wait-wake-try-to-take loop 610 + * @lock: the rt_mutex to take 611 + * @state: the state the task should block in (TASK_INTERRUPTIBLE 612 + * or TASK_UNINTERRUPTIBLE) 613 + * @timeout: the pre-initialized and started timer, or NULL for none 614 + * @waiter: the pre-initialized rt_mutex_waiter 615 + * @detect_deadlock: passed to task_blocks_on_rt_mutex 616 + * 617 + * lock->wait_lock must be held by the caller. 618 + */ 619 + static int __sched 620 + __rt_mutex_slowlock(struct rt_mutex *lock, int state, 621 + struct hrtimer_sleeper *timeout, 622 + struct rt_mutex_waiter *waiter, 623 + int detect_deadlock) 624 + { 625 + int ret = 0; 626 + 627 + for (;;) { 628 + /* Try to acquire the lock: */ 629 + if (try_to_take_rt_mutex(lock)) 630 + break; 631 + 632 + /* 633 + * TASK_INTERRUPTIBLE checks for signals and 634 + * timeout. Ignored otherwise. 635 + */ 636 + if (unlikely(state == TASK_INTERRUPTIBLE)) { 637 + /* Signal pending? */ 638 + if (signal_pending(current)) 639 + ret = -EINTR; 640 + if (timeout && !timeout->task) 641 + ret = -ETIMEDOUT; 642 + if (ret) 643 + break; 644 + } 645 + 646 + /* 647 + * waiter->task is NULL the first time we come here and 648 + * when we have been woken up by the previous owner 649 + * but the lock got stolen by a higher prio task. 650 + */ 651 + if (!waiter->task) { 652 + ret = task_blocks_on_rt_mutex(lock, waiter, current, 653 + detect_deadlock); 654 + /* 655 + * If we got woken up by the owner then start loop 656 + * all over without going into schedule to try 657 + * to get the lock now: 658 + */ 659 + if (unlikely(!waiter->task)) { 660 + /* 661 + * Reset the return value. We might 662 + * have returned with -EDEADLK and the 663 + * owner released the lock while we 664 + * were walking the pi chain. 665 + */ 666 + ret = 0; 667 + continue; 668 + } 669 + if (unlikely(ret)) 670 + break; 671 + } 672 + 673 + spin_unlock(&lock->wait_lock); 674 + 675 + debug_rt_mutex_print_deadlock(waiter); 676 + 677 + if (waiter->task) 678 + schedule_rt_mutex(lock); 679 + 680 + spin_lock(&lock->wait_lock); 681 + set_current_state(state); 682 + } 683 + 684 + return ret; 685 + } 686 + 610 687 /* 611 688 * Slow path lock function: 612 689 */ ··· 717 636 timeout->task = NULL; 718 637 } 719 638 720 - for (;;) { 721 - /* Try to acquire the lock: */ 722 - if (try_to_take_rt_mutex(lock)) 723 - break; 724 - 725 - /* 726 - * TASK_INTERRUPTIBLE checks for signals and 727 - * timeout. Ignored otherwise. 728 - */ 729 - if (unlikely(state == TASK_INTERRUPTIBLE)) { 730 - /* Signal pending? */ 731 - if (signal_pending(current)) 732 - ret = -EINTR; 733 - if (timeout && !timeout->task) 734 - ret = -ETIMEDOUT; 735 - if (ret) 736 - break; 737 - } 738 - 739 - /* 740 - * waiter.task is NULL the first time we come here and 741 - * when we have been woken up by the previous owner 742 - * but the lock got stolen by a higher prio task. 743 - */ 744 - if (!waiter.task) { 745 - ret = task_blocks_on_rt_mutex(lock, &waiter, 746 - detect_deadlock); 747 - /* 748 - * If we got woken up by the owner then start loop 749 - * all over without going into schedule to try 750 - * to get the lock now: 751 - */ 752 - if (unlikely(!waiter.task)) { 753 - /* 754 - * Reset the return value. We might 755 - * have returned with -EDEADLK and the 756 - * owner released the lock while we 757 - * were walking the pi chain. 758 - */ 759 - ret = 0; 760 - continue; 761 - } 762 - if (unlikely(ret)) 763 - break; 764 - } 765 - 766 - spin_unlock(&lock->wait_lock); 767 - 768 - debug_rt_mutex_print_deadlock(&waiter); 769 - 770 - if (waiter.task) 771 - schedule_rt_mutex(lock); 772 - 773 - spin_lock(&lock->wait_lock); 774 - set_current_state(state); 775 - } 639 + ret = __rt_mutex_slowlock(lock, state, timeout, &waiter, 640 + detect_deadlock); 776 641 777 642 set_current_state(TASK_RUNNING); 778 643 ··· 1013 986 } 1014 987 1015 988 /** 989 + * rt_mutex_start_proxy_lock() - Start lock acquisition for another task 990 + * @lock: the rt_mutex to take 991 + * @waiter: the pre-initialized rt_mutex_waiter 992 + * @task: the task to prepare 993 + * @detect_deadlock: perform deadlock detection (1) or not (0) 994 + * 995 + * Returns: 996 + * 0 - task blocked on lock 997 + * 1 - acquired the lock for task, caller should wake it up 998 + * <0 - error 999 + * 1000 + * Special API call for FUTEX_REQUEUE_PI support. 1001 + */ 1002 + int rt_mutex_start_proxy_lock(struct rt_mutex *lock, 1003 + struct rt_mutex_waiter *waiter, 1004 + struct task_struct *task, int detect_deadlock) 1005 + { 1006 + int ret; 1007 + 1008 + spin_lock(&lock->wait_lock); 1009 + 1010 + mark_rt_mutex_waiters(lock); 1011 + 1012 + if (!rt_mutex_owner(lock) || try_to_steal_lock(lock, task)) { 1013 + /* We got the lock for task. */ 1014 + debug_rt_mutex_lock(lock); 1015 + 1016 + rt_mutex_set_owner(lock, task, 0); 1017 + 1018 + rt_mutex_deadlock_account_lock(lock, task); 1019 + return 1; 1020 + } 1021 + 1022 + ret = task_blocks_on_rt_mutex(lock, waiter, task, detect_deadlock); 1023 + 1024 + 1025 + if (ret && !waiter->task) { 1026 + /* 1027 + * Reset the return value. We might have 1028 + * returned with -EDEADLK and the owner 1029 + * released the lock while we were walking the 1030 + * pi chain. Let the waiter sort it out. 1031 + */ 1032 + ret = 0; 1033 + } 1034 + spin_unlock(&lock->wait_lock); 1035 + 1036 + debug_rt_mutex_print_deadlock(waiter); 1037 + 1038 + return ret; 1039 + } 1040 + 1041 + /** 1016 1042 * rt_mutex_next_owner - return the next owner of the lock 1017 1043 * 1018 1044 * @lock: the rt lock query ··· 1083 1003 return NULL; 1084 1004 1085 1005 return rt_mutex_top_waiter(lock)->task; 1006 + } 1007 + 1008 + /** 1009 + * rt_mutex_finish_proxy_lock() - Complete lock acquisition 1010 + * @lock: the rt_mutex we were woken on 1011 + * @to: the timeout, null if none. hrtimer should already have 1012 + * been started. 1013 + * @waiter: the pre-initialized rt_mutex_waiter 1014 + * @detect_deadlock: perform deadlock detection (1) or not (0) 1015 + * 1016 + * Complete the lock acquisition started our behalf by another thread. 1017 + * 1018 + * Returns: 1019 + * 0 - success 1020 + * <0 - error, one of -EINTR, -ETIMEDOUT, or -EDEADLK 1021 + * 1022 + * Special API call for PI-futex requeue support 1023 + */ 1024 + int rt_mutex_finish_proxy_lock(struct rt_mutex *lock, 1025 + struct hrtimer_sleeper *to, 1026 + struct rt_mutex_waiter *waiter, 1027 + int detect_deadlock) 1028 + { 1029 + int ret; 1030 + 1031 + spin_lock(&lock->wait_lock); 1032 + 1033 + set_current_state(TASK_INTERRUPTIBLE); 1034 + 1035 + ret = __rt_mutex_slowlock(lock, TASK_INTERRUPTIBLE, to, waiter, 1036 + detect_deadlock); 1037 + 1038 + set_current_state(TASK_RUNNING); 1039 + 1040 + if (unlikely(waiter->task)) 1041 + remove_waiter(lock, waiter); 1042 + 1043 + /* 1044 + * try_to_take_rt_mutex() sets the waiter bit unconditionally. We might 1045 + * have to fix that up. 1046 + */ 1047 + fixup_rt_mutex_waiters(lock); 1048 + 1049 + spin_unlock(&lock->wait_lock); 1050 + 1051 + /* 1052 + * Readjust priority, when we did not get the lock. We might have been 1053 + * the pending owner and boosted. Since we did not take the lock, the 1054 + * PI boost has to go. 1055 + */ 1056 + if (unlikely(ret)) 1057 + rt_mutex_adjust_prio(current); 1058 + 1059 + return ret; 1086 1060 }
+8
kernel/rtmutex_common.h
··· 120 120 struct task_struct *proxy_owner); 121 121 extern void rt_mutex_proxy_unlock(struct rt_mutex *lock, 122 122 struct task_struct *proxy_owner); 123 + extern int rt_mutex_start_proxy_lock(struct rt_mutex *lock, 124 + struct rt_mutex_waiter *waiter, 125 + struct task_struct *task, 126 + int detect_deadlock); 127 + extern int rt_mutex_finish_proxy_lock(struct rt_mutex *lock, 128 + struct hrtimer_sleeper *to, 129 + struct rt_mutex_waiter *waiter, 130 + int detect_deadlock); 123 131 124 132 #ifdef CONFIG_DEBUG_RT_MUTEXES 125 133 # include "rtmutex-debug.h"