Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

binder: fix freeze race

Currently cgroup freezer is used to freeze the application threads, and
BINDER_FREEZE is used to freeze the corresponding binder interface.
There's already a mechanism in ioctl(BINDER_FREEZE) to wait for any
existing transactions to drain out before actually freezing the binder
interface.

But freezing an app requires 2 steps, freezing the binder interface with
ioctl(BINDER_FREEZE) and then freezing the application main threads with
cgroupfs. This is not an atomic operation. The following race issue
might happen.

1) Binder interface is frozen by ioctl(BINDER_FREEZE);
2) Main thread A initiates a new sync binder transaction to process B;
3) Main thread A is frozen by "echo 1 > cgroup.freeze";
4) The response from process B reaches the frozen thread, which will
unexpectedly fail.

This patch provides a mechanism to check if there's any new pending
transaction happening between ioctl(BINDER_FREEZE) and freezing the
main thread. If there's any, the main thread freezing operation can
be rolled back to finish the pending transaction.

Furthermore, the response might reach the binder driver before the
rollback actually happens. That will still cause failed transaction.

As the other process doesn't wait for another response of the response,
the response transaction failure can be fixed by treating the response
transaction like an oneway/async one, allowing it to reach the frozen
thread. And it will be consumed when the thread gets unfrozen later.

NOTE: This patch reuses the existing definition of struct
binder_frozen_status_info but expands the bit assignments of __u32
member sync_recv.

To ensure backward compatibility, bit 0 of sync_recv still indicates
there's an outstanding sync binder transaction. This patch adds new
information to bit 1 of sync_recv, indicating the binder transaction
happens exactly when there's a race.

If an existing userspace app runs on a new kernel, a sync binder call
will set bit 0 of sync_recv so ioctl(BINDER_GET_FROZEN_INFO) still
return the expected value (true). The app just doesn't check bit 1
intentionally so it doesn't have the ability to tell if there's a race.
This behavior is aligned with what happens on an old kernel which
doesn't set bit 1 at all.

A new userspace app can 1) check bit 0 to know if there's a sync binder
transaction happened when being frozen - same as before; and 2) check
bit 1 to know if that sync binder transaction happened exactly when
there's a race - a new information for rollback decision.

the same time, confirmed the pending transactions succeeded.

Fixes: 432ff1e91694 ("binder: BINDER_FREEZE ioctl")
Acked-by: Todd Kjos <tkjos@google.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Li Li <dualli@google.com>
Test: stress test with apps being frozen and initiating binder calls at
Link: https://lore.kernel.org/r/20210910164210.2282716-2-dualli@chromium.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

authored by

Li Li and committed by
Greg Kroah-Hartman
b564171a 6880fa6c

+38 -6
+29 -6
drivers/android/binder.c
··· 3038 3038 if (reply) { 3039 3039 binder_enqueue_thread_work(thread, tcomplete); 3040 3040 binder_inner_proc_lock(target_proc); 3041 - if (target_thread->is_dead || target_proc->is_frozen) { 3042 - return_error = target_thread->is_dead ? 3043 - BR_DEAD_REPLY : BR_FROZEN_REPLY; 3041 + if (target_thread->is_dead) { 3042 + return_error = BR_DEAD_REPLY; 3044 3043 binder_inner_proc_unlock(target_proc); 3045 3044 goto err_dead_proc_or_thread; 3046 3045 } ··· 4647 4648 return 0; 4648 4649 } 4649 4650 4651 + static bool binder_txns_pending_ilocked(struct binder_proc *proc) 4652 + { 4653 + struct rb_node *n; 4654 + struct binder_thread *thread; 4655 + 4656 + if (proc->outstanding_txns > 0) 4657 + return true; 4658 + 4659 + for (n = rb_first(&proc->threads); n; n = rb_next(n)) { 4660 + thread = rb_entry(n, struct binder_thread, rb_node); 4661 + if (thread->transaction_stack) 4662 + return true; 4663 + } 4664 + return false; 4665 + } 4666 + 4650 4667 static int binder_ioctl_freeze(struct binder_freeze_info *info, 4651 4668 struct binder_proc *target_proc) 4652 4669 { ··· 4694 4679 (!target_proc->outstanding_txns), 4695 4680 msecs_to_jiffies(info->timeout_ms)); 4696 4681 4697 - if (!ret && target_proc->outstanding_txns) 4698 - ret = -EAGAIN; 4682 + /* Check pending transactions that wait for reply */ 4683 + if (ret >= 0) { 4684 + binder_inner_proc_lock(target_proc); 4685 + if (binder_txns_pending_ilocked(target_proc)) 4686 + ret = -EAGAIN; 4687 + binder_inner_proc_unlock(target_proc); 4688 + } 4699 4689 4700 4690 if (ret < 0) { 4701 4691 binder_inner_proc_lock(target_proc); ··· 4716 4696 { 4717 4697 struct binder_proc *target_proc; 4718 4698 bool found = false; 4699 + __u32 txns_pending; 4719 4700 4720 4701 info->sync_recv = 0; 4721 4702 info->async_recv = 0; ··· 4726 4705 if (target_proc->pid == info->pid) { 4727 4706 found = true; 4728 4707 binder_inner_proc_lock(target_proc); 4729 - info->sync_recv |= target_proc->sync_recv; 4708 + txns_pending = binder_txns_pending_ilocked(target_proc); 4709 + info->sync_recv |= target_proc->sync_recv | 4710 + (txns_pending << 1); 4730 4711 info->async_recv |= target_proc->async_recv; 4731 4712 binder_inner_proc_unlock(target_proc); 4732 4713 }
+2
drivers/android/binder_internal.h
··· 378 378 * binder transactions 379 379 * (protected by @inner_lock) 380 380 * @sync_recv: process received sync transactions since last frozen 381 + * bit 0: received sync transaction after being frozen 382 + * bit 1: new pending sync transaction during freezing 381 383 * (protected by @inner_lock) 382 384 * @async_recv: process received async transactions since last frozen 383 385 * (protected by @inner_lock)
+7
include/uapi/linux/android/binder.h
··· 225 225 226 226 struct binder_frozen_status_info { 227 227 __u32 pid; 228 + 229 + /* process received sync transactions since last frozen 230 + * bit 0: received sync transaction after being frozen 231 + * bit 1: new pending sync transaction during freezing 232 + */ 228 233 __u32 sync_recv; 234 + 235 + /* process received async transactions since last frozen */ 229 236 __u32 async_recv; 230 237 }; 231 238