Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

misc: bcm_vk: Fix potential deadlock on &vk->ctx_lock

As &vk->ctx_lock is acquired by timer bcm_vk_hb_poll() under softirq
context, other process context code should disable irq or bottom-half
before acquire the same lock, otherwise deadlock could happen if the
timer preempt the execution while the lock is held in process context
on the same CPU.

Possible deadlock scenario
bcm_vk_open()
-> bcm_vk_get_ctx()
-> spin_lock(&vk->ctx_lock)
<timer iterrupt>
-> bcm_vk_hb_poll()
-> bcm_vk_blk_drv_access()
-> spin_lock_irqsave(&vk->ctx_lock, flags) (deadlock here)

This flaw was found using an experimental static analysis tool we are
developing for irq-related deadlock, which reported the following
warning when analyzing the linux kernel 6.4-rc7 release.

[Deadlock]: &vk->ctx_lock
[Interrupt]: bcm_vk_hb_poll
-->/root/linux/drivers/misc/bcm-vk/bcm_vk_msg.c:176
-->/root/linux/drivers/misc/bcm-vk/bcm_vk_dev.c:512
[Locking Unit]: bcm_vk_ioctl
-->/root/linux/drivers/misc/bcm-vk/bcm_vk_dev.c:1181
-->/root/linux/drivers/misc/bcm-vk/bcm_vk_dev.c:512

[Deadlock]: &vk->ctx_lock
[Interrupt]: bcm_vk_hb_poll
-->/root/linux/drivers/misc/bcm-vk/bcm_vk_msg.c:176
-->/root/linux/drivers/misc/bcm-vk/bcm_vk_dev.c:512
[Locking Unit]: bcm_vk_ioctl
-->/root/linux/drivers/misc/bcm-vk/bcm_vk_dev.c:1169

[Deadlock]: &vk->ctx_lock
[Interrupt]: bcm_vk_hb_poll
-->/root/linux/drivers/misc/bcm-vk/bcm_vk_msg.c:176
-->/root/linux/drivers/misc/bcm-vk/bcm_vk_dev.c:512
[Locking Unit]: bcm_vk_open
-->/root/linux/drivers/misc/bcm-vk/bcm_vk_msg.c:216

[Deadlock]: &vk->ctx_lock
[Interrupt]: bcm_vk_hb_poll
-->/root/linux/drivers/misc/bcm-vk/bcm_vk_msg.c:176
-->/root/linux/drivers/misc/bcm-vk/bcm_vk_dev.c:512
[Locking Unit]: bcm_vk_release
-->/root/linux/drivers/misc/bcm-vk/bcm_vk_msg.c:306

As suggested by Arnd, the tentative patch fix the potential deadlocks
by replacing the timer with delay workqueue. x86_64 allyesconfig using
GCC shows no new warning. Note that no runtime testing was performed
due to no device on hand.

Signed-off-by: Chengfeng Ye <dg573847474@gmail.com>
Acked-by: Scott Branden <scott.branden@broadcom.com>
Tested-by: Desmond Yan <desmond.branden@broadcom.com>
Tested-by: Desmond Yan <desmond.yan@broadcom.com>
Link: https://lore.kernel.org/r/20230629182941.13045-1-dg573847474@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

authored by

Chengfeng Ye and committed by
Greg Kroah-Hartman
1bae5c0e acdbfa04

+8 -8
+1 -1
drivers/misc/bcm-vk/bcm_vk.h
··· 340 340 }; 341 341 342 342 struct bcm_vk_hb_ctrl { 343 - struct timer_list timer; 343 + struct delayed_work work; 344 344 u32 last_uptime; 345 345 u32 lost_cnt; 346 346 };
+7 -7
drivers/misc/bcm-vk/bcm_vk_msg.c
··· 137 137 #define BCM_VK_HB_TIMER_VALUE (BCM_VK_HB_TIMER_S * HZ) 138 138 #define BCM_VK_HB_LOST_MAX (27 / BCM_VK_HB_TIMER_S) 139 139 140 - static void bcm_vk_hb_poll(struct timer_list *t) 140 + static void bcm_vk_hb_poll(struct work_struct *work) 141 141 { 142 142 u32 uptime_s; 143 - struct bcm_vk_hb_ctrl *hb = container_of(t, struct bcm_vk_hb_ctrl, 144 - timer); 143 + struct bcm_vk_hb_ctrl *hb = container_of(to_delayed_work(work), struct bcm_vk_hb_ctrl, 144 + work); 145 145 struct bcm_vk *vk = container_of(hb, struct bcm_vk, hb_ctrl); 146 146 147 147 if (bcm_vk_drv_access_ok(vk) && hb_mon_is_on()) { ··· 177 177 bcm_vk_set_host_alert(vk, ERR_LOG_HOST_HB_FAIL); 178 178 } 179 179 /* re-arm timer */ 180 - mod_timer(&hb->timer, jiffies + BCM_VK_HB_TIMER_VALUE); 180 + schedule_delayed_work(&hb->work, BCM_VK_HB_TIMER_VALUE); 181 181 } 182 182 183 183 void bcm_vk_hb_init(struct bcm_vk *vk) 184 184 { 185 185 struct bcm_vk_hb_ctrl *hb = &vk->hb_ctrl; 186 186 187 - timer_setup(&hb->timer, bcm_vk_hb_poll, 0); 188 - mod_timer(&hb->timer, jiffies + BCM_VK_HB_TIMER_VALUE); 187 + INIT_DELAYED_WORK(&hb->work, bcm_vk_hb_poll); 188 + schedule_delayed_work(&hb->work, BCM_VK_HB_TIMER_VALUE); 189 189 } 190 190 191 191 void bcm_vk_hb_deinit(struct bcm_vk *vk) 192 192 { 193 193 struct bcm_vk_hb_ctrl *hb = &vk->hb_ctrl; 194 194 195 - del_timer(&hb->timer); 195 + cancel_delayed_work_sync(&hb->work); 196 196 } 197 197 198 198 static void bcm_vk_msgid_bitmap_clear(struct bcm_vk *vk,