Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

wifi: mac80211: fix a expired vs. cancel race in roc

When the remain on channel is removed at the time it should
have expired, we have a race: the driver could be handling
the flow of the expiration while mac80211 is cancelling
that very same remain on channel request.

This wouldn't be problem in itself, but since mac80211
can send the next request to the driver in the cancellation
flow, we can get to the following situation:

CPU0 CPU1
expiration of roc in driver
ieee80211_remain_on_channel_expired()
Cancellation of the roc
schedules a worker (hw_roc_done)
Add next roc
hw_roc_done_wk runs and ends
the second roc prematurely.

Since, by design, there is only one single request sent to the
driver at a time, we can safely assume that after the cancel()
request returns from the driver, we should not handle any worker
that handles the expiration of the request.

Cancel the hw_roc_done worker after the cancellation to make
sure we start the next one with a clean slate.

Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230928172905.4e4469be20ac.Iab0525f5cc4698acf23eab98b8b1eec02099cde0@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

authored by

Emmanuel Grumbach and committed by
Johannes Berg
9ad08fb1 271d14b3

+17
+17
net/mac80211/offchannel.c
··· 717 717 return ret; 718 718 } 719 719 720 + /* 721 + * We could be racing against the notification from the driver: 722 + * + driver is handling the notification on CPU0 723 + * + user space is cancelling the remain on channel and 724 + * schedules the hw_roc_done worker. 725 + * 726 + * Now hw_roc_done might start to run after the next roc will 727 + * start and mac80211 will think that this second roc has 728 + * ended prematurely. 729 + * Cancel the work to make sure that all the pending workers 730 + * have completed execution. 731 + * Note that this assumes that by the time the driver returns 732 + * from drv_cancel_remain_on_channel, it has completed all 733 + * the processing of related notifications. 734 + */ 735 + wiphy_work_cancel(local->hw.wiphy, &local->hw_roc_done); 736 + 720 737 /* TODO: 721 738 * if multiple items were combined here then we really shouldn't 722 739 * cancel them all - we should wait for as much time as needed