Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

firmware: fix batched requests - send wake up on failure on direct lookups

Fix batched requests from waiting forever on failure.

The firmware API batched requests feature has been broken since the API call
request_firmware_direct() was introduced on commit bba3a87e982ad ("firmware:
Introduce request_firmware_direct()"), added on v3.14 *iff* the firmware
being requested was not present in *certain kernel builds* [0].

When no firmware is found the worker which goes on to finish never informs
waiters queued up of this, so any batched request will stall in what seems
to be forever (MAX_SCHEDULE_TIMEOUT). Sadly, a reboot will also stall, as
the reboot notifier was only designed to kill custom fallback workers. The
issue seems to the user as a type of soft lockup, what *actually* happens
underneath the hood is a wait call which never completes as we failed to
issue a completion on error.

For device drivers with optional firmware schemes (ie, Intel iwlwifi, or
Netronome -- even though it uses request_firmware() and not
request_firmware_direct()), this could mean that when you boot a system with
multiple cards the firmware will seem to never load on the system, or that
the card is just not responsive even the driver initialization. Due to
differences in scheduling possible this should not always trigger --
one would need to to ensure that multiple requests are in place at the
right time for this to work, also release_firmware() must not be called
prior to any other incoming request. The complexity may not be worth
supporting batched requests in the future given the wait mechanism is
only used also for the fallback mechanism. We'll keep it for now and
just fix it.

Its reported that at least with the Intel WiFi cards on one system this
issue was creeping up 50% of the boots [0].

Before this commit batched requests testing revealed:
============================================================================
CONFIG_FW_LOADER_USER_HELPER_FALLBACK=n
CONFIG_FW_LOADER_USER_HELPER=y

Most common Linux distribution setup.

API-type no-firmware-found firmware-found
----------------------------------------------------------------------
request_firmware() FAIL OK
request_firmware_direct() FAIL OK
request_firmware_nowait(uevent=true) FAIL OK
request_firmware_nowait(uevent=false) FAIL OK
============================================================================
CONFIG_FW_LOADER_USER_HELPER_FALLBACK=n
CONFIG_FW_LOADER_USER_HELPER=n

Only possible if CONFIG_DELL_RBU=n and CONFIG_LEDS_LP55XX_COMMON=n, rare.

API-type no-firmware-found firmware-found
----------------------------------------------------------------------
request_firmware() FAIL OK
request_firmware_direct() FAIL OK
request_firmware_nowait(uevent=true) FAIL OK
request_firmware_nowait(uevent=false) FAIL OK
============================================================================
CONFIG_FW_LOADER_USER_HELPER_FALLBACK=y
CONFIG_FW_LOADER_USER_HELPER=y

Google Android setup.

API-type no-firmware-found firmware-found
----------------------------------------------------------------------
request_firmware() OK OK
request_firmware_direct() FAIL OK
request_firmware_nowait(uevent=true) OK OK
request_firmware_nowait(uevent=false) OK OK
============================================================================

Ater this commit batched testing results:
============================================================================
CONFIG_FW_LOADER_USER_HELPER_FALLBACK=n
CONFIG_FW_LOADER_USER_HELPER=y

Most common Linux distribution setup.

API-type no-firmware-found firmware-found
----------------------------------------------------------------------
request_firmware() OK OK
request_firmware_direct() OK OK
request_firmware_nowait(uevent=true) OK OK
request_firmware_nowait(uevent=false) OK OK
============================================================================
CONFIG_FW_LOADER_USER_HELPER_FALLBACK=n
CONFIG_FW_LOADER_USER_HELPER=n

Only possible if CONFIG_DELL_RBU=n and CONFIG_LEDS_LP55XX_COMMON=n, rare.

API-type no-firmware-found firmware-found
----------------------------------------------------------------------
request_firmware() OK OK
request_firmware_direct() OK OK
request_firmware_nowait(uevent=true) OK OK
request_firmware_nowait(uevent=false) OK OK
============================================================================
CONFIG_FW_LOADER_USER_HELPER_FALLBACK=y
CONFIG_FW_LOADER_USER_HELPER=y

Google Android setup.

API-type no-firmware-found firmware-found
----------------------------------------------------------------------
request_firmware() OK OK
request_firmware_direct() OK OK
request_firmware_nowait(uevent=true) OK OK
request_firmware_nowait(uevent=false) OK OK
============================================================================

[0] https://bugzilla.kernel.org/show_bug.cgi?id=195477

Cc: stable <stable@vger.kernel.org> # v3.14
Fixes: bba3a87e982ad ("firmware: Introduce request_firmware_direct()"
Reported-by: Nicolas <nbroeking@me.com>
Reported-by: John Ewalt <jewalt@lgsinnovations.com>
Reported-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

authored by

Luis R. Rodriguez and committed by
Greg Kroah-Hartman
90d41e74 e44565f6

+30 -8
+30 -8
drivers/base/firmware_class.c
··· 153 153 __fw_state_set(fw_st, FW_STATUS_LOADING) 154 154 #define fw_state_done(fw_st) \ 155 155 __fw_state_set(fw_st, FW_STATUS_DONE) 156 + #define fw_state_aborted(fw_st) \ 157 + __fw_state_set(fw_st, FW_STATUS_ABORTED) 156 158 #define fw_state_wait(fw_st) \ 157 159 __fw_state_wait_common(fw_st, MAX_SCHEDULE_TIMEOUT) 158 - 159 - #ifndef CONFIG_FW_LOADER_USER_HELPER 160 - 161 - #define fw_state_is_aborted(fw_st) false 162 - 163 - #else /* CONFIG_FW_LOADER_USER_HELPER */ 164 160 165 161 static int __fw_state_check(struct fw_state *fw_st, enum fw_status status) 166 162 { 167 163 return fw_st->status == status; 168 164 } 165 + 166 + #define fw_state_is_aborted(fw_st) \ 167 + __fw_state_check(fw_st, FW_STATUS_ABORTED) 168 + 169 + #ifdef CONFIG_FW_LOADER_USER_HELPER 169 170 170 171 #define fw_state_aborted(fw_st) \ 171 172 __fw_state_set(fw_st, FW_STATUS_ABORTED) ··· 174 173 __fw_state_check(fw_st, FW_STATUS_DONE) 175 174 #define fw_state_is_loading(fw_st) \ 176 175 __fw_state_check(fw_st, FW_STATUS_LOADING) 177 - #define fw_state_is_aborted(fw_st) \ 178 - __fw_state_check(fw_st, FW_STATUS_ABORTED) 179 176 #define fw_state_wait_timeout(fw_st, timeout) \ 180 177 __fw_state_wait_common(fw_st, timeout) 181 178 ··· 1197 1198 return 1; /* need to load */ 1198 1199 } 1199 1200 1201 + /* 1202 + * Batched requests need only one wake, we need to do this step last due to the 1203 + * fallback mechanism. The buf is protected with kref_get(), and it won't be 1204 + * released until the last user calls release_firmware(). 1205 + * 1206 + * Failed batched requests are possible as well, in such cases we just share 1207 + * the struct firmware_buf and won't release it until all requests are woken 1208 + * and have gone through this same path. 1209 + */ 1210 + static void fw_abort_batch_reqs(struct firmware *fw) 1211 + { 1212 + struct firmware_buf *buf; 1213 + 1214 + /* Loaded directly? */ 1215 + if (!fw || !fw->priv) 1216 + return; 1217 + 1218 + buf = fw->priv; 1219 + if (!fw_state_is_aborted(&buf->fw_st)) 1220 + fw_state_aborted(&buf->fw_st); 1221 + } 1222 + 1200 1223 /* called from request_firmware() and request_firmware_work_func() */ 1201 1224 static int 1202 1225 _request_firmware(const struct firmware **firmware_p, const char *name, ··· 1262 1241 1263 1242 out: 1264 1243 if (ret < 0) { 1244 + fw_abort_batch_reqs(fw); 1265 1245 release_firmware(fw); 1266 1246 fw = NULL; 1267 1247 }