Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

thermal: core: Do not fail cdev registration because of invalid initial state

It is reported that commit 31a0fa0019b0 ("thermal/debugfs: Pass cooling
device state to thermal_debug_cdev_add()") causes the ACPI fan driver
to fail probing on some systems which turns out to be due to the _FST
control method returning an invalid value until _FSL is first evaluated
for the given fan. If this happens, the .get_cur_state() cooling device
callback returns an error and __thermal_cooling_device_register() fails
as uses that callback after commit 31a0fa0019b0.

Arguably, _FST should not return an invalid value even if it is
evaluated before _FSL, so this may be regarded as a platform firmware
issue, but at the same time it is not a good enough reason for failing
the cooling device registration where the initial cooling device state
is only needed to initialize a thermal debug facility.

Accordingly, modify __thermal_cooling_device_register() to avoid
calling thermal_debug_cdev_add() instead of returning an error if the
initial .get_cur_state() callback invocation fails.

Fixes: 31a0fa0019b0 ("thermal/debugfs: Pass cooling device state to thermal_debug_cdev_add()")
Closes: https://lore.kernel.org/linux-acpi/20240530153727.843378-1-laura.nao@collabora.com
Reported-by: Laura Nao <laura.nao@collabora.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Tested-by: Laura Nao <laura.nao@collabora.com>

+11 -2
+11 -2
drivers/thermal/thermal_core.c
··· 999 999 if (ret) 1000 1000 goto out_cdev_type; 1001 1001 1002 + /* 1003 + * The cooling device's current state is only needed for debug 1004 + * initialization below, so a failure to get it does not cause 1005 + * the entire cooling device initialization to fail. However, 1006 + * the debug will not work for the device if its initial state 1007 + * cannot be determined and drivers are responsible for ensuring 1008 + * that this will not happen. 1009 + */ 1002 1010 ret = cdev->ops->get_cur_state(cdev, &current_state); 1003 1011 if (ret) 1004 - goto out_cdev_type; 1012 + current_state = ULONG_MAX; 1005 1013 1006 1014 thermal_cooling_device_setup_sysfs(cdev); 1007 1015 ··· 1024 1016 return ERR_PTR(ret); 1025 1017 } 1026 1018 1027 - thermal_debug_cdev_add(cdev, current_state); 1019 + if (current_state <= cdev->max_state) 1020 + thermal_debug_cdev_add(cdev, current_state); 1028 1021 1029 1022 /* Add 'this' new cdev to the global cdev list */ 1030 1023 mutex_lock(&thermal_list_lock);