Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

thermal: power_allocator: fix one race condition issue for thermal_instances list

When invoking allow_maximum_power and traverse tz->thermal_instances,
we should grab thermal_zone_device->lock to avoid race condition. For
example, during the system reboot, if the mali GPU device implements
device shutdown callback and unregister GPU devfreq cooling device,
the deleted list head may be accessed to cause panic, as the following
log shows:

[ 33.551070] c3 25 (kworker/3:0) Unable to handle kernel paging request at virtual address dead000000000070
[ 33.566708] c3 25 (kworker/3:0) pgd = ffffffc0ed290000
[ 33.572071] c3 25 (kworker/3:0) [dead000000000070] *pgd=00000001ed292003, *pud=00000001ed292003, *pmd=0000000000000000
[ 33.581515] c3 25 (kworker/3:0) Internal error: Oops: 96000004 [#1] PREEMPT SMP
[ 33.599761] c3 25 (kworker/3:0) CPU: 3 PID: 25 Comm: kworker/3:0 Not tainted 4.4.35+ #912
[ 33.614137] c3 25 (kworker/3:0) Workqueue: events_freezable thermal_zone_device_check
[ 33.620245] c3 25 (kworker/3:0) task: ffffffc0f32e4200 ti: ffffffc0f32f0000 task.ti: ffffffc0f32f0000
[ 33.629466] c3 25 (kworker/3:0) PC is at power_allocator_throttle+0x7c8/0x8a4
[ 33.636609] c3 25 (kworker/3:0) LR is at power_allocator_throttle+0x808/0x8a4
[ 33.643742] c3 25 (kworker/3:0) pc : [<ffffff8008683dd0>] lr : [<ffffff8008683e10>] pstate: 20000145
[ 33.652874] c3 25 (kworker/3:0) sp : ffffffc0f32f3bb0
[ 34.468519] c3 25 (kworker/3:0) Process kworker/3:0 (pid: 25, stack limit = 0xffffffc0f32f0020)
[ 34.477220] c3 25 (kworker/3:0) Stack: (0xffffffc0f32f3bb0 to 0xffffffc0f32f4000)
[ 34.819822] c3 25 (kworker/3:0) Call trace:
[ 34.824021] c3 25 (kworker/3:0) Exception stack(0xffffffc0f32f39c0 to 0xffffffc0f32f3af0)
[ 34.924993] c3 25 (kworker/3:0) [<ffffff8008683dd0>] power_allocator_throttle+0x7c8/0x8a4
[ 34.933184] c3 25 (kworker/3:0) [<ffffff80086807f4>] handle_thermal_trip.part.25+0x70/0x224
[ 34.941545] c3 25 (kworker/3:0) [<ffffff8008680a68>] thermal_zone_device_update+0xc0/0x20c
[ 34.949818] c3 25 (kworker/3:0) [<ffffff8008680bd4>] thermal_zone_device_check+0x20/0x2c
[ 34.957924] c3 25 (kworker/3:0) [<ffffff80080b93a4>] process_one_work+0x168/0x458
[ 34.965414] c3 25 (kworker/3:0) [<ffffff80080ba068>] worker_thread+0x13c/0x4b4
[ 34.972650] c3 25 (kworker/3:0) [<ffffff80080c0a4c>] kthread+0xe8/0xfc
[ 34.979187] c3 25 (kworker/3:0) [<ffffff8008084e90>] ret_from_fork+0x10/0x40
[ 34.986244] c3 25 (kworker/3:0) Code: f9405e73 eb1302bf d102e273 54ffc460 (b9402a61)
[ 34.994339] c3 25 (kworker/3:0) ---[ end trace 32057901e3b7e1db ]---

Signed-off-by: Yi Zeng <yizeng@asrmicro.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>

authored by

Yi Zeng and committed by
Zhang Rui
a5de11d6 eb42612f

+2
+2
drivers/thermal/power_allocator.c
··· 523 523 struct thermal_instance *instance; 524 524 struct power_allocator_params *params = tz->governor_data; 525 525 526 + mutex_lock(&tz->lock); 526 527 list_for_each_entry(instance, &tz->thermal_instances, tz_node) { 527 528 if ((instance->trip != params->trip_max_desired_temperature) || 528 529 (!cdev_is_power_actor(instance->cdev))) ··· 535 534 mutex_unlock(&instance->cdev->lock); 536 535 thermal_cdev_update(instance->cdev); 537 536 } 537 + mutex_unlock(&tz->lock); 538 538 } 539 539 540 540 /**