Merge branch 'backup-thermal-shutdown' into next

+101 -1
+21
Documentation/thermal/sysfs-api.txt
··· 582 582 This function serves as an arbitrator to set the state of a cooling 583 583 device. It sets the cooling device to the deepest cooling state if 584 584 possible. 585 + 586 + 6. thermal_emergency_poweroff: 587 + 588 + On an event of critical trip temperature crossing. Thermal framework 589 + allows the system to shutdown gracefully by calling orderly_poweroff(). 590 + In the event of a failure of orderly_poweroff() to shut down the system 591 + we are in danger of keeping the system alive at undesirably high 592 + temperatures. To mitigate this high risk scenario we program a work 593 + queue to fire after a pre-determined number of seconds to start 594 + an emergency shutdown of the device using the kernel_power_off() 595 + function. In case kernel_power_off() fails then finally 596 + emergency_restart() is called in the worst case. 597 + 598 + The delay should be carefully profiled so as to give adequate time for 599 + orderly_poweroff(). In case of failure of an orderly_poweroff() the 600 + emergency poweroff kicks in after the delay has elapsed and shuts down 601 + the system. 602 + 603 + If set to 0 emergency poweroff will not be supported. So a carefully 604 + profiled non-zero positive value is a must for emergerncy poweroff to be 605 + triggered.
+17
drivers/thermal/Kconfig
··· 15 15 16 16 if THERMAL 17 17 18 + config THERMAL_EMERGENCY_POWEROFF_DELAY_MS 19 + int "Emergency poweroff delay in milli-seconds" 20 + depends on THERMAL 21 + default 0 22 + help 23 + Thermal subsystem will issue a graceful shutdown when 24 + critical temperatures are reached using orderly_poweroff(). In 25 + case of failure of an orderly_poweroff(), the thermal emergency 26 + poweroff kicks in after a delay has elapsed and shuts down the system. 27 + This config is number of milliseconds to delay before emergency 28 + poweroff kicks in. Similarly to the critical trip point, 29 + the delay should be carefully profiled so as to give adequate 30 + time for orderly_poweroff() to finish on regular execution. 31 + If set to 0 emergency poweroff will not be supported. 32 + 33 + In doubt, leave as 0. 34 + 18 35 config THERMAL_HWMON 19 36 bool 20 37 prompt "Expose thermal sensors as hwmon device"
+63 -1
drivers/thermal/thermal_core.c
··· 45 45 46 46 static DEFINE_MUTEX(thermal_list_lock); 47 47 static DEFINE_MUTEX(thermal_governor_lock); 48 + static DEFINE_MUTEX(poweroff_lock); 48 49 49 50 static atomic_t in_suspend; 51 + static bool power_off_triggered; 50 52 51 53 static struct thermal_governor *def_governor; 52 54 ··· 324 322 def_governor->throttle(tz, trip); 325 323 } 326 324 325 + /** 326 + * thermal_emergency_poweroff_func - emergency poweroff work after a known delay 327 + * @work: work_struct associated with the emergency poweroff function 328 + * 329 + * This function is called in very critical situations to force 330 + * a kernel poweroff after a configurable timeout value. 331 + */ 332 + static void thermal_emergency_poweroff_func(struct work_struct *work) 333 + { 334 + /* 335 + * We have reached here after the emergency thermal shutdown 336 + * Waiting period has expired. This means orderly_poweroff has 337 + * not been able to shut off the system for some reason. 338 + * Try to shut down the system immediately using kernel_power_off 339 + * if populated 340 + */ 341 + WARN(1, "Attempting kernel_power_off: Temperature too high\n"); 342 + kernel_power_off(); 343 + 344 + /* 345 + * Worst of the worst case trigger emergency restart 346 + */ 347 + WARN(1, "Attempting emergency_restart: Temperature too high\n"); 348 + emergency_restart(); 349 + } 350 + 351 + static DECLARE_DELAYED_WORK(thermal_emergency_poweroff_work, 352 + thermal_emergency_poweroff_func); 353 + 354 + /** 355 + * thermal_emergency_poweroff - Trigger an emergency system poweroff 356 + * 357 + * This may be called from any critical situation to trigger a system shutdown 358 + * after a known period of time. By default this is not scheduled. 359 + */ 360 + void thermal_emergency_poweroff(void) 361 + { 362 + int poweroff_delay_ms = CONFIG_THERMAL_EMERGENCY_POWEROFF_DELAY_MS; 363 + /* 364 + * poweroff_delay_ms must be a carefully profiled positive value. 365 + * Its a must for thermal_emergency_poweroff_work to be scheduled 366 + */ 367 + if (poweroff_delay_ms <= 0) 368 + return; 369 + schedule_delayed_work(&thermal_emergency_poweroff_work, 370 + msecs_to_jiffies(poweroff_delay_ms)); 371 + } 372 + 327 373 static void handle_critical_trips(struct thermal_zone_device *tz, 328 374 int trip, enum thermal_trip_type trip_type) 329 375 { ··· 392 342 dev_emerg(&tz->device, 393 343 "critical temperature reached(%d C),shutting down\n", 394 344 tz->temperature / 1000); 395 - orderly_poweroff(true); 345 + mutex_lock(&poweroff_lock); 346 + if (!power_off_triggered) { 347 + /* 348 + * Queue a backup emergency shutdown in the event of 349 + * orderly_poweroff failure 350 + */ 351 + thermal_emergency_poweroff(); 352 + orderly_poweroff(true); 353 + power_off_triggered = true; 354 + } 355 + mutex_unlock(&poweroff_lock); 396 356 } 397 357 } 398 358 ··· 1523 1463 { 1524 1464 int result; 1525 1465 1466 + mutex_init(&poweroff_lock); 1526 1467 result = thermal_register_governors(); 1527 1468 if (result) 1528 1469 goto error; ··· 1558 1497 ida_destroy(&thermal_cdev_ida); 1559 1498 mutex_destroy(&thermal_list_lock); 1560 1499 mutex_destroy(&thermal_governor_lock); 1500 + mutex_destroy(&poweroff_lock); 1561 1501 return result; 1562 1502 } 1563 1503