[PATCH] hwmon: abituguru timeout fixes

This patch contains 2 sets of fixes for the abituguru:
1) Much improved timeout handling, drasticly reducing the amount of
timeout errors on some motherboards
2) Fix the exit paths in the bank1 sensor type detect code to always
restore the original settings even on an error. Without this our
special test settings could remain seriously confusing the system
BIOS's setup menu.

Both are very much related and are must haves, to avoid messing up the
uguru CMOS settings.

Detailed changes:
- Much improved timeout / wait for status handling. Many thanks to Sunil
Kumar, for all his testing, ideas and patches! The code now first busy
waits, polling the uguru for the expected status as this usually
succeeds pretty quickly (within 90 reads). To avoid unnecessary CPU burn
in timeout conditions, the amount of busy waiting has been halved from
previous versions (120 tries instead of 250). This is not a problem,
because this version goes to sleep after 120 attemps for 1 jiffy and
then tries again, it does this sleep and try again 5 times before
finally giving up. This (almost?) completly removes the timeout errors
some people have seen regulary. Apparently some older uguru versions
sometimes are distracted for a (relatively) long time. This solves this.
- These timeout errors not only occur in the sending address part of
reading the uguru but also in the wait for read state, so errors in
this state are now handled as retryable just like send address state
errors and are only logged and reported to userspace if 3 executive
tries fail.
- Fix a very nasty bug in the bank1 sensor type detection code, where it
would not restore the original settings in any of the error paths!
- Since not successfully restoring the original settings can seriously
confuse the system BIOS (hang when entering the relevant setup menu),
we now try restoring them 3 times before giving up.

Signed-off-by: Hans de Goede <j.w.r.degoede@hhs.nl>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

authored by Hans de Goede and committed by Greg Kroah-Hartman faf9b616 4801bc25

+61 -38
+61 -38
drivers/hwmon/abituguru.c
··· 26 26 #include <linux/jiffies.h> 27 27 #include <linux/mutex.h> 28 28 #include <linux/err.h> 29 + #include <linux/delay.h> 29 30 #include <linux/platform_device.h> 30 31 #include <linux/hwmon.h> 31 32 #include <linux/hwmon-sysfs.h> ··· 65 64 #define ABIT_UGURU_IN_SENSOR 0 66 65 #define ABIT_UGURU_TEMP_SENSOR 1 67 66 #define ABIT_UGURU_NC 2 68 - /* Timeouts / Retries, if these turn out to need a lot of fiddling we could 69 - convert them to params. */ 70 - /* 250 was determined by trial and error, 200 works most of the time, but not 71 - always. I assume this is cpu-speed independent, since the ISA-bus and not 72 - the CPU should be the bottleneck. Note that 250 sometimes is still not 73 - enough (only reported on AN7 mb) this is handled by a higher layer. */ 74 - #define ABIT_UGURU_WAIT_TIMEOUT 250 67 + /* In many cases we need to wait for the uGuru to reach a certain status, most 68 + of the time it will reach this status within 30 - 90 ISA reads, and thus we 69 + can best busy wait. This define gives the total amount of reads to try. */ 70 + #define ABIT_UGURU_WAIT_TIMEOUT 125 71 + /* However sometimes older versions of the uGuru seem to be distracted and they 72 + do not respond for a long time. To handle this we sleep before each of the 73 + last ABIT_UGURU_WAIT_TIMEOUT_SLEEP tries. */ 74 + #define ABIT_UGURU_WAIT_TIMEOUT_SLEEP 5 75 75 /* Normally all expected status in abituguru_ready, are reported after the 76 - first read, but sometimes not and we need to poll, 5 polls was not enough 77 - 50 sofar is. */ 78 - #define ABIT_UGURU_READY_TIMEOUT 50 76 + first read, but sometimes not and we need to poll. */ 77 + #define ABIT_UGURU_READY_TIMEOUT 5 79 78 /* Maximum 3 retries on timedout reads/writes, delay 200 ms before retrying */ 80 79 #define ABIT_UGURU_MAX_RETRIES 3 81 80 #define ABIT_UGURU_RETRY_DELAY (HZ/5) ··· 227 226 timeout--; 228 227 if (timeout == 0) 229 228 return -EBUSY; 229 + /* sleep a bit before our last few tries, see the comment on 230 + this where ABIT_UGURU_WAIT_TIMEOUT_SLEEP is defined. */ 231 + if (timeout <= ABIT_UGURU_WAIT_TIMEOUT_SLEEP) 232 + msleep(0); 230 233 } 231 234 return 0; 232 235 } ··· 261 256 "CMD reg does not hold 0xAC after ready command\n"); 262 257 return -EIO; 263 258 } 259 + msleep(0); 264 260 } 265 261 266 262 /* After this the ABIT_UGURU_DATA port should contain ··· 274 268 "state != more input after ready command\n"); 275 269 return -EIO; 276 270 } 271 + msleep(0); 277 272 } 278 273 279 274 data->uguru_ready = 1; ··· 338 331 /* And read the data */ 339 332 for (i = 0; i < count; i++) { 340 333 if (abituguru_wait(data, ABIT_UGURU_STATUS_READ)) { 341 - ABIT_UGURU_DEBUG(1, "timeout exceeded waiting for " 334 + ABIT_UGURU_DEBUG(retries ? 1 : 3, 335 + "timeout exceeded waiting for " 342 336 "read state (bank: %d, sensor: %d)\n", 343 337 (int)bank_addr, (int)sensor_addr); 344 338 break; ··· 358 350 static int abituguru_write(struct abituguru_data *data, 359 351 u8 bank_addr, u8 sensor_addr, u8 *buf, int count) 360 352 { 361 - int i; 353 + /* We use the ready timeout as we have to wait for 0xAC just like the 354 + ready function */ 355 + int i, timeout = ABIT_UGURU_READY_TIMEOUT; 362 356 363 357 /* Send the address */ 364 358 i = abituguru_send_address(data, bank_addr, sensor_addr, ··· 380 370 } 381 371 382 372 /* Now we need to wait till the chip is ready to be read again, 383 - don't ask why */ 373 + so that we can read 0xAC as confirmation that our write has 374 + succeeded. */ 384 375 if (abituguru_wait(data, ABIT_UGURU_STATUS_READ)) { 385 376 ABIT_UGURU_DEBUG(1, "timeout exceeded waiting for read state " 386 377 "after write (bank: %d, sensor: %d)\n", (int)bank_addr, ··· 390 379 } 391 380 392 381 /* Cmd port MUST be read now and should contain 0xAC */ 393 - if (inb_p(data->addr + ABIT_UGURU_CMD) != 0xAC) { 394 - ABIT_UGURU_DEBUG(1, "CMD reg does not hold 0xAC after write " 395 - "(bank: %d, sensor: %d)\n", (int)bank_addr, 396 - (int)sensor_addr); 397 - return -EIO; 382 + while (inb_p(data->addr + ABIT_UGURU_CMD) != 0xAC) { 383 + timeout--; 384 + if (timeout == 0) { 385 + ABIT_UGURU_DEBUG(1, "CMD reg does not hold 0xAC after " 386 + "write (bank: %d, sensor: %d)\n", 387 + (int)bank_addr, (int)sensor_addr); 388 + return -EIO; 389 + } 390 + msleep(0); 398 391 } 399 392 400 393 /* Last put the chip back in ready state */ ··· 418 403 u8 sensor_addr) 419 404 { 420 405 u8 val, buf[3]; 421 - int ret = ABIT_UGURU_NC; 406 + int i, ret = -ENODEV; /* error is the most common used retval :| */ 422 407 423 408 /* If overriden by the user return the user selected type */ 424 409 if (bank1_types[sensor_addr] >= ABIT_UGURU_IN_SENSOR && ··· 454 439 buf[2] = 250; 455 440 if (abituguru_write(data, ABIT_UGURU_SENSOR_BANK1 + 2, sensor_addr, 456 441 buf, 3) != 3) 457 - return -ENODEV; 442 + goto abituguru_detect_bank1_sensor_type_exit; 458 443 /* Now we need 20 ms to give the uguru time to read the sensors 459 444 and raise a voltage alarm */ 460 445 set_current_state(TASK_UNINTERRUPTIBLE); ··· 462 447 /* Check for alarm and check the alarm is a volt low alarm. */ 463 448 if (abituguru_read(data, ABIT_UGURU_ALARM_BANK, 0, buf, 3, 464 449 ABIT_UGURU_MAX_RETRIES) != 3) 465 - return -ENODEV; 450 + goto abituguru_detect_bank1_sensor_type_exit; 466 451 if (buf[sensor_addr/8] & (0x01 << (sensor_addr % 8))) { 467 452 if (abituguru_read(data, ABIT_UGURU_SENSOR_BANK1 + 1, 468 453 sensor_addr, buf, 3, 469 454 ABIT_UGURU_MAX_RETRIES) != 3) 470 - return -ENODEV; 455 + goto abituguru_detect_bank1_sensor_type_exit; 471 456 if (buf[0] & ABIT_UGURU_VOLT_LOW_ALARM_FLAG) { 472 - /* Restore original settings */ 473 - if (abituguru_write(data, ABIT_UGURU_SENSOR_BANK1 + 2, 474 - sensor_addr, 475 - data->bank1_settings[sensor_addr], 476 - 3) != 3) 477 - return -ENODEV; 478 457 ABIT_UGURU_DEBUG(2, " found volt sensor\n"); 479 - return ABIT_UGURU_IN_SENSOR; 458 + ret = ABIT_UGURU_IN_SENSOR; 459 + goto abituguru_detect_bank1_sensor_type_exit; 480 460 } else 481 461 ABIT_UGURU_DEBUG(2, " alarm raised during volt " 482 462 "sensor test, but volt low flag not set\n"); ··· 487 477 buf[2] = 10; 488 478 if (abituguru_write(data, ABIT_UGURU_SENSOR_BANK1 + 2, sensor_addr, 489 479 buf, 3) != 3) 490 - return -ENODEV; 480 + goto abituguru_detect_bank1_sensor_type_exit; 491 481 /* Now we need 50 ms to give the uguru time to read the sensors 492 482 and raise a temp alarm */ 493 483 set_current_state(TASK_UNINTERRUPTIBLE); ··· 495 485 /* Check for alarm and check the alarm is a temp high alarm. */ 496 486 if (abituguru_read(data, ABIT_UGURU_ALARM_BANK, 0, buf, 3, 497 487 ABIT_UGURU_MAX_RETRIES) != 3) 498 - return -ENODEV; 488 + goto abituguru_detect_bank1_sensor_type_exit; 499 489 if (buf[sensor_addr/8] & (0x01 << (sensor_addr % 8))) { 500 490 if (abituguru_read(data, ABIT_UGURU_SENSOR_BANK1 + 1, 501 491 sensor_addr, buf, 3, 502 492 ABIT_UGURU_MAX_RETRIES) != 3) 503 - return -ENODEV; 493 + goto abituguru_detect_bank1_sensor_type_exit; 504 494 if (buf[0] & ABIT_UGURU_TEMP_HIGH_ALARM_FLAG) { 505 - ret = ABIT_UGURU_TEMP_SENSOR; 506 495 ABIT_UGURU_DEBUG(2, " found temp sensor\n"); 496 + ret = ABIT_UGURU_TEMP_SENSOR; 497 + goto abituguru_detect_bank1_sensor_type_exit; 507 498 } else 508 499 ABIT_UGURU_DEBUG(2, " alarm raised during temp " 509 500 "sensor test, but temp high flag not set\n"); ··· 512 501 ABIT_UGURU_DEBUG(2, " alarm not raised during temp sensor " 513 502 "test\n"); 514 503 515 - /* Restore original settings */ 516 - if (abituguru_write(data, ABIT_UGURU_SENSOR_BANK1 + 2, sensor_addr, 517 - data->bank1_settings[sensor_addr], 3) != 3) 504 + ret = ABIT_UGURU_NC; 505 + abituguru_detect_bank1_sensor_type_exit: 506 + /* Restore original settings, failing here is really BAD, it has been 507 + reported that some BIOS-es hang when entering the uGuru menu with 508 + invalid settings present in the uGuru, so we try this 3 times. */ 509 + for (i = 0; i < 3; i++) 510 + if (abituguru_write(data, ABIT_UGURU_SENSOR_BANK1 + 2, 511 + sensor_addr, data->bank1_settings[sensor_addr], 512 + 3) == 3) 513 + break; 514 + if (i == 3) { 515 + printk(KERN_ERR ABIT_UGURU_NAME 516 + ": Fatal error could not restore original settings. " 517 + "This should never happen please report this to the " 518 + "abituguru maintainer (see MAINTAINERS)\n"); 518 519 return -ENODEV; 519 - 520 + } 520 521 return ret; 521 522 } 522 523 ··· 1328 1305 data->update_timeouts = 0; 1329 1306 LEAVE_UPDATE: 1330 1307 /* handle timeout condition */ 1331 - if (err == -EBUSY) { 1308 + if (!success && (err == -EBUSY || err >= 0)) { 1332 1309 /* No overflow please */ 1333 1310 if (data->update_timeouts < 255u) 1334 1311 data->update_timeouts++;