Merge branch 'fixes-2.6.39' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq

* 'fixes-2.6.39' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: fix deadlock in worker_maybe_bind_and_lock()
workqueue: Document debugging tricks

Fix up trivial spelling conflict in kernel/workqueue.c

+47 -1
+40
Documentation/workqueue.txt
··· 12 4. Application Programming Interface (API) 13 5. Example Execution Scenarios 14 6. Guidelines 15 16 17 1. Introduction ··· 380 * Unless work items are expected to consume a huge amount of CPU 381 cycles, using a bound wq is usually beneficial due to the increased 382 level of locality in wq operations and work item execution.
··· 12 4. Application Programming Interface (API) 13 5. Example Execution Scenarios 14 6. Guidelines 15 + 7. Debugging 16 17 18 1. Introduction ··· 379 * Unless work items are expected to consume a huge amount of CPU 380 cycles, using a bound wq is usually beneficial due to the increased 381 level of locality in wq operations and work item execution. 382 + 383 + 384 + 7. Debugging 385 + 386 + Because the work functions are executed by generic worker threads 387 + there are a few tricks needed to shed some light on misbehaving 388 + workqueue users. 389 + 390 + Worker threads show up in the process list as: 391 + 392 + root 5671 0.0 0.0 0 0 ? S 12:07 0:00 [kworker/0:1] 393 + root 5672 0.0 0.0 0 0 ? S 12:07 0:00 [kworker/1:2] 394 + root 5673 0.0 0.0 0 0 ? S 12:12 0:00 [kworker/0:0] 395 + root 5674 0.0 0.0 0 0 ? S 12:13 0:00 [kworker/1:0] 396 + 397 + If kworkers are going crazy (using too much cpu), there are two types 398 + of possible problems: 399 + 400 + 1. Something beeing scheduled in rapid succession 401 + 2. A single work item that consumes lots of cpu cycles 402 + 403 + The first one can be tracked using tracing: 404 + 405 + $ echo workqueue:workqueue_queue_work > /sys/kernel/debug/tracing/set_event 406 + $ cat /sys/kernel/debug/tracing/trace_pipe > out.txt 407 + (wait a few secs) 408 + ^C 409 + 410 + If something is busy looping on work queueing, it would be dominating 411 + the output and the offender can be determined with the work item 412 + function. 413 + 414 + For the second type of problems it should be possible to just check 415 + the stack trace of the offending worker thread. 416 + 417 + $ cat /proc/THE_OFFENDING_KWORKER/stack 418 + 419 + The work item's function should be trivially visible in the stack 420 + trace.
+7 -1
kernel/workqueue.c
··· 1291 return true; 1292 spin_unlock_irq(&gcwq->lock); 1293 1294 - /* CPU has come up in between, retry migration */ 1295 cpu_relax(); 1296 } 1297 } 1298
··· 1291 return true; 1292 spin_unlock_irq(&gcwq->lock); 1293 1294 + /* 1295 + * We've raced with CPU hot[un]plug. Give it a breather 1296 + * and retry migration. cond_resched() is required here; 1297 + * otherwise, we might deadlock against cpu_stop trying to 1298 + * bring down the CPU on non-preemptive kernel. 1299 + */ 1300 cpu_relax(); 1301 + cond_resched(); 1302 } 1303 } 1304