perf script: task-analyzer add csv support

Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

kernel os linux

This patch adds the possibility to write the trace and the summary as csv files
to a user specified file. A format as such simplifies further data processing.
This is achieved by having ";" as separators instead of spaces and solely one
header per file.

Additional parameters are being considered, like in the normal usage of the
script. Colors are turned off in the case of a csv output, thus the highlight
option is also being ignored.

Usage:

Write standard task to csv file:

$ perf script report tasks-analyzer --csv <file>

write limited output to csv file in nanoseconds:

$ perf script report tasks-analyzer --csv <file> --ns --limit-to-tasks 1337

Write summary to a csv file:

$ perf script report tasks-analyzer --csv-summary <file>

Write summary to csv file with additional schedule information:

$ perf script report tasks-analyzer --csv-summary <file> --summary-extended

Write both summary and standard task to a csv file:

$ perf script report tasks-analyzer --csv --csv-summary

The following examples illustrate what is possible with the CSV output. The
first command sequence will record all scheduler switch events for 10 seconds,
the task-analyzer calculates task information like runtimes as CSV. A small
python snippet using pandas and matplotlib will visualize the most frequent
task (e.g. kworker/1:1) runtimes - each runtime as a bar in a bar chart:

$ perf record -e sched:sched_switch -a -- sleep 10
$ perf script report tasks-analyzer --ns --csv tasks.csv
$ cat << EOF > /tmp/freq-comm-runtimes-bar.py
import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv("tasks.csv", sep=';')
most_freq_comm = df["COMM"].value_counts().idxmax()
most_freq_runtimes = df[df["COMM"]==most_freq_comm]["Runtime"]
plt.title(f"Runtimes for Task {most_freq_comm} in Nanoseconds")
plt.bar(range(len(most_freq_runtimes)), most_freq_runtimes)
plt.show()
$ python3 /tmp/freq-comm-runtimes-bar.py

As a seconds example, the subsequent script generates a pie chart of all
accumulated tasks runtimes for 10 seconds of system recordings:

$ perf record -e sched:sched_switch -a -- sleep 10
$ perf script report tasks-analyzer --csv-summary task-summary.csv
$ cat << EOF > /tmp/accumulated-task-pie.py
import pandas as pd
from matplotlib.pyplot import pie, axis, show

df = pd.read_csv("task-summary.csv", sep=';')
sums = df.groupby(df["Comm"])["Accumulated"].sum()
axis("equal")
pie(sums, labels=sums.index);
show()
EOF
$ python3 /tmp/accumulated-task-pie.py

A variety of other visualizations are possible in matplotlib and other
environments. Of course, pandas, numpy and co. also allow easy
statistical analysis of the data!

Signed-off-by: Petar Gligoric <petar.gligoric@rohde-schwarz.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20221206154406.41941-3-petar.gligor@gmail.com
Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

authored by

Petar Gligoric and committed by

Arnaldo Carvalho de Melo 3 years ago fdd0f81f e76aff05

+187 -91

1 changed file

expand all

tools

perf

scripts

python

task-analyzer.py

+187 -91

tools/perf/scripts/python/task-analyzer.py

··· 156 156 help="always, never or auto, allowing configuring color output" 157 157 " via the command line", 158 158 ) 159 + parser.add_argument( 160 + "--csv", 161 + default="", 162 + help="Write trace to file selected by user. Options, like --ns or --extended" 163 + "-times are used.", 164 + ) 165 + parser.add_argument( 166 + "--csv-summary", 167 + default="", 168 + help="Write summary to file selected by user. Options, like --ns or" 169 + " --summary-extended are used.", 170 + ) 159 171 args = parser.parse_args() 160 172 args.tid_renames = dict() 161 173 ··· 287 275 288 276 289 277 290 - 291 278 class Summary(object): 292 279 """ 293 280 Primary instance for calculating the summary output. Processes the whole trace to ··· 338 327 sum(db["inter_times"].values()) - 4 * decimal_precision 339 328 ) 340 329 _header += ("Max Inter Task Times",) 341 - print(fmt.format(*_header)) 330 + fd_sum.write(fmt.format(*_header) + "\n") 342 331 343 332 def _column_titles(self): 344 333 """ ··· 347 336 values are being displayed in grey. Thus in their format two additional {}, 348 337 are placed for color set and reset. 349 338 """ 339 + separator, fix_csv_align = _prepare_fmt_sep() 350 340 decimal_precision, time_precision = _prepare_fmt_precision() 351 - fmt = " {{:>{}}}".format(db["task_info"]["pid"]) 352 - fmt += " {{:>{}}}".format(db["task_info"]["tid"]) 353 - fmt += " {{:>{}}}".format(db["task_info"]["comm"]) 354 - fmt += " {{:>{}}}".format(db["runtime_info"]["runs"]) 355 - fmt += " {{:>{}}}".format(db["runtime_info"]["acc"]) 356 - fmt += " {{:>{}}}".format(db["runtime_info"]["mean"]) 357 - fmt += " {{:>{}}}".format(db["runtime_info"]["median"]) 358 - fmt += " {{:>{}}}".format(db["runtime_info"]["min"] - decimal_precision) 359 - fmt += " {{:>{}}}".format(db["runtime_info"]["max"] - decimal_precision) 360 - fmt += " {{}}{{:>{}}}{{}}".format(db["runtime_info"]["max_at"] - time_precision) 341 + fmt = "{{:>{}}}".format(db["task_info"]["pid"] * fix_csv_align) 342 + fmt += "{}{{:>{}}}".format(separator, db["task_info"]["tid"] * fix_csv_align) 343 + fmt += "{}{{:>{}}}".format(separator, db["task_info"]["comm"] * fix_csv_align) 344 + fmt += "{}{{:>{}}}".format(separator, db["runtime_info"]["runs"] * fix_csv_align) 345 + fmt += "{}{{:>{}}}".format(separator, db["runtime_info"]["acc"] * fix_csv_align) 346 + fmt += "{}{{:>{}}}".format(separator, db["runtime_info"]["mean"] * fix_csv_align) 347 + fmt += "{}{{:>{}}}".format( 348 + separator, db["runtime_info"]["median"] * fix_csv_align 349 + ) 350 + fmt += "{}{{:>{}}}".format( 351 + separator, (db["runtime_info"]["min"] - decimal_precision) * fix_csv_align 352 + ) 353 + fmt += "{}{{:>{}}}".format( 354 + separator, (db["runtime_info"]["max"] - decimal_precision) * fix_csv_align 355 + ) 356 + fmt += "{}{{}}{{:>{}}}{{}}".format( 357 + separator, (db["runtime_info"]["max_at"] - time_precision) * fix_csv_align 358 + ) 361 359 362 360 column_titles = ("PID", "TID", "Comm") 363 361 column_titles += ("Runs", "Accumulated", "Mean", "Median", "Min", "Max") 364 - column_titles += (_COLORS["grey"], "At", _COLORS["reset"]) 362 + column_titles += (_COLORS["grey"], "Max At", _COLORS["reset"]) 365 363 366 364 if args.summary_extended: 367 - fmt += " {{:>{}}}".format(db["inter_times"]["out_in"] - decimal_precision) 368 - fmt += " {{}}{{:>{}}}{{}}".format( 369 - db["inter_times"]["inter_at"] - time_precision 365 + fmt += "{}{{:>{}}}".format( 366 + separator, 367 + (db["inter_times"]["out_in"] - decimal_precision) * fix_csv_align 370 368 ) 371 - fmt += " {{:>{}}}".format(db["inter_times"]["out_out"] - decimal_precision) 372 - fmt += " {{:>{}}}".format(db["inter_times"]["in_in"] - decimal_precision) 373 - fmt += " {{:>{}}}".format(db["inter_times"]["in_out"] - decimal_precision) 369 + fmt += "{}{{}}{{:>{}}}{{}}".format( 370 + separator, 371 + (db["inter_times"]["inter_at"] - time_precision) * fix_csv_align 372 + ) 373 + fmt += "{}{{:>{}}}".format( 374 + separator, 375 + (db["inter_times"]["out_out"] - decimal_precision) * fix_csv_align 376 + ) 377 + fmt += "{}{{:>{}}}".format( 378 + separator, 379 + (db["inter_times"]["in_in"] - decimal_precision) * fix_csv_align 380 + ) 381 + fmt += "{}{{:>{}}}".format( 382 + separator, 383 + (db["inter_times"]["in_out"] - decimal_precision) * fix_csv_align 384 + ) 374 385 375 386 column_titles += ("Out-In", _COLORS["grey"], "Max At", _COLORS["reset"], 376 387 "Out-Out", "In-In", "In-Out") 377 - print(fmt.format(*column_titles)) 388 + 389 + fd_sum.write(fmt.format(*column_titles) + "\n") 390 + 378 391 379 392 def _task_stats(self): 380 393 """calculates the stats of every task and constructs the printable summary""" ··· 449 414 self._calc_alignments_summary(align_helper) 450 415 451 416 def _format_stats(self): 417 + separator, fix_csv_align = _prepare_fmt_sep() 452 418 decimal_precision, time_precision = _prepare_fmt_precision() 453 - fmt = " {{:>{}d}}".format(db["task_info"]["pid"]) 454 - fmt += " {{:>{}d}}".format(db["task_info"]["tid"]) 455 - fmt += " {{:>{}}}".format(db["task_info"]["comm"]) 456 - fmt += " {{:>{}d}}".format(db["runtime_info"]["runs"]) 457 - fmt += " {{:>{}.{}f}}".format(db["runtime_info"]["acc"], time_precision) 458 - fmt += " {{}}{{:>{}.{}f}}".format(db["runtime_info"]["mean"], time_precision) 459 - fmt += " {{:>{}.{}f}}".format(db["runtime_info"]["median"], time_precision) 460 - fmt += " {{:>{}.{}f}}".format( 461 - db["runtime_info"]["min"] - decimal_precision, time_precision 462 - ) 463 - fmt += " {{:>{}.{}f}}".format( 464 - db["runtime_info"]["max"] - decimal_precision, time_precision 465 - ) 466 - fmt += " {{}}{{:>{}.{}f}}{{}}{{}}".format( 467 - db["runtime_info"]["max_at"] - time_precision, decimal_precision 419 + len_pid = db["task_info"]["pid"] * fix_csv_align 420 + len_tid = db["task_info"]["tid"] * fix_csv_align 421 + len_comm = db["task_info"]["comm"] * fix_csv_align 422 + len_runs = db["runtime_info"]["runs"] * fix_csv_align 423 + len_acc = db["runtime_info"]["acc"] * fix_csv_align 424 + len_mean = db["runtime_info"]["mean"] * fix_csv_align 425 + len_median = db["runtime_info"]["median"] * fix_csv_align 426 + len_min = (db["runtime_info"]["min"] - decimal_precision) * fix_csv_align 427 + len_max = (db["runtime_info"]["max"] - decimal_precision) * fix_csv_align 428 + len_max_at = (db["runtime_info"]["max_at"] - time_precision) * fix_csv_align 429 + if args.summary_extended: 430 + len_out_in = ( 431 + db["inter_times"]["out_in"] - decimal_precision 432 + ) * fix_csv_align 433 + len_inter_at = ( 434 + db["inter_times"]["inter_at"] - time_precision 435 + ) * fix_csv_align 436 + len_out_out = ( 437 + db["inter_times"]["out_out"] - decimal_precision 438 + ) * fix_csv_align 439 + len_in_in = (db["inter_times"]["in_in"] - decimal_precision) * fix_csv_align 440 + len_in_out = ( 441 + db["inter_times"]["in_out"] - decimal_precision 442 + ) * fix_csv_align 443 + 444 + fmt = "{{:{}d}}".format(len_pid) 445 + fmt += "{}{{:{}d}}".format(separator, len_tid) 446 + fmt += "{}{{:>{}}}".format(separator, len_comm) 447 + fmt += "{}{{:{}d}}".format(separator, len_runs) 448 + fmt += "{}{{:{}.{}f}}".format(separator, len_acc, time_precision) 449 + fmt += "{}{{}}{{:{}.{}f}}".format(separator, len_mean, time_precision) 450 + fmt += "{}{{:{}.{}f}}".format(separator, len_median, time_precision) 451 + fmt += "{}{{:{}.{}f}}".format(separator, len_min, time_precision) 452 + fmt += "{}{{:{}.{}f}}".format(separator, len_max, time_precision) 453 + fmt += "{}{{}}{{:{}.{}f}}{{}}{{}}".format( 454 + separator, len_max_at, decimal_precision 468 455 ) 469 456 if args.summary_extended: 470 - fmt += " {{:>{}.{}f}}".format( 471 - db["inter_times"]["out_in"] - decimal_precision, time_precision 457 + fmt += "{}{{:{}.{}f}}".format(separator, len_out_in, time_precision) 458 + fmt += "{}{{}}{{:{}.{}f}}{{}}".format( 459 + separator, len_inter_at, decimal_precision 472 460 ) 473 - fmt += " {{}}{{:>{}.{}f}}{{}}".format( 474 - db["inter_times"]["inter_at"] - time_precision, decimal_precision 475 - ) 476 - fmt += " {{:>{}.{}f}}".format( 477 - db["inter_times"]["out_out"] - decimal_precision, time_precision 478 - ) 479 - fmt += " {{:>{}.{}f}}".format( 480 - db["inter_times"]["in_in"] - decimal_precision, time_precision 481 - ) 482 - fmt += " {{:>{}.{}f}}".format( 483 - db["inter_times"]["in_out"] - decimal_precision, time_precision 484 - ) 461 + fmt += "{}{{:{}.{}f}}".format(separator, len_out_out, time_precision) 462 + fmt += "{}{{:{}.{}f}}".format(separator, len_in_in, time_precision) 463 + fmt += "{}{{:{}.{}f}}".format(separator, len_in_out, time_precision) 485 464 return fmt 486 465 487 466 ··· 516 467 517 468 518 469 def print(self): 519 - print("\nSummary") 520 470 self._task_stats() 521 - self._print_header() 522 - self._column_titles() 523 471 fmt = self._format_stats() 472 + 473 + if not args.csv_summary: 474 + print("\nSummary") 475 + self._print_header() 476 + self._column_titles() 524 477 for i in range(len(self._body)): 525 - print(fmt.format(*tuple(self._body[i]))) 478 + fd_sum.write(fmt.format(*tuple(self._body[i])) + "\n") 526 479 527 480 528 481 ··· 582 531 583 532 584 533 def _fmt_header(): 585 - fmt = "{{:>{}}}".format(LEN_SWITCHED_IN) 586 - fmt += " {{:>{}}}".format(LEN_SWITCHED_OUT) 587 - fmt += " {{:>{}}}".format(LEN_CPU) 588 - fmt += " {{:>{}}}".format(LEN_PID) 589 - fmt += " {{:>{}}}".format(LEN_TID) 590 - fmt += " {{:>{}}}".format(LEN_COMM) 591 - fmt += " {{:>{}}}".format(LEN_RUNTIME) 592 - fmt += " {{:>{}}}".format(LEN_OUT_IN) 534 + separator, fix_csv_align = _prepare_fmt_sep() 535 + fmt = "{{:>{}}}".format(LEN_SWITCHED_IN*fix_csv_align) 536 + fmt += "{}{{:>{}}}".format(separator, LEN_SWITCHED_OUT*fix_csv_align) 537 + fmt += "{}{{:>{}}}".format(separator, LEN_CPU*fix_csv_align) 538 + fmt += "{}{{:>{}}}".format(separator, LEN_PID*fix_csv_align) 539 + fmt += "{}{{:>{}}}".format(separator, LEN_TID*fix_csv_align) 540 + fmt += "{}{{:>{}}}".format(separator, LEN_COMM*fix_csv_align) 541 + fmt += "{}{{:>{}}}".format(separator, LEN_RUNTIME*fix_csv_align) 542 + fmt += "{}{{:>{}}}".format(separator, LEN_OUT_IN*fix_csv_align) 593 543 if args.extended_times: 594 - fmt += " {{:>{}}}".format(LEN_OUT_OUT) 595 - fmt += " {{:>{}}}".format(LEN_IN_IN) 596 - fmt += " {{:>{}}}".format(LEN_IN_OUT) 544 + fmt += "{}{{:>{}}}".format(separator, LEN_OUT_OUT*fix_csv_align) 545 + fmt += "{}{{:>{}}}".format(separator, LEN_IN_IN*fix_csv_align) 546 + fmt += "{}{{:>{}}}".format(separator, LEN_IN_OUT*fix_csv_align) 597 547 return fmt 598 548 599 549 600 550 def _fmt_body(): 551 + separator, fix_csv_align = _prepare_fmt_sep() 601 552 decimal_precision, time_precision = _prepare_fmt_precision() 602 - fmt = "{{}}{{:{}.{}f}}".format(LEN_SWITCHED_IN, decimal_precision) 603 - fmt += " {{:{}.{}f}}".format(LEN_SWITCHED_OUT, decimal_precision) 604 - fmt += " {{:{}d}}".format(LEN_CPU) 605 - fmt += " {{:{}d}}".format(LEN_PID) 606 - fmt += " {{}}{{:{}d}}{{}}".format(LEN_TID) 607 - fmt += " {{}}{{:>{}}}".format(LEN_COMM) 608 - fmt += " {{:{}.{}f}}".format(LEN_RUNTIME, time_precision) 553 + fmt = "{{}}{{:{}.{}f}}".format(LEN_SWITCHED_IN*fix_csv_align, decimal_precision) 554 + fmt += "{}{{:{}.{}f}}".format( 555 + separator, LEN_SWITCHED_OUT*fix_csv_align, decimal_precision 556 + ) 557 + fmt += "{}{{:{}d}}".format(separator, LEN_CPU*fix_csv_align) 558 + fmt += "{}{{:{}d}}".format(separator, LEN_PID*fix_csv_align) 559 + fmt += "{}{{}}{{:{}d}}{{}}".format(separator, LEN_TID*fix_csv_align) 560 + fmt += "{}{{}}{{:>{}}}".format(separator, LEN_COMM*fix_csv_align) 561 + fmt += "{}{{:{}.{}f}}".format(separator, LEN_RUNTIME*fix_csv_align, time_precision) 609 562 if args.extended_times: 610 - fmt += " {{:{}.{}f}}".format(LEN_OUT_IN, time_precision) 611 - fmt += " {{:{}.{}f}}".format(LEN_OUT_OUT, time_precision) 612 - fmt += " {{:{}.{}f}}".format(LEN_IN_IN, time_precision) 613 - fmt += " {{:{}.{}f}}{{}}".format(LEN_IN_OUT, time_precision) 563 + fmt += "{}{{:{}.{}f}}".format(separator, LEN_OUT_IN*fix_csv_align, time_precision) 564 + fmt += "{}{{:{}.{}f}}".format(separator, LEN_OUT_OUT*fix_csv_align, time_precision) 565 + fmt += "{}{{:{}.{}f}}".format(separator, LEN_IN_IN*fix_csv_align, time_precision) 566 + fmt += "{}{{:{}.{}f}}{{}}".format( 567 + separator, LEN_IN_OUT*fix_csv_align, time_precision 568 + ) 614 569 else: 615 - fmt += " {{:{}.{}f}}{{}}".format(LEN_OUT_IN, time_precision) 570 + fmt += "{}{{:{}.{}f}}{{}}".format( 571 + separator, LEN_OUT_IN*fix_csv_align, time_precision 572 + ) 616 573 return fmt 617 574 618 575 ··· 630 571 "Time Out-In") 631 572 if args.extended_times: 632 573 header += ("Time Out-Out", "Time In-In", "Time In-Out") 633 - print(fmt.format(*header)) 574 + fd_task.write(fmt.format(*header) + "\n") 575 + 634 576 635 577 636 578 def _print_task_finish(task): ··· 643 583 in_in = -1 644 584 in_out = -1 645 585 fmt = _fmt_body() 646 - 647 586 # depending on user provided highlight option we change the color 648 587 # for particular tasks 649 588 if str(task.tid) in args.highlight_tasks_map: ··· 671 612 out_out = timespan_gap_tid.out_out 672 613 in_in = timespan_gap_tid.in_in 673 614 in_out = timespan_gap_tid.in_out 674 - if args.extended_times: 675 - print(fmt.format(c_row_set, task.time_in(), task.time_out(), task.cpu, task.pid, 676 - c_tid_set, task.tid, c_tid_reset, c_row_set, task.comm, 677 - task.runtime(time_unit), out_in, out_out, in_in, in_out, 678 - c_row_reset)) 679 - else: 680 - print(fmt.format(c_row_set, task.time_in(), task.time_out(), task.cpu, task.pid, 681 - c_tid_set, task.tid, c_tid_reset, c_row_set, task.comm, 682 - task.runtime(time_unit), out_in, c_row_reset)) 683 615 616 + 617 + if args.extended_times: 618 + line_out = fmt.format(c_row_set, task.time_in(), task.time_out(), task.cpu, 619 + task.pid, c_tid_set, task.tid, c_tid_reset, c_row_set, task.comm, 620 + task.runtime(time_unit), out_in, out_out, in_in, in_out, 621 + c_row_reset) + "\n" 622 + else: 623 + line_out = fmt.format(c_row_set, task.time_in(), task.time_out(), task.cpu, 624 + task.pid, c_tid_set, task.tid, c_tid_reset, c_row_set, task.comm, 625 + task.runtime(time_unit), out_in, c_row_reset) + "\n" 626 + try: 627 + fd_task.write(line_out) 628 + except(IOError): 629 + # don't mangle the output if user SIGINT this script 630 + sys.exit() 684 631 685 632 def _record_cleanup(_list): 686 633 """ ··· 798 733 ) 799 734 if args.time_limit and (args.summary or args.summary_only or args.summary_extended): 800 735 sys.exit("Error: Cannot set time limit and print summary") 801 - 736 + if args.csv_summary: 737 + args.summary = True 738 + if args.csv == args.csv_summary: 739 + sys.exit("Error: Chosen files for csv and csv summary are the same") 740 + if args.csv and (args.summary_extended or args.summary) and not args.csv_summary: 741 + sys.exit("Error: No file chosen to write summary to. Choose with --csv-summary " 742 + "<file>") 743 + if args.csv and args.summary_only: 744 + sys.exit("Error: --csv chosen and --summary-only. Standard task would not be" 745 + "written to csv file.") 802 746 803 747 def _argument_prepare_check(): 804 - global time_unit 748 + global time_unit, fd_task, fd_sum 805 749 if args.filter_tasks: 806 750 args.filter_tasks = args.filter_tasks.split(",") 807 751 if args.limit_to_tasks: ··· 841 767 time_unit = "ns" 842 768 elif args.ms: 843 769 time_unit = "ms" 770 + 771 + 772 + fd_task = sys.stdout 773 + if args.csv: 774 + args.stdio_color = "never" 775 + fd_task = open(args.csv, "w") 776 + print("generating csv at",args.csv,) 777 + 778 + fd_sum = sys.stdout 779 + if args.csv_summary: 780 + args.stdio_color = "never" 781 + fd_sum = open(args.csv_summary, "w") 782 + print("generating csv summary at",args.csv_summary) 783 + if not args.csv: 784 + args.summary_only = True 844 785 845 786 846 787 def _is_within_timelimit(time): ··· 890 801 decimal_precision = 6 891 802 time_precision = 3 892 803 if args.ns: 893 - decimal_precision = 9 894 - time_precision = 0 804 + decimal_precision = 9 805 + time_precision = 0 895 806 return decimal_precision, time_precision 896 807 808 + def _prepare_fmt_sep(): 809 + separator = " " 810 + fix_csv_align = 1 811 + if args.csv or args.csv_summary: 812 + separator = ";" 813 + fix_csv_align = 0 814 + return separator, fix_csv_align 897 815 898 816 def trace_unhandled(event_name, context, event_fields_dict, perf_sample_dict): 899 817 pass