Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

perf record: Extend --threads command line option

Extend --threads option in perf record command line interface.
The option can have a value in the form of masks that specify
CPUs to be monitored with data streaming threads and its layout
in system topology. The masks can be filtered using CPU mask
provided via -C option.

The specification value can be user defined list of masks. Masks
separated by colon define CPUs to be monitored by one thread and
affinity mask of that thread is separated by slash. For example:
<cpus mask 1>/<affinity mask 1>:<cpu mask 2>/<affinity mask 2>
specifies parallel threads layout that consists of two threads
with corresponding assigned CPUs to be monitored.

The specification value can be a string e.g. "cpu", "core" or
"package" meaning creation of data streaming thread for every
CPU or core or package to monitor distinct CPUs or CPUs grouped
by core or package.

The option provided with no or empty value defaults to per-cpu
parallel threads layout creating data streaming thread for every
CPU being monitored.

Document --threads option syntax and parallel data streaming modes
in Documentation/perf-record.txt.

Suggested-by: Jiri Olsa <jolsa@kernel.org>
Suggested-by: Namhyung Kim <namhyung@kernel.org>
Reviewed-by: Riccardo Mancini <rickyman7@gmail.com>
Signed-off-by: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Tested-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Riccardo Mancini <rickyman7@gmail.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Namhyung Kim <namhyung@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Antonov <alexander.antonov@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Budankov <abudankov@huawei.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/079e2619be70c465317cf7c9fdaf5fa069728c32.1642440724.git.alexey.v.bayduraev@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

authored by

Alexey Bayduraev and committed by
Arnaldo Carvalho de Melo
f466e5ed 06380a84

+349 -4
+32 -2
tools/perf/Documentation/perf-record.txt
··· 713 713 wait -n ${perf_pid} 714 714 exit $? 715 715 716 - --threads:: 716 + --threads=<spec>:: 717 717 Write collected trace data into several data files using parallel threads. 718 - The option creates a data streaming thread for each CPU in the system. 718 + <spec> value can be user defined list of masks. Masks separated by colon 719 + define CPUs to be monitored by a thread and affinity mask of that thread 720 + is separated by slash: 721 + 722 + <cpus mask 1>/<affinity mask 1>:<cpus mask 2>/<affinity mask 2>:... 723 + 724 + CPUs or affinity masks must not overlap with other corresponding masks. 725 + Invalid CPUs are ignored, but masks containing only invalid CPUs are not 726 + allowed. 727 + 728 + For example user specification like the following: 729 + 730 + 0,2-4/2-4:1,5-7/5-7 731 + 732 + specifies parallel threads layout that consists of two threads, 733 + the first thread monitors CPUs 0 and 2-4 with the affinity mask 2-4, 734 + the second monitors CPUs 1 and 5-7 with the affinity mask 5-7. 735 + 736 + <spec> value can also be a string meaning predefined parallel threads 737 + layout: 738 + 739 + cpu - create new data streaming thread for every monitored cpu 740 + core - create new thread to monitor CPUs grouped by a core 741 + package - create new thread to monitor CPUs grouped by a package 742 + numa - create new threed to monitor CPUs grouped by a NUMA domain 743 + 744 + Predefined layouts can be used on systems with large number of CPUs in 745 + order not to spawn multiple per-cpu streaming threads but still avoid LOST 746 + events in data directory files. Option specified with no or empty value 747 + defaults to CPU layout. Masks defined or provided by the option value are 748 + filtered through the mask provided by -C option. 719 749 720 750 include::intel-hybrid.txt[] 721 751
+316 -2
tools/perf/builtin-record.c
··· 51 51 #include "util/evlist-hybrid.h" 52 52 #include "asm/bug.h" 53 53 #include "perf.h" 54 + #include "cputopo.h" 54 55 55 56 #include <errno.h> 56 57 #include <inttypes.h> ··· 131 130 enum thread_spec { 132 131 THREAD_SPEC__UNDEFINED = 0, 133 132 THREAD_SPEC__CPU, 133 + THREAD_SPEC__CORE, 134 + THREAD_SPEC__PACKAGE, 135 + THREAD_SPEC__NUMA, 136 + THREAD_SPEC__USER, 137 + THREAD_SPEC__MAX, 138 + }; 139 + 140 + static const char *thread_spec_tags[THREAD_SPEC__MAX] = { 141 + "undefined", "cpu", "core", "package", "numa", "user" 134 142 }; 135 143 136 144 struct record { ··· 2785 2775 2786 2776 static int record__parse_threads(const struct option *opt, const char *str, int unset) 2787 2777 { 2778 + int s; 2788 2779 struct record_opts *opts = opt->value; 2789 2780 2790 - if (unset || !str || !strlen(str)) 2781 + if (unset || !str || !strlen(str)) { 2791 2782 opts->threads_spec = THREAD_SPEC__CPU; 2783 + } else { 2784 + for (s = 1; s < THREAD_SPEC__MAX; s++) { 2785 + if (s == THREAD_SPEC__USER) { 2786 + opts->threads_user_spec = strdup(str); 2787 + if (!opts->threads_user_spec) 2788 + return -ENOMEM; 2789 + opts->threads_spec = THREAD_SPEC__USER; 2790 + break; 2791 + } 2792 + if (!strncasecmp(str, thread_spec_tags[s], strlen(thread_spec_tags[s]))) { 2793 + opts->threads_spec = s; 2794 + break; 2795 + } 2796 + } 2797 + } 2798 + 2799 + if (opts->threads_spec == THREAD_SPEC__USER) 2800 + pr_debug("threads_spec: %s\n", opts->threads_user_spec); 2801 + else 2802 + pr_debug("threads_spec: %s\n", thread_spec_tags[opts->threads_spec]); 2792 2803 2793 2804 return 0; 2794 2805 } ··· 3304 3273 set_bit(cpus->map[c].cpu, mask->bits); 3305 3274 } 3306 3275 3276 + static int record__mmap_cpu_mask_init_spec(struct mmap_cpu_mask *mask, const char *mask_spec) 3277 + { 3278 + struct perf_cpu_map *cpus; 3279 + 3280 + cpus = perf_cpu_map__new(mask_spec); 3281 + if (!cpus) 3282 + return -ENOMEM; 3283 + 3284 + bitmap_zero(mask->bits, mask->nbits); 3285 + record__mmap_cpu_mask_init(mask, cpus); 3286 + perf_cpu_map__put(cpus); 3287 + 3288 + return 0; 3289 + } 3290 + 3307 3291 static void record__free_thread_masks(struct record *rec, int nr_threads) 3308 3292 { 3309 3293 int t; ··· 3381 3335 return 0; 3382 3336 } 3383 3337 3338 + static int record__init_thread_masks_spec(struct record *rec, struct perf_cpu_map *cpus, 3339 + const char **maps_spec, const char **affinity_spec, 3340 + u32 nr_spec) 3341 + { 3342 + u32 s; 3343 + int ret = 0, t = 0; 3344 + struct mmap_cpu_mask cpus_mask; 3345 + struct thread_mask thread_mask, full_mask, *thread_masks; 3346 + 3347 + ret = record__mmap_cpu_mask_alloc(&cpus_mask, cpu__max_cpu().cpu); 3348 + if (ret) { 3349 + pr_err("Failed to allocate CPUs mask\n"); 3350 + return ret; 3351 + } 3352 + record__mmap_cpu_mask_init(&cpus_mask, cpus); 3353 + 3354 + ret = record__thread_mask_alloc(&full_mask, cpu__max_cpu().cpu); 3355 + if (ret) { 3356 + pr_err("Failed to allocate full mask\n"); 3357 + goto out_free_cpu_mask; 3358 + } 3359 + 3360 + ret = record__thread_mask_alloc(&thread_mask, cpu__max_cpu().cpu); 3361 + if (ret) { 3362 + pr_err("Failed to allocate thread mask\n"); 3363 + goto out_free_full_and_cpu_masks; 3364 + } 3365 + 3366 + for (s = 0; s < nr_spec; s++) { 3367 + ret = record__mmap_cpu_mask_init_spec(&thread_mask.maps, maps_spec[s]); 3368 + if (ret) { 3369 + pr_err("Failed to initialize maps thread mask\n"); 3370 + goto out_free; 3371 + } 3372 + ret = record__mmap_cpu_mask_init_spec(&thread_mask.affinity, affinity_spec[s]); 3373 + if (ret) { 3374 + pr_err("Failed to initialize affinity thread mask\n"); 3375 + goto out_free; 3376 + } 3377 + 3378 + /* ignore invalid CPUs but do not allow empty masks */ 3379 + if (!bitmap_and(thread_mask.maps.bits, thread_mask.maps.bits, 3380 + cpus_mask.bits, thread_mask.maps.nbits)) { 3381 + pr_err("Empty maps mask: %s\n", maps_spec[s]); 3382 + ret = -EINVAL; 3383 + goto out_free; 3384 + } 3385 + if (!bitmap_and(thread_mask.affinity.bits, thread_mask.affinity.bits, 3386 + cpus_mask.bits, thread_mask.affinity.nbits)) { 3387 + pr_err("Empty affinity mask: %s\n", affinity_spec[s]); 3388 + ret = -EINVAL; 3389 + goto out_free; 3390 + } 3391 + 3392 + /* do not allow intersection with other masks (full_mask) */ 3393 + if (bitmap_intersects(thread_mask.maps.bits, full_mask.maps.bits, 3394 + thread_mask.maps.nbits)) { 3395 + pr_err("Intersecting maps mask: %s\n", maps_spec[s]); 3396 + ret = -EINVAL; 3397 + goto out_free; 3398 + } 3399 + if (bitmap_intersects(thread_mask.affinity.bits, full_mask.affinity.bits, 3400 + thread_mask.affinity.nbits)) { 3401 + pr_err("Intersecting affinity mask: %s\n", affinity_spec[s]); 3402 + ret = -EINVAL; 3403 + goto out_free; 3404 + } 3405 + 3406 + bitmap_or(full_mask.maps.bits, full_mask.maps.bits, 3407 + thread_mask.maps.bits, full_mask.maps.nbits); 3408 + bitmap_or(full_mask.affinity.bits, full_mask.affinity.bits, 3409 + thread_mask.affinity.bits, full_mask.maps.nbits); 3410 + 3411 + thread_masks = realloc(rec->thread_masks, (t + 1) * sizeof(struct thread_mask)); 3412 + if (!thread_masks) { 3413 + pr_err("Failed to reallocate thread masks\n"); 3414 + ret = -ENOMEM; 3415 + goto out_free; 3416 + } 3417 + rec->thread_masks = thread_masks; 3418 + rec->thread_masks[t] = thread_mask; 3419 + if (verbose) { 3420 + pr_debug("thread_masks[%d]: ", t); 3421 + mmap_cpu_mask__scnprintf(&rec->thread_masks[t].maps, "maps"); 3422 + pr_debug("thread_masks[%d]: ", t); 3423 + mmap_cpu_mask__scnprintf(&rec->thread_masks[t].affinity, "affinity"); 3424 + } 3425 + t++; 3426 + ret = record__thread_mask_alloc(&thread_mask, cpu__max_cpu().cpu); 3427 + if (ret) { 3428 + pr_err("Failed to allocate thread mask\n"); 3429 + goto out_free_full_and_cpu_masks; 3430 + } 3431 + } 3432 + rec->nr_threads = t; 3433 + pr_debug("nr_threads: %d\n", rec->nr_threads); 3434 + if (!rec->nr_threads) 3435 + ret = -EINVAL; 3436 + 3437 + out_free: 3438 + record__thread_mask_free(&thread_mask); 3439 + out_free_full_and_cpu_masks: 3440 + record__thread_mask_free(&full_mask); 3441 + out_free_cpu_mask: 3442 + record__mmap_cpu_mask_free(&cpus_mask); 3443 + 3444 + return ret; 3445 + } 3446 + 3447 + static int record__init_thread_core_masks(struct record *rec, struct perf_cpu_map *cpus) 3448 + { 3449 + int ret; 3450 + struct cpu_topology *topo; 3451 + 3452 + topo = cpu_topology__new(); 3453 + if (!topo) { 3454 + pr_err("Failed to allocate CPU topology\n"); 3455 + return -ENOMEM; 3456 + } 3457 + 3458 + ret = record__init_thread_masks_spec(rec, cpus, topo->core_cpus_list, 3459 + topo->core_cpus_list, topo->core_cpus_lists); 3460 + cpu_topology__delete(topo); 3461 + 3462 + return ret; 3463 + } 3464 + 3465 + static int record__init_thread_package_masks(struct record *rec, struct perf_cpu_map *cpus) 3466 + { 3467 + int ret; 3468 + struct cpu_topology *topo; 3469 + 3470 + topo = cpu_topology__new(); 3471 + if (!topo) { 3472 + pr_err("Failed to allocate CPU topology\n"); 3473 + return -ENOMEM; 3474 + } 3475 + 3476 + ret = record__init_thread_masks_spec(rec, cpus, topo->package_cpus_list, 3477 + topo->package_cpus_list, topo->package_cpus_lists); 3478 + cpu_topology__delete(topo); 3479 + 3480 + return ret; 3481 + } 3482 + 3483 + static int record__init_thread_numa_masks(struct record *rec, struct perf_cpu_map *cpus) 3484 + { 3485 + u32 s; 3486 + int ret; 3487 + const char **spec; 3488 + struct numa_topology *topo; 3489 + 3490 + topo = numa_topology__new(); 3491 + if (!topo) { 3492 + pr_err("Failed to allocate NUMA topology\n"); 3493 + return -ENOMEM; 3494 + } 3495 + 3496 + spec = zalloc(topo->nr * sizeof(char *)); 3497 + if (!spec) { 3498 + pr_err("Failed to allocate NUMA spec\n"); 3499 + ret = -ENOMEM; 3500 + goto out_delete_topo; 3501 + } 3502 + for (s = 0; s < topo->nr; s++) 3503 + spec[s] = topo->nodes[s].cpus; 3504 + 3505 + ret = record__init_thread_masks_spec(rec, cpus, spec, spec, topo->nr); 3506 + 3507 + zfree(&spec); 3508 + 3509 + out_delete_topo: 3510 + numa_topology__delete(topo); 3511 + 3512 + return ret; 3513 + } 3514 + 3515 + static int record__init_thread_user_masks(struct record *rec, struct perf_cpu_map *cpus) 3516 + { 3517 + int t, ret; 3518 + u32 s, nr_spec = 0; 3519 + char **maps_spec = NULL, **affinity_spec = NULL, **tmp_spec; 3520 + char *user_spec, *spec, *spec_ptr, *mask, *mask_ptr, *dup_mask = NULL; 3521 + 3522 + for (t = 0, user_spec = (char *)rec->opts.threads_user_spec; ; t++, user_spec = NULL) { 3523 + spec = strtok_r(user_spec, ":", &spec_ptr); 3524 + if (spec == NULL) 3525 + break; 3526 + pr_debug2("threads_spec[%d]: %s\n", t, spec); 3527 + mask = strtok_r(spec, "/", &mask_ptr); 3528 + if (mask == NULL) 3529 + break; 3530 + pr_debug2(" maps mask: %s\n", mask); 3531 + tmp_spec = realloc(maps_spec, (nr_spec + 1) * sizeof(char *)); 3532 + if (!tmp_spec) { 3533 + pr_err("Failed to reallocate maps spec\n"); 3534 + ret = -ENOMEM; 3535 + goto out_free; 3536 + } 3537 + maps_spec = tmp_spec; 3538 + maps_spec[nr_spec] = dup_mask = strdup(mask); 3539 + if (!maps_spec[nr_spec]) { 3540 + pr_err("Failed to allocate maps spec[%d]\n", nr_spec); 3541 + ret = -ENOMEM; 3542 + goto out_free; 3543 + } 3544 + mask = strtok_r(NULL, "/", &mask_ptr); 3545 + if (mask == NULL) { 3546 + pr_err("Invalid thread maps or affinity specs\n"); 3547 + ret = -EINVAL; 3548 + goto out_free; 3549 + } 3550 + pr_debug2(" affinity mask: %s\n", mask); 3551 + tmp_spec = realloc(affinity_spec, (nr_spec + 1) * sizeof(char *)); 3552 + if (!tmp_spec) { 3553 + pr_err("Failed to reallocate affinity spec\n"); 3554 + ret = -ENOMEM; 3555 + goto out_free; 3556 + } 3557 + affinity_spec = tmp_spec; 3558 + affinity_spec[nr_spec] = strdup(mask); 3559 + if (!affinity_spec[nr_spec]) { 3560 + pr_err("Failed to allocate affinity spec[%d]\n", nr_spec); 3561 + ret = -ENOMEM; 3562 + goto out_free; 3563 + } 3564 + dup_mask = NULL; 3565 + nr_spec++; 3566 + } 3567 + 3568 + ret = record__init_thread_masks_spec(rec, cpus, (const char **)maps_spec, 3569 + (const char **)affinity_spec, nr_spec); 3570 + 3571 + out_free: 3572 + free(dup_mask); 3573 + for (s = 0; s < nr_spec; s++) { 3574 + if (maps_spec) 3575 + free(maps_spec[s]); 3576 + if (affinity_spec) 3577 + free(affinity_spec[s]); 3578 + } 3579 + free(affinity_spec); 3580 + free(maps_spec); 3581 + 3582 + return ret; 3583 + } 3584 + 3384 3585 static int record__init_thread_default_masks(struct record *rec, struct perf_cpu_map *cpus) 3385 3586 { 3386 3587 int ret; ··· 3645 3352 3646 3353 static int record__init_thread_masks(struct record *rec) 3647 3354 { 3355 + int ret = 0; 3648 3356 struct perf_cpu_map *cpus = rec->evlist->core.cpus; 3649 3357 3650 3358 if (!record__threads_enabled(rec)) 3651 3359 return record__init_thread_default_masks(rec, cpus); 3652 3360 3653 - return record__init_thread_cpu_masks(rec, cpus); 3361 + switch (rec->opts.threads_spec) { 3362 + case THREAD_SPEC__CPU: 3363 + ret = record__init_thread_cpu_masks(rec, cpus); 3364 + break; 3365 + case THREAD_SPEC__CORE: 3366 + ret = record__init_thread_core_masks(rec, cpus); 3367 + break; 3368 + case THREAD_SPEC__PACKAGE: 3369 + ret = record__init_thread_package_masks(rec, cpus); 3370 + break; 3371 + case THREAD_SPEC__NUMA: 3372 + ret = record__init_thread_numa_masks(rec, cpus); 3373 + break; 3374 + case THREAD_SPEC__USER: 3375 + ret = record__init_thread_user_masks(rec, cpus); 3376 + break; 3377 + default: 3378 + break; 3379 + } 3380 + 3381 + return ret; 3654 3382 } 3655 3383 3656 3384 int cmd_record(int argc, const char **argv)
+1
tools/perf/util/record.h
··· 79 79 bool ctl_fd_close; 80 80 int synth; 81 81 int threads_spec; 82 + const char *threads_user_spec; 82 83 }; 83 84 84 85 extern const char * const *record_usage;