Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

perf metric: JSON flag to default metric group

For the default output, the default metric group could vary on different
platforms. For example, on SPR, the TopdownL1 and TopdownL2 metrics
should be displayed in the default mode. On ICL, only the TopdownL1
should be displayed.

Add a flag so we can tag the default metric group for different
platforms rather than hack the perf code.

The flag is added to Intel TopdownL1 since ICL and ADL, TopdownL2
metrics since SPR.

Add a new field, DefaultMetricgroupName, in the JSON file to indicate
the real metric group name.

Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ahmad Yasin <ahmad.yasin@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20230615135315.3662428-3-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

authored by

Kan Liang and committed by
Arnaldo Carvalho de Melo
969a4661 e15e4a3d

+114 -76
+27 -18
tools/perf/pmu-events/arch/x86/alderlake/adl-metrics.json
··· 129 129 }, 130 130 { 131 131 "BriefDescription": "Counts the total number of issue slots that were not consumed by the backend due to backend stalls", 132 + "DefaultMetricgroupName": "TopdownL1", 132 133 "MetricExpr": "TOPDOWN_BE_BOUND.ALL / tma_info_core_slots", 133 - "MetricGroup": "TopdownL1;tma_L1_group", 134 + "MetricGroup": "Default;TopdownL1;tma_L1_group", 134 135 "MetricName": "tma_backend_bound", 135 136 "MetricThreshold": "tma_backend_bound > 0.1", 136 - "MetricgroupNoGroup": "TopdownL1", 137 + "MetricgroupNoGroup": "TopdownL1;Default", 137 138 "PublicDescription": "Counts the total number of issue slots that were not consumed by the backend due to backend stalls. Note that uops must be available for consumption in order for this event to count. If a uop is not available (IQ is empty), this event will not count. The rest of these subevents count backend stalls, in cycles, due to an outstanding request which is memory bound vs core bound. The subevents are not slot based events and therefore can not be precisely added or subtracted from the Backend_Bound_Aux subevents which are slot based.", 138 139 "ScaleUnit": "100%", 139 140 "Unit": "cpu_atom" 140 141 }, 141 142 { 142 143 "BriefDescription": "Counts the total number of issue slots that were not consumed by the backend due to backend stalls", 144 + "DefaultMetricgroupName": "TopdownL1", 143 145 "MetricExpr": "tma_backend_bound", 144 - "MetricGroup": "TopdownL1;tma_L1_group", 146 + "MetricGroup": "Default;TopdownL1;tma_L1_group", 145 147 "MetricName": "tma_backend_bound_aux", 146 148 "MetricThreshold": "tma_backend_bound_aux > 0.2", 147 - "MetricgroupNoGroup": "TopdownL1", 149 + "MetricgroupNoGroup": "TopdownL1;Default", 148 150 "PublicDescription": "Counts the total number of issue slots that were not consumed by the backend due to backend stalls. Note that UOPS must be available for consumption in order for this event to count. If a uop is not available (IQ is empty), this event will not count. All of these subevents count backend stalls, in slots, due to a resource limitation. These are not cycle based events and therefore can not be precisely added or subtracted from the Backend_Bound subevents which are cycle based. These subevents are supplementary to Backend_Bound and can be used to analyze results from a resource perspective at allocation.", 149 151 "ScaleUnit": "100%", 150 152 "Unit": "cpu_atom" 151 153 }, 152 154 { 153 155 "BriefDescription": "Counts the total number of issue slots that were not consumed by the backend because allocation is stalled due to a mispredicted jump or a machine clear", 156 + "DefaultMetricgroupName": "TopdownL1", 154 157 "MetricExpr": "(tma_info_core_slots - (cpu_atom@TOPDOWN_FE_BOUND.ALL@ + cpu_atom@TOPDOWN_BE_BOUND.ALL@ + cpu_atom@TOPDOWN_RETIRING.ALL@)) / tma_info_core_slots", 155 - "MetricGroup": "TopdownL1;tma_L1_group", 158 + "MetricGroup": "Default;TopdownL1;tma_L1_group", 156 159 "MetricName": "tma_bad_speculation", 157 160 "MetricThreshold": "tma_bad_speculation > 0.15", 158 - "MetricgroupNoGroup": "TopdownL1", 161 + "MetricgroupNoGroup": "TopdownL1;Default", 159 162 "PublicDescription": "Counts the total number of issue slots that were not consumed by the backend because allocation is stalled due to a mispredicted jump or a machine clear. Only issue slots wasted due to fast nukes such as memory ordering nukes are counted. Other nukes are not accounted for. Counts all issue slots blocked during this recovery window including relevant microcode flows and while uops are not yet available in the instruction queue (IQ). Also includes the issue slots that were consumed by the backend but were thrown away because they were younger than the mispredict or machine clear.", 160 163 "ScaleUnit": "100%", 161 164 "Unit": "cpu_atom" ··· 298 295 }, 299 296 { 300 297 "BriefDescription": "Counts the number of issue slots that were not consumed by the backend due to frontend stalls.", 298 + "DefaultMetricgroupName": "TopdownL1", 301 299 "MetricExpr": "TOPDOWN_FE_BOUND.ALL / tma_info_core_slots", 302 - "MetricGroup": "TopdownL1;tma_L1_group", 300 + "MetricGroup": "Default;TopdownL1;tma_L1_group", 303 301 "MetricName": "tma_frontend_bound", 304 302 "MetricThreshold": "tma_frontend_bound > 0.2", 305 - "MetricgroupNoGroup": "TopdownL1", 303 + "MetricgroupNoGroup": "TopdownL1;Default", 306 304 "ScaleUnit": "100%", 307 305 "Unit": "cpu_atom" 308 306 }, ··· 726 722 }, 727 723 { 728 724 "BriefDescription": "Counts the numer of issue slots that result in retirement slots.", 725 + "DefaultMetricgroupName": "TopdownL1", 729 726 "MetricExpr": "TOPDOWN_RETIRING.ALL / tma_info_core_slots", 730 - "MetricGroup": "TopdownL1;tma_L1_group", 727 + "MetricGroup": "Default;TopdownL1;tma_L1_group", 731 728 "MetricName": "tma_retiring", 732 729 "MetricThreshold": "tma_retiring > 0.75", 733 - "MetricgroupNoGroup": "TopdownL1", 730 + "MetricgroupNoGroup": "TopdownL1;Default", 734 731 "ScaleUnit": "100%", 735 732 "Unit": "cpu_atom" 736 733 }, ··· 837 832 }, 838 833 { 839 834 "BriefDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend", 835 + "DefaultMetricgroupName": "TopdownL1", 840 836 "MetricExpr": "cpu_core@topdown\\-be\\-bound@ / (cpu_core@topdown\\-fe\\-bound@ + cpu_core@topdown\\-bad\\-spec@ + cpu_core@topdown\\-retiring@ + cpu_core@topdown\\-be\\-bound@) + 0 * tma_info_thread_slots", 841 - "MetricGroup": "TmaL1;TopdownL1;tma_L1_group", 837 + "MetricGroup": "Default;TmaL1;TopdownL1;tma_L1_group", 842 838 "MetricName": "tma_backend_bound", 843 839 "MetricThreshold": "tma_backend_bound > 0.2", 844 - "MetricgroupNoGroup": "TopdownL1", 840 + "MetricgroupNoGroup": "TopdownL1;Default", 845 841 "PublicDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. Backend is the portion of the processor core where the out-of-order scheduler dispatches ready uops into their respective execution units; and once completed these uops get retired according to program order. For example; stalls due to data-cache misses or stalls due to the divider unit being overloaded are both categorized under Backend Bound. Backend Bound is further divided into two main categories: Memory Bound and Core Bound. Sample with: TOPDOWN.BACKEND_BOUND_SLOTS", 846 842 "ScaleUnit": "100%", 847 843 "Unit": "cpu_core" 848 844 }, 849 845 { 850 846 "BriefDescription": "This category represents fraction of slots wasted due to incorrect speculations", 847 + "DefaultMetricgroupName": "TopdownL1", 851 848 "MetricExpr": "max(1 - (tma_frontend_bound + tma_backend_bound + tma_retiring), 0)", 852 - "MetricGroup": "TmaL1;TopdownL1;tma_L1_group", 849 + "MetricGroup": "Default;TmaL1;TopdownL1;tma_L1_group", 853 850 "MetricName": "tma_bad_speculation", 854 851 "MetricThreshold": "tma_bad_speculation > 0.15", 855 - "MetricgroupNoGroup": "TopdownL1", 852 + "MetricgroupNoGroup": "TopdownL1;Default", 856 853 "PublicDescription": "This category represents fraction of slots wasted due to incorrect speculations. This include slots used to issue uops that do not eventually get retired and slots for which the issue-pipeline was blocked due to recovery from earlier incorrect speculation. For example; wasted work due to miss-predicted branches are categorized under Bad Speculation category. Incorrect data speculation followed by Memory Ordering Nukes is another example.", 857 854 "ScaleUnit": "100%", 858 855 "Unit": "cpu_core" ··· 1119 1112 }, 1120 1113 { 1121 1114 "BriefDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend", 1115 + "DefaultMetricgroupName": "TopdownL1", 1122 1116 "MetricExpr": "cpu_core@topdown\\-fe\\-bound@ / (cpu_core@topdown\\-fe\\-bound@ + cpu_core@topdown\\-bad\\-spec@ + cpu_core@topdown\\-retiring@ + cpu_core@topdown\\-be\\-bound@) - cpu_core@INT_MISC.UOP_DROPPING@ / tma_info_thread_slots", 1123 - "MetricGroup": "PGO;TmaL1;TopdownL1;tma_L1_group", 1117 + "MetricGroup": "Default;PGO;TmaL1;TopdownL1;tma_L1_group", 1124 1118 "MetricName": "tma_frontend_bound", 1125 1119 "MetricThreshold": "tma_frontend_bound > 0.15", 1126 - "MetricgroupNoGroup": "TopdownL1", 1120 + "MetricgroupNoGroup": "TopdownL1;Default", 1127 1121 "PublicDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. Frontend denotes the first part of the processor core responsible to fetch operations that are executed later on by the Backend part. Within the Frontend; a branch predictor predicts the next address to fetch; cache-lines are fetched from the memory subsystem; parsed into instructions; and lastly decoded into micro-operations (uops). Ideally the Frontend can issue Pipeline_Width uops every cycle to the Backend. Frontend Bound denotes unutilized issue-slots when there is no Backend stall; i.e. bubbles where Frontend delivered no uops while Backend could have accepted them. For example; stalls due to instruction-cache misses would be categorized under Frontend Bound. Sample with: FRONTEND_RETIRED.LATENCY_GE_4_PS", 1128 1122 "ScaleUnit": "100%", 1129 1123 "Unit": "cpu_core" ··· 2324 2316 }, 2325 2317 { 2326 2318 "BriefDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired", 2319 + "DefaultMetricgroupName": "TopdownL1", 2327 2320 "MetricExpr": "cpu_core@topdown\\-retiring@ / (cpu_core@topdown\\-fe\\-bound@ + cpu_core@topdown\\-bad\\-spec@ + cpu_core@topdown\\-retiring@ + cpu_core@topdown\\-be\\-bound@) + 0 * tma_info_thread_slots", 2328 - "MetricGroup": "TmaL1;TopdownL1;tma_L1_group", 2321 + "MetricGroup": "Default;TmaL1;TopdownL1;tma_L1_group", 2329 2322 "MetricName": "tma_retiring", 2330 2323 "MetricThreshold": "tma_retiring > 0.7 | tma_heavy_operations > 0.1", 2331 - "MetricgroupNoGroup": "TopdownL1", 2324 + "MetricgroupNoGroup": "TopdownL1;Default", 2332 2325 "PublicDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. Ideally; all pipeline slots would be attributed to the Retiring category. Retiring of 100% would indicate the maximum Pipeline_Width throughput was achieved. Maximizing Retiring typically increases the Instructions-per-cycle (see IPC metric). Note that a high Retiring value does not necessary mean there is no room for more performance. For example; Heavy-operations or Microcode Assists are categorized under Retiring. They often indicate suboptimal performance and can often be optimized or avoided. Sample with: UOPS_RETIRED.SLOTS", 2333 2326 "ScaleUnit": "100%", 2334 2327 "Unit": "cpu_core"
+15 -10
tools/perf/pmu-events/arch/x86/alderlaken/adln-metrics.json
··· 94 94 }, 95 95 { 96 96 "BriefDescription": "Counts the total number of issue slots that were not consumed by the backend due to backend stalls", 97 + "DefaultMetricgroupName": "TopdownL1", 97 98 "MetricExpr": "TOPDOWN_BE_BOUND.ALL / tma_info_core_slots", 98 - "MetricGroup": "TopdownL1;tma_L1_group", 99 + "MetricGroup": "Default;TopdownL1;tma_L1_group", 99 100 "MetricName": "tma_backend_bound", 100 101 "MetricThreshold": "tma_backend_bound > 0.1", 101 - "MetricgroupNoGroup": "TopdownL1", 102 + "MetricgroupNoGroup": "TopdownL1;Default", 102 103 "PublicDescription": "Counts the total number of issue slots that were not consumed by the backend due to backend stalls. Note that uops must be available for consumption in order for this event to count. If a uop is not available (IQ is empty), this event will not count. The rest of these subevents count backend stalls, in cycles, due to an outstanding request which is memory bound vs core bound. The subevents are not slot based events and therefore can not be precisely added or subtracted from the Backend_Bound_Aux subevents which are slot based.", 103 104 "ScaleUnit": "100%" 104 105 }, 105 106 { 106 107 "BriefDescription": "Counts the total number of issue slots that were not consumed by the backend due to backend stalls", 108 + "DefaultMetricgroupName": "TopdownL1", 107 109 "MetricExpr": "tma_backend_bound", 108 - "MetricGroup": "TopdownL1;tma_L1_group", 110 + "MetricGroup": "Default;TopdownL1;tma_L1_group", 109 111 "MetricName": "tma_backend_bound_aux", 110 112 "MetricThreshold": "tma_backend_bound_aux > 0.2", 111 - "MetricgroupNoGroup": "TopdownL1", 113 + "MetricgroupNoGroup": "TopdownL1;Default", 112 114 "PublicDescription": "Counts the total number of issue slots that were not consumed by the backend due to backend stalls. Note that UOPS must be available for consumption in order for this event to count. If a uop is not available (IQ is empty), this event will not count. All of these subevents count backend stalls, in slots, due to a resource limitation. These are not cycle based events and therefore can not be precisely added or subtracted from the Backend_Bound subevents which are cycle based. These subevents are supplementary to Backend_Bound and can be used to analyze results from a resource perspective at allocation.", 113 115 "ScaleUnit": "100%" 114 116 }, 115 117 { 116 118 "BriefDescription": "Counts the total number of issue slots that were not consumed by the backend because allocation is stalled due to a mispredicted jump or a machine clear", 119 + "DefaultMetricgroupName": "TopdownL1", 117 120 "MetricExpr": "(tma_info_core_slots - (TOPDOWN_FE_BOUND.ALL + TOPDOWN_BE_BOUND.ALL + TOPDOWN_RETIRING.ALL)) / tma_info_core_slots", 118 - "MetricGroup": "TopdownL1;tma_L1_group", 121 + "MetricGroup": "Default;TopdownL1;tma_L1_group", 119 122 "MetricName": "tma_bad_speculation", 120 123 "MetricThreshold": "tma_bad_speculation > 0.15", 121 - "MetricgroupNoGroup": "TopdownL1", 124 + "MetricgroupNoGroup": "TopdownL1;Default", 122 125 "PublicDescription": "Counts the total number of issue slots that were not consumed by the backend because allocation is stalled due to a mispredicted jump or a machine clear. Only issue slots wasted due to fast nukes such as memory ordering nukes are counted. Other nukes are not accounted for. Counts all issue slots blocked during this recovery window including relevant microcode flows and while uops are not yet available in the instruction queue (IQ). Also includes the issue slots that were consumed by the backend but were thrown away because they were younger than the mispredict or machine clear.", 123 126 "ScaleUnit": "100%" 124 127 }, ··· 246 243 }, 247 244 { 248 245 "BriefDescription": "Counts the number of issue slots that were not consumed by the backend due to frontend stalls.", 246 + "DefaultMetricgroupName": "TopdownL1", 249 247 "MetricExpr": "TOPDOWN_FE_BOUND.ALL / tma_info_core_slots", 250 - "MetricGroup": "TopdownL1;tma_L1_group", 248 + "MetricGroup": "Default;TopdownL1;tma_L1_group", 251 249 "MetricName": "tma_frontend_bound", 252 250 "MetricThreshold": "tma_frontend_bound > 0.2", 253 - "MetricgroupNoGroup": "TopdownL1", 251 + "MetricgroupNoGroup": "TopdownL1;Default", 254 252 "ScaleUnit": "100%" 255 253 }, 256 254 { ··· 616 612 }, 617 613 { 618 614 "BriefDescription": "Counts the numer of issue slots that result in retirement slots.", 615 + "DefaultMetricgroupName": "TopdownL1", 619 616 "MetricExpr": "TOPDOWN_RETIRING.ALL / tma_info_core_slots", 620 - "MetricGroup": "TopdownL1;tma_L1_group", 617 + "MetricGroup": "Default;TopdownL1;tma_L1_group", 621 618 "MetricName": "tma_retiring", 622 619 "MetricThreshold": "tma_retiring > 0.75", 623 - "MetricgroupNoGroup": "TopdownL1", 620 + "MetricgroupNoGroup": "TopdownL1;Default", 624 621 "ScaleUnit": "100%" 625 622 }, 626 623 {
+12 -8
tools/perf/pmu-events/arch/x86/icelake/icl-metrics.json
··· 111 111 }, 112 112 { 113 113 "BriefDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend", 114 + "DefaultMetricgroupName": "TopdownL1", 114 115 "MetricExpr": "topdown\\-be\\-bound / (topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) + 5 * cpu@INT_MISC.RECOVERY_CYCLES\\,cmask\\=1\\,edge@ / tma_info_thread_slots", 115 - "MetricGroup": "TmaL1;TopdownL1;tma_L1_group", 116 + "MetricGroup": "Default;TmaL1;TopdownL1;tma_L1_group", 116 117 "MetricName": "tma_backend_bound", 117 118 "MetricThreshold": "tma_backend_bound > 0.2", 118 - "MetricgroupNoGroup": "TopdownL1", 119 + "MetricgroupNoGroup": "TopdownL1;Default", 119 120 "PublicDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. Backend is the portion of the processor core where the out-of-order scheduler dispatches ready uops into their respective execution units; and once completed these uops get retired according to program order. For example; stalls due to data-cache misses or stalls due to the divider unit being overloaded are both categorized under Backend Bound. Backend Bound is further divided into two main categories: Memory Bound and Core Bound. Sample with: TOPDOWN.BACKEND_BOUND_SLOTS", 120 121 "ScaleUnit": "100%" 121 122 }, 122 123 { 123 124 "BriefDescription": "This category represents fraction of slots wasted due to incorrect speculations", 125 + "DefaultMetricgroupName": "TopdownL1", 124 126 "MetricExpr": "max(1 - (tma_frontend_bound + tma_backend_bound + tma_retiring), 0)", 125 - "MetricGroup": "TmaL1;TopdownL1;tma_L1_group", 127 + "MetricGroup": "Default;TmaL1;TopdownL1;tma_L1_group", 126 128 "MetricName": "tma_bad_speculation", 127 129 "MetricThreshold": "tma_bad_speculation > 0.15", 128 - "MetricgroupNoGroup": "TopdownL1", 130 + "MetricgroupNoGroup": "TopdownL1;Default", 129 131 "PublicDescription": "This category represents fraction of slots wasted due to incorrect speculations. This include slots used to issue uops that do not eventually get retired and slots for which the issue-pipeline was blocked due to recovery from earlier incorrect speculation. For example; wasted work due to miss-predicted branches are categorized under Bad Speculation category. Incorrect data speculation followed by Memory Ordering Nukes is another example.", 130 132 "ScaleUnit": "100%" 131 133 }, ··· 374 372 }, 375 373 { 376 374 "BriefDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend", 375 + "DefaultMetricgroupName": "TopdownL1", 377 376 "MetricExpr": "topdown\\-fe\\-bound / (topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) - INT_MISC.UOP_DROPPING / tma_info_thread_slots", 378 - "MetricGroup": "PGO;TmaL1;TopdownL1;tma_L1_group", 377 + "MetricGroup": "Default;PGO;TmaL1;TopdownL1;tma_L1_group", 379 378 "MetricName": "tma_frontend_bound", 380 379 "MetricThreshold": "tma_frontend_bound > 0.15", 381 - "MetricgroupNoGroup": "TopdownL1", 380 + "MetricgroupNoGroup": "TopdownL1;Default", 382 381 "PublicDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. Frontend denotes the first part of the processor core responsible to fetch operations that are executed later on by the Backend part. Within the Frontend; a branch predictor predicts the next address to fetch; cache-lines are fetched from the memory subsystem; parsed into instructions; and lastly decoded into micro-operations (uops). Ideally the Frontend can issue Pipeline_Width uops every cycle to the Backend. Frontend Bound denotes unutilized issue-slots when there is no Backend stall; i.e. bubbles where Frontend delivered no uops while Backend could have accepted them. For example; stalls due to instruction-cache misses would be categorized under Frontend Bound. Sample with: FRONTEND_RETIRED.LATENCY_GE_4_PS", 383 382 "ScaleUnit": "100%" 384 383 }, ··· 1381 1378 }, 1382 1379 { 1383 1380 "BriefDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired", 1381 + "DefaultMetricgroupName": "TopdownL1", 1384 1382 "MetricExpr": "topdown\\-retiring / (topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) + 0 * tma_info_thread_slots", 1385 - "MetricGroup": "TmaL1;TopdownL1;tma_L1_group", 1383 + "MetricGroup": "Default;TmaL1;TopdownL1;tma_L1_group", 1386 1384 "MetricName": "tma_retiring", 1387 1385 "MetricThreshold": "tma_retiring > 0.7 | tma_heavy_operations > 0.1", 1388 - "MetricgroupNoGroup": "TopdownL1", 1386 + "MetricgroupNoGroup": "TopdownL1;Default", 1389 1387 "PublicDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. Ideally; all pipeline slots would be attributed to the Retiring category. Retiring of 100% would indicate the maximum Pipeline_Width throughput was achieved. Maximizing Retiring typically increases the Instructions-per-cycle (see IPC metric). Note that a high Retiring value does not necessary mean there is no room for more performance. For example; Heavy-operations or Microcode Assists are categorized under Retiring. They often indicate suboptimal performance and can often be optimized or avoided. Sample with: UOPS_RETIRED.SLOTS", 1390 1388 "ScaleUnit": "100%" 1391 1389 },
+12 -8
tools/perf/pmu-events/arch/x86/icelakex/icx-metrics.json
··· 315 315 }, 316 316 { 317 317 "BriefDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend", 318 + "DefaultMetricgroupName": "TopdownL1", 318 319 "MetricExpr": "topdown\\-be\\-bound / (topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) + 5 * cpu@INT_MISC.RECOVERY_CYCLES\\,cmask\\=1\\,edge@ / tma_info_thread_slots", 319 - "MetricGroup": "TmaL1;TopdownL1;tma_L1_group", 320 + "MetricGroup": "Default;TmaL1;TopdownL1;tma_L1_group", 320 321 "MetricName": "tma_backend_bound", 321 322 "MetricThreshold": "tma_backend_bound > 0.2", 322 - "MetricgroupNoGroup": "TopdownL1", 323 + "MetricgroupNoGroup": "TopdownL1;Default", 323 324 "PublicDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. Backend is the portion of the processor core where the out-of-order scheduler dispatches ready uops into their respective execution units; and once completed these uops get retired according to program order. For example; stalls due to data-cache misses or stalls due to the divider unit being overloaded are both categorized under Backend Bound. Backend Bound is further divided into two main categories: Memory Bound and Core Bound. Sample with: TOPDOWN.BACKEND_BOUND_SLOTS", 324 325 "ScaleUnit": "100%" 325 326 }, 326 327 { 327 328 "BriefDescription": "This category represents fraction of slots wasted due to incorrect speculations", 329 + "DefaultMetricgroupName": "TopdownL1", 328 330 "MetricExpr": "max(1 - (tma_frontend_bound + tma_backend_bound + tma_retiring), 0)", 329 - "MetricGroup": "TmaL1;TopdownL1;tma_L1_group", 331 + "MetricGroup": "Default;TmaL1;TopdownL1;tma_L1_group", 330 332 "MetricName": "tma_bad_speculation", 331 333 "MetricThreshold": "tma_bad_speculation > 0.15", 332 - "MetricgroupNoGroup": "TopdownL1", 334 + "MetricgroupNoGroup": "TopdownL1;Default", 333 335 "PublicDescription": "This category represents fraction of slots wasted due to incorrect speculations. This include slots used to issue uops that do not eventually get retired and slots for which the issue-pipeline was blocked due to recovery from earlier incorrect speculation. For example; wasted work due to miss-predicted branches are categorized under Bad Speculation category. Incorrect data speculation followed by Memory Ordering Nukes is another example.", 334 336 "ScaleUnit": "100%" 335 337 }, ··· 578 576 }, 579 577 { 580 578 "BriefDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend", 579 + "DefaultMetricgroupName": "TopdownL1", 581 580 "MetricExpr": "topdown\\-fe\\-bound / (topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) - INT_MISC.UOP_DROPPING / tma_info_thread_slots", 582 - "MetricGroup": "PGO;TmaL1;TopdownL1;tma_L1_group", 581 + "MetricGroup": "Default;PGO;TmaL1;TopdownL1;tma_L1_group", 583 582 "MetricName": "tma_frontend_bound", 584 583 "MetricThreshold": "tma_frontend_bound > 0.15", 585 - "MetricgroupNoGroup": "TopdownL1", 584 + "MetricgroupNoGroup": "TopdownL1;Default", 586 585 "PublicDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. Frontend denotes the first part of the processor core responsible to fetch operations that are executed later on by the Backend part. Within the Frontend; a branch predictor predicts the next address to fetch; cache-lines are fetched from the memory subsystem; parsed into instructions; and lastly decoded into micro-operations (uops). Ideally the Frontend can issue Pipeline_Width uops every cycle to the Backend. Frontend Bound denotes unutilized issue-slots when there is no Backend stall; i.e. bubbles where Frontend delivered no uops while Backend could have accepted them. For example; stalls due to instruction-cache misses would be categorized under Frontend Bound. Sample with: FRONTEND_RETIRED.LATENCY_GE_4_PS", 587 586 "ScaleUnit": "100%" 588 587 }, ··· 1677 1674 }, 1678 1675 { 1679 1676 "BriefDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired", 1677 + "DefaultMetricgroupName": "TopdownL1", 1680 1678 "MetricExpr": "topdown\\-retiring / (topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) + 0 * tma_info_thread_slots", 1681 - "MetricGroup": "TmaL1;TopdownL1;tma_L1_group", 1679 + "MetricGroup": "Default;TmaL1;TopdownL1;tma_L1_group", 1682 1680 "MetricName": "tma_retiring", 1683 1681 "MetricThreshold": "tma_retiring > 0.7 | tma_heavy_operations > 0.1", 1684 - "MetricgroupNoGroup": "TopdownL1", 1682 + "MetricgroupNoGroup": "TopdownL1;Default", 1685 1683 "PublicDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. Ideally; all pipeline slots would be attributed to the Retiring category. Retiring of 100% would indicate the maximum Pipeline_Width throughput was achieved. Maximizing Retiring typically increases the Instructions-per-cycle (see IPC metric). Note that a high Retiring value does not necessary mean there is no room for more performance. For example; Heavy-operations or Microcode Assists are categorized under Retiring. They often indicate suboptimal performance and can often be optimized or avoided. Sample with: UOPS_RETIRED.SLOTS", 1686 1684 "ScaleUnit": "100%" 1687 1685 },
+36 -24
tools/perf/pmu-events/arch/x86/sapphirerapids/spr-metrics.json
··· 340 340 }, 341 341 { 342 342 "BriefDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend", 343 + "DefaultMetricgroupName": "TopdownL1", 343 344 "MetricExpr": "topdown\\-be\\-bound / (topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) + 0 * tma_info_thread_slots", 344 - "MetricGroup": "TmaL1;TopdownL1;tma_L1_group", 345 + "MetricGroup": "Default;TmaL1;TopdownL1;tma_L1_group", 345 346 "MetricName": "tma_backend_bound", 346 347 "MetricThreshold": "tma_backend_bound > 0.2", 347 - "MetricgroupNoGroup": "TopdownL1", 348 + "MetricgroupNoGroup": "TopdownL1;Default", 348 349 "PublicDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. Backend is the portion of the processor core where the out-of-order scheduler dispatches ready uops into their respective execution units; and once completed these uops get retired according to program order. For example; stalls due to data-cache misses or stalls due to the divider unit being overloaded are both categorized under Backend Bound. Backend Bound is further divided into two main categories: Memory Bound and Core Bound. Sample with: TOPDOWN.BACKEND_BOUND_SLOTS", 349 350 "ScaleUnit": "100%" 350 351 }, 351 352 { 352 353 "BriefDescription": "This category represents fraction of slots wasted due to incorrect speculations", 354 + "DefaultMetricgroupName": "TopdownL1", 353 355 "MetricExpr": "max(1 - (tma_frontend_bound + tma_backend_bound + tma_retiring), 0)", 354 - "MetricGroup": "TmaL1;TopdownL1;tma_L1_group", 356 + "MetricGroup": "Default;TmaL1;TopdownL1;tma_L1_group", 355 357 "MetricName": "tma_bad_speculation", 356 358 "MetricThreshold": "tma_bad_speculation > 0.15", 357 - "MetricgroupNoGroup": "TopdownL1", 359 + "MetricgroupNoGroup": "TopdownL1;Default", 358 360 "PublicDescription": "This category represents fraction of slots wasted due to incorrect speculations. This include slots used to issue uops that do not eventually get retired and slots for which the issue-pipeline was blocked due to recovery from earlier incorrect speculation. For example; wasted work due to miss-predicted branches are categorized under Bad Speculation category. Incorrect data speculation followed by Memory Ordering Nukes is another example.", 359 361 "ScaleUnit": "100%" 360 362 }, 361 363 { 362 364 "BriefDescription": "This metric represents fraction of slots the CPU has wasted due to Branch Misprediction", 365 + "DefaultMetricgroupName": "TopdownL2", 363 366 "MetricExpr": "topdown\\-br\\-mispredict / (topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) + 0 * tma_info_thread_slots", 364 - "MetricGroup": "BadSpec;BrMispredicts;TmaL2;TopdownL2;tma_L2_group;tma_bad_speculation_group;tma_issueBM", 367 + "MetricGroup": "BadSpec;BrMispredicts;Default;TmaL2;TopdownL2;tma_L2_group;tma_bad_speculation_group;tma_issueBM", 365 368 "MetricName": "tma_branch_mispredicts", 366 369 "MetricThreshold": "tma_branch_mispredicts > 0.1 & tma_bad_speculation > 0.15", 367 - "MetricgroupNoGroup": "TopdownL2", 370 + "MetricgroupNoGroup": "TopdownL2;Default", 368 371 "PublicDescription": "This metric represents fraction of slots the CPU has wasted due to Branch Misprediction. These slots are either wasted by uops fetched from an incorrectly speculated program path; or stalls when the out-of-order part of the machine needs to recover its state from a speculative path. Sample with: TOPDOWN.BR_MISPREDICT_SLOTS. Related metrics: tma_info_bad_spec_branch_misprediction_cost, tma_info_bottleneck_mispredictions, tma_mispredicts_resteers", 369 372 "ScaleUnit": "100%" 370 373 }, ··· 410 407 }, 411 408 { 412 409 "BriefDescription": "This metric represents fraction of slots where Core non-memory issues were of a bottleneck", 410 + "DefaultMetricgroupName": "TopdownL2", 413 411 "MetricExpr": "max(0, tma_backend_bound - tma_memory_bound)", 414 - "MetricGroup": "Backend;Compute;TmaL2;TopdownL2;tma_L2_group;tma_backend_bound_group", 412 + "MetricGroup": "Backend;Compute;Default;TmaL2;TopdownL2;tma_L2_group;tma_backend_bound_group", 415 413 "MetricName": "tma_core_bound", 416 414 "MetricThreshold": "tma_core_bound > 0.1 & tma_backend_bound > 0.2", 417 - "MetricgroupNoGroup": "TopdownL2", 415 + "MetricgroupNoGroup": "TopdownL2;Default", 418 416 "PublicDescription": "This metric represents fraction of slots where Core non-memory issues were of a bottleneck. Shortage in hardware compute resources; or dependencies in software's instructions are both categorized under Core Bound. Hence it may indicate the machine ran out of an out-of-order resource; certain execution units are overloaded or dependencies in program's data- or instruction-flow are limiting the performance (e.g. FP-chained long-latency arithmetic operations).", 419 417 "ScaleUnit": "100%" 420 418 }, ··· 513 509 }, 514 510 { 515 511 "BriefDescription": "This metric represents fraction of slots the CPU was stalled due to Frontend bandwidth issues", 512 + "DefaultMetricgroupName": "TopdownL2", 516 513 "MetricExpr": "max(0, tma_frontend_bound - tma_fetch_latency)", 517 - "MetricGroup": "FetchBW;Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend_bound_group;tma_issueFB", 514 + "MetricGroup": "Default;FetchBW;Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend_bound_group;tma_issueFB", 518 515 "MetricName": "tma_fetch_bandwidth", 519 516 "MetricThreshold": "tma_fetch_bandwidth > 0.1 & tma_frontend_bound > 0.15 & tma_info_thread_ipc / 6 > 0.35", 520 - "MetricgroupNoGroup": "TopdownL2", 517 + "MetricgroupNoGroup": "TopdownL2;Default", 521 518 "PublicDescription": "This metric represents fraction of slots the CPU was stalled due to Frontend bandwidth issues. For example; inefficiencies at the instruction decoders; or restrictions for caching in the DSB (decoded uops cache) are categorized under Fetch Bandwidth. In such cases; the Frontend typically delivers suboptimal amount of uops to the Backend. Sample with: FRONTEND_RETIRED.LATENCY_GE_2_BUBBLES_GE_1_PS;FRONTEND_RETIRED.LATENCY_GE_1_PS;FRONTEND_RETIRED.LATENCY_GE_2_PS. Related metrics: tma_dsb_switches, tma_info_botlnk_l2_dsb_misses, tma_info_frontend_dsb_coverage, tma_info_inst_mix_iptb, tma_lcp", 522 519 "ScaleUnit": "100%" 523 520 }, 524 521 { 525 522 "BriefDescription": "This metric represents fraction of slots the CPU was stalled due to Frontend latency issues", 523 + "DefaultMetricgroupName": "TopdownL2", 526 524 "MetricExpr": "topdown\\-fetch\\-lat / (topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) - INT_MISC.UOP_DROPPING / tma_info_thread_slots", 527 - "MetricGroup": "Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend_bound_group", 525 + "MetricGroup": "Default;Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend_bound_group", 528 526 "MetricName": "tma_fetch_latency", 529 527 "MetricThreshold": "tma_fetch_latency > 0.1 & tma_frontend_bound > 0.15", 530 - "MetricgroupNoGroup": "TopdownL2", 528 + "MetricgroupNoGroup": "TopdownL2;Default", 531 529 "PublicDescription": "This metric represents fraction of slots the CPU was stalled due to Frontend latency issues. For example; instruction-cache misses; iTLB misses or fetch stalls after a branch misprediction are categorized under Frontend Latency. In such cases; the Frontend eventually delivers no uops for some period. Sample with: FRONTEND_RETIRED.LATENCY_GE_16_PS;FRONTEND_RETIRED.LATENCY_GE_8_PS", 532 530 "ScaleUnit": "100%" 533 531 }, ··· 617 611 }, 618 612 { 619 613 "BriefDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend", 614 + "DefaultMetricgroupName": "TopdownL1", 620 615 "MetricExpr": "topdown\\-fe\\-bound / (topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) - INT_MISC.UOP_DROPPING / tma_info_thread_slots", 621 - "MetricGroup": "PGO;TmaL1;TopdownL1;tma_L1_group", 616 + "MetricGroup": "Default;PGO;TmaL1;TopdownL1;tma_L1_group", 622 617 "MetricName": "tma_frontend_bound", 623 618 "MetricThreshold": "tma_frontend_bound > 0.15", 624 - "MetricgroupNoGroup": "TopdownL1", 619 + "MetricgroupNoGroup": "TopdownL1;Default", 625 620 "PublicDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. Frontend denotes the first part of the processor core responsible to fetch operations that are executed later on by the Backend part. Within the Frontend; a branch predictor predicts the next address to fetch; cache-lines are fetched from the memory subsystem; parsed into instructions; and lastly decoded into micro-operations (uops). Ideally the Frontend can issue Pipeline_Width uops every cycle to the Backend. Frontend Bound denotes unutilized issue-slots when there is no Backend stall; i.e. bubbles where Frontend delivered no uops while Backend could have accepted them. For example; stalls due to instruction-cache misses would be categorized under Frontend Bound. Sample with: FRONTEND_RETIRED.LATENCY_GE_4_PS", 626 621 "ScaleUnit": "100%" 627 622 }, ··· 637 630 }, 638 631 { 639 632 "BriefDescription": "This metric represents fraction of slots where the CPU was retiring heavy-weight operations -- instructions that require two or more uops or micro-coded sequences", 633 + "DefaultMetricgroupName": "TopdownL2", 640 634 "MetricExpr": "topdown\\-heavy\\-ops / (topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) + 0 * tma_info_thread_slots", 641 - "MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_group", 635 + "MetricGroup": "Default;Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_group", 642 636 "MetricName": "tma_heavy_operations", 643 637 "MetricThreshold": "tma_heavy_operations > 0.1", 644 - "MetricgroupNoGroup": "TopdownL2", 638 + "MetricgroupNoGroup": "TopdownL2;Default", 645 639 "PublicDescription": "This metric represents fraction of slots where the CPU was retiring heavy-weight operations -- instructions that require two or more uops or micro-coded sequences. This highly-correlates with the uop length of these instructions/sequences. Sample with: UOPS_RETIRED.HEAVY", 646 640 "ScaleUnit": "100%" 647 641 }, ··· 1494 1486 }, 1495 1487 { 1496 1488 "BriefDescription": "This metric represents fraction of slots where the CPU was retiring light-weight operations -- instructions that require no more than one uop (micro-operation)", 1489 + "DefaultMetricgroupName": "TopdownL2", 1497 1490 "MetricExpr": "max(0, tma_retiring - tma_heavy_operations)", 1498 - "MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_group", 1491 + "MetricGroup": "Default;Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_group", 1499 1492 "MetricName": "tma_light_operations", 1500 1493 "MetricThreshold": "tma_light_operations > 0.6", 1501 - "MetricgroupNoGroup": "TopdownL2", 1494 + "MetricgroupNoGroup": "TopdownL2;Default", 1502 1495 "PublicDescription": "This metric represents fraction of slots where the CPU was retiring light-weight operations -- instructions that require no more than one uop (micro-operation). This correlates with total number of instructions used by the program. A uops-per-instruction (see UopPI metric) ratio of 1 or less should be expected for decently optimized software running on Intel Core/Xeon products. While this often indicates efficient X86 instructions were executed; high value does not necessarily mean better performance cannot be achieved. Sample with: INST_RETIRED.PREC_DIST", 1503 1496 "ScaleUnit": "100%" 1504 1497 }, ··· 1549 1540 }, 1550 1541 { 1551 1542 "BriefDescription": "This metric represents fraction of slots the CPU has wasted due to Machine Clears", 1543 + "DefaultMetricgroupName": "TopdownL2", 1552 1544 "MetricExpr": "max(0, tma_bad_speculation - tma_branch_mispredicts)", 1553 - "MetricGroup": "BadSpec;MachineClears;TmaL2;TopdownL2;tma_L2_group;tma_bad_speculation_group;tma_issueMC;tma_issueSyncxn", 1545 + "MetricGroup": "BadSpec;Default;MachineClears;TmaL2;TopdownL2;tma_L2_group;tma_bad_speculation_group;tma_issueMC;tma_issueSyncxn", 1554 1546 "MetricName": "tma_machine_clears", 1555 1547 "MetricThreshold": "tma_machine_clears > 0.1 & tma_bad_speculation > 0.15", 1556 - "MetricgroupNoGroup": "TopdownL2", 1548 + "MetricgroupNoGroup": "TopdownL2;Default", 1557 1549 "PublicDescription": "This metric represents fraction of slots the CPU has wasted due to Machine Clears. These slots are either wasted by uops fetched prior to the clear; or stalls the out-of-order portion of the machine needs to recover its state after the clear. For example; this can happen due to memory ordering Nukes (e.g. Memory Disambiguation) or Self-Modifying-Code (SMC) nukes. Sample with: MACHINE_CLEARS.COUNT. Related metrics: tma_clears_resteers, tma_contested_accesses, tma_data_sharing, tma_false_sharing, tma_l1_bound, tma_microcode_sequencer, tma_ms_switches, tma_remote_cache", 1558 1550 "ScaleUnit": "100%" 1559 1551 }, ··· 1586 1576 }, 1587 1577 { 1588 1578 "BriefDescription": "This metric represents fraction of slots the Memory subsystem within the Backend was a bottleneck", 1579 + "DefaultMetricgroupName": "TopdownL2", 1589 1580 "MetricExpr": "topdown\\-mem\\-bound / (topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) + 0 * tma_info_thread_slots", 1590 - "MetricGroup": "Backend;TmaL2;TopdownL2;tma_L2_group;tma_backend_bound_group", 1581 + "MetricGroup": "Backend;Default;TmaL2;TopdownL2;tma_L2_group;tma_backend_bound_group", 1591 1582 "MetricName": "tma_memory_bound", 1592 1583 "MetricThreshold": "tma_memory_bound > 0.2 & tma_backend_bound > 0.2", 1593 - "MetricgroupNoGroup": "TopdownL2", 1584 + "MetricgroupNoGroup": "TopdownL2;Default", 1594 1585 "PublicDescription": "This metric represents fraction of slots the Memory subsystem within the Backend was a bottleneck. Memory Bound estimates fraction of slots where pipeline is likely stalled due to demand load or store instructions. This accounts mainly for (1) non-completed in-flight memory demand loads which coincides with execution units starvation; in addition to (2) cases where stores could impose backpressure on the pipeline when many of them get buffered at the same time (less common out of the two).", 1595 1586 "ScaleUnit": "100%" 1596 1587 }, ··· 1795 1784 }, 1796 1785 { 1797 1786 "BriefDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired", 1787 + "DefaultMetricgroupName": "TopdownL1", 1798 1788 "MetricExpr": "topdown\\-retiring / (topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) + 0 * tma_info_thread_slots", 1799 - "MetricGroup": "TmaL1;TopdownL1;tma_L1_group", 1789 + "MetricGroup": "Default;TmaL1;TopdownL1;tma_L1_group", 1800 1790 "MetricName": "tma_retiring", 1801 1791 "MetricThreshold": "tma_retiring > 0.7 | tma_heavy_operations > 0.1", 1802 - "MetricgroupNoGroup": "TopdownL1", 1792 + "MetricgroupNoGroup": "TopdownL1;Default", 1803 1793 "PublicDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. Ideally; all pipeline slots would be attributed to the Retiring category. Retiring of 100% would indicate the maximum Pipeline_Width throughput was achieved. Maximizing Retiring typically increases the Instructions-per-cycle (see IPC metric). Note that a high Retiring value does not necessary mean there is no room for more performance. For example; Heavy-operations or Microcode Assists are categorized under Retiring. They often indicate suboptimal performance and can often be optimized or avoided. Sample with: UOPS_RETIRED.SLOTS", 1804 1794 "ScaleUnit": "100%" 1805 1795 },
+12 -8
tools/perf/pmu-events/arch/x86/tigerlake/tgl-metrics.json
··· 105 105 }, 106 106 { 107 107 "BriefDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend", 108 + "DefaultMetricgroupName": "TopdownL1", 108 109 "MetricExpr": "topdown\\-be\\-bound / (topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) + 5 * cpu@INT_MISC.RECOVERY_CYCLES\\,cmask\\=1\\,edge@ / tma_info_thread_slots", 109 - "MetricGroup": "TmaL1;TopdownL1;tma_L1_group", 110 + "MetricGroup": "Default;TmaL1;TopdownL1;tma_L1_group", 110 111 "MetricName": "tma_backend_bound", 111 112 "MetricThreshold": "tma_backend_bound > 0.2", 112 - "MetricgroupNoGroup": "TopdownL1", 113 + "MetricgroupNoGroup": "TopdownL1;Default", 113 114 "PublicDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. Backend is the portion of the processor core where the out-of-order scheduler dispatches ready uops into their respective execution units; and once completed these uops get retired according to program order. For example; stalls due to data-cache misses or stalls due to the divider unit being overloaded are both categorized under Backend Bound. Backend Bound is further divided into two main categories: Memory Bound and Core Bound. Sample with: TOPDOWN.BACKEND_BOUND_SLOTS", 114 115 "ScaleUnit": "100%" 115 116 }, 116 117 { 117 118 "BriefDescription": "This category represents fraction of slots wasted due to incorrect speculations", 119 + "DefaultMetricgroupName": "TopdownL1", 118 120 "MetricExpr": "max(1 - (tma_frontend_bound + tma_backend_bound + tma_retiring), 0)", 119 - "MetricGroup": "TmaL1;TopdownL1;tma_L1_group", 121 + "MetricGroup": "Default;TmaL1;TopdownL1;tma_L1_group", 120 122 "MetricName": "tma_bad_speculation", 121 123 "MetricThreshold": "tma_bad_speculation > 0.15", 122 - "MetricgroupNoGroup": "TopdownL1", 124 + "MetricgroupNoGroup": "TopdownL1;Default", 123 125 "PublicDescription": "This category represents fraction of slots wasted due to incorrect speculations. This include slots used to issue uops that do not eventually get retired and slots for which the issue-pipeline was blocked due to recovery from earlier incorrect speculation. For example; wasted work due to miss-predicted branches are categorized under Bad Speculation category. Incorrect data speculation followed by Memory Ordering Nukes is another example.", 124 126 "ScaleUnit": "100%" 125 127 }, ··· 368 366 }, 369 367 { 370 368 "BriefDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend", 369 + "DefaultMetricgroupName": "TopdownL1", 371 370 "MetricExpr": "topdown\\-fe\\-bound / (topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) - INT_MISC.UOP_DROPPING / tma_info_thread_slots", 372 - "MetricGroup": "PGO;TmaL1;TopdownL1;tma_L1_group", 371 + "MetricGroup": "Default;PGO;TmaL1;TopdownL1;tma_L1_group", 373 372 "MetricName": "tma_frontend_bound", 374 373 "MetricThreshold": "tma_frontend_bound > 0.15", 375 - "MetricgroupNoGroup": "TopdownL1", 374 + "MetricgroupNoGroup": "TopdownL1;Default", 376 375 "PublicDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. Frontend denotes the first part of the processor core responsible to fetch operations that are executed later on by the Backend part. Within the Frontend; a branch predictor predicts the next address to fetch; cache-lines are fetched from the memory subsystem; parsed into instructions; and lastly decoded into micro-operations (uops). Ideally the Frontend can issue Pipeline_Width uops every cycle to the Backend. Frontend Bound denotes unutilized issue-slots when there is no Backend stall; i.e. bubbles where Frontend delivered no uops while Backend could have accepted them. For example; stalls due to instruction-cache misses would be categorized under Frontend Bound. Sample with: FRONTEND_RETIRED.LATENCY_GE_4_PS", 377 376 "ScaleUnit": "100%" 378 377 }, ··· 1395 1392 }, 1396 1393 { 1397 1394 "BriefDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired", 1395 + "DefaultMetricgroupName": "TopdownL1", 1398 1396 "MetricExpr": "topdown\\-retiring / (topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) + 0 * tma_info_thread_slots", 1399 - "MetricGroup": "TmaL1;TopdownL1;tma_L1_group", 1397 + "MetricGroup": "Default;TmaL1;TopdownL1;tma_L1_group", 1400 1398 "MetricName": "tma_retiring", 1401 1399 "MetricThreshold": "tma_retiring > 0.7 | tma_heavy_operations > 0.1", 1402 - "MetricgroupNoGroup": "TopdownL1", 1400 + "MetricgroupNoGroup": "TopdownL1;Default", 1403 1401 "PublicDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. Ideally; all pipeline slots would be attributed to the Retiring category. Retiring of 100% would indicate the maximum Pipeline_Width throughput was achieved. Maximizing Retiring typically increases the Instructions-per-cycle (see IPC metric). Note that a high Retiring value does not necessary mean there is no room for more performance. For example; Heavy-operations or Microcode Assists are categorized under Retiring. They often indicate suboptimal performance and can often be optimized or avoided. Sample with: UOPS_RETIRED.SLOTS", 1404 1402 "ScaleUnit": "100%" 1405 1403 },