Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

perf mem: Add statistics for peer snooping

Since the flag PERF_MEM_SNOOPX_PEER is added to support cache snooping
from peer cache line, it can come from a peer core, a peer cluster, or
a remote NUMA node.

This patch adds statistics for the flag PERF_MEM_SNOOPX_PEER. Note, we
take PERF_MEM_SNOOPX_PEER as an affiliated info, it needs to cooperate
with cache level statistics. Therefore, we account the load operations
for both the cache level's metrics (e.g. ld_l2hit, ld_llchit, etc.) and
peer related metrics when flag PERF_MEM_SNOOPX_PEER is set.

So three new metrics are introduced: 'lcl_peer' is for local cache
access, the metric 'rmt_peer' is for remote access (includes remote DRAM
and any caches in remote node), and the metric 'tot_peer' is accounting
the sum value of 'lcl_peer' and 'rmt_peer'.

Reviewed-by: Ali Saidi <alisaidi@amazon.com>
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Tested-by: Ali Saidi <alisaidi@amazon.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Like Xu <likexu@tencent.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Timothy Hayes <timothy.hayes@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20220811062451.435810-5-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

authored by

Leo Yan and committed by
Arnaldo Carvalho de Melo
e843dec5 4e6430cb

+28 -3
+25 -3
tools/perf/util/mem-events.c
··· 525 525 u64 op = data_src->mem_op; 526 526 u64 lvl = data_src->mem_lvl; 527 527 u64 snoop = data_src->mem_snoop; 528 + u64 snoopx = data_src->mem_snoopx; 528 529 u64 lock = data_src->mem_lock; 529 530 u64 blk = data_src->mem_blk; 530 531 /* ··· 543 542 do { \ 544 543 stats->__f++; \ 545 544 stats->tot_hitm++; \ 545 + } while (0) 546 + 547 + #define PEER_INC(__f) \ 548 + do { \ 549 + stats->__f++; \ 550 + stats->tot_peer++; \ 546 551 } while (0) 547 552 548 553 #define P(a, b) PERF_MEM_##a##_##b ··· 574 567 if (lvl & P(LVL, IO)) stats->ld_io++; 575 568 if (lvl & P(LVL, LFB)) stats->ld_fbhit++; 576 569 if (lvl & P(LVL, L1 )) stats->ld_l1hit++; 577 - if (lvl & P(LVL, L2 )) stats->ld_l2hit++; 570 + if (lvl & P(LVL, L2)) { 571 + stats->ld_l2hit++; 572 + 573 + if (snoopx & P(SNOOPX, PEER)) 574 + PEER_INC(lcl_peer); 575 + } 578 576 if (lvl & P(LVL, L3 )) { 579 577 if (snoop & P(SNOOP, HITM)) 580 578 HITM_INC(lcl_hitm); 581 579 else 582 580 stats->ld_llchit++; 581 + 582 + if (snoopx & P(SNOOPX, PEER)) 583 + PEER_INC(lcl_peer); 583 584 } 584 585 585 586 if (lvl & P(LVL, LOC_RAM)) { ··· 612 597 if ((lvl & P(LVL, REM_CCE1)) || 613 598 (lvl & P(LVL, REM_CCE2)) || 614 599 mrem) { 615 - if (snoop & P(SNOOP, HIT)) 600 + if (snoop & P(SNOOP, HIT)) { 616 601 stats->rmt_hit++; 617 - else if (snoop & P(SNOOP, HITM)) 602 + } else if (snoop & P(SNOOP, HITM)) { 618 603 HITM_INC(rmt_hitm); 604 + } else if (snoopx & P(SNOOPX, PEER)) { 605 + stats->rmt_hit++; 606 + PEER_INC(rmt_peer); 607 + } 619 608 } 620 609 621 610 if ((lvl & P(LVL, MISS))) ··· 683 664 stats->lcl_hitm += add->lcl_hitm; 684 665 stats->rmt_hitm += add->rmt_hitm; 685 666 stats->tot_hitm += add->tot_hitm; 667 + stats->lcl_peer += add->lcl_peer; 668 + stats->rmt_peer += add->rmt_peer; 669 + stats->tot_peer += add->tot_peer; 686 670 stats->rmt_hit += add->rmt_hit; 687 671 stats->lcl_dram += add->lcl_dram; 688 672 stats->rmt_dram += add->rmt_dram;
+3
tools/perf/util/mem-events.h
··· 78 78 u32 lcl_hitm; /* count of loads with local HITM */ 79 79 u32 rmt_hitm; /* count of loads with remote HITM */ 80 80 u32 tot_hitm; /* count of loads with local and remote HITM */ 81 + u32 lcl_peer; /* count of loads with local peer cache */ 82 + u32 rmt_peer; /* count of loads with remote peer cache */ 83 + u32 tot_peer; /* count of loads with local and remote peer cache */ 81 84 u32 rmt_hit; /* count of loads with remote hit clean; */ 82 85 u32 lcl_dram; /* count of loads miss to local DRAM */ 83 86 u32 rmt_dram; /* count of loads miss to remote DRAM */