Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

perf annotate: Add disasm_line__parse() to parse raw instruction for powerpc

Currently, the perf tool infrastructure uses the disasm_line__parse
function to parse disassembled line.

Example snippet from objdump:

objdump --start-address=<address> --stop-address=<address> -d --no-show-raw-insn -C <vmlinux>

c0000000010224b4: lwz r10,0(r9)

This line "lwz r10,0(r9)" is parsed to extract instruction name,
registers names and offset.

In powerpc, the approach for data type profiling uses raw instruction
instead of result from objdump to identify the instruction category and
extract the source/target registers.

Example: 38 01 81 e8 ld r4,312(r1)

Here "38 01 81 e8" is the raw instruction representation. Add function
"disasm_line__parse_powerpc" to handle parsing of raw instruction.
Also update "struct disasm_line" to save the binary code/
With the change, function captures:

line -> "38 01 81 e8 ld r4,312(r1)"
raw instruction "38 01 81 e8"

Raw instruction is used later to extract the reg/offset fields. Macros
are added to extract opcode and register fields. "struct disasm_line"
is updated to carry union of "bytes" and "raw_insn" of 32 bit to carry raw
code (raw).

Function "disasm_line__parse_powerpc fills the raw instruction hex value
and can use macros to get opcode. There is no changes in existing code
paths, which parses the disassembled code. The size of raw instruction
depends on architecture.

In case of powerpc, the parsing the disasm line needs to handle cases
for reading binary code directly from DSO as well as parsing the objdump
result. Hence adding the logic into separate function instead of
updating "disasm_line__parse". The architecture using the instruction
name and present approach is not altered. Since this approach targets
powerpc, the macro implementation is added for powerpc as of now.

Since the disasm_line__parse is used in other cases (perf annotate) and
not only data tye profiling, the powerpc callback includes changes to
work with binary code as well as mnemonic representation.

Also in case if the DSO read fails and libcapstone is not supported, the
approach fallback to use objdump as option. Hence as option, patch has
changes to ensure objdump option also works well.

Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Tested-by: Kajol Jain <kjain@linux.ibm.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Akanksha J N <akanksha@linux.ibm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Disha Goel <disgoel@linux.vnet.ibm.com>
Cc: Hari Bathini <hbathini@linux.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Segher Boessenkool <segher@kernel.crashing.org>
Link: https://lore.kernel.org/lkml/20240718084358.72242-5-atrajeev@linux.vnet.ibm.com
[ Add check for strndup() result ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

authored by

Athira Rajeev and committed by
Arnaldo Carvalho de Melo
06dd4c5a b1d8d968

+79 -2
+2
tools/include/linux/string.h
··· 46 46 47 47 extern char *strim(char *); 48 48 49 + extern void remove_spaces(char *s); 50 + 49 51 extern void *memchr_inv(const void *start, int c, size_t bytes); 50 52 #endif /* _TOOLS_LINUX_STRING_H_ */
+13
tools/lib/string.c
··· 153 153 return skip_spaces(s); 154 154 } 155 155 156 + /* 157 + * remove_spaces - Removes whitespaces from @s 158 + */ 159 + void remove_spaces(char *s) 160 + { 161 + char *d = s; 162 + 163 + do { 164 + while (*d == ' ') 165 + ++d; 166 + } while ((*s++ = *d++)); 167 + } 168 + 156 169 /** 157 170 * strreplace - Replace all occurrences of character in string. 158 171 * @s: The string to operate on.
+1
tools/perf/arch/powerpc/annotate/instructions.c
··· 55 55 arch->initialized = true; 56 56 arch->associate_instruction_ops = powerpc__associate_instruction_ops; 57 57 arch->objdump.comment_char = '#'; 58 + annotate_opts.show_asm_raw = true; 58 59 } 59 60 60 61 return 0;
+9
tools/perf/arch/powerpc/util/dwarf-regs.c
··· 98 98 return roff->ptregs_offset; 99 99 return -EINVAL; 100 100 } 101 + 102 + #define PPC_OP(op) (((op) >> 26) & 0x3F) 103 + #define PPC_RA(a) (((a) >> 16) & 0x1f) 104 + #define PPC_RT(t) (((t) >> 21) & 0x1f) 105 + #define PPC_RB(b) (((b) >> 11) & 0x1f) 106 + #define PPC_D(D) ((D) & 0xfffe) 107 + #define PPC_DS(DS) ((DS) & 0xfffc) 108 + #define OP_LD 58 109 + #define OP_STD 62
+4 -1
tools/perf/util/annotate.h
··· 113 113 struct disasm_line { 114 114 struct ins ins; 115 115 struct ins_operands ops; 116 - 116 + union { 117 + u8 bytes[4]; 118 + u32 raw_insn; 119 + } raw; 117 120 /* This needs to be at the end. */ 118 121 struct annotation_line al; 119 122 };
+50 -1
tools/perf/util/disasm.c
··· 44 44 45 45 static void ins__sort(struct arch *arch); 46 46 static int disasm_line__parse(char *line, const char **namep, char **rawp); 47 + static int disasm_line__parse_powerpc(struct disasm_line *dl); 47 48 48 49 static __attribute__((constructor)) void symbol__init_regexpr(void) 49 50 { ··· 846 845 return -1; 847 846 } 848 847 848 + /* 849 + * Parses the result captured from symbol__disassemble_* 850 + * Example, line read from DSO file in powerpc: 851 + * line: 38 01 81 e8 852 + * opcode: fetched from arch specific get_opcode_insn 853 + * rawp_insn: e8810138 854 + * 855 + * rawp_insn is used later to extract the reg/offset fields 856 + */ 857 + #define PPC_OP(op) (((op) >> 26) & 0x3F) 858 + #define RAW_BYTES 11 859 + 860 + static int disasm_line__parse_powerpc(struct disasm_line *dl) 861 + { 862 + char *line = dl->al.line; 863 + const char **namep = &dl->ins.name; 864 + char **rawp = &dl->ops.raw; 865 + char *tmp_raw_insn, *name_raw_insn = skip_spaces(line); 866 + char *name = skip_spaces(name_raw_insn + RAW_BYTES); 867 + int objdump = 0; 868 + 869 + if (strlen(line) > RAW_BYTES) 870 + objdump = 1; 871 + 872 + if (name_raw_insn[0] == '\0') 873 + return -1; 874 + 875 + if (objdump) { 876 + disasm_line__parse(name, namep, rawp); 877 + } else 878 + *namep = ""; 879 + 880 + tmp_raw_insn = strndup(name_raw_insn, 11); 881 + if (tmp_raw_insn == NULL) 882 + return -1; 883 + 884 + remove_spaces(tmp_raw_insn); 885 + 886 + sscanf(tmp_raw_insn, "%x", &dl->raw.raw_insn); 887 + if (objdump) 888 + dl->raw.raw_insn = be32_to_cpu(dl->raw.raw_insn); 889 + 890 + return 0; 891 + } 892 + 849 893 static void annotation_line__init(struct annotation_line *al, 850 894 struct annotate_args *args, 851 895 int nr) ··· 944 898 goto out_delete; 945 899 946 900 if (args->offset != -1) { 947 - if (disasm_line__parse(dl->al.line, &dl->ins.name, &dl->ops.raw) < 0) 901 + if (arch__is(args->arch, "powerpc")) { 902 + if (disasm_line__parse_powerpc(dl) < 0) 903 + goto out_free_line; 904 + } else if (disasm_line__parse(dl->al.line, &dl->ins.name, &dl->ops.raw) < 0) 948 905 goto out_free_line; 949 906 950 907 disasm_line__init_ins(dl, args->arch, &args->ms);