Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

at v5.3-rc4 482 lines 22 kB view raw
1 Coresight - HW Assisted Tracing on ARM 2 ====================================== 3 4 Author: Mathieu Poirier <mathieu.poirier@linaro.org> 5 Date: September 11th, 2014 6 7Introduction 8------------ 9 10Coresight is an umbrella of technologies allowing for the debugging of ARM 11based SoC. It includes solutions for JTAG and HW assisted tracing. This 12document is concerned with the latter. 13 14HW assisted tracing is becoming increasingly useful when dealing with systems 15that have many SoCs and other components like GPU and DMA engines. ARM has 16developed a HW assisted tracing solution by means of different components, each 17being added to a design at synthesis time to cater to specific tracing needs. 18Components are generally categorised as source, link and sinks and are 19(usually) discovered using the AMBA bus. 20 21"Sources" generate a compressed stream representing the processor instruction 22path based on tracing scenarios as configured by users. From there the stream 23flows through the coresight system (via ATB bus) using links that are connecting 24the emanating source to a sink(s). Sinks serve as endpoints to the coresight 25implementation, either storing the compressed stream in a memory buffer or 26creating an interface to the outside world where data can be transferred to a 27host without fear of filling up the onboard coresight memory buffer. 28 29At typical coresight system would look like this: 30 31 ***************************************************************** 32 **************************** AMBA AXI ****************************===|| 33 ***************************************************************** || 34 ^ ^ | || 35 | | * ** 36 0000000 ::::: 0000000 ::::: ::::: @@@@@@@ |||||||||||| 37 0 CPU 0<-->: C : 0 CPU 0<-->: C : : C : @ STM @ || System || 38 |->0000000 : T : |->0000000 : T : : T :<--->@@@@@ || Memory || 39 | #######<-->: I : | #######<-->: I : : I : @@@<-| |||||||||||| 40 | # ETM # ::::: | # PTM # ::::: ::::: @ | 41 | ##### ^ ^ | ##### ^ ! ^ ! . | ||||||||| 42 | |->### | ! | |->### | ! | ! . | || DAP || 43 | | # | ! | | # | ! | ! . | ||||||||| 44 | | . | ! | | . | ! | ! . | | | 45 | | . | ! | | . | ! | ! . | | * 46 | | . | ! | | . | ! | ! . | | SWD/ 47 | | . | ! | | . | ! | ! . | | JTAG 48 *****************************************************************<-| 49 *************************** AMBA Debug APB ************************ 50 ***************************************************************** 51 | . ! . ! ! . | 52 | . * . * * . | 53 ***************************************************************** 54 ******************** Cross Trigger Matrix (CTM) ******************* 55 ***************************************************************** 56 | . ^ . . | 57 | * ! * * | 58 ***************************************************************** 59 ****************** AMBA Advanced Trace Bus (ATB) ****************** 60 ***************************************************************** 61 | ! =============== | 62 | * ===== F =====<---------| 63 | ::::::::: ==== U ==== 64 |-->:: CTI ::<!! === N === 65 | ::::::::: ! == N == 66 | ^ * == E == 67 | ! &&&&&&&&& IIIIIII == L == 68 |------>&& ETB &&<......II I ======= 69 | ! &&&&&&&&& II I . 70 | ! I I . 71 | ! I REP I<.......... 72 | ! I I 73 | !!>&&&&&&&&& II I *Source: ARM ltd. 74 |------>& TPIU &<......II I DAP = Debug Access Port 75 &&&&&&&&& IIIIIII ETM = Embedded Trace Macrocell 76 ; PTM = Program Trace Macrocell 77 ; CTI = Cross Trigger Interface 78 * ETB = Embedded Trace Buffer 79 To trace port TPIU= Trace Port Interface Unit 80 SWD = Serial Wire Debug 81 82While on target configuration of the components is done via the APB bus, 83all trace data are carried out-of-band on the ATB bus. The CTM provides 84a way to aggregate and distribute signals between CoreSight components. 85 86The coresight framework provides a central point to represent, configure and 87manage coresight devices on a platform. This first implementation centers on 88the basic tracing functionality, enabling components such ETM/PTM, funnel, 89replicator, TMC, TPIU and ETB. Future work will enable more 90intricate IP blocks such as STM and CTI. 91 92 93Acronyms and Classification 94--------------------------- 95 96Acronyms: 97 98PTM: Program Trace Macrocell 99ETM: Embedded Trace Macrocell 100STM: System trace Macrocell 101ETB: Embedded Trace Buffer 102ITM: Instrumentation Trace Macrocell 103TPIU: Trace Port Interface Unit 104TMC-ETR: Trace Memory Controller, configured as Embedded Trace Router 105TMC-ETF: Trace Memory Controller, configured as Embedded Trace FIFO 106CTI: Cross Trigger Interface 107 108Classification: 109 110Source: 111 ETMv3.x ETMv4, PTMv1.0, PTMv1.1, STM, STM500, ITM 112Link: 113 Funnel, replicator (intelligent or not), TMC-ETR 114Sinks: 115 ETBv1.0, ETB1.1, TPIU, TMC-ETF 116Misc: 117 CTI 118 119 120Device Tree Bindings 121---------------------- 122 123See Documentation/devicetree/bindings/arm/coresight.txt for details. 124 125As of this writing drivers for ITM, STMs and CTIs are not provided but are 126expected to be added as the solution matures. 127 128 129Framework and implementation 130---------------------------- 131 132The coresight framework provides a central point to represent, configure and 133manage coresight devices on a platform. Any coresight compliant device can 134register with the framework for as long as they use the right APIs: 135 136struct coresight_device *coresight_register(struct coresight_desc *desc); 137void coresight_unregister(struct coresight_device *csdev); 138 139The registering function is taking a "struct coresight_device *csdev" and 140register the device with the core framework. The unregister function takes 141a reference to a "struct coresight_device", obtained at registration time. 142 143If everything goes well during the registration process the new devices will 144show up under /sys/bus/coresight/devices, as showns here for a TC2 platform: 145 146root:~# ls /sys/bus/coresight/devices/ 147replicator 20030000.tpiu 2201c000.ptm 2203c000.etm 2203e000.etm 14820010000.etb 20040000.funnel 2201d000.ptm 2203d000.etm 149root:~# 150 151The functions take a "struct coresight_device", which looks like this: 152 153struct coresight_desc { 154 enum coresight_dev_type type; 155 struct coresight_dev_subtype subtype; 156 const struct coresight_ops *ops; 157 struct coresight_platform_data *pdata; 158 struct device *dev; 159 const struct attribute_group **groups; 160}; 161 162 163The "coresight_dev_type" identifies what the device is, i.e, source link or 164sink while the "coresight_dev_subtype" will characterise that type further. 165 166The "struct coresight_ops" is mandatory and will tell the framework how to 167perform base operations related to the components, each component having 168a different set of requirement. For that "struct coresight_ops_sink", 169"struct coresight_ops_link" and "struct coresight_ops_source" have been 170provided. 171 172The next field, "struct coresight_platform_data *pdata" is acquired by calling 173"of_get_coresight_platform_data()", as part of the driver's _probe routine and 174"struct device *dev" gets the device reference embedded in the "amba_device": 175 176static int etm_probe(struct amba_device *adev, const struct amba_id *id) 177{ 178 ... 179 ... 180 drvdata->dev = &adev->dev; 181 ... 182} 183 184Specific class of device (source, link, or sink) have generic operations 185that can be performed on them (see "struct coresight_ops"). The 186"**groups" is a list of sysfs entries pertaining to operations 187specific to that component only. "Implementation defined" customisations are 188expected to be accessed and controlled using those entries. 189 190 191Device Naming scheme 192------------------------ 193The devices that appear on the "coresight" bus were named the same as their 194parent devices, i.e, the real devices that appears on AMBA bus or the platform bus. 195Thus the names were based on the Linux Open Firmware layer naming convention, 196which follows the base physical address of the device followed by the device 197type. e.g: 198 199root:~# ls /sys/bus/coresight/devices/ 200 20010000.etf 20040000.funnel 20100000.stm 22040000.etm 201 22140000.etm 230c0000.funnel 23240000.etm 20030000.tpiu 202 20070000.etr 20120000.replicator 220c0000.funnel 203 23040000.etm 23140000.etm 23340000.etm 204 205However, with the introduction of ACPI support, the names of the real 206devices are a bit cryptic and non-obvious. Thus, a new naming scheme was 207introduced to use more generic names based on the type of the device. The 208following rules apply: 209 210 1) Devices that are bound to CPUs, are named based on the CPU logical 211 number. 212 213 e.g, ETM bound to CPU0 is named "etm0" 214 215 2) All other devices follow a pattern, "<device_type_prefix>N", where : 216 217 <device_type_prefix> - A prefix specific to the type of the device 218 N - a sequential number assigned based on the order 219 of probing. 220 221 e.g, tmc_etf0, tmc_etr0, funnel0, funnel1 222 223Thus, with the new scheme the devices could appear as : 224 225root:~# ls /sys/bus/coresight/devices/ 226 etm0 etm1 etm2 etm3 etm4 etm5 funnel0 227 funnel1 funnel2 replicator0 stm0 tmc_etf0 tmc_etr0 tpiu0 228 229Some of the examples below might refer to old naming scheme and some 230to the newer scheme, to give a confirmation that what you see on your 231system is not unexpected. One must use the "names" as they appear on 232the system under specified locations. 233 234How to use the tracer modules 235----------------------------- 236 237There are two ways to use the Coresight framework: 1) using the perf cmd line 238tools and 2) interacting directly with the Coresight devices using the sysFS 239interface. Preference is given to the former as using the sysFS interface 240requires a deep understanding of the Coresight HW. The following sections 241provide details on using both methods. 242 2431) Using the sysFS interface: 244 245Before trace collection can start, a coresight sink needs to be identified. 246There is no limit on the amount of sinks (nor sources) that can be enabled at 247any given moment. As a generic operation, all device pertaining to the sink 248class will have an "active" entry in sysfs: 249 250root:/sys/bus/coresight/devices# ls 251replicator 20030000.tpiu 2201c000.ptm 2203c000.etm 2203e000.etm 25220010000.etb 20040000.funnel 2201d000.ptm 2203d000.etm 253root:/sys/bus/coresight/devices# ls 20010000.etb 254enable_sink status trigger_cntr 255root:/sys/bus/coresight/devices# echo 1 > 20010000.etb/enable_sink 256root:/sys/bus/coresight/devices# cat 20010000.etb/enable_sink 2571 258root:/sys/bus/coresight/devices# 259 260At boot time the current etm3x driver will configure the first address 261comparator with "_stext" and "_etext", essentially tracing any instruction 262that falls within that range. As such "enabling" a source will immediately 263trigger a trace capture: 264 265root:/sys/bus/coresight/devices# echo 1 > 2201c000.ptm/enable_source 266root:/sys/bus/coresight/devices# cat 2201c000.ptm/enable_source 2671 268root:/sys/bus/coresight/devices# cat 20010000.etb/status 269Depth: 0x2000 270Status: 0x1 271RAM read ptr: 0x0 272RAM wrt ptr: 0x19d3 <----- The write pointer is moving 273Trigger cnt: 0x0 274Control: 0x1 275Flush status: 0x0 276Flush ctrl: 0x2001 277root:/sys/bus/coresight/devices# 278 279Trace collection is stopped the same way: 280 281root:/sys/bus/coresight/devices# echo 0 > 2201c000.ptm/enable_source 282root:/sys/bus/coresight/devices# 283 284The content of the ETB buffer can be harvested directly from /dev: 285 286root:/sys/bus/coresight/devices# dd if=/dev/20010000.etb \ 287of=~/cstrace.bin 288 28964+0 records in 29064+0 records out 29132768 bytes (33 kB) copied, 0.00125258 s, 26.2 MB/s 292root:/sys/bus/coresight/devices# 293 294The file cstrace.bin can be decompressed using "ptm2human", DS-5 or Trace32. 295 296Following is a DS-5 output of an experimental loop that increments a variable up 297to a certain value. The example is simple and yet provides a glimpse of the 298wealth of possibilities that coresight provides. 299 300Info Tracing enabled 301Instruction 106378866 0x8026B53C E52DE004 false PUSH {lr} 302Instruction 0 0x8026B540 E24DD00C false SUB sp,sp,#0xc 303Instruction 0 0x8026B544 E3A03000 false MOV r3,#0 304Instruction 0 0x8026B548 E58D3004 false STR r3,[sp,#4] 305Instruction 0 0x8026B54C E59D3004 false LDR r3,[sp,#4] 306Instruction 0 0x8026B550 E3530004 false CMP r3,#4 307Instruction 0 0x8026B554 E2833001 false ADD r3,r3,#1 308Instruction 0 0x8026B558 E58D3004 false STR r3,[sp,#4] 309Instruction 0 0x8026B55C DAFFFFFA true BLE {pc}-0x10 ; 0x8026b54c 310Timestamp Timestamp: 17106715833 311Instruction 319 0x8026B54C E59D3004 false LDR r3,[sp,#4] 312Instruction 0 0x8026B550 E3530004 false CMP r3,#4 313Instruction 0 0x8026B554 E2833001 false ADD r3,r3,#1 314Instruction 0 0x8026B558 E58D3004 false STR r3,[sp,#4] 315Instruction 0 0x8026B55C DAFFFFFA true BLE {pc}-0x10 ; 0x8026b54c 316Instruction 9 0x8026B54C E59D3004 false LDR r3,[sp,#4] 317Instruction 0 0x8026B550 E3530004 false CMP r3,#4 318Instruction 0 0x8026B554 E2833001 false ADD r3,r3,#1 319Instruction 0 0x8026B558 E58D3004 false STR r3,[sp,#4] 320Instruction 0 0x8026B55C DAFFFFFA true BLE {pc}-0x10 ; 0x8026b54c 321Instruction 7 0x8026B54C E59D3004 false LDR r3,[sp,#4] 322Instruction 0 0x8026B550 E3530004 false CMP r3,#4 323Instruction 0 0x8026B554 E2833001 false ADD r3,r3,#1 324Instruction 0 0x8026B558 E58D3004 false STR r3,[sp,#4] 325Instruction 0 0x8026B55C DAFFFFFA true BLE {pc}-0x10 ; 0x8026b54c 326Instruction 7 0x8026B54C E59D3004 false LDR r3,[sp,#4] 327Instruction 0 0x8026B550 E3530004 false CMP r3,#4 328Instruction 0 0x8026B554 E2833001 false ADD r3,r3,#1 329Instruction 0 0x8026B558 E58D3004 false STR r3,[sp,#4] 330Instruction 0 0x8026B55C DAFFFFFA true BLE {pc}-0x10 ; 0x8026b54c 331Instruction 10 0x8026B54C E59D3004 false LDR r3,[sp,#4] 332Instruction 0 0x8026B550 E3530004 false CMP r3,#4 333Instruction 0 0x8026B554 E2833001 false ADD r3,r3,#1 334Instruction 0 0x8026B558 E58D3004 false STR r3,[sp,#4] 335Instruction 0 0x8026B55C DAFFFFFA true BLE {pc}-0x10 ; 0x8026b54c 336Instruction 6 0x8026B560 EE1D3F30 false MRC p15,#0x0,r3,c13,c0,#1 337Instruction 0 0x8026B564 E1A0100D false MOV r1,sp 338Instruction 0 0x8026B568 E3C12D7F false BIC r2,r1,#0x1fc0 339Instruction 0 0x8026B56C E3C2203F false BIC r2,r2,#0x3f 340Instruction 0 0x8026B570 E59D1004 false LDR r1,[sp,#4] 341Instruction 0 0x8026B574 E59F0010 false LDR r0,[pc,#16] ; [0x8026B58C] = 0x80550368 342Instruction 0 0x8026B578 E592200C false LDR r2,[r2,#0xc] 343Instruction 0 0x8026B57C E59221D0 false LDR r2,[r2,#0x1d0] 344Instruction 0 0x8026B580 EB07A4CF true BL {pc}+0x1e9344 ; 0x804548c4 345Info Tracing enabled 346Instruction 13570831 0x8026B584 E28DD00C false ADD sp,sp,#0xc 347Instruction 0 0x8026B588 E8BD8000 true LDM sp!,{pc} 348Timestamp Timestamp: 17107041535 349 3502) Using perf framework: 351 352Coresight tracers are represented using the Perf framework's Performance 353Monitoring Unit (PMU) abstraction. As such the perf framework takes charge of 354controlling when tracing gets enabled based on when the process of interest is 355scheduled. When configured in a system, Coresight PMUs will be listed when 356queried by the perf command line tool: 357 358 linaro@linaro-nano:~$ ./perf list pmu 359 360 List of pre-defined events (to be used in -e): 361 362 cs_etm// [Kernel PMU event] 363 364 linaro@linaro-nano:~$ 365 366Regardless of the number of tracers available in a system (usually equal to the 367amount of processor cores), the "cs_etm" PMU will be listed only once. 368 369A Coresight PMU works the same way as any other PMU, i.e the name of the PMU is 370listed along with configuration options within forward slashes '/'. Since a 371Coresight system will typically have more than one sink, the name of the sink to 372work with needs to be specified as an event option. 373On newer kernels the available sinks are listed in sysFS under: 374($SYSFS)/bus/event_source/devices/cs_etm/sinks/ 375 376 root@localhost:/sys/bus/event_source/devices/cs_etm/sinks# ls 377 tmc_etf0 tmc_etr0 tpiu0 378 379On older kernels, this may need to be found from the list of coresight devices, 380available under ($SYSFS)/bus/coresight/devices/: 381 382 root:~# ls /sys/bus/coresight/devices/ 383 etm0 etm1 etm2 etm3 etm4 etm5 funnel0 384 funnel1 funnel2 replicator0 stm0 tmc_etf0 tmc_etr0 tpiu0 385 386 root@linaro-nano:~# perf record -e cs_etm/@tmc_etr0/u --per-thread program 387 388As mentioned above in section "Device Naming scheme", the names of the devices could 389look different from what is used in the example above. One must use the device names 390as it appears under the sysFS. 391 392The syntax within the forward slashes '/' is important. The '@' character 393tells the parser that a sink is about to be specified and that this is the sink 394to use for the trace session. 395 396More information on the above and other example on how to use Coresight with 397the perf tools can be found in the "HOWTO.md" file of the openCSD gitHub 398repository [3]. 399 4002.1) AutoFDO analysis using the perf tools: 401 402perf can be used to record and analyze trace of programs. 403 404Execution can be recorded using 'perf record' with the cs_etm event, 405specifying the name of the sink to record to, e.g: 406 407 perf record -e cs_etm/@tmc_etr0/u --per-thread 408 409The 'perf report' and 'perf script' commands can be used to analyze execution, 410synthesizing instruction and branch events from the instruction trace. 411'perf inject' can be used to replace the trace data with the synthesized events. 412The --itrace option controls the type and frequency of synthesized events 413(see perf documentation). 414 415Note that only 64-bit programs are currently supported - further work is 416required to support instruction decode of 32-bit Arm programs. 417 418 419Generating coverage files for Feedback Directed Optimization: AutoFDO 420--------------------------------------------------------------------- 421 422'perf inject' accepts the --itrace option in which case tracing data is 423removed and replaced with the synthesized events. e.g. 424 425 perf inject --itrace --strip -i perf.data -o perf.data.new 426 427Below is an example of using ARM ETM for autoFDO. It requires autofdo 428(https://github.com/google/autofdo) and gcc version 5. The bubble 429sort example is from the AutoFDO tutorial (https://gcc.gnu.org/wiki/AutoFDO/Tutorial). 430 431 $ gcc-5 -O3 sort.c -o sort 432 $ taskset -c 2 ./sort 433 Bubble sorting array of 30000 elements 434 5910 ms 435 436 $ perf record -e cs_etm/@tmc_etr0/u --per-thread taskset -c 2 ./sort 437 Bubble sorting array of 30000 elements 438 12543 ms 439 [ perf record: Woken up 35 times to write data ] 440 [ perf record: Captured and wrote 69.640 MB perf.data ] 441 442 $ perf inject -i perf.data -o inj.data --itrace=il64 --strip 443 $ create_gcov --binary=./sort --profile=inj.data --gcov=sort.gcov -gcov_version=1 444 $ gcc-5 -O3 -fauto-profile=sort.gcov sort.c -o sort_autofdo 445 $ taskset -c 2 ./sort_autofdo 446 Bubble sorting array of 30000 elements 447 5806 ms 448 449 450How to use the STM module 451------------------------- 452 453Using the System Trace Macrocell module is the same as the tracers - the only 454difference is that clients are driving the trace capture rather 455than the program flow through the code. 456 457As with any other CoreSight component, specifics about the STM tracer can be 458found in sysfs with more information on each entry being found in [1]: 459 460root@genericarmv8:~# ls /sys/bus/coresight/devices/stm0 461enable_source hwevent_select port_enable subsystem uevent 462hwevent_enable mgmt port_select traceid 463root@genericarmv8:~# 464 465Like any other source a sink needs to be identified and the STM enabled before 466being used: 467 468root@genericarmv8:~# echo 1 > /sys/bus/coresight/devices/tmc_etf0/enable_sink 469root@genericarmv8:~# echo 1 > /sys/bus/coresight/devices/stm0/enable_source 470 471From there user space applications can request and use channels using the devfs 472interface provided for that purpose by the generic STM API: 473 474root@genericarmv8:~# ls -l /dev/stm0 475crw------- 1 root root 10, 61 Jan 3 18:11 /dev/stm0 476root@genericarmv8:~# 477 478Details on how to use the generic STM API can be found here [2]. 479 480[1]. Documentation/ABI/testing/sysfs-bus-coresight-devices-stm 481[2]. Documentation/trace/stm.rst 482[3]. https://github.com/Linaro/perf-opencsd