Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

Merge tag 'trace-ring-buffer-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull ring-buffer updates from Steven Rostedt:

- tracing/ring-buffer: persistent buffer across reboots

This allows for the tracing instance ring buffer to stay persistent
across reboots. The way this is done is by adding to the kernel
command line:

trace_instance=boot_map@0x285400000:12M

This will reserve 12 megabytes at the address 0x285400000, and then
map the tracing instance "boot_map" ring buffer to that memory. This
will appear as a normal instance in the tracefs system:

/sys/kernel/tracing/instances/boot_map

A user could enable tracing in that instance, and on reboot or kernel
crash, if the memory is not wiped by the firmware, it will recreate
the trace in that instance. For example, if one was debugging a
shutdown of a kernel reboot:

# cd /sys/kernel/tracing
# echo function > instances/boot_map/current_tracer
# reboot
[..]
# cd /sys/kernel/tracing
# tail instances/boot_map/trace
swapper/0-1 [000] d..1. 164.549800: restore_boot_irq_mode <-native_machine_shutdown
swapper/0-1 [000] d..1. 164.549801: native_restore_boot_irq_mode <-native_machine_shutdown
swapper/0-1 [000] d..1. 164.549802: disconnect_bsp_APIC <-native_machine_shutdown
swapper/0-1 [000] d..1. 164.549811: hpet_disable <-native_machine_shutdown
swapper/0-1 [000] d..1. 164.549812: iommu_shutdown_noop <-native_machine_restart
swapper/0-1 [000] d..1. 164.549813: native_machine_emergency_restart <-__do_sys_reboot
swapper/0-1 [000] d..1. 164.549813: tboot_shutdown <-native_machine_emergency_restart
swapper/0-1 [000] d..1. 164.549820: acpi_reboot <-native_machine_emergency_restart
swapper/0-1 [000] d..1. 164.549821: acpi_reset <-acpi_reboot
swapper/0-1 [000] d..1. 164.549822: acpi_os_write_port <-acpi_reboot

On reboot, the buffer is examined to make sure it is valid. The
validation check even steps through every event to make sure the meta
data of the event is correct. If any test fails, it will simply reset
the buffer, and the buffer will be empty on boot.

- Allow the tracing persistent boot buffer to use the "reserve_mem"
option

Instead of having the admin find a physical address to store the
persistent buffer, which can be very tedious if they have to
administrate several different machines, allow them to use the
"reserve_mem" option that will find a location for them. It is not as
reliable because of KASLR, as the loading of the kernel in different
locations can cause the memory allocated to be inconsistent. Booting
with "nokaslr" can make reserve_mem more reliable.

- Have function graph tracer handle offsets from a previous boot.

The ring buffer output from a previous boot may have different
addresses due to kaslr. Have the function graph tracer handle these
by using the delta from the previous boot to the new boot address
space.

- Only reset the saved meta offset when the buffer is started or reset

In the persistent memory meta data, it holds the previous address
space information, so that it can calculate the delta to have
function tracing work. But this gets updated after being read to hold
the new address space. But if the buffer isn't used for that boot, on
reboot, the delta is now calculated from the previous boot and not
the boot that holds the data in the ring buffer. This causes the
functions not to be shown. Do not save the address space information
of the current kernel until it is being recorded.

- Add a magic variable to test the valid meta data

Add a magic variable in the meta data that can also be used for
validation. The validator of the previous buffer doesn't need this
magic data, but it can be used if the meta data is changed by a new
kernel, which may have the same format that passes the validator but
is used differently. This magic number can also be used as a
"versioning" of the meta data.

- Align user space mapped ring buffer sub buffers to improve TLB
entries

Linus mentioned that the mapped ring buffer sub buffers were
misaligned between the meta page and the sub-buffers, so that if the
sub-buffers were bigger than PAGE_SIZE, it wouldn't allow the TLB to
use bigger entries.

- Add new kernel command line "traceoff" to disable tracing on boot for
instances

If tracing is enabled for a boot instance, there needs a way to be
able to disable it on boot so that new events do not get entered into
the ring buffer and be mixed with events from a previous boot, as
that can be confusing.

- Allow trace_printk() to go to other instances

Currently, trace_printk() can only go to the top level instance. When
debugging with a persistent buffer, it is really useful to be able to
add trace_printk() to go to that buffer, so that you have access to
them after a crash.

- Do not use "bin_printk()" for traces to a boot instance

The bin_printk() saves only a pointer to the printk format in the
ring buffer, as the reader of the buffer can still have access to it.
But this is not the case if the buffer is from a previous boot. If
the trace_printk() is going to a "persistent" buffer, it will use the
slower version that writes the printk format into the buffer.

- Add command line option to allow trace_printk() to go to an instance

Allow the kernel command line to define which instance the
trace_printk() goes to, instead of forcing the admin to set it for
every boot via the tracefs options.

- Start a document that explains how to use tracefs to debug the kernel

- Add some more kernel selftests to test user mapped ring buffer

* tag 'trace-ring-buffer-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (28 commits)
selftests/ring-buffer: Handle meta-page bigger than the system
selftests/ring-buffer: Verify the entire meta-page padding
tracing/Documentation: Start a document on how to debug with tracing
tracing: Add option to set an instance to be the trace_printk destination
tracing: Have trace_printk not use binary prints if boot buffer
tracing: Allow trace_printk() to go to other instance buffers
tracing: Add "traceoff" flag to boot time tracing instances
ring-buffer: Align meta-page to sub-buffers for improved TLB usage
ring-buffer: Add magic and struct size to boot up meta data
ring-buffer: Don't reset persistent ring-buffer meta saved addresses
tracing/fgraph: Have fgraph handle previous boot function addresses
tracing: Allow boot instances to use reserve_mem boot memory
tracing: Fix ifdef of snapshots to not prevent last_boot_info file
ring-buffer: Use vma_pages() helper function
tracing: Fix NULL vs IS_ERR() check in enable_instances()
tracing: Add last boot delta offset for stack traces
tracing: Update function tracing output for previous boot buffer
tracing: Handle old buffer mappings for event strings and functions
tracing/ring-buffer: Add last_boot_info file to boot instance
ring-buffer: Save text and data locations in mapped meta data
...

+1482 -151
+45
Documentation/admin-guide/kernel-parameters.txt
··· 6808 6808 the same thing would happen if it was left off). The irq_handler_entry 6809 6809 event, and all events under the "initcall" system. 6810 6810 6811 + Flags can be added to the instance to modify its behavior when it is 6812 + created. The flags are separated by '^'. 6813 + 6814 + The available flags are: 6815 + 6816 + traceoff - Have the tracing instance tracing disabled after it is created. 6817 + traceprintk - Have trace_printk() write into this trace instance 6818 + (note, "printk" and "trace_printk" can also be used) 6819 + 6820 + trace_instance=foo^traceoff^traceprintk,sched,irq 6821 + 6822 + The flags must come before the defined events. 6823 + 6824 + If memory has been reserved (see memmap for x86), the instance 6825 + can use that memory: 6826 + 6827 + memmap=12M$0x284500000 trace_instance=boot_map@0x284500000:12M 6828 + 6829 + The above will create a "boot_map" instance that uses the physical 6830 + memory at 0x284500000 that is 12Megs. The per CPU buffers of that 6831 + instance will be split up accordingly. 6832 + 6833 + Alternatively, the memory can be reserved by the reserve_mem option: 6834 + 6835 + reserve_mem=12M:4096:trace trace_instance=boot_map@trace 6836 + 6837 + This will reserve 12 megabytes at boot up with a 4096 byte alignment 6838 + and place the ring buffer in this memory. Note that due to KASLR, the 6839 + memory may not be the same location each time, which will not preserve 6840 + the buffer content. 6841 + 6842 + Also note that the layout of the ring buffer data may change between 6843 + kernel versions where the validator will fail and reset the ring buffer 6844 + if the layout is not the same as the previous kernel. 6845 + 6846 + If the ring buffer is used for persistent bootups and has events enabled, 6847 + it is recommend to disable tracing so that events from a previous boot do not 6848 + mix with events of the current boot (unless you are debugging a random crash 6849 + at boot up). 6850 + 6851 + reserve_mem=12M:4096:trace trace_instance=boot_map^traceoff^traceprintk@trace,sched,irq 6852 + 6853 + See also Documentation/trace/debugging.rst 6854 + 6855 + 6811 6856 trace_options=[option-list] 6812 6857 [FTRACE] Enable or disable tracer options at boot. 6813 6858 The option-list is a comma delimited list of options
+159
Documentation/trace/debugging.rst
··· 1 + ============================== 2 + Using the tracer for debugging 3 + ============================== 4 + 5 + Copyright 2024 Google LLC. 6 + 7 + :Author: Steven Rostedt <rostedt@goodmis.org> 8 + :License: The GNU Free Documentation License, Version 1.2 9 + (dual licensed under the GPL v2) 10 + 11 + - Written for: 6.12 12 + 13 + Introduction 14 + ------------ 15 + The tracing infrastructure can be very useful for debugging the Linux 16 + kernel. This document is a place to add various methods of using the tracer 17 + for debugging. 18 + 19 + First, make sure that the tracefs file system is mounted:: 20 + 21 + $ sudo mount -t tracefs tracefs /sys/kernel/tracing 22 + 23 + 24 + Using trace_printk() 25 + -------------------- 26 + 27 + trace_printk() is a very lightweight utility that can be used in any context 28 + inside the kernel, with the exception of "noinstr" sections. It can be used 29 + in normal, softirq, interrupt and even NMI context. The trace data is 30 + written to the tracing ring buffer in a lockless way. To make it even 31 + lighter weight, when possible, it will only record the pointer to the format 32 + string, and save the raw arguments into the buffer. The format and the 33 + arguments will be post processed when the ring buffer is read. This way the 34 + trace_printk() format conversions are not done during the hot path, where 35 + the trace is being recorded. 36 + 37 + trace_printk() is meant only for debugging, and should never be added into 38 + a subsystem of the kernel. If you need debugging traces, add trace events 39 + instead. If a trace_printk() is found in the kernel, the following will 40 + appear in the dmesg:: 41 + 42 + ********************************************************** 43 + ** NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE ** 44 + ** ** 45 + ** trace_printk() being used. Allocating extra memory. ** 46 + ** ** 47 + ** This means that this is a DEBUG kernel and it is ** 48 + ** unsafe for production use. ** 49 + ** ** 50 + ** If you see this message and you are not debugging ** 51 + ** the kernel, report this immediately to your vendor! ** 52 + ** ** 53 + ** NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE ** 54 + ********************************************************** 55 + 56 + Debugging kernel crashes 57 + ------------------------ 58 + There is various methods of acquiring the state of the system when a kernel 59 + crash occurs. This could be from the oops message in printk, or one could 60 + use kexec/kdump. But these just show what happened at the time of the crash. 61 + It can be very useful in knowing what happened up to the point of the crash. 62 + The tracing ring buffer, by default, is a circular buffer than will 63 + overwrite older events with newer ones. When a crash happens, the content of 64 + the ring buffer will be all the events that lead up to the crash. 65 + 66 + There are several kernel command line parameters that can be used to help in 67 + this. The first is "ftrace_dump_on_oops". This will dump the tracing ring 68 + buffer when a oops occurs to the console. This can be useful if the console 69 + is being logged somewhere. If a serial console is used, it may be prudent to 70 + make sure the ring buffer is relatively small, otherwise the dumping of the 71 + ring buffer may take several minutes to hours to finish. Here's an example 72 + of the kernel command line:: 73 + 74 + ftrace_dump_on_oops trace_buf_size=50K 75 + 76 + Note, the tracing buffer is made up of per CPU buffers where each of these 77 + buffers is broken up into sub-buffers that are by default PAGE_SIZE. The 78 + above trace_buf_size option above sets each of the per CPU buffers to 50K, 79 + so, on a machine with 8 CPUs, that's actually 400K total. 80 + 81 + Persistent buffers across boots 82 + ------------------------------- 83 + If the system memory allows it, the tracing ring buffer can be specified at 84 + a specific location in memory. If the location is the same across boots and 85 + the memory is not modified, the tracing buffer can be retrieved from the 86 + following boot. There's two ways to reserve memory for the use of the ring 87 + buffer. 88 + 89 + The more reliable way (on x86) is to reserve memory with the "memmap" kernel 90 + command line option and then use that memory for the trace_instance. This 91 + requires a bit of knowledge of the physical memory layout of the system. The 92 + advantage of using this method, is that the memory for the ring buffer will 93 + always be the same:: 94 + 95 + memmap==12M$0x284500000 trace_instance=boot_map@0x284500000:12M 96 + 97 + The memmap above reserves 12 megabytes of memory at the physical memory 98 + location 0x284500000. Then the trace_instance option will create a trace 99 + instance "boot_map" at that same location with the same amount of memory 100 + reserved. As the ring buffer is broke up into per CPU buffers, the 12 101 + megabytes will be broken up evenly between those CPUs. If you have 8 CPUs, 102 + each per CPU ring buffer will be 1.5 megabytes in size. Note, that also 103 + includes meta data, so the amount of memory actually used by the ring buffer 104 + will be slightly smaller. 105 + 106 + Another more generic but less robust way to allocate a ring buffer mapping 107 + at boot is with the "reserve_mem" option:: 108 + 109 + reserve_mem=12M:4096:trace trace_instance=boot_map@trace 110 + 111 + The reserve_mem option above will find 12 megabytes that are available at 112 + boot up, and align it by 4096 bytes. It will label this memory as "trace" 113 + that can be used by later command line options. 114 + 115 + The trace_instance option creates a "boot_map" instance and will use the 116 + memory reserved by reserve_mem that was labeled as "trace". This method is 117 + more generic but may not be as reliable. Due to KASLR, the memory reserved 118 + by reserve_mem may not be located at the same location. If this happens, 119 + then the ring buffer will not be from the previous boot and will be reset. 120 + 121 + Sometimes, by using a larger alignment, it can keep KASLR from moving things 122 + around in such a way that it will move the location of the reserve_mem. By 123 + using a larger alignment, you may find better that the buffer is more 124 + consistent to where it is placed:: 125 + 126 + reserve_mem=12M:0x2000000:trace trace_instance=boot_map@trace 127 + 128 + On boot up, the memory reserved for the ring buffer is validated. It will go 129 + through a series of tests to make sure that the ring buffer contains valid 130 + data. If it is, it will then set it up to be available to read from the 131 + instance. If it fails any of the tests, it will clear the entire ring buffer 132 + and initialize it as new. 133 + 134 + The layout of this mapped memory may not be consistent from kernel to 135 + kernel, so only the same kernel is guaranteed to work if the mapping is 136 + preserved. Switching to a different kernel version may find a different 137 + layout and mark the buffer as invalid. 138 + 139 + Using trace_printk() in the boot instance 140 + ----------------------------------------- 141 + By default, the content of trace_printk() goes into the top level tracing 142 + instance. But this instance is never preserved across boots. To have the 143 + trace_printk() content, and some other internal tracing go to the preserved 144 + buffer (like dump stacks), either set the instance to be the trace_printk() 145 + destination from the kernel command line, or set it after boot up via the 146 + trace_printk_dest option. 147 + 148 + After boot up:: 149 + 150 + echo 1 > /sys/kernel/tracing/instances/boot_map/options/trace_printk_dest 151 + 152 + From the kernel command line:: 153 + 154 + reserve_mem=12M:4096:trace trace_instance=boot_map^traceprintk^traceoff@trace 155 + 156 + If setting it from the kernel command line, it is recommended to also 157 + disable tracing with the "traceoff" flag, and enable tracing after boot up. 158 + Otherwise the trace from the most recent boot will be mixed with the trace 159 + from the previous boot, and may make it confusing to read.
+12
Documentation/trace/ftrace.rst
··· 1186 1186 trace_printk 1187 1187 Can disable trace_printk() from writing into the buffer. 1188 1188 1189 + trace_printk_dest 1190 + Set to have trace_printk() and similar internal tracing functions 1191 + write into this instance. Note, only one trace instance can have 1192 + this set. By setting this flag, it clears the trace_printk_dest flag 1193 + of the instance that had it set previously. By default, the top 1194 + level trace has this set, and will get it set again if another 1195 + instance has it set then clears it. 1196 + 1197 + This flag cannot be cleared by the top level instance, as it is the 1198 + default instance. The only way the top level instance has this flag 1199 + cleared, is by it being set in another instance. 1200 + 1189 1201 annotate 1190 1202 It is sometimes confusing when the CPU buffers are full 1191 1203 and one CPU buffer had a lot of events recently, thus
+20
include/linux/ring_buffer.h
··· 89 89 struct trace_buffer * 90 90 __ring_buffer_alloc(unsigned long size, unsigned flags, struct lock_class_key *key); 91 91 92 + struct trace_buffer *__ring_buffer_alloc_range(unsigned long size, unsigned flags, 93 + int order, unsigned long start, 94 + unsigned long range_size, 95 + struct lock_class_key *key); 96 + 97 + bool ring_buffer_last_boot_delta(struct trace_buffer *buffer, long *text, 98 + long *data); 99 + 92 100 /* 93 101 * Because the ring buffer is generic, if other users of the ring buffer get 94 102 * traced by ftrace, it can produce lockdep warnings. We need to keep each ··· 106 98 ({ \ 107 99 static struct lock_class_key __key; \ 108 100 __ring_buffer_alloc((size), (flags), &__key); \ 101 + }) 102 + 103 + /* 104 + * Because the ring buffer is generic, if other users of the ring buffer get 105 + * traced by ftrace, it can produce lockdep warnings. We need to keep each 106 + * ring buffer's lock class separate. 107 + */ 108 + #define ring_buffer_alloc_range(size, flags, order, start, range_size) \ 109 + ({ \ 110 + static struct lock_class_key __key; \ 111 + __ring_buffer_alloc_range((size), (flags), (order), (start), \ 112 + (range_size), &__key); \ 109 113 }) 110 114 111 115 typedef bool (*ring_buffer_cond_fn)(void *data);
+842 -105
kernel/trace/ring_buffer.c
··· 32 32 #include <asm/local64.h> 33 33 #include <asm/local.h> 34 34 35 + #include "trace.h" 36 + 35 37 /* 36 38 * The "absolute" timestamp in the buffer is only 59 bits. 37 39 * If a clock has the 5 MSBs set, it needs to be saved and ··· 43 41 #define ABS_TS_MASK (~TS_MSB) 44 42 45 43 static void update_pages_handler(struct work_struct *work); 44 + 45 + #define RING_BUFFER_META_MAGIC 0xBADFEED 46 + 47 + struct ring_buffer_meta { 48 + int magic; 49 + int struct_size; 50 + unsigned long text_addr; 51 + unsigned long data_addr; 52 + unsigned long first_buffer; 53 + unsigned long head_buffer; 54 + unsigned long commit_buffer; 55 + __u32 subbuf_size; 56 + __u32 nr_subbufs; 57 + int buffers[]; 58 + }; 46 59 47 60 /* 48 61 * The ring buffer header is special. We must manually up keep it. ··· 359 342 local_t entries; /* entries on this page */ 360 343 unsigned long real_end; /* real end of data */ 361 344 unsigned order; /* order of the page */ 362 - u32 id; /* ID for external mapping */ 345 + u32 id:30; /* ID for external mapping */ 346 + u32 range:1; /* Mapped via a range */ 363 347 struct buffer_data_page *page; /* Actual data page */ 364 348 }; 365 349 ··· 391 373 392 374 static void free_buffer_page(struct buffer_page *bpage) 393 375 { 394 - free_pages((unsigned long)bpage->page, bpage->order); 376 + /* Range pages are not to be freed */ 377 + if (!bpage->range) 378 + free_pages((unsigned long)bpage->page, bpage->order); 395 379 kfree(bpage); 396 380 } 397 381 ··· 511 491 unsigned long pages_removed; 512 492 513 493 unsigned int mapped; 494 + unsigned int user_mapped; /* user space mapping */ 514 495 struct mutex mapping_lock; 515 496 unsigned long *subbuf_ids; /* ID to subbuf VA */ 516 497 struct trace_buffer_meta *meta_page; 498 + struct ring_buffer_meta *ring_meta; 517 499 518 500 /* ring buffer pages to update, > 0 to add, < 0 to remove */ 519 501 long nr_pages_to_update; ··· 544 522 545 523 struct rb_irq_work irq_work; 546 524 bool time_stamp_abs; 525 + 526 + unsigned long range_addr_start; 527 + unsigned long range_addr_end; 528 + 529 + long last_text_delta; 530 + long last_data_delta; 547 531 548 532 unsigned int subbuf_size; 549 533 unsigned int subbuf_order; ··· 1267 1239 * Set the previous list pointer to have the HEAD flag. 1268 1240 */ 1269 1241 rb_set_list_to_head(head->list.prev); 1242 + 1243 + if (cpu_buffer->ring_meta) { 1244 + struct ring_buffer_meta *meta = cpu_buffer->ring_meta; 1245 + meta->head_buffer = (unsigned long)head->page; 1246 + } 1270 1247 } 1271 1248 1272 1249 static void rb_list_head_clear(struct list_head *list) ··· 1511 1478 } 1512 1479 } 1513 1480 1481 + /* 1482 + * Take an address, add the meta data size as well as the array of 1483 + * array subbuffer indexes, then align it to a subbuffer size. 1484 + * 1485 + * This is used to help find the next per cpu subbuffer within a mapped range. 1486 + */ 1487 + static unsigned long 1488 + rb_range_align_subbuf(unsigned long addr, int subbuf_size, int nr_subbufs) 1489 + { 1490 + addr += sizeof(struct ring_buffer_meta) + 1491 + sizeof(int) * nr_subbufs; 1492 + return ALIGN(addr, subbuf_size); 1493 + } 1494 + 1495 + /* 1496 + * Return the ring_buffer_meta for a given @cpu. 1497 + */ 1498 + static void *rb_range_meta(struct trace_buffer *buffer, int nr_pages, int cpu) 1499 + { 1500 + int subbuf_size = buffer->subbuf_size + BUF_PAGE_HDR_SIZE; 1501 + unsigned long ptr = buffer->range_addr_start; 1502 + struct ring_buffer_meta *meta; 1503 + int nr_subbufs; 1504 + 1505 + if (!ptr) 1506 + return NULL; 1507 + 1508 + /* When nr_pages passed in is zero, the first meta has already been initialized */ 1509 + if (!nr_pages) { 1510 + meta = (struct ring_buffer_meta *)ptr; 1511 + nr_subbufs = meta->nr_subbufs; 1512 + } else { 1513 + meta = NULL; 1514 + /* Include the reader page */ 1515 + nr_subbufs = nr_pages + 1; 1516 + } 1517 + 1518 + /* 1519 + * The first chunk may not be subbuffer aligned, where as 1520 + * the rest of the chunks are. 1521 + */ 1522 + if (cpu) { 1523 + ptr = rb_range_align_subbuf(ptr, subbuf_size, nr_subbufs); 1524 + ptr += subbuf_size * nr_subbufs; 1525 + 1526 + /* We can use multiplication to find chunks greater than 1 */ 1527 + if (cpu > 1) { 1528 + unsigned long size; 1529 + unsigned long p; 1530 + 1531 + /* Save the beginning of this CPU chunk */ 1532 + p = ptr; 1533 + ptr = rb_range_align_subbuf(ptr, subbuf_size, nr_subbufs); 1534 + ptr += subbuf_size * nr_subbufs; 1535 + 1536 + /* Now all chunks after this are the same size */ 1537 + size = ptr - p; 1538 + ptr += size * (cpu - 2); 1539 + } 1540 + } 1541 + return (void *)ptr; 1542 + } 1543 + 1544 + /* Return the start of subbufs given the meta pointer */ 1545 + static void *rb_subbufs_from_meta(struct ring_buffer_meta *meta) 1546 + { 1547 + int subbuf_size = meta->subbuf_size; 1548 + unsigned long ptr; 1549 + 1550 + ptr = (unsigned long)meta; 1551 + ptr = rb_range_align_subbuf(ptr, subbuf_size, meta->nr_subbufs); 1552 + 1553 + return (void *)ptr; 1554 + } 1555 + 1556 + /* 1557 + * Return a specific sub-buffer for a given @cpu defined by @idx. 1558 + */ 1559 + static void *rb_range_buffer(struct ring_buffer_per_cpu *cpu_buffer, int idx) 1560 + { 1561 + struct ring_buffer_meta *meta; 1562 + unsigned long ptr; 1563 + int subbuf_size; 1564 + 1565 + meta = rb_range_meta(cpu_buffer->buffer, 0, cpu_buffer->cpu); 1566 + if (!meta) 1567 + return NULL; 1568 + 1569 + if (WARN_ON_ONCE(idx >= meta->nr_subbufs)) 1570 + return NULL; 1571 + 1572 + subbuf_size = meta->subbuf_size; 1573 + 1574 + /* Map this buffer to the order that's in meta->buffers[] */ 1575 + idx = meta->buffers[idx]; 1576 + 1577 + ptr = (unsigned long)rb_subbufs_from_meta(meta); 1578 + 1579 + ptr += subbuf_size * idx; 1580 + if (ptr + subbuf_size > cpu_buffer->buffer->range_addr_end) 1581 + return NULL; 1582 + 1583 + return (void *)ptr; 1584 + } 1585 + 1586 + /* 1587 + * See if the existing memory contains valid ring buffer data. 1588 + * As the previous kernel must be the same as this kernel, all 1589 + * the calculations (size of buffers and number of buffers) 1590 + * must be the same. 1591 + */ 1592 + static bool rb_meta_valid(struct ring_buffer_meta *meta, int cpu, 1593 + struct trace_buffer *buffer, int nr_pages) 1594 + { 1595 + int subbuf_size = PAGE_SIZE; 1596 + struct buffer_data_page *subbuf; 1597 + unsigned long buffers_start; 1598 + unsigned long buffers_end; 1599 + int i; 1600 + 1601 + /* Check the meta magic and meta struct size */ 1602 + if (meta->magic != RING_BUFFER_META_MAGIC || 1603 + meta->struct_size != sizeof(*meta)) { 1604 + pr_info("Ring buffer boot meta[%d] mismatch of magic or struct size\n", cpu); 1605 + return false; 1606 + } 1607 + 1608 + /* The subbuffer's size and number of subbuffers must match */ 1609 + if (meta->subbuf_size != subbuf_size || 1610 + meta->nr_subbufs != nr_pages + 1) { 1611 + pr_info("Ring buffer boot meta [%d] mismatch of subbuf_size/nr_pages\n", cpu); 1612 + return false; 1613 + } 1614 + 1615 + buffers_start = meta->first_buffer; 1616 + buffers_end = meta->first_buffer + (subbuf_size * meta->nr_subbufs); 1617 + 1618 + /* Is the head and commit buffers within the range of buffers? */ 1619 + if (meta->head_buffer < buffers_start || 1620 + meta->head_buffer >= buffers_end) { 1621 + pr_info("Ring buffer boot meta [%d] head buffer out of range\n", cpu); 1622 + return false; 1623 + } 1624 + 1625 + if (meta->commit_buffer < buffers_start || 1626 + meta->commit_buffer >= buffers_end) { 1627 + pr_info("Ring buffer boot meta [%d] commit buffer out of range\n", cpu); 1628 + return false; 1629 + } 1630 + 1631 + subbuf = rb_subbufs_from_meta(meta); 1632 + 1633 + /* Is the meta buffers and the subbufs themselves have correct data? */ 1634 + for (i = 0; i < meta->nr_subbufs; i++) { 1635 + if (meta->buffers[i] < 0 || 1636 + meta->buffers[i] >= meta->nr_subbufs) { 1637 + pr_info("Ring buffer boot meta [%d] array out of range\n", cpu); 1638 + return false; 1639 + } 1640 + 1641 + if ((unsigned)local_read(&subbuf->commit) > subbuf_size) { 1642 + pr_info("Ring buffer boot meta [%d] buffer invalid commit\n", cpu); 1643 + return false; 1644 + } 1645 + 1646 + subbuf = (void *)subbuf + subbuf_size; 1647 + } 1648 + 1649 + return true; 1650 + } 1651 + 1652 + static int rb_meta_subbuf_idx(struct ring_buffer_meta *meta, void *subbuf); 1653 + 1654 + static int rb_read_data_buffer(struct buffer_data_page *dpage, int tail, int cpu, 1655 + unsigned long long *timestamp, u64 *delta_ptr) 1656 + { 1657 + struct ring_buffer_event *event; 1658 + u64 ts, delta; 1659 + int events = 0; 1660 + int e; 1661 + 1662 + *delta_ptr = 0; 1663 + *timestamp = 0; 1664 + 1665 + ts = dpage->time_stamp; 1666 + 1667 + for (e = 0; e < tail; e += rb_event_length(event)) { 1668 + 1669 + event = (struct ring_buffer_event *)(dpage->data + e); 1670 + 1671 + switch (event->type_len) { 1672 + 1673 + case RINGBUF_TYPE_TIME_EXTEND: 1674 + delta = rb_event_time_stamp(event); 1675 + ts += delta; 1676 + break; 1677 + 1678 + case RINGBUF_TYPE_TIME_STAMP: 1679 + delta = rb_event_time_stamp(event); 1680 + delta = rb_fix_abs_ts(delta, ts); 1681 + if (delta < ts) { 1682 + *delta_ptr = delta; 1683 + *timestamp = ts; 1684 + return -1; 1685 + } 1686 + ts = delta; 1687 + break; 1688 + 1689 + case RINGBUF_TYPE_PADDING: 1690 + if (event->time_delta == 1) 1691 + break; 1692 + fallthrough; 1693 + case RINGBUF_TYPE_DATA: 1694 + events++; 1695 + ts += event->time_delta; 1696 + break; 1697 + 1698 + default: 1699 + return -1; 1700 + } 1701 + } 1702 + *timestamp = ts; 1703 + return events; 1704 + } 1705 + 1706 + static int rb_validate_buffer(struct buffer_data_page *dpage, int cpu) 1707 + { 1708 + unsigned long long ts; 1709 + u64 delta; 1710 + int tail; 1711 + 1712 + tail = local_read(&dpage->commit); 1713 + return rb_read_data_buffer(dpage, tail, cpu, &ts, &delta); 1714 + } 1715 + 1716 + /* If the meta data has been validated, now validate the events */ 1717 + static void rb_meta_validate_events(struct ring_buffer_per_cpu *cpu_buffer) 1718 + { 1719 + struct ring_buffer_meta *meta = cpu_buffer->ring_meta; 1720 + struct buffer_page *head_page; 1721 + unsigned long entry_bytes = 0; 1722 + unsigned long entries = 0; 1723 + int ret; 1724 + int i; 1725 + 1726 + if (!meta || !meta->head_buffer) 1727 + return; 1728 + 1729 + /* Do the reader page first */ 1730 + ret = rb_validate_buffer(cpu_buffer->reader_page->page, cpu_buffer->cpu); 1731 + if (ret < 0) { 1732 + pr_info("Ring buffer reader page is invalid\n"); 1733 + goto invalid; 1734 + } 1735 + entries += ret; 1736 + entry_bytes += local_read(&cpu_buffer->reader_page->page->commit); 1737 + local_set(&cpu_buffer->reader_page->entries, ret); 1738 + 1739 + head_page = cpu_buffer->head_page; 1740 + 1741 + /* If both the head and commit are on the reader_page then we are done. */ 1742 + if (head_page == cpu_buffer->reader_page && 1743 + head_page == cpu_buffer->commit_page) 1744 + goto done; 1745 + 1746 + /* Iterate until finding the commit page */ 1747 + for (i = 0; i < meta->nr_subbufs + 1; i++, rb_inc_page(&head_page)) { 1748 + 1749 + /* Reader page has already been done */ 1750 + if (head_page == cpu_buffer->reader_page) 1751 + continue; 1752 + 1753 + ret = rb_validate_buffer(head_page->page, cpu_buffer->cpu); 1754 + if (ret < 0) { 1755 + pr_info("Ring buffer meta [%d] invalid buffer page\n", 1756 + cpu_buffer->cpu); 1757 + goto invalid; 1758 + } 1759 + entries += ret; 1760 + entry_bytes += local_read(&head_page->page->commit); 1761 + local_set(&cpu_buffer->head_page->entries, ret); 1762 + 1763 + if (head_page == cpu_buffer->commit_page) 1764 + break; 1765 + } 1766 + 1767 + if (head_page != cpu_buffer->commit_page) { 1768 + pr_info("Ring buffer meta [%d] commit page not found\n", 1769 + cpu_buffer->cpu); 1770 + goto invalid; 1771 + } 1772 + done: 1773 + local_set(&cpu_buffer->entries, entries); 1774 + local_set(&cpu_buffer->entries_bytes, entry_bytes); 1775 + 1776 + pr_info("Ring buffer meta [%d] is from previous boot!\n", cpu_buffer->cpu); 1777 + return; 1778 + 1779 + invalid: 1780 + /* The content of the buffers are invalid, reset the meta data */ 1781 + meta->head_buffer = 0; 1782 + meta->commit_buffer = 0; 1783 + 1784 + /* Reset the reader page */ 1785 + local_set(&cpu_buffer->reader_page->entries, 0); 1786 + local_set(&cpu_buffer->reader_page->page->commit, 0); 1787 + 1788 + /* Reset all the subbuffers */ 1789 + for (i = 0; i < meta->nr_subbufs - 1; i++, rb_inc_page(&head_page)) { 1790 + local_set(&head_page->entries, 0); 1791 + local_set(&head_page->page->commit, 0); 1792 + } 1793 + } 1794 + 1795 + /* Used to calculate data delta */ 1796 + static char rb_data_ptr[] = ""; 1797 + 1798 + #define THIS_TEXT_PTR ((unsigned long)rb_meta_init_text_addr) 1799 + #define THIS_DATA_PTR ((unsigned long)rb_data_ptr) 1800 + 1801 + static void rb_meta_init_text_addr(struct ring_buffer_meta *meta) 1802 + { 1803 + meta->text_addr = THIS_TEXT_PTR; 1804 + meta->data_addr = THIS_DATA_PTR; 1805 + } 1806 + 1807 + static void rb_range_meta_init(struct trace_buffer *buffer, int nr_pages) 1808 + { 1809 + struct ring_buffer_meta *meta; 1810 + unsigned long delta; 1811 + void *subbuf; 1812 + int cpu; 1813 + int i; 1814 + 1815 + for (cpu = 0; cpu < nr_cpu_ids; cpu++) { 1816 + void *next_meta; 1817 + 1818 + meta = rb_range_meta(buffer, nr_pages, cpu); 1819 + 1820 + if (rb_meta_valid(meta, cpu, buffer, nr_pages)) { 1821 + /* Make the mappings match the current address */ 1822 + subbuf = rb_subbufs_from_meta(meta); 1823 + delta = (unsigned long)subbuf - meta->first_buffer; 1824 + meta->first_buffer += delta; 1825 + meta->head_buffer += delta; 1826 + meta->commit_buffer += delta; 1827 + buffer->last_text_delta = THIS_TEXT_PTR - meta->text_addr; 1828 + buffer->last_data_delta = THIS_DATA_PTR - meta->data_addr; 1829 + continue; 1830 + } 1831 + 1832 + if (cpu < nr_cpu_ids - 1) 1833 + next_meta = rb_range_meta(buffer, nr_pages, cpu + 1); 1834 + else 1835 + next_meta = (void *)buffer->range_addr_end; 1836 + 1837 + memset(meta, 0, next_meta - (void *)meta); 1838 + 1839 + meta->magic = RING_BUFFER_META_MAGIC; 1840 + meta->struct_size = sizeof(*meta); 1841 + 1842 + meta->nr_subbufs = nr_pages + 1; 1843 + meta->subbuf_size = PAGE_SIZE; 1844 + 1845 + subbuf = rb_subbufs_from_meta(meta); 1846 + 1847 + meta->first_buffer = (unsigned long)subbuf; 1848 + rb_meta_init_text_addr(meta); 1849 + 1850 + /* 1851 + * The buffers[] array holds the order of the sub-buffers 1852 + * that are after the meta data. The sub-buffers may 1853 + * be swapped out when read and inserted into a different 1854 + * location of the ring buffer. Although their addresses 1855 + * remain the same, the buffers[] array contains the 1856 + * index into the sub-buffers holding their actual order. 1857 + */ 1858 + for (i = 0; i < meta->nr_subbufs; i++) { 1859 + meta->buffers[i] = i; 1860 + rb_init_page(subbuf); 1861 + subbuf += meta->subbuf_size; 1862 + } 1863 + } 1864 + } 1865 + 1866 + static void *rbm_start(struct seq_file *m, loff_t *pos) 1867 + { 1868 + struct ring_buffer_per_cpu *cpu_buffer = m->private; 1869 + struct ring_buffer_meta *meta = cpu_buffer->ring_meta; 1870 + unsigned long val; 1871 + 1872 + if (!meta) 1873 + return NULL; 1874 + 1875 + if (*pos > meta->nr_subbufs) 1876 + return NULL; 1877 + 1878 + val = *pos; 1879 + val++; 1880 + 1881 + return (void *)val; 1882 + } 1883 + 1884 + static void *rbm_next(struct seq_file *m, void *v, loff_t *pos) 1885 + { 1886 + (*pos)++; 1887 + 1888 + return rbm_start(m, pos); 1889 + } 1890 + 1891 + static int rbm_show(struct seq_file *m, void *v) 1892 + { 1893 + struct ring_buffer_per_cpu *cpu_buffer = m->private; 1894 + struct ring_buffer_meta *meta = cpu_buffer->ring_meta; 1895 + unsigned long val = (unsigned long)v; 1896 + 1897 + if (val == 1) { 1898 + seq_printf(m, "head_buffer: %d\n", 1899 + rb_meta_subbuf_idx(meta, (void *)meta->head_buffer)); 1900 + seq_printf(m, "commit_buffer: %d\n", 1901 + rb_meta_subbuf_idx(meta, (void *)meta->commit_buffer)); 1902 + seq_printf(m, "subbuf_size: %d\n", meta->subbuf_size); 1903 + seq_printf(m, "nr_subbufs: %d\n", meta->nr_subbufs); 1904 + return 0; 1905 + } 1906 + 1907 + val -= 2; 1908 + seq_printf(m, "buffer[%ld]: %d\n", val, meta->buffers[val]); 1909 + 1910 + return 0; 1911 + } 1912 + 1913 + static void rbm_stop(struct seq_file *m, void *p) 1914 + { 1915 + } 1916 + 1917 + static const struct seq_operations rb_meta_seq_ops = { 1918 + .start = rbm_start, 1919 + .next = rbm_next, 1920 + .show = rbm_show, 1921 + .stop = rbm_stop, 1922 + }; 1923 + 1924 + int ring_buffer_meta_seq_init(struct file *file, struct trace_buffer *buffer, int cpu) 1925 + { 1926 + struct seq_file *m; 1927 + int ret; 1928 + 1929 + ret = seq_open(file, &rb_meta_seq_ops); 1930 + if (ret) 1931 + return ret; 1932 + 1933 + m = file->private_data; 1934 + m->private = buffer->buffers[cpu]; 1935 + 1936 + return 0; 1937 + } 1938 + 1939 + /* Map the buffer_pages to the previous head and commit pages */ 1940 + static void rb_meta_buffer_update(struct ring_buffer_per_cpu *cpu_buffer, 1941 + struct buffer_page *bpage) 1942 + { 1943 + struct ring_buffer_meta *meta = cpu_buffer->ring_meta; 1944 + 1945 + if (meta->head_buffer == (unsigned long)bpage->page) 1946 + cpu_buffer->head_page = bpage; 1947 + 1948 + if (meta->commit_buffer == (unsigned long)bpage->page) { 1949 + cpu_buffer->commit_page = bpage; 1950 + cpu_buffer->tail_page = bpage; 1951 + } 1952 + } 1953 + 1514 1954 static int __rb_allocate_pages(struct ring_buffer_per_cpu *cpu_buffer, 1515 1955 long nr_pages, struct list_head *pages) 1516 1956 { 1957 + struct trace_buffer *buffer = cpu_buffer->buffer; 1958 + struct ring_buffer_meta *meta = NULL; 1517 1959 struct buffer_page *bpage, *tmp; 1518 1960 bool user_thread = current->mm != NULL; 1519 1961 gfp_t mflags; ··· 2023 1515 */ 2024 1516 if (user_thread) 2025 1517 set_current_oom_origin(); 1518 + 1519 + if (buffer->range_addr_start) 1520 + meta = rb_range_meta(buffer, nr_pages, cpu_buffer->cpu); 1521 + 2026 1522 for (i = 0; i < nr_pages; i++) { 2027 1523 struct page *page; 2028 1524 ··· 2037 1525 2038 1526 rb_check_bpage(cpu_buffer, bpage); 2039 1527 2040 - list_add(&bpage->list, pages); 1528 + /* 1529 + * Append the pages as for mapped buffers we want to keep 1530 + * the order 1531 + */ 1532 + list_add_tail(&bpage->list, pages); 2041 1533 2042 - page = alloc_pages_node(cpu_to_node(cpu_buffer->cpu), 2043 - mflags | __GFP_COMP | __GFP_ZERO, 2044 - cpu_buffer->buffer->subbuf_order); 2045 - if (!page) 2046 - goto free_pages; 2047 - bpage->page = page_address(page); 1534 + if (meta) { 1535 + /* A range was given. Use that for the buffer page */ 1536 + bpage->page = rb_range_buffer(cpu_buffer, i + 1); 1537 + if (!bpage->page) 1538 + goto free_pages; 1539 + /* If this is valid from a previous boot */ 1540 + if (meta->head_buffer) 1541 + rb_meta_buffer_update(cpu_buffer, bpage); 1542 + bpage->range = 1; 1543 + bpage->id = i + 1; 1544 + } else { 1545 + page = alloc_pages_node(cpu_to_node(cpu_buffer->cpu), 1546 + mflags | __GFP_COMP | __GFP_ZERO, 1547 + cpu_buffer->buffer->subbuf_order); 1548 + if (!page) 1549 + goto free_pages; 1550 + bpage->page = page_address(page); 1551 + rb_init_page(bpage->page); 1552 + } 2048 1553 bpage->order = cpu_buffer->buffer->subbuf_order; 2049 - rb_init_page(bpage->page); 2050 1554 2051 1555 if (user_thread && fatal_signal_pending(current)) 2052 1556 goto free_pages; ··· 2112 1584 rb_allocate_cpu_buffer(struct trace_buffer *buffer, long nr_pages, int cpu) 2113 1585 { 2114 1586 struct ring_buffer_per_cpu *cpu_buffer; 1587 + struct ring_buffer_meta *meta; 2115 1588 struct buffer_page *bpage; 2116 1589 struct page *page; 2117 1590 int ret; ··· 2143 1614 2144 1615 cpu_buffer->reader_page = bpage; 2145 1616 2146 - page = alloc_pages_node(cpu_to_node(cpu), GFP_KERNEL | __GFP_COMP | __GFP_ZERO, 2147 - cpu_buffer->buffer->subbuf_order); 2148 - if (!page) 2149 - goto fail_free_reader; 2150 - bpage->page = page_address(page); 2151 - rb_init_page(bpage->page); 1617 + if (buffer->range_addr_start) { 1618 + /* 1619 + * Range mapped buffers have the same restrictions as memory 1620 + * mapped ones do. 1621 + */ 1622 + cpu_buffer->mapped = 1; 1623 + cpu_buffer->ring_meta = rb_range_meta(buffer, nr_pages, cpu); 1624 + bpage->page = rb_range_buffer(cpu_buffer, 0); 1625 + if (!bpage->page) 1626 + goto fail_free_reader; 1627 + if (cpu_buffer->ring_meta->head_buffer) 1628 + rb_meta_buffer_update(cpu_buffer, bpage); 1629 + bpage->range = 1; 1630 + } else { 1631 + page = alloc_pages_node(cpu_to_node(cpu), 1632 + GFP_KERNEL | __GFP_COMP | __GFP_ZERO, 1633 + cpu_buffer->buffer->subbuf_order); 1634 + if (!page) 1635 + goto fail_free_reader; 1636 + bpage->page = page_address(page); 1637 + rb_init_page(bpage->page); 1638 + } 2152 1639 2153 1640 INIT_LIST_HEAD(&cpu_buffer->reader_page->list); 2154 1641 INIT_LIST_HEAD(&cpu_buffer->new_pages); ··· 2173 1628 if (ret < 0) 2174 1629 goto fail_free_reader; 2175 1630 2176 - cpu_buffer->head_page 2177 - = list_entry(cpu_buffer->pages, struct buffer_page, list); 2178 - cpu_buffer->tail_page = cpu_buffer->commit_page = cpu_buffer->head_page; 1631 + rb_meta_validate_events(cpu_buffer); 2179 1632 2180 - rb_head_page_activate(cpu_buffer); 1633 + /* If the boot meta was valid then this has already been updated */ 1634 + meta = cpu_buffer->ring_meta; 1635 + if (!meta || !meta->head_buffer || 1636 + !cpu_buffer->head_page || !cpu_buffer->commit_page || !cpu_buffer->tail_page) { 1637 + if (meta && meta->head_buffer && 1638 + (cpu_buffer->head_page || cpu_buffer->commit_page || cpu_buffer->tail_page)) { 1639 + pr_warn("Ring buffer meta buffers not all mapped\n"); 1640 + if (!cpu_buffer->head_page) 1641 + pr_warn(" Missing head_page\n"); 1642 + if (!cpu_buffer->commit_page) 1643 + pr_warn(" Missing commit_page\n"); 1644 + if (!cpu_buffer->tail_page) 1645 + pr_warn(" Missing tail_page\n"); 1646 + } 1647 + 1648 + cpu_buffer->head_page 1649 + = list_entry(cpu_buffer->pages, struct buffer_page, list); 1650 + cpu_buffer->tail_page = cpu_buffer->commit_page = cpu_buffer->head_page; 1651 + 1652 + rb_head_page_activate(cpu_buffer); 1653 + 1654 + if (cpu_buffer->ring_meta) 1655 + meta->commit_buffer = meta->head_buffer; 1656 + } else { 1657 + /* The valid meta buffer still needs to activate the head page */ 1658 + rb_head_page_activate(cpu_buffer); 1659 + } 2181 1660 2182 1661 return cpu_buffer; 2183 1662 ··· 2238 1669 kfree(cpu_buffer); 2239 1670 } 2240 1671 2241 - /** 2242 - * __ring_buffer_alloc - allocate a new ring_buffer 2243 - * @size: the size in bytes per cpu that is needed. 2244 - * @flags: attributes to set for the ring buffer. 2245 - * @key: ring buffer reader_lock_key. 2246 - * 2247 - * Currently the only flag that is available is the RB_FL_OVERWRITE 2248 - * flag. This flag means that the buffer will overwrite old data 2249 - * when the buffer wraps. If this flag is not set, the buffer will 2250 - * drop data when the tail hits the head. 2251 - */ 2252 - struct trace_buffer *__ring_buffer_alloc(unsigned long size, unsigned flags, 2253 - struct lock_class_key *key) 1672 + static struct trace_buffer *alloc_buffer(unsigned long size, unsigned flags, 1673 + int order, unsigned long start, 1674 + unsigned long end, 1675 + struct lock_class_key *key) 2254 1676 { 2255 1677 struct trace_buffer *buffer; 2256 1678 long nr_pages; 1679 + int subbuf_size; 2257 1680 int bsize; 2258 1681 int cpu; 2259 1682 int ret; ··· 2259 1698 if (!zalloc_cpumask_var(&buffer->cpumask, GFP_KERNEL)) 2260 1699 goto fail_free_buffer; 2261 1700 2262 - /* Default buffer page size - one system page */ 2263 - buffer->subbuf_order = 0; 2264 - buffer->subbuf_size = PAGE_SIZE - BUF_PAGE_HDR_SIZE; 1701 + buffer->subbuf_order = order; 1702 + subbuf_size = (PAGE_SIZE << order); 1703 + buffer->subbuf_size = subbuf_size - BUF_PAGE_HDR_SIZE; 2265 1704 2266 1705 /* Max payload is buffer page size - header (8bytes) */ 2267 1706 buffer->max_data_size = buffer->subbuf_size - (sizeof(u32) * 2); 2268 1707 2269 - nr_pages = DIV_ROUND_UP(size, buffer->subbuf_size); 2270 1708 buffer->flags = flags; 2271 1709 buffer->clock = trace_clock_local; 2272 1710 buffer->reader_lock_key = key; 2273 1711 2274 1712 init_irq_work(&buffer->irq_work.work, rb_wake_up_waiters); 2275 1713 init_waitqueue_head(&buffer->irq_work.waiters); 2276 - 2277 - /* need at least two pages */ 2278 - if (nr_pages < 2) 2279 - nr_pages = 2; 2280 1714 2281 1715 buffer->cpus = nr_cpu_ids; 2282 1716 ··· 2280 1724 GFP_KERNEL); 2281 1725 if (!buffer->buffers) 2282 1726 goto fail_free_cpumask; 1727 + 1728 + /* If start/end are specified, then that overrides size */ 1729 + if (start && end) { 1730 + unsigned long ptr; 1731 + int n; 1732 + 1733 + size = end - start; 1734 + size = size / nr_cpu_ids; 1735 + 1736 + /* 1737 + * The number of sub-buffers (nr_pages) is determined by the 1738 + * total size allocated minus the meta data size. 1739 + * Then that is divided by the number of per CPU buffers 1740 + * needed, plus account for the integer array index that 1741 + * will be appended to the meta data. 1742 + */ 1743 + nr_pages = (size - sizeof(struct ring_buffer_meta)) / 1744 + (subbuf_size + sizeof(int)); 1745 + /* Need at least two pages plus the reader page */ 1746 + if (nr_pages < 3) 1747 + goto fail_free_buffers; 1748 + 1749 + again: 1750 + /* Make sure that the size fits aligned */ 1751 + for (n = 0, ptr = start; n < nr_cpu_ids; n++) { 1752 + ptr += sizeof(struct ring_buffer_meta) + 1753 + sizeof(int) * nr_pages; 1754 + ptr = ALIGN(ptr, subbuf_size); 1755 + ptr += subbuf_size * nr_pages; 1756 + } 1757 + if (ptr > end) { 1758 + if (nr_pages <= 3) 1759 + goto fail_free_buffers; 1760 + nr_pages--; 1761 + goto again; 1762 + } 1763 + 1764 + /* nr_pages should not count the reader page */ 1765 + nr_pages--; 1766 + buffer->range_addr_start = start; 1767 + buffer->range_addr_end = end; 1768 + 1769 + rb_range_meta_init(buffer, nr_pages); 1770 + } else { 1771 + 1772 + /* need at least two pages */ 1773 + nr_pages = DIV_ROUND_UP(size, buffer->subbuf_size); 1774 + if (nr_pages < 2) 1775 + nr_pages = 2; 1776 + } 2283 1777 2284 1778 cpu = raw_smp_processor_id(); 2285 1779 cpumask_set_cpu(cpu, buffer->cpumask); ··· 2359 1753 kfree(buffer); 2360 1754 return NULL; 2361 1755 } 1756 + 1757 + /** 1758 + * __ring_buffer_alloc - allocate a new ring_buffer 1759 + * @size: the size in bytes per cpu that is needed. 1760 + * @flags: attributes to set for the ring buffer. 1761 + * @key: ring buffer reader_lock_key. 1762 + * 1763 + * Currently the only flag that is available is the RB_FL_OVERWRITE 1764 + * flag. This flag means that the buffer will overwrite old data 1765 + * when the buffer wraps. If this flag is not set, the buffer will 1766 + * drop data when the tail hits the head. 1767 + */ 1768 + struct trace_buffer *__ring_buffer_alloc(unsigned long size, unsigned flags, 1769 + struct lock_class_key *key) 1770 + { 1771 + /* Default buffer page size - one system page */ 1772 + return alloc_buffer(size, flags, 0, 0, 0,key); 1773 + 1774 + } 2362 1775 EXPORT_SYMBOL_GPL(__ring_buffer_alloc); 1776 + 1777 + /** 1778 + * __ring_buffer_alloc_range - allocate a new ring_buffer from existing memory 1779 + * @size: the size in bytes per cpu that is needed. 1780 + * @flags: attributes to set for the ring buffer. 1781 + * @start: start of allocated range 1782 + * @range_size: size of allocated range 1783 + * @order: sub-buffer order 1784 + * @key: ring buffer reader_lock_key. 1785 + * 1786 + * Currently the only flag that is available is the RB_FL_OVERWRITE 1787 + * flag. This flag means that the buffer will overwrite old data 1788 + * when the buffer wraps. If this flag is not set, the buffer will 1789 + * drop data when the tail hits the head. 1790 + */ 1791 + struct trace_buffer *__ring_buffer_alloc_range(unsigned long size, unsigned flags, 1792 + int order, unsigned long start, 1793 + unsigned long range_size, 1794 + struct lock_class_key *key) 1795 + { 1796 + return alloc_buffer(size, flags, order, start, start + range_size, key); 1797 + } 1798 + 1799 + /** 1800 + * ring_buffer_last_boot_delta - return the delta offset from last boot 1801 + * @buffer: The buffer to return the delta from 1802 + * @text: Return text delta 1803 + * @data: Return data delta 1804 + * 1805 + * Returns: The true if the delta is non zero 1806 + */ 1807 + bool ring_buffer_last_boot_delta(struct trace_buffer *buffer, long *text, 1808 + long *data) 1809 + { 1810 + if (!buffer) 1811 + return false; 1812 + 1813 + if (!buffer->last_text_delta) 1814 + return false; 1815 + 1816 + *text = buffer->last_text_delta; 1817 + *data = buffer->last_data_delta; 1818 + 1819 + return true; 1820 + } 2363 1821 2364 1822 /** 2365 1823 * ring_buffer_free - free a ring buffer. ··· 3034 2364 iter->next_event = 0; 3035 2365 } 3036 2366 2367 + /* Return the index into the sub-buffers for a given sub-buffer */ 2368 + static int rb_meta_subbuf_idx(struct ring_buffer_meta *meta, void *subbuf) 2369 + { 2370 + void *subbuf_array; 2371 + 2372 + subbuf_array = (void *)meta + sizeof(int) * meta->nr_subbufs; 2373 + subbuf_array = (void *)ALIGN((unsigned long)subbuf_array, meta->subbuf_size); 2374 + return (subbuf - subbuf_array) / meta->subbuf_size; 2375 + } 2376 + 2377 + static void rb_update_meta_head(struct ring_buffer_per_cpu *cpu_buffer, 2378 + struct buffer_page *next_page) 2379 + { 2380 + struct ring_buffer_meta *meta = cpu_buffer->ring_meta; 2381 + unsigned long old_head = (unsigned long)next_page->page; 2382 + unsigned long new_head; 2383 + 2384 + rb_inc_page(&next_page); 2385 + new_head = (unsigned long)next_page->page; 2386 + 2387 + /* 2388 + * Only move it forward once, if something else came in and 2389 + * moved it forward, then we don't want to touch it. 2390 + */ 2391 + (void)cmpxchg(&meta->head_buffer, old_head, new_head); 2392 + } 2393 + 2394 + static void rb_update_meta_reader(struct ring_buffer_per_cpu *cpu_buffer, 2395 + struct buffer_page *reader) 2396 + { 2397 + struct ring_buffer_meta *meta = cpu_buffer->ring_meta; 2398 + void *old_reader = cpu_buffer->reader_page->page; 2399 + void *new_reader = reader->page; 2400 + int id; 2401 + 2402 + id = reader->id; 2403 + cpu_buffer->reader_page->id = id; 2404 + reader->id = 0; 2405 + 2406 + meta->buffers[0] = rb_meta_subbuf_idx(meta, new_reader); 2407 + meta->buffers[id] = rb_meta_subbuf_idx(meta, old_reader); 2408 + 2409 + /* The head pointer is the one after the reader */ 2410 + rb_update_meta_head(cpu_buffer, reader); 2411 + } 2412 + 3037 2413 /* 3038 2414 * rb_handle_head_page - writer hit the head page 3039 2415 * ··· 3129 2413 local_sub(rb_page_commit(next_page), &cpu_buffer->entries_bytes); 3130 2414 local_inc(&cpu_buffer->pages_lost); 3131 2415 2416 + if (cpu_buffer->ring_meta) 2417 + rb_update_meta_head(cpu_buffer, next_page); 3132 2418 /* 3133 2419 * The entries will be zeroed out when we move the 3134 2420 * tail page. ··· 3692 2974 local_set(&cpu_buffer->commit_page->page->commit, 3693 2975 rb_page_write(cpu_buffer->commit_page)); 3694 2976 rb_inc_page(&cpu_buffer->commit_page); 2977 + if (cpu_buffer->ring_meta) { 2978 + struct ring_buffer_meta *meta = cpu_buffer->ring_meta; 2979 + meta->commit_buffer = (unsigned long)cpu_buffer->commit_page->page; 2980 + } 3695 2981 /* add barrier to keep gcc from optimizing too much */ 3696 2982 barrier(); 3697 2983 } ··· 4142 3420 struct rb_event_info *info, 4143 3421 unsigned long tail) 4144 3422 { 4145 - struct ring_buffer_event *event; 4146 3423 struct buffer_data_page *bpage; 4147 3424 u64 ts, delta; 4148 3425 bool full = false; 4149 - int e; 3426 + int ret; 4150 3427 4151 3428 bpage = info->tail_page->page; 4152 3429 ··· 4171 3450 if (atomic_inc_return(this_cpu_ptr(&checking)) != 1) 4172 3451 goto out; 4173 3452 4174 - ts = bpage->time_stamp; 4175 - 4176 - for (e = 0; e < tail; e += rb_event_length(event)) { 4177 - 4178 - event = (struct ring_buffer_event *)(bpage->data + e); 4179 - 4180 - switch (event->type_len) { 4181 - 4182 - case RINGBUF_TYPE_TIME_EXTEND: 4183 - delta = rb_event_time_stamp(event); 4184 - ts += delta; 4185 - break; 4186 - 4187 - case RINGBUF_TYPE_TIME_STAMP: 4188 - delta = rb_event_time_stamp(event); 4189 - delta = rb_fix_abs_ts(delta, ts); 4190 - if (delta < ts) { 4191 - buffer_warn_return("[CPU: %d]ABSOLUTE TIME WENT BACKWARDS: last ts: %lld absolute ts: %lld\n", 4192 - cpu_buffer->cpu, ts, delta); 4193 - } 4194 - ts = delta; 4195 - break; 4196 - 4197 - case RINGBUF_TYPE_PADDING: 4198 - if (event->time_delta == 1) 4199 - break; 4200 - fallthrough; 4201 - case RINGBUF_TYPE_DATA: 4202 - ts += event->time_delta; 4203 - break; 4204 - 4205 - default: 4206 - RB_WARN_ON(cpu_buffer, 1); 3453 + ret = rb_read_data_buffer(bpage, tail, cpu_buffer->cpu, &ts, &delta); 3454 + if (ret < 0) { 3455 + if (delta < ts) { 3456 + buffer_warn_return("[CPU: %d]ABSOLUTE TIME WENT BACKWARDS: last ts: %lld absolute ts: %lld\n", 3457 + cpu_buffer->cpu, ts, delta); 3458 + goto out; 4207 3459 } 4208 3460 } 4209 3461 if ((full && ts > info->ts) || ··· 5285 4591 if (!ret) 5286 4592 goto spin; 5287 4593 4594 + if (cpu_buffer->ring_meta) 4595 + rb_update_meta_reader(cpu_buffer, reader); 4596 + 5288 4597 /* 5289 4598 * Yay! We succeeded in replacing the page. 5290 4599 * ··· 5909 5212 { 5910 5213 struct trace_buffer_meta *meta = cpu_buffer->meta_page; 5911 5214 5215 + if (!meta) 5216 + return; 5217 + 5912 5218 meta->reader.read = cpu_buffer->reader_page->read; 5913 5219 meta->reader.id = cpu_buffer->reader_page->id; 5914 5220 meta->reader.lost_events = cpu_buffer->lost_events; ··· 5968 5268 cpu_buffer->lost_events = 0; 5969 5269 cpu_buffer->last_overrun = 0; 5970 5270 5971 - if (cpu_buffer->mapped) 5972 - rb_update_meta_page(cpu_buffer); 5973 - 5974 5271 rb_head_page_activate(cpu_buffer); 5975 5272 cpu_buffer->pages_removed = 0; 5273 + 5274 + if (cpu_buffer->mapped) { 5275 + rb_update_meta_page(cpu_buffer); 5276 + if (cpu_buffer->ring_meta) { 5277 + struct ring_buffer_meta *meta = cpu_buffer->ring_meta; 5278 + meta->commit_buffer = meta->head_buffer; 5279 + } 5280 + } 5976 5281 } 5977 5282 5978 5283 /* Must have disabled the cpu buffer then done a synchronize_rcu */ ··· 6008 5303 void ring_buffer_reset_cpu(struct trace_buffer *buffer, int cpu) 6009 5304 { 6010 5305 struct ring_buffer_per_cpu *cpu_buffer = buffer->buffers[cpu]; 5306 + struct ring_buffer_meta *meta; 6011 5307 6012 5308 if (!cpumask_test_cpu(cpu, buffer->cpumask)) 6013 5309 return; ··· 6027 5321 atomic_dec(&cpu_buffer->record_disabled); 6028 5322 atomic_dec(&cpu_buffer->resize_disabled); 6029 5323 5324 + /* Make sure persistent meta now uses this buffer's addresses */ 5325 + meta = rb_range_meta(buffer, 0, cpu_buffer->cpu); 5326 + if (meta) 5327 + rb_meta_init_text_addr(meta); 5328 + 6030 5329 mutex_unlock(&buffer->mutex); 6031 5330 } 6032 5331 EXPORT_SYMBOL_GPL(ring_buffer_reset_cpu); ··· 6046 5335 void ring_buffer_reset_online_cpus(struct trace_buffer *buffer) 6047 5336 { 6048 5337 struct ring_buffer_per_cpu *cpu_buffer; 5338 + struct ring_buffer_meta *meta; 6049 5339 int cpu; 6050 5340 6051 5341 /* prevent another thread from changing buffer sizes */ ··· 6073 5361 continue; 6074 5362 6075 5363 reset_disabled_cpu_buffer(cpu_buffer); 5364 + 5365 + /* Make sure persistent meta now uses this buffer's addresses */ 5366 + meta = rb_range_meta(buffer, 0, cpu_buffer->cpu); 5367 + if (meta) 5368 + rb_meta_init_text_addr(meta); 6076 5369 6077 5370 atomic_dec(&cpu_buffer->record_disabled); 6078 5371 atomic_sub(RESET_BIT, &cpu_buffer->resize_disabled); ··· 6852 6135 /* install subbuf ID to kern VA translation */ 6853 6136 cpu_buffer->subbuf_ids = subbuf_ids; 6854 6137 6855 - meta->meta_page_size = PAGE_SIZE; 6856 6138 meta->meta_struct_len = sizeof(*meta); 6857 6139 meta->nr_subbufs = nr_subbufs; 6858 6140 meta->subbuf_size = cpu_buffer->buffer->subbuf_size + BUF_PAGE_HDR_SIZE; 6141 + meta->meta_page_size = meta->subbuf_size; 6859 6142 6860 6143 rb_update_meta_page(cpu_buffer); 6861 6144 } ··· 6872 6155 6873 6156 mutex_lock(&cpu_buffer->mapping_lock); 6874 6157 6875 - if (!cpu_buffer->mapped) { 6158 + if (!cpu_buffer->user_mapped) { 6876 6159 mutex_unlock(&cpu_buffer->mapping_lock); 6877 6160 return ERR_PTR(-ENODEV); 6878 6161 } ··· 6896 6179 6897 6180 lockdep_assert_held(&cpu_buffer->mapping_lock); 6898 6181 6182 + /* mapped is always greater or equal to user_mapped */ 6183 + if (WARN_ON(cpu_buffer->mapped < cpu_buffer->user_mapped)) 6184 + return -EINVAL; 6185 + 6899 6186 if (inc && cpu_buffer->mapped == UINT_MAX) 6900 6187 return -EBUSY; 6901 6188 6902 - if (WARN_ON(!inc && cpu_buffer->mapped == 0)) 6189 + if (WARN_ON(!inc && cpu_buffer->user_mapped == 0)) 6903 6190 return -EINVAL; 6904 6191 6905 6192 mutex_lock(&cpu_buffer->buffer->mutex); 6906 6193 raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags); 6907 6194 6908 - if (inc) 6195 + if (inc) { 6196 + cpu_buffer->user_mapped++; 6909 6197 cpu_buffer->mapped++; 6910 - else 6198 + } else { 6199 + cpu_buffer->user_mapped--; 6911 6200 cpu_buffer->mapped--; 6201 + } 6912 6202 6913 6203 raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags); 6914 6204 mutex_unlock(&cpu_buffer->buffer->mutex); ··· 6938 6214 static int __rb_map_vma(struct ring_buffer_per_cpu *cpu_buffer, 6939 6215 struct vm_area_struct *vma) 6940 6216 { 6941 - unsigned long nr_subbufs, nr_pages, vma_pages, pgoff = vma->vm_pgoff; 6217 + unsigned long nr_subbufs, nr_pages, nr_vma_pages, pgoff = vma->vm_pgoff; 6942 6218 unsigned int subbuf_pages, subbuf_order; 6943 6219 struct page **pages; 6944 6220 int p = 0, s = 0; ··· 6949 6225 !(vma->vm_flags & VM_MAYSHARE)) 6950 6226 return -EPERM; 6951 6227 6228 + subbuf_order = cpu_buffer->buffer->subbuf_order; 6229 + subbuf_pages = 1 << subbuf_order; 6230 + 6231 + if (subbuf_order && pgoff % subbuf_pages) 6232 + return -EINVAL; 6233 + 6952 6234 /* 6953 6235 * Make sure the mapping cannot become writable later. Also tell the VM 6954 6236 * to not touch these pages (VM_DONTCOPY | VM_DONTEXPAND). ··· 6964 6234 6965 6235 lockdep_assert_held(&cpu_buffer->mapping_lock); 6966 6236 6967 - subbuf_order = cpu_buffer->buffer->subbuf_order; 6968 - subbuf_pages = 1 << subbuf_order; 6969 - 6970 6237 nr_subbufs = cpu_buffer->nr_pages + 1; /* + reader-subbuf */ 6971 - nr_pages = ((nr_subbufs) << subbuf_order) - pgoff + 1; /* + meta-page */ 6238 + nr_pages = ((nr_subbufs + 1) << subbuf_order) - pgoff; /* + meta-page */ 6972 6239 6973 - vma_pages = (vma->vm_end - vma->vm_start) >> PAGE_SHIFT; 6974 - if (!vma_pages || vma_pages > nr_pages) 6240 + nr_vma_pages = vma_pages(vma); 6241 + if (!nr_vma_pages || nr_vma_pages > nr_pages) 6975 6242 return -EINVAL; 6976 6243 6977 - nr_pages = vma_pages; 6244 + nr_pages = nr_vma_pages; 6978 6245 6979 6246 pages = kcalloc(nr_pages, sizeof(*pages), GFP_KERNEL); 6980 6247 if (!pages) 6981 6248 return -ENOMEM; 6982 6249 6983 6250 if (!pgoff) { 6251 + unsigned long meta_page_padding; 6252 + 6984 6253 pages[p++] = virt_to_page(cpu_buffer->meta_page); 6985 6254 6986 6255 /* 6987 - * TODO: Align sub-buffers on their size, once 6988 - * vm_insert_pages() supports the zero-page. 6256 + * Pad with the zero-page to align the meta-page with the 6257 + * sub-buffers. 6989 6258 */ 6259 + meta_page_padding = subbuf_pages - 1; 6260 + while (meta_page_padding-- && p < nr_pages) { 6261 + unsigned long __maybe_unused zero_addr = 6262 + vma->vm_start + (PAGE_SIZE * p); 6263 + 6264 + pages[p++] = ZERO_PAGE(zero_addr); 6265 + } 6990 6266 } else { 6991 6267 /* Skip the meta-page */ 6992 - pgoff--; 6993 - 6994 - if (pgoff % subbuf_pages) { 6995 - err = -EINVAL; 6996 - goto out; 6997 - } 6268 + pgoff -= subbuf_pages; 6998 6269 6999 6270 s += pgoff / subbuf_pages; 7000 6271 } ··· 7047 6316 7048 6317 mutex_lock(&cpu_buffer->mapping_lock); 7049 6318 7050 - if (cpu_buffer->mapped) { 6319 + if (cpu_buffer->user_mapped) { 7051 6320 err = __rb_map_vma(cpu_buffer, vma); 7052 6321 if (!err) 7053 6322 err = __rb_inc_dec_mapped(cpu_buffer, true); ··· 7078 6347 */ 7079 6348 raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags); 7080 6349 rb_setup_ids_meta_page(cpu_buffer, subbuf_ids); 6350 + 7081 6351 raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags); 7082 6352 7083 6353 err = __rb_map_vma(cpu_buffer, vma); 7084 6354 if (!err) { 7085 6355 raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags); 7086 - cpu_buffer->mapped = 1; 6356 + /* This is the first time it is mapped by user */ 6357 + cpu_buffer->mapped++; 6358 + cpu_buffer->user_mapped = 1; 7087 6359 raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags); 7088 6360 } else { 7089 6361 kfree(cpu_buffer->subbuf_ids); ··· 7114 6380 7115 6381 mutex_lock(&cpu_buffer->mapping_lock); 7116 6382 7117 - if (!cpu_buffer->mapped) { 6383 + if (!cpu_buffer->user_mapped) { 7118 6384 err = -ENODEV; 7119 6385 goto out; 7120 - } else if (cpu_buffer->mapped > 1) { 6386 + } else if (cpu_buffer->user_mapped > 1) { 7121 6387 __rb_inc_dec_mapped(cpu_buffer, false); 7122 6388 goto out; 7123 6389 } ··· 7125 6391 mutex_lock(&buffer->mutex); 7126 6392 raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags); 7127 6393 7128 - cpu_buffer->mapped = 0; 6394 + /* This is the last user space mapping */ 6395 + if (!WARN_ON_ONCE(cpu_buffer->mapped < cpu_buffer->user_mapped)) 6396 + cpu_buffer->mapped--; 6397 + cpu_buffer->user_mapped = 0; 7129 6398 7130 6399 raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags); 7131 6400
+338 -34
kernel/trace/trace.c
··· 482 482 TRACE_ITER_ANNOTATE | TRACE_ITER_CONTEXT_INFO | \ 483 483 TRACE_ITER_RECORD_CMD | TRACE_ITER_OVERWRITE | \ 484 484 TRACE_ITER_IRQ_INFO | TRACE_ITER_MARKERS | \ 485 - TRACE_ITER_HASH_PTR) 485 + TRACE_ITER_HASH_PTR | TRACE_ITER_TRACE_PRINTK) 486 486 487 487 /* trace_options that are only supported by global_trace */ 488 488 #define TOP_LEVEL_TRACE_FLAGS (TRACE_ITER_PRINTK | \ ··· 490 490 491 491 /* trace_flags that are default zero for instances */ 492 492 #define ZEROED_TRACE_FLAGS \ 493 - (TRACE_ITER_EVENT_FORK | TRACE_ITER_FUNC_FORK) 493 + (TRACE_ITER_EVENT_FORK | TRACE_ITER_FUNC_FORK | TRACE_ITER_TRACE_PRINTK) 494 494 495 495 /* 496 496 * The global_trace is the descriptor that holds the top-level tracing ··· 499 499 static struct trace_array global_trace = { 500 500 .trace_flags = TRACE_DEFAULT_FLAGS, 501 501 }; 502 + 503 + static struct trace_array *printk_trace = &global_trace; 504 + 505 + static __always_inline bool printk_binsafe(struct trace_array *tr) 506 + { 507 + /* 508 + * The binary format of traceprintk can cause a crash if used 509 + * by a buffer from another boot. Force the use of the 510 + * non binary version of trace_printk if the trace_printk 511 + * buffer is a boot mapped ring buffer. 512 + */ 513 + return !(tr->flags & TRACE_ARRAY_FL_BOOT); 514 + } 515 + 516 + static void update_printk_trace(struct trace_array *tr) 517 + { 518 + if (printk_trace == tr) 519 + return; 520 + 521 + printk_trace->trace_flags &= ~TRACE_ITER_TRACE_PRINTK; 522 + printk_trace = tr; 523 + tr->trace_flags |= TRACE_ITER_TRACE_PRINTK; 524 + } 502 525 503 526 void trace_set_ring_buffer_expanded(struct trace_array *tr) 504 527 { ··· 1140 1117 */ 1141 1118 int __trace_puts(unsigned long ip, const char *str, int size) 1142 1119 { 1143 - return __trace_array_puts(&global_trace, ip, str, size); 1120 + return __trace_array_puts(printk_trace, ip, str, size); 1144 1121 } 1145 1122 EXPORT_SYMBOL_GPL(__trace_puts); 1146 1123 ··· 1151 1128 */ 1152 1129 int __trace_bputs(unsigned long ip, const char *str) 1153 1130 { 1131 + struct trace_array *tr = READ_ONCE(printk_trace); 1154 1132 struct ring_buffer_event *event; 1155 1133 struct trace_buffer *buffer; 1156 1134 struct bputs_entry *entry; ··· 1159 1135 int size = sizeof(struct bputs_entry); 1160 1136 int ret = 0; 1161 1137 1162 - if (!(global_trace.trace_flags & TRACE_ITER_PRINTK)) 1138 + if (!printk_binsafe(tr)) 1139 + return __trace_puts(ip, str, strlen(str)); 1140 + 1141 + if (!(tr->trace_flags & TRACE_ITER_PRINTK)) 1163 1142 return 0; 1164 1143 1165 1144 if (unlikely(tracing_selftest_running || tracing_disabled)) 1166 1145 return 0; 1167 1146 1168 1147 trace_ctx = tracing_gen_ctx(); 1169 - buffer = global_trace.array_buffer.buffer; 1148 + buffer = tr->array_buffer.buffer; 1170 1149 1171 1150 ring_buffer_nest_start(buffer); 1172 1151 event = __trace_buffer_lock_reserve(buffer, TRACE_BPUTS, size, ··· 1182 1155 entry->str = str; 1183 1156 1184 1157 __buffer_unlock_commit(buffer, event); 1185 - ftrace_trace_stack(&global_trace, buffer, trace_ctx, 4, NULL); 1158 + ftrace_trace_stack(tr, buffer, trace_ctx, 4, NULL); 1186 1159 1187 1160 ret = 1; 1188 1161 out: ··· 3048 3021 /* Skip 1 to skip this function. */ 3049 3022 skip++; 3050 3023 #endif 3051 - __ftrace_trace_stack(global_trace.array_buffer.buffer, 3024 + __ftrace_trace_stack(printk_trace->array_buffer.buffer, 3052 3025 tracing_gen_ctx(), skip, NULL); 3053 3026 } 3054 3027 EXPORT_SYMBOL_GPL(trace_dump_stack); ··· 3267 3240 struct trace_event_call *call = &event_bprint; 3268 3241 struct ring_buffer_event *event; 3269 3242 struct trace_buffer *buffer; 3270 - struct trace_array *tr = &global_trace; 3243 + struct trace_array *tr = READ_ONCE(printk_trace); 3271 3244 struct bprint_entry *entry; 3272 3245 unsigned int trace_ctx; 3273 3246 char *tbuffer; 3274 3247 int len = 0, size; 3248 + 3249 + if (!printk_binsafe(tr)) 3250 + return trace_vprintk(ip, fmt, args); 3275 3251 3276 3252 if (unlikely(tracing_selftest_running || tracing_disabled)) 3277 3253 return 0; ··· 3368 3338 memcpy(&entry->buf, tbuffer, len + 1); 3369 3339 if (!call_filter_check_discard(call, entry, buffer, event)) { 3370 3340 __buffer_unlock_commit(buffer, event); 3371 - ftrace_trace_stack(&global_trace, buffer, trace_ctx, 6, NULL); 3341 + ftrace_trace_stack(printk_trace, buffer, trace_ctx, 6, NULL); 3372 3342 } 3373 3343 3374 3344 out: ··· 3464 3434 int ret; 3465 3435 va_list ap; 3466 3436 3467 - if (!(global_trace.trace_flags & TRACE_ITER_PRINTK)) 3437 + if (!(printk_trace->trace_flags & TRACE_ITER_PRINTK)) 3468 3438 return 0; 3469 3439 3470 3440 va_start(ap, fmt); ··· 3476 3446 __printf(2, 0) 3477 3447 int trace_vprintk(unsigned long ip, const char *fmt, va_list args) 3478 3448 { 3479 - return trace_array_vprintk(&global_trace, ip, fmt, args); 3449 + return trace_array_vprintk(printk_trace, ip, fmt, args); 3480 3450 } 3481 3451 EXPORT_SYMBOL_GPL(trace_vprintk); 3482 3452 ··· 3697 3667 void trace_check_vprintf(struct trace_iterator *iter, const char *fmt, 3698 3668 va_list ap) 3699 3669 { 3670 + long text_delta = iter->tr->text_delta; 3671 + long data_delta = iter->tr->data_delta; 3700 3672 const char *p = fmt; 3701 3673 const char *str; 3674 + bool good; 3702 3675 int i, j; 3703 3676 3704 3677 if (WARN_ON_ONCE(!fmt)) ··· 3720 3687 3721 3688 j = 0; 3722 3689 3723 - /* We only care about %s and variants */ 3690 + /* 3691 + * We only care about %s and variants 3692 + * as well as %p[sS] if delta is non-zero 3693 + */ 3724 3694 for (i = 0; p[i]; i++) { 3725 3695 if (i + 1 >= iter->fmt_size) { 3726 3696 /* ··· 3752 3716 } 3753 3717 if (p[i+j] == 's') 3754 3718 break; 3719 + 3720 + if (text_delta && p[i+1] == 'p' && 3721 + ((p[i+2] == 's' || p[i+2] == 'S'))) 3722 + break; 3723 + 3755 3724 star = false; 3756 3725 } 3757 3726 j = 0; ··· 3769 3728 strncpy(iter->fmt, p, i); 3770 3729 iter->fmt[i] = '\0'; 3771 3730 trace_seq_vprintf(&iter->seq, iter->fmt, ap); 3731 + 3732 + /* Add delta to %pS pointers */ 3733 + if (p[i+1] == 'p') { 3734 + unsigned long addr; 3735 + char fmt[4]; 3736 + 3737 + fmt[0] = '%'; 3738 + fmt[1] = 'p'; 3739 + fmt[2] = p[i+2]; /* Either %ps or %pS */ 3740 + fmt[3] = '\0'; 3741 + 3742 + addr = va_arg(ap, unsigned long); 3743 + addr += text_delta; 3744 + trace_seq_printf(&iter->seq, fmt, (void *)addr); 3745 + 3746 + p += i + 3; 3747 + continue; 3748 + } 3772 3749 3773 3750 /* 3774 3751 * If iter->seq is full, the above call no longer guarantees ··· 3806 3747 /* The ap now points to the string data of the %s */ 3807 3748 str = va_arg(ap, const char *); 3808 3749 3750 + good = trace_safe_str(iter, str, star, len); 3751 + 3752 + /* Could be from the last boot */ 3753 + if (data_delta && !good) { 3754 + str += data_delta; 3755 + good = trace_safe_str(iter, str, star, len); 3756 + } 3757 + 3809 3758 /* 3810 3759 * If you hit this warning, it is likely that the 3811 3760 * trace event in question used %s on a string that ··· 3823 3756 * instead. See samples/trace_events/trace-events-sample.h 3824 3757 * for reference. 3825 3758 */ 3826 - if (WARN_ONCE(!trace_safe_str(iter, str, star, len), 3827 - "fmt: '%s' current_buffer: '%s'", 3759 + if (WARN_ONCE(!good, "fmt: '%s' current_buffer: '%s'", 3828 3760 fmt, seq_buf_str(&iter->seq.seq))) { 3829 3761 int ret; 3830 3762 ··· 4985 4919 static bool 4986 4920 trace_ok_for_array(struct tracer *t, struct trace_array *tr) 4987 4921 { 4922 + #ifdef CONFIG_TRACER_SNAPSHOT 4923 + /* arrays with mapped buffer range do not have snapshots */ 4924 + if (tr->range_addr_start && t->use_max_tr) 4925 + return false; 4926 + #endif 4988 4927 return (tr->flags & TRACE_ARRAY_FL_GLOBAL) || t->allow_instances; 4989 4928 } 4990 4929 ··· 5082 5011 return 0; 5083 5012 } 5084 5013 5085 - static int show_traces_release(struct inode *inode, struct file *file) 5014 + static int tracing_seq_release(struct inode *inode, struct file *file) 5086 5015 { 5087 5016 struct trace_array *tr = inode->i_private; 5088 5017 ··· 5123 5052 .open = show_traces_open, 5124 5053 .read = seq_read, 5125 5054 .llseek = seq_lseek, 5126 - .release = show_traces_release, 5055 + .release = tracing_seq_release, 5127 5056 }; 5128 5057 5129 5058 static ssize_t ··· 5308 5237 int set_tracer_flag(struct trace_array *tr, unsigned int mask, int enabled) 5309 5238 { 5310 5239 if ((mask == TRACE_ITER_RECORD_TGID) || 5311 - (mask == TRACE_ITER_RECORD_CMD)) 5240 + (mask == TRACE_ITER_RECORD_CMD) || 5241 + (mask == TRACE_ITER_TRACE_PRINTK)) 5312 5242 lockdep_assert_held(&event_mutex); 5313 5243 5314 5244 /* do nothing if flag is already set */ ··· 5320 5248 if (tr->current_trace->flag_changed) 5321 5249 if (tr->current_trace->flag_changed(tr, mask, !!enabled)) 5322 5250 return -EINVAL; 5251 + 5252 + if (mask == TRACE_ITER_TRACE_PRINTK) { 5253 + if (enabled) { 5254 + update_printk_trace(tr); 5255 + } else { 5256 + /* 5257 + * The global_trace cannot clear this. 5258 + * It's flag only gets cleared if another instance sets it. 5259 + */ 5260 + if (printk_trace == &global_trace) 5261 + return -EINVAL; 5262 + /* 5263 + * An instance must always have it set. 5264 + * by default, that's the global_trace instane. 5265 + */ 5266 + if (printk_trace == tr) 5267 + update_printk_trace(&global_trace); 5268 + } 5269 + } 5323 5270 5324 5271 if (enabled) 5325 5272 tr->trace_flags |= mask; ··· 6125 6034 return ret; 6126 6035 } 6127 6036 6037 + static void update_last_data(struct trace_array *tr) 6038 + { 6039 + if (!tr->text_delta && !tr->data_delta) 6040 + return; 6041 + 6042 + /* Clear old data */ 6043 + tracing_reset_online_cpus(&tr->array_buffer); 6044 + 6045 + /* Using current data now */ 6046 + tr->text_delta = 0; 6047 + tr->data_delta = 0; 6048 + } 6128 6049 6129 6050 /** 6130 6051 * tracing_update_buffers - used by tracing facility to expand ring buffers ··· 6154 6051 int ret = 0; 6155 6052 6156 6053 mutex_lock(&trace_types_lock); 6054 + 6055 + update_last_data(tr); 6056 + 6157 6057 if (!tr->ring_buffer_expanded) 6158 6058 ret = __tracing_resize_ring_buffer(tr, trace_buf_size, 6159 6059 RING_BUFFER_ALL_CPUS); ··· 6211 6105 int ret = 0; 6212 6106 6213 6107 mutex_lock(&trace_types_lock); 6108 + 6109 + update_last_data(tr); 6214 6110 6215 6111 if (!tr->ring_buffer_expanded) { 6216 6112 ret = __tracing_resize_ring_buffer(tr, trace_buf_size, ··· 6962 6854 } 6963 6855 6964 6856 static ssize_t 6857 + tracing_last_boot_read(struct file *filp, char __user *ubuf, size_t cnt, loff_t *ppos) 6858 + { 6859 + struct trace_array *tr = filp->private_data; 6860 + struct seq_buf seq; 6861 + char buf[64]; 6862 + 6863 + seq_buf_init(&seq, buf, 64); 6864 + 6865 + seq_buf_printf(&seq, "text delta:\t%ld\n", tr->text_delta); 6866 + seq_buf_printf(&seq, "data delta:\t%ld\n", tr->data_delta); 6867 + 6868 + return simple_read_from_buffer(ubuf, cnt, ppos, buf, seq_buf_used(&seq)); 6869 + } 6870 + 6871 + static int tracing_buffer_meta_open(struct inode *inode, struct file *filp) 6872 + { 6873 + struct trace_array *tr = inode->i_private; 6874 + int cpu = tracing_get_cpu(inode); 6875 + int ret; 6876 + 6877 + ret = tracing_check_open_get_tr(tr); 6878 + if (ret) 6879 + return ret; 6880 + 6881 + ret = ring_buffer_meta_seq_init(filp, tr->array_buffer.buffer, cpu); 6882 + if (ret < 0) 6883 + __trace_array_put(tr); 6884 + return ret; 6885 + } 6886 + 6887 + static ssize_t 6965 6888 tracing_free_buffer_write(struct file *filp, const char __user *ubuf, 6966 6889 size_t cnt, loff_t *ppos) 6967 6890 { ··· 7568 7429 .release = tracing_release_generic_tr, 7569 7430 }; 7570 7431 7432 + static const struct file_operations tracing_buffer_meta_fops = { 7433 + .open = tracing_buffer_meta_open, 7434 + .read = seq_read, 7435 + .llseek = seq_lseek, 7436 + .release = tracing_seq_release, 7437 + }; 7438 + 7571 7439 static const struct file_operations tracing_total_entries_fops = { 7572 7440 .open = tracing_open_generic_tr, 7573 7441 .read = tracing_total_entries_read, ··· 7613 7467 .read = seq_read, 7614 7468 .llseek = seq_lseek, 7615 7469 .release = tracing_single_release_tr, 7470 + }; 7471 + 7472 + static const struct file_operations last_boot_fops = { 7473 + .open = tracing_open_generic_tr, 7474 + .read = tracing_last_boot_read, 7475 + .llseek = generic_file_llseek, 7476 + .release = tracing_release_generic_tr, 7616 7477 }; 7617 7478 7618 7479 #ifdef CONFIG_TRACER_SNAPSHOT ··· 8814 8661 trace_create_cpu_file("buffer_size_kb", TRACE_MODE_READ, d_cpu, 8815 8662 tr, cpu, &tracing_entries_fops); 8816 8663 8664 + if (tr->range_addr_start) 8665 + trace_create_cpu_file("buffer_meta", TRACE_MODE_READ, d_cpu, 8666 + tr, cpu, &tracing_buffer_meta_fops); 8817 8667 #ifdef CONFIG_TRACER_SNAPSHOT 8818 - trace_create_cpu_file("snapshot", TRACE_MODE_WRITE, d_cpu, 8819 - tr, cpu, &snapshot_fops); 8668 + if (!tr->range_addr_start) { 8669 + trace_create_cpu_file("snapshot", TRACE_MODE_WRITE, d_cpu, 8670 + tr, cpu, &snapshot_fops); 8820 8671 8821 - trace_create_cpu_file("snapshot_raw", TRACE_MODE_READ, d_cpu, 8822 - tr, cpu, &snapshot_raw_fops); 8672 + trace_create_cpu_file("snapshot_raw", TRACE_MODE_READ, d_cpu, 8673 + tr, cpu, &snapshot_raw_fops); 8674 + } 8823 8675 #endif 8824 8676 } 8825 8677 ··· 9361 9203 9362 9204 buf->tr = tr; 9363 9205 9364 - buf->buffer = ring_buffer_alloc(size, rb_flags); 9206 + if (tr->range_addr_start && tr->range_addr_size) { 9207 + buf->buffer = ring_buffer_alloc_range(size, rb_flags, 0, 9208 + tr->range_addr_start, 9209 + tr->range_addr_size); 9210 + 9211 + ring_buffer_last_boot_delta(buf->buffer, 9212 + &tr->text_delta, &tr->data_delta); 9213 + /* 9214 + * This is basically the same as a mapped buffer, 9215 + * with the same restrictions. 9216 + */ 9217 + tr->mapped++; 9218 + } else { 9219 + buf->buffer = ring_buffer_alloc(size, rb_flags); 9220 + } 9365 9221 if (!buf->buffer) 9366 9222 return -ENOMEM; 9367 9223 ··· 9412 9240 return ret; 9413 9241 9414 9242 #ifdef CONFIG_TRACER_MAX_TRACE 9243 + /* Fix mapped buffer trace arrays do not have snapshot buffers */ 9244 + if (tr->range_addr_start) 9245 + return 0; 9246 + 9415 9247 ret = allocate_trace_buffer(tr, &tr->max_buffer, 9416 9248 allocate_snapshot ? size : 1); 9417 9249 if (MEM_FAIL(ret, "Failed to allocate trace buffer\n")) { ··· 9516 9340 } 9517 9341 9518 9342 static struct trace_array * 9519 - trace_array_create_systems(const char *name, const char *systems) 9343 + trace_array_create_systems(const char *name, const char *systems, 9344 + unsigned long range_addr_start, 9345 + unsigned long range_addr_size) 9520 9346 { 9521 9347 struct trace_array *tr; 9522 9348 int ret; ··· 9543 9365 if (!tr->system_names) 9544 9366 goto out_free_tr; 9545 9367 } 9368 + 9369 + /* Only for boot up memory mapped ring buffers */ 9370 + tr->range_addr_start = range_addr_start; 9371 + tr->range_addr_size = range_addr_size; 9546 9372 9547 9373 tr->trace_flags = global_trace.trace_flags & ~ZEROED_TRACE_FLAGS; 9548 9374 ··· 9605 9423 9606 9424 static struct trace_array *trace_array_create(const char *name) 9607 9425 { 9608 - return trace_array_create_systems(name, NULL); 9426 + return trace_array_create_systems(name, NULL, 0, 0); 9609 9427 } 9610 9428 9611 9429 static int instance_mkdir(const char *name) ··· 9628 9446 mutex_unlock(&trace_types_lock); 9629 9447 mutex_unlock(&event_mutex); 9630 9448 return ret; 9449 + } 9450 + 9451 + static u64 map_pages(u64 start, u64 size) 9452 + { 9453 + struct page **pages; 9454 + phys_addr_t page_start; 9455 + unsigned int page_count; 9456 + unsigned int i; 9457 + void *vaddr; 9458 + 9459 + page_count = DIV_ROUND_UP(size, PAGE_SIZE); 9460 + 9461 + page_start = start; 9462 + pages = kmalloc_array(page_count, sizeof(struct page *), GFP_KERNEL); 9463 + if (!pages) 9464 + return 0; 9465 + 9466 + for (i = 0; i < page_count; i++) { 9467 + phys_addr_t addr = page_start + i * PAGE_SIZE; 9468 + pages[i] = pfn_to_page(addr >> PAGE_SHIFT); 9469 + } 9470 + vaddr = vmap(pages, page_count, VM_MAP, PAGE_KERNEL); 9471 + kfree(pages); 9472 + 9473 + return (u64)(unsigned long)vaddr; 9631 9474 } 9632 9475 9633 9476 /** ··· 9684 9477 goto out_unlock; 9685 9478 } 9686 9479 9687 - tr = trace_array_create_systems(name, systems); 9480 + tr = trace_array_create_systems(name, systems, 0, 0); 9688 9481 9689 9482 if (IS_ERR(tr)) 9690 9483 tr = NULL; ··· 9713 9506 if ((1 << i) & ZEROED_TRACE_FLAGS) 9714 9507 set_tracer_flag(tr, 1 << i, 0); 9715 9508 } 9509 + 9510 + if (printk_trace == tr) 9511 + update_printk_trace(&global_trace); 9716 9512 9717 9513 tracing_set_nop(tr); 9718 9514 clear_ftrace_function_probes(tr); ··· 9879 9669 if (ftrace_create_function_files(tr, d_tracer)) 9880 9670 MEM_FAIL(1, "Could not allocate function filter files"); 9881 9671 9672 + if (tr->range_addr_start) { 9673 + trace_create_file("last_boot_info", TRACE_MODE_READ, d_tracer, 9674 + tr, &last_boot_fops); 9882 9675 #ifdef CONFIG_TRACER_SNAPSHOT 9883 - trace_create_file("snapshot", TRACE_MODE_WRITE, d_tracer, 9884 - tr, &snapshot_fops); 9676 + } else { 9677 + trace_create_file("snapshot", TRACE_MODE_WRITE, d_tracer, 9678 + tr, &snapshot_fops); 9885 9679 #endif 9680 + } 9886 9681 9887 9682 trace_create_file("error_log", TRACE_MODE_WRITE, d_tracer, 9888 9683 tr, &tracing_err_log_fops); ··· 10507 10292 { 10508 10293 struct trace_array *tr; 10509 10294 char *curr_str; 10295 + char *name; 10510 10296 char *str; 10511 10297 char *tok; 10512 10298 ··· 10516 10300 str = boot_instance_info; 10517 10301 10518 10302 while ((curr_str = strsep(&str, "\t"))) { 10303 + phys_addr_t start = 0; 10304 + phys_addr_t size = 0; 10305 + unsigned long addr = 0; 10306 + bool traceprintk = false; 10307 + bool traceoff = false; 10308 + char *flag_delim; 10309 + char *addr_delim; 10519 10310 10520 10311 tok = strsep(&curr_str, ","); 10521 10312 10522 - if (IS_ENABLED(CONFIG_TRACER_MAX_TRACE)) 10523 - do_allocate_snapshot(tok); 10313 + flag_delim = strchr(tok, '^'); 10314 + addr_delim = strchr(tok, '@'); 10524 10315 10525 - tr = trace_array_get_by_name(tok, NULL); 10526 - if (!tr) { 10527 - pr_warn("Failed to create instance buffer %s\n", curr_str); 10316 + if (addr_delim) 10317 + *addr_delim++ = '\0'; 10318 + 10319 + if (flag_delim) 10320 + *flag_delim++ = '\0'; 10321 + 10322 + name = tok; 10323 + 10324 + if (flag_delim) { 10325 + char *flag; 10326 + 10327 + while ((flag = strsep(&flag_delim, "^"))) { 10328 + if (strcmp(flag, "traceoff") == 0) { 10329 + traceoff = true; 10330 + } else if ((strcmp(flag, "printk") == 0) || 10331 + (strcmp(flag, "traceprintk") == 0) || 10332 + (strcmp(flag, "trace_printk") == 0)) { 10333 + traceprintk = true; 10334 + } else { 10335 + pr_info("Tracing: Invalid instance flag '%s' for %s\n", 10336 + flag, name); 10337 + } 10338 + } 10339 + } 10340 + 10341 + tok = addr_delim; 10342 + if (tok && isdigit(*tok)) { 10343 + start = memparse(tok, &tok); 10344 + if (!start) { 10345 + pr_warn("Tracing: Invalid boot instance address for %s\n", 10346 + name); 10347 + continue; 10348 + } 10349 + if (*tok != ':') { 10350 + pr_warn("Tracing: No size specified for instance %s\n", name); 10351 + continue; 10352 + } 10353 + tok++; 10354 + size = memparse(tok, &tok); 10355 + if (!size) { 10356 + pr_warn("Tracing: Invalid boot instance size for %s\n", 10357 + name); 10358 + continue; 10359 + } 10360 + } else if (tok) { 10361 + if (!reserve_mem_find_by_name(tok, &start, &size)) { 10362 + start = 0; 10363 + pr_warn("Failed to map boot instance %s to %s\n", name, tok); 10364 + continue; 10365 + } 10366 + } 10367 + 10368 + if (start) { 10369 + addr = map_pages(start, size); 10370 + if (addr) { 10371 + pr_info("Tracing: mapped boot instance %s at physical memory %pa of size 0x%lx\n", 10372 + name, &start, (unsigned long)size); 10373 + } else { 10374 + pr_warn("Tracing: Failed to map boot instance %s\n", name); 10375 + continue; 10376 + } 10377 + } else { 10378 + /* Only non mapped buffers have snapshot buffers */ 10379 + if (IS_ENABLED(CONFIG_TRACER_MAX_TRACE)) 10380 + do_allocate_snapshot(name); 10381 + } 10382 + 10383 + tr = trace_array_create_systems(name, NULL, addr, size); 10384 + if (IS_ERR(tr)) { 10385 + pr_warn("Tracing: Failed to create instance buffer %s\n", curr_str); 10528 10386 continue; 10529 10387 } 10530 - /* Allow user space to delete it */ 10531 - trace_array_put(tr); 10388 + 10389 + if (traceoff) 10390 + tracer_tracing_off(tr); 10391 + 10392 + if (traceprintk) 10393 + update_printk_trace(tr); 10394 + 10395 + /* 10396 + * If start is set, then this is a mapped buffer, and 10397 + * cannot be deleted by user space, so keep the reference 10398 + * to it. 10399 + */ 10400 + if (start) 10401 + tr->flags |= TRACE_ARRAY_FL_BOOT; 10402 + else 10403 + trace_array_put(tr); 10532 10404 10533 10405 while ((tok = strsep(&curr_str, ","))) { 10534 10406 early_enable_events(tr, tok, true);
+12 -2
kernel/trace/trace.h
··· 336 336 bool allocated_snapshot; 337 337 spinlock_t snapshot_trigger_lock; 338 338 unsigned int snapshot; 339 - unsigned int mapped; 340 339 unsigned long max_latency; 341 340 #ifdef CONFIG_FSNOTIFY 342 341 struct dentry *d_max_latency; ··· 343 344 struct irq_work fsnotify_irqwork; 344 345 #endif 345 346 #endif 347 + /* The below is for memory mapped ring buffer */ 348 + unsigned int mapped; 349 + unsigned long range_addr_start; 350 + unsigned long range_addr_size; 351 + long text_delta; 352 + long data_delta; 353 + 346 354 struct trace_pid_list __rcu *filtered_pids; 347 355 struct trace_pid_list __rcu *filtered_no_pids; 348 356 /* ··· 429 423 }; 430 424 431 425 enum { 432 - TRACE_ARRAY_FL_GLOBAL = (1 << 0) 426 + TRACE_ARRAY_FL_GLOBAL = BIT(0), 427 + TRACE_ARRAY_FL_BOOT = BIT(1), 433 428 }; 434 429 435 430 extern struct list_head ftrace_trace_arrays; ··· 650 643 int type, 651 644 unsigned long len, 652 645 unsigned int trace_ctx); 646 + 647 + int ring_buffer_meta_seq_init(struct file *file, struct trace_buffer *buffer, int cpu); 653 648 654 649 struct trace_entry *tracing_get_trace_entry(struct trace_array *tr, 655 650 struct trace_array_cpu *data); ··· 1321 1312 C(IRQ_INFO, "irq-info"), \ 1322 1313 C(MARKERS, "markers"), \ 1323 1314 C(EVENT_FORK, "event-fork"), \ 1315 + C(TRACE_PRINTK, "trace_printk_dest"), \ 1324 1316 C(PAUSE_ON_TRACE, "pause-on-trace"), \ 1325 1317 C(HASH_PTR, "hash-ptr"), /* Print hashed pointer */ \ 1326 1318 FUNCTION_FLAGS \
+18 -5
kernel/trace/trace_functions_graph.c
··· 544 544 struct trace_seq *s = &iter->seq; 545 545 struct trace_entry *ent = iter->ent; 546 546 547 + addr += iter->tr->text_delta; 548 + 547 549 if (addr < (unsigned long)__irqentry_text_start || 548 550 addr >= (unsigned long)__irqentry_text_end) 549 551 return; ··· 712 710 struct ftrace_graph_ret *graph_ret; 713 711 struct ftrace_graph_ent *call; 714 712 unsigned long long duration; 713 + unsigned long func; 715 714 int cpu = iter->cpu; 716 715 int i; 717 716 718 717 graph_ret = &ret_entry->ret; 719 718 call = &entry->graph_ent; 720 719 duration = graph_ret->rettime - graph_ret->calltime; 720 + 721 + func = call->func + iter->tr->text_delta; 721 722 722 723 if (data) { 723 724 struct fgraph_cpu_data *cpu_data; ··· 752 747 * enabled. 753 748 */ 754 749 if (flags & __TRACE_GRAPH_PRINT_RETVAL) 755 - print_graph_retval(s, graph_ret->retval, true, (void *)call->func, 750 + print_graph_retval(s, graph_ret->retval, true, (void *)func, 756 751 !!(flags & TRACE_GRAPH_PRINT_RETVAL_HEX)); 757 752 else 758 - trace_seq_printf(s, "%ps();\n", (void *)call->func); 753 + trace_seq_printf(s, "%ps();\n", (void *)func); 759 754 760 755 print_graph_irq(iter, graph_ret->func, TRACE_GRAPH_RET, 761 756 cpu, iter->ent->pid, flags); ··· 771 766 struct ftrace_graph_ent *call = &entry->graph_ent; 772 767 struct fgraph_data *data = iter->private; 773 768 struct trace_array *tr = iter->tr; 769 + unsigned long func; 774 770 int i; 775 771 776 772 if (data) { ··· 794 788 for (i = 0; i < call->depth * TRACE_GRAPH_INDENT; i++) 795 789 trace_seq_putc(s, ' '); 796 790 797 - trace_seq_printf(s, "%ps() {\n", (void *)call->func); 791 + func = call->func + iter->tr->text_delta; 792 + 793 + trace_seq_printf(s, "%ps() {\n", (void *)func); 798 794 799 795 if (trace_seq_has_overflowed(s)) 800 796 return TRACE_TYPE_PARTIAL_LINE; ··· 870 862 int cpu = iter->cpu; 871 863 int *depth_irq; 872 864 struct fgraph_data *data = iter->private; 865 + 866 + addr += iter->tr->text_delta; 873 867 874 868 /* 875 869 * If we are either displaying irqs, or we got called as ··· 1000 990 unsigned long long duration = trace->rettime - trace->calltime; 1001 991 struct fgraph_data *data = iter->private; 1002 992 struct trace_array *tr = iter->tr; 993 + unsigned long func; 1003 994 pid_t pid = ent->pid; 1004 995 int cpu = iter->cpu; 1005 996 int func_match = 1; 1006 997 int i; 998 + 999 + func = trace->func + iter->tr->text_delta; 1007 1000 1008 1001 if (check_irq_return(iter, flags, trace->depth)) 1009 1002 return TRACE_TYPE_HANDLED; ··· 1046 1033 * function-retval option is enabled. 1047 1034 */ 1048 1035 if (flags & __TRACE_GRAPH_PRINT_RETVAL) { 1049 - print_graph_retval(s, trace->retval, false, (void *)trace->func, 1036 + print_graph_retval(s, trace->retval, false, (void *)func, 1050 1037 !!(flags & TRACE_GRAPH_PRINT_RETVAL_HEX)); 1051 1038 } else { 1052 1039 /* ··· 1059 1046 if (func_match && !(flags & TRACE_GRAPH_PRINT_TAIL)) 1060 1047 trace_seq_puts(s, "}\n"); 1061 1048 else 1062 - trace_seq_printf(s, "} /* %ps */\n", (void *)trace->func); 1049 + trace_seq_printf(s, "} /* %ps */\n", (void *)func); 1063 1050 } 1064 1051 1065 1052 /* Overrun */
+12 -5
kernel/trace/trace_output.c
··· 990 990 } 991 991 992 992 static void print_fn_trace(struct trace_seq *s, unsigned long ip, 993 - unsigned long parent_ip, int flags) 993 + unsigned long parent_ip, long delta, int flags) 994 994 { 995 + ip += delta; 996 + parent_ip += delta; 997 + 995 998 seq_print_ip_sym(s, ip, flags); 996 999 997 1000 if ((flags & TRACE_ITER_PRINT_PARENT) && parent_ip) { ··· 1012 1009 1013 1010 trace_assign_type(field, iter->ent); 1014 1011 1015 - print_fn_trace(s, field->ip, field->parent_ip, flags); 1012 + print_fn_trace(s, field->ip, field->parent_ip, iter->tr->text_delta, flags); 1016 1013 trace_seq_putc(s, '\n'); 1017 1014 1018 1015 return trace_handle_return(s); ··· 1233 1230 struct trace_seq *s = &iter->seq; 1234 1231 unsigned long *p; 1235 1232 unsigned long *end; 1233 + long delta = iter->tr->text_delta; 1236 1234 1237 1235 trace_assign_type(field, iter->ent); 1238 1236 end = (unsigned long *)((long)iter->ent + iter->ent_size); ··· 1246 1242 break; 1247 1243 1248 1244 trace_seq_puts(s, " => "); 1249 - seq_print_ip_sym(s, *p, flags); 1245 + seq_print_ip_sym(s, (*p) + delta, flags); 1250 1246 trace_seq_putc(s, '\n'); 1251 1247 } 1252 1248 ··· 1591 1587 { 1592 1588 struct print_entry *field; 1593 1589 struct trace_seq *s = &iter->seq; 1590 + unsigned long ip; 1594 1591 1595 1592 trace_assign_type(field, iter->ent); 1596 1593 1597 - seq_print_ip_sym(s, field->ip, flags); 1594 + ip = field->ip + iter->tr->text_delta; 1595 + 1596 + seq_print_ip_sym(s, ip, flags); 1598 1597 trace_seq_printf(s, ": %s", field->buf); 1599 1598 1600 1599 return trace_handle_return(s); ··· 1681 1674 1682 1675 trace_assign_type(field, iter->ent); 1683 1676 1684 - print_fn_trace(s, field->ip, field->parent_ip, flags); 1677 + print_fn_trace(s, field->ip, field->parent_ip, iter->tr->text_delta, flags); 1685 1678 trace_seq_printf(s, " (repeats: %u, last_ts:", field->count); 1686 1679 trace_print_time(s, iter, 1687 1680 iter->ts - FUNC_REPEATS_GET_DELTA_TS(field));
+24
tools/testing/selftests/ring-buffer/map_test.c
··· 92 92 if (desc->cpu_fd < 0) 93 93 return -ENODEV; 94 94 95 + again: 95 96 map = mmap(NULL, page_size, PROT_READ, MAP_SHARED, desc->cpu_fd, 0); 96 97 if (map == MAP_FAILED) 97 98 return -errno; 98 99 99 100 desc->meta = (struct trace_buffer_meta *)map; 101 + 102 + /* the meta-page is bigger than the original mapping */ 103 + if (page_size < desc->meta->meta_struct_len) { 104 + int meta_page_size = desc->meta->meta_page_size; 105 + 106 + munmap(desc->meta, page_size); 107 + page_size = meta_page_size; 108 + goto again; 109 + } 100 110 101 111 return 0; 102 112 } ··· 238 228 data = mmap(NULL, data_len, PROT_READ, MAP_SHARED, 239 229 desc->cpu_fd, meta_len); 240 230 ASSERT_EQ(data, MAP_FAILED); 231 + 232 + /* Verify meta-page padding */ 233 + if (desc->meta->meta_page_size > getpagesize()) { 234 + data_len = desc->meta->meta_page_size; 235 + data = mmap(NULL, data_len, 236 + PROT_READ, MAP_SHARED, desc->cpu_fd, 0); 237 + ASSERT_NE(data, MAP_FAILED); 238 + 239 + for (int i = desc->meta->meta_struct_len; 240 + i < desc->meta->meta_page_size; i += sizeof(int)) 241 + ASSERT_EQ(*(int *)(data + i), 0); 242 + 243 + munmap(data, data_len); 244 + } 241 245 } 242 246 243 247 FIXTURE(snapshot) {