Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

Merge tag 'trace-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing updates from Steven Rostedt:
"User visible changes:

- Added a way to easier filter with cpumasks:

# echo 'cpumask & CPUS{17-42}' > /sys/kernel/tracing/events/ipi_send_cpumask/filter

- Show actual size of ring buffer after modifying the ring buffer
size via buffer_size_kb.

Currently it just returns what was written, but the actual size
rounds up to the sub buffer size. Show that real size instead.

Major changes:

- Added "eventfs". This is the code that handles the inodes and
dentries of tracefs/events directory. As there are thousands of
events, and each event has several inodes and dentries that
currently exist even when tracing is never used, they take up
precious memory. Instead, eventfs will allocate the inodes and
dentries in a JIT way (similar to what procfs does). There is now
metadata that handles the events and subdirectories, and will
create the inodes and dentries when they are used.

Note, I also have patches that remove the subdirectory meta data,
but will wait till the next merge window before applying them. It's
a little more complex, and I want to make sure the dynamic code
works properly before adding more complexity, making it easier to
revert if need be.

Minor changes:

- Optimization to user event list traversal

- Remove intermediate permission of tracefs files (note the
intermediate permission removes all access to the files so it is
not a security concern, but just a clean up)

- Add the complex fix to FORTIFY_SOURCE to the kernel stack event
logic

- Other minor cleanups"

* tag 'trace-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (29 commits)
tracefs: Remove kerneldoc from struct eventfs_file
tracefs: Avoid changing i_mode to a temp value
tracing/user_events: Optimize safe list traversals
ftrace: Remove empty declaration ftrace_enable_daemon() and ftrace_disable_daemon()
tracing: Remove unused function declarations
tracing/filters: Document cpumask filtering
tracing/filters: Further optimise scalar vs cpumask comparison
tracing/filters: Optimise CPU vs cpumask filtering when the user mask is a single CPU
tracing/filters: Optimise scalar vs cpumask filtering when the user mask is a single CPU
tracing/filters: Optimise cpumask vs cpumask filtering when user mask is a single CPU
tracing/filters: Enable filtering the CPU common field by a cpumask
tracing/filters: Enable filtering a scalar field by a cpumask
tracing/filters: Enable filtering a cpumask field by another cpumask
tracing/filters: Dynamically allocate filter_pred.regex
test: ftrace: Fix kprobe test for eventfs
eventfs: Move tracing/events to eventfs
eventfs: Implement removal of meta data from eventfs
eventfs: Implement functions to create files and dirs when accessed
eventfs: Implement eventfs lookup, read, open functions
eventfs: Implement eventfs file add functions
...

+1424 -169
+14
Documentation/trace/events.rst
··· 219 219 The ".function" postfix can only be attached to values of size long, and can only 220 220 be compared with "==" or "!=". 221 221 222 + Cpumask fields or scalar fields that encode a CPU number can be filtered using 223 + a user-provided cpumask in cpulist format. The format is as follows:: 224 + 225 + CPUS{$cpulist} 226 + 227 + Operators available to cpumask filtering are: 228 + 229 + & (intersection), ==, != 230 + 231 + For example, this will filter events that have their .target_cpu field present 232 + in the given cpumask:: 233 + 234 + target_cpu & CPUS{17-42} 235 + 222 236 5.2 Setting filters 223 237 ------------------- 224 238
+1
fs/tracefs/Makefile
··· 1 1 # SPDX-License-Identifier: GPL-2.0-only 2 2 tracefs-objs := inode.o 3 + tracefs-objs += event_inode.o 3 4 4 5 obj-$(CONFIG_TRACING) += tracefs.o 5 6
+807
fs/tracefs/event_inode.c
··· 1 + // SPDX-License-Identifier: GPL-2.0-only 2 + /* 3 + * event_inode.c - part of tracefs, a pseudo file system for activating tracing 4 + * 5 + * Copyright (C) 2020-23 VMware Inc, author: Steven Rostedt (VMware) <rostedt@goodmis.org> 6 + * Copyright (C) 2020-23 VMware Inc, author: Ajay Kaher <akaher@vmware.com> 7 + * 8 + * eventfs is used to dynamically create inodes and dentries based on the 9 + * meta data provided by the tracing system. 10 + * 11 + * eventfs stores the meta-data of files/dirs and holds off on creating 12 + * inodes/dentries of the files. When accessed, the eventfs will create the 13 + * inodes/dentries in a just-in-time (JIT) manner. The eventfs will clean up 14 + * and delete the inodes/dentries when they are no longer referenced. 15 + */ 16 + #include <linux/fsnotify.h> 17 + #include <linux/fs.h> 18 + #include <linux/namei.h> 19 + #include <linux/workqueue.h> 20 + #include <linux/security.h> 21 + #include <linux/tracefs.h> 22 + #include <linux/kref.h> 23 + #include <linux/delay.h> 24 + #include "internal.h" 25 + 26 + struct eventfs_inode { 27 + struct list_head e_top_files; 28 + }; 29 + 30 + /* 31 + * struct eventfs_file - hold the properties of the eventfs files and 32 + * directories. 33 + * @name: the name of the file or directory to create 34 + * @d_parent: holds parent's dentry 35 + * @dentry: once accessed holds dentry 36 + * @list: file or directory to be added to parent directory 37 + * @ei: list of files and directories within directory 38 + * @fop: file_operations for file or directory 39 + * @iop: inode_operations for file or directory 40 + * @data: something that the caller will want to get to later on 41 + * @mode: the permission that the file or directory should have 42 + */ 43 + struct eventfs_file { 44 + const char *name; 45 + struct dentry *d_parent; 46 + struct dentry *dentry; 47 + struct list_head list; 48 + struct eventfs_inode *ei; 49 + const struct file_operations *fop; 50 + const struct inode_operations *iop; 51 + /* 52 + * Union - used for deletion 53 + * @del_list: list of eventfs_file to delete 54 + * @rcu: eventfs_file to delete in RCU 55 + * @is_freed: node is freed if one of the above is set 56 + */ 57 + union { 58 + struct list_head del_list; 59 + struct rcu_head rcu; 60 + unsigned long is_freed; 61 + }; 62 + void *data; 63 + umode_t mode; 64 + }; 65 + 66 + static DEFINE_MUTEX(eventfs_mutex); 67 + DEFINE_STATIC_SRCU(eventfs_srcu); 68 + 69 + static struct dentry *eventfs_root_lookup(struct inode *dir, 70 + struct dentry *dentry, 71 + unsigned int flags); 72 + static int dcache_dir_open_wrapper(struct inode *inode, struct file *file); 73 + static int eventfs_release(struct inode *inode, struct file *file); 74 + 75 + static const struct inode_operations eventfs_root_dir_inode_operations = { 76 + .lookup = eventfs_root_lookup, 77 + }; 78 + 79 + static const struct file_operations eventfs_file_operations = { 80 + .open = dcache_dir_open_wrapper, 81 + .read = generic_read_dir, 82 + .iterate_shared = dcache_readdir, 83 + .llseek = generic_file_llseek, 84 + .release = eventfs_release, 85 + }; 86 + 87 + /** 88 + * create_file - create a file in the tracefs filesystem 89 + * @name: the name of the file to create. 90 + * @mode: the permission that the file should have. 91 + * @parent: parent dentry for this file. 92 + * @data: something that the caller will want to get to later on. 93 + * @fop: struct file_operations that should be used for this file. 94 + * 95 + * This is the basic "create a file" function for tracefs. It allows for a 96 + * wide range of flexibility in creating a file. 97 + * 98 + * This function will return a pointer to a dentry if it succeeds. This 99 + * pointer must be passed to the tracefs_remove() function when the file is 100 + * to be removed (no automatic cleanup happens if your module is unloaded, 101 + * you are responsible here.) If an error occurs, %NULL will be returned. 102 + * 103 + * If tracefs is not enabled in the kernel, the value -%ENODEV will be 104 + * returned. 105 + */ 106 + static struct dentry *create_file(const char *name, umode_t mode, 107 + struct dentry *parent, void *data, 108 + const struct file_operations *fop) 109 + { 110 + struct tracefs_inode *ti; 111 + struct dentry *dentry; 112 + struct inode *inode; 113 + 114 + if (!(mode & S_IFMT)) 115 + mode |= S_IFREG; 116 + 117 + if (WARN_ON_ONCE(!S_ISREG(mode))) 118 + return NULL; 119 + 120 + dentry = eventfs_start_creating(name, parent); 121 + 122 + if (IS_ERR(dentry)) 123 + return dentry; 124 + 125 + inode = tracefs_get_inode(dentry->d_sb); 126 + if (unlikely(!inode)) 127 + return eventfs_failed_creating(dentry); 128 + 129 + inode->i_mode = mode; 130 + inode->i_fop = fop; 131 + inode->i_private = data; 132 + 133 + ti = get_tracefs(inode); 134 + ti->flags |= TRACEFS_EVENT_INODE; 135 + d_instantiate(dentry, inode); 136 + fsnotify_create(dentry->d_parent->d_inode, dentry); 137 + return eventfs_end_creating(dentry); 138 + }; 139 + 140 + /** 141 + * create_dir - create a dir in the tracefs filesystem 142 + * @name: the name of the file to create. 143 + * @parent: parent dentry for this file. 144 + * @data: something that the caller will want to get to later on. 145 + * 146 + * This is the basic "create a dir" function for eventfs. It allows for a 147 + * wide range of flexibility in creating a dir. 148 + * 149 + * This function will return a pointer to a dentry if it succeeds. This 150 + * pointer must be passed to the tracefs_remove() function when the file is 151 + * to be removed (no automatic cleanup happens if your module is unloaded, 152 + * you are responsible here.) If an error occurs, %NULL will be returned. 153 + * 154 + * If tracefs is not enabled in the kernel, the value -%ENODEV will be 155 + * returned. 156 + */ 157 + static struct dentry *create_dir(const char *name, struct dentry *parent, void *data) 158 + { 159 + struct tracefs_inode *ti; 160 + struct dentry *dentry; 161 + struct inode *inode; 162 + 163 + dentry = eventfs_start_creating(name, parent); 164 + if (IS_ERR(dentry)) 165 + return dentry; 166 + 167 + inode = tracefs_get_inode(dentry->d_sb); 168 + if (unlikely(!inode)) 169 + return eventfs_failed_creating(dentry); 170 + 171 + inode->i_mode = S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO; 172 + inode->i_op = &eventfs_root_dir_inode_operations; 173 + inode->i_fop = &eventfs_file_operations; 174 + inode->i_private = data; 175 + 176 + ti = get_tracefs(inode); 177 + ti->flags |= TRACEFS_EVENT_INODE; 178 + 179 + inc_nlink(inode); 180 + d_instantiate(dentry, inode); 181 + inc_nlink(dentry->d_parent->d_inode); 182 + fsnotify_mkdir(dentry->d_parent->d_inode, dentry); 183 + return eventfs_end_creating(dentry); 184 + } 185 + 186 + /** 187 + * eventfs_set_ef_status_free - set the ef->status to free 188 + * @dentry: dentry who's status to be freed 189 + * 190 + * eventfs_set_ef_status_free will be called if no more 191 + * references remain 192 + */ 193 + void eventfs_set_ef_status_free(struct dentry *dentry) 194 + { 195 + struct tracefs_inode *ti_parent; 196 + struct eventfs_file *ef; 197 + 198 + mutex_lock(&eventfs_mutex); 199 + ti_parent = get_tracefs(dentry->d_parent->d_inode); 200 + if (!ti_parent || !(ti_parent->flags & TRACEFS_EVENT_INODE)) 201 + goto out; 202 + 203 + ef = dentry->d_fsdata; 204 + if (!ef) 205 + goto out; 206 + 207 + /* 208 + * If ef was freed, then the LSB bit is set for d_fsdata. 209 + * But this should not happen, as it should still have a 210 + * ref count that prevents it. Warn in case it does. 211 + */ 212 + if (WARN_ON_ONCE((unsigned long)ef & 1)) 213 + goto out; 214 + 215 + dentry->d_fsdata = NULL; 216 + ef->dentry = NULL; 217 + out: 218 + mutex_unlock(&eventfs_mutex); 219 + } 220 + 221 + /** 222 + * eventfs_post_create_dir - post create dir routine 223 + * @ef: eventfs_file of recently created dir 224 + * 225 + * Map the meta-data of files within an eventfs dir to their parent dentry 226 + */ 227 + static void eventfs_post_create_dir(struct eventfs_file *ef) 228 + { 229 + struct eventfs_file *ef_child; 230 + struct tracefs_inode *ti; 231 + 232 + /* srcu lock already held */ 233 + /* fill parent-child relation */ 234 + list_for_each_entry_srcu(ef_child, &ef->ei->e_top_files, list, 235 + srcu_read_lock_held(&eventfs_srcu)) { 236 + ef_child->d_parent = ef->dentry; 237 + } 238 + 239 + ti = get_tracefs(ef->dentry->d_inode); 240 + ti->private = ef->ei; 241 + } 242 + 243 + /** 244 + * create_dentry - helper function to create dentry 245 + * @ef: eventfs_file of file or directory to create 246 + * @parent: parent dentry 247 + * @lookup: true if called from lookup routine 248 + * 249 + * Used to create a dentry for file/dir, executes post dentry creation routine 250 + */ 251 + static struct dentry * 252 + create_dentry(struct eventfs_file *ef, struct dentry *parent, bool lookup) 253 + { 254 + bool invalidate = false; 255 + struct dentry *dentry; 256 + 257 + mutex_lock(&eventfs_mutex); 258 + if (ef->is_freed) { 259 + mutex_unlock(&eventfs_mutex); 260 + return NULL; 261 + } 262 + if (ef->dentry) { 263 + dentry = ef->dentry; 264 + /* On dir open, up the ref count */ 265 + if (!lookup) 266 + dget(dentry); 267 + mutex_unlock(&eventfs_mutex); 268 + return dentry; 269 + } 270 + mutex_unlock(&eventfs_mutex); 271 + 272 + if (!lookup) 273 + inode_lock(parent->d_inode); 274 + 275 + if (ef->ei) 276 + dentry = create_dir(ef->name, parent, ef->data); 277 + else 278 + dentry = create_file(ef->name, ef->mode, parent, 279 + ef->data, ef->fop); 280 + 281 + if (!lookup) 282 + inode_unlock(parent->d_inode); 283 + 284 + mutex_lock(&eventfs_mutex); 285 + if (IS_ERR_OR_NULL(dentry)) { 286 + /* If the ef was already updated get it */ 287 + dentry = ef->dentry; 288 + if (dentry && !lookup) 289 + dget(dentry); 290 + mutex_unlock(&eventfs_mutex); 291 + return dentry; 292 + } 293 + 294 + if (!ef->dentry && !ef->is_freed) { 295 + ef->dentry = dentry; 296 + if (ef->ei) 297 + eventfs_post_create_dir(ef); 298 + dentry->d_fsdata = ef; 299 + } else { 300 + /* A race here, should try again (unless freed) */ 301 + invalidate = true; 302 + 303 + /* 304 + * Should never happen unless we get here due to being freed. 305 + * Otherwise it means two dentries exist with the same name. 306 + */ 307 + WARN_ON_ONCE(!ef->is_freed); 308 + } 309 + mutex_unlock(&eventfs_mutex); 310 + if (invalidate) 311 + d_invalidate(dentry); 312 + 313 + if (lookup || invalidate) 314 + dput(dentry); 315 + 316 + return invalidate ? NULL : dentry; 317 + } 318 + 319 + static bool match_event_file(struct eventfs_file *ef, const char *name) 320 + { 321 + bool ret; 322 + 323 + mutex_lock(&eventfs_mutex); 324 + ret = !ef->is_freed && strcmp(ef->name, name) == 0; 325 + mutex_unlock(&eventfs_mutex); 326 + 327 + return ret; 328 + } 329 + 330 + /** 331 + * eventfs_root_lookup - lookup routine to create file/dir 332 + * @dir: in which a lookup is being done 333 + * @dentry: file/dir dentry 334 + * @flags: to pass as flags parameter to simple lookup 335 + * 336 + * Used to create a dynamic file/dir within @dir. Use the eventfs_inode 337 + * list of meta data to find the information needed to create the file/dir. 338 + */ 339 + static struct dentry *eventfs_root_lookup(struct inode *dir, 340 + struct dentry *dentry, 341 + unsigned int flags) 342 + { 343 + struct tracefs_inode *ti; 344 + struct eventfs_inode *ei; 345 + struct eventfs_file *ef; 346 + struct dentry *ret = NULL; 347 + int idx; 348 + 349 + ti = get_tracefs(dir); 350 + if (!(ti->flags & TRACEFS_EVENT_INODE)) 351 + return NULL; 352 + 353 + ei = ti->private; 354 + idx = srcu_read_lock(&eventfs_srcu); 355 + list_for_each_entry_srcu(ef, &ei->e_top_files, list, 356 + srcu_read_lock_held(&eventfs_srcu)) { 357 + if (!match_event_file(ef, dentry->d_name.name)) 358 + continue; 359 + ret = simple_lookup(dir, dentry, flags); 360 + create_dentry(ef, ef->d_parent, true); 361 + break; 362 + } 363 + srcu_read_unlock(&eventfs_srcu, idx); 364 + return ret; 365 + } 366 + 367 + /** 368 + * eventfs_release - called to release eventfs file/dir 369 + * @inode: inode to be released 370 + * @file: file to be released (not used) 371 + */ 372 + static int eventfs_release(struct inode *inode, struct file *file) 373 + { 374 + struct tracefs_inode *ti; 375 + struct eventfs_inode *ei; 376 + struct eventfs_file *ef; 377 + struct dentry *dentry; 378 + int idx; 379 + 380 + ti = get_tracefs(inode); 381 + if (!(ti->flags & TRACEFS_EVENT_INODE)) 382 + return -EINVAL; 383 + 384 + ei = ti->private; 385 + idx = srcu_read_lock(&eventfs_srcu); 386 + list_for_each_entry_srcu(ef, &ei->e_top_files, list, 387 + srcu_read_lock_held(&eventfs_srcu)) { 388 + mutex_lock(&eventfs_mutex); 389 + dentry = ef->dentry; 390 + mutex_unlock(&eventfs_mutex); 391 + if (dentry) 392 + dput(dentry); 393 + } 394 + srcu_read_unlock(&eventfs_srcu, idx); 395 + return dcache_dir_close(inode, file); 396 + } 397 + 398 + /** 399 + * dcache_dir_open_wrapper - eventfs open wrapper 400 + * @inode: not used 401 + * @file: dir to be opened (to create its child) 402 + * 403 + * Used to dynamically create the file/dir within @file. @file is really a 404 + * directory and all the files/dirs of the children within @file will be 405 + * created. If any of the files/dirs have already been created, their 406 + * reference count will be incremented. 407 + */ 408 + static int dcache_dir_open_wrapper(struct inode *inode, struct file *file) 409 + { 410 + struct tracefs_inode *ti; 411 + struct eventfs_inode *ei; 412 + struct eventfs_file *ef; 413 + struct dentry *dentry = file_dentry(file); 414 + struct inode *f_inode = file_inode(file); 415 + int idx; 416 + 417 + ti = get_tracefs(f_inode); 418 + if (!(ti->flags & TRACEFS_EVENT_INODE)) 419 + return -EINVAL; 420 + 421 + ei = ti->private; 422 + idx = srcu_read_lock(&eventfs_srcu); 423 + list_for_each_entry_rcu(ef, &ei->e_top_files, list) { 424 + create_dentry(ef, dentry, false); 425 + } 426 + srcu_read_unlock(&eventfs_srcu, idx); 427 + return dcache_dir_open(inode, file); 428 + } 429 + 430 + /** 431 + * eventfs_prepare_ef - helper function to prepare eventfs_file 432 + * @name: the name of the file/directory to create. 433 + * @mode: the permission that the file should have. 434 + * @fop: struct file_operations that should be used for this file/directory. 435 + * @iop: struct inode_operations that should be used for this file/directory. 436 + * @data: something that the caller will want to get to later on. The 437 + * inode.i_private pointer will point to this value on the open() call. 438 + * 439 + * This function allocates and fills the eventfs_file structure. 440 + */ 441 + static struct eventfs_file *eventfs_prepare_ef(const char *name, umode_t mode, 442 + const struct file_operations *fop, 443 + const struct inode_operations *iop, 444 + void *data) 445 + { 446 + struct eventfs_file *ef; 447 + 448 + ef = kzalloc(sizeof(*ef), GFP_KERNEL); 449 + if (!ef) 450 + return ERR_PTR(-ENOMEM); 451 + 452 + ef->name = kstrdup(name, GFP_KERNEL); 453 + if (!ef->name) { 454 + kfree(ef); 455 + return ERR_PTR(-ENOMEM); 456 + } 457 + 458 + if (S_ISDIR(mode)) { 459 + ef->ei = kzalloc(sizeof(*ef->ei), GFP_KERNEL); 460 + if (!ef->ei) { 461 + kfree(ef->name); 462 + kfree(ef); 463 + return ERR_PTR(-ENOMEM); 464 + } 465 + INIT_LIST_HEAD(&ef->ei->e_top_files); 466 + } else { 467 + ef->ei = NULL; 468 + } 469 + 470 + ef->iop = iop; 471 + ef->fop = fop; 472 + ef->mode = mode; 473 + ef->data = data; 474 + return ef; 475 + } 476 + 477 + /** 478 + * eventfs_create_events_dir - create the trace event structure 479 + * @name: the name of the directory to create. 480 + * @parent: parent dentry for this file. This should be a directory dentry 481 + * if set. If this parameter is NULL, then the directory will be 482 + * created in the root of the tracefs filesystem. 483 + * 484 + * This function creates the top of the trace event directory. 485 + */ 486 + struct dentry *eventfs_create_events_dir(const char *name, 487 + struct dentry *parent) 488 + { 489 + struct dentry *dentry = tracefs_start_creating(name, parent); 490 + struct eventfs_inode *ei; 491 + struct tracefs_inode *ti; 492 + struct inode *inode; 493 + 494 + if (IS_ERR(dentry)) 495 + return dentry; 496 + 497 + ei = kzalloc(sizeof(*ei), GFP_KERNEL); 498 + if (!ei) 499 + return ERR_PTR(-ENOMEM); 500 + inode = tracefs_get_inode(dentry->d_sb); 501 + if (unlikely(!inode)) { 502 + kfree(ei); 503 + tracefs_failed_creating(dentry); 504 + return ERR_PTR(-ENOMEM); 505 + } 506 + 507 + INIT_LIST_HEAD(&ei->e_top_files); 508 + 509 + ti = get_tracefs(inode); 510 + ti->flags |= TRACEFS_EVENT_INODE; 511 + ti->private = ei; 512 + 513 + inode->i_mode = S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO; 514 + inode->i_op = &eventfs_root_dir_inode_operations; 515 + inode->i_fop = &eventfs_file_operations; 516 + 517 + /* directory inodes start off with i_nlink == 2 (for "." entry) */ 518 + inc_nlink(inode); 519 + d_instantiate(dentry, inode); 520 + inc_nlink(dentry->d_parent->d_inode); 521 + fsnotify_mkdir(dentry->d_parent->d_inode, dentry); 522 + return tracefs_end_creating(dentry); 523 + } 524 + 525 + /** 526 + * eventfs_add_subsystem_dir - add eventfs subsystem_dir to list to create later 527 + * @name: the name of the file to create. 528 + * @parent: parent dentry for this dir. 529 + * 530 + * This function adds eventfs subsystem dir to list. 531 + * And all these dirs are created on the fly when they are looked up, 532 + * and the dentry and inodes will be removed when they are done. 533 + */ 534 + struct eventfs_file *eventfs_add_subsystem_dir(const char *name, 535 + struct dentry *parent) 536 + { 537 + struct tracefs_inode *ti_parent; 538 + struct eventfs_inode *ei_parent; 539 + struct eventfs_file *ef; 540 + 541 + if (!parent) 542 + return ERR_PTR(-EINVAL); 543 + 544 + ti_parent = get_tracefs(parent->d_inode); 545 + ei_parent = ti_parent->private; 546 + 547 + ef = eventfs_prepare_ef(name, S_IFDIR, NULL, NULL, NULL); 548 + if (IS_ERR(ef)) 549 + return ef; 550 + 551 + mutex_lock(&eventfs_mutex); 552 + list_add_tail(&ef->list, &ei_parent->e_top_files); 553 + ef->d_parent = parent; 554 + mutex_unlock(&eventfs_mutex); 555 + return ef; 556 + } 557 + 558 + /** 559 + * eventfs_add_dir - add eventfs dir to list to create later 560 + * @name: the name of the file to create. 561 + * @ef_parent: parent eventfs_file for this dir. 562 + * 563 + * This function adds eventfs dir to list. 564 + * And all these dirs are created on the fly when they are looked up, 565 + * and the dentry and inodes will be removed when they are done. 566 + */ 567 + struct eventfs_file *eventfs_add_dir(const char *name, 568 + struct eventfs_file *ef_parent) 569 + { 570 + struct eventfs_file *ef; 571 + 572 + if (!ef_parent) 573 + return ERR_PTR(-EINVAL); 574 + 575 + ef = eventfs_prepare_ef(name, S_IFDIR, NULL, NULL, NULL); 576 + if (IS_ERR(ef)) 577 + return ef; 578 + 579 + mutex_lock(&eventfs_mutex); 580 + list_add_tail(&ef->list, &ef_parent->ei->e_top_files); 581 + ef->d_parent = ef_parent->dentry; 582 + mutex_unlock(&eventfs_mutex); 583 + return ef; 584 + } 585 + 586 + /** 587 + * eventfs_add_events_file - add the data needed to create a file for later reference 588 + * @name: the name of the file to create. 589 + * @mode: the permission that the file should have. 590 + * @parent: parent dentry for this file. 591 + * @data: something that the caller will want to get to later on. 592 + * @fop: struct file_operations that should be used for this file. 593 + * 594 + * This function is used to add the information needed to create a 595 + * dentry/inode within the top level events directory. The file created 596 + * will have the @mode permissions. The @data will be used to fill the 597 + * inode.i_private when the open() call is done. The dentry and inodes are 598 + * all created when they are referenced, and removed when they are no 599 + * longer referenced. 600 + */ 601 + int eventfs_add_events_file(const char *name, umode_t mode, 602 + struct dentry *parent, void *data, 603 + const struct file_operations *fop) 604 + { 605 + struct tracefs_inode *ti; 606 + struct eventfs_inode *ei; 607 + struct eventfs_file *ef; 608 + 609 + if (!parent) 610 + return -EINVAL; 611 + 612 + if (!(mode & S_IFMT)) 613 + mode |= S_IFREG; 614 + 615 + if (!parent->d_inode) 616 + return -EINVAL; 617 + 618 + ti = get_tracefs(parent->d_inode); 619 + if (!(ti->flags & TRACEFS_EVENT_INODE)) 620 + return -EINVAL; 621 + 622 + ei = ti->private; 623 + ef = eventfs_prepare_ef(name, mode, fop, NULL, data); 624 + 625 + if (IS_ERR(ef)) 626 + return -ENOMEM; 627 + 628 + mutex_lock(&eventfs_mutex); 629 + list_add_tail(&ef->list, &ei->e_top_files); 630 + ef->d_parent = parent; 631 + mutex_unlock(&eventfs_mutex); 632 + return 0; 633 + } 634 + 635 + /** 636 + * eventfs_add_file - add eventfs file to list to create later 637 + * @name: the name of the file to create. 638 + * @mode: the permission that the file should have. 639 + * @ef_parent: parent eventfs_file for this file. 640 + * @data: something that the caller will want to get to later on. 641 + * @fop: struct file_operations that should be used for this file. 642 + * 643 + * This function is used to add the information needed to create a 644 + * file within a subdirectory of the events directory. The file created 645 + * will have the @mode permissions. The @data will be used to fill the 646 + * inode.i_private when the open() call is done. The dentry and inodes are 647 + * all created when they are referenced, and removed when they are no 648 + * longer referenced. 649 + */ 650 + int eventfs_add_file(const char *name, umode_t mode, 651 + struct eventfs_file *ef_parent, 652 + void *data, 653 + const struct file_operations *fop) 654 + { 655 + struct eventfs_file *ef; 656 + 657 + if (!ef_parent) 658 + return -EINVAL; 659 + 660 + if (!(mode & S_IFMT)) 661 + mode |= S_IFREG; 662 + 663 + ef = eventfs_prepare_ef(name, mode, fop, NULL, data); 664 + if (IS_ERR(ef)) 665 + return -ENOMEM; 666 + 667 + mutex_lock(&eventfs_mutex); 668 + list_add_tail(&ef->list, &ef_parent->ei->e_top_files); 669 + ef->d_parent = ef_parent->dentry; 670 + mutex_unlock(&eventfs_mutex); 671 + return 0; 672 + } 673 + 674 + static void free_ef(struct rcu_head *head) 675 + { 676 + struct eventfs_file *ef = container_of(head, struct eventfs_file, rcu); 677 + 678 + kfree(ef->name); 679 + kfree(ef->ei); 680 + kfree(ef); 681 + } 682 + 683 + /** 684 + * eventfs_remove_rec - remove eventfs dir or file from list 685 + * @ef: eventfs_file to be removed. 686 + * @head: to create list of eventfs_file to be deleted 687 + * @level: to check recursion depth 688 + * 689 + * The helper function eventfs_remove_rec() is used to clean up and free the 690 + * associated data from eventfs for both of the added functions. 691 + */ 692 + static void eventfs_remove_rec(struct eventfs_file *ef, struct list_head *head, int level) 693 + { 694 + struct eventfs_file *ef_child; 695 + 696 + if (!ef) 697 + return; 698 + /* 699 + * Check recursion depth. It should never be greater than 3: 700 + * 0 - events/ 701 + * 1 - events/group/ 702 + * 2 - events/group/event/ 703 + * 3 - events/group/event/file 704 + */ 705 + if (WARN_ON_ONCE(level > 3)) 706 + return; 707 + 708 + if (ef->ei) { 709 + /* search for nested folders or files */ 710 + list_for_each_entry_srcu(ef_child, &ef->ei->e_top_files, list, 711 + lockdep_is_held(&eventfs_mutex)) { 712 + eventfs_remove_rec(ef_child, head, level + 1); 713 + } 714 + } 715 + 716 + list_del_rcu(&ef->list); 717 + list_add_tail(&ef->del_list, head); 718 + } 719 + 720 + /** 721 + * eventfs_remove - remove eventfs dir or file from list 722 + * @ef: eventfs_file to be removed. 723 + * 724 + * This function acquire the eventfs_mutex lock and call eventfs_remove_rec() 725 + */ 726 + void eventfs_remove(struct eventfs_file *ef) 727 + { 728 + struct eventfs_file *tmp; 729 + LIST_HEAD(ef_del_list); 730 + struct dentry *dentry_list = NULL; 731 + struct dentry *dentry; 732 + 733 + if (!ef) 734 + return; 735 + 736 + mutex_lock(&eventfs_mutex); 737 + eventfs_remove_rec(ef, &ef_del_list, 0); 738 + list_for_each_entry_safe(ef, tmp, &ef_del_list, del_list) { 739 + if (ef->dentry) { 740 + unsigned long ptr = (unsigned long)dentry_list; 741 + 742 + /* Keep the dentry from being freed yet */ 743 + dget(ef->dentry); 744 + 745 + /* 746 + * Paranoid: The dget() above should prevent the dentry 747 + * from being freed and calling eventfs_set_ef_status_free(). 748 + * But just in case, set the link list LSB pointer to 1 749 + * and have eventfs_set_ef_status_free() check that to 750 + * make sure that if it does happen, it will not think 751 + * the d_fsdata is an event_file. 752 + * 753 + * For this to work, no event_file should be allocated 754 + * on a odd space, as the ef should always be allocated 755 + * to be at least word aligned. Check for that too. 756 + */ 757 + WARN_ON_ONCE(ptr & 1); 758 + 759 + ef->dentry->d_fsdata = (void *)(ptr | 1); 760 + dentry_list = ef->dentry; 761 + ef->dentry = NULL; 762 + } 763 + call_srcu(&eventfs_srcu, &ef->rcu, free_ef); 764 + } 765 + mutex_unlock(&eventfs_mutex); 766 + 767 + while (dentry_list) { 768 + unsigned long ptr; 769 + 770 + dentry = dentry_list; 771 + ptr = (unsigned long)dentry->d_fsdata & ~1UL; 772 + dentry_list = (struct dentry *)ptr; 773 + dentry->d_fsdata = NULL; 774 + d_invalidate(dentry); 775 + mutex_lock(&eventfs_mutex); 776 + /* dentry should now have at least a single reference */ 777 + WARN_ONCE((int)d_count(dentry) < 1, 778 + "dentry %p less than one reference (%d) after invalidate\n", 779 + dentry, d_count(dentry)); 780 + mutex_unlock(&eventfs_mutex); 781 + dput(dentry); 782 + } 783 + } 784 + 785 + /** 786 + * eventfs_remove_events_dir - remove eventfs dir or file from list 787 + * @dentry: events's dentry to be removed. 788 + * 789 + * This function remove events main directory 790 + */ 791 + void eventfs_remove_events_dir(struct dentry *dentry) 792 + { 793 + struct tracefs_inode *ti; 794 + struct eventfs_inode *ei; 795 + 796 + if (!dentry || !dentry->d_inode) 797 + return; 798 + 799 + ti = get_tracefs(dentry->d_inode); 800 + if (!ti || !(ti->flags & TRACEFS_EVENT_INODE)) 801 + return; 802 + 803 + ei = ti->private; 804 + d_invalidate(dentry); 805 + dput(dentry); 806 + kfree(ei); 807 + }
+145 -12
fs/tracefs/inode.c
··· 21 21 #include <linux/parser.h> 22 22 #include <linux/magic.h> 23 23 #include <linux/slab.h> 24 + #include "internal.h" 24 25 25 26 #define TRACEFS_DEFAULT_MODE 0700 27 + static struct kmem_cache *tracefs_inode_cachep __ro_after_init; 26 28 27 29 static struct vfsmount *tracefs_mount; 28 30 static int tracefs_mount_count; 29 31 static bool tracefs_registered; 32 + 33 + static struct inode *tracefs_alloc_inode(struct super_block *sb) 34 + { 35 + struct tracefs_inode *ti; 36 + 37 + ti = kmem_cache_alloc(tracefs_inode_cachep, GFP_KERNEL); 38 + if (!ti) 39 + return NULL; 40 + 41 + ti->flags = 0; 42 + 43 + return &ti->vfs_inode; 44 + } 45 + 46 + static void tracefs_free_inode(struct inode *inode) 47 + { 48 + kmem_cache_free(tracefs_inode_cachep, get_tracefs(inode)); 49 + } 30 50 31 51 static ssize_t default_read_file(struct file *file, char __user *buf, 32 52 size_t count, loff_t *ppos) ··· 147 127 .rmdir = tracefs_syscall_rmdir, 148 128 }; 149 129 150 - static struct inode *tracefs_get_inode(struct super_block *sb) 130 + struct inode *tracefs_get_inode(struct super_block *sb) 151 131 { 152 132 struct inode *inode = new_inode(sb); 153 133 if (inode) { ··· 310 290 struct tracefs_fs_info *fsi = sb->s_fs_info; 311 291 struct inode *inode = d_inode(sb->s_root); 312 292 struct tracefs_mount_opts *opts = &fsi->mount_opts; 293 + umode_t tmp_mode; 313 294 314 295 /* 315 296 * On remount, only reset mode/uid/gid if they were provided as mount ··· 318 297 */ 319 298 320 299 if (!remount || opts->opts & BIT(Opt_mode)) { 321 - inode->i_mode &= ~S_IALLUGO; 322 - inode->i_mode |= opts->mode; 300 + tmp_mode = READ_ONCE(inode->i_mode) & ~S_IALLUGO; 301 + tmp_mode |= opts->mode; 302 + WRITE_ONCE(inode->i_mode, tmp_mode); 323 303 } 324 304 325 305 if (!remount || opts->opts & BIT(Opt_uid)) ··· 368 346 } 369 347 370 348 static const struct super_operations tracefs_super_operations = { 349 + .alloc_inode = tracefs_alloc_inode, 350 + .free_inode = tracefs_free_inode, 351 + .drop_inode = generic_delete_inode, 371 352 .statfs = simple_statfs, 372 353 .remount_fs = tracefs_remount, 373 354 .show_options = tracefs_show_options, 355 + }; 356 + 357 + static void tracefs_dentry_iput(struct dentry *dentry, struct inode *inode) 358 + { 359 + struct tracefs_inode *ti; 360 + 361 + if (!dentry || !inode) 362 + return; 363 + 364 + ti = get_tracefs(inode); 365 + if (ti && ti->flags & TRACEFS_EVENT_INODE) 366 + eventfs_set_ef_status_free(dentry); 367 + iput(inode); 368 + } 369 + 370 + static const struct dentry_operations tracefs_dentry_operations = { 371 + .d_iput = tracefs_dentry_iput, 374 372 }; 375 373 376 374 static int trace_fill_super(struct super_block *sb, void *data, int silent) ··· 415 373 goto fail; 416 374 417 375 sb->s_op = &tracefs_super_operations; 376 + sb->s_d_op = &tracefs_dentry_operations; 418 377 419 378 tracefs_apply_options(sb, false); 420 379 ··· 442 399 }; 443 400 MODULE_ALIAS_FS("tracefs"); 444 401 445 - static struct dentry *start_creating(const char *name, struct dentry *parent) 402 + struct dentry *tracefs_start_creating(const char *name, struct dentry *parent) 446 403 { 447 404 struct dentry *dentry; 448 405 int error; ··· 480 437 return dentry; 481 438 } 482 439 483 - static struct dentry *failed_creating(struct dentry *dentry) 440 + struct dentry *tracefs_failed_creating(struct dentry *dentry) 484 441 { 485 442 inode_unlock(d_inode(dentry->d_parent)); 486 443 dput(dentry); ··· 488 445 return NULL; 489 446 } 490 447 491 - static struct dentry *end_creating(struct dentry *dentry) 448 + struct dentry *tracefs_end_creating(struct dentry *dentry) 492 449 { 493 450 inode_unlock(d_inode(dentry->d_parent)); 451 + return dentry; 452 + } 453 + 454 + /** 455 + * eventfs_start_creating - start the process of creating a dentry 456 + * @name: Name of the file created for the dentry 457 + * @parent: The parent dentry where this dentry will be created 458 + * 459 + * This is a simple helper function for the dynamically created eventfs 460 + * files. When the directory of the eventfs files are accessed, their 461 + * dentries are created on the fly. This function is used to start that 462 + * process. 463 + */ 464 + struct dentry *eventfs_start_creating(const char *name, struct dentry *parent) 465 + { 466 + struct dentry *dentry; 467 + int error; 468 + 469 + error = simple_pin_fs(&trace_fs_type, &tracefs_mount, 470 + &tracefs_mount_count); 471 + if (error) 472 + return ERR_PTR(error); 473 + 474 + /* 475 + * If the parent is not specified, we create it in the root. 476 + * We need the root dentry to do this, which is in the super 477 + * block. A pointer to that is in the struct vfsmount that we 478 + * have around. 479 + */ 480 + if (!parent) 481 + parent = tracefs_mount->mnt_root; 482 + 483 + if (unlikely(IS_DEADDIR(parent->d_inode))) 484 + dentry = ERR_PTR(-ENOENT); 485 + else 486 + dentry = lookup_one_len(name, parent, strlen(name)); 487 + 488 + if (!IS_ERR(dentry) && dentry->d_inode) { 489 + dput(dentry); 490 + dentry = ERR_PTR(-EEXIST); 491 + } 492 + 493 + if (IS_ERR(dentry)) 494 + simple_release_fs(&tracefs_mount, &tracefs_mount_count); 495 + 496 + return dentry; 497 + } 498 + 499 + /** 500 + * eventfs_failed_creating - clean up a failed eventfs dentry creation 501 + * @dentry: The dentry to clean up 502 + * 503 + * If after calling eventfs_start_creating(), a failure is detected, the 504 + * resources created by eventfs_start_creating() needs to be cleaned up. In 505 + * that case, this function should be called to perform that clean up. 506 + */ 507 + struct dentry *eventfs_failed_creating(struct dentry *dentry) 508 + { 509 + dput(dentry); 510 + simple_release_fs(&tracefs_mount, &tracefs_mount_count); 511 + return NULL; 512 + } 513 + 514 + /** 515 + * eventfs_end_creating - Finish the process of creating a eventfs dentry 516 + * @dentry: The dentry that has successfully been created. 517 + * 518 + * This function is currently just a place holder to match 519 + * eventfs_start_creating(). In case any synchronization needs to be added, 520 + * this function will be used to implement that without having to modify 521 + * the callers of eventfs_start_creating(). 522 + */ 523 + struct dentry *eventfs_end_creating(struct dentry *dentry) 524 + { 494 525 return dentry; 495 526 } 496 527 ··· 607 490 if (!(mode & S_IFMT)) 608 491 mode |= S_IFREG; 609 492 BUG_ON(!S_ISREG(mode)); 610 - dentry = start_creating(name, parent); 493 + dentry = tracefs_start_creating(name, parent); 611 494 612 495 if (IS_ERR(dentry)) 613 496 return NULL; 614 497 615 498 inode = tracefs_get_inode(dentry->d_sb); 616 499 if (unlikely(!inode)) 617 - return failed_creating(dentry); 500 + return tracefs_failed_creating(dentry); 618 501 619 502 inode->i_mode = mode; 620 503 inode->i_fop = fops ? fops : &tracefs_file_operations; ··· 623 506 inode->i_gid = d_inode(dentry->d_parent)->i_gid; 624 507 d_instantiate(dentry, inode); 625 508 fsnotify_create(d_inode(dentry->d_parent), dentry); 626 - return end_creating(dentry); 509 + return tracefs_end_creating(dentry); 627 510 } 628 511 629 512 static struct dentry *__create_dir(const char *name, struct dentry *parent, 630 513 const struct inode_operations *ops) 631 514 { 632 - struct dentry *dentry = start_creating(name, parent); 515 + struct dentry *dentry = tracefs_start_creating(name, parent); 633 516 struct inode *inode; 634 517 635 518 if (IS_ERR(dentry)) ··· 637 520 638 521 inode = tracefs_get_inode(dentry->d_sb); 639 522 if (unlikely(!inode)) 640 - return failed_creating(dentry); 523 + return tracefs_failed_creating(dentry); 641 524 642 525 /* Do not set bits for OTH */ 643 526 inode->i_mode = S_IFDIR | S_IRWXU | S_IRUSR| S_IRGRP | S_IXUSR | S_IXGRP; ··· 651 534 d_instantiate(dentry, inode); 652 535 inc_nlink(d_inode(dentry->d_parent)); 653 536 fsnotify_mkdir(d_inode(dentry->d_parent), dentry); 654 - return end_creating(dentry); 537 + return tracefs_end_creating(dentry); 655 538 } 656 539 657 540 /** ··· 745 628 return tracefs_registered; 746 629 } 747 630 631 + static void init_once(void *foo) 632 + { 633 + struct tracefs_inode *ti = (struct tracefs_inode *) foo; 634 + 635 + inode_init_once(&ti->vfs_inode); 636 + } 637 + 748 638 static int __init tracefs_init(void) 749 639 { 750 640 int retval; 641 + 642 + tracefs_inode_cachep = kmem_cache_create("tracefs_inode_cache", 643 + sizeof(struct tracefs_inode), 644 + 0, (SLAB_RECLAIM_ACCOUNT| 645 + SLAB_MEM_SPREAD| 646 + SLAB_ACCOUNT), 647 + init_once); 648 + if (!tracefs_inode_cachep) 649 + return -ENOMEM; 751 650 752 651 retval = sysfs_create_mount_point(kernel_kobj, "tracing"); 753 652 if (retval)
+29
fs/tracefs/internal.h
··· 1 + /* SPDX-License-Identifier: GPL-2.0 */ 2 + #ifndef _TRACEFS_INTERNAL_H 3 + #define _TRACEFS_INTERNAL_H 4 + 5 + enum { 6 + TRACEFS_EVENT_INODE = BIT(1), 7 + }; 8 + 9 + struct tracefs_inode { 10 + unsigned long flags; 11 + void *private; 12 + struct inode vfs_inode; 13 + }; 14 + 15 + static inline struct tracefs_inode *get_tracefs(const struct inode *inode) 16 + { 17 + return container_of(inode, struct tracefs_inode, vfs_inode); 18 + } 19 + 20 + struct dentry *tracefs_start_creating(const char *name, struct dentry *parent); 21 + struct dentry *tracefs_end_creating(struct dentry *dentry); 22 + struct dentry *tracefs_failed_creating(struct dentry *dentry); 23 + struct inode *tracefs_get_inode(struct super_block *sb); 24 + struct dentry *eventfs_start_creating(const char *name, struct dentry *parent); 25 + struct dentry *eventfs_failed_creating(struct dentry *dentry); 26 + struct dentry *eventfs_end_creating(struct dentry *dentry); 27 + void eventfs_set_ef_status_free(struct dentry *dentry); 28 + 29 + #endif /* _TRACEFS_INTERNAL_H */
-5
include/linux/ftrace.h
··· 862 862 extern void ftrace_module_init(struct module *mod); 863 863 extern void ftrace_module_enable(struct module *mod); 864 864 extern void ftrace_release_mod(struct module *mod); 865 - 866 - extern void ftrace_disable_daemon(void); 867 - extern void ftrace_enable_daemon(void); 868 865 #else /* CONFIG_DYNAMIC_FTRACE */ 869 866 static inline int skip_trace(unsigned long ip) { return 0; } 870 - static inline void ftrace_disable_daemon(void) { } 871 - static inline void ftrace_enable_daemon(void) { } 872 867 static inline void ftrace_module_init(struct module *mod) { } 873 868 static inline void ftrace_module_enable(struct module *mod) { } 874 869 static inline void ftrace_release_mod(struct module *mod) { }
+2
include/linux/trace_events.h
··· 649 649 struct list_head list; 650 650 struct trace_event_call *event_call; 651 651 struct event_filter __rcu *filter; 652 + struct eventfs_file *ef; 652 653 struct dentry *dir; 653 654 struct trace_array *tr; 654 655 struct trace_subsystem_dir *system; ··· 825 824 FILTER_RDYN_STRING, 826 825 FILTER_PTR_STRING, 827 826 FILTER_TRACE_FN, 827 + FILTER_CPUMASK, 828 828 FILTER_COMM, 829 829 FILTER_CPU, 830 830 FILTER_STACKTRACE,
+23
include/linux/tracefs.h
··· 21 21 22 22 #ifdef CONFIG_TRACING 23 23 24 + struct eventfs_file; 25 + 26 + struct dentry *eventfs_create_events_dir(const char *name, 27 + struct dentry *parent); 28 + 29 + struct eventfs_file *eventfs_add_subsystem_dir(const char *name, 30 + struct dentry *parent); 31 + 32 + struct eventfs_file *eventfs_add_dir(const char *name, 33 + struct eventfs_file *ef_parent); 34 + 35 + int eventfs_add_file(const char *name, umode_t mode, 36 + struct eventfs_file *ef_parent, void *data, 37 + const struct file_operations *fops); 38 + 39 + int eventfs_add_events_file(const char *name, umode_t mode, 40 + struct dentry *parent, void *data, 41 + const struct file_operations *fops); 42 + 43 + void eventfs_remove(struct eventfs_file *ef); 44 + 45 + void eventfs_remove_events_dir(struct dentry *dentry); 46 + 24 47 struct dentry *tracefs_create_file(const char *name, umode_t mode, 25 48 struct dentry *parent, void *data, 26 49 const struct file_operations *fops);
+6 -14
kernel/trace/ring_buffer.c
··· 692 692 static inline bool 693 693 rb_time_read_cmpxchg(local_t *l, unsigned long expect, unsigned long set) 694 694 { 695 - unsigned long ret; 696 - 697 - ret = local_cmpxchg(l, expect, set); 698 - return ret == expect; 695 + return local_try_cmpxchg(l, &expect, set); 699 696 } 700 697 701 698 static bool rb_time_cmpxchg(rb_time_t *t, u64 expect, u64 set) ··· 749 752 750 753 static bool rb_time_cmpxchg(rb_time_t *t, u64 expect, u64 set) 751 754 { 752 - u64 val; 753 - val = local64_cmpxchg(&t->time, expect, set); 754 - return val == expect; 755 + return local64_try_cmpxchg(&t->time, &expect, set); 755 756 } 756 757 #endif 757 758 ··· 1489 1494 { 1490 1495 unsigned long *ptr = (unsigned long *)&old->list.prev->next; 1491 1496 unsigned long val; 1492 - unsigned long ret; 1493 1497 1494 1498 val = *ptr & ~RB_FLAG_MASK; 1495 1499 val |= RB_PAGE_HEAD; 1496 1500 1497 - ret = cmpxchg(ptr, val, (unsigned long)&new->list); 1498 - 1499 - return ret == val; 1501 + return try_cmpxchg(ptr, &val, (unsigned long)&new->list); 1500 1502 } 1501 1503 1502 1504 /* ··· 2995 3003 { 2996 3004 unsigned long new_index, old_index; 2997 3005 struct buffer_page *bpage; 2998 - unsigned long index; 2999 3006 unsigned long addr; 3000 3007 u64 write_stamp; 3001 3008 u64 delta; ··· 3051 3060 */ 3052 3061 old_index += write_mask; 3053 3062 new_index += write_mask; 3054 - index = local_cmpxchg(&bpage->write, old_index, new_index); 3055 - if (index == old_index) { 3063 + 3064 + /* caution: old_index gets updated on cmpxchg failure */ 3065 + if (local_try_cmpxchg(&bpage->write, &old_index, new_index)) { 3056 3066 /* update counters */ 3057 3067 local_sub(event_length, &cpu_buffer->entries_bytes); 3058 3068 return true;
+40 -59
kernel/trace/trace.c
··· 3119 3119 struct ftrace_stack *fstack; 3120 3120 struct stack_entry *entry; 3121 3121 int stackidx; 3122 - void *ptr; 3123 3122 3124 3123 /* 3125 3124 * Add one, for this function and the call to save_stack_trace() ··· 3156 3157 nr_entries = stack_trace_save(fstack->calls, size, skip); 3157 3158 } 3158 3159 3159 - size = nr_entries * sizeof(unsigned long); 3160 3160 event = __trace_buffer_lock_reserve(buffer, TRACE_STACK, 3161 - (sizeof(*entry) - sizeof(entry->caller)) + size, 3161 + struct_size(entry, caller, nr_entries), 3162 3162 trace_ctx); 3163 3163 if (!event) 3164 3164 goto out; 3165 - ptr = ring_buffer_event_data(event); 3166 - entry = ptr; 3167 - 3168 - /* 3169 - * For backward compatibility reasons, the entry->caller is an 3170 - * array of 8 slots to store the stack. This is also exported 3171 - * to user space. The amount allocated on the ring buffer actually 3172 - * holds enough for the stack specified by nr_entries. This will 3173 - * go into the location of entry->caller. Due to string fortifiers 3174 - * checking the size of the destination of memcpy() it triggers 3175 - * when it detects that size is greater than 8. To hide this from 3176 - * the fortifiers, we use "ptr" and pointer arithmetic to assign caller. 3177 - * 3178 - * The below is really just: 3179 - * memcpy(&entry->caller, fstack->calls, size); 3180 - */ 3181 - ptr += offsetof(typeof(*entry), caller); 3182 - memcpy(ptr, fstack->calls, size); 3165 + entry = ring_buffer_event_data(event); 3183 3166 3184 3167 entry->size = nr_entries; 3168 + memcpy(&entry->caller, fstack->calls, 3169 + flex_array_size(entry, caller, nr_entries)); 3185 3170 3186 3171 if (!call_filter_check_discard(call, entry, buffer, event)) 3187 3172 __buffer_unlock_commit(buffer, event); ··· 4189 4206 loff_t l = 0; 4190 4207 int cpu; 4191 4208 4192 - /* 4193 - * copy the tracer to avoid using a global lock all around. 4194 - * iter->trace is a copy of current_trace, the pointer to the 4195 - * name may be used instead of a strcmp(), as iter->trace->name 4196 - * will point to the same string as current_trace->name. 4197 - */ 4198 4209 mutex_lock(&trace_types_lock); 4199 - if (unlikely(tr->current_trace && iter->trace->name != tr->current_trace->name)) { 4210 + if (unlikely(tr->current_trace != iter->trace)) { 4200 4211 /* Close iter->trace before switching to the new current tracer */ 4201 4212 if (iter->trace->close) 4202 4213 iter->trace->close(iter); 4203 - *iter->trace = *tr->current_trace; 4214 + iter->trace = tr->current_trace; 4204 4215 /* Reopen the new current tracer */ 4205 4216 if (iter->trace->open) 4206 4217 iter->trace->open(iter); ··· 4806 4829 .show = s_show, 4807 4830 }; 4808 4831 4832 + /* 4833 + * Note, as iter itself can be allocated and freed in different 4834 + * ways, this function is only used to free its content, and not 4835 + * the iterator itself. The only requirement to all the allocations 4836 + * is that it must zero all fields (kzalloc), as freeing works with 4837 + * ethier allocated content or NULL. 4838 + */ 4839 + static void free_trace_iter_content(struct trace_iterator *iter) 4840 + { 4841 + /* The fmt is either NULL, allocated or points to static_fmt_buf */ 4842 + if (iter->fmt != static_fmt_buf) 4843 + kfree(iter->fmt); 4844 + 4845 + kfree(iter->temp); 4846 + kfree(iter->buffer_iter); 4847 + mutex_destroy(&iter->mutex); 4848 + free_cpumask_var(iter->started); 4849 + } 4850 + 4809 4851 static struct trace_iterator * 4810 4852 __tracing_open(struct inode *inode, struct file *file, bool snapshot) 4811 4853 { ··· 4866 4870 iter->fmt = NULL; 4867 4871 iter->fmt_size = 0; 4868 4872 4869 - /* 4870 - * We make a copy of the current tracer to avoid concurrent 4871 - * changes on it while we are reading. 4872 - */ 4873 4873 mutex_lock(&trace_types_lock); 4874 - iter->trace = kzalloc(sizeof(*iter->trace), GFP_KERNEL); 4875 - if (!iter->trace) 4876 - goto fail; 4877 - 4878 - *iter->trace = *tr->current_trace; 4874 + iter->trace = tr->current_trace; 4879 4875 4880 4876 if (!zalloc_cpumask_var(&iter->started, GFP_KERNEL)) 4881 4877 goto fail; ··· 4932 4944 4933 4945 fail: 4934 4946 mutex_unlock(&trace_types_lock); 4935 - kfree(iter->trace); 4936 - kfree(iter->temp); 4937 - kfree(iter->buffer_iter); 4947 + free_trace_iter_content(iter); 4938 4948 release: 4939 4949 seq_release_private(inode, file); 4940 4950 return ERR_PTR(-ENOMEM); ··· 5011 5025 5012 5026 mutex_unlock(&trace_types_lock); 5013 5027 5014 - mutex_destroy(&iter->mutex); 5015 - free_cpumask_var(iter->started); 5016 - kfree(iter->fmt); 5017 - kfree(iter->temp); 5018 - kfree(iter->trace); 5019 - kfree(iter->buffer_iter); 5028 + free_trace_iter_content(iter); 5020 5029 seq_release_private(inode, file); 5021 5030 5022 5031 return 0; ··· 6299 6318 per_cpu_ptr(buf->data, cpu)->entries = val; 6300 6319 } 6301 6320 6321 + static void update_buffer_entries(struct array_buffer *buf, int cpu) 6322 + { 6323 + if (cpu == RING_BUFFER_ALL_CPUS) { 6324 + set_buffer_entries(buf, ring_buffer_size(buf->buffer, 0)); 6325 + } else { 6326 + per_cpu_ptr(buf->data, cpu)->entries = ring_buffer_size(buf->buffer, cpu); 6327 + } 6328 + } 6329 + 6302 6330 #ifdef CONFIG_TRACER_MAX_TRACE 6303 6331 /* resize @tr's buffer to the size of @size_tr's entries */ 6304 6332 static int resize_buffer_duplicate_size(struct array_buffer *trace_buf, ··· 6386 6396 return ret; 6387 6397 } 6388 6398 6389 - if (cpu == RING_BUFFER_ALL_CPUS) 6390 - set_buffer_entries(&tr->max_buffer, size); 6391 - else 6392 - per_cpu_ptr(tr->max_buffer.data, cpu)->entries = size; 6399 + update_buffer_entries(&tr->max_buffer, cpu); 6393 6400 6394 6401 out: 6395 6402 #endif /* CONFIG_TRACER_MAX_TRACE */ 6396 6403 6397 - if (cpu == RING_BUFFER_ALL_CPUS) 6398 - set_buffer_entries(&tr->array_buffer, size); 6399 - else 6400 - per_cpu_ptr(tr->array_buffer.data, cpu)->entries = size; 6404 + update_buffer_entries(&tr->array_buffer, cpu); 6401 6405 6402 6406 return ret; 6403 6407 } ··· 6809 6825 close_pipe_on_cpu(tr, iter->cpu_file); 6810 6826 mutex_unlock(&trace_types_lock); 6811 6827 6812 - free_cpumask_var(iter->started); 6813 - kfree(iter->fmt); 6814 - kfree(iter->temp); 6815 - mutex_destroy(&iter->mutex); 6828 + free_trace_iter_content(iter); 6816 6829 kfree(iter); 6817 6830 6818 6831 trace_array_put(tr);
+11 -3
kernel/trace/trace.h
··· 77 77 #undef __array 78 78 #define __array(type, item, size) type item[size]; 79 79 80 + /* 81 + * For backward compatibility, older user space expects to see the 82 + * kernel_stack event with a fixed size caller field. But today the fix 83 + * size is ignored by the kernel, and the real structure is dynamic. 84 + * Expose to user space: "unsigned long caller[8];" but the real structure 85 + * will be "unsigned long caller[] __counted_by(size)" 86 + */ 87 + #undef __stack_array 88 + #define __stack_array(type, item, size, field) type item[] __counted_by(field); 89 + 80 90 #undef __array_desc 81 91 #define __array_desc(type, container, item, size) 82 92 ··· 606 596 int tracer_init(struct tracer *t, struct trace_array *tr); 607 597 int tracing_is_enabled(void); 608 598 void tracing_reset_online_cpus(struct array_buffer *buf); 609 - void tracing_reset_current(int cpu); 610 599 void tracing_reset_all_online_cpus(void); 611 600 void tracing_reset_all_online_cpus_unlocked(void); 612 601 int tracing_open_generic(struct inode *inode, struct file *filp); ··· 706 697 void *trace_pid_next(struct trace_pid_list *pid_list, void *v, loff_t *pos); 707 698 void *trace_pid_start(struct trace_pid_list *pid_list, loff_t *pos); 708 699 int trace_pid_show(struct seq_file *m, void *v); 709 - void trace_free_pid_list(struct trace_pid_list *pid_list); 710 700 int trace_pid_write(struct trace_pid_list *filtered_pids, 711 701 struct trace_pid_list **new_pid_list, 712 702 const char __user *ubuf, size_t cnt); ··· 1342 1334 struct list_head list; 1343 1335 struct event_subsystem *subsystem; 1344 1336 struct trace_array *tr; 1345 - struct dentry *entry; 1337 + struct eventfs_file *ef; 1346 1338 int ref_count; 1347 1339 int nr_events; 1348 1340 };
+1 -1
kernel/trace/trace_entries.h
··· 190 190 191 191 F_STRUCT( 192 192 __field( int, size ) 193 - __array( unsigned long, caller, FTRACE_STACK_ENTRIES ) 193 + __stack_array( unsigned long, caller, FTRACE_STACK_ENTRIES, size) 194 194 ), 195 195 196 196 F_printk("\t=> %ps\n\t=> %ps\n\t=> %ps\n"
+39 -37
kernel/trace/trace_events.c
··· 984 984 return; 985 985 986 986 if (!--dir->nr_events) { 987 - tracefs_remove(dir->entry); 987 + eventfs_remove(dir->ef); 988 988 list_del(&dir->list); 989 989 __put_system_dir(dir); 990 990 } ··· 1005 1005 1006 1006 tracefs_remove(dir); 1007 1007 } 1008 - 1008 + eventfs_remove(file->ef); 1009 1009 list_del(&file->list); 1010 1010 remove_subsystem(file->system); 1011 1011 free_event_filter(file->filter); ··· 2291 2291 return NULL; 2292 2292 } 2293 2293 2294 - static struct dentry * 2294 + static struct eventfs_file * 2295 2295 event_subsystem_dir(struct trace_array *tr, const char *name, 2296 2296 struct trace_event_file *file, struct dentry *parent) 2297 2297 { 2298 2298 struct event_subsystem *system, *iter; 2299 2299 struct trace_subsystem_dir *dir; 2300 - struct dentry *entry; 2300 + int res; 2301 2301 2302 2302 /* First see if we did not already create this dir */ 2303 2303 list_for_each_entry(dir, &tr->systems, list) { ··· 2305 2305 if (strcmp(system->name, name) == 0) { 2306 2306 dir->nr_events++; 2307 2307 file->system = dir; 2308 - return dir->entry; 2308 + return dir->ef; 2309 2309 } 2310 2310 } 2311 2311 ··· 2329 2329 } else 2330 2330 __get_system(system); 2331 2331 2332 - dir->entry = tracefs_create_dir(name, parent); 2333 - if (!dir->entry) { 2332 + dir->ef = eventfs_add_subsystem_dir(name, parent); 2333 + if (IS_ERR(dir->ef)) { 2334 2334 pr_warn("Failed to create system directory %s\n", name); 2335 2335 __put_system(system); 2336 2336 goto out_free; ··· 2345 2345 /* the ftrace system is special, do not create enable or filter files */ 2346 2346 if (strcmp(name, "ftrace") != 0) { 2347 2347 2348 - entry = tracefs_create_file("filter", TRACE_MODE_WRITE, 2349 - dir->entry, dir, 2348 + res = eventfs_add_file("filter", TRACE_MODE_WRITE, 2349 + dir->ef, dir, 2350 2350 &ftrace_subsystem_filter_fops); 2351 - if (!entry) { 2351 + if (res) { 2352 2352 kfree(system->filter); 2353 2353 system->filter = NULL; 2354 2354 pr_warn("Could not create tracefs '%s/filter' entry\n", name); 2355 2355 } 2356 2356 2357 - trace_create_file("enable", TRACE_MODE_WRITE, dir->entry, dir, 2357 + eventfs_add_file("enable", TRACE_MODE_WRITE, dir->ef, dir, 2358 2358 &ftrace_system_enable_fops); 2359 2359 } 2360 2360 2361 2361 list_add(&dir->list, &tr->systems); 2362 2362 2363 - return dir->entry; 2363 + return dir->ef; 2364 2364 2365 2365 out_free: 2366 2366 kfree(dir); ··· 2413 2413 event_create_dir(struct dentry *parent, struct trace_event_file *file) 2414 2414 { 2415 2415 struct trace_event_call *call = file->event_call; 2416 + struct eventfs_file *ef_subsystem = NULL; 2416 2417 struct trace_array *tr = file->tr; 2417 - struct dentry *d_events; 2418 2418 const char *name; 2419 2419 int ret; 2420 2420 2421 2421 /* 2422 2422 * If the trace point header did not define TRACE_SYSTEM 2423 - * then the system would be called "TRACE_SYSTEM". 2423 + * then the system would be called "TRACE_SYSTEM". This should 2424 + * never happen. 2424 2425 */ 2425 - if (strcmp(call->class->system, TRACE_SYSTEM) != 0) { 2426 - d_events = event_subsystem_dir(tr, call->class->system, file, parent); 2427 - if (!d_events) 2428 - return -ENOMEM; 2429 - } else 2430 - d_events = parent; 2426 + if (WARN_ON_ONCE(strcmp(call->class->system, TRACE_SYSTEM) == 0)) 2427 + return -ENODEV; 2428 + 2429 + ef_subsystem = event_subsystem_dir(tr, call->class->system, file, parent); 2430 + if (!ef_subsystem) 2431 + return -ENOMEM; 2431 2432 2432 2433 name = trace_event_name(call); 2433 - file->dir = tracefs_create_dir(name, d_events); 2434 - if (!file->dir) { 2434 + file->ef = eventfs_add_dir(name, ef_subsystem); 2435 + if (IS_ERR(file->ef)) { 2435 2436 pr_warn("Could not create tracefs '%s' directory\n", name); 2436 2437 return -1; 2437 2438 } 2438 2439 2439 2440 if (call->class->reg && !(call->flags & TRACE_EVENT_FL_IGNORE_ENABLE)) 2440 - trace_create_file("enable", TRACE_MODE_WRITE, file->dir, file, 2441 + eventfs_add_file("enable", TRACE_MODE_WRITE, file->ef, file, 2441 2442 &ftrace_enable_fops); 2442 2443 2443 2444 #ifdef CONFIG_PERF_EVENTS 2444 2445 if (call->event.type && call->class->reg) 2445 - trace_create_file("id", TRACE_MODE_READ, file->dir, 2446 + eventfs_add_file("id", TRACE_MODE_READ, file->ef, 2446 2447 (void *)(long)call->event.type, 2447 2448 &ftrace_event_id_fops); 2448 2449 #endif ··· 2459 2458 * triggers or filters. 2460 2459 */ 2461 2460 if (!(call->flags & TRACE_EVENT_FL_IGNORE_ENABLE)) { 2462 - trace_create_file("filter", TRACE_MODE_WRITE, file->dir, 2461 + eventfs_add_file("filter", TRACE_MODE_WRITE, file->ef, 2463 2462 file, &ftrace_event_filter_fops); 2464 2463 2465 - trace_create_file("trigger", TRACE_MODE_WRITE, file->dir, 2464 + eventfs_add_file("trigger", TRACE_MODE_WRITE, file->ef, 2466 2465 file, &event_trigger_fops); 2467 2466 } 2468 2467 2469 2468 #ifdef CONFIG_HIST_TRIGGERS 2470 - trace_create_file("hist", TRACE_MODE_READ, file->dir, file, 2469 + eventfs_add_file("hist", TRACE_MODE_READ, file->ef, file, 2471 2470 &event_hist_fops); 2472 2471 #endif 2473 2472 #ifdef CONFIG_HIST_TRIGGERS_DEBUG 2474 - trace_create_file("hist_debug", TRACE_MODE_READ, file->dir, file, 2473 + eventfs_add_file("hist_debug", TRACE_MODE_READ, file->ef, file, 2475 2474 &event_hist_debug_fops); 2476 2475 #endif 2477 - trace_create_file("format", TRACE_MODE_READ, file->dir, call, 2476 + eventfs_add_file("format", TRACE_MODE_READ, file->ef, call, 2478 2477 &ftrace_event_format_fops); 2479 2478 2480 2479 #ifdef CONFIG_TRACE_EVENT_INJECT 2481 2480 if (call->event.type && call->class->reg) 2482 - trace_create_file("inject", 0200, file->dir, file, 2481 + eventfs_add_file("inject", 0200, file->ef, file, 2483 2482 &event_inject_fops); 2484 2483 #endif 2485 2484 ··· 3632 3631 { 3633 3632 struct dentry *d_events; 3634 3633 struct dentry *entry; 3634 + int error = 0; 3635 3635 3636 3636 entry = trace_create_file("set_event", TRACE_MODE_WRITE, parent, 3637 3637 tr, &ftrace_set_event_fops); 3638 3638 if (!entry) 3639 3639 return -ENOMEM; 3640 3640 3641 - d_events = tracefs_create_dir("events", parent); 3642 - if (!d_events) { 3641 + d_events = eventfs_create_events_dir("events", parent); 3642 + if (IS_ERR(d_events)) { 3643 3643 pr_warn("Could not create tracefs 'events' directory\n"); 3644 3644 return -ENOMEM; 3645 3645 } 3646 3646 3647 - entry = trace_create_file("enable", TRACE_MODE_WRITE, d_events, 3647 + error = eventfs_add_events_file("enable", TRACE_MODE_WRITE, d_events, 3648 3648 tr, &ftrace_tr_enable_fops); 3649 - if (!entry) 3649 + if (error) 3650 3650 return -ENOMEM; 3651 3651 3652 3652 /* There are not as crucial, just warn if they are not created */ ··· 3660 3658 &ftrace_set_event_notrace_pid_fops); 3661 3659 3662 3660 /* ring buffer internal formats */ 3663 - trace_create_file("header_page", TRACE_MODE_READ, d_events, 3661 + eventfs_add_events_file("header_page", TRACE_MODE_READ, d_events, 3664 3662 ring_buffer_print_page_header, 3665 3663 &ftrace_show_header_fops); 3666 3664 3667 - trace_create_file("header_event", TRACE_MODE_READ, d_events, 3665 + eventfs_add_events_file("header_event", TRACE_MODE_READ, d_events, 3668 3666 ring_buffer_print_entry_header, 3669 3667 &ftrace_show_header_fops); 3670 3668 ··· 3752 3750 3753 3751 down_write(&trace_event_sem); 3754 3752 __trace_remove_event_dirs(tr); 3755 - tracefs_remove(tr->event_dir); 3753 + eventfs_remove_events_dir(tr->event_dir); 3756 3754 up_write(&trace_event_sem); 3757 3755 3758 3756 tr->event_dir = NULL;
+275 -27
kernel/trace/trace_events_filter.c
··· 46 46 enum filter_pred_fn { 47 47 FILTER_PRED_FN_NOP, 48 48 FILTER_PRED_FN_64, 49 + FILTER_PRED_FN_64_CPUMASK, 49 50 FILTER_PRED_FN_S64, 50 51 FILTER_PRED_FN_U64, 51 52 FILTER_PRED_FN_32, 53 + FILTER_PRED_FN_32_CPUMASK, 52 54 FILTER_PRED_FN_S32, 53 55 FILTER_PRED_FN_U32, 54 56 FILTER_PRED_FN_16, 57 + FILTER_PRED_FN_16_CPUMASK, 55 58 FILTER_PRED_FN_S16, 56 59 FILTER_PRED_FN_U16, 57 60 FILTER_PRED_FN_8, 61 + FILTER_PRED_FN_8_CPUMASK, 58 62 FILTER_PRED_FN_S8, 59 63 FILTER_PRED_FN_U8, 60 64 FILTER_PRED_FN_COMM, ··· 68 64 FILTER_PRED_FN_PCHAR_USER, 69 65 FILTER_PRED_FN_PCHAR, 70 66 FILTER_PRED_FN_CPU, 67 + FILTER_PRED_FN_CPU_CPUMASK, 68 + FILTER_PRED_FN_CPUMASK, 69 + FILTER_PRED_FN_CPUMASK_CPU, 71 70 FILTER_PRED_FN_FUNCTION, 72 71 FILTER_PRED_FN_, 73 72 FILTER_PRED_TEST_VISITED, 74 73 }; 75 74 76 75 struct filter_pred { 77 - enum filter_pred_fn fn_num; 78 - u64 val; 79 - u64 val2; 80 - struct regex regex; 76 + struct regex *regex; 77 + struct cpumask *mask; 81 78 unsigned short *ops; 82 79 struct ftrace_event_field *field; 83 - int offset; 80 + u64 val; 81 + u64 val2; 82 + enum filter_pred_fn fn_num; 83 + int offset; 84 84 int not; 85 - int op; 85 + int op; 86 86 }; 87 87 88 88 /* ··· 102 94 C(TOO_MANY_OPEN, "Too many '('"), \ 103 95 C(TOO_MANY_CLOSE, "Too few '('"), \ 104 96 C(MISSING_QUOTE, "Missing matching quote"), \ 97 + C(MISSING_BRACE_OPEN, "Missing '{'"), \ 98 + C(MISSING_BRACE_CLOSE, "Missing '}'"), \ 105 99 C(OPERAND_TOO_LONG, "Operand too long"), \ 106 100 C(EXPECT_STRING, "Expecting string field"), \ 107 101 C(EXPECT_DIGIT, "Expecting numeric field"), \ ··· 113 103 C(BAD_SUBSYS_FILTER, "Couldn't find or set field in one of a subsystem's events"), \ 114 104 C(TOO_MANY_PREDS, "Too many terms in predicate expression"), \ 115 105 C(INVALID_FILTER, "Meaningless filter expression"), \ 106 + C(INVALID_CPULIST, "Invalid cpulist"), \ 116 107 C(IP_FIELD_ONLY, "Only 'ip' field is supported for function trace"), \ 117 108 C(INVALID_VALUE, "Invalid value (did you forget quotes)?"), \ 118 109 C(NO_FUNCTION, "Function not found"), \ ··· 196 185 PROCESS_AND = 2, 197 186 PROCESS_OR = 4, 198 187 }; 188 + 189 + static void free_predicate(struct filter_pred *pred) 190 + { 191 + if (pred) { 192 + kfree(pred->regex); 193 + kfree(pred->mask); 194 + kfree(pred); 195 + } 196 + } 199 197 200 198 /* 201 199 * Without going into a formal proof, this explains the method that is used in ··· 643 623 kfree(inverts); 644 624 if (prog_stack) { 645 625 for (i = 0; prog_stack[i].pred; i++) 646 - kfree(prog_stack[i].pred); 626 + free_predicate(prog_stack[i].pred); 647 627 kfree(prog_stack); 648 628 } 649 629 return ERR_PTR(ret); 630 + } 631 + 632 + static inline int 633 + do_filter_cpumask(int op, const struct cpumask *mask, const struct cpumask *cmp) 634 + { 635 + switch (op) { 636 + case OP_EQ: 637 + return cpumask_equal(mask, cmp); 638 + case OP_NE: 639 + return !cpumask_equal(mask, cmp); 640 + case OP_BAND: 641 + return cpumask_intersects(mask, cmp); 642 + default: 643 + return 0; 644 + } 645 + } 646 + 647 + /* Optimisation of do_filter_cpumask() for scalar fields */ 648 + static inline int 649 + do_filter_scalar_cpumask(int op, unsigned int cpu, const struct cpumask *mask) 650 + { 651 + /* 652 + * Per the weight-of-one cpumask optimisations, the mask passed in this 653 + * function has a weight >= 2, so it is never equal to a single scalar. 654 + */ 655 + switch (op) { 656 + case OP_EQ: 657 + return false; 658 + case OP_NE: 659 + return true; 660 + case OP_BAND: 661 + return cpumask_test_cpu(cpu, mask); 662 + default: 663 + return 0; 664 + } 665 + } 666 + 667 + static inline int 668 + do_filter_cpumask_scalar(int op, const struct cpumask *mask, unsigned int cpu) 669 + { 670 + switch (op) { 671 + case OP_EQ: 672 + return cpumask_test_cpu(cpu, mask) && 673 + cpumask_nth(1, mask) >= nr_cpu_ids; 674 + case OP_NE: 675 + return !cpumask_test_cpu(cpu, mask) || 676 + cpumask_nth(1, mask) < nr_cpu_ids; 677 + case OP_BAND: 678 + return cpumask_test_cpu(cpu, mask); 679 + default: 680 + return 0; 681 + } 650 682 } 651 683 652 684 enum pred_cmp_types { ··· 744 672 } \ 745 673 } 746 674 675 + #define DEFINE_CPUMASK_COMPARISON_PRED(size) \ 676 + static int filter_pred_##size##_cpumask(struct filter_pred *pred, void *event) \ 677 + { \ 678 + u##size *addr = (u##size *)(event + pred->offset); \ 679 + unsigned int cpu = *addr; \ 680 + \ 681 + if (cpu >= nr_cpu_ids) \ 682 + return 0; \ 683 + \ 684 + return do_filter_scalar_cpumask(pred->op, cpu, pred->mask); \ 685 + } 686 + 747 687 #define DEFINE_EQUALITY_PRED(size) \ 748 688 static int filter_pred_##size(struct filter_pred *pred, void *event) \ 749 689 { \ ··· 776 692 DEFINE_COMPARISON_PRED(u16); 777 693 DEFINE_COMPARISON_PRED(s8); 778 694 DEFINE_COMPARISON_PRED(u8); 695 + 696 + DEFINE_CPUMASK_COMPARISON_PRED(64); 697 + DEFINE_CPUMASK_COMPARISON_PRED(32); 698 + DEFINE_CPUMASK_COMPARISON_PRED(16); 699 + DEFINE_CPUMASK_COMPARISON_PRED(8); 779 700 780 701 DEFINE_EQUALITY_PRED(64); 781 702 DEFINE_EQUALITY_PRED(32); ··· 839 750 char *addr = (char *)(event + pred->offset); 840 751 int cmp, match; 841 752 842 - cmp = pred->regex.match(addr, &pred->regex, pred->regex.field_len); 753 + cmp = pred->regex->match(addr, pred->regex, pred->regex->field_len); 843 754 844 755 match = cmp ^ pred->not; 845 756 ··· 852 763 int len; 853 764 854 765 len = strlen(str) + 1; /* including tailing '\0' */ 855 - cmp = pred->regex.match(str, &pred->regex, len); 766 + cmp = pred->regex->match(str, pred->regex, len); 856 767 857 768 match = cmp ^ pred->not; 858 769 ··· 902 813 char *addr = (char *)(event + str_loc); 903 814 int cmp, match; 904 815 905 - cmp = pred->regex.match(addr, &pred->regex, str_len); 816 + cmp = pred->regex->match(addr, pred->regex, str_len); 906 817 907 818 match = cmp ^ pred->not; 908 819 ··· 925 836 char *addr = (char *)(&item[1]) + str_loc; 926 837 int cmp, match; 927 838 928 - cmp = pred->regex.match(addr, &pred->regex, str_len); 839 + cmp = pred->regex->match(addr, pred->regex, str_len); 929 840 930 841 match = cmp ^ pred->not; 931 842 ··· 958 869 } 959 870 } 960 871 872 + /* Filter predicate for current CPU vs user-provided cpumask */ 873 + static int filter_pred_cpu_cpumask(struct filter_pred *pred, void *event) 874 + { 875 + int cpu = raw_smp_processor_id(); 876 + 877 + return do_filter_scalar_cpumask(pred->op, cpu, pred->mask); 878 + } 879 + 880 + /* Filter predicate for cpumask field vs user-provided cpumask */ 881 + static int filter_pred_cpumask(struct filter_pred *pred, void *event) 882 + { 883 + u32 item = *(u32 *)(event + pred->offset); 884 + int loc = item & 0xffff; 885 + const struct cpumask *mask = (event + loc); 886 + const struct cpumask *cmp = pred->mask; 887 + 888 + return do_filter_cpumask(pred->op, mask, cmp); 889 + } 890 + 891 + /* Filter predicate for cpumask field vs user-provided scalar */ 892 + static int filter_pred_cpumask_cpu(struct filter_pred *pred, void *event) 893 + { 894 + u32 item = *(u32 *)(event + pred->offset); 895 + int loc = item & 0xffff; 896 + const struct cpumask *mask = (event + loc); 897 + unsigned int cpu = pred->val; 898 + 899 + return do_filter_cpumask_scalar(pred->op, mask, cpu); 900 + } 901 + 961 902 /* Filter predicate for COMM. */ 962 903 static int filter_pred_comm(struct filter_pred *pred, void *event) 963 904 { 964 905 int cmp; 965 906 966 - cmp = pred->regex.match(current->comm, &pred->regex, 907 + cmp = pred->regex->match(current->comm, pred->regex, 967 908 TASK_COMM_LEN); 968 909 return cmp ^ pred->not; 969 910 } ··· 1123 1004 1124 1005 static void filter_build_regex(struct filter_pred *pred) 1125 1006 { 1126 - struct regex *r = &pred->regex; 1007 + struct regex *r = pred->regex; 1127 1008 char *search; 1128 1009 enum regex_type type = MATCH_FULL; 1129 1010 ··· 1288 1169 return; 1289 1170 1290 1171 for (i = 0; prog[i].pred; i++) 1291 - kfree(prog[i].pred); 1172 + free_predicate(prog[i].pred); 1292 1173 kfree(prog); 1293 1174 } 1294 1175 ··· 1355 1236 1356 1237 int filter_assign_type(const char *type) 1357 1238 { 1358 - if (strstr(type, "__data_loc") && strstr(type, "char")) 1359 - return FILTER_DYN_STRING; 1239 + if (strstr(type, "__data_loc")) { 1240 + if (strstr(type, "char")) 1241 + return FILTER_DYN_STRING; 1242 + if (strstr(type, "cpumask_t")) 1243 + return FILTER_CPUMASK; 1244 + } 1360 1245 1361 1246 if (strstr(type, "__rel_loc") && strstr(type, "char")) 1362 1247 return FILTER_RDYN_STRING; ··· 1436 1313 switch (pred->fn_num) { 1437 1314 case FILTER_PRED_FN_64: 1438 1315 return filter_pred_64(pred, event); 1316 + case FILTER_PRED_FN_64_CPUMASK: 1317 + return filter_pred_64_cpumask(pred, event); 1439 1318 case FILTER_PRED_FN_S64: 1440 1319 return filter_pred_s64(pred, event); 1441 1320 case FILTER_PRED_FN_U64: 1442 1321 return filter_pred_u64(pred, event); 1443 1322 case FILTER_PRED_FN_32: 1444 1323 return filter_pred_32(pred, event); 1324 + case FILTER_PRED_FN_32_CPUMASK: 1325 + return filter_pred_32_cpumask(pred, event); 1445 1326 case FILTER_PRED_FN_S32: 1446 1327 return filter_pred_s32(pred, event); 1447 1328 case FILTER_PRED_FN_U32: 1448 1329 return filter_pred_u32(pred, event); 1449 1330 case FILTER_PRED_FN_16: 1450 1331 return filter_pred_16(pred, event); 1332 + case FILTER_PRED_FN_16_CPUMASK: 1333 + return filter_pred_16_cpumask(pred, event); 1451 1334 case FILTER_PRED_FN_S16: 1452 1335 return filter_pred_s16(pred, event); 1453 1336 case FILTER_PRED_FN_U16: 1454 1337 return filter_pred_u16(pred, event); 1455 1338 case FILTER_PRED_FN_8: 1456 1339 return filter_pred_8(pred, event); 1340 + case FILTER_PRED_FN_8_CPUMASK: 1341 + return filter_pred_8_cpumask(pred, event); 1457 1342 case FILTER_PRED_FN_S8: 1458 1343 return filter_pred_s8(pred, event); 1459 1344 case FILTER_PRED_FN_U8: ··· 1480 1349 return filter_pred_pchar(pred, event); 1481 1350 case FILTER_PRED_FN_CPU: 1482 1351 return filter_pred_cpu(pred, event); 1352 + case FILTER_PRED_FN_CPU_CPUMASK: 1353 + return filter_pred_cpu_cpumask(pred, event); 1354 + case FILTER_PRED_FN_CPUMASK: 1355 + return filter_pred_cpumask(pred, event); 1356 + case FILTER_PRED_FN_CPUMASK_CPU: 1357 + return filter_pred_cpumask_cpu(pred, event); 1483 1358 case FILTER_PRED_FN_FUNCTION: 1484 1359 return filter_pred_function(pred, event); 1485 1360 case FILTER_PRED_TEST_VISITED: ··· 1690 1553 goto err_free; 1691 1554 } 1692 1555 1693 - pred->regex.len = len; 1694 - strncpy(pred->regex.pattern, str + s, len); 1695 - pred->regex.pattern[len] = 0; 1556 + pred->regex = kzalloc(sizeof(*pred->regex), GFP_KERNEL); 1557 + if (!pred->regex) 1558 + goto err_mem; 1559 + pred->regex->len = len; 1560 + strncpy(pred->regex->pattern, str + s, len); 1561 + pred->regex->pattern[len] = 0; 1562 + 1563 + } else if (!strncmp(str + i, "CPUS", 4)) { 1564 + unsigned int maskstart; 1565 + bool single; 1566 + char *tmp; 1567 + 1568 + switch (field->filter_type) { 1569 + case FILTER_CPUMASK: 1570 + case FILTER_CPU: 1571 + case FILTER_OTHER: 1572 + break; 1573 + default: 1574 + parse_error(pe, FILT_ERR_ILLEGAL_FIELD_OP, pos + i); 1575 + goto err_free; 1576 + } 1577 + 1578 + switch (op) { 1579 + case OP_EQ: 1580 + case OP_NE: 1581 + case OP_BAND: 1582 + break; 1583 + default: 1584 + parse_error(pe, FILT_ERR_ILLEGAL_FIELD_OP, pos + i); 1585 + goto err_free; 1586 + } 1587 + 1588 + /* Skip CPUS */ 1589 + i += 4; 1590 + if (str[i++] != '{') { 1591 + parse_error(pe, FILT_ERR_MISSING_BRACE_OPEN, pos + i); 1592 + goto err_free; 1593 + } 1594 + maskstart = i; 1595 + 1596 + /* Walk the cpulist until closing } */ 1597 + for (; str[i] && str[i] != '}'; i++); 1598 + if (str[i] != '}') { 1599 + parse_error(pe, FILT_ERR_MISSING_BRACE_CLOSE, pos + i); 1600 + goto err_free; 1601 + } 1602 + 1603 + if (maskstart == i) { 1604 + parse_error(pe, FILT_ERR_INVALID_CPULIST, pos + i); 1605 + goto err_free; 1606 + } 1607 + 1608 + /* Copy the cpulist between { and } */ 1609 + tmp = kmalloc((i - maskstart) + 1, GFP_KERNEL); 1610 + strscpy(tmp, str + maskstart, (i - maskstart) + 1); 1611 + 1612 + pred->mask = kzalloc(cpumask_size(), GFP_KERNEL); 1613 + if (!pred->mask) 1614 + goto err_mem; 1615 + 1616 + /* Now parse it */ 1617 + if (cpulist_parse(tmp, pred->mask)) { 1618 + parse_error(pe, FILT_ERR_INVALID_CPULIST, pos + i); 1619 + goto err_free; 1620 + } 1621 + 1622 + /* Move along */ 1623 + i++; 1624 + 1625 + /* 1626 + * Optimisation: if the user-provided mask has a weight of one 1627 + * then we can treat it as a scalar input. 1628 + */ 1629 + single = cpumask_weight(pred->mask) == 1; 1630 + if (single) { 1631 + pred->val = cpumask_first(pred->mask); 1632 + kfree(pred->mask); 1633 + } 1634 + 1635 + if (field->filter_type == FILTER_CPUMASK) { 1636 + pred->fn_num = single ? 1637 + FILTER_PRED_FN_CPUMASK_CPU : 1638 + FILTER_PRED_FN_CPUMASK; 1639 + } else if (field->filter_type == FILTER_CPU) { 1640 + if (single) { 1641 + pred->op = pred->op == OP_BAND ? OP_EQ : pred->op; 1642 + pred->fn_num = FILTER_PRED_FN_CPU; 1643 + } else { 1644 + pred->fn_num = FILTER_PRED_FN_CPU_CPUMASK; 1645 + } 1646 + } else if (single) { 1647 + pred->op = pred->op == OP_BAND ? OP_EQ : pred->op; 1648 + pred->fn_num = select_comparison_fn(pred->op, field->size, false); 1649 + if (pred->op == OP_NE) 1650 + pred->not = 1; 1651 + } else { 1652 + switch (field->size) { 1653 + case 8: 1654 + pred->fn_num = FILTER_PRED_FN_64_CPUMASK; 1655 + break; 1656 + case 4: 1657 + pred->fn_num = FILTER_PRED_FN_32_CPUMASK; 1658 + break; 1659 + case 2: 1660 + pred->fn_num = FILTER_PRED_FN_16_CPUMASK; 1661 + break; 1662 + case 1: 1663 + pred->fn_num = FILTER_PRED_FN_8_CPUMASK; 1664 + break; 1665 + } 1666 + } 1696 1667 1697 1668 /* This is either a string, or an integer */ 1698 1669 } else if (str[i] == '\'' || str[i] == '"') { ··· 1842 1597 goto err_free; 1843 1598 } 1844 1599 1845 - pred->regex.len = len; 1846 - strncpy(pred->regex.pattern, str + s, len); 1847 - pred->regex.pattern[len] = 0; 1600 + pred->regex = kzalloc(sizeof(*pred->regex), GFP_KERNEL); 1601 + if (!pred->regex) 1602 + goto err_mem; 1603 + pred->regex->len = len; 1604 + strncpy(pred->regex->pattern, str + s, len); 1605 + pred->regex->pattern[len] = 0; 1848 1606 1849 1607 filter_build_regex(pred); 1850 1608 ··· 1856 1608 1857 1609 } else if (field->filter_type == FILTER_STATIC_STRING) { 1858 1610 pred->fn_num = FILTER_PRED_FN_STRING; 1859 - pred->regex.field_len = field->size; 1611 + pred->regex->field_len = field->size; 1860 1612 1861 1613 } else if (field->filter_type == FILTER_DYN_STRING) { 1862 1614 pred->fn_num = FILTER_PRED_FN_STRLOC; ··· 1939 1691 return i; 1940 1692 1941 1693 err_free: 1942 - kfree(pred); 1694 + free_predicate(pred); 1943 1695 return -EINVAL; 1944 1696 err_mem: 1945 - kfree(pred); 1697 + free_predicate(pred); 1946 1698 return -ENOMEM; 1947 1699 } 1948 1700 ··· 2535 2287 return ret; 2536 2288 2537 2289 return __ftrace_function_set_filter(pred->op == OP_EQ, 2538 - pred->regex.pattern, 2539 - pred->regex.len, 2290 + pred->regex->pattern, 2291 + pred->regex->len, 2540 2292 data); 2541 2293 } 2542 2294
+8 -7
kernel/trace/trace_events_user.c
··· 1328 1328 1329 1329 static int user_event_set_print_fmt(struct user_event *user, char *buf, int len) 1330 1330 { 1331 - struct ftrace_event_field *field, *next; 1331 + struct ftrace_event_field *field; 1332 1332 struct list_head *head = &user->fields; 1333 1333 int pos = 0, depth = 0; 1334 1334 const char *str_func; 1335 1335 1336 1336 pos += snprintf(buf + pos, LEN_OR_ZERO, "\""); 1337 1337 1338 - list_for_each_entry_safe_reverse(field, next, head, link) { 1338 + list_for_each_entry_reverse(field, head, link) { 1339 1339 if (depth != 0) 1340 1340 pos += snprintf(buf + pos, LEN_OR_ZERO, " "); 1341 1341 ··· 1347 1347 1348 1348 pos += snprintf(buf + pos, LEN_OR_ZERO, "\""); 1349 1349 1350 - list_for_each_entry_safe_reverse(field, next, head, link) { 1350 + list_for_each_entry_reverse(field, head, link) { 1351 1351 if (user_field_is_dyn_string(field->type, &str_func)) 1352 1352 pos += snprintf(buf + pos, LEN_OR_ZERO, 1353 1353 ", %s(%s)", str_func, field->name); ··· 1732 1732 static int user_event_show(struct seq_file *m, struct dyn_event *ev) 1733 1733 { 1734 1734 struct user_event *user = container_of(ev, struct user_event, devent); 1735 - struct ftrace_event_field *field, *next; 1735 + struct ftrace_event_field *field; 1736 1736 struct list_head *head; 1737 1737 int depth = 0; 1738 1738 ··· 1740 1740 1741 1741 head = trace_get_fields(&user->call); 1742 1742 1743 - list_for_each_entry_safe_reverse(field, next, head, link) { 1743 + list_for_each_entry_reverse(field, head, link) { 1744 1744 if (depth == 0) 1745 1745 seq_puts(m, " "); 1746 1746 else ··· 1816 1816 static bool user_fields_match(struct user_event *user, int argc, 1817 1817 const char **argv) 1818 1818 { 1819 - struct ftrace_event_field *field, *next; 1819 + struct ftrace_event_field *field; 1820 1820 struct list_head *head = &user->fields; 1821 1821 int i = 0; 1822 1822 1823 - list_for_each_entry_safe_reverse(field, next, head, link) 1823 + list_for_each_entry_reverse(field, head, link) { 1824 1824 if (!user_field_match(field, argc, argv, &i)) 1825 1825 return false; 1826 + } 1826 1827 1827 1828 if (i != argc) 1828 1829 return false;
+9
kernel/trace/trace_export.c
··· 51 51 #undef __array 52 52 #define __array(type, item, size) type item[size]; 53 53 54 + #undef __stack_array 55 + #define __stack_array(type, item, size, field) __array(type, item, size) 56 + 54 57 #undef __array_desc 55 58 #define __array_desc(type, container, item, size) type item[size]; 56 59 ··· 117 114 is_signed_type(_type), .filter_type = FILTER_OTHER, \ 118 115 .len = _len }, 119 116 117 + #undef __stack_array 118 + #define __stack_array(_type, _item, _len, _field) __array(_type, _item, _len) 119 + 120 120 #undef __array_desc 121 121 #define __array_desc(_type, _container, _item, _len) __array(_type, _item, _len) 122 122 ··· 154 148 155 149 #undef __array 156 150 #define __array(type, item, len) 151 + 152 + #undef __stack_array 153 + #define __stack_array(type, item, len, field) 157 154 158 155 #undef __array_desc 159 156 #define __array_desc(type, container, item, len)
+7 -2
tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_char.tc
··· 34 34 esac 35 35 36 36 : "Test get argument (1)" 37 - echo "p:testprobe tracefs_create_dir arg1=+0(${ARG1}):char" > kprobe_events 37 + if grep -q eventfs_add_dir available_filter_functions; then 38 + DIR_NAME="eventfs_add_dir" 39 + else 40 + DIR_NAME="tracefs_create_dir" 41 + fi 42 + echo "p:testprobe ${DIR_NAME} arg1=+0(${ARG1}):char" > kprobe_events 38 43 echo 1 > events/kprobes/testprobe/enable 39 44 echo "p:test $FUNCTION_FORK" >> kprobe_events 40 45 grep -qe "testprobe.* arg1='t'" trace 41 46 42 47 echo 0 > events/kprobes/testprobe/enable 43 48 : "Test get argument (2)" 44 - echo "p:testprobe tracefs_create_dir arg1=+0(${ARG1}):char arg2=+0(${ARG1}):char[4]" > kprobe_events 49 + echo "p:testprobe ${DIR_NAME} arg1=+0(${ARG1}):char arg2=+0(${ARG1}):char[4]" > kprobe_events 45 50 echo 1 > events/kprobes/testprobe/enable 46 51 echo "p:test $FUNCTION_FORK" >> kprobe_events 47 52 grep -qe "testprobe.* arg1='t' arg2={'t','e','s','t'}" trace
+7 -2
tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc
··· 37 37 esac 38 38 39 39 : "Test get argument (1)" 40 - echo "p:testprobe tracefs_create_dir arg1=+0(${ARG1}):string" > kprobe_events 40 + if grep -q eventfs_add_dir available_filter_functions; then 41 + DIR_NAME="eventfs_add_dir" 42 + else 43 + DIR_NAME="tracefs_create_dir" 44 + fi 45 + echo "p:testprobe ${DIR_NAME} arg1=+0(${ARG1}):string" > kprobe_events 41 46 echo 1 > events/kprobes/testprobe/enable 42 47 echo "p:test $FUNCTION_FORK" >> kprobe_events 43 48 grep -qe "testprobe.* arg1=\"test\"" trace 44 49 45 50 echo 0 > events/kprobes/testprobe/enable 46 51 : "Test get argument (2)" 47 - echo "p:testprobe tracefs_create_dir arg1=+0(${ARG1}):string arg2=+0(${ARG1}):string" > kprobe_events 52 + echo "p:testprobe ${DIR_NAME} arg1=+0(${ARG1}):string arg2=+0(${ARG1}):string" > kprobe_events 48 53 echo 1 > events/kprobes/testprobe/enable 49 54 echo "p:test $FUNCTION_FORK" >> kprobe_events 50 55 grep -qe "testprobe.* arg1=\"test\" arg2=\"test\"" trace