Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

Merge tag 'trace-v5.14-rc4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fixes from Steven Rostedt:
"Fix tracepoint race between static_call and callback data

As callbacks to a tracepoint are paired with the data that is passed
in when the callback is registered to the tracepoint, it must have
that data passed to the callback when the tracepoint is triggered,
else bad things will happen. To keep the two together, they are both
assigned to a tracepoint structure and added to an array. The
tracepoint call site will dereference the structure (via RCU) and call
the callback in that structure along with the data in that structure.
This keeps the callback and data tightly coupled.

Because of the overhead that retpolines have on tracepoint callbacks,
if there's only one callback attached to a tracepoint (a common case),
then it is called via a static call (code modified to do a direct call
instead of an indirect call). But to implement this, the data had to
be decoupled from the callback, as now the callback is implemented via
a direct call from the static call and not an indirect call from the
dereferenced structure.

Note, the static call only calls a callback used when there's a single
callback attached to the tracepoint. If more than one callback is
attached to the same tracepoint, then the static call will call an
iterator function that goes back to dereferencing the structure
keeping the callback and its data tightly coupled again.

Issues can arise when going from 0 callbacks to one, as the static
call is assigned to the callback, and it must take care that the data
passed to it is loaded before the static call calls the callback.
Going from 1 to 2 callbacks is not an issue, as long as the static
call is updated to the iterator before the tracepoint structure array
is updated via RCU. Going from 2 to more or back down to 2 is not an
issue as the iterator can handle all theses cases. But going from 2 to
1, care must be taken as the static call is now calling a callback and
the data that is loaded must be the data for that callback.

Care was taken to ensure the callback and data would be in-sync, but
after a bug was reported, it became clear that not enough was done to
make sure that was the case. These changes address this.

The first change is to compare the old and new data instead of the old
and new callback, as it's the data that can corrupt the callback, even
if the callback is the same (something getting freed).

The next change is to convert these transitions into states, to make
it easier to know when a synchronization is needed, and to perform
those synchronizations. The problem with this patch is that it slows
down disabling all events from under a second, to making it take over
10 seconds to do the same work. But that is addressed in the final
patch.

The final patch uses the RCU state functions to keep track of the RCU
state between the transitions, and only needs to perform the
synchronization if an RCU synchronization hasn't been done already.
This brings the performance of disabling all events back to its
original value. That's because no synchronization is required between
disabling tracepoints but is required when enabling a tracepoint after
its been disabled. If an RCU synchronization happens after the
tracepoint is disabled, and before it is re-enabled, there's no need
to do the synchronization again.

Both the second and third patch have subtle complexities that they are
separated into two patches. But because the second patch causes such a
regression in performance, the third patch adds a "Fixes" tag to the
second patch, such that the two must be backported together and not
just the second patch"

* tag 'trace-v5.14-rc4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracepoint: Use rcu get state and cond sync for static call updates
tracepoint: Fix static call function vs data state mismatch
tracepoint: static call: Compare data on transition from 2->1 callees

+135 -20
+135 -20
kernel/tracepoint.c
··· 15 15 #include <linux/sched/task.h> 16 16 #include <linux/static_key.h> 17 17 18 + enum tp_func_state { 19 + TP_FUNC_0, 20 + TP_FUNC_1, 21 + TP_FUNC_2, 22 + TP_FUNC_N, 23 + }; 24 + 18 25 extern tracepoint_ptr_t __start___tracepoints_ptrs[]; 19 26 extern tracepoint_ptr_t __stop___tracepoints_ptrs[]; 20 27 21 28 DEFINE_SRCU(tracepoint_srcu); 22 29 EXPORT_SYMBOL_GPL(tracepoint_srcu); 30 + 31 + enum tp_transition_sync { 32 + TP_TRANSITION_SYNC_1_0_1, 33 + TP_TRANSITION_SYNC_N_2_1, 34 + 35 + _NR_TP_TRANSITION_SYNC, 36 + }; 37 + 38 + struct tp_transition_snapshot { 39 + unsigned long rcu; 40 + unsigned long srcu; 41 + bool ongoing; 42 + }; 43 + 44 + /* Protected by tracepoints_mutex */ 45 + static struct tp_transition_snapshot tp_transition_snapshot[_NR_TP_TRANSITION_SYNC]; 46 + 47 + static void tp_rcu_get_state(enum tp_transition_sync sync) 48 + { 49 + struct tp_transition_snapshot *snapshot = &tp_transition_snapshot[sync]; 50 + 51 + /* Keep the latest get_state snapshot. */ 52 + snapshot->rcu = get_state_synchronize_rcu(); 53 + snapshot->srcu = start_poll_synchronize_srcu(&tracepoint_srcu); 54 + snapshot->ongoing = true; 55 + } 56 + 57 + static void tp_rcu_cond_sync(enum tp_transition_sync sync) 58 + { 59 + struct tp_transition_snapshot *snapshot = &tp_transition_snapshot[sync]; 60 + 61 + if (!snapshot->ongoing) 62 + return; 63 + cond_synchronize_rcu(snapshot->rcu); 64 + if (!poll_state_synchronize_srcu(&tracepoint_srcu, snapshot->srcu)) 65 + synchronize_srcu(&tracepoint_srcu); 66 + snapshot->ongoing = false; 67 + } 23 68 24 69 /* Set to 1 to enable tracepoint debug output */ 25 70 static const int tracepoint_debug; ··· 291 246 return old; 292 247 } 293 248 294 - static void tracepoint_update_call(struct tracepoint *tp, struct tracepoint_func *tp_funcs, bool sync) 249 + /* 250 + * Count the number of functions (enum tp_func_state) in a tp_funcs array. 251 + */ 252 + static enum tp_func_state nr_func_state(const struct tracepoint_func *tp_funcs) 253 + { 254 + if (!tp_funcs) 255 + return TP_FUNC_0; 256 + if (!tp_funcs[1].func) 257 + return TP_FUNC_1; 258 + if (!tp_funcs[2].func) 259 + return TP_FUNC_2; 260 + return TP_FUNC_N; /* 3 or more */ 261 + } 262 + 263 + static void tracepoint_update_call(struct tracepoint *tp, struct tracepoint_func *tp_funcs) 295 264 { 296 265 void *func = tp->iterator; 297 266 298 267 /* Synthetic events do not have static call sites */ 299 268 if (!tp->static_call_key) 300 269 return; 301 - 302 - if (!tp_funcs[1].func) { 270 + if (nr_func_state(tp_funcs) == TP_FUNC_1) 303 271 func = tp_funcs[0].func; 304 - /* 305 - * If going from the iterator back to a single caller, 306 - * we need to synchronize with __DO_TRACE to make sure 307 - * that the data passed to the callback is the one that 308 - * belongs to that callback. 309 - */ 310 - if (sync) 311 - tracepoint_synchronize_unregister(); 312 - } 313 - 314 272 __static_call_update(tp->static_call_key, tp->static_call_tramp, func); 315 273 } 316 274 ··· 347 299 * a pointer to it. This array is referenced by __DO_TRACE from 348 300 * include/linux/tracepoint.h using rcu_dereference_sched(). 349 301 */ 350 - tracepoint_update_call(tp, tp_funcs, false); 351 - rcu_assign_pointer(tp->funcs, tp_funcs); 352 - static_key_enable(&tp->key); 302 + switch (nr_func_state(tp_funcs)) { 303 + case TP_FUNC_1: /* 0->1 */ 304 + /* 305 + * Make sure new static func never uses old data after a 306 + * 1->0->1 transition sequence. 307 + */ 308 + tp_rcu_cond_sync(TP_TRANSITION_SYNC_1_0_1); 309 + /* Set static call to first function */ 310 + tracepoint_update_call(tp, tp_funcs); 311 + /* Both iterator and static call handle NULL tp->funcs */ 312 + rcu_assign_pointer(tp->funcs, tp_funcs); 313 + static_key_enable(&tp->key); 314 + break; 315 + case TP_FUNC_2: /* 1->2 */ 316 + /* Set iterator static call */ 317 + tracepoint_update_call(tp, tp_funcs); 318 + /* 319 + * Iterator callback installed before updating tp->funcs. 320 + * Requires ordering between RCU assign/dereference and 321 + * static call update/call. 322 + */ 323 + fallthrough; 324 + case TP_FUNC_N: /* N->N+1 (N>1) */ 325 + rcu_assign_pointer(tp->funcs, tp_funcs); 326 + /* 327 + * Make sure static func never uses incorrect data after a 328 + * N->...->2->1 (N>1) transition sequence. 329 + */ 330 + if (tp_funcs[0].data != old[0].data) 331 + tp_rcu_get_state(TP_TRANSITION_SYNC_N_2_1); 332 + break; 333 + default: 334 + WARN_ON_ONCE(1); 335 + break; 336 + } 353 337 354 338 release_probes(old); 355 339 return 0; ··· 408 328 /* Failed allocating new tp_funcs, replaced func with stub */ 409 329 return 0; 410 330 411 - if (!tp_funcs) { 331 + switch (nr_func_state(tp_funcs)) { 332 + case TP_FUNC_0: /* 1->0 */ 412 333 /* Removed last function */ 413 334 if (tp->unregfunc && static_key_enabled(&tp->key)) 414 335 tp->unregfunc(); 415 336 416 337 static_key_disable(&tp->key); 338 + /* Set iterator static call */ 339 + tracepoint_update_call(tp, tp_funcs); 340 + /* Both iterator and static call handle NULL tp->funcs */ 341 + rcu_assign_pointer(tp->funcs, NULL); 342 + /* 343 + * Make sure new static func never uses old data after a 344 + * 1->0->1 transition sequence. 345 + */ 346 + tp_rcu_get_state(TP_TRANSITION_SYNC_1_0_1); 347 + break; 348 + case TP_FUNC_1: /* 2->1 */ 417 349 rcu_assign_pointer(tp->funcs, tp_funcs); 418 - } else { 350 + /* 351 + * Make sure static func never uses incorrect data after a 352 + * N->...->2->1 (N>2) transition sequence. If the first 353 + * element's data has changed, then force the synchronization 354 + * to prevent current readers that have loaded the old data 355 + * from calling the new function. 356 + */ 357 + if (tp_funcs[0].data != old[0].data) 358 + tp_rcu_get_state(TP_TRANSITION_SYNC_N_2_1); 359 + tp_rcu_cond_sync(TP_TRANSITION_SYNC_N_2_1); 360 + /* Set static call to first function */ 361 + tracepoint_update_call(tp, tp_funcs); 362 + break; 363 + case TP_FUNC_2: /* N->N-1 (N>2) */ 364 + fallthrough; 365 + case TP_FUNC_N: 419 366 rcu_assign_pointer(tp->funcs, tp_funcs); 420 - tracepoint_update_call(tp, tp_funcs, 421 - tp_funcs[0].func != old[0].func); 367 + /* 368 + * Make sure static func never uses incorrect data after a 369 + * N->...->2->1 (N>2) transition sequence. 370 + */ 371 + if (tp_funcs[0].data != old[0].data) 372 + tp_rcu_get_state(TP_TRANSITION_SYNC_N_2_1); 373 + break; 374 + default: 375 + WARN_ON_ONCE(1); 376 + break; 422 377 } 423 378 release_probes(old); 424 379 return 0;