perf: Optimize perf_pmu_migrate_context()

Thomas reported that offlining CPUs spends a lot of time in
synchronize_rcu() as called from perf_pmu_migrate_context() even though
he's not actually using uncore events.

Turns out, the thing is unconditionally waiting for RCU, even if there's
no actual events to migrate.

Fixes: 0cda4c023132 ("perf: Introduce perf_pmu_migrate_context()")
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Link: https://lkml.kernel.org/r/20230403090858.GT4253@hirez.programming.kicks-ass.net

Changed files
+7 -5
kernel
events
+7 -5
kernel/events/core.c
··· 12893 12893 __perf_pmu_remove(src_ctx, src_cpu, pmu, &src_ctx->pinned_groups, &events); 12894 12894 __perf_pmu_remove(src_ctx, src_cpu, pmu, &src_ctx->flexible_groups, &events); 12895 12895 12896 - /* 12897 - * Wait for the events to quiesce before re-instating them. 12898 - */ 12899 - synchronize_rcu(); 12896 + if (!list_empty(&events)) { 12897 + /* 12898 + * Wait for the events to quiesce before re-instating them. 12899 + */ 12900 + synchronize_rcu(); 12900 12901 12901 - __perf_pmu_install(dst_ctx, dst_cpu, pmu, &events); 12902 + __perf_pmu_install(dst_ctx, dst_cpu, pmu, &events); 12903 + } 12902 12904 12903 12905 mutex_unlock(&dst_ctx->mutex); 12904 12906 mutex_unlock(&src_ctx->mutex);