Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

perf/x86/intel: Fix PMI handling for Intel PT

Intel PT is a separate PMU and it is not using any of the x86_pmu
code paths, which means in particular that the active_events counter
remains intact when new PT events are created.

However, PT uses the generic x86_pmu PMI handler for its PMI handling needs.

The problem here is that the latter checks active_events and in case of it
being zero, exits without calling the actual x86_pmu.handle_nmi(), which
results in unknown NMI errors and massive data loss for PT.

The effect is not visible if there are other perf events in the system
at the same time that keep active_events counter non-zero, for instance
if the NMI watchdog is running, so one needs to disable it to reproduce
the problem.

At the same time, the active_events counter besides doing what the name
suggests also implicitly serves as a PMC hardware and DS area reference
counter.

This patch adds a separate reference counter for the PMC hardware, leaving
active_events for actually counting the events and makes sure it also
counts PT and BTS events.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@infradead.org
Cc: adrian.hunter@intel.com
Link: http://lkml.kernel.org/r/87k2v92t0s.fsf@ashishki-desk.ger.corp.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

authored by

Alexander Shishkin and committed by
Ingo Molnar
1b7b938f 6b099d9b

+23 -4
+23 -4
arch/x86/kernel/cpu/perf_event.c
··· 135 135 } 136 136 137 137 static atomic_t active_events; 138 + static atomic_t pmc_refcount; 138 139 static DEFINE_MUTEX(pmc_reserve_mutex); 139 140 140 141 #ifdef CONFIG_X86_LOCAL_APIC ··· 272 271 static void hw_perf_event_destroy(struct perf_event *event) 273 272 { 274 273 x86_release_hardware(); 274 + atomic_dec(&active_events); 275 275 } 276 276 277 277 void hw_perf_lbr_event_destroy(struct perf_event *event) ··· 326 324 { 327 325 int err = 0; 328 326 329 - if (!atomic_inc_not_zero(&active_events)) { 327 + if (!atomic_inc_not_zero(&pmc_refcount)) { 330 328 mutex_lock(&pmc_reserve_mutex); 331 - if (atomic_read(&active_events) == 0) { 329 + if (atomic_read(&pmc_refcount) == 0) { 332 330 if (!reserve_pmc_hardware()) 333 331 err = -EBUSY; 334 332 else 335 333 reserve_ds_buffers(); 336 334 } 337 335 if (!err) 338 - atomic_inc(&active_events); 336 + atomic_inc(&pmc_refcount); 339 337 mutex_unlock(&pmc_reserve_mutex); 340 338 } 341 339 ··· 344 342 345 343 void x86_release_hardware(void) 346 344 { 347 - if (atomic_dec_and_mutex_lock(&active_events, &pmc_reserve_mutex)) { 345 + if (atomic_dec_and_mutex_lock(&pmc_refcount, &pmc_reserve_mutex)) { 348 346 release_pmc_hardware(); 349 347 release_ds_buffers(); 350 348 mutex_unlock(&pmc_reserve_mutex); ··· 373 371 374 372 out: 375 373 mutex_unlock(&pmc_reserve_mutex); 374 + 375 + /* 376 + * Assuming that all exclusive events will share the PMI handler 377 + * (which checks active_events for whether there is work to do), 378 + * we can bump active_events counter right here, except for 379 + * x86_lbr_exclusive_lbr events that go through x86_pmu_event_init() 380 + * path, which already bumps active_events for them. 381 + */ 382 + if (!ret && what != x86_lbr_exclusive_lbr) 383 + atomic_inc(&active_events); 384 + 376 385 return ret; 377 386 } 378 387 379 388 void x86_del_exclusive(unsigned int what) 380 389 { 381 390 atomic_dec(&x86_pmu.lbr_exclusive[what]); 391 + atomic_dec(&active_events); 382 392 } 383 393 384 394 int x86_setup_perfctr(struct perf_event *event) ··· 571 557 if (err) 572 558 return err; 573 559 560 + atomic_inc(&active_events); 574 561 event->destroy = hw_perf_event_destroy; 575 562 576 563 event->hw.idx = -1; ··· 1444 1429 u64 finish_clock; 1445 1430 int ret; 1446 1431 1432 + /* 1433 + * All PMUs/events that share this PMI handler should make sure to 1434 + * increment active_events for their events. 1435 + */ 1447 1436 if (!atomic_read(&active_events)) 1448 1437 return NMI_DONE; 1449 1438