Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

perf/x86/intel/cqm: Do not access cpu_data() from CPU_UP_PREPARE handler

Tony reports that booting his 144-cpu machine with maxcpus=10 triggers
the following WARN_ON():

[ 21.045727] WARNING: CPU: 8 PID: 647 at arch/x86/kernel/cpu/perf_event_intel_cqm.c:1267 intel_cqm_cpu_prepare+0x75/0x90()
[ 21.045744] CPU: 8 PID: 647 Comm: systemd-udevd Not tainted 4.2.0-rc4 #1
[ 21.045745] Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS BRHSXSD1.86B.0066.R00.1506021730 06/02/2015
[ 21.045747] 0000000000000000 0000000082771b09 ffff880856333ba8 ffffffff81669b67
[ 21.045748] 0000000000000000 0000000000000000 ffff880856333be8 ffffffff8107b02a
[ 21.045750] ffff88085b789800 ffff88085f68a020 ffffffff819e2470 000000000000000a
[ 21.045750] Call Trace:
[ 21.045757] [<ffffffff81669b67>] dump_stack+0x45/0x57
[ 21.045759] [<ffffffff8107b02a>] warn_slowpath_common+0x8a/0xc0
[ 21.045761] [<ffffffff8107b15a>] warn_slowpath_null+0x1a/0x20
[ 21.045762] [<ffffffff81036725>] intel_cqm_cpu_prepare+0x75/0x90
[ 21.045764] [<ffffffff81036872>] intel_cqm_cpu_notifier+0x42/0x160
[ 21.045767] [<ffffffff8109a33d>] notifier_call_chain+0x4d/0x80
[ 21.045769] [<ffffffff8109a44e>] __raw_notifier_call_chain+0xe/0x10
[ 21.045770] [<ffffffff8107b538>] _cpu_up+0xe8/0x190
[ 21.045771] [<ffffffff8107b65a>] cpu_up+0x7a/0xa0
[ 21.045774] [<ffffffff8165e920>] cpu_subsys_online+0x40/0x90
[ 21.045777] [<ffffffff81433b37>] device_online+0x67/0x90
[ 21.045778] [<ffffffff81433bea>] online_store+0x8a/0xa0
[ 21.045782] [<ffffffff81430e78>] dev_attr_store+0x18/0x30
[ 21.045785] [<ffffffff8126b6ba>] sysfs_kf_write+0x3a/0x50
[ 21.045786] [<ffffffff8126ad40>] kernfs_fop_write+0x120/0x170
[ 21.045789] [<ffffffff811f0b77>] __vfs_write+0x37/0x100
[ 21.045791] [<ffffffff811f38b8>] ? __sb_start_write+0x58/0x110
[ 21.045795] [<ffffffff81296d2d>] ? security_file_permission+0x3d/0xc0
[ 21.045796] [<ffffffff811f1279>] vfs_write+0xa9/0x190
[ 21.045797] [<ffffffff811f2075>] SyS_write+0x55/0xc0
[ 21.045800] [<ffffffff81067300>] ? do_page_fault+0x30/0x80
[ 21.045804] [<ffffffff816709ae>] entry_SYSCALL_64_fastpath+0x12/0x71
[ 21.045805] ---[ end trace fe228b836d8af405 ]---

The root cause is that CPU_UP_PREPARE is completely the wrong notifier
action from which to access cpu_data(), because smp_store_cpu_info()
won't have been executed by the target CPU at that point, which in turn
means that ->x86_cache_max_rmid and ->x86_cache_occ_scale haven't been
filled out.

Instead let's invoke our handler from CPU_STARTING and rename it
appropriately.

Reported-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Kanaka Juvva <kanaka.d.juvva@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vikas Shivappa <vikas.shivappa@intel.com>
Link: http://lkml.kernel.org/r/1438863163-14083-1-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>

authored by

Matt Fleming and committed by
Ingo Molnar
d7a702f0 dbc72b7a

+3 -5
+3 -5
arch/x86/kernel/cpu/perf_event_intel_cqm.c
··· 1255 1255 cpumask_set_cpu(cpu, &cqm_cpumask); 1256 1256 } 1257 1257 1258 - static void intel_cqm_cpu_prepare(unsigned int cpu) 1258 + static void intel_cqm_cpu_starting(unsigned int cpu) 1259 1259 { 1260 1260 struct intel_pqr_state *state = &per_cpu(pqr_state, cpu); 1261 1261 struct cpuinfo_x86 *c = &cpu_data(cpu); ··· 1296 1296 unsigned int cpu = (unsigned long)hcpu; 1297 1297 1298 1298 switch (action & ~CPU_TASKS_FROZEN) { 1299 - case CPU_UP_PREPARE: 1300 - intel_cqm_cpu_prepare(cpu); 1301 - break; 1302 1299 case CPU_DOWN_PREPARE: 1303 1300 intel_cqm_cpu_exit(cpu); 1304 1301 break; 1305 1302 case CPU_STARTING: 1303 + intel_cqm_cpu_starting(cpu); 1306 1304 cqm_pick_event_reader(cpu); 1307 1305 break; 1308 1306 } ··· 1371 1373 goto out; 1372 1374 1373 1375 for_each_online_cpu(i) { 1374 - intel_cqm_cpu_prepare(i); 1376 + intel_cqm_cpu_starting(i); 1375 1377 cqm_pick_event_reader(i); 1376 1378 } 1377 1379