Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

x86/CPU/AMD: Bring back Compute Unit ID

Commit:

a33d331761bc ("x86/CPU/AMD: Fix Bulldozer topology")

restored the initial approach we had with the Fam15h topology of
enumerating CU (Compute Unit) threads as cores. And this is still
correct - they're beefier than HT threads but still have some
shared functionality.

Our current approach has a problem with the Mad Max Steam game, for
example. Yves Dionne reported a certain "choppiness" while playing on
v4.9.5.

That problem stems most likely from the fact that the CU threads share
resources within one CU and when we schedule to a thread of a different
compute unit, this incurs latency due to migrating the working set to a
different CU through the caches.

When the thread siblings mask mirrors that aspect of the CUs and
threads, the scheduler pays attention to it and tries to schedule within
one CU first. Which takes care of the latency, of course.

Reported-by: Yves Dionne <yves.dionne@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org> # 4.9
Cc: Brice Goglin <Brice.Goglin@inria.fr>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yazen Ghannam <yazen.ghannam@amd.com>
Link: http://lkml.kernel.org/r/20170205105022.8705-1-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>

authored by

Borislav Petkov and committed by
Ingo Molnar
79a8b9aa a0a28644

+19 -4
+1
arch/x86/include/asm/processor.h
··· 104 104 __u8 x86_phys_bits; 105 105 /* CPUID returned core id bits: */ 106 106 __u8 x86_coreid_bits; 107 + __u8 cu_id; 107 108 /* Max extended CPUID function supported: */ 108 109 __u32 extended_cpuid_level; 109 110 /* Maximum supported CPUID level, -1=no CPUID: */
+8 -1
arch/x86/kernel/cpu/amd.c
··· 309 309 310 310 /* get information required for multi-node processors */ 311 311 if (boot_cpu_has(X86_FEATURE_TOPOEXT)) { 312 + u32 eax, ebx, ecx, edx; 312 313 313 - node_id = cpuid_ecx(0x8000001e) & 7; 314 + cpuid(0x8000001e, &eax, &ebx, &ecx, &edx); 315 + 316 + node_id = ecx & 0xff; 317 + smp_num_siblings = ((ebx >> 8) & 0xff) + 1; 318 + 319 + if (c->x86 == 0x15) 320 + c->cu_id = ebx & 0xff; 314 321 315 322 /* 316 323 * We may have multiple LLCs if L3 caches exist, so check if we
+1
arch/x86/kernel/cpu/common.c
··· 1015 1015 c->x86_model_id[0] = '\0'; /* Unset */ 1016 1016 c->x86_max_cores = 1; 1017 1017 c->x86_coreid_bits = 0; 1018 + c->cu_id = 0xff; 1018 1019 #ifdef CONFIG_X86_64 1019 1020 c->x86_clflush_size = 64; 1020 1021 c->x86_phys_bits = 36;
+9 -3
arch/x86/kernel/smpboot.c
··· 433 433 int cpu1 = c->cpu_index, cpu2 = o->cpu_index; 434 434 435 435 if (c->phys_proc_id == o->phys_proc_id && 436 - per_cpu(cpu_llc_id, cpu1) == per_cpu(cpu_llc_id, cpu2) && 437 - c->cpu_core_id == o->cpu_core_id) 438 - return topology_sane(c, o, "smt"); 436 + per_cpu(cpu_llc_id, cpu1) == per_cpu(cpu_llc_id, cpu2)) { 437 + if (c->cpu_core_id == o->cpu_core_id) 438 + return topology_sane(c, o, "smt"); 439 + 440 + if ((c->cu_id != 0xff) && 441 + (o->cu_id != 0xff) && 442 + (c->cu_id == o->cu_id)) 443 + return topology_sane(c, o, "smt"); 444 + } 439 445 440 446 } else if (c->phys_proc_id == o->phys_proc_id && 441 447 c->cpu_core_id == o->cpu_core_id) {