Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

x86/mtrr: Skip cache flushes on CPUs with cache self-snooping

Programming MTRR registers in multi-processor systems is a rather lengthy
process. Furthermore, all processors must program these registers in lock
step and with interrupts disabled; the process also involves flushing
caches and TLBs twice. As a result, the process may take a considerable
amount of time.

On some platforms, this can lead to a large skew of the refined-jiffies
clock source. Early when booting, if no other clock is available (e.g.,
booting with hpet=disabled), the refined-jiffies clock source is used to
monitor the TSC clock source. If the skew of refined-jiffies is too large,
Linux wrongly assumes that the TSC is unstable:

clocksource: timekeeping watchdog on CPU1: Marking clocksource
'tsc-early' as unstable because the skew is too large:
clocksource: 'refined-jiffies' wd_now: fffedc10 wd_last:
fffedb90 mask: ffffffff
clocksource: 'tsc-early' cs_now: 5eccfddebc cs_last: 5e7e3303d4
mask: ffffffffffffffff
tsc: Marking TSC unstable due to clocksource watchdog

As per measurements, around 98% of the time needed by the procedure to
program MTRRs in multi-processor systems is spent flushing caches with
wbinvd(). As per the Section 11.11.8 of the Intel 64 and IA 32
Architectures Software Developer's Manual, it is not necessary to flush
caches if the CPU supports cache self-snooping. Thus, skipping the cache
flushes can reduce by several tens of milliseconds the time needed to
complete the programming of the MTRR registers:

Platform Before After
104-core (208 Threads) Skylake 1437ms 28ms
2-core ( 4 Threads) Haswell 114ms 2ms

Reported-by: Mohammad Etemadi <mohammad.etemadi@intel.com>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Alan Cox <alan.cox@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Hans de Goede <hdegoede@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jordan Borgner <mail@jordan-borgner.de>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: Ricardo Neri <ricardo.neri@intel.com>
Cc: Andy Shevchenko <andriy.shevchenko@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Peter Feiner <pfeiner@google.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Link: https://lkml.kernel.org/r/1561689337-19390-3-git-send-email-ricardo.neri-calderon@linux.intel.com

authored by

Ricardo Neri and committed by
Thomas Gleixner
fd329f27 1e03bff3

+13 -2
+13 -2
arch/x86/kernel/cpu/mtrr/generic.c
··· 743 743 /* Enter the no-fill (CD=1, NW=0) cache mode and flush caches. */ 744 744 cr0 = read_cr0() | X86_CR0_CD; 745 745 write_cr0(cr0); 746 - wbinvd(); 746 + 747 + /* 748 + * Cache flushing is the most time-consuming step when programming 749 + * the MTRRs. Fortunately, as per the Intel Software Development 750 + * Manual, we can skip it if the processor supports cache self- 751 + * snooping. 752 + */ 753 + if (!static_cpu_has(X86_FEATURE_SELFSNOOP)) 754 + wbinvd(); 747 755 748 756 /* Save value of CR4 and clear Page Global Enable (bit 7) */ 749 757 if (boot_cpu_has(X86_FEATURE_PGE)) { ··· 768 760 769 761 /* Disable MTRRs, and set the default type to uncached */ 770 762 mtrr_wrmsr(MSR_MTRRdefType, deftype_lo & ~0xcff, deftype_hi); 771 - wbinvd(); 763 + 764 + /* Again, only flush caches if we have to. */ 765 + if (!static_cpu_has(X86_FEATURE_SELFSNOOP)) 766 + wbinvd(); 772 767 } 773 768 774 769 static void post_set(void) __releases(set_atomicity_lock)