Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

x86/speculation/mds: Conditionally clear CPU buffers on idle entry

Add a static key which controls the invocation of the CPU buffer clear
mechanism on idle entry. This is independent of other MDS mitigations
because the idle entry invocation to mitigate the potential leakage due to
store buffer repartitioning is only necessary on SMT systems.

Add the actual invocations to the different halt/mwait variants which
covers all usage sites. mwaitx is not patched as it's not available on
Intel CPUs.

The buffer clear is only invoked before entering the C-State to prevent
that stale data from the idling CPU is spilled to the Hyper-Thread sibling
after the Store buffer got repartitioned and all entries are available to
the non idle sibling.

When coming out of idle the store buffer is partitioned again so each
sibling has half of it available. Now CPU which returned from idle could be
speculatively exposed to contents of the sibling, but the buffers are
flushed either on exit to user space or on VMENTER.

When later on conditional buffer clearing is implemented on top of this,
then there is no action required either because before returning to user
space the context switch will set the condition flag which causes a flush
on the return to user path.

Note, that the buffer clearing on idle is only sensible on CPUs which are
solely affected by MSBDS and not any other variant of MDS because the other
MDS variants cannot be mitigated when SMT is enabled, so the buffer
clearing on idle would be a window dressing exercise.

This intentionally does not handle the case in the acpi/processor_idle
driver which uses the legacy IO port interface for C-State transitions for
two reasons:

- The acpi/processor_idle driver was replaced by the intel_idle driver
almost a decade ago. Anything Nehalem upwards supports it and defaults
to that new driver.

- The legacy IO port interface is likely to be used on older and therefore
unaffected CPUs or on systems which do not receive microcode updates
anymore, so there is no point in adding that.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Jon Masters <jcm@redhat.com>
Tested-by: Jon Masters <jcm@redhat.com>

+68
+42
Documentation/x86/mds.rst
··· 149 149 This takes the paranoid exit path only when the INT1 breakpoint is in 150 150 kernel space. #DB on a user space address takes the regular exit path, 151 151 so no extra mitigation required. 152 + 153 + 154 + 2. C-State transition 155 + ^^^^^^^^^^^^^^^^^^^^^ 156 + 157 + When a CPU goes idle and enters a C-State the CPU buffers need to be 158 + cleared on affected CPUs when SMT is active. This addresses the 159 + repartitioning of the store buffer when one of the Hyper-Threads enters 160 + a C-State. 161 + 162 + When SMT is inactive, i.e. either the CPU does not support it or all 163 + sibling threads are offline CPU buffer clearing is not required. 164 + 165 + The idle clearing is enabled on CPUs which are only affected by MSBDS 166 + and not by any other MDS variant. The other MDS variants cannot be 167 + protected against cross Hyper-Thread attacks because the Fill Buffer and 168 + the Load Ports are shared. So on CPUs affected by other variants, the 169 + idle clearing would be a window dressing exercise and is therefore not 170 + activated. 171 + 172 + The invocation is controlled by the static key mds_idle_clear which is 173 + switched depending on the chosen mitigation mode and the SMT state of 174 + the system. 175 + 176 + The buffer clear is only invoked before entering the C-State to prevent 177 + that stale data from the idling CPU from spilling to the Hyper-Thread 178 + sibling after the store buffer got repartitioned and all entries are 179 + available to the non idle sibling. 180 + 181 + When coming out of idle the store buffer is partitioned again so each 182 + sibling has half of it available. The back from idle CPU could be then 183 + speculatively exposed to contents of the sibling. The buffers are 184 + flushed either on exit to user space or on VMENTER so malicious code 185 + in user space or the guest cannot speculatively access them. 186 + 187 + The mitigation is hooked into all variants of halt()/mwait(), but does 188 + not cover the legacy ACPI IO-Port mechanism because the ACPI idle driver 189 + has been superseded by the intel_idle driver around 2010 and is 190 + preferred on all affected CPUs which are expected to gain the MD_CLEAR 191 + functionality in microcode. Aside of that the IO-Port mechanism is a 192 + legacy interface which is only used on older systems which are either 193 + not affected or do not receive microcode updates anymore.
+4
arch/x86/include/asm/irqflags.h
··· 6 6 7 7 #ifndef __ASSEMBLY__ 8 8 9 + #include <asm/nospec-branch.h> 10 + 9 11 /* Provide __cpuidle; we can't safely include <linux/cpu.h> */ 10 12 #define __cpuidle __attribute__((__section__(".cpuidle.text"))) 11 13 ··· 56 54 57 55 static inline __cpuidle void native_safe_halt(void) 58 56 { 57 + mds_idle_clear_cpu_buffers(); 59 58 asm volatile("sti; hlt": : :"memory"); 60 59 } 61 60 62 61 static inline __cpuidle void native_halt(void) 63 62 { 63 + mds_idle_clear_cpu_buffers(); 64 64 asm volatile("hlt": : :"memory"); 65 65 } 66 66
+7
arch/x86/include/asm/mwait.h
··· 6 6 #include <linux/sched/idle.h> 7 7 8 8 #include <asm/cpufeature.h> 9 + #include <asm/nospec-branch.h> 9 10 10 11 #define MWAIT_SUBSTATE_MASK 0xf 11 12 #define MWAIT_CSTATE_MASK 0xf ··· 41 40 42 41 static inline void __mwait(unsigned long eax, unsigned long ecx) 43 42 { 43 + mds_idle_clear_cpu_buffers(); 44 + 44 45 /* "mwait %eax, %ecx;" */ 45 46 asm volatile(".byte 0x0f, 0x01, 0xc9;" 46 47 :: "a" (eax), "c" (ecx)); ··· 77 74 static inline void __mwaitx(unsigned long eax, unsigned long ebx, 78 75 unsigned long ecx) 79 76 { 77 + /* No MDS buffer clear as this is AMD/HYGON only */ 78 + 80 79 /* "mwaitx %eax, %ebx, %ecx;" */ 81 80 asm volatile(".byte 0x0f, 0x01, 0xfb;" 82 81 :: "a" (eax), "b" (ebx), "c" (ecx)); ··· 86 81 87 82 static inline void __sti_mwait(unsigned long eax, unsigned long ecx) 88 83 { 84 + mds_idle_clear_cpu_buffers(); 85 + 89 86 trace_hardirqs_on(); 90 87 /* "mwait %eax, %ecx;" */ 91 88 asm volatile("sti; .byte 0x0f, 0x01, 0xc9;"
+12
arch/x86/include/asm/nospec-branch.h
··· 319 319 DECLARE_STATIC_KEY_FALSE(switch_mm_always_ibpb); 320 320 321 321 DECLARE_STATIC_KEY_FALSE(mds_user_clear); 322 + DECLARE_STATIC_KEY_FALSE(mds_idle_clear); 322 323 323 324 #include <asm/segment.h> 324 325 ··· 354 353 static inline void mds_user_clear_cpu_buffers(void) 355 354 { 356 355 if (static_branch_likely(&mds_user_clear)) 356 + mds_clear_cpu_buffers(); 357 + } 358 + 359 + /** 360 + * mds_idle_clear_cpu_buffers - Mitigation for MDS vulnerability 361 + * 362 + * Clear CPU buffers if the corresponding static key is enabled 363 + */ 364 + static inline void mds_idle_clear_cpu_buffers(void) 365 + { 366 + if (static_branch_likely(&mds_idle_clear)) 357 367 mds_clear_cpu_buffers(); 358 368 } 359 369
+3
arch/x86/kernel/cpu/bugs.c
··· 66 66 /* Control MDS CPU buffer clear before returning to user space */ 67 67 DEFINE_STATIC_KEY_FALSE(mds_user_clear); 68 68 EXPORT_SYMBOL_GPL(mds_user_clear); 69 + /* Control MDS CPU buffer clear before idling (halt, mwait) */ 70 + DEFINE_STATIC_KEY_FALSE(mds_idle_clear); 71 + EXPORT_SYMBOL_GPL(mds_idle_clear); 69 72 70 73 void __init check_bugs(void) 71 74 {