Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

Merge branch 'x86-mds-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 MDS mitigations from Thomas Gleixner:
"Microarchitectural Data Sampling (MDS) is a hardware vulnerability
which allows unprivileged speculative access to data which is
available in various CPU internal buffers. This new set of misfeatures
has the following CVEs assigned:

CVE-2018-12126 MSBDS Microarchitectural Store Buffer Data Sampling
CVE-2018-12130 MFBDS Microarchitectural Fill Buffer Data Sampling
CVE-2018-12127 MLPDS Microarchitectural Load Port Data Sampling
CVE-2019-11091 MDSUM Microarchitectural Data Sampling Uncacheable Memory

MDS attacks target microarchitectural buffers which speculatively
forward data under certain conditions. Disclosure gadgets can expose
this data via cache side channels.

Contrary to other speculation based vulnerabilities the MDS
vulnerability does not allow the attacker to control the memory target
address. As a consequence the attacks are purely sampling based, but
as demonstrated with the TLBleed attack samples can be postprocessed
successfully.

The mitigation is to flush the microarchitectural buffers on return to
user space and before entering a VM. It's bolted on the VERW
instruction and requires a microcode update. As some of the attacks
exploit data structures shared between hyperthreads, full protection
requires to disable hyperthreading. The kernel does not do that by
default to avoid breaking unattended updates.

The mitigation set comes with documentation for administrators and a
deeper technical view"

* 'x86-mds-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
x86/speculation/mds: Fix documentation typo
Documentation: Correct the possible MDS sysfs values
x86/mds: Add MDSUM variant to the MDS documentation
x86/speculation/mds: Add 'mitigations=' support for MDS
x86/speculation/mds: Print SMT vulnerable on MSBDS with mitigations off
x86/speculation/mds: Fix comment
x86/speculation/mds: Add SMT warning message
x86/speculation: Move arch_smt_update() call to after mitigation decisions
x86/speculation/mds: Add mds=full,nosmt cmdline option
Documentation: Add MDS vulnerability documentation
Documentation: Move L1TF to separate directory
x86/speculation/mds: Add mitigation mode VMWERV
x86/speculation/mds: Add sysfs reporting for MDS
x86/speculation/mds: Add mitigation control for MDS
x86/speculation/mds: Conditionally clear CPU buffers on idle entry
x86/kvm/vmx: Add MDS protection when L1D Flush is not active
x86/speculation/mds: Clear CPU buffers on exit to user
x86/speculation/mds: Add mds_clear_cpu_buffers()
x86/kvm: Expose X86_FEATURE_MD_CLEAR to guests
x86/speculation/mds: Add BUG_MSBDS_ONLY
...

+921 -82
+2 -2
Documentation/ABI/testing/sysfs-devices-system-cpu
··· 484 484 /sys/devices/system/cpu/vulnerabilities/spectre_v2 485 485 /sys/devices/system/cpu/vulnerabilities/spec_store_bypass 486 486 /sys/devices/system/cpu/vulnerabilities/l1tf 487 + /sys/devices/system/cpu/vulnerabilities/mds 487 488 Date: January 2018 488 489 Contact: Linux kernel mailing list <linux-kernel@vger.kernel.org> 489 490 Description: Information about CPU vulnerabilities ··· 497 496 "Vulnerable" CPU is affected and no mitigation in effect 498 497 "Mitigation: $M" CPU is affected and mitigation $M is in effect 499 498 500 - Details about the l1tf file can be found in 501 - Documentation/admin-guide/l1tf.rst 499 + See also: Documentation/admin-guide/hw-vuln/index.rst 502 500 503 501 What: /sys/devices/system/cpu/smt 504 502 /sys/devices/system/cpu/smt/active
+13
Documentation/admin-guide/hw-vuln/index.rst
··· 1 + ======================== 2 + Hardware vulnerabilities 3 + ======================== 4 + 5 + This section describes CPU vulnerabilities and provides an overview of the 6 + possible mitigations along with guidance for selecting mitigations if they 7 + are configurable at compile, boot or run time. 8 + 9 + .. toctree:: 10 + :maxdepth: 1 11 + 12 + l1tf 13 + mds
+308
Documentation/admin-guide/hw-vuln/mds.rst
··· 1 + MDS - Microarchitectural Data Sampling 2 + ====================================== 3 + 4 + Microarchitectural Data Sampling is a hardware vulnerability which allows 5 + unprivileged speculative access to data which is available in various CPU 6 + internal buffers. 7 + 8 + Affected processors 9 + ------------------- 10 + 11 + This vulnerability affects a wide range of Intel processors. The 12 + vulnerability is not present on: 13 + 14 + - Processors from AMD, Centaur and other non Intel vendors 15 + 16 + - Older processor models, where the CPU family is < 6 17 + 18 + - Some Atoms (Bonnell, Saltwell, Goldmont, GoldmontPlus) 19 + 20 + - Intel processors which have the ARCH_CAP_MDS_NO bit set in the 21 + IA32_ARCH_CAPABILITIES MSR. 22 + 23 + Whether a processor is affected or not can be read out from the MDS 24 + vulnerability file in sysfs. See :ref:`mds_sys_info`. 25 + 26 + Not all processors are affected by all variants of MDS, but the mitigation 27 + is identical for all of them so the kernel treats them as a single 28 + vulnerability. 29 + 30 + Related CVEs 31 + ------------ 32 + 33 + The following CVE entries are related to the MDS vulnerability: 34 + 35 + ============== ===== =================================================== 36 + CVE-2018-12126 MSBDS Microarchitectural Store Buffer Data Sampling 37 + CVE-2018-12130 MFBDS Microarchitectural Fill Buffer Data Sampling 38 + CVE-2018-12127 MLPDS Microarchitectural Load Port Data Sampling 39 + CVE-2019-11091 MDSUM Microarchitectural Data Sampling Uncacheable Memory 40 + ============== ===== =================================================== 41 + 42 + Problem 43 + ------- 44 + 45 + When performing store, load, L1 refill operations, processors write data 46 + into temporary microarchitectural structures (buffers). The data in the 47 + buffer can be forwarded to load operations as an optimization. 48 + 49 + Under certain conditions, usually a fault/assist caused by a load 50 + operation, data unrelated to the load memory address can be speculatively 51 + forwarded from the buffers. Because the load operation causes a fault or 52 + assist and its result will be discarded, the forwarded data will not cause 53 + incorrect program execution or state changes. But a malicious operation 54 + may be able to forward this speculative data to a disclosure gadget which 55 + allows in turn to infer the value via a cache side channel attack. 56 + 57 + Because the buffers are potentially shared between Hyper-Threads cross 58 + Hyper-Thread attacks are possible. 59 + 60 + Deeper technical information is available in the MDS specific x86 61 + architecture section: :ref:`Documentation/x86/mds.rst <mds>`. 62 + 63 + 64 + Attack scenarios 65 + ---------------- 66 + 67 + Attacks against the MDS vulnerabilities can be mounted from malicious non 68 + priviledged user space applications running on hosts or guest. Malicious 69 + guest OSes can obviously mount attacks as well. 70 + 71 + Contrary to other speculation based vulnerabilities the MDS vulnerability 72 + does not allow the attacker to control the memory target address. As a 73 + consequence the attacks are purely sampling based, but as demonstrated with 74 + the TLBleed attack samples can be postprocessed successfully. 75 + 76 + Web-Browsers 77 + ^^^^^^^^^^^^ 78 + 79 + It's unclear whether attacks through Web-Browsers are possible at 80 + all. The exploitation through Java-Script is considered very unlikely, 81 + but other widely used web technologies like Webassembly could possibly be 82 + abused. 83 + 84 + 85 + .. _mds_sys_info: 86 + 87 + MDS system information 88 + ----------------------- 89 + 90 + The Linux kernel provides a sysfs interface to enumerate the current MDS 91 + status of the system: whether the system is vulnerable, and which 92 + mitigations are active. The relevant sysfs file is: 93 + 94 + /sys/devices/system/cpu/vulnerabilities/mds 95 + 96 + The possible values in this file are: 97 + 98 + .. list-table:: 99 + 100 + * - 'Not affected' 101 + - The processor is not vulnerable 102 + * - 'Vulnerable' 103 + - The processor is vulnerable, but no mitigation enabled 104 + * - 'Vulnerable: Clear CPU buffers attempted, no microcode' 105 + - The processor is vulnerable but microcode is not updated. 106 + 107 + The mitigation is enabled on a best effort basis. See :ref:`vmwerv` 108 + * - 'Mitigation: Clear CPU buffers' 109 + - The processor is vulnerable and the CPU buffer clearing mitigation is 110 + enabled. 111 + 112 + If the processor is vulnerable then the following information is appended 113 + to the above information: 114 + 115 + ======================== ============================================ 116 + 'SMT vulnerable' SMT is enabled 117 + 'SMT mitigated' SMT is enabled and mitigated 118 + 'SMT disabled' SMT is disabled 119 + 'SMT Host state unknown' Kernel runs in a VM, Host SMT state unknown 120 + ======================== ============================================ 121 + 122 + .. _vmwerv: 123 + 124 + Best effort mitigation mode 125 + ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 126 + 127 + If the processor is vulnerable, but the availability of the microcode based 128 + mitigation mechanism is not advertised via CPUID the kernel selects a best 129 + effort mitigation mode. This mode invokes the mitigation instructions 130 + without a guarantee that they clear the CPU buffers. 131 + 132 + This is done to address virtualization scenarios where the host has the 133 + microcode update applied, but the hypervisor is not yet updated to expose 134 + the CPUID to the guest. If the host has updated microcode the protection 135 + takes effect otherwise a few cpu cycles are wasted pointlessly. 136 + 137 + The state in the mds sysfs file reflects this situation accordingly. 138 + 139 + 140 + Mitigation mechanism 141 + ------------------------- 142 + 143 + The kernel detects the affected CPUs and the presence of the microcode 144 + which is required. 145 + 146 + If a CPU is affected and the microcode is available, then the kernel 147 + enables the mitigation by default. The mitigation can be controlled at boot 148 + time via a kernel command line option. See 149 + :ref:`mds_mitigation_control_command_line`. 150 + 151 + .. _cpu_buffer_clear: 152 + 153 + CPU buffer clearing 154 + ^^^^^^^^^^^^^^^^^^^ 155 + 156 + The mitigation for MDS clears the affected CPU buffers on return to user 157 + space and when entering a guest. 158 + 159 + If SMT is enabled it also clears the buffers on idle entry when the CPU 160 + is only affected by MSBDS and not any other MDS variant, because the 161 + other variants cannot be protected against cross Hyper-Thread attacks. 162 + 163 + For CPUs which are only affected by MSBDS the user space, guest and idle 164 + transition mitigations are sufficient and SMT is not affected. 165 + 166 + .. _virt_mechanism: 167 + 168 + Virtualization mitigation 169 + ^^^^^^^^^^^^^^^^^^^^^^^^^ 170 + 171 + The protection for host to guest transition depends on the L1TF 172 + vulnerability of the CPU: 173 + 174 + - CPU is affected by L1TF: 175 + 176 + If the L1D flush mitigation is enabled and up to date microcode is 177 + available, the L1D flush mitigation is automatically protecting the 178 + guest transition. 179 + 180 + If the L1D flush mitigation is disabled then the MDS mitigation is 181 + invoked explicit when the host MDS mitigation is enabled. 182 + 183 + For details on L1TF and virtualization see: 184 + :ref:`Documentation/admin-guide/hw-vuln//l1tf.rst <mitigation_control_kvm>`. 185 + 186 + - CPU is not affected by L1TF: 187 + 188 + CPU buffers are flushed before entering the guest when the host MDS 189 + mitigation is enabled. 190 + 191 + The resulting MDS protection matrix for the host to guest transition: 192 + 193 + ============ ===== ============= ============ ================= 194 + L1TF MDS VMX-L1FLUSH Host MDS MDS-State 195 + 196 + Don't care No Don't care N/A Not affected 197 + 198 + Yes Yes Disabled Off Vulnerable 199 + 200 + Yes Yes Disabled Full Mitigated 201 + 202 + Yes Yes Enabled Don't care Mitigated 203 + 204 + No Yes N/A Off Vulnerable 205 + 206 + No Yes N/A Full Mitigated 207 + ============ ===== ============= ============ ================= 208 + 209 + This only covers the host to guest transition, i.e. prevents leakage from 210 + host to guest, but does not protect the guest internally. Guests need to 211 + have their own protections. 212 + 213 + .. _xeon_phi: 214 + 215 + XEON PHI specific considerations 216 + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 217 + 218 + The XEON PHI processor family is affected by MSBDS which can be exploited 219 + cross Hyper-Threads when entering idle states. Some XEON PHI variants allow 220 + to use MWAIT in user space (Ring 3) which opens an potential attack vector 221 + for malicious user space. The exposure can be disabled on the kernel 222 + command line with the 'ring3mwait=disable' command line option. 223 + 224 + XEON PHI is not affected by the other MDS variants and MSBDS is mitigated 225 + before the CPU enters a idle state. As XEON PHI is not affected by L1TF 226 + either disabling SMT is not required for full protection. 227 + 228 + .. _mds_smt_control: 229 + 230 + SMT control 231 + ^^^^^^^^^^^ 232 + 233 + All MDS variants except MSBDS can be attacked cross Hyper-Threads. That 234 + means on CPUs which are affected by MFBDS or MLPDS it is necessary to 235 + disable SMT for full protection. These are most of the affected CPUs; the 236 + exception is XEON PHI, see :ref:`xeon_phi`. 237 + 238 + Disabling SMT can have a significant performance impact, but the impact 239 + depends on the type of workloads. 240 + 241 + See the relevant chapter in the L1TF mitigation documentation for details: 242 + :ref:`Documentation/admin-guide/hw-vuln/l1tf.rst <smt_control>`. 243 + 244 + 245 + .. _mds_mitigation_control_command_line: 246 + 247 + Mitigation control on the kernel command line 248 + --------------------------------------------- 249 + 250 + The kernel command line allows to control the MDS mitigations at boot 251 + time with the option "mds=". The valid arguments for this option are: 252 + 253 + ============ ============================================================= 254 + full If the CPU is vulnerable, enable all available mitigations 255 + for the MDS vulnerability, CPU buffer clearing on exit to 256 + userspace and when entering a VM. Idle transitions are 257 + protected as well if SMT is enabled. 258 + 259 + It does not automatically disable SMT. 260 + 261 + full,nosmt The same as mds=full, with SMT disabled on vulnerable 262 + CPUs. This is the complete mitigation. 263 + 264 + off Disables MDS mitigations completely. 265 + 266 + ============ ============================================================= 267 + 268 + Not specifying this option is equivalent to "mds=full". 269 + 270 + 271 + Mitigation selection guide 272 + -------------------------- 273 + 274 + 1. Trusted userspace 275 + ^^^^^^^^^^^^^^^^^^^^ 276 + 277 + If all userspace applications are from a trusted source and do not 278 + execute untrusted code which is supplied externally, then the mitigation 279 + can be disabled. 280 + 281 + 282 + 2. Virtualization with trusted guests 283 + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 284 + 285 + The same considerations as above versus trusted user space apply. 286 + 287 + 3. Virtualization with untrusted guests 288 + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 289 + 290 + The protection depends on the state of the L1TF mitigations. 291 + See :ref:`virt_mechanism`. 292 + 293 + If the MDS mitigation is enabled and SMT is disabled, guest to host and 294 + guest to guest attacks are prevented. 295 + 296 + .. _mds_default_mitigations: 297 + 298 + Default mitigations 299 + ------------------- 300 + 301 + The kernel default mitigations for vulnerable processors are: 302 + 303 + - Enable CPU buffer clearing 304 + 305 + The kernel does not by default enforce the disabling of SMT, which leaves 306 + SMT systems vulnerable when running untrusted code. The same rationale as 307 + for L1TF applies. 308 + See :ref:`Documentation/admin-guide/hw-vuln//l1tf.rst <default_mitigations>`.
+2 -4
Documentation/admin-guide/index.rst
··· 17 17 kernel-parameters 18 18 devices 19 19 20 - This section describes CPU vulnerabilities and provides an overview of the 21 - possible mitigations along with guidance for selecting mitigations if they 22 - are configurable at compile, boot or run time. 20 + This section describes CPU vulnerabilities and their mitigations. 23 21 24 22 .. toctree:: 25 23 :maxdepth: 1 26 24 27 - l1tf 25 + hw-vuln/index 28 26 29 27 Here is a set of documents aimed at users who are trying to track down 30 28 problems and bugs in particular.
+29 -1
Documentation/admin-guide/kernel-parameters.txt
··· 2143 2143 2144 2144 Default is 'flush'. 2145 2145 2146 - For details see: Documentation/admin-guide/l1tf.rst 2146 + For details see: Documentation/admin-guide/hw-vuln/l1tf.rst 2147 2147 2148 2148 l2cr= [PPC] 2149 2149 ··· 2389 2389 Format: <first>,<last> 2390 2390 Specifies range of consoles to be captured by the MDA. 2391 2391 2392 + mds= [X86,INTEL] 2393 + Control mitigation for the Micro-architectural Data 2394 + Sampling (MDS) vulnerability. 2395 + 2396 + Certain CPUs are vulnerable to an exploit against CPU 2397 + internal buffers which can forward information to a 2398 + disclosure gadget under certain conditions. 2399 + 2400 + In vulnerable processors, the speculatively 2401 + forwarded data can be used in a cache side channel 2402 + attack, to access data to which the attacker does 2403 + not have direct access. 2404 + 2405 + This parameter controls the MDS mitigation. The 2406 + options are: 2407 + 2408 + full - Enable MDS mitigation on vulnerable CPUs 2409 + full,nosmt - Enable MDS mitigation and disable 2410 + SMT on vulnerable CPUs 2411 + off - Unconditionally disable MDS mitigation 2412 + 2413 + Not specifying this option is equivalent to 2414 + mds=full. 2415 + 2416 + For details see: Documentation/admin-guide/hw-vuln/mds.rst 2417 + 2392 2418 mem=nn[KMG] [KNL,BOOT] Force usage of a specific amount of memory 2393 2419 Amount of memory to be used when the kernel is not able 2394 2420 to see the whole system memory or for test. ··· 2591 2565 spec_store_bypass_disable=off [X86,PPC] 2592 2566 ssbd=force-off [ARM64] 2593 2567 l1tf=off [X86] 2568 + mds=off [X86] 2594 2569 2595 2570 auto (default) 2596 2571 Mitigate all CPU vulnerabilities, but leave SMT ··· 2606 2579 if needed. This is for users who always want to 2607 2580 be fully mitigated, even if it means losing SMT. 2608 2581 Equivalent to: l1tf=flush,nosmt [X86] 2582 + mds=full,nosmt [X86] 2609 2583 2610 2584 mminit_loglevel= 2611 2585 [KNL] When CONFIG_DEBUG_MEMORY_INIT is set, this
+1
Documentation/admin-guide/l1tf.rst Documentation/admin-guide/hw-vuln/l1tf.rst
··· 445 445 line, then 'always' is enforced and the kvm-intel.vmentry_l1d_flush 446 446 module parameter is ignored and writes to the sysfs file are rejected. 447 447 448 + .. _mitigation_selection: 448 449 449 450 Mitigation selection guide 450 451 --------------------------
+1
Documentation/index.rst
··· 114 114 115 115 x86/index 116 116 sh/index 117 + x86/index 117 118 118 119 Filesystem Documentation 119 120 ------------------------
+10
Documentation/x86/conf.py
··· 1 + # -*- coding: utf-8; mode: python -*- 2 + 3 + project = "X86 architecture specific documentation" 4 + 5 + tags.add("subproject") 6 + 7 + latex_documents = [ 8 + ('index', 'x86.tex', project, 9 + 'The kernel development community', 'manual'), 10 + ]
+1
Documentation/x86/index.rst
··· 23 23 intel_mpx 24 24 amd-memory-encryption 25 25 pti 26 + mds 26 27 microcode 27 28 resctrl_ui 28 29 usb-legacy-support
+225
Documentation/x86/mds.rst
··· 1 + Microarchitectural Data Sampling (MDS) mitigation 2 + ================================================= 3 + 4 + .. _mds: 5 + 6 + Overview 7 + -------- 8 + 9 + Microarchitectural Data Sampling (MDS) is a family of side channel attacks 10 + on internal buffers in Intel CPUs. The variants are: 11 + 12 + - Microarchitectural Store Buffer Data Sampling (MSBDS) (CVE-2018-12126) 13 + - Microarchitectural Fill Buffer Data Sampling (MFBDS) (CVE-2018-12130) 14 + - Microarchitectural Load Port Data Sampling (MLPDS) (CVE-2018-12127) 15 + - Microarchitectural Data Sampling Uncacheable Memory (MDSUM) (CVE-2019-11091) 16 + 17 + MSBDS leaks Store Buffer Entries which can be speculatively forwarded to a 18 + dependent load (store-to-load forwarding) as an optimization. The forward 19 + can also happen to a faulting or assisting load operation for a different 20 + memory address, which can be exploited under certain conditions. Store 21 + buffers are partitioned between Hyper-Threads so cross thread forwarding is 22 + not possible. But if a thread enters or exits a sleep state the store 23 + buffer is repartitioned which can expose data from one thread to the other. 24 + 25 + MFBDS leaks Fill Buffer Entries. Fill buffers are used internally to manage 26 + L1 miss situations and to hold data which is returned or sent in response 27 + to a memory or I/O operation. Fill buffers can forward data to a load 28 + operation and also write data to the cache. When the fill buffer is 29 + deallocated it can retain the stale data of the preceding operations which 30 + can then be forwarded to a faulting or assisting load operation, which can 31 + be exploited under certain conditions. Fill buffers are shared between 32 + Hyper-Threads so cross thread leakage is possible. 33 + 34 + MLPDS leaks Load Port Data. Load ports are used to perform load operations 35 + from memory or I/O. The received data is then forwarded to the register 36 + file or a subsequent operation. In some implementations the Load Port can 37 + contain stale data from a previous operation which can be forwarded to 38 + faulting or assisting loads under certain conditions, which again can be 39 + exploited eventually. Load ports are shared between Hyper-Threads so cross 40 + thread leakage is possible. 41 + 42 + MDSUM is a special case of MSBDS, MFBDS and MLPDS. An uncacheable load from 43 + memory that takes a fault or assist can leave data in a microarchitectural 44 + structure that may later be observed using one of the same methods used by 45 + MSBDS, MFBDS or MLPDS. 46 + 47 + Exposure assumptions 48 + -------------------- 49 + 50 + It is assumed that attack code resides in user space or in a guest with one 51 + exception. The rationale behind this assumption is that the code construct 52 + needed for exploiting MDS requires: 53 + 54 + - to control the load to trigger a fault or assist 55 + 56 + - to have a disclosure gadget which exposes the speculatively accessed 57 + data for consumption through a side channel. 58 + 59 + - to control the pointer through which the disclosure gadget exposes the 60 + data 61 + 62 + The existence of such a construct in the kernel cannot be excluded with 63 + 100% certainty, but the complexity involved makes it extremly unlikely. 64 + 65 + There is one exception, which is untrusted BPF. The functionality of 66 + untrusted BPF is limited, but it needs to be thoroughly investigated 67 + whether it can be used to create such a construct. 68 + 69 + 70 + Mitigation strategy 71 + ------------------- 72 + 73 + All variants have the same mitigation strategy at least for the single CPU 74 + thread case (SMT off): Force the CPU to clear the affected buffers. 75 + 76 + This is achieved by using the otherwise unused and obsolete VERW 77 + instruction in combination with a microcode update. The microcode clears 78 + the affected CPU buffers when the VERW instruction is executed. 79 + 80 + For virtualization there are two ways to achieve CPU buffer 81 + clearing. Either the modified VERW instruction or via the L1D Flush 82 + command. The latter is issued when L1TF mitigation is enabled so the extra 83 + VERW can be avoided. If the CPU is not affected by L1TF then VERW needs to 84 + be issued. 85 + 86 + If the VERW instruction with the supplied segment selector argument is 87 + executed on a CPU without the microcode update there is no side effect 88 + other than a small number of pointlessly wasted CPU cycles. 89 + 90 + This does not protect against cross Hyper-Thread attacks except for MSBDS 91 + which is only exploitable cross Hyper-thread when one of the Hyper-Threads 92 + enters a C-state. 93 + 94 + The kernel provides a function to invoke the buffer clearing: 95 + 96 + mds_clear_cpu_buffers() 97 + 98 + The mitigation is invoked on kernel/userspace, hypervisor/guest and C-state 99 + (idle) transitions. 100 + 101 + As a special quirk to address virtualization scenarios where the host has 102 + the microcode updated, but the hypervisor does not (yet) expose the 103 + MD_CLEAR CPUID bit to guests, the kernel issues the VERW instruction in the 104 + hope that it might actually clear the buffers. The state is reflected 105 + accordingly. 106 + 107 + According to current knowledge additional mitigations inside the kernel 108 + itself are not required because the necessary gadgets to expose the leaked 109 + data cannot be controlled in a way which allows exploitation from malicious 110 + user space or VM guests. 111 + 112 + Kernel internal mitigation modes 113 + -------------------------------- 114 + 115 + ======= ============================================================ 116 + off Mitigation is disabled. Either the CPU is not affected or 117 + mds=off is supplied on the kernel command line 118 + 119 + full Mitigation is enabled. CPU is affected and MD_CLEAR is 120 + advertised in CPUID. 121 + 122 + vmwerv Mitigation is enabled. CPU is affected and MD_CLEAR is not 123 + advertised in CPUID. That is mainly for virtualization 124 + scenarios where the host has the updated microcode but the 125 + hypervisor does not expose MD_CLEAR in CPUID. It's a best 126 + effort approach without guarantee. 127 + ======= ============================================================ 128 + 129 + If the CPU is affected and mds=off is not supplied on the kernel command 130 + line then the kernel selects the appropriate mitigation mode depending on 131 + the availability of the MD_CLEAR CPUID bit. 132 + 133 + Mitigation points 134 + ----------------- 135 + 136 + 1. Return to user space 137 + ^^^^^^^^^^^^^^^^^^^^^^^ 138 + 139 + When transitioning from kernel to user space the CPU buffers are flushed 140 + on affected CPUs when the mitigation is not disabled on the kernel 141 + command line. The migitation is enabled through the static key 142 + mds_user_clear. 143 + 144 + The mitigation is invoked in prepare_exit_to_usermode() which covers 145 + most of the kernel to user space transitions. There are a few exceptions 146 + which are not invoking prepare_exit_to_usermode() on return to user 147 + space. These exceptions use the paranoid exit code. 148 + 149 + - Non Maskable Interrupt (NMI): 150 + 151 + Access to sensible data like keys, credentials in the NMI context is 152 + mostly theoretical: The CPU can do prefetching or execute a 153 + misspeculated code path and thereby fetching data which might end up 154 + leaking through a buffer. 155 + 156 + But for mounting other attacks the kernel stack address of the task is 157 + already valuable information. So in full mitigation mode, the NMI is 158 + mitigated on the return from do_nmi() to provide almost complete 159 + coverage. 160 + 161 + - Double fault (#DF): 162 + 163 + A double fault is usually fatal, but the ESPFIX workaround, which can 164 + be triggered from user space through modify_ldt(2) is a recoverable 165 + double fault. #DF uses the paranoid exit path, so explicit mitigation 166 + in the double fault handler is required. 167 + 168 + - Machine Check Exception (#MC): 169 + 170 + Another corner case is a #MC which hits between the CPU buffer clear 171 + invocation and the actual return to user. As this still is in kernel 172 + space it takes the paranoid exit path which does not clear the CPU 173 + buffers. So the #MC handler repopulates the buffers to some 174 + extent. Machine checks are not reliably controllable and the window is 175 + extremly small so mitigation would just tick a checkbox that this 176 + theoretical corner case is covered. To keep the amount of special 177 + cases small, ignore #MC. 178 + 179 + - Debug Exception (#DB): 180 + 181 + This takes the paranoid exit path only when the INT1 breakpoint is in 182 + kernel space. #DB on a user space address takes the regular exit path, 183 + so no extra mitigation required. 184 + 185 + 186 + 2. C-State transition 187 + ^^^^^^^^^^^^^^^^^^^^^ 188 + 189 + When a CPU goes idle and enters a C-State the CPU buffers need to be 190 + cleared on affected CPUs when SMT is active. This addresses the 191 + repartitioning of the store buffer when one of the Hyper-Threads enters 192 + a C-State. 193 + 194 + When SMT is inactive, i.e. either the CPU does not support it or all 195 + sibling threads are offline CPU buffer clearing is not required. 196 + 197 + The idle clearing is enabled on CPUs which are only affected by MSBDS 198 + and not by any other MDS variant. The other MDS variants cannot be 199 + protected against cross Hyper-Thread attacks because the Fill Buffer and 200 + the Load Ports are shared. So on CPUs affected by other variants, the 201 + idle clearing would be a window dressing exercise and is therefore not 202 + activated. 203 + 204 + The invocation is controlled by the static key mds_idle_clear which is 205 + switched depending on the chosen mitigation mode and the SMT state of 206 + the system. 207 + 208 + The buffer clear is only invoked before entering the C-State to prevent 209 + that stale data from the idling CPU from spilling to the Hyper-Thread 210 + sibling after the store buffer got repartitioned and all entries are 211 + available to the non idle sibling. 212 + 213 + When coming out of idle the store buffer is partitioned again so each 214 + sibling has half of it available. The back from idle CPU could be then 215 + speculatively exposed to contents of the sibling. The buffers are 216 + flushed either on exit to user space or on VMENTER so malicious code 217 + in user space or the guest cannot speculatively access them. 218 + 219 + The mitigation is hooked into all variants of halt()/mwait(), but does 220 + not cover the legacy ACPI IO-Port mechanism because the ACPI idle driver 221 + has been superseded by the intel_idle driver around 2010 and is 222 + preferred on all affected CPUs which are expected to gain the MD_CLEAR 223 + functionality in microcode. Aside of that the IO-Port mechanism is a 224 + legacy interface which is only used on older systems which are either 225 + not affected or do not receive microcode updates anymore.
+3
arch/x86/entry/common.c
··· 32 32 #include <asm/vdso.h> 33 33 #include <asm/cpufeature.h> 34 34 #include <asm/fpu/api.h> 35 + #include <asm/nospec-branch.h> 35 36 36 37 #define CREATE_TRACE_POINTS 37 38 #include <trace/events/syscalls.h> ··· 221 220 #endif 222 221 223 222 user_enter_irqoff(); 223 + 224 + mds_user_clear_cpu_buffers(); 224 225 } 225 226 226 227 #define SYSCALL_EXIT_WORK_FLAGS \
+3
arch/x86/include/asm/cpufeatures.h
··· 344 344 /* Intel-defined CPU features, CPUID level 0x00000007:0 (EDX), word 18 */ 345 345 #define X86_FEATURE_AVX512_4VNNIW (18*32+ 2) /* AVX-512 Neural Network Instructions */ 346 346 #define X86_FEATURE_AVX512_4FMAPS (18*32+ 3) /* AVX-512 Multiply Accumulation Single precision */ 347 + #define X86_FEATURE_MD_CLEAR (18*32+10) /* VERW clears CPU buffers */ 347 348 #define X86_FEATURE_TSX_FORCE_ABORT (18*32+13) /* "" TSX_FORCE_ABORT */ 348 349 #define X86_FEATURE_PCONFIG (18*32+18) /* Intel PCONFIG */ 349 350 #define X86_FEATURE_SPEC_CTRL (18*32+26) /* "" Speculation Control (IBRS + IBPB) */ ··· 383 382 #define X86_BUG_SPECTRE_V2 X86_BUG(16) /* CPU is affected by Spectre variant 2 attack with indirect branches */ 384 383 #define X86_BUG_SPEC_STORE_BYPASS X86_BUG(17) /* CPU is affected by speculative store bypass attack */ 385 384 #define X86_BUG_L1TF X86_BUG(18) /* CPU is affected by L1 Terminal Fault */ 385 + #define X86_BUG_MDS X86_BUG(19) /* CPU is affected by Microarchitectural data sampling */ 386 + #define X86_BUG_MSBDS_ONLY X86_BUG(20) /* CPU is only affected by the MSDBS variant of BUG_MDS */ 386 387 387 388 #endif /* _ASM_X86_CPUFEATURES_H */
+4
arch/x86/include/asm/irqflags.h
··· 6 6 7 7 #ifndef __ASSEMBLY__ 8 8 9 + #include <asm/nospec-branch.h> 10 + 9 11 /* Provide __cpuidle; we can't safely include <linux/cpu.h> */ 10 12 #define __cpuidle __attribute__((__section__(".cpuidle.text"))) 11 13 ··· 56 54 57 55 static inline __cpuidle void native_safe_halt(void) 58 56 { 57 + mds_idle_clear_cpu_buffers(); 59 58 asm volatile("sti; hlt": : :"memory"); 60 59 } 61 60 62 61 static inline __cpuidle void native_halt(void) 63 62 { 63 + mds_idle_clear_cpu_buffers(); 64 64 asm volatile("hlt": : :"memory"); 65 65 } 66 66
+23 -16
arch/x86/include/asm/msr-index.h
··· 2 2 #ifndef _ASM_X86_MSR_INDEX_H 3 3 #define _ASM_X86_MSR_INDEX_H 4 4 5 + #include <linux/bits.h> 6 + 5 7 /* 6 8 * CPU model specific register (MSR) numbers. 7 9 * ··· 42 40 /* Intel MSRs. Some also available on other CPUs */ 43 41 44 42 #define MSR_IA32_SPEC_CTRL 0x00000048 /* Speculation Control */ 45 - #define SPEC_CTRL_IBRS (1 << 0) /* Indirect Branch Restricted Speculation */ 43 + #define SPEC_CTRL_IBRS BIT(0) /* Indirect Branch Restricted Speculation */ 46 44 #define SPEC_CTRL_STIBP_SHIFT 1 /* Single Thread Indirect Branch Predictor (STIBP) bit */ 47 - #define SPEC_CTRL_STIBP (1 << SPEC_CTRL_STIBP_SHIFT) /* STIBP mask */ 45 + #define SPEC_CTRL_STIBP BIT(SPEC_CTRL_STIBP_SHIFT) /* STIBP mask */ 48 46 #define SPEC_CTRL_SSBD_SHIFT 2 /* Speculative Store Bypass Disable bit */ 49 - #define SPEC_CTRL_SSBD (1 << SPEC_CTRL_SSBD_SHIFT) /* Speculative Store Bypass Disable */ 47 + #define SPEC_CTRL_SSBD BIT(SPEC_CTRL_SSBD_SHIFT) /* Speculative Store Bypass Disable */ 50 48 51 49 #define MSR_IA32_PRED_CMD 0x00000049 /* Prediction Command */ 52 - #define PRED_CMD_IBPB (1 << 0) /* Indirect Branch Prediction Barrier */ 50 + #define PRED_CMD_IBPB BIT(0) /* Indirect Branch Prediction Barrier */ 53 51 54 52 #define MSR_PPIN_CTL 0x0000004e 55 53 #define MSR_PPIN 0x0000004f ··· 71 69 #define MSR_MTRRcap 0x000000fe 72 70 73 71 #define MSR_IA32_ARCH_CAPABILITIES 0x0000010a 74 - #define ARCH_CAP_RDCL_NO (1 << 0) /* Not susceptible to Meltdown */ 75 - #define ARCH_CAP_IBRS_ALL (1 << 1) /* Enhanced IBRS support */ 76 - #define ARCH_CAP_SKIP_VMENTRY_L1DFLUSH (1 << 3) /* Skip L1D flush on vmentry */ 77 - #define ARCH_CAP_SSB_NO (1 << 4) /* 78 - * Not susceptible to Speculative Store Bypass 79 - * attack, so no Speculative Store Bypass 80 - * control required. 81 - */ 72 + #define ARCH_CAP_RDCL_NO BIT(0) /* Not susceptible to Meltdown */ 73 + #define ARCH_CAP_IBRS_ALL BIT(1) /* Enhanced IBRS support */ 74 + #define ARCH_CAP_SKIP_VMENTRY_L1DFLUSH BIT(3) /* Skip L1D flush on vmentry */ 75 + #define ARCH_CAP_SSB_NO BIT(4) /* 76 + * Not susceptible to Speculative Store Bypass 77 + * attack, so no Speculative Store Bypass 78 + * control required. 79 + */ 80 + #define ARCH_CAP_MDS_NO BIT(5) /* 81 + * Not susceptible to 82 + * Microarchitectural Data 83 + * Sampling (MDS) vulnerabilities. 84 + */ 82 85 83 86 #define MSR_IA32_FLUSH_CMD 0x0000010b 84 - #define L1D_FLUSH (1 << 0) /* 85 - * Writeback and invalidate the 86 - * L1 data cache. 87 - */ 87 + #define L1D_FLUSH BIT(0) /* 88 + * Writeback and invalidate the 89 + * L1 data cache. 90 + */ 88 91 89 92 #define MSR_IA32_BBL_CR_CTL 0x00000119 90 93 #define MSR_IA32_BBL_CR_CTL3 0x0000011e
+7
arch/x86/include/asm/mwait.h
··· 6 6 #include <linux/sched/idle.h> 7 7 8 8 #include <asm/cpufeature.h> 9 + #include <asm/nospec-branch.h> 9 10 10 11 #define MWAIT_SUBSTATE_MASK 0xf 11 12 #define MWAIT_CSTATE_MASK 0xf ··· 41 40 42 41 static inline void __mwait(unsigned long eax, unsigned long ecx) 43 42 { 43 + mds_idle_clear_cpu_buffers(); 44 + 44 45 /* "mwait %eax, %ecx;" */ 45 46 asm volatile(".byte 0x0f, 0x01, 0xc9;" 46 47 :: "a" (eax), "c" (ecx)); ··· 77 74 static inline void __mwaitx(unsigned long eax, unsigned long ebx, 78 75 unsigned long ecx) 79 76 { 77 + /* No MDS buffer clear as this is AMD/HYGON only */ 78 + 80 79 /* "mwaitx %eax, %ebx, %ecx;" */ 81 80 asm volatile(".byte 0x0f, 0x01, 0xfb;" 82 81 :: "a" (eax), "b" (ebx), "c" (ecx)); ··· 86 81 87 82 static inline void __sti_mwait(unsigned long eax, unsigned long ecx) 88 83 { 84 + mds_idle_clear_cpu_buffers(); 85 + 89 86 trace_hardirqs_on(); 90 87 /* "mwait %eax, %ecx;" */ 91 88 asm volatile("sti; .byte 0x0f, 0x01, 0xc9;"
+50
arch/x86/include/asm/nospec-branch.h
··· 308 308 DECLARE_STATIC_KEY_FALSE(switch_mm_cond_ibpb); 309 309 DECLARE_STATIC_KEY_FALSE(switch_mm_always_ibpb); 310 310 311 + DECLARE_STATIC_KEY_FALSE(mds_user_clear); 312 + DECLARE_STATIC_KEY_FALSE(mds_idle_clear); 313 + 314 + #include <asm/segment.h> 315 + 316 + /** 317 + * mds_clear_cpu_buffers - Mitigation for MDS vulnerability 318 + * 319 + * This uses the otherwise unused and obsolete VERW instruction in 320 + * combination with microcode which triggers a CPU buffer flush when the 321 + * instruction is executed. 322 + */ 323 + static inline void mds_clear_cpu_buffers(void) 324 + { 325 + static const u16 ds = __KERNEL_DS; 326 + 327 + /* 328 + * Has to be the memory-operand variant because only that 329 + * guarantees the CPU buffer flush functionality according to 330 + * documentation. The register-operand variant does not. 331 + * Works with any segment selector, but a valid writable 332 + * data segment is the fastest variant. 333 + * 334 + * "cc" clobber is required because VERW modifies ZF. 335 + */ 336 + asm volatile("verw %[ds]" : : [ds] "m" (ds) : "cc"); 337 + } 338 + 339 + /** 340 + * mds_user_clear_cpu_buffers - Mitigation for MDS vulnerability 341 + * 342 + * Clear CPU buffers if the corresponding static key is enabled 343 + */ 344 + static inline void mds_user_clear_cpu_buffers(void) 345 + { 346 + if (static_branch_likely(&mds_user_clear)) 347 + mds_clear_cpu_buffers(); 348 + } 349 + 350 + /** 351 + * mds_idle_clear_cpu_buffers - Mitigation for MDS vulnerability 352 + * 353 + * Clear CPU buffers if the corresponding static key is enabled 354 + */ 355 + static inline void mds_idle_clear_cpu_buffers(void) 356 + { 357 + if (static_branch_likely(&mds_idle_clear)) 358 + mds_clear_cpu_buffers(); 359 + } 360 + 311 361 #endif /* __ASSEMBLY__ */ 312 362 313 363 /*
+6
arch/x86/include/asm/processor.h
··· 978 978 979 979 extern enum l1tf_mitigations l1tf_mitigation; 980 980 981 + enum mds_mitigations { 982 + MDS_MITIGATION_OFF, 983 + MDS_MITIGATION_FULL, 984 + MDS_MITIGATION_VMWERV, 985 + }; 986 + 981 987 #endif /* _ASM_X86_PROCESSOR_H */
+131 -4
arch/x86/kernel/cpu/bugs.c
··· 37 37 static void __init spectre_v2_select_mitigation(void); 38 38 static void __init ssb_select_mitigation(void); 39 39 static void __init l1tf_select_mitigation(void); 40 + static void __init mds_select_mitigation(void); 40 41 41 42 /* The base value of the SPEC_CTRL MSR that always has to be preserved. */ 42 43 u64 x86_spec_ctrl_base; ··· 63 62 DEFINE_STATIC_KEY_FALSE(switch_mm_cond_ibpb); 64 63 /* Control unconditional IBPB in switch_mm() */ 65 64 DEFINE_STATIC_KEY_FALSE(switch_mm_always_ibpb); 65 + 66 + /* Control MDS CPU buffer clear before returning to user space */ 67 + DEFINE_STATIC_KEY_FALSE(mds_user_clear); 68 + EXPORT_SYMBOL_GPL(mds_user_clear); 69 + /* Control MDS CPU buffer clear before idling (halt, mwait) */ 70 + DEFINE_STATIC_KEY_FALSE(mds_idle_clear); 71 + EXPORT_SYMBOL_GPL(mds_idle_clear); 66 72 67 73 void __init check_bugs(void) 68 74 { ··· 108 100 ssb_select_mitigation(); 109 101 110 102 l1tf_select_mitigation(); 103 + 104 + mds_select_mitigation(); 105 + 106 + arch_smt_update(); 111 107 112 108 #ifdef CONFIG_X86_32 113 109 /* ··· 217 205 else if (boot_cpu_has(X86_FEATURE_LS_CFG_SSBD)) 218 206 wrmsrl(MSR_AMD64_LS_CFG, msrval); 219 207 } 208 + 209 + #undef pr_fmt 210 + #define pr_fmt(fmt) "MDS: " fmt 211 + 212 + /* Default mitigation for MDS-affected CPUs */ 213 + static enum mds_mitigations mds_mitigation __ro_after_init = MDS_MITIGATION_FULL; 214 + static bool mds_nosmt __ro_after_init = false; 215 + 216 + static const char * const mds_strings[] = { 217 + [MDS_MITIGATION_OFF] = "Vulnerable", 218 + [MDS_MITIGATION_FULL] = "Mitigation: Clear CPU buffers", 219 + [MDS_MITIGATION_VMWERV] = "Vulnerable: Clear CPU buffers attempted, no microcode", 220 + }; 221 + 222 + static void __init mds_select_mitigation(void) 223 + { 224 + if (!boot_cpu_has_bug(X86_BUG_MDS) || cpu_mitigations_off()) { 225 + mds_mitigation = MDS_MITIGATION_OFF; 226 + return; 227 + } 228 + 229 + if (mds_mitigation == MDS_MITIGATION_FULL) { 230 + if (!boot_cpu_has(X86_FEATURE_MD_CLEAR)) 231 + mds_mitigation = MDS_MITIGATION_VMWERV; 232 + 233 + static_branch_enable(&mds_user_clear); 234 + 235 + if (!boot_cpu_has(X86_BUG_MSBDS_ONLY) && 236 + (mds_nosmt || cpu_mitigations_auto_nosmt())) 237 + cpu_smt_disable(false); 238 + } 239 + 240 + pr_info("%s\n", mds_strings[mds_mitigation]); 241 + } 242 + 243 + static int __init mds_cmdline(char *str) 244 + { 245 + if (!boot_cpu_has_bug(X86_BUG_MDS)) 246 + return 0; 247 + 248 + if (!str) 249 + return -EINVAL; 250 + 251 + if (!strcmp(str, "off")) 252 + mds_mitigation = MDS_MITIGATION_OFF; 253 + else if (!strcmp(str, "full")) 254 + mds_mitigation = MDS_MITIGATION_FULL; 255 + else if (!strcmp(str, "full,nosmt")) { 256 + mds_mitigation = MDS_MITIGATION_FULL; 257 + mds_nosmt = true; 258 + } 259 + 260 + return 0; 261 + } 262 + early_param("mds", mds_cmdline); 220 263 221 264 #undef pr_fmt 222 265 #define pr_fmt(fmt) "Spectre V2 : " fmt ··· 642 575 643 576 /* Set up IBPB and STIBP depending on the general spectre V2 command */ 644 577 spectre_v2_user_select_mitigation(cmd); 645 - 646 - /* Enable STIBP if appropriate */ 647 - arch_smt_update(); 648 578 } 649 579 650 580 static void update_stibp_msr(void * __unused) ··· 675 611 static_branch_disable(&switch_to_cond_stibp); 676 612 } 677 613 614 + #undef pr_fmt 615 + #define pr_fmt(fmt) fmt 616 + 617 + /* Update the static key controlling the MDS CPU buffer clear in idle */ 618 + static void update_mds_branch_idle(void) 619 + { 620 + /* 621 + * Enable the idle clearing if SMT is active on CPUs which are 622 + * affected only by MSBDS and not any other MDS variant. 623 + * 624 + * The other variants cannot be mitigated when SMT is enabled, so 625 + * clearing the buffers on idle just to prevent the Store Buffer 626 + * repartitioning leak would be a window dressing exercise. 627 + */ 628 + if (!boot_cpu_has_bug(X86_BUG_MSBDS_ONLY)) 629 + return; 630 + 631 + if (sched_smt_active()) 632 + static_branch_enable(&mds_idle_clear); 633 + else 634 + static_branch_disable(&mds_idle_clear); 635 + } 636 + 637 + #define MDS_MSG_SMT "MDS CPU bug present and SMT on, data leak possible. See https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/mds.html for more details.\n" 638 + 678 639 void arch_smt_update(void) 679 640 { 680 641 /* Enhanced IBRS implies STIBP. No update required. */ ··· 718 629 case SPECTRE_V2_USER_PRCTL: 719 630 case SPECTRE_V2_USER_SECCOMP: 720 631 update_indir_branch_cond(); 632 + break; 633 + } 634 + 635 + switch (mds_mitigation) { 636 + case MDS_MITIGATION_FULL: 637 + case MDS_MITIGATION_VMWERV: 638 + if (sched_smt_active() && !boot_cpu_has(X86_BUG_MSBDS_ONLY)) 639 + pr_warn_once(MDS_MSG_SMT); 640 + update_mds_branch_idle(); 641 + break; 642 + case MDS_MITIGATION_OFF: 721 643 break; 722 644 } 723 645 ··· 1143 1043 pr_info("You may make it effective by booting the kernel with mem=%llu parameter.\n", 1144 1044 half_pa); 1145 1045 pr_info("However, doing so will make a part of your RAM unusable.\n"); 1146 - pr_info("Reading https://www.kernel.org/doc/html/latest/admin-guide/l1tf.html might help you decide.\n"); 1046 + pr_info("Reading https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/l1tf.html might help you decide.\n"); 1147 1047 return; 1148 1048 } 1149 1049 ··· 1176 1076 early_param("l1tf", l1tf_cmdline); 1177 1077 1178 1078 #undef pr_fmt 1079 + #define pr_fmt(fmt) fmt 1179 1080 1180 1081 #ifdef CONFIG_SYSFS 1181 1082 ··· 1214 1113 return sprintf(buf, "%s\n", L1TF_DEFAULT_MSG); 1215 1114 } 1216 1115 #endif 1116 + 1117 + static ssize_t mds_show_state(char *buf) 1118 + { 1119 + if (!hypervisor_is_type(X86_HYPER_NATIVE)) { 1120 + return sprintf(buf, "%s; SMT Host state unknown\n", 1121 + mds_strings[mds_mitigation]); 1122 + } 1123 + 1124 + if (boot_cpu_has(X86_BUG_MSBDS_ONLY)) { 1125 + return sprintf(buf, "%s; SMT %s\n", mds_strings[mds_mitigation], 1126 + (mds_mitigation == MDS_MITIGATION_OFF ? "vulnerable" : 1127 + sched_smt_active() ? "mitigated" : "disabled")); 1128 + } 1129 + 1130 + return sprintf(buf, "%s; SMT %s\n", mds_strings[mds_mitigation], 1131 + sched_smt_active() ? "vulnerable" : "disabled"); 1132 + } 1217 1133 1218 1134 static char *stibp_state(void) 1219 1135 { ··· 1298 1180 if (boot_cpu_has(X86_FEATURE_L1TF_PTEINV)) 1299 1181 return l1tf_show_state(buf); 1300 1182 break; 1183 + 1184 + case X86_BUG_MDS: 1185 + return mds_show_state(buf); 1186 + 1301 1187 default: 1302 1188 break; 1303 1189 } ··· 1332 1210 ssize_t cpu_show_l1tf(struct device *dev, struct device_attribute *attr, char *buf) 1333 1211 { 1334 1212 return cpu_show_common(dev, attr, buf, X86_BUG_L1TF); 1213 + } 1214 + 1215 + ssize_t cpu_show_mds(struct device *dev, struct device_attribute *attr, char *buf) 1216 + { 1217 + return cpu_show_common(dev, attr, buf, X86_BUG_MDS); 1335 1218 } 1336 1219 #endif
+71 -50
arch/x86/kernel/cpu/common.c
··· 940 940 #endif 941 941 } 942 942 943 - static const __initconst struct x86_cpu_id cpu_no_speculation[] = { 944 - { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_SALTWELL, X86_FEATURE_ANY }, 945 - { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_SALTWELL_TABLET, X86_FEATURE_ANY }, 946 - { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_BONNELL_MID, X86_FEATURE_ANY }, 947 - { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_SALTWELL_MID, X86_FEATURE_ANY }, 948 - { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_BONNELL, X86_FEATURE_ANY }, 949 - { X86_VENDOR_CENTAUR, 5 }, 950 - { X86_VENDOR_INTEL, 5 }, 951 - { X86_VENDOR_NSC, 5 }, 952 - { X86_VENDOR_ANY, 4 }, 943 + #define NO_SPECULATION BIT(0) 944 + #define NO_MELTDOWN BIT(1) 945 + #define NO_SSB BIT(2) 946 + #define NO_L1TF BIT(3) 947 + #define NO_MDS BIT(4) 948 + #define MSBDS_ONLY BIT(5) 949 + 950 + #define VULNWL(_vendor, _family, _model, _whitelist) \ 951 + { X86_VENDOR_##_vendor, _family, _model, X86_FEATURE_ANY, _whitelist } 952 + 953 + #define VULNWL_INTEL(model, whitelist) \ 954 + VULNWL(INTEL, 6, INTEL_FAM6_##model, whitelist) 955 + 956 + #define VULNWL_AMD(family, whitelist) \ 957 + VULNWL(AMD, family, X86_MODEL_ANY, whitelist) 958 + 959 + #define VULNWL_HYGON(family, whitelist) \ 960 + VULNWL(HYGON, family, X86_MODEL_ANY, whitelist) 961 + 962 + static const __initconst struct x86_cpu_id cpu_vuln_whitelist[] = { 963 + VULNWL(ANY, 4, X86_MODEL_ANY, NO_SPECULATION), 964 + VULNWL(CENTAUR, 5, X86_MODEL_ANY, NO_SPECULATION), 965 + VULNWL(INTEL, 5, X86_MODEL_ANY, NO_SPECULATION), 966 + VULNWL(NSC, 5, X86_MODEL_ANY, NO_SPECULATION), 967 + 968 + /* Intel Family 6 */ 969 + VULNWL_INTEL(ATOM_SALTWELL, NO_SPECULATION), 970 + VULNWL_INTEL(ATOM_SALTWELL_TABLET, NO_SPECULATION), 971 + VULNWL_INTEL(ATOM_SALTWELL_MID, NO_SPECULATION), 972 + VULNWL_INTEL(ATOM_BONNELL, NO_SPECULATION), 973 + VULNWL_INTEL(ATOM_BONNELL_MID, NO_SPECULATION), 974 + 975 + VULNWL_INTEL(ATOM_SILVERMONT, NO_SSB | NO_L1TF | MSBDS_ONLY), 976 + VULNWL_INTEL(ATOM_SILVERMONT_X, NO_SSB | NO_L1TF | MSBDS_ONLY), 977 + VULNWL_INTEL(ATOM_SILVERMONT_MID, NO_SSB | NO_L1TF | MSBDS_ONLY), 978 + VULNWL_INTEL(ATOM_AIRMONT, NO_SSB | NO_L1TF | MSBDS_ONLY), 979 + VULNWL_INTEL(XEON_PHI_KNL, NO_SSB | NO_L1TF | MSBDS_ONLY), 980 + VULNWL_INTEL(XEON_PHI_KNM, NO_SSB | NO_L1TF | MSBDS_ONLY), 981 + 982 + VULNWL_INTEL(CORE_YONAH, NO_SSB), 983 + 984 + VULNWL_INTEL(ATOM_AIRMONT_MID, NO_L1TF | MSBDS_ONLY), 985 + 986 + VULNWL_INTEL(ATOM_GOLDMONT, NO_MDS | NO_L1TF), 987 + VULNWL_INTEL(ATOM_GOLDMONT_X, NO_MDS | NO_L1TF), 988 + VULNWL_INTEL(ATOM_GOLDMONT_PLUS, NO_MDS | NO_L1TF), 989 + 990 + /* AMD Family 0xf - 0x12 */ 991 + VULNWL_AMD(0x0f, NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS), 992 + VULNWL_AMD(0x10, NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS), 993 + VULNWL_AMD(0x11, NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS), 994 + VULNWL_AMD(0x12, NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS), 995 + 996 + /* FAMILY_ANY must be last, otherwise 0x0f - 0x12 matches won't work */ 997 + VULNWL_AMD(X86_FAMILY_ANY, NO_MELTDOWN | NO_L1TF | NO_MDS), 998 + VULNWL_HYGON(X86_FAMILY_ANY, NO_MELTDOWN | NO_L1TF | NO_MDS), 953 999 {} 954 1000 }; 955 1001 956 - static const __initconst struct x86_cpu_id cpu_no_meltdown[] = { 957 - { X86_VENDOR_AMD }, 958 - { X86_VENDOR_HYGON }, 959 - {} 960 - }; 1002 + static bool __init cpu_matches(unsigned long which) 1003 + { 1004 + const struct x86_cpu_id *m = x86_match_cpu(cpu_vuln_whitelist); 961 1005 962 - /* Only list CPUs which speculate but are non susceptible to SSB */ 963 - static const __initconst struct x86_cpu_id cpu_no_spec_store_bypass[] = { 964 - { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_SILVERMONT }, 965 - { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_AIRMONT }, 966 - { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_SILVERMONT_X }, 967 - { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_SILVERMONT_MID }, 968 - { X86_VENDOR_INTEL, 6, INTEL_FAM6_CORE_YONAH }, 969 - { X86_VENDOR_INTEL, 6, INTEL_FAM6_XEON_PHI_KNL }, 970 - { X86_VENDOR_INTEL, 6, INTEL_FAM6_XEON_PHI_KNM }, 971 - { X86_VENDOR_AMD, 0x12, }, 972 - { X86_VENDOR_AMD, 0x11, }, 973 - { X86_VENDOR_AMD, 0x10, }, 974 - { X86_VENDOR_AMD, 0xf, }, 975 - {} 976 - }; 977 - 978 - static const __initconst struct x86_cpu_id cpu_no_l1tf[] = { 979 - /* in addition to cpu_no_speculation */ 980 - { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_SILVERMONT }, 981 - { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_SILVERMONT_X }, 982 - { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_AIRMONT }, 983 - { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_SILVERMONT_MID }, 984 - { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_AIRMONT_MID }, 985 - { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_GOLDMONT }, 986 - { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_GOLDMONT_X }, 987 - { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_GOLDMONT_PLUS }, 988 - { X86_VENDOR_INTEL, 6, INTEL_FAM6_XEON_PHI_KNL }, 989 - { X86_VENDOR_INTEL, 6, INTEL_FAM6_XEON_PHI_KNM }, 990 - {} 991 - }; 1006 + return m && !!(m->driver_data & which); 1007 + } 992 1008 993 1009 static void __init cpu_set_bug_bits(struct cpuinfo_x86 *c) 994 1010 { 995 1011 u64 ia32_cap = 0; 996 1012 997 - if (x86_match_cpu(cpu_no_speculation)) 1013 + if (cpu_matches(NO_SPECULATION)) 998 1014 return; 999 1015 1000 1016 setup_force_cpu_bug(X86_BUG_SPECTRE_V1); ··· 1019 1003 if (cpu_has(c, X86_FEATURE_ARCH_CAPABILITIES)) 1020 1004 rdmsrl(MSR_IA32_ARCH_CAPABILITIES, ia32_cap); 1021 1005 1022 - if (!x86_match_cpu(cpu_no_spec_store_bypass) && 1023 - !(ia32_cap & ARCH_CAP_SSB_NO) && 1006 + if (!cpu_matches(NO_SSB) && !(ia32_cap & ARCH_CAP_SSB_NO) && 1024 1007 !cpu_has(c, X86_FEATURE_AMD_SSB_NO)) 1025 1008 setup_force_cpu_bug(X86_BUG_SPEC_STORE_BYPASS); 1026 1009 1027 1010 if (ia32_cap & ARCH_CAP_IBRS_ALL) 1028 1011 setup_force_cpu_cap(X86_FEATURE_IBRS_ENHANCED); 1029 1012 1030 - if (x86_match_cpu(cpu_no_meltdown)) 1013 + if (!cpu_matches(NO_MDS) && !(ia32_cap & ARCH_CAP_MDS_NO)) { 1014 + setup_force_cpu_bug(X86_BUG_MDS); 1015 + if (cpu_matches(MSBDS_ONLY)) 1016 + setup_force_cpu_bug(X86_BUG_MSBDS_ONLY); 1017 + } 1018 + 1019 + if (cpu_matches(NO_MELTDOWN)) 1031 1020 return; 1032 1021 1033 1022 /* Rogue Data Cache Load? No! */ ··· 1041 1020 1042 1021 setup_force_cpu_bug(X86_BUG_CPU_MELTDOWN); 1043 1022 1044 - if (x86_match_cpu(cpu_no_l1tf)) 1023 + if (cpu_matches(NO_L1TF)) 1045 1024 return; 1046 1025 1047 1026 setup_force_cpu_bug(X86_BUG_L1TF);
+4
arch/x86/kernel/nmi.c
··· 35 35 #include <asm/x86_init.h> 36 36 #include <asm/reboot.h> 37 37 #include <asm/cache.h> 38 + #include <asm/nospec-branch.h> 38 39 39 40 #define CREATE_TRACE_POINTS 40 41 #include <trace/events/nmi.h> ··· 552 551 write_cr2(this_cpu_read(nmi_cr2)); 553 552 if (this_cpu_dec_return(nmi_state)) 554 553 goto nmi_restart; 554 + 555 + if (user_mode(regs)) 556 + mds_user_clear_cpu_buffers(); 555 557 } 556 558 NOKPROBE_SYMBOL(do_nmi); 557 559
+8
arch/x86/kernel/traps.c
··· 58 58 #include <asm/alternative.h> 59 59 #include <asm/fpu/xstate.h> 60 60 #include <asm/trace/mpx.h> 61 + #include <asm/nospec-branch.h> 61 62 #include <asm/mpx.h> 62 63 #include <asm/vm86.h> 63 64 #include <asm/umip.h> ··· 368 367 regs->ip = (unsigned long)general_protection; 369 368 regs->sp = (unsigned long)&gpregs->orig_ax; 370 369 370 + /* 371 + * This situation can be triggered by userspace via 372 + * modify_ldt(2) and the return does not take the regular 373 + * user space exit, so a CPU buffer clear is required when 374 + * MDS mitigation is enabled. 375 + */ 376 + mds_user_clear_cpu_buffers(); 371 377 return; 372 378 } 373 379 #endif
+2 -1
arch/x86/kvm/cpuid.c
··· 410 410 /* cpuid 7.0.edx*/ 411 411 const u32 kvm_cpuid_7_0_edx_x86_features = 412 412 F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(SPEC_CTRL) | 413 - F(SPEC_CTRL_SSBD) | F(ARCH_CAPABILITIES) | F(INTEL_STIBP); 413 + F(SPEC_CTRL_SSBD) | F(ARCH_CAPABILITIES) | F(INTEL_STIBP) | 414 + F(MD_CLEAR); 414 415 415 416 /* all calls to cpuid_count() should be made on the same cpu */ 416 417 get_cpu();
+5 -2
arch/x86/kvm/vmx/vmx.c
··· 6431 6431 */ 6432 6432 x86_spec_ctrl_set_guest(vmx->spec_ctrl, 0); 6433 6433 6434 + /* L1D Flush includes CPU buffer clear to mitigate MDS */ 6434 6435 if (static_branch_unlikely(&vmx_l1d_should_flush)) 6435 6436 vmx_l1d_flush(vcpu); 6437 + else if (static_branch_unlikely(&mds_user_clear)) 6438 + mds_clear_cpu_buffers(); 6436 6439 6437 6440 if (vcpu->arch.cr2 != read_cr2()) 6438 6441 write_cr2(vcpu->arch.cr2); ··· 6671 6668 return ERR_PTR(err); 6672 6669 } 6673 6670 6674 - #define L1TF_MSG_SMT "L1TF CPU bug present and SMT on, data leak possible. See CVE-2018-3646 and https://www.kernel.org/doc/html/latest/admin-guide/l1tf.html for details.\n" 6675 - #define L1TF_MSG_L1D "L1TF CPU bug present and virtualization mitigation disabled, data leak possible. See CVE-2018-3646 and https://www.kernel.org/doc/html/latest/admin-guide/l1tf.html for details.\n" 6671 + #define L1TF_MSG_SMT "L1TF CPU bug present and SMT on, data leak possible. See CVE-2018-3646 and https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/l1tf.html for details.\n" 6672 + #define L1TF_MSG_L1D "L1TF CPU bug present and virtualization mitigation disabled, data leak possible. See CVE-2018-3646 and https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/l1tf.html for details.\n" 6676 6673 6677 6674 static int vmx_vm_init(struct kvm *kvm) 6678 6675 {
+8
drivers/base/cpu.c
··· 548 548 return sprintf(buf, "Not affected\n"); 549 549 } 550 550 551 + ssize_t __weak cpu_show_mds(struct device *dev, 552 + struct device_attribute *attr, char *buf) 553 + { 554 + return sprintf(buf, "Not affected\n"); 555 + } 556 + 551 557 static DEVICE_ATTR(meltdown, 0444, cpu_show_meltdown, NULL); 552 558 static DEVICE_ATTR(spectre_v1, 0444, cpu_show_spectre_v1, NULL); 553 559 static DEVICE_ATTR(spectre_v2, 0444, cpu_show_spectre_v2, NULL); 554 560 static DEVICE_ATTR(spec_store_bypass, 0444, cpu_show_spec_store_bypass, NULL); 555 561 static DEVICE_ATTR(l1tf, 0444, cpu_show_l1tf, NULL); 562 + static DEVICE_ATTR(mds, 0444, cpu_show_mds, NULL); 556 563 557 564 static struct attribute *cpu_root_vulnerabilities_attrs[] = { 558 565 &dev_attr_meltdown.attr, ··· 567 560 &dev_attr_spectre_v2.attr, 568 561 &dev_attr_spec_store_bypass.attr, 569 562 &dev_attr_l1tf.attr, 563 + &dev_attr_mds.attr, 570 564 NULL 571 565 }; 572 566
+2
include/linux/cpu.h
··· 57 57 struct device_attribute *attr, char *buf); 58 58 extern ssize_t cpu_show_l1tf(struct device *dev, 59 59 struct device_attribute *attr, char *buf); 60 + extern ssize_t cpu_show_mds(struct device *dev, 61 + struct device_attribute *attr, char *buf); 60 62 61 63 extern __printf(4, 5) 62 64 struct device *cpu_device_create(struct device *parent, void *drvdata,
+1 -1
tools/power/x86/turbostat/Makefile
··· 9 9 endif 10 10 11 11 turbostat : turbostat.c 12 - override CFLAGS += -Wall 12 + override CFLAGS += -Wall -I../../../include 13 13 override CFLAGS += -DMSRHEADER='"../../../../arch/x86/include/asm/msr-index.h"' 14 14 override CFLAGS += -DINTEL_FAMILY_HEADER='"../../../../arch/x86/include/asm/intel-family.h"' 15 15
+1 -1
tools/power/x86/x86_energy_perf_policy/Makefile
··· 9 9 endif 10 10 11 11 x86_energy_perf_policy : x86_energy_perf_policy.c 12 - override CFLAGS += -Wall 12 + override CFLAGS += -Wall -I../../../include 13 13 override CFLAGS += -DMSRHEADER='"../../../../arch/x86/include/asm/msr-index.h"' 14 14 15 15 %: %.c