Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

Merge tag 'kvm-x86-misc-6.5' of https://github.com/kvm-x86/linux into HEAD

KVM x86 changes for 6.5:

* Move handling of PAT out of MTRR code and dedup SVM+VMX code

* Fix output of PIC poll command emulation when there's an interrupt

* Add a maintainer's handbook to document KVM x86 processes, preferred coding
style, testing expectations, etc.

* Misc cleanups

+493 -86
+1
Documentation/process/maintainer-handbooks.rst
··· 17 17 18 18 maintainer-tip 19 19 maintainer-netdev 20 + maintainer-kvm-x86
+390
Documentation/process/maintainer-kvm-x86.rst
··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + KVM x86 4 + ======= 5 + 6 + Foreword 7 + -------- 8 + KVM strives to be a welcoming community; contributions from newcomers are 9 + valued and encouraged. Please do not be discouraged or intimidated by the 10 + length of this document and the many rules/guidelines it contains. Everyone 11 + makes mistakes, and everyone was a newbie at some point. So long as you make 12 + an honest effort to follow KVM x86's guidelines, are receptive to feedback, 13 + and learn from any mistakes you make, you will be welcomed with open arms, not 14 + torches and pitchforks. 15 + 16 + TL;DR 17 + ----- 18 + Testing is mandatory. Be consistent with established styles and patterns. 19 + 20 + Trees 21 + ----- 22 + KVM x86 is currently in a transition period from being part of the main KVM 23 + tree, to being "just another KVM arch". As such, KVM x86 is split across the 24 + main KVM tree, ``git.kernel.org/pub/scm/virt/kvm/kvm.git``, and a KVM x86 25 + specific tree, ``github.com/kvm-x86/linux.git``. 26 + 27 + Generally speaking, fixes for the current cycle are applied directly to the 28 + main KVM tree, while all development for the next cycle is routed through the 29 + KVM x86 tree. In the unlikely event that a fix for the current cycle is routed 30 + through the KVM x86 tree, it will be applied to the ``fixes`` branch before 31 + making its way to the main KVM tree. 32 + 33 + Note, this transition period is expected to last quite some time, i.e. will be 34 + the status quo for the foreseeable future. 35 + 36 + Branches 37 + ~~~~~~~~ 38 + The KVM x86 tree is organized into multiple topic branches. The purpose of 39 + using finer-grained topic branches is to make it easier to keep tabs on an area 40 + of development, and to limit the collateral damage of human errors and/or buggy 41 + commits, e.g. dropping the HEAD commit of a topic branch has no impact on other 42 + in-flight commits' SHA1 hashes, and having to reject a pull request due to bugs 43 + delays only that topic branch. 44 + 45 + All topic branches, except for ``next`` and ``fixes``, are rolled into ``next`` 46 + via a Cthulhu merge on an as-needed basis, i.e. when a topic branch is updated. 47 + As a result, force pushes to ``next`` are common. 48 + 49 + Lifecycle 50 + ~~~~~~~~~ 51 + Fixes that target the current release, a.k.a. mainline, are typically applied 52 + directly to the main KVM tree, i.e. do not route through the KVM x86 tree. 53 + 54 + Changes that target the next release are routed through the KVM x86 tree. Pull 55 + requests (from KVM x86 to main KVM) are sent for each KVM x86 topic branch, 56 + typically the week before Linus' opening of the merge window, e.g. the week 57 + following rc7 for "normal" releases. If all goes well, the topic branches are 58 + rolled into the main KVM pull request sent during Linus' merge window. 59 + 60 + The KVM x86 tree doesn't have its own official merge window, but there's a soft 61 + close around rc5 for new features, and a soft close around rc6 for fixes (for 62 + the next release; see above for fixes that target the current release). 63 + 64 + Timeline 65 + ~~~~~~~~ 66 + Submissions are typically reviewed and applied in FIFO order, with some wiggle 67 + room for the size of a series, patches that are "cache hot", etc. Fixes, 68 + especially for the current release and or stable trees, get to jump the queue. 69 + Patches that will be taken through a non-KVM tree (most often through the tip 70 + tree) and/or have other acks/reviews also jump the queue to some extent. 71 + 72 + Note, the vast majority of review is done between rc1 and rc6, give or take. 73 + The period between rc6 and the next rc1 is used to catch up on other tasks, 74 + i.e. radio silence during this period isn't unusual. 75 + 76 + Pings to get a status update are welcome, but keep in mind the timing of the 77 + current release cycle and have realistic expectations. If you are pinging for 78 + acceptance, i.e. not just for feedback or an update, please do everything you 79 + can, within reason, to ensure that your patches are ready to be merged! Pings 80 + on series that break the build or fail tests lead to unhappy maintainers! 81 + 82 + Development 83 + ----------- 84 + 85 + Base Tree/Branch 86 + ~~~~~~~~~~~~~~~~ 87 + Fixes that target the current release, a.k.a. mainline, should be based on 88 + ``git://git.kernel.org/pub/scm/virt/kvm/kvm.git master``. Note, fixes do not 89 + automatically warrant inclusion in the current release. There is no singular 90 + rule, but typically only fixes for bugs that are urgent, critical, and/or were 91 + introduced in the current release should target the current release. 92 + 93 + Everything else should be based on ``kvm-x86/next``, i.e. there is no need to 94 + select a specific topic branch as the base. If there are conflicts and/or 95 + dependencies across topic branches, it is the maintainer's job to sort them 96 + out. 97 + 98 + The only exception to using ``kvm-x86/next`` as the base is if a patch/series 99 + is a multi-arch series, i.e. has non-trivial modifications to common KVM code 100 + and/or has more than superficial changes to other architectures' code. Multi- 101 + arch patch/series should instead be based on a common, stable point in KVM's 102 + history, e.g. the release candidate upon which ``kvm-x86 next`` is based. If 103 + you're unsure whether a patch/series is truly multi-arch, err on the side of 104 + caution and treat it as multi-arch, i.e. use a common base. 105 + 106 + Coding Style 107 + ~~~~~~~~~~~~ 108 + When it comes to style, naming, patterns, etc., consistency is the number one 109 + priority in KVM x86. If all else fails, match what already exists. 110 + 111 + With a few caveats listed below, follow the tip tree maintainers' preferred 112 + :ref:`maintainer-tip-coding-style`, as patches/series often touch both KVM and 113 + non-KVM x86 files, i.e. draw the attention of KVM *and* tip tree maintainers. 114 + 115 + Using reverse fir tree, a.k.a. reverse Christmas tree or reverse XMAS tree, for 116 + variable declarations isn't strictly required, though it is still preferred. 117 + 118 + Except for a handful of special snowflakes, do not use kernel-doc comments for 119 + functions. The vast majority of "public" KVM functions aren't truly public as 120 + they are intended only for KVM-internal consumption (there are plans to 121 + privatize KVM's headers and exports to enforce this). 122 + 123 + Comments 124 + ~~~~~~~~ 125 + Write comments using imperative mood and avoid pronouns. Use comments to 126 + provide a high level overview of the code, and/or to explain why the code does 127 + what it does. Do not reiterate what the code literally does; let the code 128 + speak for itself. If the code itself is inscrutable, comments will not help. 129 + 130 + SDM and APM References 131 + ~~~~~~~~~~~~~~~~~~~~~~ 132 + Much of KVM's code base is directly tied to architectural behavior defined in 133 + Intel's Software Development Manual (SDM) and AMD's Architecture Programmer’s 134 + Manual (APM). Use of "Intel's SDM" and "AMD's APM", or even just "SDM" or 135 + "APM", without additional context is a-ok. 136 + 137 + Do not reference specific sections, tables, figures, etc. by number, especially 138 + not in comments. Instead, if necessary (see below), copy-paste the relevant 139 + snippet and reference sections/tables/figures by name. The layouts of the SDM 140 + and APM are constantly changing, and so the numbers/labels aren't stable. 141 + 142 + Generally speaking, do not explicitly reference or copy-paste from the SDM or 143 + APM in comments. With few exceptions, KVM *must* honor architectural behavior, 144 + therefore it's implied that KVM behavior is emulating SDM and/or APM behavior. 145 + Note, referencing the SDM/APM in changelogs to justify the change and provide 146 + context is perfectly ok and encouraged. 147 + 148 + Shortlog 149 + ~~~~~~~~ 150 + The preferred prefix format is ``KVM: <topic>:``, where ``<topic>`` is one of:: 151 + 152 + - x86 153 + - x86/mmu 154 + - x86/pmu 155 + - x86/xen 156 + - selftests 157 + - SVM 158 + - nSVM 159 + - VMX 160 + - nVMX 161 + 162 + **DO NOT use x86/kvm!** ``x86/kvm`` is used exclusively for Linux-as-a-KVM-guest 163 + changes, i.e. for arch/x86/kernel/kvm.c. Do not use file names or complete file 164 + paths as the subject/shortlog prefix. 165 + 166 + Note, these don't align with the topics branches (the topic branches care much 167 + more about code conflicts). 168 + 169 + All names are case sensitive! ``KVM: x86:`` is good, ``kvm: vmx:`` is not. 170 + 171 + Capitalize the first word of the condensed patch description, but omit ending 172 + punctionation. E.g.:: 173 + 174 + KVM: x86: Fix a null pointer dereference in function_xyz() 175 + 176 + not:: 177 + 178 + kvm: x86: fix a null pointer dereference in function_xyz. 179 + 180 + If a patch touches multiple topics, traverse up the conceptual tree to find the 181 + first common parent (which is often simply ``x86``). When in doubt, 182 + ``git log path/to/file`` should provide a reasonable hint. 183 + 184 + New topics do occasionally pop up, but please start an on-list discussion if 185 + you want to propose introducing a new topic, i.e. don't go rogue. 186 + 187 + See :ref:`the_canonical_patch_format` for more information, with one amendment: 188 + do not treat the 70-75 character limit as an absolute, hard limit. Instead, 189 + use 75 characters as a firm-but-not-hard limit, and use 80 characters as a hard 190 + limit. I.e. let the shortlog run a few characters over the standard limit if 191 + you have good reason to do so. 192 + 193 + Changelog 194 + ~~~~~~~~~ 195 + Most importantly, write changelogs using imperative mood and avoid pronouns. 196 + 197 + See :ref:`describe_changes` for more information, with one amendment: lead with 198 + a short blurb on the actual changes, and then follow up with the context and 199 + background. Note! This order directly conflicts with the tip tree's preferred 200 + approach! Please follow the tip tree's preferred style when sending patches 201 + that primarily target arch/x86 code that is _NOT_ KVM code. 202 + 203 + Stating what a patch does before diving into details is preferred by KVM x86 204 + for several reasons. First and foremost, what code is actually being changed 205 + is arguably the most important information, and so that info should be easy to 206 + find. Changelogs that bury the "what's actually changing" in a one-liner after 207 + 3+ paragraphs of background make it very hard to find that information. 208 + 209 + For initial review, one could argue the "what's broken" is more important, but 210 + for skimming logs and git archaeology, the gory details matter less and less. 211 + E.g. when doing a series of "git blame", the details of each change along the 212 + way are useless, the details only matter for the culprit. Providing the "what 213 + changed" makes it easy to quickly determine whether or not a commit might be of 214 + interest. 215 + 216 + Another benefit of stating "what's changing" first is that it's almost always 217 + possible to state "what's changing" in a single sentence. Conversely, all but 218 + the most simple bugs require multiple sentences or paragraphs to fully describe 219 + the problem. If both the "what's changing" and "what's the bug" are super 220 + short then the order doesn't matter. But if one is shorter (almost always the 221 + "what's changing), then covering the shorter one first is advantageous because 222 + it's less of an inconvenience for readers/reviewers that have a strict ordering 223 + preference. E.g. having to skip one sentence to get to the context is less 224 + painful than having to skip three paragraphs to get to "what's changing". 225 + 226 + Fixes 227 + ~~~~~ 228 + If a change fixes a KVM/kernel bug, add a Fixes: tag even if the change doesn't 229 + need to be backported to stable kernels, and even if the change fixes a bug in 230 + an older release. 231 + 232 + Conversely, if a fix does need to be backported, explicitly tag the patch with 233 + "Cc: stable@vger.kernel" (though the email itself doesn't need to Cc: stable); 234 + KVM x86 opts out of backporting Fixes: by default. Some auto-selected patches 235 + do get backported, but require explicit maintainer approval (search MANUALSEL). 236 + 237 + Function References 238 + ~~~~~~~~~~~~~~~~~~~ 239 + When a function is mentioned in a comment, changelog, or shortlog (or anywhere 240 + for that matter), use the format ``function_name()``. The parentheses provide 241 + context and disambiguate the reference. 242 + 243 + Testing 244 + ------- 245 + At a bare minimum, *all* patches in a series must build cleanly for KVM_INTEL=m 246 + KVM_AMD=m, and KVM_WERROR=y. Building every possible combination of Kconfigs 247 + isn't feasible, but the more the merrier. KVM_SMM, KVM_XEN, PROVE_LOCKING, and 248 + X86_64 are particularly interesting knobs to turn. 249 + 250 + Running KVM selftests and KVM-unit-tests is also mandatory (and stating the 251 + obvious, the tests need to pass). The only exception is for changes that have 252 + negligible probability of affecting runtime behavior, e.g. patches that only 253 + modify comments. When possible and relevant, testing on both Intel and AMD is 254 + strongly preferred. Booting an actual VM is encouraged, but not mandatory. 255 + 256 + For changes that touch KVM's shadow paging code, running with TDP (EPT/NPT) 257 + disabled is mandatory. For changes that affect common KVM MMU code, running 258 + with TDP disabled is strongly encouraged. For all other changes, if the code 259 + being modified depends on and/or interacts with a module param, testing with 260 + the relevant settings is mandatory. 261 + 262 + Note, KVM selftests and KVM-unit-tests do have known failures. If you suspect 263 + a failure is not due to your changes, verify that the *exact same* failure 264 + occurs with and without your changes. 265 + 266 + Changes that touch reStructured Text documentation, i.e. .rst files, must build 267 + htmldocs cleanly, i.e. with no new warnings or errors. 268 + 269 + If you can't fully test a change, e.g. due to lack of hardware, clearly state 270 + what level of testing you were able to do, e.g. in the cover letter. 271 + 272 + New Features 273 + ~~~~~~~~~~~~ 274 + With one exception, new features *must* come with test coverage. KVM specific 275 + tests aren't strictly required, e.g. if coverage is provided by running a 276 + sufficiently enabled guest VM, or by running a related kernel selftest in a VM, 277 + but dedicated KVM tests are preferred in all cases. Negative testcases in 278 + particular are mandatory for enabling of new hardware features as error and 279 + exception flows are rarely exercised simply by running a VM. 280 + 281 + The only exception to this rule is if KVM is simply advertising support for a 282 + feature via KVM_GET_SUPPORTED_CPUID, i.e. for instructions/features that KVM 283 + can't prevent a guest from using and for which there is no true enabling. 284 + 285 + Note, "new features" does not just mean "new hardware features"! New features 286 + that can't be well validated using existing KVM selftests and/or KVM-unit-tests 287 + must come with tests. 288 + 289 + Posting new feature development without tests to get early feedback is more 290 + than welcome, but such submissions should be tagged RFC, and the cover letter 291 + should clearly state what type of feedback is requested/expected. Do not abuse 292 + the RFC process; RFCs will typically not receive in-depth review. 293 + 294 + Bug Fixes 295 + ~~~~~~~~~ 296 + Except for "obvious" found-by-inspection bugs, fixes must be accompanied by a 297 + reproducer for the bug being fixed. In many cases the reproducer is implicit, 298 + e.g. for build errors and test failures, but it should still be clear to 299 + readers what is broken and how to verify the fix. Some leeway is given for 300 + bugs that are found via non-public workloads/tests, but providing regression 301 + tests for such bugs is strongly preferred. 302 + 303 + In general, regression tests are preferred for any bug that is not trivial to 304 + hit. E.g. even if the bug was originally found by a fuzzer such as syzkaller, 305 + a targeted regression test may be warranted if the bug requires hitting a 306 + one-in-a-million type race condition. 307 + 308 + Note, KVM bugs are rarely urgent *and* non-trivial to reproduce. Ask yourself 309 + if a bug is really truly the end of the world before posting a fix without a 310 + reproducer. 311 + 312 + Posting 313 + ------- 314 + 315 + Links 316 + ~~~~~ 317 + Do not explicitly reference bug reports, prior versions of a patch/series, etc. 318 + via ``In-Reply-To:`` headers. Using ``In-Reply-To:`` becomes an unholy mess 319 + for large series and/or when the version count gets high, and ``In-Reply-To:`` 320 + is useless for anyone that doesn't have the original message, e.g. if someone 321 + wasn't Cc'd on the bug report or if the list of recipients changes between 322 + versions. 323 + 324 + To link to a bug report, previous version, or anything of interest, use lore 325 + links. For referencing previous version(s), generally speaking do not include 326 + a Link: in the changelog as there is no need to record the history in git, i.e. 327 + put the link in the cover letter or in the section git ignores. Do provide a 328 + formal Link: for bug reports and/or discussions that led to the patch. The 329 + context of why a change was made is highly valuable for future readers. 330 + 331 + Git Base 332 + ~~~~~~~~ 333 + If you are using git version 2.9.0 or later (Googlers, this is all of you!), 334 + use ``git format-patch`` with the ``--base`` flag to automatically include the 335 + base tree information in the generated patches. 336 + 337 + Note, ``--base=auto`` works as expected if and only if a branch's upstream is 338 + set to the base topic branch, e.g. it will do the wrong thing if your upstream 339 + is set to your personal repository for backup purposes. An alternative "auto" 340 + solution is to derive the names of your development branches based on their 341 + KVM x86 topic, and feed that into ``--base``. E.g. ``x86/pmu/my_branch_name``, 342 + and then write a small wrapper to extract ``pmu`` from the current branch name 343 + to yield ``--base=x/pmu``, where ``x`` is whatever name your repository uses to 344 + track the KVM x86 remote. 345 + 346 + Co-Posting Tests 347 + ~~~~~~~~~~~~~~~~ 348 + KVM selftests that are associated with KVM changes, e.g. regression tests for 349 + bug fixes, should be posted along with the KVM changes as a single series. The 350 + standard kernel rules for bisection apply, i.e. KVM changes that result in test 351 + failures should be ordered after the selftests updates, and vice versa, new 352 + tests that fail due to KVM bugs should be ordered after the KVM fixes. 353 + 354 + KVM-unit-tests should *always* be posted separately. Tools, e.g. b4 am, don't 355 + know that KVM-unit-tests is a separate repository and get confused when patches 356 + in a series apply on different trees. To tie KVM-unit-tests patches back to 357 + KVM patches, first post the KVM changes and then provide a lore Link: to the 358 + KVM patch/series in the KVM-unit-tests patch(es). 359 + 360 + Notifications 361 + ------------- 362 + When a patch/series is officially accepted, a notification email will be sent 363 + in reply to the original posting (cover letter for multi-patch series). The 364 + notification will include the tree and topic branch, along with the SHA1s of 365 + the commits of applied patches. 366 + 367 + If a subset of patches is applied, this will be clearly stated in the 368 + notification. Unless stated otherwise, it's implied that any patches in the 369 + series that were not accepted need more work and should be submitted in a new 370 + version. 371 + 372 + If for some reason a patch is dropped after officially being accepted, a reply 373 + will be sent to the notification email explaining why the patch was dropped, as 374 + well as the next steps. 375 + 376 + SHA1 Stability 377 + ~~~~~~~~~~~~~~ 378 + SHA1s are not 100% guaranteed to be stable until they land in Linus' tree! A 379 + SHA1 is *usually* stable once a notification has been sent, but things happen. 380 + In most cases, an update to the notification email be provided if an applied 381 + patch's SHA1 changes. However, in some scenarios, e.g. if all KVM x86 branches 382 + need to be rebased, individual notifications will not be given. 383 + 384 + Vulnerabilities 385 + --------------- 386 + Bugs that can be exploited by the guest to attack the host (kernel or 387 + userspace), or that can be exploited by a nested VM to *its* host (L2 attacking 388 + L1), are of particular interest to KVM. Please follow the protocol for 389 + :ref:`securitybugs` if you suspect a bug can lead to an escape, data leak, etc. 390 +
+2
Documentation/process/maintainer-tip.rst
··· 452 452 Some of these options are x86-specific and can be left out when testing 453 453 on other architectures. 454 454 455 + .. _maintainer-tip-coding-style: 456 + 455 457 Coding style notes 456 458 ------------------ 457 459
+1 -1
Documentation/virt/kvm/x86/mmu.rst
··· 205 205 role.passthrough: 206 206 The page is not backed by a guest page table, but its first entry 207 207 points to one. This is set if NPT uses 5-level page tables (host 208 - CR4.LA57=1) and is shadowing L1's 4-level NPT (L1 CR4.LA57=1). 208 + CR4.LA57=1) and is shadowing L1's 4-level NPT (L1 CR4.LA57=0). 209 209 gfn: 210 210 Either the guest page table containing the translations shadowed by this 211 211 page, or the base page frame for linear translations. See role.direct.
+1
MAINTAINERS
··· 11436 11436 M: Paolo Bonzini <pbonzini@redhat.com> 11437 11437 L: kvm@vger.kernel.org 11438 11438 S: Supported 11439 + P: Documentation/process/maintainer-kvm-x86.rst 11439 11440 T: git git://git.kernel.org/pub/scm/virt/kvm/kvm.git 11440 11441 F: arch/x86/include/asm/kvm* 11441 11442 F: arch/x86/include/asm/svm.h
+4 -9
arch/x86/kvm/cpuid.c
··· 501 501 struct kvm_cpuid2 *cpuid, 502 502 struct kvm_cpuid_entry2 __user *entries) 503 503 { 504 - int r; 505 - 506 - r = -E2BIG; 507 504 if (cpuid->nent < vcpu->arch.cpuid_nent) 508 - goto out; 509 - r = -EFAULT; 505 + return -E2BIG; 506 + 510 507 if (copy_to_user(entries, vcpu->arch.cpuid_entries, 511 508 vcpu->arch.cpuid_nent * sizeof(struct kvm_cpuid_entry2))) 512 - goto out; 513 - return 0; 509 + return -EFAULT; 514 510 515 - out: 516 511 cpuid->nent = vcpu->arch.cpuid_nent; 517 - return r; 512 + return 0; 518 513 } 519 514 520 515 /* Mask kvm_cpu_caps for @leaf with the raw CPUID capabilities of this CPU. */
+3
arch/x86/kvm/i8259.c
··· 411 411 pic_clear_isr(s, ret); 412 412 if (addr1 >> 7 || ret != 2) 413 413 pic_update_irq(s->pics_state); 414 + /* Bit 7 is 1, means there's an interrupt */ 415 + ret |= 0x80; 414 416 } else { 417 + /* Bit 7 is 0, means there's no interrupt */ 415 418 ret = 0x07; 416 419 pic_update_irq(s->pics_state); 417 420 }
-5
arch/x86/kvm/lapic.c
··· 51 51 #define mod_64(x, y) ((x) % (y)) 52 52 #endif 53 53 54 - #define PRId64 "d" 55 - #define PRIx64 "llx" 56 - #define PRIu64 "u" 57 - #define PRIo64 "o" 58 - 59 54 /* 14 is the version for Xeon and Pentium 8.4.8*/ 60 55 #define APIC_VERSION 0x14UL 61 56 #define LAPIC_MMIO_LENGTH (1 << 12)
+31 -33
arch/x86/kvm/mtrr.c
··· 25 25 #define IA32_MTRR_DEF_TYPE_FE (1ULL << 10) 26 26 #define IA32_MTRR_DEF_TYPE_TYPE_MASK (0xff) 27 27 28 + static bool is_mtrr_base_msr(unsigned int msr) 29 + { 30 + /* MTRR base MSRs use even numbers, masks use odd numbers. */ 31 + return !(msr & 0x1); 32 + } 33 + 34 + static struct kvm_mtrr_range *var_mtrr_msr_to_range(struct kvm_vcpu *vcpu, 35 + unsigned int msr) 36 + { 37 + int index = (msr - MTRRphysBase_MSR(0)) / 2; 38 + 39 + return &vcpu->arch.mtrr_state.var_ranges[index]; 40 + } 41 + 28 42 static bool msr_mtrr_valid(unsigned msr) 29 43 { 30 44 switch (msr) { 31 - case 0x200 ... 0x200 + 2 * KVM_NR_VAR_MTRR - 1: 45 + case MTRRphysBase_MSR(0) ... MTRRphysMask_MSR(KVM_NR_VAR_MTRR - 1): 32 46 case MSR_MTRRfix64K_00000: 33 47 case MSR_MTRRfix16K_80000: 34 48 case MSR_MTRRfix16K_A0000: ··· 55 41 case MSR_MTRRfix4K_F0000: 56 42 case MSR_MTRRfix4K_F8000: 57 43 case MSR_MTRRdefType: 58 - case MSR_IA32_CR_PAT: 59 44 return true; 60 45 } 61 46 return false; ··· 65 52 return t < 8 && (1 << t) & 0x73; /* 0, 1, 4, 5, 6 */ 66 53 } 67 54 68 - bool kvm_mtrr_valid(struct kvm_vcpu *vcpu, u32 msr, u64 data) 55 + static bool kvm_mtrr_valid(struct kvm_vcpu *vcpu, u32 msr, u64 data) 69 56 { 70 57 int i; 71 58 u64 mask; ··· 73 60 if (!msr_mtrr_valid(msr)) 74 61 return false; 75 62 76 - if (msr == MSR_IA32_CR_PAT) { 77 - return kvm_pat_valid(data); 78 - } else if (msr == MSR_MTRRdefType) { 63 + if (msr == MSR_MTRRdefType) { 79 64 if (data & ~0xcff) 80 65 return false; 81 66 return valid_mtrr_type(data & 0xff); ··· 85 74 } 86 75 87 76 /* variable MTRRs */ 88 - WARN_ON(!(msr >= 0x200 && msr < 0x200 + 2 * KVM_NR_VAR_MTRR)); 77 + WARN_ON(!(msr >= MTRRphysBase_MSR(0) && 78 + msr <= MTRRphysMask_MSR(KVM_NR_VAR_MTRR - 1))); 89 79 90 80 mask = kvm_vcpu_reserved_gpa_bits_raw(vcpu); 91 81 if ((msr & 1) == 0) { ··· 100 88 101 89 return (data & mask) == 0; 102 90 } 103 - EXPORT_SYMBOL_GPL(kvm_mtrr_valid); 104 91 105 92 static bool mtrr_is_enabled(struct kvm_mtrr *mtrr_state) 106 93 { ··· 319 308 { 320 309 struct kvm_mtrr *mtrr_state = &vcpu->arch.mtrr_state; 321 310 gfn_t start, end; 322 - int index; 323 311 324 - if (msr == MSR_IA32_CR_PAT || !tdp_enabled || 325 - !kvm_arch_has_noncoherent_dma(vcpu->kvm)) 312 + if (!tdp_enabled || !kvm_arch_has_noncoherent_dma(vcpu->kvm)) 326 313 return; 327 314 328 315 if (!mtrr_is_enabled(mtrr_state) && msr != MSR_MTRRdefType) ··· 335 326 end = ~0ULL; 336 327 } else { 337 328 /* variable range MTRRs. */ 338 - index = (msr - 0x200) / 2; 339 - var_mtrr_range(&mtrr_state->var_ranges[index], &start, &end); 329 + var_mtrr_range(var_mtrr_msr_to_range(vcpu, msr), &start, &end); 340 330 } 341 331 342 332 kvm_zap_gfn_range(vcpu->kvm, gpa_to_gfn(start), gpa_to_gfn(end)); ··· 350 342 { 351 343 struct kvm_mtrr *mtrr_state = &vcpu->arch.mtrr_state; 352 344 struct kvm_mtrr_range *tmp, *cur; 353 - int index, is_mtrr_mask; 354 345 355 - index = (msr - 0x200) / 2; 356 - is_mtrr_mask = msr - 0x200 - 2 * index; 357 - cur = &mtrr_state->var_ranges[index]; 346 + cur = var_mtrr_msr_to_range(vcpu, msr); 358 347 359 348 /* remove the entry if it's in the list. */ 360 349 if (var_mtrr_range_is_valid(cur)) 361 - list_del(&mtrr_state->var_ranges[index].node); 350 + list_del(&cur->node); 362 351 363 352 /* 364 353 * Set all illegal GPA bits in the mask, since those bits must 365 354 * implicitly be 0. The bits are then cleared when reading them. 366 355 */ 367 - if (!is_mtrr_mask) 356 + if (is_mtrr_base_msr(msr)) 368 357 cur->base = data; 369 358 else 370 359 cur->mask = data | kvm_vcpu_reserved_gpa_bits_raw(vcpu); ··· 387 382 *(u64 *)&vcpu->arch.mtrr_state.fixed_ranges[index] = data; 388 383 else if (msr == MSR_MTRRdefType) 389 384 vcpu->arch.mtrr_state.deftype = data; 390 - else if (msr == MSR_IA32_CR_PAT) 391 - vcpu->arch.pat = data; 392 385 else 393 386 set_var_mtrr_msr(vcpu, msr, data); 394 387 ··· 414 411 return 1; 415 412 416 413 index = fixed_msr_to_range_index(msr); 417 - if (index >= 0) 414 + if (index >= 0) { 418 415 *pdata = *(u64 *)&vcpu->arch.mtrr_state.fixed_ranges[index]; 419 - else if (msr == MSR_MTRRdefType) 416 + } else if (msr == MSR_MTRRdefType) { 420 417 *pdata = vcpu->arch.mtrr_state.deftype; 421 - else if (msr == MSR_IA32_CR_PAT) 422 - *pdata = vcpu->arch.pat; 423 - else { /* Variable MTRRs */ 424 - int is_mtrr_mask; 425 - 426 - index = (msr - 0x200) / 2; 427 - is_mtrr_mask = msr - 0x200 - 2 * index; 428 - if (!is_mtrr_mask) 429 - *pdata = vcpu->arch.mtrr_state.var_ranges[index].base; 418 + } else { 419 + /* Variable MTRRs */ 420 + if (is_mtrr_base_msr(msr)) 421 + *pdata = var_mtrr_msr_to_range(vcpu, msr)->base; 430 422 else 431 - *pdata = vcpu->arch.mtrr_state.var_ranges[index].mask; 423 + *pdata = var_mtrr_msr_to_range(vcpu, msr)->mask; 432 424 433 425 *pdata &= ~kvm_vcpu_reserved_gpa_bits_raw(vcpu); 434 426 }
+5 -4
arch/x86/kvm/svm/svm.c
··· 752 752 753 753 BUG_ON(offset == MSR_INVALID); 754 754 755 - return !!test_bit(bit_write, &tmp); 755 + return test_bit(bit_write, &tmp); 756 756 } 757 757 758 758 static void set_msr_interception_bitmap(struct kvm_vcpu *vcpu, u32 *msrpm, ··· 2939 2939 2940 2940 break; 2941 2941 case MSR_IA32_CR_PAT: 2942 - if (!kvm_mtrr_valid(vcpu, MSR_IA32_CR_PAT, data)) 2943 - return 1; 2944 - vcpu->arch.pat = data; 2942 + ret = kvm_set_msr_common(vcpu, msr); 2943 + if (ret) 2944 + break; 2945 + 2945 2946 svm->vmcb01.ptr->save.g_pat = data; 2946 2947 if (is_guest_mode(vcpu)) 2947 2948 nested_vmcb02_compute_g_pat(svm);
+4 -7
arch/x86/kvm/vmx/vmx.c
··· 2287 2287 return 1; 2288 2288 goto find_uret_msr; 2289 2289 case MSR_IA32_CR_PAT: 2290 - if (!kvm_pat_valid(data)) 2291 - return 1; 2290 + ret = kvm_set_msr_common(vcpu, msr_info); 2291 + if (ret) 2292 + break; 2292 2293 2293 2294 if (is_guest_mode(vcpu) && 2294 2295 get_vmcs12(vcpu)->vm_exit_controls & VM_EXIT_SAVE_IA32_PAT) 2295 2296 get_vmcs12(vcpu)->guest_ia32_pat = data; 2296 2297 2297 - if (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PAT) { 2298 + if (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PAT) 2298 2299 vmcs_write64(GUEST_IA32_PAT, data); 2299 - vcpu->arch.pat = data; 2300 - break; 2301 - } 2302 - ret = kvm_set_msr_common(vcpu, msr_info); 2303 2300 break; 2304 2301 case MSR_IA32_MCG_EXT_CTL: 2305 2302 if ((!msr_info->host_initiated &&
+30 -26
arch/x86/kvm/x86.c
··· 1017 1017 wrmsrl(MSR_IA32_XSS, vcpu->arch.ia32_xss); 1018 1018 } 1019 1019 1020 - #ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS 1021 - if (static_cpu_has(X86_FEATURE_PKU) && 1020 + if (cpu_feature_enabled(X86_FEATURE_PKU) && 1022 1021 vcpu->arch.pkru != vcpu->arch.host_pkru && 1023 1022 ((vcpu->arch.xcr0 & XFEATURE_MASK_PKRU) || 1024 1023 kvm_is_cr4_bit_set(vcpu, X86_CR4_PKE))) 1025 1024 write_pkru(vcpu->arch.pkru); 1026 - #endif /* CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS */ 1027 1025 } 1028 1026 EXPORT_SYMBOL_GPL(kvm_load_guest_xsave_state); 1029 1027 ··· 1030 1032 if (vcpu->arch.guest_state_protected) 1031 1033 return; 1032 1034 1033 - #ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS 1034 - if (static_cpu_has(X86_FEATURE_PKU) && 1035 + if (cpu_feature_enabled(X86_FEATURE_PKU) && 1035 1036 ((vcpu->arch.xcr0 & XFEATURE_MASK_PKRU) || 1036 1037 kvm_is_cr4_bit_set(vcpu, X86_CR4_PKE))) { 1037 1038 vcpu->arch.pkru = rdpkru(); 1038 1039 if (vcpu->arch.pkru != vcpu->arch.host_pkru) 1039 1040 write_pkru(vcpu->arch.host_pkru); 1040 1041 } 1041 - #endif /* CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS */ 1042 1042 1043 1043 if (kvm_is_cr4_bit_set(vcpu, X86_CR4_OSXSAVE)) { 1044 1044 ··· 1423 1427 EXPORT_SYMBOL_GPL(kvm_emulate_rdpmc); 1424 1428 1425 1429 /* 1426 - * List of msr numbers which we expose to userspace through KVM_GET_MSRS 1427 - * and KVM_SET_MSRS, and KVM_GET_MSR_INDEX_LIST. 1428 - * 1429 - * The three MSR lists(msrs_to_save, emulated_msrs, msr_based_features) 1430 - * extract the supported MSRs from the related const lists. 1431 - * msrs_to_save is selected from the msrs_to_save_all to reflect the 1432 - * capabilities of the host cpu. This capabilities test skips MSRs that are 1433 - * kvm-specific. Those are put in emulated_msrs_all; filtering of emulated_msrs 1434 - * may depend on host virtualization features rather than host cpu features. 1430 + * The three MSR lists(msrs_to_save, emulated_msrs, msr_based_features) track 1431 + * the set of MSRs that KVM exposes to userspace through KVM_GET_MSRS, 1432 + * KVM_SET_MSRS, and KVM_GET_MSR_INDEX_LIST. msrs_to_save holds MSRs that 1433 + * require host support, i.e. should be probed via RDMSR. emulated_msrs holds 1434 + * MSRs that KVM emulates without strictly requiring host support. 1435 + * msr_based_features holds MSRs that enumerate features, i.e. are effectively 1436 + * CPUID leafs. Note, msr_based_features isn't mutually exclusive with 1437 + * msrs_to_save and emulated_msrs. 1435 1438 */ 1436 1439 1437 1440 static const u32 msrs_to_save_base[] = { ··· 1526 1531 MSR_IA32_UCODE_REV, 1527 1532 1528 1533 /* 1529 - * The following list leaves out MSRs whose values are determined 1530 - * by arch/x86/kvm/vmx/nested.c based on CPUID or other MSRs. 1531 - * We always support the "true" VMX control MSRs, even if the host 1532 - * processor does not, so I am putting these registers here rather 1533 - * than in msrs_to_save_all. 1534 + * KVM always supports the "true" VMX control MSRs, even if the host 1535 + * does not. The VMX MSRs as a whole are considered "emulated" as KVM 1536 + * doesn't strictly require them to exist in the host (ignoring that 1537 + * KVM would refuse to load in the first place if the core set of MSRs 1538 + * aren't supported). 1534 1539 */ 1535 1540 MSR_IA32_VMX_BASIC, 1536 1541 MSR_IA32_VMX_TRUE_PINBASED_CTLS, ··· 1626 1631 * If we're doing cache flushes (either "always" or "cond") 1627 1632 * we will do one whenever the guest does a vmlaunch/vmresume. 1628 1633 * If an outer hypervisor is doing the cache flush for us 1629 - * (VMENTER_L1D_FLUSH_NESTED_VM), we can safely pass that 1634 + * (ARCH_CAP_SKIP_VMENTRY_L1DFLUSH), we can safely pass that 1630 1635 * capability to the guest too, and if EPT is disabled we're not 1631 1636 * vulnerable. Overall, only VMENTER_L1D_FLUSH_NEVER will 1632 1637 * require a nested hypervisor to do a flush of its own. ··· 1804 1809 unsigned long *bitmap = ranges[i].bitmap; 1805 1810 1806 1811 if ((index >= start) && (index < end) && (flags & type)) { 1807 - allowed = !!test_bit(index - start, bitmap); 1812 + allowed = test_bit(index - start, bitmap); 1808 1813 break; 1809 1814 } 1810 1815 } ··· 3697 3702 return 1; 3698 3703 } 3699 3704 break; 3700 - case 0x200 ... MSR_IA32_MC0_CTL2 - 1: 3701 - case MSR_IA32_MCx_CTL2(KVM_MAX_MCE_BANKS) ... 0x2ff: 3705 + case MSR_IA32_CR_PAT: 3706 + if (!kvm_pat_valid(data)) 3707 + return 1; 3708 + 3709 + vcpu->arch.pat = data; 3710 + break; 3711 + case MTRRphysBase_MSR(0) ... MSR_MTRRfix4K_F8000: 3712 + case MSR_MTRRdefType: 3702 3713 return kvm_mtrr_set_msr(vcpu, msr, data); 3703 3714 case MSR_IA32_APICBASE: 3704 3715 return kvm_set_apic_base(vcpu, msr_info); ··· 4111 4110 msr_info->data = kvm_scale_tsc(rdtsc(), ratio) + offset; 4112 4111 break; 4113 4112 } 4113 + case MSR_IA32_CR_PAT: 4114 + msr_info->data = vcpu->arch.pat; 4115 + break; 4114 4116 case MSR_MTRRcap: 4115 - case 0x200 ... MSR_IA32_MC0_CTL2 - 1: 4116 - case MSR_IA32_MCx_CTL2(KVM_MAX_MCE_BANKS) ... 0x2ff: 4117 + case MTRRphysBase_MSR(0) ... MSR_MTRRfix4K_F8000: 4118 + case MSR_MTRRdefType: 4117 4119 return kvm_mtrr_get_msr(vcpu, msr_info->index, &msr_info->data); 4118 4120 case 0xcd: /* fsb frequency */ 4119 4121 msr_info->data = 3;
-1
arch/x86/kvm/x86.h
··· 309 309 310 310 void kvm_vcpu_mtrr_init(struct kvm_vcpu *vcpu); 311 311 u8 kvm_mtrr_get_guest_memory_type(struct kvm_vcpu *vcpu, gfn_t gfn); 312 - bool kvm_mtrr_valid(struct kvm_vcpu *vcpu, u32 msr, u64 data); 313 312 int kvm_mtrr_set_msr(struct kvm_vcpu *vcpu, u32 msr, u64 data); 314 313 int kvm_mtrr_get_msr(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata); 315 314 bool kvm_mtrr_check_gfn_range_consistency(struct kvm_vcpu *vcpu, gfn_t gfn,
+21
tools/testing/selftests/kvm/x86_64/cpuid_test.c
··· 163 163 ent->eax = eax; 164 164 } 165 165 166 + static void test_get_cpuid2(struct kvm_vcpu *vcpu) 167 + { 168 + struct kvm_cpuid2 *cpuid = allocate_kvm_cpuid2(vcpu->cpuid->nent + 1); 169 + int i, r; 170 + 171 + vcpu_ioctl(vcpu, KVM_GET_CPUID2, cpuid); 172 + TEST_ASSERT(cpuid->nent == vcpu->cpuid->nent, 173 + "KVM didn't update nent on success, wanted %u, got %u\n", 174 + vcpu->cpuid->nent, cpuid->nent); 175 + 176 + for (i = 0; i < vcpu->cpuid->nent; i++) { 177 + cpuid->nent = i; 178 + r = __vcpu_ioctl(vcpu, KVM_GET_CPUID2, cpuid); 179 + TEST_ASSERT(r && errno == E2BIG, KVM_IOCTL_ERROR(KVM_GET_CPUID2, r)); 180 + TEST_ASSERT(cpuid->nent == i, "KVM modified nent on failure"); 181 + } 182 + free(cpuid); 183 + } 184 + 166 185 int main(void) 167 186 { 168 187 struct kvm_vcpu *vcpu; ··· 201 182 run_vcpu(vcpu, stage); 202 183 203 184 set_cpuid_after_run(vcpu); 185 + 186 + test_get_cpuid2(vcpu); 204 187 205 188 kvm_vm_free(vm); 206 189 }