Merge tag 'kvm-x86-misc-6.5' of https://github.com/kvm-x86/linux into HEAD

tjh.dev / kernel

fork atom

Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

kernel os linux

fork atom

Merge tag 'kvm-x86-misc-6.5' of https://github.com/kvm-x86/linux into HEAD

KVM x86 changes for 6.5:

* Move handling of PAT out of MTRR code and dedup SVM+VMX code

* Fix output of PIC poll command emulation when there's an interrupt

* Add a maintainer's handbook to document KVM x86 processes, preferred coding
style, testing expectations, etc.

* Misc cleanups

Paolo Bonzini 2 years ago 36b68d36 d74669eb

+493 -86

14 changed files

expand all collapse all

Documentation

process

maintainer-handbooks.rst

maintainer-kvm-x86.rst

maintainer-tip.rst

virt

kvm

x86

mmu.rst

MAINTAINERS

arch

x86

kvm

cpuid.c

i8259.c

lapic.c

mtrr.c

svm

svm.c

vmx

vmx.c

x86.c

x86.h

tools

testing

selftests

kvm

x86_64

cpuid_test.c

Documentation/process/maintainer-handbooks.rst

reviewed

··· 17 17 18 18 maintainer-tip 19 19 maintainer-netdev 20 20 + maintainer-kvm-x86

+390

Documentation/process/maintainer-kvm-x86.rst

reviewed

··· 1 1 + .. SPDX-License-Identifier: GPL-2.0 2 2 + 3 3 + KVM x86 4 4 + ======= 5 5 + 6 6 + Foreword 7 7 + -------- 8 8 + KVM strives to be a welcoming community; contributions from newcomers are 9 9 + valued and encouraged. Please do not be discouraged or intimidated by the 10 10 + length of this document and the many rules/guidelines it contains. Everyone 11 11 + makes mistakes, and everyone was a newbie at some point. So long as you make 12 12 + an honest effort to follow KVM x86's guidelines, are receptive to feedback, 13 13 + and learn from any mistakes you make, you will be welcomed with open arms, not 14 14 + torches and pitchforks. 15 15 + 16 16 + TL;DR 17 17 + ----- 18 18 + Testing is mandatory. Be consistent with established styles and patterns. 19 19 + 20 20 + Trees 21 21 + ----- 22 22 + KVM x86 is currently in a transition period from being part of the main KVM 23 23 + tree, to being "just another KVM arch". As such, KVM x86 is split across the 24 24 + main KVM tree, ``git.kernel.org/pub/scm/virt/kvm/kvm.git``, and a KVM x86 25 25 + specific tree, ``github.com/kvm-x86/linux.git``. 26 26 + 27 27 + Generally speaking, fixes for the current cycle are applied directly to the 28 28 + main KVM tree, while all development for the next cycle is routed through the 29 29 + KVM x86 tree. In the unlikely event that a fix for the current cycle is routed 30 30 + through the KVM x86 tree, it will be applied to the ``fixes`` branch before 31 31 + making its way to the main KVM tree. 32 32 + 33 33 + Note, this transition period is expected to last quite some time, i.e. will be 34 34 + the status quo for the foreseeable future. 35 35 + 36 36 + Branches 37 37 + ~~~~~~~~ 38 38 + The KVM x86 tree is organized into multiple topic branches. The purpose of 39 39 + using finer-grained topic branches is to make it easier to keep tabs on an area 40 40 + of development, and to limit the collateral damage of human errors and/or buggy 41 41 + commits, e.g. dropping the HEAD commit of a topic branch has no impact on other 42 42 + in-flight commits' SHA1 hashes, and having to reject a pull request due to bugs 43 43 + delays only that topic branch. 44 44 + 45 45 + All topic branches, except for ``next`` and ``fixes``, are rolled into ``next`` 46 46 + via a Cthulhu merge on an as-needed basis, i.e. when a topic branch is updated. 47 47 + As a result, force pushes to ``next`` are common. 48 48 + 49 49 + Lifecycle 50 50 + ~~~~~~~~~ 51 51 + Fixes that target the current release, a.k.a. mainline, are typically applied 52 52 + directly to the main KVM tree, i.e. do not route through the KVM x86 tree. 53 53 + 54 54 + Changes that target the next release are routed through the KVM x86 tree. Pull 55 55 + requests (from KVM x86 to main KVM) are sent for each KVM x86 topic branch, 56 56 + typically the week before Linus' opening of the merge window, e.g. the week 57 57 + following rc7 for "normal" releases. If all goes well, the topic branches are 58 58 + rolled into the main KVM pull request sent during Linus' merge window. 59 59 + 60 60 + The KVM x86 tree doesn't have its own official merge window, but there's a soft 61 61 + close around rc5 for new features, and a soft close around rc6 for fixes (for 62 62 + the next release; see above for fixes that target the current release). 63 63 + 64 64 + Timeline 65 65 + ~~~~~~~~ 66 66 + Submissions are typically reviewed and applied in FIFO order, with some wiggle 67 67 + room for the size of a series, patches that are "cache hot", etc. Fixes, 68 68 + especially for the current release and or stable trees, get to jump the queue. 69 69 + Patches that will be taken through a non-KVM tree (most often through the tip 70 70 + tree) and/or have other acks/reviews also jump the queue to some extent. 71 71 + 72 72 + Note, the vast majority of review is done between rc1 and rc6, give or take. 73 73 + The period between rc6 and the next rc1 is used to catch up on other tasks, 74 74 + i.e. radio silence during this period isn't unusual. 75 75 + 76 76 + Pings to get a status update are welcome, but keep in mind the timing of the 77 77 + current release cycle and have realistic expectations. If you are pinging for 78 78 + acceptance, i.e. not just for feedback or an update, please do everything you 79 79 + can, within reason, to ensure that your patches are ready to be merged! Pings 80 80 + on series that break the build or fail tests lead to unhappy maintainers! 81 81 + 82 82 + Development 83 83 + ----------- 84 84 + 85 85 + Base Tree/Branch 86 86 + ~~~~~~~~~~~~~~~~ 87 87 + Fixes that target the current release, a.k.a. mainline, should be based on 88 88 + ``git://git.kernel.org/pub/scm/virt/kvm/kvm.git master``. Note, fixes do not 89 89 + automatically warrant inclusion in the current release. There is no singular 90 90 + rule, but typically only fixes for bugs that are urgent, critical, and/or were 91 91 + introduced in the current release should target the current release. 92 92 + 93 93 + Everything else should be based on ``kvm-x86/next``, i.e. there is no need to 94 94 + select a specific topic branch as the base. If there are conflicts and/or 95 95 + dependencies across topic branches, it is the maintainer's job to sort them 96 96 + out. 97 97 + 98 98 + The only exception to using ``kvm-x86/next`` as the base is if a patch/series 99 99 + is a multi-arch series, i.e. has non-trivial modifications to common KVM code 100 100 + and/or has more than superficial changes to other architectures' code. Multi- 101 101 + arch patch/series should instead be based on a common, stable point in KVM's 102 102 + history, e.g. the release candidate upon which ``kvm-x86 next`` is based. If 103 103 + you're unsure whether a patch/series is truly multi-arch, err on the side of 104 104 + caution and treat it as multi-arch, i.e. use a common base. 105 105 + 106 106 + Coding Style 107 107 + ~~~~~~~~~~~~ 108 108 + When it comes to style, naming, patterns, etc., consistency is the number one 109 109 + priority in KVM x86. If all else fails, match what already exists. 110 110 + 111 111 + With a few caveats listed below, follow the tip tree maintainers' preferred 112 112 + :ref:`maintainer-tip-coding-style`, as patches/series often touch both KVM and 113 113 + non-KVM x86 files, i.e. draw the attention of KVM *and* tip tree maintainers. 114 114 + 115 115 + Using reverse fir tree, a.k.a. reverse Christmas tree or reverse XMAS tree, for 116 116 + variable declarations isn't strictly required, though it is still preferred. 117 117 + 118 118 + Except for a handful of special snowflakes, do not use kernel-doc comments for 119 119 + functions. The vast majority of "public" KVM functions aren't truly public as 120 120 + they are intended only for KVM-internal consumption (there are plans to 121 121 + privatize KVM's headers and exports to enforce this). 122 122 + 123 123 + Comments 124 124 + ~~~~~~~~ 125 125 + Write comments using imperative mood and avoid pronouns. Use comments to 126 126 + provide a high level overview of the code, and/or to explain why the code does 127 127 + what it does. Do not reiterate what the code literally does; let the code 128 128 + speak for itself. If the code itself is inscrutable, comments will not help. 129 129 + 130 130 + SDM and APM References 131 131 + ~~~~~~~~~~~~~~~~~~~~~~ 132 132 + Much of KVM's code base is directly tied to architectural behavior defined in 133 133 + Intel's Software Development Manual (SDM) and AMD's Architecture Programmer’s 134 134 + Manual (APM). Use of "Intel's SDM" and "AMD's APM", or even just "SDM" or 135 135 + "APM", without additional context is a-ok. 136 136 + 137 137 + Do not reference specific sections, tables, figures, etc. by number, especially 138 138 + not in comments. Instead, if necessary (see below), copy-paste the relevant 139 139 + snippet and reference sections/tables/figures by name. The layouts of the SDM 140 140 + and APM are constantly changing, and so the numbers/labels aren't stable. 141 141 + 142 142 + Generally speaking, do not explicitly reference or copy-paste from the SDM or 143 143 + APM in comments. With few exceptions, KVM *must* honor architectural behavior, 144 144 + therefore it's implied that KVM behavior is emulating SDM and/or APM behavior. 145 145 + Note, referencing the SDM/APM in changelogs to justify the change and provide 146 146 + context is perfectly ok and encouraged. 147 147 + 148 148 + Shortlog 149 149 + ~~~~~~~~ 150 150 + The preferred prefix format is ``KVM: <topic>:``, where ``<topic>`` is one of:: 151 151 + 152 152 + - x86 153 153 + - x86/mmu 154 154 + - x86/pmu 155 155 + - x86/xen 156 156 + - selftests 157 157 + - SVM 158 158 + - nSVM 159 159 + - VMX 160 160 + - nVMX 161 161 + 162 162 + **DO NOT use x86/kvm!** ``x86/kvm`` is used exclusively for Linux-as-a-KVM-guest 163 163 + changes, i.e. for arch/x86/kernel/kvm.c. Do not use file names or complete file 164 164 + paths as the subject/shortlog prefix. 165 165 + 166 166 + Note, these don't align with the topics branches (the topic branches care much 167 167 + more about code conflicts). 168 168 + 169 169 + All names are case sensitive! ``KVM: x86:`` is good, ``kvm: vmx:`` is not. 170 170 + 171 171 + Capitalize the first word of the condensed patch description, but omit ending 172 172 + punctionation. E.g.:: 173 173 + 174 174 + KVM: x86: Fix a null pointer dereference in function_xyz() 175 175 + 176 176 + not:: 177 177 + 178 178 + kvm: x86: fix a null pointer dereference in function_xyz. 179 179 + 180 180 + If a patch touches multiple topics, traverse up the conceptual tree to find the 181 181 + first common parent (which is often simply ``x86``). When in doubt, 182 182 + ``git log path/to/file`` should provide a reasonable hint. 183 183 + 184 184 + New topics do occasionally pop up, but please start an on-list discussion if 185 185 + you want to propose introducing a new topic, i.e. don't go rogue. 186 186 + 187 187 + See :ref:`the_canonical_patch_format` for more information, with one amendment: 188 188 + do not treat the 70-75 character limit as an absolute, hard limit. Instead, 189 189 + use 75 characters as a firm-but-not-hard limit, and use 80 characters as a hard 190 190 + limit. I.e. let the shortlog run a few characters over the standard limit if 191 191 + you have good reason to do so. 192 192 + 193 193 + Changelog 194 194 + ~~~~~~~~~ 195 195 + Most importantly, write changelogs using imperative mood and avoid pronouns. 196 196 + 197 197 + See :ref:`describe_changes` for more information, with one amendment: lead with 198 198 + a short blurb on the actual changes, and then follow up with the context and 199 199 + background. Note! This order directly conflicts with the tip tree's preferred 200 200 + approach! Please follow the tip tree's preferred style when sending patches 201 201 + that primarily target arch/x86 code that is _NOT_ KVM code. 202 202 + 203 203 + Stating what a patch does before diving into details is preferred by KVM x86 204 204 + for several reasons. First and foremost, what code is actually being changed 205 205 + is arguably the most important information, and so that info should be easy to 206 206 + find. Changelogs that bury the "what's actually changing" in a one-liner after 207 207 + 3+ paragraphs of background make it very hard to find that information. 208 208 + 209 209 + For initial review, one could argue the "what's broken" is more important, but 210 210 + for skimming logs and git archaeology, the gory details matter less and less. 211 211 + E.g. when doing a series of "git blame", the details of each change along the 212 212 + way are useless, the details only matter for the culprit. Providing the "what 213 213 + changed" makes it easy to quickly determine whether or not a commit might be of 214 214 + interest. 215 215 + 216 216 + Another benefit of stating "what's changing" first is that it's almost always 217 217 + possible to state "what's changing" in a single sentence. Conversely, all but 218 218 + the most simple bugs require multiple sentences or paragraphs to fully describe 219 219 + the problem. If both the "what's changing" and "what's the bug" are super 220 220 + short then the order doesn't matter. But if one is shorter (almost always the 221 221 + "what's changing), then covering the shorter one first is advantageous because 222 222 + it's less of an inconvenience for readers/reviewers that have a strict ordering 223 223 + preference. E.g. having to skip one sentence to get to the context is less 224 224 + painful than having to skip three paragraphs to get to "what's changing". 225 225 + 226 226 + Fixes 227 227 + ~~~~~ 228 228 + If a change fixes a KVM/kernel bug, add a Fixes: tag even if the change doesn't 229 229 + need to be backported to stable kernels, and even if the change fixes a bug in 230 230 + an older release. 231 231 + 232 232 + Conversely, if a fix does need to be backported, explicitly tag the patch with 233 233 + "Cc: stable@vger.kernel" (though the email itself doesn't need to Cc: stable); 234 234 + KVM x86 opts out of backporting Fixes: by default. Some auto-selected patches 235 235 + do get backported, but require explicit maintainer approval (search MANUALSEL). 236 236 + 237 237 + Function References 238 238 + ~~~~~~~~~~~~~~~~~~~ 239 239 + When a function is mentioned in a comment, changelog, or shortlog (or anywhere 240 240 + for that matter), use the format ``function_name()``. The parentheses provide 241 241 + context and disambiguate the reference. 242 242 + 243 243 + Testing 244 244 + ------- 245 245 + At a bare minimum, *all* patches in a series must build cleanly for KVM_INTEL=m 246 246 + KVM_AMD=m, and KVM_WERROR=y. Building every possible combination of Kconfigs 247 247 + isn't feasible, but the more the merrier. KVM_SMM, KVM_XEN, PROVE_LOCKING, and 248 248 + X86_64 are particularly interesting knobs to turn. 249 249 + 250 250 + Running KVM selftests and KVM-unit-tests is also mandatory (and stating the 251 251 + obvious, the tests need to pass). The only exception is for changes that have 252 252 + negligible probability of affecting runtime behavior, e.g. patches that only 253 253 + modify comments. When possible and relevant, testing on both Intel and AMD is 254 254 + strongly preferred. Booting an actual VM is encouraged, but not mandatory. 255 255 + 256 256 + For changes that touch KVM's shadow paging code, running with TDP (EPT/NPT) 257 257 + disabled is mandatory. For changes that affect common KVM MMU code, running 258 258 + with TDP disabled is strongly encouraged. For all other changes, if the code 259 259 + being modified depends on and/or interacts with a module param, testing with 260 260 + the relevant settings is mandatory. 261 261 + 262 262 + Note, KVM selftests and KVM-unit-tests do have known failures. If you suspect 263 263 + a failure is not due to your changes, verify that the *exact same* failure 264 264 + occurs with and without your changes. 265 265 + 266 266 + Changes that touch reStructured Text documentation, i.e. .rst files, must build 267 267 + htmldocs cleanly, i.e. with no new warnings or errors. 268 268 + 269 269 + If you can't fully test a change, e.g. due to lack of hardware, clearly state 270 270 + what level of testing you were able to do, e.g. in the cover letter. 271 271 + 272 272 + New Features 273 273 + ~~~~~~~~~~~~ 274 274 + With one exception, new features *must* come with test coverage. KVM specific 275 275 + tests aren't strictly required, e.g. if coverage is provided by running a 276 276 + sufficiently enabled guest VM, or by running a related kernel selftest in a VM, 277 277 + but dedicated KVM tests are preferred in all cases. Negative testcases in 278 278 + particular are mandatory for enabling of new hardware features as error and 279 279 + exception flows are rarely exercised simply by running a VM. 280 280 + 281 281 + The only exception to this rule is if KVM is simply advertising support for a 282 282 + feature via KVM_GET_SUPPORTED_CPUID, i.e. for instructions/features that KVM 283 283 + can't prevent a guest from using and for which there is no true enabling. 284 284 + 285 285 + Note, "new features" does not just mean "new hardware features"! New features 286 286 + that can't be well validated using existing KVM selftests and/or KVM-unit-tests 287 287 + must come with tests. 288 288 + 289 289 + Posting new feature development without tests to get early feedback is more 290 290 + than welcome, but such submissions should be tagged RFC, and the cover letter 291 291 + should clearly state what type of feedback is requested/expected. Do not abuse 292 292 + the RFC process; RFCs will typically not receive in-depth review. 293 293 + 294 294 + Bug Fixes 295 295 + ~~~~~~~~~ 296 296 + Except for "obvious" found-by-inspection bugs, fixes must be accompanied by a 297 297 + reproducer for the bug being fixed. In many cases the reproducer is implicit, 298 298 + e.g. for build errors and test failures, but it should still be clear to 299 299 + readers what is broken and how to verify the fix. Some leeway is given for 300 300 + bugs that are found via non-public workloads/tests, but providing regression 301 301 + tests for such bugs is strongly preferred. 302 302 + 303 303 + In general, regression tests are preferred for any bug that is not trivial to 304 304 + hit. E.g. even if the bug was originally found by a fuzzer such as syzkaller, 305 305 + a targeted regression test may be warranted if the bug requires hitting a 306 306 + one-in-a-million type race condition. 307 307 + 308 308 + Note, KVM bugs are rarely urgent *and* non-trivial to reproduce. Ask yourself 309 309 + if a bug is really truly the end of the world before posting a fix without a 310 310 + reproducer. 311 311 + 312 312 + Posting 313 313 + ------- 314 314 + 315 315 + Links 316 316 + ~~~~~ 317 317 + Do not explicitly reference bug reports, prior versions of a patch/series, etc. 318 318 + via ``In-Reply-To:`` headers. Using ``In-Reply-To:`` becomes an unholy mess 319 319 + for large series and/or when the version count gets high, and ``In-Reply-To:`` 320 320 + is useless for anyone that doesn't have the original message, e.g. if someone 321 321 + wasn't Cc'd on the bug report or if the list of recipients changes between 322 322 + versions. 323 323 + 324 324 + To link to a bug report, previous version, or anything of interest, use lore 325 325 + links. For referencing previous version(s), generally speaking do not include 326 326 + a Link: in the changelog as there is no need to record the history in git, i.e. 327 327 + put the link in the cover letter or in the section git ignores. Do provide a 328 328 + formal Link: for bug reports and/or discussions that led to the patch. The 329 329 + context of why a change was made is highly valuable for future readers. 330 330 + 331 331 + Git Base 332 332 + ~~~~~~~~ 333 333 + If you are using git version 2.9.0 or later (Googlers, this is all of you!), 334 334 + use ``git format-patch`` with the ``--base`` flag to automatically include the 335 335 + base tree information in the generated patches. 336 336 + 337 337 + Note, ``--base=auto`` works as expected if and only if a branch's upstream is 338 338 + set to the base topic branch, e.g. it will do the wrong thing if your upstream 339 339 + is set to your personal repository for backup purposes. An alternative "auto" 340 340 + solution is to derive the names of your development branches based on their 341 341 + KVM x86 topic, and feed that into ``--base``. E.g. ``x86/pmu/my_branch_name``, 342 342 + and then write a small wrapper to extract ``pmu`` from the current branch name 343 343 + to yield ``--base=x/pmu``, where ``x`` is whatever name your repository uses to 344 344 + track the KVM x86 remote. 345 345 + 346 346 + Co-Posting Tests 347 347 + ~~~~~~~~~~~~~~~~ 348 348 + KVM selftests that are associated with KVM changes, e.g. regression tests for 349 349 + bug fixes, should be posted along with the KVM changes as a single series. The 350 350 + standard kernel rules for bisection apply, i.e. KVM changes that result in test 351 351 + failures should be ordered after the selftests updates, and vice versa, new 352 352 + tests that fail due to KVM bugs should be ordered after the KVM fixes. 353 353 + 354 354 + KVM-unit-tests should *always* be posted separately. Tools, e.g. b4 am, don't 355 355 + know that KVM-unit-tests is a separate repository and get confused when patches 356 356 + in a series apply on different trees. To tie KVM-unit-tests patches back to 357 357 + KVM patches, first post the KVM changes and then provide a lore Link: to the 358 358 + KVM patch/series in the KVM-unit-tests patch(es). 359 359 + 360 360 + Notifications 361 361 + ------------- 362 362 + When a patch/series is officially accepted, a notification email will be sent 363 363 + in reply to the original posting (cover letter for multi-patch series). The 364 364 + notification will include the tree and topic branch, along with the SHA1s of 365 365 + the commits of applied patches. 366 366 + 367 367 + If a subset of patches is applied, this will be clearly stated in the 368 368 + notification. Unless stated otherwise, it's implied that any patches in the 369 369 + series that were not accepted need more work and should be submitted in a new 370 370 + version. 371 371 + 372 372 + If for some reason a patch is dropped after officially being accepted, a reply 373 373 + will be sent to the notification email explaining why the patch was dropped, as 374 374 + well as the next steps. 375 375 + 376 376 + SHA1 Stability 377 377 + ~~~~~~~~~~~~~~ 378 378 + SHA1s are not 100% guaranteed to be stable until they land in Linus' tree! A 379 379 + SHA1 is *usually* stable once a notification has been sent, but things happen. 380 380 + In most cases, an update to the notification email be provided if an applied 381 381 + patch's SHA1 changes. However, in some scenarios, e.g. if all KVM x86 branches 382 382 + need to be rebased, individual notifications will not be given. 383 383 + 384 384 + Vulnerabilities 385 385 + --------------- 386 386 + Bugs that can be exploited by the guest to attack the host (kernel or 387 387 + userspace), or that can be exploited by a nested VM to *its* host (L2 attacking 388 388 + L1), are of particular interest to KVM. Please follow the protocol for 389 389 + :ref:`securitybugs` if you suspect a bug can lead to an escape, data leak, etc. 390 390 +

Documentation/process/maintainer-tip.rst

reviewed

··· 452 452 Some of these options are x86-specific and can be left out when testing 453 453 on other architectures. 454 454 455 455 + .. _maintainer-tip-coding-style: 456 456 + 455 457 Coding style notes 456 458 ------------------ 457 459

+1 -1

Documentation/virt/kvm/x86/mmu.rst

reviewed

··· 205 205 role.passthrough: 206 206 The page is not backed by a guest page table, but its first entry 207 207 points to one. This is set if NPT uses 5-level page tables (host 208 208 - CR4.LA57=1) and is shadowing L1's 4-level NPT (L1 CR4.LA57=1). 208 208 + CR4.LA57=1) and is shadowing L1's 4-level NPT (L1 CR4.LA57=0). 209 209 gfn: 210 210 Either the guest page table containing the translations shadowed by this 211 211 page, or the base page frame for linear translations. See role.direct.

MAINTAINERS

reviewed

··· 11436 11436 M: Paolo Bonzini <pbonzini@redhat.com> 11437 11437 L: kvm@vger.kernel.org 11438 11438 S: Supported 11439 11439 + P: Documentation/process/maintainer-kvm-x86.rst 11439 11440 T: git git://git.kernel.org/pub/scm/virt/kvm/kvm.git 11440 11441 F: arch/x86/include/asm/kvm* 11441 11442 F: arch/x86/include/asm/svm.h

+4 -9

arch/x86/kvm/cpuid.c

reviewed

··· 501 501 struct kvm_cpuid2 *cpuid, 502 502 struct kvm_cpuid_entry2 __user *entries) 503 503 { 504 504 - int r; 505 505 - 506 506 - r = -E2BIG; 507 504 if (cpuid->nent < vcpu->arch.cpuid_nent) 508 508 - goto out; 509 509 - r = -EFAULT; 505 505 + return -E2BIG; 506 506 + 510 507 if (copy_to_user(entries, vcpu->arch.cpuid_entries, 511 508 vcpu->arch.cpuid_nent * sizeof(struct kvm_cpuid_entry2))) 512 512 - goto out; 513 513 - return 0; 509 509 + return -EFAULT; 514 510 515 515 - out: 516 511 cpuid->nent = vcpu->arch.cpuid_nent; 517 517 - return r; 512 512 + return 0; 518 513 } 519 514 520 515 /* Mask kvm_cpu_caps for @leaf with the raw CPUID capabilities of this CPU. */

arch/x86/kvm/i8259.c

reviewed

··· 411 411 pic_clear_isr(s, ret); 412 412 if (addr1 >> 7 || ret != 2) 413 413 pic_update_irq(s->pics_state); 414 414 + /* Bit 7 is 1, means there's an interrupt */ 415 415 + ret |= 0x80; 414 416 } else { 417 417 + /* Bit 7 is 0, means there's no interrupt */ 415 418 ret = 0x07; 416 419 pic_update_irq(s->pics_state); 417 420 }

-5

arch/x86/kvm/lapic.c

reviewed

··· 51 51 #define mod_64(x, y) ((x) % (y)) 52 52 #endif 53 53 54 54 - #define PRId64 "d" 55 55 - #define PRIx64 "llx" 56 56 - #define PRIu64 "u" 57 57 - #define PRIo64 "o" 58 58 - 59 54 /* 14 is the version for Xeon and Pentium 8.4.8*/ 60 55 #define APIC_VERSION 0x14UL 61 56 #define LAPIC_MMIO_LENGTH (1 << 12)

+31 -33

arch/x86/kvm/mtrr.c

reviewed

··· 25 25 #define IA32_MTRR_DEF_TYPE_FE (1ULL << 10) 26 26 #define IA32_MTRR_DEF_TYPE_TYPE_MASK (0xff) 27 27 28 28 + static bool is_mtrr_base_msr(unsigned int msr) 29 29 + { 30 30 + /* MTRR base MSRs use even numbers, masks use odd numbers. */ 31 31 + return !(msr & 0x1); 32 32 + } 33 33 + 34 34 + static struct kvm_mtrr_range *var_mtrr_msr_to_range(struct kvm_vcpu *vcpu, 35 35 + unsigned int msr) 36 36 + { 37 37 + int index = (msr - MTRRphysBase_MSR(0)) / 2; 38 38 + 39 39 + return &vcpu->arch.mtrr_state.var_ranges[index]; 40 40 + } 41 41 + 28 42 static bool msr_mtrr_valid(unsigned msr) 29 43 { 30 44 switch (msr) { 31 31 - case 0x200 ... 0x200 + 2 * KVM_NR_VAR_MTRR - 1: 45 45 + case MTRRphysBase_MSR(0) ... MTRRphysMask_MSR(KVM_NR_VAR_MTRR - 1): 32 46 case MSR_MTRRfix64K_00000: 33 47 case MSR_MTRRfix16K_80000: 34 48 case MSR_MTRRfix16K_A0000: ··· 55 41 case MSR_MTRRfix4K_F0000: 56 42 case MSR_MTRRfix4K_F8000: 57 43 case MSR_MTRRdefType: 58 58 - case MSR_IA32_CR_PAT: 59 44 return true; 60 45 } 61 46 return false; ··· 65 52 return t < 8 && (1 << t) & 0x73; /* 0, 1, 4, 5, 6 */ 66 53 } 67 54 68 68 - bool kvm_mtrr_valid(struct kvm_vcpu *vcpu, u32 msr, u64 data) 55 55 + static bool kvm_mtrr_valid(struct kvm_vcpu *vcpu, u32 msr, u64 data) 69 56 { 70 57 int i; 71 58 u64 mask; ··· 73 60 if (!msr_mtrr_valid(msr)) 74 61 return false; 75 62 76 76 - if (msr == MSR_IA32_CR_PAT) { 77 77 - return kvm_pat_valid(data); 78 78 - } else if (msr == MSR_MTRRdefType) { 63 63 + if (msr == MSR_MTRRdefType) { 79 64 if (data & ~0xcff) 80 65 return false; 81 66 return valid_mtrr_type(data & 0xff); ··· 85 74 } 86 75 87 76 /* variable MTRRs */ 88 88 - WARN_ON(!(msr >= 0x200 && msr < 0x200 + 2 * KVM_NR_VAR_MTRR)); 77 77 + WARN_ON(!(msr >= MTRRphysBase_MSR(0) && 78 78 + msr <= MTRRphysMask_MSR(KVM_NR_VAR_MTRR - 1))); 89 79 90 80 mask = kvm_vcpu_reserved_gpa_bits_raw(vcpu); 91 81 if ((msr & 1) == 0) { ··· 100 88 101 89 return (data & mask) == 0; 102 90 } 103 103 - EXPORT_SYMBOL_GPL(kvm_mtrr_valid); 104 91 105 92 static bool mtrr_is_enabled(struct kvm_mtrr *mtrr_state) 106 93 { ··· 319 308 { 320 309 struct kvm_mtrr *mtrr_state = &vcpu->arch.mtrr_state; 321 310 gfn_t start, end; 322 322 - int index; 323 311 324 324 - if (msr == MSR_IA32_CR_PAT || !tdp_enabled || 325 325 - !kvm_arch_has_noncoherent_dma(vcpu->kvm)) 312 312 + if (!tdp_enabled || !kvm_arch_has_noncoherent_dma(vcpu->kvm)) 326 313 return; 327 314 328 315 if (!mtrr_is_enabled(mtrr_state) && msr != MSR_MTRRdefType) ··· 335 326 end = ~0ULL; 336 327 } else { 337 328 /* variable range MTRRs. */ 338 338 - index = (msr - 0x200) / 2; 339 339 - var_mtrr_range(&mtrr_state->var_ranges[index], &start, &end); 329 329 + var_mtrr_range(var_mtrr_msr_to_range(vcpu, msr), &start, &end); 340 330 } 341 331 342 332 kvm_zap_gfn_range(vcpu->kvm, gpa_to_gfn(start), gpa_to_gfn(end)); ··· 350 342 { 351 343 struct kvm_mtrr *mtrr_state = &vcpu->arch.mtrr_state; 352 344 struct kvm_mtrr_range *tmp, *cur; 353 353 - int index, is_mtrr_mask; 354 345 355 355 - index = (msr - 0x200) / 2; 356 356 - is_mtrr_mask = msr - 0x200 - 2 * index; 357 357 - cur = &mtrr_state->var_ranges[index]; 346 346 + cur = var_mtrr_msr_to_range(vcpu, msr); 358 347 359 348 /* remove the entry if it's in the list. */ 360 349 if (var_mtrr_range_is_valid(cur)) 361 361 - list_del(&mtrr_state->var_ranges[index].node); 350 350 + list_del(&cur->node); 362 351 363 352 /* 364 353 * Set all illegal GPA bits in the mask, since those bits must 365 354 * implicitly be 0. The bits are then cleared when reading them. 366 355 */ 367 367 - if (!is_mtrr_mask) 356 356 + if (is_mtrr_base_msr(msr)) 368 357 cur->base = data; 369 358 else 370 359 cur->mask = data | kvm_vcpu_reserved_gpa_bits_raw(vcpu); ··· 387 382 *(u64 *)&vcpu->arch.mtrr_state.fixed_ranges[index] = data; 388 383 else if (msr == MSR_MTRRdefType) 389 384 vcpu->arch.mtrr_state.deftype = data; 390 390 - else if (msr == MSR_IA32_CR_PAT) 391 391 - vcpu->arch.pat = data; 392 385 else 393 386 set_var_mtrr_msr(vcpu, msr, data); 394 387 ··· 414 411 return 1; 415 412 416 413 index = fixed_msr_to_range_index(msr); 417 417 - if (index >= 0) 414 414 + if (index >= 0) { 418 415 *pdata = *(u64 *)&vcpu->arch.mtrr_state.fixed_ranges[index]; 419 419 - else if (msr == MSR_MTRRdefType) 416 416 + } else if (msr == MSR_MTRRdefType) { 420 417 *pdata = vcpu->arch.mtrr_state.deftype; 421 421 - else if (msr == MSR_IA32_CR_PAT) 422 422 - *pdata = vcpu->arch.pat; 423 423 - else { /* Variable MTRRs */ 424 424 - int is_mtrr_mask; 425 425 - 426 426 - index = (msr - 0x200) / 2; 427 427 - is_mtrr_mask = msr - 0x200 - 2 * index; 428 428 - if (!is_mtrr_mask) 429 429 - *pdata = vcpu->arch.mtrr_state.var_ranges[index].base; 418 418 + } else { 419 419 + /* Variable MTRRs */ 420 420 + if (is_mtrr_base_msr(msr)) 421 421 + *pdata = var_mtrr_msr_to_range(vcpu, msr)->base; 430 422 else 431 431 - *pdata = vcpu->arch.mtrr_state.var_ranges[index].mask; 423 423 + *pdata = var_mtrr_msr_to_range(vcpu, msr)->mask; 432 424 433 425 *pdata &= ~kvm_vcpu_reserved_gpa_bits_raw(vcpu); 434 426 }

+5 -4

arch/x86/kvm/svm/svm.c

reviewed

··· 752 752 753 753 BUG_ON(offset == MSR_INVALID); 754 754 755 755 - return !!test_bit(bit_write, &tmp); 755 755 + return test_bit(bit_write, &tmp); 756 756 } 757 757 758 758 static void set_msr_interception_bitmap(struct kvm_vcpu *vcpu, u32 *msrpm, ··· 2939 2939 2940 2940 break; 2941 2941 case MSR_IA32_CR_PAT: 2942 2942 - if (!kvm_mtrr_valid(vcpu, MSR_IA32_CR_PAT, data)) 2943 2943 - return 1; 2944 2944 - vcpu->arch.pat = data; 2942 2942 + ret = kvm_set_msr_common(vcpu, msr); 2943 2943 + if (ret) 2944 2944 + break; 2945 2945 + 2945 2946 svm->vmcb01.ptr->save.g_pat = data; 2946 2947 if (is_guest_mode(vcpu)) 2947 2948 nested_vmcb02_compute_g_pat(svm);

+4 -7

arch/x86/kvm/vmx/vmx.c

reviewed

··· 2287 2287 return 1; 2288 2288 goto find_uret_msr; 2289 2289 case MSR_IA32_CR_PAT: 2290 2290 - if (!kvm_pat_valid(data)) 2291 2291 - return 1; 2290 2290 + ret = kvm_set_msr_common(vcpu, msr_info); 2291 2291 + if (ret) 2292 2292 + break; 2292 2293 2293 2294 if (is_guest_mode(vcpu) && 2294 2295 get_vmcs12(vcpu)->vm_exit_controls & VM_EXIT_SAVE_IA32_PAT) 2295 2296 get_vmcs12(vcpu)->guest_ia32_pat = data; 2296 2297 2297 2297 - if (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PAT) { 2298 2298 + if (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PAT) 2298 2299 vmcs_write64(GUEST_IA32_PAT, data); 2299 2299 - vcpu->arch.pat = data; 2300 2300 - break; 2301 2301 - } 2302 2302 - ret = kvm_set_msr_common(vcpu, msr_info); 2303 2300 break; 2304 2301 case MSR_IA32_MCG_EXT_CTL: 2305 2302 if ((!msr_info->host_initiated &&

+30 -26

arch/x86/kvm/x86.c

reviewed

··· 1017 1017 wrmsrl(MSR_IA32_XSS, vcpu->arch.ia32_xss); 1018 1018 } 1019 1019 1020 1020 - #ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS 1021 1021 - if (static_cpu_has(X86_FEATURE_PKU) && 1020 1020 + if (cpu_feature_enabled(X86_FEATURE_PKU) && 1022 1021 vcpu->arch.pkru != vcpu->arch.host_pkru && 1023 1022 ((vcpu->arch.xcr0 & XFEATURE_MASK_PKRU) || 1024 1023 kvm_is_cr4_bit_set(vcpu, X86_CR4_PKE))) 1025 1024 write_pkru(vcpu->arch.pkru); 1026 1026 - #endif /* CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS */ 1027 1025 } 1028 1026 EXPORT_SYMBOL_GPL(kvm_load_guest_xsave_state); 1029 1027 ··· 1030 1032 if (vcpu->arch.guest_state_protected) 1031 1033 return; 1032 1034 1033 1033 - #ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS 1034 1034 - if (static_cpu_has(X86_FEATURE_PKU) && 1035 1035 + if (cpu_feature_enabled(X86_FEATURE_PKU) && 1035 1036 ((vcpu->arch.xcr0 & XFEATURE_MASK_PKRU) || 1036 1037 kvm_is_cr4_bit_set(vcpu, X86_CR4_PKE))) { 1037 1038 vcpu->arch.pkru = rdpkru(); 1038 1039 if (vcpu->arch.pkru != vcpu->arch.host_pkru) 1039 1040 write_pkru(vcpu->arch.host_pkru); 1040 1041 } 1041 1041 - #endif /* CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS */ 1042 1042 1043 1043 if (kvm_is_cr4_bit_set(vcpu, X86_CR4_OSXSAVE)) { 1044 1044 ··· 1423 1427 EXPORT_SYMBOL_GPL(kvm_emulate_rdpmc); 1424 1428 1425 1429 /* 1426 1426 - * List of msr numbers which we expose to userspace through KVM_GET_MSRS 1427 1427 - * and KVM_SET_MSRS, and KVM_GET_MSR_INDEX_LIST. 1428 1428 - * 1429 1429 - * The three MSR lists(msrs_to_save, emulated_msrs, msr_based_features) 1430 1430 - * extract the supported MSRs from the related const lists. 1431 1431 - * msrs_to_save is selected from the msrs_to_save_all to reflect the 1432 1432 - * capabilities of the host cpu. This capabilities test skips MSRs that are 1433 1433 - * kvm-specific. Those are put in emulated_msrs_all; filtering of emulated_msrs 1434 1434 - * may depend on host virtualization features rather than host cpu features. 1430 1430 + * The three MSR lists(msrs_to_save, emulated_msrs, msr_based_features) track 1431 1431 + * the set of MSRs that KVM exposes to userspace through KVM_GET_MSRS, 1432 1432 + * KVM_SET_MSRS, and KVM_GET_MSR_INDEX_LIST. msrs_to_save holds MSRs that 1433 1433 + * require host support, i.e. should be probed via RDMSR. emulated_msrs holds 1434 1434 + * MSRs that KVM emulates without strictly requiring host support. 1435 1435 + * msr_based_features holds MSRs that enumerate features, i.e. are effectively 1436 1436 + * CPUID leafs. Note, msr_based_features isn't mutually exclusive with 1437 1437 + * msrs_to_save and emulated_msrs. 1435 1438 */ 1436 1439 1437 1440 static const u32 msrs_to_save_base[] = { ··· 1526 1531 MSR_IA32_UCODE_REV, 1527 1532 1528 1533 /* 1529 1529 - * The following list leaves out MSRs whose values are determined 1530 1530 - * by arch/x86/kvm/vmx/nested.c based on CPUID or other MSRs. 1531 1531 - * We always support the "true" VMX control MSRs, even if the host 1532 1532 - * processor does not, so I am putting these registers here rather 1533 1533 - * than in msrs_to_save_all. 1534 1534 + * KVM always supports the "true" VMX control MSRs, even if the host 1535 1535 + * does not. The VMX MSRs as a whole are considered "emulated" as KVM 1536 1536 + * doesn't strictly require them to exist in the host (ignoring that 1537 1537 + * KVM would refuse to load in the first place if the core set of MSRs 1538 1538 + * aren't supported). 1534 1539 */ 1535 1540 MSR_IA32_VMX_BASIC, 1536 1541 MSR_IA32_VMX_TRUE_PINBASED_CTLS, ··· 1626 1631 * If we're doing cache flushes (either "always" or "cond") 1627 1632 * we will do one whenever the guest does a vmlaunch/vmresume. 1628 1633 * If an outer hypervisor is doing the cache flush for us 1629 1629 - * (VMENTER_L1D_FLUSH_NESTED_VM), we can safely pass that 1634 1634 + * (ARCH_CAP_SKIP_VMENTRY_L1DFLUSH), we can safely pass that 1630 1635 * capability to the guest too, and if EPT is disabled we're not 1631 1636 * vulnerable. Overall, only VMENTER_L1D_FLUSH_NEVER will 1632 1637 * require a nested hypervisor to do a flush of its own. ··· 1804 1809 unsigned long *bitmap = ranges[i].bitmap; 1805 1810 1806 1811 if ((index >= start) && (index < end) && (flags & type)) { 1807 1807 - allowed = !!test_bit(index - start, bitmap); 1812 1812 + allowed = test_bit(index - start, bitmap); 1808 1813 break; 1809 1814 } 1810 1815 } ··· 3697 3702 return 1; 3698 3703 } 3699 3704 break; 3700 3700 - case 0x200 ... MSR_IA32_MC0_CTL2 - 1: 3701 3701 - case MSR_IA32_MCx_CTL2(KVM_MAX_MCE_BANKS) ... 0x2ff: 3705 3705 + case MSR_IA32_CR_PAT: 3706 3706 + if (!kvm_pat_valid(data)) 3707 3707 + return 1; 3708 3708 + 3709 3709 + vcpu->arch.pat = data; 3710 3710 + break; 3711 3711 + case MTRRphysBase_MSR(0) ... MSR_MTRRfix4K_F8000: 3712 3712 + case MSR_MTRRdefType: 3702 3713 return kvm_mtrr_set_msr(vcpu, msr, data); 3703 3714 case MSR_IA32_APICBASE: 3704 3715 return kvm_set_apic_base(vcpu, msr_info); ··· 4111 4110 msr_info->data = kvm_scale_tsc(rdtsc(), ratio) + offset; 4112 4111 break; 4113 4112 } 4113 4113 + case MSR_IA32_CR_PAT: 4114 4114 + msr_info->data = vcpu->arch.pat; 4115 4115 + break; 4114 4116 case MSR_MTRRcap: 4115 4115 - case 0x200 ... MSR_IA32_MC0_CTL2 - 1: 4116 4116 - case MSR_IA32_MCx_CTL2(KVM_MAX_MCE_BANKS) ... 0x2ff: 4117 4117 + case MTRRphysBase_MSR(0) ... MSR_MTRRfix4K_F8000: 4118 4118 + case MSR_MTRRdefType: 4117 4119 return kvm_mtrr_get_msr(vcpu, msr_info->index, &msr_info->data); 4118 4120 case 0xcd: /* fsb frequency */ 4119 4121 msr_info->data = 3;

-1

arch/x86/kvm/x86.h

reviewed

··· 309 309 310 310 void kvm_vcpu_mtrr_init(struct kvm_vcpu *vcpu); 311 311 u8 kvm_mtrr_get_guest_memory_type(struct kvm_vcpu *vcpu, gfn_t gfn); 312 312 - bool kvm_mtrr_valid(struct kvm_vcpu *vcpu, u32 msr, u64 data); 313 312 int kvm_mtrr_set_msr(struct kvm_vcpu *vcpu, u32 msr, u64 data); 314 313 int kvm_mtrr_get_msr(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata); 315 314 bool kvm_mtrr_check_gfn_range_consistency(struct kvm_vcpu *vcpu, gfn_t gfn,

+21

tools/testing/selftests/kvm/x86_64/cpuid_test.c

reviewed

··· 163 163 ent->eax = eax; 164 164 } 165 165 166 166 + static void test_get_cpuid2(struct kvm_vcpu *vcpu) 167 167 + { 168 168 + struct kvm_cpuid2 *cpuid = allocate_kvm_cpuid2(vcpu->cpuid->nent + 1); 169 169 + int i, r; 170 170 + 171 171 + vcpu_ioctl(vcpu, KVM_GET_CPUID2, cpuid); 172 172 + TEST_ASSERT(cpuid->nent == vcpu->cpuid->nent, 173 173 + "KVM didn't update nent on success, wanted %u, got %u\n", 174 174 + vcpu->cpuid->nent, cpuid->nent); 175 175 + 176 176 + for (i = 0; i < vcpu->cpuid->nent; i++) { 177 177 + cpuid->nent = i; 178 178 + r = __vcpu_ioctl(vcpu, KVM_GET_CPUID2, cpuid); 179 179 + TEST_ASSERT(r && errno == E2BIG, KVM_IOCTL_ERROR(KVM_GET_CPUID2, r)); 180 180 + TEST_ASSERT(cpuid->nent == i, "KVM modified nent on failure"); 181 181 + } 182 182 + free(cpuid); 183 183 + } 184 184 + 166 185 int main(void) 167 186 { 168 187 struct kvm_vcpu *vcpu; ··· 201 182 run_vcpu(vcpu, stage); 202 183 203 184 set_cpuid_after_run(vcpu); 185 185 + 186 186 + test_get_cpuid2(vcpu); 204 187 205 188 kvm_vm_free(vm); 206 189 }