Merge branch 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6

+8

Documentation/kernel-parameters.txt

··· 104 104 Do not modify the syntax of boot loader parameters without extreme 105 105 need or coordination with <Documentation/i386/boot.txt>. 106 106 107 + There are also arch-specific kernel-parameters not documented here. 108 + See for example <Documentation/x86_64/boot-options.txt>. 109 + 107 110 Note that ALL kernel parameters listed below are CASE SENSITIVE, and that 108 111 a trailing = on the name of any parameter states that that parameter will 109 112 be entered as an environment variable, whereas its absence indicates that ··· 363 360 when calculating gettimeofday(). If specified 364 361 clocksource is not available, it defaults to PIT. 365 362 Format: { pit | tsc | cyclone | pmtmr } 363 + 364 + code_bytes [IA32] How many bytes of object code to print in an 365 + oops report. 366 + Range: 0 - 8192 367 + Default: 64 366 368 367 369 disable_8254_timer 368 370 enable_8254_timer

+79 -45

Documentation/x86_64/boot-options.txt

··· 180 180 pci=lastbus=NUMBER Scan upto NUMBER busses, no matter what the mptable says. 181 181 pci=noacpi Don't use ACPI to set up PCI interrupt routing. 182 182 183 - IOMMU 183 + IOMMU (input/output memory management unit) 184 184 185 - iommu=[size][,noagp][,off][,force][,noforce][,leak][,memaper[=order]][,merge] 186 - [,forcesac][,fullflush][,nomerge][,noaperture][,calgary] 187 - size set size of iommu (in bytes) 188 - noagp don't initialize the AGP driver and use full aperture. 189 - off don't use the IOMMU 190 - leak turn on simple iommu leak tracing (only when CONFIG_IOMMU_LEAK is on) 191 - memaper[=order] allocate an own aperture over RAM with size 32MB^order. 192 - noforce don't force IOMMU usage. Default. 193 - force Force IOMMU. 194 - merge Do SG merging. Implies force (experimental) 195 - nomerge Don't do SG merging. 196 - forcesac For SAC mode for masks <40bits (experimental) 197 - fullflush Flush IOMMU on each allocation (default) 198 - nofullflush Don't use IOMMU fullflush 199 - allowed overwrite iommu off workarounds for specific chipsets. 200 - soft Use software bounce buffering (default for Intel machines) 201 - noaperture Don't touch the aperture for AGP. 202 - allowdac Allow DMA >4GB 203 - When off all DMA over >4GB is forced through an IOMMU or bounce 204 - buffering. 205 - nodac Forbid DMA >4GB 206 - panic Always panic when IOMMU overflows 207 - calgary Use the Calgary IOMMU if it is available 185 + Currently four x86-64 PCI-DMA mapping implementations exist: 208 186 209 - swiotlb=pages[,force] 187 + 1. <arch/x86_64/kernel/pci-nommu.c>: use no hardware/software IOMMU at all 188 + (e.g. because you have < 3 GB memory). 189 + Kernel boot message: "PCI-DMA: Disabling IOMMU" 210 190 211 - pages Prereserve that many 128K pages for the software IO bounce buffering. 212 - force Force all IO through the software TLB. 191 + 2. <arch/x86_64/kernel/pci-gart.c>: AMD GART based hardware IOMMU. 192 + Kernel boot message: "PCI-DMA: using GART IOMMU" 213 193 214 - calgary=[64k,128k,256k,512k,1M,2M,4M,8M] 215 - calgary=[translate_empty_slots] 216 - calgary=[disable=<PCI bus number>] 194 + 3. <arch/x86_64/kernel/pci-swiotlb.c> : Software IOMMU implementation. Used 195 + e.g. if there is no hardware IOMMU in the system and it is need because 196 + you have >3GB memory or told the kernel to us it (iommu=soft)) 197 + Kernel boot message: "PCI-DMA: Using software bounce buffering 198 + for IO (SWIOTLB)" 199 + 200 + 4. <arch/x86_64/pci-calgary.c> : IBM Calgary hardware IOMMU. Used in IBM 201 + pSeries and xSeries servers. This hardware IOMMU supports DMA address 202 + mapping with memory protection, etc. 203 + Kernel boot message: "PCI-DMA: Using Calgary IOMMU" 204 + 205 + iommu=[<size>][,noagp][,off][,force][,noforce][,leak[=<nr_of_leak_pages>] 206 + [,memaper[=<order>]][,merge][,forcesac][,fullflush][,nomerge] 207 + [,noaperture][,calgary] 208 + 209 + General iommu options: 210 + off Don't initialize and use any kind of IOMMU. 211 + noforce Don't force hardware IOMMU usage when it is not needed. 212 + (default). 213 + force Force the use of the hardware IOMMU even when it is 214 + not actually needed (e.g. because < 3 GB memory). 215 + soft Use software bounce buffering (SWIOTLB) (default for 216 + Intel machines). This can be used to prevent the usage 217 + of an available hardware IOMMU. 218 + 219 + iommu options only relevant to the AMD GART hardware IOMMU: 220 + <size> Set the size of the remapping area in bytes. 221 + allowed Overwrite iommu off workarounds for specific chipsets. 222 + fullflush Flush IOMMU on each allocation (default). 223 + nofullflush Don't use IOMMU fullflush. 224 + leak Turn on simple iommu leak tracing (only when 225 + CONFIG_IOMMU_LEAK is on). Default number of leak pages 226 + is 20. 227 + memaper[=<order>] Allocate an own aperture over RAM with size 32MB<<order. 228 + (default: order=1, i.e. 64MB) 229 + merge Do scatter-gather (SG) merging. Implies "force" 230 + (experimental). 231 + nomerge Don't do scatter-gather (SG) merging. 232 + noaperture Ask the IOMMU not to touch the aperture for AGP. 233 + forcesac Force single-address cycle (SAC) mode for masks <40bits 234 + (experimental). 235 + noagp Don't initialize the AGP driver and use full aperture. 236 + allowdac Allow double-address cycle (DAC) mode, i.e. DMA >4GB. 237 + DAC is used with 32-bit PCI to push a 64-bit address in 238 + two cycles. When off all DMA over >4GB is forced through 239 + an IOMMU or software bounce buffering. 240 + nodac Forbid DAC mode, i.e. DMA >4GB. 241 + panic Always panic when IOMMU overflows. 242 + calgary Use the Calgary IOMMU if it is available 243 + 244 + iommu options only relevant to the software bounce buffering (SWIOTLB) IOMMU 245 + implementation: 246 + swiotlb=<pages>[,force] 247 + <pages> Prereserve that many 128K pages for the software IO 248 + bounce buffering. 249 + force Force all IO through the software TLB. 250 + 251 + Settings for the IBM Calgary hardware IOMMU currently found in IBM 252 + pSeries and xSeries machines: 253 + 254 + calgary=[64k,128k,256k,512k,1M,2M,4M,8M] 255 + calgary=[translate_empty_slots] 256 + calgary=[disable=<PCI bus number>] 257 + panic Always panic when IOMMU overflows 217 258 218 259 64k,...,8M - Set the size of each PCI slot's translation table 219 260 when using the Calgary IOMMU. This is the size of the translation ··· 275 234 276 235 Debugging 277 236 278 - oops=panic Always panic on oopses. Default is to just kill the process, 279 - but there is a small probability of deadlocking the machine. 280 - This will also cause panics on machine check exceptions. 281 - Useful together with panic=30 to trigger a reboot. 237 + oops=panic Always panic on oopses. Default is to just kill the process, 238 + but there is a small probability of deadlocking the machine. 239 + This will also cause panics on machine check exceptions. 240 + Useful together with panic=30 to trigger a reboot. 282 241 283 - kstack=N Print that many words from the kernel stack in oops dumps. 242 + kstack=N Print N words from the kernel stack in oops dumps. 284 243 285 - pagefaulttrace Dump all page faults. Only useful for extreme debugging 244 + pagefaulttrace Dump all page faults. Only useful for extreme debugging 286 245 and will create a lot of output. 287 246 288 247 call_trace=[old|both|newfallback|new] ··· 292 251 newfallback: use new unwinder but fall back to old if it gets 293 252 stuck (default) 294 253 295 - call_trace=[old|both|newfallback|new] 296 - old: use old inexact backtracer 297 - new: use new exact dwarf2 unwinder 298 - both: print entries from both 299 - newfallback: use new unwinder but fall back to old if it gets 300 - stuck (default) 301 - 302 - Misc 254 + Miscellaneous 303 255 304 256 noreplacement Don't replace instructions with more appropriate ones 305 257 for the CPU. This may be useful on asymmetric MP systems 306 - where some CPU have less capabilities than the others. 258 + where some CPUs have less capabilities than others.

+1 -1

Documentation/x86_64/cpu-hotplug-spec

··· 2 2 --------------------------------------------------- 3 3 4 4 Linux/x86-64 supports CPU hotplug now. For various reasons Linux wants to 5 - know in advance boot time the maximum number of CPUs that could be plugged 5 + know in advance of boot time the maximum number of CPUs that could be plugged 6 6 into the system. ACPI 3.0 currently has no official way to supply 7 7 this information from the firmware to the operating system. 8 8

+13 -13

Documentation/x86_64/kernel-stacks

··· 9 9 except for the thread_info structure at the bottom. 10 10 11 11 In addition to the per thread stacks, there are specialized stacks 12 - associated with each cpu. These stacks are only used while the kernel 13 - is in control on that cpu, when a cpu returns to user space the 14 - specialized stacks contain no useful data. The main cpu stacks is 12 + associated with each CPU. These stacks are only used while the kernel 13 + is in control on that CPU; when a CPU returns to user space the 14 + specialized stacks contain no useful data. The main CPU stacks are: 15 15 16 16 * Interrupt stack. IRQSTACKSIZE 17 17 ··· 32 32 to automatically switch to a new stack for designated events such as 33 33 double fault or NMI, which makes it easier to handle these unusual 34 34 events on x86_64. This feature is called the Interrupt Stack Table 35 - (IST). There can be up to 7 IST entries per cpu. The IST code is an 36 - index into the Task State Segment (TSS), the IST entries in the TSS 37 - point to dedicated stacks, each stack can be a different size. 35 + (IST). There can be up to 7 IST entries per CPU. The IST code is an 36 + index into the Task State Segment (TSS). The IST entries in the TSS 37 + point to dedicated stacks; each stack can be a different size. 38 38 39 - An IST is selected by an non-zero value in the IST field of an 39 + An IST is selected by a non-zero value in the IST field of an 40 40 interrupt-gate descriptor. When an interrupt occurs and the hardware 41 41 loads such a descriptor, the hardware automatically sets the new stack 42 42 pointer based on the IST value, then invokes the interrupt handler. If 43 43 software wants to allow nested IST interrupts then the handler must 44 44 adjust the IST values on entry to and exit from the interrupt handler. 45 - (this is occasionally done, e.g. for debug exceptions) 45 + (This is occasionally done, e.g. for debug exceptions.) 46 46 47 47 Events with different IST codes (i.e. with different stacks) can be 48 48 nested. For example, a debug interrupt can safely be interrupted by an ··· 58 58 59 59 Used for interrupt 12 - Stack Fault Exception (#SS). 60 60 61 - This allows to recover from invalid stack segments. Rarely 61 + This allows the CPU to recover from invalid stack segments. Rarely 62 62 happens. 63 63 64 64 * DOUBLEFAULT_STACK. EXCEPTION_STKSZ (PAGE_SIZE). 65 65 66 66 Used for interrupt 8 - Double Fault Exception (#DF). 67 67 68 - Invoked when handling a exception causes another exception. Happens 69 - when the kernel is very confused (e.g. kernel stack pointer corrupt) 70 - Using a separate stack allows to recover from it well enough in many 71 - cases to still output an oops. 68 + Invoked when handling one exception causes another exception. Happens 69 + when the kernel is very confused (e.g. kernel stack pointer corrupt). 70 + Using a separate stack allows the kernel to recover from it well enough 71 + in many cases to still output an oops. 72 72 73 73 * NMI_STACK. EXCEPTION_STKSZ (PAGE_SIZE). 74 74

+70

Documentation/x86_64/machinecheck

··· 1 + 2 + Configurable sysfs parameters for the x86-64 machine check code. 3 + 4 + Machine checks report internal hardware error conditions detected 5 + by the CPU. Uncorrected errors typically cause a machine check 6 + (often with panic), corrected ones cause a machine check log entry. 7 + 8 + Machine checks are organized in banks (normally associated with 9 + a hardware subsystem) and subevents in a bank. The exact meaning 10 + of the banks and subevent is CPU specific. 11 + 12 + mcelog knows how to decode them. 13 + 14 + When you see the "Machine check errors logged" message in the system 15 + log then mcelog should run to collect and decode machine check entries 16 + from /dev/mcelog. Normally mcelog should be run regularly from a cronjob. 17 + 18 + Each CPU has a directory in /sys/devices/system/machinecheck/machinecheckN 19 + (N = CPU number) 20 + 21 + The directory contains some configurable entries: 22 + 23 + Entries: 24 + 25 + bankNctl 26 + (N bank number) 27 + 64bit Hex bitmask enabling/disabling specific subevents for bank N 28 + When a bit in the bitmask is zero then the respective 29 + subevent will not be reported. 30 + By default all events are enabled. 31 + Note that BIOS maintain another mask to disable specific events 32 + per bank. This is not visible here 33 + 34 + The following entries appear for each CPU, but they are truly shared 35 + between all CPUs. 36 + 37 + check_interval 38 + How often to poll for corrected machine check errors, in seconds 39 + (Note output is hexademical). Default 5 minutes. 40 + 41 + tolerant 42 + Tolerance level. When a machine check exception occurs for a non 43 + corrected machine check the kernel can take different actions. 44 + Since machine check exceptions can happen any time it is sometimes 45 + risky for the kernel to kill a process because it defies 46 + normal kernel locking rules. The tolerance level configures 47 + how hard the kernel tries to recover even at some risk of deadlock. 48 + 49 + 0: always panic, 50 + 1: panic if deadlock possible, 51 + 2: try to avoid panic, 52 + 3: never panic or exit (for testing only) 53 + 54 + Default: 1 55 + 56 + Note this only makes a difference if the CPU allows recovery 57 + from a machine check exception. Current x86 CPUs generally do not. 58 + 59 + trigger 60 + Program to run when a machine check event is detected. 61 + This is an alternative to running mcelog regularly from cron 62 + and allows to detect events faster. 63 + 64 + TBD document entries for AMD threshold interrupt configuration 65 + 66 + For more details about the x86 machine check architecture 67 + see the Intel and AMD architecture manuals from their developer websites. 68 + 69 + For more details about the architecture see 70 + see http://one.firstfloor.org/~andi/mce.pdf

+11 -11

Documentation/x86_64/mm.txt

··· 3 3 4 4 Virtual memory map with 4 level page tables: 5 5 6 - 0000000000000000 - 00007fffffffffff (=47bits) user space, different per mm 6 + 0000000000000000 - 00007fffffffffff (=47 bits) user space, different per mm 7 7 hole caused by [48:63] sign extension 8 - ffff800000000000 - ffff80ffffffffff (=40bits) guard hole 9 - ffff810000000000 - ffffc0ffffffffff (=46bits) direct mapping of all phys. memory 10 - ffffc10000000000 - ffffc1ffffffffff (=40bits) hole 11 - ffffc20000000000 - ffffe1ffffffffff (=45bits) vmalloc/ioremap space 8 + ffff800000000000 - ffff80ffffffffff (=40 bits) guard hole 9 + ffff810000000000 - ffffc0ffffffffff (=46 bits) direct mapping of all phys. memory 10 + ffffc10000000000 - ffffc1ffffffffff (=40 bits) hole 11 + ffffc20000000000 - ffffe1ffffffffff (=45 bits) vmalloc/ioremap space 12 12 ... unused hole ... 13 - ffffffff80000000 - ffffffff82800000 (=40MB) kernel text mapping, from phys 0 13 + ffffffff80000000 - ffffffff82800000 (=40 MB) kernel text mapping, from phys 0 14 14 ... unused hole ... 15 - ffffffff88000000 - fffffffffff00000 (=1919MB) module mapping space 15 + ffffffff88000000 - fffffffffff00000 (=1919 MB) module mapping space 16 16 17 - The direct mapping covers all memory in the system upto the highest 17 + The direct mapping covers all memory in the system up to the highest 18 18 memory address (this means in some cases it can also include PCI memory 19 - holes) 19 + holes). 20 20 21 21 vmalloc space is lazily synchronized into the different PML4 pages of 22 22 the processes using the page fault handler, with init_level4_pgt as 23 23 reference. 24 24 25 - Current X86-64 implementations only support 40 bit of address space, 26 - but we support upto 46bits. This expands into MBZ space in the page tables. 25 + Current X86-64 implementations only support 40 bits of address space, 26 + but we support up to 46 bits. This expands into MBZ space in the page tables. 27 27 28 28 -Andi Kleen, Jul 2004

+1

MAINTAINERS

··· 3779 3779 M: ak@suse.de 3780 3780 L: discuss@x86-64.org 3781 3781 W: http://www.x86-64.org 3782 + T: quilt ftp://ftp.firstfloor.org/pub/ak/x86_64/quilt-current 3782 3783 S: Maintained 3783 3784 3784 3785 YAM DRIVER FOR AX.25

+18

arch/i386/Kconfig

··· 203 203 However, when run without a hypervisor the kernel is 204 204 theoretically slower. If in doubt, say N. 205 205 206 + config VMI 207 + bool "VMI Paravirt-ops support" 208 + depends on PARAVIRT 209 + default y 210 + help 211 + VMI provides a paravirtualized interface to multiple hypervisors 212 + include VMware ESX server and Xen by connecting to a ROM module 213 + provided by the hypervisor. 214 + 206 215 config ACPI_SRAT 207 216 bool 208 217 default y ··· 1272 1263 config KTIME_SCALAR 1273 1264 bool 1274 1265 default y 1266 + 1267 + config NO_IDLE_HZ 1268 + bool 1269 + depends on PARAVIRT 1270 + default y 1271 + help 1272 + Switches the regular HZ timer off when the system is going idle. 1273 + This helps a hypervisor detect that the Linux system is idle, 1274 + reducing the overhead of idle systems.

-5

arch/i386/Kconfig.cpu

··· 226 226 depends on !M386 227 227 default y 228 228 229 - config X86_XADD 230 - bool 231 - depends on !M386 232 - default y 233 - 234 229 config X86_L1_CACHE_SHIFT 235 230 int 236 231 default "7" if MPENTIUM4 || X86_GENERIC

+1 -1

arch/i386/Kconfig.debug

··· 87 87 88 88 config DEBUG_PARAVIRT 89 89 bool "Enable some paravirtualization debugging" 90 - default y 90 + default n 91 91 depends on PARAVIRT && DEBUG_KERNEL 92 92 help 93 93 Currently deliberately clobbers regs which are allowed to be

+37 -14

arch/i386/defconfig

··· 1 1 # 2 2 # Automatically generated make config: don't edit 3 - # Linux kernel version: 2.6.20-rc3 4 - # Fri Jan 5 11:54:46 2007 3 + # Linux kernel version: 2.6.20-git8 4 + # Tue Feb 13 11:25:18 2007 5 5 # 6 6 CONFIG_X86_32=y 7 7 CONFIG_GENERIC_TIME=y ··· 10 10 CONFIG_SEMAPHORE_SLEEPERS=y 11 11 CONFIG_X86=y 12 12 CONFIG_MMU=y 13 + CONFIG_ZONE_DMA=y 13 14 CONFIG_GENERIC_ISA_DMA=y 14 15 CONFIG_GENERIC_IOMAP=y 15 16 CONFIG_GENERIC_BUG=y ··· 140 139 # CONFIG_MVIAC3_2 is not set 141 140 CONFIG_X86_GENERIC=y 142 141 CONFIG_X86_CMPXCHG=y 143 - CONFIG_X86_XADD=y 144 142 CONFIG_X86_L1_CACHE_SHIFT=7 145 143 CONFIG_RWSEM_XCHGADD_ALGORITHM=y 146 144 # CONFIG_ARCH_HAS_ILOG2_U32 is not set ··· 198 198 # CONFIG_SPARSEMEM_STATIC is not set 199 199 CONFIG_SPLIT_PTLOCK_CPUS=4 200 200 CONFIG_RESOURCES_64BIT=y 201 + CONFIG_ZONE_DMA_FLAG=1 201 202 # CONFIG_HIGHPTE is not set 202 203 # CONFIG_MATH_EMULATION is not set 203 204 CONFIG_MTRR=y ··· 212 211 CONFIG_HZ=250 213 212 # CONFIG_KEXEC is not set 214 213 # CONFIG_CRASH_DUMP is not set 214 + CONFIG_PHYSICAL_START=0x100000 215 215 # CONFIG_RELOCATABLE is not set 216 216 CONFIG_PHYSICAL_ALIGN=0x100000 217 217 # CONFIG_HOTPLUG_CPU is not set ··· 231 229 # ACPI (Advanced Configuration and Power Interface) Support 232 230 # 233 231 CONFIG_ACPI=y 232 + CONFIG_ACPI_PROCFS=y 234 233 CONFIG_ACPI_AC=y 235 234 CONFIG_ACPI_BATTERY=y 236 235 CONFIG_ACPI_BUTTON=y 237 - # CONFIG_ACPI_VIDEO is not set 238 236 # CONFIG_ACPI_HOTKEY is not set 239 237 CONFIG_ACPI_FAN=y 240 238 # CONFIG_ACPI_DOCK is not set 239 + # CONFIG_ACPI_BAY is not set 241 240 CONFIG_ACPI_PROCESSOR=y 242 241 CONFIG_ACPI_THERMAL=y 243 242 # CONFIG_ACPI_ASUS is not set ··· 309 306 CONFIG_PCI_MMCONFIG=y 310 307 # CONFIG_PCIEPORTBUS is not set 311 308 CONFIG_PCI_MSI=y 312 - # CONFIG_PCI_MULTITHREAD_PROBE is not set 313 309 # CONFIG_PCI_DEBUG is not set 314 310 # CONFIG_HT_IRQ is not set 315 311 CONFIG_ISA_DMA_API=y ··· 349 347 CONFIG_XFRM=y 350 348 # CONFIG_XFRM_USER is not set 351 349 # CONFIG_XFRM_SUB_POLICY is not set 350 + # CONFIG_XFRM_MIGRATE is not set 352 351 # CONFIG_NET_KEY is not set 353 352 CONFIG_INET=y 354 353 CONFIG_IP_MULTICAST=y ··· 449 446 CONFIG_PREVENT_FIRMWARE_BUILD=y 450 447 CONFIG_FW_LOADER=y 451 448 # CONFIG_DEBUG_DRIVER is not set 449 + # CONFIG_DEBUG_DEVRES is not set 452 450 # CONFIG_SYS_HYPERVISOR is not set 453 451 454 452 # ··· 470 466 # 471 467 # Plug and Play support 472 468 # 473 - CONFIG_PNP=y 474 - CONFIG_PNPACPI=y 469 + # CONFIG_PNP is not set 475 470 476 471 # 477 472 # Block devices ··· 518 515 # CONFIG_BLK_DEV_IDETAPE is not set 519 516 # CONFIG_BLK_DEV_IDEFLOPPY is not set 520 517 # CONFIG_BLK_DEV_IDESCSI is not set 518 + CONFIG_BLK_DEV_IDEACPI=y 521 519 # CONFIG_IDE_TASK_IOCTL is not set 522 520 523 521 # ··· 551 547 # CONFIG_BLK_DEV_JMICRON is not set 552 548 # CONFIG_BLK_DEV_SC1200 is not set 553 549 CONFIG_BLK_DEV_PIIX=y 550 + # CONFIG_BLK_DEV_IT8213 is not set 554 551 # CONFIG_BLK_DEV_IT821X is not set 555 552 # CONFIG_BLK_DEV_NS87415 is not set 556 553 # CONFIG_BLK_DEV_PDC202XX_OLD is not set ··· 562 557 # CONFIG_BLK_DEV_SLC90E66 is not set 563 558 # CONFIG_BLK_DEV_TRM290 is not set 564 559 # CONFIG_BLK_DEV_VIA82CXXX is not set 560 + # CONFIG_BLK_DEV_TC86C001 is not set 565 561 # CONFIG_IDE_ARM is not set 566 562 CONFIG_BLK_DEV_IDEDMA=y 567 563 # CONFIG_IDEDMA_IVB is not set ··· 661 655 # Serial ATA (prod) and Parallel ATA (experimental) drivers 662 656 # 663 657 CONFIG_ATA=y 658 + # CONFIG_ATA_NONSTANDARD is not set 664 659 CONFIG_SATA_AHCI=y 665 660 CONFIG_SATA_SVW=y 666 661 CONFIG_ATA_PIIX=y ··· 677 670 # CONFIG_SATA_ULI is not set 678 671 CONFIG_SATA_VIA=y 679 672 # CONFIG_SATA_VITESSE is not set 673 + # CONFIG_SATA_INIC162X is not set 680 674 CONFIG_SATA_INTEL_COMBINED=y 681 675 # CONFIG_PATA_ALI is not set 682 676 # CONFIG_PATA_AMD is not set ··· 695 687 # CONFIG_PATA_HPT3X2N is not set 696 688 # CONFIG_PATA_HPT3X3 is not set 697 689 # CONFIG_PATA_IT821X is not set 690 + # CONFIG_PATA_IT8213 is not set 698 691 # CONFIG_PATA_JMICRON is not set 699 692 # CONFIG_PATA_TRIFLEX is not set 700 693 # CONFIG_PATA_MARVELL is not set ··· 748 739 # Subsystem Options 749 740 # 750 741 # CONFIG_IEEE1394_VERBOSEDEBUG is not set 751 - # CONFIG_IEEE1394_OUI_DB is not set 752 742 # CONFIG_IEEE1394_EXTRA_CONFIG_ROMS is not set 753 - # CONFIG_IEEE1394_EXPORT_FULL_API is not set 754 743 755 744 # 756 745 # Device Drivers ··· 772 765 # I2O device support 773 766 # 774 767 # CONFIG_I2O is not set 768 + 769 + # 770 + # Macintosh device drivers 771 + # 772 + # CONFIG_MAC_EMUMOUSEBTN is not set 775 773 776 774 # 777 775 # Network device support ··· 845 833 # CONFIG_SUNDANCE is not set 846 834 # CONFIG_TLAN is not set 847 835 # CONFIG_VIA_RHINE is not set 836 + # CONFIG_SC92031 is not set 848 837 849 838 # 850 839 # Ethernet (1000 Mbit) ··· 868 855 CONFIG_TIGON3=y 869 856 CONFIG_BNX2=y 870 857 # CONFIG_QLA3XXX is not set 858 + # CONFIG_ATL1 is not set 871 859 872 860 # 873 861 # Ethernet (10000 Mbit) 874 862 # 875 863 # CONFIG_CHELSIO_T1 is not set 864 + # CONFIG_CHELSIO_T3 is not set 876 865 # CONFIG_IXGB is not set 877 866 # CONFIG_S2IO is not set 878 867 # CONFIG_MYRI10GE is not set ··· 1105 1090 # Open Sound System 1106 1091 # 1107 1092 CONFIG_SOUND_PRIME=y 1093 + CONFIG_OBSOLETE_OSS=y 1108 1094 # CONFIG_SOUND_BT878 is not set 1109 1095 # CONFIG_SOUND_ES1371 is not set 1110 1096 CONFIG_SOUND_ICH=y ··· 1119 1103 # HID Devices 1120 1104 # 1121 1105 CONFIG_HID=y 1106 + # CONFIG_HID_DEBUG is not set 1122 1107 1123 1108 # 1124 1109 # USB support ··· 1134 1117 # Miscellaneous USB options 1135 1118 # 1136 1119 CONFIG_USB_DEVICEFS=y 1137 - # CONFIG_USB_BANDWIDTH is not set 1138 1120 # CONFIG_USB_DYNAMIC_MINORS is not set 1139 1121 # CONFIG_USB_SUSPEND is not set 1140 - # CONFIG_USB_MULTITHREAD_PROBE is not set 1141 1122 # CONFIG_USB_OTG is not set 1142 1123 1143 1124 # ··· 1145 1130 # CONFIG_USB_EHCI_SPLIT_ISO is not set 1146 1131 # CONFIG_USB_EHCI_ROOT_HUB_TT is not set 1147 1132 # CONFIG_USB_EHCI_TT_NEWSCHED is not set 1133 + # CONFIG_USB_EHCI_BIG_ENDIAN_MMIO is not set 1148 1134 # CONFIG_USB_ISP116X_HCD is not set 1149 1135 CONFIG_USB_OHCI_HCD=y 1150 - # CONFIG_USB_OHCI_BIG_ENDIAN is not set 1136 + # CONFIG_USB_OHCI_BIG_ENDIAN_DESC is not set 1137 + # CONFIG_USB_OHCI_BIG_ENDIAN_MMIO is not set 1151 1138 CONFIG_USB_OHCI_LITTLE_ENDIAN=y 1152 1139 CONFIG_USB_UHCI_HCD=y 1153 1140 # CONFIG_USB_SL811_HCD is not set ··· 1200 1183 # CONFIG_USB_ATI_REMOTE2 is not set 1201 1184 # CONFIG_USB_KEYSPAN_REMOTE is not set 1202 1185 # CONFIG_USB_APPLETOUCH is not set 1186 + # CONFIG_USB_GTCO is not set 1203 1187 1204 1188 # 1205 1189 # USB Imaging devices ··· 1303 1285 1304 1286 # 1305 1287 # DMA Devices 1288 + # 1289 + 1290 + # 1291 + # Auxiliary Display support 1306 1292 # 1307 1293 1308 1294 # ··· 1502 1480 # CONFIG_DEBUG_FS is not set 1503 1481 # CONFIG_HEADERS_CHECK is not set 1504 1482 CONFIG_DEBUG_KERNEL=y 1483 + # CONFIG_DEBUG_SHIRQ is not set 1505 1484 CONFIG_LOG_BUF_SHIFT=18 1506 1485 CONFIG_DETECT_SOFTLOCKUP=y 1507 1486 # CONFIG_SCHEDSTATS is not set ··· 1511 1488 # CONFIG_RT_MUTEX_TESTER is not set 1512 1489 # CONFIG_DEBUG_SPINLOCK is not set 1513 1490 # CONFIG_DEBUG_MUTEXES is not set 1514 - # CONFIG_DEBUG_RWSEMS is not set 1515 1491 # CONFIG_DEBUG_LOCK_ALLOC is not set 1516 1492 # CONFIG_PROVE_LOCKING is not set 1517 1493 # CONFIG_DEBUG_SPINLOCK_SLEEP is not set ··· 1555 1533 # CONFIG_LIBCRC32C is not set 1556 1534 CONFIG_ZLIB_INFLATE=y 1557 1535 CONFIG_PLIST=y 1558 - CONFIG_IOMAP_COPY=y 1536 + CONFIG_HAS_IOMEM=y 1537 + CONFIG_HAS_IOPORT=y 1559 1538 CONFIG_GENERIC_HARDIRQS=y 1560 1539 CONFIG_GENERIC_IRQ_PROBE=y 1561 1540 CONFIG_GENERIC_PENDING_IRQ=y

+2 -1

arch/i386/kernel/Makefile

··· 40 40 obj-$(CONFIG_HPET_TIMER) += hpet.o 41 41 obj-$(CONFIG_K8_NB) += k8.o 42 42 43 - # Make sure this is linked after any other paravirt_ops structs: see head.S 43 + obj-$(CONFIG_VMI) += vmi.o vmitime.o 44 44 obj-$(CONFIG_PARAVIRT) += paravirt.o 45 + obj-y += pcspeaker.o 45 46 46 47 EXTRA_AFLAGS := -traditional 47 48

+5 -1

arch/i386/kernel/apic.c

··· 36 36 #include <asm/hpet.h> 37 37 #include <asm/i8253.h> 38 38 #include <asm/nmi.h> 39 + #include <asm/idle.h> 39 40 40 41 #include <mach_apic.h> 41 42 #include <mach_apicdef.h> ··· 1256 1255 * Besides, if we don't timer interrupts ignore the global 1257 1256 * interrupt lock, which is the WrongThing (tm) to do. 1258 1257 */ 1258 + exit_idle(); 1259 1259 irq_enter(); 1260 1260 smp_local_timer_interrupt(); 1261 1261 irq_exit(); ··· 1307 1305 { 1308 1306 unsigned long v; 1309 1307 1308 + exit_idle(); 1310 1309 irq_enter(); 1311 1310 /* 1312 1311 * Check if this really is a spurious interrupt and ACK it ··· 1332 1329 { 1333 1330 unsigned long v, v1; 1334 1331 1332 + exit_idle(); 1335 1333 irq_enter(); 1336 1334 /* First tickle the hardware, only then report what went on. -- REW */ 1337 1335 v = apic_read(APIC_ESR); ··· 1399 1395 if (!skip_ioapic_setup && nr_ioapics) 1400 1396 setup_IO_APIC(); 1401 1397 #endif 1402 - setup_boot_APIC_clock(); 1398 + setup_boot_clock(); 1403 1399 1404 1400 return 0; 1405 1401 }

+19 -9

arch/i386/kernel/apm.c

··· 211 211 #include <linux/slab.h> 212 212 #include <linux/stat.h> 213 213 #include <linux/proc_fs.h> 214 + #include <linux/seq_file.h> 214 215 #include <linux/miscdevice.h> 215 216 #include <linux/apm_bios.h> 216 217 #include <linux/init.h> ··· 1637 1636 return 0; 1638 1637 } 1639 1638 1640 - static int apm_get_info(char *buf, char **start, off_t fpos, int length) 1639 + static int proc_apm_show(struct seq_file *m, void *v) 1641 1640 { 1642 - char * p; 1643 1641 unsigned short bx; 1644 1642 unsigned short cx; 1645 1643 unsigned short dx; ··· 1649 1649 int percentage = -1; 1650 1650 int time_units = -1; 1651 1651 char *units = "?"; 1652 - 1653 - p = buf; 1654 1652 1655 1653 if ((num_online_cpus() == 1) && 1656 1654 !(error = apm_get_power_status(&bx, &cx, &dx))) { ··· 1703 1705 -1: Unknown 1704 1706 8) min = minutes; sec = seconds */ 1705 1707 1706 - p += sprintf(p, "%s %d.%d 0x%02x 0x%02x 0x%02x 0x%02x %d%% %d %s\n", 1708 + seq_printf(m, "%s %d.%d 0x%02x 0x%02x 0x%02x 0x%02x %d%% %d %s\n", 1707 1709 driver_version, 1708 1710 (apm_info.bios.version >> 8) & 0xff, 1709 1711 apm_info.bios.version & 0xff, ··· 1714 1716 percentage, 1715 1717 time_units, 1716 1718 units); 1717 - 1718 - return p - buf; 1719 + return 0; 1719 1720 } 1721 + 1722 + static int proc_apm_open(struct inode *inode, struct file *file) 1723 + { 1724 + return single_open(file, proc_apm_show, NULL); 1725 + } 1726 + 1727 + static const struct file_operations apm_file_ops = { 1728 + .owner = THIS_MODULE, 1729 + .open = proc_apm_open, 1730 + .read = seq_read, 1731 + .llseek = seq_lseek, 1732 + .release = single_release, 1733 + }; 1720 1734 1721 1735 static int apm(void *unused) 1722 1736 { ··· 2351 2341 set_base(gdt[APM_DS >> 3], 2352 2342 __va((unsigned long)apm_info.bios.dseg << 4)); 2353 2343 2354 - apm_proc = create_proc_info_entry("apm", 0, NULL, apm_get_info); 2344 + apm_proc = create_proc_entry("apm", 0, NULL); 2355 2345 if (apm_proc) 2356 - apm_proc->owner = THIS_MODULE; 2346 + apm_proc->proc_fops = &apm_file_ops; 2357 2347 2358 2348 kapmd_task = kthread_create(apm, NULL, "kapmd"); 2359 2349 if (IS_ERR(kapmd_task)) {

+1 -1

arch/i386/kernel/asm-offsets.c

··· 72 72 OFFSET(PT_EAX, pt_regs, eax); 73 73 OFFSET(PT_DS, pt_regs, xds); 74 74 OFFSET(PT_ES, pt_regs, xes); 75 - OFFSET(PT_GS, pt_regs, xgs); 75 + OFFSET(PT_FS, pt_regs, xfs); 76 76 OFFSET(PT_ORIG_EAX, pt_regs, orig_eax); 77 77 OFFSET(PT_EIP, pt_regs, eip); 78 78 OFFSET(PT_CS, pt_regs, xcs);

+7 -7

arch/i386/kernel/cpu/common.c

··· 605 605 struct pt_regs * __devinit idle_regs(struct pt_regs *regs) 606 606 { 607 607 memset(regs, 0, sizeof(struct pt_regs)); 608 - regs->xgs = __KERNEL_PDA; 608 + regs->xfs = __KERNEL_PDA; 609 609 return regs; 610 610 } 611 611 ··· 662 662 .pcurrent = &init_task, 663 663 }; 664 664 665 - static inline void set_kernel_gs(void) 665 + static inline void set_kernel_fs(void) 666 666 { 667 - /* Set %gs for this CPU's PDA. Memory clobber is to create a 667 + /* Set %fs for this CPU's PDA. Memory clobber is to create a 668 668 barrier with respect to any PDA operations, so the compiler 669 669 doesn't move any before here. */ 670 - asm volatile ("mov %0, %%gs" : : "r" (__KERNEL_PDA) : "memory"); 670 + asm volatile ("mov %0, %%fs" : : "r" (__KERNEL_PDA) : "memory"); 671 671 } 672 672 673 673 /* Initialize the CPU's GDT and PDA. The boot CPU does this for ··· 718 718 the boot CPU, this will transition from the boot gdt+pda to 719 719 the real ones). */ 720 720 load_gdt(cpu_gdt_descr); 721 - set_kernel_gs(); 721 + set_kernel_fs(); 722 722 } 723 723 724 724 /* Common CPU init for both boot and secondary CPUs */ ··· 764 764 __set_tss_desc(cpu, GDT_ENTRY_DOUBLEFAULT_TSS, &doublefault_tss); 765 765 #endif 766 766 767 - /* Clear %fs. */ 768 - asm volatile ("mov %0, %%fs" : : "r" (0)); 767 + /* Clear %gs. */ 768 + asm volatile ("mov %0, %%gs" : : "r" (0)); 769 769 770 770 /* Clear all 6 debug registers: */ 771 771 set_debugreg(0, 0);

+29 -23

arch/i386/kernel/cpu/cyrix.c

··· 6 6 #include <asm/io.h> 7 7 #include <asm/processor.h> 8 8 #include <asm/timer.h> 9 + #include <asm/pci-direct.h> 9 10 10 11 #include "cpu.h" 11 12 ··· 162 161 static void __cpuinit geode_configure(void) 163 162 { 164 163 unsigned long flags; 165 - u8 ccr3, ccr4; 164 + u8 ccr3; 166 165 local_irq_save(flags); 167 166 168 167 /* Suspend on halt power saving and enable #SUSP pin */ 169 168 setCx86(CX86_CCR2, getCx86(CX86_CCR2) | 0x88); 170 169 171 170 ccr3 = getCx86(CX86_CCR3); 172 - setCx86(CX86_CCR3, (ccr3 & 0x0f) | 0x10); /* Enable */ 171 + setCx86(CX86_CCR3, (ccr3 & 0x0f) | 0x10); /* enable MAPEN */ 173 172 174 - ccr4 = getCx86(CX86_CCR4); 175 - ccr4 |= 0x38; /* FPU fast, DTE cache, Mem bypass */ 176 - 177 - setCx86(CX86_CCR3, ccr3); 173 + 174 + /* FPU fast, DTE cache, Mem bypass */ 175 + setCx86(CX86_CCR4, getCx86(CX86_CCR4) | 0x38); 176 + setCx86(CX86_CCR3, ccr3); /* disable MAPEN */ 178 177 179 178 set_cx86_memwb(); 180 179 set_cx86_reorder(); ··· 183 182 local_irq_restore(flags); 184 183 } 185 184 186 - 187 - #ifdef CONFIG_PCI 188 - static struct pci_device_id __cpuinitdata cyrix_55x0[] = { 189 - { PCI_DEVICE(PCI_VENDOR_ID_CYRIX, PCI_DEVICE_ID_CYRIX_5510) }, 190 - { PCI_DEVICE(PCI_VENDOR_ID_CYRIX, PCI_DEVICE_ID_CYRIX_5520) }, 191 - { }, 192 - }; 193 - #endif 194 185 195 186 static void __cpuinit init_cyrix(struct cpuinfo_x86 *c) 196 187 { ··· 251 258 252 259 case 4: /* MediaGX/GXm or Geode GXM/GXLV/GX1 */ 253 260 #ifdef CONFIG_PCI 261 + { 262 + u32 vendor, device; 254 263 /* It isn't really a PCI quirk directly, but the cure is the 255 264 same. The MediaGX has deep magic SMM stuff that handles the 256 265 SB emulation. It thows away the fifo on disable_dma() which ··· 268 273 printk(KERN_INFO "Working around Cyrix MediaGX virtual DMA bugs.\n"); 269 274 isa_dma_bridge_buggy = 2; 270 275 276 + /* We do this before the PCI layer is running. However we 277 + are safe here as we know the bridge must be a Cyrix 278 + companion and must be present */ 279 + vendor = read_pci_config_16(0, 0, 0x12, PCI_VENDOR_ID); 280 + device = read_pci_config_16(0, 0, 0x12, PCI_DEVICE_ID); 271 281 272 282 /* 273 283 * The 5510/5520 companion chips have a funky PIT. 274 284 */ 275 - if (pci_dev_present(cyrix_55x0)) 285 + if (vendor == PCI_VENDOR_ID_CYRIX && 286 + (device == PCI_DEVICE_ID_CYRIX_5510 || device == PCI_DEVICE_ID_CYRIX_5520)) 276 287 pit_latch_buggy = 1; 288 + } 277 289 #endif 278 290 c->x86_cache_size=16; /* Yep 16K integrated cache thats it */ 279 291 280 292 /* GXm supports extended cpuid levels 'ala' AMD */ 281 293 if (c->cpuid_level == 2) { 282 294 /* Enable cxMMX extensions (GX1 Datasheet 54) */ 283 - setCx86(CX86_CCR7, getCx86(CX86_CCR7)|1); 295 + setCx86(CX86_CCR7, getCx86(CX86_CCR7) | 1); 284 296 285 - /* GXlv/GXm/GX1 */ 286 - if((dir1 >= 0x50 && dir1 <= 0x54) || dir1 >= 0x63) 297 + /* 298 + * GXm : 0x30 ... 0x5f GXm datasheet 51 299 + * GXlv: 0x6x GXlv datasheet 54 300 + * ? : 0x7x 301 + * GX1 : 0x8x GX1 datasheet 56 302 + */ 303 + if((0x30 <= dir1 && dir1 <= 0x6f) || (0x80 <=dir1 && dir1 <= 0x8f)) 287 304 geode_configure(); 288 305 get_model_name(c); /* get CPU marketing name */ 289 306 return; ··· 422 415 423 416 if (dir0 == 5 || dir0 == 3) 424 417 { 425 - unsigned char ccr3, ccr4; 418 + unsigned char ccr3; 426 419 unsigned long flags; 427 420 printk(KERN_INFO "Enabling CPUID on Cyrix processor.\n"); 428 421 local_irq_save(flags); 429 422 ccr3 = getCx86(CX86_CCR3); 430 - setCx86(CX86_CCR3, (ccr3 & 0x0f) | 0x10); /* enable MAPEN */ 431 - ccr4 = getCx86(CX86_CCR4); 432 - setCx86(CX86_CCR4, ccr4 | 0x80); /* enable cpuid */ 433 - setCx86(CX86_CCR3, ccr3); /* disable MAPEN */ 423 + setCx86(CX86_CCR3, (ccr3 & 0x0f) | 0x10); /* enable MAPEN */ 424 + setCx86(CX86_CCR4, getCx86(CX86_CCR4) | 0x80); /* enable cpuid */ 425 + setCx86(CX86_CCR3, ccr3); /* disable MAPEN */ 434 426 local_irq_restore(flags); 435 427 } 436 428 }

+1

arch/i386/kernel/cpu/mcheck/mce.c

··· 12 12 13 13 #include <asm/processor.h> 14 14 #include <asm/system.h> 15 + #include <asm/mce.h> 15 16 16 17 #include "mce.h" 17 18

+1 -1

arch/i386/kernel/cpu/mcheck/mce.h

··· 1 1 #include <linux/init.h> 2 + #include <asm/mce.h> 2 3 3 4 void amd_mcheck_init(struct cpuinfo_x86 *c); 4 5 void intel_p4_mcheck_init(struct cpuinfo_x86 *c); ··· 10 9 /* Call the installed machine check handler for this CPU setup. */ 11 10 extern fastcall void (*machine_check_vector)(struct pt_regs *, long error_code); 12 11 13 - extern int mce_disabled; 14 12 extern int nr_mce_banks; 15 13

+2

arch/i386/kernel/cpu/mcheck/p4.c

··· 12 12 #include <asm/system.h> 13 13 #include <asm/msr.h> 14 14 #include <asm/apic.h> 15 + #include <asm/idle.h> 15 16 16 17 #include <asm/therm_throt.h> 17 18 ··· 60 59 61 60 fastcall void smp_thermal_interrupt(struct pt_regs *regs) 62 61 { 62 + exit_idle(); 63 63 irq_enter(); 64 64 vendor_thermal_interrupt(regs); 65 65 irq_exit();

+30

arch/i386/kernel/cpu/mtrr/if.c

··· 211 211 default: 212 212 return -ENOTTY; 213 213 case MTRRIOC_ADD_ENTRY: 214 + #ifdef CONFIG_COMPAT 215 + case MTRRIOC32_ADD_ENTRY: 216 + #endif 214 217 if (!capable(CAP_SYS_ADMIN)) 215 218 return -EPERM; 216 219 err = ··· 221 218 file, 0); 222 219 break; 223 220 case MTRRIOC_SET_ENTRY: 221 + #ifdef CONFIG_COMPAT 222 + case MTRRIOC32_SET_ENTRY: 223 + #endif 224 224 if (!capable(CAP_SYS_ADMIN)) 225 225 return -EPERM; 226 226 err = mtrr_add(sentry.base, sentry.size, sentry.type, 0); 227 227 break; 228 228 case MTRRIOC_DEL_ENTRY: 229 + #ifdef CONFIG_COMPAT 230 + case MTRRIOC32_DEL_ENTRY: 231 + #endif 229 232 if (!capable(CAP_SYS_ADMIN)) 230 233 return -EPERM; 231 234 err = mtrr_file_del(sentry.base, sentry.size, file, 0); 232 235 break; 233 236 case MTRRIOC_KILL_ENTRY: 237 + #ifdef CONFIG_COMPAT 238 + case MTRRIOC32_KILL_ENTRY: 239 + #endif 234 240 if (!capable(CAP_SYS_ADMIN)) 235 241 return -EPERM; 236 242 err = mtrr_del(-1, sentry.base, sentry.size); 237 243 break; 238 244 case MTRRIOC_GET_ENTRY: 245 + #ifdef CONFIG_COMPAT 246 + case MTRRIOC32_GET_ENTRY: 247 + #endif 239 248 if (gentry.regnum >= num_var_ranges) 240 249 return -EINVAL; 241 250 mtrr_if->get(gentry.regnum, &gentry.base, &size, &type); ··· 264 249 265 250 break; 266 251 case MTRRIOC_ADD_PAGE_ENTRY: 252 + #ifdef CONFIG_COMPAT 253 + case MTRRIOC32_ADD_PAGE_ENTRY: 254 + #endif 267 255 if (!capable(CAP_SYS_ADMIN)) 268 256 return -EPERM; 269 257 err = ··· 274 256 file, 1); 275 257 break; 276 258 case MTRRIOC_SET_PAGE_ENTRY: 259 + #ifdef CONFIG_COMPAT 260 + case MTRRIOC32_SET_PAGE_ENTRY: 261 + #endif 277 262 if (!capable(CAP_SYS_ADMIN)) 278 263 return -EPERM; 279 264 err = mtrr_add_page(sentry.base, sentry.size, sentry.type, 0); 280 265 break; 281 266 case MTRRIOC_DEL_PAGE_ENTRY: 267 + #ifdef CONFIG_COMPAT 268 + case MTRRIOC32_DEL_PAGE_ENTRY: 269 + #endif 282 270 if (!capable(CAP_SYS_ADMIN)) 283 271 return -EPERM; 284 272 err = mtrr_file_del(sentry.base, sentry.size, file, 1); 285 273 break; 286 274 case MTRRIOC_KILL_PAGE_ENTRY: 275 + #ifdef CONFIG_COMPAT 276 + case MTRRIOC32_KILL_PAGE_ENTRY: 277 + #endif 287 278 if (!capable(CAP_SYS_ADMIN)) 288 279 return -EPERM; 289 280 err = mtrr_del_page(-1, sentry.base, sentry.size); 290 281 break; 291 282 case MTRRIOC_GET_PAGE_ENTRY: 283 + #ifdef CONFIG_COMPAT 284 + case MTRRIOC32_GET_PAGE_ENTRY: 285 + #endif 292 286 if (gentry.regnum >= num_var_ranges) 293 287 return -EINVAL; 294 288 mtrr_if->get(gentry.regnum, &gentry.base, &size, &type);

+3 -3

arch/i386/kernel/cpu/mtrr/main.c

··· 50 50 unsigned int *usage_table; 51 51 static DEFINE_MUTEX(mtrr_mutex); 52 52 53 - u32 size_or_mask, size_and_mask; 53 + u64 size_or_mask, size_and_mask; 54 54 55 55 static struct mtrr_ops * mtrr_ops[X86_VENDOR_NUM] = {}; 56 56 ··· 662 662 boot_cpu_data.x86_mask == 0x4)) 663 663 phys_addr = 36; 664 664 665 - size_or_mask = ~((1 << (phys_addr - PAGE_SHIFT)) - 1); 666 - size_and_mask = ~size_or_mask & 0xfff00000; 665 + size_or_mask = ~((1ULL << (phys_addr - PAGE_SHIFT)) - 1); 666 + size_and_mask = ~size_or_mask & 0xfffff00000ULL; 667 667 } else if (boot_cpu_data.x86_vendor == X86_VENDOR_CENTAUR && 668 668 boot_cpu_data.x86 == 6) { 669 669 /* VIA C* family have Intel style MTRRs, but

+1 -1

arch/i386/kernel/cpu/mtrr/mtrr.h

··· 84 84 85 85 extern void set_mtrr_ops(struct mtrr_ops * ops); 86 86 87 - extern u32 size_or_mask, size_and_mask; 87 + extern u64 size_or_mask, size_and_mask; 88 88 extern struct mtrr_ops * mtrr_if; 89 89 90 90 #define is_cpu(vnd) (mtrr_if && mtrr_if->vendor == X86_VENDOR_##vnd)

+9 -5

arch/i386/kernel/cpu/proc.c

··· 29 29 NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, 30 30 NULL, NULL, NULL, "syscall", NULL, NULL, NULL, NULL, 31 31 NULL, NULL, NULL, "mp", "nx", NULL, "mmxext", NULL, 32 - NULL, "fxsr_opt", "rdtscp", NULL, NULL, "lm", "3dnowext", "3dnow", 32 + NULL, "fxsr_opt", "pdpe1gb", "rdtscp", NULL, "lm", "3dnowext", "3dnow", 33 33 34 34 /* Transmeta-defined */ 35 35 "recovery", "longrun", NULL, "lrti", NULL, NULL, NULL, NULL, ··· 47 47 /* Intel-defined (#2) */ 48 48 "pni", NULL, NULL, "monitor", "ds_cpl", "vmx", "smx", "est", 49 49 "tm2", "ssse3", "cid", NULL, NULL, "cx16", "xtpr", NULL, 50 - NULL, NULL, "dca", NULL, NULL, NULL, NULL, NULL, 50 + NULL, NULL, "dca", NULL, NULL, NULL, NULL, "popcnt", 51 51 NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, 52 52 53 53 /* VIA/Cyrix/Centaur-defined */ ··· 57 57 NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, 58 58 59 59 /* AMD-defined (#2) */ 60 - "lahf_lm", "cmp_legacy", "svm", NULL, "cr8legacy", NULL, NULL, NULL, 61 - NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, 60 + "lahf_lm", "cmp_legacy", "svm", "extapic", "cr8legacy", "abm", 61 + "sse4a", "misalignsse", 62 + "3dnowprefetch", "osvw", "ibs", NULL, NULL, NULL, NULL, NULL, 62 63 NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, 63 64 NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, 64 65 }; ··· 70 69 "ttp", /* thermal trip */ 71 70 "tm", 72 71 "stc", 72 + "100mhzsteps", 73 + "hwpstate", 73 74 NULL, 74 - /* nothing */ /* constant_tsc - moved to flags */ 75 + NULL, /* constant_tsc - moved to flags */ 76 + /* nothing */ 75 77 }; 76 78 struct cpuinfo_x86 *c = v; 77 79 int i, n = c - cpu_data;

+4 -1

arch/i386/kernel/cpu/transmeta.c

··· 9 9 { 10 10 unsigned int cap_mask, uk, max, dummy; 11 11 unsigned int cms_rev1, cms_rev2; 12 - unsigned int cpu_rev, cpu_freq, cpu_flags, new_cpu_rev; 12 + unsigned int cpu_rev, cpu_freq = 0, cpu_flags, new_cpu_rev; 13 13 char cpu_info[65]; 14 14 15 15 get_model_name(c); /* Same as AMD/Cyrix */ ··· 72 72 wrmsr(0x80860004, ~0, uk); 73 73 c->x86_capability[0] = cpuid_edx(0x00000001); 74 74 wrmsr(0x80860004, cap_mask, uk); 75 + 76 + /* All Transmeta CPUs have a constant TSC */ 77 + set_bit(X86_FEATURE_CONSTANT_TSC, c->x86_capability); 75 78 76 79 /* If we can run i686 user-space code, call us an i686 */ 77 80 #define USER686 (X86_FEATURE_TSC|X86_FEATURE_CX8|X86_FEATURE_CMOV)

+2 -5

arch/i386/kernel/cpuid.c

··· 48 48 #ifdef CONFIG_SMP 49 49 50 50 struct cpuid_command { 51 - int cpu; 52 51 u32 reg; 53 52 u32 *data; 54 53 }; ··· 56 57 { 57 58 struct cpuid_command *cmd = (struct cpuid_command *)cmd_block; 58 59 59 - if (cmd->cpu == smp_processor_id()) 60 - cpuid(cmd->reg, &cmd->data[0], &cmd->data[1], &cmd->data[2], 60 + cpuid(cmd->reg, &cmd->data[0], &cmd->data[1], &cmd->data[2], 61 61 &cmd->data[3]); 62 62 } 63 63 ··· 68 70 if (cpu == smp_processor_id()) { 69 71 cpuid(reg, &data[0], &data[1], &data[2], &data[3]); 70 72 } else { 71 - cmd.cpu = cpu; 72 73 cmd.reg = reg; 73 74 cmd.data = data; 74 75 75 - smp_call_function(cpuid_smp_cpuid, &cmd, 1, 1); 76 + smp_call_function_single(cpu, cpuid_smp_cpuid, &cmd, 1, 1); 76 77 } 77 78 preempt_enable(); 78 79 }

+10 -8

arch/i386/kernel/e820.c

··· 14 14 #include <asm/pgtable.h> 15 15 #include <asm/page.h> 16 16 #include <asm/e820.h> 17 + #include <asm/setup.h> 17 18 18 19 #ifdef CONFIG_EFI 19 20 int efi_enabled = 0; ··· 157 156 .flags = IORESOURCE_BUSY | IORESOURCE_IO 158 157 } }; 159 158 160 - static int romsignature(const unsigned char *x) 159 + #define ROMSIGNATURE 0xaa55 160 + 161 + static int __init romsignature(const unsigned char *rom) 161 162 { 162 163 unsigned short sig; 163 - int ret = 0; 164 - if (probe_kernel_address((const unsigned short *)x, sig) == 0) 165 - ret = (sig == 0xaa55); 166 - return ret; 164 + 165 + return probe_kernel_address((const unsigned short *)rom, sig) == 0 && 166 + sig == ROMSIGNATURE; 167 167 } 168 168 169 169 static int __init romchecksum(unsigned char *rom, unsigned long length) 170 170 { 171 - unsigned char *p, sum = 0; 171 + unsigned char sum; 172 172 173 - for (p = rom; p < rom + length; p++) 174 - sum += *p; 173 + for (sum = 0; length; length--) 174 + sum += *rom++; 175 175 return sum == 0; 176 176 } 177 177

+58 -20

arch/i386/kernel/entry.S

··· 30 30 * 18(%esp) - %eax 31 31 * 1C(%esp) - %ds 32 32 * 20(%esp) - %es 33 - * 24(%esp) - %gs 33 + * 24(%esp) - %fs 34 34 * 28(%esp) - orig_eax 35 35 * 2C(%esp) - %eip 36 36 * 30(%esp) - %cs ··· 99 99 100 100 #define SAVE_ALL \ 101 101 cld; \ 102 - pushl %gs; \ 102 + pushl %fs; \ 103 103 CFI_ADJUST_CFA_OFFSET 4;\ 104 - /*CFI_REL_OFFSET gs, 0;*/\ 104 + /*CFI_REL_OFFSET fs, 0;*/\ 105 105 pushl %es; \ 106 106 CFI_ADJUST_CFA_OFFSET 4;\ 107 107 /*CFI_REL_OFFSET es, 0;*/\ ··· 133 133 movl %edx, %ds; \ 134 134 movl %edx, %es; \ 135 135 movl $(__KERNEL_PDA), %edx; \ 136 - movl %edx, %gs 136 + movl %edx, %fs 137 137 138 138 #define RESTORE_INT_REGS \ 139 139 popl %ebx; \ ··· 166 166 2: popl %es; \ 167 167 CFI_ADJUST_CFA_OFFSET -4;\ 168 168 /*CFI_RESTORE es;*/\ 169 - 3: popl %gs; \ 169 + 3: popl %fs; \ 170 170 CFI_ADJUST_CFA_OFFSET -4;\ 171 - /*CFI_RESTORE gs;*/\ 171 + /*CFI_RESTORE fs;*/\ 172 172 .pushsection .fixup,"ax"; \ 173 173 4: movl $0,(%esp); \ 174 174 jmp 1b; \ ··· 227 227 CFI_ADJUST_CFA_OFFSET -4 228 228 jmp syscall_exit 229 229 CFI_ENDPROC 230 + END(ret_from_fork) 230 231 231 232 /* 232 233 * Return to user mode is not as complex as all this looks, ··· 259 258 # int/exception return? 260 259 jne work_pending 261 260 jmp restore_all 261 + END(ret_from_exception) 262 262 263 263 #ifdef CONFIG_PREEMPT 264 264 ENTRY(resume_kernel) ··· 274 272 jz restore_all 275 273 call preempt_schedule_irq 276 274 jmp need_resched 275 + END(resume_kernel) 277 276 #endif 278 277 CFI_ENDPROC 279 278 ··· 352 349 movl PT_OLDESP(%esp), %ecx 353 350 xorl %ebp,%ebp 354 351 TRACE_IRQS_ON 355 - 1: mov PT_GS(%esp), %gs 352 + 1: mov PT_FS(%esp), %fs 356 353 ENABLE_INTERRUPTS_SYSEXIT 357 354 CFI_ENDPROC 358 355 .pushsection .fixup,"ax" 359 - 2: movl $0,PT_GS(%esp) 356 + 2: movl $0,PT_FS(%esp) 360 357 jmp 1b 361 358 .section __ex_table,"a" 362 359 .align 4 363 360 .long 1b,2b 364 361 .popsection 362 + ENDPROC(sysenter_entry) 365 363 366 364 # system call handler stub 367 365 ENTRY(system_call) ··· 463 459 CFI_ADJUST_CFA_OFFSET -8 464 460 jmp restore_nocheck 465 461 CFI_ENDPROC 462 + ENDPROC(system_call) 466 463 467 464 # perform work that needs to be done immediately before resumption 468 465 ALIGN ··· 509 504 xorl %edx, %edx 510 505 call do_notify_resume 511 506 jmp resume_userspace_sig 507 + END(work_pending) 512 508 513 509 # perform syscall exit tracing 514 510 ALIGN ··· 525 519 cmpl $(nr_syscalls), %eax 526 520 jnae syscall_call 527 521 jmp syscall_exit 522 + END(syscall_trace_entry) 528 523 529 524 # perform syscall exit tracing 530 525 ALIGN ··· 539 532 movl $1, %edx 540 533 call do_syscall_trace 541 534 jmp resume_userspace 535 + END(syscall_exit_work) 542 536 CFI_ENDPROC 543 537 544 538 RING0_INT_FRAME # can't unwind into user space anyway ··· 550 542 GET_THREAD_INFO(%ebp) 551 543 movl $-EFAULT,PT_EAX(%esp) 552 544 jmp resume_userspace 545 + END(syscall_fault) 553 546 554 547 syscall_badsys: 555 548 movl $-ENOSYS,PT_EAX(%esp) 556 549 jmp resume_userspace 550 + END(syscall_badsys) 557 551 CFI_ENDPROC 558 552 559 553 #define FIXUP_ESPFIX_STACK \ 560 554 /* since we are on a wrong stack, we cant make it a C code :( */ \ 561 - movl %gs:PDA_cpu, %ebx; \ 555 + movl %fs:PDA_cpu, %ebx; \ 562 556 PER_CPU(cpu_gdt_descr, %ebx); \ 563 557 movl GDS_address(%ebx), %ebx; \ 564 558 GET_DESC_BASE(GDT_ENTRY_ESPFIX_SS, %ebx, %eax, %ax, %al, %ah); \ ··· 591 581 ENTRY(interrupt) 592 582 .text 593 583 594 - vector=0 595 584 ENTRY(irq_entries_start) 596 585 RING0_INT_FRAME 586 + vector=0 597 587 .rept NR_IRQS 598 588 ALIGN 599 589 .if vector ··· 602 592 1: pushl $~(vector) 603 593 CFI_ADJUST_CFA_OFFSET 4 604 594 jmp common_interrupt 605 - .data 595 + .previous 606 596 .long 1b 607 - .text 597 + .text 608 598 vector=vector+1 609 599 .endr 600 + END(irq_entries_start) 601 + 602 + .previous 603 + END(interrupt) 604 + .previous 610 605 611 606 /* 612 607 * the CPU automatically disables interrupts when executing an IRQ vector, ··· 624 609 movl %esp,%eax 625 610 call do_IRQ 626 611 jmp ret_from_intr 612 + ENDPROC(common_interrupt) 627 613 CFI_ENDPROC 628 614 629 615 #define BUILD_INTERRUPT(name, nr) \ ··· 637 621 movl %esp,%eax; \ 638 622 call smp_/**/name; \ 639 623 jmp ret_from_intr; \ 640 - CFI_ENDPROC 624 + CFI_ENDPROC; \ 625 + ENDPROC(name) 641 626 642 627 /* The include is where all of the SMP etc. interrupts come from */ 643 628 #include "entry_arch.h" 629 + 630 + /* This alternate entry is needed because we hijack the apic LVTT */ 631 + #if defined(CONFIG_VMI) && defined(CONFIG_X86_LOCAL_APIC) 632 + BUILD_INTERRUPT(apic_vmi_timer_interrupt,LOCAL_TIMER_VECTOR) 633 + #endif 644 634 645 635 KPROBE_ENTRY(page_fault) 646 636 RING0_EC_FRAME ··· 654 632 CFI_ADJUST_CFA_OFFSET 4 655 633 ALIGN 656 634 error_code: 657 - /* the function address is in %gs's slot on the stack */ 635 + /* the function address is in %fs's slot on the stack */ 658 636 pushl %es 659 637 CFI_ADJUST_CFA_OFFSET 4 660 638 /*CFI_REL_OFFSET es, 0*/ ··· 683 661 CFI_ADJUST_CFA_OFFSET 4 684 662 CFI_REL_OFFSET ebx, 0 685 663 cld 686 - pushl %gs 664 + pushl %fs 687 665 CFI_ADJUST_CFA_OFFSET 4 688 - /*CFI_REL_OFFSET gs, 0*/ 666 + /*CFI_REL_OFFSET fs, 0*/ 689 667 movl $(__KERNEL_PDA), %ecx 690 - movl %ecx, %gs 668 + movl %ecx, %fs 691 669 UNWIND_ESPFIX_STACK 692 670 popl %ecx 693 671 CFI_ADJUST_CFA_OFFSET -4 694 672 /*CFI_REGISTER es, ecx*/ 695 - movl PT_GS(%esp), %edi # get the function address 673 + movl PT_FS(%esp), %edi # get the function address 696 674 movl PT_ORIG_EAX(%esp), %edx # get the error code 697 675 movl $-1, PT_ORIG_EAX(%esp) # no syscall to restart 698 - mov %ecx, PT_GS(%esp) 699 - /*CFI_REL_OFFSET gs, ES*/ 676 + mov %ecx, PT_FS(%esp) 677 + /*CFI_REL_OFFSET fs, ES*/ 700 678 movl $(__USER_DS), %ecx 701 679 movl %ecx, %ds 702 680 movl %ecx, %es ··· 714 692 CFI_ADJUST_CFA_OFFSET 4 715 693 jmp error_code 716 694 CFI_ENDPROC 695 + END(coprocessor_error) 717 696 718 697 ENTRY(simd_coprocessor_error) 719 698 RING0_INT_FRAME ··· 724 701 CFI_ADJUST_CFA_OFFSET 4 725 702 jmp error_code 726 703 CFI_ENDPROC 704 + END(simd_coprocessor_error) 727 705 728 706 ENTRY(device_not_available) 729 707 RING0_INT_FRAME ··· 745 721 CFI_ADJUST_CFA_OFFSET -4 746 722 jmp ret_from_exception 747 723 CFI_ENDPROC 724 + END(device_not_available) 748 725 749 726 /* 750 727 * Debug traps and NMI can happen at the one SYSENTER instruction ··· 889 864 .align 4 890 865 .long 1b,iret_exc 891 866 .previous 867 + END(native_iret) 892 868 893 869 ENTRY(native_irq_enable_sysexit) 894 870 sti 895 871 sysexit 872 + END(native_irq_enable_sysexit) 896 873 #endif 897 874 898 875 KPROBE_ENTRY(int3) ··· 917 890 CFI_ADJUST_CFA_OFFSET 4 918 891 jmp error_code 919 892 CFI_ENDPROC 893 + END(overflow) 920 894 921 895 ENTRY(bounds) 922 896 RING0_INT_FRAME ··· 927 899 CFI_ADJUST_CFA_OFFSET 4 928 900 jmp error_code 929 901 CFI_ENDPROC 902 + END(bounds) 930 903 931 904 ENTRY(invalid_op) 932 905 RING0_INT_FRAME ··· 937 908 CFI_ADJUST_CFA_OFFSET 4 938 909 jmp error_code 939 910 CFI_ENDPROC 911 + END(invalid_op) 940 912 941 913 ENTRY(coprocessor_segment_overrun) 942 914 RING0_INT_FRAME ··· 947 917 CFI_ADJUST_CFA_OFFSET 4 948 918 jmp error_code 949 919 CFI_ENDPROC 920 + END(coprocessor_segment_overrun) 950 921 951 922 ENTRY(invalid_TSS) 952 923 RING0_EC_FRAME ··· 955 924 CFI_ADJUST_CFA_OFFSET 4 956 925 jmp error_code 957 926 CFI_ENDPROC 927 + END(invalid_TSS) 958 928 959 929 ENTRY(segment_not_present) 960 930 RING0_EC_FRAME ··· 963 931 CFI_ADJUST_CFA_OFFSET 4 964 932 jmp error_code 965 933 CFI_ENDPROC 934 + END(segment_not_present) 966 935 967 936 ENTRY(stack_segment) 968 937 RING0_EC_FRAME ··· 971 938 CFI_ADJUST_CFA_OFFSET 4 972 939 jmp error_code 973 940 CFI_ENDPROC 941 + END(stack_segment) 974 942 975 943 KPROBE_ENTRY(general_protection) 976 944 RING0_EC_FRAME ··· 987 953 CFI_ADJUST_CFA_OFFSET 4 988 954 jmp error_code 989 955 CFI_ENDPROC 956 + END(alignment_check) 990 957 991 958 ENTRY(divide_error) 992 959 RING0_INT_FRAME ··· 997 962 CFI_ADJUST_CFA_OFFSET 4 998 963 jmp error_code 999 964 CFI_ENDPROC 965 + END(divide_error) 1000 966 1001 967 #ifdef CONFIG_X86_MCE 1002 968 ENTRY(machine_check) ··· 1008 972 CFI_ADJUST_CFA_OFFSET 4 1009 973 jmp error_code 1010 974 CFI_ENDPROC 975 + END(machine_check) 1011 976 #endif 1012 977 1013 978 ENTRY(spurious_interrupt_bug) ··· 1019 982 CFI_ADJUST_CFA_OFFSET 4 1020 983 jmp error_code 1021 984 CFI_ENDPROC 985 + END(spurious_interrupt_bug) 1022 986 1023 987 ENTRY(kernel_thread_helper) 1024 988 pushl $0 # fake return address for unwinder

+27 -11

arch/i386/kernel/head.S

··· 53 53 * any particular GDT layout, because we load our own as soon as we 54 54 * can. 55 55 */ 56 + .section .text.head,"ax",@progbits 56 57 ENTRY(startup_32) 57 58 58 59 #ifdef CONFIG_PARAVIRT ··· 142 141 jb 10b 143 142 movl %edi,(init_pg_tables_end - __PAGE_OFFSET) 144 143 145 - #ifdef CONFIG_SMP 146 144 xorl %ebx,%ebx /* This is the boot CPU (BSP) */ 147 145 jmp 3f 148 - 149 146 /* 150 147 * Non-boot CPU entry point; entered from trampoline.S 151 148 * We can't lgdt here, because lgdt itself uses a data segment, but 152 149 * we know the trampoline has already loaded the boot_gdt_table GDT 153 150 * for us. 151 + * 152 + * If cpu hotplug is not supported then this code can go in init section 153 + * which will be freed later 154 154 */ 155 + 156 + #ifdef CONFIG_HOTPLUG_CPU 157 + .section .text,"ax",@progbits 158 + #else 159 + .section .init.text,"ax",@progbits 160 + #endif 161 + 162 + #ifdef CONFIG_SMP 155 163 ENTRY(startup_32_smp) 156 164 cld 157 165 movl $(__BOOT_DS),%eax ··· 218 208 xorl %ebx,%ebx 219 209 incl %ebx 220 210 221 - 3: 222 211 #endif /* CONFIG_SMP */ 212 + 3: 223 213 224 214 /* 225 215 * Enable paging ··· 319 309 320 310 call check_x87 321 311 call setup_pda 322 - lgdt cpu_gdt_descr 312 + lgdt early_gdt_descr 323 313 lidt idt_descr 324 314 ljmp $(__KERNEL_CS),$1f 325 315 1: movl $(__KERNEL_DS),%eax # reload all the segment registers ··· 329 319 movl %eax,%ds 330 320 movl %eax,%es 331 321 332 - xorl %eax,%eax # Clear FS and LDT 333 - movl %eax,%fs 322 + xorl %eax,%eax # Clear GS and LDT 323 + movl %eax,%gs 334 324 lldt %ax 335 325 336 326 movl $(__KERNEL_PDA),%eax 337 - mov %eax,%gs 327 + mov %eax,%fs 338 328 339 329 cld # gcc2 wants the direction flag cleared at all times 340 330 pushl $0 # fake return address for unwinder ··· 370 360 * cpu_gdt_table and boot_pda; for secondary CPUs, these will be 371 361 * that CPU's GDT and PDA. 372 362 */ 373 - setup_pda: 363 + ENTRY(setup_pda) 374 364 /* get the PDA pointer */ 375 365 movl start_pda, %eax 376 366 377 367 /* slot the PDA address into the GDT */ 378 - mov cpu_gdt_descr+2, %ecx 368 + mov early_gdt_descr+2, %ecx 379 369 mov %ax, (__KERNEL_PDA+0+2)(%ecx) /* base & 0x0000ffff */ 380 370 shr $16, %eax 381 371 mov %al, (__KERNEL_PDA+4+0)(%ecx) /* base & 0x00ff0000 */ ··· 502 492 #endif 503 493 iret 504 494 495 + .section .text 505 496 #ifdef CONFIG_PARAVIRT 506 497 startup_paravirt: 507 498 cld ··· 513 502 pushl %ecx 514 503 pushl %eax 515 504 516 - /* paravirt.o is last in link, and that probe fn never returns */ 517 505 pushl $__start_paravirtprobe 518 506 1: 519 507 movl 0(%esp), %eax 508 + cmpl $__stop_paravirtprobe, %eax 509 + je unhandled_paravirt 520 510 pushl (%eax) 521 511 movl 8(%esp), %eax 522 512 call *(%esp) ··· 529 517 530 518 addl $4, (%esp) 531 519 jmp 1b 520 + 521 + unhandled_paravirt: 522 + /* Nothing wanted us: we're screwed. */ 523 + ud2 532 524 #endif 533 525 534 526 /* ··· 597 581 598 582 # boot GDT descriptor (later on used by CPU#0): 599 583 .word 0 # 32 bit align gdt_desc.address 600 - ENTRY(cpu_gdt_descr) 584 + ENTRY(early_gdt_descr) 601 585 .word GDT_ENTRIES*8-1 602 586 .long cpu_gdt_table 603 587

+2 -2

arch/i386/kernel/io_apic.c

··· 1920 1920 static void __init setup_ioapic_ids_from_mpc(void) { } 1921 1921 #endif 1922 1922 1923 - static int no_timer_check __initdata; 1923 + int no_timer_check __initdata; 1924 1924 1925 1925 static int __init notimercheck(char *s) 1926 1926 { ··· 2310 2310 2311 2311 disable_8259A_irq(0); 2312 2312 set_irq_chip_and_handler_name(0, &lapic_chip, handle_fasteoi_irq, 2313 - "fasteio"); 2313 + "fasteoi"); 2314 2314 apic_write_around(APIC_LVT0, APIC_DM_FIXED | vector); /* Fixed mode */ 2315 2315 enable_8259A_irq(0); 2316 2316

+3

arch/i386/kernel/irq.c

··· 19 19 #include <linux/cpu.h> 20 20 #include <linux/delay.h> 21 21 22 + #include <asm/idle.h> 23 + 22 24 DEFINE_PER_CPU(irq_cpustat_t, irq_stat) ____cacheline_internodealigned_in_smp; 23 25 EXPORT_PER_CPU_SYMBOL(irq_stat); 24 26 ··· 63 61 union irq_ctx *curctx, *irqctx; 64 62 u32 *isp; 65 63 #endif 64 + exit_idle(); 66 65 67 66 if (unlikely((unsigned)irq >= NR_IRQS)) { 68 67 printk(KERN_EMERG "%s: cannot handle IRQ %d\n",

+3 -3

arch/i386/kernel/kprobes.c

··· 363 363 " pushf\n" 364 364 /* skip cs, eip, orig_eax */ 365 365 " subl $12, %esp\n" 366 - " pushl %gs\n" 366 + " pushl %fs\n" 367 367 " pushl %ds\n" 368 368 " pushl %es\n" 369 369 " pushl %eax\n" ··· 387 387 " popl %edi\n" 388 388 " popl %ebp\n" 389 389 " popl %eax\n" 390 - /* skip eip, orig_eax, es, ds, gs */ 390 + /* skip eip, orig_eax, es, ds, fs */ 391 391 " addl $20, %esp\n" 392 392 " popf\n" 393 393 " ret\n"); ··· 408 408 spin_lock_irqsave(&kretprobe_lock, flags); 409 409 head = kretprobe_inst_table_head(current); 410 410 /* fixup registers */ 411 - regs->xcs = __KERNEL_CS; 411 + regs->xcs = __KERNEL_CS | get_kernel_rpl(); 412 412 regs->eip = trampoline_address; 413 413 regs->orig_eax = 0xffffffff; 414 414

+1 -1

arch/i386/kernel/microcode.c

··· 384 384 { 385 385 long cursor = 0; 386 386 int error = 0; 387 - void *new_mc; 387 + void *new_mc = NULL; 388 388 int cpu; 389 389 cpumask_t old; 390 390

+4 -9

arch/i386/kernel/msr.c

··· 68 68 #ifdef CONFIG_SMP 69 69 70 70 struct msr_command { 71 - int cpu; 72 71 int err; 73 72 u32 reg; 74 73 u32 data[2]; ··· 77 78 { 78 79 struct msr_command *cmd = (struct msr_command *)cmd_block; 79 80 80 - if (cmd->cpu == smp_processor_id()) 81 - cmd->err = wrmsr_eio(cmd->reg, cmd->data[0], cmd->data[1]); 81 + cmd->err = wrmsr_eio(cmd->reg, cmd->data[0], cmd->data[1]); 82 82 } 83 83 84 84 static void msr_smp_rdmsr(void *cmd_block) 85 85 { 86 86 struct msr_command *cmd = (struct msr_command *)cmd_block; 87 87 88 - if (cmd->cpu == smp_processor_id()) 89 - cmd->err = rdmsr_eio(cmd->reg, &cmd->data[0], &cmd->data[1]); 88 + cmd->err = rdmsr_eio(cmd->reg, &cmd->data[0], &cmd->data[1]); 90 89 } 91 90 92 91 static inline int do_wrmsr(int cpu, u32 reg, u32 eax, u32 edx) ··· 96 99 if (cpu == smp_processor_id()) { 97 100 ret = wrmsr_eio(reg, eax, edx); 98 101 } else { 99 - cmd.cpu = cpu; 100 102 cmd.reg = reg; 101 103 cmd.data[0] = eax; 102 104 cmd.data[1] = edx; 103 105 104 - smp_call_function(msr_smp_wrmsr, &cmd, 1, 1); 106 + smp_call_function_single(cpu, msr_smp_wrmsr, &cmd, 1, 1); 105 107 ret = cmd.err; 106 108 } 107 109 preempt_enable(); ··· 116 120 if (cpu == smp_processor_id()) { 117 121 ret = rdmsr_eio(reg, eax, edx); 118 122 } else { 119 - cmd.cpu = cpu; 120 123 cmd.reg = reg; 121 124 122 - smp_call_function(msr_smp_rdmsr, &cmd, 1, 1); 125 + smp_call_function_single(cpu, msr_smp_rdmsr, &cmd, 1, 1); 123 126 124 127 *eax = cmd.data[0]; 125 128 *edx = cmd.data[1];

+80 -18

arch/i386/kernel/nmi.c

··· 185 185 { 186 186 switch (boot_cpu_data.x86_vendor) { 187 187 case X86_VENDOR_AMD: 188 - return ((boot_cpu_data.x86 == 15) || (boot_cpu_data.x86 == 6)); 188 + return ((boot_cpu_data.x86 == 15) || (boot_cpu_data.x86 == 6) 189 + || (boot_cpu_data.x86 == 16)); 189 190 case X86_VENDOR_INTEL: 190 191 if (cpu_has(&boot_cpu_data, X86_FEATURE_ARCH_PERFMON)) 191 192 return 1; ··· 216 215 mb(); 217 216 } 218 217 #endif 218 + 219 + static unsigned int adjust_for_32bit_ctr(unsigned int hz) 220 + { 221 + u64 counter_val; 222 + unsigned int retval = hz; 223 + 224 + /* 225 + * On Intel CPUs with P6/ARCH_PERFMON only 32 bits in the counter 226 + * are writable, with higher bits sign extending from bit 31. 227 + * So, we can only program the counter with 31 bit values and 228 + * 32nd bit should be 1, for 33.. to be 1. 229 + * Find the appropriate nmi_hz 230 + */ 231 + counter_val = (u64)cpu_khz * 1000; 232 + do_div(counter_val, retval); 233 + if (counter_val > 0x7fffffffULL) { 234 + u64 count = (u64)cpu_khz * 1000; 235 + do_div(count, 0x7fffffffUL); 236 + retval = count + 1; 237 + } 238 + return retval; 239 + } 219 240 220 241 static int __init check_nmi_watchdog(void) 221 242 { ··· 304 281 struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk); 305 282 306 283 nmi_hz = 1; 307 - /* 308 - * On Intel CPUs with ARCH_PERFMON only 32 bits in the counter 309 - * are writable, with higher bits sign extending from bit 31. 310 - * So, we can only program the counter with 31 bit values and 311 - * 32nd bit should be 1, for 33.. to be 1. 312 - * Find the appropriate nmi_hz 313 - */ 314 - if (wd->perfctr_msr == MSR_ARCH_PERFMON_PERFCTR0 && 315 - ((u64)cpu_khz * 1000) > 0x7fffffffULL) { 316 - u64 count = (u64)cpu_khz * 1000; 317 - do_div(count, 0x7fffffffUL); 318 - nmi_hz = count + 1; 284 + 285 + if (wd->perfctr_msr == MSR_P6_PERFCTR0 || 286 + wd->perfctr_msr == MSR_ARCH_PERFMON_PERFCTR0) { 287 + nmi_hz = adjust_for_32bit_ctr(nmi_hz); 319 288 } 320 289 } 321 290 ··· 384 369 } 385 370 } 386 371 372 + static void __acpi_nmi_disable(void *__unused) 373 + { 374 + apic_write_around(APIC_LVT0, APIC_DM_NMI | APIC_LVT_MASKED); 375 + } 376 + 377 + /* 378 + * Disable timer based NMIs on all CPUs: 379 + */ 380 + void acpi_nmi_disable(void) 381 + { 382 + if (atomic_read(&nmi_active) && nmi_watchdog == NMI_IO_APIC) 383 + on_each_cpu(__acpi_nmi_disable, NULL, 0, 1); 384 + } 385 + 386 + static void __acpi_nmi_enable(void *__unused) 387 + { 388 + apic_write_around(APIC_LVT0, APIC_DM_NMI); 389 + } 390 + 391 + /* 392 + * Enable timer based NMIs on all CPUs: 393 + */ 394 + void acpi_nmi_enable(void) 395 + { 396 + if (atomic_read(&nmi_active) && nmi_watchdog == NMI_IO_APIC) 397 + on_each_cpu(__acpi_nmi_enable, NULL, 0, 1); 398 + } 399 + 387 400 #ifdef CONFIG_PM 388 401 389 402 static int nmi_pm_active; /* nmi_active before suspend */ ··· 483 440 if(descr) 484 441 Dprintk("setting %s to -0x%08Lx\n", descr, count); 485 442 wrmsrl(perfctr_msr, 0 - count); 443 + } 444 + 445 + static void write_watchdog_counter32(unsigned int perfctr_msr, 446 + const char *descr) 447 + { 448 + u64 count = (u64)cpu_khz * 1000; 449 + 450 + do_div(count, nmi_hz); 451 + if(descr) 452 + Dprintk("setting %s to -0x%08Lx\n", descr, count); 453 + wrmsr(perfctr_msr, (u32)(-count), 0); 486 454 } 487 455 488 456 /* Note that these events don't tick when the CPU idles. This means ··· 585 531 586 532 /* setup the timer */ 587 533 wrmsr(evntsel_msr, evntsel, 0); 588 - write_watchdog_counter(perfctr_msr, "P6_PERFCTR0"); 534 + nmi_hz = adjust_for_32bit_ctr(nmi_hz); 535 + write_watchdog_counter32(perfctr_msr, "P6_PERFCTR0"); 589 536 apic_write(APIC_LVTPC, APIC_DM_NMI); 590 537 evntsel |= P6_EVNTSEL0_ENABLE; 591 538 wrmsr(evntsel_msr, evntsel, 0); ··· 759 704 760 705 /* setup the timer */ 761 706 wrmsr(evntsel_msr, evntsel, 0); 762 - write_watchdog_counter(perfctr_msr, "INTEL_ARCH_PERFCTR0"); 707 + nmi_hz = adjust_for_32bit_ctr(nmi_hz); 708 + write_watchdog_counter32(perfctr_msr, "INTEL_ARCH_PERFCTR0"); 763 709 apic_write(APIC_LVTPC, APIC_DM_NMI); 764 710 evntsel |= ARCH_PERFMON_EVENTSEL0_ENABLE; 765 711 wrmsr(evntsel_msr, evntsel, 0); ··· 818 762 if (nmi_watchdog == NMI_LOCAL_APIC) { 819 763 switch (boot_cpu_data.x86_vendor) { 820 764 case X86_VENDOR_AMD: 821 - if (boot_cpu_data.x86 != 6 && boot_cpu_data.x86 != 15) 765 + if (boot_cpu_data.x86 != 6 && boot_cpu_data.x86 != 15 && 766 + boot_cpu_data.x86 != 16) 822 767 return; 823 768 if (!setup_k7_watchdog()) 824 769 return; ··· 1013 956 dummy &= ~P4_CCCR_OVF; 1014 957 wrmsrl(wd->cccr_msr, dummy); 1015 958 apic_write(APIC_LVTPC, APIC_DM_NMI); 959 + /* start the cycle over again */ 960 + write_watchdog_counter(wd->perfctr_msr, NULL); 1016 961 } 1017 962 else if (wd->perfctr_msr == MSR_P6_PERFCTR0 || 1018 963 wd->perfctr_msr == MSR_ARCH_PERFMON_PERFCTR0) { ··· 1023 964 * other P6 variant. 1024 965 * ArchPerfom/Core Duo also needs this */ 1025 966 apic_write(APIC_LVTPC, APIC_DM_NMI); 967 + /* P6/ARCH_PERFMON has 32 bit counter write */ 968 + write_watchdog_counter32(wd->perfctr_msr, NULL); 969 + } else { 970 + /* start the cycle over again */ 971 + write_watchdog_counter(wd->perfctr_msr, NULL); 1026 972 } 1027 - /* start the cycle over again */ 1028 - write_watchdog_counter(wd->perfctr_msr, NULL); 1029 973 rc = 1; 1030 974 } else if (nmi_watchdog == NMI_IO_APIC) { 1031 975 /* don't know how to accurately check for this.

+62 -54

arch/i386/kernel/paravirt.c

··· 92 92 return insn_len; 93 93 } 94 94 95 - static fastcall unsigned long native_get_debugreg(int regno) 95 + static unsigned long native_get_debugreg(int regno) 96 96 { 97 97 unsigned long val = 0; /* Damn you, gcc! */ 98 98 ··· 115 115 return val; 116 116 } 117 117 118 - static fastcall void native_set_debugreg(int regno, unsigned long value) 118 + static void native_set_debugreg(int regno, unsigned long value) 119 119 { 120 120 switch (regno) { 121 121 case 0: ··· 146 146 paravirt_ops.init_IRQ(); 147 147 } 148 148 149 - static fastcall void native_clts(void) 149 + static void native_clts(void) 150 150 { 151 151 asm volatile ("clts"); 152 152 } 153 153 154 - static fastcall unsigned long native_read_cr0(void) 154 + static unsigned long native_read_cr0(void) 155 155 { 156 156 unsigned long val; 157 157 asm volatile("movl %%cr0,%0\n\t" :"=r" (val)); 158 158 return val; 159 159 } 160 160 161 - static fastcall void native_write_cr0(unsigned long val) 161 + static void native_write_cr0(unsigned long val) 162 162 { 163 163 asm volatile("movl %0,%%cr0": :"r" (val)); 164 164 } 165 165 166 - static fastcall unsigned long native_read_cr2(void) 166 + static unsigned long native_read_cr2(void) 167 167 { 168 168 unsigned long val; 169 169 asm volatile("movl %%cr2,%0\n\t" :"=r" (val)); 170 170 return val; 171 171 } 172 172 173 - static fastcall void native_write_cr2(unsigned long val) 173 + static void native_write_cr2(unsigned long val) 174 174 { 175 175 asm volatile("movl %0,%%cr2": :"r" (val)); 176 176 } 177 177 178 - static fastcall unsigned long native_read_cr3(void) 178 + static unsigned long native_read_cr3(void) 179 179 { 180 180 unsigned long val; 181 181 asm volatile("movl %%cr3,%0\n\t" :"=r" (val)); 182 182 return val; 183 183 } 184 184 185 - static fastcall void native_write_cr3(unsigned long val) 185 + static void native_write_cr3(unsigned long val) 186 186 { 187 187 asm volatile("movl %0,%%cr3": :"r" (val)); 188 188 } 189 189 190 - static fastcall unsigned long native_read_cr4(void) 190 + static unsigned long native_read_cr4(void) 191 191 { 192 192 unsigned long val; 193 193 asm volatile("movl %%cr4,%0\n\t" :"=r" (val)); 194 194 return val; 195 195 } 196 196 197 - static fastcall unsigned long native_read_cr4_safe(void) 197 + static unsigned long native_read_cr4_safe(void) 198 198 { 199 199 unsigned long val; 200 200 /* This could fault if %cr4 does not exist */ ··· 207 207 return val; 208 208 } 209 209 210 - static fastcall void native_write_cr4(unsigned long val) 210 + static void native_write_cr4(unsigned long val) 211 211 { 212 212 asm volatile("movl %0,%%cr4": :"r" (val)); 213 213 } 214 214 215 - static fastcall unsigned long native_save_fl(void) 215 + static unsigned long native_save_fl(void) 216 216 { 217 217 unsigned long f; 218 218 asm volatile("pushfl ; popl %0":"=g" (f): /* no input */); 219 219 return f; 220 220 } 221 221 222 - static fastcall void native_restore_fl(unsigned long f) 222 + static void native_restore_fl(unsigned long f) 223 223 { 224 224 asm volatile("pushl %0 ; popfl": /* no output */ 225 225 :"g" (f) 226 226 :"memory", "cc"); 227 227 } 228 228 229 - static fastcall void native_irq_disable(void) 229 + static void native_irq_disable(void) 230 230 { 231 231 asm volatile("cli": : :"memory"); 232 232 } 233 233 234 - static fastcall void native_irq_enable(void) 234 + static void native_irq_enable(void) 235 235 { 236 236 asm volatile("sti": : :"memory"); 237 237 } 238 238 239 - static fastcall void native_safe_halt(void) 239 + static void native_safe_halt(void) 240 240 { 241 241 asm volatile("sti; hlt": : :"memory"); 242 242 } 243 243 244 - static fastcall void native_halt(void) 244 + static void native_halt(void) 245 245 { 246 246 asm volatile("hlt": : :"memory"); 247 247 } 248 248 249 - static fastcall void native_wbinvd(void) 249 + static void native_wbinvd(void) 250 250 { 251 251 asm volatile("wbinvd": : :"memory"); 252 252 } 253 253 254 - static fastcall unsigned long long native_read_msr(unsigned int msr, int *err) 254 + static unsigned long long native_read_msr(unsigned int msr, int *err) 255 255 { 256 256 unsigned long long val; 257 257 ··· 270 270 return val; 271 271 } 272 272 273 - static fastcall int native_write_msr(unsigned int msr, unsigned long long val) 273 + static int native_write_msr(unsigned int msr, unsigned long long val) 274 274 { 275 275 int err; 276 276 asm volatile("2: wrmsr ; xorl %0,%0\n" ··· 288 288 return err; 289 289 } 290 290 291 - static fastcall unsigned long long native_read_tsc(void) 291 + static unsigned long long native_read_tsc(void) 292 292 { 293 293 unsigned long long val; 294 294 asm volatile("rdtsc" : "=A" (val)); 295 295 return val; 296 296 } 297 297 298 - static fastcall unsigned long long native_read_pmc(void) 298 + static unsigned long long native_read_pmc(void) 299 299 { 300 300 unsigned long long val; 301 301 asm volatile("rdpmc" : "=A" (val)); 302 302 return val; 303 303 } 304 304 305 - static fastcall void native_load_tr_desc(void) 305 + static void native_load_tr_desc(void) 306 306 { 307 307 asm volatile("ltr %w0"::"q" (GDT_ENTRY_TSS*8)); 308 308 } 309 309 310 - static fastcall void native_load_gdt(const struct Xgt_desc_struct *dtr) 310 + static void native_load_gdt(const struct Xgt_desc_struct *dtr) 311 311 { 312 312 asm volatile("lgdt %0"::"m" (*dtr)); 313 313 } 314 314 315 - static fastcall void native_load_idt(const struct Xgt_desc_struct *dtr) 315 + static void native_load_idt(const struct Xgt_desc_struct *dtr) 316 316 { 317 317 asm volatile("lidt %0"::"m" (*dtr)); 318 318 } 319 319 320 - static fastcall void native_store_gdt(struct Xgt_desc_struct *dtr) 320 + static void native_store_gdt(struct Xgt_desc_struct *dtr) 321 321 { 322 322 asm ("sgdt %0":"=m" (*dtr)); 323 323 } 324 324 325 - static fastcall void native_store_idt(struct Xgt_desc_struct *dtr) 325 + static void native_store_idt(struct Xgt_desc_struct *dtr) 326 326 { 327 327 asm ("sidt %0":"=m" (*dtr)); 328 328 } 329 329 330 - static fastcall unsigned long native_store_tr(void) 330 + static unsigned long native_store_tr(void) 331 331 { 332 332 unsigned long tr; 333 333 asm ("str %0":"=r" (tr)); 334 334 return tr; 335 335 } 336 336 337 - static fastcall void native_load_tls(struct thread_struct *t, unsigned int cpu) 337 + static void native_load_tls(struct thread_struct *t, unsigned int cpu) 338 338 { 339 339 #define C(i) get_cpu_gdt_table(cpu)[GDT_ENTRY_TLS_MIN + i] = t->tls_array[i] 340 340 C(0); C(1); C(2); ··· 348 348 lp[1] = entry_high; 349 349 } 350 350 351 - static fastcall void native_write_ldt_entry(void *dt, int entrynum, u32 low, u32 high) 351 + static void native_write_ldt_entry(void *dt, int entrynum, u32 low, u32 high) 352 352 { 353 353 native_write_dt_entry(dt, entrynum, low, high); 354 354 } 355 355 356 - static fastcall void native_write_gdt_entry(void *dt, int entrynum, u32 low, u32 high) 356 + static void native_write_gdt_entry(void *dt, int entrynum, u32 low, u32 high) 357 357 { 358 358 native_write_dt_entry(dt, entrynum, low, high); 359 359 } 360 360 361 - static fastcall void native_write_idt_entry(void *dt, int entrynum, u32 low, u32 high) 361 + static void native_write_idt_entry(void *dt, int entrynum, u32 low, u32 high) 362 362 { 363 363 native_write_dt_entry(dt, entrynum, low, high); 364 364 } 365 365 366 - static fastcall void native_load_esp0(struct tss_struct *tss, 366 + static void native_load_esp0(struct tss_struct *tss, 367 367 struct thread_struct *thread) 368 368 { 369 369 tss->esp0 = thread->esp0; ··· 375 375 } 376 376 } 377 377 378 - static fastcall void native_io_delay(void) 378 + static void native_io_delay(void) 379 379 { 380 380 asm volatile("outb %al,$0x80"); 381 381 } 382 382 383 - static fastcall void native_flush_tlb(void) 383 + static void native_flush_tlb(void) 384 384 { 385 385 __native_flush_tlb(); 386 386 } ··· 389 389 * Global pages have to be flushed a bit differently. Not a real 390 390 * performance problem because this does not happen often. 391 391 */ 392 - static fastcall void native_flush_tlb_global(void) 392 + static void native_flush_tlb_global(void) 393 393 { 394 394 __native_flush_tlb_global(); 395 395 } 396 396 397 - static fastcall void native_flush_tlb_single(u32 addr) 397 + static void native_flush_tlb_single(u32 addr) 398 398 { 399 399 __native_flush_tlb_single(addr); 400 400 } 401 401 402 402 #ifndef CONFIG_X86_PAE 403 - static fastcall void native_set_pte(pte_t *ptep, pte_t pteval) 403 + static void native_set_pte(pte_t *ptep, pte_t pteval) 404 404 { 405 405 *ptep = pteval; 406 406 } 407 407 408 - static fastcall void native_set_pte_at(struct mm_struct *mm, u32 addr, pte_t *ptep, pte_t pteval) 408 + static void native_set_pte_at(struct mm_struct *mm, u32 addr, pte_t *ptep, pte_t pteval) 409 409 { 410 410 *ptep = pteval; 411 411 } 412 412 413 - static fastcall void native_set_pmd(pmd_t *pmdp, pmd_t pmdval) 413 + static void native_set_pmd(pmd_t *pmdp, pmd_t pmdval) 414 414 { 415 415 *pmdp = pmdval; 416 416 } 417 417 418 418 #else /* CONFIG_X86_PAE */ 419 419 420 - static fastcall void native_set_pte(pte_t *ptep, pte_t pte) 420 + static void native_set_pte(pte_t *ptep, pte_t pte) 421 421 { 422 422 ptep->pte_high = pte.pte_high; 423 423 smp_wmb(); 424 424 ptep->pte_low = pte.pte_low; 425 425 } 426 426 427 - static fastcall void native_set_pte_at(struct mm_struct *mm, u32 addr, pte_t *ptep, pte_t pte) 427 + static void native_set_pte_at(struct mm_struct *mm, u32 addr, pte_t *ptep, pte_t pte) 428 428 { 429 429 ptep->pte_high = pte.pte_high; 430 430 smp_wmb(); 431 431 ptep->pte_low = pte.pte_low; 432 432 } 433 433 434 - static fastcall void native_set_pte_present(struct mm_struct *mm, unsigned long addr, pte_t *ptep, pte_t pte) 434 + static void native_set_pte_present(struct mm_struct *mm, unsigned long addr, pte_t *ptep, pte_t pte) 435 435 { 436 436 ptep->pte_low = 0; 437 437 smp_wmb(); ··· 440 440 ptep->pte_low = pte.pte_low; 441 441 } 442 442 443 - static fastcall void native_set_pte_atomic(pte_t *ptep, pte_t pteval) 443 + static void native_set_pte_atomic(pte_t *ptep, pte_t pteval) 444 444 { 445 445 set_64bit((unsigned long long *)ptep,pte_val(pteval)); 446 446 } 447 447 448 - static fastcall void native_set_pmd(pmd_t *pmdp, pmd_t pmdval) 448 + static void native_set_pmd(pmd_t *pmdp, pmd_t pmdval) 449 449 { 450 450 set_64bit((unsigned long long *)pmdp,pmd_val(pmdval)); 451 451 } 452 452 453 - static fastcall void native_set_pud(pud_t *pudp, pud_t pudval) 453 + static void native_set_pud(pud_t *pudp, pud_t pudval) 454 454 { 455 455 *pudp = pudval; 456 456 } 457 457 458 - static fastcall void native_pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep) 458 + static void native_pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep) 459 459 { 460 460 ptep->pte_low = 0; 461 461 smp_wmb(); 462 462 ptep->pte_high = 0; 463 463 } 464 464 465 - static fastcall void native_pmd_clear(pmd_t *pmd) 465 + static void native_pmd_clear(pmd_t *pmd) 466 466 { 467 467 u32 *tmp = (u32 *)pmd; 468 468 *tmp = 0; ··· 472 472 #endif /* CONFIG_X86_PAE */ 473 473 474 474 /* These are in entry.S */ 475 - extern fastcall void native_iret(void); 476 - extern fastcall void native_irq_enable_sysexit(void); 475 + extern void native_iret(void); 476 + extern void native_irq_enable_sysexit(void); 477 477 478 478 static int __init print_banner(void) 479 479 { ··· 481 481 return 0; 482 482 } 483 483 core_initcall(print_banner); 484 - 485 - /* We simply declare start_kernel to be the paravirt probe of last resort. */ 486 - paravirt_probe(start_kernel); 487 484 488 485 struct paravirt_ops paravirt_ops = { 489 486 .name = "bare hardware", ··· 541 544 .apic_write = native_apic_write, 542 545 .apic_write_atomic = native_apic_write_atomic, 543 546 .apic_read = native_apic_read, 547 + .setup_boot_clock = setup_boot_APIC_clock, 548 + .setup_secondary_clock = setup_secondary_APIC_clock, 544 549 #endif 550 + .set_lazy_mode = (void *)native_nop, 545 551 546 552 .flush_tlb_user = native_flush_tlb, 547 553 .flush_tlb_kernel = native_flush_tlb_global, 548 554 .flush_tlb_single = native_flush_tlb_single, 555 + 556 + .alloc_pt = (void *)native_nop, 557 + .alloc_pd = (void *)native_nop, 558 + .alloc_pd_clone = (void *)native_nop, 559 + .release_pt = (void *)native_nop, 560 + .release_pd = (void *)native_nop, 549 561 550 562 .set_pte = native_set_pte, 551 563 .set_pte_at = native_set_pte_at, ··· 571 565 572 566 .irq_enable_sysexit = native_irq_enable_sysexit, 573 567 .iret = native_iret, 568 + 569 + .startup_ipi_hook = (void *)native_nop, 574 570 }; 575 571 576 572 /*

+20

arch/i386/kernel/pcspeaker.c

··· 1 + #include <linux/platform_device.h> 2 + #include <linux/errno.h> 3 + #include <linux/init.h> 4 + 5 + static __init int add_pcspkr(void) 6 + { 7 + struct platform_device *pd; 8 + int ret; 9 + 10 + pd = platform_device_alloc("pcspkr", -1); 11 + if (!pd) 12 + return -ENOMEM; 13 + 14 + ret = platform_device_add(pd); 15 + if (ret) 16 + platform_device_put(pd); 17 + 18 + return ret; 19 + } 20 + device_initcall(add_pcspkr);

+83 -16

arch/i386/kernel/process.c

··· 48 48 #include <asm/i387.h> 49 49 #include <asm/desc.h> 50 50 #include <asm/vm86.h> 51 + #include <asm/idle.h> 51 52 #ifdef CONFIG_MATH_EMULATION 52 53 #include <asm/math_emu.h> 53 54 #endif ··· 80 79 void (*pm_idle)(void); 81 80 EXPORT_SYMBOL(pm_idle); 82 81 static DEFINE_PER_CPU(unsigned int, cpu_idle_state); 82 + 83 + static ATOMIC_NOTIFIER_HEAD(idle_notifier); 84 + 85 + void idle_notifier_register(struct notifier_block *n) 86 + { 87 + atomic_notifier_chain_register(&idle_notifier, n); 88 + } 89 + 90 + void idle_notifier_unregister(struct notifier_block *n) 91 + { 92 + atomic_notifier_chain_unregister(&idle_notifier, n); 93 + } 94 + 95 + static DEFINE_PER_CPU(volatile unsigned long, idle_state); 96 + 97 + void enter_idle(void) 98 + { 99 + /* needs to be atomic w.r.t. interrupts, not against other CPUs */ 100 + __set_bit(0, &__get_cpu_var(idle_state)); 101 + atomic_notifier_call_chain(&idle_notifier, IDLE_START, NULL); 102 + } 103 + 104 + static void __exit_idle(void) 105 + { 106 + /* needs to be atomic w.r.t. interrupts, not against other CPUs */ 107 + if (__test_and_clear_bit(0, &__get_cpu_var(idle_state)) == 0) 108 + return; 109 + atomic_notifier_call_chain(&idle_notifier, IDLE_END, NULL); 110 + } 111 + 112 + void exit_idle(void) 113 + { 114 + if (current->pid) 115 + return; 116 + __exit_idle(); 117 + } 83 118 84 119 void disable_hlt(void) 85 120 { ··· 167 130 */ 168 131 static void poll_idle (void) 169 132 { 133 + local_irq_enable(); 170 134 cpu_relax(); 171 135 } 172 136 ··· 227 189 play_dead(); 228 190 229 191 __get_cpu_var(irq_stat).idle_timestamp = jiffies; 192 + 193 + /* 194 + * Idle routines should keep interrupts disabled 195 + * from here on, until they go to idle. 196 + * Otherwise, idle callbacks can misfire. 197 + */ 198 + local_irq_disable(); 199 + enter_idle(); 230 200 idle(); 201 + __exit_idle(); 231 202 } 232 203 preempt_enable_no_resched(); 233 204 schedule(); ··· 290 243 __monitor((void *)&current_thread_info()->flags, 0, 0); 291 244 smp_mb(); 292 245 if (!need_resched()) 293 - __mwait(eax, ecx); 246 + __sti_mwait(eax, ecx); 247 + else 248 + local_irq_enable(); 249 + } else { 250 + local_irq_enable(); 294 251 } 295 252 } 296 253 ··· 359 308 regs->eax,regs->ebx,regs->ecx,regs->edx); 360 309 printk("ESI: %08lx EDI: %08lx EBP: %08lx", 361 310 regs->esi, regs->edi, regs->ebp); 362 - printk(" DS: %04x ES: %04x GS: %04x\n", 363 - 0xffff & regs->xds,0xffff & regs->xes, 0xffff & regs->xgs); 311 + printk(" DS: %04x ES: %04x FS: %04x\n", 312 + 0xffff & regs->xds,0xffff & regs->xes, 0xffff & regs->xfs); 364 313 365 314 cr0 = read_cr0(); 366 315 cr2 = read_cr2(); ··· 391 340 392 341 regs.xds = __USER_DS; 393 342 regs.xes = __USER_DS; 394 - regs.xgs = __KERNEL_PDA; 343 + regs.xfs = __KERNEL_PDA; 395 344 regs.orig_eax = -1; 396 345 regs.eip = (unsigned long) kernel_thread_helper; 397 346 regs.xcs = __KERNEL_CS | get_kernel_rpl(); ··· 476 425 477 426 p->thread.eip = (unsigned long) ret_from_fork; 478 427 479 - savesegment(fs,p->thread.fs); 428 + savesegment(gs,p->thread.gs); 480 429 481 430 tsk = current; 482 431 if (unlikely(test_tsk_thread_flag(tsk, TIF_IO_BITMAP))) { ··· 552 501 dump->regs.eax = regs->eax; 553 502 dump->regs.ds = regs->xds; 554 503 dump->regs.es = regs->xes; 555 - savesegment(fs,dump->regs.fs); 556 - dump->regs.gs = regs->xgs; 504 + dump->regs.fs = regs->xfs; 505 + savesegment(gs,dump->regs.gs); 557 506 dump->regs.orig_eax = regs->orig_eax; 558 507 dump->regs.eip = regs->eip; 559 508 dump->regs.cs = regs->xcs; ··· 704 653 load_esp0(tss, next); 705 654 706 655 /* 707 - * Save away %fs. No need to save %gs, as it was saved on the 656 + * Save away %gs. No need to save %fs, as it was saved on the 708 657 * stack on entry. No need to save %es and %ds, as those are 709 658 * always kernel segments while inside the kernel. Doing this 710 659 * before setting the new TLS descriptors avoids the situation ··· 713 662 * used %fs or %gs (it does not today), or if the kernel is 714 663 * running inside of a hypervisor layer. 715 664 */ 716 - savesegment(fs, prev->fs); 665 + savesegment(gs, prev->gs); 717 666 718 667 /* 719 668 * Load the per-thread Thread-Local Storage descriptor. ··· 721 670 load_TLS(next, cpu); 722 671 723 672 /* 724 - * Restore %fs if needed. 725 - * 726 - * Glibc normally makes %fs be zero. 673 + * Restore IOPL if needed. In normal use, the flags restore 674 + * in the switch assembly will handle this. But if the kernel 675 + * is running virtualized at a non-zero CPL, the popf will 676 + * not restore flags, so it must be done in a separate step. 727 677 */ 728 - if (unlikely(prev->fs | next->fs)) 729 - loadsegment(fs, next->fs); 730 - 731 - write_pda(pcurrent, next_p); 678 + if (get_kernel_rpl() && unlikely(prev->iopl != next->iopl)) 679 + set_iopl_mask(next->iopl); 732 680 733 681 /* 734 682 * Now maybe handle debug registers and/or IO bitmaps ··· 738 688 739 689 disable_tsc(prev_p, next_p); 740 690 691 + /* 692 + * Leave lazy mode, flushing any hypercalls made here. 693 + * This must be done before restoring TLS segments so 694 + * the GDT and LDT are properly updated, and must be 695 + * done before math_state_restore, so the TS bit is up 696 + * to date. 697 + */ 698 + arch_leave_lazy_cpu_mode(); 699 + 741 700 /* If the task has used fpu the last 5 timeslices, just do a full 742 701 * restore of the math state immediately to avoid the trap; the 743 702 * chances of needing FPU soon are obviously high now 744 703 */ 745 704 if (next_p->fpu_counter > 5) 746 705 math_state_restore(); 706 + 707 + /* 708 + * Restore %gs if needed (which is common) 709 + */ 710 + if (prev->gs | next->gs) 711 + loadsegment(gs, next->gs); 712 + 713 + write_pda(pcurrent, next_p); 747 714 748 715 return prev_p; 749 716 }

+8 -8

arch/i386/kernel/ptrace.c

··· 89 89 unsigned long regno, unsigned long value) 90 90 { 91 91 switch (regno >> 2) { 92 - case FS: 92 + case GS: 93 93 if (value && (value & 3) != 3) 94 94 return -EIO; 95 - child->thread.fs = value; 95 + child->thread.gs = value; 96 96 return 0; 97 97 case DS: 98 98 case ES: 99 - case GS: 99 + case FS: 100 100 if (value && (value & 3) != 3) 101 101 return -EIO; 102 102 value &= 0xffff; ··· 112 112 value |= get_stack_long(child, EFL_OFFSET) & ~FLAG_MASK; 113 113 break; 114 114 } 115 - if (regno > ES*4) 115 + if (regno > FS*4) 116 116 regno -= 1*4; 117 117 put_stack_long(child, regno, value); 118 118 return 0; ··· 124 124 unsigned long retval = ~0UL; 125 125 126 126 switch (regno >> 2) { 127 - case FS: 128 - retval = child->thread.fs; 127 + case GS: 128 + retval = child->thread.gs; 129 129 break; 130 130 case DS: 131 131 case ES: 132 - case GS: 132 + case FS: 133 133 case SS: 134 134 case CS: 135 135 retval = 0xffff; 136 136 /* fall through */ 137 137 default: 138 - if (regno > ES*4) 138 + if (regno > FS*4) 139 139 regno -= 1*4; 140 140 retval &= get_stack_long(child, regno); 141 141 }

+9 -26

arch/i386/kernel/setup.c

··· 33 33 #include <linux/initrd.h> 34 34 #include <linux/bootmem.h> 35 35 #include <linux/seq_file.h> 36 - #include <linux/platform_device.h> 37 36 #include <linux/console.h> 38 37 #include <linux/mca.h> 39 38 #include <linux/root_dev.h> ··· 59 60 #include <asm/io_apic.h> 60 61 #include <asm/ist.h> 61 62 #include <asm/io.h> 63 + #include <asm/vmi.h> 62 64 #include <setup_arch.h> 63 65 #include <bios_ebda.h> 64 66 ··· 581 581 582 582 max_low_pfn = setup_memory(); 583 583 584 + #ifdef CONFIG_VMI 585 + /* 586 + * Must be after max_low_pfn is determined, and before kernel 587 + * pagetables are setup. 588 + */ 589 + vmi_init(); 590 + #endif 591 + 584 592 /* 585 593 * NOTE: before this point _nobody_ is allowed to allocate 586 594 * any memory using the bootmem allocator. Although the ··· 659 651 #endif 660 652 tsc_init(); 661 653 } 662 - 663 - static __init int add_pcspkr(void) 664 - { 665 - struct platform_device *pd; 666 - int ret; 667 - 668 - pd = platform_device_alloc("pcspkr", -1); 669 - if (!pd) 670 - return -ENOMEM; 671 - 672 - ret = platform_device_add(pd); 673 - if (ret) 674 - platform_device_put(pd); 675 - 676 - return ret; 677 - } 678 - device_initcall(add_pcspkr); 679 - 680 - /* 681 - * Local Variables: 682 - * mode:c 683 - * c-file-style:"k&r" 684 - * c-basic-offset:8 685 - * End: 686 - */

+10 -6

arch/i386/kernel/signal.c

··· 21 21 #include <linux/suspend.h> 22 22 #include <linux/ptrace.h> 23 23 #include <linux/elf.h> 24 + #include <linux/binfmts.h> 24 25 #include <asm/processor.h> 25 26 #include <asm/ucontext.h> 26 27 #include <asm/uaccess.h> ··· 129 128 X86_EFLAGS_TF | X86_EFLAGS_SF | X86_EFLAGS_ZF | \ 130 129 X86_EFLAGS_AF | X86_EFLAGS_PF | X86_EFLAGS_CF) 131 130 132 - COPY_SEG(gs); 133 - GET_SEG(fs); 131 + GET_SEG(gs); 132 + COPY_SEG(fs); 134 133 COPY_SEG(es); 135 134 COPY_SEG(ds); 136 135 COPY(edi); ··· 245 244 { 246 245 int tmp, err = 0; 247 246 248 - err |= __put_user(regs->xgs, (unsigned int __user *)&sc->gs); 249 - savesegment(fs, tmp); 250 - err |= __put_user(tmp, (unsigned int __user *)&sc->fs); 247 + err |= __put_user(regs->xfs, (unsigned int __user *)&sc->fs); 248 + savesegment(gs, tmp); 249 + err |= __put_user(tmp, (unsigned int __user *)&sc->gs); 251 250 252 251 err |= __put_user(regs->xes, (unsigned int __user *)&sc->es); 253 252 err |= __put_user(regs->xds, (unsigned int __user *)&sc->ds); ··· 350 349 goto give_sigsegv; 351 350 } 352 351 353 - restorer = (void *)VDSO_SYM(&__kernel_sigreturn); 352 + if (current->binfmt->hasvdso) 353 + restorer = (void *)VDSO_SYM(&__kernel_sigreturn); 354 + else 355 + restorer = (void *)&frame->retcode; 354 356 if (ka->sa.sa_flags & SA_RESTORER) 355 357 restorer = ka->sa.sa_restorer; 356 358

+4 -3

arch/i386/kernel/smp.c

··· 23 23 24 24 #include <asm/mtrr.h> 25 25 #include <asm/tlbflush.h> 26 + #include <asm/idle.h> 26 27 #include <mach_apic.h> 27 28 28 29 /* ··· 375 374 /* 376 375 * i'm not happy about this global shared spinlock in the 377 376 * MM hot path, but we'll see how contended it is. 378 - * Temporarily this turns IRQs off, so that lockups are 379 - * detected by the NMI watchdog. 377 + * AK: x86-64 has a faster method that could be ported. 380 378 */ 381 379 spin_lock(&tlbstate_lock); 382 380 ··· 400 400 401 401 while (!cpus_empty(flush_cpumask)) 402 402 /* nothing. lockup detection does not belong here */ 403 - mb(); 403 + cpu_relax(); 404 404 405 405 flush_mm = NULL; 406 406 flush_va = 0; ··· 624 624 /* 625 625 * At this point the info structure may be out of scope unless wait==1 626 626 */ 627 + exit_idle(); 627 628 irq_enter(); 628 629 (*func)(info); 629 630 irq_exit();

+13 -3

arch/i386/kernel/smpboot.c

··· 63 63 #include <mach_apic.h> 64 64 #include <mach_wakecpu.h> 65 65 #include <smpboot_hooks.h> 66 + #include <asm/vmi.h> 66 67 67 68 /* Set if we find a B stepping CPU */ 68 69 static int __devinitdata smp_b_stepping; ··· 546 545 * booting is too fragile that we want to limit the 547 546 * things done here to the most necessary things. 548 547 */ 548 + #ifdef CONFIG_VMI 549 + vmi_bringup(); 550 + #endif 549 551 secondary_cpu_init(); 550 552 preempt_disable(); 551 553 smp_callin(); 552 554 while (!cpu_isset(smp_processor_id(), smp_commenced_mask)) 553 555 rep_nop(); 554 - setup_secondary_APIC_clock(); 556 + setup_secondary_clock(); 555 557 if (nmi_watchdog == NMI_IO_APIC) { 556 558 disable_8259A_irq(0); 557 559 enable_NMI_through_LVT0(NULL); ··· 623 619 unsigned short ss; 624 620 } stack_start; 625 621 extern struct i386_pda *start_pda; 626 - extern struct Xgt_desc_struct cpu_gdt_descr; 627 622 628 623 #ifdef CONFIG_NUMA 629 624 ··· 836 833 num_starts = 2; 837 834 else 838 835 num_starts = 0; 836 + 837 + /* 838 + * Paravirt / VMI wants a startup IPI hook here to set up the 839 + * target processor state. 840 + */ 841 + startup_ipi_hook(phys_apicid, (unsigned long) start_secondary, 842 + (unsigned long) stack_start.esp); 839 843 840 844 /* 841 845 * Run STARTUP IPI loop. ··· 1330 1320 1331 1321 smpboot_setup_io_apic(); 1332 1322 1333 - setup_boot_APIC_clock(); 1323 + setup_boot_clock(); 1334 1324 1335 1325 /* 1336 1326 * Synchronize the TSC with the AP

+1 -1

arch/i386/kernel/sysenter.c

··· 78 78 syscall_pages[0] = virt_to_page(syscall_page); 79 79 80 80 #ifdef CONFIG_COMPAT_VDSO 81 - __set_fixmap(FIX_VDSO, __pa(syscall_page), PAGE_READONLY); 81 + __set_fixmap(FIX_VDSO, __pa(syscall_page), PAGE_READONLY_EXEC); 82 82 printk("Compat vDSO mapped to %08lx.\n", __fix_to_virt(FIX_VDSO)); 83 83 #endif 84 84

+7 -7

arch/i386/kernel/time.c

··· 131 131 unsigned long pc = instruction_pointer(regs); 132 132 133 133 #ifdef CONFIG_SMP 134 - if (!user_mode_vm(regs) && in_lock_functions(pc)) { 134 + if (!v8086_mode(regs) && SEGMENT_IS_KERNEL_CODE(regs->xcs) && 135 + in_lock_functions(pc)) { 135 136 #ifdef CONFIG_FRAME_POINTER 136 137 return *(unsigned long *)(regs->ebp + 4); 137 138 #else 138 - unsigned long *sp; 139 - if ((regs->xcs & 3) == 0) 140 - sp = (unsigned long *)&regs->esp; 141 - else 142 - sp = (unsigned long *)regs->esp; 139 + unsigned long *sp = (unsigned long *)&regs->esp; 140 + 143 141 /* Return address is either directly at stack pointer 144 142 or above a saved eflags. Eflags has bits 22-31 zero, 145 143 kernel addresses don't. */ ··· 230 232 static void sync_cmos_clock(unsigned long dummy); 231 233 232 234 static DEFINE_TIMER(sync_cmos_timer, sync_cmos_clock, 0, 0); 235 + int no_sync_cmos_clock; 233 236 234 237 static void sync_cmos_clock(unsigned long dummy) 235 238 { ··· 274 275 275 276 void notify_arch_cmos_timer(void) 276 277 { 277 - mod_timer(&sync_cmos_timer, jiffies + 1); 278 + if (!no_sync_cmos_clock) 279 + mod_timer(&sync_cmos_timer, jiffies + 1); 278 280 } 279 281 280 282 static long clock_cmos_diff;

+20 -7

arch/i386/kernel/traps.c

··· 94 94 asmlinkage void machine_check(void); 95 95 96 96 int kstack_depth_to_print = 24; 97 + static unsigned int code_bytes = 64; 97 98 ATOMIC_NOTIFIER_HEAD(i386die_chain); 98 99 99 100 int register_die_notifier(struct notifier_block *nb) ··· 292 291 int i; 293 292 int in_kernel = 1; 294 293 unsigned long esp; 295 - unsigned short ss; 294 + unsigned short ss, gs; 296 295 297 296 esp = (unsigned long) (&regs->esp); 298 297 savesegment(ss, ss); 298 + savesegment(gs, gs); 299 299 if (user_mode_vm(regs)) { 300 300 in_kernel = 0; 301 301 esp = regs->esp; ··· 315 313 regs->eax, regs->ebx, regs->ecx, regs->edx); 316 314 printk(KERN_EMERG "esi: %08lx edi: %08lx ebp: %08lx esp: %08lx\n", 317 315 regs->esi, regs->edi, regs->ebp, esp); 318 - printk(KERN_EMERG "ds: %04x es: %04x ss: %04x\n", 319 - regs->xds & 0xffff, regs->xes & 0xffff, ss); 316 + printk(KERN_EMERG "ds: %04x es: %04x fs: %04x gs: %04x ss: %04x\n", 317 + regs->xds & 0xffff, regs->xes & 0xffff, regs->xfs & 0xffff, gs, ss); 320 318 printk(KERN_EMERG "Process %.*s (pid: %d, ti=%p task=%p task.ti=%p)", 321 319 TASK_COMM_LEN, current->comm, current->pid, 322 320 current_thread_info(), current, current->thread_info); ··· 326 324 */ 327 325 if (in_kernel) { 328 326 u8 *eip; 329 - int code_bytes = 64; 327 + unsigned int code_prologue = code_bytes * 43 / 64; 328 + unsigned int code_len = code_bytes; 330 329 unsigned char c; 331 330 332 331 printk("\n" KERN_EMERG "Stack: "); ··· 335 332 336 333 printk(KERN_EMERG "Code: "); 337 334 338 - eip = (u8 *)regs->eip - 43; 335 + eip = (u8 *)regs->eip - code_prologue; 339 336 if (eip < (u8 *)PAGE_OFFSET || 340 337 probe_kernel_address(eip, c)) { 341 338 /* try starting at EIP */ 342 339 eip = (u8 *)regs->eip; 343 - code_bytes = 32; 340 + code_len = code_len - code_prologue + 1; 344 341 } 345 - for (i = 0; i < code_bytes; i++, eip++) { 342 + for (i = 0; i < code_len; i++, eip++) { 346 343 if (eip < (u8 *)PAGE_OFFSET || 347 344 probe_kernel_address(eip, c)) { 348 345 printk(" Bad EIP value."); ··· 1194 1191 return 1; 1195 1192 } 1196 1193 __setup("kstack=", kstack_setup); 1194 + 1195 + static int __init code_bytes_setup(char *s) 1196 + { 1197 + code_bytes = simple_strtoul(s, NULL, 0); 1198 + if (code_bytes > 8192) 1199 + code_bytes = 8192; 1200 + 1201 + return 1; 1202 + } 1203 + __setup("code_bytes=", code_bytes_setup);

+18 -8

arch/i386/kernel/tsc.c

··· 23 23 * an extra value to store the TSC freq 24 24 */ 25 25 unsigned int tsc_khz; 26 + unsigned long long (*custom_sched_clock)(void); 26 27 27 28 int tsc_disable; 28 29 ··· 108 107 { 109 108 unsigned long long this_offset; 110 109 110 + if (unlikely(custom_sched_clock)) 111 + return (*custom_sched_clock)(); 112 + 111 113 /* 112 - * in the NUMA case we dont use the TSC as they are not 113 - * synchronized across all CPUs. 114 + * Fall back to jiffies if there's no TSC available: 114 115 */ 115 - #ifndef CONFIG_NUMA 116 - if (!cpu_khz || check_tsc_unstable()) 117 - #endif 118 - /* no locking but a rare wrong value is not a big deal */ 116 + if (unlikely(tsc_disable)) 117 + /* No locking but a rare wrong value is not a big deal: */ 119 118 return (jiffies_64 - INITIAL_JIFFIES) * (1000000000 / HZ); 120 119 121 120 /* read the Time Stamp Counter: */ ··· 195 194 void __init tsc_init(void) 196 195 { 197 196 if (!cpu_has_tsc || tsc_disable) 198 - return; 197 + goto out_no_tsc; 199 198 200 199 cpu_khz = calculate_cpu_khz(); 201 200 tsc_khz = cpu_khz; 202 201 203 202 if (!cpu_khz) 204 - return; 203 + goto out_no_tsc; 205 204 206 205 printk("Detected %lu.%03lu MHz processor.\n", 207 206 (unsigned long)cpu_khz / 1000, ··· 209 208 210 209 set_cyc2ns_scale(cpu_khz); 211 210 use_tsc_delay(); 211 + return; 212 + 213 + out_no_tsc: 214 + /* 215 + * Set the tsc_disable flag if there's no TSC support, this 216 + * makes it a fast flag for the kernel to see whether it 217 + * should be using the TSC. 218 + */ 219 + tsc_disable = 1; 212 220 } 213 221 214 222 #ifdef CONFIG_CPU_FREQ

+17 -16

arch/i386/kernel/vm86.c

··· 96 96 { 97 97 int ret = 0; 98 98 99 - /* kernel_vm86_regs is missing xfs, so copy everything up to 100 - (but not including) xgs, and then rest after xgs. */ 101 - ret += copy_to_user(user, regs, offsetof(struct kernel_vm86_regs, pt.xgs)); 102 - ret += copy_to_user(&user->__null_gs, &regs->pt.xgs, 99 + /* kernel_vm86_regs is missing xgs, so copy everything up to 100 + (but not including) orig_eax, and then rest including orig_eax. */ 101 + ret += copy_to_user(user, regs, offsetof(struct kernel_vm86_regs, pt.orig_eax)); 102 + ret += copy_to_user(&user->orig_eax, &regs->pt.orig_eax, 103 103 sizeof(struct kernel_vm86_regs) - 104 - offsetof(struct kernel_vm86_regs, pt.xgs)); 104 + offsetof(struct kernel_vm86_regs, pt.orig_eax)); 105 105 106 106 return ret; 107 107 } ··· 113 113 { 114 114 int ret = 0; 115 115 116 - ret += copy_from_user(regs, user, offsetof(struct kernel_vm86_regs, pt.xgs)); 117 - ret += copy_from_user(&regs->pt.xgs, &user->__null_gs, 116 + /* copy eax-xfs inclusive */ 117 + ret += copy_from_user(regs, user, offsetof(struct kernel_vm86_regs, pt.orig_eax)); 118 + /* copy orig_eax-__gsh+extra */ 119 + ret += copy_from_user(&regs->pt.orig_eax, &user->orig_eax, 118 120 sizeof(struct kernel_vm86_regs) - 119 - offsetof(struct kernel_vm86_regs, pt.xgs) + 121 + offsetof(struct kernel_vm86_regs, pt.orig_eax) + 120 122 extra); 121 - 122 123 return ret; 123 124 } 124 125 ··· 158 157 159 158 ret = KVM86->regs32; 160 159 161 - loadsegment(fs, current->thread.saved_fs); 162 - ret->xgs = current->thread.saved_gs; 160 + ret->xfs = current->thread.saved_fs; 161 + loadsegment(gs, current->thread.saved_gs); 163 162 164 163 return ret; 165 164 } ··· 286 285 */ 287 286 info->regs.pt.xds = 0; 288 287 info->regs.pt.xes = 0; 289 - info->regs.pt.xgs = 0; 288 + info->regs.pt.xfs = 0; 290 289 291 - /* we are clearing fs later just before "jmp resume_userspace", 290 + /* we are clearing gs later just before "jmp resume_userspace", 292 291 * because it is not saved/restored. 293 292 */ 294 293 ··· 322 321 */ 323 322 info->regs32->eax = 0; 324 323 tsk->thread.saved_esp0 = tsk->thread.esp0; 325 - savesegment(fs, tsk->thread.saved_fs); 326 - tsk->thread.saved_gs = info->regs32->xgs; 324 + tsk->thread.saved_fs = info->regs32->xfs; 325 + savesegment(gs, tsk->thread.saved_gs); 327 326 328 327 tss = &per_cpu(init_tss, get_cpu()); 329 328 tsk->thread.esp0 = (unsigned long) &info->VM86_TSS_ESP0; ··· 343 342 __asm__ __volatile__( 344 343 "movl %0,%%esp\n\t" 345 344 "movl %1,%%ebp\n\t" 346 - "mov %2, %%fs\n\t" 345 + "mov %2, %%gs\n\t" 347 346 "jmp resume_userspace" 348 347 : /* no outputs */ 349 348 :"r" (&info->regs), "r" (task_thread_info(tsk)), "r" (0));

+949

arch/i386/kernel/vmi.c

··· 1 + /* 2 + * VMI specific paravirt-ops implementation 3 + * 4 + * Copyright (C) 2005, VMware, Inc. 5 + * 6 + * This program is free software; you can redistribute it and/or modify 7 + * it under the terms of the GNU General Public License as published by 8 + * the Free Software Foundation; either version 2 of the License, or 9 + * (at your option) any later version. 10 + * 11 + * This program is distributed in the hope that it will be useful, but 12 + * WITHOUT ANY WARRANTY; without even the implied warranty of 13 + * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or 14 + * NON INFRINGEMENT. See the GNU General Public License for more 15 + * details. 16 + * 17 + * You should have received a copy of the GNU General Public License 18 + * along with this program; if not, write to the Free Software 19 + * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. 20 + * 21 + * Send feedback to zach@vmware.com 22 + * 23 + */ 24 + 25 + #include <linux/module.h> 26 + #include <linux/license.h> 27 + #include <linux/cpu.h> 28 + #include <linux/bootmem.h> 29 + #include <linux/mm.h> 30 + #include <asm/vmi.h> 31 + #include <asm/io.h> 32 + #include <asm/fixmap.h> 33 + #include <asm/apicdef.h> 34 + #include <asm/apic.h> 35 + #include <asm/processor.h> 36 + #include <asm/timer.h> 37 + #include <asm/vmi_time.h> 38 + 39 + /* Convenient for calling VMI functions indirectly in the ROM */ 40 + typedef u32 __attribute__((regparm(1))) (VROMFUNC)(void); 41 + typedef u64 __attribute__((regparm(2))) (VROMLONGFUNC)(int); 42 + 43 + #define call_vrom_func(rom,func) \ 44 + (((VROMFUNC *)(rom->func))()) 45 + 46 + #define call_vrom_long_func(rom,func,arg) \ 47 + (((VROMLONGFUNC *)(rom->func)) (arg)) 48 + 49 + static struct vrom_header *vmi_rom; 50 + static int license_gplok; 51 + static int disable_nodelay; 52 + static int disable_pge; 53 + static int disable_pse; 54 + static int disable_sep; 55 + static int disable_tsc; 56 + static int disable_mtrr; 57 + 58 + /* Cached VMI operations */ 59 + struct { 60 + void (*cpuid)(void /* non-c */); 61 + void (*_set_ldt)(u32 selector); 62 + void (*set_tr)(u32 selector); 63 + void (*set_kernel_stack)(u32 selector, u32 esp0); 64 + void (*allocate_page)(u32, u32, u32, u32, u32); 65 + void (*release_page)(u32, u32); 66 + void (*set_pte)(pte_t, pte_t *, unsigned); 67 + void (*update_pte)(pte_t *, unsigned); 68 + void (*set_linear_mapping)(int, u32, u32, u32); 69 + void (*flush_tlb)(int); 70 + void (*set_initial_ap_state)(int, int); 71 + void (*halt)(void); 72 + } vmi_ops; 73 + 74 + /* XXX move this to alternative.h */ 75 + extern struct paravirt_patch __start_parainstructions[], 76 + __stop_parainstructions[]; 77 + 78 + /* 79 + * VMI patching routines. 80 + */ 81 + #define MNEM_CALL 0xe8 82 + #define MNEM_JMP 0xe9 83 + #define MNEM_RET 0xc3 84 + 85 + static char irq_save_disable_callout[] = { 86 + MNEM_CALL, 0, 0, 0, 0, 87 + MNEM_CALL, 0, 0, 0, 0, 88 + MNEM_RET 89 + }; 90 + #define IRQ_PATCH_INT_MASK 0 91 + #define IRQ_PATCH_DISABLE 5 92 + 93 + static inline void patch_offset(unsigned char *eip, unsigned char *dest) 94 + { 95 + *(unsigned long *)(eip+1) = dest-eip-5; 96 + } 97 + 98 + static unsigned patch_internal(int call, unsigned len, void *insns) 99 + { 100 + u64 reloc; 101 + struct vmi_relocation_info *const rel = (struct vmi_relocation_info *)&reloc; 102 + reloc = call_vrom_long_func(vmi_rom, get_reloc, call); 103 + switch(rel->type) { 104 + case VMI_RELOCATION_CALL_REL: 105 + BUG_ON(len < 5); 106 + *(char *)insns = MNEM_CALL; 107 + patch_offset(insns, rel->eip); 108 + return 5; 109 + 110 + case VMI_RELOCATION_JUMP_REL: 111 + BUG_ON(len < 5); 112 + *(char *)insns = MNEM_JMP; 113 + patch_offset(insns, rel->eip); 114 + return 5; 115 + 116 + case VMI_RELOCATION_NOP: 117 + /* obliterate the whole thing */ 118 + return 0; 119 + 120 + case VMI_RELOCATION_NONE: 121 + /* leave native code in place */ 122 + break; 123 + 124 + default: 125 + BUG(); 126 + } 127 + return len; 128 + } 129 + 130 + /* 131 + * Apply patch if appropriate, return length of new instruction 132 + * sequence. The callee does nop padding for us. 133 + */ 134 + static unsigned vmi_patch(u8 type, u16 clobbers, void *insns, unsigned len) 135 + { 136 + switch (type) { 137 + case PARAVIRT_IRQ_DISABLE: 138 + return patch_internal(VMI_CALL_DisableInterrupts, len, insns); 139 + case PARAVIRT_IRQ_ENABLE: 140 + return patch_internal(VMI_CALL_EnableInterrupts, len, insns); 141 + case PARAVIRT_RESTORE_FLAGS: 142 + return patch_internal(VMI_CALL_SetInterruptMask, len, insns); 143 + case PARAVIRT_SAVE_FLAGS: 144 + return patch_internal(VMI_CALL_GetInterruptMask, len, insns); 145 + case PARAVIRT_SAVE_FLAGS_IRQ_DISABLE: 146 + if (len >= 10) { 147 + patch_internal(VMI_CALL_GetInterruptMask, len, insns); 148 + patch_internal(VMI_CALL_DisableInterrupts, len-5, insns+5); 149 + return 10; 150 + } else { 151 + /* 152 + * You bastards didn't leave enough room to 153 + * patch save_flags_irq_disable inline. Patch 154 + * to a helper 155 + */ 156 + BUG_ON(len < 5); 157 + *(char *)insns = MNEM_CALL; 158 + patch_offset(insns, irq_save_disable_callout); 159 + return 5; 160 + } 161 + case PARAVIRT_INTERRUPT_RETURN: 162 + return patch_internal(VMI_CALL_IRET, len, insns); 163 + case PARAVIRT_STI_SYSEXIT: 164 + return patch_internal(VMI_CALL_SYSEXIT, len, insns); 165 + default: 166 + break; 167 + } 168 + return len; 169 + } 170 + 171 + /* CPUID has non-C semantics, and paravirt-ops API doesn't match hardware ISA */ 172 + static void vmi_cpuid(unsigned int *eax, unsigned int *ebx, 173 + unsigned int *ecx, unsigned int *edx) 174 + { 175 + int override = 0; 176 + if (*eax == 1) 177 + override = 1; 178 + asm volatile ("call *%6" 179 + : "=a" (*eax), 180 + "=b" (*ebx), 181 + "=c" (*ecx), 182 + "=d" (*edx) 183 + : "0" (*eax), "2" (*ecx), "r" (vmi_ops.cpuid)); 184 + if (override) { 185 + if (disable_pse) 186 + *edx &= ~X86_FEATURE_PSE; 187 + if (disable_pge) 188 + *edx &= ~X86_FEATURE_PGE; 189 + if (disable_sep) 190 + *edx &= ~X86_FEATURE_SEP; 191 + if (disable_tsc) 192 + *edx &= ~X86_FEATURE_TSC; 193 + if (disable_mtrr) 194 + *edx &= ~X86_FEATURE_MTRR; 195 + } 196 + } 197 + 198 + static inline void vmi_maybe_load_tls(struct desc_struct *gdt, int nr, struct desc_struct *new) 199 + { 200 + if (gdt[nr].a != new->a || gdt[nr].b != new->b) 201 + write_gdt_entry(gdt, nr, new->a, new->b); 202 + } 203 + 204 + static void vmi_load_tls(struct thread_struct *t, unsigned int cpu) 205 + { 206 + struct desc_struct *gdt = get_cpu_gdt_table(cpu); 207 + vmi_maybe_load_tls(gdt, GDT_ENTRY_TLS_MIN + 0, &t->tls_array[0]); 208 + vmi_maybe_load_tls(gdt, GDT_ENTRY_TLS_MIN + 1, &t->tls_array[1]); 209 + vmi_maybe_load_tls(gdt, GDT_ENTRY_TLS_MIN + 2, &t->tls_array[2]); 210 + } 211 + 212 + static void vmi_set_ldt(const void *addr, unsigned entries) 213 + { 214 + unsigned cpu = smp_processor_id(); 215 + u32 low, high; 216 + 217 + pack_descriptor(&low, &high, (unsigned long)addr, 218 + entries * sizeof(struct desc_struct) - 1, 219 + DESCTYPE_LDT, 0); 220 + write_gdt_entry(get_cpu_gdt_table(cpu), GDT_ENTRY_LDT, low, high); 221 + vmi_ops._set_ldt(entries ? GDT_ENTRY_LDT*sizeof(struct desc_struct) : 0); 222 + } 223 + 224 + static void vmi_set_tr(void) 225 + { 226 + vmi_ops.set_tr(GDT_ENTRY_TSS*sizeof(struct desc_struct)); 227 + } 228 + 229 + static void vmi_load_esp0(struct tss_struct *tss, 230 + struct thread_struct *thread) 231 + { 232 + tss->esp0 = thread->esp0; 233 + 234 + /* This can only happen when SEP is enabled, no need to test "SEP"arately */ 235 + if (unlikely(tss->ss1 != thread->sysenter_cs)) { 236 + tss->ss1 = thread->sysenter_cs; 237 + wrmsr(MSR_IA32_SYSENTER_CS, thread->sysenter_cs, 0); 238 + } 239 + vmi_ops.set_kernel_stack(__KERNEL_DS, tss->esp0); 240 + } 241 + 242 + static void vmi_flush_tlb_user(void) 243 + { 244 + vmi_ops.flush_tlb(VMI_FLUSH_TLB); 245 + } 246 + 247 + static void vmi_flush_tlb_kernel(void) 248 + { 249 + vmi_ops.flush_tlb(VMI_FLUSH_TLB | VMI_FLUSH_GLOBAL); 250 + } 251 + 252 + /* Stub to do nothing at all; used for delays and unimplemented calls */ 253 + static void vmi_nop(void) 254 + { 255 + } 256 + 257 + /* For NO_IDLE_HZ, we stop the clock when halting the kernel */ 258 + #ifdef CONFIG_NO_IDLE_HZ 259 + static fastcall void vmi_safe_halt(void) 260 + { 261 + int idle = vmi_stop_hz_timer(); 262 + vmi_ops.halt(); 263 + if (idle) { 264 + local_irq_disable(); 265 + vmi_account_time_restart_hz_timer(); 266 + local_irq_enable(); 267 + } 268 + } 269 + #endif 270 + 271 + #ifdef CONFIG_DEBUG_PAGE_TYPE 272 + 273 + #ifdef CONFIG_X86_PAE 274 + #define MAX_BOOT_PTS (2048+4+1) 275 + #else 276 + #define MAX_BOOT_PTS (1024+1) 277 + #endif 278 + 279 + /* 280 + * During boot, mem_map is not yet available in paging_init, so stash 281 + * all the boot page allocations here. 282 + */ 283 + static struct { 284 + u32 pfn; 285 + int type; 286 + } boot_page_allocations[MAX_BOOT_PTS]; 287 + static int num_boot_page_allocations; 288 + static int boot_allocations_applied; 289 + 290 + void vmi_apply_boot_page_allocations(void) 291 + { 292 + int i; 293 + BUG_ON(!mem_map); 294 + for (i = 0; i < num_boot_page_allocations; i++) { 295 + struct page *page = pfn_to_page(boot_page_allocations[i].pfn); 296 + page->type = boot_page_allocations[i].type; 297 + page->type = boot_page_allocations[i].type & 298 + ~(VMI_PAGE_ZEROED | VMI_PAGE_CLONE); 299 + } 300 + boot_allocations_applied = 1; 301 + } 302 + 303 + static void record_page_type(u32 pfn, int type) 304 + { 305 + BUG_ON(num_boot_page_allocations >= MAX_BOOT_PTS); 306 + boot_page_allocations[num_boot_page_allocations].pfn = pfn; 307 + boot_page_allocations[num_boot_page_allocations].type = type; 308 + num_boot_page_allocations++; 309 + } 310 + 311 + static void check_zeroed_page(u32 pfn, int type, struct page *page) 312 + { 313 + u32 *ptr; 314 + int i; 315 + int limit = PAGE_SIZE / sizeof(int); 316 + 317 + if (page_address(page)) 318 + ptr = (u32 *)page_address(page); 319 + else 320 + ptr = (u32 *)__va(pfn << PAGE_SHIFT); 321 + /* 322 + * When cloning the root in non-PAE mode, only the userspace 323 + * pdes need to be zeroed. 324 + */ 325 + if (type & VMI_PAGE_CLONE) 326 + limit = USER_PTRS_PER_PGD; 327 + for (i = 0; i < limit; i++) 328 + BUG_ON(ptr[i]); 329 + } 330 + 331 + /* 332 + * We stash the page type into struct page so we can verify the page 333 + * types are used properly. 334 + */ 335 + static void vmi_set_page_type(u32 pfn, int type) 336 + { 337 + /* PAE can have multiple roots per page - don't track */ 338 + if (PTRS_PER_PMD > 1 && (type & VMI_PAGE_PDP)) 339 + return; 340 + 341 + if (boot_allocations_applied) { 342 + struct page *page = pfn_to_page(pfn); 343 + if (type != VMI_PAGE_NORMAL) 344 + BUG_ON(page->type); 345 + else 346 + BUG_ON(page->type == VMI_PAGE_NORMAL); 347 + page->type = type & ~(VMI_PAGE_ZEROED | VMI_PAGE_CLONE); 348 + if (type & VMI_PAGE_ZEROED) 349 + check_zeroed_page(pfn, type, page); 350 + } else { 351 + record_page_type(pfn, type); 352 + } 353 + } 354 + 355 + static void vmi_check_page_type(u32 pfn, int type) 356 + { 357 + /* PAE can have multiple roots per page - skip checks */ 358 + if (PTRS_PER_PMD > 1 && (type & VMI_PAGE_PDP)) 359 + return; 360 + 361 + type &= ~(VMI_PAGE_ZEROED | VMI_PAGE_CLONE); 362 + if (boot_allocations_applied) { 363 + struct page *page = pfn_to_page(pfn); 364 + BUG_ON((page->type ^ type) & VMI_PAGE_PAE); 365 + BUG_ON(type == VMI_PAGE_NORMAL && page->type); 366 + BUG_ON((type & page->type) == 0); 367 + } 368 + } 369 + #else 370 + #define vmi_set_page_type(p,t) do { } while (0) 371 + #define vmi_check_page_type(p,t) do { } while (0) 372 + #endif 373 + 374 + static void vmi_allocate_pt(u32 pfn) 375 + { 376 + vmi_set_page_type(pfn, VMI_PAGE_L1); 377 + vmi_ops.allocate_page(pfn, VMI_PAGE_L1, 0, 0, 0); 378 + } 379 + 380 + static void vmi_allocate_pd(u32 pfn) 381 + { 382 + /* 383 + * This call comes in very early, before mem_map is setup. 384 + * It is called only for swapper_pg_dir, which already has 385 + * data on it. 386 + */ 387 + vmi_set_page_type(pfn, VMI_PAGE_L2); 388 + vmi_ops.allocate_page(pfn, VMI_PAGE_L2, 0, 0, 0); 389 + } 390 + 391 + static void vmi_allocate_pd_clone(u32 pfn, u32 clonepfn, u32 start, u32 count) 392 + { 393 + vmi_set_page_type(pfn, VMI_PAGE_L2 | VMI_PAGE_CLONE); 394 + vmi_check_page_type(clonepfn, VMI_PAGE_L2); 395 + vmi_ops.allocate_page(pfn, VMI_PAGE_L2 | VMI_PAGE_CLONE, clonepfn, start, count); 396 + } 397 + 398 + static void vmi_release_pt(u32 pfn) 399 + { 400 + vmi_ops.release_page(pfn, VMI_PAGE_L1); 401 + vmi_set_page_type(pfn, VMI_PAGE_NORMAL); 402 + } 403 + 404 + static void vmi_release_pd(u32 pfn) 405 + { 406 + vmi_ops.release_page(pfn, VMI_PAGE_L2); 407 + vmi_set_page_type(pfn, VMI_PAGE_NORMAL); 408 + } 409 + 410 + /* 411 + * Helper macros for MMU update flags. We can defer updates until a flush 412 + * or page invalidation only if the update is to the current address space 413 + * (otherwise, there is no flush). We must check against init_mm, since 414 + * this could be a kernel update, which usually passes init_mm, although 415 + * sometimes this check can be skipped if we know the particular function 416 + * is only called on user mode PTEs. We could change the kernel to pass 417 + * current->active_mm here, but in particular, I was unsure if changing 418 + * mm/highmem.c to do this would still be correct on other architectures. 419 + */ 420 + #define is_current_as(mm, mustbeuser) ((mm) == current->active_mm || \ 421 + (!mustbeuser && (mm) == &init_mm)) 422 + #define vmi_flags_addr(mm, addr, level, user) \ 423 + ((level) | (is_current_as(mm, user) ? \ 424 + (VMI_PAGE_CURRENT_AS | ((addr) & VMI_PAGE_VA_MASK)) : 0)) 425 + #define vmi_flags_addr_defer(mm, addr, level, user) \ 426 + ((level) | (is_current_as(mm, user) ? \ 427 + (VMI_PAGE_DEFER | VMI_PAGE_CURRENT_AS | ((addr) & VMI_PAGE_VA_MASK)) : 0)) 428 + 429 + static void vmi_update_pte(struct mm_struct *mm, u32 addr, pte_t *ptep) 430 + { 431 + vmi_check_page_type(__pa(ptep) >> PAGE_SHIFT, VMI_PAGE_PTE); 432 + vmi_ops.update_pte(ptep, vmi_flags_addr(mm, addr, VMI_PAGE_PT, 0)); 433 + } 434 + 435 + static void vmi_update_pte_defer(struct mm_struct *mm, u32 addr, pte_t *ptep) 436 + { 437 + vmi_check_page_type(__pa(ptep) >> PAGE_SHIFT, VMI_PAGE_PTE); 438 + vmi_ops.update_pte(ptep, vmi_flags_addr_defer(mm, addr, VMI_PAGE_PT, 0)); 439 + } 440 + 441 + static void vmi_set_pte(pte_t *ptep, pte_t pte) 442 + { 443 + /* XXX because of set_pmd_pte, this can be called on PT or PD layers */ 444 + vmi_check_page_type(__pa(ptep) >> PAGE_SHIFT, VMI_PAGE_PTE | VMI_PAGE_PD); 445 + vmi_ops.set_pte(pte, ptep, VMI_PAGE_PT); 446 + } 447 + 448 + static void vmi_set_pte_at(struct mm_struct *mm, u32 addr, pte_t *ptep, pte_t pte) 449 + { 450 + vmi_check_page_type(__pa(ptep) >> PAGE_SHIFT, VMI_PAGE_PTE); 451 + vmi_ops.set_pte(pte, ptep, vmi_flags_addr(mm, addr, VMI_PAGE_PT, 0)); 452 + } 453 + 454 + static void vmi_set_pmd(pmd_t *pmdp, pmd_t pmdval) 455 + { 456 + #ifdef CONFIG_X86_PAE 457 + const pte_t pte = { pmdval.pmd, pmdval.pmd >> 32 }; 458 + vmi_check_page_type(__pa(pmdp) >> PAGE_SHIFT, VMI_PAGE_PMD); 459 + #else 460 + const pte_t pte = { pmdval.pud.pgd.pgd }; 461 + vmi_check_page_type(__pa(pmdp) >> PAGE_SHIFT, VMI_PAGE_PGD); 462 + #endif 463 + vmi_ops.set_pte(pte, (pte_t *)pmdp, VMI_PAGE_PD); 464 + } 465 + 466 + #ifdef CONFIG_X86_PAE 467 + 468 + static void vmi_set_pte_atomic(pte_t *ptep, pte_t pteval) 469 + { 470 + /* 471 + * XXX This is called from set_pmd_pte, but at both PT 472 + * and PD layers so the VMI_PAGE_PT flag is wrong. But 473 + * it is only called for large page mapping changes, 474 + * the Xen backend, doesn't support large pages, and the 475 + * ESX backend doesn't depend on the flag. 476 + */ 477 + set_64bit((unsigned long long *)ptep,pte_val(pteval)); 478 + vmi_ops.update_pte(ptep, VMI_PAGE_PT); 479 + } 480 + 481 + static void vmi_set_pte_present(struct mm_struct *mm, unsigned long addr, pte_t *ptep, pte_t pte) 482 + { 483 + vmi_check_page_type(__pa(ptep) >> PAGE_SHIFT, VMI_PAGE_PTE); 484 + vmi_ops.set_pte(pte, ptep, vmi_flags_addr_defer(mm, addr, VMI_PAGE_PT, 1)); 485 + } 486 + 487 + static void vmi_set_pud(pud_t *pudp, pud_t pudval) 488 + { 489 + /* Um, eww */ 490 + const pte_t pte = { pudval.pgd.pgd, pudval.pgd.pgd >> 32 }; 491 + vmi_check_page_type(__pa(pudp) >> PAGE_SHIFT, VMI_PAGE_PGD); 492 + vmi_ops.set_pte(pte, (pte_t *)pudp, VMI_PAGE_PDP); 493 + } 494 + 495 + static void vmi_pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep) 496 + { 497 + const pte_t pte = { 0 }; 498 + vmi_check_page_type(__pa(ptep) >> PAGE_SHIFT, VMI_PAGE_PTE); 499 + vmi_ops.set_pte(pte, ptep, vmi_flags_addr(mm, addr, VMI_PAGE_PT, 0)); 500 + } 501 + 502 + void vmi_pmd_clear(pmd_t *pmd) 503 + { 504 + const pte_t pte = { 0 }; 505 + vmi_check_page_type(__pa(pmd) >> PAGE_SHIFT, VMI_PAGE_PMD); 506 + vmi_ops.set_pte(pte, (pte_t *)pmd, VMI_PAGE_PD); 507 + } 508 + #endif 509 + 510 + #ifdef CONFIG_SMP 511 + struct vmi_ap_state ap; 512 + extern void setup_pda(void); 513 + 514 + static void __init /* XXX cpu hotplug */ 515 + vmi_startup_ipi_hook(int phys_apicid, unsigned long start_eip, 516 + unsigned long start_esp) 517 + { 518 + /* Default everything to zero. This is fine for most GPRs. */ 519 + memset(&ap, 0, sizeof(struct vmi_ap_state)); 520 + 521 + ap.gdtr_limit = GDT_SIZE - 1; 522 + ap.gdtr_base = (unsigned long) get_cpu_gdt_table(phys_apicid); 523 + 524 + ap.idtr_limit = IDT_ENTRIES * 8 - 1; 525 + ap.idtr_base = (unsigned long) idt_table; 526 + 527 + ap.ldtr = 0; 528 + 529 + ap.cs = __KERNEL_CS; 530 + ap.eip = (unsigned long) start_eip; 531 + ap.ss = __KERNEL_DS; 532 + ap.esp = (unsigned long) start_esp; 533 + 534 + ap.ds = __USER_DS; 535 + ap.es = __USER_DS; 536 + ap.fs = __KERNEL_PDA; 537 + ap.gs = 0; 538 + 539 + ap.eflags = 0; 540 + 541 + setup_pda(); 542 + 543 + #ifdef CONFIG_X86_PAE 544 + /* efer should match BSP efer. */ 545 + if (cpu_has_nx) { 546 + unsigned l, h; 547 + rdmsr(MSR_EFER, l, h); 548 + ap.efer = (unsigned long long) h << 32 | l; 549 + } 550 + #endif 551 + 552 + ap.cr3 = __pa(swapper_pg_dir); 553 + /* Protected mode, paging, AM, WP, NE, MP. */ 554 + ap.cr0 = 0x80050023; 555 + ap.cr4 = mmu_cr4_features; 556 + vmi_ops.set_initial_ap_state(__pa(&ap), phys_apicid); 557 + } 558 + #endif 559 + 560 + static inline int __init check_vmi_rom(struct vrom_header *rom) 561 + { 562 + struct pci_header *pci; 563 + struct pnp_header *pnp; 564 + const char *manufacturer = "UNKNOWN"; 565 + const char *product = "UNKNOWN"; 566 + const char *license = "unspecified"; 567 + 568 + if (rom->rom_signature != 0xaa55) 569 + return 0; 570 + if (rom->vrom_signature != VMI_SIGNATURE) 571 + return 0; 572 + if (rom->api_version_maj != VMI_API_REV_MAJOR || 573 + rom->api_version_min+1 < VMI_API_REV_MINOR+1) { 574 + printk(KERN_WARNING "VMI: Found mismatched rom version %d.%d\n", 575 + rom->api_version_maj, 576 + rom->api_version_min); 577 + return 0; 578 + } 579 + 580 + /* 581 + * Relying on the VMI_SIGNATURE field is not 100% safe, so check 582 + * the PCI header and device type to make sure this is really a 583 + * VMI device. 584 + */ 585 + if (!rom->pci_header_offs) { 586 + printk(KERN_WARNING "VMI: ROM does not contain PCI header.\n"); 587 + return 0; 588 + } 589 + 590 + pci = (struct pci_header *)((char *)rom+rom->pci_header_offs); 591 + if (pci->vendorID != PCI_VENDOR_ID_VMWARE || 592 + pci->deviceID != PCI_DEVICE_ID_VMWARE_VMI) { 593 + /* Allow it to run... anyways, but warn */ 594 + printk(KERN_WARNING "VMI: ROM from unknown manufacturer\n"); 595 + } 596 + 597 + if (rom->pnp_header_offs) { 598 + pnp = (struct pnp_header *)((char *)rom+rom->pnp_header_offs); 599 + if (pnp->manufacturer_offset) 600 + manufacturer = (const char *)rom+pnp->manufacturer_offset; 601 + if (pnp->product_offset) 602 + product = (const char *)rom+pnp->product_offset; 603 + } 604 + 605 + if (rom->license_offs) 606 + license = (char *)rom+rom->license_offs; 607 + 608 + printk(KERN_INFO "VMI: Found %s %s, API version %d.%d, ROM version %d.%d\n", 609 + manufacturer, product, 610 + rom->api_version_maj, rom->api_version_min, 611 + pci->rom_version_maj, pci->rom_version_min); 612 + 613 + license_gplok = license_is_gpl_compatible(license); 614 + if (!license_gplok) { 615 + printk(KERN_WARNING "VMI: ROM license '%s' taints kernel... " 616 + "inlining disabled\n", 617 + license); 618 + add_taint(TAINT_PROPRIETARY_MODULE); 619 + } 620 + return 1; 621 + } 622 + 623 + /* 624 + * Probe for the VMI option ROM 625 + */ 626 + static inline int __init probe_vmi_rom(void) 627 + { 628 + unsigned long base; 629 + 630 + /* VMI ROM is in option ROM area, check signature */ 631 + for (base = 0xC0000; base < 0xE0000; base += 2048) { 632 + struct vrom_header *romstart; 633 + romstart = (struct vrom_header *)isa_bus_to_virt(base); 634 + if (check_vmi_rom(romstart)) { 635 + vmi_rom = romstart; 636 + return 1; 637 + } 638 + } 639 + return 0; 640 + } 641 + 642 + /* 643 + * VMI setup common to all processors 644 + */ 645 + void vmi_bringup(void) 646 + { 647 + /* We must establish the lowmem mapping for MMU ops to work */ 648 + if (vmi_rom) 649 + vmi_ops.set_linear_mapping(0, __PAGE_OFFSET, max_low_pfn, 0); 650 + } 651 + 652 + /* 653 + * Return a pointer to the VMI function or a NOP stub 654 + */ 655 + static void *vmi_get_function(int vmicall) 656 + { 657 + u64 reloc; 658 + const struct vmi_relocation_info *rel = (struct vmi_relocation_info *)&reloc; 659 + reloc = call_vrom_long_func(vmi_rom, get_reloc, vmicall); 660 + BUG_ON(rel->type == VMI_RELOCATION_JUMP_REL); 661 + if (rel->type == VMI_RELOCATION_CALL_REL) 662 + return (void *)rel->eip; 663 + else 664 + return (void *)vmi_nop; 665 + } 666 + 667 + /* 668 + * Helper macro for making the VMI paravirt-ops fill code readable. 669 + * For unimplemented operations, fall back to default. 670 + */ 671 + #define para_fill(opname, vmicall) \ 672 + do { \ 673 + reloc = call_vrom_long_func(vmi_rom, get_reloc, \ 674 + VMI_CALL_##vmicall); \ 675 + if (rel->type != VMI_RELOCATION_NONE) { \ 676 + BUG_ON(rel->type != VMI_RELOCATION_CALL_REL); \ 677 + paravirt_ops.opname = (void *)rel->eip; \ 678 + } \ 679 + } while (0) 680 + 681 + /* 682 + * Activate the VMI interface and switch into paravirtualized mode 683 + */ 684 + static inline int __init activate_vmi(void) 685 + { 686 + short kernel_cs; 687 + u64 reloc; 688 + const struct vmi_relocation_info *rel = (struct vmi_relocation_info *)&reloc; 689 + 690 + if (call_vrom_func(vmi_rom, vmi_init) != 0) { 691 + printk(KERN_ERR "VMI ROM failed to initialize!"); 692 + return 0; 693 + } 694 + savesegment(cs, kernel_cs); 695 + 696 + paravirt_ops.paravirt_enabled = 1; 697 + paravirt_ops.kernel_rpl = kernel_cs & SEGMENT_RPL_MASK; 698 + 699 + paravirt_ops.patch = vmi_patch; 700 + paravirt_ops.name = "vmi"; 701 + 702 + /* 703 + * Many of these operations are ABI compatible with VMI. 704 + * This means we can fill in the paravirt-ops with direct 705 + * pointers into the VMI ROM. If the calling convention for 706 + * these operations changes, this code needs to be updated. 707 + * 708 + * Exceptions 709 + * CPUID paravirt-op uses pointers, not the native ISA 710 + * halt has no VMI equivalent; all VMI halts are "safe" 711 + * no MSR support yet - just trap and emulate. VMI uses the 712 + * same ABI as the native ISA, but Linux wants exceptions 713 + * from bogus MSR read / write handled 714 + * rdpmc is not yet used in Linux 715 + */ 716 + 717 + /* CPUID is special, so very special */ 718 + reloc = call_vrom_long_func(vmi_rom, get_reloc, VMI_CALL_CPUID); 719 + if (rel->type != VMI_RELOCATION_NONE) { 720 + BUG_ON(rel->type != VMI_RELOCATION_CALL_REL); 721 + vmi_ops.cpuid = (void *)rel->eip; 722 + paravirt_ops.cpuid = vmi_cpuid; 723 + } 724 + 725 + para_fill(clts, CLTS); 726 + para_fill(get_debugreg, GetDR); 727 + para_fill(set_debugreg, SetDR); 728 + para_fill(read_cr0, GetCR0); 729 + para_fill(read_cr2, GetCR2); 730 + para_fill(read_cr3, GetCR3); 731 + para_fill(read_cr4, GetCR4); 732 + para_fill(write_cr0, SetCR0); 733 + para_fill(write_cr2, SetCR2); 734 + para_fill(write_cr3, SetCR3); 735 + para_fill(write_cr4, SetCR4); 736 + para_fill(save_fl, GetInterruptMask); 737 + para_fill(restore_fl, SetInterruptMask); 738 + para_fill(irq_disable, DisableInterrupts); 739 + para_fill(irq_enable, EnableInterrupts); 740 + /* irq_save_disable !!! sheer pain */ 741 + patch_offset(&irq_save_disable_callout[IRQ_PATCH_INT_MASK], 742 + (char *)paravirt_ops.save_fl); 743 + patch_offset(&irq_save_disable_callout[IRQ_PATCH_DISABLE], 744 + (char *)paravirt_ops.irq_disable); 745 + #ifndef CONFIG_NO_IDLE_HZ 746 + para_fill(safe_halt, Halt); 747 + #else 748 + vmi_ops.halt = vmi_get_function(VMI_CALL_Halt); 749 + paravirt_ops.safe_halt = vmi_safe_halt; 750 + #endif 751 + para_fill(wbinvd, WBINVD); 752 + /* paravirt_ops.read_msr = vmi_rdmsr */ 753 + /* paravirt_ops.write_msr = vmi_wrmsr */ 754 + para_fill(read_tsc, RDTSC); 755 + /* paravirt_ops.rdpmc = vmi_rdpmc */ 756 + 757 + /* TR interface doesn't pass TR value */ 758 + reloc = call_vrom_long_func(vmi_rom, get_reloc, VMI_CALL_SetTR); 759 + if (rel->type != VMI_RELOCATION_NONE) { 760 + BUG_ON(rel->type != VMI_RELOCATION_CALL_REL); 761 + vmi_ops.set_tr = (void *)rel->eip; 762 + paravirt_ops.load_tr_desc = vmi_set_tr; 763 + } 764 + 765 + /* LDT is special, too */ 766 + reloc = call_vrom_long_func(vmi_rom, get_reloc, VMI_CALL_SetLDT); 767 + if (rel->type != VMI_RELOCATION_NONE) { 768 + BUG_ON(rel->type != VMI_RELOCATION_CALL_REL); 769 + vmi_ops._set_ldt = (void *)rel->eip; 770 + paravirt_ops.set_ldt = vmi_set_ldt; 771 + } 772 + 773 + para_fill(load_gdt, SetGDT); 774 + para_fill(load_idt, SetIDT); 775 + para_fill(store_gdt, GetGDT); 776 + para_fill(store_idt, GetIDT); 777 + para_fill(store_tr, GetTR); 778 + paravirt_ops.load_tls = vmi_load_tls; 779 + para_fill(write_ldt_entry, WriteLDTEntry); 780 + para_fill(write_gdt_entry, WriteGDTEntry); 781 + para_fill(write_idt_entry, WriteIDTEntry); 782 + reloc = call_vrom_long_func(vmi_rom, get_reloc, 783 + VMI_CALL_UpdateKernelStack); 784 + if (rel->type != VMI_RELOCATION_NONE) { 785 + BUG_ON(rel->type != VMI_RELOCATION_CALL_REL); 786 + vmi_ops.set_kernel_stack = (void *)rel->eip; 787 + paravirt_ops.load_esp0 = vmi_load_esp0; 788 + } 789 + 790 + para_fill(set_iopl_mask, SetIOPLMask); 791 + paravirt_ops.io_delay = (void *)vmi_nop; 792 + if (!disable_nodelay) { 793 + paravirt_ops.const_udelay = (void *)vmi_nop; 794 + } 795 + 796 + para_fill(set_lazy_mode, SetLazyMode); 797 + 798 + reloc = call_vrom_long_func(vmi_rom, get_reloc, VMI_CALL_FlushTLB); 799 + if (rel->type != VMI_RELOCATION_NONE) { 800 + vmi_ops.flush_tlb = (void *)rel->eip; 801 + paravirt_ops.flush_tlb_user = vmi_flush_tlb_user; 802 + paravirt_ops.flush_tlb_kernel = vmi_flush_tlb_kernel; 803 + } 804 + para_fill(flush_tlb_single, InvalPage); 805 + 806 + /* 807 + * Until a standard flag format can be agreed on, we need to 808 + * implement these as wrappers in Linux. Get the VMI ROM 809 + * function pointers for the two backend calls. 810 + */ 811 + #ifdef CONFIG_X86_PAE 812 + vmi_ops.set_pte = vmi_get_function(VMI_CALL_SetPxELong); 813 + vmi_ops.update_pte = vmi_get_function(VMI_CALL_UpdatePxELong); 814 + #else 815 + vmi_ops.set_pte = vmi_get_function(VMI_CALL_SetPxE); 816 + vmi_ops.update_pte = vmi_get_function(VMI_CALL_UpdatePxE); 817 + #endif 818 + vmi_ops.set_linear_mapping = vmi_get_function(VMI_CALL_SetLinearMapping); 819 + vmi_ops.allocate_page = vmi_get_function(VMI_CALL_AllocatePage); 820 + vmi_ops.release_page = vmi_get_function(VMI_CALL_ReleasePage); 821 + 822 + paravirt_ops.alloc_pt = vmi_allocate_pt; 823 + paravirt_ops.alloc_pd = vmi_allocate_pd; 824 + paravirt_ops.alloc_pd_clone = vmi_allocate_pd_clone; 825 + paravirt_ops.release_pt = vmi_release_pt; 826 + paravirt_ops.release_pd = vmi_release_pd; 827 + paravirt_ops.set_pte = vmi_set_pte; 828 + paravirt_ops.set_pte_at = vmi_set_pte_at; 829 + paravirt_ops.set_pmd = vmi_set_pmd; 830 + paravirt_ops.pte_update = vmi_update_pte; 831 + paravirt_ops.pte_update_defer = vmi_update_pte_defer; 832 + #ifdef CONFIG_X86_PAE 833 + paravirt_ops.set_pte_atomic = vmi_set_pte_atomic; 834 + paravirt_ops.set_pte_present = vmi_set_pte_present; 835 + paravirt_ops.set_pud = vmi_set_pud; 836 + paravirt_ops.pte_clear = vmi_pte_clear; 837 + paravirt_ops.pmd_clear = vmi_pmd_clear; 838 + #endif 839 + /* 840 + * These MUST always be patched. Don't support indirect jumps 841 + * through these operations, as the VMI interface may use either 842 + * a jump or a call to get to these operations, depending on 843 + * the backend. They are performance critical anyway, so requiring 844 + * a patch is not a big problem. 845 + */ 846 + paravirt_ops.irq_enable_sysexit = (void *)0xfeedbab0; 847 + paravirt_ops.iret = (void *)0xbadbab0; 848 + 849 + #ifdef CONFIG_SMP 850 + paravirt_ops.startup_ipi_hook = vmi_startup_ipi_hook; 851 + vmi_ops.set_initial_ap_state = vmi_get_function(VMI_CALL_SetInitialAPState); 852 + #endif 853 + 854 + #ifdef CONFIG_X86_LOCAL_APIC 855 + paravirt_ops.apic_read = vmi_get_function(VMI_CALL_APICRead); 856 + paravirt_ops.apic_write = vmi_get_function(VMI_CALL_APICWrite); 857 + paravirt_ops.apic_write_atomic = vmi_get_function(VMI_CALL_APICWrite); 858 + #endif 859 + 860 + /* 861 + * Check for VMI timer functionality by probing for a cycle frequency method 862 + */ 863 + reloc = call_vrom_long_func(vmi_rom, get_reloc, VMI_CALL_GetCycleFrequency); 864 + if (rel->type != VMI_RELOCATION_NONE) { 865 + vmi_timer_ops.get_cycle_frequency = (void *)rel->eip; 866 + vmi_timer_ops.get_cycle_counter = 867 + vmi_get_function(VMI_CALL_GetCycleCounter); 868 + vmi_timer_ops.get_wallclock = 869 + vmi_get_function(VMI_CALL_GetWallclockTime); 870 + vmi_timer_ops.wallclock_updated = 871 + vmi_get_function(VMI_CALL_WallclockUpdated); 872 + vmi_timer_ops.set_alarm = vmi_get_function(VMI_CALL_SetAlarm); 873 + vmi_timer_ops.cancel_alarm = 874 + vmi_get_function(VMI_CALL_CancelAlarm); 875 + paravirt_ops.time_init = vmi_time_init; 876 + paravirt_ops.get_wallclock = vmi_get_wallclock; 877 + paravirt_ops.set_wallclock = vmi_set_wallclock; 878 + #ifdef CONFIG_X86_LOCAL_APIC 879 + paravirt_ops.setup_boot_clock = vmi_timer_setup_boot_alarm; 880 + paravirt_ops.setup_secondary_clock = vmi_timer_setup_secondary_alarm; 881 + #endif 882 + custom_sched_clock = vmi_sched_clock; 883 + } 884 + 885 + /* 886 + * Alternative instruction rewriting doesn't happen soon enough 887 + * to convert VMI_IRET to a call instead of a jump; so we have 888 + * to do this before IRQs get reenabled. Fortunately, it is 889 + * idempotent. 890 + */ 891 + apply_paravirt(__start_parainstructions, __stop_parainstructions); 892 + 893 + vmi_bringup(); 894 + 895 + return 1; 896 + } 897 + 898 + #undef para_fill 899 + 900 + void __init vmi_init(void) 901 + { 902 + unsigned long flags; 903 + 904 + if (!vmi_rom) 905 + probe_vmi_rom(); 906 + else 907 + check_vmi_rom(vmi_rom); 908 + 909 + /* In case probing for or validating the ROM failed, basil */ 910 + if (!vmi_rom) 911 + return; 912 + 913 + reserve_top_address(-vmi_rom->virtual_top); 914 + 915 + local_irq_save(flags); 916 + activate_vmi(); 917 + #ifdef CONFIG_SMP 918 + no_timer_check = 1; 919 + #endif 920 + local_irq_restore(flags & X86_EFLAGS_IF); 921 + } 922 + 923 + static int __init parse_vmi(char *arg) 924 + { 925 + if (!arg) 926 + return -EINVAL; 927 + 928 + if (!strcmp(arg, "disable_nodelay")) 929 + disable_nodelay = 1; 930 + else if (!strcmp(arg, "disable_pge")) { 931 + clear_bit(X86_FEATURE_PGE, boot_cpu_data.x86_capability); 932 + disable_pge = 1; 933 + } else if (!strcmp(arg, "disable_pse")) { 934 + clear_bit(X86_FEATURE_PSE, boot_cpu_data.x86_capability); 935 + disable_pse = 1; 936 + } else if (!strcmp(arg, "disable_sep")) { 937 + clear_bit(X86_FEATURE_SEP, boot_cpu_data.x86_capability); 938 + disable_sep = 1; 939 + } else if (!strcmp(arg, "disable_tsc")) { 940 + clear_bit(X86_FEATURE_TSC, boot_cpu_data.x86_capability); 941 + disable_tsc = 1; 942 + } else if (!strcmp(arg, "disable_mtrr")) { 943 + clear_bit(X86_FEATURE_MTRR, boot_cpu_data.x86_capability); 944 + disable_mtrr = 1; 945 + } 946 + return 0; 947 + } 948 + 949 + early_param("vmi", parse_vmi);

+499

arch/i386/kernel/vmitime.c

··· 1 + /* 2 + * VMI paravirtual timer support routines. 3 + * 4 + * Copyright (C) 2005, VMware, Inc. 5 + * 6 + * This program is free software; you can redistribute it and/or modify 7 + * it under the terms of the GNU General Public License as published by 8 + * the Free Software Foundation; either version 2 of the License, or 9 + * (at your option) any later version. 10 + * 11 + * This program is distributed in the hope that it will be useful, but 12 + * WITHOUT ANY WARRANTY; without even the implied warranty of 13 + * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or 14 + * NON INFRINGEMENT. See the GNU General Public License for more 15 + * details. 16 + * 17 + * You should have received a copy of the GNU General Public License 18 + * along with this program; if not, write to the Free Software 19 + * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. 20 + * 21 + * Send feedback to dhecht@vmware.com 22 + * 23 + */ 24 + 25 + /* 26 + * Portions of this code from arch/i386/kernel/timers/timer_tsc.c. 27 + * Portions of the CONFIG_NO_IDLE_HZ code from arch/s390/kernel/time.c. 28 + * See comments there for proper credits. 29 + */ 30 + 31 + #include <linux/spinlock.h> 32 + #include <linux/init.h> 33 + #include <linux/errno.h> 34 + #include <linux/jiffies.h> 35 + #include <linux/interrupt.h> 36 + #include <linux/kernel_stat.h> 37 + #include <linux/rcupdate.h> 38 + #include <linux/clocksource.h> 39 + 40 + #include <asm/timer.h> 41 + #include <asm/io.h> 42 + #include <asm/apic.h> 43 + #include <asm/div64.h> 44 + #include <asm/timer.h> 45 + #include <asm/desc.h> 46 + 47 + #include <asm/vmi.h> 48 + #include <asm/vmi_time.h> 49 + 50 + #include <mach_timer.h> 51 + #include <io_ports.h> 52 + 53 + #ifdef CONFIG_X86_LOCAL_APIC 54 + #define VMI_ALARM_WIRING VMI_ALARM_WIRED_LVTT 55 + #else 56 + #define VMI_ALARM_WIRING VMI_ALARM_WIRED_IRQ0 57 + #endif 58 + 59 + /* Cached VMI operations */ 60 + struct vmi_timer_ops vmi_timer_ops; 61 + 62 + #ifdef CONFIG_NO_IDLE_HZ 63 + 64 + /* /proc/sys/kernel/hz_timer state. */ 65 + int sysctl_hz_timer; 66 + 67 + /* Some stats */ 68 + static DEFINE_PER_CPU(unsigned long, vmi_idle_no_hz_irqs); 69 + static DEFINE_PER_CPU(unsigned long, vmi_idle_no_hz_jiffies); 70 + static DEFINE_PER_CPU(unsigned long, idle_start_jiffies); 71 + 72 + #endif /* CONFIG_NO_IDLE_HZ */ 73 + 74 + /* Number of alarms per second. By default this is CONFIG_VMI_ALARM_HZ. */ 75 + static int alarm_hz = CONFIG_VMI_ALARM_HZ; 76 + 77 + /* Cache of the value get_cycle_frequency / HZ. */ 78 + static signed long long cycles_per_jiffy; 79 + 80 + /* Cache of the value get_cycle_frequency / alarm_hz. */ 81 + static signed long long cycles_per_alarm; 82 + 83 + /* The number of cycles accounted for by the 'jiffies'/'xtime' count. 84 + * Protected by xtime_lock. */ 85 + static unsigned long long real_cycles_accounted_system; 86 + 87 + /* The number of cycles accounted for by update_process_times(), per cpu. */ 88 + static DEFINE_PER_CPU(unsigned long long, process_times_cycles_accounted_cpu); 89 + 90 + /* The number of stolen cycles accounted, per cpu. */ 91 + static DEFINE_PER_CPU(unsigned long long, stolen_cycles_accounted_cpu); 92 + 93 + /* Clock source. */ 94 + static cycle_t read_real_cycles(void) 95 + { 96 + return vmi_timer_ops.get_cycle_counter(VMI_CYCLES_REAL); 97 + } 98 + 99 + static cycle_t read_available_cycles(void) 100 + { 101 + return vmi_timer_ops.get_cycle_counter(VMI_CYCLES_AVAILABLE); 102 + } 103 + 104 + #if 0 105 + static cycle_t read_stolen_cycles(void) 106 + { 107 + return vmi_timer_ops.get_cycle_counter(VMI_CYCLES_STOLEN); 108 + } 109 + #endif /* 0 */ 110 + 111 + static struct clocksource clocksource_vmi = { 112 + .name = "vmi-timer", 113 + .rating = 450, 114 + .read = read_real_cycles, 115 + .mask = CLOCKSOURCE_MASK(64), 116 + .mult = 0, /* to be set */ 117 + .shift = 22, 118 + .is_continuous = 1, 119 + }; 120 + 121 + 122 + /* Timer interrupt handler. */ 123 + static irqreturn_t vmi_timer_interrupt(int irq, void *dev_id); 124 + 125 + static struct irqaction vmi_timer_irq = { 126 + vmi_timer_interrupt, 127 + SA_INTERRUPT, 128 + CPU_MASK_NONE, 129 + "VMI-alarm", 130 + NULL, 131 + NULL 132 + }; 133 + 134 + /* Alarm rate */ 135 + static int __init vmi_timer_alarm_rate_setup(char* str) 136 + { 137 + int alarm_rate; 138 + if (get_option(&str, &alarm_rate) == 1 && alarm_rate > 0) { 139 + alarm_hz = alarm_rate; 140 + printk(KERN_WARNING "VMI timer alarm HZ set to %d\n", alarm_hz); 141 + } 142 + return 1; 143 + } 144 + __setup("vmi_timer_alarm_hz=", vmi_timer_alarm_rate_setup); 145 + 146 + 147 + /* Initialization */ 148 + static void vmi_get_wallclock_ts(struct timespec *ts) 149 + { 150 + unsigned long long wallclock; 151 + wallclock = vmi_timer_ops.get_wallclock(); // nsec units 152 + ts->tv_nsec = do_div(wallclock, 1000000000); 153 + ts->tv_sec = wallclock; 154 + } 155 + 156 + static void update_xtime_from_wallclock(void) 157 + { 158 + struct timespec ts; 159 + vmi_get_wallclock_ts(&ts); 160 + do_settimeofday(&ts); 161 + } 162 + 163 + unsigned long vmi_get_wallclock(void) 164 + { 165 + struct timespec ts; 166 + vmi_get_wallclock_ts(&ts); 167 + return ts.tv_sec; 168 + } 169 + 170 + int vmi_set_wallclock(unsigned long now) 171 + { 172 + return -1; 173 + } 174 + 175 + unsigned long long vmi_sched_clock(void) 176 + { 177 + return read_available_cycles(); 178 + } 179 + 180 + void __init vmi_time_init(void) 181 + { 182 + unsigned long long cycles_per_sec, cycles_per_msec; 183 + unsigned long flags; 184 + 185 + local_irq_save(flags); 186 + setup_irq(0, &vmi_timer_irq); 187 + #ifdef CONFIG_X86_LOCAL_APIC 188 + set_intr_gate(LOCAL_TIMER_VECTOR, apic_vmi_timer_interrupt); 189 + #endif 190 + 191 + no_sync_cmos_clock = 1; 192 + 193 + vmi_get_wallclock_ts(&xtime); 194 + set_normalized_timespec(&wall_to_monotonic, 195 + -xtime.tv_sec, -xtime.tv_nsec); 196 + 197 + real_cycles_accounted_system = read_real_cycles(); 198 + update_xtime_from_wallclock(); 199 + per_cpu(process_times_cycles_accounted_cpu, 0) = read_available_cycles(); 200 + 201 + cycles_per_sec = vmi_timer_ops.get_cycle_frequency(); 202 + 203 + cycles_per_jiffy = cycles_per_sec; 204 + (void)do_div(cycles_per_jiffy, HZ); 205 + cycles_per_alarm = cycles_per_sec; 206 + (void)do_div(cycles_per_alarm, alarm_hz); 207 + cycles_per_msec = cycles_per_sec; 208 + (void)do_div(cycles_per_msec, 1000); 209 + cpu_khz = cycles_per_msec; 210 + 211 + printk(KERN_WARNING "VMI timer cycles/sec = %llu ; cycles/jiffy = %llu ;" 212 + "cycles/alarm = %llu\n", cycles_per_sec, cycles_per_jiffy, 213 + cycles_per_alarm); 214 + 215 + clocksource_vmi.mult = clocksource_khz2mult(cycles_per_msec, 216 + clocksource_vmi.shift); 217 + if (clocksource_register(&clocksource_vmi)) 218 + printk(KERN_WARNING "Error registering VMITIME clocksource."); 219 + 220 + /* Disable PIT. */ 221 + outb_p(0x3a, PIT_MODE); /* binary, mode 5, LSB/MSB, ch 0 */ 222 + 223 + /* schedule the alarm. do this in phase with process_times_cycles_accounted_cpu 224 + * reduce the latency calling update_process_times. */ 225 + vmi_timer_ops.set_alarm( 226 + VMI_ALARM_WIRED_IRQ0 | VMI_ALARM_IS_PERIODIC | VMI_CYCLES_AVAILABLE, 227 + per_cpu(process_times_cycles_accounted_cpu, 0) + cycles_per_alarm, 228 + cycles_per_alarm); 229 + 230 + local_irq_restore(flags); 231 + } 232 + 233 + #ifdef CONFIG_X86_LOCAL_APIC 234 + 235 + void __init vmi_timer_setup_boot_alarm(void) 236 + { 237 + local_irq_disable(); 238 + 239 + /* Route the interrupt to the correct vector. */ 240 + apic_write_around(APIC_LVTT, LOCAL_TIMER_VECTOR); 241 + 242 + /* Cancel the IRQ0 wired alarm, and setup the LVTT alarm. */ 243 + vmi_timer_ops.cancel_alarm(VMI_CYCLES_AVAILABLE); 244 + vmi_timer_ops.set_alarm( 245 + VMI_ALARM_WIRED_LVTT | VMI_ALARM_IS_PERIODIC | VMI_CYCLES_AVAILABLE, 246 + per_cpu(process_times_cycles_accounted_cpu, 0) + cycles_per_alarm, 247 + cycles_per_alarm); 248 + local_irq_enable(); 249 + } 250 + 251 + /* Initialize the time accounting variables for an AP on an SMP system. 252 + * Also, set the local alarm for the AP. */ 253 + void __init vmi_timer_setup_secondary_alarm(void) 254 + { 255 + int cpu = smp_processor_id(); 256 + 257 + /* Route the interrupt to the correct vector. */ 258 + apic_write_around(APIC_LVTT, LOCAL_TIMER_VECTOR); 259 + 260 + per_cpu(process_times_cycles_accounted_cpu, cpu) = read_available_cycles(); 261 + 262 + vmi_timer_ops.set_alarm( 263 + VMI_ALARM_WIRED_LVTT | VMI_ALARM_IS_PERIODIC | VMI_CYCLES_AVAILABLE, 264 + per_cpu(process_times_cycles_accounted_cpu, cpu) + cycles_per_alarm, 265 + cycles_per_alarm); 266 + } 267 + 268 + #endif 269 + 270 + /* Update system wide (real) time accounting (e.g. jiffies, xtime). */ 271 + static void vmi_account_real_cycles(unsigned long long cur_real_cycles) 272 + { 273 + long long cycles_not_accounted; 274 + 275 + write_seqlock(&xtime_lock); 276 + 277 + cycles_not_accounted = cur_real_cycles - real_cycles_accounted_system; 278 + while (cycles_not_accounted >= cycles_per_jiffy) { 279 + /* systems wide jiffies and wallclock. */ 280 + do_timer(1); 281 + 282 + cycles_not_accounted -= cycles_per_jiffy; 283 + real_cycles_accounted_system += cycles_per_jiffy; 284 + } 285 + 286 + if (vmi_timer_ops.wallclock_updated()) 287 + update_xtime_from_wallclock(); 288 + 289 + write_sequnlock(&xtime_lock); 290 + } 291 + 292 + /* Update per-cpu process times. */ 293 + static void vmi_account_process_times_cycles(struct pt_regs *regs, int cpu, 294 + unsigned long long cur_process_times_cycles) 295 + { 296 + long long cycles_not_accounted; 297 + cycles_not_accounted = cur_process_times_cycles - 298 + per_cpu(process_times_cycles_accounted_cpu, cpu); 299 + 300 + while (cycles_not_accounted >= cycles_per_jiffy) { 301 + /* Account time to the current process. This includes 302 + * calling into the scheduler to decrement the timeslice 303 + * and possibly reschedule.*/ 304 + update_process_times(user_mode(regs)); 305 + /* XXX handle /proc/profile multiplier. */ 306 + profile_tick(CPU_PROFILING); 307 + 308 + cycles_not_accounted -= cycles_per_jiffy; 309 + per_cpu(process_times_cycles_accounted_cpu, cpu) += cycles_per_jiffy; 310 + } 311 + } 312 + 313 + #ifdef CONFIG_NO_IDLE_HZ 314 + /* Update per-cpu idle times. Used when a no-hz halt is ended. */ 315 + static void vmi_account_no_hz_idle_cycles(int cpu, 316 + unsigned long long cur_process_times_cycles) 317 + { 318 + long long cycles_not_accounted; 319 + unsigned long no_idle_hz_jiffies = 0; 320 + 321 + cycles_not_accounted = cur_process_times_cycles - 322 + per_cpu(process_times_cycles_accounted_cpu, cpu); 323 + 324 + while (cycles_not_accounted >= cycles_per_jiffy) { 325 + no_idle_hz_jiffies++; 326 + cycles_not_accounted -= cycles_per_jiffy; 327 + per_cpu(process_times_cycles_accounted_cpu, cpu) += cycles_per_jiffy; 328 + } 329 + /* Account time to the idle process. */ 330 + account_steal_time(idle_task(cpu), jiffies_to_cputime(no_idle_hz_jiffies)); 331 + } 332 + #endif 333 + 334 + /* Update per-cpu stolen time. */ 335 + static void vmi_account_stolen_cycles(int cpu, 336 + unsigned long long cur_real_cycles, 337 + unsigned long long cur_avail_cycles) 338 + { 339 + long long stolen_cycles_not_accounted; 340 + unsigned long stolen_jiffies = 0; 341 + 342 + if (cur_real_cycles < cur_avail_cycles) 343 + return; 344 + 345 + stolen_cycles_not_accounted = cur_real_cycles - cur_avail_cycles - 346 + per_cpu(stolen_cycles_accounted_cpu, cpu); 347 + 348 + while (stolen_cycles_not_accounted >= cycles_per_jiffy) { 349 + stolen_jiffies++; 350 + stolen_cycles_not_accounted -= cycles_per_jiffy; 351 + per_cpu(stolen_cycles_accounted_cpu, cpu) += cycles_per_jiffy; 352 + } 353 + /* HACK: pass NULL to force time onto cpustat->steal. */ 354 + account_steal_time(NULL, jiffies_to_cputime(stolen_jiffies)); 355 + } 356 + 357 + /* Body of either IRQ0 interrupt handler (UP no local-APIC) or 358 + * local-APIC LVTT interrupt handler (UP & local-APIC or SMP). */ 359 + static void vmi_local_timer_interrupt(int cpu) 360 + { 361 + unsigned long long cur_real_cycles, cur_process_times_cycles; 362 + 363 + cur_real_cycles = read_real_cycles(); 364 + cur_process_times_cycles = read_available_cycles(); 365 + /* Update system wide (real) time state (xtime, jiffies). */ 366 + vmi_account_real_cycles(cur_real_cycles); 367 + /* Update per-cpu process times. */ 368 + vmi_account_process_times_cycles(get_irq_regs(), cpu, cur_process_times_cycles); 369 + /* Update time stolen from this cpu by the hypervisor. */ 370 + vmi_account_stolen_cycles(cpu, cur_real_cycles, cur_process_times_cycles); 371 + } 372 + 373 + #ifdef CONFIG_NO_IDLE_HZ 374 + 375 + /* Must be called only from idle loop, with interrupts disabled. */ 376 + int vmi_stop_hz_timer(void) 377 + { 378 + /* Note that cpu_set, cpu_clear are (SMP safe) atomic on x86. */ 379 + 380 + unsigned long seq, next; 381 + unsigned long long real_cycles_expiry; 382 + int cpu = smp_processor_id(); 383 + int idle; 384 + 385 + BUG_ON(!irqs_disabled()); 386 + if (sysctl_hz_timer != 0) 387 + return 0; 388 + 389 + cpu_set(cpu, nohz_cpu_mask); 390 + smp_mb(); 391 + if (rcu_needs_cpu(cpu) || local_softirq_pending() || 392 + (next = next_timer_interrupt(), time_before_eq(next, jiffies))) { 393 + cpu_clear(cpu, nohz_cpu_mask); 394 + next = jiffies; 395 + idle = 0; 396 + } else 397 + idle = 1; 398 + 399 + /* Convert jiffies to the real cycle counter. */ 400 + do { 401 + seq = read_seqbegin(&xtime_lock); 402 + real_cycles_expiry = real_cycles_accounted_system + 403 + (long)(next - jiffies) * cycles_per_jiffy; 404 + } while (read_seqretry(&xtime_lock, seq)); 405 + 406 + /* This cpu is going idle. Disable the periodic alarm. */ 407 + if (idle) { 408 + vmi_timer_ops.cancel_alarm(VMI_CYCLES_AVAILABLE); 409 + per_cpu(idle_start_jiffies, cpu) = jiffies; 410 + } 411 + 412 + /* Set the real time alarm to expire at the next event. */ 413 + vmi_timer_ops.set_alarm( 414 + VMI_ALARM_WIRING | VMI_ALARM_IS_ONESHOT | VMI_CYCLES_REAL, 415 + real_cycles_expiry, 0); 416 + 417 + return idle; 418 + } 419 + 420 + static void vmi_reenable_hz_timer(int cpu) 421 + { 422 + /* For /proc/vmi/info idle_hz stat. */ 423 + per_cpu(vmi_idle_no_hz_jiffies, cpu) += jiffies - per_cpu(idle_start_jiffies, cpu); 424 + per_cpu(vmi_idle_no_hz_irqs, cpu)++; 425 + 426 + /* Don't bother explicitly cancelling the one-shot alarm -- at 427 + * worse we will receive a spurious timer interrupt. */ 428 + vmi_timer_ops.set_alarm( 429 + VMI_ALARM_WIRING | VMI_ALARM_IS_PERIODIC | VMI_CYCLES_AVAILABLE, 430 + per_cpu(process_times_cycles_accounted_cpu, cpu) + cycles_per_alarm, 431 + cycles_per_alarm); 432 + /* Indicate this cpu is no longer nohz idle. */ 433 + cpu_clear(cpu, nohz_cpu_mask); 434 + } 435 + 436 + /* Called from interrupt handlers when (local) HZ timer is disabled. */ 437 + void vmi_account_time_restart_hz_timer(void) 438 + { 439 + unsigned long long cur_real_cycles, cur_process_times_cycles; 440 + int cpu = smp_processor_id(); 441 + 442 + BUG_ON(!irqs_disabled()); 443 + /* Account the time during which the HZ timer was disabled. */ 444 + cur_real_cycles = read_real_cycles(); 445 + cur_process_times_cycles = read_available_cycles(); 446 + /* Update system wide (real) time state (xtime, jiffies). */ 447 + vmi_account_real_cycles(cur_real_cycles); 448 + /* Update per-cpu idle times. */ 449 + vmi_account_no_hz_idle_cycles(cpu, cur_process_times_cycles); 450 + /* Update time stolen from this cpu by the hypervisor. */ 451 + vmi_account_stolen_cycles(cpu, cur_real_cycles, cur_process_times_cycles); 452 + /* Reenable the hz timer. */ 453 + vmi_reenable_hz_timer(cpu); 454 + } 455 + 456 + #endif /* CONFIG_NO_IDLE_HZ */ 457 + 458 + /* UP (and no local-APIC) VMI-timer alarm interrupt handler. 459 + * Handler for IRQ0. Not used when SMP or X86_LOCAL_APIC after 460 + * APIC setup and setup_boot_vmi_alarm() is called. */ 461 + static irqreturn_t vmi_timer_interrupt(int irq, void *dev_id) 462 + { 463 + vmi_local_timer_interrupt(smp_processor_id()); 464 + return IRQ_HANDLED; 465 + } 466 + 467 + #ifdef CONFIG_X86_LOCAL_APIC 468 + 469 + /* SMP VMI-timer alarm interrupt handler. Handler for LVTT vector. 470 + * Also used in UP when CONFIG_X86_LOCAL_APIC. 471 + * The wrapper code is from arch/i386/kernel/apic.c#smp_apic_timer_interrupt. */ 472 + void smp_apic_vmi_timer_interrupt(struct pt_regs *regs) 473 + { 474 + struct pt_regs *old_regs = set_irq_regs(regs); 475 + int cpu = smp_processor_id(); 476 + 477 + /* 478 + * the NMI deadlock-detector uses this. 479 + */ 480 + per_cpu(irq_stat,cpu).apic_timer_irqs++; 481 + 482 + /* 483 + * NOTE! We'd better ACK the irq immediately, 484 + * because timer handling can be slow. 485 + */ 486 + ack_APIC_irq(); 487 + 488 + /* 489 + * update_process_times() expects us to have done irq_enter(). 490 + * Besides, if we don't timer interrupts ignore the global 491 + * interrupt lock, which is the WrongThing (tm) to do. 492 + */ 493 + irq_enter(); 494 + vmi_local_timer_interrupt(cpu); 495 + irq_exit(); 496 + set_irq_regs(old_regs); 497 + } 498 + 499 + #endif /* CONFIG_X86_LOCAL_APIC */

+6 -1

arch/i386/kernel/vmlinux.lds.S

··· 37 37 { 38 38 . = LOAD_OFFSET + LOAD_PHYSICAL_ADDR; 39 39 phys_startup_32 = startup_32 - LOAD_OFFSET; 40 + 41 + .text.head : AT(ADDR(.text.head) - LOAD_OFFSET) { 42 + _text = .; /* Text and read-only data */ 43 + *(.text.head) 44 + } :text = 0x9090 45 + 40 46 /* read-only */ 41 47 .text : AT(ADDR(.text) - LOAD_OFFSET) { 42 - _text = .; /* Text and read-only data */ 43 48 *(.text) 44 49 SCHED_TEXT 45 50 LOCK_TEXT

+5 -9

arch/i386/math-emu/get_address.c

··· 56 56 #define VM86_REG_(x) (*(unsigned short *) \ 57 57 (reg_offset_vm86[((unsigned)x)]+(u_char *) FPU_info)) 58 58 59 - /* These are dummy, fs and gs are not saved on the stack. */ 60 - #define ___FS ___ds 59 + /* This dummy, gs is not saved on the stack. */ 61 60 #define ___GS ___ds 62 61 63 62 static int reg_offset_pm[] = { 64 63 offsetof(struct info,___cs), 65 64 offsetof(struct info,___ds), 66 65 offsetof(struct info,___es), 67 - offsetof(struct info,___FS), 66 + offsetof(struct info,___fs), 68 67 offsetof(struct info,___GS), 69 68 offsetof(struct info,___ss), 70 69 offsetof(struct info,___ds) ··· 168 169 169 170 switch ( segment ) 170 171 { 171 - /* fs and gs aren't used by the kernel, so they still have their 172 - user-space values. */ 173 - case PREFIX_FS_-1: 174 - /* N.B. - movl %seg, mem is a 2 byte write regardless of prefix */ 175 - savesegment(fs, addr->selector); 176 - break; 172 + /* gs isn't used by the kernel, so it still has its 173 + user-space value. */ 177 174 case PREFIX_GS_-1: 175 + /* N.B. - movl %seg, mem is a 2 byte write regardless of prefix */ 178 176 savesegment(gs, addr->selector); 179 177 break; 180 178 default:

+5 -3

arch/i386/math-emu/status_w.h

··· 48 48 49 49 #define status_word() \ 50 50 ((partial_status & ~SW_Top & 0xffff) | ((top << SW_Top_Shift) & SW_Top)) 51 - #define setcc(cc) ({ \ 52 - partial_status &= ~(SW_C0|SW_C1|SW_C2|SW_C3); \ 53 - partial_status |= (cc) & (SW_C0|SW_C1|SW_C2|SW_C3); }) 51 + static inline void setcc(int cc) 52 + { 53 + partial_status &= ~(SW_C0|SW_C1|SW_C2|SW_C3); 54 + partial_status |= (cc) & (SW_C0|SW_C1|SW_C2|SW_C3); 55 + } 54 56 55 57 #ifdef PECULIAR_486 56 58 /* Default, this conveys no information, but an 80486 does it. */

-1

arch/i386/mm/discontig.c

··· 101 101 extern void add_one_highpage_init(struct page *, int, int); 102 102 103 103 extern struct e820map e820; 104 - extern unsigned long init_pg_tables_end; 105 104 extern unsigned long highend_pfn, highstart_pfn; 106 105 extern unsigned long max_low_pfn; 107 106 extern unsigned long totalram_pages;

+8 -10

arch/i386/mm/fault.c

··· 46 46 } 47 47 EXPORT_SYMBOL_GPL(unregister_page_fault_notifier); 48 48 49 - static inline int notify_page_fault(enum die_val val, const char *str, 50 - struct pt_regs *regs, long err, int trap, int sig) 49 + static inline int notify_page_fault(struct pt_regs *regs, long err) 51 50 { 52 51 struct die_args args = { 53 52 .regs = regs, 54 - .str = str, 53 + .str = "page fault", 55 54 .err = err, 56 - .trapnr = trap, 57 - .signr = sig 55 + .trapnr = 14, 56 + .signr = SIGSEGV 58 57 }; 59 - return atomic_notifier_call_chain(&notify_page_fault_chain, val, &args); 58 + return atomic_notifier_call_chain(&notify_page_fault_chain, 59 + DIE_PAGE_FAULT, &args); 60 60 } 61 61 62 62 /* ··· 327 327 if (unlikely(address >= TASK_SIZE)) { 328 328 if (!(error_code & 0x0000000d) && vmalloc_fault(address) >= 0) 329 329 return; 330 - if (notify_page_fault(DIE_PAGE_FAULT, "page fault", regs, error_code, 14, 331 - SIGSEGV) == NOTIFY_STOP) 330 + if (notify_page_fault(regs, error_code) == NOTIFY_STOP) 332 331 return; 333 332 /* 334 333 * Don't take the mm semaphore here. If we fixup a prefetch ··· 336 337 goto bad_area_nosemaphore; 337 338 } 338 339 339 - if (notify_page_fault(DIE_PAGE_FAULT, "page fault", regs, error_code, 14, 340 - SIGSEGV) == NOTIFY_STOP) 340 + if (notify_page_fault(regs, error_code) == NOTIFY_STOP) 341 341 return; 342 342 343 343 /* It's safe to allow irq's after cr2 has been saved and the vmalloc

+4

arch/i386/mm/init.c

··· 62 62 63 63 #ifdef CONFIG_X86_PAE 64 64 pmd_table = (pmd_t *) alloc_bootmem_low_pages(PAGE_SIZE); 65 + paravirt_alloc_pd(__pa(pmd_table) >> PAGE_SHIFT); 65 66 set_pgd(pgd, __pgd(__pa(pmd_table) | _PAGE_PRESENT)); 66 67 pud = pud_offset(pgd, 0); 67 68 if (pmd_table != pmd_offset(pud, 0)) ··· 83 82 { 84 83 if (pmd_none(*pmd)) { 85 84 pte_t *page_table = (pte_t *) alloc_bootmem_low_pages(PAGE_SIZE); 85 + paravirt_alloc_pt(__pa(page_table) >> PAGE_SHIFT); 86 86 set_pmd(pmd, __pmd(__pa(page_table) | _PAGE_TABLE)); 87 87 if (page_table != pte_offset_kernel(pmd, 0)) 88 88 BUG(); ··· 347 345 /* Init entries of the first-level page table to the zero page */ 348 346 for (i = 0; i < PTRS_PER_PGD; i++) 349 347 set_pgd(pgd_base + i, __pgd(__pa(empty_zero_page) | _PAGE_PRESENT)); 348 + #else 349 + paravirt_alloc_pd(__pa(swapper_pg_dir) >> PAGE_SHIFT); 350 350 #endif 351 351 352 352 /* Enable PSE if available */

+2

arch/i386/mm/pageattr.c

··· 60 60 address = __pa(address); 61 61 addr = address & LARGE_PAGE_MASK; 62 62 pbase = (pte_t *)page_address(base); 63 + paravirt_alloc_pt(page_to_pfn(base)); 63 64 for (i = 0; i < PTRS_PER_PTE; i++, addr += PAGE_SIZE) { 64 65 set_pte(&pbase[i], pfn_pte(addr >> PAGE_SHIFT, 65 66 addr == address ? prot : ref_prot)); ··· 173 172 if (!PageReserved(kpte_page)) { 174 173 if (cpu_has_pse && (page_private(kpte_page) == 0)) { 175 174 ClearPagePrivate(kpte_page); 175 + paravirt_release_pt(page_to_pfn(kpte_page)); 176 176 list_add(&kpte_page->lru, &df_list); 177 177 revert_page(kpte_page, address); 178 178 }

+22 -4

arch/i386/mm/pgtable.c

··· 171 171 void reserve_top_address(unsigned long reserve) 172 172 { 173 173 BUG_ON(fixmaps > 0); 174 + printk(KERN_INFO "Reserving virtual address space above 0x%08x\n", 175 + (int)-reserve); 174 176 #ifdef CONFIG_COMPAT_VDSO 175 177 BUG_ON(reserve != 0); 176 178 #else ··· 250 248 clone_pgd_range((pgd_t *)pgd + USER_PTRS_PER_PGD, 251 249 swapper_pg_dir + USER_PTRS_PER_PGD, 252 250 KERNEL_PGD_PTRS); 251 + 253 252 if (PTRS_PER_PMD > 1) 254 253 return; 254 + 255 + /* must happen under lock */ 256 + paravirt_alloc_pd_clone(__pa(pgd) >> PAGE_SHIFT, 257 + __pa(swapper_pg_dir) >> PAGE_SHIFT, 258 + USER_PTRS_PER_PGD, PTRS_PER_PGD - USER_PTRS_PER_PGD); 255 259 256 260 pgd_list_add(pgd); 257 261 spin_unlock_irqrestore(&pgd_lock, flags); ··· 268 260 { 269 261 unsigned long flags; /* can be called from interrupt context */ 270 262 263 + paravirt_release_pd(__pa(pgd) >> PAGE_SHIFT); 271 264 spin_lock_irqsave(&pgd_lock, flags); 272 265 pgd_list_del(pgd); 273 266 spin_unlock_irqrestore(&pgd_lock, flags); ··· 286 277 pmd_t *pmd = kmem_cache_alloc(pmd_cache, GFP_KERNEL); 287 278 if (!pmd) 288 279 goto out_oom; 280 + paravirt_alloc_pd(__pa(pmd) >> PAGE_SHIFT); 289 281 set_pgd(&pgd[i], __pgd(1 + __pa(pmd))); 290 282 } 291 283 return pgd; 292 284 293 285 out_oom: 294 - for (i--; i >= 0; i--) 295 - kmem_cache_free(pmd_cache, (void *)__va(pgd_val(pgd[i])-1)); 286 + for (i--; i >= 0; i--) { 287 + pgd_t pgdent = pgd[i]; 288 + void* pmd = (void *)__va(pgd_val(pgdent)-1); 289 + paravirt_release_pd(__pa(pmd) >> PAGE_SHIFT); 290 + kmem_cache_free(pmd_cache, pmd); 291 + } 296 292 kmem_cache_free(pgd_cache, pgd); 297 293 return NULL; 298 294 } ··· 308 294 309 295 /* in the PAE case user pgd entries are overwritten before usage */ 310 296 if (PTRS_PER_PMD > 1) 311 - for (i = 0; i < USER_PTRS_PER_PGD; ++i) 312 - kmem_cache_free(pmd_cache, (void *)__va(pgd_val(pgd[i])-1)); 297 + for (i = 0; i < USER_PTRS_PER_PGD; ++i) { 298 + pgd_t pgdent = pgd[i]; 299 + void* pmd = (void *)__va(pgd_val(pgdent)-1); 300 + paravirt_release_pd(__pa(pmd) >> PAGE_SHIFT); 301 + kmem_cache_free(pmd_cache, pmd); 302 + } 313 303 /* in the non-PAE case, free_pgtables() clears user pgd entries */ 314 304 kmem_cache_free(pgd_cache, pgd); 315 305 }

+5 -4

arch/i386/oprofile/op_model_ppro.c

··· 24 24 25 25 #define CTR_IS_RESERVED(msrs,c) (msrs->counters[(c)].addr ? 1 : 0) 26 26 #define CTR_READ(l,h,msrs,c) do {rdmsr(msrs->counters[(c)].addr, (l), (h));} while (0) 27 - #define CTR_WRITE(l,msrs,c) do {wrmsr(msrs->counters[(c)].addr, -(u32)(l), -1);} while (0) 27 + #define CTR_32BIT_WRITE(l,msrs,c) \ 28 + do {wrmsr(msrs->counters[(c)].addr, -(u32)(l), 0);} while (0) 28 29 #define CTR_OVERFLOWED(n) (!((n) & (1U<<31))) 29 30 30 31 #define CTRL_IS_RESERVED(msrs,c) (msrs->controls[(c)].addr ? 1 : 0) ··· 80 79 for (i = 0; i < NUM_COUNTERS; ++i) { 81 80 if (unlikely(!CTR_IS_RESERVED(msrs,i))) 82 81 continue; 83 - CTR_WRITE(1, msrs, i); 82 + CTR_32BIT_WRITE(1, msrs, i); 84 83 } 85 84 86 85 /* enable active counters */ ··· 88 87 if ((counter_config[i].enabled) && (CTR_IS_RESERVED(msrs,i))) { 89 88 reset_value[i] = counter_config[i].count; 90 89 91 - CTR_WRITE(counter_config[i].count, msrs, i); 90 + CTR_32BIT_WRITE(counter_config[i].count, msrs, i); 92 91 93 92 CTRL_READ(low, high, msrs, i); 94 93 CTRL_CLEAR(low); ··· 117 116 CTR_READ(low, high, msrs, i); 118 117 if (CTR_OVERFLOWED(low)) { 119 118 oprofile_add_sample(regs, i); 120 - CTR_WRITE(reset_value[i], msrs, i); 119 + CTR_32BIT_WRITE(reset_value[i], msrs, i); 121 120 } 122 121 } 123 122

+1 -1

arch/i386/pci/Makefile

··· 1 1 obj-y := i386.o init.o 2 2 3 3 obj-$(CONFIG_PCI_BIOS) += pcbios.o 4 - obj-$(CONFIG_PCI_MMCONFIG) += mmconfig.o direct.o 4 + obj-$(CONFIG_PCI_MMCONFIG) += mmconfig.o direct.o mmconfig-shared.o 5 5 obj-$(CONFIG_PCI_DIRECT) += direct.o 6 6 7 7 pci-y := fixup.o

+264

arch/i386/pci/mmconfig-shared.c

··· 1 + /* 2 + * mmconfig-shared.c - Low-level direct PCI config space access via 3 + * MMCONFIG - common code between i386 and x86-64. 4 + * 5 + * This code does: 6 + * - known chipset handling 7 + * - ACPI decoding and validation 8 + * 9 + * Per-architecture code takes care of the mappings and accesses 10 + * themselves. 11 + */ 12 + 13 + #include <linux/pci.h> 14 + #include <linux/init.h> 15 + #include <linux/acpi.h> 16 + #include <linux/bitmap.h> 17 + #include <asm/e820.h> 18 + 19 + #include "pci.h" 20 + 21 + /* aperture is up to 256MB but BIOS may reserve less */ 22 + #define MMCONFIG_APER_MIN (2 * 1024*1024) 23 + #define MMCONFIG_APER_MAX (256 * 1024*1024) 24 + 25 + DECLARE_BITMAP(pci_mmcfg_fallback_slots, 32*PCI_MMCFG_MAX_CHECK_BUS); 26 + 27 + /* K8 systems have some devices (typically in the builtin northbridge) 28 + that are only accessible using type1 29 + Normally this can be expressed in the MCFG by not listing them 30 + and assigning suitable _SEGs, but this isn't implemented in some BIOS. 31 + Instead try to discover all devices on bus 0 that are unreachable using MM 32 + and fallback for them. */ 33 + static void __init unreachable_devices(void) 34 + { 35 + int i, bus; 36 + /* Use the max bus number from ACPI here? */ 37 + for (bus = 0; bus < PCI_MMCFG_MAX_CHECK_BUS; bus++) { 38 + for (i = 0; i < 32; i++) { 39 + unsigned int devfn = PCI_DEVFN(i, 0); 40 + u32 val1, val2; 41 + 42 + pci_conf1_read(0, bus, devfn, 0, 4, &val1); 43 + if (val1 == 0xffffffff) 44 + continue; 45 + 46 + if (pci_mmcfg_arch_reachable(0, bus, devfn)) { 47 + raw_pci_ops->read(0, bus, devfn, 0, 4, &val2); 48 + if (val1 == val2) 49 + continue; 50 + } 51 + set_bit(i + 32 * bus, pci_mmcfg_fallback_slots); 52 + printk(KERN_NOTICE "PCI: No mmconfig possible on device" 53 + " %02x:%02x\n", bus, i); 54 + } 55 + } 56 + } 57 + 58 + static const char __init *pci_mmcfg_e7520(void) 59 + { 60 + u32 win; 61 + pci_conf1_read(0, 0, PCI_DEVFN(0,0), 0xce, 2, &win); 62 + 63 + pci_mmcfg_config_num = 1; 64 + pci_mmcfg_config = kzalloc(sizeof(pci_mmcfg_config[0]), GFP_KERNEL); 65 + if (!pci_mmcfg_config) 66 + return NULL; 67 + pci_mmcfg_config[0].address = (win & 0xf000) << 16; 68 + pci_mmcfg_config[0].pci_segment = 0; 69 + pci_mmcfg_config[0].start_bus_number = 0; 70 + pci_mmcfg_config[0].end_bus_number = 255; 71 + 72 + return "Intel Corporation E7520 Memory Controller Hub"; 73 + } 74 + 75 + static const char __init *pci_mmcfg_intel_945(void) 76 + { 77 + u32 pciexbar, mask = 0, len = 0; 78 + 79 + pci_mmcfg_config_num = 1; 80 + 81 + pci_conf1_read(0, 0, PCI_DEVFN(0,0), 0x48, 4, &pciexbar); 82 + 83 + /* Enable bit */ 84 + if (!(pciexbar & 1)) 85 + pci_mmcfg_config_num = 0; 86 + 87 + /* Size bits */ 88 + switch ((pciexbar >> 1) & 3) { 89 + case 0: 90 + mask = 0xf0000000U; 91 + len = 0x10000000U; 92 + break; 93 + case 1: 94 + mask = 0xf8000000U; 95 + len = 0x08000000U; 96 + break; 97 + case 2: 98 + mask = 0xfc000000U; 99 + len = 0x04000000U; 100 + break; 101 + default: 102 + pci_mmcfg_config_num = 0; 103 + } 104 + 105 + /* Errata #2, things break when not aligned on a 256Mb boundary */ 106 + /* Can only happen in 64M/128M mode */ 107 + 108 + if ((pciexbar & mask) & 0x0fffffffU) 109 + pci_mmcfg_config_num = 0; 110 + 111 + if (pci_mmcfg_config_num) { 112 + pci_mmcfg_config = kzalloc(sizeof(pci_mmcfg_config[0]), GFP_KERNEL); 113 + if (!pci_mmcfg_config) 114 + return NULL; 115 + pci_mmcfg_config[0].address = pciexbar & mask; 116 + pci_mmcfg_config[0].pci_segment = 0; 117 + pci_mmcfg_config[0].start_bus_number = 0; 118 + pci_mmcfg_config[0].end_bus_number = (len >> 20) - 1; 119 + } 120 + 121 + return "Intel Corporation 945G/GZ/P/PL Express Memory Controller Hub"; 122 + } 123 + 124 + struct pci_mmcfg_hostbridge_probe { 125 + u32 vendor; 126 + u32 device; 127 + const char *(*probe)(void); 128 + }; 129 + 130 + static struct pci_mmcfg_hostbridge_probe pci_mmcfg_probes[] __initdata = { 131 + { PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_E7520_MCH, pci_mmcfg_e7520 }, 132 + { PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82945G_HB, pci_mmcfg_intel_945 }, 133 + }; 134 + 135 + static int __init pci_mmcfg_check_hostbridge(void) 136 + { 137 + u32 l; 138 + u16 vendor, device; 139 + int i; 140 + const char *name; 141 + 142 + pci_conf1_read(0, 0, PCI_DEVFN(0,0), 0, 4, &l); 143 + vendor = l & 0xffff; 144 + device = (l >> 16) & 0xffff; 145 + 146 + pci_mmcfg_config_num = 0; 147 + pci_mmcfg_config = NULL; 148 + name = NULL; 149 + 150 + for (i = 0; !name && i < ARRAY_SIZE(pci_mmcfg_probes); i++) { 151 + if (pci_mmcfg_probes[i].vendor == vendor && 152 + pci_mmcfg_probes[i].device == device) 153 + name = pci_mmcfg_probes[i].probe(); 154 + } 155 + 156 + if (name) { 157 + printk(KERN_INFO "PCI: Found %s %s MMCONFIG support.\n", 158 + name, pci_mmcfg_config_num ? "with" : "without"); 159 + } 160 + 161 + return name != NULL; 162 + } 163 + 164 + static void __init pci_mmcfg_insert_resources(void) 165 + { 166 + #define PCI_MMCFG_RESOURCE_NAME_LEN 19 167 + int i; 168 + struct resource *res; 169 + char *names; 170 + unsigned num_buses; 171 + 172 + res = kcalloc(PCI_MMCFG_RESOURCE_NAME_LEN + sizeof(*res), 173 + pci_mmcfg_config_num, GFP_KERNEL); 174 + if (!res) { 175 + printk(KERN_ERR "PCI: Unable to allocate MMCONFIG resources\n"); 176 + return; 177 + } 178 + 179 + names = (void *)&res[pci_mmcfg_config_num]; 180 + for (i = 0; i < pci_mmcfg_config_num; i++, res++) { 181 + struct acpi_mcfg_allocation *cfg = &pci_mmcfg_config[i]; 182 + num_buses = cfg->end_bus_number - cfg->start_bus_number + 1; 183 + res->name = names; 184 + snprintf(names, PCI_MMCFG_RESOURCE_NAME_LEN, "PCI MMCONFIG %u", 185 + cfg->pci_segment); 186 + res->start = cfg->address; 187 + res->end = res->start + (num_buses << 20) - 1; 188 + res->flags = IORESOURCE_MEM | IORESOURCE_BUSY; 189 + insert_resource(&iomem_resource, res); 190 + names += PCI_MMCFG_RESOURCE_NAME_LEN; 191 + } 192 + } 193 + 194 + static void __init pci_mmcfg_reject_broken(int type) 195 + { 196 + typeof(pci_mmcfg_config[0]) *cfg; 197 + 198 + if ((pci_mmcfg_config_num == 0) || 199 + (pci_mmcfg_config == NULL) || 200 + (pci_mmcfg_config[0].address == 0)) 201 + return; 202 + 203 + cfg = &pci_mmcfg_config[0]; 204 + 205 + /* 206 + * Handle more broken MCFG tables on Asus etc. 207 + * They only contain a single entry for bus 0-0. 208 + */ 209 + if (pci_mmcfg_config_num == 1 && 210 + cfg->pci_segment == 0 && 211 + (cfg->start_bus_number | cfg->end_bus_number) == 0) { 212 + printk(KERN_ERR "PCI: start and end of bus number is 0. " 213 + "Rejected as broken MCFG.\n"); 214 + goto reject; 215 + } 216 + 217 + /* 218 + * Only do this check when type 1 works. If it doesn't work 219 + * assume we run on a Mac and always use MCFG 220 + */ 221 + if (type == 1 && !e820_all_mapped(cfg->address, 222 + cfg->address + MMCONFIG_APER_MIN, 223 + E820_RESERVED)) { 224 + printk(KERN_ERR "PCI: BIOS Bug: MCFG area at %Lx is not" 225 + " E820-reserved\n", cfg->address); 226 + goto reject; 227 + } 228 + return; 229 + 230 + reject: 231 + printk(KERN_ERR "PCI: Not using MMCONFIG.\n"); 232 + kfree(pci_mmcfg_config); 233 + pci_mmcfg_config = NULL; 234 + pci_mmcfg_config_num = 0; 235 + } 236 + 237 + void __init pci_mmcfg_init(int type) 238 + { 239 + int known_bridge = 0; 240 + 241 + if ((pci_probe & PCI_PROBE_MMCONF) == 0) 242 + return; 243 + 244 + if (type == 1 && pci_mmcfg_check_hostbridge()) 245 + known_bridge = 1; 246 + 247 + if (!known_bridge) { 248 + acpi_table_parse(ACPI_SIG_MCFG, acpi_parse_mcfg); 249 + pci_mmcfg_reject_broken(type); 250 + } 251 + 252 + if ((pci_mmcfg_config_num == 0) || 253 + (pci_mmcfg_config == NULL) || 254 + (pci_mmcfg_config[0].address == 0)) 255 + return; 256 + 257 + if (pci_mmcfg_arch_init()) { 258 + if (type == 1) 259 + unreachable_devices(); 260 + if (known_bridge) 261 + pci_mmcfg_insert_resources(); 262 + pci_probe = (pci_probe & ~PCI_PROBE_MASK) | PCI_PROBE_MMCONF; 263 + } 264 + }

+11 -85

arch/i386/pci/mmconfig.c

··· 15 15 #include <asm/e820.h> 16 16 #include "pci.h" 17 17 18 - /* aperture is up to 256MB but BIOS may reserve less */ 19 - #define MMCONFIG_APER_MIN (2 * 1024*1024) 20 - #define MMCONFIG_APER_MAX (256 * 1024*1024) 21 - 22 18 /* Assume systems with more busses have correct MCFG */ 23 - #define MAX_CHECK_BUS 16 24 - 25 19 #define mmcfg_virt_addr ((void __iomem *) fix_to_virt(FIX_PCIE_MCFG)) 26 20 27 21 /* The base address of the last MMCONFIG device accessed */ 28 22 static u32 mmcfg_last_accessed_device; 29 23 static int mmcfg_last_accessed_cpu; 30 24 31 - static DECLARE_BITMAP(fallback_slots, MAX_CHECK_BUS*32); 32 - 33 25 /* 34 26 * Functions for accessing PCI configuration space with MMCONFIG accesses 35 27 */ 36 28 static u32 get_base_addr(unsigned int seg, int bus, unsigned devfn) 37 29 { 38 - int cfg_num = -1; 39 30 struct acpi_mcfg_allocation *cfg; 31 + int cfg_num; 40 32 41 - if (seg == 0 && bus < MAX_CHECK_BUS && 42 - test_bit(PCI_SLOT(devfn) + 32*bus, fallback_slots)) 33 + if (seg == 0 && bus < PCI_MMCFG_MAX_CHECK_BUS && 34 + test_bit(PCI_SLOT(devfn) + 32*bus, pci_mmcfg_fallback_slots)) 43 35 return 0; 44 36 45 - while (1) { 46 - ++cfg_num; 47 - if (cfg_num >= pci_mmcfg_config_num) { 48 - break; 49 - } 37 + for (cfg_num = 0; cfg_num < pci_mmcfg_config_num; cfg_num++) { 50 38 cfg = &pci_mmcfg_config[cfg_num]; 51 - if (cfg->pci_segment != seg) 52 - continue; 53 - if ((cfg->start_bus_number <= bus) && 39 + if (cfg->pci_segment == seg && 40 + (cfg->start_bus_number <= bus) && 54 41 (cfg->end_bus_number >= bus)) 55 42 return cfg->address; 56 43 } 57 - 58 - /* Handle more broken MCFG tables on Asus etc. 59 - They only contain a single entry for bus 0-0. Assume 60 - this applies to all busses. */ 61 - cfg = &pci_mmcfg_config[0]; 62 - if (pci_mmcfg_config_num == 1 && 63 - cfg->pci_segment == 0 && 64 - (cfg->start_bus_number | cfg->end_bus_number) == 0) 65 - return cfg->address; 66 44 67 45 /* Fall back to type 0 */ 68 46 return 0; ··· 136 158 .write = pci_mmcfg_write, 137 159 }; 138 160 139 - /* K8 systems have some devices (typically in the builtin northbridge) 140 - that are only accessible using type1 141 - Normally this can be expressed in the MCFG by not listing them 142 - and assigning suitable _SEGs, but this isn't implemented in some BIOS. 143 - Instead try to discover all devices on bus 0 that are unreachable using MM 144 - and fallback for them. */ 145 - static __init void unreachable_devices(void) 161 + int __init pci_mmcfg_arch_reachable(unsigned int seg, unsigned int bus, 162 + unsigned int devfn) 146 163 { 147 - int i, k; 148 - unsigned long flags; 149 - 150 - for (k = 0; k < MAX_CHECK_BUS; k++) { 151 - for (i = 0; i < 32; i++) { 152 - u32 val1; 153 - u32 addr; 154 - 155 - pci_conf1_read(0, k, PCI_DEVFN(i, 0), 0, 4, &val1); 156 - if (val1 == 0xffffffff) 157 - continue; 158 - 159 - /* Locking probably not needed, but safer */ 160 - spin_lock_irqsave(&pci_config_lock, flags); 161 - addr = get_base_addr(0, k, PCI_DEVFN(i, 0)); 162 - if (addr != 0) 163 - pci_exp_set_dev_base(addr, k, PCI_DEVFN(i, 0)); 164 - if (addr == 0 || 165 - readl((u32 __iomem *)mmcfg_virt_addr) != val1) { 166 - set_bit(i + 32*k, fallback_slots); 167 - printk(KERN_NOTICE 168 - "PCI: No mmconfig possible on %x:%x\n", k, i); 169 - } 170 - spin_unlock_irqrestore(&pci_config_lock, flags); 171 - } 172 - } 164 + return get_base_addr(seg, bus, devfn) != 0; 173 165 } 174 166 175 - void __init pci_mmcfg_init(int type) 167 + int __init pci_mmcfg_arch_init(void) 176 168 { 177 - if ((pci_probe & PCI_PROBE_MMCONF) == 0) 178 - return; 179 - 180 - acpi_table_parse(ACPI_SIG_MCFG, acpi_parse_mcfg); 181 - if ((pci_mmcfg_config_num == 0) || 182 - (pci_mmcfg_config == NULL) || 183 - (pci_mmcfg_config[0].address == 0)) 184 - return; 185 - 186 - /* Only do this check when type 1 works. If it doesn't work 187 - assume we run on a Mac and always use MCFG */ 188 - if (type == 1 && !e820_all_mapped(pci_mmcfg_config[0].address, 189 - pci_mmcfg_config[0].address + MMCONFIG_APER_MIN, 190 - E820_RESERVED)) { 191 - printk(KERN_ERR "PCI: BIOS Bug: MCFG area at %lx is not E820-reserved\n", 192 - (unsigned long)pci_mmcfg_config[0].address); 193 - printk(KERN_ERR "PCI: Not using MMCONFIG.\n"); 194 - return; 195 - } 196 - 197 169 printk(KERN_INFO "PCI: Using MMCONFIG\n"); 198 170 raw_pci_ops = &pci_mmcfg; 199 - pci_probe = (pci_probe & ~PCI_PROBE_MASK) | PCI_PROBE_MMCONF; 200 - 201 - unreachable_devices(); 171 + return 1; 202 172 }

+10

arch/i386/pci/pci.h

··· 94 94 extern void pci_mmcfg_init(int type); 95 95 extern void pcibios_sort(void); 96 96 97 + /* pci-mmconfig.c */ 98 + 99 + /* Verify the first 16 busses. We assume that systems with more busses 100 + get MCFG right. */ 101 + #define PCI_MMCFG_MAX_CHECK_BUS 16 102 + extern DECLARE_BITMAP(pci_mmcfg_fallback_slots, 32*PCI_MMCFG_MAX_CHECK_BUS); 103 + 104 + extern int __init pci_mmcfg_arch_reachable(unsigned int seg, unsigned int bus, 105 + unsigned int devfn); 106 + extern int __init pci_mmcfg_arch_init(void);

+14 -8

arch/x86_64/Kconfig

··· 152 152 Optimize for Intel Pentium 4 and older Nocona/Dempsey Xeon CPUs 153 153 with Intel Extended Memory 64 Technology(EM64T). For details see 154 154 <http://www.intel.com/technology/64bitextensions/>. 155 - Note the the latest Xeons (Xeon 51xx and 53xx) are not based on the 156 - Netburst core and shouldn't use this option. You can distingush them 155 + Note that the latest Xeons (Xeon 51xx and 53xx) are not based on the 156 + Netburst core and shouldn't use this option. You can distinguish them 157 157 using the cpu family field 158 - in /proc/cpuinfo. Family 15 is a older Xeon, Family 6 a newer one 159 - (this rule only applies to system that support EM64T) 158 + in /proc/cpuinfo. Family 15 is an older Xeon, Family 6 a newer one 159 + (this rule only applies to systems that support EM64T) 160 160 161 161 config MCORE2 162 162 bool "Intel Core2 / newer Xeon" 163 163 help 164 164 Optimize for Intel Core2 and newer Xeons (51xx) 165 - You can distingush the newer Xeons from the older ones using 166 - the cpu family field in /proc/cpuinfo. 15 is a older Xeon 165 + You can distinguish the newer Xeons from the older ones using 166 + the cpu family field in /proc/cpuinfo. 15 is an older Xeon 167 167 (use CONFIG_MPSC then), 6 is a newer one. This rule only 168 168 applies to CPUs that support EM64T. 169 169 ··· 458 458 on systems with more than 3GB. This is usually needed for USB, 459 459 sound, many IDE/SATA chipsets and some other devices. 460 460 Provides a driver for the AMD Athlon64/Opteron/Turion/Sempron GART 461 - based IOMMU and a software bounce buffer based IOMMU used on Intel 462 - systems and as fallback. 461 + based hardware IOMMU and a software bounce buffer based IOMMU used 462 + on Intel systems and as fallback. 463 463 The code is only active when needed (enough memory and limited 464 464 device) unless CONFIG_IOMMU_DEBUG or iommu=force is specified 465 465 too. ··· 496 496 # need this always selected by IOMMU for the VIA workaround 497 497 config SWIOTLB 498 498 bool 499 + help 500 + Support for software bounce buffers used on x86-64 systems 501 + which don't have a hardware IOMMU (e.g. the current generation 502 + of Intel's x86-64 CPUs). Using this PCI devices which can only 503 + access 32-bits of memory can be used on systems with more than 504 + 3 GB of memory. If unsure, say Y. 499 505 500 506 config X86_MCE 501 507 bool "Machine check support" if EMBEDDED

+34 -11

arch/x86_64/defconfig

··· 1 1 # 2 2 # Automatically generated make config: don't edit 3 - # Linux kernel version: 2.6.20-rc3 4 - # Fri Jan 5 11:54:41 2007 3 + # Linux kernel version: 2.6.20-git8 4 + # Tue Feb 13 11:25:16 2007 5 5 # 6 6 CONFIG_X86_64=y 7 7 CONFIG_64BIT=y ··· 11 11 CONFIG_STACKTRACE_SUPPORT=y 12 12 CONFIG_SEMAPHORE_SLEEPERS=y 13 13 CONFIG_MMU=y 14 + CONFIG_ZONE_DMA=y 14 15 CONFIG_RWSEM_GENERIC_SPINLOCK=y 15 16 CONFIG_GENERIC_HWEIGHT=y 16 17 CONFIG_GENERIC_CALIBRATE_DELAY=y ··· 154 153 CONFIG_SPLIT_PTLOCK_CPUS=4 155 154 CONFIG_MIGRATION=y 156 155 CONFIG_RESOURCES_64BIT=y 156 + CONFIG_ZONE_DMA_FLAG=1 157 157 CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID=y 158 158 CONFIG_OUT_OF_LINE_PFN_TO_PAGE=y 159 159 CONFIG_NR_CPUS=32 ··· 203 201 CONFIG_ACPI_SLEEP=y 204 202 CONFIG_ACPI_SLEEP_PROC_FS=y 205 203 CONFIG_ACPI_SLEEP_PROC_SLEEP=y 204 + CONFIG_ACPI_PROCFS=y 206 205 CONFIG_ACPI_AC=y 207 206 CONFIG_ACPI_BATTERY=y 208 207 CONFIG_ACPI_BUTTON=y 209 - # CONFIG_ACPI_VIDEO is not set 210 208 # CONFIG_ACPI_HOTKEY is not set 211 209 CONFIG_ACPI_FAN=y 212 210 # CONFIG_ACPI_DOCK is not set 211 + # CONFIG_ACPI_BAY is not set 213 212 CONFIG_ACPI_PROCESSOR=y 214 213 CONFIG_ACPI_HOTPLUG_CPU=y 215 214 CONFIG_ACPI_THERMAL=y ··· 266 263 CONFIG_PCIEPORTBUS=y 267 264 CONFIG_PCIEAER=y 268 265 CONFIG_PCI_MSI=y 269 - # CONFIG_PCI_MULTITHREAD_PROBE is not set 270 266 # CONFIG_PCI_DEBUG is not set 271 267 # CONFIG_HT_IRQ is not set 272 268 ··· 400 398 CONFIG_PREVENT_FIRMWARE_BUILD=y 401 399 CONFIG_FW_LOADER=y 402 400 # CONFIG_DEBUG_DRIVER is not set 401 + # CONFIG_DEBUG_DEVRES is not set 403 402 # CONFIG_SYS_HYPERVISOR is not set 404 403 405 404 # ··· 469 466 # CONFIG_BLK_DEV_IDETAPE is not set 470 467 # CONFIG_BLK_DEV_IDEFLOPPY is not set 471 468 # CONFIG_BLK_DEV_IDESCSI is not set 469 + CONFIG_BLK_DEV_IDEACPI=y 472 470 # CONFIG_IDE_TASK_IOCTL is not set 473 471 474 472 # ··· 501 497 # CONFIG_BLK_DEV_JMICRON is not set 502 498 # CONFIG_BLK_DEV_SC1200 is not set 503 499 CONFIG_BLK_DEV_PIIX=y 500 + # CONFIG_BLK_DEV_IT8213 is not set 504 501 # CONFIG_BLK_DEV_IT821X is not set 505 502 # CONFIG_BLK_DEV_NS87415 is not set 506 503 # CONFIG_BLK_DEV_PDC202XX_OLD is not set ··· 512 507 # CONFIG_BLK_DEV_SLC90E66 is not set 513 508 # CONFIG_BLK_DEV_TRM290 is not set 514 509 # CONFIG_BLK_DEV_VIA82CXXX is not set 510 + # CONFIG_BLK_DEV_TC86C001 is not set 515 511 # CONFIG_IDE_ARM is not set 516 512 CONFIG_BLK_DEV_IDEDMA=y 517 513 # CONFIG_IDEDMA_IVB is not set ··· 605 599 # Serial ATA (prod) and Parallel ATA (experimental) drivers 606 600 # 607 601 CONFIG_ATA=y 602 + # CONFIG_ATA_NONSTANDARD is not set 608 603 CONFIG_SATA_AHCI=y 609 604 CONFIG_SATA_SVW=y 610 605 CONFIG_ATA_PIIX=y ··· 621 614 # CONFIG_SATA_ULI is not set 622 615 CONFIG_SATA_VIA=y 623 616 # CONFIG_SATA_VITESSE is not set 617 + # CONFIG_SATA_INIC162X is not set 624 618 CONFIG_SATA_INTEL_COMBINED=y 625 619 # CONFIG_PATA_ALI is not set 626 620 # CONFIG_PATA_AMD is not set ··· 638 630 # CONFIG_PATA_HPT3X2N is not set 639 631 # CONFIG_PATA_HPT3X3 is not set 640 632 # CONFIG_PATA_IT821X is not set 633 + # CONFIG_PATA_IT8213 is not set 641 634 # CONFIG_PATA_JMICRON is not set 642 635 # CONFIG_PATA_TRIFLEX is not set 643 636 # CONFIG_PATA_MARVELL is not set ··· 691 682 # Subsystem Options 692 683 # 693 684 # CONFIG_IEEE1394_VERBOSEDEBUG is not set 694 - # CONFIG_IEEE1394_OUI_DB is not set 695 685 # CONFIG_IEEE1394_EXTRA_CONFIG_ROMS is not set 696 - # CONFIG_IEEE1394_EXPORT_FULL_API is not set 697 686 698 687 # 699 688 # Device Drivers ··· 712 705 # I2O device support 713 706 # 714 707 # CONFIG_I2O is not set 708 + 709 + # 710 + # Macintosh device drivers 711 + # 712 + # CONFIG_MAC_EMUMOUSEBTN is not set 715 713 716 714 # 717 715 # Network device support ··· 786 774 # CONFIG_EPIC100 is not set 787 775 # CONFIG_SUNDANCE is not set 788 776 # CONFIG_VIA_RHINE is not set 777 + # CONFIG_SC92031 is not set 789 778 790 779 # 791 780 # Ethernet (1000 Mbit) ··· 808 795 CONFIG_TIGON3=y 809 796 CONFIG_BNX2=y 810 797 # CONFIG_QLA3XXX is not set 798 + # CONFIG_ATL1 is not set 811 799 812 800 # 813 801 # Ethernet (10000 Mbit) 814 802 # 815 803 # CONFIG_CHELSIO_T1 is not set 804 + # CONFIG_CHELSIO_T3 is not set 816 805 # CONFIG_IXGB is not set 817 806 CONFIG_S2IO=m 818 807 # CONFIG_S2IO_NAPI is not set ··· 1130 1115 # Open Sound System 1131 1116 # 1132 1117 CONFIG_SOUND_PRIME=y 1118 + CONFIG_OBSOLETE_OSS=y 1133 1119 # CONFIG_SOUND_BT878 is not set 1134 1120 # CONFIG_SOUND_ES1371 is not set 1135 1121 CONFIG_SOUND_ICH=y ··· 1144 1128 # HID Devices 1145 1129 # 1146 1130 CONFIG_HID=y 1131 + # CONFIG_HID_DEBUG is not set 1147 1132 1148 1133 # 1149 1134 # USB support ··· 1159 1142 # Miscellaneous USB options 1160 1143 # 1161 1144 CONFIG_USB_DEVICEFS=y 1162 - # CONFIG_USB_BANDWIDTH is not set 1163 1145 # CONFIG_USB_DYNAMIC_MINORS is not set 1164 1146 # CONFIG_USB_SUSPEND is not set 1165 - # CONFIG_USB_MULTITHREAD_PROBE is not set 1166 1147 # CONFIG_USB_OTG is not set 1167 1148 1168 1149 # ··· 1170 1155 # CONFIG_USB_EHCI_SPLIT_ISO is not set 1171 1156 # CONFIG_USB_EHCI_ROOT_HUB_TT is not set 1172 1157 # CONFIG_USB_EHCI_TT_NEWSCHED is not set 1158 + # CONFIG_USB_EHCI_BIG_ENDIAN_MMIO is not set 1173 1159 # CONFIG_USB_ISP116X_HCD is not set 1174 1160 CONFIG_USB_OHCI_HCD=y 1175 - # CONFIG_USB_OHCI_BIG_ENDIAN is not set 1161 + # CONFIG_USB_OHCI_BIG_ENDIAN_DESC is not set 1162 + # CONFIG_USB_OHCI_BIG_ENDIAN_MMIO is not set 1176 1163 CONFIG_USB_OHCI_LITTLE_ENDIAN=y 1177 1164 CONFIG_USB_UHCI_HCD=y 1178 1165 # CONFIG_USB_SL811_HCD is not set ··· 1225 1208 # CONFIG_USB_ATI_REMOTE2 is not set 1226 1209 # CONFIG_USB_KEYSPAN_REMOTE is not set 1227 1210 # CONFIG_USB_APPLETOUCH is not set 1211 + # CONFIG_USB_GTCO is not set 1228 1212 1229 1213 # 1230 1214 # USB Imaging devices ··· 1328 1310 1329 1311 # 1330 1312 # DMA Devices 1313 + # 1314 + 1315 + # 1316 + # Auxiliary Display support 1331 1317 # 1332 1318 1333 1319 # ··· 1534 1512 CONFIG_DEBUG_FS=y 1535 1513 # CONFIG_HEADERS_CHECK is not set 1536 1514 CONFIG_DEBUG_KERNEL=y 1515 + # CONFIG_DEBUG_SHIRQ is not set 1537 1516 CONFIG_LOG_BUF_SHIFT=18 1538 1517 CONFIG_DETECT_SOFTLOCKUP=y 1539 1518 # CONFIG_SCHEDSTATS is not set ··· 1543 1520 # CONFIG_RT_MUTEX_TESTER is not set 1544 1521 # CONFIG_DEBUG_SPINLOCK is not set 1545 1522 # CONFIG_DEBUG_MUTEXES is not set 1546 - # CONFIG_DEBUG_RWSEMS is not set 1547 1523 # CONFIG_DEBUG_LOCK_ALLOC is not set 1548 1524 # CONFIG_PROVE_LOCKING is not set 1549 1525 # CONFIG_DEBUG_SPINLOCK_SLEEP is not set ··· 1582 1560 # CONFIG_LIBCRC32C is not set 1583 1561 CONFIG_ZLIB_INFLATE=y 1584 1562 CONFIG_PLIST=y 1585 - CONFIG_IOMAP_COPY=y 1563 + CONFIG_HAS_IOMEM=y 1564 + CONFIG_HAS_IOPORT=y

+8 -3

arch/x86_64/ia32/ia32_signal.c

··· 21 21 #include <linux/stddef.h> 22 22 #include <linux/personality.h> 23 23 #include <linux/compat.h> 24 + #include <linux/binfmts.h> 24 25 #include <asm/ucontext.h> 25 26 #include <asm/uaccess.h> 26 27 #include <asm/i387.h> ··· 450 449 451 450 /* Return stub is in 32bit vsyscall page */ 452 451 { 453 - void __user *restorer = VSYSCALL32_SIGRETURN; 452 + void __user *restorer; 453 + if (current->binfmt->hasvdso) 454 + restorer = VSYSCALL32_SIGRETURN; 455 + else 456 + restorer = (void *)&frame->retcode; 454 457 if (ka->sa.sa_flags & SA_RESTORER) 455 458 restorer = ka->sa.sa_restorer; 456 459 err |= __put_user(ptr_to_compat(restorer), &frame->pretcode); ··· 500 495 ptrace_notify(SIGTRAP); 501 496 502 497 #if DEBUG_SIG 503 - printk("SIG deliver (%s:%d): sp=%p pc=%p ra=%p\n", 498 + printk("SIG deliver (%s:%d): sp=%p pc=%lx ra=%u\n", 504 499 current->comm, current->pid, frame, regs->rip, frame->pretcode); 505 500 #endif 506 501 ··· 606 601 ptrace_notify(SIGTRAP); 607 602 608 603 #if DEBUG_SIG 609 - printk("SIG deliver (%s:%d): sp=%p pc=%p ra=%p\n", 604 + printk("SIG deliver (%s:%d): sp=%p pc=%lx ra=%u\n", 610 605 current->comm, current->pid, frame, regs->rip, frame->pretcode); 611 606 #endif 612 607

+1

arch/x86_64/ia32/ia32entry.S

··· 718 718 .quad compat_sys_vmsplice 719 719 .quad compat_sys_move_pages 720 720 .quad sys_getcpu 721 + .quad sys_epoll_pwait 721 722 ia32_syscall_end:

+2

arch/x86_64/kernel/Makefile

··· 43 43 44 44 obj-y += topology.o 45 45 obj-y += intel_cacheinfo.o 46 + obj-y += pcspeaker.o 46 47 47 48 CFLAGS_vsyscall.o := $(PROFILING) -g0 48 49 ··· 57 56 i8237-y += ../../i386/kernel/i8237.o 58 57 msr-$(subst m,y,$(CONFIG_X86_MSR)) += ../../i386/kernel/msr.o 59 58 alternative-y += ../../i386/kernel/alternative.o 59 + pcspeaker-y += ../../i386/kernel/pcspeaker.o

+1 -1

arch/x86_64/kernel/acpi/sleep.c

··· 58 58 unsigned long acpi_video_flags; 59 59 extern char wakeup_start, wakeup_end; 60 60 61 - extern unsigned long FASTCALL(acpi_copy_wakeup_routine(unsigned long)); 61 + extern unsigned long acpi_copy_wakeup_routine(unsigned long); 62 62 63 63 static pgd_t low_ptr; 64 64

+38

arch/x86_64/kernel/e820.c

··· 83 83 return 1; 84 84 } 85 85 86 + #ifdef CONFIG_NUMA 87 + /* NUMA memory to node map */ 88 + if (last >= nodemap_addr && addr < nodemap_addr + nodemap_size) { 89 + *addrp = nodemap_addr + nodemap_size; 90 + return 1; 91 + } 92 + #endif 86 93 /* XXX ramdisk image here? */ 87 94 return 0; 88 95 } ··· 188 181 189 182 printk("end_pfn_map = %lu\n", end_pfn_map); 190 183 return end_pfn; 184 + } 185 + 186 + /* 187 + * Find the hole size in the range. 188 + */ 189 + unsigned long __init e820_hole_size(unsigned long start, unsigned long end) 190 + { 191 + unsigned long ram = 0; 192 + int i; 193 + 194 + for (i = 0; i < e820.nr_map; i++) { 195 + struct e820entry *ei = &e820.map[i]; 196 + unsigned long last, addr; 197 + 198 + if (ei->type != E820_RAM || 199 + ei->addr+ei->size <= start || 200 + ei->addr >= end) 201 + continue; 202 + 203 + addr = round_up(ei->addr, PAGE_SIZE); 204 + if (addr < start) 205 + addr = start; 206 + 207 + last = round_down(ei->addr + ei->size, PAGE_SIZE); 208 + if (last >= end) 209 + last = end; 210 + 211 + if (last > addr) 212 + ram += last - addr; 213 + } 214 + return ((end - start) - ram); 191 215 } 192 216 193 217 /*

+14 -6

arch/x86_64/kernel/head.S

··· 163 163 */ 164 164 lgdt cpu_gdt_descr 165 165 166 + /* set up data segments. actually 0 would do too */ 167 + movl $__KERNEL_DS,%eax 168 + movl %eax,%ds 169 + movl %eax,%ss 170 + movl %eax,%es 171 + 172 + /* 173 + * We don't really need to load %fs or %gs, but load them anyway 174 + * to kill any stale realmode selectors. This allows execution 175 + * under VT hardware. 176 + */ 177 + movl %eax,%fs 178 + movl %eax,%gs 179 + 166 180 /* 167 181 * Setup up a dummy PDA. this is just for some early bootup code 168 182 * that does in_interrupt() ··· 187 173 shrq $32,%rdx 188 174 wrmsr 189 175 190 - /* set up data segments. actually 0 would do too */ 191 - movl $__KERNEL_DS,%eax 192 - movl %eax,%ds 193 - movl %eax,%ss 194 - movl %eax,%es 195 - 196 176 /* esi is pointer to real mode structure with interesting info. 197 177 pass it to C */ 198 178 movl %esi, %edi

+11 -13

arch/x86_64/kernel/io_apic.c

··· 831 831 entry.delivery_mode = INT_DELIVERY_MODE; 832 832 entry.dest_mode = INT_DEST_MODE; 833 833 entry.mask = 0; /* enable IRQ */ 834 - entry.dest.logical.logical_dest = cpu_mask_to_apicid(TARGET_CPUS); 834 + entry.dest = cpu_mask_to_apicid(TARGET_CPUS); 835 835 836 836 entry.trigger = irq_trigger(idx); 837 837 entry.polarity = irq_polarity(idx); ··· 839 839 if (irq_trigger(idx)) { 840 840 entry.trigger = 1; 841 841 entry.mask = 1; 842 - entry.dest.logical.logical_dest = cpu_mask_to_apicid(TARGET_CPUS); 842 + entry.dest = cpu_mask_to_apicid(TARGET_CPUS); 843 843 } 844 844 845 845 if (!apic && !IO_APIC_IRQ(irq)) ··· 851 851 if (vector < 0) 852 852 return; 853 853 854 - entry.dest.logical.logical_dest = cpu_mask_to_apicid(mask); 854 + entry.dest = cpu_mask_to_apicid(mask); 855 855 entry.vector = vector; 856 856 857 857 ioapic_register_intr(irq, vector, IOAPIC_AUTO); ··· 920 920 */ 921 921 entry.dest_mode = INT_DEST_MODE; 922 922 entry.mask = 0; /* unmask IRQ now */ 923 - entry.dest.logical.logical_dest = cpu_mask_to_apicid(TARGET_CPUS); 923 + entry.dest = cpu_mask_to_apicid(TARGET_CPUS); 924 924 entry.delivery_mode = INT_DELIVERY_MODE; 925 925 entry.polarity = 0; 926 926 entry.trigger = 0; ··· 1020 1020 1021 1021 printk(KERN_DEBUG ".... IRQ redirection table:\n"); 1022 1022 1023 - printk(KERN_DEBUG " NR Log Phy Mask Trig IRR Pol" 1024 - " Stat Dest Deli Vect: \n"); 1023 + printk(KERN_DEBUG " NR Dst Mask Trig IRR Pol" 1024 + " Stat Dmod Deli Vect: \n"); 1025 1025 1026 1026 for (i = 0; i <= reg_01.bits.entries; i++) { 1027 1027 struct IO_APIC_route_entry entry; 1028 1028 1029 1029 entry = ioapic_read_entry(apic, i); 1030 1030 1031 - printk(KERN_DEBUG " %02x %03X %02X ", 1031 + printk(KERN_DEBUG " %02x %03X ", 1032 1032 i, 1033 - entry.dest.logical.logical_dest, 1034 - entry.dest.physical.physical_dest 1033 + entry.dest 1035 1034 ); 1036 1035 1037 1036 printk("%1d %1d %1d %1d %1d %1d %1d %02X\n", ··· 1292 1293 entry.dest_mode = 0; /* Physical */ 1293 1294 entry.delivery_mode = dest_ExtINT; /* ExtInt */ 1294 1295 entry.vector = 0; 1295 - entry.dest.physical.physical_dest = 1296 - GET_APIC_ID(apic_read(APIC_ID)); 1296 + entry.dest = GET_APIC_ID(apic_read(APIC_ID)); 1297 1297 1298 1298 /* 1299 1299 * Add it to the IO-APIC irq-routing table: ··· 1554 1556 1555 1557 entry1.dest_mode = 0; /* physical delivery */ 1556 1558 entry1.mask = 0; /* unmask IRQ now */ 1557 - entry1.dest.physical.physical_dest = hard_smp_processor_id(); 1559 + entry1.dest = hard_smp_processor_id(); 1558 1560 entry1.delivery_mode = dest_ExtINT; 1559 1561 entry1.polarity = entry0.polarity; 1560 1562 entry1.trigger = 0; ··· 2129 2131 2130 2132 entry.delivery_mode = INT_DELIVERY_MODE; 2131 2133 entry.dest_mode = INT_DEST_MODE; 2132 - entry.dest.logical.logical_dest = cpu_mask_to_apicid(mask); 2134 + entry.dest = cpu_mask_to_apicid(mask); 2133 2135 entry.trigger = triggering; 2134 2136 entry.polarity = polarity; 2135 2137 entry.mask = 1; /* Disabled (masked) */

+1 -1

arch/x86_64/kernel/ioport.c

··· 114 114 if (!capable(CAP_SYS_RAWIO)) 115 115 return -EPERM; 116 116 } 117 - regs->eflags = (regs->eflags &~ 0x3000UL) | (level << 12); 117 + regs->eflags = (regs->eflags &~ X86_EFLAGS_IOPL) | (level << 12); 118 118 return 0; 119 119 }

+9 -3

arch/x86_64/kernel/irq.c

··· 18 18 #include <asm/uaccess.h> 19 19 #include <asm/io_apic.h> 20 20 #include <asm/idle.h> 21 + #include <asm/smp.h> 21 22 22 23 atomic_t irq_err_count; 23 24 ··· 121 120 122 121 if (likely(irq < NR_IRQS)) 123 122 generic_handle_irq(irq); 124 - else if (printk_ratelimit()) 125 - printk(KERN_EMERG "%s: %d.%d No irq handler for vector\n", 126 - __func__, smp_processor_id(), vector); 123 + else { 124 + if (!disable_apic) 125 + ack_APIC_irq(); 126 + 127 + if (printk_ratelimit()) 128 + printk(KERN_EMERG "%s: %d.%d No irq handler for vector\n", 129 + __func__, smp_processor_id(), vector); 130 + } 127 131 128 132 irq_exit(); 129 133

+54 -12

arch/x86_64/kernel/mce.c

··· 19 19 #include <linux/cpu.h> 20 20 #include <linux/percpu.h> 21 21 #include <linux/ctype.h> 22 + #include <linux/kmod.h> 22 23 #include <asm/processor.h> 23 24 #include <asm/msr.h> 24 25 #include <asm/mce.h> ··· 43 42 static int notify_user; 44 43 static int rip_msr; 45 44 static int mce_bootlog = 1; 45 + static atomic_t mce_events; 46 + 47 + static char trigger[128]; 48 + static char *trigger_argv[2] = { trigger, NULL }; 46 49 47 50 /* 48 51 * Lockless MCE logging infrastructure. ··· 62 57 void mce_log(struct mce *mce) 63 58 { 64 59 unsigned next, entry; 60 + atomic_inc(&mce_events); 65 61 mce->finished = 0; 66 62 wmb(); 67 63 for (;;) { ··· 167 161 } 168 162 } 169 163 164 + static void do_mce_trigger(void) 165 + { 166 + static atomic_t mce_logged; 167 + int events = atomic_read(&mce_events); 168 + if (events != atomic_read(&mce_logged) && trigger[0]) { 169 + /* Small race window, but should be harmless. */ 170 + atomic_set(&mce_logged, events); 171 + call_usermodehelper(trigger, trigger_argv, NULL, -1); 172 + } 173 + } 174 + 170 175 /* 171 176 * The actual machine check handler 172 177 */ ··· 251 234 } 252 235 253 236 /* Never do anything final in the polling timer */ 254 - if (!regs) 237 + if (!regs) { 238 + /* Normal interrupt context here. Call trigger for any new 239 + events. */ 240 + do_mce_trigger(); 255 241 goto out; 242 + } 256 243 257 244 /* If we didn't find an uncorrectable error, pick 258 245 the last one (shouldn't happen, just being safe). */ ··· 627 606 } \ 628 607 static SYSDEV_ATTR(name, 0644, show_ ## name, set_ ## name); 629 608 609 + /* TBD should generate these dynamically based on number of available banks */ 630 610 ACCESSOR(bank0ctl,bank[0],mce_restart()) 631 611 ACCESSOR(bank1ctl,bank[1],mce_restart()) 632 612 ACCESSOR(bank2ctl,bank[2],mce_restart()) 633 613 ACCESSOR(bank3ctl,bank[3],mce_restart()) 634 614 ACCESSOR(bank4ctl,bank[4],mce_restart()) 635 615 ACCESSOR(bank5ctl,bank[5],mce_restart()) 636 - static struct sysdev_attribute * bank_attributes[NR_BANKS] = { 637 - &attr_bank0ctl, &attr_bank1ctl, &attr_bank2ctl, 638 - &attr_bank3ctl, &attr_bank4ctl, &attr_bank5ctl}; 616 + 617 + static ssize_t show_trigger(struct sys_device *s, char *buf) 618 + { 619 + strcpy(buf, trigger); 620 + strcat(buf, "\n"); 621 + return strlen(trigger) + 1; 622 + } 623 + 624 + static ssize_t set_trigger(struct sys_device *s,const char *buf,size_t siz) 625 + { 626 + char *p; 627 + int len; 628 + strncpy(trigger, buf, sizeof(trigger)); 629 + trigger[sizeof(trigger)-1] = 0; 630 + len = strlen(trigger); 631 + p = strchr(trigger, '\n'); 632 + if (*p) *p = 0; 633 + return len; 634 + } 635 + 636 + static SYSDEV_ATTR(trigger, 0644, show_trigger, set_trigger); 639 637 ACCESSOR(tolerant,tolerant,) 640 638 ACCESSOR(check_interval,check_interval,mce_restart()) 639 + static struct sysdev_attribute *mce_attributes[] = { 640 + &attr_bank0ctl, &attr_bank1ctl, &attr_bank2ctl, 641 + &attr_bank3ctl, &attr_bank4ctl, &attr_bank5ctl, 642 + &attr_tolerant, &attr_check_interval, &attr_trigger, 643 + NULL 644 + }; 641 645 642 646 /* Per cpu sysdev init. All of the cpus still share the same ctl bank */ 643 647 static __cpuinit int mce_create_device(unsigned int cpu) ··· 678 632 err = sysdev_register(&per_cpu(device_mce,cpu)); 679 633 680 634 if (!err) { 681 - for (i = 0; i < banks; i++) 635 + for (i = 0; mce_attributes[i]; i++) 682 636 sysdev_create_file(&per_cpu(device_mce,cpu), 683 - bank_attributes[i]); 684 - sysdev_create_file(&per_cpu(device_mce,cpu), &attr_tolerant); 685 - sysdev_create_file(&per_cpu(device_mce,cpu), &attr_check_interval); 637 + mce_attributes[i]); 686 638 } 687 639 return err; 688 640 } ··· 689 645 { 690 646 int i; 691 647 692 - for (i = 0; i < banks; i++) 648 + for (i = 0; mce_attributes[i]; i++) 693 649 sysdev_remove_file(&per_cpu(device_mce,cpu), 694 - bank_attributes[i]); 695 - sysdev_remove_file(&per_cpu(device_mce,cpu), &attr_tolerant); 696 - sysdev_remove_file(&per_cpu(device_mce,cpu), &attr_check_interval); 650 + mce_attributes[i]); 697 651 sysdev_unregister(&per_cpu(device_mce,cpu)); 698 652 memset(&per_cpu(device_mce, cpu).kobj, 0, sizeof(struct kobject)); 699 653 }

+29 -15

arch/x86_64/kernel/mce_amd.c

··· 37 37 #define THRESHOLD_MAX 0xFFF 38 38 #define INT_TYPE_APIC 0x00020000 39 39 #define MASK_VALID_HI 0x80000000 40 + #define MASK_CNTP_HI 0x40000000 41 + #define MASK_LOCKED_HI 0x20000000 40 42 #define MASK_LVTOFF_HI 0x00F00000 41 43 #define MASK_COUNT_EN_HI 0x00080000 42 44 #define MASK_INT_TYPE_HI 0x00060000 ··· 124 122 for (block = 0; block < NR_BLOCKS; ++block) { 125 123 if (block == 0) 126 124 address = MSR_IA32_MC0_MISC + bank * 4; 127 - else if (block == 1) 128 - address = MCG_XBLK_ADDR 129 - + ((low & MASK_BLKPTR_LO) >> 21); 125 + else if (block == 1) { 126 + address = (low & MASK_BLKPTR_LO) >> 21; 127 + if (!address) 128 + break; 129 + address += MCG_XBLK_ADDR; 130 + } 130 131 else 131 132 ++address; 132 133 133 134 if (rdmsr_safe(address, &low, &high)) 134 - continue; 135 + break; 135 136 136 137 if (!(high & MASK_VALID_HI)) { 137 138 if (block) ··· 143 138 break; 144 139 } 145 140 146 - if (!(high & MASK_VALID_HI >> 1) || 147 - (high & MASK_VALID_HI >> 2)) 141 + if (!(high & MASK_CNTP_HI) || 142 + (high & MASK_LOCKED_HI)) 148 143 continue; 149 144 150 145 if (!block) ··· 192 187 193 188 /* assume first bank caused it */ 194 189 for (bank = 0; bank < NR_BANKS; ++bank) { 190 + if (!(per_cpu(bank_map, m.cpu) & (1 << bank))) 191 + continue; 195 192 for (block = 0; block < NR_BLOCKS; ++block) { 196 193 if (block == 0) 197 194 address = MSR_IA32_MC0_MISC + bank * 4; 198 - else if (block == 1) 199 - address = MCG_XBLK_ADDR 200 - + ((low & MASK_BLKPTR_LO) >> 21); 195 + else if (block == 1) { 196 + address = (low & MASK_BLKPTR_LO) >> 21; 197 + if (!address) 198 + break; 199 + address += MCG_XBLK_ADDR; 200 + } 201 201 else 202 202 ++address; 203 203 204 204 if (rdmsr_safe(address, &low, &high)) 205 - continue; 205 + break; 206 206 207 207 if (!(high & MASK_VALID_HI)) { 208 208 if (block) ··· 216 206 break; 217 207 } 218 208 219 - if (!(high & MASK_VALID_HI >> 1) || 220 - (high & MASK_VALID_HI >> 2)) 209 + if (!(high & MASK_CNTP_HI) || 210 + (high & MASK_LOCKED_HI)) 221 211 continue; 212 + 213 + /* Log the machine check that caused the threshold 214 + event. */ 215 + do_machine_check(NULL, 0); 222 216 223 217 if (high & MASK_OVERFLOW_HI) { 224 218 rdmsrl(address, m.misc); ··· 399 385 return 0; 400 386 401 387 if (rdmsr_safe(address, &low, &high)) 402 - goto recurse; 388 + return 0; 403 389 404 390 if (!(high & MASK_VALID_HI)) { 405 391 if (block) ··· 408 394 return 0; 409 395 } 410 396 411 - if (!(high & MASK_VALID_HI >> 1) || 412 - (high & MASK_VALID_HI >> 2)) 397 + if (!(high & MASK_CNTP_HI) || 398 + (high & MASK_LOCKED_HI)) 413 399 goto recurse; 414 400 415 401 b = kzalloc(sizeof(struct threshold_block), GFP_KERNEL);

+60 -15

arch/x86_64/kernel/nmi.c

··· 172 172 { 173 173 switch (boot_cpu_data.x86_vendor) { 174 174 case X86_VENDOR_AMD: 175 - return boot_cpu_data.x86 == 15; 175 + return boot_cpu_data.x86 == 15 || boot_cpu_data.x86 == 16; 176 176 case X86_VENDOR_INTEL: 177 177 if (cpu_has(&boot_cpu_data, X86_FEATURE_ARCH_PERFMON)) 178 178 return 1; ··· 213 213 mb(); 214 214 } 215 215 #endif 216 + 217 + static unsigned int adjust_for_32bit_ctr(unsigned int hz) 218 + { 219 + unsigned int retval = hz; 220 + 221 + /* 222 + * On Intel CPUs with ARCH_PERFMON only 32 bits in the counter 223 + * are writable, with higher bits sign extending from bit 31. 224 + * So, we can only program the counter with 31 bit values and 225 + * 32nd bit should be 1, for 33.. to be 1. 226 + * Find the appropriate nmi_hz 227 + */ 228 + if ((((u64)cpu_khz * 1000) / retval) > 0x7fffffffULL) { 229 + retval = ((u64)cpu_khz * 1000) / 0x7fffffffUL + 1; 230 + } 231 + return retval; 232 + } 216 233 217 234 int __init check_nmi_watchdog (void) 218 235 { ··· 285 268 struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk); 286 269 287 270 nmi_hz = 1; 288 - /* 289 - * On Intel CPUs with ARCH_PERFMON only 32 bits in the counter 290 - * are writable, with higher bits sign extending from bit 31. 291 - * So, we can only program the counter with 31 bit values and 292 - * 32nd bit should be 1, for 33.. to be 1. 293 - * Find the appropriate nmi_hz 294 - */ 295 - if (wd->perfctr_msr == MSR_ARCH_PERFMON_PERFCTR0 && 296 - ((u64)cpu_khz * 1000) > 0x7fffffffULL) { 297 - nmi_hz = ((u64)cpu_khz * 1000) / 0x7fffffffUL + 1; 298 - } 271 + if (wd->perfctr_msr == MSR_ARCH_PERFMON_PERFCTR0) 272 + nmi_hz = adjust_for_32bit_ctr(nmi_hz); 299 273 } 300 274 301 275 kfree(counts); ··· 368 360 } 369 361 } 370 362 363 + static void __acpi_nmi_disable(void *__unused) 364 + { 365 + apic_write(APIC_LVT0, APIC_DM_NMI | APIC_LVT_MASKED); 366 + } 367 + 368 + /* 369 + * Disable timer based NMIs on all CPUs: 370 + */ 371 + void acpi_nmi_disable(void) 372 + { 373 + if (atomic_read(&nmi_active) && nmi_watchdog == NMI_IO_APIC) 374 + on_each_cpu(__acpi_nmi_disable, NULL, 0, 1); 375 + } 376 + 377 + static void __acpi_nmi_enable(void *__unused) 378 + { 379 + apic_write(APIC_LVT0, APIC_DM_NMI); 380 + } 381 + 382 + /* 383 + * Enable timer based NMIs on all CPUs: 384 + */ 385 + void acpi_nmi_enable(void) 386 + { 387 + if (atomic_read(&nmi_active) && nmi_watchdog == NMI_IO_APIC) 388 + on_each_cpu(__acpi_nmi_enable, NULL, 0, 1); 389 + } 371 390 #ifdef CONFIG_PM 372 391 373 392 static int nmi_pm_active; /* nmi_active before suspend */ ··· 669 634 670 635 /* setup the timer */ 671 636 wrmsr(evntsel_msr, evntsel, 0); 672 - wrmsrl(perfctr_msr, -((u64)cpu_khz * 1000 / nmi_hz)); 637 + 638 + nmi_hz = adjust_for_32bit_ctr(nmi_hz); 639 + wrmsr(perfctr_msr, (u32)(-((u64)cpu_khz * 1000 / nmi_hz)), 0); 673 640 674 641 apic_write(APIC_LVTPC, APIC_DM_NMI); 675 642 evntsel |= ARCH_PERFMON_EVENTSEL0_ENABLE; ··· 892 855 dummy &= ~P4_CCCR_OVF; 893 856 wrmsrl(wd->cccr_msr, dummy); 894 857 apic_write(APIC_LVTPC, APIC_DM_NMI); 858 + /* start the cycle over again */ 859 + wrmsrl(wd->perfctr_msr, 860 + -((u64)cpu_khz * 1000 / nmi_hz)); 895 861 } else if (wd->perfctr_msr == MSR_ARCH_PERFMON_PERFCTR0) { 896 862 /* 897 863 * ArchPerfom/Core Duo needs to re-unmask 898 864 * the apic vector 899 865 */ 900 866 apic_write(APIC_LVTPC, APIC_DM_NMI); 867 + /* ARCH_PERFMON has 32 bit counter writes */ 868 + wrmsr(wd->perfctr_msr, 869 + (u32)(-((u64)cpu_khz * 1000 / nmi_hz)), 0); 870 + } else { 871 + /* start the cycle over again */ 872 + wrmsrl(wd->perfctr_msr, 873 + -((u64)cpu_khz * 1000 / nmi_hz)); 901 874 } 902 - /* start the cycle over again */ 903 - wrmsrl(wd->perfctr_msr, -((u64)cpu_khz * 1000 / nmi_hz)); 904 875 rc = 1; 905 876 } else if (nmi_watchdog == NMI_IO_APIC) { 906 877 /* don't know how to accurately check for this.

+15 -2

arch/x86_64/kernel/pci-calgary.c

··· 138 138 139 139 #define PHB_DEBUG_STUFF_OFFSET 0x0020 140 140 141 + #define EMERGENCY_PAGES 32 /* = 128KB */ 142 + 141 143 unsigned int specified_table_size = TCE_TABLE_SIZE_UNSPECIFIED; 142 144 static int translate_empty_slots __read_mostly = 0; 143 145 static int calgary_detected __read_mostly = 0; ··· 298 296 { 299 297 unsigned long entry; 300 298 unsigned long badbit; 299 + unsigned long badend; 300 + 301 + /* were we called with bad_dma_address? */ 302 + badend = bad_dma_address + (EMERGENCY_PAGES * PAGE_SIZE); 303 + if (unlikely((dma_addr >= bad_dma_address) && (dma_addr < badend))) { 304 + printk(KERN_ERR "Calgary: driver tried unmapping bad DMA " 305 + "address 0x%Lx\n", dma_addr); 306 + WARN_ON(1); 307 + return; 308 + } 301 309 302 310 entry = dma_addr >> PAGE_SHIFT; 303 311 ··· 668 656 u64 start; 669 657 struct iommu_table *tbl = dev->sysdata; 670 658 671 - /* reserve bad_dma_address in case it's a legal address */ 672 - iommu_range_reserve(tbl, bad_dma_address, 1); 659 + /* reserve EMERGENCY_PAGES from bad_dma_address and up */ 660 + iommu_range_reserve(tbl, bad_dma_address, EMERGENCY_PAGES); 673 661 674 662 /* avoid the BIOS/VGA first 640KB-1MB region */ 675 663 start = (640 * 1024); ··· 1188 1176 } 1189 1177 1190 1178 force_iommu = 1; 1179 + bad_dma_address = 0x0; 1191 1180 dma_ops = &calgary_dma_ops; 1192 1181 1193 1182 return 0;

+4 -24

arch/x86_64/kernel/pci-dma.c

··· 223 223 } 224 224 EXPORT_SYMBOL(dma_set_mask); 225 225 226 - /* iommu=[size][,noagp][,off][,force][,noforce][,leak][,memaper[=order]][,merge] 227 - [,forcesac][,fullflush][,nomerge][,biomerge] 228 - size set size of iommu (in bytes) 229 - noagp don't initialize the AGP driver and use full aperture. 230 - off don't use the IOMMU 231 - leak turn on simple iommu leak tracing (only when CONFIG_IOMMU_LEAK is on) 232 - memaper[=order] allocate an own aperture over RAM with size 32MB^order. 233 - noforce don't force IOMMU usage. Default. 234 - force Force IOMMU. 235 - merge Do lazy merging. This may improve performance on some block devices. 236 - Implies force (experimental) 237 - biomerge Do merging at the BIO layer. This is more efficient than merge, 238 - but should be only done with very big IOMMUs. Implies merge,force. 239 - nomerge Don't do SG merging. 240 - forcesac For SAC mode for masks <40bits (experimental) 241 - fullflush Flush IOMMU on each allocation (default) 242 - nofullflush Don't use IOMMU fullflush 243 - allowed overwrite iommu off workarounds for specific chipsets. 244 - soft Use software bounce buffering (default for Intel machines) 245 - noaperture Don't touch the aperture for AGP. 246 - allowdac Allow DMA >4GB 247 - nodac Forbid DMA >4GB 248 - panic Force panic when IOMMU overflows 249 - */ 226 + /* 227 + * See <Documentation/x86_64/boot-options.txt> for the iommu kernel parameter 228 + * documentation. 229 + */ 250 230 __init int iommu_setup(char *p) 251 231 { 252 232 iommu_merge = 1;

+2 -2

arch/x86_64/kernel/pci-gart.c

··· 185 185 static inline int need_iommu(struct device *dev, unsigned long addr, size_t size) 186 186 { 187 187 u64 mask = *dev->dma_mask; 188 - int high = addr + size >= mask; 188 + int high = addr + size > mask; 189 189 int mmu = high; 190 190 if (force_iommu) 191 191 mmu = 1; ··· 195 195 static inline int nonforced_iommu(struct device *dev, unsigned long addr, size_t size) 196 196 { 197 197 u64 mask = *dev->dma_mask; 198 - int high = addr + size >= mask; 198 + int high = addr + size > mask; 199 199 int mmu = high; 200 200 return mmu; 201 201 }

+6 -2

arch/x86_64/kernel/ptrace.c

··· 536 536 } 537 537 ret = 0; 538 538 for (ui = 0; ui < sizeof(struct user_regs_struct); ui += sizeof(long)) { 539 - ret |= __get_user(tmp, (unsigned long __user *) data); 540 - putreg(child, ui, tmp); 539 + ret = __get_user(tmp, (unsigned long __user *) data); 540 + if (ret) 541 + break; 542 + ret = putreg(child, ui, tmp); 543 + if (ret) 544 + break; 541 545 data += sizeof(long); 542 546 } 543 547 break;

+17 -152

arch/x86_64/kernel/setup.c

··· 138 138 .flags = IORESOURCE_RAM, 139 139 }; 140 140 141 - #define IORESOURCE_ROM (IORESOURCE_BUSY | IORESOURCE_READONLY | IORESOURCE_MEM) 142 - 143 - static struct resource system_rom_resource = { 144 - .name = "System ROM", 145 - .start = 0xf0000, 146 - .end = 0xfffff, 147 - .flags = IORESOURCE_ROM, 148 - }; 149 - 150 - static struct resource extension_rom_resource = { 151 - .name = "Extension ROM", 152 - .start = 0xe0000, 153 - .end = 0xeffff, 154 - .flags = IORESOURCE_ROM, 155 - }; 156 - 157 - static struct resource adapter_rom_resources[] = { 158 - { .name = "Adapter ROM", .start = 0xc8000, .end = 0, 159 - .flags = IORESOURCE_ROM }, 160 - { .name = "Adapter ROM", .start = 0, .end = 0, 161 - .flags = IORESOURCE_ROM }, 162 - { .name = "Adapter ROM", .start = 0, .end = 0, 163 - .flags = IORESOURCE_ROM }, 164 - { .name = "Adapter ROM", .start = 0, .end = 0, 165 - .flags = IORESOURCE_ROM }, 166 - { .name = "Adapter ROM", .start = 0, .end = 0, 167 - .flags = IORESOURCE_ROM }, 168 - { .name = "Adapter ROM", .start = 0, .end = 0, 169 - .flags = IORESOURCE_ROM } 170 - }; 171 - 172 - static struct resource video_rom_resource = { 173 - .name = "Video ROM", 174 - .start = 0xc0000, 175 - .end = 0xc7fff, 176 - .flags = IORESOURCE_ROM, 177 - }; 178 - 179 - static struct resource video_ram_resource = { 180 - .name = "Video RAM area", 181 - .start = 0xa0000, 182 - .end = 0xbffff, 183 - .flags = IORESOURCE_RAM, 184 - }; 185 - 186 - #define romsignature(x) (*(unsigned short *)(x) == 0xaa55) 187 - 188 - static int __init romchecksum(unsigned char *rom, unsigned long length) 189 - { 190 - unsigned char *p, sum = 0; 191 - 192 - for (p = rom; p < rom + length; p++) 193 - sum += *p; 194 - return sum == 0; 195 - } 196 - 197 - static void __init probe_roms(void) 198 - { 199 - unsigned long start, length, upper; 200 - unsigned char *rom; 201 - int i; 202 - 203 - /* video rom */ 204 - upper = adapter_rom_resources[0].start; 205 - for (start = video_rom_resource.start; start < upper; start += 2048) { 206 - rom = isa_bus_to_virt(start); 207 - if (!romsignature(rom)) 208 - continue; 209 - 210 - video_rom_resource.start = start; 211 - 212 - /* 0 < length <= 0x7f * 512, historically */ 213 - length = rom[2] * 512; 214 - 215 - /* if checksum okay, trust length byte */ 216 - if (length && romchecksum(rom, length)) 217 - video_rom_resource.end = start + length - 1; 218 - 219 - request_resource(&iomem_resource, &video_rom_resource); 220 - break; 221 - } 222 - 223 - start = (video_rom_resource.end + 1 + 2047) & ~2047UL; 224 - if (start < upper) 225 - start = upper; 226 - 227 - /* system rom */ 228 - request_resource(&iomem_resource, &system_rom_resource); 229 - upper = system_rom_resource.start; 230 - 231 - /* check for extension rom (ignore length byte!) */ 232 - rom = isa_bus_to_virt(extension_rom_resource.start); 233 - if (romsignature(rom)) { 234 - length = extension_rom_resource.end - extension_rom_resource.start + 1; 235 - if (romchecksum(rom, length)) { 236 - request_resource(&iomem_resource, &extension_rom_resource); 237 - upper = extension_rom_resource.start; 238 - } 239 - } 240 - 241 - /* check for adapter roms on 2k boundaries */ 242 - for (i = 0; i < ARRAY_SIZE(adapter_rom_resources) && start < upper; 243 - start += 2048) { 244 - rom = isa_bus_to_virt(start); 245 - if (!romsignature(rom)) 246 - continue; 247 - 248 - /* 0 < length <= 0x7f * 512, historically */ 249 - length = rom[2] * 512; 250 - 251 - /* but accept any length that fits if checksum okay */ 252 - if (!length || start + length > upper || !romchecksum(rom, length)) 253 - continue; 254 - 255 - adapter_rom_resources[i].start = start; 256 - adapter_rom_resources[i].end = start + length - 1; 257 - request_resource(&iomem_resource, &adapter_rom_resources[i]); 258 - 259 - start = adapter_rom_resources[i++].end & ~2047UL; 260 - } 261 - } 262 - 263 141 #ifdef CONFIG_PROC_VMCORE 264 142 /* elfcorehdr= specifies the location of elf core header 265 143 * stored by the crashed kernel. This option will be passed ··· 322 444 /* reserve ebda region */ 323 445 if (ebda_addr) 324 446 reserve_bootmem_generic(ebda_addr, ebda_size); 447 + #ifdef CONFIG_NUMA 448 + /* reserve nodemap region */ 449 + if (nodemap_addr) 450 + reserve_bootmem_generic(nodemap_addr, nodemap_size); 451 + #endif 325 452 326 453 #ifdef CONFIG_SMP 327 454 /* ··· 402 519 init_apic_mappings(); 403 520 404 521 /* 405 - * Request address space for all standard RAM and ROM resources 406 - * and also for regions reported as reserved by the e820. 407 - */ 408 - probe_roms(); 522 + * We trust e820 completely. No explicit ROM probing in memory. 523 + */ 409 524 e820_reserve_resources(); 410 525 e820_mark_nosave_regions(); 411 - 412 - request_resource(&iomem_resource, &video_ram_resource); 413 526 414 527 { 415 528 unsigned i; ··· 942 1063 NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, 943 1064 NULL, NULL, NULL, "syscall", NULL, NULL, NULL, NULL, 944 1065 NULL, NULL, NULL, NULL, "nx", NULL, "mmxext", NULL, 945 - NULL, "fxsr_opt", NULL, "rdtscp", NULL, "lm", "3dnowext", "3dnow", 1066 + NULL, "fxsr_opt", "pdpe1gb", "rdtscp", NULL, "lm", 1067 + "3dnowext", "3dnow", 946 1068 947 1069 /* Transmeta-defined */ 948 1070 "recovery", "longrun", NULL, "lrti", NULL, NULL, NULL, NULL, ··· 961 1081 /* Intel-defined (#2) */ 962 1082 "pni", NULL, NULL, "monitor", "ds_cpl", "vmx", "smx", "est", 963 1083 "tm2", "ssse3", "cid", NULL, NULL, "cx16", "xtpr", NULL, 964 - NULL, NULL, "dca", NULL, NULL, NULL, NULL, NULL, 1084 + NULL, NULL, "dca", NULL, NULL, NULL, NULL, "popcnt", 965 1085 NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, 966 1086 967 1087 /* VIA/Cyrix/Centaur-defined */ ··· 971 1091 NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, 972 1092 973 1093 /* AMD-defined (#2) */ 974 - "lahf_lm", "cmp_legacy", "svm", NULL, "cr8_legacy", NULL, NULL, NULL, 975 - NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, 1094 + "lahf_lm", "cmp_legacy", "svm", "extapic", "cr8_legacy", 1095 + "altmovcr8", "abm", "sse4a", 1096 + "misalignsse", "3dnowprefetch", 1097 + "osvw", "ibs", NULL, NULL, NULL, NULL, 976 1098 NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, 977 1099 NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, 978 1100 }; ··· 985 1103 "ttp", /* thermal trip */ 986 1104 "tm", 987 1105 "stc", 1106 + "100mhzsteps", 1107 + "hwpstate", 1108 + NULL, /* tsc invariant mapped to constant_tsc */ 988 1109 NULL, 989 1110 /* nothing */ /* constant_tsc - moved to flags */ 990 1111 }; ··· 1104 1219 .stop = c_stop, 1105 1220 .show = show_cpuinfo, 1106 1221 }; 1107 - 1108 - #if defined(CONFIG_INPUT_PCSPKR) || defined(CONFIG_INPUT_PCSPKR_MODULE) 1109 - #include <linux/platform_device.h> 1110 - static __init int add_pcspkr(void) 1111 - { 1112 - struct platform_device *pd; 1113 - int ret; 1114 - 1115 - pd = platform_device_alloc("pcspkr", -1); 1116 - if (!pd) 1117 - return -ENOMEM; 1118 - 1119 - ret = platform_device_add(pd); 1120 - if (ret) 1121 - platform_device_put(pd); 1122 - 1123 - return ret; 1124 - } 1125 - device_initcall(add_pcspkr); 1126 - #endif

-1

arch/x86_64/kernel/setup64.c

··· 37 37 char boot_cpu_stack[IRQSTACKSIZE] __attribute__((section(".bss.page_aligned"))); 38 38 39 39 unsigned long __supported_pte_mask __read_mostly = ~0UL; 40 - EXPORT_SYMBOL(__supported_pte_mask); 41 40 static int do_not_nx __cpuinitdata = 0; 42 41 43 42 /* noexec=on|off

+3 -2

arch/x86_64/kernel/stacktrace.c

··· 32 32 trace->skip--; 33 33 return; 34 34 } 35 - if (trace->nr_entries < trace->max_entries - 1) 35 + if (trace->nr_entries < trace->max_entries) 36 36 trace->entries[trace->nr_entries++] = addr; 37 37 } 38 38 ··· 49 49 void save_stack_trace(struct stack_trace *trace, struct task_struct *task) 50 50 { 51 51 dump_trace(task, NULL, NULL, &save_stack_ops, trace); 52 - trace->entries[trace->nr_entries++] = ULONG_MAX; 52 + if (trace->nr_entries < trace->max_entries) 53 + trace->entries[trace->nr_entries++] = ULONG_MAX; 53 54 } 54 55 EXPORT_SYMBOL(save_stack_trace); 55 56

+10 -4

arch/x86_64/kernel/time.c

··· 657 657 658 658 #define TICK_COUNT 100000000 659 659 #define TICK_MIN 5000 660 + #define MAX_READ_RETRIES 5 660 661 661 662 /* 662 663 * Some platforms take periodic SMI interrupts with 5ms duration. Make sure none ··· 665 664 */ 666 665 static void __init read_hpet_tsc(int *hpet, int *tsc) 667 666 { 668 - int tsc1, tsc2, hpet1; 667 + int tsc1, tsc2, hpet1, retries = 0; 668 + static int msg; 669 669 670 670 do { 671 671 tsc1 = get_cycles_sync(); 672 672 hpet1 = hpet_readl(HPET_COUNTER); 673 673 tsc2 = get_cycles_sync(); 674 - } while (tsc2 - tsc1 > TICK_MIN); 674 + } while (tsc2 - tsc1 > TICK_MIN && retries++ < MAX_READ_RETRIES); 675 + if (retries >= MAX_READ_RETRIES && !msg++) 676 + printk(KERN_WARNING 677 + "hpet.c: exceeded max retries to read HPET & TSC\n"); 675 678 *hpet = hpet1; 676 679 *tsc = tsc2; 677 680 } ··· 1226 1221 if (PIE_on) 1227 1222 PIE_count += lost_ints; 1228 1223 1229 - printk(KERN_WARNING "rtc: lost some interrupts at %ldHz.\n", 1230 - hpet_rtc_int_freq); 1224 + if (printk_ratelimit()) 1225 + printk(KERN_WARNING "rtc: lost some interrupts at %ldHz.\n", 1226 + hpet_rtc_int_freq); 1231 1227 } 1232 1228 } 1233 1229

+3 -2

arch/x86_64/kernel/x8664_ksyms.c

··· 26 26 EXPORT_SYMBOL(__put_user_8); 27 27 28 28 EXPORT_SYMBOL(copy_user_generic); 29 + EXPORT_SYMBOL(__copy_user_nocache); 29 30 EXPORT_SYMBOL(copy_from_user); 30 31 EXPORT_SYMBOL(copy_to_user); 31 32 EXPORT_SYMBOL(__copy_from_user_inatomic); ··· 35 34 EXPORT_SYMBOL(clear_page); 36 35 37 36 #ifdef CONFIG_SMP 38 - extern void FASTCALL( __write_lock_failed(rwlock_t *rw)); 39 - extern void FASTCALL( __read_lock_failed(rwlock_t *rw)); 37 + extern void __write_lock_failed(rwlock_t *rw); 38 + extern void __read_lock_failed(rwlock_t *rw); 40 39 EXPORT_SYMBOL(__write_lock_failed); 41 40 EXPORT_SYMBOL(__read_lock_failed); 42 41 #endif

+1 -1

arch/x86_64/lib/Makefile

··· 9 9 lib-y := csum-partial.o csum-copy.o csum-wrappers.o delay.o \ 10 10 usercopy.o getuser.o putuser.o \ 11 11 thunk.o clear_page.o copy_page.o bitstr.o bitops.o 12 - lib-y += memcpy.o memmove.o memset.o copy_user.o rwlock.o 12 + lib-y += memcpy.o memmove.o memset.o copy_user.o rwlock.o copy_user_nocache.o

+217

arch/x86_64/lib/copy_user_nocache.S

··· 1 + /* Copyright 2002 Andi Kleen, SuSE Labs. 2 + * Subject to the GNU Public License v2. 3 + * 4 + * Functions to copy from and to user space. 5 + */ 6 + 7 + #include <linux/linkage.h> 8 + #include <asm/dwarf2.h> 9 + 10 + #define FIX_ALIGNMENT 1 11 + 12 + #include <asm/current.h> 13 + #include <asm/asm-offsets.h> 14 + #include <asm/thread_info.h> 15 + #include <asm/cpufeature.h> 16 + 17 + /* 18 + * copy_user_nocache - Uncached memory copy with exception handling 19 + * This will force destination/source out of cache for more performance. 20 + * 21 + * Input: 22 + * rdi destination 23 + * rsi source 24 + * rdx count 25 + * rcx zero flag when 1 zero on exception 26 + * 27 + * Output: 28 + * eax uncopied bytes or 0 if successful. 29 + */ 30 + ENTRY(__copy_user_nocache) 31 + CFI_STARTPROC 32 + pushq %rbx 33 + CFI_ADJUST_CFA_OFFSET 8 34 + CFI_REL_OFFSET rbx, 0 35 + pushq %rcx /* save zero flag */ 36 + CFI_ADJUST_CFA_OFFSET 8 37 + CFI_REL_OFFSET rcx, 0 38 + 39 + xorl %eax,%eax /* zero for the exception handler */ 40 + 41 + #ifdef FIX_ALIGNMENT 42 + /* check for bad alignment of destination */ 43 + movl %edi,%ecx 44 + andl $7,%ecx 45 + jnz .Lbad_alignment 46 + .Lafter_bad_alignment: 47 + #endif 48 + 49 + movq %rdx,%rcx 50 + 51 + movl $64,%ebx 52 + shrq $6,%rdx 53 + decq %rdx 54 + js .Lhandle_tail 55 + 56 + .p2align 4 57 + .Lloop: 58 + .Ls1: movq (%rsi),%r11 59 + .Ls2: movq 1*8(%rsi),%r8 60 + .Ls3: movq 2*8(%rsi),%r9 61 + .Ls4: movq 3*8(%rsi),%r10 62 + .Ld1: movnti %r11,(%rdi) 63 + .Ld2: movnti %r8,1*8(%rdi) 64 + .Ld3: movnti %r9,2*8(%rdi) 65 + .Ld4: movnti %r10,3*8(%rdi) 66 + 67 + .Ls5: movq 4*8(%rsi),%r11 68 + .Ls6: movq 5*8(%rsi),%r8 69 + .Ls7: movq 6*8(%rsi),%r9 70 + .Ls8: movq 7*8(%rsi),%r10 71 + .Ld5: movnti %r11,4*8(%rdi) 72 + .Ld6: movnti %r8,5*8(%rdi) 73 + .Ld7: movnti %r9,6*8(%rdi) 74 + .Ld8: movnti %r10,7*8(%rdi) 75 + 76 + dec %rdx 77 + 78 + leaq 64(%rsi),%rsi 79 + leaq 64(%rdi),%rdi 80 + 81 + jns .Lloop 82 + 83 + .p2align 4 84 + .Lhandle_tail: 85 + movl %ecx,%edx 86 + andl $63,%ecx 87 + shrl $3,%ecx 88 + jz .Lhandle_7 89 + movl $8,%ebx 90 + .p2align 4 91 + .Lloop_8: 92 + .Ls9: movq (%rsi),%r8 93 + .Ld9: movnti %r8,(%rdi) 94 + decl %ecx 95 + leaq 8(%rdi),%rdi 96 + leaq 8(%rsi),%rsi 97 + jnz .Lloop_8 98 + 99 + .Lhandle_7: 100 + movl %edx,%ecx 101 + andl $7,%ecx 102 + jz .Lende 103 + .p2align 4 104 + .Lloop_1: 105 + .Ls10: movb (%rsi),%bl 106 + .Ld10: movb %bl,(%rdi) 107 + incq %rdi 108 + incq %rsi 109 + decl %ecx 110 + jnz .Lloop_1 111 + 112 + CFI_REMEMBER_STATE 113 + .Lende: 114 + popq %rcx 115 + CFI_ADJUST_CFA_OFFSET -8 116 + CFI_RESTORE %rcx 117 + popq %rbx 118 + CFI_ADJUST_CFA_OFFSET -8 119 + CFI_RESTORE rbx 120 + ret 121 + CFI_RESTORE_STATE 122 + 123 + #ifdef FIX_ALIGNMENT 124 + /* align destination */ 125 + .p2align 4 126 + .Lbad_alignment: 127 + movl $8,%r9d 128 + subl %ecx,%r9d 129 + movl %r9d,%ecx 130 + cmpq %r9,%rdx 131 + jz .Lhandle_7 132 + js .Lhandle_7 133 + .Lalign_1: 134 + .Ls11: movb (%rsi),%bl 135 + .Ld11: movb %bl,(%rdi) 136 + incq %rsi 137 + incq %rdi 138 + decl %ecx 139 + jnz .Lalign_1 140 + subq %r9,%rdx 141 + jmp .Lafter_bad_alignment 142 + #endif 143 + 144 + /* table sorted by exception address */ 145 + .section __ex_table,"a" 146 + .align 8 147 + .quad .Ls1,.Ls1e 148 + .quad .Ls2,.Ls2e 149 + .quad .Ls3,.Ls3e 150 + .quad .Ls4,.Ls4e 151 + .quad .Ld1,.Ls1e 152 + .quad .Ld2,.Ls2e 153 + .quad .Ld3,.Ls3e 154 + .quad .Ld4,.Ls4e 155 + .quad .Ls5,.Ls5e 156 + .quad .Ls6,.Ls6e 157 + .quad .Ls7,.Ls7e 158 + .quad .Ls8,.Ls8e 159 + .quad .Ld5,.Ls5e 160 + .quad .Ld6,.Ls6e 161 + .quad .Ld7,.Ls7e 162 + .quad .Ld8,.Ls8e 163 + .quad .Ls9,.Le_quad 164 + .quad .Ld9,.Le_quad 165 + .quad .Ls10,.Le_byte 166 + .quad .Ld10,.Le_byte 167 + #ifdef FIX_ALIGNMENT 168 + .quad .Ls11,.Lzero_rest 169 + .quad .Ld11,.Lzero_rest 170 + #endif 171 + .quad .Le5,.Le_zero 172 + .previous 173 + 174 + /* compute 64-offset for main loop. 8 bytes accuracy with error on the 175 + pessimistic side. this is gross. it would be better to fix the 176 + interface. */ 177 + /* eax: zero, ebx: 64 */ 178 + .Ls1e: addl $8,%eax 179 + .Ls2e: addl $8,%eax 180 + .Ls3e: addl $8,%eax 181 + .Ls4e: addl $8,%eax 182 + .Ls5e: addl $8,%eax 183 + .Ls6e: addl $8,%eax 184 + .Ls7e: addl $8,%eax 185 + .Ls8e: addl $8,%eax 186 + addq %rbx,%rdi /* +64 */ 187 + subq %rax,%rdi /* correct destination with computed offset */ 188 + 189 + shlq $6,%rdx /* loop counter * 64 (stride length) */ 190 + addq %rax,%rdx /* add offset to loopcnt */ 191 + andl $63,%ecx /* remaining bytes */ 192 + addq %rcx,%rdx /* add them */ 193 + jmp .Lzero_rest 194 + 195 + /* exception on quad word loop in tail handling */ 196 + /* ecx: loopcnt/8, %edx: length, rdi: correct */ 197 + .Le_quad: 198 + shll $3,%ecx 199 + andl $7,%edx 200 + addl %ecx,%edx 201 + /* edx: bytes to zero, rdi: dest, eax:zero */ 202 + .Lzero_rest: 203 + cmpl $0,(%rsp) /* zero flag set? */ 204 + jz .Le_zero 205 + movq %rdx,%rcx 206 + .Le_byte: 207 + xorl %eax,%eax 208 + .Le5: rep 209 + stosb 210 + /* when there is another exception while zeroing the rest just return */ 211 + .Le_zero: 212 + movq %rdx,%rax 213 + jmp .Lende 214 + CFI_ENDPROC 215 + ENDPROC(__copy_user_nocache) 216 + 217 +

+8 -10

arch/x86_64/mm/fault.c

··· 56 56 } 57 57 EXPORT_SYMBOL_GPL(unregister_page_fault_notifier); 58 58 59 - static inline int notify_page_fault(enum die_val val, const char *str, 60 - struct pt_regs *regs, long err, int trap, int sig) 59 + static inline int notify_page_fault(struct pt_regs *regs, long err) 61 60 { 62 61 struct die_args args = { 63 62 .regs = regs, 64 - .str = str, 63 + .str = "page fault", 65 64 .err = err, 66 - .trapnr = trap, 67 - .signr = sig 65 + .trapnr = 14, 66 + .signr = SIGSEGV 68 67 }; 69 - return atomic_notifier_call_chain(&notify_page_fault_chain, val, &args); 68 + return atomic_notifier_call_chain(&notify_page_fault_chain, 69 + DIE_PAGE_FAULT, &args); 70 70 } 71 71 72 72 /* Sometimes the CPU reports invalid exceptions on prefetch. ··· 355 355 if (vmalloc_fault(address) >= 0) 356 356 return; 357 357 } 358 - if (notify_page_fault(DIE_PAGE_FAULT, "page fault", regs, error_code, 14, 359 - SIGSEGV) == NOTIFY_STOP) 358 + if (notify_page_fault(regs, error_code) == NOTIFY_STOP) 360 359 return; 361 360 /* 362 361 * Don't take the mm semaphore here. If we fixup a prefetch ··· 364 365 goto bad_area_nosemaphore; 365 366 } 366 367 367 - if (notify_page_fault(DIE_PAGE_FAULT, "page fault", regs, error_code, 14, 368 - SIGSEGV) == NOTIFY_STOP) 368 + if (notify_page_fault(regs, error_code) == NOTIFY_STOP) 369 369 return; 370 370 371 371 if (likely(regs->eflags & X86_EFLAGS_IF))

+164 -38

arch/x86_64/mm/numa.c

··· 36 36 cpumask_t node_to_cpumask[MAX_NUMNODES] __read_mostly; 37 37 38 38 int numa_off __initdata; 39 + unsigned long __initdata nodemap_addr; 40 + unsigned long __initdata nodemap_size; 39 41 40 42 41 43 /* ··· 54 52 int res = -1; 55 53 unsigned long addr, end; 56 54 57 - if (shift >= 64) 58 - return -1; 59 - memset(memnodemap, 0xff, sizeof(memnodemap)); 55 + memset(memnodemap, 0xff, memnodemapsize); 60 56 for (i = 0; i < numnodes; i++) { 61 57 addr = nodes[i].start; 62 58 end = nodes[i].end; 63 59 if (addr >= end) 64 60 continue; 65 - if ((end >> shift) >= NODEMAPSIZE) 61 + if ((end >> shift) >= memnodemapsize) 66 62 return 0; 67 63 do { 68 64 if (memnodemap[addr >> shift] != 0xff) 69 65 return -1; 70 66 memnodemap[addr >> shift] = i; 71 - addr += (1UL << shift); 67 + addr += (1UL << shift); 72 68 } while (addr < end); 73 69 res = 1; 74 70 } 75 71 return res; 76 72 } 77 73 74 + static int __init allocate_cachealigned_memnodemap(void) 75 + { 76 + unsigned long pad, pad_addr; 77 + 78 + memnodemap = memnode.embedded_map; 79 + if (memnodemapsize <= 48) 80 + return 0; 81 + 82 + pad = L1_CACHE_BYTES - 1; 83 + pad_addr = 0x8000; 84 + nodemap_size = pad + memnodemapsize; 85 + nodemap_addr = find_e820_area(pad_addr, end_pfn<<PAGE_SHIFT, 86 + nodemap_size); 87 + if (nodemap_addr == -1UL) { 88 + printk(KERN_ERR 89 + "NUMA: Unable to allocate Memory to Node hash map\n"); 90 + nodemap_addr = nodemap_size = 0; 91 + return -1; 92 + } 93 + pad_addr = (nodemap_addr + pad) & ~pad; 94 + memnodemap = phys_to_virt(pad_addr); 95 + 96 + printk(KERN_DEBUG "NUMA: Allocated memnodemap from %lx - %lx\n", 97 + nodemap_addr, nodemap_addr + nodemap_size); 98 + return 0; 99 + } 100 + 101 + /* 102 + * The LSB of all start and end addresses in the node map is the value of the 103 + * maximum possible shift. 104 + */ 105 + static int __init 106 + extract_lsb_from_nodes (const struct bootnode *nodes, int numnodes) 107 + { 108 + int i, nodes_used = 0; 109 + unsigned long start, end; 110 + unsigned long bitfield = 0, memtop = 0; 111 + 112 + for (i = 0; i < numnodes; i++) { 113 + start = nodes[i].start; 114 + end = nodes[i].end; 115 + if (start >= end) 116 + continue; 117 + bitfield |= start; 118 + nodes_used++; 119 + if (end > memtop) 120 + memtop = end; 121 + } 122 + if (nodes_used <= 1) 123 + i = 63; 124 + else 125 + i = find_first_bit(&bitfield, sizeof(unsigned long)*8); 126 + memnodemapsize = (memtop >> i)+1; 127 + return i; 128 + } 129 + 78 130 int __init compute_hash_shift(struct bootnode *nodes, int numnodes) 79 131 { 80 - int shift = 20; 132 + int shift; 81 133 82 - while (populate_memnodemap(nodes, numnodes, shift + 1) >= 0) 83 - shift++; 84 - 134 + shift = extract_lsb_from_nodes(nodes, numnodes); 135 + if (allocate_cachealigned_memnodemap()) 136 + return -1; 85 137 printk(KERN_DEBUG "NUMA: Using %d for the hash shift.\n", 86 138 shift); 87 139 ··· 272 216 } 273 217 274 218 #ifdef CONFIG_NUMA_EMU 219 + /* Numa emulation */ 275 220 int numa_fake __initdata = 0; 276 221 277 - /* Numa emulation */ 222 + /* 223 + * This function is used to find out if the start and end correspond to 224 + * different zones. 225 + */ 226 + int zone_cross_over(unsigned long start, unsigned long end) 227 + { 228 + if ((start < (MAX_DMA32_PFN << PAGE_SHIFT)) && 229 + (end >= (MAX_DMA32_PFN << PAGE_SHIFT))) 230 + return 1; 231 + return 0; 232 + } 233 + 278 234 static int __init numa_emulation(unsigned long start_pfn, unsigned long end_pfn) 279 235 { 280 - int i; 236 + int i, big; 281 237 struct bootnode nodes[MAX_NUMNODES]; 282 - unsigned long sz = ((end_pfn - start_pfn)<<PAGE_SHIFT) / numa_fake; 238 + unsigned long sz, old_sz; 239 + unsigned long hole_size; 240 + unsigned long start, end; 241 + unsigned long max_addr = (end_pfn << PAGE_SHIFT); 242 + 243 + start = (start_pfn << PAGE_SHIFT); 244 + hole_size = e820_hole_size(start, max_addr); 245 + sz = (max_addr - start - hole_size) / numa_fake; 283 246 284 247 /* Kludge needed for the hash function */ 285 - if (hweight64(sz) > 1) { 286 - unsigned long x = 1; 287 - while ((x << 1) < sz) 288 - x <<= 1; 289 - if (x < sz/2) 290 - printk(KERN_ERR "Numa emulation unbalanced. Complain to maintainer\n"); 291 - sz = x; 292 - } 293 248 249 + old_sz = sz; 250 + /* 251 + * Round down to the nearest FAKE_NODE_MIN_SIZE. 252 + */ 253 + sz &= FAKE_NODE_MIN_HASH_MASK; 254 + 255 + /* 256 + * We ensure that each node is at least 64MB big. Smaller than this 257 + * size can cause VM hiccups. 258 + */ 259 + if (sz == 0) { 260 + printk(KERN_INFO "Not enough memory for %d nodes. Reducing " 261 + "the number of nodes\n", numa_fake); 262 + numa_fake = (max_addr - start - hole_size) / FAKE_NODE_MIN_SIZE; 263 + printk(KERN_INFO "Number of fake nodes will be = %d\n", 264 + numa_fake); 265 + sz = FAKE_NODE_MIN_SIZE; 266 + } 267 + /* 268 + * Find out how many nodes can get an extra NODE_MIN_SIZE granule. 269 + * This logic ensures the extra memory gets distributed among as many 270 + * nodes as possible (as compared to one single node getting all that 271 + * extra memory. 272 + */ 273 + big = ((old_sz - sz) * numa_fake) / FAKE_NODE_MIN_SIZE; 274 + printk(KERN_INFO "Fake node Size: %luMB hole_size: %luMB big nodes: " 275 + "%d\n", 276 + (sz >> 20), (hole_size >> 20), big); 294 277 memset(&nodes,0,sizeof(nodes)); 278 + end = start; 295 279 for (i = 0; i < numa_fake; i++) { 296 - nodes[i].start = (start_pfn<<PAGE_SHIFT) + i*sz; 280 + /* 281 + * In case we are not able to allocate enough memory for all 282 + * the nodes, we reduce the number of fake nodes. 283 + */ 284 + if (end >= max_addr) { 285 + numa_fake = i - 1; 286 + break; 287 + } 288 + start = nodes[i].start = end; 289 + /* 290 + * Final node can have all the remaining memory. 291 + */ 297 292 if (i == numa_fake-1) 298 - sz = (end_pfn<<PAGE_SHIFT) - nodes[i].start; 299 - nodes[i].end = nodes[i].start + sz; 293 + sz = max_addr - start; 294 + end = nodes[i].start + sz; 295 + /* 296 + * Fir "big" number of nodes get extra granule. 297 + */ 298 + if (i < big) 299 + end += FAKE_NODE_MIN_SIZE; 300 + /* 301 + * Iterate over the range to ensure that this node gets at 302 + * least sz amount of RAM (excluding holes) 303 + */ 304 + while ((end - start - e820_hole_size(start, end)) < sz) { 305 + end += FAKE_NODE_MIN_SIZE; 306 + if (end >= max_addr) 307 + break; 308 + } 309 + /* 310 + * Look at the next node to make sure there is some real memory 311 + * to map. Bad things happen when the only memory present 312 + * in a zone on a fake node is IO hole. 313 + */ 314 + while (e820_hole_size(end, end + FAKE_NODE_MIN_SIZE) > 0) { 315 + if (zone_cross_over(start, end + sz)) { 316 + end = (MAX_DMA32_PFN << PAGE_SHIFT); 317 + break; 318 + } 319 + if (end >= max_addr) 320 + break; 321 + end += FAKE_NODE_MIN_SIZE; 322 + } 323 + if (end > max_addr) 324 + end = max_addr; 325 + nodes[i].end = end; 300 326 printk(KERN_INFO "Faking node %d at %016Lx-%016Lx (%LuMB)\n", 301 327 i, 302 328 nodes[i].start, nodes[i].end, ··· 428 290 end_pfn << PAGE_SHIFT); 429 291 /* setup dummy node covering all memory */ 430 292 memnode_shift = 63; 293 + memnodemap = memnode.embedded_map; 431 294 memnodemap[0] = 0; 432 295 nodes_clear(node_online_map); 433 296 node_set_online(0); ··· 460 321 return pages; 461 322 } 462 323 463 - #ifdef CONFIG_SPARSEMEM 464 - static void __init arch_sparse_init(void) 465 - { 466 - int i; 467 - 468 - for_each_online_node(i) 469 - memory_present(i, node_start_pfn(i), node_end_pfn(i)); 470 - 471 - sparse_init(); 472 - } 473 - #else 474 - #define arch_sparse_init() do {} while (0) 475 - #endif 476 - 477 324 void __init paging_init(void) 478 325 { 479 326 int i; ··· 469 344 max_zone_pfns[ZONE_DMA32] = MAX_DMA32_PFN; 470 345 max_zone_pfns[ZONE_NORMAL] = end_pfn; 471 346 472 - arch_sparse_init(); 347 + sparse_memory_present_with_active_regions(MAX_NUMNODES); 348 + sparse_init(); 473 349 474 350 for_each_online_node(i) { 475 351 setup_node_zones(i);

+3 -1

arch/x86_64/mm/pageattr.c

··· 107 107 pud_t *pud; 108 108 pmd_t *pmd; 109 109 pte_t large_pte; 110 + unsigned long pfn; 110 111 111 112 pgd = pgd_offset_k(address); 112 113 BUG_ON(pgd_none(*pgd)); ··· 115 114 BUG_ON(pud_none(*pud)); 116 115 pmd = pmd_offset(pud, address); 117 116 BUG_ON(pmd_val(*pmd) & _PAGE_PSE); 118 - large_pte = mk_pte_phys(__pa(address) & LARGE_PAGE_MASK, ref_prot); 117 + pfn = (__pa(address) & LARGE_PAGE_MASK) >> PAGE_SHIFT; 118 + large_pte = pfn_pte(pfn, ref_prot); 119 119 large_pte = pte_mkhuge(large_pte); 120 120 set_pte((pte_t *)pmd, large_pte); 121 121 }

+2 -1

arch/x86_64/pci/Makefile

··· 11 11 obj-$(CONFIG_ACPI) += acpi.o 12 12 obj-y += legacy.o irq.o common.o early.o 13 13 # mmconfig has a 64bit special 14 - obj-$(CONFIG_PCI_MMCONFIG) += mmconfig.o direct.o 14 + obj-$(CONFIG_PCI_MMCONFIG) += mmconfig.o direct.o mmconfig-shared.o 15 15 16 16 obj-$(CONFIG_NUMA) += k8-bus.o 17 17 ··· 24 24 i386-y += ../../i386/pci/i386.o 25 25 init-y += ../../i386/pci/init.o 26 26 early-y += ../../i386/pci/early.o 27 + mmconfig-shared-y += ../../i386/pci/mmconfig-shared.o

+29 -85

arch/x86_64/pci/mmconfig.c

··· 13 13 14 14 #include "pci.h" 15 15 16 - /* aperture is up to 256MB but BIOS may reserve less */ 17 - #define MMCONFIG_APER_MIN (2 * 1024*1024) 18 - #define MMCONFIG_APER_MAX (256 * 1024*1024) 19 - 20 - /* Verify the first 16 busses. We assume that systems with more busses 21 - get MCFG right. */ 22 - #define MAX_CHECK_BUS 16 23 - 24 - static DECLARE_BITMAP(fallback_slots, 32*MAX_CHECK_BUS); 25 - 26 16 /* Static virtual mapping of the MMCONFIG aperture */ 27 17 struct mmcfg_virt { 28 18 struct acpi_mcfg_allocation *cfg; ··· 22 32 23 33 static char __iomem *get_virt(unsigned int seg, unsigned bus) 24 34 { 25 - int cfg_num = -1; 26 35 struct acpi_mcfg_allocation *cfg; 36 + int cfg_num; 27 37 28 - while (1) { 29 - ++cfg_num; 30 - if (cfg_num >= pci_mmcfg_config_num) 31 - break; 38 + for (cfg_num = 0; cfg_num < pci_mmcfg_config_num; cfg_num++) { 32 39 cfg = pci_mmcfg_virt[cfg_num].cfg; 33 - if (cfg->pci_segment != seg) 34 - continue; 35 - if ((cfg->start_bus_number <= bus) && 40 + if (cfg->pci_segment == seg && 41 + (cfg->start_bus_number <= bus) && 36 42 (cfg->end_bus_number >= bus)) 37 43 return pci_mmcfg_virt[cfg_num].virt; 38 44 } 39 - 40 - /* Handle more broken MCFG tables on Asus etc. 41 - They only contain a single entry for bus 0-0. Assume 42 - this applies to all busses. */ 43 - cfg = &pci_mmcfg_config[0]; 44 - if (pci_mmcfg_config_num == 1 && 45 - cfg->pci_segment == 0 && 46 - (cfg->start_bus_number | cfg->end_bus_number) == 0) 47 - return pci_mmcfg_virt[0].virt; 48 45 49 46 /* Fall back to type 0 */ 50 47 return NULL; ··· 40 63 static char __iomem *pci_dev_base(unsigned int seg, unsigned int bus, unsigned int devfn) 41 64 { 42 65 char __iomem *addr; 43 - if (seg == 0 && bus < MAX_CHECK_BUS && 44 - test_bit(32*bus + PCI_SLOT(devfn), fallback_slots)) 66 + if (seg == 0 && bus < PCI_MMCFG_MAX_CHECK_BUS && 67 + test_bit(32*bus + PCI_SLOT(devfn), pci_mmcfg_fallback_slots)) 45 68 return NULL; 46 69 addr = get_virt(seg, bus); 47 70 if (!addr) ··· 112 135 .write = pci_mmcfg_write, 113 136 }; 114 137 115 - /* K8 systems have some devices (typically in the builtin northbridge) 116 - that are only accessible using type1 117 - Normally this can be expressed in the MCFG by not listing them 118 - and assigning suitable _SEGs, but this isn't implemented in some BIOS. 119 - Instead try to discover all devices on bus 0 that are unreachable using MM 120 - and fallback for them. */ 121 - static __init void unreachable_devices(void) 138 + static void __iomem * __init mcfg_ioremap(struct acpi_mcfg_allocation *cfg) 122 139 { 123 - int i, k; 124 - /* Use the max bus number from ACPI here? */ 125 - for (k = 0; k < MAX_CHECK_BUS; k++) { 126 - for (i = 0; i < 32; i++) { 127 - u32 val1; 128 - char __iomem *addr; 140 + void __iomem *addr; 141 + u32 size; 129 142 130 - pci_conf1_read(0, k, PCI_DEVFN(i,0), 0, 4, &val1); 131 - if (val1 == 0xffffffff) 132 - continue; 133 - addr = pci_dev_base(0, k, PCI_DEVFN(i, 0)); 134 - if (addr == NULL|| readl(addr) != val1) { 135 - set_bit(i + 32*k, fallback_slots); 136 - printk(KERN_NOTICE "PCI: No mmconfig possible" 137 - " on device %02x:%02x\n", k, i); 138 - } 139 - } 143 + size = (cfg->end_bus_number + 1) << 20; 144 + addr = ioremap_nocache(cfg->address, size); 145 + if (addr) { 146 + printk(KERN_INFO "PCI: Using MMCONFIG at %Lx - %Lx\n", 147 + cfg->address, cfg->address + size - 1); 140 148 } 149 + return addr; 141 150 } 142 151 143 - void __init pci_mmcfg_init(int type) 152 + int __init pci_mmcfg_arch_reachable(unsigned int seg, unsigned int bus, 153 + unsigned int devfn) 154 + { 155 + return pci_dev_base(seg, bus, devfn) != NULL; 156 + } 157 + 158 + int __init pci_mmcfg_arch_init(void) 144 159 { 145 160 int i; 146 - 147 - if ((pci_probe & PCI_PROBE_MMCONF) == 0) 148 - return; 149 - 150 - acpi_table_parse(ACPI_SIG_MCFG, acpi_parse_mcfg); 151 - if ((pci_mmcfg_config_num == 0) || 152 - (pci_mmcfg_config == NULL) || 153 - (pci_mmcfg_config[0].address == 0)) 154 - return; 155 - 156 - /* Only do this check when type 1 works. If it doesn't work 157 - assume we run on a Mac and always use MCFG */ 158 - if (type == 1 && !e820_all_mapped(pci_mmcfg_config[0].address, 159 - pci_mmcfg_config[0].address + MMCONFIG_APER_MIN, 160 - E820_RESERVED)) { 161 - printk(KERN_ERR "PCI: BIOS Bug: MCFG area at %lx is not E820-reserved\n", 162 - (unsigned long)pci_mmcfg_config[0].address); 163 - printk(KERN_ERR "PCI: Not using MMCONFIG.\n"); 164 - return; 165 - } 166 - 167 - pci_mmcfg_virt = kmalloc(sizeof(*pci_mmcfg_virt) * pci_mmcfg_config_num, GFP_KERNEL); 161 + pci_mmcfg_virt = kmalloc(sizeof(*pci_mmcfg_virt) * 162 + pci_mmcfg_config_num, GFP_KERNEL); 168 163 if (pci_mmcfg_virt == NULL) { 169 164 printk(KERN_ERR "PCI: Can not allocate memory for mmconfig structures\n"); 170 - return; 165 + return 0; 171 166 } 167 + 172 168 for (i = 0; i < pci_mmcfg_config_num; ++i) { 173 169 pci_mmcfg_virt[i].cfg = &pci_mmcfg_config[i]; 174 - pci_mmcfg_virt[i].virt = ioremap_nocache(pci_mmcfg_config[i].address, 175 - MMCONFIG_APER_MAX); 170 + pci_mmcfg_virt[i].virt = mcfg_ioremap(&pci_mmcfg_config[i]); 176 171 if (!pci_mmcfg_virt[i].virt) { 177 172 printk(KERN_ERR "PCI: Cannot map mmconfig aperture for " 178 173 "segment %d\n", 179 174 pci_mmcfg_config[i].pci_segment); 180 - return; 175 + return 0; 181 176 } 182 - printk(KERN_INFO "PCI: Using MMCONFIG at %lx\n", 183 - (unsigned long)pci_mmcfg_config[i].address); 184 177 } 185 - 186 - unreachable_devices(); 187 - 188 178 raw_pci_ops = &pci_mmcfg; 189 - pci_probe = (pci_probe & ~PCI_PROBE_MASK) | PCI_PROBE_MMCONF; 179 + return 1; 190 180 }

+9

drivers/acpi/namespace/nsinit.c

··· 45 45 #include <acpi/acnamesp.h> 46 46 #include <acpi/acdispat.h> 47 47 #include <acpi/acinterp.h> 48 + #include <linux/nmi.h> 48 49 49 50 #define _COMPONENT ACPI_NAMESPACE 50 51 ACPI_MODULE_NAME("nsinit") ··· 535 534 info->parameter_type = ACPI_PARAM_ARGS; 536 535 info->flags = ACPI_IGNORE_RETURN_VALUE; 537 536 537 + /* 538 + * Some hardware relies on this being executed as atomically 539 + * as possible (without an NMI being received in the middle of 540 + * this) - so disable NMIs and initialize the device: 541 + */ 542 + acpi_nmi_disable(); 538 543 status = acpi_ns_evaluate(info); 544 + acpi_nmi_enable(); 545 + 539 546 if (ACPI_SUCCESS(status)) { 540 547 walk_info->num_INI++; 541 548

+6 -6

drivers/kvm/vmx.c

··· 1879 1879 1880 1880 asm ("mov %0, %%ds; mov %0, %%es" : : "r"(__USER_DS)); 1881 1881 1882 - /* 1883 - * Profile KVM exit RIPs: 1884 - */ 1885 - if (unlikely(prof_on == KVM_PROFILING)) 1886 - profile_hit(KVM_PROFILING, (void *)vmcs_readl(GUEST_RIP)); 1887 - 1888 1882 kvm_run->exit_type = 0; 1889 1883 if (fail) { 1890 1884 kvm_run->exit_type = KVM_EXIT_TYPE_FAIL_ENTRY; ··· 1901 1907 1902 1908 reload_tss(); 1903 1909 } 1910 + /* 1911 + * Profile KVM exit RIPs: 1912 + */ 1913 + if (unlikely(prof_on == KVM_PROFILING)) 1914 + profile_hit(KVM_PROFILING, (void *)vmcs_readl(GUEST_RIP)); 1915 + 1904 1916 vcpu->launched = 1; 1905 1917 kvm_run->exit_type = KVM_EXIT_TYPE_VM_EXIT; 1906 1918 r = kvm_handle_exit(kvm_run, vcpu);

+2 -1

fs/binfmt_elf.c

··· 76 76 .load_binary = load_elf_binary, 77 77 .load_shlib = load_elf_library, 78 78 .core_dump = elf_core_dump, 79 - .min_coredump = ELF_EXEC_PAGESIZE 79 + .min_coredump = ELF_EXEC_PAGESIZE, 80 + .hasvdso = 1 80 81 }; 81 82 82 83 #define BAD_ADDR(x) ((unsigned long)(x) >= TASK_SIZE)

+13

include/asm-generic/pgtable.h

··· 183 183 #endif 184 184 185 185 /* 186 + * A facility to provide batching of the reload of page tables with the 187 + * actual context switch code for paravirtualized guests. By convention, 188 + * only one of the lazy modes (CPU, MMU) should be active at any given 189 + * time, entry should never be nested, and entry and exits should always 190 + * be paired. This is for sanity of maintaining and reasoning about the 191 + * kernel code. 192 + */ 193 + #ifndef __HAVE_ARCH_ENTER_LAZY_CPU_MODE 194 + #define arch_enter_lazy_cpu_mode() do {} while (0) 195 + #define arch_leave_lazy_cpu_mode() do {} while (0) 196 + #endif 197 + 198 + /* 186 199 * When walking page tables, get the address of the next boundary, 187 200 * or the end address of the range if that comes earlier. Although no 188 201 * vma end wraps to 0, rounded up __boundary may wrap to 0 throughout.

+2

include/asm-i386/apic.h

··· 43 43 #define apic_write native_apic_write 44 44 #define apic_write_atomic native_apic_write_atomic 45 45 #define apic_read native_apic_read 46 + #define setup_boot_clock setup_boot_APIC_clock 47 + #define setup_secondary_clock setup_secondary_APIC_clock 46 48 #endif 47 49 48 50 static __inline fastcall void native_apic_write(unsigned long reg,

+1 -1

include/asm-i386/bugs.h

··· 160 160 * If we configured ourselves for a TSC, we'd better have one! 161 161 */ 162 162 #ifdef CONFIG_X86_TSC 163 - if (!cpu_has_tsc) 163 + if (!cpu_has_tsc && !tsc_disable) 164 164 panic("Kernel compiled for Pentium+, requires TSC feature!"); 165 165 #endif 166 166

+1 -1

include/asm-i386/desc.h

··· 22 22 23 23 extern struct Xgt_desc_struct idt_descr; 24 24 DECLARE_PER_CPU(struct Xgt_desc_struct, cpu_gdt_descr); 25 - 25 + extern struct Xgt_desc_struct early_gdt_descr; 26 26 27 27 static inline struct desc_struct *get_cpu_gdt_table(unsigned int cpu) 28 28 {

+2 -2

include/asm-i386/elf.h

··· 90 90 pr_reg[6] = regs->eax; \ 91 91 pr_reg[7] = regs->xds; \ 92 92 pr_reg[8] = regs->xes; \ 93 - savesegment(fs,pr_reg[9]); \ 94 - pr_reg[10] = regs->xgs; \ 93 + pr_reg[9] = regs->xfs; \ 94 + savesegment(gs,pr_reg[10]); \ 95 95 pr_reg[11] = regs->orig_eax; \ 96 96 pr_reg[12] = regs->eip; \ 97 97 pr_reg[13] = regs->xcs; \

+14

include/asm-i386/idle.h

··· 1 + #ifndef _ASM_I386_IDLE_H 2 + #define _ASM_I386_IDLE_H 1 3 + 4 + #define IDLE_START 1 5 + #define IDLE_END 2 6 + 7 + struct notifier_block; 8 + void idle_notifier_register(struct notifier_block *n); 9 + void idle_notifier_unregister(struct notifier_block *n); 10 + 11 + void exit_idle(void); 12 + void enter_idle(void); 13 + 14 + #endif

+2

include/asm-i386/mce.h

··· 3 3 #else 4 4 #define mcheck_init(c) do {} while(0) 5 5 #endif 6 + 7 + extern int mce_disabled;

+1 -1

include/asm-i386/mmu_context.h

··· 63 63 } 64 64 65 65 #define deactivate_mm(tsk, mm) \ 66 - asm("movl %0,%%fs": :"r" (0)); 66 + asm("movl %0,%%gs": :"r" (0)); 67 67 68 68 #define activate_mm(prev, next) \ 69 69 switch_mm((prev),(next),NULL)

+106 -56

include/asm-i386/paravirt.h

··· 59 59 convention. This makes it easier to implement inline 60 60 assembler replacements. */ 61 61 62 - void (fastcall *cpuid)(unsigned int *eax, unsigned int *ebx, 62 + void (*cpuid)(unsigned int *eax, unsigned int *ebx, 63 63 unsigned int *ecx, unsigned int *edx); 64 64 65 - unsigned long (fastcall *get_debugreg)(int regno); 66 - void (fastcall *set_debugreg)(int regno, unsigned long value); 65 + unsigned long (*get_debugreg)(int regno); 66 + void (*set_debugreg)(int regno, unsigned long value); 67 67 68 - void (fastcall *clts)(void); 68 + void (*clts)(void); 69 69 70 - unsigned long (fastcall *read_cr0)(void); 71 - void (fastcall *write_cr0)(unsigned long); 70 + unsigned long (*read_cr0)(void); 71 + void (*write_cr0)(unsigned long); 72 72 73 - unsigned long (fastcall *read_cr2)(void); 74 - void (fastcall *write_cr2)(unsigned long); 73 + unsigned long (*read_cr2)(void); 74 + void (*write_cr2)(unsigned long); 75 75 76 - unsigned long (fastcall *read_cr3)(void); 77 - void (fastcall *write_cr3)(unsigned long); 76 + unsigned long (*read_cr3)(void); 77 + void (*write_cr3)(unsigned long); 78 78 79 - unsigned long (fastcall *read_cr4_safe)(void); 80 - unsigned long (fastcall *read_cr4)(void); 81 - void (fastcall *write_cr4)(unsigned long); 79 + unsigned long (*read_cr4_safe)(void); 80 + unsigned long (*read_cr4)(void); 81 + void (*write_cr4)(unsigned long); 82 82 83 - unsigned long (fastcall *save_fl)(void); 84 - void (fastcall *restore_fl)(unsigned long); 85 - void (fastcall *irq_disable)(void); 86 - void (fastcall *irq_enable)(void); 87 - void (fastcall *safe_halt)(void); 88 - void (fastcall *halt)(void); 89 - void (fastcall *wbinvd)(void); 83 + unsigned long (*save_fl)(void); 84 + void (*restore_fl)(unsigned long); 85 + void (*irq_disable)(void); 86 + void (*irq_enable)(void); 87 + void (*safe_halt)(void); 88 + void (*halt)(void); 89 + void (*wbinvd)(void); 90 90 91 91 /* err = 0/-EFAULT. wrmsr returns 0/-EFAULT. */ 92 - u64 (fastcall *read_msr)(unsigned int msr, int *err); 93 - int (fastcall *write_msr)(unsigned int msr, u64 val); 92 + u64 (*read_msr)(unsigned int msr, int *err); 93 + int (*write_msr)(unsigned int msr, u64 val); 94 94 95 - u64 (fastcall *read_tsc)(void); 96 - u64 (fastcall *read_pmc)(void); 95 + u64 (*read_tsc)(void); 96 + u64 (*read_pmc)(void); 97 97 98 - void (fastcall *load_tr_desc)(void); 99 - void (fastcall *load_gdt)(const struct Xgt_desc_struct *); 100 - void (fastcall *load_idt)(const struct Xgt_desc_struct *); 101 - void (fastcall *store_gdt)(struct Xgt_desc_struct *); 102 - void (fastcall *store_idt)(struct Xgt_desc_struct *); 103 - void (fastcall *set_ldt)(const void *desc, unsigned entries); 104 - unsigned long (fastcall *store_tr)(void); 105 - void (fastcall *load_tls)(struct thread_struct *t, unsigned int cpu); 106 - void (fastcall *write_ldt_entry)(void *dt, int entrynum, 98 + void (*load_tr_desc)(void); 99 + void (*load_gdt)(const struct Xgt_desc_struct *); 100 + void (*load_idt)(const struct Xgt_desc_struct *); 101 + void (*store_gdt)(struct Xgt_desc_struct *); 102 + void (*store_idt)(struct Xgt_desc_struct *); 103 + void (*set_ldt)(const void *desc, unsigned entries); 104 + unsigned long (*store_tr)(void); 105 + void (*load_tls)(struct thread_struct *t, unsigned int cpu); 106 + void (*write_ldt_entry)(void *dt, int entrynum, 107 107 u32 low, u32 high); 108 - void (fastcall *write_gdt_entry)(void *dt, int entrynum, 108 + void (*write_gdt_entry)(void *dt, int entrynum, 109 109 u32 low, u32 high); 110 - void (fastcall *write_idt_entry)(void *dt, int entrynum, 110 + void (*write_idt_entry)(void *dt, int entrynum, 111 111 u32 low, u32 high); 112 - void (fastcall *load_esp0)(struct tss_struct *tss, 112 + void (*load_esp0)(struct tss_struct *tss, 113 113 struct thread_struct *thread); 114 114 115 - void (fastcall *set_iopl_mask)(unsigned mask); 115 + void (*set_iopl_mask)(unsigned mask); 116 116 117 - void (fastcall *io_delay)(void); 117 + void (*io_delay)(void); 118 118 void (*const_udelay)(unsigned long loops); 119 119 120 120 #ifdef CONFIG_X86_LOCAL_APIC 121 - void (fastcall *apic_write)(unsigned long reg, unsigned long v); 122 - void (fastcall *apic_write_atomic)(unsigned long reg, unsigned long v); 123 - unsigned long (fastcall *apic_read)(unsigned long reg); 121 + void (*apic_write)(unsigned long reg, unsigned long v); 122 + void (*apic_write_atomic)(unsigned long reg, unsigned long v); 123 + unsigned long (*apic_read)(unsigned long reg); 124 + void (*setup_boot_clock)(void); 125 + void (*setup_secondary_clock)(void); 124 126 #endif 125 127 126 - void (fastcall *flush_tlb_user)(void); 127 - void (fastcall *flush_tlb_kernel)(void); 128 - void (fastcall *flush_tlb_single)(u32 addr); 128 + void (*flush_tlb_user)(void); 129 + void (*flush_tlb_kernel)(void); 130 + void (*flush_tlb_single)(u32 addr); 129 131 130 - void (fastcall *set_pte)(pte_t *ptep, pte_t pteval); 131 - void (fastcall *set_pte_at)(struct mm_struct *mm, u32 addr, pte_t *ptep, pte_t pteval); 132 - void (fastcall *set_pmd)(pmd_t *pmdp, pmd_t pmdval); 133 - void (fastcall *pte_update)(struct mm_struct *mm, u32 addr, pte_t *ptep); 134 - void (fastcall *pte_update_defer)(struct mm_struct *mm, u32 addr, pte_t *ptep); 132 + void (*alloc_pt)(u32 pfn); 133 + void (*alloc_pd)(u32 pfn); 134 + void (*alloc_pd_clone)(u32 pfn, u32 clonepfn, u32 start, u32 count); 135 + void (*release_pt)(u32 pfn); 136 + void (*release_pd)(u32 pfn); 137 + 138 + void (*set_pte)(pte_t *ptep, pte_t pteval); 139 + void (*set_pte_at)(struct mm_struct *mm, u32 addr, pte_t *ptep, pte_t pteval); 140 + void (*set_pmd)(pmd_t *pmdp, pmd_t pmdval); 141 + void (*pte_update)(struct mm_struct *mm, u32 addr, pte_t *ptep); 142 + void (*pte_update_defer)(struct mm_struct *mm, u32 addr, pte_t *ptep); 135 143 #ifdef CONFIG_X86_PAE 136 - void (fastcall *set_pte_atomic)(pte_t *ptep, pte_t pteval); 137 - void (fastcall *set_pte_present)(struct mm_struct *mm, unsigned long addr, pte_t *ptep, pte_t pte); 138 - void (fastcall *set_pud)(pud_t *pudp, pud_t pudval); 139 - void (fastcall *pte_clear)(struct mm_struct *mm, unsigned long addr, pte_t *ptep); 140 - void (fastcall *pmd_clear)(pmd_t *pmdp); 144 + void (*set_pte_atomic)(pte_t *ptep, pte_t pteval); 145 + void (*set_pte_present)(struct mm_struct *mm, unsigned long addr, pte_t *ptep, pte_t pte); 146 + void (*set_pud)(pud_t *pudp, pud_t pudval); 147 + void (*pte_clear)(struct mm_struct *mm, unsigned long addr, pte_t *ptep); 148 + void (*pmd_clear)(pmd_t *pmdp); 141 149 #endif 150 + 151 + void (*set_lazy_mode)(int mode); 142 152 143 153 /* These two are jmp to, not actually called. */ 144 - void (fastcall *irq_enable_sysexit)(void); 145 - void (fastcall *iret)(void); 154 + void (*irq_enable_sysexit)(void); 155 + void (*iret)(void); 156 + 157 + void (*startup_ipi_hook)(int phys_apicid, unsigned long start_eip, unsigned long start_esp); 146 158 }; 147 159 148 160 /* Mark a paravirt probe function. */ ··· 325 313 { 326 314 return paravirt_ops.apic_read(reg); 327 315 } 316 + 317 + static inline void setup_boot_clock(void) 318 + { 319 + paravirt_ops.setup_boot_clock(); 320 + } 321 + 322 + static inline void setup_secondary_clock(void) 323 + { 324 + paravirt_ops.setup_secondary_clock(); 325 + } 328 326 #endif 329 327 328 + #ifdef CONFIG_SMP 329 + static inline void startup_ipi_hook(int phys_apicid, unsigned long start_eip, 330 + unsigned long start_esp) 331 + { 332 + return paravirt_ops.startup_ipi_hook(phys_apicid, start_eip, start_esp); 333 + } 334 + #endif 330 335 331 336 #define __flush_tlb() paravirt_ops.flush_tlb_user() 332 337 #define __flush_tlb_global() paravirt_ops.flush_tlb_kernel() 333 338 #define __flush_tlb_single(addr) paravirt_ops.flush_tlb_single(addr) 339 + 340 + #define paravirt_alloc_pt(pfn) paravirt_ops.alloc_pt(pfn) 341 + #define paravirt_release_pt(pfn) paravirt_ops.release_pt(pfn) 342 + 343 + #define paravirt_alloc_pd(pfn) paravirt_ops.alloc_pd(pfn) 344 + #define paravirt_alloc_pd_clone(pfn, clonepfn, start, count) \ 345 + paravirt_ops.alloc_pd_clone(pfn, clonepfn, start, count) 346 + #define paravirt_release_pd(pfn) paravirt_ops.release_pd(pfn) 334 347 335 348 static inline void set_pte(pte_t *ptep, pte_t pteval) 336 349 { ··· 408 371 paravirt_ops.pmd_clear(pmdp); 409 372 } 410 373 #endif 374 + 375 + /* Lazy mode for batching updates / context switch */ 376 + #define PARAVIRT_LAZY_NONE 0 377 + #define PARAVIRT_LAZY_MMU 1 378 + #define PARAVIRT_LAZY_CPU 2 379 + 380 + #define __HAVE_ARCH_ENTER_LAZY_CPU_MODE 381 + #define arch_enter_lazy_cpu_mode() paravirt_ops.set_lazy_mode(PARAVIRT_LAZY_CPU) 382 + #define arch_leave_lazy_cpu_mode() paravirt_ops.set_lazy_mode(PARAVIRT_LAZY_NONE) 383 + 384 + #define __HAVE_ARCH_ENTER_LAZY_MMU_MODE 385 + #define arch_enter_lazy_mmu_mode() paravirt_ops.set_lazy_mode(PARAVIRT_LAZY_MMU) 386 + #define arch_leave_lazy_mmu_mode() paravirt_ops.set_lazy_mode(PARAVIRT_LAZY_NONE) 411 387 412 388 /* These all sit in the .parainstructions section to tell us what to patch. */ 413 389 struct paravirt_patch {

+6 -6

include/asm-i386/pda.h

··· 39 39 if (0) { T__ tmp__; tmp__ = (val); } \ 40 40 switch (sizeof(_proxy_pda.field)) { \ 41 41 case 1: \ 42 - asm(op "b %1,%%gs:%c2" \ 42 + asm(op "b %1,%%fs:%c2" \ 43 43 : "+m" (_proxy_pda.field) \ 44 44 :"ri" ((T__)val), \ 45 45 "i"(pda_offset(field))); \ 46 46 break; \ 47 47 case 2: \ 48 - asm(op "w %1,%%gs:%c2" \ 48 + asm(op "w %1,%%fs:%c2" \ 49 49 : "+m" (_proxy_pda.field) \ 50 50 :"ri" ((T__)val), \ 51 51 "i"(pda_offset(field))); \ 52 52 break; \ 53 53 case 4: \ 54 - asm(op "l %1,%%gs:%c2" \ 54 + asm(op "l %1,%%fs:%c2" \ 55 55 : "+m" (_proxy_pda.field) \ 56 56 :"ri" ((T__)val), \ 57 57 "i"(pda_offset(field))); \ ··· 65 65 typeof(_proxy_pda.field) ret__; \ 66 66 switch (sizeof(_proxy_pda.field)) { \ 67 67 case 1: \ 68 - asm(op "b %%gs:%c1,%0" \ 68 + asm(op "b %%fs:%c1,%0" \ 69 69 : "=r" (ret__) \ 70 70 : "i" (pda_offset(field)), \ 71 71 "m" (_proxy_pda.field)); \ 72 72 break; \ 73 73 case 2: \ 74 - asm(op "w %%gs:%c1,%0" \ 74 + asm(op "w %%fs:%c1,%0" \ 75 75 : "=r" (ret__) \ 76 76 : "i" (pda_offset(field)), \ 77 77 "m" (_proxy_pda.field)); \ 78 78 break; \ 79 79 case 4: \ 80 - asm(op "l %%gs:%c1,%0" \ 80 + asm(op "l %%fs:%c1,%0" \ 81 81 : "=r" (ret__) \ 82 82 : "i" (pda_offset(field)), \ 83 83 "m" (_proxy_pda.field)); \

+26 -4

include/asm-i386/pgalloc.h

··· 5 5 #include <linux/threads.h> 6 6 #include <linux/mm.h> /* for struct page */ 7 7 8 - #define pmd_populate_kernel(mm, pmd, pte) \ 9 - set_pmd(pmd, __pmd(_PAGE_TABLE + __pa(pte))) 8 + #ifdef CONFIG_PARAVIRT 9 + #include <asm/paravirt.h> 10 + #else 11 + #define paravirt_alloc_pt(pfn) do { } while (0) 12 + #define paravirt_alloc_pd(pfn) do { } while (0) 13 + #define paravirt_alloc_pd(pfn) do { } while (0) 14 + #define paravirt_alloc_pd_clone(pfn, clonepfn, start, count) do { } while (0) 15 + #define paravirt_release_pt(pfn) do { } while (0) 16 + #define paravirt_release_pd(pfn) do { } while (0) 17 + #endif 18 + 19 + #define pmd_populate_kernel(mm, pmd, pte) \ 20 + do { \ 21 + paravirt_alloc_pt(__pa(pte) >> PAGE_SHIFT); \ 22 + set_pmd(pmd, __pmd(_PAGE_TABLE + __pa(pte))); \ 23 + } while (0) 10 24 11 25 #define pmd_populate(mm, pmd, pte) \ 26 + do { \ 27 + paravirt_alloc_pt(page_to_pfn(pte)); \ 12 28 set_pmd(pmd, __pmd(_PAGE_TABLE + \ 13 29 ((unsigned long long)page_to_pfn(pte) << \ 14 - (unsigned long long) PAGE_SHIFT))) 30 + (unsigned long long) PAGE_SHIFT))); \ 31 + } while (0) 32 + 15 33 /* 16 34 * Allocate and free page tables. 17 35 */ ··· 50 32 } 51 33 52 34 53 - #define __pte_free_tlb(tlb,pte) tlb_remove_page((tlb),(pte)) 35 + #define __pte_free_tlb(tlb,pte) \ 36 + do { \ 37 + paravirt_release_pt(page_to_pfn(pte)); \ 38 + tlb_remove_page((tlb),(pte)); \ 39 + } while (0) 54 40 55 41 #ifdef CONFIG_X86_PAE 56 42 /*

+11 -3

include/asm-i386/processor.h

··· 257 257 : :"a" (eax), "c" (ecx)); 258 258 } 259 259 260 + static inline void __sti_mwait(unsigned long eax, unsigned long ecx) 261 + { 262 + /* "mwait %eax,%ecx;" */ 263 + asm volatile( 264 + "sti; .byte 0x0f,0x01,0xc9;" 265 + : :"a" (eax), "c" (ecx)); 266 + } 267 + 260 268 extern void mwait_idle_with_hints(unsigned long eax, unsigned long ecx); 261 269 262 270 /* from system description table in BIOS. Mostly for MCA use, but ··· 432 424 .vm86_info = NULL, \ 433 425 .sysenter_cs = __KERNEL_CS, \ 434 426 .io_bitmap_ptr = NULL, \ 435 - .gs = __KERNEL_PDA, \ 427 + .fs = __KERNEL_PDA, \ 436 428 } 437 429 438 430 /* ··· 450 442 } 451 443 452 444 #define start_thread(regs, new_eip, new_esp) do { \ 453 - __asm__("movl %0,%%fs": :"r" (0)); \ 454 - regs->xgs = 0; \ 445 + __asm__("movl %0,%%gs": :"r" (0)); \ 446 + regs->xfs = 0; \ 455 447 set_fs(USER_DS); \ 456 448 regs->xds = __USER_DS; \ 457 449 regs->xes = __USER_DS; \

+6 -2

include/asm-i386/ptrace.h

··· 16 16 long eax; 17 17 int xds; 18 18 int xes; 19 - /* int xfs; */ 20 - int xgs; 19 + int xfs; 20 + /* int xgs; */ 21 21 long orig_eax; 22 22 long eip; 23 23 int xcs; ··· 48 48 static inline int user_mode_vm(struct pt_regs *regs) 49 49 { 50 50 return ((regs->xcs & SEGMENT_RPL_MASK) | (regs->eflags & VM_MASK)) >= USER_RPL; 51 + } 52 + static inline int v8086_mode(struct pt_regs *regs) 53 + { 54 + return (regs->eflags & VM_MASK); 51 55 } 52 56 53 57 #define instruction_pointer(regs) ((regs)->eip)

+13 -6

include/asm-i386/segment.h

··· 83 83 * The GDT has 32 entries 84 84 */ 85 85 #define GDT_ENTRIES 32 86 - 87 86 #define GDT_SIZE (GDT_ENTRIES * 8) 88 - 89 - /* Matches __KERNEL_CS and __USER_CS (they must be 2 entries apart) */ 90 - #define SEGMENT_IS_FLAT_CODE(x) (((x) & 0xec) == GDT_ENTRY_KERNEL_CS * 8) 91 - /* Matches PNP_CS32 and PNP_CS16 (they must be consecutive) */ 92 - #define SEGMENT_IS_PNP_CODE(x) (((x) & 0xf4) == GDT_ENTRY_PNPBIOS_BASE * 8) 93 87 94 88 /* Simple and small GDT entries for booting only */ 95 89 ··· 128 134 #ifndef CONFIG_PARAVIRT 129 135 #define get_kernel_rpl() 0 130 136 #endif 137 + /* 138 + * Matching rules for certain types of segments. 139 + */ 140 + 141 + /* Matches only __KERNEL_CS, ignoring PnP / USER / APM segments */ 142 + #define SEGMENT_IS_KERNEL_CODE(x) (((x) & 0xfc) == GDT_ENTRY_KERNEL_CS * 8) 143 + 144 + /* Matches __KERNEL_CS and __USER_CS (they must be 2 entries apart) */ 145 + #define SEGMENT_IS_FLAT_CODE(x) (((x) & 0xec) == GDT_ENTRY_KERNEL_CS * 8) 146 + 147 + /* Matches PNP_CS32 and PNP_CS16 (they must be consecutive) */ 148 + #define SEGMENT_IS_PNP_CODE(x) (((x) & 0xf4) == GDT_ENTRY_PNPBIOS_BASE * 8) 149 + 131 150 #endif

+2

include/asm-i386/setup.h

··· 77 77 void __init add_memory_region(unsigned long long start, 78 78 unsigned long long size, int type); 79 79 80 + extern unsigned long init_pg_tables_end; 81 + 80 82 #endif /* __ASSEMBLY__ */ 81 83 82 84 #endif /* __KERNEL__ */

+5

include/asm-i386/smp.h

··· 52 52 extern void cpu_uninit(void); 53 53 #endif 54 54 55 + #ifndef CONFIG_PARAVIRT 56 + #define startup_ipi_hook(phys_apicid, start_eip, start_esp) \ 57 + do { } while (0) 58 + #endif 59 + 55 60 /* 56 61 * This function is needed by all SMP systems. It must _always_ be valid 57 62 * from the initial startup. We map APIC_BASE very early in page_setup(),

+1

include/asm-i386/time.h

··· 30 30 31 31 #ifdef CONFIG_PARAVIRT 32 32 #include <asm/paravirt.h> 33 + extern unsigned long long native_sched_clock(void); 33 34 #else /* !CONFIG_PARAVIRT */ 34 35 35 36 #define get_wallclock() native_get_wallclock()

+3

include/asm-i386/timer.h

··· 8 8 /* Modifiers for buggy PIT handling */ 9 9 extern int pit_latch_buggy; 10 10 extern int timer_ack; 11 + extern int no_timer_check; 12 + extern unsigned long long (*custom_sched_clock)(void); 13 + extern int no_sync_cmos_clock; 11 14 extern int recalibrate_cpu_khz(void); 12 15 13 16 #endif

+262

include/asm-i386/vmi.h

··· 1 + /* 2 + * VMI interface definition 3 + * 4 + * Copyright (C) 2005, VMware, Inc. 5 + * 6 + * This program is free software; you can redistribute it and/or modify 7 + * it under the terms of the GNU General Public License as published by 8 + * the Free Software Foundation; either version 2 of the License, or 9 + * (at your option) any later version. 10 + * 11 + * This program is distributed in the hope that it will be useful, but 12 + * WITHOUT ANY WARRANTY; without even the implied warranty of 13 + * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or 14 + * NON INFRINGEMENT. See the GNU General Public License for more 15 + * details. 16 + * 17 + * You should have received a copy of the GNU General Public License 18 + * along with this program; if not, write to the Free Software 19 + * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. 20 + * 21 + * Maintained by: Zachary Amsden zach@vmware.com 22 + * 23 + */ 24 + #include <linux/types.h> 25 + 26 + /* 27 + *--------------------------------------------------------------------- 28 + * 29 + * VMI Option ROM API 30 + * 31 + *--------------------------------------------------------------------- 32 + */ 33 + #define VMI_SIGNATURE 0x696d5663 /* "cVmi" */ 34 + 35 + #define PCI_VENDOR_ID_VMWARE 0x15AD 36 + #define PCI_DEVICE_ID_VMWARE_VMI 0x0801 37 + 38 + /* 39 + * We use two version numbers for compatibility, with the major 40 + * number signifying interface breakages, and the minor number 41 + * interface extensions. 42 + */ 43 + #define VMI_API_REV_MAJOR 3 44 + #define VMI_API_REV_MINOR 0 45 + 46 + #define VMI_CALL_CPUID 0 47 + #define VMI_CALL_WRMSR 1 48 + #define VMI_CALL_RDMSR 2 49 + #define VMI_CALL_SetGDT 3 50 + #define VMI_CALL_SetLDT 4 51 + #define VMI_CALL_SetIDT 5 52 + #define VMI_CALL_SetTR 6 53 + #define VMI_CALL_GetGDT 7 54 + #define VMI_CALL_GetLDT 8 55 + #define VMI_CALL_GetIDT 9 56 + #define VMI_CALL_GetTR 10 57 + #define VMI_CALL_WriteGDTEntry 11 58 + #define VMI_CALL_WriteLDTEntry 12 59 + #define VMI_CALL_WriteIDTEntry 13 60 + #define VMI_CALL_UpdateKernelStack 14 61 + #define VMI_CALL_SetCR0 15 62 + #define VMI_CALL_SetCR2 16 63 + #define VMI_CALL_SetCR3 17 64 + #define VMI_CALL_SetCR4 18 65 + #define VMI_CALL_GetCR0 19 66 + #define VMI_CALL_GetCR2 20 67 + #define VMI_CALL_GetCR3 21 68 + #define VMI_CALL_GetCR4 22 69 + #define VMI_CALL_WBINVD 23 70 + #define VMI_CALL_SetDR 24 71 + #define VMI_CALL_GetDR 25 72 + #define VMI_CALL_RDPMC 26 73 + #define VMI_CALL_RDTSC 27 74 + #define VMI_CALL_CLTS 28 75 + #define VMI_CALL_EnableInterrupts 29 76 + #define VMI_CALL_DisableInterrupts 30 77 + #define VMI_CALL_GetInterruptMask 31 78 + #define VMI_CALL_SetInterruptMask 32 79 + #define VMI_CALL_IRET 33 80 + #define VMI_CALL_SYSEXIT 34 81 + #define VMI_CALL_Halt 35 82 + #define VMI_CALL_Reboot 36 83 + #define VMI_CALL_Shutdown 37 84 + #define VMI_CALL_SetPxE 38 85 + #define VMI_CALL_SetPxELong 39 86 + #define VMI_CALL_UpdatePxE 40 87 + #define VMI_CALL_UpdatePxELong 41 88 + #define VMI_CALL_MachineToPhysical 42 89 + #define VMI_CALL_PhysicalToMachine 43 90 + #define VMI_CALL_AllocatePage 44 91 + #define VMI_CALL_ReleasePage 45 92 + #define VMI_CALL_InvalPage 46 93 + #define VMI_CALL_FlushTLB 47 94 + #define VMI_CALL_SetLinearMapping 48 95 + 96 + #define VMI_CALL_SetIOPLMask 61 97 + #define VMI_CALL_SetInitialAPState 62 98 + #define VMI_CALL_APICWrite 63 99 + #define VMI_CALL_APICRead 64 100 + #define VMI_CALL_SetLazyMode 73 101 + 102 + /* 103 + *--------------------------------------------------------------------- 104 + * 105 + * MMU operation flags 106 + * 107 + *--------------------------------------------------------------------- 108 + */ 109 + 110 + /* Flags used by VMI_{Allocate|Release}Page call */ 111 + #define VMI_PAGE_PAE 0x10 /* Allocate PAE shadow */ 112 + #define VMI_PAGE_CLONE 0x20 /* Clone from another shadow */ 113 + #define VMI_PAGE_ZEROED 0x40 /* Page is pre-zeroed */ 114 + 115 + 116 + /* Flags shared by Allocate|Release Page and PTE updates */ 117 + #define VMI_PAGE_PT 0x01 118 + #define VMI_PAGE_PD 0x02 119 + #define VMI_PAGE_PDP 0x04 120 + #define VMI_PAGE_PML4 0x08 121 + 122 + #define VMI_PAGE_NORMAL 0x00 /* for debugging */ 123 + 124 + /* Flags used by PTE updates */ 125 + #define VMI_PAGE_CURRENT_AS 0x10 /* implies VMI_PAGE_VA_MASK is valid */ 126 + #define VMI_PAGE_DEFER 0x20 /* may queue update until TLB inval */ 127 + #define VMI_PAGE_VA_MASK 0xfffff000 128 + 129 + #ifdef CONFIG_X86_PAE 130 + #define VMI_PAGE_L1 (VMI_PAGE_PT | VMI_PAGE_PAE | VMI_PAGE_ZEROED) 131 + #define VMI_PAGE_L2 (VMI_PAGE_PD | VMI_PAGE_PAE | VMI_PAGE_ZEROED) 132 + #else 133 + #define VMI_PAGE_L1 (VMI_PAGE_PT | VMI_PAGE_ZEROED) 134 + #define VMI_PAGE_L2 (VMI_PAGE_PD | VMI_PAGE_ZEROED) 135 + #endif 136 + 137 + /* Flags used by VMI_FlushTLB call */ 138 + #define VMI_FLUSH_TLB 0x01 139 + #define VMI_FLUSH_GLOBAL 0x02 140 + 141 + /* 142 + *--------------------------------------------------------------------- 143 + * 144 + * VMI relocation definitions for ROM call get_reloc 145 + * 146 + *--------------------------------------------------------------------- 147 + */ 148 + 149 + /* VMI Relocation types */ 150 + #define VMI_RELOCATION_NONE 0 151 + #define VMI_RELOCATION_CALL_REL 1 152 + #define VMI_RELOCATION_JUMP_REL 2 153 + #define VMI_RELOCATION_NOP 3 154 + 155 + #ifndef __ASSEMBLY__ 156 + struct vmi_relocation_info { 157 + unsigned char *eip; 158 + unsigned char type; 159 + unsigned char reserved[3]; 160 + }; 161 + #endif 162 + 163 + 164 + /* 165 + *--------------------------------------------------------------------- 166 + * 167 + * Generic ROM structures and definitions 168 + * 169 + *--------------------------------------------------------------------- 170 + */ 171 + 172 + #ifndef __ASSEMBLY__ 173 + 174 + struct vrom_header { 175 + u16 rom_signature; // option ROM signature 176 + u8 rom_length; // ROM length in 512 byte chunks 177 + u8 rom_entry[4]; // 16-bit code entry point 178 + u8 rom_pad0; // 4-byte align pad 179 + u32 vrom_signature; // VROM identification signature 180 + u8 api_version_min;// Minor version of API 181 + u8 api_version_maj;// Major version of API 182 + u8 jump_slots; // Number of jump slots 183 + u8 reserved1; // Reserved for expansion 184 + u32 virtual_top; // Hypervisor virtual address start 185 + u16 reserved2; // Reserved for expansion 186 + u16 license_offs; // Offset to License string 187 + u16 pci_header_offs;// Offset to PCI OPROM header 188 + u16 pnp_header_offs;// Offset to PnP OPROM header 189 + u32 rom_pad3; // PnP reserverd / VMI reserved 190 + u8 reserved[96]; // Reserved for headers 191 + char vmi_init[8]; // VMI_Init jump point 192 + char get_reloc[8]; // VMI_GetRelocationInfo jump point 193 + } __attribute__((packed)); 194 + 195 + struct pnp_header { 196 + char sig[4]; 197 + char rev; 198 + char size; 199 + short next; 200 + short res; 201 + long devID; 202 + unsigned short manufacturer_offset; 203 + unsigned short product_offset; 204 + } __attribute__((packed)); 205 + 206 + struct pci_header { 207 + char sig[4]; 208 + short vendorID; 209 + short deviceID; 210 + short vpdData; 211 + short size; 212 + char rev; 213 + char class; 214 + char subclass; 215 + char interface; 216 + short chunks; 217 + char rom_version_min; 218 + char rom_version_maj; 219 + char codetype; 220 + char lastRom; 221 + short reserved; 222 + } __attribute__((packed)); 223 + 224 + /* Function prototypes for bootstrapping */ 225 + extern void vmi_init(void); 226 + extern void vmi_bringup(void); 227 + extern void vmi_apply_boot_page_allocations(void); 228 + 229 + /* State needed to start an application processor in an SMP system. */ 230 + struct vmi_ap_state { 231 + u32 cr0; 232 + u32 cr2; 233 + u32 cr3; 234 + u32 cr4; 235 + 236 + u64 efer; 237 + 238 + u32 eip; 239 + u32 eflags; 240 + u32 eax; 241 + u32 ebx; 242 + u32 ecx; 243 + u32 edx; 244 + u32 esp; 245 + u32 ebp; 246 + u32 esi; 247 + u32 edi; 248 + u16 cs; 249 + u16 ss; 250 + u16 ds; 251 + u16 es; 252 + u16 fs; 253 + u16 gs; 254 + u16 ldtr; 255 + 256 + u16 gdtr_limit; 257 + u32 gdtr_base; 258 + u32 idtr_base; 259 + u16 idtr_limit; 260 + }; 261 + 262 + #endif

+103

include/asm-i386/vmi_time.h

··· 1 + /* 2 + * VMI Time wrappers 3 + * 4 + * Copyright (C) 2006, VMware, Inc. 5 + * 6 + * This program is free software; you can redistribute it and/or modify 7 + * it under the terms of the GNU General Public License as published by 8 + * the Free Software Foundation; either version 2 of the License, or 9 + * (at your option) any later version. 10 + * 11 + * This program is distributed in the hope that it will be useful, but 12 + * WITHOUT ANY WARRANTY; without even the implied warranty of 13 + * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or 14 + * NON INFRINGEMENT. See the GNU General Public License for more 15 + * details. 16 + * 17 + * You should have received a copy of the GNU General Public License 18 + * along with this program; if not, write to the Free Software 19 + * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. 20 + * 21 + * Send feedback to dhecht@vmware.com 22 + * 23 + */ 24 + 25 + #ifndef __VMI_TIME_H 26 + #define __VMI_TIME_H 27 + 28 + /* 29 + * Raw VMI call indices for timer functions 30 + */ 31 + #define VMI_CALL_GetCycleFrequency 66 32 + #define VMI_CALL_GetCycleCounter 67 33 + #define VMI_CALL_SetAlarm 68 34 + #define VMI_CALL_CancelAlarm 69 35 + #define VMI_CALL_GetWallclockTime 70 36 + #define VMI_CALL_WallclockUpdated 71 37 + 38 + /* Cached VMI timer operations */ 39 + extern struct vmi_timer_ops { 40 + u64 (*get_cycle_frequency)(void); 41 + u64 (*get_cycle_counter)(int); 42 + u64 (*get_wallclock)(void); 43 + int (*wallclock_updated)(void); 44 + void (*set_alarm)(u32 flags, u64 expiry, u64 period); 45 + void (*cancel_alarm)(u32 flags); 46 + } vmi_timer_ops; 47 + 48 + /* Prototypes */ 49 + extern void __init vmi_time_init(void); 50 + extern unsigned long vmi_get_wallclock(void); 51 + extern int vmi_set_wallclock(unsigned long now); 52 + extern unsigned long long vmi_sched_clock(void); 53 + 54 + #ifdef CONFIG_X86_LOCAL_APIC 55 + extern void __init vmi_timer_setup_boot_alarm(void); 56 + extern void __init vmi_timer_setup_secondary_alarm(void); 57 + extern void apic_vmi_timer_interrupt(void); 58 + #endif 59 + 60 + #ifdef CONFIG_NO_IDLE_HZ 61 + extern int vmi_stop_hz_timer(void); 62 + extern void vmi_account_time_restart_hz_timer(void); 63 + #endif 64 + 65 + /* 66 + * When run under a hypervisor, a vcpu is always in one of three states: 67 + * running, halted, or ready. The vcpu is in the 'running' state if it 68 + * is executing. When the vcpu executes the halt interface, the vcpu 69 + * enters the 'halted' state and remains halted until there is some work 70 + * pending for the vcpu (e.g. an alarm expires, host I/O completes on 71 + * behalf of virtual I/O). At this point, the vcpu enters the 'ready' 72 + * state (waiting for the hypervisor to reschedule it). Finally, at any 73 + * time when the vcpu is not in the 'running' state nor the 'halted' 74 + * state, it is in the 'ready' state. 75 + * 76 + * Real time is advances while the vcpu is 'running', 'ready', or 77 + * 'halted'. Stolen time is the time in which the vcpu is in the 78 + * 'ready' state. Available time is the remaining time -- the vcpu is 79 + * either 'running' or 'halted'. 80 + * 81 + * All three views of time are accessible through the VMI cycle 82 + * counters. 83 + */ 84 + 85 + /* The cycle counters. */ 86 + #define VMI_CYCLES_REAL 0 87 + #define VMI_CYCLES_AVAILABLE 1 88 + #define VMI_CYCLES_STOLEN 2 89 + 90 + /* The alarm interface 'flags' bits */ 91 + #define VMI_ALARM_COUNTERS 2 92 + 93 + #define VMI_ALARM_COUNTER_MASK 0x000000ff 94 + 95 + #define VMI_ALARM_WIRED_IRQ0 0x00000000 96 + #define VMI_ALARM_WIRED_LVTT 0x00010000 97 + 98 + #define VMI_ALARM_IS_ONESHOT 0x00000000 99 + #define VMI_ALARM_IS_PERIODIC 0x00000100 100 + 101 + #define CONFIG_VMI_ALARM_HZ 100 102 + 103 + #endif

+1 -1

include/asm-x86_64/bitops.h

··· 7 7 8 8 #include <asm/alternative.h> 9 9 10 - #if __GNUC__ < 4 || __GNUC_MINOR__ < 1 10 + #if __GNUC__ < 4 || (__GNUC__ == 4 && __GNUC_MINOR__ < 1) 11 11 /* Technically wrong, but this avoids compilation errors on some gcc 12 12 versions. */ 13 13 #define ADDR "=m" (*(volatile long *) addr)

+3

include/asm-x86_64/dma-mapping.h

··· 66 66 #define dma_alloc_noncoherent(d, s, h, f) dma_alloc_coherent(d, s, h, f) 67 67 #define dma_free_noncoherent(d, s, v, h) dma_free_coherent(d, s, v, h) 68 68 69 + #define dma_alloc_noncoherent(d, s, h, f) dma_alloc_coherent(d, s, h, f) 70 + #define dma_free_noncoherent(d, s, v, h) dma_free_coherent(d, s, v, h) 71 + 69 72 extern void *dma_alloc_coherent(struct device *dev, size_t size, 70 73 dma_addr_t *dma_handle, gfp_t gfp); 71 74 extern void dma_free_coherent(struct device *dev, size_t size, void *vaddr,

+2

include/asm-x86_64/e820.h

··· 46 46 extern void e820_print_map(char *who); 47 47 extern int e820_any_mapped(unsigned long start, unsigned long end, unsigned type); 48 48 extern int e820_all_mapped(unsigned long start, unsigned long end, unsigned type); 49 + extern unsigned long e820_hole_size(unsigned long start, unsigned long end); 49 50 50 51 extern void e820_setup_gap(void); 51 52 extern void e820_register_active_regions(int nid, ··· 57 56 extern struct e820map e820; 58 57 59 58 extern unsigned ebda_addr, ebda_size; 59 + extern unsigned long nodemap_addr, nodemap_size; 60 60 #endif/*!__ASSEMBLY__*/ 61 61 62 62 #endif/*__E820_HEADER*/

+1 -1

include/asm-x86_64/hw_irq.h

··· 91 91 extern int i8259A_irq_pending(unsigned int irq); 92 92 extern void make_8259A_irq(unsigned int irq); 93 93 extern void init_8259A(int aeoi); 94 - extern void FASTCALL(send_IPI_self(int vector)); 94 + extern void send_IPI_self(int vector); 95 95 extern void init_VISWS_APIC_irqs(void); 96 96 extern void setup_IO_APIC(void); 97 97 extern void disable_IO_APIC(void);

+1 -1

include/asm-x86_64/io.h

··· 100 100 101 101 #define IO_SPACE_LIMIT 0xffff 102 102 103 - #if defined(__KERNEL__) && __x86_64__ 103 + #if defined(__KERNEL__) && defined(__x86_64__) 104 104 105 105 #include <linux/vmalloc.h> 106 106

+2 -12

include/asm-x86_64/io_apic.h

··· 85 85 mask : 1, /* 0: enabled, 1: disabled */ 86 86 __reserved_2 : 15; 87 87 88 - union { struct { __u32 89 - __reserved_1 : 24, 90 - physical_dest : 4, 91 - __reserved_2 : 4; 92 - } physical; 93 - 94 - struct { __u32 95 - __reserved_1 : 24, 96 - logical_dest : 8; 97 - } logical; 98 - } dest; 99 - 88 + __u32 __reserved_3 : 24, 89 + dest : 8; 100 90 } __attribute__ ((packed)); 101 91 102 92 /*

+2

include/asm-x86_64/mce.h

··· 103 103 104 104 extern atomic_t mce_entry; 105 105 106 + extern void do_machine_check(struct pt_regs *, long); 107 + 106 108 #endif 107 109 108 110 #endif

+12 -6

include/asm-x86_64/mmzone.h

··· 11 11 12 12 #include <asm/smp.h> 13 13 14 - /* Should really switch to dynamic allocation at some point */ 15 - #define NODEMAPSIZE 0x4fff 16 - 17 14 /* Simple perfect hash to map physical addresses to node numbers */ 18 15 struct memnode { 19 16 int shift; 20 - u8 map[NODEMAPSIZE]; 21 - } ____cacheline_aligned; 17 + unsigned int mapsize; 18 + u8 *map; 19 + u8 embedded_map[64-16]; 20 + } ____cacheline_aligned; /* total size = 64 bytes */ 22 21 extern struct memnode memnode; 23 22 #define memnode_shift memnode.shift 24 23 #define memnodemap memnode.map 24 + #define memnodemapsize memnode.mapsize 25 25 26 26 extern struct pglist_data *node_data[]; 27 27 28 28 static inline __attribute__((pure)) int phys_to_nid(unsigned long addr) 29 29 { 30 30 unsigned nid; 31 - VIRTUAL_BUG_ON((addr >> memnode_shift) >= NODEMAPSIZE); 31 + VIRTUAL_BUG_ON(!memnodemap); 32 + VIRTUAL_BUG_ON((addr >> memnode_shift) >= memnodemapsize); 32 33 nid = memnodemap[addr >> memnode_shift]; 33 34 VIRTUAL_BUG_ON(nid >= MAX_NUMNODES || !node_data[nid]); 34 35 return nid; ··· 45 44 #define pfn_to_nid(pfn) phys_to_nid((unsigned long)(pfn) << PAGE_SHIFT) 46 45 47 46 extern int pfn_valid(unsigned long pfn); 47 + #endif 48 + 49 + #ifdef CONFIG_NUMA_EMU 50 + #define FAKE_NODE_MIN_SIZE (64*1024*1024) 51 + #define FAKE_NODE_MIN_HASH_MASK (~(FAKE_NODE_MIN_SIZE - 1ul)) 48 52 #endif 49 53 50 54 #endif

+3 -3

include/asm-x86_64/mutex.h

··· 21 21 unsigned long dummy; \ 22 22 \ 23 23 typecheck(atomic_t *, v); \ 24 - typecheck_fn(fastcall void (*)(atomic_t *), fail_fn); \ 24 + typecheck_fn(void (*)(atomic_t *), fail_fn); \ 25 25 \ 26 26 __asm__ __volatile__( \ 27 27 LOCK_PREFIX " decl (%%rdi) \n" \ ··· 47 47 */ 48 48 static inline int 49 49 __mutex_fastpath_lock_retval(atomic_t *count, 50 - int fastcall (*fail_fn)(atomic_t *)) 50 + int (*fail_fn)(atomic_t *)) 51 51 { 52 52 if (unlikely(atomic_dec_return(count) < 0)) 53 53 return fail_fn(count); ··· 67 67 unsigned long dummy; \ 68 68 \ 69 69 typecheck(atomic_t *, v); \ 70 - typecheck_fn(fastcall void (*)(atomic_t *), fail_fn); \ 70 + typecheck_fn(void (*)(atomic_t *), fail_fn); \ 71 71 \ 72 72 __asm__ __volatile__( \ 73 73 LOCK_PREFIX " incl (%%rdi) \n" \

-5

include/asm-x86_64/pgalloc.h

··· 18 18 set_pmd(pmd, __pmd(_PAGE_TABLE | (page_to_pfn(pte) << PAGE_SHIFT))); 19 19 } 20 20 21 - static inline pmd_t *get_pmd(void) 22 - { 23 - return (pmd_t *)get_zeroed_page(GFP_KERNEL); 24 - } 25 - 26 21 static inline void pmd_free(pmd_t *pmd) 27 22 { 28 23 BUG_ON((unsigned long)pmd & (PAGE_SIZE-1));

-9

include/asm-x86_64/pgtable.h

··· 359 359 #define mk_pte(page, pgprot) pfn_pte(page_to_pfn(page), (pgprot)) 360 360 #define mk_pte_huge(entry) (pte_val(entry) |= _PAGE_PRESENT | _PAGE_PSE) 361 361 362 - /* physical address -> PTE */ 363 - static inline pte_t mk_pte_phys(unsigned long physpage, pgprot_t pgprot) 364 - { 365 - pte_t pte; 366 - pte_val(pte) = physpage | pgprot_val(pgprot); 367 - pte_val(pte) &= __supported_pte_mask; 368 - return pte; 369 - } 370 - 371 362 /* Change flags of a PTE */ 372 363 static inline pte_t pte_modify(pte_t pte, pgprot_t newprot) 373 364 {

+14

include/asm-x86_64/uaccess.h

··· 367 367 return copy_user_generic((__force void *)dst, src, size); 368 368 } 369 369 370 + #define ARCH_HAS_NOCACHE_UACCESS 1 371 + extern long __copy_user_nocache(void *dst, const void __user *src, unsigned size, int zerorest); 372 + 373 + static inline int __copy_from_user_nocache(void *dst, const void __user *src, unsigned size) 374 + { 375 + might_sleep(); 376 + return __copy_user_nocache(dst, (__force void *)src, size, 1); 377 + } 378 + 379 + static inline int __copy_from_user_inatomic_nocache(void *dst, const void __user *src, unsigned size) 380 + { 381 + return __copy_user_nocache(dst, (__force void *)src, size, 0); 382 + } 383 + 370 384 #endif /* __X86_64_UACCESS_H */

-5

include/asm-x86_64/vsyscall.h

··· 56 56 extern int vgetcpu_mode; 57 57 extern struct timezone sys_tz; 58 58 extern int sysctl_vsyscall; 59 - extern seqlock_t xtime_lock; 60 - 61 - extern int sysctl_vsyscall; 62 - 63 - #define ARCH_HAVE_XTIME_LOCK 1 64 59 65 60 #endif /* __KERNEL__ */ 66 61

+1

include/linux/binfmts.h

··· 59 59 int (*load_shlib)(struct file *); 60 60 int (*core_dump)(long signr, struct pt_regs * regs, struct file * file); 61 61 unsigned long min_coredump; /* minimal dump size */ 62 + int hasvdso; 62 63 }; 63 64 64 65 extern int register_binfmt(struct linux_binfmt *);

+8 -1

include/linux/nmi.h

··· 17 17 #ifdef ARCH_HAS_NMI_WATCHDOG 18 18 #include <asm/nmi.h> 19 19 extern void touch_nmi_watchdog(void); 20 + extern void acpi_nmi_disable(void); 21 + extern void acpi_nmi_enable(void); 20 22 #else 21 - # define touch_nmi_watchdog() touch_softlockup_watchdog() 23 + static inline void touch_nmi_watchdog(void) 24 + { 25 + touch_softlockup_watchdog(); 26 + } 27 + static inline void acpi_nmi_disable(void) { } 28 + static inline void acpi_nmi_enable(void) { } 22 29 #endif 23 30 24 31 #ifndef trigger_all_cpu_backtrace

+1 -1

include/linux/time.h

··· 90 90 91 91 extern struct timespec xtime; 92 92 extern struct timespec wall_to_monotonic; 93 - extern seqlock_t xtime_lock; 93 + extern seqlock_t xtime_lock __attribute__((weak)); 94 94 95 95 void timekeeping_init(void); 96 96

+45 -36

init/main.c

··· 726 726 kernel_execve(init_filename, argv_init, envp_init); 727 727 } 728 728 729 - static int init(void * unused) 729 + /* This is a non __init function. Force it to be noinline otherwise gcc 730 + * makes it inline to init() and it becomes part of init.text section 731 + */ 732 + static int noinline init_post(void) 733 + { 734 + free_initmem(); 735 + unlock_kernel(); 736 + mark_rodata_ro(); 737 + system_state = SYSTEM_RUNNING; 738 + numa_default_policy(); 739 + 740 + if (sys_open((const char __user *) "/dev/console", O_RDWR, 0) < 0) 741 + printk(KERN_WARNING "Warning: unable to open an initial console.\n"); 742 + 743 + (void) sys_dup(0); 744 + (void) sys_dup(0); 745 + 746 + if (ramdisk_execute_command) { 747 + run_init_process(ramdisk_execute_command); 748 + printk(KERN_WARNING "Failed to execute %s\n", 749 + ramdisk_execute_command); 750 + } 751 + 752 + /* 753 + * We try each of these until one succeeds. 754 + * 755 + * The Bourne shell can be used instead of init if we are 756 + * trying to recover a really broken machine. 757 + */ 758 + if (execute_command) { 759 + run_init_process(execute_command); 760 + printk(KERN_WARNING "Failed to execute %s. Attempting " 761 + "defaults...\n", execute_command); 762 + } 763 + run_init_process("/sbin/init"); 764 + run_init_process("/etc/init"); 765 + run_init_process("/bin/init"); 766 + run_init_process("/bin/sh"); 767 + 768 + panic("No init found. Try passing init= option to kernel."); 769 + } 770 + 771 + static int __init init(void * unused) 730 772 { 731 773 lock_kernel(); 732 774 /* ··· 816 774 * we're essentially up and running. Get rid of the 817 775 * initmem segments and start the user-mode stuff.. 818 776 */ 819 - free_initmem(); 820 - unlock_kernel(); 821 - mark_rodata_ro(); 822 - system_state = SYSTEM_RUNNING; 823 - numa_default_policy(); 824 - 825 - if (sys_open((const char __user *) "/dev/console", O_RDWR, 0) < 0) 826 - printk(KERN_WARNING "Warning: unable to open an initial console.\n"); 827 - 828 - (void) sys_dup(0); 829 - (void) sys_dup(0); 830 - 831 - if (ramdisk_execute_command) { 832 - run_init_process(ramdisk_execute_command); 833 - printk(KERN_WARNING "Failed to execute %s\n", 834 - ramdisk_execute_command); 835 - } 836 - 837 - /* 838 - * We try each of these until one succeeds. 839 - * 840 - * The Bourne shell can be used instead of init if we are 841 - * trying to recover a really broken machine. 842 - */ 843 - if (execute_command) { 844 - run_init_process(execute_command); 845 - printk(KERN_WARNING "Failed to execute %s. Attempting " 846 - "defaults...\n", execute_command); 847 - } 848 - run_init_process("/sbin/init"); 849 - run_init_process("/etc/init"); 850 - run_init_process("/bin/init"); 851 - run_init_process("/bin/sh"); 852 - 853 - panic("No init found. Try passing init= option to kernel."); 777 + init_post(); 778 + return 0; 854 779 }

+30 -14

kernel/kmod.c

··· 217 217 sub_info->retval = ret; 218 218 } 219 219 220 - complete(sub_info->complete); 220 + if (sub_info->wait < 0) 221 + kfree(sub_info); 222 + else 223 + complete(sub_info->complete); 221 224 return 0; 222 225 } 223 226 ··· 242 239 pid = kernel_thread(____call_usermodehelper, sub_info, 243 240 CLONE_VFORK | SIGCHLD); 244 241 242 + if (wait < 0) 243 + return; 244 + 245 245 if (pid < 0) { 246 246 sub_info->retval = pid; 247 247 complete(sub_info->complete); ··· 259 253 * @envp: null-terminated environment list 260 254 * @session_keyring: session keyring for process (NULL for an empty keyring) 261 255 * @wait: wait for the application to finish and return status. 256 + * when -1 don't wait at all, but you get no useful error back when 257 + * the program couldn't be exec'ed. This makes it safe to call 258 + * from interrupt context. 262 259 * 263 260 * Runs a user-space application. The application is started 264 261 * asynchronously if wait is not set, and runs as a child of keventd. ··· 274 265 struct key *session_keyring, int wait) 275 266 { 276 267 DECLARE_COMPLETION_ONSTACK(done); 277 - struct subprocess_info sub_info = { 278 - .work = __WORK_INITIALIZER(sub_info.work, 279 - __call_usermodehelper), 280 - .complete = &done, 281 - .path = path, 282 - .argv = argv, 283 - .envp = envp, 284 - .ring = session_keyring, 285 - .wait = wait, 286 - .retval = 0, 287 - }; 268 + struct subprocess_info *sub_info; 269 + int retval; 288 270 289 271 if (!khelper_wq) 290 272 return -EBUSY; ··· 283 283 if (path[0] == '\0') 284 284 return 0; 285 285 286 - queue_work(khelper_wq, &sub_info.work); 286 + sub_info = kzalloc(sizeof(struct subprocess_info), GFP_ATOMIC); 287 + if (!sub_info) 288 + return -ENOMEM; 289 + 290 + INIT_WORK(&sub_info->work, __call_usermodehelper); 291 + sub_info->complete = &done; 292 + sub_info->path = path; 293 + sub_info->argv = argv; 294 + sub_info->envp = envp; 295 + sub_info->ring = session_keyring; 296 + sub_info->wait = wait; 297 + 298 + queue_work(khelper_wq, &sub_info->work); 299 + if (wait < 0) /* task has freed sub_info */ 300 + return 0; 287 301 wait_for_completion(&done); 288 - return sub_info.retval; 302 + retval = sub_info->retval; 303 + kfree(sub_info); 304 + return retval; 289 305 } 290 306 EXPORT_SYMBOL(call_usermodehelper_keys); 291 307

+7

kernel/sched.c

··· 1853 1853 struct mm_struct *mm = next->mm; 1854 1854 struct mm_struct *oldmm = prev->active_mm; 1855 1855 1856 + /* 1857 + * For paravirt, this is coupled with an exit in switch_to to 1858 + * combine the page table reload and the switch backend into 1859 + * one hypercall. 1860 + */ 1861 + arch_enter_lazy_cpu_mode(); 1862 + 1856 1863 if (!mm) { 1857 1864 next->active_mm = oldmm; 1858 1865 atomic_inc(&oldmm->mm_count);

+1 -3

kernel/timer.c

··· 1162 1162 * This read-write spinlock protects us from races in SMP while 1163 1163 * playing with xtime and avenrun. 1164 1164 */ 1165 - #ifndef ARCH_HAVE_XTIME_LOCK 1166 - __cacheline_aligned_in_smp DEFINE_SEQLOCK(xtime_lock); 1165 + __attribute__((weak)) __cacheline_aligned_in_smp DEFINE_SEQLOCK(xtime_lock); 1167 1166 1168 1167 EXPORT_SYMBOL(xtime_lock); 1169 - #endif 1170 1168 1171 1169 /* 1172 1170 * This function runs timers and the timer-tq in bottom half context.

+9 -1

scripts/mod/modpost.c

··· 641 641 if (f1 && f2) 642 642 return 1; 643 643 644 - /* Whitelist all references from .pci_fixup section if vmlinux */ 644 + /* Whitelist all references from .pci_fixup section if vmlinux 645 + * Whitelist all refereces from .text.head to .init.data if vmlinux 646 + * Whitelist all refereces from .text.head to .init.text if vmlinux 647 + */ 645 648 if (is_vmlinux(modname)) { 646 649 if ((strcmp(fromsec, ".pci_fixup") == 0) && 647 650 (strcmp(tosec, ".init.text") == 0)) 651 + return 1; 652 + 653 + if ((strcmp(fromsec, ".text.head") == 0) && 654 + ((strcmp(tosec, ".init.data") == 0) || 655 + (strcmp(tosec, ".init.text") == 0))) 648 656 return 1; 649 657 650 658 /* Check for pattern 3 */