···298298299299 Note that, rcu_assign_pointer() and rcu_dereference() relate to300300 SRCU just as they do to other forms of RCU.301301+302302+15. The whole point of call_rcu(), synchronize_rcu(), and friends303303+ is to wait until all pre-existing readers have finished before304304+ carrying out some otherwise-destructive operation. It is305305+ therefore critically important to -first- remove any path306306+ that readers can follow that could be affected by the307307+ destructive operation, and -only- -then- invoke call_rcu(),308308+ synchronize_rcu(), or friends.309309+310310+ Because these primitives only wait for pre-existing readers,311311+ it is the caller's responsibility to guarantee safety to312312+ any subsequent readers.
+9
Documentation/feature-removal-schedule.txt
···335335 Secmark, it is time to deprecate the older mechanism and start the336336 process of removing the old code.337337Who: Paul Moore <paul.moore@hp.com>338338+---------------------------339339+340340+What: sysfs ui for changing p4-clockmod parameters341341+When: September 2009342342+Why: See commits 129f8ae9b1b5be94517da76009ea956e89104ce8 and343343+ e088e4c9cdb618675874becb91b2fd581ee707e6.344344+ Removal is subject to fixing any remaining bugs in ACPI which may345345+ cause the thermal throttling not to happen at the right time.346346+Who: Dave Jones <davej@redhat.com>, Matthew Garrett <mjg@redhat.com>
···11+22+Options for the ipv6 module are supplied as parameters at load time.33+44+Module options may be given as command line arguments to the insmod55+or modprobe command, but are usually specified in either the66+/etc/modules.conf or /etc/modprobe.conf configuration file, or in a77+distro-specific configuration file.88+99+The available ipv6 module parameters are listed below. If a parameter1010+is not specified the default value is used.1111+1212+The parameters are as follows:1313+1414+disable1515+1616+ Specifies whether to load the IPv6 module, but disable all1717+ its functionality. This might be used when another module1818+ has a dependency on the IPv6 module being loaded, but no1919+ IPv6 addresses or operations are desired.2020+2121+ The possible values and their effects are:2222+2323+ 02424+ IPv6 is enabled.2525+2626+ This is the default value.2727+2828+ 12929+ IPv6 is disabled.3030+3131+ No IPv6 addresses will be added to interfaces, and3232+ it will not be possible to open an IPv6 socket.3333+3434+ A reboot is required to enable IPv6.3535+
+101
Documentation/x86/earlyprintk.txt
···11+22+Mini-HOWTO for using the earlyprintk=dbgp boot option with a33+USB2 Debug port key and a debug cable, on x86 systems.44+55+You need two computers, the 'USB debug key' special gadget and66+and two USB cables, connected like this:77+88+ [host/target] <-------> [USB debug key] <-------> [client/console]99+1010+1. There are three specific hardware requirements:1111+1212+ a.) Host/target system needs to have USB debug port capability.1313+1414+ You can check this capability by looking at a 'Debug port' bit in1515+ the lspci -vvv output:1616+1717+ # lspci -vvv1818+ ...1919+ 00:1d.7 USB Controller: Intel Corporation 82801H (ICH8 Family) USB2 EHCI Controller #1 (rev 03) (prog-if 20 [EHCI])2020+ Subsystem: Lenovo ThinkPad T612121+ Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-2222+ Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-2323+ Latency: 02424+ Interrupt: pin D routed to IRQ 192525+ Region 0: Memory at fe227000 (32-bit, non-prefetchable) [size=1K]2626+ Capabilities: [50] Power Management version 22727+ Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0+,D1-,D2-,D3hot+,D3cold+)2828+ Status: D0 PME-Enable- DSel=0 DScale=0 PME+2929+ Capabilities: [58] Debug port: BAR=1 offset=00a03030+ ^^^^^^^^^^^ <==================== [ HERE ]3131+ Kernel driver in use: ehci_hcd3232+ Kernel modules: ehci-hcd3333+ ...3434+3535+( If your system does not list a debug port capability then you probably3636+ wont be able to use the USB debug key. )3737+3838+ b.) You also need a Netchip USB debug cable/key:3939+4040+ http://www.plxtech.com/products/NET2000/NET20DC/default.asp4141+4242+ This is a small blue plastic connector with two USB connections,4343+ it draws power from its USB connections.4444+4545+ c.) Thirdly, you need a second client/console system with a regular USB port.4646+4747+2. Software requirements:4848+4949+ a.) On the host/target system:5050+5151+ You need to enable the following kernel config option:5252+5353+ CONFIG_EARLY_PRINTK_DBGP=y5454+5555+ And you need to add the boot command line: "earlyprintk=dbgp".5656+ (If you are using Grub, append it to the 'kernel' line in5757+ /etc/grub.conf)5858+5959+ NOTE: normally earlyprintk console gets turned off once the6060+ regular console is alive - use "earlyprintk=dbgp,keep" to keep6161+ this channel open beyond early bootup. This can be useful for6262+ debugging crashes under Xorg, etc.6363+6464+ b.) On the client/console system:6565+6666+ You should enable the following kernel config option:6767+6868+ CONFIG_USB_SERIAL_DEBUG=y6969+7070+ On the next bootup with the modified kernel you should7171+ get a /dev/ttyUSBx device(s).7272+7373+ Now this channel of kernel messages is ready to be used: start7474+ your favorite terminal emulator (minicom, etc.) and set7575+ it up to use /dev/ttyUSB0 - or use a raw 'cat /dev/ttyUSBx' to7676+ see the raw output.7777+7878+ c.) On Nvidia Southbridge based systems: the kernel will try to probe7979+ and find out which port has debug device connected.8080+8181+3. Testing that it works fine:8282+8383+ You can test the output by using earlyprintk=dbgp,keep and provoking8484+ kernel messages on the host/target system. You can provoke a harmless8585+ kernel message by for example doing:8686+8787+ echo h > /proc/sysrq-trigger8888+8989+ On the host/target system you should see this help line in "dmesg" output:9090+9191+ SysRq : HELP : loglevel(0-9) reBoot Crashdump terminate-all-tasks(E) memory-full-oom-kill(F) kill-all-tasks(I) saK show-backtrace-all-active-cpus(L) show-memory-usage(M) nice-all-RT-tasks(N) powerOff show-registers(P) show-all-timers(Q) unRaw Sync show-task-states(T) Unmount show-blocked-tasks(W) dump-ftrace-buffer(Z)9292+9393+ On the client/console system do:9494+9595+ cat /dev/ttyUSB09696+9797+ And you should see the help line above displayed shortly after you've9898+ provoked it on the host system.9999+100100+If it does not work then please ask about it on the linux-kernel@vger.kernel.org101101+mailing list or contact the x86 maintainers.
···189189190190 if (alpha_using_srm) {191191 static struct vm_struct console_remap_vm;192192- unsigned long vaddr = VMALLOC_START;192192+ unsigned long nr_pages = 0;193193+ unsigned long vaddr;193194 unsigned long i, j;195195+196196+ /* calculate needed size */197197+ for (i = 0; i < crb->map_entries; ++i)198198+ nr_pages += crb->map[i].count;199199+200200+ /* register the vm area */201201+ console_remap_vm.flags = VM_ALLOC;202202+ console_remap_vm.size = nr_pages << PAGE_SHIFT;203203+ vm_area_register_early(&console_remap_vm, PAGE_SIZE);204204+205205+ vaddr = (unsigned long)console_remap_vm.addr;194206195207 /* Set up the third level PTEs and update the virtual196208 addresses of the CRB entries. */···225213 vaddr += PAGE_SIZE;226214 }227215 }228228-229229- /* Let vmalloc know that we've allocated some space. */230230- console_remap_vm.flags = VM_ALLOC;231231- console_remap_vm.addr = (void *) VMALLOC_START;232232- console_remap_vm.size = vaddr - VMALLOC_START;233233- vmlist = &console_remap_vm;234216 }235217236218 callback_init_done = 1;
+7-6
arch/arm/kernel/setup.c
···233233 unsigned int cachetype = read_cpuid_cachetype();234234 unsigned int arch = cpu_architecture();235235236236- if (arch >= CPU_ARCH_ARMv7) {237237- cacheid = CACHEID_VIPT_NONALIASING;238238- if ((cachetype & (3 << 14)) == 1 << 14)239239- cacheid |= CACHEID_ASID_TAGGED;240240- } else if (arch >= CPU_ARCH_ARMv6) {241241- if (cachetype & (1 << 23))236236+ if (arch >= CPU_ARCH_ARMv6) {237237+ if ((cachetype & (7 << 29)) == 4 << 29) {238238+ /* ARMv7 register format */239239+ cacheid = CACHEID_VIPT_NONALIASING;240240+ if ((cachetype & (3 << 14)) == 1 << 14)241241+ cacheid |= CACHEID_ASID_TAGGED;242242+ } else if (cachetype & (1 << 23))242243 cacheid = CACHEID_VIPT_ALIASING;243244 else244245 cacheid = CACHEID_VIPT_NONALIASING;
···1129112911301130config PM_WAKEUP_BY_GPIO11311131 bool "Allow Wakeup from Standby by GPIO"11321132+ depends on PM && !BF54x1132113311331134config PM_WAKEUP_GPIO_NUMBER11341135 int "GPIO number"···11691168 default n11701169 help11711170 Enable General-Purpose Wake-Up (Voltage Regulator Power-Up)11711171+ (all processors, except ADSP-BF549). This option sets11721172+ the general-purpose wake-up enable (GPWE) control bit to enable11731173+ wake-up upon detection of an active low signal on the /GPW (PH7) pin.11741174+ On ADSP-BF549 this option enables the the same functionality on the11751175+ /MRXON pin also PH7.11761176+11721177endmenu1173117811741179menu "CPU Frequency scaling"
-6
arch/blackfin/Kconfig.debug
···2121config HAVE_ARCH_KGDB2222 def_bool y23232424-config KGDB_TESTCASE2525- tristate "KGDB: for test case in expect"2626- default n2727- help2828- This is a kgdb test case for automated testing.2929-3024config DEBUG_VERBOSE3125 bool "Verbose fault messages"3226 default y
+59-4
arch/blackfin/configs/BF518F-EZBRD_defconfig
···11#22# Automatically generated make config: don't edit33-# Linux kernel version: 2.6.28-rc244-# Fri Jan 9 17:58:41 200933+# Linux kernel version: 2.6.2844+# Fri Feb 20 10:01:44 200955#66# CONFIG_MMU is not set77# CONFIG_FPU is not set···133133# CONFIG_BF538 is not set134134# CONFIG_BF539 is not set135135# CONFIG_BF542 is not set136136+# CONFIG_BF542M is not set136137# CONFIG_BF544 is not set138138+# CONFIG_BF544M is not set137139# CONFIG_BF547 is not set140140+# CONFIG_BF547M is not set138141# CONFIG_BF548 is not set142142+# CONFIG_BF548M is not set139143# CONFIG_BF549 is not set144144+# CONFIG_BF549M is not set140145# CONFIG_BF561 is not set141146CONFIG_BF_REV_MIN=0142147CONFIG_BF_REV_MAX=2···431426# CONFIG_TIPC is not set432427# CONFIG_ATM is not set433428# CONFIG_BRIDGE is not set434434-# CONFIG_NET_DSA is not set429429+CONFIG_NET_DSA=y430430+# CONFIG_NET_DSA_TAG_DSA is not set431431+# CONFIG_NET_DSA_TAG_EDSA is not set432432+# CONFIG_NET_DSA_TAG_TRAILER is not set433433+CONFIG_NET_DSA_TAG_STPID=y434434+# CONFIG_NET_DSA_MV88E6XXX is not set435435+# CONFIG_NET_DSA_MV88E6060 is not set436436+# CONFIG_NET_DSA_MV88E6XXX_NEED_PPU is not set437437+# CONFIG_NET_DSA_MV88E6131 is not set438438+# CONFIG_NET_DSA_MV88E6123_61_65 is not set439439+CONFIG_NET_DSA_KSZ8893M=y435440# CONFIG_VLAN_8021Q is not set436441# CONFIG_DECNET is not set437442# CONFIG_LLC2 is not set···544529#545530# Self-contained MTD device drivers546531#532532+# CONFIG_MTD_DATAFLASH is not set533533+# CONFIG_MTD_M25P80 is not set547534# CONFIG_MTD_SLRAM is not set548535# CONFIG_MTD_PHRAM is not set549536# CONFIG_MTD_MTDRAM is not set···578561# CONFIG_BLK_DEV_HD is not set579562CONFIG_MISC_DEVICES=y580563# CONFIG_EEPROM_93CX6 is not set564564+# CONFIG_ICS932S401 is not set581565# CONFIG_ENCLOSURE_SERVICES is not set566566+# CONFIG_C2PORT is not set582567CONFIG_HAVE_IDE=y583568# CONFIG_IDE is not set584569···626607# CONFIG_SMC91X is not set627608# CONFIG_SMSC911X is not set628609# CONFIG_DM9000 is not set610610+# CONFIG_ENC28J60 is not set629611# CONFIG_IBM_NEW_EMAC_ZMII is not set630612# CONFIG_IBM_NEW_EMAC_RGMII is not set631613# CONFIG_IBM_NEW_EMAC_TAH is not set···784764# CONFIG_I2C_DEBUG_ALGO is not set785765# CONFIG_I2C_DEBUG_BUS is not set786766# CONFIG_I2C_DEBUG_CHIP is not set787787-# CONFIG_SPI is not set767767+CONFIG_SPI=y768768+# CONFIG_SPI_DEBUG is not set769769+CONFIG_SPI_MASTER=y770770+771771+#772772+# SPI Master Controller Drivers773773+#774774+CONFIG_SPI_BFIN=y775775+# CONFIG_SPI_BFIN_LOCK is not set776776+# CONFIG_SPI_BITBANG is not set777777+778778+#779779+# SPI Protocol Masters780780+#781781+# CONFIG_SPI_AT25 is not set782782+# CONFIG_SPI_SPIDEV is not set783783+# CONFIG_SPI_TLE62X0 is not set788784CONFIG_ARCH_WANT_OPTIONAL_GPIOLIB=y789785# CONFIG_GPIOLIB is not set790786# CONFIG_W1 is not set···824788# CONFIG_MFD_SM501 is not set825789# CONFIG_HTC_PASIC3 is not set826790# CONFIG_MFD_TMIO is not set791791+# CONFIG_PMIC_DA903X is not set827792# CONFIG_MFD_WM8400 is not set828793# CONFIG_MFD_WM8350_I2C is not set794794+# CONFIG_REGULATOR is not set829795830796#831797# Multimedia devices···899861# CONFIG_RTC_DRV_M41T80 is not set900862# CONFIG_RTC_DRV_S35390A is not set901863# CONFIG_RTC_DRV_FM3130 is not set864864+# CONFIG_RTC_DRV_RX8581 is not set902865903866#904867# SPI RTC drivers905868#869869+# CONFIG_RTC_DRV_M41T94 is not set870870+# CONFIG_RTC_DRV_DS1305 is not set871871+# CONFIG_RTC_DRV_DS1390 is not set872872+# CONFIG_RTC_DRV_MAX6902 is not set873873+# CONFIG_RTC_DRV_R9701 is not set874874+# CONFIG_RTC_DRV_RS5C348 is not set875875+# CONFIG_RTC_DRV_DS3234 is not set906876907877#908878# Platform RTC drivers···11081062# CONFIG_DEBUG_BLOCK_EXT_DEVT is not set11091063# CONFIG_FAULT_INJECTION is not set11101064CONFIG_SYSCTL_SYSCALL_CHECK=y10651065+10661066+#10671067+# Tracers10681068+#10691069+# CONFIG_SCHED_TRACER is not set10701070+# CONFIG_CONTEXT_SWITCH_TRACER is not set10711071+# CONFIG_BOOT_TRACER is not set11111072# CONFIG_DYNAMIC_PRINTK_DEBUG is not set11121073# CONFIG_SAMPLES is not set11131074CONFIG_HAVE_ARCH_KGDB=y11141075# CONFIG_KGDB is not set11151076# CONFIG_DEBUG_STACKOVERFLOW is not set11161077# CONFIG_DEBUG_STACK_USAGE is not set10781078+# CONFIG_KGDB_TESTCASE is not set11171079CONFIG_DEBUG_VERBOSE=y11181080CONFIG_DEBUG_MMRS=y11191081# CONFIG_DEBUG_HWERR is not set···11541100#11551101# CONFIG_CRYPTO_FIPS is not set11561102# CONFIG_CRYPTO_MANAGER is not set11031103+# CONFIG_CRYPTO_MANAGER2 is not set11571104# CONFIG_CRYPTO_GF128MUL is not set11581105# CONFIG_CRYPTO_NULL is not set11591106# CONFIG_CRYPTO_CRYPTD is not set
+2-2
arch/blackfin/configs/BF527-EZKIT_defconfig
···327327CONFIG_BFIN_DCACHE=y328328# CONFIG_BFIN_DCACHE_BANKA is not set329329# CONFIG_BFIN_ICACHE_LOCK is not set330330-# CONFIG_BFIN_WB is not set331331-CONFIG_BFIN_WT=y330330+CONFIG_BFIN_WB=y331331+# CONFIG_BFIN_WT is not set332332# CONFIG_MPU is not set333333334334#
+2-2
arch/blackfin/configs/BF533-EZKIT_defconfig
···290290CONFIG_BFIN_DCACHE=y291291# CONFIG_BFIN_DCACHE_BANKA is not set292292# CONFIG_BFIN_ICACHE_LOCK is not set293293-# CONFIG_BFIN_WB is not set294294-CONFIG_BFIN_WT=y293293+CONFIG_BFIN_WB=y294294+# CONFIG_BFIN_WT is not set295295# CONFIG_MPU is not set296296297297#
+2-2
arch/blackfin/configs/BF533-STAMP_defconfig
···290290CONFIG_BFIN_DCACHE=y291291# CONFIG_BFIN_DCACHE_BANKA is not set292292# CONFIG_BFIN_ICACHE_LOCK is not set293293-# CONFIG_BFIN_WB is not set294294-CONFIG_BFIN_WT=y293293+CONFIG_BFIN_WB=y294294+# CONFIG_BFIN_WT is not set295295# CONFIG_MPU is not set296296297297#
+3-11
arch/blackfin/configs/BF537-STAMP_defconfig
···298298CONFIG_BFIN_DCACHE=y299299# CONFIG_BFIN_DCACHE_BANKA is not set300300# CONFIG_BFIN_ICACHE_LOCK is not set301301-# CONFIG_BFIN_WB is not set302302-CONFIG_BFIN_WT=y301301+CONFIG_BFIN_WB=y302302+# CONFIG_BFIN_WT is not set303303# CONFIG_MPU is not set304304305305#···568568# CONFIG_MTD_DOC2000 is not set569569# CONFIG_MTD_DOC2001 is not set570570# CONFIG_MTD_DOC2001PLUS is not set571571-CONFIG_MTD_NAND=m572572-# CONFIG_MTD_NAND_VERIFY_WRITE is not set573573-# CONFIG_MTD_NAND_ECC_SMC is not set574574-# CONFIG_MTD_NAND_MUSEUM_IDS is not set575575-# CONFIG_MTD_NAND_BFIN is not set576576-CONFIG_MTD_NAND_IDS=m577577-# CONFIG_MTD_NAND_DISKONCHIP is not set578578-# CONFIG_MTD_NAND_NANDSIM is not set579579-CONFIG_MTD_NAND_PLATFORM=m571571+# CONFIG_MTD_NAND is not set580572# CONFIG_MTD_ONENAND is not set581573582574#
+2-2
arch/blackfin/configs/BF538-EZKIT_defconfig
···306306CONFIG_BFIN_DCACHE=y307307# CONFIG_BFIN_DCACHE_BANKA is not set308308# CONFIG_BFIN_ICACHE_LOCK is not set309309-# CONFIG_BFIN_WB is not set310310-CONFIG_BFIN_WT=y309309+CONFIG_BFIN_WB=y310310+# CONFIG_BFIN_WT is not set311311# CONFIG_MPU is not set312312313313#
+3-3
arch/blackfin/configs/BF548-EZKIT_defconfig
···361361CONFIG_BFIN_DCACHE=y362362# CONFIG_BFIN_DCACHE_BANKA is not set363363# CONFIG_BFIN_ICACHE_LOCK is not set364364-# CONFIG_BFIN_WB is not set365365-CONFIG_BFIN_WT=y364364+CONFIG_BFIN_WB=y365365+# CONFIG_BFIN_WT is not set366366# CONFIG_BFIN_L2_CACHEABLE is not set367367# CONFIG_MPU is not set368368···680680CONFIG_SCSI_DMA=y681681# CONFIG_SCSI_TGT is not set682682# CONFIG_SCSI_NETLINK is not set683683-CONFIG_SCSI_PROC_FS=y683683+# CONFIG_SCSI_PROC_FS is not set684684685685#686686# SCSI support type (disk, tape, CD-ROM)
+2-2
arch/blackfin/configs/BF561-EZKIT_defconfig
···329329CONFIG_BFIN_DCACHE=y330330# CONFIG_BFIN_DCACHE_BANKA is not set331331# CONFIG_BFIN_ICACHE_LOCK is not set332332-# CONFIG_BFIN_WB is not set333333-CONFIG_BFIN_WT=y332332+CONFIG_BFIN_WB=y333333+# CONFIG_BFIN_WT is not set334334# CONFIG_BFIN_L2_CACHEABLE is not set335335# CONFIG_MPU is not set336336
+2-2
arch/blackfin/configs/BlackStamp_defconfig
···288288CONFIG_BFIN_DCACHE=y289289# CONFIG_BFIN_DCACHE_BANKA is not set290290# CONFIG_BFIN_ICACHE_LOCK is not set291291-# CONFIG_BFIN_WB is not set292292-CONFIG_BFIN_WT=y291291+CONFIG_BFIN_WB=y292292+# CONFIG_BFIN_WT is not set293293# CONFIG_MPU is not set294294295295#
+2-2
arch/blackfin/configs/CM-BF527_defconfig
···332332CONFIG_BFIN_DCACHE=y333333# CONFIG_BFIN_DCACHE_BANKA is not set334334# CONFIG_BFIN_ICACHE_LOCK is not set335335-# CONFIG_BFIN_WB is not set336336-CONFIG_BFIN_WT=y335335+CONFIG_BFIN_WB=y336336+# CONFIG_BFIN_WT is not set337337# CONFIG_MPU is not set338338339339#
+3-3
arch/blackfin/configs/CM-BF548_defconfig
···336336CONFIG_BFIN_DCACHE=y337337# CONFIG_BFIN_DCACHE_BANKA is not set338338# CONFIG_BFIN_ICACHE_LOCK is not set339339-# CONFIG_BFIN_WB is not set340340-CONFIG_BFIN_WT=y339339+CONFIG_BFIN_WB=y340340+# CONFIG_BFIN_WT is not set341341CONFIG_L1_MAX_PIECE=16342342# CONFIG_MPU is not set343343···595595CONFIG_SCSI_DMA=y596596# CONFIG_SCSI_TGT is not set597597# CONFIG_SCSI_NETLINK is not set598598-CONFIG_SCSI_PROC_FS=y598598+# CONFIG_SCSI_PROC_FS is not set599599600600#601601# SCSI support type (disk, tape, CD-ROM)
+1-1
arch/blackfin/configs/IP0X_defconfig
···612612CONFIG_SCSI=y613613# CONFIG_SCSI_TGT is not set614614# CONFIG_SCSI_NETLINK is not set615615-CONFIG_SCSI_PROC_FS=y615615+# CONFIG_SCSI_PROC_FS is not set616616617617#618618# SCSI support type (disk, tape, CD-ROM)
+2-2
arch/blackfin/configs/SRV1_defconfig
···282282CONFIG_BFIN_DCACHE=y283283# CONFIG_BFIN_DCACHE_BANKA is not set284284# CONFIG_BFIN_ICACHE_LOCK is not set285285-# CONFIG_BFIN_WB is not set286286-CONFIG_BFIN_WT=y285285+CONFIG_BFIN_WB=y286286+# CONFIG_BFIN_WT is not set287287CONFIG_L1_MAX_PIECE=16288288289289#
···11/*22- * File: include/asm-blackfin/bfin_sport.h33- * Based on:44- * Author: Roy Huang (roy.huang@analog.com)22+ * bfin_sport.h - userspace header for bfin sport driver53 *66- * Created: Thu Aug. 24 200677- * Description:44+ * Copyright 2004-2008 Analog Devices Inc.85 *99- * Modified:1010- * Copyright 2004-2006 Analog Devices Inc.1111- *1212- * Bugs: Enter bugs at http://blackfin.uclinux.org/1313- *1414- * This program is free software; you can redistribute it and/or modify1515- * it under the terms of the GNU General Public License as published by1616- * the Free Software Foundation; either version 2 of the License, or1717- * (at your option) any later version.1818- *1919- * This program is distributed in the hope that it will be useful,2020- * but WITHOUT ANY WARRANTY; without even the implied warranty of2121- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the2222- * GNU General Public License for more details.2323- *2424- * You should have received a copy of the GNU General Public License2525- * along with this program; if not, see the file COPYING, or write2626- * to the Free Software Foundation, Inc.,2727- * 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA66+ * Licensed under the GPL-2 or later.287 */298309#ifndef __BFIN_SPORT_H__···2142#define NORM_FORMAT 0x02243#define ALAW_FORMAT 0x22344#define ULAW_FORMAT 0x32424-struct sport_register;25452646/* Function driver which use sport must initialize the structure */2747struct sport_config {2828- /*TDM (multichannels), I2S or other mode */4848+ /* TDM (multichannels), I2S or other mode */2949 unsigned int mode:3;30503151 /* if TDM mode is selected, channels must be set */···5072 int serial_clk;5173 int fsync_clk;52745353- unsigned int data_format:2; /*Normal, u-law or a-law */7575+ unsigned int data_format:2; /* Normal, u-law or a-law */54765577 int word_len; /* How length of the word in bits, 3-32 bits */5678 int dma_enabled;5779};8080+8181+/* Userspace interface */8282+#define SPORT_IOC_MAGIC 'P'8383+#define SPORT_IOC_CONFIG _IOWR('P', 0x01, struct sport_config)8484+8585+#ifdef __KERNEL__58865987struct sport_register {6088 unsigned short tcr1;···101117 unsigned long mrcs3;102118};103119104104-#define SPORT_IOC_MAGIC 'P'105105-#define SPORT_IOC_CONFIG _IOWR('P', 0x01, struct sport_config)106106-107120struct sport_dev {108121 struct cdev cdev; /* Char device structure */109122···130149 struct sport_config config;131150};132151152152+#endif153153+133154#define SPORT_TCR1 0134155#define SPORT_TCR2 1135156#define SPORT_TCLKDIV 2···152169#define SPORT_MRCS2 22153170#define SPORT_MRCS3 23154171155155-#endif /*__BFIN_SPORT_H__*/172172+#endif
+36-64
arch/blackfin/include/asm/ipipe.h
···3535#include <asm/atomic.h>3636#include <asm/traps.h>37373838-#define IPIPE_ARCH_STRING "1.8-00"3838+#define IPIPE_ARCH_STRING "1.9-00"3939#define IPIPE_MAJOR_NUMBER 14040-#define IPIPE_MINOR_NUMBER 84040+#define IPIPE_MINOR_NUMBER 94141#define IPIPE_PATCH_NUMBER 042424343#ifdef CONFIG_SMP···8383 "%2 = CYCLES2\n" \8484 "CC = %2 == %0\n" \8585 "if ! CC jump 1b\n" \8686- : "=r" (((unsigned long *)&t)[1]), \8787- "=r" (((unsigned long *)&t)[0]), \8888- "=r" (__cy2) \8686+ : "=d,a" (((unsigned long *)&t)[1]), \8787+ "=d,a" (((unsigned long *)&t)[0]), \8888+ "=d,a" (__cy2) \8989 : /*no input*/ : "CC"); \9090 t; \9191 })···118118119119#define __ipipe_disable_irq(irq) (irq_desc[irq].chip->mask(irq))120120121121-#define __ipipe_lock_root() \122122- set_bit(IPIPE_ROOTLOCK_FLAG, &ipipe_root_domain->flags)121121+static inline int __ipipe_check_tickdev(const char *devname)122122+{123123+ return 1;124124+}123125124124-#define __ipipe_unlock_root() \125125- clear_bit(IPIPE_ROOTLOCK_FLAG, &ipipe_root_domain->flags)126126+static inline void __ipipe_lock_root(void)127127+{128128+ set_bit(IPIPE_SYNCDEFER_FLAG, &ipipe_root_cpudom_var(status));129129+}130130+131131+static inline void __ipipe_unlock_root(void)132132+{133133+ clear_bit(IPIPE_SYNCDEFER_FLAG, &ipipe_root_cpudom_var(status));134134+}126135127136void __ipipe_enable_pipeline(void);128137129138#define __ipipe_hook_critical_ipi(ipd) do { } while (0)130139131131-#define __ipipe_sync_pipeline(syncmask) \132132- do { \133133- struct ipipe_domain *ipd = ipipe_current_domain; \134134- if (likely(ipd != ipipe_root_domain || !test_bit(IPIPE_ROOTLOCK_FLAG, &ipd->flags))) \135135- __ipipe_sync_stage(syncmask); \136136- } while (0)140140+#define __ipipe_sync_pipeline ___ipipe_sync_pipeline141141+void ___ipipe_sync_pipeline(unsigned long syncmask);137142138143void __ipipe_handle_irq(unsigned irq, struct pt_regs *regs);139144140145int __ipipe_get_irq_priority(unsigned irq);141141-142142-int __ipipe_get_irqthread_priority(unsigned irq);143146144147void __ipipe_stall_root_raw(void);145148146149void __ipipe_unstall_root_raw(void);147150148151void __ipipe_serial_debug(const char *fmt, ...);152152+153153+asmlinkage void __ipipe_call_irqtail(unsigned long addr);149154150155DECLARE_PER_CPU(struct pt_regs, __ipipe_tick_regs);151156···167162168163#define __ipipe_run_irqtail() /* Must be a macro */ \169164 do { \170170- asmlinkage void __ipipe_call_irqtail(void); \171165 unsigned long __pending; \172172- CSYNC(); \166166+ CSYNC(); \173167 __pending = bfin_read_IPEND(); \174168 if (__pending & 0x8000) { \175169 __pending &= ~0x8010; \176170 if (__pending && (__pending & (__pending - 1)) == 0) \177177- __ipipe_call_irqtail(); \171171+ __ipipe_call_irqtail(__ipipe_irq_tail_hook); \178172 } \179173 } while (0)180174181175#define __ipipe_run_isr(ipd, irq) \182176 do { \183177 if (ipd == ipipe_root_domain) { \184184- /* \185185- * Note: the I-pipe implements a threaded interrupt model on \186186- * this arch for Linux external IRQs. The interrupt handler we \187187- * call here only wakes up the associated IRQ thread. \188188- */ \189189- if (ipipe_virtual_irq_p(irq)) { \190190- /* No irqtail here; virtual interrupts have no effect \191191- on IPEND so there is no need for processing \192192- deferral. */ \193193- local_irq_enable_nohead(ipd); \178178+ local_irq_enable_hw(); \179179+ if (ipipe_virtual_irq_p(irq)) \194180 ipd->irqs[irq].handler(irq, ipd->irqs[irq].cookie); \195195- local_irq_disable_nohead(ipd); \196196- } else \197197- /* \198198- * No need to run the irqtail here either; \199199- * we can't be preempted by hw IRQs, so \200200- * non-Linux IRQs cannot stack over the short \201201- * thread wakeup code. Which in turn means \202202- * that no irqtail condition could be pending \203203- * for domains above Linux in the pipeline. \204204- */ \181181+ else \205182 ipd->irqs[irq].handler(irq, &__raw_get_cpu_var(__ipipe_tick_regs)); \183183+ local_irq_disable_hw(); \206184 } else { \207185 __clear_bit(IPIPE_SYNC_FLAG, &ipipe_cpudom_var(ipd, status)); \208186 local_irq_enable_nohead(ipd); \···205217206218int ipipe_start_irq_thread(unsigned irq, struct irq_desc *desc);207219208208-#define IS_SYSIRQ(irq) ((irq) > IRQ_CORETMR && (irq) <= SYS_IRQS)209209-#define IS_GPIOIRQ(irq) ((irq) >= GPIO_IRQ_BASE && (irq) < NR_IRQS)210210-220220+#ifdef CONFIG_GENERIC_CLOCKEVENTS221221+#define IRQ_SYSTMR IRQ_CORETMR222222+#define IRQ_PRIOTMR IRQ_CORETMR223223+#else211224#define IRQ_SYSTMR IRQ_TIMER0212225#define IRQ_PRIOTMR CONFIG_IRQ_TIMER0226226+#endif213227214214-#if defined(CONFIG_BF531) || defined(CONFIG_BF532) || defined(CONFIG_BF533)215215-#define PRIO_GPIODEMUX(irq) CONFIG_PFA216216-#elif defined(CONFIG_BF534) || defined(CONFIG_BF536) || defined(CONFIG_BF537)217217-#define PRIO_GPIODEMUX(irq) CONFIG_IRQ_PROG_INTA218218-#elif defined(CONFIG_BF52x)219219-#define PRIO_GPIODEMUX(irq) ((irq) == IRQ_PORTF_INTA ? CONFIG_IRQ_PORTF_INTA : \220220- (irq) == IRQ_PORTG_INTA ? CONFIG_IRQ_PORTG_INTA : \221221- (irq) == IRQ_PORTH_INTA ? CONFIG_IRQ_PORTH_INTA : \222222- -1)223223-#elif defined(CONFIG_BF561)224224-#define PRIO_GPIODEMUX(irq) ((irq) == IRQ_PROG0_INTA ? CONFIG_IRQ_PROG0_INTA : \225225- (irq) == IRQ_PROG1_INTA ? CONFIG_IRQ_PROG1_INTA : \226226- (irq) == IRQ_PROG2_INTA ? CONFIG_IRQ_PROG2_INTA : \227227- -1)228228+#ifdef CONFIG_BF561228229#define bfin_write_TIMER_DISABLE(val) bfin_write_TMRS8_DISABLE(val)229230#define bfin_write_TIMER_ENABLE(val) bfin_write_TMRS8_ENABLE(val)230231#define bfin_write_TIMER_STATUS(val) bfin_write_TMRS8_STATUS(val)231232#define bfin_read_TIMER_STATUS() bfin_read_TMRS8_STATUS()232233#elif defined(CONFIG_BF54x)233233-#define PRIO_GPIODEMUX(irq) ((irq) == IRQ_PINT0 ? CONFIG_IRQ_PINT0 : \234234- (irq) == IRQ_PINT1 ? CONFIG_IRQ_PINT1 : \235235- (irq) == IRQ_PINT2 ? CONFIG_IRQ_PINT2 : \236236- (irq) == IRQ_PINT3 ? CONFIG_IRQ_PINT3 : \237237- -1)238234#define bfin_write_TIMER_DISABLE(val) bfin_write_TIMER_DISABLE0(val)239235#define bfin_write_TIMER_ENABLE(val) bfin_write_TIMER_ENABLE0(val)240236#define bfin_write_TIMER_STATUS(val) bfin_write_TIMER_STATUS0(val)241237#define bfin_read_TIMER_STATUS(val) bfin_read_TIMER_STATUS0(val)242242-#else243243-# error "no PRIO_GPIODEMUX() for this part"244238#endif245239246240#define __ipipe_root_tick_p(regs) ((regs->ipend & 0x10) != 0)···244274#define __ipipe_root_tick_p(regs) 1245275246276#endif /* !CONFIG_IPIPE */277277+278278+#define ipipe_update_tick_evtdev(evtdev) do { } while (0)247279248280#endif /* !__ASM_BLACKFIN_IPIPE_H */
+4-8
arch/blackfin/include/asm/ipipe_base.h
···11/* -*- linux-c -*-22- * include/asm-blackfin/_baseipipe.h22+ * include/asm-blackfin/ipipe_base.h33 *44 * Copyright (C) 2007 Philippe Gerum.55 *···2727#define IPIPE_NR_XIRQS NR_IRQS2828#define IPIPE_IRQ_ISHIFT 5 /* 2^5 for 32bits arch. */29293030-/* Blackfin-specific, global domain flags */3131-#define IPIPE_ROOTLOCK_FLAG 1 /* Lock pipeline for root */3030+/* Blackfin-specific, per-cpu pipeline status */3131+#define IPIPE_SYNCDEFER_FLAG 153232+#define IPIPE_SYNCDEFER_MASK (1L << IPIPE_SYNCDEFER_MASK)32333334 /* Blackfin traps -- i.e. exception vector numbers */3435#define IPIPE_NR_FAULTS 52 /* We leave a gap after VEC_ILL_RES. */···4847#define IPIPE_TIMER_IRQ IRQ_CORETMR49485049#ifndef __ASSEMBLY__5151-5252-#include <linux/bitops.h>5353-5454-extern int test_bit(int nr, const void *addr);5555-56505751extern unsigned long __ipipe_root_status; /* Alias to ipipe_root_cpudom_var(status) */5852
···122122#define TIF_MEMDIE 4123123#define TIF_RESTORE_SIGMASK 5 /* restore signal mask in do_signal() */124124#define TIF_FREEZE 6 /* is freezing for suspend */125125+#define TIF_IRQ_SYNC 7 /* sync pipeline stage */125126126127/* as above, but as bit values */127128#define _TIF_SYSCALL_TRACE (1<<TIF_SYSCALL_TRACE)···131130#define _TIF_POLLING_NRFLAG (1<<TIF_POLLING_NRFLAG)132131#define _TIF_RESTORE_SIGMASK (1<<TIF_RESTORE_SIGMASK)133132#define _TIF_FREEZE (1<<TIF_FREEZE)133133+#define _TIF_IRQ_SYNC (1<<TIF_IRQ_SYNC)134134135135#define _TIF_WORK_MASK 0x0000FFFE /* work to do on interrupt/exception return */136136
+5-3
arch/blackfin/kernel/Makefile
···1515 obj-y += time.o1616endif17171818-CFLAGS_kgdb_test.o := -mlong-calls -O01919-2018obj-$(CONFIG_IPIPE) += ipipe.o2119obj-$(CONFIG_IPIPE_TRACE_MCOUNT) += mcount.o2220obj-$(CONFIG_BFIN_GPTIMERS) += gptimers.o2321obj-$(CONFIG_CPLB_INFO) += cplbinfo.o2422obj-$(CONFIG_MODULES) += module.o2523obj-$(CONFIG_KGDB) += kgdb.o2626-obj-$(CONFIG_KGDB_TESTCASE) += kgdb_test.o2424+obj-$(CONFIG_KGDB_TESTS) += kgdb_test.o2725obj-$(CONFIG_EARLY_PRINTK) += early_printk.o2626+2727+# the kgdb test puts code into L2 and without linker2828+# relaxation, we need to force long calls to/from it2929+CFLAGS_kgdb_test.o := -mlong-calls -O0
+4
arch/blackfin/kernel/cplb-nompu/cplbinit.c
···53535454 i_d = i_i = 0;55555656+#ifdef CONFIG_DEBUG_HUNT_FOR_ZERO5657 /* Set up the zero page. */5758 d_tbl[i_d].addr = 0;5859 d_tbl[i_d++].data = SDRAM_OOPS | PAGE_SIZE_1KB;6060+ i_tbl[i_i].addr = 0;6161+ i_tbl[i_i++].data = SDRAM_OOPS | PAGE_SIZE_1KB;6262+#endif59636064 /* Cover kernel memory with 4M pages. */6165 addr = 0;
+48-132
arch/blackfin/kernel/ipipe.c
···3535#include <asm/atomic.h>3636#include <asm/io.h>37373838-static int create_irq_threads;3939-4038DEFINE_PER_CPU(struct pt_regs, __ipipe_tick_regs);4141-4242-static DEFINE_PER_CPU(unsigned long, pending_irqthread_mask);4343-4444-static DEFINE_PER_CPU(int [IVG13 + 1], pending_irq_count);45394640asmlinkage void asm_do_IRQ(unsigned int irq, struct pt_regs *regs);4741···8793 */8894void __ipipe_handle_irq(unsigned irq, struct pt_regs *regs)8995{9696+ struct ipipe_percpu_domain_data *p = ipipe_root_cpudom_ptr();9097 struct ipipe_domain *this_domain, *next_domain;9198 struct list_head *head, *pos;9299 int m_ack, s = -1;···99104 * interrupt.100105 */101106 m_ack = (regs == NULL || irq == IRQ_SYSTMR || irq == IRQ_CORETMR);102102-103107 this_domain = ipipe_current_domain;104108105109 if (unlikely(test_bit(IPIPE_STICKY_FLAG, &this_domain->irqs[irq].control)))···108114 next_domain = list_entry(head, struct ipipe_domain, p_link);109115 if (likely(test_bit(IPIPE_WIRED_FLAG, &next_domain->irqs[irq].control))) {110116 if (!m_ack && next_domain->irqs[irq].acknowledge != NULL)111111- next_domain->irqs[irq].acknowledge(irq, irq_desc + irq);112112- if (test_bit(IPIPE_ROOTLOCK_FLAG, &ipipe_root_domain->flags))113113- s = __test_and_set_bit(IPIPE_STALL_FLAG,114114- &ipipe_root_cpudom_var(status));117117+ next_domain->irqs[irq].acknowledge(irq, irq_to_desc(irq));118118+ if (test_bit(IPIPE_SYNCDEFER_FLAG, &p->status))119119+ s = __test_and_set_bit(IPIPE_STALL_FLAG, &p->status);115120 __ipipe_dispatch_wired(next_domain, irq);116116- goto finalize;117117- return;121121+ goto out;118122 }119123 }120124121125 /* Ack the interrupt. */122126123127 pos = head;124124-125128 while (pos != &__ipipe_pipeline) {126129 next_domain = list_entry(pos, struct ipipe_domain, p_link);127127- /*128128- * For each domain handling the incoming IRQ, mark it129129- * as pending in its log.130130- */131130 if (test_bit(IPIPE_HANDLE_FLAG, &next_domain->irqs[irq].control)) {132132- /*133133- * Domains that handle this IRQ are polled for134134- * acknowledging it by decreasing priority135135- * order. The interrupt must be made pending136136- * _first_ in the domain's status flags before137137- * the PIC is unlocked.138138- */139131 __ipipe_set_irq_pending(next_domain, irq);140140-141132 if (!m_ack && next_domain->irqs[irq].acknowledge != NULL) {142142- next_domain->irqs[irq].acknowledge(irq, irq_desc + irq);133133+ next_domain->irqs[irq].acknowledge(irq, irq_to_desc(irq));143134 m_ack = 1;144135 }145136 }146146-147147- /*148148- * If the domain does not want the IRQ to be passed149149- * down the interrupt pipe, exit the loop now.150150- */151137 if (!test_bit(IPIPE_PASS_FLAG, &next_domain->irqs[irq].control))152138 break;153153-154139 pos = next_domain->p_link.next;155140 }156141···139166 * immediately to the current domain if the interrupt has been140167 * marked as 'sticky'. This search does not go beyond the141168 * current domain in the pipeline. We also enforce the142142- * additional root stage lock (blackfin-specific). */169169+ * additional root stage lock (blackfin-specific).170170+ */171171+ if (test_bit(IPIPE_SYNCDEFER_FLAG, &p->status))172172+ s = __test_and_set_bit(IPIPE_STALL_FLAG, &p->status);143173144144- if (test_bit(IPIPE_ROOTLOCK_FLAG, &ipipe_root_domain->flags))145145- s = __test_and_set_bit(IPIPE_STALL_FLAG,146146- &ipipe_root_cpudom_var(status));147147-finalize:174174+ /*175175+ * If the interrupt preempted the head domain, then do not176176+ * even try to walk the pipeline, unless an interrupt is177177+ * pending for it.178178+ */179179+ if (test_bit(IPIPE_AHEAD_FLAG, &this_domain->flags) &&180180+ ipipe_head_cpudom_var(irqpend_himask) == 0)181181+ goto out;148182149183 __ipipe_walk_pipeline(head);150150-184184+out:151185 if (!s)152152- __clear_bit(IPIPE_STALL_FLAG,153153- &ipipe_root_cpudom_var(status));186186+ __clear_bit(IPIPE_STALL_FLAG, &p->status);154187}155188156189int __ipipe_check_root(void)···166187167188void __ipipe_enable_irqdesc(struct ipipe_domain *ipd, unsigned irq)168189{169169- struct irq_desc *desc = irq_desc + irq;190190+ struct irq_desc *desc = irq_to_desc(irq);170191 int prio = desc->ic_prio;171192172193 desc->depth = 0;···178199179200void __ipipe_disable_irqdesc(struct ipipe_domain *ipd, unsigned irq)180201{181181- struct irq_desc *desc = irq_desc + irq;202202+ struct irq_desc *desc = irq_to_desc(irq);182203 int prio = desc->ic_prio;183204184205 if (ipd != &ipipe_root &&···215236{216237 unsigned long flags;217238218218- /* We need to run the IRQ tail hook whenever we don't239239+ /*240240+ * We need to run the IRQ tail hook whenever we don't219241 * propagate a syscall to higher domains, because we know that220242 * important operations might be pending there (e.g. Xenomai221221- * deferred rescheduling). */243243+ * deferred rescheduling).244244+ */222245223223- if (!__ipipe_syscall_watched_p(current, regs->orig_p0)) {246246+ if (regs->orig_p0 < NR_syscalls) {224247 void (*hook)(void) = (void (*)(void))__ipipe_irq_tail_hook;225248 hook();226226- return 0;249249+ if ((current->flags & PF_EVNOTIFY) == 0)250250+ return 0;227251 }228252229253 /*···294312{295313 unsigned long flags;296314315315+#ifdef CONFIG_IPIPE_DEBUG297316 if (irq >= IPIPE_NR_IRQS ||298317 (ipipe_virtual_irq_p(irq)299318 && !test_bit(irq - IPIPE_VIRQ_BASE, &__ipipe_virtual_irq_map)))300319 return -EINVAL;320320+#endif301321302322 local_irq_save_hw(flags);303303-304323 __ipipe_handle_irq(irq, NULL);305305-306324 local_irq_restore_hw(flags);307325308326 return 1;309327}310328311311-/* Move Linux IRQ to threads. */312312-313313-static int do_irqd(void *__desc)329329+asmlinkage void __ipipe_sync_root(void)314330{315315- struct irq_desc *desc = __desc;316316- unsigned irq = desc - irq_desc;317317- int thrprio = desc->thr_prio;318318- int thrmask = 1 << thrprio;319319- int cpu = smp_processor_id();320320- cpumask_t cpumask;331331+ unsigned long flags;321332322322- sigfillset(¤t->blocked);323323- current->flags |= PF_NOFREEZE;324324- cpumask = cpumask_of_cpu(cpu);325325- set_cpus_allowed(current, cpumask);326326- ipipe_setscheduler_root(current, SCHED_FIFO, 50 + thrprio);333333+ BUG_ON(irqs_disabled());327334328328- while (!kthread_should_stop()) {329329- local_irq_disable();330330- if (!(desc->status & IRQ_SCHEDULED)) {331331- set_current_state(TASK_INTERRUPTIBLE);332332-resched:333333- local_irq_enable();334334- schedule();335335- local_irq_disable();336336- }337337- __set_current_state(TASK_RUNNING);338338- /*339339- * If higher priority interrupt servers are ready to340340- * run, reschedule immediately. We need this for the341341- * GPIO demux IRQ handler to unmask the interrupt line342342- * _last_, after all GPIO IRQs have run.343343- */344344- if (per_cpu(pending_irqthread_mask, cpu) & ~(thrmask|(thrmask-1)))345345- goto resched;346346- if (--per_cpu(pending_irq_count[thrprio], cpu) == 0)347347- per_cpu(pending_irqthread_mask, cpu) &= ~thrmask;348348- desc->status &= ~IRQ_SCHEDULED;349349- desc->thr_handler(irq, &__raw_get_cpu_var(__ipipe_tick_regs));350350- local_irq_enable();351351- }352352- __set_current_state(TASK_RUNNING);353353- return 0;335335+ local_irq_save_hw(flags);336336+337337+ clear_thread_flag(TIF_IRQ_SYNC);338338+339339+ if (ipipe_root_cpudom_var(irqpend_himask) != 0)340340+ __ipipe_sync_pipeline(IPIPE_IRQMASK_ANY);341341+342342+ local_irq_restore_hw(flags);354343}355344356356-static void kick_irqd(unsigned irq, void *cookie)345345+void ___ipipe_sync_pipeline(unsigned long syncmask)357346{358358- struct irq_desc *desc = irq_desc + irq;359359- int thrprio = desc->thr_prio;360360- int thrmask = 1 << thrprio;361361- int cpu = smp_processor_id();347347+ struct ipipe_domain *ipd = ipipe_current_domain;362348363363- if (!(desc->status & IRQ_SCHEDULED)) {364364- desc->status |= IRQ_SCHEDULED;365365- per_cpu(pending_irqthread_mask, cpu) |= thrmask;366366- ++per_cpu(pending_irq_count[thrprio], cpu);367367- wake_up_process(desc->thread);368368- }369369-}370370-371371-int ipipe_start_irq_thread(unsigned irq, struct irq_desc *desc)372372-{373373- if (desc->thread || !create_irq_threads)374374- return 0;375375-376376- desc->thread = kthread_create(do_irqd, desc, "IRQ %d", irq);377377- if (desc->thread == NULL) {378378- printk(KERN_ERR "irqd: could not create IRQ thread %d!\n", irq);379379- return -ENOMEM;349349+ if (ipd == ipipe_root_domain) {350350+ if (test_bit(IPIPE_SYNCDEFER_FLAG, &ipipe_root_cpudom_var(status)))351351+ return;380352 }381353382382- wake_up_process(desc->thread);383383-384384- desc->thr_handler = ipipe_root_domain->irqs[irq].handler;385385- ipipe_root_domain->irqs[irq].handler = &kick_irqd;386386-387387- return 0;388388-}389389-390390-void __init ipipe_init_irq_threads(void)391391-{392392- unsigned irq;393393- struct irq_desc *desc;394394-395395- create_irq_threads = 1;396396-397397- for (irq = 0; irq < NR_IRQS; irq++) {398398- desc = irq_desc + irq;399399- if (desc->action != NULL ||400400- (desc->status & IRQ_NOREQUEST) != 0)401401- ipipe_start_irq_thread(irq, desc);402402- }354354+ __ipipe_sync_stage(syncmask);403355}404356405357EXPORT_SYMBOL(show_stack);
+9-5
arch/blackfin/kernel/irqchip.c
···149149#endif150150 generic_handle_irq(irq);151151152152-#ifndef CONFIG_IPIPE /* Useless and bugous over the I-pipe: IRQs are threaded. */153153- /* If we're the only interrupt running (ignoring IRQ15 which is for154154- syscalls), lower our priority to IRQ14 so that softirqs run at155155- that level. If there's another, lower-level interrupt, irq_exit156156- will defer softirqs to that. */152152+#ifndef CONFIG_IPIPE153153+ /*154154+ * If we're the only interrupt running (ignoring IRQ15 which155155+ * is for syscalls), lower our priority to IRQ14 so that156156+ * softirqs run at that level. If there's another,157157+ * lower-level interrupt, irq_exit will defer softirqs to158158+ * that. If the interrupt pipeline is enabled, we are already159159+ * running at IRQ14 priority, so we don't need this code.160160+ */157161 CSYNC();158162 pending = bfin_read_IPEND() & ~0x8000;159163 other_ints = pending & (pending - 1);
+7-2
arch/blackfin/kernel/kgdb_test.c
···2020static char cmdline[256];2121static unsigned long len;22222323+#ifndef CONFIG_SMP2324static int num1 __attribute__((l1_data));24252526void kgdb_l1_test(void) __attribute__((l1_text));···3332 printk(KERN_ALERT "L1(after change) : data variable addr = 0x%p, data value is %d\n", &num1, num1);3433 return ;3534}3535+#endif3636+3637#if L2_LENGTH37383839static int num2 __attribute__((l2));···6259static int test_proc_output(char *buf)6360{6461 kgdb_test("hello world!", 12, 0x55, 0x10);6262+#ifndef CONFIG_SMP6563 kgdb_l1_test();6666- #if L2_LENGTH6464+#endif6565+#if L2_LENGTH6766 kgdb_l2_test();6868- #endif6767+#endif69687069 return 0;7170}
···889889 CPU, bfin_revid());890890 }891891892892+ /* We can't run on BF548-0.1 due to ANOMALY 05000448 */893893+ if (bfin_cpuid() == 0x27de && bfin_revid() == 1)894894+ panic("You can't run on this processor due to 05000448\n");895895+892896 printk(KERN_INFO "Blackfin Linux support by http://blackfin.uclinux.org/\n");893897894898 printk(KERN_INFO "Processor Speed: %lu MHz core clock and %lu MHz System Clock\n",···11451141 icache_size = 0;1146114211471143 seq_printf(m, "cache size\t: %d KB(L1 icache) "11481148- "%d KB(L1 dcache-%s) %d KB(L2 cache)\n",11441144+ "%d KB(L1 dcache%s) %d KB(L2 cache)\n",11491145 icache_size, dcache_size,11501146#if defined CONFIG_BFIN_WB11511151- "wb"11471147+ "-wb"11521148#elif defined CONFIG_BFIN_WT11531153- "wt"11491149+ "-wt"11541150#endif11551151 "", 0);11561152
+4-1
arch/blackfin/kernel/time.c
···134134135135 write_seqlock(&xtime_lock);136136#if defined(CONFIG_TICK_SOURCE_SYSTMR0) && !defined(CONFIG_IPIPE)137137-/* FIXME: Here TIMIL0 is not set when IPIPE enabled, why? */137137+ /*138138+ * TIMIL0 is latched in __ipipe_grab_irq() when the I-Pipe is139139+ * enabled.140140+ */138141 if (get_gptimer_status(0) & TIMER_STATUS_TIMIL0) {139142#endif140143 do_timer(1);
···22 * File: include/asm-blackfin/mach-bf518/anomaly.h33 * Bugs: Enter bugs at http://blackfin.uclinux.org/44 *55- * Copyright (C) 2004-2008 Analog Devices Inc.55+ * Copyright (C) 2004-2009 Analog Devices Inc.66 * Licensed under the GPL-2 or later.77 */8899/* This file shoule be up to date with:1010- * - ????1010+ * - Revision B, 02/03/2009; ADSP-BF512/BF514/BF516/BF518 Blackfin Processor Anomaly List1111 */12121313#ifndef _MACH_ANOMALY_H_···1919#define ANOMALY_05000122 (1)2020/* False Hardware Error from an Access in the Shadow of a Conditional Branch */2121#define ANOMALY_05000245 (1)2222+/* Incorrect Timer Pulse Width in Single-Shot PWM_OUT Mode with External Clock */2323+#define ANOMALY_05000254 (1)2224/* Sensitivity To Noise with Slow Input Edge Rates on External SPORT TX and RX Clocks */2325#define ANOMALY_05000265 (1)2426/* False Hardware Errors Caused by Fetches at the Boundary of Reserved Memory */···5553#define ANOMALY_05000443 (1)5654/* Incorrect L1 Instruction Bank B Memory Map Location */5755#define ANOMALY_05000444 (1)5656+/* Incorrect Default Hysteresis Setting for RESET, NMI, and BMODE Signals */5757+#define ANOMALY_05000452 (1)5858+/* PWM_TRIPB Signal Not Available on PG10 */5959+#define ANOMALY_05000453 (1)6060+/* PPI_FS3 is Driven One Half Cycle Later Than PPI Data */6161+#define ANOMALY_05000455 (1)58625963/* Anomalies that don't exist on this proc */6064#define ANOMALY_05000125 (0)···7365#define ANOMALY_05000263 (0)7466#define ANOMALY_05000266 (0)7567#define ANOMALY_05000273 (0)6868+#define ANOMALY_05000278 (0)7669#define ANOMALY_05000285 (0)7070+#define ANOMALY_05000305 (0)7771#define ANOMALY_05000307 (0)7872#define ANOMALY_05000311 (0)7973#define ANOMALY_05000312 (0)8074#define ANOMALY_05000323 (0)8175#define ANOMALY_05000353 (0)8276#define ANOMALY_05000363 (0)7777+#define ANOMALY_05000380 (0)8378#define ANOMALY_05000386 (0)8479#define ANOMALY_05000412 (0)8580#define ANOMALY_05000432 (0)8181+#define ANOMALY_05000447 (0)8282+#define ANOMALY_05000448 (0)86838784#endif
···3838 help3939 Core support for IP04/IP04 open hardware IP-PBX.40404141-config GENERIC_BF533_BOARD4242- bool "Generic"4343- help4444- Generic or Custom board support.4545-4641endchoice
···11-/*22- * File: arch/blackfin/mach-bf533/generic_board.c33- * Based on: arch/blackfin/mach-bf533/ezkit.c44- * Author: Aidan Williams <aidan@nicta.com.au>55- *66- * Created: 200577- * Description:88- *99- * Modified:1010- * Copyright 2005 National ICT Australia (NICTA)1111- * Copyright 2004-2006 Analog Devices Inc.1212- *1313- * Bugs: Enter bugs at http://blackfin.uclinux.org/1414- *1515- * This program is free software; you can redistribute it and/or modify1616- * it under the terms of the GNU General Public License as published by1717- * the Free Software Foundation; either version 2 of the License, or1818- * (at your option) any later version.1919- *2020- * This program is distributed in the hope that it will be useful,2121- * but WITHOUT ANY WARRANTY; without even the implied warranty of2222- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the2323- * GNU General Public License for more details.2424- *2525- * You should have received a copy of the GNU General Public License2626- * along with this program; if not, see the file COPYING, or write2727- * to the Free Software Foundation, Inc.,2828- * 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA2929- */3030-3131-#include <linux/device.h>3232-#include <linux/platform_device.h>3333-#include <linux/irq.h>3434-3535-/*3636- * Name the Board for the /proc/cpuinfo3737- */3838-const char bfin_board_name[] = "UNKNOWN BOARD";3939-4040-#if defined(CONFIG_RTC_DRV_BFIN) || defined(CONFIG_RTC_DRV_BFIN_MODULE)4141-static struct platform_device rtc_device = {4242- .name = "rtc-bfin",4343- .id = -1,4444-};4545-#endif4646-4747-/*4848- * Driver needs to know address, irq and flag pin.4949- */5050-#if defined(CONFIG_SMC91X) || defined(CONFIG_SMC91X_MODULE)5151-static struct resource smc91x_resources[] = {5252- {5353- .start = 0x20300300,5454- .end = 0x20300300 + 16,5555- .flags = IORESOURCE_MEM,5656- }, {5757- .start = IRQ_PROG_INTB,5858- .end = IRQ_PROG_INTB,5959- .flags = IORESOURCE_IRQ | IORESOURCE_IRQ_HIGHLEVEL,6060- }, {6161- .start = IRQ_PF7,6262- .end = IRQ_PF7,6363- .flags = IORESOURCE_IRQ | IORESOURCE_IRQ_HIGHLEVEL,6464- },6565-};6666-6767-static struct platform_device smc91x_device = {6868- .name = "smc91x",6969- .id = 0,7070- .num_resources = ARRAY_SIZE(smc91x_resources),7171- .resource = smc91x_resources,7272-};7373-#endif7474-7575-#if defined(CONFIG_BFIN_SIR) || defined(CONFIG_BFIN_SIR_MODULE)7676-#ifdef CONFIG_BFIN_SIR07777-static struct resource bfin_sir0_resources[] = {7878- {7979- .start = 0xFFC00400,8080- .end = 0xFFC004FF,8181- .flags = IORESOURCE_MEM,8282- },8383- {8484- .start = IRQ_UART0_RX,8585- .end = IRQ_UART0_RX+1,8686- .flags = IORESOURCE_IRQ,8787- },8888- {8989- .start = CH_UART0_RX,9090- .end = CH_UART0_RX+1,9191- .flags = IORESOURCE_DMA,9292- },9393-};9494-9595-static struct platform_device bfin_sir0_device = {9696- .name = "bfin_sir",9797- .id = 0,9898- .num_resources = ARRAY_SIZE(bfin_sir0_resources),9999- .resource = bfin_sir0_resources,100100-};101101-#endif102102-#endif103103-104104-static struct platform_device *generic_board_devices[] __initdata = {105105-#if defined(CONFIG_RTC_DRV_BFIN) || defined(CONFIG_RTC_DRV_BFIN_MODULE)106106- &rtc_device,107107-#endif108108-109109-#if defined(CONFIG_SMC91X) || defined(CONFIG_SMC91X_MODULE)110110- &smc91x_device,111111-#endif112112-113113-#if defined(CONFIG_BFIN_SIR) || defined(CONFIG_BFIN_SIR_MODULE)114114-#ifdef CONFIG_BFIN_SIR0115115- &bfin_sir0_device,116116-#endif117117-#endif118118-};119119-120120-static int __init generic_board_init(void)121121-{122122- printk(KERN_INFO "%s(): registering device resources\n", __func__);123123- return platform_add_devices(generic_board_devices, ARRAY_SIZE(generic_board_devices));124124-}125125-126126-arch_initcall(generic_board_init);
+6-7
arch/blackfin/mach-bf533/boards/ip0x.c
···127127#if defined(CONFIG_SPI_BFIN) || defined(CONFIG_SPI_BFIN_MODULE)128128/* all SPI peripherals info goes here */129129130130-#if defined(CONFIG_SPI_MMC) || defined(CONFIG_SPI_MMC_MODULE)131131-static struct bfin5xx_spi_chip spi_mmc_chip_info = {130130+#if defined(CONFIG_MMC_SPI) || defined(CONFIG_MMC_SPI_MODULE)131131+static struct bfin5xx_spi_chip mmc_spi_chip_info = {132132/*133133 * CPOL (Clock Polarity)134134 * 0 - Active high SCK···152152/* Notice: for blackfin, the speed_hz is the value of register153153 * SPI_BAUD, not the real baudrate */154154static struct spi_board_info bfin_spi_board_info[] __initdata = {155155-#if defined(CONFIG_SPI_MMC) || defined(CONFIG_SPI_MMC_MODULE)155155+#if defined(CONFIG_MMC_SPI) || defined(CONFIG_MMC_SPI_MODULE)156156 {157157- .modalias = "spi_mmc",157157+ .modalias = "mmc_spi",158158 .max_speed_hz = 2,159159 .bus_num = 1,160160- .chip_select = CONFIG_SPI_MMC_CS_CHAN,161161- .platform_data = NULL,162162- .controller_data = &spi_mmc_chip_info,160160+ .chip_select = 5,161161+ .controller_data = &mmc_spi_chip_info,163162 },164163#endif165164};
+5-2
arch/blackfin/mach-bf533/include/mach/anomaly.h
···22 * File: include/asm-blackfin/mach-bf533/anomaly.h33 * Bugs: Enter bugs at http://blackfin.uclinux.org/44 *55- * Copyright (C) 2004-2008 Analog Devices Inc.55+ * Copyright (C) 2004-2009 Analog Devices Inc.66 * Licensed under the GPL-2 or later.77 */88···160160#define ANOMALY_05000301 (__SILICON_REVISION__ < 6)161161/* SSYNCs After Writes To DMA MMR Registers May Not Be Handled Correctly */162162#define ANOMALY_05000302 (__SILICON_REVISION__ < 5)163163-/* New Feature: Additional Hysteresis on SPORT Input Pins (Not Available On Older Silicon) */163163+/* SPORT_HYS Bit in PLL_CTL Register Is Not Functional */164164#define ANOMALY_05000305 (__SILICON_REVISION__ < 5)165165/* New Feature: Additional PPI Frame Sync Sampling Options (Not Available On Older Silicon) */166166#define ANOMALY_05000306 (__SILICON_REVISION__ < 5)···278278#define ANOMALY_05000266 (0)279279#define ANOMALY_05000323 (0)280280#define ANOMALY_05000353 (1)281281+#define ANOMALY_05000380 (0)281282#define ANOMALY_05000386 (1)282283#define ANOMALY_05000412 (0)283284#define ANOMALY_05000432 (0)284285#define ANOMALY_05000435 (0)286286+#define ANOMALY_05000447 (0)287287+#define ANOMALY_05000448 (0)285288286289#endif
···22 * File: include/asm-blackfin/mach-bf537/anomaly.h33 * Bugs: Enter bugs at http://blackfin.uclinux.org/44 *55- * Copyright (C) 2004-2008 Analog Devices Inc.55+ * Copyright (C) 2004-2009 Analog Devices Inc.66 * Licensed under the GPL-2 or later.77 */88···110110#define ANOMALY_05000301 (1)111111/* SSYNCs After Writes To CAN/DMA MMR Registers Are Not Always Handled Correctly */112112#define ANOMALY_05000304 (__SILICON_REVISION__ < 3)113113-/* New Feature: Additional Hysteresis on SPORT Input Pins (Not Available On Older Silicon) */113113+/* SPORT_HYS Bit in PLL_CTL Register Is Not Functional */114114#define ANOMALY_05000305 (__SILICON_REVISION__ < 3)115115/* SCKELOW Bit Does Not Maintain State Through Hibernate */116116#define ANOMALY_05000307 (__SILICON_REVISION__ < 3)···168168#define ANOMALY_05000323 (0)169169#define ANOMALY_05000353 (1)170170#define ANOMALY_05000363 (0)171171+#define ANOMALY_05000380 (0)171172#define ANOMALY_05000386 (1)172173#define ANOMALY_05000412 (0)173174#define ANOMALY_05000432 (0)174175#define ANOMALY_05000435 (0)176176+#define ANOMALY_05000447 (0)177177+#define ANOMALY_05000448 (0)175178176179#endif
···1919 help2020 CM-BF561 support for EVAL- and DEV-Board.21212222-config GENERIC_BF561_BOARD2323- bool "Generic"2424- help2525- Generic or Custom board support.2626-2722endchoice
···11-/*22- * File: arch/blackfin/mach-bf561/generic_board.c33- * Based on: arch/blackfin/mach-bf533/ezkit.c44- * Author: Aidan Williams <aidan@nicta.com.au>55- *66- * Created:77- * Description:88- *99- * Modified:1010- * Copyright 2005 National ICT Australia (NICTA)1111- * Copyright 2004-2006 Analog Devices Inc.1212- *1313- * Bugs: Enter bugs at http://blackfin.uclinux.org/1414- *1515- * This program is free software; you can redistribute it and/or modify1616- * it under the terms of the GNU General Public License as published by1717- * the Free Software Foundation; either version 2 of the License, or1818- * (at your option) any later version.1919- *2020- * This program is distributed in the hope that it will be useful,2121- * but WITHOUT ANY WARRANTY; without even the implied warranty of2222- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the2323- * GNU General Public License for more details.2424- *2525- * You should have received a copy of the GNU General Public License2626- * along with this program; if not, see the file COPYING, or write2727- * to the Free Software Foundation, Inc.,2828- * 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA2929- */3030-3131-#include <linux/device.h>3232-#include <linux/platform_device.h>3333-#include <linux/irq.h>3434-3535-const char bfin_board_name[] = "UNKNOWN BOARD";3636-3737-/*3838- * Driver needs to know address, irq and flag pin.3939- */4040-#if defined(CONFIG_SMC91X) || defined(CONFIG_SMC91X_MODULE)4141-static struct resource smc91x_resources[] = {4242- {4343- .start = 0x2C010300,4444- .end = 0x2C010300 + 16,4545- .flags = IORESOURCE_MEM,4646- }, {4747- .start = IRQ_PROG_INTB,4848- .end = IRQ_PROG_INTB,4949- .flags = IORESOURCE_IRQ | IORESOURCE_IRQ_HIGHLEVEL,5050- }, {5151- .start = IRQ_PF9,5252- .end = IRQ_PF9,5353- .flags = IORESOURCE_IRQ | IORESOURCE_IRQ_HIGHLEVEL,5454- },5555-};5656-5757-static struct platform_device smc91x_device = {5858- .name = "smc91x",5959- .id = 0,6060- .num_resources = ARRAY_SIZE(smc91x_resources),6161- .resource = smc91x_resources,6262-};6363-#endif6464-6565-#if defined(CONFIG_BFIN_SIR) || defined(CONFIG_BFIN_SIR_MODULE)6666-#ifdef CONFIG_BFIN_SIR06767-static struct resource bfin_sir0_resources[] = {6868- {6969- .start = 0xFFC00400,7070- .end = 0xFFC004FF,7171- .flags = IORESOURCE_MEM,7272- },7373- {7474- .start = IRQ_UART0_RX,7575- .end = IRQ_UART0_RX+1,7676- .flags = IORESOURCE_IRQ,7777- },7878- {7979- .start = CH_UART0_RX,8080- .end = CH_UART0_RX+1,8181- .flags = IORESOURCE_DMA,8282- },8383-};8484-8585-static struct platform_device bfin_sir0_device = {8686- .name = "bfin_sir",8787- .id = 0,8888- .num_resources = ARRAY_SIZE(bfin_sir0_resources),8989- .resource = bfin_sir0_resources,9090-};9191-#endif9292-#endif9393-9494-static struct platform_device *generic_board_devices[] __initdata = {9595-#if defined(CONFIG_SMC91X) || defined(CONFIG_SMC91X_MODULE)9696- &smc91x_device,9797-#endif9898-9999-#if defined(CONFIG_BFIN_SIR) || defined(CONFIG_BFIN_SIR_MODULE)100100-#ifdef CONFIG_BFIN_SIR0101101- &bfin_sir0_device,102102-#endif103103-#endif104104-};105105-106106-static int __init generic_board_init(void)107107-{108108- printk(KERN_INFO "%s(): registering device resources\n", __func__);109109- return platform_add_devices(generic_board_devices,110110- ARRAY_SIZE(generic_board_devices));111111-}112112-113113-arch_initcall(generic_board_init);
+5-2
arch/blackfin/mach-bf561/include/mach/anomaly.h
···22 * File: include/asm-blackfin/mach-bf561/anomaly.h33 * Bugs: Enter bugs at http://blackfin.uclinux.org/44 *55- * Copyright (C) 2004-2008 Analog Devices Inc.55+ * Copyright (C) 2004-2009 Analog Devices Inc.66 * Licensed under the GPL-2 or later.77 */88···224224#define ANOMALY_05000301 (1)225225/* SSYNCs After Writes To DMA MMR Registers May Not Be Handled Correctly */226226#define ANOMALY_05000302 (1)227227-/* New Feature: Additional Hysteresis on SPORT Input Pins (Not Available On Older Silicon) */227227+/* SPORT_HYS Bit in PLL_CTL Register Is Not Functional */228228#define ANOMALY_05000305 (__SILICON_REVISION__ < 5)229229/* SCKELOW Bit Does Not Maintain State Through Hibernate */230230#define ANOMALY_05000307 (__SILICON_REVISION__ < 5)···283283#define ANOMALY_05000273 (0)284284#define ANOMALY_05000311 (0)285285#define ANOMALY_05000353 (1)286286+#define ANOMALY_05000380 (0)286287#define ANOMALY_05000386 (1)287288#define ANOMALY_05000432 (0)288289#define ANOMALY_05000435 (0)290290+#define ANOMALY_05000447 (0)291291+#define ANOMALY_05000448 (0)289292290293#endif
···6262#if (CONFIG_BOOT_LOAD & 0x3)6363# error "The kernel load address must be 4 byte aligned"6464#endif6565+6666+/* The entire kernel must be able to make a 24bit pcrel call to start of L1 */6767+#if ((0xffffffff - L1_CODE_START + 1) + CONFIG_BOOT_LOAD) > 0x10000006868+# error "The kernel load address is too high; keep it below 10meg for safety"6969+#endif7070+7171+#if ANOMALY_050004487272+# error You are using a part with anomaly 05000448, this issue causes random memory read/write failures - that means random crashes.7373+#endif
+22
arch/blackfin/mach-common/cache.S
···66666767/* Invalidate all instruction cache lines assocoiated with this memory area */6868ENTRY(_blackfin_icache_flush_range)6969+/*7070+ * Walkaround to avoid loading wrong instruction after invalidating icache7171+ * and following sequence is met.7272+ *7373+ * 1) One instruction address is cached in the instruction cache.7474+ * 2) This instruction in SDRAM is changed.7575+ * 3) IFLASH[P0] is executed only once in blackfin_icache_flush_range().7676+ * 4) This instruction is executed again, but the old one is loaded.7777+ */7878+ P0 = R0;7979+ IFLUSH[P0];6980 do_flush IFLUSH, , nop7081ENDPROC(_blackfin_icache_flush_range)71827283/* Flush all cache lines assocoiated with this area of memory. */7384ENTRY(_blackfin_icache_dcache_flush_range)8585+/*8686+ * Walkaround to avoid loading wrong instruction after invalidating icache8787+ * and following sequence is met.8888+ *8989+ * 1) One instruction address is cached in the instruction cache.9090+ * 2) This instruction in SDRAM is changed.9191+ * 3) IFLASH[P0] is executed only once in blackfin_icache_flush_range().9292+ * 4) This instruction is executed again, but the old one is loaded.9393+ */9494+ P0 = R0;9595+ IFLUSH[P0];7496 do_flush FLUSH, IFLUSH7597ENDPROC(_blackfin_icache_dcache_flush_range)7698
···142142{143143 unsigned int val;144144145145+ /* Do not do the fixup on other platforms! */146146+ if (!machine_is(gef_sbc610))147147+ return;148148+145149 printk(KERN_INFO "Running NEC uPD720101 Fixup\n");146150147151 /* Ensure ports 1, 2, 3, 4 & 5 are enabled */
···138138config HAVE_SETUP_PER_CPU_AREA139139 def_bool y140140141141+config HAVE_DYNAMIC_PER_CPU_AREA142142+ def_bool y143143+141144config HAVE_CPUMASK_OF_CPU_MAP142145 def_bool X86_64_SMP143146···783780 Additional support for AMD specific MCE features such as784781 the DRAM Error Threshold.785782783783+config X86_MCE_THRESHOLD784784+ depends on X86_MCE_AMD || X86_MCE_INTEL785785+ bool786786+ default y787787+786788config X86_MCE_NONFATAL787789 tristate "Check for non-fatal errors on AMD Athlon/Duron / Intel Pentium 4"788790 depends on X86_32 && X86_MCE···11331125 Specify the maximum number of NUMA Nodes available on the target11341126 system. Increases memory reserved to accomodate various tables.1135112711361136-config HAVE_ARCH_BOOTMEM_NODE11281128+config HAVE_ARCH_BOOTMEM11371129 def_bool y11381130 depends on X86_32 && NUMA11391131···14311423config KEXEC_JUMP14321424 bool "kexec jump (EXPERIMENTAL)"14331425 depends on EXPERIMENTAL14341434- depends on KEXEC && HIBERNATION && X86_3214261426+ depends on KEXEC && HIBERNATION14351427 ---help---14361428 Jump between original kernel and kexeced kernel and invoke14371429 code in physical address mode via KEXEC
+2
arch/x86/include/asm/apic.h
···379379380380static inline void ack_APIC_irq(void)381381{382382+#ifdef CONFIG_X86_LOCAL_APIC382383 /*383384 * ack_APIC_irq() actually gets compiled as a single instruction384385 * ... yummie.···387386388387 /* Docs say use 0 for future compatibility */389388 apic_write(APIC_EOI, 0);389389+#endif390390}391391392392static inline unsigned default_get_apic_id(unsigned long x)
···3333 smp_invalidate_interrupt)3434#endif35353636+BUILD_INTERRUPT(generic_interrupt, GENERIC_INTERRUPT_VECTOR)3737+3638/*3739 * every pentium local APIC has two 'local interrupts', with a3840 * soft-definable vector attached to both interrupts, one of
···1212 unsigned int apic_timer_irqs; /* arch dependent */1313 unsigned int irq_spurious_count;1414#endif1515+ unsigned int generic_irqs; /* arch dependent */1516#ifdef CONFIG_SMP1617 unsigned int irq_resched_count;1718 unsigned int irq_call_count;
···11+#ifndef _ASM_X86_INIT_32_H22+#define _ASM_X86_INIT_32_H33+44+#ifdef CONFIG_X86_3255+extern void __init early_ioremap_page_table_range_init(void);66+#endif77+88+extern unsigned long __init99+kernel_physical_mapping_init(unsigned long start,1010+ unsigned long end,1111+ unsigned long page_size_mask);1212+1313+1414+extern unsigned long __initdata e820_table_start;1515+extern unsigned long __meminitdata e820_table_end;1616+extern unsigned long __meminitdata e820_table_top;1717+1818+#endif /* _ASM_X86_INIT_32_H */
-3
arch/x86/include/asm/io.h
···172172173173extern void iounmap(volatile void __iomem *addr);174174175175-extern void __iomem *fix_ioremap(unsigned idx, unsigned long phys);176176-177175178176#ifdef CONFIG_X86_32179177# include "io_32.h"···196198extern void __iomem *early_ioremap(unsigned long offset, unsigned long size);197199extern void __iomem *early_memremap(unsigned long offset, unsigned long size);198200extern void early_iounmap(void __iomem *addr, unsigned long size);199199-extern void __iomem *fix_ioremap(unsigned idx, unsigned long phys);200201201202#define IO_SPACE_LIMIT 0xffff202203
···112112#define LOCAL_PERF_VECTOR 0xee113113114114/*115115+ * Generic system vector for platform specific use116116+ */117117+#define GENERIC_INTERRUPT_VECTOR 0xed118118+119119+/*115120 * First APIC vector available to drivers: (vectors 0x30-0xee) we116121 * start at 0x31(0x41) to spread out vectors evenly between priority117122 * levels. (0x80 is the syscall vector)
+7-6
arch/x86/include/asm/kexec.h
···99# define PAGES_NR 41010#else1111# define PA_CONTROL_PAGE 01212-# define PA_TABLE_PAGE 11313-# define PAGES_NR 21212+# define VA_CONTROL_PAGE 11313+# define PA_TABLE_PAGE 21414+# define PA_SWAP_PAGE 31515+# define PAGES_NR 41416#endif15171616-#ifdef CONFIG_X86_321718# define KEXEC_CONTROL_CODE_MAX_SIZE 20481818-#endif19192020#ifndef __ASSEMBLY__2121···136136 unsigned int has_pae,137137 unsigned int preserve_context);138138#else139139-NORET_TYPE void139139+unsigned long140140relocate_kernel(unsigned long indirection_page,141141 unsigned long page_list,142142- unsigned long start_address) ATTRIB_NORET;142142+ unsigned long start_address,143143+ unsigned int preserve_context);143144#endif144145145146#define ARCH_HAS_KIMAGE_ARCH
···7777#define MSR_IA32_MC0_ADDR 0x000004027878#define MSR_IA32_MC0_MISC 0x0000040379798080+/* These are consecutive and not in the normal 4er MCE bank block */8181+#define MSR_IA32_MC0_CTL2 0x000002808282+#define CMCI_EN (1ULL << 30)8383+#define CMCI_THRESHOLD_MASK 0xffffULL8484+8085#define MSR_P6_PERFCTR0 0x000000c18186#define MSR_P6_PERFCTR1 0x000000c28287#define MSR_P6_EVNTSEL0 0x00000186
-6
arch/x86/include/asm/page_types.h
···40404141#ifndef __ASSEMBLY__42424343-struct pgprot;4444-4543extern int page_is_ram(unsigned long pagenr);4644extern int devmem_is_allowed(unsigned long pagenr);4747-extern void map_devmem(unsigned long pfn, unsigned long size,4848- struct pgprot vma_prot);4949-extern void unmap_devmem(unsigned long pfn, unsigned long size,5050- struct pgprot vma_prot);51455246extern unsigned long max_low_pfn_mapped;5347extern unsigned long max_pfn_mapped;
+5
arch/x86/include/asm/pat.h
···22#define _ASM_X86_PAT_H3344#include <linux/types.h>55+#include <asm/pgtable_types.h>5667#ifdef CONFIG_X86_PAT78extern int pat_enabled;···18171918extern int kernel_map_sync_memtype(u64 base, unsigned long size,2019 unsigned long flag);2020+extern void map_devmem(unsigned long pfn, unsigned long size,2121+ struct pgprot vma_prot);2222+extern void unmap_devmem(unsigned long pfn, unsigned long size,2323+ struct pgprot vma_prot);21242225#endif /* _ASM_X86_PAT_H */
···288288 return 1;289289}290290291291+pmd_t *populate_extra_pmd(unsigned long vaddr);292292+pte_t *populate_extra_pte(unsigned long vaddr);291293#endif /* __ASSEMBLY__ */292294293295#ifdef CONFIG_X86_32
+5
arch/x86/include/asm/pgtable_32_types.h
···2525 * area for the same reason. ;)2626 */2727#define VMALLOC_OFFSET (8 * 1024 * 1024)2828+2929+#ifndef __ASSEMBLER__3030+extern bool __vmalloc_start_set; /* set once high_memory is set */3131+#endif3232+2833#define VMALLOC_START ((unsigned long)high_memory + VMALLOC_OFFSET)2934#ifdef CONFIG_X86_PAE3035#define LAST_PKMAP 512
···199199#define SCIR_CPU_ACTIVITY 0x02 /* not idle */200200#define SCIR_CPU_HB_INTERVAL (HZ) /* once per second */201201202202+/* Loop through all installed blades */203203+#define for_each_possible_blade(bid) \204204+ for ((bid) = 0; (bid) < uv_num_possible_blades(); (bid)++)205205+202206/*203207 * Macros for converting between kernel virtual addresses, socket local physical204208 * addresses, and UV global physical addresses.
+1
arch/x86/include/asm/xen/page.h
···164164165165166166xmaddr_t arbitrary_virt_to_machine(void *address);167167+unsigned long arbitrary_virt_to_mfn(void *vaddr);167168void make_lowmem_page_readonly(void *vaddr);168169void make_lowmem_page_readwrite(void *vaddr);169170
···414414 that might execute the to be patched code.415415 Other CPUs are not running. */416416 stop_nmi();417417-#ifdef CONFIG_X86_MCE418418- stop_mce();419419-#endif417417+418418+ /*419419+ * Don't stop machine check exceptions while patching.420420+ * MCEs only happen when something got corrupted and in this421421+ * case we must do something about the corruption.422422+ * Ignoring it is worse than a unlikely patching race.423423+ * Also machine checks tend to be broadcast and if one CPU424424+ * goes into machine check the others follow quickly, so we don't425425+ * expect a machine check to cause undue problems during to code426426+ * patching.427427+ */420428421429 apply_alternatives(__alt_instructions, __alt_instructions_end);422430···464456 (unsigned long)__smp_locks_end);465457466458 restart_nmi();467467-#ifdef CONFIG_X86_MCE468468- restart_mce();469469-#endif470459}471460472461/**
+15
arch/x86/kernel/apic/apic.c
···4646#include <asm/idle.h>4747#include <asm/mtrr.h>4848#include <asm/smp.h>4949+#include <asm/mce.h>49505051unsigned int num_processors;5152···843842 apic_write(APIC_LVTTHMR, v | APIC_LVT_MASKED);844843 }845844#endif845845+#ifdef CONFIG_X86_MCE_INTEL846846+ if (maxlvt >= 6) {847847+ v = apic_read(APIC_LVTCMCI);848848+ if (!(v & APIC_LVT_MASKED))849849+ apic_write(APIC_LVTCMCI, v | APIC_LVT_MASKED);850850+ }851851+#endif852852+846853 /*847854 * Clean APIC state for other OSs:848855 */···12501241 apic_write(APIC_LVT1, value);1251124212521243 preempt_enable();12441244+12451245+#ifdef CONFIG_X86_MCE_INTEL12461246+ /* Recheck CMCI information after local APIC is up on CPU #0 */12471247+ if (smp_processor_id() == 0)12481248+ cmci_recheck();12491249+#endif12531250}1254125112551252void __cpuinit end_local_APIC_setup(void)
+52
arch/x86/kernel/cpu/amd.c
···55#include <asm/io.h>66#include <asm/processor.h>77#include <asm/apic.h>88+#include <asm/cpu.h>89910#ifdef CONFIG_X86_641011# include <asm/numa_64.h>···142141 }143142}144143144144+static void __cpuinit amd_k7_smp_check(struct cpuinfo_x86 *c)145145+{146146+#ifdef CONFIG_SMP147147+ /* calling is from identify_secondary_cpu() ? */148148+ if (c->cpu_index == boot_cpu_id)149149+ return;150150+151151+ /*152152+ * Certain Athlons might work (for various values of 'work') in SMP153153+ * but they are not certified as MP capable.154154+ */155155+ /* Athlon 660/661 is valid. */156156+ if ((c->x86_model == 6) && ((c->x86_mask == 0) ||157157+ (c->x86_mask == 1)))158158+ goto valid_k7;159159+160160+ /* Duron 670 is valid */161161+ if ((c->x86_model == 7) && (c->x86_mask == 0))162162+ goto valid_k7;163163+164164+ /*165165+ * Athlon 662, Duron 671, and Athlon >model 7 have capability166166+ * bit. It's worth noting that the A5 stepping (662) of some167167+ * Athlon XP's have the MP bit set.168168+ * See http://www.heise.de/newsticker/data/jow-18.10.01-000 for169169+ * more.170170+ */171171+ if (((c->x86_model == 6) && (c->x86_mask >= 2)) ||172172+ ((c->x86_model == 7) && (c->x86_mask >= 1)) ||173173+ (c->x86_model > 7))174174+ if (cpu_has_mp)175175+ goto valid_k7;176176+177177+ /* If we get here, not a certified SMP capable AMD system. */178178+179179+ /*180180+ * Don't taint if we are running SMP kernel on a single non-MP181181+ * approved Athlon182182+ */183183+ WARN_ONCE(1, "WARNING: This combination of AMD"184184+ "processors is not suitable for SMP.\n");185185+ if (!test_taint(TAINT_UNSAFE_SMP))186186+ add_taint(TAINT_UNSAFE_SMP);187187+188188+valid_k7:189189+ ;190190+#endif191191+}192192+145193static void __cpuinit init_amd_k7(struct cpuinfo_x86 *c)146194{147195 u32 l, h;···225175 }226176227177 set_cpu_cap(c, X86_FEATURE_K7);178178+179179+ amd_k7_smp_check(c);228180}229181#endif230182
+1-1
arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
···601601 if (!data)602602 return -ENOMEM;603603604604- data->acpi_data = percpu_ptr(acpi_perf_data, cpu);604604+ data->acpi_data = per_cpu_ptr(acpi_perf_data, cpu);605605 per_cpu(drv_data, cpu) = data;606606607607 if (cpu_has(c, X86_FEATURE_CONSTANT_TSC))
···6060 }6161}62626363-static unsigned long old_cr4 __initdata;6464-6565-void __init stop_mce(void)6666-{6767- old_cr4 = read_cr4();6868- clear_in_cr4(X86_CR4_MCE);6969-}7070-7171-void __init restart_mce(void)7272-{7373- if (old_cr4 & X86_CR4_MCE)7474- set_in_cr4(X86_CR4_MCE);7575-}7676-7763static int __init mcheck_disable(char *str)7864{7965 mce_disabled = 1;
+395-135
arch/x86/kernel/cpu/mcheck/mce_64.c
···33 * K8 parts Copyright 2002,2003 Andi Kleen, SuSE Labs.44 * Rest from unknown author(s).55 * 2004 Andi Kleen. Rewrote most of it.66+ * Copyright 2008 Intel Corporation77+ * Author: Andi Kleen68 */79810#include <linux/init.h>···2624#include <linux/ctype.h>2725#include <linux/kmod.h>2826#include <linux/kdebug.h>2727+#include <linux/kobject.h>2828+#include <linux/sysfs.h>2929+#include <linux/ratelimit.h>2930#include <asm/processor.h>3031#include <asm/msr.h>3132#include <asm/mce.h>···3732#include <asm/idle.h>38333934#define MISC_MCELOG_MINOR 2274040-#define NR_SYSFS_BANKS 641354236atomic_t mce_entry;4337···5147 */5248static int tolerant = 1;5349static int banks;5454-static unsigned long bank[NR_SYSFS_BANKS] = { [0 ... NR_SYSFS_BANKS-1] = ~0UL };5050+static u64 *bank;5551static unsigned long notify_user;5652static int rip_msr;5753static int mce_bootlog = -1;···6157static char *trigger_argv[2] = { trigger, NULL };62586359static DECLARE_WAIT_QUEUE_HEAD(mce_wait);6060+6161+/* MCA banks polled by the period polling timer for corrected events */6262+DEFINE_PER_CPU(mce_banks_t, mce_poll_banks) = {6363+ [0 ... BITS_TO_LONGS(MAX_NR_BANKS)-1] = ~0UL6464+};6565+6666+/* Do initial initialization of a struct mce */6767+void mce_setup(struct mce *m)6868+{6969+ memset(m, 0, sizeof(struct mce));7070+ m->cpu = smp_processor_id();7171+ rdtscll(m->tsc);7272+}64736574/*6675 * Lockless MCE logging infrastructure.···136119 print_symbol("{%s}", m->ip);137120 printk("\n");138121 }139139- printk(KERN_EMERG "TSC %Lx ", m->tsc);122122+ printk(KERN_EMERG "TSC %llx ", m->tsc);140123 if (m->addr)141141- printk("ADDR %Lx ", m->addr);124124+ printk("ADDR %llx ", m->addr);142125 if (m->misc)143143- printk("MISC %Lx ", m->misc);126126+ printk("MISC %llx ", m->misc);144127 printk("\n");145128 printk(KERN_EMERG "This is not a software problem!\n");146129 printk(KERN_EMERG "Run through mcelog --ascii to decode "···166149 panic(msg);167150}168151169169-static int mce_available(struct cpuinfo_x86 *c)152152+int mce_available(struct cpuinfo_x86 *c)170153{154154+ if (mce_dont_init)155155+ return 0;171156 return cpu_has(c, X86_FEATURE_MCE) && cpu_has(c, X86_FEATURE_MCA);172157}173158···191172}192173193174/*194194- * The actual machine check handler175175+ * Poll for corrected events or events that happened before reset.176176+ * Those are just logged through /dev/mcelog.177177+ *178178+ * This is executed in standard interrupt context.179179+ */180180+void machine_check_poll(enum mcp_flags flags, mce_banks_t *b)181181+{182182+ struct mce m;183183+ int i;184184+185185+ mce_setup(&m);186186+187187+ rdmsrl(MSR_IA32_MCG_STATUS, m.mcgstatus);188188+ for (i = 0; i < banks; i++) {189189+ if (!bank[i] || !test_bit(i, *b))190190+ continue;191191+192192+ m.misc = 0;193193+ m.addr = 0;194194+ m.bank = i;195195+ m.tsc = 0;196196+197197+ barrier();198198+ rdmsrl(MSR_IA32_MC0_STATUS + i*4, m.status);199199+ if (!(m.status & MCI_STATUS_VAL))200200+ continue;201201+202202+ /*203203+ * Uncorrected events are handled by the exception handler204204+ * when it is enabled. But when the exception is disabled log205205+ * everything.206206+ *207207+ * TBD do the same check for MCI_STATUS_EN here?208208+ */209209+ if ((m.status & MCI_STATUS_UC) && !(flags & MCP_UC))210210+ continue;211211+212212+ if (m.status & MCI_STATUS_MISCV)213213+ rdmsrl(MSR_IA32_MC0_MISC + i*4, m.misc);214214+ if (m.status & MCI_STATUS_ADDRV)215215+ rdmsrl(MSR_IA32_MC0_ADDR + i*4, m.addr);216216+217217+ if (!(flags & MCP_TIMESTAMP))218218+ m.tsc = 0;219219+ /*220220+ * Don't get the IP here because it's unlikely to221221+ * have anything to do with the actual error location.222222+ */223223+224224+ mce_log(&m);225225+ add_taint(TAINT_MACHINE_CHECK);226226+227227+ /*228228+ * Clear state for this bank.229229+ */230230+ wrmsrl(MSR_IA32_MC0_STATUS+4*i, 0);231231+ }232232+233233+ /*234234+ * Don't clear MCG_STATUS here because it's only defined for235235+ * exceptions.236236+ */237237+}238238+239239+/*240240+ * The actual machine check handler. This only handles real241241+ * exceptions when something got corrupted coming in through int 18.242242+ *243243+ * This is executed in NMI context not subject to normal locking rules. This244244+ * implies that most kernel services cannot be safely used. Don't even245245+ * think about putting a printk in there!195246 */196247void do_machine_check(struct pt_regs * regs, long error_code)197248{···279190 * error.280191 */281192 int kill_it = 0;193193+ DECLARE_BITMAP(toclear, MAX_NR_BANKS);282194283195 atomic_inc(&mce_entry);284196285285- if ((regs286286- && notify_die(DIE_NMI, "machine check", regs, error_code,197197+ if (notify_die(DIE_NMI, "machine check", regs, error_code,287198 18, SIGKILL) == NOTIFY_STOP)288288- || !banks)199199+ goto out2;200200+ if (!banks)289201 goto out2;290202291291- memset(&m, 0, sizeof(struct mce));292292- m.cpu = smp_processor_id();203203+ mce_setup(&m);204204+293205 rdmsrl(MSR_IA32_MCG_STATUS, m.mcgstatus);294206 /* if the restart IP is not valid, we're done for */295207 if (!(m.mcgstatus & MCG_STATUS_RIPV))···300210 barrier();301211302212 for (i = 0; i < banks; i++) {303303- if (i < NR_SYSFS_BANKS && !bank[i])213213+ __clear_bit(i, toclear);214214+ if (!bank[i])304215 continue;305216306217 m.misc = 0;307218 m.addr = 0;308219 m.bank = i;309309- m.tsc = 0;310220311221 rdmsrl(MSR_IA32_MC0_STATUS + i*4, m.status);312222 if ((m.status & MCI_STATUS_VAL) == 0)313223 continue;224224+225225+ /*226226+ * Non uncorrected errors are handled by machine_check_poll227227+ * Leave them alone.228228+ */229229+ if ((m.status & MCI_STATUS_UC) == 0)230230+ continue;231231+232232+ /*233233+ * Set taint even when machine check was not enabled.234234+ */235235+ add_taint(TAINT_MACHINE_CHECK);236236+237237+ __set_bit(i, toclear);314238315239 if (m.status & MCI_STATUS_EN) {316240 /* if PCC was set, there's no way out */···339235 no_way_out = 1;340236 kill_it = 1;341237 }238238+ } else {239239+ /*240240+ * Machine check event was not enabled. Clear, but241241+ * ignore.242242+ */243243+ continue;342244 }343245344246 if (m.status & MCI_STATUS_MISCV)···353243 rdmsrl(MSR_IA32_MC0_ADDR + i*4, m.addr);354244355245 mce_get_rip(&m, regs);356356- if (error_code >= 0)357357- rdtscll(m.tsc);358358- if (error_code != -2)359359- mce_log(&m);246246+ mce_log(&m);360247361248 /* Did this bank cause the exception? */362249 /* Assume that the bank with uncorrectable errors did it,···362255 panicm = m;363256 panicm_found = 1;364257 }365365-366366- add_taint(TAINT_MACHINE_CHECK);367258 }368368-369369- /* Never do anything final in the polling timer */370370- if (!regs)371371- goto out;372259373260 /* If we didn't find an uncorrectable error, pick374261 the last one (shouldn't happen, just being safe). */···410309 /* notify userspace ASAP */411310 set_thread_flag(TIF_MCE_NOTIFY);412311413413- out:414312 /* the last thing we do is clear state */415415- for (i = 0; i < banks; i++)416416- wrmsrl(MSR_IA32_MC0_STATUS+4*i, 0);313313+ for (i = 0; i < banks; i++) {314314+ if (test_bit(i, toclear))315315+ wrmsrl(MSR_IA32_MC0_STATUS+4*i, 0);316316+ }417317 wrmsrl(MSR_IA32_MCG_STATUS, 0);418318 out2:419319 atomic_dec(&mce_entry);···434332 * and historically has been the register value of the435333 * MSR_IA32_THERMAL_STATUS (Intel) msr.436334 */437437-void mce_log_therm_throt_event(unsigned int cpu, __u64 status)335335+void mce_log_therm_throt_event(__u64 status)438336{439337 struct mce m;440338441441- memset(&m, 0, sizeof(m));442442- m.cpu = cpu;339339+ mce_setup(&m);443340 m.bank = MCE_THERMAL_BANK;444341 m.status = status;445445- rdtscll(m.tsc);446342 mce_log(&m);447343}448344#endif /* CONFIG_X86_MCE_INTEL */···453353454354static int check_interval = 5 * 60; /* 5 minutes */455355static int next_interval; /* in jiffies */456456-static void mcheck_timer(struct work_struct *work);457457-static DECLARE_DELAYED_WORK(mcheck_work, mcheck_timer);356356+static void mcheck_timer(unsigned long);357357+static DEFINE_PER_CPU(struct timer_list, mce_timer);458358459459-static void mcheck_check_cpu(void *info)359359+static void mcheck_timer(unsigned long data)460360{361361+ struct timer_list *t = &per_cpu(mce_timer, data);362362+363363+ WARN_ON(smp_processor_id() != data);364364+461365 if (mce_available(¤t_cpu_data))462462- do_machine_check(NULL, 0);463463-}464464-465465-static void mcheck_timer(struct work_struct *work)466466-{467467- on_each_cpu(mcheck_check_cpu, NULL, 1);366366+ machine_check_poll(MCP_TIMESTAMP,367367+ &__get_cpu_var(mce_poll_banks));468368469369 /*470370 * Alert userspace if needed. If we logged an MCE, reduce the···477377 (int)round_jiffies_relative(check_interval*HZ));478378 }479379480480- schedule_delayed_work(&mcheck_work, next_interval);380380+ t->expires = jiffies + next_interval;381381+ add_timer(t);481382}482383384384+static void mce_do_trigger(struct work_struct *work)385385+{386386+ call_usermodehelper(trigger, trigger_argv, NULL, UMH_NO_WAIT);387387+}388388+389389+static DECLARE_WORK(mce_trigger_work, mce_do_trigger);390390+483391/*484484- * This is only called from process context. This is where we do485485- * anything we need to alert userspace about new MCEs. This is called486486- * directly from the poller and also from entry.S and idle, thanks to487487- * TIF_MCE_NOTIFY.392392+ * Notify the user(s) about new machine check events.393393+ * Can be called from interrupt context, but not from machine check/NMI394394+ * context.488395 */489396int mce_notify_user(void)490397{398398+ /* Not more than two messages every minute */399399+ static DEFINE_RATELIMIT_STATE(ratelimit, 60*HZ, 2);400400+491401 clear_thread_flag(TIF_MCE_NOTIFY);492402 if (test_and_clear_bit(0, ¬ify_user)) {493493- static unsigned long last_print;494494- unsigned long now = jiffies;495495-496403 wake_up_interruptible(&mce_wait);497497- if (trigger[0])498498- call_usermodehelper(trigger, trigger_argv, NULL,499499- UMH_NO_WAIT);500404501501- if (time_after_eq(now, last_print + (check_interval*HZ))) {502502- last_print = now;405405+ /*406406+ * There is no risk of missing notifications because407407+ * work_pending is always cleared before the function is408408+ * executed.409409+ */410410+ if (trigger[0] && !work_pending(&mce_trigger_work))411411+ schedule_work(&mce_trigger_work);412412+413413+ if (__ratelimit(&ratelimit))503414 printk(KERN_INFO "Machine check events logged\n");504504- }505415506416 return 1;507417 }···535425536426static __init int periodic_mcheck_init(void)537427{538538- next_interval = check_interval * HZ;539539- if (next_interval)540540- schedule_delayed_work(&mcheck_work,541541- round_jiffies_relative(next_interval));542542- idle_notifier_register(&mce_idle_notifier);543543- return 0;428428+ idle_notifier_register(&mce_idle_notifier);429429+ return 0;544430}545431__initcall(periodic_mcheck_init);546546-547432548433/*549434 * Initialize Machine Checks for a CPU.550435 */551551-static void mce_init(void *dummy)436436+static int mce_cap_init(void)552437{553438 u64 cap;554554- int i;439439+ unsigned b;555440556441 rdmsrl(MSR_IA32_MCG_CAP, cap);557557- banks = cap & 0xff;558558- if (banks > MCE_EXTENDED_BANK) {559559- banks = MCE_EXTENDED_BANK;560560- printk(KERN_INFO "MCE: warning: using only %d banks\n",561561- MCE_EXTENDED_BANK);442442+ b = cap & 0xff;443443+ if (b > MAX_NR_BANKS) {444444+ printk(KERN_WARNING445445+ "MCE: Using only %u machine check banks out of %u\n",446446+ MAX_NR_BANKS, b);447447+ b = MAX_NR_BANKS;562448 }449449+450450+ /* Don't support asymmetric configurations today */451451+ WARN_ON(banks != 0 && b != banks);452452+ banks = b;453453+ if (!bank) {454454+ bank = kmalloc(banks * sizeof(u64), GFP_KERNEL);455455+ if (!bank)456456+ return -ENOMEM;457457+ memset(bank, 0xff, banks * sizeof(u64));458458+ }459459+563460 /* Use accurate RIP reporting if available. */564461 if ((cap & (1<<9)) && ((cap >> 16) & 0xff) >= 9)565462 rip_msr = MSR_IA32_MCG_EIP;566463567567- /* Log the machine checks left over from the previous reset.568568- This also clears all registers */569569- do_machine_check(NULL, mce_bootlog ? -1 : -2);464464+ return 0;465465+}466466+467467+static void mce_init(void *dummy)468468+{469469+ u64 cap;470470+ int i;471471+ mce_banks_t all_banks;472472+473473+ /*474474+ * Log the machine checks left over from the previous reset.475475+ */476476+ bitmap_fill(all_banks, MAX_NR_BANKS);477477+ machine_check_poll(MCP_UC, &all_banks);570478571479 set_in_cr4(X86_CR4_MCE);572480481481+ rdmsrl(MSR_IA32_MCG_CAP, cap);573482 if (cap & MCG_CTL_P)574483 wrmsr(MSR_IA32_MCG_CTL, 0xffffffff, 0xffffffff);575484576485 for (i = 0; i < banks; i++) {577577- if (i < NR_SYSFS_BANKS)578578- wrmsrl(MSR_IA32_MC0_CTL+4*i, bank[i]);579579- else580580- wrmsrl(MSR_IA32_MC0_CTL+4*i, ~0UL);581581-486486+ wrmsrl(MSR_IA32_MC0_CTL+4*i, bank[i]);582487 wrmsrl(MSR_IA32_MC0_STATUS+4*i, 0);583488 }584489}585490586491/* Add per CPU specific workarounds here */587587-static void __cpuinit mce_cpu_quirks(struct cpuinfo_x86 *c)492492+static void mce_cpu_quirks(struct cpuinfo_x86 *c)588493{589494 /* This should be disabled by the BIOS, but isn't always */590495 if (c->x86_vendor == X86_VENDOR_AMD) {591591- if(c->x86 == 15)496496+ if (c->x86 == 15 && banks > 4)592497 /* disable GART TBL walk error reporting, which trips off593498 incorrectly with the IOMMU & 3ware & Cerberus. */594594- clear_bit(10, &bank[4]);499499+ clear_bit(10, (unsigned long *)&bank[4]);595500 if(c->x86 <= 17 && mce_bootlog < 0)596501 /* Lots of broken BIOS around that don't clear them597502 by default and leave crap in there. Don't log. */···629504 }630505}631506507507+static void mce_init_timer(void)508508+{509509+ struct timer_list *t = &__get_cpu_var(mce_timer);510510+511511+ /* data race harmless because everyone sets to the same value */512512+ if (!next_interval)513513+ next_interval = check_interval * HZ;514514+ if (!next_interval)515515+ return;516516+ setup_timer(t, mcheck_timer, smp_processor_id());517517+ t->expires = round_jiffies(jiffies + next_interval);518518+ add_timer(t);519519+}520520+632521/*633522 * Called for each booted CPU to set up machine checks.634523 * Must be called with preempt off.635524 */636525void __cpuinit mcheck_init(struct cpuinfo_x86 *c)637526{638638- mce_cpu_quirks(c);639639-640640- if (mce_dont_init ||641641- !mce_available(c))527527+ if (!mce_available(c))642528 return;529529+530530+ if (mce_cap_init() < 0) {531531+ mce_dont_init = 1;532532+ return;533533+ }534534+ mce_cpu_quirks(c);643535644536 mce_init(NULL);645537 mce_cpu_features(c);538538+ mce_init_timer();646539}647540648541/*···716573{717574 unsigned long *cpu_tsc;718575 static DEFINE_MUTEX(mce_read_mutex);719719- unsigned next;576576+ unsigned prev, next;720577 char __user *buf = ubuf;721578 int i, err;722579···735592 }736593737594 err = 0;738738- for (i = 0; i < next; i++) {739739- unsigned long start = jiffies;595595+ prev = 0;596596+ do {597597+ for (i = prev; i < next; i++) {598598+ unsigned long start = jiffies;740599741741- while (!mcelog.entry[i].finished) {742742- if (time_after_eq(jiffies, start + 2)) {743743- memset(mcelog.entry + i,0, sizeof(struct mce));744744- goto timeout;600600+ while (!mcelog.entry[i].finished) {601601+ if (time_after_eq(jiffies, start + 2)) {602602+ memset(mcelog.entry + i, 0,603603+ sizeof(struct mce));604604+ goto timeout;605605+ }606606+ cpu_relax();745607 }746746- cpu_relax();608608+ smp_rmb();609609+ err |= copy_to_user(buf, mcelog.entry + i,610610+ sizeof(struct mce));611611+ buf += sizeof(struct mce);612612+timeout:613613+ ;747614 }748748- smp_rmb();749749- err |= copy_to_user(buf, mcelog.entry + i, sizeof(struct mce));750750- buf += sizeof(struct mce);751751- timeout:752752- ;753753- }754615755755- memset(mcelog.entry, 0, next * sizeof(struct mce));756756- mcelog.next = 0;616616+ memset(mcelog.entry + prev, 0,617617+ (next - prev) * sizeof(struct mce));618618+ prev = next;619619+ next = cmpxchg(&mcelog.next, prev, 0);620620+ } while (next != prev);757621758622 synchronize_sched();759623···830680 &mce_chrdev_ops,831681};832682833833-static unsigned long old_cr4 __initdata;834834-835835-void __init stop_mce(void)836836-{837837- old_cr4 = read_cr4();838838- clear_in_cr4(X86_CR4_MCE);839839-}840840-841841-void __init restart_mce(void)842842-{843843- if (old_cr4 & X86_CR4_MCE)844844- set_in_cr4(X86_CR4_MCE);845845-}846846-847683/*848684 * Old style boot options parsing. Only for compatibility.849685 */···839703 return 1;840704}841705842842-/* mce=off disables machine check. Note you can re-enable it later843843- using sysfs.706706+/* mce=off disables machine check.844707 mce=TOLERANCELEVEL (number, see above)845708 mce=bootlog Log MCEs from before booting. Disabled by default on AMD.846709 mce=nobootlog Don't log MCEs from before booting. */···863728 * Sysfs support864729 */865730731731+/*732732+ * Disable machine checks on suspend and shutdown. We can't really handle733733+ * them later.734734+ */735735+static int mce_disable(void)736736+{737737+ int i;738738+739739+ for (i = 0; i < banks; i++)740740+ wrmsrl(MSR_IA32_MC0_CTL + i*4, 0);741741+ return 0;742742+}743743+744744+static int mce_suspend(struct sys_device *dev, pm_message_t state)745745+{746746+ return mce_disable();747747+}748748+749749+static int mce_shutdown(struct sys_device *dev)750750+{751751+ return mce_disable();752752+}753753+866754/* On resume clear all MCE state. Don't want to see leftovers from the BIOS.867755 Only one CPU is active at this time, the others get readded later using868756 CPU hotplug. */···896738 return 0;897739}898740741741+static void mce_cpu_restart(void *data)742742+{743743+ del_timer_sync(&__get_cpu_var(mce_timer));744744+ if (mce_available(¤t_cpu_data))745745+ mce_init(NULL);746746+ mce_init_timer();747747+}748748+899749/* Reinit MCEs after user configuration changes */900750static void mce_restart(void)901751{902902- if (next_interval)903903- cancel_delayed_work(&mcheck_work);904904- /* Timer race is harmless here */905905- on_each_cpu(mce_init, NULL, 1);906752 next_interval = check_interval * HZ;907907- if (next_interval)908908- schedule_delayed_work(&mcheck_work,909909- round_jiffies_relative(next_interval));753753+ on_each_cpu(mce_cpu_restart, NULL, 1);910754}911755912756static struct sysdev_class mce_sysclass = {757757+ .suspend = mce_suspend,758758+ .shutdown = mce_shutdown,913759 .resume = mce_resume,914760 .name = "machinecheck",915761};···940778 } \941779 static SYSDEV_ATTR(name, 0644, show_ ## name, set_ ## name);942780943943-/*944944- * TBD should generate these dynamically based on number of available banks.945945- * Have only 6 contol banks in /sysfs until then.946946- */947947-ACCESSOR(bank0ctl,bank[0],mce_restart())948948-ACCESSOR(bank1ctl,bank[1],mce_restart())949949-ACCESSOR(bank2ctl,bank[2],mce_restart())950950-ACCESSOR(bank3ctl,bank[3],mce_restart())951951-ACCESSOR(bank4ctl,bank[4],mce_restart())952952-ACCESSOR(bank5ctl,bank[5],mce_restart())781781+static struct sysdev_attribute *bank_attrs;782782+783783+static ssize_t show_bank(struct sys_device *s, struct sysdev_attribute *attr,784784+ char *buf)785785+{786786+ u64 b = bank[attr - bank_attrs];787787+ return sprintf(buf, "%llx\n", b);788788+}789789+790790+static ssize_t set_bank(struct sys_device *s, struct sysdev_attribute *attr,791791+ const char *buf, size_t siz)792792+{793793+ char *end;794794+ u64 new = simple_strtoull(buf, &end, 0);795795+ if (end == buf)796796+ return -EINVAL;797797+ bank[attr - bank_attrs] = new;798798+ mce_restart();799799+ return end-buf;800800+}953801954802static ssize_t show_trigger(struct sys_device *s, struct sysdev_attribute *attr,955803 char *buf)···986814static SYSDEV_INT_ATTR(tolerant, 0644, tolerant);987815ACCESSOR(check_interval,check_interval,mce_restart())988816static struct sysdev_attribute *mce_attributes[] = {989989- &attr_bank0ctl, &attr_bank1ctl, &attr_bank2ctl,990990- &attr_bank3ctl, &attr_bank4ctl, &attr_bank5ctl,991817 &attr_tolerant.attr, &attr_check_interval, &attr_trigger,992818 NULL993819};···1015845 if (err)1016846 goto error;1017847 }848848+ for (i = 0; i < banks; i++) {849849+ err = sysdev_create_file(&per_cpu(device_mce, cpu),850850+ &bank_attrs[i]);851851+ if (err)852852+ goto error2;853853+ }1018854 cpu_set(cpu, mce_device_initialized);10198551020856 return 0;857857+error2:858858+ while (--i >= 0) {859859+ sysdev_remove_file(&per_cpu(device_mce, cpu),860860+ &bank_attrs[i]);861861+ }1021862error:10221022- while (i--) {863863+ while (--i >= 0) {1023864 sysdev_remove_file(&per_cpu(device_mce,cpu),1024865 mce_attributes[i]);1025866 }···1049868 for (i = 0; mce_attributes[i]; i++)1050869 sysdev_remove_file(&per_cpu(device_mce,cpu),1051870 mce_attributes[i]);871871+ for (i = 0; i < banks; i++)872872+ sysdev_remove_file(&per_cpu(device_mce, cpu),873873+ &bank_attrs[i]);1052874 sysdev_unregister(&per_cpu(device_mce,cpu));1053875 cpu_clear(cpu, mce_device_initialized);876876+}877877+878878+/* Make sure there are no machine checks on offlined CPUs. */879879+static void mce_disable_cpu(void *h)880880+{881881+ int i;882882+ unsigned long action = *(unsigned long *)h;883883+884884+ if (!mce_available(¤t_cpu_data))885885+ return;886886+ if (!(action & CPU_TASKS_FROZEN))887887+ cmci_clear();888888+ for (i = 0; i < banks; i++)889889+ wrmsrl(MSR_IA32_MC0_CTL + i*4, 0);890890+}891891+892892+static void mce_reenable_cpu(void *h)893893+{894894+ int i;895895+ unsigned long action = *(unsigned long *)h;896896+897897+ if (!mce_available(¤t_cpu_data))898898+ return;899899+ if (!(action & CPU_TASKS_FROZEN))900900+ cmci_reenable();901901+ for (i = 0; i < banks; i++)902902+ wrmsrl(MSR_IA32_MC0_CTL + i*4, bank[i]);1054903}10559041056905/* Get notified when a cpu comes on/off. Be hotplug friendly. */···1088877 unsigned long action, void *hcpu)1089878{1090879 unsigned int cpu = (unsigned long)hcpu;880880+ struct timer_list *t = &per_cpu(mce_timer, cpu);10918811092882 switch (action) {1093883 case CPU_ONLINE:···1103891 threshold_cpu_callback(action, cpu);1104892 mce_remove_device(cpu);1105893 break;894894+ case CPU_DOWN_PREPARE:895895+ case CPU_DOWN_PREPARE_FROZEN:896896+ del_timer_sync(t);897897+ smp_call_function_single(cpu, mce_disable_cpu, &action, 1);898898+ break;899899+ case CPU_DOWN_FAILED:900900+ case CPU_DOWN_FAILED_FROZEN:901901+ t->expires = round_jiffies(jiffies + next_interval);902902+ add_timer_on(t, cpu);903903+ smp_call_function_single(cpu, mce_reenable_cpu, &action, 1);904904+ break;905905+ case CPU_POST_DEAD:906906+ /* intentionally ignoring frozen here */907907+ cmci_rediscover(cpu);908908+ break;1106909 }1107910 return NOTIFY_OK;1108911}···1126899 .notifier_call = mce_cpu_callback,1127900};1128901902902+static __init int mce_init_banks(void)903903+{904904+ int i;905905+906906+ bank_attrs = kzalloc(sizeof(struct sysdev_attribute) * banks,907907+ GFP_KERNEL);908908+ if (!bank_attrs)909909+ return -ENOMEM;910910+911911+ for (i = 0; i < banks; i++) {912912+ struct sysdev_attribute *a = &bank_attrs[i];913913+ a->attr.name = kasprintf(GFP_KERNEL, "bank%d", i);914914+ if (!a->attr.name)915915+ goto nomem;916916+ a->attr.mode = 0644;917917+ a->show = show_bank;918918+ a->store = set_bank;919919+ }920920+ return 0;921921+922922+nomem:923923+ while (--i >= 0)924924+ kfree(bank_attrs[i].attr.name);925925+ kfree(bank_attrs);926926+ bank_attrs = NULL;927927+ return -ENOMEM;928928+}929929+1129930static __init int mce_init_device(void)1130931{1131932 int err;···11619061162907 if (!mce_available(&boot_cpu_data))1163908 return -EIO;909909+910910+ err = mce_init_banks();911911+ if (err)912912+ return err;913913+1164914 err = sysdev_class_register(&mce_sysclass);1165915 if (err)1166916 return err;
+9-13
arch/x86/kernel/cpu/mcheck/mce_amd_64.c
···79798080static DEFINE_PER_CPU(unsigned char, bank_map); /* see which banks are on */81818282+static void amd_threshold_interrupt(void);8383+8284/*8385 * CPU Initialization8486 */···176174 tr.reset = 0;177175 tr.old_limit = 0;178176 threshold_restart_bank(&tr);177177+178178+ mce_threshold_vector = amd_threshold_interrupt;179179 }180180 }181181}···191187 * the interrupt goes off when error_count reaches threshold_limit.192188 * the handler will simply log mcelog w/ software defined bank number.193189 */194194-asmlinkage void mce_threshold_interrupt(void)190190+static void amd_threshold_interrupt(void)195191{196192 unsigned int bank, block;197193 struct mce m;198194 u32 low = 0, high = 0, address = 0;199195200200- ack_APIC_irq();201201- exit_idle();202202- irq_enter();203203-204204- memset(&m, 0, sizeof(m));205205- rdtscll(m.tsc);206206- m.cpu = smp_processor_id();196196+ mce_setup(&m);207197208198 /* assume first bank caused it */209199 for (bank = 0; bank < NR_BANKS; ++bank) {···231233232234 /* Log the machine check that caused the threshold233235 event. */234234- do_machine_check(NULL, 0);236236+ machine_check_poll(MCP_TIMESTAMP,237237+ &__get_cpu_var(mce_poll_banks));235238236239 if (high & MASK_OVERFLOW_HI) {237240 rdmsrl(address, m.misc);···242243 + bank * NR_BLOCKS243244 + block;244245 mce_log(&m);245245- goto out;246246+ return;246247 }247248 }248249 }249249-out:250250- inc_irq_stat(irq_threshold_count);251251- irq_exit();252250}253251254252/*
+206-1
arch/x86/kernel/cpu/mcheck/mce_intel_64.c
···11/*22 * Intel specific MCE features.33 * Copyright 2004 Zwane Mwaikambo <zwane@linuxpower.ca>44+ * Copyright (C) 2008, 2009 Intel Corporation55+ * Author: Andi Kleen46 */5768#include <linux/init.h>···1513#include <asm/hw_irq.h>1614#include <asm/idle.h>1715#include <asm/therm_throt.h>1616+#include <asm/apic.h>18171918asmlinkage void smp_thermal_interrupt(void)2019{···28252926 rdmsrl(MSR_IA32_THERM_STATUS, msr_val);3027 if (therm_throt_process(msr_val & 1))3131- mce_log_therm_throt_event(smp_processor_id(), msr_val);2828+ mce_log_therm_throt_event(msr_val);32293330 inc_irq_stat(irq_thermal_count);3431 irq_exit();···8885 return;8986}90878888+/*8989+ * Support for Intel Correct Machine Check Interrupts. This allows9090+ * the CPU to raise an interrupt when a corrected machine check happened.9191+ * Normally we pick those up using a regular polling timer.9292+ * Also supports reliable discovery of shared banks.9393+ */9494+9595+static DEFINE_PER_CPU(mce_banks_t, mce_banks_owned);9696+9797+/*9898+ * cmci_discover_lock protects against parallel discovery attempts9999+ * which could race against each other.100100+ */101101+static DEFINE_SPINLOCK(cmci_discover_lock);102102+103103+#define CMCI_THRESHOLD 1104104+105105+static int cmci_supported(int *banks)106106+{107107+ u64 cap;108108+109109+ /*110110+ * Vendor check is not strictly needed, but the initial111111+ * initialization is vendor keyed and this112112+ * makes sure none of the backdoors are entered otherwise.113113+ */114114+ if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)115115+ return 0;116116+ if (!cpu_has_apic || lapic_get_maxlvt() < 6)117117+ return 0;118118+ rdmsrl(MSR_IA32_MCG_CAP, cap);119119+ *banks = min_t(unsigned, MAX_NR_BANKS, cap & 0xff);120120+ return !!(cap & MCG_CMCI_P);121121+}122122+123123+/*124124+ * The interrupt handler. This is called on every event.125125+ * Just call the poller directly to log any events.126126+ * This could in theory increase the threshold under high load,127127+ * but doesn't for now.128128+ */129129+static void intel_threshold_interrupt(void)130130+{131131+ machine_check_poll(MCP_TIMESTAMP, &__get_cpu_var(mce_banks_owned));132132+ mce_notify_user();133133+}134134+135135+static void print_update(char *type, int *hdr, int num)136136+{137137+ if (*hdr == 0)138138+ printk(KERN_INFO "CPU %d MCA banks", smp_processor_id());139139+ *hdr = 1;140140+ printk(KERN_CONT " %s:%d", type, num);141141+}142142+143143+/*144144+ * Enable CMCI (Corrected Machine Check Interrupt) for available MCE banks145145+ * on this CPU. Use the algorithm recommended in the SDM to discover shared146146+ * banks.147147+ */148148+static void cmci_discover(int banks, int boot)149149+{150150+ unsigned long *owned = (void *)&__get_cpu_var(mce_banks_owned);151151+ int hdr = 0;152152+ int i;153153+154154+ spin_lock(&cmci_discover_lock);155155+ for (i = 0; i < banks; i++) {156156+ u64 val;157157+158158+ if (test_bit(i, owned))159159+ continue;160160+161161+ rdmsrl(MSR_IA32_MC0_CTL2 + i, val);162162+163163+ /* Already owned by someone else? */164164+ if (val & CMCI_EN) {165165+ if (test_and_clear_bit(i, owned) || boot)166166+ print_update("SHD", &hdr, i);167167+ __clear_bit(i, __get_cpu_var(mce_poll_banks));168168+ continue;169169+ }170170+171171+ val |= CMCI_EN | CMCI_THRESHOLD;172172+ wrmsrl(MSR_IA32_MC0_CTL2 + i, val);173173+ rdmsrl(MSR_IA32_MC0_CTL2 + i, val);174174+175175+ /* Did the enable bit stick? -- the bank supports CMCI */176176+ if (val & CMCI_EN) {177177+ if (!test_and_set_bit(i, owned) || boot)178178+ print_update("CMCI", &hdr, i);179179+ __clear_bit(i, __get_cpu_var(mce_poll_banks));180180+ } else {181181+ WARN_ON(!test_bit(i, __get_cpu_var(mce_poll_banks)));182182+ }183183+ }184184+ spin_unlock(&cmci_discover_lock);185185+ if (hdr)186186+ printk(KERN_CONT "\n");187187+}188188+189189+/*190190+ * Just in case we missed an event during initialization check191191+ * all the CMCI owned banks.192192+ */193193+void cmci_recheck(void)194194+{195195+ unsigned long flags;196196+ int banks;197197+198198+ if (!mce_available(¤t_cpu_data) || !cmci_supported(&banks))199199+ return;200200+ local_irq_save(flags);201201+ machine_check_poll(MCP_TIMESTAMP, &__get_cpu_var(mce_banks_owned));202202+ local_irq_restore(flags);203203+}204204+205205+/*206206+ * Disable CMCI on this CPU for all banks it owns when it goes down.207207+ * This allows other CPUs to claim the banks on rediscovery.208208+ */209209+void cmci_clear(void)210210+{211211+ int i;212212+ int banks;213213+ u64 val;214214+215215+ if (!cmci_supported(&banks))216216+ return;217217+ spin_lock(&cmci_discover_lock);218218+ for (i = 0; i < banks; i++) {219219+ if (!test_bit(i, __get_cpu_var(mce_banks_owned)))220220+ continue;221221+ /* Disable CMCI */222222+ rdmsrl(MSR_IA32_MC0_CTL2 + i, val);223223+ val &= ~(CMCI_EN|CMCI_THRESHOLD_MASK);224224+ wrmsrl(MSR_IA32_MC0_CTL2 + i, val);225225+ __clear_bit(i, __get_cpu_var(mce_banks_owned));226226+ }227227+ spin_unlock(&cmci_discover_lock);228228+}229229+230230+/*231231+ * After a CPU went down cycle through all the others and rediscover232232+ * Must run in process context.233233+ */234234+void cmci_rediscover(int dying)235235+{236236+ int banks;237237+ int cpu;238238+ cpumask_var_t old;239239+240240+ if (!cmci_supported(&banks))241241+ return;242242+ if (!alloc_cpumask_var(&old, GFP_KERNEL))243243+ return;244244+ cpumask_copy(old, ¤t->cpus_allowed);245245+246246+ for_each_online_cpu (cpu) {247247+ if (cpu == dying)248248+ continue;249249+ if (set_cpus_allowed_ptr(current, &cpumask_of_cpu(cpu)))250250+ continue;251251+ /* Recheck banks in case CPUs don't all have the same */252252+ if (cmci_supported(&banks))253253+ cmci_discover(banks, 0);254254+ }255255+256256+ set_cpus_allowed_ptr(current, old);257257+ free_cpumask_var(old);258258+}259259+260260+/*261261+ * Reenable CMCI on this CPU in case a CPU down failed.262262+ */263263+void cmci_reenable(void)264264+{265265+ int banks;266266+ if (cmci_supported(&banks))267267+ cmci_discover(banks, 0);268268+}269269+270270+static __cpuinit void intel_init_cmci(void)271271+{272272+ int banks;273273+274274+ if (!cmci_supported(&banks))275275+ return;276276+277277+ mce_threshold_vector = intel_threshold_interrupt;278278+ cmci_discover(banks, 1);279279+ /*280280+ * For CPU #0 this runs with still disabled APIC, but that's281281+ * ok because only the vector is set up. We still do another282282+ * check for the banks later for CPU #0 just to make sure283283+ * to not miss any events.284284+ */285285+ apic_write(APIC_LVTCMCI, THRESHOLD_APIC_VECTOR|APIC_DM_FIXED);286286+ cmci_recheck();287287+}288288+91289void mce_intel_feature_init(struct cpuinfo_x86 *c)92290{93291 intel_init_thermal(c);292292+ intel_init_cmci();94293}
+29
arch/x86/kernel/cpu/mcheck/threshold.c
···11+/*22+ * Common corrected MCE threshold handler code:33+ */44+#include <linux/interrupt.h>55+#include <linux/kernel.h>66+77+#include <asm/irq_vectors.h>88+#include <asm/apic.h>99+#include <asm/idle.h>1010+#include <asm/mce.h>1111+1212+static void default_threshold_interrupt(void)1313+{1414+ printk(KERN_ERR "Unexpected threshold interrupt at vector %x\n",1515+ THRESHOLD_APIC_VECTOR);1616+}1717+1818+void (*mce_threshold_vector)(void) = default_threshold_interrupt;1919+2020+asmlinkage void mce_threshold_interrupt(void)2121+{2222+ exit_idle();2323+ irq_enter();2424+ inc_irq_stat(irq_threshold_count);2525+ mce_threshold_vector();2626+ irq_exit();2727+ /* Ack only at the end to avoid potential reentry */2828+ ack_APIC_irq();2929+}
···15151616atomic_t irq_err_count;17171818+/* Function pointer for generic interrupt vector handling */1919+void (*generic_interrupt_extension)(void) = NULL;2020+1821/*1922 * 'what should we do if we get a hw irq event on an illegal vector'.2023 * each architecture has to answer this themselves.···5956 seq_printf(p, "%10u ", irq_stats(j)->apic_timer_irqs);6057 seq_printf(p, " Local timer interrupts\n");6158#endif5959+ if (generic_interrupt_extension) {6060+ seq_printf(p, "PLT: ");6161+ for_each_online_cpu(j)6262+ seq_printf(p, "%10u ", irq_stats(j)->generic_irqs);6363+ seq_printf(p, " Platform interrupts\n");6464+ }6265#ifdef CONFIG_SMP6366 seq_printf(p, "RES: ");6467 for_each_online_cpu(j)···172163#ifdef CONFIG_X86_LOCAL_APIC173164 sum += irq_stats(cpu)->apic_timer_irqs;174165#endif166166+ if (generic_interrupt_extension)167167+ sum += irq_stats(cpu)->generic_irqs;175168#ifdef CONFIG_SMP176169 sum += irq_stats(cpu)->irq_resched_count;177170 sum += irq_stats(cpu)->irq_call_count;···235224236225 set_irq_regs(old_regs);237226 return 1;227227+}228228+229229+/*230230+ * Handler for GENERIC_INTERRUPT_VECTOR.231231+ */232232+void smp_generic_interrupt(struct pt_regs *regs)233233+{234234+ struct pt_regs *old_regs = set_irq_regs(regs);235235+236236+ ack_APIC_irq();237237+238238+ exit_idle();239239+240240+ irq_enter();241241+242242+ inc_irq_stat(generic_irqs);243243+244244+ if (generic_interrupt_extension)245245+ generic_interrupt_extension();246246+247247+ irq_exit();248248+249249+ set_irq_regs(old_regs);238250}239251240252EXPORT_SYMBOL_GPL(vector_used_by_percpu_irq);
···175175 /* self generated IPI for local APIC timer */176176 alloc_intr_gate(LOCAL_TIMER_VECTOR, apic_timer_interrupt);177177178178+ /* generic IPI for platform specific use */179179+ alloc_intr_gate(GENERIC_INTERRUPT_VECTOR, generic_interrupt);180180+178181 /* IPI vectors for APIC spurious and error interrupts */179182 alloc_intr_gate(SPURIOUS_APIC_VECTOR, spurious_interrupt);180183 alloc_intr_gate(ERROR_APIC_VECTOR, error_interrupt);
+3
arch/x86/kernel/irqinit_64.c
···147147 /* self generated IPI for local APIC timer */148148 alloc_intr_gate(LOCAL_TIMER_VECTOR, apic_timer_interrupt);149149150150+ /* generic IPI for platform specific use */151151+ alloc_intr_gate(GENERIC_INTERRUPT_VECTOR, generic_interrupt);152152+150153 /* IPI vectors for APIC spurious and error interrupts */151154 alloc_intr_gate(SPURIOUS_APIC_VECTOR, spurious_interrupt);152155 alloc_intr_gate(ERROR_APIC_VECTOR, error_interrupt);
+10-7
arch/x86/kernel/machine_kexec_32.c
···1414#include <linux/ftrace.h>1515#include <linux/suspend.h>1616#include <linux/gfp.h>1717+#include <linux/io.h>17181819#include <asm/pgtable.h>1920#include <asm/pgalloc.h>2021#include <asm/tlbflush.h>2122#include <asm/mmu_context.h>2222-#include <asm/io.h>2323#include <asm/apic.h>2424#include <asm/cpufeature.h>2525#include <asm/desc.h>···6363 "\tmovl %%eax,%%fs\n"6464 "\tmovl %%eax,%%gs\n"6565 "\tmovl %%eax,%%ss\n"6666- ::: "eax", "memory");6666+ : : : "eax", "memory");6767#undef STR6868#undef __STR6969}···205205206206 if (image->preserve_context) {207207#ifdef CONFIG_X86_IO_APIC208208- /* We need to put APICs in legacy mode so that we can208208+ /*209209+ * We need to put APICs in legacy mode so that we can209210 * get timer interrupts in second kernel. kexec/kdump210211 * paths already have calls to disable_IO_APIC() in211212 * one form or other. kexec jump path also need···228227 page_list[PA_SWAP_PAGE] = (page_to_pfn(image->swap_page)229228 << PAGE_SHIFT);230229231231- /* The segment registers are funny things, they have both a230230+ /*231231+ * The segment registers are funny things, they have both a232232 * visible and an invisible part. Whenever the visible part is233233 * set to a specific selector, the invisible part is loaded234234 * with from a table in memory. At no other time is the···239237 * segments, before I zap the gdt with an invalid value.240238 */241239 load_segments();242242- /* The gdt & idt are now invalid.240240+ /*241241+ * The gdt & idt are now invalid.243242 * If you want to load them you must set up your own idt & gdt.244243 */245245- set_gdt(phys_to_virt(0),0);246246- set_idt(phys_to_virt(0),0);244244+ set_gdt(phys_to_virt(0), 0);245245+ set_idt(phys_to_virt(0), 0);247246248247 /* now call it */249248 image->start = relocate_kernel_ptr((unsigned long)image->head,
+88-11
arch/x86/kernel/machine_kexec_64.c
···1212#include <linux/reboot.h>1313#include <linux/numa.h>1414#include <linux/ftrace.h>1515+#include <linux/io.h>1616+#include <linux/suspend.h>15171618#include <asm/pgtable.h>1719#include <asm/tlbflush.h>1820#include <asm/mmu_context.h>1919-#include <asm/io.h>2121+2222+static int init_one_level2_page(struct kimage *image, pgd_t *pgd,2323+ unsigned long addr)2424+{2525+ pud_t *pud;2626+ pmd_t *pmd;2727+ struct page *page;2828+ int result = -ENOMEM;2929+3030+ addr &= PMD_MASK;3131+ pgd += pgd_index(addr);3232+ if (!pgd_present(*pgd)) {3333+ page = kimage_alloc_control_pages(image, 0);3434+ if (!page)3535+ goto out;3636+ pud = (pud_t *)page_address(page);3737+ memset(pud, 0, PAGE_SIZE);3838+ set_pgd(pgd, __pgd(__pa(pud) | _KERNPG_TABLE));3939+ }4040+ pud = pud_offset(pgd, addr);4141+ if (!pud_present(*pud)) {4242+ page = kimage_alloc_control_pages(image, 0);4343+ if (!page)4444+ goto out;4545+ pmd = (pmd_t *)page_address(page);4646+ memset(pmd, 0, PAGE_SIZE);4747+ set_pud(pud, __pud(__pa(pmd) | _KERNPG_TABLE));4848+ }4949+ pmd = pmd_offset(pud, addr);5050+ if (!pmd_present(*pmd))5151+ set_pmd(pmd, __pmd(addr | __PAGE_KERNEL_LARGE_EXEC));5252+ result = 0;5353+out:5454+ return result;5555+}20562157static void init_level2_page(pmd_t *level2p, unsigned long addr)2258{···11983 }12084 level3p = (pud_t *)page_address(page);12185 result = init_level3_page(image, level3p, addr, last_addr);122122- if (result) {8686+ if (result)12387 goto out;124124- }12588 set_pgd(level4p++, __pgd(__pa(level3p) | _KERNPG_TABLE));12689 addr += PGDIR_SIZE;12790 }···189154 int result;190155 level4p = (pgd_t *)__va(start_pgtable);191156 result = init_level4_page(image, level4p, 0, max_pfn << PAGE_SHIFT);157157+ if (result)158158+ return result;159159+ /*160160+ * image->start may be outside 0 ~ max_pfn, for example when161161+ * jump back to original kernel from kexeced kernel162162+ */163163+ result = init_one_level2_page(image, level4p, image->start);192164 if (result)193165 return result;194166 return init_transition_pgtable(image, level4p);···271229{272230 unsigned long page_list[PAGES_NR];273231 void *control_page;232232+ int save_ftrace_enabled;274233275275- tracer_disable();234234+#ifdef CONFIG_KEXEC_JUMP235235+ if (kexec_image->preserve_context)236236+ save_processor_state();237237+#endif238238+239239+ save_ftrace_enabled = __ftrace_enabled_save();276240277241 /* Interrupts aren't acceptable while we reboot */278242 local_irq_disable();279243244244+ if (image->preserve_context) {245245+#ifdef CONFIG_X86_IO_APIC246246+ /*247247+ * We need to put APICs in legacy mode so that we can248248+ * get timer interrupts in second kernel. kexec/kdump249249+ * paths already have calls to disable_IO_APIC() in250250+ * one form or other. kexec jump path also need251251+ * one.252252+ */253253+ disable_IO_APIC();254254+#endif255255+ }256256+280257 control_page = page_address(image->control_code_page) + PAGE_SIZE;281281- memcpy(control_page, relocate_kernel, PAGE_SIZE);258258+ memcpy(control_page, relocate_kernel, KEXEC_CONTROL_CODE_MAX_SIZE);282259283260 page_list[PA_CONTROL_PAGE] = virt_to_phys(control_page);261261+ page_list[VA_CONTROL_PAGE] = (unsigned long)control_page;284262 page_list[PA_TABLE_PAGE] =285263 (unsigned long)__pa(page_address(image->control_code_page));286264287287- /* The segment registers are funny things, they have both a265265+ if (image->type == KEXEC_TYPE_DEFAULT)266266+ page_list[PA_SWAP_PAGE] = (page_to_pfn(image->swap_page)267267+ << PAGE_SHIFT);268268+269269+ /*270270+ * The segment registers are funny things, they have both a288271 * visible and an invisible part. Whenever the visible part is289272 * set to a specific selector, the invisible part is loaded290273 * with from a table in memory. At no other time is the···319252 * segments, before I zap the gdt with an invalid value.320253 */321254 load_segments();322322- /* The gdt & idt are now invalid.255255+ /*256256+ * The gdt & idt are now invalid.323257 * If you want to load them you must set up your own idt & gdt.324258 */325325- set_gdt(phys_to_virt(0),0);326326- set_idt(phys_to_virt(0),0);259259+ set_gdt(phys_to_virt(0), 0);260260+ set_idt(phys_to_virt(0), 0);327261328262 /* now call it */329329- relocate_kernel((unsigned long)image->head, (unsigned long)page_list,330330- image->start);263263+ image->start = relocate_kernel((unsigned long)image->head,264264+ (unsigned long)page_list,265265+ image->start,266266+ image->preserve_context);267267+268268+#ifdef CONFIG_KEXEC_JUMP269269+ if (kexec_image->preserve_context)270270+ restore_processor_state();271271+#endif272272+273273+ __ftrace_enabled_restore(save_ftrace_enabled);331274}332275333276void arch_crash_save_vmcoreinfo(void)
+22-3
arch/x86/kernel/mpparse.c
···558558559559static struct mpf_intel *mpf_found;560560561561+static unsigned long __init get_mpc_size(unsigned long physptr)562562+{563563+ struct mpc_table *mpc;564564+ unsigned long size;565565+566566+ mpc = early_ioremap(physptr, PAGE_SIZE);567567+ size = mpc->length;568568+ early_iounmap(mpc, PAGE_SIZE);569569+ apic_printk(APIC_VERBOSE, " mpc: %lx-%lx\n", physptr, physptr + size);570570+571571+ return size;572572+}573573+561574/*562575 * Scan the memory blocks for an SMP configuration block.563576 */···624611 construct_default_ISA_mptable(mpf->feature1);625612626613 } else if (mpf->physptr) {614614+ struct mpc_table *mpc;615615+ unsigned long size;627616617617+ size = get_mpc_size(mpf->physptr);618618+ mpc = early_ioremap(mpf->physptr, size);628619 /*629620 * Read the physical hardware table. Anything here will630621 * override the defaults.631622 */632632- if (!smp_read_mpc(phys_to_virt(mpf->physptr), early)) {623623+ if (!smp_read_mpc(mpc, early)) {633624#ifdef CONFIG_X86_LOCAL_APIC634625 smp_found_config = 0;635626#endif···641624 "BIOS bug, MP table errors detected!...\n");642625 printk(KERN_ERR "... disabling SMP support. "643626 "(tell your hw vendor)\n");627627+ early_iounmap(mpc, size);644628 return;645629 }630630+ early_iounmap(mpc, size);646631647632 if (early)648633 return;···716697717698 if (!reserve)718699 return 1;719719- reserve_bootmem_generic(virt_to_phys(mpf), PAGE_SIZE,700700+ reserve_bootmem_generic(virt_to_phys(mpf), sizeof(*mpf),720701 BOOTMEM_DEFAULT);721702 if (mpf->physptr) {722722- unsigned long size = PAGE_SIZE;703703+ unsigned long size = get_mpc_size(mpf->physptr);723704#ifdef CONFIG_X86_32724705 /*725706 * We cannot access to MPC table to compute
···17171818#define PTR(x) (x << 2)19192020-/* control_page + KEXEC_CONTROL_CODE_MAX_SIZE2020+/*2121+ * control_page + KEXEC_CONTROL_CODE_MAX_SIZE2122 * ~ control_page + PAGE_SIZE are used as data storage and stack for2223 * jumping back2324 */···7776 movl %eax, CP_PA_SWAP_PAGE(%edi)7877 movl %ebx, CP_PA_BACKUP_PAGES_MAP(%edi)79788080- /* get physical address of control page now */8181- /* this is impossible after page table switch */7979+ /*8080+ * get physical address of control page now8181+ * this is impossible after page table switch8282+ */8283 movl PTR(PA_CONTROL_PAGE)(%ebp), %edi83848485 /* switch to new set of page tables */···10097 /* store the start address on the stack */10198 pushl %edx10299103103- /* Set cr0 to a known state:100100+ /*101101+ * Set cr0 to a known state:104102 * - Paging disabled105103 * - Alignment check disabled106104 * - Write protect disabled···117113 /* clear cr4 if applicable */118114 testl %ecx, %ecx119115 jz 1f120120- /* Set cr4 to a known state:116116+ /*117117+ * Set cr4 to a known state:121118 * Setting everything to zero seems safe.122119 */123120 xorl %eax, %eax···137132 call swap_pages138133 addl $8, %esp139134140140- /* To be certain of avoiding problems with self-modifying code135135+ /*136136+ * To be certain of avoiding problems with self-modifying code141137 * I need to execute a serializing instruction here.142138 * So I flush the TLB, it's handy, and not processor dependent.143139 */144140 xorl %eax, %eax145141 movl %eax, %cr3146142147147- /* set all of the registers to known values */148148- /* leave %esp alone */143143+ /*144144+ * set all of the registers to known values145145+ * leave %esp alone146146+ */149147150148 testl %esi, %esi151149 jnz 1f
+155-36
arch/x86/kernel/relocate_kernel_64.S
···1919#define PTR(x) (x << 3)2020#define PAGE_ATTR (_PAGE_PRESENT | _PAGE_RW | _PAGE_ACCESSED | _PAGE_DIRTY)21212222+/*2323+ * control_page + KEXEC_CONTROL_CODE_MAX_SIZE2424+ * ~ control_page + PAGE_SIZE are used as data storage and stack for2525+ * jumping back2626+ */2727+#define DATA(offset) (KEXEC_CONTROL_CODE_MAX_SIZE+(offset))2828+2929+/* Minimal CPU state */3030+#define RSP DATA(0x0)3131+#define CR0 DATA(0x8)3232+#define CR3 DATA(0x10)3333+#define CR4 DATA(0x18)3434+3535+/* other data */3636+#define CP_PA_TABLE_PAGE DATA(0x20)3737+#define CP_PA_SWAP_PAGE DATA(0x28)3838+#define CP_PA_BACKUP_PAGES_MAP DATA(0x30)3939+2240 .text2341 .align PAGE_SIZE2442 .code642543 .globl relocate_kernel2644relocate_kernel:2727- /* %rdi indirection_page4545+ /*4646+ * %rdi indirection_page2847 * %rsi page_list2948 * %rdx start address4949+ * %rcx preserve_context3050 */5151+5252+ /* Save the CPU context, used for jumping back */5353+ pushq %rbx5454+ pushq %rbp5555+ pushq %r125656+ pushq %r135757+ pushq %r145858+ pushq %r155959+ pushf6060+6161+ movq PTR(VA_CONTROL_PAGE)(%rsi), %r116262+ movq %rsp, RSP(%r11)6363+ movq %cr0, %rax6464+ movq %rax, CR0(%r11)6565+ movq %cr3, %rax6666+ movq %rax, CR3(%r11)6767+ movq %cr4, %rax6868+ movq %rax, CR4(%r11)31693270 /* zero out flags, and disable interrupts */3371 pushq $03472 popfq35733636- /* get physical address of control page now */3737- /* this is impossible after page table switch */7474+ /*7575+ * get physical address of control page now7676+ * this is impossible after page table switch7777+ */3878 movq PTR(PA_CONTROL_PAGE)(%rsi), %r839794080 /* get physical address of page table now too */4141- movq PTR(PA_TABLE_PAGE)(%rsi), %rcx8181+ movq PTR(PA_TABLE_PAGE)(%rsi), %r98282+8383+ /* get physical address of swap page now */8484+ movq PTR(PA_SWAP_PAGE)(%rsi), %r108585+8686+ /* save some information for jumping back */8787+ movq %r9, CP_PA_TABLE_PAGE(%r11)8888+ movq %r10, CP_PA_SWAP_PAGE(%r11)8989+ movq %rdi, CP_PA_BACKUP_PAGES_MAP(%r11)42904391 /* Switch to the identity mapped page tables */4444- movq %rcx, %cr39292+ movq %r9, %cr345934694 /* setup a new stack at the end of the physical control page */4795 lea PAGE_SIZE(%r8), %rsp···10355 /* store the start address on the stack */10456 pushq %rdx10557106106- /* Set cr0 to a known state:5858+ /*5959+ * Set cr0 to a known state:10760 * - Paging enabled10861 * - Alignment check disabled10962 * - Write protect disabled···11768 orl $(X86_CR0_PG | X86_CR0_PE), %eax11869 movq %rax, %cr011970120120- /* Set cr4 to a known state:7171+ /*7272+ * Set cr4 to a known state:12173 * - physical address extension enabled12274 */12375 movq $X86_CR4_PAE, %rax···128781:1297913080 /* Flush the TLB (needed?) */131131- movq %rcx, %cr38181+ movq %r9, %cr38282+8383+ movq %rcx, %r118484+ call swap_pages8585+8686+ /*8787+ * To be certain of avoiding problems with self-modifying code8888+ * I need to execute a serializing instruction here.8989+ * So I flush the TLB by reloading %cr3 here, it's handy,9090+ * and not processor dependent.9191+ */9292+ movq %cr3, %rax9393+ movq %rax, %cr39494+9595+ /*9696+ * set all of the registers to known values9797+ * leave %rsp alone9898+ */9999+100100+ testq %r11, %r11101101+ jnz 1f102102+ xorq %rax, %rax103103+ xorq %rbx, %rbx104104+ xorq %rcx, %rcx105105+ xorq %rdx, %rdx106106+ xorq %rsi, %rsi107107+ xorq %rdi, %rdi108108+ xorq %rbp, %rbp109109+ xorq %r8, %r8110110+ xorq %r9, %r9111111+ xorq %r10, %r9112112+ xorq %r11, %r11113113+ xorq %r12, %r12114114+ xorq %r13, %r13115115+ xorq %r14, %r14116116+ xorq %r15, %r15117117+118118+ ret119119+120120+1:121121+ popq %rdx122122+ leaq PAGE_SIZE(%r10), %rsp123123+ call *%rdx124124+125125+ /* get the re-entry point of the peer system */126126+ movq 0(%rsp), %rbp127127+ call 1f128128+1:129129+ popq %r8130130+ subq $(1b - relocate_kernel), %r8131131+ movq CP_PA_SWAP_PAGE(%r8), %r10132132+ movq CP_PA_BACKUP_PAGES_MAP(%r8), %rdi133133+ movq CP_PA_TABLE_PAGE(%r8), %rax134134+ movq %rax, %cr3135135+ lea PAGE_SIZE(%r8), %rsp136136+ call swap_pages137137+ movq $virtual_mapped, %rax138138+ pushq %rax139139+ ret140140+141141+virtual_mapped:142142+ movq RSP(%r8), %rsp143143+ movq CR4(%r8), %rax144144+ movq %rax, %cr4145145+ movq CR3(%r8), %rax146146+ movq CR0(%r8), %r8147147+ movq %rax, %cr3148148+ movq %r8, %cr0149149+ movq %rbp, %rax150150+151151+ popf152152+ popq %r15153153+ popq %r14154154+ popq %r13155155+ popq %r12156156+ popq %rbp157157+ popq %rbx158158+ ret132159133160 /* Do the copies */161161+swap_pages:134162 movq %rdi, %rcx /* Put the page_list in %rcx */135163 xorq %rdi, %rdi136164 xorq %rsi, %rsi···240112 movq %rcx, %rsi /* For ever source page do a copy */241113 andq $0xfffffffffffff000, %rsi242114115115+ movq %rdi, %rdx116116+ movq %rsi, %rax117117+118118+ movq %r10, %rdi243119 movq $512, %rcx244120 rep ; movsq121121+122122+ movq %rax, %rdi123123+ movq %rdx, %rsi124124+ movq $512, %rcx125125+ rep ; movsq126126+127127+ movq %rdx, %rdi128128+ movq %r10, %rsi129129+ movq $512, %rcx130130+ rep ; movsq131131+132132+ lea PAGE_SIZE(%rax), %rsi245133 jmp 0b2461343:247247-248248- /* To be certain of avoiding problems with self-modifying code249249- * I need to execute a serializing instruction here.250250- * So I flush the TLB by reloading %cr3 here, it's handy,251251- * and not processor dependent.252252- */253253- movq %cr3, %rax254254- movq %rax, %cr3255255-256256- /* set all of the registers to known values */257257- /* leave %rsp alone */258258-259259- xorq %rax, %rax260260- xorq %rbx, %rbx261261- xorq %rcx, %rcx262262- xorq %rdx, %rdx263263- xorq %rsi, %rsi264264- xorq %rdi, %rdi265265- xorq %rbp, %rbp266266- xorq %r8, %r8267267- xorq %r9, %r9268268- xorq %r10, %r9269269- xorq %r11, %r11270270- xorq %r12, %r12271271- xorq %r13, %r13272272- xorq %r14, %r14273273- xorq %r15, %r15274274-275135 ret136136+137137+ .globl kexec_control_code_size138138+.set kexec_control_code_size, . - relocate_kernel
+6-3
arch/x86/kernel/setup.c
···202202#endif203203204204#else205205-struct cpuinfo_x86 boot_cpu_data __read_mostly;205205+struct cpuinfo_x86 boot_cpu_data __read_mostly = {206206+ .x86_phys_bits = MAX_PHYSMEM_BITS,207207+};206208EXPORT_SYMBOL(boot_cpu_data);207209#endif208210···772770773771 finish_e820_parsing();774772773773+ if (efi_enabled)774774+ efi_init();775775+775776 dmi_scan_machine();776777777778 dmi_check_system(bad_bios_dmi_table);···794789 insert_resource(&iomem_resource, &data_resource);795790 insert_resource(&iomem_resource, &bss_resource);796791797797- if (efi_enabled)798798- efi_init();799792800793#ifdef CONFIG_X86_32801794 if (ppro_with_ram_bug()) {
+370-28
arch/x86/kernel/setup_percpu.c
···77#include <linux/crash_dump.h>88#include <linux/smp.h>99#include <linux/topology.h>1010+#include <linux/pfn.h>1011#include <asm/sections.h>1112#include <asm/processor.h>1213#include <asm/setup.h>···4241};4342EXPORT_SYMBOL(__per_cpu_offset);44434444+/*4545+ * On x86_64 symbols referenced from code should be reachable using4646+ * 32bit relocations. Reserve space for static percpu variables in4747+ * modules so that they are always served from the first chunk which4848+ * is located at the percpu segment base. On x86_32, anything can4949+ * address anywhere. No need to reserve space in the first chunk.5050+ */5151+#ifdef CONFIG_X86_645252+#define PERCPU_FIRST_CHUNK_RESERVE PERCPU_MODULE_RESERVE5353+#else5454+#define PERCPU_FIRST_CHUNK_RESERVE 05555+#endif5656+5757+/**5858+ * pcpu_need_numa - determine percpu allocation needs to consider NUMA5959+ *6060+ * If NUMA is not configured or there is only one NUMA node available,6161+ * there is no reason to consider NUMA. This function determines6262+ * whether percpu allocation should consider NUMA or not.6363+ *6464+ * RETURNS:6565+ * true if NUMA should be considered; otherwise, false.6666+ */6767+static bool __init pcpu_need_numa(void)6868+{6969+#ifdef CONFIG_NEED_MULTIPLE_NODES7070+ pg_data_t *last = NULL;7171+ unsigned int cpu;7272+7373+ for_each_possible_cpu(cpu) {7474+ int node = early_cpu_to_node(cpu);7575+7676+ if (node_online(node) && NODE_DATA(node) &&7777+ last && last != NODE_DATA(node))7878+ return true;7979+8080+ last = NODE_DATA(node);8181+ }8282+#endif8383+ return false;8484+}8585+8686+/**8787+ * pcpu_alloc_bootmem - NUMA friendly alloc_bootmem wrapper for percpu8888+ * @cpu: cpu to allocate for8989+ * @size: size allocation in bytes9090+ * @align: alignment9191+ *9292+ * Allocate @size bytes aligned at @align for cpu @cpu. This wrapper9393+ * does the right thing for NUMA regardless of the current9494+ * configuration.9595+ *9696+ * RETURNS:9797+ * Pointer to the allocated area on success, NULL on failure.9898+ */9999+static void * __init pcpu_alloc_bootmem(unsigned int cpu, unsigned long size,100100+ unsigned long align)101101+{102102+ const unsigned long goal = __pa(MAX_DMA_ADDRESS);103103+#ifdef CONFIG_NEED_MULTIPLE_NODES104104+ int node = early_cpu_to_node(cpu);105105+ void *ptr;106106+107107+ if (!node_online(node) || !NODE_DATA(node)) {108108+ ptr = __alloc_bootmem_nopanic(size, align, goal);109109+ pr_info("cpu %d has no node %d or node-local memory\n",110110+ cpu, node);111111+ pr_debug("per cpu data for cpu%d %lu bytes at %016lx\n",112112+ cpu, size, __pa(ptr));113113+ } else {114114+ ptr = __alloc_bootmem_node_nopanic(NODE_DATA(node),115115+ size, align, goal);116116+ pr_debug("per cpu data for cpu%d %lu bytes on node%d at "117117+ "%016lx\n", cpu, size, node, __pa(ptr));118118+ }119119+ return ptr;120120+#else121121+ return __alloc_bootmem_nopanic(size, align, goal);122122+#endif123123+}124124+125125+/*126126+ * Remap allocator127127+ *128128+ * This allocator uses PMD page as unit. A PMD page is allocated for129129+ * each cpu and each is remapped into vmalloc area using PMD mapping.130130+ * As PMD page is quite large, only part of it is used for the first131131+ * chunk. Unused part is returned to the bootmem allocator.132132+ *133133+ * So, the PMD pages are mapped twice - once to the physical mapping134134+ * and to the vmalloc area for the first percpu chunk. The double135135+ * mapping does add one more PMD TLB entry pressure but still is much136136+ * better than only using 4k mappings while still being NUMA friendly.137137+ */138138+#ifdef CONFIG_NEED_MULTIPLE_NODES139139+static size_t pcpur_size __initdata;140140+static void **pcpur_ptrs __initdata;141141+142142+static struct page * __init pcpur_get_page(unsigned int cpu, int pageno)143143+{144144+ size_t off = (size_t)pageno << PAGE_SHIFT;145145+146146+ if (off >= pcpur_size)147147+ return NULL;148148+149149+ return virt_to_page(pcpur_ptrs[cpu] + off);150150+}151151+152152+static ssize_t __init setup_pcpu_remap(size_t static_size)153153+{154154+ static struct vm_struct vm;155155+ pg_data_t *last;156156+ size_t ptrs_size, dyn_size;157157+ unsigned int cpu;158158+ ssize_t ret;159159+160160+ /*161161+ * If large page isn't supported, there's no benefit in doing162162+ * this. Also, on non-NUMA, embedding is better.163163+ */164164+ if (!cpu_has_pse || pcpu_need_numa())165165+ return -EINVAL;166166+167167+ last = NULL;168168+ for_each_possible_cpu(cpu) {169169+ int node = early_cpu_to_node(cpu);170170+171171+ if (node_online(node) && NODE_DATA(node) &&172172+ last && last != NODE_DATA(node))173173+ goto proceed;174174+175175+ last = NODE_DATA(node);176176+ }177177+ return -EINVAL;178178+179179+proceed:180180+ /*181181+ * Currently supports only single page. Supporting multiple182182+ * pages won't be too difficult if it ever becomes necessary.183183+ */184184+ pcpur_size = PFN_ALIGN(static_size + PERCPU_MODULE_RESERVE +185185+ PERCPU_DYNAMIC_RESERVE);186186+ if (pcpur_size > PMD_SIZE) {187187+ pr_warning("PERCPU: static data is larger than large page, "188188+ "can't use large page\n");189189+ return -EINVAL;190190+ }191191+ dyn_size = pcpur_size - static_size - PERCPU_FIRST_CHUNK_RESERVE;192192+193193+ /* allocate pointer array and alloc large pages */194194+ ptrs_size = PFN_ALIGN(num_possible_cpus() * sizeof(pcpur_ptrs[0]));195195+ pcpur_ptrs = alloc_bootmem(ptrs_size);196196+197197+ for_each_possible_cpu(cpu) {198198+ pcpur_ptrs[cpu] = pcpu_alloc_bootmem(cpu, PMD_SIZE, PMD_SIZE);199199+ if (!pcpur_ptrs[cpu])200200+ goto enomem;201201+202202+ /*203203+ * Only use pcpur_size bytes and give back the rest.204204+ *205205+ * Ingo: The 2MB up-rounding bootmem is needed to make206206+ * sure the partial 2MB page is still fully RAM - it's207207+ * not well-specified to have a PAT-incompatible area208208+ * (unmapped RAM, device memory, etc.) in that hole.209209+ */210210+ free_bootmem(__pa(pcpur_ptrs[cpu] + pcpur_size),211211+ PMD_SIZE - pcpur_size);212212+213213+ memcpy(pcpur_ptrs[cpu], __per_cpu_load, static_size);214214+ }215215+216216+ /* allocate address and map */217217+ vm.flags = VM_ALLOC;218218+ vm.size = num_possible_cpus() * PMD_SIZE;219219+ vm_area_register_early(&vm, PMD_SIZE);220220+221221+ for_each_possible_cpu(cpu) {222222+ pmd_t *pmd;223223+224224+ pmd = populate_extra_pmd((unsigned long)vm.addr225225+ + cpu * PMD_SIZE);226226+ set_pmd(pmd, pfn_pmd(page_to_pfn(virt_to_page(pcpur_ptrs[cpu])),227227+ PAGE_KERNEL_LARGE));228228+ }229229+230230+ /* we're ready, commit */231231+ pr_info("PERCPU: Remapped at %p with large pages, static data "232232+ "%zu bytes\n", vm.addr, static_size);233233+234234+ ret = pcpu_setup_first_chunk(pcpur_get_page, static_size,235235+ PERCPU_FIRST_CHUNK_RESERVE,236236+ PMD_SIZE, dyn_size, vm.addr, NULL);237237+ goto out_free_ar;238238+239239+enomem:240240+ for_each_possible_cpu(cpu)241241+ if (pcpur_ptrs[cpu])242242+ free_bootmem(__pa(pcpur_ptrs[cpu]), PMD_SIZE);243243+ ret = -ENOMEM;244244+out_free_ar:245245+ free_bootmem(__pa(pcpur_ptrs), ptrs_size);246246+ return ret;247247+}248248+#else249249+static ssize_t __init setup_pcpu_remap(size_t static_size)250250+{251251+ return -EINVAL;252252+}253253+#endif254254+255255+/*256256+ * Embedding allocator257257+ *258258+ * The first chunk is sized to just contain the static area plus259259+ * module and dynamic reserves, and allocated as a contiguous area260260+ * using bootmem allocator and used as-is without being mapped into261261+ * vmalloc area. This enables the first chunk to piggy back on the262262+ * linear physical PMD mapping and doesn't add any additional pressure263263+ * to TLB. Note that if the needed size is smaller than the minimum264264+ * unit size, the leftover is returned to the bootmem allocator.265265+ */266266+static void *pcpue_ptr __initdata;267267+static size_t pcpue_size __initdata;268268+static size_t pcpue_unit_size __initdata;269269+270270+static struct page * __init pcpue_get_page(unsigned int cpu, int pageno)271271+{272272+ size_t off = (size_t)pageno << PAGE_SHIFT;273273+274274+ if (off >= pcpue_size)275275+ return NULL;276276+277277+ return virt_to_page(pcpue_ptr + cpu * pcpue_unit_size + off);278278+}279279+280280+static ssize_t __init setup_pcpu_embed(size_t static_size)281281+{282282+ unsigned int cpu;283283+ size_t dyn_size;284284+285285+ /*286286+ * If large page isn't supported, there's no benefit in doing287287+ * this. Also, embedding allocation doesn't play well with288288+ * NUMA.289289+ */290290+ if (!cpu_has_pse || pcpu_need_numa())291291+ return -EINVAL;292292+293293+ /* allocate and copy */294294+ pcpue_size = PFN_ALIGN(static_size + PERCPU_MODULE_RESERVE +295295+ PERCPU_DYNAMIC_RESERVE);296296+ pcpue_unit_size = max_t(size_t, pcpue_size, PCPU_MIN_UNIT_SIZE);297297+ dyn_size = pcpue_size - static_size - PERCPU_FIRST_CHUNK_RESERVE;298298+299299+ pcpue_ptr = pcpu_alloc_bootmem(0, num_possible_cpus() * pcpue_unit_size,300300+ PAGE_SIZE);301301+ if (!pcpue_ptr)302302+ return -ENOMEM;303303+304304+ for_each_possible_cpu(cpu) {305305+ void *ptr = pcpue_ptr + cpu * pcpue_unit_size;306306+307307+ free_bootmem(__pa(ptr + pcpue_size),308308+ pcpue_unit_size - pcpue_size);309309+ memcpy(ptr, __per_cpu_load, static_size);310310+ }311311+312312+ /* we're ready, commit */313313+ pr_info("PERCPU: Embedded %zu pages at %p, static data %zu bytes\n",314314+ pcpue_size >> PAGE_SHIFT, pcpue_ptr, static_size);315315+316316+ return pcpu_setup_first_chunk(pcpue_get_page, static_size,317317+ PERCPU_FIRST_CHUNK_RESERVE,318318+ pcpue_unit_size, dyn_size,319319+ pcpue_ptr, NULL);320320+}321321+322322+/*323323+ * 4k page allocator324324+ *325325+ * This is the basic allocator. Static percpu area is allocated326326+ * page-by-page and most of initialization is done by the generic327327+ * setup function.328328+ */329329+static struct page **pcpu4k_pages __initdata;330330+static int pcpu4k_nr_static_pages __initdata;331331+332332+static struct page * __init pcpu4k_get_page(unsigned int cpu, int pageno)333333+{334334+ if (pageno < pcpu4k_nr_static_pages)335335+ return pcpu4k_pages[cpu * pcpu4k_nr_static_pages + pageno];336336+ return NULL;337337+}338338+339339+static void __init pcpu4k_populate_pte(unsigned long addr)340340+{341341+ populate_extra_pte(addr);342342+}343343+344344+static ssize_t __init setup_pcpu_4k(size_t static_size)345345+{346346+ size_t pages_size;347347+ unsigned int cpu;348348+ int i, j;349349+ ssize_t ret;350350+351351+ pcpu4k_nr_static_pages = PFN_UP(static_size);352352+353353+ /* unaligned allocations can't be freed, round up to page size */354354+ pages_size = PFN_ALIGN(pcpu4k_nr_static_pages * num_possible_cpus()355355+ * sizeof(pcpu4k_pages[0]));356356+ pcpu4k_pages = alloc_bootmem(pages_size);357357+358358+ /* allocate and copy */359359+ j = 0;360360+ for_each_possible_cpu(cpu)361361+ for (i = 0; i < pcpu4k_nr_static_pages; i++) {362362+ void *ptr;363363+364364+ ptr = pcpu_alloc_bootmem(cpu, PAGE_SIZE, PAGE_SIZE);365365+ if (!ptr)366366+ goto enomem;367367+368368+ memcpy(ptr, __per_cpu_load + i * PAGE_SIZE, PAGE_SIZE);369369+ pcpu4k_pages[j++] = virt_to_page(ptr);370370+ }371371+372372+ /* we're ready, commit */373373+ pr_info("PERCPU: Allocated %d 4k pages, static data %zu bytes\n",374374+ pcpu4k_nr_static_pages, static_size);375375+376376+ ret = pcpu_setup_first_chunk(pcpu4k_get_page, static_size,377377+ PERCPU_FIRST_CHUNK_RESERVE, -1, -1, NULL,378378+ pcpu4k_populate_pte);379379+ goto out_free_ar;380380+381381+enomem:382382+ while (--j >= 0)383383+ free_bootmem(__pa(page_address(pcpu4k_pages[j])), PAGE_SIZE);384384+ ret = -ENOMEM;385385+out_free_ar:386386+ free_bootmem(__pa(pcpu4k_pages), pages_size);387387+ return ret;388388+}389389+45390static inline void setup_percpu_segment(int cpu)46391{47392#ifdef CONFIG_X86_32···40861 */40962void __init setup_per_cpu_areas(void)41063{411411- ssize_t size;412412- char *ptr;413413- int cpu;414414-415415- /* Copy section for each CPU (we discard the original) */416416- size = roundup(PERCPU_ENOUGH_ROOM, PAGE_SIZE);6464+ size_t static_size = __per_cpu_end - __per_cpu_start;6565+ unsigned int cpu;6666+ unsigned long delta;6767+ size_t pcpu_unit_size;6868+ ssize_t ret;4176941870 pr_info("NR_CPUS:%d nr_cpumask_bits:%d nr_cpu_ids:%d nr_node_ids:%d\n",41971 NR_CPUS, nr_cpumask_bits, nr_cpu_ids, nr_node_ids);42072421421- pr_info("PERCPU: Allocating %zd bytes of per cpu data\n", size);7373+ /*7474+ * Allocate percpu area. If PSE is supported, try to make use7575+ * of large page mappings. Please read comments on top of7676+ * each allocator for details.7777+ */7878+ ret = setup_pcpu_remap(static_size);7979+ if (ret < 0)8080+ ret = setup_pcpu_embed(static_size);8181+ if (ret < 0)8282+ ret = setup_pcpu_4k(static_size);8383+ if (ret < 0)8484+ panic("cannot allocate static percpu area (%zu bytes, err=%zd)",8585+ static_size, ret);422868787+ pcpu_unit_size = ret;8888+8989+ /* alrighty, percpu areas up and running */9090+ delta = (unsigned long)pcpu_base_addr - (unsigned long)__per_cpu_start;42391 for_each_possible_cpu(cpu) {424424-#ifndef CONFIG_NEED_MULTIPLE_NODES425425- ptr = alloc_bootmem_pages(size);426426-#else427427- int node = early_cpu_to_node(cpu);428428- if (!node_online(node) || !NODE_DATA(node)) {429429- ptr = alloc_bootmem_pages(size);430430- pr_info("cpu %d has no node %d or node-local memory\n",431431- cpu, node);432432- pr_debug("per cpu data for cpu%d at %016lx\n",433433- cpu, __pa(ptr));434434- } else {435435- ptr = alloc_bootmem_pages_node(NODE_DATA(node), size);436436- pr_debug("per cpu data for cpu%d on node%d at %016lx\n",437437- cpu, node, __pa(ptr));438438- }439439-#endif440440-441441- memcpy(ptr, __per_cpu_load, __per_cpu_end - __per_cpu_start);442442- per_cpu_offset(cpu) = ptr - __per_cpu_start;9292+ per_cpu_offset(cpu) = delta + cpu * pcpu_unit_size;44393 per_cpu(this_cpu_off, cpu) = per_cpu_offset(cpu);44494 per_cpu(cpu_number, cpu) = cpu;44595 setup_percpu_segment(cpu);···469125 */470126 if (cpu == boot_cpu_id)471127 switch_to_new_gdt(cpu);472472-473473- DBG("PERCPU: cpu %4d %p\n", cpu, ptr);474128 }475129476130 /* indicate the early static arrays will soon be gone */
-78
arch/x86/kernel/smpboot.c
···114114115115atomic_t init_deasserted;116116117117-118118-/* Set if we find a B stepping CPU */119119-static int __cpuinitdata smp_b_stepping;120120-121117#if defined(CONFIG_NUMA) && defined(CONFIG_X86_32)122118123119/* which logical CPUs are on which nodes */···267271 cpumask_set_cpu(cpuid, cpu_callin_mask);268272}269273270270-static int __cpuinitdata unsafe_smp;271271-272274/*273275 * Activate a secondary processor.274276 */···334340 cpu_idle();335341}336342337337-static void __cpuinit smp_apply_quirks(struct cpuinfo_x86 *c)338338-{339339- /*340340- * Mask B, Pentium, but not Pentium MMX341341- */342342- if (c->x86_vendor == X86_VENDOR_INTEL &&343343- c->x86 == 5 &&344344- c->x86_mask >= 1 && c->x86_mask <= 4 &&345345- c->x86_model <= 3)346346- /*347347- * Remember we have B step Pentia with bugs348348- */349349- smp_b_stepping = 1;350350-351351- /*352352- * Certain Athlons might work (for various values of 'work') in SMP353353- * but they are not certified as MP capable.354354- */355355- if ((c->x86_vendor == X86_VENDOR_AMD) && (c->x86 == 6)) {356356-357357- if (num_possible_cpus() == 1)358358- goto valid_k7;359359-360360- /* Athlon 660/661 is valid. */361361- if ((c->x86_model == 6) && ((c->x86_mask == 0) ||362362- (c->x86_mask == 1)))363363- goto valid_k7;364364-365365- /* Duron 670 is valid */366366- if ((c->x86_model == 7) && (c->x86_mask == 0))367367- goto valid_k7;368368-369369- /*370370- * Athlon 662, Duron 671, and Athlon >model 7 have capability371371- * bit. It's worth noting that the A5 stepping (662) of some372372- * Athlon XP's have the MP bit set.373373- * See http://www.heise.de/newsticker/data/jow-18.10.01-000 for374374- * more.375375- */376376- if (((c->x86_model == 6) && (c->x86_mask >= 2)) ||377377- ((c->x86_model == 7) && (c->x86_mask >= 1)) ||378378- (c->x86_model > 7))379379- if (cpu_has_mp)380380- goto valid_k7;381381-382382- /* If we get here, not a certified SMP capable AMD system. */383383- unsafe_smp = 1;384384- }385385-386386-valid_k7:387387- ;388388-}389389-390390-static void __cpuinit smp_checks(void)391391-{392392- if (smp_b_stepping)393393- printk(KERN_WARNING "WARNING: SMP operation may be unreliable"394394- "with B stepping processors.\n");395395-396396- /*397397- * Don't taint if we are running SMP kernel on a single non-MP398398- * approved Athlon399399- */400400- if (unsafe_smp && num_online_cpus() > 1) {401401- printk(KERN_INFO "WARNING: This combination of AMD"402402- "processors is not suitable for SMP.\n");403403- add_taint(TAINT_UNSAFE_SMP);404404- }405405-}406406-407343/*408344 * The bootstrap kernel entry code has set these up. Save them for409345 * a given CPU···347423 c->cpu_index = id;348424 if (id != 0)349425 identify_secondary_cpu(c);350350- smp_apply_quirks(c);351426}352427353428···11161193 pr_debug("Boot done.\n");1117119411181195 impress_friends();11191119- smp_checks();11201196#ifdef CONFIG_X86_IO_APIC11211197 setup_ioapic_dest();11221198#endif
···11+/*22+ * SGI RTC clock/timer routines.33+ *44+ * This program is free software; you can redistribute it and/or modify55+ * it under the terms of the GNU General Public License as published by66+ * the Free Software Foundation; either version 2 of the License, or77+ * (at your option) any later version.88+ *99+ * This program is distributed in the hope that it will be useful,1010+ * but WITHOUT ANY WARRANTY; without even the implied warranty of1111+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the1212+ * GNU General Public License for more details.1313+ *1414+ * You should have received a copy of the GNU General Public License1515+ * along with this program; if not, write to the Free Software1616+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA1717+ *1818+ * Copyright (c) 2009 Silicon Graphics, Inc. All Rights Reserved.1919+ * Copyright (c) Dimitri Sivanich2020+ */2121+#include <linux/clockchips.h>2222+2323+#include <asm/uv/uv_mmrs.h>2424+#include <asm/uv/uv_hub.h>2525+#include <asm/uv/bios.h>2626+#include <asm/uv/uv.h>2727+#include <asm/apic.h>2828+#include <asm/cpu.h>2929+3030+#define RTC_NAME "sgi_rtc"3131+3232+static cycle_t uv_read_rtc(void);3333+static int uv_rtc_next_event(unsigned long, struct clock_event_device *);3434+static void uv_rtc_timer_setup(enum clock_event_mode,3535+ struct clock_event_device *);3636+3737+static struct clocksource clocksource_uv = {3838+ .name = RTC_NAME,3939+ .rating = 400,4040+ .read = uv_read_rtc,4141+ .mask = (cycle_t)UVH_RTC_REAL_TIME_CLOCK_MASK,4242+ .shift = 10,4343+ .flags = CLOCK_SOURCE_IS_CONTINUOUS,4444+};4545+4646+static struct clock_event_device clock_event_device_uv = {4747+ .name = RTC_NAME,4848+ .features = CLOCK_EVT_FEAT_ONESHOT,4949+ .shift = 20,5050+ .rating = 400,5151+ .irq = -1,5252+ .set_next_event = uv_rtc_next_event,5353+ .set_mode = uv_rtc_timer_setup,5454+ .event_handler = NULL,5555+};5656+5757+static DEFINE_PER_CPU(struct clock_event_device, cpu_ced);5858+5959+/* There is one of these allocated per node */6060+struct uv_rtc_timer_head {6161+ spinlock_t lock;6262+ /* next cpu waiting for timer, local node relative: */6363+ int next_cpu;6464+ /* number of cpus on this node: */6565+ int ncpus;6666+ struct {6767+ int lcpu; /* systemwide logical cpu number */6868+ u64 expires; /* next timer expiration for this cpu */6969+ } cpu[1];7070+};7171+7272+/*7373+ * Access to uv_rtc_timer_head via blade id.7474+ */7575+static struct uv_rtc_timer_head **blade_info __read_mostly;7676+7777+static int uv_rtc_enable;7878+7979+/*8080+ * Hardware interface routines8181+ */8282+8383+/* Send IPIs to another node */8484+static void uv_rtc_send_IPI(int cpu)8585+{8686+ unsigned long apicid, val;8787+ int pnode;8888+8989+ apicid = cpu_physical_id(cpu);9090+ pnode = uv_apicid_to_pnode(apicid);9191+ val = (1UL << UVH_IPI_INT_SEND_SHFT) |9292+ (apicid << UVH_IPI_INT_APIC_ID_SHFT) |9393+ (GENERIC_INTERRUPT_VECTOR << UVH_IPI_INT_VECTOR_SHFT);9494+9595+ uv_write_global_mmr64(pnode, UVH_IPI_INT, val);9696+}9797+9898+/* Check for an RTC interrupt pending */9999+static int uv_intr_pending(int pnode)100100+{101101+ return uv_read_global_mmr64(pnode, UVH_EVENT_OCCURRED0) &102102+ UVH_EVENT_OCCURRED0_RTC1_MASK;103103+}104104+105105+/* Setup interrupt and return non-zero if early expiration occurred. */106106+static int uv_setup_intr(int cpu, u64 expires)107107+{108108+ u64 val;109109+ int pnode = uv_cpu_to_pnode(cpu);110110+111111+ uv_write_global_mmr64(pnode, UVH_RTC1_INT_CONFIG,112112+ UVH_RTC1_INT_CONFIG_M_MASK);113113+ uv_write_global_mmr64(pnode, UVH_INT_CMPB, -1L);114114+115115+ uv_write_global_mmr64(pnode, UVH_EVENT_OCCURRED0_ALIAS,116116+ UVH_EVENT_OCCURRED0_RTC1_MASK);117117+118118+ val = (GENERIC_INTERRUPT_VECTOR << UVH_RTC1_INT_CONFIG_VECTOR_SHFT) |119119+ ((u64)cpu_physical_id(cpu) << UVH_RTC1_INT_CONFIG_APIC_ID_SHFT);120120+121121+ /* Set configuration */122122+ uv_write_global_mmr64(pnode, UVH_RTC1_INT_CONFIG, val);123123+ /* Initialize comparator value */124124+ uv_write_global_mmr64(pnode, UVH_INT_CMPB, expires);125125+126126+ return (expires < uv_read_rtc() && !uv_intr_pending(pnode));127127+}128128+129129+/*130130+ * Per-cpu timer tracking routines131131+ */132132+133133+static __init void uv_rtc_deallocate_timers(void)134134+{135135+ int bid;136136+137137+ for_each_possible_blade(bid) {138138+ kfree(blade_info[bid]);139139+ }140140+ kfree(blade_info);141141+}142142+143143+/* Allocate per-node list of cpu timer expiration times. */144144+static __init int uv_rtc_allocate_timers(void)145145+{146146+ int cpu;147147+148148+ blade_info = kmalloc(uv_possible_blades * sizeof(void *), GFP_KERNEL);149149+ if (!blade_info)150150+ return -ENOMEM;151151+ memset(blade_info, 0, uv_possible_blades * sizeof(void *));152152+153153+ for_each_present_cpu(cpu) {154154+ int nid = cpu_to_node(cpu);155155+ int bid = uv_cpu_to_blade_id(cpu);156156+ int bcpu = uv_cpu_hub_info(cpu)->blade_processor_id;157157+ struct uv_rtc_timer_head *head = blade_info[bid];158158+159159+ if (!head) {160160+ head = kmalloc_node(sizeof(struct uv_rtc_timer_head) +161161+ (uv_blade_nr_possible_cpus(bid) *162162+ 2 * sizeof(u64)),163163+ GFP_KERNEL, nid);164164+ if (!head) {165165+ uv_rtc_deallocate_timers();166166+ return -ENOMEM;167167+ }168168+ spin_lock_init(&head->lock);169169+ head->ncpus = uv_blade_nr_possible_cpus(bid);170170+ head->next_cpu = -1;171171+ blade_info[bid] = head;172172+ }173173+174174+ head->cpu[bcpu].lcpu = cpu;175175+ head->cpu[bcpu].expires = ULLONG_MAX;176176+ }177177+178178+ return 0;179179+}180180+181181+/* Find and set the next expiring timer. */182182+static void uv_rtc_find_next_timer(struct uv_rtc_timer_head *head, int pnode)183183+{184184+ u64 lowest = ULLONG_MAX;185185+ int c, bcpu = -1;186186+187187+ head->next_cpu = -1;188188+ for (c = 0; c < head->ncpus; c++) {189189+ u64 exp = head->cpu[c].expires;190190+ if (exp < lowest) {191191+ bcpu = c;192192+ lowest = exp;193193+ }194194+ }195195+ if (bcpu >= 0) {196196+ head->next_cpu = bcpu;197197+ c = head->cpu[bcpu].lcpu;198198+ if (uv_setup_intr(c, lowest))199199+ /* If we didn't set it up in time, trigger */200200+ uv_rtc_send_IPI(c);201201+ } else {202202+ uv_write_global_mmr64(pnode, UVH_RTC1_INT_CONFIG,203203+ UVH_RTC1_INT_CONFIG_M_MASK);204204+ }205205+}206206+207207+/*208208+ * Set expiration time for current cpu.209209+ *210210+ * Returns 1 if we missed the expiration time.211211+ */212212+static int uv_rtc_set_timer(int cpu, u64 expires)213213+{214214+ int pnode = uv_cpu_to_pnode(cpu);215215+ int bid = uv_cpu_to_blade_id(cpu);216216+ struct uv_rtc_timer_head *head = blade_info[bid];217217+ int bcpu = uv_cpu_hub_info(cpu)->blade_processor_id;218218+ u64 *t = &head->cpu[bcpu].expires;219219+ unsigned long flags;220220+ int next_cpu;221221+222222+ spin_lock_irqsave(&head->lock, flags);223223+224224+ next_cpu = head->next_cpu;225225+ *t = expires;226226+ /* Will this one be next to go off? */227227+ if (next_cpu < 0 || bcpu == next_cpu ||228228+ expires < head->cpu[next_cpu].expires) {229229+ head->next_cpu = bcpu;230230+ if (uv_setup_intr(cpu, expires)) {231231+ *t = ULLONG_MAX;232232+ uv_rtc_find_next_timer(head, pnode);233233+ spin_unlock_irqrestore(&head->lock, flags);234234+ return 1;235235+ }236236+ }237237+238238+ spin_unlock_irqrestore(&head->lock, flags);239239+ return 0;240240+}241241+242242+/*243243+ * Unset expiration time for current cpu.244244+ *245245+ * Returns 1 if this timer was pending.246246+ */247247+static int uv_rtc_unset_timer(int cpu)248248+{249249+ int pnode = uv_cpu_to_pnode(cpu);250250+ int bid = uv_cpu_to_blade_id(cpu);251251+ struct uv_rtc_timer_head *head = blade_info[bid];252252+ int bcpu = uv_cpu_hub_info(cpu)->blade_processor_id;253253+ u64 *t = &head->cpu[bcpu].expires;254254+ unsigned long flags;255255+ int rc = 0;256256+257257+ spin_lock_irqsave(&head->lock, flags);258258+259259+ if (head->next_cpu == bcpu && uv_read_rtc() >= *t)260260+ rc = 1;261261+262262+ *t = ULLONG_MAX;263263+264264+ /* Was the hardware setup for this timer? */265265+ if (head->next_cpu == bcpu)266266+ uv_rtc_find_next_timer(head, pnode);267267+268268+ spin_unlock_irqrestore(&head->lock, flags);269269+270270+ return rc;271271+}272272+273273+274274+/*275275+ * Kernel interface routines.276276+ */277277+278278+/*279279+ * Read the RTC.280280+ */281281+static cycle_t uv_read_rtc(void)282282+{283283+ return (cycle_t)uv_read_local_mmr(UVH_RTC);284284+}285285+286286+/*287287+ * Program the next event, relative to now288288+ */289289+static int uv_rtc_next_event(unsigned long delta,290290+ struct clock_event_device *ced)291291+{292292+ int ced_cpu = cpumask_first(ced->cpumask);293293+294294+ return uv_rtc_set_timer(ced_cpu, delta + uv_read_rtc());295295+}296296+297297+/*298298+ * Setup the RTC timer in oneshot mode299299+ */300300+static void uv_rtc_timer_setup(enum clock_event_mode mode,301301+ struct clock_event_device *evt)302302+{303303+ int ced_cpu = cpumask_first(evt->cpumask);304304+305305+ switch (mode) {306306+ case CLOCK_EVT_MODE_PERIODIC:307307+ case CLOCK_EVT_MODE_ONESHOT:308308+ case CLOCK_EVT_MODE_RESUME:309309+ /* Nothing to do here yet */310310+ break;311311+ case CLOCK_EVT_MODE_UNUSED:312312+ case CLOCK_EVT_MODE_SHUTDOWN:313313+ uv_rtc_unset_timer(ced_cpu);314314+ break;315315+ }316316+}317317+318318+static void uv_rtc_interrupt(void)319319+{320320+ struct clock_event_device *ced = &__get_cpu_var(cpu_ced);321321+ int cpu = smp_processor_id();322322+323323+ if (!ced || !ced->event_handler)324324+ return;325325+326326+ if (uv_rtc_unset_timer(cpu) != 1)327327+ return;328328+329329+ ced->event_handler(ced);330330+}331331+332332+static int __init uv_enable_rtc(char *str)333333+{334334+ uv_rtc_enable = 1;335335+336336+ return 1;337337+}338338+__setup("uvrtc", uv_enable_rtc);339339+340340+static __init void uv_rtc_register_clockevents(struct work_struct *dummy)341341+{342342+ struct clock_event_device *ced = &__get_cpu_var(cpu_ced);343343+344344+ *ced = clock_event_device_uv;345345+ ced->cpumask = cpumask_of(smp_processor_id());346346+ clockevents_register_device(ced);347347+}348348+349349+static __init int uv_rtc_setup_clock(void)350350+{351351+ int rc;352352+353353+ if (!uv_rtc_enable || !is_uv_system() || generic_interrupt_extension)354354+ return -ENODEV;355355+356356+ generic_interrupt_extension = uv_rtc_interrupt;357357+358358+ clocksource_uv.mult = clocksource_hz2mult(sn_rtc_cycles_per_second,359359+ clocksource_uv.shift);360360+361361+ rc = clocksource_register(&clocksource_uv);362362+ if (rc) {363363+ generic_interrupt_extension = NULL;364364+ return rc;365365+ }366366+367367+ /* Setup and register clockevents */368368+ rc = uv_rtc_allocate_timers();369369+ if (rc) {370370+ clocksource_unregister(&clocksource_uv);371371+ generic_interrupt_extension = NULL;372372+ return rc;373373+ }374374+375375+ clock_event_device_uv.mult = div_sc(sn_rtc_cycles_per_second,376376+ NSEC_PER_SEC, clock_event_device_uv.shift);377377+378378+ clock_event_device_uv.min_delta_ns = NSEC_PER_SEC /379379+ sn_rtc_cycles_per_second;380380+381381+ clock_event_device_uv.max_delta_ns = clocksource_uv.mask *382382+ (NSEC_PER_SEC / sn_rtc_cycles_per_second);383383+384384+ rc = schedule_on_each_cpu(uv_rtc_register_clockevents);385385+ if (rc) {386386+ clocksource_unregister(&clocksource_uv);387387+ generic_interrupt_extension = NULL;388388+ uv_rtc_deallocate_timers();389389+ }390390+391391+ return rc;392392+}393393+arch_initcall(uv_rtc_setup_clock);
+7
arch/x86/kernel/vmlinux_64.lds.S
···275275ASSERT((per_cpu__irq_stack_union == 0),276276 "irq_stack_union is not at start of per-cpu area");277277#endif278278+279279+#ifdef CONFIG_KEXEC280280+#include <asm/kexec.h>281281+282282+ASSERT(kexec_control_code_size <= KEXEC_CONTROL_CODE_MAX_SIZE,283283+ "kexec control code size is too big")284284+#endif
+14-7
arch/x86/lguest/boot.c
···348348 * flush_tlb_user() for both user and kernel mappings unless349349 * the Page Global Enable (PGE) feature bit is set. */350350 *dx |= 0x00002000;351351+ /* We also lie, and say we're family id 5. 6 or greater352352+ * leads to a rdmsr in early_init_intel which we can't handle.353353+ * Family ID is returned as bits 8-12 in ax. */354354+ *ax &= 0xFFFFF0FF;355355+ *ax |= 0x00000500;351356 break;352357 case 0x80000000:353358 /* Futureproof this a little: if they ask how much extended···599594 /* Some systems map "vectors" to interrupts weirdly. Lguest has600595 * a straightforward 1 to 1 mapping, so force that here. */601596 __get_cpu_var(vector_irq)[vector] = i;602602- if (vector != SYSCALL_VECTOR) {603603- set_intr_gate(vector,604604- interrupt[vector-FIRST_EXTERNAL_VECTOR]);605605- set_irq_chip_and_handler_name(i, &lguest_irq_controller,606606- handle_level_irq,607607- "level");608608- }597597+ if (vector != SYSCALL_VECTOR)598598+ set_intr_gate(vector, interrupt[i]);609599 }610600 /* This call is required to set up for 4k stacks, where we have611601 * separate stacks for hard and soft interrupts. */612602 irq_ctx_init(smp_processor_id());603603+}604604+605605+void lguest_setup_irq(unsigned int irq)606606+{607607+ irq_to_desc_alloc_cpu(irq, 0);608608+ set_irq_chip_and_handler_name(irq, &lguest_irq_controller,609609+ handle_level_irq, "level");613610}614611615612/*
+20-11
arch/x86/math-emu/fpu_aux.c
···3030}31313232/* Needs to be externally visible */3333-void finit(void)3333+void finit_task(struct task_struct *tsk)3434{3535- control_word = 0x037f;3636- partial_status = 0;3737- top = 0; /* We don't keep top in the status word internally. */3838- fpu_tag_word = 0xffff;3535+ struct i387_soft_struct *soft = &tsk->thread.xstate->soft;3636+ struct address *oaddr, *iaddr;3737+ soft->cwd = 0x037f;3838+ soft->swd = 0;3939+ soft->ftop = 0; /* We don't keep top in the status word internally. */4040+ soft->twd = 0xffff;3941 /* The behaviour is different from that detailed in4042 Section 15.1.6 of the Intel manual */4141- operand_address.offset = 0;4242- operand_address.selector = 0;4343- instruction_address.offset = 0;4444- instruction_address.selector = 0;4545- instruction_address.opcode = 0;4646- no_ip_update = 1;4343+ oaddr = (struct address *)&soft->foo;4444+ oaddr->offset = 0;4545+ oaddr->selector = 0;4646+ iaddr = (struct address *)&soft->fip;4747+ iaddr->offset = 0;4848+ iaddr->selector = 0;4949+ iaddr->opcode = 0;5050+ soft->no_update = 1;5151+}5252+5353+void finit(void)5454+{5555+ finit_task(current);4756}48574958/*
···11+#include <linux/ioport.h>12#include <linux/swap.h>33+24#include <asm/cacheflush.h>55+#include <asm/e820.h>66+#include <asm/init.h>37#include <asm/page.h>88+#include <asm/page_types.h>49#include <asm/sections.h>510#include <asm/system.h>1111+#include <asm/tlbflush.h>1212+1313+unsigned long __initdata e820_table_start;1414+unsigned long __meminitdata e820_table_end;1515+unsigned long __meminitdata e820_table_top;1616+1717+int after_bootmem;1818+1919+int direct_gbpages2020+#ifdef CONFIG_DIRECT_GBPAGES2121+ = 12222+#endif2323+;2424+2525+static void __init find_early_table_space(unsigned long end, int use_pse,2626+ int use_gbpages)2727+{2828+ unsigned long puds, pmds, ptes, tables, start;2929+3030+ puds = (end + PUD_SIZE - 1) >> PUD_SHIFT;3131+ tables = roundup(puds * sizeof(pud_t), PAGE_SIZE);3232+3333+ if (use_gbpages) {3434+ unsigned long extra;3535+3636+ extra = end - ((end>>PUD_SHIFT) << PUD_SHIFT);3737+ pmds = (extra + PMD_SIZE - 1) >> PMD_SHIFT;3838+ } else3939+ pmds = (end + PMD_SIZE - 1) >> PMD_SHIFT;4040+4141+ tables += roundup(pmds * sizeof(pmd_t), PAGE_SIZE);4242+4343+ if (use_pse) {4444+ unsigned long extra;4545+4646+ extra = end - ((end>>PMD_SHIFT) << PMD_SHIFT);4747+#ifdef CONFIG_X86_324848+ extra += PMD_SIZE;4949+#endif5050+ ptes = (extra + PAGE_SIZE - 1) >> PAGE_SHIFT;5151+ } else5252+ ptes = (end + PAGE_SIZE - 1) >> PAGE_SHIFT;5353+5454+ tables += roundup(ptes * sizeof(pte_t), PAGE_SIZE);5555+5656+#ifdef CONFIG_X86_325757+ /* for fixmap */5858+ tables += roundup(__end_of_fixed_addresses * sizeof(pte_t), PAGE_SIZE);5959+#endif6060+6161+ /*6262+ * RED-PEN putting page tables only on node 0 could6363+ * cause a hotspot and fill up ZONE_DMA. The page tables6464+ * need roughly 0.5KB per GB.6565+ */6666+#ifdef CONFIG_X86_326767+ start = 0x7000;6868+ e820_table_start = find_e820_area(start, max_pfn_mapped<<PAGE_SHIFT,6969+ tables, PAGE_SIZE);7070+#else /* CONFIG_X86_64 */7171+ start = 0x8000;7272+ e820_table_start = find_e820_area(start, end, tables, PAGE_SIZE);7373+#endif7474+ if (e820_table_start == -1UL)7575+ panic("Cannot find space for the kernel page tables");7676+7777+ e820_table_start >>= PAGE_SHIFT;7878+ e820_table_end = e820_table_start;7979+ e820_table_top = e820_table_start + (tables >> PAGE_SHIFT);8080+8181+ printk(KERN_DEBUG "kernel direct mapping tables up to %lx @ %lx-%lx\n",8282+ end, e820_table_start << PAGE_SHIFT, e820_table_top << PAGE_SHIFT);8383+}8484+8585+struct map_range {8686+ unsigned long start;8787+ unsigned long end;8888+ unsigned page_size_mask;8989+};9090+9191+#ifdef CONFIG_X86_329292+#define NR_RANGE_MR 39393+#else /* CONFIG_X86_64 */9494+#define NR_RANGE_MR 59595+#endif9696+9797+static int save_mr(struct map_range *mr, int nr_range,9898+ unsigned long start_pfn, unsigned long end_pfn,9999+ unsigned long page_size_mask)100100+{101101+ if (start_pfn < end_pfn) {102102+ if (nr_range >= NR_RANGE_MR)103103+ panic("run out of range for init_memory_mapping\n");104104+ mr[nr_range].start = start_pfn<<PAGE_SHIFT;105105+ mr[nr_range].end = end_pfn<<PAGE_SHIFT;106106+ mr[nr_range].page_size_mask = page_size_mask;107107+ nr_range++;108108+ }109109+110110+ return nr_range;111111+}112112+113113+#ifdef CONFIG_X86_64114114+static void __init init_gbpages(void)115115+{116116+ if (direct_gbpages && cpu_has_gbpages)117117+ printk(KERN_INFO "Using GB pages for direct mapping\n");118118+ else119119+ direct_gbpages = 0;120120+}121121+#else122122+static inline void init_gbpages(void)123123+{124124+}125125+#endif126126+127127+/*128128+ * Setup the direct mapping of the physical memory at PAGE_OFFSET.129129+ * This runs before bootmem is initialized and gets pages directly from130130+ * the physical memory. To access them they are temporarily mapped.131131+ */132132+unsigned long __init_refok init_memory_mapping(unsigned long start,133133+ unsigned long end)134134+{135135+ unsigned long page_size_mask = 0;136136+ unsigned long start_pfn, end_pfn;137137+ unsigned long ret = 0;138138+ unsigned long pos;139139+140140+ struct map_range mr[NR_RANGE_MR];141141+ int nr_range, i;142142+ int use_pse, use_gbpages;143143+144144+ printk(KERN_INFO "init_memory_mapping: %016lx-%016lx\n", start, end);145145+146146+ if (!after_bootmem)147147+ init_gbpages();148148+149149+#ifdef CONFIG_DEBUG_PAGEALLOC150150+ /*151151+ * For CONFIG_DEBUG_PAGEALLOC, identity mapping will use small pages.152152+ * This will simplify cpa(), which otherwise needs to support splitting153153+ * large pages into small in interrupt context, etc.154154+ */155155+ use_pse = use_gbpages = 0;156156+#else157157+ use_pse = cpu_has_pse;158158+ use_gbpages = direct_gbpages;159159+#endif160160+161161+#ifdef CONFIG_X86_32162162+#ifdef CONFIG_X86_PAE163163+ set_nx();164164+ if (nx_enabled)165165+ printk(KERN_INFO "NX (Execute Disable) protection: active\n");166166+#endif167167+168168+ /* Enable PSE if available */169169+ if (cpu_has_pse)170170+ set_in_cr4(X86_CR4_PSE);171171+172172+ /* Enable PGE if available */173173+ if (cpu_has_pge) {174174+ set_in_cr4(X86_CR4_PGE);175175+ __supported_pte_mask |= _PAGE_GLOBAL;176176+ }177177+#endif178178+179179+ if (use_gbpages)180180+ page_size_mask |= 1 << PG_LEVEL_1G;181181+ if (use_pse)182182+ page_size_mask |= 1 << PG_LEVEL_2M;183183+184184+ memset(mr, 0, sizeof(mr));185185+ nr_range = 0;186186+187187+ /* head if not big page alignment ? */188188+ start_pfn = start >> PAGE_SHIFT;189189+ pos = start_pfn << PAGE_SHIFT;190190+#ifdef CONFIG_X86_32191191+ /*192192+ * Don't use a large page for the first 2/4MB of memory193193+ * because there are often fixed size MTRRs in there194194+ * and overlapping MTRRs into large pages can cause195195+ * slowdowns.196196+ */197197+ if (pos == 0)198198+ end_pfn = 1<<(PMD_SHIFT - PAGE_SHIFT);199199+ else200200+ end_pfn = ((pos + (PMD_SIZE - 1))>>PMD_SHIFT)201201+ << (PMD_SHIFT - PAGE_SHIFT);202202+#else /* CONFIG_X86_64 */203203+ end_pfn = ((pos + (PMD_SIZE - 1)) >> PMD_SHIFT)204204+ << (PMD_SHIFT - PAGE_SHIFT);205205+#endif206206+ if (end_pfn > (end >> PAGE_SHIFT))207207+ end_pfn = end >> PAGE_SHIFT;208208+ if (start_pfn < end_pfn) {209209+ nr_range = save_mr(mr, nr_range, start_pfn, end_pfn, 0);210210+ pos = end_pfn << PAGE_SHIFT;211211+ }212212+213213+ /* big page (2M) range */214214+ start_pfn = ((pos + (PMD_SIZE - 1))>>PMD_SHIFT)215215+ << (PMD_SHIFT - PAGE_SHIFT);216216+#ifdef CONFIG_X86_32217217+ end_pfn = (end>>PMD_SHIFT) << (PMD_SHIFT - PAGE_SHIFT);218218+#else /* CONFIG_X86_64 */219219+ end_pfn = ((pos + (PUD_SIZE - 1))>>PUD_SHIFT)220220+ << (PUD_SHIFT - PAGE_SHIFT);221221+ if (end_pfn > ((end>>PMD_SHIFT)<<(PMD_SHIFT - PAGE_SHIFT)))222222+ end_pfn = ((end>>PMD_SHIFT)<<(PMD_SHIFT - PAGE_SHIFT));223223+#endif224224+225225+ if (start_pfn < end_pfn) {226226+ nr_range = save_mr(mr, nr_range, start_pfn, end_pfn,227227+ page_size_mask & (1<<PG_LEVEL_2M));228228+ pos = end_pfn << PAGE_SHIFT;229229+ }230230+231231+#ifdef CONFIG_X86_64232232+ /* big page (1G) range */233233+ start_pfn = ((pos + (PUD_SIZE - 1))>>PUD_SHIFT)234234+ << (PUD_SHIFT - PAGE_SHIFT);235235+ end_pfn = (end >> PUD_SHIFT) << (PUD_SHIFT - PAGE_SHIFT);236236+ if (start_pfn < end_pfn) {237237+ nr_range = save_mr(mr, nr_range, start_pfn, end_pfn,238238+ page_size_mask &239239+ ((1<<PG_LEVEL_2M)|(1<<PG_LEVEL_1G)));240240+ pos = end_pfn << PAGE_SHIFT;241241+ }242242+243243+ /* tail is not big page (1G) alignment */244244+ start_pfn = ((pos + (PMD_SIZE - 1))>>PMD_SHIFT)245245+ << (PMD_SHIFT - PAGE_SHIFT);246246+ end_pfn = (end >> PMD_SHIFT) << (PMD_SHIFT - PAGE_SHIFT);247247+ if (start_pfn < end_pfn) {248248+ nr_range = save_mr(mr, nr_range, start_pfn, end_pfn,249249+ page_size_mask & (1<<PG_LEVEL_2M));250250+ pos = end_pfn << PAGE_SHIFT;251251+ }252252+#endif253253+254254+ /* tail is not big page (2M) alignment */255255+ start_pfn = pos>>PAGE_SHIFT;256256+ end_pfn = end>>PAGE_SHIFT;257257+ nr_range = save_mr(mr, nr_range, start_pfn, end_pfn, 0);258258+259259+ /* try to merge same page size and continuous */260260+ for (i = 0; nr_range > 1 && i < nr_range - 1; i++) {261261+ unsigned long old_start;262262+ if (mr[i].end != mr[i+1].start ||263263+ mr[i].page_size_mask != mr[i+1].page_size_mask)264264+ continue;265265+ /* move it */266266+ old_start = mr[i].start;267267+ memmove(&mr[i], &mr[i+1],268268+ (nr_range - 1 - i) * sizeof(struct map_range));269269+ mr[i--].start = old_start;270270+ nr_range--;271271+ }272272+273273+ for (i = 0; i < nr_range; i++)274274+ printk(KERN_DEBUG " %010lx - %010lx page %s\n",275275+ mr[i].start, mr[i].end,276276+ (mr[i].page_size_mask & (1<<PG_LEVEL_1G))?"1G":(277277+ (mr[i].page_size_mask & (1<<PG_LEVEL_2M))?"2M":"4k"));278278+279279+ /*280280+ * Find space for the kernel direct mapping tables.281281+ *282282+ * Later we should allocate these tables in the local node of the283283+ * memory mapped. Unfortunately this is done currently before the284284+ * nodes are discovered.285285+ */286286+ if (!after_bootmem)287287+ find_early_table_space(end, use_pse, use_gbpages);288288+289289+#ifdef CONFIG_X86_32290290+ for (i = 0; i < nr_range; i++)291291+ kernel_physical_mapping_init(mr[i].start, mr[i].end,292292+ mr[i].page_size_mask);293293+ ret = end;294294+#else /* CONFIG_X86_64 */295295+ for (i = 0; i < nr_range; i++)296296+ ret = kernel_physical_mapping_init(mr[i].start, mr[i].end,297297+ mr[i].page_size_mask);298298+#endif299299+300300+#ifdef CONFIG_X86_32301301+ early_ioremap_page_table_range_init();302302+303303+ load_cr3(swapper_pg_dir);304304+#endif305305+306306+#ifdef CONFIG_X86_64307307+ if (!after_bootmem)308308+ mmu_cr4_features = read_cr4();309309+#endif310310+ __flush_tlb_all();311311+312312+ if (!after_bootmem && e820_table_end > e820_table_start)313313+ reserve_early(e820_table_start << PAGE_SHIFT,314314+ e820_table_end << PAGE_SHIFT, "PGTABLE");315315+316316+ if (!after_bootmem)317317+ early_memtest(start, end);318318+319319+ return ret >> PAGE_SHIFT;320320+}321321+322322+323323+/*324324+ * devmem_is_allowed() checks to see if /dev/mem access to a certain address325325+ * is valid. The argument is a physical page number.326326+ *327327+ *328328+ * On x86, access has to be given to the first megabyte of ram because that area329329+ * contains bios code and data regions used by X and dosemu and similar apps.330330+ * Access has to be given to non-kernel-ram areas as well, these contain the PCI331331+ * mmio resources as well as potential bios/acpi data regions.332332+ */333333+int devmem_is_allowed(unsigned long pagenr)334334+{335335+ if (pagenr <= 256)336336+ return 1;337337+ if (iomem_is_exclusive(pagenr << PAGE_SHIFT))338338+ return 0;339339+ if (!page_is_ram(pagenr))340340+ return 1;341341+ return 0;342342+}63437344void free_init_pages(char *what, unsigned long begin, unsigned long end)8345{···38447 (unsigned long)(&__init_begin),38548 (unsigned long)(&__init_end));38649}5050+5151+#ifdef CONFIG_BLK_DEV_INITRD5252+void free_initrd_mem(unsigned long start, unsigned long end)5353+{5454+ free_init_pages("initrd memory", start, end);5555+}5656+#endif
+79-194
arch/x86/mm/init_32.c
···4949#include <asm/paravirt.h>5050#include <asm/setup.h>5151#include <asm/cacheflush.h>5252+#include <asm/init.h>52535354unsigned long max_low_pfn_mapped;5455unsigned long max_pfn_mapped;···59586059static noinline int do_test_wp_bit(void);61606262-6363-static unsigned long __initdata table_start;6464-static unsigned long __meminitdata table_end;6565-static unsigned long __meminitdata table_top;6666-6767-static int __initdata after_init_bootmem;6161+bool __read_mostly __vmalloc_start_set = false;68626963static __init void *alloc_low_page(void)7064{7171- unsigned long pfn = table_end++;6565+ unsigned long pfn = e820_table_end++;7266 void *adr;73677474- if (pfn >= table_top)6868+ if (pfn >= e820_table_top)7569 panic("alloc_low_page: ran out of memory");76707771 adr = __va(pfn * PAGE_SIZE);···86908791#ifdef CONFIG_X86_PAE8892 if (!(pgd_val(*pgd) & _PAGE_PRESENT)) {8989- if (after_init_bootmem)9393+ if (after_bootmem)9094 pmd_table = (pmd_t *)alloc_bootmem_low_pages(PAGE_SIZE);9195 else9296 pmd_table = (pmd_t *)alloc_low_page();···113117 if (!(pmd_val(*pmd) & _PAGE_PRESENT)) {114118 pte_t *page_table = NULL;115119116116- if (after_init_bootmem) {120120+ if (after_bootmem) {117121#ifdef CONFIG_DEBUG_PAGEALLOC118122 page_table = (pte_t *) alloc_bootmem_pages(PAGE_SIZE);119123#endif···129133 }130134131135 return pte_offset_kernel(pmd, 0);136136+}137137+138138+pmd_t * __init populate_extra_pmd(unsigned long vaddr)139139+{140140+ int pgd_idx = pgd_index(vaddr);141141+ int pmd_idx = pmd_index(vaddr);142142+143143+ return one_md_table_init(swapper_pg_dir + pgd_idx) + pmd_idx;144144+}145145+146146+pte_t * __init populate_extra_pte(unsigned long vaddr)147147+{148148+ int pte_idx = pte_index(vaddr);149149+ pmd_t *pmd;150150+151151+ pmd = populate_extra_pmd(vaddr);152152+ return one_page_table_init(pmd) + pte_idx;132153}133154134155static pte_t *__init page_table_kmap_check(pte_t *pte, pmd_t *pmd,···164151 if (pmd_idx_kmap_begin != pmd_idx_kmap_end165152 && (vaddr >> PMD_SHIFT) >= pmd_idx_kmap_begin166153 && (vaddr >> PMD_SHIFT) <= pmd_idx_kmap_end167167- && ((__pa(pte) >> PAGE_SHIFT) < table_start168168- || (__pa(pte) >> PAGE_SHIFT) >= table_end)) {154154+ && ((__pa(pte) >> PAGE_SHIFT) < e820_table_start155155+ || (__pa(pte) >> PAGE_SHIFT) >= e820_table_end)) {169156 pte_t *newpte;170157 int i;171158172172- BUG_ON(after_init_bootmem);159159+ BUG_ON(after_bootmem);173160 newpte = alloc_low_page();174161 for (i = 0; i < PTRS_PER_PTE; i++)175162 set_pte(newpte + i, pte[i]);···238225 * of max_low_pfn pages, by creating page tables starting from address239226 * PAGE_OFFSET:240227 */241241-static void __init kernel_physical_mapping_init(pgd_t *pgd_base,242242- unsigned long start_pfn,243243- unsigned long end_pfn,244244- int use_pse)228228+unsigned long __init229229+kernel_physical_mapping_init(unsigned long start,230230+ unsigned long end,231231+ unsigned long page_size_mask)245232{233233+ int use_pse = page_size_mask == (1<<PG_LEVEL_2M);234234+ unsigned long start_pfn, end_pfn;235235+ pgd_t *pgd_base = swapper_pg_dir;246236 int pgd_idx, pmd_idx, pte_ofs;247237 unsigned long pfn;248238 pgd_t *pgd;···253237 pte_t *pte;254238 unsigned pages_2m, pages_4k;255239 int mapping_iter;240240+241241+ start_pfn = start >> PAGE_SHIFT;242242+ end_pfn = end >> PAGE_SHIFT;256243257244 /*258245 * First iteration will setup identity mapping using large/small pages···371352 mapping_iter = 2;372353 goto repeat;373354 }374374-}375375-376376-/*377377- * devmem_is_allowed() checks to see if /dev/mem access to a certain address378378- * is valid. The argument is a physical page number.379379- *380380- *381381- * On x86, access has to be given to the first megabyte of ram because that area382382- * contains bios code and data regions used by X and dosemu and similar apps.383383- * Access has to be given to non-kernel-ram areas as well, these contain the PCI384384- * mmio resources as well as potential bios/acpi data regions.385385- */386386-int devmem_is_allowed(unsigned long pagenr)387387-{388388- if (pagenr <= 256)389389- return 1;390390- if (iomem_is_exclusive(pagenr << PAGE_SHIFT))391391- return 0;392392- if (!page_is_ram(pagenr))393393- return 1;394355 return 0;395356}396357···527528 * be partially populated, and so it avoids stomping on any existing528529 * mappings.529530 */530530-static void __init early_ioremap_page_table_range_init(pgd_t *pgd_base)531531+void __init early_ioremap_page_table_range_init(void)531532{533533+ pgd_t *pgd_base = swapper_pg_dir;532534 unsigned long vaddr, end;533535534536 /*···624624}625625early_param("noexec", noexec_setup);626626627627-static void __init set_nx(void)627627+void __init set_nx(void)628628{629629 unsigned int v[4], l, h;630630···776776#ifdef CONFIG_FLATMEM777777 max_mapnr = num_physpages;778778#endif779779+ __vmalloc_start_set = true;780780+779781 printk(KERN_NOTICE "%ldMB LOWMEM available.\n",780782 pages_to_mb(max_low_pfn));781783···799797 free_area_init_nodes(max_zone_pfns);800798}801799800800+static unsigned long __init setup_node_bootmem(int nodeid,801801+ unsigned long start_pfn,802802+ unsigned long end_pfn,803803+ unsigned long bootmap)804804+{805805+ unsigned long bootmap_size;806806+807807+ /* don't touch min_low_pfn */808808+ bootmap_size = init_bootmem_node(NODE_DATA(nodeid),809809+ bootmap >> PAGE_SHIFT,810810+ start_pfn, end_pfn);811811+ printk(KERN_INFO " node %d low ram: %08lx - %08lx\n",812812+ nodeid, start_pfn<<PAGE_SHIFT, end_pfn<<PAGE_SHIFT);813813+ printk(KERN_INFO " node %d bootmap %08lx - %08lx\n",814814+ nodeid, bootmap, bootmap + bootmap_size);815815+ free_bootmem_with_active_regions(nodeid, end_pfn);816816+ early_res_to_bootmem(start_pfn<<PAGE_SHIFT, end_pfn<<PAGE_SHIFT);817817+818818+ return bootmap + bootmap_size;819819+}820820+802821void __init setup_bootmem_allocator(void)803822{804804- int i;823823+ int nodeid;805824 unsigned long bootmap_size, bootmap;806825 /*807826 * Initialize the boot-time allocator (with low memory only):808827 */809828 bootmap_size = bootmem_bootmap_pages(max_low_pfn)<<PAGE_SHIFT;810810- bootmap = find_e820_area(min_low_pfn<<PAGE_SHIFT,811811- max_pfn_mapped<<PAGE_SHIFT, bootmap_size,829829+ bootmap = find_e820_area(0, max_pfn_mapped<<PAGE_SHIFT, bootmap_size,812830 PAGE_SIZE);813831 if (bootmap == -1L)814832 panic("Cannot find bootmem map of size %ld\n", bootmap_size);815833 reserve_early(bootmap, bootmap + bootmap_size, "BOOTMAP");816834817817- /* don't touch min_low_pfn */818818- bootmap_size = init_bootmem_node(NODE_DATA(0), bootmap >> PAGE_SHIFT,819819- min_low_pfn, max_low_pfn);820835 printk(KERN_INFO " mapped low ram: 0 - %08lx\n",821836 max_pfn_mapped<<PAGE_SHIFT);822822- printk(KERN_INFO " low ram: %08lx - %08lx\n",823823- min_low_pfn<<PAGE_SHIFT, max_low_pfn<<PAGE_SHIFT);824824- printk(KERN_INFO " bootmap %08lx - %08lx\n",825825- bootmap, bootmap + bootmap_size);826826- for_each_online_node(i)827827- free_bootmem_with_active_regions(i, max_low_pfn);828828- early_res_to_bootmem(0, max_low_pfn<<PAGE_SHIFT);837837+ printk(KERN_INFO " low ram: 0 - %08lx\n", max_low_pfn<<PAGE_SHIFT);829838830830- after_init_bootmem = 1;831831-}839839+ for_each_online_node(nodeid) {840840+ unsigned long start_pfn, end_pfn;832841833833-static void __init find_early_table_space(unsigned long end, int use_pse)834834-{835835- unsigned long puds, pmds, ptes, tables, start;836836-837837- puds = (end + PUD_SIZE - 1) >> PUD_SHIFT;838838- tables = roundup(puds * sizeof(pud_t), PAGE_SIZE);839839-840840- pmds = (end + PMD_SIZE - 1) >> PMD_SHIFT;841841- tables += roundup(pmds * sizeof(pmd_t), PAGE_SIZE);842842-843843- if (use_pse) {844844- unsigned long extra;845845-846846- extra = end - ((end>>PMD_SHIFT) << PMD_SHIFT);847847- extra += PMD_SIZE;848848- ptes = (extra + PAGE_SIZE - 1) >> PAGE_SHIFT;849849- } else850850- ptes = (end + PAGE_SIZE - 1) >> PAGE_SHIFT;851851-852852- tables += roundup(ptes * sizeof(pte_t), PAGE_SIZE);853853-854854- /* for fixmap */855855- tables += roundup(__end_of_fixed_addresses * sizeof(pte_t), PAGE_SIZE);856856-857857- /*858858- * RED-PEN putting page tables only on node 0 could859859- * cause a hotspot and fill up ZONE_DMA. The page tables860860- * need roughly 0.5KB per GB.861861- */862862- start = 0x7000;863863- table_start = find_e820_area(start, max_pfn_mapped<<PAGE_SHIFT,864864- tables, PAGE_SIZE);865865- if (table_start == -1UL)866866- panic("Cannot find space for the kernel page tables");867867-868868- table_start >>= PAGE_SHIFT;869869- table_end = table_start;870870- table_top = table_start + (tables>>PAGE_SHIFT);871871-872872- printk(KERN_DEBUG "kernel direct mapping tables up to %lx @ %lx-%lx\n",873873- end, table_start << PAGE_SHIFT,874874- (table_start << PAGE_SHIFT) + tables);875875-}876876-877877-unsigned long __init_refok init_memory_mapping(unsigned long start,878878- unsigned long end)879879-{880880- pgd_t *pgd_base = swapper_pg_dir;881881- unsigned long start_pfn, end_pfn;882882- unsigned long big_page_start;883883-#ifdef CONFIG_DEBUG_PAGEALLOC884884- /*885885- * For CONFIG_DEBUG_PAGEALLOC, identity mapping will use small pages.886886- * This will simplify cpa(), which otherwise needs to support splitting887887- * large pages into small in interrupt context, etc.888888- */889889- int use_pse = 0;842842+#ifdef CONFIG_NEED_MULTIPLE_NODES843843+ start_pfn = node_start_pfn[nodeid];844844+ end_pfn = node_end_pfn[nodeid];845845+ if (start_pfn > max_low_pfn)846846+ continue;847847+ if (end_pfn > max_low_pfn)848848+ end_pfn = max_low_pfn;890849#else891891- int use_pse = cpu_has_pse;850850+ start_pfn = 0;851851+ end_pfn = max_low_pfn;892852#endif893893-894894- /*895895- * Find space for the kernel direct mapping tables.896896- */897897- if (!after_init_bootmem)898898- find_early_table_space(end, use_pse);899899-900900-#ifdef CONFIG_X86_PAE901901- set_nx();902902- if (nx_enabled)903903- printk(KERN_INFO "NX (Execute Disable) protection: active\n");904904-#endif905905-906906- /* Enable PSE if available */907907- if (cpu_has_pse)908908- set_in_cr4(X86_CR4_PSE);909909-910910- /* Enable PGE if available */911911- if (cpu_has_pge) {912912- set_in_cr4(X86_CR4_PGE);913913- __supported_pte_mask |= _PAGE_GLOBAL;853853+ bootmap = setup_node_bootmem(nodeid, start_pfn, end_pfn,854854+ bootmap);914855 }915856916916- /*917917- * Don't use a large page for the first 2/4MB of memory918918- * because there are often fixed size MTRRs in there919919- * and overlapping MTRRs into large pages can cause920920- * slowdowns.921921- */922922- big_page_start = PMD_SIZE;923923-924924- if (start < big_page_start) {925925- start_pfn = start >> PAGE_SHIFT;926926- end_pfn = min(big_page_start>>PAGE_SHIFT, end>>PAGE_SHIFT);927927- } else {928928- /* head is not big page alignment ? */929929- start_pfn = start >> PAGE_SHIFT;930930- end_pfn = ((start + (PMD_SIZE - 1))>>PMD_SHIFT)931931- << (PMD_SHIFT - PAGE_SHIFT);932932- }933933- if (start_pfn < end_pfn)934934- kernel_physical_mapping_init(pgd_base, start_pfn, end_pfn, 0);935935-936936- /* big page range */937937- start_pfn = ((start + (PMD_SIZE - 1))>>PMD_SHIFT)938938- << (PMD_SHIFT - PAGE_SHIFT);939939- if (start_pfn < (big_page_start >> PAGE_SHIFT))940940- start_pfn = big_page_start >> PAGE_SHIFT;941941- end_pfn = (end>>PMD_SHIFT) << (PMD_SHIFT - PAGE_SHIFT);942942- if (start_pfn < end_pfn)943943- kernel_physical_mapping_init(pgd_base, start_pfn, end_pfn,944944- use_pse);945945-946946- /* tail is not big page alignment ? */947947- start_pfn = end_pfn;948948- if (start_pfn > (big_page_start>>PAGE_SHIFT)) {949949- end_pfn = end >> PAGE_SHIFT;950950- if (start_pfn < end_pfn)951951- kernel_physical_mapping_init(pgd_base, start_pfn,952952- end_pfn, 0);953953- }954954-955955- early_ioremap_page_table_range_init(pgd_base);956956-957957- load_cr3(swapper_pg_dir);958958-959959- __flush_tlb_all();960960-961961- if (!after_init_bootmem)962962- reserve_early(table_start << PAGE_SHIFT,963963- table_end << PAGE_SHIFT, "PGTABLE");964964-965965- if (!after_init_bootmem)966966- early_memtest(start, end);967967-968968- return end >> PAGE_SHIFT;857857+ after_bootmem = 1;969858}970970-971859972860/*973861 * paging_init() sets up the page tables - note that the first 8MB are···10891197 printk(KERN_INFO "Testing CPA: write protecting again\n");10901198 set_pages_ro(virt_to_page(start), size >> PAGE_SHIFT);10911199#endif10921092-}10931093-#endif10941094-10951095-#ifdef CONFIG_BLK_DEV_INITRD10961096-void free_initrd_mem(unsigned long start, unsigned long end)10971097-{10981098- free_init_pages("initrd memory", start, end);10991200}11001201#endif11011202
+68-292
arch/x86/mm/init_64.c
···4848#include <asm/kdebug.h>4949#include <asm/numa.h>5050#include <asm/cacheflush.h>5151+#include <asm/init.h>51525253/*5354 * end_pfn only includes RAM, while max_pfn_mapped includes all e820 entries.···6160static unsigned long dma_reserve __initdata;62616362DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);6464-6565-int direct_gbpages6666-#ifdef CONFIG_DIRECT_GBPAGES6767- = 16868-#endif6969-;70637164static int __init parse_direct_gbpages_off(char *arg)7265{···8287 * around without checking the pgd every time.8388 */84898585-int after_bootmem;8686-8790pteval_t __supported_pte_mask __read_mostly = ~_PAGE_IOMAP;8891EXPORT_SYMBOL_GPL(__supported_pte_mask);89929090-static int do_not_nx __cpuinitdata;9393+static int disable_nx __cpuinitdata;91949295/*9396 * noexec=on|off···100107 return -EINVAL;101108 if (!strncmp(str, "on", 2)) {102109 __supported_pte_mask |= _PAGE_NX;103103- do_not_nx = 0;110110+ disable_nx = 0;104111 } else if (!strncmp(str, "off", 3)) {105105- do_not_nx = 1;112112+ disable_nx = 1;106113 __supported_pte_mask &= ~_PAGE_NX;107114 }108115 return 0;···114121 unsigned long efer;115122116123 rdmsrl(MSR_EFER, efer);117117- if (!(efer & EFER_NX) || do_not_nx)124124+ if (!(efer & EFER_NX) || disable_nx)118125 __supported_pte_mask &= ~_PAGE_NX;119126}120127···161168 return ptr;162169}163170164164-void165165-set_pte_vaddr_pud(pud_t *pud_page, unsigned long vaddr, pte_t new_pte)171171+static pud_t *fill_pud(pgd_t *pgd, unsigned long vaddr)172172+{173173+ if (pgd_none(*pgd)) {174174+ pud_t *pud = (pud_t *)spp_getpage();175175+ pgd_populate(&init_mm, pgd, pud);176176+ if (pud != pud_offset(pgd, 0))177177+ printk(KERN_ERR "PAGETABLE BUG #00! %p <-> %p\n",178178+ pud, pud_offset(pgd, 0));179179+ }180180+ return pud_offset(pgd, vaddr);181181+}182182+183183+static pmd_t *fill_pmd(pud_t *pud, unsigned long vaddr)184184+{185185+ if (pud_none(*pud)) {186186+ pmd_t *pmd = (pmd_t *) spp_getpage();187187+ pud_populate(&init_mm, pud, pmd);188188+ if (pmd != pmd_offset(pud, 0))189189+ printk(KERN_ERR "PAGETABLE BUG #01! %p <-> %p\n",190190+ pmd, pmd_offset(pud, 0));191191+ }192192+ return pmd_offset(pud, vaddr);193193+}194194+195195+static pte_t *fill_pte(pmd_t *pmd, unsigned long vaddr)196196+{197197+ if (pmd_none(*pmd)) {198198+ pte_t *pte = (pte_t *) spp_getpage();199199+ pmd_populate_kernel(&init_mm, pmd, pte);200200+ if (pte != pte_offset_kernel(pmd, 0))201201+ printk(KERN_ERR "PAGETABLE BUG #02!\n");202202+ }203203+ return pte_offset_kernel(pmd, vaddr);204204+}205205+206206+void set_pte_vaddr_pud(pud_t *pud_page, unsigned long vaddr, pte_t new_pte)166207{167208 pud_t *pud;168209 pmd_t *pmd;169210 pte_t *pte;170211171212 pud = pud_page + pud_index(vaddr);172172- if (pud_none(*pud)) {173173- pmd = (pmd_t *) spp_getpage();174174- pud_populate(&init_mm, pud, pmd);175175- if (pmd != pmd_offset(pud, 0)) {176176- printk(KERN_ERR "PAGETABLE BUG #01! %p <-> %p\n",177177- pmd, pmd_offset(pud, 0));178178- return;179179- }180180- }181181- pmd = pmd_offset(pud, vaddr);182182- if (pmd_none(*pmd)) {183183- pte = (pte_t *) spp_getpage();184184- pmd_populate_kernel(&init_mm, pmd, pte);185185- if (pte != pte_offset_kernel(pmd, 0)) {186186- printk(KERN_ERR "PAGETABLE BUG #02!\n");187187- return;188188- }189189- }213213+ pmd = fill_pmd(pud, vaddr);214214+ pte = fill_pte(pmd, vaddr);190215191191- pte = pte_offset_kernel(pmd, vaddr);192216 set_pte(pte, new_pte);193217194218 /*···215205 __flush_tlb_one(vaddr);216206}217207218218-void219219-set_pte_vaddr(unsigned long vaddr, pte_t pteval)208208+void set_pte_vaddr(unsigned long vaddr, pte_t pteval)220209{221210 pgd_t *pgd;222211 pud_t *pud_page;···230221 }231222 pud_page = (pud_t*)pgd_page_vaddr(*pgd);232223 set_pte_vaddr_pud(pud_page, vaddr, pteval);224224+}225225+226226+pmd_t * __init populate_extra_pmd(unsigned long vaddr)227227+{228228+ pgd_t *pgd;229229+ pud_t *pud;230230+231231+ pgd = pgd_offset_k(vaddr);232232+ pud = fill_pud(pgd, vaddr);233233+ return fill_pmd(pud, vaddr);234234+}235235+236236+pte_t * __init populate_extra_pte(unsigned long vaddr)237237+{238238+ pmd_t *pmd;239239+240240+ pmd = populate_extra_pmd(vaddr);241241+ return fill_pte(pmd, vaddr);233242}234243235244/*···318291 }319292}320293321321-static unsigned long __initdata table_start;322322-static unsigned long __meminitdata table_end;323323-static unsigned long __meminitdata table_top;324324-325294static __ref void *alloc_low_page(unsigned long *phys)326295{327327- unsigned long pfn = table_end++;296296+ unsigned long pfn = e820_table_end++;328297 void *adr;329298330299 if (after_bootmem) {···330307 return adr;331308 }332309333333- if (pfn >= table_top)310310+ if (pfn >= e820_table_top)334311 panic("alloc_low_page: ran out of memory");335312336313 adr = early_memremap(pfn * PAGE_SIZE, PAGE_SIZE);···570547 return phys_pud_init(pud, addr, end, page_size_mask);571548}572549573573-static void __init find_early_table_space(unsigned long end, int use_pse,574574- int use_gbpages)575575-{576576- unsigned long puds, pmds, ptes, tables, start;577577-578578- puds = (end + PUD_SIZE - 1) >> PUD_SHIFT;579579- tables = roundup(puds * sizeof(pud_t), PAGE_SIZE);580580- if (use_gbpages) {581581- unsigned long extra;582582- extra = end - ((end>>PUD_SHIFT) << PUD_SHIFT);583583- pmds = (extra + PMD_SIZE - 1) >> PMD_SHIFT;584584- } else585585- pmds = (end + PMD_SIZE - 1) >> PMD_SHIFT;586586- tables += roundup(pmds * sizeof(pmd_t), PAGE_SIZE);587587-588588- if (use_pse) {589589- unsigned long extra;590590- extra = end - ((end>>PMD_SHIFT) << PMD_SHIFT);591591- ptes = (extra + PAGE_SIZE - 1) >> PAGE_SHIFT;592592- } else593593- ptes = (end + PAGE_SIZE - 1) >> PAGE_SHIFT;594594- tables += roundup(ptes * sizeof(pte_t), PAGE_SIZE);595595-596596- /*597597- * RED-PEN putting page tables only on node 0 could598598- * cause a hotspot and fill up ZONE_DMA. The page tables599599- * need roughly 0.5KB per GB.600600- */601601- start = 0x8000;602602- table_start = find_e820_area(start, end, tables, PAGE_SIZE);603603- if (table_start == -1UL)604604- panic("Cannot find space for the kernel page tables");605605-606606- table_start >>= PAGE_SHIFT;607607- table_end = table_start;608608- table_top = table_start + (tables >> PAGE_SHIFT);609609-610610- printk(KERN_DEBUG "kernel direct mapping tables up to %lx @ %lx-%lx\n",611611- end, table_start << PAGE_SHIFT, table_top << PAGE_SHIFT);612612-}613613-614614-static void __init init_gbpages(void)615615-{616616- if (direct_gbpages && cpu_has_gbpages)617617- printk(KERN_INFO "Using GB pages for direct mapping\n");618618- else619619- direct_gbpages = 0;620620-}621621-622622-static unsigned long __meminit kernel_physical_mapping_init(unsigned long start,623623- unsigned long end,624624- unsigned long page_size_mask)550550+unsigned long __init551551+kernel_physical_mapping_init(unsigned long start,552552+ unsigned long end,553553+ unsigned long page_size_mask)625554{626555627556 unsigned long next, last_map_addr = end;···608633 __flush_tlb_all();609634610635 return last_map_addr;611611-}612612-613613-struct map_range {614614- unsigned long start;615615- unsigned long end;616616- unsigned page_size_mask;617617-};618618-619619-#define NR_RANGE_MR 5620620-621621-static int save_mr(struct map_range *mr, int nr_range,622622- unsigned long start_pfn, unsigned long end_pfn,623623- unsigned long page_size_mask)624624-{625625-626626- if (start_pfn < end_pfn) {627627- if (nr_range >= NR_RANGE_MR)628628- panic("run out of range for init_memory_mapping\n");629629- mr[nr_range].start = start_pfn<<PAGE_SHIFT;630630- mr[nr_range].end = end_pfn<<PAGE_SHIFT;631631- mr[nr_range].page_size_mask = page_size_mask;632632- nr_range++;633633- }634634-635635- return nr_range;636636-}637637-638638-/*639639- * Setup the direct mapping of the physical memory at PAGE_OFFSET.640640- * This runs before bootmem is initialized and gets pages directly from641641- * the physical memory. To access them they are temporarily mapped.642642- */643643-unsigned long __init_refok init_memory_mapping(unsigned long start,644644- unsigned long end)645645-{646646- unsigned long last_map_addr = 0;647647- unsigned long page_size_mask = 0;648648- unsigned long start_pfn, end_pfn;649649- unsigned long pos;650650-651651- struct map_range mr[NR_RANGE_MR];652652- int nr_range, i;653653- int use_pse, use_gbpages;654654-655655- printk(KERN_INFO "init_memory_mapping: %016lx-%016lx\n", start, end);656656-657657- /*658658- * Find space for the kernel direct mapping tables.659659- *660660- * Later we should allocate these tables in the local node of the661661- * memory mapped. Unfortunately this is done currently before the662662- * nodes are discovered.663663- */664664- if (!after_bootmem)665665- init_gbpages();666666-667667-#ifdef CONFIG_DEBUG_PAGEALLOC668668- /*669669- * For CONFIG_DEBUG_PAGEALLOC, identity mapping will use small pages.670670- * This will simplify cpa(), which otherwise needs to support splitting671671- * large pages into small in interrupt context, etc.672672- */673673- use_pse = use_gbpages = 0;674674-#else675675- use_pse = cpu_has_pse;676676- use_gbpages = direct_gbpages;677677-#endif678678-679679- if (use_gbpages)680680- page_size_mask |= 1 << PG_LEVEL_1G;681681- if (use_pse)682682- page_size_mask |= 1 << PG_LEVEL_2M;683683-684684- memset(mr, 0, sizeof(mr));685685- nr_range = 0;686686-687687- /* head if not big page alignment ?*/688688- start_pfn = start >> PAGE_SHIFT;689689- pos = start_pfn << PAGE_SHIFT;690690- end_pfn = ((pos + (PMD_SIZE - 1)) >> PMD_SHIFT)691691- << (PMD_SHIFT - PAGE_SHIFT);692692- if (end_pfn > (end >> PAGE_SHIFT))693693- end_pfn = end >> PAGE_SHIFT;694694- if (start_pfn < end_pfn) {695695- nr_range = save_mr(mr, nr_range, start_pfn, end_pfn, 0);696696- pos = end_pfn << PAGE_SHIFT;697697- }698698-699699- /* big page (2M) range*/700700- start_pfn = ((pos + (PMD_SIZE - 1))>>PMD_SHIFT)701701- << (PMD_SHIFT - PAGE_SHIFT);702702- end_pfn = ((pos + (PUD_SIZE - 1))>>PUD_SHIFT)703703- << (PUD_SHIFT - PAGE_SHIFT);704704- if (end_pfn > ((end>>PMD_SHIFT)<<(PMD_SHIFT - PAGE_SHIFT)))705705- end_pfn = ((end>>PMD_SHIFT)<<(PMD_SHIFT - PAGE_SHIFT));706706- if (start_pfn < end_pfn) {707707- nr_range = save_mr(mr, nr_range, start_pfn, end_pfn,708708- page_size_mask & (1<<PG_LEVEL_2M));709709- pos = end_pfn << PAGE_SHIFT;710710- }711711-712712- /* big page (1G) range */713713- start_pfn = ((pos + (PUD_SIZE - 1))>>PUD_SHIFT)714714- << (PUD_SHIFT - PAGE_SHIFT);715715- end_pfn = (end >> PUD_SHIFT) << (PUD_SHIFT - PAGE_SHIFT);716716- if (start_pfn < end_pfn) {717717- nr_range = save_mr(mr, nr_range, start_pfn, end_pfn,718718- page_size_mask &719719- ((1<<PG_LEVEL_2M)|(1<<PG_LEVEL_1G)));720720- pos = end_pfn << PAGE_SHIFT;721721- }722722-723723- /* tail is not big page (1G) alignment */724724- start_pfn = ((pos + (PMD_SIZE - 1))>>PMD_SHIFT)725725- << (PMD_SHIFT - PAGE_SHIFT);726726- end_pfn = (end >> PMD_SHIFT) << (PMD_SHIFT - PAGE_SHIFT);727727- if (start_pfn < end_pfn) {728728- nr_range = save_mr(mr, nr_range, start_pfn, end_pfn,729729- page_size_mask & (1<<PG_LEVEL_2M));730730- pos = end_pfn << PAGE_SHIFT;731731- }732732-733733- /* tail is not big page (2M) alignment */734734- start_pfn = pos>>PAGE_SHIFT;735735- end_pfn = end>>PAGE_SHIFT;736736- nr_range = save_mr(mr, nr_range, start_pfn, end_pfn, 0);737737-738738- /* try to merge same page size and continuous */739739- for (i = 0; nr_range > 1 && i < nr_range - 1; i++) {740740- unsigned long old_start;741741- if (mr[i].end != mr[i+1].start ||742742- mr[i].page_size_mask != mr[i+1].page_size_mask)743743- continue;744744- /* move it */745745- old_start = mr[i].start;746746- memmove(&mr[i], &mr[i+1],747747- (nr_range - 1 - i) * sizeof (struct map_range));748748- mr[i--].start = old_start;749749- nr_range--;750750- }751751-752752- for (i = 0; i < nr_range; i++)753753- printk(KERN_DEBUG " %010lx - %010lx page %s\n",754754- mr[i].start, mr[i].end,755755- (mr[i].page_size_mask & (1<<PG_LEVEL_1G))?"1G":(756756- (mr[i].page_size_mask & (1<<PG_LEVEL_2M))?"2M":"4k"));757757-758758- if (!after_bootmem)759759- find_early_table_space(end, use_pse, use_gbpages);760760-761761- for (i = 0; i < nr_range; i++)762762- last_map_addr = kernel_physical_mapping_init(763763- mr[i].start, mr[i].end,764764- mr[i].page_size_mask);765765-766766- if (!after_bootmem)767767- mmu_cr4_features = read_cr4();768768- __flush_tlb_all();769769-770770- if (!after_bootmem && table_end > table_start)771771- reserve_early(table_start << PAGE_SHIFT,772772- table_end << PAGE_SHIFT, "PGTABLE");773773-774774- printk(KERN_INFO "last_map_addr: %lx end: %lx\n",775775- last_map_addr, end);776776-777777- if (!after_bootmem)778778- early_memtest(start, end);779779-780780- return last_map_addr >> PAGE_SHIFT;781636}782637783638#ifndef CONFIG_NUMA···680875#endif681876682877#endif /* CONFIG_MEMORY_HOTPLUG */683683-684684-/*685685- * devmem_is_allowed() checks to see if /dev/mem access to a certain address686686- * is valid. The argument is a physical page number.687687- *688688- *689689- * On x86, access has to be given to the first megabyte of ram because that area690690- * contains bios code and data regions used by X and dosemu and similar apps.691691- * Access has to be given to non-kernel-ram areas as well, these contain the PCI692692- * mmio resources as well as potential bios/acpi data regions.693693- */694694-int devmem_is_allowed(unsigned long pagenr)695695-{696696- if (pagenr <= 256)697697- return 1;698698- if (iomem_is_exclusive(pagenr << PAGE_SHIFT))699699- return 0;700700- if (!page_is_ram(pagenr))701701- return 1;702702- return 0;703703-}704704-705878706879static struct kcore_list kcore_mem, kcore_vmalloc, kcore_kernel,707880 kcore_modules, kcore_vsyscall;···766983#endif767984}768985769769-#endif770770-771771-#ifdef CONFIG_BLK_DEV_INITRD772772-void free_initrd_mem(unsigned long start, unsigned long end)773773-{774774- free_init_pages("initrd memory", start, end);775775-}776986#endif777987778988int __init reserve_bootmem_generic(unsigned long phys, unsigned long len,
+18-17
arch/x86/mm/ioremap.c
···3838 } else {3939 VIRTUAL_BUG_ON(x < PAGE_OFFSET);4040 x -= PAGE_OFFSET;4141- VIRTUAL_BUG_ON(system_state == SYSTEM_BOOTING ? x > MAXMEM :4242- !phys_addr_valid(x));4141+ VIRTUAL_BUG_ON(!phys_addr_valid(x));4342 }4443 return x;4544}···5556 if (x < PAGE_OFFSET)5657 return false;5758 x -= PAGE_OFFSET;5858- if (system_state == SYSTEM_BOOTING ?5959- x > MAXMEM : !phys_addr_valid(x)) {5959+ if (!phys_addr_valid(x))6060 return false;6161- }6261 }63626463 return pfn_valid(x >> PAGE_SHIFT);···7376#ifdef CONFIG_DEBUG_VIRTUAL7477unsigned long __phys_addr(unsigned long x)7578{7676- /* VMALLOC_* aren't constants; not available at the boot time */7979+ /* VMALLOC_* aren't constants */7780 VIRTUAL_BUG_ON(x < PAGE_OFFSET);7878- VIRTUAL_BUG_ON(system_state != SYSTEM_BOOTING &&7979- is_vmalloc_addr((void *) x));8181+ VIRTUAL_BUG_ON(__vmalloc_start_set && is_vmalloc_addr((void *) x));8082 return x - PAGE_OFFSET;8183}8284EXPORT_SYMBOL(__phys_addr);···8589{8690 if (x < PAGE_OFFSET)8791 return false;8888- if (system_state != SYSTEM_BOOTING && is_vmalloc_addr((void *) x))9292+ if (__vmalloc_start_set && is_vmalloc_addr((void *) x))9393+ return false;9494+ if (x >= FIXADDR_START)8995 return false;9096 return pfn_valid((x - PAGE_OFFSET) >> PAGE_SHIFT);9197}···506508 return &bm_pte[pte_index(addr)];507509}508510511511+static unsigned long slot_virt[FIX_BTMAPS_SLOTS] __initdata;512512+509513void __init early_ioremap_init(void)510514{511515 pmd_t *pmd;516516+ int i;512517513518 if (early_ioremap_debug)514519 printk(KERN_INFO "early_ioremap_init()\n");520520+521521+ for (i = 0; i < FIX_BTMAPS_SLOTS; i++)522522+ slot_virt[i] = fix_to_virt(FIX_BTMAP_BEGIN - NR_FIX_BTMAPS*i);515523516524 pmd = early_ioremap_pmd(fix_to_virt(FIX_BTMAP_BEGIN));517525 memset(bm_pte, 0, sizeof(bm_pte));···585581586582static void __iomem *prev_map[FIX_BTMAPS_SLOTS] __initdata;587583static unsigned long prev_size[FIX_BTMAPS_SLOTS] __initdata;584584+588585static int __init check_early_ioremap_leak(void)589586{590587 int count = 0;···607602}608603late_initcall(check_early_ioremap_leak);609604610610-static void __init __iomem *__early_ioremap(unsigned long phys_addr, unsigned long size, pgprot_t prot)605605+static void __init __iomem *606606+__early_ioremap(unsigned long phys_addr, unsigned long size, pgprot_t prot)611607{612608 unsigned long offset, last_addr;613609 unsigned int nrpages;···674668 --nrpages;675669 }676670 if (early_ioremap_debug)677677- printk(KERN_CONT "%08lx + %08lx\n", offset, fix_to_virt(idx0));671671+ printk(KERN_CONT "%08lx + %08lx\n", offset, slot_virt[slot]);678672679679- prev_map[slot] = (void __iomem *)(offset + fix_to_virt(idx0));673673+ prev_map[slot] = (void __iomem *)(offset + slot_virt[slot]);680674 return prev_map[slot];681675}682676···743737 --nrpages;744738 }745739 prev_map[slot] = NULL;746746-}747747-748748-void __this_fixmap_does_not_exist(void)749749-{750750- WARN_ON(1);751740}
+104-60
arch/x86/mm/kmmio.c
···3232 struct list_head list;3333 struct kmmio_fault_page *release_next;3434 unsigned long page; /* location of the fault page */3535+ bool old_presence; /* page presence prior to arming */3636+ bool armed;35373638 /*3739 * Number of times this page has been registered as a part3840 * of a probe. If zero, page is disarmed and this may be freed.3939- * Used only by writers (RCU).4141+ * Used only by writers (RCU) and post_kmmio_handler().4242+ * Protected by kmmio_lock, when linked into kmmio_page_table.4043 */4144 int count;4245};···108105 return NULL;109106}110107111111-static void set_page_present(unsigned long addr, bool present,112112- unsigned int *pglevel)108108+static void set_pmd_presence(pmd_t *pmd, bool present, bool *old)113109{114114- pteval_t pteval;115115- pmdval_t pmdval;110110+ pmdval_t v = pmd_val(*pmd);111111+ *old = !!(v & _PAGE_PRESENT);112112+ v &= ~_PAGE_PRESENT;113113+ if (present)114114+ v |= _PAGE_PRESENT;115115+ set_pmd(pmd, __pmd(v));116116+}117117+118118+static void set_pte_presence(pte_t *pte, bool present, bool *old)119119+{120120+ pteval_t v = pte_val(*pte);121121+ *old = !!(v & _PAGE_PRESENT);122122+ v &= ~_PAGE_PRESENT;123123+ if (present)124124+ v |= _PAGE_PRESENT;125125+ set_pte_atomic(pte, __pte(v));126126+}127127+128128+static int set_page_presence(unsigned long addr, bool present, bool *old)129129+{116130 unsigned int level;117117- pmd_t *pmd;118131 pte_t *pte = lookup_address(addr, &level);119132120133 if (!pte) {121134 pr_err("kmmio: no pte for page 0x%08lx\n", addr);122122- return;135135+ return -1;123136 }124124-125125- if (pglevel)126126- *pglevel = level;127137128138 switch (level) {129139 case PG_LEVEL_2M:130130- pmd = (pmd_t *)pte;131131- pmdval = pmd_val(*pmd) & ~_PAGE_PRESENT;132132- if (present)133133- pmdval |= _PAGE_PRESENT;134134- set_pmd(pmd, __pmd(pmdval));140140+ set_pmd_presence((pmd_t *)pte, present, old);135141 break;136136-137142 case PG_LEVEL_4K:138138- pteval = pte_val(*pte) & ~_PAGE_PRESENT;139139- if (present)140140- pteval |= _PAGE_PRESENT;141141- set_pte_atomic(pte, __pte(pteval));143143+ set_pte_presence(pte, present, old);142144 break;143143-144145 default:145146 pr_err("kmmio: unexpected page level 0x%x.\n", level);146146- return;147147+ return -1;147148 }148149149150 __flush_tlb_one(addr);151151+ return 0;150152}151153152152-/** Mark the given page as not present. Access to it will trigger a fault. */153153-static void arm_kmmio_fault_page(unsigned long page, unsigned int *pglevel)154154+/*155155+ * Mark the given page as not present. Access to it will trigger a fault.156156+ *157157+ * Struct kmmio_fault_page is protected by RCU and kmmio_lock, but the158158+ * protection is ignored here. RCU read lock is assumed held, so the struct159159+ * will not disappear unexpectedly. Furthermore, the caller must guarantee,160160+ * that double arming the same virtual address (page) cannot occur.161161+ *162162+ * Double disarming on the other hand is allowed, and may occur when a fault163163+ * and mmiotrace shutdown happen simultaneously.164164+ */165165+static int arm_kmmio_fault_page(struct kmmio_fault_page *f)154166{155155- set_page_present(page & PAGE_MASK, false, pglevel);167167+ int ret;168168+ WARN_ONCE(f->armed, KERN_ERR "kmmio page already armed.\n");169169+ if (f->armed) {170170+ pr_warning("kmmio double-arm: page 0x%08lx, ref %d, old %d\n",171171+ f->page, f->count, f->old_presence);172172+ }173173+ ret = set_page_presence(f->page, false, &f->old_presence);174174+ WARN_ONCE(ret < 0, KERN_ERR "kmmio arming 0x%08lx failed.\n", f->page);175175+ f->armed = true;176176+ return ret;156177}157178158158-/** Mark the given page as present. */159159-static void disarm_kmmio_fault_page(unsigned long page, unsigned int *pglevel)179179+/** Restore the given page to saved presence state. */180180+static void disarm_kmmio_fault_page(struct kmmio_fault_page *f)160181{161161- set_page_present(page & PAGE_MASK, true, pglevel);182182+ bool tmp;183183+ int ret = set_page_presence(f->page, f->old_presence, &tmp);184184+ WARN_ONCE(ret < 0,185185+ KERN_ERR "kmmio disarming 0x%08lx failed.\n", f->page);186186+ f->armed = false;162187}163188164189/*···233202234203 ctx = &get_cpu_var(kmmio_ctx);235204 if (ctx->active) {236236- disarm_kmmio_fault_page(faultpage->page, NULL);237205 if (addr == ctx->addr) {238206 /*239239- * On SMP we sometimes get recursive probe hits on the240240- * same address. Context is already saved, fall out.207207+ * A second fault on the same page means some other208208+ * condition needs handling by do_page_fault(), the209209+ * page really not being present is the most common.241210 */242242- pr_debug("kmmio: duplicate probe hit on CPU %d, for "243243- "address 0x%08lx.\n",244244- smp_processor_id(), addr);245245- ret = 1;246246- goto no_kmmio_ctx;247247- }248248- /*249249- * Prevent overwriting already in-flight context.250250- * This should not happen, let's hope disarming at least251251- * prevents a panic.252252- */253253- pr_emerg("kmmio: recursive probe hit on CPU %d, "211211+ pr_debug("kmmio: secondary hit for 0x%08lx CPU %d.\n",212212+ addr, smp_processor_id());213213+214214+ if (!faultpage->old_presence)215215+ pr_info("kmmio: unexpected secondary hit for "216216+ "address 0x%08lx on CPU %d.\n", addr,217217+ smp_processor_id());218218+ } else {219219+ /*220220+ * Prevent overwriting already in-flight context.221221+ * This should not happen, let's hope disarming at222222+ * least prevents a panic.223223+ */224224+ pr_emerg("kmmio: recursive probe hit on CPU %d, "254225 "for address 0x%08lx. Ignoring.\n",255226 smp_processor_id(), addr);256256- pr_emerg("kmmio: previous hit was at 0x%08lx.\n",257257- ctx->addr);227227+ pr_emerg("kmmio: previous hit was at 0x%08lx.\n",228228+ ctx->addr);229229+ disarm_kmmio_fault_page(faultpage);230230+ }258231 goto no_kmmio_ctx;259232 }260233 ctx->active++;···279244 regs->flags &= ~X86_EFLAGS_IF;280245281246 /* Now we set present bit in PTE and single step. */282282- disarm_kmmio_fault_page(ctx->fpage->page, NULL);247247+ disarm_kmmio_fault_page(ctx->fpage);283248284249 /*285250 * If another cpu accesses the same page while we are stepping,···310275 struct kmmio_context *ctx = &get_cpu_var(kmmio_ctx);311276312277 if (!ctx->active) {313313- pr_debug("kmmio: spurious debug trap on CPU %d.\n",278278+ pr_warning("kmmio: spurious debug trap on CPU %d.\n",314279 smp_processor_id());315280 goto out;316281 }···318283 if (ctx->probe && ctx->probe->post_handler)319284 ctx->probe->post_handler(ctx->probe, condition, regs);320285321321- arm_kmmio_fault_page(ctx->fpage->page, NULL);286286+ /* Prevent racing against release_kmmio_fault_page(). */287287+ spin_lock(&kmmio_lock);288288+ if (ctx->fpage->count)289289+ arm_kmmio_fault_page(ctx->fpage);290290+ spin_unlock(&kmmio_lock);322291323292 regs->flags &= ~X86_EFLAGS_TF;324293 regs->flags |= ctx->saved_flags;···354315 f = get_kmmio_fault_page(page);355316 if (f) {356317 if (!f->count)357357- arm_kmmio_fault_page(f->page, NULL);318318+ arm_kmmio_fault_page(f);358319 f->count++;359320 return 0;360321 }361322362362- f = kmalloc(sizeof(*f), GFP_ATOMIC);323323+ f = kzalloc(sizeof(*f), GFP_ATOMIC);363324 if (!f)364325 return -1;365326366327 f->count = 1;367328 f->page = page;368368- list_add_rcu(&f->list, kmmio_page_list(f->page));369329370370- arm_kmmio_fault_page(f->page, NULL);330330+ if (arm_kmmio_fault_page(f)) {331331+ kfree(f);332332+ return -1;333333+ }334334+335335+ list_add_rcu(&f->list, kmmio_page_list(f->page));371336372337 return 0;373338}···390347 f->count--;391348 BUG_ON(f->count < 0);392349 if (!f->count) {393393- disarm_kmmio_fault_page(f->page, NULL);350350+ disarm_kmmio_fault_page(f);394351 f->release_next = *release_list;395352 *release_list = f;396353 }···451408452409static void remove_kmmio_fault_pages(struct rcu_head *head)453410{454454- struct kmmio_delayed_release *dr = container_of(455455- head,456456- struct kmmio_delayed_release,457457- rcu);411411+ struct kmmio_delayed_release *dr =412412+ container_of(head, struct kmmio_delayed_release, rcu);458413 struct kmmio_fault_page *p = dr->release_list;459414 struct kmmio_fault_page **prevp = &dr->release_list;460415 unsigned long flags;416416+461417 spin_lock_irqsave(&kmmio_lock, flags);462418 while (p) {463463- if (!p->count)419419+ if (!p->count) {464420 list_del_rcu(&p->list);465465- else421421+ prevp = &p->release_next;422422+ } else {466423 *prevp = p->release_next;467467- prevp = &p->release_next;424424+ }468425 p = p->release_next;469426 }470427 spin_unlock_irqrestore(&kmmio_lock, flags);428428+471429 /* This is the real RCU destroy call. */472430 call_rcu(&dr->rcu, rcu_free_kmmio_fault_pages);473431}
···103103 help104104 Can we use information of configuration file?105105106106-config HIGHMEM107107- bool "High memory support"108108-109106endmenu110107111108menu "Platform options"
···24232423 }2424242424252425 /* prereset() might have cleared ATA_EH_RESET. If so,24262426- * bang classes and return.24262426+ * bang classes, thaw and return.24272427 */24282428 if (reset && !(ehc->i.action & ATA_EH_RESET)) {24292429 ata_for_each_dev(dev, link, ALL)24302430 classes[dev->devno] = ATA_DEV_NONE;24312431+ if ((ap->pflags & ATA_PFLAG_FROZEN) &&24322432+ ata_is_host_link(link))24332433+ ata_eh_thaw_port(ap);24312434 rc = 0;24322435 goto out;24332436 }···29042901 int i;2905290229062903 for (i = 0; i < ATA_EH_UA_TRIES; i++) {29072907- u8 sense_buffer[SCSI_SENSE_BUFFERSIZE];29042904+ u8 *sense_buffer = dev->link->ap->sector_buf;29082905 u8 sense_key = 0;29092906 unsigned int err_mask;29102907
+1-1
drivers/ata/sata_nv.c
···25232523module_init(nv_init);25242524module_exit(nv_exit);25252525module_param_named(adma, adma_enabled, bool, 0444);25262526-MODULE_PARM_DESC(adma, "Enable use of ADMA (Default: true)");25262526+MODULE_PARM_DESC(adma, "Enable use of ADMA (Default: false)");25272527module_param_named(swncq, swncq_enabled, bool, 0444);25282528MODULE_PARM_DESC(swncq, "Enable use of SWNCQ (Default: true)");25292529
+1-1
drivers/base/node.c
···303303 sect_start_pfn = section_nr_to_pfn(mem_blk->phys_index);304304 sect_end_pfn = sect_start_pfn + PAGES_PER_SECTION - 1;305305 for (pfn = sect_start_pfn; pfn <= sect_end_pfn; pfn++) {306306- unsigned int nid;306306+ int nid;307307308308 nid = get_nid_for_pfn(pfn);309309 if (nid < 0)
+1-1
drivers/block/aoe/aoedev.c
···173173 return;174174 while (atomic_read(&skb_shinfo(skb)->dataref) != 1 && i-- > 0)175175 msleep(Sms);176176- if (i <= 0) {176176+ if (i < 0) {177177 printk(KERN_ERR178178 "aoe: %s holds ref: %s\n",179179 skb->dev ? skb->dev->name : "netif",
+3-5
drivers/block/cciss.c
···36063606 if (cciss_hard_reset_controller(pdev) || cciss_reset_msi(pdev))36073607 return -ENODEV;3608360836093609- /* Some devices (notably the HP Smart Array 5i Controller)36103610- need a little pause here */36113611- schedule_timeout_uninterruptible(30*HZ);36123612-36133613- /* Now try to get the controller to respond to a no-op */36093609+ /* Now try to get the controller to respond to a no-op. Some36103610+ devices (notably the HP Smart Array 5i Controller) need36113611+ up to 30 seconds to respond. */36143612 for (i=0; i<30; i++) {36153613 if (cciss_noop(pdev) == 0)36163614 break;
+1-2
drivers/block/loop.c
···392392 struct loop_device *lo = p->lo;393393 struct page *page = buf->page;394394 sector_t IV;395395- size_t size;396396- int ret;395395+ int size, ret;397396398397 ret = buf->ops->confirm(pipe, buf);399398 if (unlikely(ret))
+2
drivers/block/xen-blkfront.c
···977977 break;978978979979 case XenbusStateClosing:980980+ if (info->gd == NULL)981981+ xenbus_dev_fatal(dev, -ENODEV, "gd is NULL");980982 bd = bdget_disk(info->gd, 0);981983 if (bd == NULL)982984 xenbus_dev_fatal(dev, -ENODEV, "bdget failed");
+9-4
drivers/char/agp/amd64-agp.c
···271271 nb_order = (nb_order >> 1) & 7;272272 pci_read_config_dword(nb, AMD64_GARTAPERTUREBASE, &nb_base);273273 nb_aper = nb_base << 25;274274- if (agp_aperture_valid(nb_aper, (32*1024*1024)<<nb_order)) {275275- return 0;276276- }277274278275 /* Northbridge seems to contain crap. Try the AGP bridge. */279276280277 pci_read_config_word(agp, cap+0x14, &apsize);281281- if (apsize == 0xffff)278278+ if (apsize == 0xffff) {279279+ if (agp_aperture_valid(nb_aper, (32*1024*1024)<<nb_order))280280+ return 0;282281 return -1;282282+ }283283284284 apsize &= 0xfff;285285 /* Some BIOS use weird encodings not in the AGPv3 table. */···299299 dev_info(&agp->dev, "aperture size %u MB is not right, using settings from NB\n",300300 32 << order);301301 order = nb_order;302302+ }303303+304304+ if (nb_order >= order) {305305+ if (agp_aperture_valid(nb_aper, (32*1024*1024)<<nb_order))306306+ return 0;302307 }303308304309 dev_info(&agp->dev, "aperture from AGP @ %Lx size %u MB\n",
+5-3
drivers/char/agp/intel-agp.c
···633633 break;634634 }635635 }636636- if (gtt_entries > 0)636636+ if (gtt_entries > 0) {637637 dev_info(&agp_bridge->dev->dev, "detected %dK %s memory\n",638638 gtt_entries / KB(1), local ? "local" : "stolen");639639- else639639+ gtt_entries /= KB(4);640640+ } else {640641 dev_info(&agp_bridge->dev->dev,641642 "no pre-allocated video memory detected\n");642642- gtt_entries /= KB(4);643643+ gtt_entries = 0;644644+ }643645644646 intel_private.gtt_entries = gtt_entries;645647}
+18-33
drivers/cpufreq/cpufreq.c
···754754 .release = cpufreq_sysfs_release,755755};756756757757-static struct kobj_type ktype_empty_cpufreq = {758758- .sysfs_ops = &sysfs_ops,759759- .release = cpufreq_sysfs_release,760760-};761761-762757763758/**764759 * cpufreq_add_dev - add a CPU device···887892 memcpy(&new_policy, policy, sizeof(struct cpufreq_policy));888893889894 /* prepare interface data */890890- if (!cpufreq_driver->hide_interface) {891891- ret = kobject_init_and_add(&policy->kobj, &ktype_cpufreq,892892- &sys_dev->kobj, "cpufreq");895895+ ret = kobject_init_and_add(&policy->kobj, &ktype_cpufreq, &sys_dev->kobj,896896+ "cpufreq");897897+ if (ret)898898+ goto err_out_driver_exit;899899+900900+ /* set up files for this cpu device */901901+ drv_attr = cpufreq_driver->attr;902902+ while ((drv_attr) && (*drv_attr)) {903903+ ret = sysfs_create_file(&policy->kobj, &((*drv_attr)->attr));893904 if (ret)894905 goto err_out_driver_exit;895895-896896- /* set up files for this cpu device */897897- drv_attr = cpufreq_driver->attr;898898- while ((drv_attr) && (*drv_attr)) {899899- ret = sysfs_create_file(&policy->kobj,900900- &((*drv_attr)->attr));901901- if (ret)902902- goto err_out_driver_exit;903903- drv_attr++;904904- }905905- if (cpufreq_driver->get) {906906- ret = sysfs_create_file(&policy->kobj,907907- &cpuinfo_cur_freq.attr);908908- if (ret)909909- goto err_out_driver_exit;910910- }911911- if (cpufreq_driver->target) {912912- ret = sysfs_create_file(&policy->kobj,913913- &scaling_cur_freq.attr);914914- if (ret)915915- goto err_out_driver_exit;916916- }917917- } else {918918- ret = kobject_init_and_add(&policy->kobj, &ktype_empty_cpufreq,919919- &sys_dev->kobj, "cpufreq");906906+ drv_attr++;907907+ }908908+ if (cpufreq_driver->get) {909909+ ret = sysfs_create_file(&policy->kobj, &cpuinfo_cur_freq.attr);910910+ if (ret)911911+ goto err_out_driver_exit;912912+ }913913+ if (cpufreq_driver->target) {914914+ ret = sysfs_create_file(&policy->kobj, &scaling_cur_freq.attr);920915 if (ret)921916 goto err_out_driver_exit;922917 }
+4-2
drivers/crypto/ixp4xx_crypto.c
···457457 if (!ctx_pool) {458458 goto err;459459 }460460- ret = qmgr_request_queue(SEND_QID, NPE_QLEN_TOTAL, 0, 0);460460+ ret = qmgr_request_queue(SEND_QID, NPE_QLEN_TOTAL, 0, 0,461461+ "ixp_crypto:out", NULL);461462 if (ret)462463 goto err;463463- ret = qmgr_request_queue(RECV_QID, NPE_QLEN, 0, 0);464464+ ret = qmgr_request_queue(RECV_QID, NPE_QLEN, 0, 0,465465+ "ixp_crypto:in", NULL);464466 if (ret) {465467 qmgr_release_queue(SEND_QID);466468 goto err;
···11/*22- * Copyright(c) 2007 Intel Corporation. All rights reserved.22+ * Copyright(c) 2007 - 2009 Intel Corporation. All rights reserved.33 *44 * This program is free software; you can redistribute it and/or modify it55 * under the terms of the GNU General Public License as published by the Free
···158158159159static void dma_halt(struct fsl_dma_chan *fsl_chan)160160{161161- int i = 0;161161+ int i;162162+162163 DMA_OUT(fsl_chan, &fsl_chan->reg_base->mr,163164 DMA_IN(fsl_chan, &fsl_chan->reg_base->mr, 32) | FSL_DMA_MR_CA,164165 32);···167166 DMA_IN(fsl_chan, &fsl_chan->reg_base->mr, 32) & ~(FSL_DMA_MR_CS168167 | FSL_DMA_MR_EMS_EN | FSL_DMA_MR_CA), 32);169168170170- while (!dma_is_idle(fsl_chan) && (i++ < 100))169169+ for (i = 0; i < 100; i++) {170170+ if (dma_is_idle(fsl_chan))171171+ break;171172 udelay(10);173173+ }172174 if (i >= 100 && !dma_is_idle(fsl_chan))173175 dev_err(fsl_chan->dev, "DMA halt timeout!\n");174176}
+1-1
drivers/dma/ioat.c
···11/*22 * Intel I/OAT DMA Linux driver33- * Copyright(c) 2007 Intel Corporation.33+ * Copyright(c) 2007 - 2009 Intel Corporation.44 *55 * This program is free software; you can redistribute it and/or modify it66 * under the terms and conditions of the GNU General Public License,
+25-1
drivers/dma/ioat_dca.c
···11/*22 * Intel I/OAT DMA Linux driver33- * Copyright(c) 2007 Intel Corporation.33+ * Copyright(c) 2007 - 2009 Intel Corporation.44 *55 * This program is free software; you can redistribute it and/or modify it66 * under the terms and conditions of the GNU General Public License,···4848#define DCA3_TAG_MAP_LITERAL_VAL 0x149495050#define DCA_TAG_MAP_MASK 0xDF5151+5252+/* expected tag map bytes for I/OAT ver.2 */5353+#define DCA2_TAG_MAP_BYTE0 0x805454+#define DCA2_TAG_MAP_BYTE1 0x05555+#define DCA2_TAG_MAP_BYTE2 0x815656+#define DCA2_TAG_MAP_BYTE3 0x825757+#define DCA2_TAG_MAP_BYTE4 0x825858+5959+/* verify if tag map matches expected values */6060+static inline int dca2_tag_map_valid(u8 *tag_map)6161+{6262+ return ((tag_map[0] == DCA2_TAG_MAP_BYTE0) &&6363+ (tag_map[1] == DCA2_TAG_MAP_BYTE1) &&6464+ (tag_map[2] == DCA2_TAG_MAP_BYTE2) &&6565+ (tag_map[3] == DCA2_TAG_MAP_BYTE3) &&6666+ (tag_map[4] == DCA2_TAG_MAP_BYTE4));6767+}51685269/*5370 * "Legacy" DCA systems do not implement the DCA register set in the···467450 ioatdca->tag_map[i] = bit | DCA_TAG_MAP_VALID;468451 else469452 ioatdca->tag_map[i] = 0;453453+ }454454+455455+ if (!dca2_tag_map_valid(ioatdca->tag_map)) {456456+ dev_err(&pdev->dev, "APICID_TAG_MAP set incorrectly by BIOS, "457457+ "disabling DCA\n");458458+ free_dca_provider(dca);459459+ return NULL;470460 }471461472462 err = register_dca_provider(dca, &pdev->dev);
+24-15
drivers/dma/ioat_dma.c
···11/*22 * Intel I/OAT DMA Linux driver33- * Copyright(c) 2004 - 2007 Intel Corporation.33+ * Copyright(c) 2004 - 2009 Intel Corporation.44 *55 * This program is free software; you can redistribute it and/or modify it66 * under the terms and conditions of the GNU General Public License,···189189 ioat_chan->xfercap = xfercap;190190 ioat_chan->desccount = 0;191191 INIT_DELAYED_WORK(&ioat_chan->work, ioat_dma_chan_reset_part2);192192- if (ioat_chan->device->version != IOAT_VER_1_2) {193193- writel(IOAT_DCACTRL_CMPL_WRITE_ENABLE194194- | IOAT_DMA_DCA_ANY_CPU,195195- ioat_chan->reg_base + IOAT_DCACTRL_OFFSET);196196- }192192+ if (ioat_chan->device->version == IOAT_VER_2_0)193193+ writel(IOAT_DCACTRL_CMPL_WRITE_ENABLE |194194+ IOAT_DMA_DCA_ANY_CPU,195195+ ioat_chan->reg_base + IOAT_DCACTRL_OFFSET);196196+ else if (ioat_chan->device->version == IOAT_VER_3_0)197197+ writel(IOAT_DMA_DCA_ANY_CPU,198198+ ioat_chan->reg_base + IOAT_DCACTRL_OFFSET);197199 spin_lock_init(&ioat_chan->cleanup_lock);198200 spin_lock_init(&ioat_chan->desc_lock);199201 INIT_LIST_HEAD(&ioat_chan->free_desc);···11711169 * up if the client is done with the descriptor11721170 */11731171 if (async_tx_test_ack(&desc->async_tx)) {11741174- list_del(&desc->node);11751175- list_add_tail(&desc->node,11761176- &ioat_chan->free_desc);11721172+ list_move_tail(&desc->node,11731173+ &ioat_chan->free_desc);11771174 } else11781175 desc->async_tx.cookie = 0;11791176 } else {···13631362 dma_cookie_t cookie;13641363 int err = 0;13651364 struct completion cmp;13651365+ unsigned long tmo;1366136613671367 src = kzalloc(sizeof(u8) * IOAT_TEST_SIZE, GFP_KERNEL);13681368 if (!src)···14151413 }14161414 device->common.device_issue_pending(dma_chan);1417141514181418- wait_for_completion_timeout(&cmp, msecs_to_jiffies(3000));14161416+ tmo = wait_for_completion_timeout(&cmp, msecs_to_jiffies(3000));1419141714201420- if (device->common.device_is_tx_complete(dma_chan, cookie, NULL, NULL)14181418+ if (tmo == 0 ||14191419+ device->common.device_is_tx_complete(dma_chan, cookie, NULL, NULL)14211420 != DMA_SUCCESS) {14221421 dev_err(&device->pdev->dev,14231422 "Self-test copy timed out, disabling\n");···16601657 " %d channels, device version 0x%02x, driver version %s\n",16611658 device->common.chancnt, device->version, IOAT_DMA_VERSION);1662165916601660+ if (!device->common.chancnt) {16611661+ dev_err(&device->pdev->dev,16621662+ "Intel(R) I/OAT DMA Engine problem found: "16631663+ "zero channels detected\n");16641664+ goto err_setup_interrupts;16651665+ }16661666+16631667 err = ioat_dma_setup_interrupts(device);16641668 if (err)16651669 goto err_setup_interrupts;···17061696 struct dma_chan *chan, *_chan;17071697 struct ioat_dma_chan *ioat_chan;1708169816991699+ if (device->version != IOAT_VER_3_0)17001700+ cancel_delayed_work(&device->work);17011701+17091702 ioat_dma_remove_interrupts(device);1710170317111704 dma_async_device_unregister(&device->common);···17191706 iounmap(device->reg_base);17201707 pci_release_regions(device->pdev);17211708 pci_disable_device(device->pdev);17221722-17231723- if (device->version != IOAT_VER_3_0) {17241724- cancel_delayed_work(&device->work);17251725- }1726170917271710 list_for_each_entry_safe(chan, _chan,17281711 &device->common.channels, device_node) {
+5-3
drivers/dma/ioatdma.h
···11/*22- * Copyright(c) 2004 - 2007 Intel Corporation. All rights reserved.22+ * Copyright(c) 2004 - 2009 Intel Corporation. All rights reserved.33 *44 * This program is free software; you can redistribute it and/or modify it55 * under the terms of the GNU General Public License as published by the Free···2929#include <linux/pci_ids.h>3030#include <net/tcp.h>31313232-#define IOAT_DMA_VERSION "3.30"3232+#define IOAT_DMA_VERSION "3.64"33333434enum ioat_interrupt {3535 none = 0,···135135 #ifdef CONFIG_NET_DMA136136 switch (dev->version) {137137 case IOAT_VER_1_2:138138- case IOAT_VER_3_0:139138 sysctl_tcp_dma_copybreak = 4096;140139 break;141140 case IOAT_VER_2_0:142141 sysctl_tcp_dma_copybreak = 2048;142142+ break;143143+ case IOAT_VER_3_0:144144+ sysctl_tcp_dma_copybreak = 262144;143145 break;144146 }145147 #endif
+1-1
drivers/dma/ioatdma_hw.h
···11/*22- * Copyright(c) 2004 - 2007 Intel Corporation. All rights reserved.22+ * Copyright(c) 2004 - 2009 Intel Corporation. All rights reserved.33 *44 * This program is free software; you can redistribute it and/or modify it55 * under the terms of the GNU General Public License as published by the Free
+1-1
drivers/dma/ioatdma_registers.h
···11/*22- * Copyright(c) 2004 - 2007 Intel Corporation. All rights reserved.22+ * Copyright(c) 2004 - 2009 Intel Corporation. All rights reserved.33 *44 * This program is free software; you can redistribute it and/or modify it55 * under the terms of the GNU General Public License as published by the Free
+9-9
drivers/dma/iop-adma.c
···928928929929 for (src_idx = 0; src_idx < IOP_ADMA_NUM_SRC_TEST; src_idx++) {930930 xor_srcs[src_idx] = alloc_page(GFP_KERNEL);931931- if (!xor_srcs[src_idx])932932- while (src_idx--) {931931+ if (!xor_srcs[src_idx]) {932932+ while (src_idx--)933933 __free_page(xor_srcs[src_idx]);934934- return -ENOMEM;935935- }934934+ return -ENOMEM;935935+ }936936 }937937938938 dest = alloc_page(GFP_KERNEL);939939- if (!dest)940940- while (src_idx--) {939939+ if (!dest) {940940+ while (src_idx--)941941 __free_page(xor_srcs[src_idx]);942942- return -ENOMEM;943943- }942942+ return -ENOMEM;943943+ }944944945945 /* Fill in src buffers */946946 for (src_idx = 0; src_idx < IOP_ADMA_NUM_SRC_TEST; src_idx++) {···1401140114021402static struct platform_driver iop_adma_driver = {14031403 .probe = iop_adma_probe,14041404- .remove = iop_adma_remove,14041404+ .remove = __devexit_p(iop_adma_remove),14051405 .driver = {14061406 .owner = THIS_MODULE,14071407 .name = "iop-adma",
···212212 hcall(LHCALL_NOTIFY, lvq->config.pfn << PAGE_SHIFT, 0, 0);213213}214214215215+/* An extern declaration inside a C file is bad form. Don't do it. */216216+extern void lguest_setup_irq(unsigned int irq);217217+215218/* This routine finds the first virtqueue described in the configuration of216219 * this device and sets it up.217220 *···268265 err = -ENOMEM;269266 goto unmap;270267 }268268+269269+ /* Make sure the interrupt is allocated. */270270+ lguest_setup_irq(lvq->config.irq);271271272272 /* Tell the interrupt for this virtqueue to go to the virtio_ring273273 * interrupt handler. */
···248248249249 sg_init_one(&sg, data_buf, len);250250251251- /*252252- * The spec states that CSR and CID accesses have a timeout253253- * of 64 clock cycles.254254- */255255- data.timeout_ns = 0;256256- data.timeout_clks = 64;251251+ if (opcode == MMC_SEND_CSD || opcode == MMC_SEND_CID) {252252+ /*253253+ * The spec states that CSR and CID accesses have a timeout254254+ * of 64 clock cycles.255255+ */256256+ data.timeout_ns = 0;257257+ data.timeout_clks = 64;258258+ } else259259+ mmc_set_data_timeout(&data, card);257260258261 mmc_wait_for_req(host, &mrq);259262
···560560 msleep(1);561561 }562562563563- if (reset_timeout == 0) {563563+ if (reset_timeout < 0) {564564 dev_crit(ksp->dev,565565 "Timeout waiting for DMA engines to reset\n");566566 /* And blithely carry on */
···12291229 break;12301230 } while (val & (GREG_SWRST_TXRST | GREG_SWRST_RXRST));1231123112321232- if (limit <= 0)12321232+ if (limit < 0)12331233 printk(KERN_ERR "%s: SW reset is ghetto.\n", gp->dev->name);1234123412351235 if (gp->phy_type == phy_serialink || gp->phy_type == phy_serdes)
+2-1
drivers/net/tg3.c
···14731473{14741474 u32 reg;1475147514761476- if (!(tp->tg3_flags2 & TG3_FLG2_5705_PLUS))14761476+ if (!(tp->tg3_flags2 & TG3_FLG2_5705_PLUS) ||14771477+ GET_ASIC_REV(tp->pci_chip_rev_id) == ASIC_REV_5906)14771478 return;1478147914791480 reg = MII_TG3_MISC_SHDW_WREN |
+9-9
drivers/net/tokenring/tmspci.c
···121121 goto err_out_trdev;122122 }123123124124- ret = request_irq(pdev->irq, tms380tr_interrupt, IRQF_SHARED,125125- dev->name, dev);126126- if (ret)127127- goto err_out_region;128128-129124 dev->base_addr = pci_ioaddr;130125 dev->irq = pci_irq_line;131126 dev->dma = 0;···137142 ret = tmsdev_init(dev, &pdev->dev);138143 if (ret) {139144 printk("%s: unable to get memory for dev->priv.\n", dev->name);140140- goto err_out_irq;145145+ goto err_out_region;141146 }142147143148 tp = netdev_priv(dev);···152157153158 tp->tmspriv = cardinfo;154159160160+ ret = request_irq(pdev->irq, tms380tr_interrupt, IRQF_SHARED,161161+ dev->name, dev);162162+ if (ret)163163+ goto err_out_tmsdev;164164+155165 dev->open = tms380tr_open;156166 dev->stop = tms380tr_close;157167 pci_set_drvdata(pdev, dev);···164164165165 ret = register_netdev(dev);166166 if (ret)167167- goto err_out_tmsdev;167167+ goto err_out_irq;168168169169 return 0;170170171171+err_out_irq:172172+ free_irq(pdev->irq, dev);171173err_out_tmsdev:172174 pci_set_drvdata(pdev, NULL);173175 tmsdev_term(dev);174174-err_out_irq:175175- free_irq(pdev->irq, dev);176176err_out_region:177177 release_region(pci_ioaddr, TMS_PCI_IO_EXTENT);178178err_out_trdev:
+2-2
drivers/net/ucc_geth_mii.c
···107107static int uec_mdio_reset(struct mii_bus *bus)108108{109109 struct ucc_mii_mng __iomem *regs = (void __iomem *)bus->priv;110110- unsigned int timeout = PHY_INIT_TIMEOUT;110110+ int timeout = PHY_INIT_TIMEOUT;111111112112 mutex_lock(&bus->mdio_lock);113113···123123124124 mutex_unlock(&bus->mdio_lock);125125126126- if (timeout <= 0) {126126+ if (timeout < 0) {127127 printk(KERN_ERR "%s: The MII Bus is stuck!\n", bus->name);128128 return -EBUSY;129129 }
···6565#define BOOTMEM_DEFAULT 06666#define BOOTMEM_EXCLUSIVE (1<<0)67676868+extern int reserve_bootmem(unsigned long addr,6969+ unsigned long size,7070+ int flags);6871extern int reserve_bootmem_node(pg_data_t *pgdat,6969- unsigned long physaddr,7070- unsigned long size,7171- int flags);7272-#ifndef CONFIG_HAVE_ARCH_BOOTMEM_NODE7373-extern int reserve_bootmem(unsigned long addr, unsigned long size, int flags);7474-#endif7272+ unsigned long physaddr,7373+ unsigned long size,7474+ int flags);75757676-extern void *__alloc_bootmem_nopanic(unsigned long size,7676+extern void *__alloc_bootmem(unsigned long size,7777 unsigned long align,7878 unsigned long goal);7979-extern void *__alloc_bootmem(unsigned long size,7979+extern void *__alloc_bootmem_nopanic(unsigned long size,8080 unsigned long align,8181 unsigned long goal);8282-extern void *__alloc_bootmem_low(unsigned long size,8383- unsigned long align,8484- unsigned long goal);8582extern void *__alloc_bootmem_node(pg_data_t *pgdat,8683 unsigned long size,8784 unsigned long align,···8790 unsigned long size,8891 unsigned long align,8992 unsigned long goal);9393+extern void *__alloc_bootmem_low(unsigned long size,9494+ unsigned long align,9595+ unsigned long goal);9096extern void *__alloc_bootmem_low_node(pg_data_t *pgdat,9197 unsigned long size,9298 unsigned long align,9399 unsigned long goal);9494-#ifndef CONFIG_HAVE_ARCH_BOOTMEM_NODE100100+95101#define alloc_bootmem(x) \96102 __alloc_bootmem(x, SMP_CACHE_BYTES, __pa(MAX_DMA_ADDRESS))97103#define alloc_bootmem_nopanic(x) \98104 __alloc_bootmem_nopanic(x, SMP_CACHE_BYTES, __pa(MAX_DMA_ADDRESS))9999-#define alloc_bootmem_low(x) \100100- __alloc_bootmem_low(x, SMP_CACHE_BYTES, 0)101105#define alloc_bootmem_pages(x) \102106 __alloc_bootmem(x, PAGE_SIZE, __pa(MAX_DMA_ADDRESS))103107#define alloc_bootmem_pages_nopanic(x) \104108 __alloc_bootmem_nopanic(x, PAGE_SIZE, __pa(MAX_DMA_ADDRESS))105105-#define alloc_bootmem_low_pages(x) \106106- __alloc_bootmem_low(x, PAGE_SIZE, 0)107109#define alloc_bootmem_node(pgdat, x) \108110 __alloc_bootmem_node(pgdat, x, SMP_CACHE_BYTES, __pa(MAX_DMA_ADDRESS))109111#define alloc_bootmem_pages_node(pgdat, x) \110112 __alloc_bootmem_node(pgdat, x, PAGE_SIZE, __pa(MAX_DMA_ADDRESS))113113+#define alloc_bootmem_pages_node_nopanic(pgdat, x) \114114+ __alloc_bootmem_node_nopanic(pgdat, x, PAGE_SIZE, __pa(MAX_DMA_ADDRESS))115115+116116+#define alloc_bootmem_low(x) \117117+ __alloc_bootmem_low(x, SMP_CACHE_BYTES, 0)118118+#define alloc_bootmem_low_pages(x) \119119+ __alloc_bootmem_low(x, PAGE_SIZE, 0)111120#define alloc_bootmem_low_pages_node(pgdat, x) \112121 __alloc_bootmem_low_node(pgdat, x, PAGE_SIZE, 0)113113-#endif /* !CONFIG_HAVE_ARCH_BOOTMEM_NODE */114122115123extern int reserve_bootmem_generic(unsigned long addr, unsigned long size,116124 int flags);
-1
include/linux/cpufreq.h
···234234 int (*suspend) (struct cpufreq_policy *policy, pm_message_t pmsg);235235 int (*resume) (struct cpufreq_policy *policy);236236 struct freq_attr **attr;237237- bool hide_interface;238237};239238240239/* flags */
+1-6
include/linux/dmaengine.h
···97979898/**9999 * struct dma_chan_percpu - the per-CPU part of struct dma_chan100100- * @refcount: local_t used for open-coded "bigref" counting101100 * @memcpy_count: transaction counter102101 * @bytes_transferred: byte counter103102 */···113114 * @cookie: last cookie value returned to client114115 * @chan_id: channel ID for sysfs115116 * @dev: class device for sysfs116116- * @refcount: kref, used in "bigref" slow-mode117117- * @slow_ref: indicates that the DMA channel is free118118- * @rcu: the DMA channel's RCU head119117 * @device_node: used to add this to the device chan list120118 * @local: per-cpu pointer to a struct dma_chan_percpu121119 * @client-count: how many clients are using this channel···209213 * @global_node: list_head for global dma_device_list210214 * @cap_mask: one or more dma_capability flags211215 * @max_xor: maximum number of xor sources, 0 if no capability212212- * @refcount: reference count213213- * @done: IO completion struct214216 * @dev_id: unique device ID215217 * @dev: struct device reference for dma mapping api216218 * @device_alloc_chan_resources: allocate resources and return the···221227 * @device_prep_dma_interrupt: prepares an end of chain interrupt operation222228 * @device_prep_slave_sg: prepares a slave dma operation223229 * @device_terminate_all: terminate all pending operations230230+ * @device_is_tx_complete: poll for transaction completion224231 * @device_issue_pending: push pending transactions to hardware225232 */226233struct dma_device {
-1
include/linux/hdreg.h
···511511 unsigned short words69_70[2]; /* reserved words 69-70512512 * future command overlap and queuing513513 */514514- /* HDIO_GET_IDENTITY currently returns only words 0 through 70 */515514 unsigned short words71_74[4]; /* reserved words 71-74516515 * for IDENTIFY PACKET DEVICE command517516 */
+1
include/linux/ide.h
···866866 unsigned int n_ports;867867 struct device *dev[2];868868 unsigned int (*init_chipset)(struct pci_dev *);869869+ irq_handler_t irq_handler;869870 unsigned long host_flags;870871 void *host_priv;871872 ide_hwif_t *cur_port; /* for hosts requiring serialization */
+4-2
include/linux/libata.h
···275275 * advised to wait only for the following duration before276276 * doing SRST.277277 */278278- ATA_TMOUT_PMP_SRST_WAIT = 1000,278278+ ATA_TMOUT_PMP_SRST_WAIT = 5000,279279280280 /* ATA bus states */281281 BUS_UNKNOWN = 0,···530530 unsigned long flags; /* ATA_QCFLAG_xxx */531531 unsigned int tag;532532 unsigned int n_elem;533533+ unsigned int orig_n_elem;533534534535 int dma_dir;535536···751750 acpi_handle acpi_handle;752751 struct ata_acpi_gtm __acpi_init_gtm; /* use ata_acpi_init_gtm() */753752#endif754754- u8 sector_buf[ATA_SECT_SIZE]; /* owned by EH */753753+ /* owned by EH */754754+ u8 sector_buf[ATA_SECT_SIZE] ____cacheline_aligned;755755};756756757757/* The following initializer overrides a method to NULL whether one of
+1
include/linux/netdevice.h
···10791079extern int register_netdevice_notifier(struct notifier_block *nb);10801080extern int unregister_netdevice_notifier(struct notifier_block *nb);10811081extern int init_dummy_netdev(struct net_device *dev);10821082+extern void netdev_resync_ops(struct net_device *dev);1082108310831084extern int call_netdevice_notifiers(unsigned long val, struct net_device *dev);10841085extern struct net_device *dev_get_by_index(struct net *net, int ifindex);
+74-34
include/linux/percpu.h
···55#include <linux/slab.h> /* For kmalloc() */66#include <linux/smp.h>77#include <linux/cpumask.h>88+#include <linux/pfn.h>89910#include <asm/percpu.h>1011···5352#define EXPORT_PER_CPU_SYMBOL(var) EXPORT_SYMBOL(per_cpu__##var)5453#define EXPORT_PER_CPU_SYMBOL_GPL(var) EXPORT_SYMBOL_GPL(per_cpu__##var)55545656-/* Enough to cover all DEFINE_PER_CPUs in kernel, including modules. */5757-#ifndef PERCPU_ENOUGH_ROOM5555+/* enough to cover all DEFINE_PER_CPUs in modules */5856#ifdef CONFIG_MODULES5959-#define PERCPU_MODULE_RESERVE 81925757+#define PERCPU_MODULE_RESERVE (8 << 10)6058#else6161-#define PERCPU_MODULE_RESERVE 05959+#define PERCPU_MODULE_RESERVE 06260#endif63616262+#ifndef PERCPU_ENOUGH_ROOM6463#define PERCPU_ENOUGH_ROOM \6565- (__per_cpu_end - __per_cpu_start + PERCPU_MODULE_RESERVE)6666-#endif /* PERCPU_ENOUGH_ROOM */6464+ (ALIGN(__per_cpu_end - __per_cpu_start, SMP_CACHE_BYTES) + \6565+ PERCPU_MODULE_RESERVE)6666+#endif67676868/*6969 * Must be an lvalue. Since @var must be a simple identifier,···78767977#ifdef CONFIG_SMP80787979+#ifdef CONFIG_HAVE_DYNAMIC_PER_CPU_AREA8080+8181+/* minimum unit size, also is the maximum supported allocation size */8282+#define PCPU_MIN_UNIT_SIZE PFN_ALIGN(64 << 10)8383+8484+/*8585+ * PERCPU_DYNAMIC_RESERVE indicates the amount of free area to piggy8686+ * back on the first chunk for dynamic percpu allocation if arch is8787+ * manually allocating and mapping it for faster access (as a part of8888+ * large page mapping for example).8989+ *9090+ * The following values give between one and two pages of free space9191+ * after typical minimal boot (2-way SMP, single disk and NIC) with9292+ * both defconfig and a distro config on x86_64 and 32. More9393+ * intelligent way to determine this would be nice.9494+ */9595+#if BITS_PER_LONG > 329696+#define PERCPU_DYNAMIC_RESERVE (20 << 10)9797+#else9898+#define PERCPU_DYNAMIC_RESERVE (12 << 10)9999+#endif100100+101101+extern void *pcpu_base_addr;102102+103103+typedef struct page * (*pcpu_get_page_fn_t)(unsigned int cpu, int pageno);104104+typedef void (*pcpu_populate_pte_fn_t)(unsigned long addr);105105+106106+extern size_t __init pcpu_setup_first_chunk(pcpu_get_page_fn_t get_page_fn,107107+ size_t static_size, size_t reserved_size,108108+ ssize_t unit_size, ssize_t dyn_size,109109+ void *base_addr,110110+ pcpu_populate_pte_fn_t populate_pte_fn);111111+112112+/*113113+ * Use this to get to a cpu's version of the per-cpu object114114+ * dynamically allocated. Non-atomic access to the current CPU's115115+ * version should probably be combined with get_cpu()/put_cpu().116116+ */117117+#define per_cpu_ptr(ptr, cpu) SHIFT_PERCPU_PTR((ptr), per_cpu_offset((cpu)))118118+119119+extern void *__alloc_reserved_percpu(size_t size, size_t align);120120+121121+#else /* CONFIG_HAVE_DYNAMIC_PER_CPU_AREA */122122+81123struct percpu_data {82124 void *ptrs[1];83125};8412685127#define __percpu_disguise(pdata) (struct percpu_data *)~(unsigned long)(pdata)8686-/* 8787- * Use this to get to a cpu's version of the per-cpu object dynamically8888- * allocated. Non-atomic access to the current CPU's version should8989- * probably be combined with get_cpu()/put_cpu().9090- */ 9191-#define percpu_ptr(ptr, cpu) \9292-({ \9393- struct percpu_data *__p = __percpu_disguise(ptr); \9494- (__typeof__(ptr))__p->ptrs[(cpu)]; \128128+129129+#define per_cpu_ptr(ptr, cpu) \130130+({ \131131+ struct percpu_data *__p = __percpu_disguise(ptr); \132132+ (__typeof__(ptr))__p->ptrs[(cpu)]; \95133})961349797-extern void *__percpu_alloc_mask(size_t size, gfp_t gfp, cpumask_t *mask);9898-extern void percpu_free(void *__pdata);135135+#endif /* CONFIG_HAVE_DYNAMIC_PER_CPU_AREA */136136+137137+extern void *__alloc_percpu(size_t size, size_t align);138138+extern void free_percpu(void *__pdata);99139100140#else /* CONFIG_SMP */101141102102-#define percpu_ptr(ptr, cpu) ({ (void)(cpu); (ptr); })142142+#define per_cpu_ptr(ptr, cpu) ({ (void)(cpu); (ptr); })103143104104-static __always_inline void *__percpu_alloc_mask(size_t size, gfp_t gfp, cpumask_t *mask)144144+static inline void *__alloc_percpu(size_t size, size_t align)105145{106106- return kzalloc(size, gfp);146146+ /*147147+ * Can't easily make larger alignment work with kmalloc. WARN148148+ * on it. Larger alignment should only be used for module149149+ * percpu sections on SMP for which this path isn't used.150150+ */151151+ WARN_ON_ONCE(align > SMP_CACHE_BYTES);152152+ return kzalloc(size, GFP_KERNEL);107153}108154109109-static inline void percpu_free(void *__pdata)155155+static inline void free_percpu(void *p)110156{111111- kfree(__pdata);157157+ kfree(p);112158}113159114160#endif /* CONFIG_SMP */115161116116-#define percpu_alloc_mask(size, gfp, mask) \117117- __percpu_alloc_mask((size), (gfp), &(mask))118118-119119-#define percpu_alloc(size, gfp) percpu_alloc_mask((size), (gfp), cpu_online_map)120120-121121-/* (legacy) interface for use without CPU hotplug handling */122122-123123-#define __alloc_percpu(size) percpu_alloc_mask((size), GFP_KERNEL, \124124- cpu_possible_map)125125-#define alloc_percpu(type) (type *)__alloc_percpu(sizeof(type))126126-#define free_percpu(ptr) percpu_free((ptr))127127-#define per_cpu_ptr(ptr, cpu) percpu_ptr((ptr), (cpu))162162+#define alloc_percpu(type) (type *)__alloc_percpu(sizeof(type), \163163+ __alignof__(type))128164129165#endif /* __LINUX_PERCPU_H */
+6
include/linux/rcuclassic.h
···181181#define rcu_enter_nohz() do { } while (0)182182#define rcu_exit_nohz() do { } while (0)183183184184+/* A context switch is a grace period for rcuclassic. */185185+static inline int rcu_blocking_is_gp(void)186186+{187187+ return num_online_cpus() == 1;188188+}189189+184190#endif /* __LINUX_RCUCLASSIC_H */
+4
include/linux/rcupdate.h
···5252 void (*func)(struct rcu_head *head);5353};54545555+/* Internal to kernel, but needed by rcupreempt.h. */5656+extern int rcu_scheduler_active;5757+5558#if defined(CONFIG_CLASSIC_RCU)5659#include <linux/rcuclassic.h>5760#elif defined(CONFIG_TREE_RCU)···268265269266/* Internal to kernel */270267extern void rcu_init(void);268268+extern void rcu_scheduler_starting(void);271269extern int rcu_needs_cpu(int cpu);272270273271#endif /* __LINUX_RCUPDATE_H */
+15
include/linux/rcupreempt.h
···142142#define rcu_exit_nohz() do { } while (0)143143#endif /* CONFIG_NO_HZ */144144145145+/*146146+ * A context switch is a grace period for rcupreempt synchronize_rcu()147147+ * only during early boot, before the scheduler has been initialized.148148+ * So, how the heck do we get a context switch? Well, if the caller149149+ * invokes synchronize_rcu(), they are willing to accept a context150150+ * switch, so we simply pretend that one happened.151151+ *152152+ * After boot, there might be a blocked or preempted task in an RCU153153+ * read-side critical section, so we cannot then take the fastpath.154154+ */155155+static inline int rcu_blocking_is_gp(void)156156+{157157+ return num_online_cpus() == 1 && !rcu_scheduler_active;158158+}159159+145160#endif /* __LINUX_RCUPREEMPT_H */
+6
include/linux/rcutree.h
···326326}327327#endif /* CONFIG_NO_HZ */328328329329+/* A context switch is a grace period for rcutree. */330330+static inline int rcu_blocking_is_gp(void)331331+{332332+ return num_online_cpus() == 1;333333+}334334+329335#endif /* __LINUX_RCUTREE_H */
+4
include/linux/sched.h
···23032303extern int sched_group_set_rt_period(struct task_group *tg,23042304 long rt_period_us);23052305extern long sched_group_rt_period(struct task_group *tg);23062306+extern int sched_rt_can_attach(struct task_group *tg, struct task_struct *tsk);23062307#endif23072308#endif23092309+23102310+extern int task_can_switch_user(struct user_struct *up,23112311+ struct task_struct *tsk);2308231223092313#ifdef CONFIG_TASK_XACCT23102314static inline void add_rchar(struct task_struct *tsk, ssize_t amt)
···95959696extern int map_vm_area(struct vm_struct *area, pgprot_t prot,9797 struct page ***pages);9898+extern int map_kernel_range_noflush(unsigned long start, unsigned long size,9999+ pgprot_t prot, struct page **pages);100100+extern void unmap_kernel_range_noflush(unsigned long addr, unsigned long size);98101extern void unmap_kernel_range(unsigned long addr, unsigned long size);99102100103/* Allocate/destroy a 'vmalloc' VM area. */···113110 */114111extern rwlock_t vmlist_lock;115112extern struct vm_struct *vmlist;113113+extern __init void vm_area_register_early(struct vm_struct *vm, size_t align);116114117115#endif /* _LINUX_VMALLOC_H */
+17-10
include/net/net_namespace.h
···109109#ifdef CONFIG_NET_NS110110extern void __put_net(struct net *net);111111112112-static inline int net_alive(struct net *net)113113-{114114- return net && atomic_read(&net->count);115115-}116116-117112static inline struct net *get_net(struct net *net)118113{119114 atomic_inc(&net->count);···139144 return net1 == net2;140145}141146#else142142-143143-static inline int net_alive(struct net *net)144144-{145145- return 1;146146-}147147148148static inline struct net *get_net(struct net *net)149149{···224234 void (*exit)(struct net *net);225235};226236237237+/*238238+ * Use these carefully. If you implement a network device and it239239+ * needs per network namespace operations use device pernet operations,240240+ * otherwise use pernet subsys operations.241241+ *242242+ * This is critically important. Most of the network code cleanup243243+ * runs with the assumption that dev_remove_pack has been called so no244244+ * new packets will arrive during and after the cleanup functions have245245+ * been called. dev_remove_pack is not per namespace so instead the246246+ * guarantee of no more packets arriving in a network namespace is247247+ * provided by ensuring that all network devices and all sockets have248248+ * left the network namespace before the cleanup methods are called.249249+ *250250+ * For the longest time the ipv4 icmp code was registered as a pernet251251+ * device which caused kernel oops, and panics during network252252+ * namespace cleanup. So please don't get this wrong.253253+ */227254extern int register_pernet_subsys(struct pernet_operations *);228255extern void unregister_pernet_subsys(struct pernet_operations *);229256extern int register_pernet_gen_subsys(int *id, struct pernet_operations *);
+15-15
init/Kconfig
···735735config SYSCTL736736 bool737737738738+config ANON_INODES739739+ bool740740+738741menuconfig EMBEDDED739742 bool "Configure standard kernel features (for small systems)"740743 help···843840 This option allows to disable the internal PC-Speaker844841 support, saving some memory.845842846846-config COMPAT_BRK847847- bool "Disable heap randomization"848848- default y849849- help850850- Randomizing heap placement makes heap exploits harder, but it851851- also breaks ancient binaries (including anything libc5 based).852852- This option changes the bootup default to heap randomization853853- disabled, and can be overriden runtime by setting854854- /proc/sys/kernel/randomize_va_space to 2.855855-856856- On non-ancient distros (post-2000 ones) N is usually a safe choice.857857-858843config BASE_FULL859844 default y860845 bool "Enable full-sized data structures for core" if EMBEDDED···859868 Disabling this option will cause the kernel to be built without860869 support for "fast userspace mutexes". The resulting kernel may not861870 run glibc-based applications correctly.862862-863863-config ANON_INODES864864- bool865871866872config EPOLL867873 bool "Enable eventpoll support" if EMBEDDED···944956 result in significant savings in code size. This also disables945957 SLUB sysfs support. /sys/slab will not exist and there will be946958 no support for cache validation etc.959959+960960+config COMPAT_BRK961961+ bool "Disable heap randomization"962962+ default y963963+ help964964+ Randomizing heap placement makes heap exploits harder, but it965965+ also breaks ancient binaries (including anything libc5 based).966966+ This option changes the bootup default to heap randomization967967+ disabled, and can be overriden runtime by setting968968+ /proc/sys/kernel/randomize_va_space to 2.969969+970970+ On non-ancient distros (post-2000 ones) N is usually a safe choice.947971948972choice949973 prompt "Choose SLAB allocator"
+2-1
init/main.c
···9898extern void tc_init(void);9999#endif100100101101-enum system_states system_state;101101+enum system_states system_state __read_mostly;102102EXPORT_SYMBOL(system_state);103103104104/*···464464 * at least once to get things moving:465465 */466466 init_idle_bootup_task(current);467467+ rcu_scheduler_starting();467468 preempt_enable_no_resched();468469 schedule();469470 preempt_disable();
+5-6
kernel/fork.c
···11841184#endif11851185 clear_all_latency_tracing(p);1186118611871187- /* Our parent execution domain becomes current domain11881188- These must match for thread signalling to apply */11891189- p->parent_exec_id = p->self_exec_id;11901190-11911187 /* ok, now we should be set up.. */11921188 p->exit_signal = (clone_flags & CLONE_THREAD) ? -1 : (clone_flags & CSIGNAL);11931189 p->pdeath_signal = 0;···12211225 set_task_cpu(p, smp_processor_id());1222122612231227 /* CLONE_PARENT re-uses the old parent */12241224- if (clone_flags & (CLONE_PARENT|CLONE_THREAD))12281228+ if (clone_flags & (CLONE_PARENT|CLONE_THREAD)) {12251229 p->real_parent = current->real_parent;12261226- else12301230+ p->parent_exec_id = current->parent_exec_id;12311231+ } else {12271232 p->real_parent = current;12331233+ p->parent_exec_id = current->self_exec_id;12341234+ }1228123512291236 spin_lock(¤t->sighand->siglock);12301237
+49-15
kernel/module.c
···5151#include <linux/tracepoint.h>5252#include <linux/ftrace.h>5353#include <linux/async.h>5454+#include <linux/percpu.h>54555556#if 05657#define DEBUGP printk···367366}368367369368#ifdef CONFIG_SMP369369+370370+#ifdef CONFIG_HAVE_DYNAMIC_PER_CPU_AREA371371+372372+static void *percpu_modalloc(unsigned long size, unsigned long align,373373+ const char *name)374374+{375375+ void *ptr;376376+377377+ if (align > PAGE_SIZE) {378378+ printk(KERN_WARNING "%s: per-cpu alignment %li > %li\n",379379+ name, align, PAGE_SIZE);380380+ align = PAGE_SIZE;381381+ }382382+383383+ ptr = __alloc_reserved_percpu(size, align);384384+ if (!ptr)385385+ printk(KERN_WARNING386386+ "Could not allocate %lu bytes percpu data\n", size);387387+ return ptr;388388+}389389+390390+static void percpu_modfree(void *freeme)391391+{392392+ free_percpu(freeme);393393+}394394+395395+#else /* ... !CONFIG_HAVE_DYNAMIC_PER_CPU_AREA */396396+370397/* Number of blocks used and allocated. */371398static unsigned int pcpu_num_used, pcpu_num_allocated;372399/* Size of each block. -ve means used. */···509480 }510481}511482512512-static unsigned int find_pcpusec(Elf_Ehdr *hdr,513513- Elf_Shdr *sechdrs,514514- const char *secstrings)515515-{516516- return find_sec(hdr, sechdrs, secstrings, ".data.percpu");517517-}518518-519519-static void percpu_modcopy(void *pcpudest, const void *from, unsigned long size)520520-{521521- int cpu;522522-523523- for_each_possible_cpu(cpu)524524- memcpy(pcpudest + per_cpu_offset(cpu), from, size);525525-}526526-527483static int percpu_modinit(void)528484{529485 pcpu_num_used = 2;···527513 return 0;528514}529515__initcall(percpu_modinit);516516+517517+#endif /* CONFIG_HAVE_DYNAMIC_PER_CPU_AREA */518518+519519+static unsigned int find_pcpusec(Elf_Ehdr *hdr,520520+ Elf_Shdr *sechdrs,521521+ const char *secstrings)522522+{523523+ return find_sec(hdr, sechdrs, secstrings, ".data.percpu");524524+}525525+526526+static void percpu_modcopy(void *pcpudest, const void *from, unsigned long size)527527+{528528+ int cpu;529529+530530+ for_each_possible_cpu(cpu)531531+ memcpy(pcpudest + per_cpu_offset(cpu), from, size);532532+}533533+530534#else /* ... !CONFIG_SMP */535535+531536static inline void *percpu_modalloc(unsigned long size, unsigned long align,532537 const char *name)533538{···568535 /* pcpusec should be 0, and size of that section should be 0. */569536 BUG_ON(size != 0);570537}538538+571539#endif /* CONFIG_SMP */572540573541#define MODINFO_ATTR(field) \
+2-2
kernel/rcuclassic.c
···679679void rcu_check_callbacks(int cpu, int user)680680{681681 if (user ||682682- (idle_cpu(cpu) && !in_softirq() &&683683- hardirq_count() <= (1 << HARDIRQ_SHIFT))) {682682+ (idle_cpu(cpu) && rcu_scheduler_active &&683683+ !in_softirq() && hardirq_count() <= (1 << HARDIRQ_SHIFT))) {684684685685 /*686686 * Get here if this CPU took its interrupt from user
+12
kernel/rcupdate.c
···4444#include <linux/cpu.h>4545#include <linux/mutex.h>4646#include <linux/module.h>4747+#include <linux/kernel_stat.h>47484849enum rcu_barrier {4950 RCU_BARRIER_STD,···5655static atomic_t rcu_barrier_cpu_count;5756static DEFINE_MUTEX(rcu_barrier_mutex);5857static struct completion rcu_barrier_completion;5858+int rcu_scheduler_active __read_mostly;59596060/*6161 * Awaken the corresponding synchronize_rcu() instance now that a···8280void synchronize_rcu(void)8381{8482 struct rcu_synchronize rcu;8383+8484+ if (rcu_blocking_is_gp())8585+ return;8686+8587 init_completion(&rcu.completion);8688 /* Will wake me after RCU finished. */8789 call_rcu(&rcu.head, wakeme_after_rcu);···181175 __rcu_init();182176}183177178178+void rcu_scheduler_starting(void)179179+{180180+ WARN_ON(num_online_cpus() != 1);181181+ WARN_ON(nr_context_switches() > 0);182182+ rcu_scheduler_active = 1;183183+}
+3
kernel/rcupreempt.c
···11811181{11821182 struct rcu_synchronize rcu;1183118311841184+ if (num_online_cpus() == 1)11851185+ return; /* blocking is gp if only one CPU! */11861186+11841187 init_completion(&rcu.completion);11851188 /* Will wake me after RCU finished. */11861189 call_rcu_sched(&rcu.head, wakeme_after_rcu);
+2-2
kernel/rcutree.c
···948948void rcu_check_callbacks(int cpu, int user)949949{950950 if (user ||951951- (idle_cpu(cpu) && !in_softirq() &&952952- hardirq_count() <= (1 << HARDIRQ_SHIFT))) {951951+ (idle_cpu(cpu) && rcu_scheduler_active &&952952+ !in_softirq() && hardirq_count() <= (1 << HARDIRQ_SHIFT))) {953953954954 /*955955 * Get here if this CPU took its interrupt from user
+15-6
kernel/sched.c
···223223{224224 ktime_t now;225225226226- if (rt_bandwidth_enabled() && rt_b->rt_runtime == RUNTIME_INF)226226+ if (!rt_bandwidth_enabled() || rt_b->rt_runtime == RUNTIME_INF)227227 return;228228229229 if (hrtimer_active(&rt_b->rt_period_timer))···9219921992209220 return ret;92219221}92229222+92239223+int sched_rt_can_attach(struct task_group *tg, struct task_struct *tsk)92249224+{92259225+ /* Don't accept realtime tasks when there is no way for them to run */92269226+ if (rt_task(tsk) && tg->rt_bandwidth.rt_runtime == 0)92279227+ return 0;92289228+92299229+ return 1;92309230+}92319231+92229232#else /* !CONFIG_RT_GROUP_SCHED */92239233static int sched_rt_global_constraints(void)92249234{···93229312 struct task_struct *tsk)93239313{93249314#ifdef CONFIG_RT_GROUP_SCHED93259325- /* Don't accept realtime tasks when there is no way for them to run */93269326- if (rt_task(tsk) && cgroup_tg(cgrp)->rt_bandwidth.rt_runtime == 0)93159315+ if (!sched_rt_can_attach(cgroup_tg(cgrp), tsk))93279316 return -EINVAL;93289317#else93299318 /* We don't support RT-tasks being in separate groups */···9485947694869477static u64 cpuacct_cpuusage_read(struct cpuacct *ca, int cpu)94879478{94889488- u64 *cpuusage = percpu_ptr(ca->cpuusage, cpu);94799479+ u64 *cpuusage = per_cpu_ptr(ca->cpuusage, cpu);94899480 u64 data;9490948194919482#ifndef CONFIG_64BIT···9504949595059496static void cpuacct_cpuusage_write(struct cpuacct *ca, int cpu, u64 val)95069497{95079507- u64 *cpuusage = percpu_ptr(ca->cpuusage, cpu);94989498+ u64 *cpuusage = per_cpu_ptr(ca->cpuusage, cpu);9508949995099500#ifndef CONFIG_64BIT95109501 /*···96009591 ca = task_ca(tsk);9601959296029593 for (; ca; ca = ca->parent) {96039603- u64 *cpuusage = percpu_ptr(ca->cpuusage, cpu);95949594+ u64 *cpuusage = per_cpu_ptr(ca->cpuusage, cpu);96049595 *cpuusage += cputime;96059596 }96069597}
···286286/* work function to remove sysfs directory for a user and free up287287 * corresponding structures.288288 */289289-static void remove_user_sysfs_dir(struct work_struct *w)289289+static void cleanup_user_struct(struct work_struct *w)290290{291291 struct user_struct *up = container_of(w, struct user_struct, work);292292 unsigned long flags;293293 int remove_user = 0;294294295295- if (up->user_ns != &init_user_ns)296296- return;297295 /* Make uid_hash_remove() + sysfs_remove_file() + kobject_del()298296 * atomic.299297 */···310312 if (!remove_user)311313 goto done;312314313313- kobject_uevent(&up->kobj, KOBJ_REMOVE);314314- kobject_del(&up->kobj);315315- kobject_put(&up->kobj);315315+ if (up->user_ns == &init_user_ns) {316316+ kobject_uevent(&up->kobj, KOBJ_REMOVE);317317+ kobject_del(&up->kobj);318318+ kobject_put(&up->kobj);319319+ }316320317321 sched_destroy_user(up);318322 key_put(up->uid_keyring);···335335 atomic_inc(&up->__count);336336 spin_unlock_irqrestore(&uidhash_lock, flags);337337338338- INIT_WORK(&up->work, remove_user_sysfs_dir);338338+ INIT_WORK(&up->work, cleanup_user_struct);339339 schedule_work(&up->work);340340}341341···360360 kmem_cache_free(uid_cachep, up);361361}362362363363+#endif364364+365365+#if defined(CONFIG_RT_GROUP_SCHED) && defined(CONFIG_USER_SCHED)366366+/*367367+ * We need to check if a setuid can take place. This function should be called368368+ * before successfully completing the setuid.369369+ */370370+int task_can_switch_user(struct user_struct *up, struct task_struct *tsk)371371+{372372+373373+ return sched_rt_can_attach(up->tg, tsk);374374+375375+}376376+#else377377+int task_can_switch_user(struct user_struct *up, struct task_struct *tsk)378378+{379379+ return 1;380380+}363381#endif364382365383/*
···9999 __percpu_populate_mask((__pdata), (size), (gfp), &(mask))100100101101/**102102- * percpu_alloc_mask - initial setup of per-cpu data102102+ * alloc_percpu - initial setup of per-cpu data103103 * @size: size of per-cpu object104104- * @gfp: may sleep or not etc.105105- * @mask: populate per-data for cpu's selected through mask bits104104+ * @align: alignment106105 *107107- * Populating per-cpu data for all online cpu's would be a typical use case,108108- * which is simplified by the percpu_alloc() wrapper.109109- * Per-cpu objects are populated with zeroed buffers.106106+ * Allocate dynamic percpu area. Percpu objects are populated with107107+ * zeroed buffers.110108 */111111-void *__percpu_alloc_mask(size_t size, gfp_t gfp, cpumask_t *mask)109109+void *__alloc_percpu(size_t size, size_t align)112110{113111 /*114112 * We allocate whole cache lines to avoid false sharing115113 */116114 size_t sz = roundup(nr_cpu_ids * sizeof(void *), cache_line_size());117117- void *pdata = kzalloc(sz, gfp);115115+ void *pdata = kzalloc(sz, GFP_KERNEL);118116 void *__pdata = __percpu_disguise(pdata);117117+118118+ /*119119+ * Can't easily make larger alignment work with kmalloc. WARN120120+ * on it. Larger alignment should only be used for module121121+ * percpu sections on SMP for which this path isn't used.122122+ */123123+ WARN_ON_ONCE(align > __alignof__(unsigned long long));119124120125 if (unlikely(!pdata))121126 return NULL;122122- if (likely(!__percpu_populate_mask(__pdata, size, gfp, mask)))127127+ if (likely(!__percpu_populate_mask(__pdata, size, GFP_KERNEL,128128+ &cpu_possible_map)))123129 return __pdata;124130 kfree(pdata);125131 return NULL;126132}127127-EXPORT_SYMBOL_GPL(__percpu_alloc_mask);133133+EXPORT_SYMBOL_GPL(__alloc_percpu);128134129135/**130130- * percpu_free - final cleanup of per-cpu data136136+ * free_percpu - final cleanup of per-cpu data131137 * @__pdata: object to clean up132138 *133139 * We simply clean up any per-cpu object left. No need for the client to134140 * track and specify through a bis mask which per-cpu objects are to free.135141 */136136-void percpu_free(void *__pdata)142142+void free_percpu(void *__pdata)137143{138144 if (unlikely(!__pdata))139145 return;140146 __percpu_depopulate_mask(__pdata, &cpu_possible_map);141147 kfree(__percpu_disguise(__pdata));142148}143143-EXPORT_SYMBOL_GPL(percpu_free);149149+EXPORT_SYMBOL_GPL(free_percpu);
+29-6
mm/bootmem.c
···382382 return mark_bootmem_node(pgdat->bdata, start, end, 1, flags);383383}384384385385-#ifndef CONFIG_HAVE_ARCH_BOOTMEM_NODE386385/**387386 * reserve_bootmem - mark a page range as usable388387 * @addr: starting address of the range···402403403404 return mark_bootmem(start, end, 1, flags);404405}405405-#endif /* !CONFIG_HAVE_ARCH_BOOTMEM_NODE */406406407407static unsigned long align_idx(struct bootmem_data *bdata, unsigned long idx,408408 unsigned long step)···427429}428430429431static void * __init alloc_bootmem_core(struct bootmem_data *bdata,430430- unsigned long size, unsigned long align,431431- unsigned long goal, unsigned long limit)432432+ unsigned long size, unsigned long align,433433+ unsigned long goal, unsigned long limit)432434{433435 unsigned long fallback = 0;434436 unsigned long min, max, start, sidx, midx, step;···528530 return NULL;529531}530532533533+static void * __init alloc_arch_preferred_bootmem(bootmem_data_t *bdata,534534+ unsigned long size, unsigned long align,535535+ unsigned long goal, unsigned long limit)536536+{537537+#ifdef CONFIG_HAVE_ARCH_BOOTMEM538538+ bootmem_data_t *p_bdata;539539+540540+ p_bdata = bootmem_arch_preferred_node(bdata, size, align, goal, limit);541541+ if (p_bdata)542542+ return alloc_bootmem_core(p_bdata, size, align, goal, limit);543543+#endif544544+ return NULL;545545+}546546+531547static void * __init ___alloc_bootmem_nopanic(unsigned long size,532548 unsigned long align,533549 unsigned long goal,534550 unsigned long limit)535551{536552 bootmem_data_t *bdata;553553+ void *region;537554538555restart:539539- list_for_each_entry(bdata, &bdata_list, list) {540540- void *region;556556+ region = alloc_arch_preferred_bootmem(NULL, size, align, goal, limit);557557+ if (region)558558+ return region;541559560560+ list_for_each_entry(bdata, &bdata_list, list) {542561 if (goal && bdata->node_low_pfn <= PFN_DOWN(goal))543562 continue;544563 if (limit && bdata->node_min_pfn >= PFN_DOWN(limit))···633618{634619 void *ptr;635620621621+ ptr = alloc_arch_preferred_bootmem(bdata, size, align, goal, limit);622622+ if (ptr)623623+ return ptr;624624+636625 ptr = alloc_bootmem_core(bdata, size, align, goal, limit);637626 if (ptr)638627 return ptr;···692673 unsigned long align, unsigned long goal)693674{694675 void *ptr;676676+677677+ ptr = alloc_arch_preferred_bootmem(pgdat->bdata, size, align, goal, 0);678678+ if (ptr)679679+ return ptr;695680696681 ptr = alloc_bootmem_core(pgdat->bdata, size, align, goal, 0);697682 if (ptr)
+1226
mm/percpu.c
···11+/*22+ * linux/mm/percpu.c - percpu memory allocator33+ *44+ * Copyright (C) 2009 SUSE Linux Products GmbH55+ * Copyright (C) 2009 Tejun Heo <tj@kernel.org>66+ *77+ * This file is released under the GPLv2.88+ *99+ * This is percpu allocator which can handle both static and dynamic1010+ * areas. Percpu areas are allocated in chunks in vmalloc area. Each1111+ * chunk is consisted of num_possible_cpus() units and the first chunk1212+ * is used for static percpu variables in the kernel image (special1313+ * boot time alloc/init handling necessary as these areas need to be1414+ * brought up before allocation services are running). Unit grows as1515+ * necessary and all units grow or shrink in unison. When a chunk is1616+ * filled up, another chunk is allocated. ie. in vmalloc area1717+ *1818+ * c0 c1 c21919+ * ------------------- ------------------- ------------2020+ * | u0 | u1 | u2 | u3 | | u0 | u1 | u2 | u3 | | u0 | u1 | u2121+ * ------------------- ...... ------------------- .... ------------2222+ *2323+ * Allocation is done in offset-size areas of single unit space. Ie,2424+ * an area of 512 bytes at 6k in c1 occupies 512 bytes at 6k of c1:u0,2525+ * c1:u1, c1:u2 and c1:u3. Percpu access can be done by configuring2626+ * percpu base registers UNIT_SIZE apart.2727+ *2828+ * There are usually many small percpu allocations many of them as2929+ * small as 4 bytes. The allocator organizes chunks into lists3030+ * according to free size and tries to allocate from the fullest one.3131+ * Each chunk keeps the maximum contiguous area size hint which is3232+ * guaranteed to be eqaul to or larger than the maximum contiguous3333+ * area in the chunk. This helps the allocator not to iterate the3434+ * chunk maps unnecessarily.3535+ *3636+ * Allocation state in each chunk is kept using an array of integers3737+ * on chunk->map. A positive value in the map represents a free3838+ * region and negative allocated. Allocation inside a chunk is done3939+ * by scanning this map sequentially and serving the first matching4040+ * entry. This is mostly copied from the percpu_modalloc() allocator.4141+ * Chunks are also linked into a rb tree to ease address to chunk4242+ * mapping during free.4343+ *4444+ * To use this allocator, arch code should do the followings.4545+ *4646+ * - define CONFIG_HAVE_DYNAMIC_PER_CPU_AREA4747+ *4848+ * - define __addr_to_pcpu_ptr() and __pcpu_ptr_to_addr() to translate4949+ * regular address to percpu pointer and back5050+ *5151+ * - use pcpu_setup_first_chunk() during percpu area initialization to5252+ * setup the first chunk containing the kernel static percpu area5353+ */5454+5555+#include <linux/bitmap.h>5656+#include <linux/bootmem.h>5757+#include <linux/list.h>5858+#include <linux/mm.h>5959+#include <linux/module.h>6060+#include <linux/mutex.h>6161+#include <linux/percpu.h>6262+#include <linux/pfn.h>6363+#include <linux/rbtree.h>6464+#include <linux/slab.h>6565+#include <linux/spinlock.h>6666+#include <linux/vmalloc.h>6767+#include <linux/workqueue.h>6868+6969+#include <asm/cacheflush.h>7070+#include <asm/tlbflush.h>7171+7272+#define PCPU_SLOT_BASE_SHIFT 5 /* 1-31 shares the same slot */7373+#define PCPU_DFL_MAP_ALLOC 16 /* start a map with 16 ents */7474+7575+struct pcpu_chunk {7676+ struct list_head list; /* linked to pcpu_slot lists */7777+ struct rb_node rb_node; /* key is chunk->vm->addr */7878+ int free_size; /* free bytes in the chunk */7979+ int contig_hint; /* max contiguous size hint */8080+ struct vm_struct *vm; /* mapped vmalloc region */8181+ int map_used; /* # of map entries used */8282+ int map_alloc; /* # of map entries allocated */8383+ int *map; /* allocation map */8484+ bool immutable; /* no [de]population allowed */8585+ struct page **page; /* points to page array */8686+ struct page *page_ar[]; /* #cpus * UNIT_PAGES */8787+};8888+8989+static int pcpu_unit_pages __read_mostly;9090+static int pcpu_unit_size __read_mostly;9191+static int pcpu_chunk_size __read_mostly;9292+static int pcpu_nr_slots __read_mostly;9393+static size_t pcpu_chunk_struct_size __read_mostly;9494+9595+/* the address of the first chunk which starts with the kernel static area */9696+void *pcpu_base_addr __read_mostly;9797+EXPORT_SYMBOL_GPL(pcpu_base_addr);9898+9999+/* optional reserved chunk, only accessible for reserved allocations */100100+static struct pcpu_chunk *pcpu_reserved_chunk;101101+/* offset limit of the reserved chunk */102102+static int pcpu_reserved_chunk_limit;103103+104104+/*105105+ * Synchronization rules.106106+ *107107+ * There are two locks - pcpu_alloc_mutex and pcpu_lock. The former108108+ * protects allocation/reclaim paths, chunks and chunk->page arrays.109109+ * The latter is a spinlock and protects the index data structures -110110+ * chunk slots, rbtree, chunks and area maps in chunks.111111+ *112112+ * During allocation, pcpu_alloc_mutex is kept locked all the time and113113+ * pcpu_lock is grabbed and released as necessary. All actual memory114114+ * allocations are done using GFP_KERNEL with pcpu_lock released.115115+ *116116+ * Free path accesses and alters only the index data structures, so it117117+ * can be safely called from atomic context. When memory needs to be118118+ * returned to the system, free path schedules reclaim_work which119119+ * grabs both pcpu_alloc_mutex and pcpu_lock, unlinks chunks to be120120+ * reclaimed, release both locks and frees the chunks. Note that it's121121+ * necessary to grab both locks to remove a chunk from circulation as122122+ * allocation path might be referencing the chunk with only123123+ * pcpu_alloc_mutex locked.124124+ */125125+static DEFINE_MUTEX(pcpu_alloc_mutex); /* protects whole alloc and reclaim */126126+static DEFINE_SPINLOCK(pcpu_lock); /* protects index data structures */127127+128128+static struct list_head *pcpu_slot __read_mostly; /* chunk list slots */129129+static struct rb_root pcpu_addr_root = RB_ROOT; /* chunks by address */130130+131131+/* reclaim work to release fully free chunks, scheduled from free path */132132+static void pcpu_reclaim(struct work_struct *work);133133+static DECLARE_WORK(pcpu_reclaim_work, pcpu_reclaim);134134+135135+static int __pcpu_size_to_slot(int size)136136+{137137+ int highbit = fls(size); /* size is in bytes */138138+ return max(highbit - PCPU_SLOT_BASE_SHIFT + 2, 1);139139+}140140+141141+static int pcpu_size_to_slot(int size)142142+{143143+ if (size == pcpu_unit_size)144144+ return pcpu_nr_slots - 1;145145+ return __pcpu_size_to_slot(size);146146+}147147+148148+static int pcpu_chunk_slot(const struct pcpu_chunk *chunk)149149+{150150+ if (chunk->free_size < sizeof(int) || chunk->contig_hint < sizeof(int))151151+ return 0;152152+153153+ return pcpu_size_to_slot(chunk->free_size);154154+}155155+156156+static int pcpu_page_idx(unsigned int cpu, int page_idx)157157+{158158+ return cpu * pcpu_unit_pages + page_idx;159159+}160160+161161+static struct page **pcpu_chunk_pagep(struct pcpu_chunk *chunk,162162+ unsigned int cpu, int page_idx)163163+{164164+ return &chunk->page[pcpu_page_idx(cpu, page_idx)];165165+}166166+167167+static unsigned long pcpu_chunk_addr(struct pcpu_chunk *chunk,168168+ unsigned int cpu, int page_idx)169169+{170170+ return (unsigned long)chunk->vm->addr +171171+ (pcpu_page_idx(cpu, page_idx) << PAGE_SHIFT);172172+}173173+174174+static bool pcpu_chunk_page_occupied(struct pcpu_chunk *chunk,175175+ int page_idx)176176+{177177+ return *pcpu_chunk_pagep(chunk, 0, page_idx) != NULL;178178+}179179+180180+/**181181+ * pcpu_mem_alloc - allocate memory182182+ * @size: bytes to allocate183183+ *184184+ * Allocate @size bytes. If @size is smaller than PAGE_SIZE,185185+ * kzalloc() is used; otherwise, vmalloc() is used. The returned186186+ * memory is always zeroed.187187+ *188188+ * CONTEXT:189189+ * Does GFP_KERNEL allocation.190190+ *191191+ * RETURNS:192192+ * Pointer to the allocated area on success, NULL on failure.193193+ */194194+static void *pcpu_mem_alloc(size_t size)195195+{196196+ if (size <= PAGE_SIZE)197197+ return kzalloc(size, GFP_KERNEL);198198+ else {199199+ void *ptr = vmalloc(size);200200+ if (ptr)201201+ memset(ptr, 0, size);202202+ return ptr;203203+ }204204+}205205+206206+/**207207+ * pcpu_mem_free - free memory208208+ * @ptr: memory to free209209+ * @size: size of the area210210+ *211211+ * Free @ptr. @ptr should have been allocated using pcpu_mem_alloc().212212+ */213213+static void pcpu_mem_free(void *ptr, size_t size)214214+{215215+ if (size <= PAGE_SIZE)216216+ kfree(ptr);217217+ else218218+ vfree(ptr);219219+}220220+221221+/**222222+ * pcpu_chunk_relocate - put chunk in the appropriate chunk slot223223+ * @chunk: chunk of interest224224+ * @oslot: the previous slot it was on225225+ *226226+ * This function is called after an allocation or free changed @chunk.227227+ * New slot according to the changed state is determined and @chunk is228228+ * moved to the slot. Note that the reserved chunk is never put on229229+ * chunk slots.230230+ *231231+ * CONTEXT:232232+ * pcpu_lock.233233+ */234234+static void pcpu_chunk_relocate(struct pcpu_chunk *chunk, int oslot)235235+{236236+ int nslot = pcpu_chunk_slot(chunk);237237+238238+ if (chunk != pcpu_reserved_chunk && oslot != nslot) {239239+ if (oslot < nslot)240240+ list_move(&chunk->list, &pcpu_slot[nslot]);241241+ else242242+ list_move_tail(&chunk->list, &pcpu_slot[nslot]);243243+ }244244+}245245+246246+static struct rb_node **pcpu_chunk_rb_search(void *addr,247247+ struct rb_node **parentp)248248+{249249+ struct rb_node **p = &pcpu_addr_root.rb_node;250250+ struct rb_node *parent = NULL;251251+ struct pcpu_chunk *chunk;252252+253253+ while (*p) {254254+ parent = *p;255255+ chunk = rb_entry(parent, struct pcpu_chunk, rb_node);256256+257257+ if (addr < chunk->vm->addr)258258+ p = &(*p)->rb_left;259259+ else if (addr > chunk->vm->addr)260260+ p = &(*p)->rb_right;261261+ else262262+ break;263263+ }264264+265265+ if (parentp)266266+ *parentp = parent;267267+ return p;268268+}269269+270270+/**271271+ * pcpu_chunk_addr_search - search for chunk containing specified address272272+ * @addr: address to search for273273+ *274274+ * Look for chunk which might contain @addr. More specifically, it275275+ * searchs for the chunk with the highest start address which isn't276276+ * beyond @addr.277277+ *278278+ * CONTEXT:279279+ * pcpu_lock.280280+ *281281+ * RETURNS:282282+ * The address of the found chunk.283283+ */284284+static struct pcpu_chunk *pcpu_chunk_addr_search(void *addr)285285+{286286+ struct rb_node *n, *parent;287287+ struct pcpu_chunk *chunk;288288+289289+ /* is it in the reserved chunk? */290290+ if (pcpu_reserved_chunk) {291291+ void *start = pcpu_reserved_chunk->vm->addr;292292+293293+ if (addr >= start && addr < start + pcpu_reserved_chunk_limit)294294+ return pcpu_reserved_chunk;295295+ }296296+297297+ /* nah... search the regular ones */298298+ n = *pcpu_chunk_rb_search(addr, &parent);299299+ if (!n) {300300+ /* no exactly matching chunk, the parent is the closest */301301+ n = parent;302302+ BUG_ON(!n);303303+ }304304+ chunk = rb_entry(n, struct pcpu_chunk, rb_node);305305+306306+ if (addr < chunk->vm->addr) {307307+ /* the parent was the next one, look for the previous one */308308+ n = rb_prev(n);309309+ BUG_ON(!n);310310+ chunk = rb_entry(n, struct pcpu_chunk, rb_node);311311+ }312312+313313+ return chunk;314314+}315315+316316+/**317317+ * pcpu_chunk_addr_insert - insert chunk into address rb tree318318+ * @new: chunk to insert319319+ *320320+ * Insert @new into address rb tree.321321+ *322322+ * CONTEXT:323323+ * pcpu_lock.324324+ */325325+static void pcpu_chunk_addr_insert(struct pcpu_chunk *new)326326+{327327+ struct rb_node **p, *parent;328328+329329+ p = pcpu_chunk_rb_search(new->vm->addr, &parent);330330+ BUG_ON(*p);331331+ rb_link_node(&new->rb_node, parent, p);332332+ rb_insert_color(&new->rb_node, &pcpu_addr_root);333333+}334334+335335+/**336336+ * pcpu_extend_area_map - extend area map for allocation337337+ * @chunk: target chunk338338+ *339339+ * Extend area map of @chunk so that it can accomodate an allocation.340340+ * A single allocation can split an area into three areas, so this341341+ * function makes sure that @chunk->map has at least two extra slots.342342+ *343343+ * CONTEXT:344344+ * pcpu_alloc_mutex, pcpu_lock. pcpu_lock is released and reacquired345345+ * if area map is extended.346346+ *347347+ * RETURNS:348348+ * 0 if noop, 1 if successfully extended, -errno on failure.349349+ */350350+static int pcpu_extend_area_map(struct pcpu_chunk *chunk)351351+{352352+ int new_alloc;353353+ int *new;354354+ size_t size;355355+356356+ /* has enough? */357357+ if (chunk->map_alloc >= chunk->map_used + 2)358358+ return 0;359359+360360+ spin_unlock_irq(&pcpu_lock);361361+362362+ new_alloc = PCPU_DFL_MAP_ALLOC;363363+ while (new_alloc < chunk->map_used + 2)364364+ new_alloc *= 2;365365+366366+ new = pcpu_mem_alloc(new_alloc * sizeof(new[0]));367367+ if (!new) {368368+ spin_lock_irq(&pcpu_lock);369369+ return -ENOMEM;370370+ }371371+372372+ /*373373+ * Acquire pcpu_lock and switch to new area map. Only free374374+ * could have happened inbetween, so map_used couldn't have375375+ * grown.376376+ */377377+ spin_lock_irq(&pcpu_lock);378378+ BUG_ON(new_alloc < chunk->map_used + 2);379379+380380+ size = chunk->map_alloc * sizeof(chunk->map[0]);381381+ memcpy(new, chunk->map, size);382382+383383+ /*384384+ * map_alloc < PCPU_DFL_MAP_ALLOC indicates that the chunk is385385+ * one of the first chunks and still using static map.386386+ */387387+ if (chunk->map_alloc >= PCPU_DFL_MAP_ALLOC)388388+ pcpu_mem_free(chunk->map, size);389389+390390+ chunk->map_alloc = new_alloc;391391+ chunk->map = new;392392+ return 0;393393+}394394+395395+/**396396+ * pcpu_split_block - split a map block397397+ * @chunk: chunk of interest398398+ * @i: index of map block to split399399+ * @head: head size in bytes (can be 0)400400+ * @tail: tail size in bytes (can be 0)401401+ *402402+ * Split the @i'th map block into two or three blocks. If @head is403403+ * non-zero, @head bytes block is inserted before block @i moving it404404+ * to @i+1 and reducing its size by @head bytes.405405+ *406406+ * If @tail is non-zero, the target block, which can be @i or @i+1407407+ * depending on @head, is reduced by @tail bytes and @tail byte block408408+ * is inserted after the target block.409409+ *410410+ * @chunk->map must have enough free slots to accomodate the split.411411+ *412412+ * CONTEXT:413413+ * pcpu_lock.414414+ */415415+static void pcpu_split_block(struct pcpu_chunk *chunk, int i,416416+ int head, int tail)417417+{418418+ int nr_extra = !!head + !!tail;419419+420420+ BUG_ON(chunk->map_alloc < chunk->map_used + nr_extra);421421+422422+ /* insert new subblocks */423423+ memmove(&chunk->map[i + nr_extra], &chunk->map[i],424424+ sizeof(chunk->map[0]) * (chunk->map_used - i));425425+ chunk->map_used += nr_extra;426426+427427+ if (head) {428428+ chunk->map[i + 1] = chunk->map[i] - head;429429+ chunk->map[i++] = head;430430+ }431431+ if (tail) {432432+ chunk->map[i++] -= tail;433433+ chunk->map[i] = tail;434434+ }435435+}436436+437437+/**438438+ * pcpu_alloc_area - allocate area from a pcpu_chunk439439+ * @chunk: chunk of interest440440+ * @size: wanted size in bytes441441+ * @align: wanted align442442+ *443443+ * Try to allocate @size bytes area aligned at @align from @chunk.444444+ * Note that this function only allocates the offset. It doesn't445445+ * populate or map the area.446446+ *447447+ * @chunk->map must have at least two free slots.448448+ *449449+ * CONTEXT:450450+ * pcpu_lock.451451+ *452452+ * RETURNS:453453+ * Allocated offset in @chunk on success, -1 if no matching area is454454+ * found.455455+ */456456+static int pcpu_alloc_area(struct pcpu_chunk *chunk, int size, int align)457457+{458458+ int oslot = pcpu_chunk_slot(chunk);459459+ int max_contig = 0;460460+ int i, off;461461+462462+ for (i = 0, off = 0; i < chunk->map_used; off += abs(chunk->map[i++])) {463463+ bool is_last = i + 1 == chunk->map_used;464464+ int head, tail;465465+466466+ /* extra for alignment requirement */467467+ head = ALIGN(off, align) - off;468468+ BUG_ON(i == 0 && head != 0);469469+470470+ if (chunk->map[i] < 0)471471+ continue;472472+ if (chunk->map[i] < head + size) {473473+ max_contig = max(chunk->map[i], max_contig);474474+ continue;475475+ }476476+477477+ /*478478+ * If head is small or the previous block is free,479479+ * merge'em. Note that 'small' is defined as smaller480480+ * than sizeof(int), which is very small but isn't too481481+ * uncommon for percpu allocations.482482+ */483483+ if (head && (head < sizeof(int) || chunk->map[i - 1] > 0)) {484484+ if (chunk->map[i - 1] > 0)485485+ chunk->map[i - 1] += head;486486+ else {487487+ chunk->map[i - 1] -= head;488488+ chunk->free_size -= head;489489+ }490490+ chunk->map[i] -= head;491491+ off += head;492492+ head = 0;493493+ }494494+495495+ /* if tail is small, just keep it around */496496+ tail = chunk->map[i] - head - size;497497+ if (tail < sizeof(int))498498+ tail = 0;499499+500500+ /* split if warranted */501501+ if (head || tail) {502502+ pcpu_split_block(chunk, i, head, tail);503503+ if (head) {504504+ i++;505505+ off += head;506506+ max_contig = max(chunk->map[i - 1], max_contig);507507+ }508508+ if (tail)509509+ max_contig = max(chunk->map[i + 1], max_contig);510510+ }511511+512512+ /* update hint and mark allocated */513513+ if (is_last)514514+ chunk->contig_hint = max_contig; /* fully scanned */515515+ else516516+ chunk->contig_hint = max(chunk->contig_hint,517517+ max_contig);518518+519519+ chunk->free_size -= chunk->map[i];520520+ chunk->map[i] = -chunk->map[i];521521+522522+ pcpu_chunk_relocate(chunk, oslot);523523+ return off;524524+ }525525+526526+ chunk->contig_hint = max_contig; /* fully scanned */527527+ pcpu_chunk_relocate(chunk, oslot);528528+529529+ /* tell the upper layer that this chunk has no matching area */530530+ return -1;531531+}532532+533533+/**534534+ * pcpu_free_area - free area to a pcpu_chunk535535+ * @chunk: chunk of interest536536+ * @freeme: offset of area to free537537+ *538538+ * Free area starting from @freeme to @chunk. Note that this function539539+ * only modifies the allocation map. It doesn't depopulate or unmap540540+ * the area.541541+ *542542+ * CONTEXT:543543+ * pcpu_lock.544544+ */545545+static void pcpu_free_area(struct pcpu_chunk *chunk, int freeme)546546+{547547+ int oslot = pcpu_chunk_slot(chunk);548548+ int i, off;549549+550550+ for (i = 0, off = 0; i < chunk->map_used; off += abs(chunk->map[i++]))551551+ if (off == freeme)552552+ break;553553+ BUG_ON(off != freeme);554554+ BUG_ON(chunk->map[i] > 0);555555+556556+ chunk->map[i] = -chunk->map[i];557557+ chunk->free_size += chunk->map[i];558558+559559+ /* merge with previous? */560560+ if (i > 0 && chunk->map[i - 1] >= 0) {561561+ chunk->map[i - 1] += chunk->map[i];562562+ chunk->map_used--;563563+ memmove(&chunk->map[i], &chunk->map[i + 1],564564+ (chunk->map_used - i) * sizeof(chunk->map[0]));565565+ i--;566566+ }567567+ /* merge with next? */568568+ if (i + 1 < chunk->map_used && chunk->map[i + 1] >= 0) {569569+ chunk->map[i] += chunk->map[i + 1];570570+ chunk->map_used--;571571+ memmove(&chunk->map[i + 1], &chunk->map[i + 2],572572+ (chunk->map_used - (i + 1)) * sizeof(chunk->map[0]));573573+ }574574+575575+ chunk->contig_hint = max(chunk->map[i], chunk->contig_hint);576576+ pcpu_chunk_relocate(chunk, oslot);577577+}578578+579579+/**580580+ * pcpu_unmap - unmap pages out of a pcpu_chunk581581+ * @chunk: chunk of interest582582+ * @page_start: page index of the first page to unmap583583+ * @page_end: page index of the last page to unmap + 1584584+ * @flush: whether to flush cache and tlb or not585585+ *586586+ * For each cpu, unmap pages [@page_start,@page_end) out of @chunk.587587+ * If @flush is true, vcache is flushed before unmapping and tlb588588+ * after.589589+ */590590+static void pcpu_unmap(struct pcpu_chunk *chunk, int page_start, int page_end,591591+ bool flush)592592+{593593+ unsigned int last = num_possible_cpus() - 1;594594+ unsigned int cpu;595595+596596+ /* unmap must not be done on immutable chunk */597597+ WARN_ON(chunk->immutable);598598+599599+ /*600600+ * Each flushing trial can be very expensive, issue flush on601601+ * the whole region at once rather than doing it for each cpu.602602+ * This could be an overkill but is more scalable.603603+ */604604+ if (flush)605605+ flush_cache_vunmap(pcpu_chunk_addr(chunk, 0, page_start),606606+ pcpu_chunk_addr(chunk, last, page_end));607607+608608+ for_each_possible_cpu(cpu)609609+ unmap_kernel_range_noflush(610610+ pcpu_chunk_addr(chunk, cpu, page_start),611611+ (page_end - page_start) << PAGE_SHIFT);612612+613613+ /* ditto as flush_cache_vunmap() */614614+ if (flush)615615+ flush_tlb_kernel_range(pcpu_chunk_addr(chunk, 0, page_start),616616+ pcpu_chunk_addr(chunk, last, page_end));617617+}618618+619619+/**620620+ * pcpu_depopulate_chunk - depopulate and unmap an area of a pcpu_chunk621621+ * @chunk: chunk to depopulate622622+ * @off: offset to the area to depopulate623623+ * @size: size of the area to depopulate in bytes624624+ * @flush: whether to flush cache and tlb or not625625+ *626626+ * For each cpu, depopulate and unmap pages [@page_start,@page_end)627627+ * from @chunk. If @flush is true, vcache is flushed before unmapping628628+ * and tlb after.629629+ *630630+ * CONTEXT:631631+ * pcpu_alloc_mutex.632632+ */633633+static void pcpu_depopulate_chunk(struct pcpu_chunk *chunk, int off, int size,634634+ bool flush)635635+{636636+ int page_start = PFN_DOWN(off);637637+ int page_end = PFN_UP(off + size);638638+ int unmap_start = -1;639639+ int uninitialized_var(unmap_end);640640+ unsigned int cpu;641641+ int i;642642+643643+ for (i = page_start; i < page_end; i++) {644644+ for_each_possible_cpu(cpu) {645645+ struct page **pagep = pcpu_chunk_pagep(chunk, cpu, i);646646+647647+ if (!*pagep)648648+ continue;649649+650650+ __free_page(*pagep);651651+652652+ /*653653+ * If it's partial depopulation, it might get654654+ * populated or depopulated again. Mark the655655+ * page gone.656656+ */657657+ *pagep = NULL;658658+659659+ unmap_start = unmap_start < 0 ? i : unmap_start;660660+ unmap_end = i + 1;661661+ }662662+ }663663+664664+ if (unmap_start >= 0)665665+ pcpu_unmap(chunk, unmap_start, unmap_end, flush);666666+}667667+668668+/**669669+ * pcpu_map - map pages into a pcpu_chunk670670+ * @chunk: chunk of interest671671+ * @page_start: page index of the first page to map672672+ * @page_end: page index of the last page to map + 1673673+ *674674+ * For each cpu, map pages [@page_start,@page_end) into @chunk.675675+ * vcache is flushed afterwards.676676+ */677677+static int pcpu_map(struct pcpu_chunk *chunk, int page_start, int page_end)678678+{679679+ unsigned int last = num_possible_cpus() - 1;680680+ unsigned int cpu;681681+ int err;682682+683683+ /* map must not be done on immutable chunk */684684+ WARN_ON(chunk->immutable);685685+686686+ for_each_possible_cpu(cpu) {687687+ err = map_kernel_range_noflush(688688+ pcpu_chunk_addr(chunk, cpu, page_start),689689+ (page_end - page_start) << PAGE_SHIFT,690690+ PAGE_KERNEL,691691+ pcpu_chunk_pagep(chunk, cpu, page_start));692692+ if (err < 0)693693+ return err;694694+ }695695+696696+ /* flush at once, please read comments in pcpu_unmap() */697697+ flush_cache_vmap(pcpu_chunk_addr(chunk, 0, page_start),698698+ pcpu_chunk_addr(chunk, last, page_end));699699+ return 0;700700+}701701+702702+/**703703+ * pcpu_populate_chunk - populate and map an area of a pcpu_chunk704704+ * @chunk: chunk of interest705705+ * @off: offset to the area to populate706706+ * @size: size of the area to populate in bytes707707+ *708708+ * For each cpu, populate and map pages [@page_start,@page_end) into709709+ * @chunk. The area is cleared on return.710710+ *711711+ * CONTEXT:712712+ * pcpu_alloc_mutex, does GFP_KERNEL allocation.713713+ */714714+static int pcpu_populate_chunk(struct pcpu_chunk *chunk, int off, int size)715715+{716716+ const gfp_t alloc_mask = GFP_KERNEL | __GFP_HIGHMEM | __GFP_COLD;717717+ int page_start = PFN_DOWN(off);718718+ int page_end = PFN_UP(off + size);719719+ int map_start = -1;720720+ int uninitialized_var(map_end);721721+ unsigned int cpu;722722+ int i;723723+724724+ for (i = page_start; i < page_end; i++) {725725+ if (pcpu_chunk_page_occupied(chunk, i)) {726726+ if (map_start >= 0) {727727+ if (pcpu_map(chunk, map_start, map_end))728728+ goto err;729729+ map_start = -1;730730+ }731731+ continue;732732+ }733733+734734+ map_start = map_start < 0 ? i : map_start;735735+ map_end = i + 1;736736+737737+ for_each_possible_cpu(cpu) {738738+ struct page **pagep = pcpu_chunk_pagep(chunk, cpu, i);739739+740740+ *pagep = alloc_pages_node(cpu_to_node(cpu),741741+ alloc_mask, 0);742742+ if (!*pagep)743743+ goto err;744744+ }745745+ }746746+747747+ if (map_start >= 0 && pcpu_map(chunk, map_start, map_end))748748+ goto err;749749+750750+ for_each_possible_cpu(cpu)751751+ memset(chunk->vm->addr + cpu * pcpu_unit_size + off, 0,752752+ size);753753+754754+ return 0;755755+err:756756+ /* likely under heavy memory pressure, give memory back */757757+ pcpu_depopulate_chunk(chunk, off, size, true);758758+ return -ENOMEM;759759+}760760+761761+static void free_pcpu_chunk(struct pcpu_chunk *chunk)762762+{763763+ if (!chunk)764764+ return;765765+ if (chunk->vm)766766+ free_vm_area(chunk->vm);767767+ pcpu_mem_free(chunk->map, chunk->map_alloc * sizeof(chunk->map[0]));768768+ kfree(chunk);769769+}770770+771771+static struct pcpu_chunk *alloc_pcpu_chunk(void)772772+{773773+ struct pcpu_chunk *chunk;774774+775775+ chunk = kzalloc(pcpu_chunk_struct_size, GFP_KERNEL);776776+ if (!chunk)777777+ return NULL;778778+779779+ chunk->map = pcpu_mem_alloc(PCPU_DFL_MAP_ALLOC * sizeof(chunk->map[0]));780780+ chunk->map_alloc = PCPU_DFL_MAP_ALLOC;781781+ chunk->map[chunk->map_used++] = pcpu_unit_size;782782+ chunk->page = chunk->page_ar;783783+784784+ chunk->vm = get_vm_area(pcpu_chunk_size, GFP_KERNEL);785785+ if (!chunk->vm) {786786+ free_pcpu_chunk(chunk);787787+ return NULL;788788+ }789789+790790+ INIT_LIST_HEAD(&chunk->list);791791+ chunk->free_size = pcpu_unit_size;792792+ chunk->contig_hint = pcpu_unit_size;793793+794794+ return chunk;795795+}796796+797797+/**798798+ * pcpu_alloc - the percpu allocator799799+ * @size: size of area to allocate in bytes800800+ * @align: alignment of area (max PAGE_SIZE)801801+ * @reserved: allocate from the reserved chunk if available802802+ *803803+ * Allocate percpu area of @size bytes aligned at @align.804804+ *805805+ * CONTEXT:806806+ * Does GFP_KERNEL allocation.807807+ *808808+ * RETURNS:809809+ * Percpu pointer to the allocated area on success, NULL on failure.810810+ */811811+static void *pcpu_alloc(size_t size, size_t align, bool reserved)812812+{813813+ struct pcpu_chunk *chunk;814814+ int slot, off;815815+816816+ if (unlikely(!size || size > PCPU_MIN_UNIT_SIZE || align > PAGE_SIZE)) {817817+ WARN(true, "illegal size (%zu) or align (%zu) for "818818+ "percpu allocation\n", size, align);819819+ return NULL;820820+ }821821+822822+ mutex_lock(&pcpu_alloc_mutex);823823+ spin_lock_irq(&pcpu_lock);824824+825825+ /* serve reserved allocations from the reserved chunk if available */826826+ if (reserved && pcpu_reserved_chunk) {827827+ chunk = pcpu_reserved_chunk;828828+ if (size > chunk->contig_hint ||829829+ pcpu_extend_area_map(chunk) < 0)830830+ goto fail_unlock;831831+ off = pcpu_alloc_area(chunk, size, align);832832+ if (off >= 0)833833+ goto area_found;834834+ goto fail_unlock;835835+ }836836+837837+restart:838838+ /* search through normal chunks */839839+ for (slot = pcpu_size_to_slot(size); slot < pcpu_nr_slots; slot++) {840840+ list_for_each_entry(chunk, &pcpu_slot[slot], list) {841841+ if (size > chunk->contig_hint)842842+ continue;843843+844844+ switch (pcpu_extend_area_map(chunk)) {845845+ case 0:846846+ break;847847+ case 1:848848+ goto restart; /* pcpu_lock dropped, restart */849849+ default:850850+ goto fail_unlock;851851+ }852852+853853+ off = pcpu_alloc_area(chunk, size, align);854854+ if (off >= 0)855855+ goto area_found;856856+ }857857+ }858858+859859+ /* hmmm... no space left, create a new chunk */860860+ spin_unlock_irq(&pcpu_lock);861861+862862+ chunk = alloc_pcpu_chunk();863863+ if (!chunk)864864+ goto fail_unlock_mutex;865865+866866+ spin_lock_irq(&pcpu_lock);867867+ pcpu_chunk_relocate(chunk, -1);868868+ pcpu_chunk_addr_insert(chunk);869869+ goto restart;870870+871871+area_found:872872+ spin_unlock_irq(&pcpu_lock);873873+874874+ /* populate, map and clear the area */875875+ if (pcpu_populate_chunk(chunk, off, size)) {876876+ spin_lock_irq(&pcpu_lock);877877+ pcpu_free_area(chunk, off);878878+ goto fail_unlock;879879+ }880880+881881+ mutex_unlock(&pcpu_alloc_mutex);882882+883883+ return __addr_to_pcpu_ptr(chunk->vm->addr + off);884884+885885+fail_unlock:886886+ spin_unlock_irq(&pcpu_lock);887887+fail_unlock_mutex:888888+ mutex_unlock(&pcpu_alloc_mutex);889889+ return NULL;890890+}891891+892892+/**893893+ * __alloc_percpu - allocate dynamic percpu area894894+ * @size: size of area to allocate in bytes895895+ * @align: alignment of area (max PAGE_SIZE)896896+ *897897+ * Allocate percpu area of @size bytes aligned at @align. Might898898+ * sleep. Might trigger writeouts.899899+ *900900+ * CONTEXT:901901+ * Does GFP_KERNEL allocation.902902+ *903903+ * RETURNS:904904+ * Percpu pointer to the allocated area on success, NULL on failure.905905+ */906906+void *__alloc_percpu(size_t size, size_t align)907907+{908908+ return pcpu_alloc(size, align, false);909909+}910910+EXPORT_SYMBOL_GPL(__alloc_percpu);911911+912912+/**913913+ * __alloc_reserved_percpu - allocate reserved percpu area914914+ * @size: size of area to allocate in bytes915915+ * @align: alignment of area (max PAGE_SIZE)916916+ *917917+ * Allocate percpu area of @size bytes aligned at @align from reserved918918+ * percpu area if arch has set it up; otherwise, allocation is served919919+ * from the same dynamic area. Might sleep. Might trigger writeouts.920920+ *921921+ * CONTEXT:922922+ * Does GFP_KERNEL allocation.923923+ *924924+ * RETURNS:925925+ * Percpu pointer to the allocated area on success, NULL on failure.926926+ */927927+void *__alloc_reserved_percpu(size_t size, size_t align)928928+{929929+ return pcpu_alloc(size, align, true);930930+}931931+932932+/**933933+ * pcpu_reclaim - reclaim fully free chunks, workqueue function934934+ * @work: unused935935+ *936936+ * Reclaim all fully free chunks except for the first one.937937+ *938938+ * CONTEXT:939939+ * workqueue context.940940+ */941941+static void pcpu_reclaim(struct work_struct *work)942942+{943943+ LIST_HEAD(todo);944944+ struct list_head *head = &pcpu_slot[pcpu_nr_slots - 1];945945+ struct pcpu_chunk *chunk, *next;946946+947947+ mutex_lock(&pcpu_alloc_mutex);948948+ spin_lock_irq(&pcpu_lock);949949+950950+ list_for_each_entry_safe(chunk, next, head, list) {951951+ WARN_ON(chunk->immutable);952952+953953+ /* spare the first one */954954+ if (chunk == list_first_entry(head, struct pcpu_chunk, list))955955+ continue;956956+957957+ rb_erase(&chunk->rb_node, &pcpu_addr_root);958958+ list_move(&chunk->list, &todo);959959+ }960960+961961+ spin_unlock_irq(&pcpu_lock);962962+ mutex_unlock(&pcpu_alloc_mutex);963963+964964+ list_for_each_entry_safe(chunk, next, &todo, list) {965965+ pcpu_depopulate_chunk(chunk, 0, pcpu_unit_size, false);966966+ free_pcpu_chunk(chunk);967967+ }968968+}969969+970970+/**971971+ * free_percpu - free percpu area972972+ * @ptr: pointer to area to free973973+ *974974+ * Free percpu area @ptr.975975+ *976976+ * CONTEXT:977977+ * Can be called from atomic context.978978+ */979979+void free_percpu(void *ptr)980980+{981981+ void *addr = __pcpu_ptr_to_addr(ptr);982982+ struct pcpu_chunk *chunk;983983+ unsigned long flags;984984+ int off;985985+986986+ if (!ptr)987987+ return;988988+989989+ spin_lock_irqsave(&pcpu_lock, flags);990990+991991+ chunk = pcpu_chunk_addr_search(addr);992992+ off = addr - chunk->vm->addr;993993+994994+ pcpu_free_area(chunk, off);995995+996996+ /* if there are more than one fully free chunks, wake up grim reaper */997997+ if (chunk->free_size == pcpu_unit_size) {998998+ struct pcpu_chunk *pos;999999+10001000+ list_for_each_entry(pos, &pcpu_slot[pcpu_nr_slots - 1], list)10011001+ if (pos != chunk) {10021002+ schedule_work(&pcpu_reclaim_work);10031003+ break;10041004+ }10051005+ }10061006+10071007+ spin_unlock_irqrestore(&pcpu_lock, flags);10081008+}10091009+EXPORT_SYMBOL_GPL(free_percpu);10101010+10111011+/**10121012+ * pcpu_setup_first_chunk - initialize the first percpu chunk10131013+ * @get_page_fn: callback to fetch page pointer10141014+ * @static_size: the size of static percpu area in bytes10151015+ * @reserved_size: the size of reserved percpu area in bytes10161016+ * @unit_size: unit size in bytes, must be multiple of PAGE_SIZE, -1 for auto10171017+ * @dyn_size: free size for dynamic allocation in bytes, -1 for auto10181018+ * @base_addr: mapped address, NULL for auto10191019+ * @populate_pte_fn: callback to allocate pagetable, NULL if unnecessary10201020+ *10211021+ * Initialize the first percpu chunk which contains the kernel static10221022+ * perpcu area. This function is to be called from arch percpu area10231023+ * setup path. The first two parameters are mandatory. The rest are10241024+ * optional.10251025+ *10261026+ * @get_page_fn() should return pointer to percpu page given cpu10271027+ * number and page number. It should at least return enough pages to10281028+ * cover the static area. The returned pages for static area should10291029+ * have been initialized with valid data. If @unit_size is specified,10301030+ * it can also return pages after the static area. NULL return10311031+ * indicates end of pages for the cpu. Note that @get_page_fn() must10321032+ * return the same number of pages for all cpus.10331033+ *10341034+ * @reserved_size, if non-zero, specifies the amount of bytes to10351035+ * reserve after the static area in the first chunk. This reserves10361036+ * the first chunk such that it's available only through reserved10371037+ * percpu allocation. This is primarily used to serve module percpu10381038+ * static areas on architectures where the addressing model has10391039+ * limited offset range for symbol relocations to guarantee module10401040+ * percpu symbols fall inside the relocatable range.10411041+ *10421042+ * @unit_size, if non-negative, specifies unit size and must be10431043+ * aligned to PAGE_SIZE and equal to or larger than @static_size +10441044+ * @reserved_size + @dyn_size.10451045+ *10461046+ * @dyn_size, if non-negative, limits the number of bytes available10471047+ * for dynamic allocation in the first chunk. Specifying non-negative10481048+ * value make percpu leave alone the area beyond @static_size +10491049+ * @reserved_size + @dyn_size.10501050+ *10511051+ * Non-null @base_addr means that the caller already allocated virtual10521052+ * region for the first chunk and mapped it. percpu must not mess10531053+ * with the chunk. Note that @base_addr with 0 @unit_size or non-NULL10541054+ * @populate_pte_fn doesn't make any sense.10551055+ *10561056+ * @populate_pte_fn is used to populate the pagetable. NULL means the10571057+ * caller already populated the pagetable.10581058+ *10591059+ * If the first chunk ends up with both reserved and dynamic areas, it10601060+ * is served by two chunks - one to serve the core static and reserved10611061+ * areas and the other for the dynamic area. They share the same vm10621062+ * and page map but uses different area allocation map to stay away10631063+ * from each other. The latter chunk is circulated in the chunk slots10641064+ * and available for dynamic allocation like any other chunks.10651065+ *10661066+ * RETURNS:10671067+ * The determined pcpu_unit_size which can be used to initialize10681068+ * percpu access.10691069+ */10701070+size_t __init pcpu_setup_first_chunk(pcpu_get_page_fn_t get_page_fn,10711071+ size_t static_size, size_t reserved_size,10721072+ ssize_t unit_size, ssize_t dyn_size,10731073+ void *base_addr,10741074+ pcpu_populate_pte_fn_t populate_pte_fn)10751075+{10761076+ static struct vm_struct first_vm;10771077+ static int smap[2], dmap[2];10781078+ struct pcpu_chunk *schunk, *dchunk = NULL;10791079+ unsigned int cpu;10801080+ int nr_pages;10811081+ int err, i;10821082+10831083+ /* santiy checks */10841084+ BUILD_BUG_ON(ARRAY_SIZE(smap) >= PCPU_DFL_MAP_ALLOC ||10851085+ ARRAY_SIZE(dmap) >= PCPU_DFL_MAP_ALLOC);10861086+ BUG_ON(!static_size);10871087+ if (unit_size >= 0) {10881088+ BUG_ON(unit_size < static_size + reserved_size +10891089+ (dyn_size >= 0 ? dyn_size : 0));10901090+ BUG_ON(unit_size & ~PAGE_MASK);10911091+ } else {10921092+ BUG_ON(dyn_size >= 0);10931093+ BUG_ON(base_addr);10941094+ }10951095+ BUG_ON(base_addr && populate_pte_fn);10961096+10971097+ if (unit_size >= 0)10981098+ pcpu_unit_pages = unit_size >> PAGE_SHIFT;10991099+ else11001100+ pcpu_unit_pages = max_t(int, PCPU_MIN_UNIT_SIZE >> PAGE_SHIFT,11011101+ PFN_UP(static_size + reserved_size));11021102+11031103+ pcpu_unit_size = pcpu_unit_pages << PAGE_SHIFT;11041104+ pcpu_chunk_size = num_possible_cpus() * pcpu_unit_size;11051105+ pcpu_chunk_struct_size = sizeof(struct pcpu_chunk)11061106+ + num_possible_cpus() * pcpu_unit_pages * sizeof(struct page *);11071107+11081108+ if (dyn_size < 0)11091109+ dyn_size = pcpu_unit_size - static_size - reserved_size;11101110+11111111+ /*11121112+ * Allocate chunk slots. The additional last slot is for11131113+ * empty chunks.11141114+ */11151115+ pcpu_nr_slots = __pcpu_size_to_slot(pcpu_unit_size) + 2;11161116+ pcpu_slot = alloc_bootmem(pcpu_nr_slots * sizeof(pcpu_slot[0]));11171117+ for (i = 0; i < pcpu_nr_slots; i++)11181118+ INIT_LIST_HEAD(&pcpu_slot[i]);11191119+11201120+ /*11211121+ * Initialize static chunk. If reserved_size is zero, the11221122+ * static chunk covers static area + dynamic allocation area11231123+ * in the first chunk. If reserved_size is not zero, it11241124+ * covers static area + reserved area (mostly used for module11251125+ * static percpu allocation).11261126+ */11271127+ schunk = alloc_bootmem(pcpu_chunk_struct_size);11281128+ INIT_LIST_HEAD(&schunk->list);11291129+ schunk->vm = &first_vm;11301130+ schunk->map = smap;11311131+ schunk->map_alloc = ARRAY_SIZE(smap);11321132+ schunk->page = schunk->page_ar;11331133+11341134+ if (reserved_size) {11351135+ schunk->free_size = reserved_size;11361136+ pcpu_reserved_chunk = schunk; /* not for dynamic alloc */11371137+ } else {11381138+ schunk->free_size = dyn_size;11391139+ dyn_size = 0; /* dynamic area covered */11401140+ }11411141+ schunk->contig_hint = schunk->free_size;11421142+11431143+ schunk->map[schunk->map_used++] = -static_size;11441144+ if (schunk->free_size)11451145+ schunk->map[schunk->map_used++] = schunk->free_size;11461146+11471147+ pcpu_reserved_chunk_limit = static_size + schunk->free_size;11481148+11491149+ /* init dynamic chunk if necessary */11501150+ if (dyn_size) {11511151+ dchunk = alloc_bootmem(sizeof(struct pcpu_chunk));11521152+ INIT_LIST_HEAD(&dchunk->list);11531153+ dchunk->vm = &first_vm;11541154+ dchunk->map = dmap;11551155+ dchunk->map_alloc = ARRAY_SIZE(dmap);11561156+ dchunk->page = schunk->page_ar; /* share page map with schunk */11571157+11581158+ dchunk->contig_hint = dchunk->free_size = dyn_size;11591159+ dchunk->map[dchunk->map_used++] = -pcpu_reserved_chunk_limit;11601160+ dchunk->map[dchunk->map_used++] = dchunk->free_size;11611161+ }11621162+11631163+ /* allocate vm address */11641164+ first_vm.flags = VM_ALLOC;11651165+ first_vm.size = pcpu_chunk_size;11661166+11671167+ if (!base_addr)11681168+ vm_area_register_early(&first_vm, PAGE_SIZE);11691169+ else {11701170+ /*11711171+ * Pages already mapped. No need to remap into11721172+ * vmalloc area. In this case the first chunks can't11731173+ * be mapped or unmapped by percpu and are marked11741174+ * immutable.11751175+ */11761176+ first_vm.addr = base_addr;11771177+ schunk->immutable = true;11781178+ if (dchunk)11791179+ dchunk->immutable = true;11801180+ }11811181+11821182+ /* assign pages */11831183+ nr_pages = -1;11841184+ for_each_possible_cpu(cpu) {11851185+ for (i = 0; i < pcpu_unit_pages; i++) {11861186+ struct page *page = get_page_fn(cpu, i);11871187+11881188+ if (!page)11891189+ break;11901190+ *pcpu_chunk_pagep(schunk, cpu, i) = page;11911191+ }11921192+11931193+ BUG_ON(i < PFN_UP(static_size));11941194+11951195+ if (nr_pages < 0)11961196+ nr_pages = i;11971197+ else11981198+ BUG_ON(nr_pages != i);11991199+ }12001200+12011201+ /* map them */12021202+ if (populate_pte_fn) {12031203+ for_each_possible_cpu(cpu)12041204+ for (i = 0; i < nr_pages; i++)12051205+ populate_pte_fn(pcpu_chunk_addr(schunk,12061206+ cpu, i));12071207+12081208+ err = pcpu_map(schunk, 0, nr_pages);12091209+ if (err)12101210+ panic("failed to setup static percpu area, err=%d\n",12111211+ err);12121212+ }12131213+12141214+ /* link the first chunk in */12151215+ if (!dchunk) {12161216+ pcpu_chunk_relocate(schunk, -1);12171217+ pcpu_chunk_addr_insert(schunk);12181218+ } else {12191219+ pcpu_chunk_relocate(dchunk, -1);12201220+ pcpu_chunk_addr_insert(dchunk);12211221+ }12221222+12231223+ /* we're done */12241224+ pcpu_base_addr = (void *)pcpu_chunk_addr(schunk, 0, 0);12251225+ return pcpu_unit_size;12261226+}
+91-3
mm/vmalloc.c
···2424#include <linux/radix-tree.h>2525#include <linux/rcupdate.h>2626#include <linux/bootmem.h>2727+#include <linux/pfn.h>27282829#include <asm/atomic.h>2930#include <asm/uaccess.h>···153152 *154153 * Ie. pte at addr+N*PAGE_SIZE shall point to pfn corresponding to pages[N]155154 */156156-static int vmap_page_range(unsigned long start, unsigned long end,157157- pgprot_t prot, struct page **pages)155155+static int vmap_page_range_noflush(unsigned long start, unsigned long end,156156+ pgprot_t prot, struct page **pages)158157{159158 pgd_t *pgd;160159 unsigned long next;···170169 if (err)171170 break;172171 } while (pgd++, addr = next, addr != end);173173- flush_cache_vmap(start, end);174172175173 if (unlikely(err))176174 return err;177175 return nr;176176+}177177+178178+static int vmap_page_range(unsigned long start, unsigned long end,179179+ pgprot_t prot, struct page **pages)180180+{181181+ int ret;182182+183183+ ret = vmap_page_range_noflush(start, end, prot, pages);184184+ flush_cache_vmap(start, end);185185+ return ret;178186}179187180188static inline int is_vmalloc_or_module_addr(const void *x)···1000990}1001991EXPORT_SYMBOL(vm_map_ram);1002992993993+/**994994+ * vm_area_register_early - register vmap area early during boot995995+ * @vm: vm_struct to register996996+ * @align: requested alignment997997+ *998998+ * This function is used to register kernel vm area before999999+ * vmalloc_init() is called. @vm->size and @vm->flags should contain10001000+ * proper values on entry and other fields should be zero. On return,10011001+ * vm->addr contains the allocated address.10021002+ *10031003+ * DO NOT USE THIS FUNCTION UNLESS YOU KNOW WHAT YOU'RE DOING.10041004+ */10051005+void __init vm_area_register_early(struct vm_struct *vm, size_t align)10061006+{10071007+ static size_t vm_init_off __initdata;10081008+ unsigned long addr;10091009+10101010+ addr = ALIGN(VMALLOC_START + vm_init_off, align);10111011+ vm_init_off = PFN_ALIGN(addr + vm->size) - VMALLOC_START;10121012+10131013+ vm->addr = (void *)addr;10141014+10151015+ vm->next = vmlist;10161016+ vmlist = vm;10171017+}10181018+10031019void __init vmalloc_init(void)10041020{10051021 struct vmap_area *va;···10531017 vmap_initialized = true;10541018}1055101910201020+/**10211021+ * map_kernel_range_noflush - map kernel VM area with the specified pages10221022+ * @addr: start of the VM area to map10231023+ * @size: size of the VM area to map10241024+ * @prot: page protection flags to use10251025+ * @pages: pages to map10261026+ *10271027+ * Map PFN_UP(@size) pages at @addr. The VM area @addr and @size10281028+ * specify should have been allocated using get_vm_area() and its10291029+ * friends.10301030+ *10311031+ * NOTE:10321032+ * This function does NOT do any cache flushing. The caller is10331033+ * responsible for calling flush_cache_vmap() on to-be-mapped areas10341034+ * before calling this function.10351035+ *10361036+ * RETURNS:10371037+ * The number of pages mapped on success, -errno on failure.10381038+ */10391039+int map_kernel_range_noflush(unsigned long addr, unsigned long size,10401040+ pgprot_t prot, struct page **pages)10411041+{10421042+ return vmap_page_range_noflush(addr, addr + size, prot, pages);10431043+}10441044+10451045+/**10461046+ * unmap_kernel_range_noflush - unmap kernel VM area10471047+ * @addr: start of the VM area to unmap10481048+ * @size: size of the VM area to unmap10491049+ *10501050+ * Unmap PFN_UP(@size) pages at @addr. The VM area @addr and @size10511051+ * specify should have been allocated using get_vm_area() and its10521052+ * friends.10531053+ *10541054+ * NOTE:10551055+ * This function does NOT do any cache flushing. The caller is10561056+ * responsible for calling flush_cache_vunmap() on to-be-mapped areas10571057+ * before calling this function and flush_tlb_kernel_range() after.10581058+ */10591059+void unmap_kernel_range_noflush(unsigned long addr, unsigned long size)10601060+{10611061+ vunmap_page_range(addr, addr + size);10621062+}10631063+10641064+/**10651065+ * unmap_kernel_range - unmap kernel VM area and flush cache and TLB10661066+ * @addr: start of the VM area to unmap10671067+ * @size: size of the VM area to unmap10681068+ *10691069+ * Similar to unmap_kernel_range_noflush() but flushes vcache before10701070+ * the unmapping and tlb after.10711071+ */10561072void unmap_kernel_range(unsigned long addr, unsigned long size)10571073{10581074 unsigned long end = addr + size;
···2267226722682268 rcu_read_lock();2269226922702270- /* Don't receive packets in an exiting network namespace */22712271- if (!net_alive(dev_net(skb->dev))) {22722272- kfree_skb(skb);22732273- goto out;22742274- }22752275-22762270#ifdef CONFIG_NET_CLS_ACT22772271 if (skb->tc_verd & TC_NCLS) {22782272 skb->tc_verd = CLR_TC_NCLS(skb->tc_verd);···42824288}42834289EXPORT_SYMBOL(netdev_fix_features);4284429042914291+/* Some devices need to (re-)set their netdev_ops inside42924292+ * ->init() or similar. If that happens, we have to setup42934293+ * the compat pointers again.42944294+ */42954295+void netdev_resync_ops(struct net_device *dev)42964296+{42974297+#ifdef CONFIG_COMPAT_NET_DEV_OPS42984298+ const struct net_device_ops *ops = dev->netdev_ops;42994299+43004300+ dev->init = ops->ndo_init;43014301+ dev->uninit = ops->ndo_uninit;43024302+ dev->open = ops->ndo_open;43034303+ dev->change_rx_flags = ops->ndo_change_rx_flags;43044304+ dev->set_rx_mode = ops->ndo_set_rx_mode;43054305+ dev->set_multicast_list = ops->ndo_set_multicast_list;43064306+ dev->set_mac_address = ops->ndo_set_mac_address;43074307+ dev->validate_addr = ops->ndo_validate_addr;43084308+ dev->do_ioctl = ops->ndo_do_ioctl;43094309+ dev->set_config = ops->ndo_set_config;43104310+ dev->change_mtu = ops->ndo_change_mtu;43114311+ dev->neigh_setup = ops->ndo_neigh_setup;43124312+ dev->tx_timeout = ops->ndo_tx_timeout;43134313+ dev->get_stats = ops->ndo_get_stats;43144314+ dev->vlan_rx_register = ops->ndo_vlan_rx_register;43154315+ dev->vlan_rx_add_vid = ops->ndo_vlan_rx_add_vid;43164316+ dev->vlan_rx_kill_vid = ops->ndo_vlan_rx_kill_vid;43174317+#ifdef CONFIG_NET_POLL_CONTROLLER43184318+ dev->poll_controller = ops->ndo_poll_controller;43194319+#endif43204320+#endif43214321+}43224322+EXPORT_SYMBOL(netdev_resync_ops);43234323+42854324/**42864325 * register_netdevice - register a network device42874326 * @dev: device to register···43594332 * This is temporary until all network devices are converted.43604333 */43614334 if (dev->netdev_ops) {43624362- const struct net_device_ops *ops = dev->netdev_ops;43634363-43644364- dev->init = ops->ndo_init;43654365- dev->uninit = ops->ndo_uninit;43664366- dev->open = ops->ndo_open;43674367- dev->change_rx_flags = ops->ndo_change_rx_flags;43684368- dev->set_rx_mode = ops->ndo_set_rx_mode;43694369- dev->set_multicast_list = ops->ndo_set_multicast_list;43704370- dev->set_mac_address = ops->ndo_set_mac_address;43714371- dev->validate_addr = ops->ndo_validate_addr;43724372- dev->do_ioctl = ops->ndo_do_ioctl;43734373- dev->set_config = ops->ndo_set_config;43744374- dev->change_mtu = ops->ndo_change_mtu;43754375- dev->tx_timeout = ops->ndo_tx_timeout;43764376- dev->get_stats = ops->ndo_get_stats;43774377- dev->vlan_rx_register = ops->ndo_vlan_rx_register;43784378- dev->vlan_rx_add_vid = ops->ndo_vlan_rx_add_vid;43794379- dev->vlan_rx_kill_vid = ops->ndo_vlan_rx_kill_vid;43804380-#ifdef CONFIG_NET_POLL_CONTROLLER43814381- dev->poll_controller = ops->ndo_poll_controller;43824382-#endif43354335+ netdev_resync_ops(dev);43834336 } else {43844337 char drivername[64];43854338 pr_info("%s (%s): not using net_device_ops yet\n",
+3-1
net/core/net-sysfs.c
···7777 if (endp == buf)7878 goto err;79798080- rtnl_lock();8080+ if (!rtnl_trylock())8181+ return -ERESTARTSYS;8282+8183 if (dev_isalive(net)) {8284 if ((ret = (*set)(net, new)) == 0)8385 ret = len;
-3
net/core/net_namespace.c
···157157 struct pernet_operations *ops;158158 struct net *net;159159160160- /* Be very certain incoming network packets will not find us */161161- rcu_barrier();162162-163160 net = container_of(work, struct net, work);164161165162 mutex_lock(&net_mutex);
+2-2
net/ipv4/af_inet.c
···13751375int snmp_mib_init(void *ptr[2], size_t mibsize)13761376{13771377 BUG_ON(ptr == NULL);13781378- ptr[0] = __alloc_percpu(mibsize);13781378+ ptr[0] = __alloc_percpu(mibsize, __alignof__(unsigned long long));13791379 if (!ptr[0])13801380 goto err0;13811381- ptr[1] = __alloc_percpu(mibsize);13811381+ ptr[1] = __alloc_percpu(mibsize, __alignof__(unsigned long long));13821382 if (!ptr[1])13831383 goto err1;13841384 return 0;
···33763376 int rc = 0;3377337733783378#ifdef CONFIG_NET_CLS_ROUTE33793379- ip_rt_acct = __alloc_percpu(256 * sizeof(struct ip_rt_acct));33793379+ ip_rt_acct = __alloc_percpu(256 * sizeof(struct ip_rt_acct), __alignof__(struct ip_rt_acct));33803380 if (!ip_rt_acct)33813381 panic("IP: failed to allocate ip_rt_acct\n");33823382#endif
+1-1
net/ipv4/tcp_ipv4.c
···24432443void __init tcp_v4_init(void)24442444{24452445 inet_hashinfo_init(&tcp_hashinfo);24462446- if (register_pernet_device(&tcp_sk_ops))24462446+ if (register_pernet_subsys(&tcp_sk_ops))24472447 panic("Failed to create the TCP control socket.\n");24482448}24492449
+17-36
net/ipv6/addrconf.c
···493493 read_unlock(&dev_base_lock);494494}495495496496-static void addrconf_fixup_forwarding(struct ctl_table *table, int *p, int old)496496+static int addrconf_fixup_forwarding(struct ctl_table *table, int *p, int old)497497{498498 struct net *net;499499500500 net = (struct net *)table->extra2;501501 if (p == &net->ipv6.devconf_dflt->forwarding)502502- return;502502+ return 0;503503504504- rtnl_lock();504504+ if (!rtnl_trylock())505505+ return -ERESTARTSYS;506506+505507 if (p == &net->ipv6.devconf_all->forwarding) {506508 __s32 newf = net->ipv6.devconf_all->forwarding;507509 net->ipv6.devconf_dflt->forwarding = newf;···514512515513 if (*p)516514 rt6_purge_dflt_routers(net);515515+ return 1;517516}518517#endif519518···2611260826122609 ASSERT_RTNL();2613261026142614- if ((dev->flags & IFF_LOOPBACK) && how == 1)26152615- how = 0;26162616-26172611 rt6_ifdown(net, dev);26182612 neigh_ifdown(&nd_tbl, dev);26192613···39833983 ret = proc_dointvec(ctl, write, filp, buffer, lenp, ppos);3984398439853985 if (write)39863986- addrconf_fixup_forwarding(ctl, valp, val);39863986+ ret = addrconf_fixup_forwarding(ctl, valp, val);39873987 return ret;39883988}39893989···40194019 }4020402040214021 *valp = new;40224022- addrconf_fixup_forwarding(table, valp, val);40234023- return 1;40224022+ return addrconf_fixup_forwarding(table, valp, val);40244023}4025402440264025static struct addrconf_sysctl_table···4445444644464447EXPORT_SYMBOL(unregister_inet6addr_notifier);4447444844484448-static void addrconf_net_exit(struct net *net)44494449-{44504450- struct net_device *dev;44514451-44524452- rtnl_lock();44534453- /* clean dev list */44544454- for_each_netdev(net, dev) {44554455- if (__in6_dev_get(dev) == NULL)44564456- continue;44574457- addrconf_ifdown(dev, 1);44584458- }44594459- addrconf_ifdown(net->loopback_dev, 2);44604460- rtnl_unlock();44614461-}44624462-44634463-static struct pernet_operations addrconf_net_ops = {44644464- .exit = addrconf_net_exit,44654465-};44664466-44674449/*44684450 * Init / cleanup code44694451 */···44864506 if (err)44874507 goto errlo;4488450844894489- err = register_pernet_device(&addrconf_net_ops);44904490- if (err)44914491- return err;44924492-44934509 register_netdevice_notifier(&ipv6_dev_notf);4494451044954511 addrconf_verify(0);···45154539void addrconf_cleanup(void)45164540{45174541 struct inet6_ifaddr *ifa;45424542+ struct net_device *dev;45184543 int i;4519454445204545 unregister_netdevice_notifier(&ipv6_dev_notf);45214521- unregister_pernet_device(&addrconf_net_ops);45224522-45234546 unregister_pernet_subsys(&addrconf_ops);4524454745254548 rtnl_lock();45494549+45504550+ /* clean dev list */45514551+ for_each_netdev(&init_net, dev) {45524552+ if (__in6_dev_get(dev) == NULL)45534553+ continue;45544554+ addrconf_ifdown(dev, 1);45554555+ }45564556+ addrconf_ifdown(init_net.loopback_dev, 2);4526455745274558 /*45284559 * Check hash table.···4551456845524569 del_timer(&addr_chk_timer);45534570 rtnl_unlock();45544554-45554555- unregister_pernet_subsys(&addrconf_net_ops);45564571}
+16-5
net/ipv6/af_inet6.c
···7272static struct list_head inetsw6[SOCK_MAX];7373static DEFINE_SPINLOCK(inetsw6_lock);74747575+static int disable_ipv6 = 0;7676+module_param_named(disable, disable_ipv6, int, 0);7777+MODULE_PARM_DESC(disable, "Disable IPv6 such that it is non-functional");7878+7579static __inline__ struct ipv6_pinfo *inet6_sk_generic(struct sock *sk)7680{7781 const int offset = sk->sk_prot->obj_size - sizeof(struct ipv6_pinfo);···995991{996992 struct sk_buff *dummy_skb;997993 struct list_head *r;998998- int err;994994+ int err = 0;9999951000996 BUILD_BUG_ON(sizeof(struct inet6_skb_parm) > sizeof(dummy_skb->cb));997997+998998+ /* Register the socket-side information for inet6_create. */999999+ for(r = &inetsw6[0]; r < &inetsw6[SOCK_MAX]; ++r)10001000+ INIT_LIST_HEAD(r);10011001+10021002+ if (disable_ipv6) {10031003+ printk(KERN_INFO10041004+ "IPv6: Loaded, but administratively disabled, "10051005+ "reboot required to enable\n");10061006+ goto out;10071007+ }1001100810021009 err = proto_register(&tcpv6_prot, 1);10031010 if (err)···10261011 if (err)10271012 goto out_unregister_udplite_proto;1028101310291029-10301030- /* Register the socket-side information for inet6_create. */10311031- for(r = &inetsw6[0]; r < &inetsw6[SOCK_MAX]; ++r)10321032- INIT_LIST_HEAD(r);1033101410341015 /* We MUST register RAW sockets before we create the ICMP6,10351016 * IGMP6, or NDISC control sockets.
+9-1
net/netlink/af_netlink.c
···10841084 return 0;10851085}1086108610871087+/**10881088+ * netlink_set_err - report error to broadcast listeners10891089+ * @ssk: the kernel netlink socket, as returned by netlink_kernel_create()10901090+ * @pid: the PID of a process that we want to skip (if any)10911091+ * @groups: the broadcast group that will notice the error10921092+ * @code: error code, must be negative (as usual in kernelspace)10931093+ */10871094void netlink_set_err(struct sock *ssk, u32 pid, u32 group, int code)10881095{10891096 struct netlink_set_err_data info;···11001093 info.exclude_sk = ssk;11011094 info.pid = pid;11021095 info.group = group;11031103- info.code = code;10961096+ /* sk->sk_err wants a positive error value */10971097+ info.code = -code;1104109811051099 read_lock(&nl_table_lock);11061100
+6-7
net/sched/act_police.c
···183183 if (R_tab == NULL)184184 goto failure;185185186186- if (!est && (ret == ACT_P_CREATED ||187187- !gen_estimator_active(&police->tcf_bstats,188188- &police->tcf_rate_est))) {189189- err = -EINVAL;190190- goto failure;191191- }192192-193186 if (parm->peakrate.rate) {194187 P_tab = qdisc_get_rtab(&parm->peakrate,195188 tb[TCA_POLICE_PEAKRATE]);···198205 &police->tcf_lock, est);199206 if (err)200207 goto failure_unlock;208208+ } else if (tb[TCA_POLICE_AVRATE] &&209209+ (ret == ACT_P_CREATED ||210210+ !gen_estimator_active(&police->tcf_bstats,211211+ &police->tcf_rate_est))) {212212+ err = -EINVAL;213213+ goto failure_unlock;201214 }202215203216 /* No failure allowed after this point */
+9-7
net/sctp/protocol.c
···717717static int sctp_ctl_sock_init(void)718718{719719 int err;720720- sa_family_t family;720720+ sa_family_t family = PF_INET;721721722722 if (sctp_get_pf_specific(PF_INET6))723723 family = PF_INET6;724724- else725725- family = PF_INET;726724727725 err = inet_ctl_sock_create(&sctp_ctl_sock, family,728726 SOCK_SEQPACKET, IPPROTO_SCTP, &init_net);727727+728728+ /* If IPv6 socket could not be created, try the IPv4 socket */729729+ if (err < 0 && family == PF_INET6)730730+ err = inet_ctl_sock_create(&sctp_ctl_sock, AF_INET,731731+ SOCK_SEQPACKET, IPPROTO_SCTP,732732+ &init_net);733733+729734 if (err < 0) {730735 printk(KERN_ERR731736 "SCTP: Failed to create the SCTP control socket.\n");···13271322out:13281323 return status;13291324err_v6_add_protocol:13301330- sctp_v6_del_protocol();13311331-err_add_protocol:13321325 sctp_v4_del_protocol();13261326+err_add_protocol:13331327 inet_ctl_sock_destroy(sctp_ctl_sock);13341328err_ctl_sock_init:13351329 sctp_v6_protosw_exit();···13391335 sctp_v4_pf_exit();13401336 sctp_v6_pf_exit();13411337 sctp_sysctl_unregister();13421342- list_del(&sctp_af_inet.list);13431338 free_pages((unsigned long)sctp_port_hashtable,13441339 get_order(sctp_port_hashsize *13451340 sizeof(struct sctp_bind_hashbucket)));···13861383 sctp_v4_pf_exit();1387138413881385 sctp_sysctl_unregister();13891389- list_del(&sctp_af_inet.list);1390138613911387 free_pages((unsigned long)sctp_assoc_hashtable,13921388 get_order(sctp_assoc_hashsize *
+33-21
net/sctp/sm_sideeffect.c
···787787 struct sctp_association *asoc,788788 struct sctp_chunk *chunk)789789{790790- struct sctp_operr_chunk *operr_chunk;791790 struct sctp_errhdr *err_hdr;791791+ struct sctp_ulpevent *ev;792792793793- operr_chunk = (struct sctp_operr_chunk *)chunk->chunk_hdr;794794- err_hdr = &operr_chunk->err_hdr;793793+ while (chunk->chunk_end > chunk->skb->data) {794794+ err_hdr = (struct sctp_errhdr *)(chunk->skb->data);795795796796- switch (err_hdr->cause) {797797- case SCTP_ERROR_UNKNOWN_CHUNK:798798- {799799- struct sctp_chunkhdr *unk_chunk_hdr;796796+ ev = sctp_ulpevent_make_remote_error(asoc, chunk, 0,797797+ GFP_ATOMIC);798798+ if (!ev)799799+ return;800800801801- unk_chunk_hdr = (struct sctp_chunkhdr *)err_hdr->variable;802802- switch (unk_chunk_hdr->type) {803803- /* ADDIP 4.1 A9) If the peer responds to an ASCONF with an804804- * ERROR chunk reporting that it did not recognized the ASCONF805805- * chunk type, the sender of the ASCONF MUST NOT send any806806- * further ASCONF chunks and MUST stop its T-4 timer.807807- */808808- case SCTP_CID_ASCONF:809809- asoc->peer.asconf_capable = 0;810810- sctp_add_cmd_sf(cmds, SCTP_CMD_TIMER_STOP,801801+ sctp_ulpq_tail_event(&asoc->ulpq, ev);802802+803803+ switch (err_hdr->cause) {804804+ case SCTP_ERROR_UNKNOWN_CHUNK:805805+ {806806+ sctp_chunkhdr_t *unk_chunk_hdr;807807+808808+ unk_chunk_hdr = (sctp_chunkhdr_t *)err_hdr->variable;809809+ switch (unk_chunk_hdr->type) {810810+ /* ADDIP 4.1 A9) If the peer responds to an ASCONF with811811+ * an ERROR chunk reporting that it did not recognized812812+ * the ASCONF chunk type, the sender of the ASCONF MUST813813+ * NOT send any further ASCONF chunks and MUST stop its814814+ * T-4 timer.815815+ */816816+ case SCTP_CID_ASCONF:817817+ if (asoc->peer.asconf_capable == 0)818818+ break;819819+820820+ asoc->peer.asconf_capable = 0;821821+ sctp_add_cmd_sf(cmds, SCTP_CMD_TIMER_STOP,811822 SCTP_TO(SCTP_EVENT_TIMEOUT_T4_RTO));823823+ break;824824+ default:825825+ break;826826+ }812827 break;828828+ }813829 default:814830 break;815831 }816816- break;817817- }818818- default:819819- break;820832 }821833}822834
···14981498 * looks for host based access restrictions14991499 *15001500 * This version will only be appropriate for really small15011501- * sets of single label hosts. Because of the masking15021502- * it cannot shortcut out on the first match. There are15031503- * numerious ways to address the problem, but none of them15041504- * have been applied here.15011501+ * sets of single label hosts.15051502 *15061503 * Returns the label of the far end or NULL if it's not special.15071504 */15081505static char *smack_host_label(struct sockaddr_in *sip)15091506{15101507 struct smk_netlbladdr *snp;15111511- char *bestlabel = NULL;15121508 struct in_addr *siap = &sip->sin_addr;15131513- struct in_addr *liap;15141514- struct in_addr *miap;15151515- struct in_addr bestmask;1516150915171510 if (siap->s_addr == 0)15181511 return NULL;1519151215201520- bestmask.s_addr = 0;15211521-15221513 for (snp = smack_netlbladdrs; snp != NULL; snp = snp->smk_next) {15231523- liap = &snp->smk_host.sin_addr;15241524- miap = &snp->smk_mask;15251514 /*15261526- * If the addresses match after applying the list entry mask15271527- * the entry matches the address. If it doesn't move along to15281528- * the next entry.15151515+ * we break after finding the first match because15161516+ * the list is sorted from longest to shortest mask15171517+ * so we have found the most specific match15291518 */15301530- if ((liap->s_addr & miap->s_addr) !=15311531- (siap->s_addr & miap->s_addr))15321532- continue;15331533- /*15341534- * If the list entry mask identifies a single address15351535- * it can't get any more specific.15361536- */15371537- if (miap->s_addr == 0xffffffff)15191519+ if ((&snp->smk_host.sin_addr)->s_addr ==15201520+ (siap->s_addr & (&snp->smk_mask)->s_addr)) {15381521 return snp->smk_label;15391539- /*15401540- * If the list entry mask is less specific than the best15411541- * already found this entry is uninteresting.15421542- */15431543- if ((miap->s_addr | bestmask.s_addr) == bestmask.s_addr)15441544- continue;15451545- /*15461546- * This is better than any entry found so far.15471547- */15481548- bestmask.s_addr = miap->s_addr;15491549- bestlabel = snp->smk_label;15221522+ }15501523 }1551152415521552- return bestlabel;15251525+ return NULL;15531526}1554152715551528/**
+49-15
security/smack/smackfs.c
···650650651651 return skp;652652}653653-/*654654-#define BEMASK 0x80000000655655-*/656656-#define BEMASK 0x00000001657653#define BEBITS (sizeof(__be32) * 8)658654659655/*···659663{660664 struct smk_netlbladdr *skp = (struct smk_netlbladdr *) v;661665 unsigned char *hp = (char *) &skp->smk_host.sin_addr.s_addr;662662- __be32 bebits;663663- int maskn = 0;666666+ int maskn;667667+ u32 temp_mask = be32_to_cpu(skp->smk_mask.s_addr);664668665665- for (bebits = BEMASK; bebits != 0; maskn++, bebits <<= 1)666666- if ((skp->smk_mask.s_addr & bebits) == 0)667667- break;669669+ for (maskn = 0; temp_mask; temp_mask <<= 1, maskn++);668670669671 seq_printf(s, "%u.%u.%u.%u/%d %s\n",670672 hp[0], hp[1], hp[2], hp[3], maskn, skp->smk_label);···696702}697703698704/**705705+ * smk_netlbladdr_insert706706+ * @new : netlabel to insert707707+ *708708+ * This helper insert netlabel in the smack_netlbladdrs list709709+ * sorted by netmask length (longest to smallest)710710+ */711711+static void smk_netlbladdr_insert(struct smk_netlbladdr *new)712712+{713713+ struct smk_netlbladdr *m;714714+715715+ if (smack_netlbladdrs == NULL) {716716+ smack_netlbladdrs = new;717717+ return;718718+ }719719+720720+ /* the comparison '>' is a bit hacky, but works */721721+ if (new->smk_mask.s_addr > smack_netlbladdrs->smk_mask.s_addr) {722722+ new->smk_next = smack_netlbladdrs;723723+ smack_netlbladdrs = new;724724+ return;725725+ }726726+ for (m = smack_netlbladdrs; m != NULL; m = m->smk_next) {727727+ if (m->smk_next == NULL) {728728+ m->smk_next = new;729729+ return;730730+ }731731+ if (new->smk_mask.s_addr > m->smk_next->smk_mask.s_addr) {732732+ new->smk_next = m->smk_next;733733+ m->smk_next = new;734734+ return;735735+ }736736+ }737737+}738738+739739+740740+/**699741 * smk_write_netlbladdr - write() for /smack/netlabel700742 * @filp: file pointer, not actually used701743 * @buf: where to get the data from···754724 struct netlbl_audit audit_info;755725 struct in_addr mask;756726 unsigned int m;757757- __be32 bebits = BEMASK;727727+ u32 mask_bits = (1<<31);758728 __be32 nsa;729729+ u32 temp_mask;759730760731 /*761732 * Must have privilege.···792761 if (sp == NULL)793762 return -EINVAL;794763795795- for (mask.s_addr = 0; m > 0; m--) {796796- mask.s_addr |= bebits;797797- bebits <<= 1;764764+ for (temp_mask = 0; m > 0; m--) {765765+ temp_mask |= mask_bits;766766+ mask_bits >>= 1;798767 }768768+ mask.s_addr = cpu_to_be32(temp_mask);769769+770770+ newname.sin_addr.s_addr &= mask.s_addr;799771 /*800772 * Only allow one writer at a time. Writes should be801773 * quite rare and small in any case.···806772 mutex_lock(&smk_netlbladdr_lock);807773808774 nsa = newname.sin_addr.s_addr;775775+ /* try to find if the prefix is already in the list */809776 for (skp = smack_netlbladdrs; skp != NULL; skp = skp->smk_next)810777 if (skp->smk_host.sin_addr.s_addr == nsa &&811778 skp->smk_mask.s_addr == mask.s_addr)···822787 rc = 0;823788 skp->smk_host.sin_addr.s_addr = newname.sin_addr.s_addr;824789 skp->smk_mask.s_addr = mask.s_addr;825825- skp->smk_next = smack_netlbladdrs;826790 skp->smk_label = sp;827827- smack_netlbladdrs = skp;791791+ smk_netlbladdr_insert(skp);828792 }829793 } else {830794 rc = netlbl_cfg_unlbl_static_del(&init_net, NULL,
+4-2
sound/pci/hda/patch_sigmatel.c
···12071207 "LFE Playback Volume",12081208 "Side Playback Volume",12091209 "Headphone Playback Volume",12101210- "Headphone Playback Volume",12101210+ "Headphone2 Playback Volume",12111211 "Speaker Playback Volume",12121212 "External Speaker Playback Volume",12131213 "Speaker2 Playback Volume",···12211221 "LFE Playback Switch",12221222 "Side Playback Switch",12231223 "Headphone Playback Switch",12241224- "Headphone Playback Switch",12241224+ "Headphone2 Playback Switch",12251225 "Speaker Playback Switch",12261226 "External Speaker Playback Switch",12271227 "Speaker2 Playback Switch",···35163516 if (! spec->autocfg.line_outs)35173517 return 0; /* can't find valid pin config */3518351835193519+#if 0 /* FIXME: temporarily disabled */35193520 /* If we have no real line-out pin and multiple hp-outs, HPs should35203521 * be set up as multi-channel outputs.35213522 */···35363535 spec->autocfg.line_out_type = AUTO_PIN_HP_OUT;35373536 spec->autocfg.hp_outs = 0;35383537 }35383538+#endif /* FIXME: temporarily disabled */35393539 if (spec->autocfg.mono_out_pin) {35403540 int dir = get_wcaps(codec, spec->autocfg.mono_out_pin) &35413541 (AC_WCAP_OUT_AMP | AC_WCAP_IN_AMP);