Merge https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

+5 -5

Documentation/ABI/stable/sysfs-driver-dma-ioatdma

··· 1 - What: sys/devices/pciXXXX:XX/0000:XX:XX.X/dma/dma<n>chan<n>/quickdata/cap 1 + What: /sys/devices/pciXXXX:XX/0000:XX:XX.X/dma/dma<n>chan<n>/quickdata/cap 2 2 Date: December 3, 2009 3 3 KernelVersion: 2.6.32 4 4 Contact: dmaengine@vger.kernel.org 5 5 Description: Capabilities the DMA supports.Currently there are DMA_PQ, DMA_PQ_VAL, 6 6 DMA_XOR,DMA_XOR_VAL,DMA_INTERRUPT. 7 7 8 - What: sys/devices/pciXXXX:XX/0000:XX:XX.X/dma/dma<n>chan<n>/quickdata/ring_active 8 + What: /sys/devices/pciXXXX:XX/0000:XX:XX.X/dma/dma<n>chan<n>/quickdata/ring_active 9 9 Date: December 3, 2009 10 10 KernelVersion: 2.6.32 11 11 Contact: dmaengine@vger.kernel.org 12 12 Description: The number of descriptors active in the ring. 13 13 14 - What: sys/devices/pciXXXX:XX/0000:XX:XX.X/dma/dma<n>chan<n>/quickdata/ring_size 14 + What: /sys/devices/pciXXXX:XX/0000:XX:XX.X/dma/dma<n>chan<n>/quickdata/ring_size 15 15 Date: December 3, 2009 16 16 KernelVersion: 2.6.32 17 17 Contact: dmaengine@vger.kernel.org 18 18 Description: Descriptor ring size, total number of descriptors available. 19 19 20 - What: sys/devices/pciXXXX:XX/0000:XX:XX.X/dma/dma<n>chan<n>/quickdata/version 20 + What: /sys/devices/pciXXXX:XX/0000:XX:XX.X/dma/dma<n>chan<n>/quickdata/version 21 21 Date: December 3, 2009 22 22 KernelVersion: 2.6.32 23 23 Contact: dmaengine@vger.kernel.org 24 24 Description: Version of ioatdma device. 25 25 26 - What: sys/devices/pciXXXX:XX/0000:XX:XX.X/dma/dma<n>chan<n>/quickdata/intr_coalesce 26 + What: /sys/devices/pciXXXX:XX/0000:XX:XX.X/dma/dma<n>chan<n>/quickdata/intr_coalesce 27 27 Date: August 8, 2017 28 28 KernelVersion: 4.14 29 29 Contact: dmaengine@vger.kernel.org

+1 -1

Documentation/ABI/testing/sysfs-class-net

··· 152 152 When an interface is under test, it cannot be expected 153 153 to pass packets as normal. 154 154 155 - What: /sys/clas/net/<iface>/duplex 155 + What: /sys/class/net/<iface>/duplex 156 156 Date: October 2009 157 157 KernelVersion: 2.6.33 158 158 Contact: netdev@vger.kernel.org

+10

Documentation/devicetree/bindings/interrupt-controller/ti,sci-inta.yaml

··· 32 32 | | vint | bit | | 0 |.....|63| vintx | 33 33 | +--------------+ +------------+ | 34 34 | | 35 + | Unmap | 36 + | +--------------+ | 37 + Unmapped events ---->| | umapidx |-------------------------> Globalevents 38 + | +--------------+ | 39 + | | 35 40 +-----------------------------------------+ 36 41 37 42 Configuration of these Intmap registers that maps global events to vint is ··· 74 69 "parent's input irq" specifies the base for parent irq 75 70 - description: | 76 71 "limit" specifies the limit for translation 72 + 73 + ti,unmapped-event-sources: 74 + $ref: /schemas/types.yaml#definitions/phandle-array 75 + description: 76 + Array of phandles to DMA controllers where the unmapped events originate. 77 77 78 78 required: 79 79 - compatible

+6

Documentation/filesystems/ext4/journal.rst

··· 256 256 - s\_padding2 257 257 - 258 258 * - 0x54 259 + - \_\_be32 260 + - s\_num\_fc\_blocks 261 + - Number of fast commit blocks in the journal. 262 + * - 0x58 259 263 - \_\_u32 260 264 - s\_padding[42] 261 265 - ··· 314 310 - This journal uses v3 of the checksum on-disk format. This is the same as 315 311 v2, but the journal block tag size is fixed regardless of the size of 316 312 block numbers. (JBD2\_FEATURE\_INCOMPAT\_CSUM\_V3) 313 + * - 0x20 314 + - Journal has fast commit blocks. (JBD2\_FEATURE\_INCOMPAT\_FAST\_COMMIT) 317 315 318 316 .. _jbd2_checksum_type: 319 317

+7

Documentation/filesystems/ext4/super.rst

··· 596 596 - Sparse Super Block, v2. If this flag is set, the SB field s\_backup\_bgs 597 597 points to the two block groups that contain backup superblocks 598 598 (COMPAT\_SPARSE\_SUPER2). 599 + * - 0x400 600 + - Fast commits supported. Although fast commits blocks are 601 + backward incompatible, fast commit blocks are not always 602 + present in the journal. If fast commit blocks are present in 603 + the journal, JBD2 incompat feature 604 + (JBD2\_FEATURE\_INCOMPAT\_FAST\_COMMIT) gets 605 + set (COMPAT\_FAST\_COMMIT). 599 606 600 607 .. _super_incompat: 601 608

+2 -4

Documentation/filesystems/journalling.rst

··· 136 136 ~~~~~~~~~~~~ 137 137 138 138 JBD2 to also allows you to perform file-system specific delta commits known as 139 - fast commits. In order to use fast commits, you first need to call 140 - :c:func:`jbd2_fc_init` and tell how many blocks at the end of journal 141 - area should be reserved for fast commits. Along with that, you will also need 142 - to set following callbacks that perform correspodning work: 139 + fast commits. In order to use fast commits, you will need to set following 140 + callbacks that perform correspodning work: 143 141 144 142 `journal->j_fc_cleanup_cb`: Cleanup function called after every full commit and 145 143 fast commit.

+4 -4

Documentation/firmware-guide/acpi/acpi-lid.rst

··· 19 19 20 20 For most platforms, both the _LID method and the lid notifications are 21 21 reliable. However, there are exceptions. In order to work with these 22 - exceptional buggy platforms, special restrictions and expections should be 22 + exceptional buggy platforms, special restrictions and exceptions should be 23 23 taken into account. This document describes the restrictions and the 24 - expections of the Linux ACPI lid device driver. 24 + exceptions of the Linux ACPI lid device driver. 25 25 26 26 27 27 Restrictions of the returning value of the _LID control method ··· 46 46 trigger some system power saving operations on Windows. Since it is fully 47 47 tested, it is reliable from all AML tables. 48 48 49 - Expections for the userspace users of the ACPI lid device driver 49 + Exceptions for the userspace users of the ACPI lid device driver 50 50 ================================================================ 51 51 52 52 The ACPI button driver exports the lid state to the userspace via the ··· 100 100 C. button.lid_init_state=ignore: 101 101 When this option is specified, the ACPI button driver never reports the 102 102 initial lid state and there is a compensation mechanism implemented to 103 - ensure that the reliable "closed" notifications can always be delievered 103 + ensure that the reliable "closed" notifications can always be delivered 104 104 to the userspace by always pairing "closed" input events with complement 105 105 "opened" input events. But there is still no guarantee that the "opened" 106 106 notifications can be delivered to the userspace when the lid is actually

+41 -14

Documentation/firmware-guide/acpi/gpio-properties.rst

··· 20 20 21 21 Name (_CRS, ResourceTemplate () 22 22 { 23 - GpioIo (Exclusive, PullUp, 0, 0, IoRestrictionInputOnly, 23 + GpioIo (Exclusive, PullUp, 0, 0, IoRestrictionOutputOnly, 24 24 "\\_SB.GPO0", 0, ResourceConsumer) {15} 25 - GpioIo (Exclusive, PullUp, 0, 0, IoRestrictionInputOnly, 25 + GpioIo (Exclusive, PullUp, 0, 0, IoRestrictionOutputOnly, 26 26 "\\_SB.GPO0", 0, ResourceConsumer) {27, 31} 27 27 }) 28 28 ··· 49 49 pin 50 50 Pin in the GpioIo()/GpioInt() resource. Typically this is zero. 51 51 active_low 52 - If 1 the GPIO is marked as active_low. 52 + If 1, the GPIO is marked as active_low. 53 53 54 54 Since ACPI GpioIo() resource does not have a field saying whether it is 55 55 active low or high, the "active_low" argument can be used here. Setting 56 56 it to 1 marks the GPIO as active low. 57 57 58 + Note, active_low in _DSD does not make sense for GpioInt() resource and 59 + must be 0. GpioInt() resource has its own means of defining it. 60 + 58 61 In our Bluetooth example the "reset-gpios" refers to the second GpioIo() 59 62 resource, second pin in that resource with the GPIO number of 31. 63 + 64 + The GpioIo() resource unfortunately doesn't explicitly provide an initial 65 + state of the output pin which driver should use during its initialization. 66 + 67 + Linux tries to use common sense here and derives the state from the bias 68 + and polarity settings. The table below shows the expectations: 69 + 70 + ========= ============= ============== 71 + Pull Bias Polarity Requested... 72 + ========= ============= ============== 73 + Implicit x AS IS (assumed firmware configured for us) 74 + Explicit x (no _DSD) as Pull Bias (Up == High, Down == Low), 75 + assuming non-active (Polarity = !Pull Bias) 76 + Down Low as low, assuming active 77 + Down High as low, assuming non-active 78 + Up Low as high, assuming non-active 79 + Up High as high, assuming active 80 + ========= ============= ============== 81 + 82 + That said, for our above example the both GPIOs, since the bias setting 83 + is explicit and _DSD is present, will be treated as active with a high 84 + polarity and Linux will configure the pins in this state until a driver 85 + reprograms them differently. 60 86 61 87 It is possible to leave holes in the array of GPIOs. This is useful in 62 88 cases like with SPI host controllers where some chip selects may be ··· 138 112 Package () { 139 113 "gpio-line-names", 140 114 Package () { 141 - "SPI0_CS_N", "EXP2_INT", "MUX6_IO", "UART0_RXD", "MUX7_IO", 142 - "LVL_C_A1", "MUX0_IO", "SPI1_MISO" 115 + "SPI0_CS_N", "EXP2_INT", "MUX6_IO", "UART0_RXD", 116 + "MUX7_IO", "LVL_C_A1", "MUX0_IO", "SPI1_MISO", 143 117 } 144 118 } 145 119 ··· 163 137 mapping between those names and the ACPI GPIO resources corresponding to them. 164 138 165 139 To do that, the driver needs to define a mapping table as a NULL-terminated 166 - array of struct acpi_gpio_mapping objects that each contain a name, a pointer 140 + array of struct acpi_gpio_mapping objects that each contains a name, a pointer 167 141 to an array of line data (struct acpi_gpio_params) objects and the size of that 168 142 array. Each struct acpi_gpio_params object consists of three fields, 169 143 crs_entry_index, line_index, active_low, representing the index of the target ··· 180 154 static const struct acpi_gpio_mapping bluetooth_acpi_gpios[] = { 181 155 { "reset-gpios", &reset_gpio, 1 }, 182 156 { "shutdown-gpios", &shutdown_gpio, 1 }, 183 - { }, 157 + { } 184 158 }; 185 159 186 160 Next, the mapping table needs to be passed as the second argument to 187 - acpi_dev_add_driver_gpios() that will register it with the ACPI device object 188 - pointed to by its first argument. That should be done in the driver's .probe() 189 - routine. On removal, the driver should unregister its GPIO mapping table by 161 + acpi_dev_add_driver_gpios() or its managed analogue that will 162 + register it with the ACPI device object pointed to by its first 163 + argument. That should be done in the driver's .probe() routine. 164 + On removal, the driver should unregister its GPIO mapping table by 190 165 calling acpi_dev_remove_driver_gpios() on the ACPI device object where that 191 166 table was previously registered. 192 167 ··· 218 191 but since there is no way to know the mapping between "reset" and 219 192 the GpioIo() in _CRS desc will hold ERR_PTR(-ENOENT). 220 193 221 - The driver author can solve this by passing the mapping explictly 222 - (the recommended way and documented in the above chapter). 194 + The driver author can solve this by passing the mapping explicitly 195 + (this is the recommended way and it's documented in the above chapter). 223 196 224 197 The ACPI GPIO mapping tables should not contaminate drivers that are not 225 198 knowing about which exact device they are servicing on. It implies that 226 - the ACPI GPIO mapping tables are hardly linked to ACPI ID and certain 199 + the ACPI GPIO mapping tables are hardly linked to an ACPI ID and certain 227 200 objects, as listed in the above chapter, of the device in question. 228 201 229 202 Getting GPIO descriptor ··· 256 229 Be aware that gpiod_get_index() in cases 1 and 2, assuming that there 257 230 are two versions of ACPI device description provided and no mapping is 258 231 present in the driver, will return different resources. That's why a 259 - certain driver has to handle them carefully as explained in previous 232 + certain driver has to handle them carefully as explained in the previous 260 233 chapter.

+1 -1

Documentation/firmware-guide/acpi/method-tracing.rst

··· 98 98 [ 0.188903] exdebug-0398 ex_trace_point : Method End [0xf58394d8:\_SB.PCI0.LPCB.ECOK] execution. 99 99 100 100 Developers can utilize these special log entries to track the AML 101 - interpretion, thus can aid issue debugging and performance tuning. Note 101 + interpretation, thus can aid issue debugging and performance tuning. Note 102 102 that, as the "AML tracer" logs are implemented via ACPI_DEBUG_PRINT() 103 103 macro, CONFIG_ACPI_DEBUG is also required to be enabled for enabling 104 104 "AML tracer" logs.

+1

Documentation/leds/index.rst

··· 25 25 leds-lp5562 26 26 leds-lp55xx 27 27 leds-mlxcpld 28 + leds-sc27xx

-1

Documentation/misc-devices/index.rst

··· 24 24 isl29003 25 25 lis3lv02d 26 26 max6875 27 - mic/index 28 27 pci-endpoint-test 29 28 spear-pcie-gadget 30 29 uacce

+2 -2

Documentation/networking/netdev-FAQ.rst

··· 110 110 Q: How can I tell whether it got merged? 111 111 A: Start by looking at the main patchworks queue for netdev: 112 112 113 - http://patchwork.ozlabs.org/project/netdev/list/ 113 + https://patchwork.kernel.org/project/netdevbpf/list/ 114 114 115 115 The "State" field will tell you exactly where things are at with your 116 116 patch. ··· 152 152 153 153 There is a patchworks queue that you can see here: 154 154 155 - http://patchwork.ozlabs.org/bundle/davem/stable/?state=* 155 + https://patchwork.kernel.org/bundle/netdev/stable/?state=* 156 156 157 157 It contains the patches which Dave has selected, but not yet handed off 158 158 to Greg. If Greg already has the patch, then it will be here:

+2 -2

Documentation/networking/phy.rst

··· 247 247 speeds (see below.) 248 248 249 249 ``PHY_INTERFACE_MODE_2500BASEX`` 250 - This defines a variant of 1000BASE-X which is clocked 2.5 times faster, 251 - than the 802.3 standard giving a fixed bit rate of 3.125Gbaud. 250 + This defines a variant of 1000BASE-X which is clocked 2.5 times as fast 251 + as the 802.3 standard, giving a fixed bit rate of 3.125Gbaud. 252 252 253 253 ``PHY_INTERFACE_MODE_SGMII`` 254 254 This is used for Cisco SGMII, which is a modification of 1000BASE-X

+1 -1

Documentation/process/stable-kernel-rules.rst

··· 39 39 submission guidelines as described in 40 40 :ref:`Documentation/networking/netdev-FAQ.rst <netdev-FAQ>` 41 41 after first checking the stable networking queue at 42 - https://patchwork.ozlabs.org/bundle/davem/stable/?series=&submitter=&state=*&q=&archive= 42 + https://patchwork.kernel.org/bundle/netdev/stable/?state=* 43 43 to ensure the requested patch is not already queued up. 44 44 - Security patches should not be handled (solely) by the -stable review 45 45 process but should follow the procedures in

+1 -1

Documentation/translations/it_IT/process/stable-kernel-rules.rst

··· 46 46 :ref:`Documentation/translations/it_IT/networking/netdev-FAQ.rst <it_netdev-FAQ>`; 47 47 ma solo dopo aver verificato al seguente indirizzo che la patch non sia 48 48 già in coda: 49 - https://patchwork.ozlabs.org/bundle/davem/stable/?series=&submitter=&state=*&q=&archive= 49 + https://patchwork.kernel.org/bundle/netdev/stable/?state=* 50 50 - Una patch di sicurezza non dovrebbero essere gestite (solamente) dal processo 51 51 di revisione -stable, ma dovrebbe seguire le procedure descritte in 52 52 :ref:`Documentation/translations/it_IT/admin-guide/security-bugs.rst <it_securitybugs>`.

+2 -3

Documentation/virt/kvm/api.rst

··· 6367 6367 instead get bounced to user space through the KVM_EXIT_X86_RDMSR and 6368 6368 KVM_EXIT_X86_WRMSR exit notifications. 6369 6369 6370 - 8.25 KVM_X86_SET_MSR_FILTER 6370 + 8.27 KVM_X86_SET_MSR_FILTER 6371 6371 --------------------------- 6372 6372 6373 6373 :Architectures: x86 ··· 6381 6381 trap and emulate MSRs that are outside of the scope of KVM as well as 6382 6382 limit the attack surface on KVM's MSR emulation code. 6383 6383 6384 - 6385 - 8.26 KVM_CAP_ENFORCE_PV_CPUID 6384 + 8.28 KVM_CAP_ENFORCE_PV_CPUID 6386 6385 ----------------------------- 6387 6386 6388 6387 Architectures: x86

+14 -16

MAINTAINERS

··· 1279 1279 L: netdev@vger.kernel.org 1280 1280 S: Supported 1281 1281 W: https://www.marvell.com/ 1282 - Q: http://patchwork.ozlabs.org/project/netdev/list/ 1282 + Q: https://patchwork.kernel.org/project/netdevbpf/list/ 1283 1283 F: Documentation/networking/device_drivers/ethernet/aquantia/atlantic.rst 1284 1284 F: drivers/net/ethernet/aquantia/atlantic/ 1285 1285 ··· 6614 6614 T: git git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git 6615 6615 F: Documentation/filesystems/ext4/ 6616 6616 F: fs/ext4/ 6617 + F: include/trace/events/ext4.h 6617 6618 6618 6619 Extended Verification Module (EVM) 6619 6620 M: Mimi Zohar <zohar@linux.ibm.com> ··· 8839 8838 W: http://www.intel.com/support/feedback.htm 8840 8839 W: http://e1000.sourceforge.net/ 8841 8840 Q: http://patchwork.ozlabs.org/project/intel-wired-lan/list/ 8842 - T: git git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queue.git 8843 - T: git git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue.git 8841 + T: git git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue.git 8842 + T: git git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue.git 8844 8843 F: Documentation/networking/device_drivers/ethernet/intel/ 8845 8844 F: drivers/net/ethernet/intel/ 8846 8845 F: drivers/net/ethernet/intel/*/ ··· 11163 11162 F: drivers/input/touchscreen/melfas_mip4.c 11164 11163 11165 11164 MELLANOX BLUEFIELD I2C DRIVER 11166 - M: Khalil Blaiech <kblaiech@mellanox.com> 11165 + M: Khalil Blaiech <kblaiech@nvidia.com> 11167 11166 L: linux-i2c@vger.kernel.org 11168 11167 S: Supported 11169 11168 F: drivers/i2c/busses/i2c-mlxbf.c ··· 11173 11172 L: netdev@vger.kernel.org 11174 11173 S: Supported 11175 11174 W: http://www.mellanox.com 11176 - Q: http://patchwork.ozlabs.org/project/netdev/list/ 11175 + Q: https://patchwork.kernel.org/project/netdevbpf/list/ 11177 11176 F: drivers/net/ethernet/mellanox/mlx4/en_* 11178 11177 11179 11178 MELLANOX ETHERNET DRIVER (mlx5e) ··· 11181 11180 L: netdev@vger.kernel.org 11182 11181 S: Supported 11183 11182 W: http://www.mellanox.com 11184 - Q: http://patchwork.ozlabs.org/project/netdev/list/ 11183 + Q: https://patchwork.kernel.org/project/netdevbpf/list/ 11185 11184 F: drivers/net/ethernet/mellanox/mlx5/core/en_* 11186 11185 11187 11186 MELLANOX ETHERNET INNOVA DRIVERS ··· 11189 11188 L: netdev@vger.kernel.org 11190 11189 S: Supported 11191 11190 W: http://www.mellanox.com 11192 - Q: http://patchwork.ozlabs.org/project/netdev/list/ 11191 + Q: https://patchwork.kernel.org/project/netdevbpf/list/ 11193 11192 F: drivers/net/ethernet/mellanox/mlx5/core/accel/* 11194 11193 F: drivers/net/ethernet/mellanox/mlx5/core/en_accel/* 11195 11194 F: drivers/net/ethernet/mellanox/mlx5/core/fpga/* ··· 11201 11200 L: netdev@vger.kernel.org 11202 11201 S: Supported 11203 11202 W: http://www.mellanox.com 11204 - Q: http://patchwork.ozlabs.org/project/netdev/list/ 11203 + Q: https://patchwork.kernel.org/project/netdevbpf/list/ 11205 11204 F: drivers/net/ethernet/mellanox/mlxsw/ 11206 11205 F: tools/testing/selftests/drivers/net/mlxsw/ 11207 11206 ··· 11210 11209 L: netdev@vger.kernel.org 11211 11210 S: Supported 11212 11211 W: http://www.mellanox.com 11213 - Q: http://patchwork.ozlabs.org/project/netdev/list/ 11212 + Q: https://patchwork.kernel.org/project/netdevbpf/list/ 11214 11213 F: drivers/net/ethernet/mellanox/mlxfw/ 11215 11214 11216 11215 MELLANOX HARDWARE PLATFORM SUPPORT ··· 11229 11228 L: linux-rdma@vger.kernel.org 11230 11229 S: Supported 11231 11230 W: http://www.mellanox.com 11232 - Q: http://patchwork.ozlabs.org/project/netdev/list/ 11231 + Q: https://patchwork.kernel.org/project/netdevbpf/list/ 11233 11232 F: drivers/net/ethernet/mellanox/mlx4/ 11234 11233 F: include/linux/mlx4/ 11235 11234 ··· 11250 11249 L: linux-rdma@vger.kernel.org 11251 11250 S: Supported 11252 11251 W: http://www.mellanox.com 11253 - Q: http://patchwork.ozlabs.org/project/netdev/list/ 11252 + Q: https://patchwork.kernel.org/project/netdevbpf/list/ 11254 11253 F: Documentation/networking/device_drivers/ethernet/mellanox/ 11255 11254 F: drivers/net/ethernet/mellanox/mlx5/core/ 11256 11255 F: include/linux/mlx5/ ··· 12130 12129 L: netdev@vger.kernel.org 12131 12130 S: Maintained 12132 12131 W: http://www.linuxfoundation.org/en/Net 12133 - Q: http://patchwork.ozlabs.org/project/netdev/list/ 12132 + Q: https://patchwork.kernel.org/project/netdevbpf/list/ 12134 12133 T: git git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git 12135 12134 T: git git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git 12136 12135 F: Documentation/devicetree/bindings/net/ ··· 12175 12174 L: netdev@vger.kernel.org 12176 12175 S: Maintained 12177 12176 W: http://www.linuxfoundation.org/en/Net 12178 - Q: http://patchwork.ozlabs.org/project/netdev/list/ 12177 + Q: https://patchwork.kernel.org/project/netdevbpf/list/ 12179 12178 B: mailto:netdev@vger.kernel.org 12180 12179 T: git git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git 12181 12180 T: git git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git ··· 15247 15246 S390 IUCV NETWORK LAYER 15248 15247 M: Julian Wiedmann <jwi@linux.ibm.com> 15249 15248 M: Karsten Graul <kgraul@linux.ibm.com> 15250 - M: Ursula Braun <ubraun@linux.ibm.com> 15251 15249 L: linux-s390@vger.kernel.org 15252 15250 S: Supported 15253 15251 W: http://www.ibm.com/developerworks/linux/linux390/ ··· 15257 15257 S390 NETWORK DRIVERS 15258 15258 M: Julian Wiedmann <jwi@linux.ibm.com> 15259 15259 M: Karsten Graul <kgraul@linux.ibm.com> 15260 - M: Ursula Braun <ubraun@linux.ibm.com> 15261 15260 L: linux-s390@vger.kernel.org 15262 15261 S: Supported 15263 15262 W: http://www.ibm.com/developerworks/linux/linux390/ ··· 15827 15828 F: drivers/misc/sgi-xp/ 15828 15829 15829 15830 SHARED MEMORY COMMUNICATIONS (SMC) SOCKETS 15830 - M: Ursula Braun <ubraun@linux.ibm.com> 15831 15831 M: Karsten Graul <kgraul@linux.ibm.com> 15832 15832 L: linux-s390@vger.kernel.org 15833 15833 S: Supported

+1 -1

Makefile

··· 2 2 VERSION = 5 3 3 PATCHLEVEL = 10 4 4 SUBLEVEL = 0 5 - EXTRAVERSION = -rc2 5 + EXTRAVERSION = -rc3 6 6 NAME = Kleptomaniac Octopus 7 7 8 8 # *DOCUMENTATION*

+1

arch/arm64/boot/dts/freescale/fsl-ls1028a-kontron-sl28.dts

··· 75 75 &enetc_port0 { 76 76 phy-handle = <&phy0>; 77 77 phy-connection-type = "sgmii"; 78 + managed = "in-band-status"; 78 79 status = "okay"; 79 80 80 81 mdio {

+2

arch/arm64/kvm/mmu.c

··· 788 788 } 789 789 790 790 switch (vma_shift) { 791 + #ifndef __PAGETABLE_PMD_FOLDED 791 792 case PUD_SHIFT: 792 793 if (fault_supports_stage2_huge_mapping(memslot, hva, PUD_SIZE)) 793 794 break; 794 795 fallthrough; 796 + #endif 795 797 case CONT_PMD_SHIFT: 796 798 vma_shift = PMD_SHIFT; 797 799 fallthrough;

+33 -75

arch/arm64/kvm/sys_regs.c

··· 1069 1069 static unsigned int ptrauth_visibility(const struct kvm_vcpu *vcpu, 1070 1070 const struct sys_reg_desc *rd) 1071 1071 { 1072 - return vcpu_has_ptrauth(vcpu) ? 0 : REG_HIDDEN_USER | REG_HIDDEN_GUEST; 1072 + return vcpu_has_ptrauth(vcpu) ? 0 : REG_HIDDEN; 1073 1073 } 1074 1074 1075 1075 #define __PTRAUTH_KEY(k) \ ··· 1153 1153 return val; 1154 1154 } 1155 1155 1156 + static unsigned int id_visibility(const struct kvm_vcpu *vcpu, 1157 + const struct sys_reg_desc *r) 1158 + { 1159 + u32 id = sys_reg((u32)r->Op0, (u32)r->Op1, 1160 + (u32)r->CRn, (u32)r->CRm, (u32)r->Op2); 1161 + 1162 + switch (id) { 1163 + case SYS_ID_AA64ZFR0_EL1: 1164 + if (!vcpu_has_sve(vcpu)) 1165 + return REG_RAZ; 1166 + break; 1167 + } 1168 + 1169 + return 0; 1170 + } 1171 + 1156 1172 /* cpufeature ID register access trap handlers */ 1157 1173 1158 1174 static bool __access_id_reg(struct kvm_vcpu *vcpu, ··· 1187 1171 struct sys_reg_params *p, 1188 1172 const struct sys_reg_desc *r) 1189 1173 { 1190 - return __access_id_reg(vcpu, p, r, false); 1174 + bool raz = sysreg_visible_as_raz(vcpu, r); 1175 + 1176 + return __access_id_reg(vcpu, p, r, raz); 1191 1177 } 1192 1178 1193 1179 static bool access_raz_id_reg(struct kvm_vcpu *vcpu, ··· 1210 1192 if (vcpu_has_sve(vcpu)) 1211 1193 return 0; 1212 1194 1213 - return REG_HIDDEN_USER | REG_HIDDEN_GUEST; 1214 - } 1215 - 1216 - /* Visibility overrides for SVE-specific ID registers */ 1217 - static unsigned int sve_id_visibility(const struct kvm_vcpu *vcpu, 1218 - const struct sys_reg_desc *rd) 1219 - { 1220 - if (vcpu_has_sve(vcpu)) 1221 - return 0; 1222 - 1223 - return REG_HIDDEN_USER; 1224 - } 1225 - 1226 - /* Generate the emulated ID_AA64ZFR0_EL1 value exposed to the guest */ 1227 - static u64 guest_id_aa64zfr0_el1(const struct kvm_vcpu *vcpu) 1228 - { 1229 - if (!vcpu_has_sve(vcpu)) 1230 - return 0; 1231 - 1232 - return read_sanitised_ftr_reg(SYS_ID_AA64ZFR0_EL1); 1233 - } 1234 - 1235 - static bool access_id_aa64zfr0_el1(struct kvm_vcpu *vcpu, 1236 - struct sys_reg_params *p, 1237 - const struct sys_reg_desc *rd) 1238 - { 1239 - if (p->is_write) 1240 - return write_to_read_only(vcpu, p, rd); 1241 - 1242 - p->regval = guest_id_aa64zfr0_el1(vcpu); 1243 - return true; 1244 - } 1245 - 1246 - static int get_id_aa64zfr0_el1(struct kvm_vcpu *vcpu, 1247 - const struct sys_reg_desc *rd, 1248 - const struct kvm_one_reg *reg, void __user *uaddr) 1249 - { 1250 - u64 val; 1251 - 1252 - if (WARN_ON(!vcpu_has_sve(vcpu))) 1253 - return -ENOENT; 1254 - 1255 - val = guest_id_aa64zfr0_el1(vcpu); 1256 - return reg_to_user(uaddr, &val, reg->id); 1257 - } 1258 - 1259 - static int set_id_aa64zfr0_el1(struct kvm_vcpu *vcpu, 1260 - const struct sys_reg_desc *rd, 1261 - const struct kvm_one_reg *reg, void __user *uaddr) 1262 - { 1263 - const u64 id = sys_reg_to_index(rd); 1264 - int err; 1265 - u64 val; 1266 - 1267 - if (WARN_ON(!vcpu_has_sve(vcpu))) 1268 - return -ENOENT; 1269 - 1270 - err = reg_from_user(&val, uaddr, id); 1271 - if (err) 1272 - return err; 1273 - 1274 - /* This is what we mean by invariant: you can't change it. */ 1275 - if (val != guest_id_aa64zfr0_el1(vcpu)) 1276 - return -EINVAL; 1277 - 1278 - return 0; 1195 + return REG_HIDDEN; 1279 1196 } 1280 1197 1281 1198 /* ··· 1252 1299 static int get_id_reg(struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd, 1253 1300 const struct kvm_one_reg *reg, void __user *uaddr) 1254 1301 { 1255 - return __get_id_reg(vcpu, rd, uaddr, false); 1302 + bool raz = sysreg_visible_as_raz(vcpu, rd); 1303 + 1304 + return __get_id_reg(vcpu, rd, uaddr, raz); 1256 1305 } 1257 1306 1258 1307 static int set_id_reg(struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd, 1259 1308 const struct kvm_one_reg *reg, void __user *uaddr) 1260 1309 { 1261 - return __set_id_reg(vcpu, rd, uaddr, false); 1310 + bool raz = sysreg_visible_as_raz(vcpu, rd); 1311 + 1312 + return __set_id_reg(vcpu, rd, uaddr, raz); 1262 1313 } 1263 1314 1264 1315 static int get_raz_id_reg(struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd, ··· 1354 1397 .access = access_id_reg, \ 1355 1398 .get_user = get_id_reg, \ 1356 1399 .set_user = set_id_reg, \ 1400 + .visibility = id_visibility, \ 1357 1401 } 1358 1402 1359 1403 /* ··· 1476 1518 ID_SANITISED(ID_AA64PFR1_EL1), 1477 1519 ID_UNALLOCATED(4,2), 1478 1520 ID_UNALLOCATED(4,3), 1479 - { SYS_DESC(SYS_ID_AA64ZFR0_EL1), access_id_aa64zfr0_el1, .get_user = get_id_aa64zfr0_el1, .set_user = set_id_aa64zfr0_el1, .visibility = sve_id_visibility }, 1521 + ID_SANITISED(ID_AA64ZFR0_EL1), 1480 1522 ID_UNALLOCATED(4,5), 1481 1523 ID_UNALLOCATED(4,6), 1482 1524 ID_UNALLOCATED(4,7), ··· 2143 2185 trace_kvm_sys_access(*vcpu_pc(vcpu), params, r); 2144 2186 2145 2187 /* Check for regs disabled by runtime config */ 2146 - if (sysreg_hidden_from_guest(vcpu, r)) { 2188 + if (sysreg_hidden(vcpu, r)) { 2147 2189 kvm_inject_undefined(vcpu); 2148 2190 return; 2149 2191 } ··· 2642 2684 return get_invariant_sys_reg(reg->id, uaddr); 2643 2685 2644 2686 /* Check for regs disabled by runtime config */ 2645 - if (sysreg_hidden_from_user(vcpu, r)) 2687 + if (sysreg_hidden(vcpu, r)) 2646 2688 return -ENOENT; 2647 2689 2648 2690 if (r->get_user) ··· 2667 2709 return set_invariant_sys_reg(reg->id, uaddr); 2668 2710 2669 2711 /* Check for regs disabled by runtime config */ 2670 - if (sysreg_hidden_from_user(vcpu, r)) 2712 + if (sysreg_hidden(vcpu, r)) 2671 2713 return -ENOENT; 2672 2714 2673 2715 if (r->set_user) ··· 2738 2780 if (!(rd->reg || rd->get_user)) 2739 2781 return 0; 2740 2782 2741 - if (sysreg_hidden_from_user(vcpu, rd)) 2783 + if (sysreg_hidden(vcpu, rd)) 2742 2784 return 0; 2743 2785 2744 2786 if (!copy_reg_to_user(rd, uind))

+8 -8

arch/arm64/kvm/sys_regs.h

··· 59 59 const struct sys_reg_desc *rd); 60 60 }; 61 61 62 - #define REG_HIDDEN_USER (1 << 0) /* hidden from userspace ioctls */ 63 - #define REG_HIDDEN_GUEST (1 << 1) /* hidden from guest */ 62 + #define REG_HIDDEN (1 << 0) /* hidden from userspace and guest */ 63 + #define REG_RAZ (1 << 1) /* RAZ from userspace and guest */ 64 64 65 65 static __printf(2, 3) 66 66 inline void print_sys_reg_msg(const struct sys_reg_params *p, ··· 111 111 __vcpu_sys_reg(vcpu, r->reg) = r->val; 112 112 } 113 113 114 - static inline bool sysreg_hidden_from_guest(const struct kvm_vcpu *vcpu, 115 - const struct sys_reg_desc *r) 114 + static inline bool sysreg_hidden(const struct kvm_vcpu *vcpu, 115 + const struct sys_reg_desc *r) 116 116 { 117 117 if (likely(!r->visibility)) 118 118 return false; 119 119 120 - return r->visibility(vcpu, r) & REG_HIDDEN_GUEST; 120 + return r->visibility(vcpu, r) & REG_HIDDEN; 121 121 } 122 122 123 - static inline bool sysreg_hidden_from_user(const struct kvm_vcpu *vcpu, 124 - const struct sys_reg_desc *r) 123 + static inline bool sysreg_visible_as_raz(const struct kvm_vcpu *vcpu, 124 + const struct sys_reg_desc *r) 125 125 { 126 126 if (likely(!r->visibility)) 127 127 return false; 128 128 129 - return r->visibility(vcpu, r) & REG_HIDDEN_USER; 129 + return r->visibility(vcpu, r) & REG_RAZ; 130 130 } 131 131 132 132 static inline int cmp_sys_reg(const struct sys_reg_desc *i1,

+1 -1

arch/powerpc/include/asm/nohash/32/kup-8xx.h

··· 63 63 static inline bool 64 64 bad_kuap_fault(struct pt_regs *regs, unsigned long address, bool is_write) 65 65 { 66 - return WARN(!((regs->kuap ^ MD_APG_KUAP) & 0xf0000000), 66 + return WARN(!((regs->kuap ^ MD_APG_KUAP) & 0xff000000), 67 67 "Bug: fault blocked by AP register !"); 68 68 } 69 69

+14 -31

arch/powerpc/include/asm/nohash/32/mmu-8xx.h

··· 33 33 * respectively NA for All or X for Supervisor and no access for User. 34 34 * Then we use the APG to say whether accesses are according to Page rules or 35 35 * "all Supervisor" rules (Access to all) 36 - * Therefore, we define 2 APG groups. lsb is _PMD_USER 37 - * 0 => Kernel => 01 (all accesses performed according to page definition) 38 - * 1 => User => 00 (all accesses performed as supervisor iaw page definition) 39 - * 2-15 => Not Used 36 + * _PAGE_ACCESSED is also managed via APG. When _PAGE_ACCESSED is not set, say 37 + * "all User" rules, that will lead to NA for all. 38 + * Therefore, we define 4 APG groups. lsb is _PAGE_ACCESSED 39 + * 0 => Kernel => 11 (all accesses performed according as user iaw page definition) 40 + * 1 => Kernel+Accessed => 01 (all accesses performed according to page definition) 41 + * 2 => User => 11 (all accesses performed according as user iaw page definition) 42 + * 3 => User+Accessed => 00 (all accesses performed as supervisor iaw page definition) for INIT 43 + * => 10 (all accesses performed according to swaped page definition) for KUEP 44 + * 4-15 => Not Used 40 45 */ 41 - #define MI_APG_INIT 0x40000000 42 - 43 - /* 44 - * 0 => Kernel => 01 (all accesses performed according to page definition) 45 - * 1 => User => 10 (all accesses performed according to swaped page definition) 46 - * 2-15 => Not Used 47 - */ 48 - #define MI_APG_KUEP 0x60000000 46 + #define MI_APG_INIT 0xdc000000 47 + #define MI_APG_KUEP 0xde000000 49 48 50 49 /* The effective page number register. When read, contains the information 51 50 * about the last instruction TLB miss. When MI_RPN is written, bits in ··· 105 106 #define MD_Ks 0x80000000 /* Should not be set */ 106 107 #define MD_Kp 0x40000000 /* Should always be set */ 107 108 108 - /* 109 - * All pages' PP data bits are set to either 000 or 011 or 001, which means 110 - * respectively RW for Supervisor and no access for User, or RO for 111 - * Supervisor and no access for user and NA for ALL. 112 - * Then we use the APG to say whether accesses are according to Page rules or 113 - * "all Supervisor" rules (Access to all) 114 - * Therefore, we define 2 APG groups. lsb is _PMD_USER 115 - * 0 => Kernel => 01 (all accesses performed according to page definition) 116 - * 1 => User => 00 (all accesses performed as supervisor iaw page definition) 117 - * 2-15 => Not Used 118 - */ 119 - #define MD_APG_INIT 0x40000000 120 - 121 - /* 122 - * 0 => No user => 01 (all accesses performed according to page definition) 123 - * 1 => User => 10 (all accesses performed according to swaped page definition) 124 - * 2-15 => Not Used 125 - */ 126 - #define MD_APG_KUAP 0x60000000 109 + /* See explanation above at the definition of MI_APG_INIT */ 110 + #define MD_APG_INIT 0xdc000000 111 + #define MD_APG_KUAP 0xde000000 127 112 128 113 /* The effective page number register. When read, contains the information 129 114 * about the last instruction TLB miss. When MD_RPN is written, bits in

+5 -4

arch/powerpc/include/asm/nohash/32/pte-8xx.h

··· 39 39 * into the TLB. 40 40 */ 41 41 #define _PAGE_GUARDED 0x0010 /* Copied to L1 G entry in DTLB */ 42 - #define _PAGE_SPECIAL 0x0020 /* SW entry */ 42 + #define _PAGE_ACCESSED 0x0020 /* Copied to L1 APG 1 entry in I/DTLB */ 43 43 #define _PAGE_EXEC 0x0040 /* Copied to PP (bit 21) in ITLB */ 44 - #define _PAGE_ACCESSED 0x0080 /* software: page referenced */ 44 + #define _PAGE_SPECIAL 0x0080 /* SW entry */ 45 45 46 46 #define _PAGE_NA 0x0200 /* Supervisor NA, User no access */ 47 47 #define _PAGE_RO 0x0600 /* Supervisor RO, User no access */ ··· 59 59 60 60 #define _PMD_PRESENT 0x0001 61 61 #define _PMD_PRESENT_MASK _PMD_PRESENT 62 - #define _PMD_BAD 0x0fd0 62 + #define _PMD_BAD 0x0f90 63 63 #define _PMD_PAGE_MASK 0x000c 64 64 #define _PMD_PAGE_8M 0x000c 65 65 #define _PMD_PAGE_512K 0x0004 66 - #define _PMD_USER 0x0020 /* APG 1 */ 66 + #define _PMD_ACCESSED 0x0020 /* APG 1 */ 67 + #define _PMD_USER 0x0040 /* APG 2 */ 67 68 68 69 #define _PTE_NONE_MASK 0 69 70

+9 -3

arch/powerpc/include/asm/topology.h

··· 6 6 7 7 struct device; 8 8 struct device_node; 9 + struct drmem_lmb; 9 10 10 11 #ifdef CONFIG_NUMA 11 12 ··· 62 61 */ 63 62 return (nid < 0) ? 0 : nid; 64 63 } 64 + 65 + int of_drconf_to_nid_single(struct drmem_lmb *lmb); 66 + 65 67 #else 66 68 67 69 static inline int early_cpu_to_node(int cpu) { return 0; } ··· 88 84 return 0; 89 85 } 90 86 91 - #endif /* CONFIG_NUMA */ 87 + static inline int of_drconf_to_nid_single(struct drmem_lmb *lmb) 88 + { 89 + return first_online_node; 90 + } 92 91 93 - struct drmem_lmb; 94 - int of_drconf_to_nid_single(struct drmem_lmb *lmb); 92 + #endif /* CONFIG_NUMA */ 95 93 96 94 #if defined(CONFIG_NUMA) && defined(CONFIG_PPC_SPLPAR) 97 95 extern int find_and_online_cpu_nid(int cpu);

+2 -2

arch/powerpc/include/asm/uaccess.h

··· 178 178 * are no aliasing issues. 179 179 */ 180 180 #define __put_user_asm_goto(x, addr, label, op) \ 181 - asm volatile goto( \ 181 + asm_volatile_goto( \ 182 182 "1: " op "%U1%X1 %0,%1 # put_user\n" \ 183 183 EX_TABLE(1b, %l2) \ 184 184 : \ ··· 191 191 __put_user_asm_goto(x, ptr, label, "std") 192 192 #else /* __powerpc64__ */ 193 193 #define __put_user_asm2_goto(x, addr, label) \ 194 - asm volatile goto( \ 194 + asm_volatile_goto( \ 195 195 "1: stw%X1 %0, %1\n" \ 196 196 "2: stw%X1 %L0, %L1\n" \ 197 197 EX_TABLE(1b, %l2) \

+3 -2

arch/powerpc/kernel/eeh_cache.c

··· 264 264 { 265 265 struct pci_io_addr_range *piar; 266 266 struct rb_node *n; 267 + unsigned long flags; 267 268 268 - spin_lock(&pci_io_addr_cache_root.piar_lock); 269 + spin_lock_irqsave(&pci_io_addr_cache_root.piar_lock, flags); 269 270 for (n = rb_first(&pci_io_addr_cache_root.rb_root); n; n = rb_next(n)) { 270 271 piar = rb_entry(n, struct pci_io_addr_range, rb_node); 271 272 ··· 274 273 (piar->flags & IORESOURCE_IO) ? "i/o" : "mem", 275 274 &piar->addr_lo, &piar->addr_hi, pci_name(piar->pcidev)); 276 275 } 277 - spin_unlock(&pci_io_addr_cache_root.piar_lock); 276 + spin_unlock_irqrestore(&pci_io_addr_cache_root.piar_lock, flags); 278 277 279 278 return 0; 280 279 }

-8

arch/powerpc/kernel/head_40x.S

··· 284 284 285 285 rlwimi r11, r10, 22, 20, 29 /* Compute PTE address */ 286 286 lwz r11, 0(r11) /* Get Linux PTE */ 287 - #ifdef CONFIG_SWAP 288 287 li r9, _PAGE_PRESENT | _PAGE_ACCESSED 289 - #else 290 - li r9, _PAGE_PRESENT 291 - #endif 292 288 andc. r9, r9, r11 /* Check permission */ 293 289 bne 5f 294 290 ··· 365 369 366 370 rlwimi r11, r10, 22, 20, 29 /* Compute PTE address */ 367 371 lwz r11, 0(r11) /* Get Linux PTE */ 368 - #ifdef CONFIG_SWAP 369 372 li r9, _PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_EXEC 370 - #else 371 - li r9, _PAGE_PRESENT | _PAGE_EXEC 372 - #endif 373 373 andc. r9, r9, r11 /* Check permission */ 374 374 bne 5f 375 375

+7 -39

arch/powerpc/kernel/head_8xx.S

··· 202 202 203 203 InstructionTLBMiss: 204 204 mtspr SPRN_SPRG_SCRATCH0, r10 205 - #if defined(ITLB_MISS_KERNEL) || defined(CONFIG_SWAP) || defined(CONFIG_HUGETLBFS) 206 205 mtspr SPRN_SPRG_SCRATCH1, r11 207 - #endif 208 206 209 207 /* If we are faulting a kernel address, we have to use the 210 208 * kernel page tables. ··· 222 224 3: 223 225 mtcr r11 224 226 #endif 225 - #if defined(CONFIG_HUGETLBFS) || !defined(CONFIG_PIN_TLB_TEXT) 226 227 lwz r11, (swapper_pg_dir-PAGE_OFFSET)@l(r10) /* Get level 1 entry */ 227 228 mtspr SPRN_MD_TWC, r11 228 - #else 229 - lwz r10, (swapper_pg_dir-PAGE_OFFSET)@l(r10) /* Get level 1 entry */ 230 - mtspr SPRN_MI_TWC, r10 /* Set segment attributes */ 231 - mtspr SPRN_MD_TWC, r10 232 - #endif 233 229 mfspr r10, SPRN_MD_TWC 234 230 lwz r10, 0(r10) /* Get the pte */ 235 - #if defined(CONFIG_HUGETLBFS) || !defined(CONFIG_PIN_TLB_TEXT) 231 + rlwimi r11, r10, 0, _PAGE_GUARDED | _PAGE_ACCESSED 236 232 rlwimi r11, r10, 32 - 9, _PMD_PAGE_512K 237 233 mtspr SPRN_MI_TWC, r11 238 - #endif 239 - #ifdef CONFIG_SWAP 240 - rlwinm r11, r10, 32-5, _PAGE_PRESENT 241 - and r11, r11, r10 242 - rlwimi r10, r11, 0, _PAGE_PRESENT 243 - #endif 244 234 /* The Linux PTE won't go exactly into the MMU TLB. 245 235 * Software indicator bits 20 and 23 must be clear. 246 236 * Software indicator bits 22, 24, 25, 26, and 27 must be ··· 242 256 243 257 /* Restore registers */ 244 258 0: mfspr r10, SPRN_SPRG_SCRATCH0 245 - #if defined(ITLB_MISS_KERNEL) || defined(CONFIG_SWAP) || defined(CONFIG_HUGETLBFS) 246 259 mfspr r11, SPRN_SPRG_SCRATCH1 247 - #endif 248 260 rfi 249 261 patch_site 0b, patch__itlbmiss_exit_1 250 262 ··· 252 268 addi r10, r10, 1 253 269 stw r10, (itlb_miss_counter - PAGE_OFFSET)@l(0) 254 270 mfspr r10, SPRN_SPRG_SCRATCH0 255 - #if defined(ITLB_MISS_KERNEL) || defined(CONFIG_SWAP) 256 271 mfspr r11, SPRN_SPRG_SCRATCH1 257 - #endif 258 272 rfi 259 273 #endif 260 274 ··· 279 297 mfspr r10, SPRN_MD_TWC 280 298 lwz r10, 0(r10) /* Get the pte */ 281 299 282 - /* Insert the Guarded flag into the TWC from the Linux PTE. 300 + /* Insert Guarded and Accessed flags into the TWC from the Linux PTE. 283 301 * It is bit 27 of both the Linux PTE and the TWC (at least 284 302 * I got that right :-). It will be better when we can put 285 303 * this into the Linux pgd/pmd and load it in the operation 286 304 * above. 287 305 */ 288 - rlwimi r11, r10, 0, _PAGE_GUARDED 306 + rlwimi r11, r10, 0, _PAGE_GUARDED | _PAGE_ACCESSED 289 307 rlwimi r11, r10, 32 - 9, _PMD_PAGE_512K 290 308 mtspr SPRN_MD_TWC, r11 291 309 292 - /* Both _PAGE_ACCESSED and _PAGE_PRESENT has to be set. 293 - * We also need to know if the insn is a load/store, so: 294 - * Clear _PAGE_PRESENT and load that which will 295 - * trap into DTLB Error with store bit set accordinly. 296 - */ 297 - /* PRESENT=0x1, ACCESSED=0x20 298 - * r11 = ((r10 & PRESENT) & ((r10 & ACCESSED) >> 5)); 299 - * r10 = (r10 & ~PRESENT) | r11; 300 - */ 301 - #ifdef CONFIG_SWAP 302 - rlwinm r11, r10, 32-5, _PAGE_PRESENT 303 - and r11, r11, r10 304 - rlwimi r10, r11, 0, _PAGE_PRESENT 305 - #endif 306 310 /* The Linux PTE won't go exactly into the MMU TLB. 307 311 * Software indicator bits 24, 25, 26, and 27 must be 308 312 * set. All other Linux PTE bits control the behavior ··· 679 711 li r9, 4 /* up to 4 pages of 8M */ 680 712 mtctr r9 681 713 lis r9, KERNELBASE@h /* Create vaddr for TLB */ 682 - li r10, MI_PS8MEG | MI_SVALID /* Set 8M byte page */ 714 + li r10, MI_PS8MEG | _PMD_ACCESSED | MI_SVALID 683 715 li r11, MI_BOOTINIT /* Create RPN for address 0 */ 684 716 1: 685 717 mtspr SPRN_MI_CTR, r8 /* Set instruction MMU control */ ··· 743 775 #ifdef CONFIG_PIN_TLB_TEXT 744 776 LOAD_REG_IMMEDIATE(r5, 28 << 8) 745 777 LOAD_REG_IMMEDIATE(r6, PAGE_OFFSET) 746 - LOAD_REG_IMMEDIATE(r7, MI_SVALID | MI_PS8MEG) 778 + LOAD_REG_IMMEDIATE(r7, MI_SVALID | MI_PS8MEG | _PMD_ACCESSED) 747 779 LOAD_REG_IMMEDIATE(r8, 0xf0 | _PAGE_RO | _PAGE_SPS | _PAGE_SH | _PAGE_PRESENT) 748 780 LOAD_REG_ADDR(r9, _sinittext) 749 781 li r0, 4 ··· 765 797 LOAD_REG_IMMEDIATE(r5, 28 << 8 | MD_TWAM) 766 798 #ifdef CONFIG_PIN_TLB_DATA 767 799 LOAD_REG_IMMEDIATE(r6, PAGE_OFFSET) 768 - LOAD_REG_IMMEDIATE(r7, MI_SVALID | MI_PS8MEG) 800 + LOAD_REG_IMMEDIATE(r7, MI_SVALID | MI_PS8MEG | _PMD_ACCESSED) 769 801 #ifdef CONFIG_PIN_TLB_IMMR 770 802 li r0, 3 771 803 #else ··· 802 834 #endif 803 835 #ifdef CONFIG_PIN_TLB_IMMR 804 836 LOAD_REG_IMMEDIATE(r0, VIRT_IMMR_BASE | MD_EVALID) 805 - LOAD_REG_IMMEDIATE(r7, MD_SVALID | MD_PS512K | MD_GUARDED) 837 + LOAD_REG_IMMEDIATE(r7, MD_SVALID | MD_PS512K | MD_GUARDED | _PMD_ACCESSED) 806 838 mfspr r8, SPRN_IMMR 807 839 rlwinm r8, r8, 0, 0xfff80000 808 840 ori r8, r8, 0xf0 | _PAGE_DIRTY | _PAGE_SPS | _PAGE_SH | \

-12

arch/powerpc/kernel/head_book3s_32.S

··· 457 457 cmplw 0,r1,r3 458 458 #endif 459 459 mfspr r2, SPRN_SPRG_PGDIR 460 - #ifdef CONFIG_SWAP 461 460 li r1,_PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_EXEC 462 - #else 463 - li r1,_PAGE_PRESENT | _PAGE_EXEC 464 - #endif 465 461 #if defined(CONFIG_MODULES) || defined(CONFIG_DEBUG_PAGEALLOC) 466 462 bgt- 112f 467 463 lis r2, (swapper_pg_dir - PAGE_OFFSET)@ha /* if kernel address, use */ ··· 519 523 lis r1, TASK_SIZE@h /* check if kernel address */ 520 524 cmplw 0,r1,r3 521 525 mfspr r2, SPRN_SPRG_PGDIR 522 - #ifdef CONFIG_SWAP 523 526 li r1, _PAGE_PRESENT | _PAGE_ACCESSED 524 - #else 525 - li r1, _PAGE_PRESENT 526 - #endif 527 527 bgt- 112f 528 528 lis r2, (swapper_pg_dir - PAGE_OFFSET)@ha /* if kernel address, use */ 529 529 addi r2, r2, (swapper_pg_dir - PAGE_OFFSET)@l /* kernel page table */ ··· 595 603 lis r1, TASK_SIZE@h /* check if kernel address */ 596 604 cmplw 0,r1,r3 597 605 mfspr r2, SPRN_SPRG_PGDIR 598 - #ifdef CONFIG_SWAP 599 606 li r1, _PAGE_RW | _PAGE_DIRTY | _PAGE_PRESENT | _PAGE_ACCESSED 600 - #else 601 - li r1, _PAGE_RW | _PAGE_DIRTY | _PAGE_PRESENT 602 - #endif 603 607 bgt- 112f 604 608 lis r2, (swapper_pg_dir - PAGE_OFFSET)@ha /* if kernel address, use */ 605 609 addi r2, r2, (swapper_pg_dir - PAGE_OFFSET)@l /* kernel page table */

+2 -1

arch/powerpc/kernel/smp.c

··· 1393 1393 /* Activate a secondary processor. */ 1394 1394 void start_secondary(void *unused) 1395 1395 { 1396 - unsigned int cpu = smp_processor_id(); 1396 + unsigned int cpu = raw_smp_processor_id(); 1397 1397 1398 1398 mmgrab(&init_mm); 1399 1399 current->active_mm = &init_mm; 1400 1400 1401 1401 smp_store_cpu_info(cpu); 1402 1402 set_dec(tb_ticks_per_jiffy); 1403 + rcu_cpu_starting(cpu); 1403 1404 preempt_disable(); 1404 1405 cpu_callin_map[cpu] = 1; 1405 1406

+1 -1

arch/riscv/include/asm/uaccess.h

··· 476 476 do { \ 477 477 long __kr_err; \ 478 478 \ 479 - __put_user_nocheck(*((type *)(dst)), (type *)(src), __kr_err); \ 479 + __put_user_nocheck(*((type *)(src)), (type *)(dst), __kr_err); \ 480 480 if (unlikely(__kr_err)) \ 481 481 goto err_label; \ 482 482 } while (0)

+1 -1

arch/riscv/kernel/ftrace.c

+5

arch/riscv/kernel/head.S

··· 35 35 .word 0 36 36 #endif 37 37 .balign 8 38 + #ifdef CONFIG_RISCV_M_MODE 39 + /* Image load offset (0MB) from start of RAM for M-mode */ 40 + .dword 0 41 + #else 38 42 #if __riscv_xlen == 64 39 43 /* Image load offset(2MB) from start of RAM */ 40 44 .dword 0x200000 41 45 #else 42 46 /* Image load offset(4MB) from start of RAM */ 43 47 .dword 0x400000 48 + #endif 44 49 #endif 45 50 /* Effective size of kernel image */ 46 51 .dword _end - _start

+1

arch/riscv/kernel/vdso/.gitignore

··· 1 1 # SPDX-License-Identifier: GPL-2.0-only 2 2 vdso.lds 3 3 *.tmp 4 + vdso-syms.S

+9 -9

arch/riscv/kernel/vdso/Makefile

··· 43 43 SYSCFLAGS_vdso.so.dbg = $(c_flags) 44 44 $(obj)/vdso.so.dbg: $(src)/vdso.lds $(obj-vdso) FORCE 45 45 $(call if_changed,vdsold) 46 + SYSCFLAGS_vdso.so.dbg = -shared -s -Wl,-soname=linux-vdso.so.1 \ 47 + -Wl,--build-id -Wl,--hash-style=both 46 48 47 49 # We also create a special relocatable object that should mirror the symbol 48 50 # table and layout of the linked DSO. With ld --just-symbols we can then 49 51 # refer to these symbols in the kernel code rather than hand-coded addresses. 50 - 51 - SYSCFLAGS_vdso.so.dbg = -shared -s -Wl,-soname=linux-vdso.so.1 \ 52 - -Wl,--build-id=sha1 -Wl,--hash-style=both 53 - $(obj)/vdso-dummy.o: $(src)/vdso.lds $(obj)/rt_sigreturn.o FORCE 54 - $(call if_changed,vdsold) 55 - 56 - LDFLAGS_vdso-syms.o := -r --just-symbols 57 - $(obj)/vdso-syms.o: $(obj)/vdso-dummy.o FORCE 58 - $(call if_changed,ld) 52 + $(obj)/vdso-syms.S: $(obj)/vdso.so FORCE 53 + $(call if_changed,so2s) 59 54 60 55 # strip rule for the .so file 61 56 $(obj)/%.so: OBJCOPYFLAGS := -S ··· 67 72 $(CROSS_COMPILE)objcopy \ 68 73 $(patsubst %, -G __vdso_%, $(vdso-syms)) $@.tmp $@ && \ 69 74 rm $@.tmp 75 + 76 + # Extracts symbol offsets from the VDSO, converting them into an assembly file 77 + # that contains the same symbols at the same offsets. 78 + quiet_cmd_so2s = SO2S $@ 79 + cmd_so2s = $(NM) -D $< | $(srctree)/$(src)/so2s.sh > $@ 70 80 71 81 # install commands for the unstripped file 72 82 quiet_cmd_vdso_install = INSTALL $@

+6

arch/riscv/kernel/vdso/so2s.sh

··· 1 + #!/bin/sh 2 + # SPDX-License-Identifier: GPL-2.0+ 3 + # Copyright 2020 Palmer Dabbelt <palmerdabbelt@google.com> 4 + 5 + sed 's!$[0-9a-f]*$ T $[a-z0-9_]*$$@@LINUX_4.15$*!.global \2\n.set \2,0x\1!' \ 6 + | grep '^\.'

+3 -1

arch/riscv/mm/fault.c

··· 86 86 pmd_t *pmd, *pmd_k; 87 87 pte_t *pte_k; 88 88 int index; 89 + unsigned long pfn; 89 90 90 91 /* User mode accesses just cause a SIGSEGV */ 91 92 if (user_mode(regs)) ··· 101 100 * of a task switch. 102 101 */ 103 102 index = pgd_index(addr); 104 - pgd = (pgd_t *)pfn_to_virt(csr_read(CSR_SATP)) + index; 103 + pfn = csr_read(CSR_SATP) & SATP_PPN; 104 + pgd = (pgd_t *)pfn_to_virt(pfn) + index; 105 105 pgd_k = init_mm.pgd + index; 106 106 107 107 if (!pgd_present(*pgd_k)) {

+21 -11

arch/riscv/mm/init.c

··· 154 154 155 155 void __init setup_bootmem(void) 156 156 { 157 - phys_addr_t mem_size = 0; 158 - phys_addr_t total_mem = 0; 159 - phys_addr_t mem_start, start, end = 0; 157 + phys_addr_t mem_start = 0; 158 + phys_addr_t start, end = 0; 160 159 phys_addr_t vmlinux_end = __pa_symbol(&_end); 161 160 phys_addr_t vmlinux_start = __pa_symbol(&_start); 162 161 u64 i; ··· 163 164 /* Find the memory region containing the kernel */ 164 165 for_each_mem_range(i, &start, &end) { 165 166 phys_addr_t size = end - start; 166 - if (!total_mem) 167 + if (!mem_start) 167 168 mem_start = start; 168 169 if (start <= vmlinux_start && vmlinux_end <= end) 169 170 BUG_ON(size == 0); 170 - total_mem = total_mem + size; 171 171 } 172 172 173 173 /* 174 - * Remove memblock from the end of usable area to the 175 - * end of region 174 + * The maximal physical memory size is -PAGE_OFFSET. 175 + * Make sure that any memory beyond mem_start + (-PAGE_OFFSET) is removed 176 + * as it is unusable by kernel. 176 177 */ 177 - mem_size = min(total_mem, (phys_addr_t)-PAGE_OFFSET); 178 - if (mem_start + mem_size < end) 179 - memblock_remove(mem_start + mem_size, 180 - end - mem_start - mem_size); 178 + memblock_enforce_memory_limit(mem_start - PAGE_OFFSET); 181 179 182 180 /* Reserve from the start of the kernel to the end of the kernel */ 183 181 memblock_reserve(vmlinux_start, vmlinux_end - vmlinux_start); ··· 293 297 #define NUM_EARLY_PMDS (1UL + MAX_EARLY_MAPPING_SIZE / PGDIR_SIZE) 294 298 #endif 295 299 pmd_t early_pmd[PTRS_PER_PMD * NUM_EARLY_PMDS] __initdata __aligned(PAGE_SIZE); 300 + pmd_t early_dtb_pmd[PTRS_PER_PMD] __initdata __aligned(PAGE_SIZE); 296 301 297 302 static pmd_t *__init get_pmd_virt_early(phys_addr_t pa) 298 303 { ··· 491 494 load_pa + (va - PAGE_OFFSET), 492 495 map_size, PAGE_KERNEL_EXEC); 493 496 497 + #ifndef __PAGETABLE_PMD_FOLDED 498 + /* Setup early PMD for DTB */ 499 + create_pgd_mapping(early_pg_dir, DTB_EARLY_BASE_VA, 500 + (uintptr_t)early_dtb_pmd, PGDIR_SIZE, PAGE_TABLE); 501 + /* Create two consecutive PMD mappings for FDT early scan */ 502 + pa = dtb_pa & ~(PMD_SIZE - 1); 503 + create_pmd_mapping(early_dtb_pmd, DTB_EARLY_BASE_VA, 504 + pa, PMD_SIZE, PAGE_KERNEL); 505 + create_pmd_mapping(early_dtb_pmd, DTB_EARLY_BASE_VA + PMD_SIZE, 506 + pa + PMD_SIZE, PMD_SIZE, PAGE_KERNEL); 507 + dtb_early_va = (void *)DTB_EARLY_BASE_VA + (dtb_pa & (PMD_SIZE - 1)); 508 + #else 494 509 /* Create two consecutive PGD mappings for FDT early scan */ 495 510 pa = dtb_pa & ~(PGDIR_SIZE - 1); 496 511 create_pgd_mapping(early_pg_dir, DTB_EARLY_BASE_VA, ··· 510 501 create_pgd_mapping(early_pg_dir, DTB_EARLY_BASE_VA + PGDIR_SIZE, 511 502 pa + PGDIR_SIZE, PGDIR_SIZE, PAGE_KERNEL); 512 503 dtb_early_va = (void *)DTB_EARLY_BASE_VA + (dtb_pa & (PGDIR_SIZE - 1)); 504 + #endif 513 505 dtb_early_pa = dtb_pa; 514 506 515 507 /*

+18 -5

arch/x86/kernel/apic/x2apic_uv_x.c

··· 290 290 { 291 291 /* Relies on 'to' being NULL chars so result will be NULL terminated */ 292 292 strncpy(to, from, len-1); 293 + 294 + /* Trim trailing spaces */ 295 + (void)strim(to); 293 296 } 294 297 295 298 /* Find UV arch type entry in UVsystab */ ··· 369 366 return ret; 370 367 } 371 368 372 - static int __init uv_set_system_type(char *_oem_id) 369 + static int __init uv_set_system_type(char *_oem_id, char *_oem_table_id) 373 370 { 374 371 /* Save OEM_ID passed from ACPI MADT */ 375 372 uv_stringify(sizeof(oem_id), oem_id, _oem_id); ··· 389 386 /* (Not hubless), not a UV */ 390 387 return 0; 391 388 389 + /* Is UV hubless system */ 390 + uv_hubless_system = 0x01; 391 + 392 + /* UV5 Hubless */ 393 + if (strncmp(uv_archtype, "NSGI5", 5) == 0) 394 + uv_hubless_system |= 0x20; 395 + 392 396 /* UV4 Hubless: CH */ 393 - if (strncmp(uv_archtype, "NSGI4", 5) == 0) 394 - uv_hubless_system = 0x11; 397 + else if (strncmp(uv_archtype, "NSGI4", 5) == 0) 398 + uv_hubless_system |= 0x10; 395 399 396 400 /* UV3 Hubless: UV300/MC990X w/o hub */ 397 401 else 398 - uv_hubless_system = 0x9; 402 + uv_hubless_system |= 0x8; 403 + 404 + /* Copy APIC type */ 405 + uv_stringify(sizeof(oem_table_id), oem_table_id, _oem_table_id); 399 406 400 407 pr_info("UV: OEM IDs %s/%s, SystemType %d, HUBLESS ID %x\n", 401 408 oem_id, oem_table_id, uv_system_type, uv_hubless_system); ··· 469 456 uv_cpu_info->p_uv_hub_info = &uv_hub_info_node0; 470 457 471 458 /* If not UV, return. */ 472 - if (likely(uv_set_system_type(_oem_id) == 0)) 459 + if (uv_set_system_type(_oem_id, _oem_table_id) == 0) 473 460 return 0; 474 461 475 462 /* Save and Decode OEM Table ID */

+33 -18

arch/x86/kernel/cpu/bugs.c

··· 1254 1254 return 0; 1255 1255 } 1256 1256 1257 + static bool is_spec_ib_user_controlled(void) 1258 + { 1259 + return spectre_v2_user_ibpb == SPECTRE_V2_USER_PRCTL || 1260 + spectre_v2_user_ibpb == SPECTRE_V2_USER_SECCOMP || 1261 + spectre_v2_user_stibp == SPECTRE_V2_USER_PRCTL || 1262 + spectre_v2_user_stibp == SPECTRE_V2_USER_SECCOMP; 1263 + } 1264 + 1257 1265 static int ib_prctl_set(struct task_struct *task, unsigned long ctrl) 1258 1266 { 1259 1267 switch (ctrl) { ··· 1269 1261 if (spectre_v2_user_ibpb == SPECTRE_V2_USER_NONE && 1270 1262 spectre_v2_user_stibp == SPECTRE_V2_USER_NONE) 1271 1263 return 0; 1264 + 1272 1265 /* 1273 - * Indirect branch speculation is always disabled in strict 1274 - * mode. It can neither be enabled if it was force-disabled 1275 - * by a previous prctl call. 1266 + * With strict mode for both IBPB and STIBP, the instruction 1267 + * code paths avoid checking this task flag and instead, 1268 + * unconditionally run the instruction. However, STIBP and IBPB 1269 + * are independent and either can be set to conditionally 1270 + * enabled regardless of the mode of the other. 1271 + * 1272 + * If either is set to conditional, allow the task flag to be 1273 + * updated, unless it was force-disabled by a previous prctl 1274 + * call. Currently, this is possible on an AMD CPU which has the 1275 + * feature X86_FEATURE_AMD_STIBP_ALWAYS_ON. In this case, if the 1276 + * kernel is booted with 'spectre_v2_user=seccomp', then 1277 + * spectre_v2_user_ibpb == SPECTRE_V2_USER_SECCOMP and 1278 + * spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT_PREFERRED. 1276 1279 */ 1277 - if (spectre_v2_user_ibpb == SPECTRE_V2_USER_STRICT || 1278 - spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT || 1279 - spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT_PREFERRED || 1280 + if (!is_spec_ib_user_controlled() || 1280 1281 task_spec_ib_force_disable(task)) 1281 1282 return -EPERM; 1283 + 1282 1284 task_clear_spec_ib_disable(task); 1283 1285 task_update_spec_tif(task); 1284 1286 break; ··· 1301 1283 if (spectre_v2_user_ibpb == SPECTRE_V2_USER_NONE && 1302 1284 spectre_v2_user_stibp == SPECTRE_V2_USER_NONE) 1303 1285 return -EPERM; 1304 - if (spectre_v2_user_ibpb == SPECTRE_V2_USER_STRICT || 1305 - spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT || 1306 - spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT_PREFERRED) 1286 + 1287 + if (!is_spec_ib_user_controlled()) 1307 1288 return 0; 1289 + 1308 1290 task_set_spec_ib_disable(task); 1309 1291 if (ctrl == PR_SPEC_FORCE_DISABLE) 1310 1292 task_set_spec_ib_force_disable(task); ··· 1369 1351 if (spectre_v2_user_ibpb == SPECTRE_V2_USER_NONE && 1370 1352 spectre_v2_user_stibp == SPECTRE_V2_USER_NONE) 1371 1353 return PR_SPEC_ENABLE; 1372 - else if (spectre_v2_user_ibpb == SPECTRE_V2_USER_STRICT || 1373 - spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT || 1374 - spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT_PREFERRED) 1375 - return PR_SPEC_DISABLE; 1376 - else if (spectre_v2_user_ibpb == SPECTRE_V2_USER_PRCTL || 1377 - spectre_v2_user_ibpb == SPECTRE_V2_USER_SECCOMP || 1378 - spectre_v2_user_stibp == SPECTRE_V2_USER_PRCTL || 1379 - spectre_v2_user_stibp == SPECTRE_V2_USER_SECCOMP) { 1354 + else if (is_spec_ib_user_controlled()) { 1380 1355 if (task_spec_ib_force_disable(task)) 1381 1356 return PR_SPEC_PRCTL | PR_SPEC_FORCE_DISABLE; 1382 1357 if (task_spec_ib_disable(task)) 1383 1358 return PR_SPEC_PRCTL | PR_SPEC_DISABLE; 1384 1359 return PR_SPEC_PRCTL | PR_SPEC_ENABLE; 1385 - } else 1360 + } else if (spectre_v2_user_ibpb == SPECTRE_V2_USER_STRICT || 1361 + spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT || 1362 + spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT_PREFERRED) 1363 + return PR_SPEC_DISABLE; 1364 + else 1386 1365 return PR_SPEC_NOT_AFFECTED; 1387 1366 } 1388 1367

+16 -7

arch/x86/kvm/cpuid.c

··· 90 90 return 0; 91 91 } 92 92 93 + void kvm_update_pv_runtime(struct kvm_vcpu *vcpu) 94 + { 95 + struct kvm_cpuid_entry2 *best; 96 + 97 + best = kvm_find_cpuid_entry(vcpu, KVM_CPUID_FEATURES, 0); 98 + 99 + /* 100 + * save the feature bitmap to avoid cpuid lookup for every PV 101 + * operation 102 + */ 103 + if (best) 104 + vcpu->arch.pv_cpuid.features = best->eax; 105 + } 106 + 93 107 void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu) 94 108 { 95 109 struct kvm_cpuid_entry2 *best; ··· 138 124 (best->eax & (1 << KVM_FEATURE_PV_UNHALT))) 139 125 best->eax &= ~(1 << KVM_FEATURE_PV_UNHALT); 140 126 141 - /* 142 - * save the feature bitmap to avoid cpuid lookup for every PV 143 - * operation 144 - */ 145 - if (best) 146 - vcpu->arch.pv_cpuid.features = best->eax; 147 - 148 127 if (!kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT)) { 149 128 best = kvm_find_cpuid_entry(vcpu, 0x1, 0); 150 129 if (best) ··· 168 161 else 169 162 vcpu->arch.guest_supported_xcr0 = 170 163 (best->eax | ((u64)best->edx << 32)) & supported_xcr0; 164 + 165 + kvm_update_pv_runtime(vcpu); 171 166 172 167 vcpu->arch.maxphyaddr = cpuid_query_maxphyaddr(vcpu); 173 168 kvm_mmu_reset_context(vcpu);

+1

arch/x86/kvm/cpuid.h

··· 11 11 void kvm_set_cpu_caps(void); 12 12 13 13 void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu); 14 + void kvm_update_pv_runtime(struct kvm_vcpu *vcpu); 14 15 struct kvm_cpuid_entry2 *kvm_find_cpuid_entry(struct kvm_vcpu *vcpu, 15 16 u32 function, u32 index); 16 17 int kvm_dev_ioctl_get_cpuid(struct kvm_cpuid2 *cpuid,

+7 -5

arch/x86/kvm/mmu/mmu.c

··· 856 856 } else { 857 857 rmap_printk("pte_list_add: %p %llx many->many\n", spte, *spte); 858 858 desc = (struct pte_list_desc *)(rmap_head->val & ~1ul); 859 - while (desc->sptes[PTE_LIST_EXT-1] && desc->more) { 860 - desc = desc->more; 859 + while (desc->sptes[PTE_LIST_EXT-1]) { 861 860 count += PTE_LIST_EXT; 862 - } 863 - if (desc->sptes[PTE_LIST_EXT-1]) { 864 - desc->more = mmu_alloc_pte_list_desc(vcpu); 861 + 862 + if (!desc->more) { 863 + desc->more = mmu_alloc_pte_list_desc(vcpu); 864 + desc = desc->more; 865 + break; 866 + } 865 867 desc = desc->more; 866 868 } 867 869 for (i = 0; desc->sptes[i]; ++i)

+55 -17

arch/x86/kvm/x86.c

··· 255 255 256 256 /* 257 257 * When called, it means the previous get/set msr reached an invalid msr. 258 - * Return 0 if we want to ignore/silent this failed msr access, or 1 if we want 259 - * to fail the caller. 258 + * Return true if we want to ignore/silent this failed msr access. 260 259 */ 261 - static int kvm_msr_ignored_check(struct kvm_vcpu *vcpu, u32 msr, 262 - u64 data, bool write) 260 + static bool kvm_msr_ignored_check(struct kvm_vcpu *vcpu, u32 msr, 261 + u64 data, bool write) 263 262 { 264 263 const char *op = write ? "wrmsr" : "rdmsr"; 265 264 ··· 267 268 kvm_pr_unimpl("ignored %s: 0x%x data 0x%llx\n", 268 269 op, msr, data); 269 270 /* Mask the error */ 270 - return 0; 271 + return true; 271 272 } else { 272 273 kvm_debug_ratelimited("unhandled %s: 0x%x data 0x%llx\n", 273 274 op, msr, data); 274 - return -ENOENT; 275 + return false; 275 276 } 276 277 } 277 278 ··· 1415 1416 if (r == KVM_MSR_RET_INVALID) { 1416 1417 /* Unconditionally clear the output for simplicity */ 1417 1418 *data = 0; 1418 - r = kvm_msr_ignored_check(vcpu, index, 0, false); 1419 + if (kvm_msr_ignored_check(vcpu, index, 0, false)) 1420 + r = 0; 1419 1421 } 1420 1422 1421 1423 if (r) ··· 1540 1540 struct msr_data msr; 1541 1541 1542 1542 if (!host_initiated && !kvm_msr_allowed(vcpu, index, KVM_MSR_FILTER_WRITE)) 1543 - return -EPERM; 1543 + return KVM_MSR_RET_FILTERED; 1544 1544 1545 1545 switch (index) { 1546 1546 case MSR_FS_BASE: ··· 1581 1581 int ret = __kvm_set_msr(vcpu, index, data, host_initiated); 1582 1582 1583 1583 if (ret == KVM_MSR_RET_INVALID) 1584 - ret = kvm_msr_ignored_check(vcpu, index, data, true); 1584 + if (kvm_msr_ignored_check(vcpu, index, data, true)) 1585 + ret = 0; 1585 1586 1586 1587 return ret; 1587 1588 } ··· 1600 1599 int ret; 1601 1600 1602 1601 if (!host_initiated && !kvm_msr_allowed(vcpu, index, KVM_MSR_FILTER_READ)) 1603 - return -EPERM; 1602 + return KVM_MSR_RET_FILTERED; 1604 1603 1605 1604 msr.index = index; 1606 1605 msr.host_initiated = host_initiated; ··· 1619 1618 if (ret == KVM_MSR_RET_INVALID) { 1620 1619 /* Unconditionally clear *data for simplicity */ 1621 1620 *data = 0; 1622 - ret = kvm_msr_ignored_check(vcpu, index, 0, false); 1621 + if (kvm_msr_ignored_check(vcpu, index, 0, false)) 1622 + ret = 0; 1623 1623 } 1624 1624 1625 1625 return ret; ··· 1664 1662 static u64 kvm_msr_reason(int r) 1665 1663 { 1666 1664 switch (r) { 1667 - case -ENOENT: 1665 + case KVM_MSR_RET_INVALID: 1668 1666 return KVM_MSR_EXIT_REASON_UNKNOWN; 1669 - case -EPERM: 1667 + case KVM_MSR_RET_FILTERED: 1670 1668 return KVM_MSR_EXIT_REASON_FILTER; 1671 1669 default: 1672 1670 return KVM_MSR_EXIT_REASON_INVAL; ··· 1967 1965 struct kvm_arch *ka = &vcpu->kvm->arch; 1968 1966 1969 1967 if (vcpu->vcpu_id == 0 && !host_initiated) { 1970 - if (ka->boot_vcpu_runs_old_kvmclock && old_msr) 1968 + if (ka->boot_vcpu_runs_old_kvmclock != old_msr) 1971 1969 kvm_make_request(KVM_REQ_MASTERCLOCK_UPDATE, vcpu); 1972 1970 1973 1971 ka->boot_vcpu_runs_old_kvmclock = old_msr; ··· 3065 3063 /* Values other than LBR and BTF are vendor-specific, 3066 3064 thus reserved and should throw a #GP */ 3067 3065 return 1; 3068 - } 3069 - vcpu_unimpl(vcpu, "%s: MSR_IA32_DEBUGCTLMSR 0x%llx, nop\n", 3070 - __func__, data); 3066 + } else if (report_ignored_msrs) 3067 + vcpu_unimpl(vcpu, "%s: MSR_IA32_DEBUGCTLMSR 0x%llx, nop\n", 3068 + __func__, data); 3071 3069 break; 3072 3070 case 0x200 ... 0x2ff: 3073 3071 return kvm_mtrr_set_msr(vcpu, msr, data); ··· 3465 3463 msr_info->data = vcpu->arch.efer; 3466 3464 break; 3467 3465 case MSR_KVM_WALL_CLOCK: 3466 + if (!guest_pv_has(vcpu, KVM_FEATURE_CLOCKSOURCE)) 3467 + return 1; 3468 + 3469 + msr_info->data = vcpu->kvm->arch.wall_clock; 3470 + break; 3468 3471 case MSR_KVM_WALL_CLOCK_NEW: 3472 + if (!guest_pv_has(vcpu, KVM_FEATURE_CLOCKSOURCE2)) 3473 + return 1; 3474 + 3469 3475 msr_info->data = vcpu->kvm->arch.wall_clock; 3470 3476 break; 3471 3477 case MSR_KVM_SYSTEM_TIME: 3478 + if (!guest_pv_has(vcpu, KVM_FEATURE_CLOCKSOURCE)) 3479 + return 1; 3480 + 3481 + msr_info->data = vcpu->arch.time; 3482 + break; 3472 3483 case MSR_KVM_SYSTEM_TIME_NEW: 3484 + if (!guest_pv_has(vcpu, KVM_FEATURE_CLOCKSOURCE2)) 3485 + return 1; 3486 + 3473 3487 msr_info->data = vcpu->arch.time; 3474 3488 break; 3475 3489 case MSR_KVM_ASYNC_PF_EN: 3490 + if (!guest_pv_has(vcpu, KVM_FEATURE_ASYNC_PF)) 3491 + return 1; 3492 + 3476 3493 msr_info->data = vcpu->arch.apf.msr_en_val; 3477 3494 break; 3478 3495 case MSR_KVM_ASYNC_PF_INT: 3496 + if (!guest_pv_has(vcpu, KVM_FEATURE_ASYNC_PF_INT)) 3497 + return 1; 3498 + 3479 3499 msr_info->data = vcpu->arch.apf.msr_int_val; 3480 3500 break; 3481 3501 case MSR_KVM_ASYNC_PF_ACK: 3502 + if (!guest_pv_has(vcpu, KVM_FEATURE_ASYNC_PF)) 3503 + return 1; 3504 + 3482 3505 msr_info->data = 0; 3483 3506 break; 3484 3507 case MSR_KVM_STEAL_TIME: 3508 + if (!guest_pv_has(vcpu, KVM_FEATURE_STEAL_TIME)) 3509 + return 1; 3510 + 3485 3511 msr_info->data = vcpu->arch.st.msr_val; 3486 3512 break; 3487 3513 case MSR_KVM_PV_EOI_EN: 3514 + if (!guest_pv_has(vcpu, KVM_FEATURE_PV_EOI)) 3515 + return 1; 3516 + 3488 3517 msr_info->data = vcpu->arch.pv_eoi.msr_val; 3489 3518 break; 3490 3519 case MSR_KVM_POLL_CONTROL: 3520 + if (!guest_pv_has(vcpu, KVM_FEATURE_POLL_CONTROL)) 3521 + return 1; 3522 + 3491 3523 msr_info->data = vcpu->arch.msr_kvm_poll_control; 3492 3524 break; 3493 3525 case MSR_IA32_P5_MC_ADDR: ··· 4611 4575 4612 4576 case KVM_CAP_ENFORCE_PV_FEATURE_CPUID: 4613 4577 vcpu->arch.pv_cpuid.enforce = cap->args[0]; 4578 + if (vcpu->arch.pv_cpuid.enforce) 4579 + kvm_update_pv_runtime(vcpu); 4614 4580 4615 4581 return 0; 4616 4582

+7 -1

arch/x86/kvm/x86.h

··· 376 376 int kvm_handle_invpcid(struct kvm_vcpu *vcpu, unsigned long type, gva_t gva); 377 377 bool kvm_msr_allowed(struct kvm_vcpu *vcpu, u32 index, u32 type); 378 378 379 - #define KVM_MSR_RET_INVALID 2 379 + /* 380 + * Internal error codes that are used to indicate that MSR emulation encountered 381 + * an error that should result in #GP in the guest, unless userspace 382 + * handles it. 383 + */ 384 + #define KVM_MSR_RET_INVALID 2 /* in-kernel MSR emulation #GP condition */ 385 + #define KVM_MSR_RET_FILTERED 3 /* #GP due to userspace MSR filter */ 380 386 381 387 #define __cr4_reserved_bits(__cpu_has, __c) \ 382 388 ({ \

+1 -3

arch/x86/lib/memcpy_64.S

··· 16 16 * to a jmp to memcpy_erms which does the REP; MOVSB mem copy. 17 17 */ 18 18 19 - .weak memcpy 20 - 21 19 /* 22 20 * memcpy - Copy a memory block. 23 21 * ··· 28 30 * rax original destination 29 31 */ 30 32 SYM_FUNC_START_ALIAS(__memcpy) 31 - SYM_FUNC_START_LOCAL(memcpy) 33 + SYM_FUNC_START_WEAK(memcpy) 32 34 ALTERNATIVE_2 "jmp memcpy_orig", "", X86_FEATURE_REP_GOOD, \ 33 35 "jmp memcpy_erms", X86_FEATURE_ERMS 34 36

+1 -3

arch/x86/lib/memmove_64.S

··· 24 24 * Output: 25 25 * rax: dest 26 26 */ 27 - .weak memmove 28 - 29 - SYM_FUNC_START_ALIAS(memmove) 27 + SYM_FUNC_START_WEAK(memmove) 30 28 SYM_FUNC_START(__memmove) 31 29 32 30 mov %rdi, %rax

+1 -3

arch/x86/lib/memset_64.S

··· 6 6 #include <asm/alternative-asm.h> 7 7 #include <asm/export.h> 8 8 9 - .weak memset 10 - 11 9 /* 12 10 * ISO C memset - set a memory block to a byte value. This function uses fast 13 11 * string to get better performance than the original function. The code is ··· 17 19 * 18 20 * rax original destination 19 21 */ 20 - SYM_FUNC_START_ALIAS(memset) 22 + SYM_FUNC_START_WEAK(memset) 21 23 SYM_FUNC_START(__memset) 22 24 /* 23 25 * Some CPUs support enhanced REP MOVSB/STOSB feature. It is recommended

+3 -3

drivers/acpi/acpi_video.c

··· 578 578 ACPI_VIDEO_FIRST_LEVEL - 1 - bqc_value; 579 579 580 580 level = device->brightness->levels[bqc_value + 581 - ACPI_VIDEO_FIRST_LEVEL]; 581 + ACPI_VIDEO_FIRST_LEVEL]; 582 582 } else { 583 583 level = bqc_value; 584 584 } ··· 990 990 goto out_free_levels; 991 991 992 992 ACPI_DEBUG_PRINT((ACPI_DB_INFO, 993 - "found %d brightness levels\n", 994 - br->count - ACPI_VIDEO_FIRST_LEVEL)); 993 + "found %d brightness levels\n", 994 + br->count - ACPI_VIDEO_FIRST_LEVEL)); 995 995 return 0; 996 996 997 997 out_free_levels:

+1 -1

drivers/acpi/battery.c

··· 987 987 */ 988 988 if ((battery->state & ACPI_BATTERY_STATE_CRITICAL) || 989 989 (test_bit(ACPI_BATTERY_ALARM_PRESENT, &battery->flags) && 990 - (battery->capacity_now <= battery->alarm))) 990 + (battery->capacity_now <= battery->alarm))) 991 991 acpi_pm_wakeup_event(&battery->device->dev); 992 992 993 993 return result;

+12 -1

drivers/acpi/button.c

··· 89 89 */ 90 90 .matches = { 91 91 DMI_MATCH(DMI_SYS_VENDOR, "MEDION"), 92 - DMI_MATCH(DMI_PRODUCT_NAME, "E2215T MD60198"), 92 + DMI_MATCH(DMI_PRODUCT_NAME, "E2215T"), 93 + }, 94 + .driver_data = (void *)(long)ACPI_BUTTON_LID_INIT_OPEN, 95 + }, 96 + { 97 + /* 98 + * Medion Akoya E2228T, notification of the LID device only 99 + * happens on close, not on open and _LID always returns closed. 100 + */ 101 + .matches = { 102 + DMI_MATCH(DMI_SYS_VENDOR, "MEDION"), 103 + DMI_MATCH(DMI_PRODUCT_NAME, "E2228T"), 93 104 }, 94 105 .driver_data = (void *)(long)ACPI_BUTTON_LID_INIT_OPEN, 95 106 },

+1

drivers/acpi/dptf/dptf_pch_fivr.c

··· 106 106 107 107 static const struct acpi_device_id pch_fivr_device_ids[] = { 108 108 {"INTC1045", 0}, 109 + {"INTC1049", 0}, 109 110 {"", 0}, 110 111 }; 111 112 MODULE_DEVICE_TABLE(acpi, pch_fivr_device_ids);

+2

drivers/acpi/dptf/dptf_power.c

··· 229 229 {"INT3532", 0}, 230 230 {"INTC1047", 0}, 231 231 {"INTC1050", 0}, 232 + {"INTC1060", 0}, 233 + {"INTC1061", 0}, 232 234 {"", 0}, 233 235 }; 234 236 MODULE_DEVICE_TABLE(acpi, int3407_device_ids);

+6

drivers/acpi/dptf/int340x_thermal.c

··· 25 25 {"INT340A"}, 26 26 {"INT340B"}, 27 27 {"INTC1040"}, 28 + {"INTC1041"}, 28 29 {"INTC1043"}, 29 30 {"INTC1044"}, 30 31 {"INTC1045"}, 32 + {"INTC1046"}, 31 33 {"INTC1047"}, 34 + {"INTC1048"}, 35 + {"INTC1049"}, 36 + {"INTC1060"}, 37 + {"INTC1061"}, 32 38 {""}, 33 39 }; 34 40

+1 -1

drivers/acpi/event.c

··· 31 31 event.type = type; 32 32 event.data = data; 33 33 return (blocking_notifier_call_chain(&acpi_chain_head, 0, (void *)&event) 34 - == NOTIFY_BAD) ? -EINVAL : 0; 34 + == NOTIFY_BAD) ? -EINVAL : 0; 35 35 } 36 36 EXPORT_SYMBOL(acpi_notifier_call_chain); 37 37

+1 -1

drivers/acpi/evged.c

··· 101 101 102 102 switch (gsi) { 103 103 case 0 ... 255: 104 - sprintf(ev_name, "_%c%02hhX", 104 + sprintf(ev_name, "_%c%02X", 105 105 trigger == ACPI_EDGE_SENSITIVE ? 'E' : 'L', gsi); 106 106 107 107 if (ACPI_SUCCESS(acpi_get_handle(handle, ev_name, &evt_handle)))

+1

drivers/acpi/fan.c

··· 27 27 {"PNP0C0B", 0}, 28 28 {"INT3404", 0}, 29 29 {"INTC1044", 0}, 30 + {"INTC1048", 0}, 30 31 {"", 0}, 31 32 }; 32 33 MODULE_DEVICE_TABLE(acpi, fan_device_ids);

+1 -1

drivers/acpi/internal.h

··· 134 134 void acpi_power_add_remove_device(struct acpi_device *adev, bool add); 135 135 int acpi_power_wakeup_list_init(struct list_head *list, int *system_level); 136 136 int acpi_device_sleep_wake(struct acpi_device *dev, 137 - int enable, int sleep_state, int dev_state); 137 + int enable, int sleep_state, int dev_state); 138 138 int acpi_power_get_inferred_state(struct acpi_device *device, int *state); 139 139 int acpi_power_on_resources(struct acpi_device *device, int state); 140 140 int acpi_power_transition(struct acpi_device *device, int state);

+5 -5

drivers/acpi/nfit/core.c

··· 2175 2175 * these commands. 2176 2176 */ 2177 2177 enum nfit_aux_cmds { 2178 - NFIT_CMD_TRANSLATE_SPA = 5, 2179 - NFIT_CMD_ARS_INJECT_SET = 7, 2180 - NFIT_CMD_ARS_INJECT_CLEAR = 8, 2181 - NFIT_CMD_ARS_INJECT_GET = 9, 2178 + NFIT_CMD_TRANSLATE_SPA = 5, 2179 + NFIT_CMD_ARS_INJECT_SET = 7, 2180 + NFIT_CMD_ARS_INJECT_CLEAR = 8, 2181 + NFIT_CMD_ARS_INJECT_GET = 9, 2182 2182 }; 2183 2183 2184 2184 static void acpi_nfit_init_dsms(struct acpi_nfit_desc *acpi_desc) ··· 2632 2632 nfit_blk->bdw_offset = nfit_mem->bdw->offset; 2633 2633 mmio = &nfit_blk->mmio[BDW]; 2634 2634 mmio->addr.base = devm_nvdimm_memremap(dev, nfit_mem->spa_bdw->address, 2635 - nfit_mem->spa_bdw->length, nd_blk_memremap_flags(ndbr)); 2635 + nfit_mem->spa_bdw->length, nd_blk_memremap_flags(ndbr)); 2636 2636 if (!mmio->addr.base) { 2637 2637 dev_dbg(dev, "%s failed to map bdw\n", 2638 2638 nvdimm_name(nvdimm));

+1 -1

drivers/acpi/pci_irq.c

··· 175 175 * configure the IRQ assigned to this slot|dev|pin. The 'source_index' 176 176 * indicates which resource descriptor in the resource template (of 177 177 * the link device) this interrupt is allocated from. 178 - * 178 + * 179 179 * NOTE: Don't query the Link Device for IRQ information at this time 180 180 * because Link Device enumeration may not have occurred yet 181 181 * (e.g. exists somewhere 'below' this _PRT entry in the ACPI

+6 -6

drivers/acpi/pci_link.c

··· 6 6 * Copyright (C) 2001, 2002 Paul Diefenbaugh <paul.s.diefenbaugh@intel.com> 7 7 * Copyright (C) 2002 Dominik Brodowski <devel@brodo.de> 8 8 * 9 - * TBD: 10 - * 1. Support more than one IRQ resource entry per link device (index). 9 + * TBD: 10 + * 1. Support more than one IRQ resource entry per link device (index). 11 11 * 2. Implement start/stop mechanism and use ACPI Bus Driver facilities 12 12 * for IRQ management (e.g. start()->_SRS). 13 13 */ ··· 249 249 } 250 250 } 251 251 252 - /* 253 - * Query and parse _CRS to get the current IRQ assignment. 252 + /* 253 + * Query and parse _CRS to get the current IRQ assignment. 254 254 */ 255 255 256 256 status = acpi_walk_resources(link->device->handle, METHOD_NAME__CRS, ··· 396 396 /* 397 397 * "acpi_irq_balance" (default in APIC mode) enables ACPI to use PIC Interrupt 398 398 * Link Devices to move the PIRQs around to minimize sharing. 399 - * 399 + * 400 400 * "acpi_irq_nobalance" (default in PIC mode) tells ACPI not to move any PIC IRQs 401 401 * that the BIOS has already set to active. This is necessary because 402 402 * ACPI has no automatic means of knowing what ISA IRQs are used. Note that ··· 414 414 * 415 415 * Note that PCI IRQ routers have a list of possible IRQs, 416 416 * which may not include the IRQs this table says are available. 417 - * 417 + * 418 418 * Since this heuristic can't tell the difference between a link 419 419 * that no device will attach to, vs. a link which may be shared 420 420 * by multiple active devices -- it is not optimal.

+1 -1

drivers/acpi/pci_mcfg.c

··· 173 173 { 174 174 if (!memcmp(f->oem_id, mcfg_oem_id, ACPI_OEM_ID_SIZE) && 175 175 !memcmp(f->oem_table_id, mcfg_oem_table_id, 176 - ACPI_OEM_TABLE_ID_SIZE) && 176 + ACPI_OEM_TABLE_ID_SIZE) && 177 177 f->oem_revision == mcfg_oem_revision && 178 178 f->segment == segment && 179 179 resource_contains(&f->bus_range, bus_range))

+3 -3

drivers/acpi/power.c

··· 13 13 * 1. via "Device Specific (D-State) Control" 14 14 * 2. via "Power Resource Control". 15 15 * The code below deals with ACPI Power Resources control. 16 - * 16 + * 17 17 * An ACPI "power resource object" represents a software controllable power 18 18 * plane, clock plane, or other resource depended on by a device. 19 19 * ··· 645 645 * -ENODEV if the execution of either _DSW or _PSW has failed 646 646 */ 647 647 int acpi_device_sleep_wake(struct acpi_device *dev, 648 - int enable, int sleep_state, int dev_state) 648 + int enable, int sleep_state, int dev_state) 649 649 { 650 650 union acpi_object in_arg[3]; 651 651 struct acpi_object_list arg_list = { 3, in_arg }; ··· 690 690 691 691 /* 692 692 * Prepare a wakeup device, two steps (Ref ACPI 2.0:P229): 693 - * 1. Power on the power resources required for the wakeup device 693 + * 1. Power on the power resources required for the wakeup device 694 694 * 2. Execute _DSW (Device Sleep Wake) or (deprecated in ACPI 3.0) _PSW (Power 695 695 * State Wake) for the device, if present 696 696 */

+3 -3

drivers/acpi/processor_perflib.c

··· 354 354 (u32) px->control, (u32) px->status)); 355 355 356 356 /* 357 - * Check that ACPI's u64 MHz will be valid as u32 KHz in cpufreq 357 + * Check that ACPI's u64 MHz will be valid as u32 KHz in cpufreq 358 358 */ 359 359 if (!px->core_frequency || 360 360 ((u32)(px->core_frequency * 1000) != ··· 627 627 goto err_ret; 628 628 629 629 /* 630 - * Now that we have _PSD data from all CPUs, lets setup P-state 630 + * Now that we have _PSD data from all CPUs, lets setup P-state 631 631 * domain info. 632 632 */ 633 633 for_each_possible_cpu(i) { ··· 693 693 if (match_pdomain->domain != pdomain->domain) 694 694 continue; 695 695 696 - match_pr->performance->shared_type = 696 + match_pr->performance->shared_type = 697 697 pr->performance->shared_type; 698 698 cpumask_copy(match_pr->performance->shared_cpu_map, 699 699 pr->performance->shared_cpu_map);

+1 -1

drivers/acpi/sbs.c

··· 366 366 state_readers[i].mode, 367 367 ACPI_SBS_BATTERY, 368 368 state_readers[i].command, 369 - (u8 *)battery + 369 + (u8 *)battery + 370 370 state_readers[i].offset); 371 371 if (result) 372 372 goto end;

+1 -1

drivers/acpi/sbshc.c

··· 176 176 EXPORT_SYMBOL_GPL(acpi_smbus_write); 177 177 178 178 int acpi_smbus_register_callback(struct acpi_smb_hc *hc, 179 - smbus_alarm_callback callback, void *context) 179 + smbus_alarm_callback callback, void *context) 180 180 { 181 181 mutex_lock(&hc->lock); 182 182 hc->callback = callback;

+3 -3

drivers/acpi/sbshc.h

··· 24 24 typedef void (*smbus_alarm_callback)(void *context); 25 25 26 26 extern int acpi_smbus_read(struct acpi_smb_hc *hc, u8 protocol, u8 address, 27 - u8 command, u8 * data); 27 + u8 command, u8 *data); 28 28 extern int acpi_smbus_write(struct acpi_smb_hc *hc, u8 protocol, u8 slave_address, 29 - u8 command, u8 * data, u8 length); 29 + u8 command, u8 *data, u8 length); 30 30 extern int acpi_smbus_register_callback(struct acpi_smb_hc *hc, 31 - smbus_alarm_callback callback, void *context); 31 + smbus_alarm_callback callback, void *context); 32 32 extern int acpi_smbus_unregister_callback(struct acpi_smb_hc *hc);

+1 -1

drivers/acpi/scan.c

··· 1453 1453 } 1454 1454 1455 1455 /** 1456 - * acpi_dma_configure - Set-up DMA configuration for the device. 1456 + * acpi_dma_configure_id - Set-up DMA configuration for the device. 1457 1457 * @dev: The pointer to the device 1458 1458 * @attr: device dma attributes 1459 1459 * @input_id: input device id const value pointer

+8 -8

drivers/acpi/video_detect.c

··· 178 178 DMI_MATCH(DMI_PRODUCT_VERSION, "ThinkPad X201s"), 179 179 }, 180 180 }, 181 - { 182 - .callback = video_detect_force_video, 183 - .ident = "ThinkPad X201T", 184 - .matches = { 185 - DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"), 186 - DMI_MATCH(DMI_PRODUCT_VERSION, "ThinkPad X201T"), 187 - }, 188 - }, 181 + { 182 + .callback = video_detect_force_video, 183 + .ident = "ThinkPad X201T", 184 + .matches = { 185 + DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"), 186 + DMI_MATCH(DMI_PRODUCT_VERSION, "ThinkPad X201T"), 187 + }, 188 + }, 189 189 190 190 /* The native backlight controls do not work on some older machines */ 191 191 {

+2 -2

drivers/acpi/wakeup.c

··· 44 44 if (!dev->wakeup.flags.valid 45 45 || sleep_state > (u32) dev->wakeup.sleep_state 46 46 || !(device_may_wakeup(&dev->dev) 47 - || dev->wakeup.prepare_count)) 47 + || dev->wakeup.prepare_count)) 48 48 continue; 49 49 50 50 if (device_may_wakeup(&dev->dev)) ··· 69 69 if (!dev->wakeup.flags.valid 70 70 || sleep_state > (u32) dev->wakeup.sleep_state 71 71 || !(device_may_wakeup(&dev->dev) 72 - || dev->wakeup.prepare_count)) 72 + || dev->wakeup.prepare_count)) 73 73 continue; 74 74 75 75 acpi_set_gpe_wake_mask(dev->wakeup.gpe_device, dev->wakeup.gpe_number,

+1 -1

drivers/block/null_blk.h

··· 47 47 unsigned int nr_zones_closed; 48 48 struct blk_zone *zones; 49 49 sector_t zone_size_sects; 50 - spinlock_t zone_dev_lock; 50 + spinlock_t zone_lock; 51 51 unsigned long *zone_locks; 52 52 53 53 unsigned long size; /* device size in MB */

+31 -16

drivers/block/null_blk_zoned.c

··· 46 46 if (!dev->zones) 47 47 return -ENOMEM; 48 48 49 - spin_lock_init(&dev->zone_dev_lock); 50 - dev->zone_locks = bitmap_zalloc(dev->nr_zones, GFP_KERNEL); 51 - if (!dev->zone_locks) { 52 - kvfree(dev->zones); 53 - return -ENOMEM; 49 + /* 50 + * With memory backing, the zone_lock spinlock needs to be temporarily 51 + * released to avoid scheduling in atomic context. To guarantee zone 52 + * information protection, use a bitmap to lock zones with 53 + * wait_on_bit_lock_io(). Sleeping on the lock is OK as memory backing 54 + * implies that the queue is marked with BLK_MQ_F_BLOCKING. 55 + */ 56 + spin_lock_init(&dev->zone_lock); 57 + if (dev->memory_backed) { 58 + dev->zone_locks = bitmap_zalloc(dev->nr_zones, GFP_KERNEL); 59 + if (!dev->zone_locks) { 60 + kvfree(dev->zones); 61 + return -ENOMEM; 62 + } 54 63 } 55 64 56 65 if (dev->zone_nr_conv >= dev->nr_zones) { ··· 146 137 147 138 static inline void null_lock_zone(struct nullb_device *dev, unsigned int zno) 148 139 { 149 - wait_on_bit_lock_io(dev->zone_locks, zno, TASK_UNINTERRUPTIBLE); 140 + if (dev->memory_backed) 141 + wait_on_bit_lock_io(dev->zone_locks, zno, TASK_UNINTERRUPTIBLE); 142 + spin_lock_irq(&dev->zone_lock); 150 143 } 151 144 152 145 static inline void null_unlock_zone(struct nullb_device *dev, unsigned int zno) 153 146 { 154 - clear_and_wake_up_bit(zno, dev->zone_locks); 147 + spin_unlock_irq(&dev->zone_lock); 148 + 149 + if (dev->memory_backed) 150 + clear_and_wake_up_bit(zno, dev->zone_locks); 155 151 } 156 152 157 153 int null_report_zones(struct gendisk *disk, sector_t sector, ··· 336 322 return null_process_cmd(cmd, REQ_OP_WRITE, sector, nr_sectors); 337 323 338 324 null_lock_zone(dev, zno); 339 - spin_lock(&dev->zone_dev_lock); 340 325 341 326 switch (zone->cond) { 342 327 case BLK_ZONE_COND_FULL: ··· 388 375 if (zone->cond != BLK_ZONE_COND_EXP_OPEN) 389 376 zone->cond = BLK_ZONE_COND_IMP_OPEN; 390 377 391 - spin_unlock(&dev->zone_dev_lock); 378 + /* 379 + * Memory backing allocation may sleep: release the zone_lock spinlock 380 + * to avoid scheduling in atomic context. Zone operation atomicity is 381 + * still guaranteed through the zone_locks bitmap. 382 + */ 383 + if (dev->memory_backed) 384 + spin_unlock_irq(&dev->zone_lock); 392 385 ret = null_process_cmd(cmd, REQ_OP_WRITE, sector, nr_sectors); 393 - spin_lock(&dev->zone_dev_lock); 386 + if (dev->memory_backed) 387 + spin_lock_irq(&dev->zone_lock); 388 + 394 389 if (ret != BLK_STS_OK) 395 390 goto unlock; 396 391 ··· 413 392 ret = BLK_STS_OK; 414 393 415 394 unlock: 416 - spin_unlock(&dev->zone_dev_lock); 417 395 null_unlock_zone(dev, zno); 418 396 419 397 return ret; ··· 536 516 null_lock_zone(dev, i); 537 517 zone = &dev->zones[i]; 538 518 if (zone->cond != BLK_ZONE_COND_EMPTY) { 539 - spin_lock(&dev->zone_dev_lock); 540 519 null_reset_zone(dev, zone); 541 - spin_unlock(&dev->zone_dev_lock); 542 520 trace_nullb_zone_op(cmd, i, zone->cond); 543 521 } 544 522 null_unlock_zone(dev, i); ··· 548 530 zone = &dev->zones[zone_no]; 549 531 550 532 null_lock_zone(dev, zone_no); 551 - spin_lock(&dev->zone_dev_lock); 552 533 553 534 switch (op) { 554 535 case REQ_OP_ZONE_RESET: ··· 566 549 ret = BLK_STS_NOTSUPP; 567 550 break; 568 551 } 569 - 570 - spin_unlock(&dev->zone_dev_lock); 571 552 572 553 if (ret == BLK_STS_OK) 573 554 trace_nullb_zone_op(cmd, zone_no, zone->cond);

+3 -1

drivers/cpufreq/cpufreq.c

··· 2254 2254 return -EINVAL; 2255 2255 2256 2256 /* Platform doesn't want dynamic frequency switching ? */ 2257 - if (policy->governor->dynamic_switching && 2257 + if (policy->governor->flags & CPUFREQ_GOV_DYNAMIC_SWITCHING && 2258 2258 cpufreq_driver->flags & CPUFREQ_NO_AUTO_DYNAMIC_SWITCHING) { 2259 2259 struct cpufreq_governor *gov = cpufreq_fallback_governor(); 2260 2260 ··· 2279 2279 return ret; 2280 2280 } 2281 2281 } 2282 + 2283 + policy->strict_target = !!(policy->governor->flags & CPUFREQ_GOV_STRICT_TARGET); 2282 2284 2283 2285 return 0; 2284 2286 }

+1 -1

drivers/cpufreq/cpufreq_governor.h

··· 156 156 #define CPUFREQ_DBS_GOVERNOR_INITIALIZER(_name_) \ 157 157 { \ 158 158 .name = _name_, \ 159 - .dynamic_switching = true, \ 159 + .flags = CPUFREQ_GOV_DYNAMIC_SWITCHING, \ 160 160 .owner = THIS_MODULE, \ 161 161 .init = cpufreq_dbs_governor_init, \ 162 162 .exit = cpufreq_dbs_governor_exit, \

+1

drivers/cpufreq/cpufreq_performance.c

··· 20 20 static struct cpufreq_governor cpufreq_gov_performance = { 21 21 .name = "performance", 22 22 .owner = THIS_MODULE, 23 + .flags = CPUFREQ_GOV_STRICT_TARGET, 23 24 .limits = cpufreq_gov_performance_limits, 24 25 }; 25 26

+1

drivers/cpufreq/cpufreq_powersave.c

··· 21 21 .name = "powersave", 22 22 .limits = cpufreq_gov_powersave_limits, 23 23 .owner = THIS_MODULE, 24 + .flags = CPUFREQ_GOV_STRICT_TARGET, 24 25 }; 25 26 26 27 MODULE_AUTHOR("Dominik Brodowski <linux@brodo.de>");

+9 -7

drivers/cpufreq/intel_pstate.c

··· 2527 2527 } 2528 2528 2529 2529 static void intel_cpufreq_adjust_hwp(struct cpudata *cpu, u32 target_pstate, 2530 - bool fast_switch) 2530 + bool strict, bool fast_switch) 2531 2531 { 2532 2532 u64 prev = READ_ONCE(cpu->hwp_req_cached), value = prev; 2533 2533 ··· 2539 2539 * field in it, so opportunistically update the max too if needed. 2540 2540 */ 2541 2541 value &= ~HWP_MAX_PERF(~0L); 2542 - value |= HWP_MAX_PERF(cpu->max_perf_ratio); 2542 + value |= HWP_MAX_PERF(strict ? target_pstate : cpu->max_perf_ratio); 2543 2543 2544 2544 if (value == prev) 2545 2545 return; ··· 2562 2562 pstate_funcs.get_val(cpu, target_pstate)); 2563 2563 } 2564 2564 2565 - static int intel_cpufreq_update_pstate(struct cpudata *cpu, int target_pstate, 2566 - bool fast_switch) 2565 + static int intel_cpufreq_update_pstate(struct cpufreq_policy *policy, 2566 + int target_pstate, bool fast_switch) 2567 2567 { 2568 + struct cpudata *cpu = all_cpu_data[policy->cpu]; 2568 2569 int old_pstate = cpu->pstate.current_pstate; 2569 2570 2570 2571 target_pstate = intel_pstate_prepare_request(cpu, target_pstate); 2571 2572 if (hwp_active) { 2572 - intel_cpufreq_adjust_hwp(cpu, target_pstate, fast_switch); 2573 + intel_cpufreq_adjust_hwp(cpu, target_pstate, 2574 + policy->strict_target, fast_switch); 2573 2575 cpu->pstate.current_pstate = target_pstate; 2574 2576 } else if (target_pstate != old_pstate) { 2575 2577 intel_cpufreq_adjust_perf_ctl(cpu, target_pstate, fast_switch); ··· 2611 2609 break; 2612 2610 } 2613 2611 2614 - target_pstate = intel_cpufreq_update_pstate(cpu, target_pstate, false); 2612 + target_pstate = intel_cpufreq_update_pstate(policy, target_pstate, false); 2615 2613 2616 2614 freqs.new = target_pstate * cpu->pstate.scaling; 2617 2615 ··· 2630 2628 2631 2629 target_pstate = DIV_ROUND_UP(target_freq, cpu->pstate.scaling); 2632 2630 2633 - target_pstate = intel_cpufreq_update_pstate(cpu, target_pstate, true); 2631 + target_pstate = intel_cpufreq_update_pstate(policy, target_pstate, true); 2634 2632 2635 2633 return target_pstate * cpu->pstate.scaling; 2636 2634 }

+1 -1

drivers/i2c/busses/Kconfig

··· 733 733 734 734 config I2C_MLXBF 735 735 tristate "Mellanox BlueField I2C controller" 736 - depends on ARM64 736 + depends on MELLANOX_PLATFORM && ARM64 737 737 help 738 738 Enabling this option will add I2C SMBus support for Mellanox BlueField 739 739 system.

+18 -32

drivers/i2c/busses/i2c-designware-slave.c

··· 159 159 u32 raw_stat, stat, enabled, tmp; 160 160 u8 val = 0, slave_activity; 161 161 162 - regmap_read(dev->map, DW_IC_INTR_STAT, &stat); 163 162 regmap_read(dev->map, DW_IC_ENABLE, &enabled); 164 163 regmap_read(dev->map, DW_IC_RAW_INTR_STAT, &raw_stat); 165 164 regmap_read(dev->map, DW_IC_STATUS, &tmp); ··· 167 168 if (!enabled || !(raw_stat & ~DW_IC_INTR_ACTIVITY) || !dev->slave) 168 169 return 0; 169 170 171 + stat = i2c_dw_read_clear_intrbits_slave(dev); 170 172 dev_dbg(dev->dev, 171 173 "%#x STATUS SLAVE_ACTIVITY=%#x : RAW_INTR_STAT=%#x : INTR_STAT=%#x\n", 172 174 enabled, slave_activity, raw_stat, stat); 173 175 174 - if ((stat & DW_IC_INTR_RX_FULL) && (stat & DW_IC_INTR_STOP_DET)) 175 - i2c_slave_event(dev->slave, I2C_SLAVE_WRITE_REQUESTED, &val); 176 + if (stat & DW_IC_INTR_RX_FULL) { 177 + if (dev->status != STATUS_WRITE_IN_PROGRESS) { 178 + dev->status = STATUS_WRITE_IN_PROGRESS; 179 + i2c_slave_event(dev->slave, I2C_SLAVE_WRITE_REQUESTED, 180 + &val); 181 + } 182 + 183 + regmap_read(dev->map, DW_IC_DATA_CMD, &tmp); 184 + val = tmp; 185 + if (!i2c_slave_event(dev->slave, I2C_SLAVE_WRITE_RECEIVED, 186 + &val)) 187 + dev_vdbg(dev->dev, "Byte %X acked!", val); 188 + } 176 189 177 190 if (stat & DW_IC_INTR_RD_REQ) { 178 191 if (slave_activity) { 179 - if (stat & DW_IC_INTR_RX_FULL) { 180 - regmap_read(dev->map, DW_IC_DATA_CMD, &tmp); 181 - val = tmp; 192 + regmap_read(dev->map, DW_IC_CLR_RD_REQ, &tmp); 182 193 183 - if (!i2c_slave_event(dev->slave, 184 - I2C_SLAVE_WRITE_RECEIVED, 185 - &val)) { 186 - dev_vdbg(dev->dev, "Byte %X acked!", 187 - val); 188 - } 189 - regmap_read(dev->map, DW_IC_CLR_RD_REQ, &tmp); 190 - stat = i2c_dw_read_clear_intrbits_slave(dev); 191 - } else { 192 - regmap_read(dev->map, DW_IC_CLR_RD_REQ, &tmp); 193 - regmap_read(dev->map, DW_IC_CLR_RX_UNDER, &tmp); 194 - stat = i2c_dw_read_clear_intrbits_slave(dev); 195 - } 194 + dev->status = STATUS_READ_IN_PROGRESS; 196 195 if (!i2c_slave_event(dev->slave, 197 196 I2C_SLAVE_READ_REQUESTED, 198 197 &val)) ··· 202 205 if (!i2c_slave_event(dev->slave, I2C_SLAVE_READ_PROCESSED, 203 206 &val)) 204 207 regmap_read(dev->map, DW_IC_CLR_RX_DONE, &tmp); 205 - 206 - i2c_slave_event(dev->slave, I2C_SLAVE_STOP, &val); 207 - stat = i2c_dw_read_clear_intrbits_slave(dev); 208 - return 1; 209 208 } 210 209 211 - if (stat & DW_IC_INTR_RX_FULL) { 212 - regmap_read(dev->map, DW_IC_DATA_CMD, &tmp); 213 - val = tmp; 214 - if (!i2c_slave_event(dev->slave, I2C_SLAVE_WRITE_RECEIVED, 215 - &val)) 216 - dev_vdbg(dev->dev, "Byte %X acked!", val); 217 - } else { 210 + if (stat & DW_IC_INTR_STOP_DET) { 211 + dev->status = STATUS_IDLE; 218 212 i2c_slave_event(dev->slave, I2C_SLAVE_STOP, &val); 219 - stat = i2c_dw_read_clear_intrbits_slave(dev); 220 213 } 221 214 222 215 return 1; ··· 217 230 struct dw_i2c_dev *dev = dev_id; 218 231 int ret; 219 232 220 - i2c_dw_read_clear_intrbits_slave(dev); 221 233 ret = i2c_dw_irq_handler_slave(dev); 222 234 if (ret > 0) 223 235 complete(&dev->cmd_complete);

+86 -118

drivers/i2c/busses/i2c-mlxbf.c

··· 62 62 * Master. Default value is set to 400MHz. 63 63 */ 64 64 #define MLXBF_I2C_TYU_PLL_OUT_FREQ (400 * 1000 * 1000) 65 - /* Reference clock for Bluefield 1 - 156 MHz. */ 66 - #define MLXBF_I2C_TYU_PLL_IN_FREQ (156 * 1000 * 1000) 67 - /* Reference clock for BlueField 2 - 200 MHz. */ 68 - #define MLXBF_I2C_YU_PLL_IN_FREQ (200 * 1000 * 1000) 65 + /* Reference clock for Bluefield - 156 MHz. */ 66 + #define MLXBF_I2C_PLL_IN_FREQ (156 * 1000 * 1000) 69 67 70 68 /* Constant used to determine the PLL frequency. */ 71 69 #define MLNXBF_I2C_COREPLL_CONST 16384 ··· 487 489 488 490 #define MLXBF_I2C_FREQUENCY_1GHZ 1000000000 489 491 490 - static void mlxbf_i2c_write(void __iomem *io, int reg, u32 val) 491 - { 492 - writel(val, io + reg); 493 - } 494 - 495 - static u32 mlxbf_i2c_read(void __iomem *io, int reg) 496 - { 497 - return readl(io + reg); 498 - } 499 - 500 - /* 501 - * This function is used to read data from Master GW Data Descriptor. 502 - * Data bytes in the Master GW Data Descriptor are shifted left so the 503 - * data starts at the MSB of the descriptor registers as set by the 504 - * underlying hardware. TYU_READ_DATA enables byte swapping while 505 - * reading data bytes, and MUST be called by the SMBus read routines 506 - * to copy data from the 32 * 32-bit HW Data registers a.k.a Master GW 507 - * Data Descriptor. 508 - */ 509 - static u32 mlxbf_i2c_read_data(void __iomem *io, int reg) 510 - { 511 - return (u32)be32_to_cpu(mlxbf_i2c_read(io, reg)); 512 - } 513 - 514 - /* 515 - * This function is used to write data to the Master GW Data Descriptor. 516 - * Data copied to the Master GW Data Descriptor MUST be shifted left so 517 - * the data starts at the MSB of the descriptor registers as required by 518 - * the underlying hardware. TYU_WRITE_DATA enables byte swapping when 519 - * writing data bytes, and MUST be called by the SMBus write routines to 520 - * copy data to the 32 * 32-bit HW Data registers a.k.a Master GW Data 521 - * Descriptor. 522 - */ 523 - static void mlxbf_i2c_write_data(void __iomem *io, int reg, u32 val) 524 - { 525 - mlxbf_i2c_write(io, reg, (u32)cpu_to_be32(val)); 526 - } 527 - 528 492 /* 529 493 * Function to poll a set of bits at a specific address; it checks whether 530 494 * the bits are equal to zero when eq_zero is set to 'true', and not equal ··· 501 541 timeout = (timeout / MLXBF_I2C_POLL_FREQ_IN_USEC) + 1; 502 542 503 543 do { 504 - bits = mlxbf_i2c_read(io, addr) & mask; 544 + bits = readl(io + addr) & mask; 505 545 if (eq_zero ? bits == 0 : bits != 0) 506 546 return eq_zero ? 1 : bits; 507 547 udelay(MLXBF_I2C_POLL_FREQ_IN_USEC); ··· 569 609 MLXBF_I2C_SMBUS_TIMEOUT); 570 610 571 611 /* Read cause status bits. */ 572 - cause_status_bits = mlxbf_i2c_read(priv->mst_cause->io, 573 - MLXBF_I2C_CAUSE_ARBITER); 612 + cause_status_bits = readl(priv->mst_cause->io + 613 + MLXBF_I2C_CAUSE_ARBITER); 574 614 cause_status_bits &= MLXBF_I2C_CAUSE_MASTER_ARBITER_BITS_MASK; 575 615 576 616 /* 577 617 * Parse both Cause and Master GW bits, then return transaction status. 578 618 */ 579 619 580 - master_status_bits = mlxbf_i2c_read(priv->smbus->io, 581 - MLXBF_I2C_SMBUS_MASTER_STATUS); 620 + master_status_bits = readl(priv->smbus->io + 621 + MLXBF_I2C_SMBUS_MASTER_STATUS); 582 622 master_status_bits &= MLXBF_I2C_SMBUS_MASTER_STATUS_MASK; 583 623 584 624 if (mlxbf_i2c_smbus_transaction_success(master_status_bits, ··· 609 649 610 650 aligned_length = round_up(length, 4); 611 651 612 - /* Copy data bytes from 4-byte aligned source buffer. */ 652 + /* 653 + * Copy data bytes from 4-byte aligned source buffer. 654 + * Data copied to the Master GW Data Descriptor MUST be shifted 655 + * left so the data starts at the MSB of the descriptor registers 656 + * as required by the underlying hardware. Enable byte swapping 657 + * when writing data bytes to the 32 * 32-bit HW Data registers 658 + * a.k.a Master GW Data Descriptor. 659 + */ 613 660 for (offset = 0; offset < aligned_length; offset += sizeof(u32)) { 614 661 data32 = *((u32 *)(data + offset)); 615 - mlxbf_i2c_write_data(priv->smbus->io, addr + offset, data32); 662 + iowrite32be(data32, priv->smbus->io + addr + offset); 616 663 } 617 664 } 618 665 ··· 631 664 632 665 mask = sizeof(u32) - 1; 633 666 667 + /* 668 + * Data bytes in the Master GW Data Descriptor are shifted left 669 + * so the data starts at the MSB of the descriptor registers as 670 + * set by the underlying hardware. Enable byte swapping while 671 + * reading data bytes from the 32 * 32-bit HW Data registers 672 + * a.k.a Master GW Data Descriptor. 673 + */ 674 + 634 675 for (offset = 0; offset < (length & ~mask); offset += sizeof(u32)) { 635 - data32 = mlxbf_i2c_read_data(priv->smbus->io, addr + offset); 676 + data32 = ioread32be(priv->smbus->io + addr + offset); 636 677 *((u32 *)(data + offset)) = data32; 637 678 } 638 679 639 680 if (!(length & mask)) 640 681 return; 641 682 642 - data32 = mlxbf_i2c_read_data(priv->smbus->io, addr + offset); 683 + data32 = ioread32be(priv->smbus->io + addr + offset); 643 684 644 685 for (byte = 0; byte < (length & mask); byte++) { 645 686 data[offset + byte] = data32 & GENMASK(7, 0); ··· 673 698 command |= rol32(pec_en, MLXBF_I2C_MASTER_SEND_PEC_SHIFT); 674 699 675 700 /* Clear status bits. */ 676 - mlxbf_i2c_write(priv->smbus->io, MLXBF_I2C_SMBUS_MASTER_STATUS, 0x0); 701 + writel(0x0, priv->smbus->io + MLXBF_I2C_SMBUS_MASTER_STATUS); 677 702 /* Set the cause data. */ 678 - mlxbf_i2c_write(priv->smbus->io, MLXBF_I2C_CAUSE_OR_CLEAR, ~0x0); 703 + writel(~0x0, priv->smbus->io + MLXBF_I2C_CAUSE_OR_CLEAR); 679 704 /* Zero PEC byte. */ 680 - mlxbf_i2c_write(priv->smbus->io, MLXBF_I2C_SMBUS_MASTER_PEC, 0x0); 705 + writel(0x0, priv->smbus->io + MLXBF_I2C_SMBUS_MASTER_PEC); 681 706 /* Zero byte count. */ 682 - mlxbf_i2c_write(priv->smbus->io, MLXBF_I2C_SMBUS_RS_BYTES, 0x0); 707 + writel(0x0, priv->smbus->io + MLXBF_I2C_SMBUS_RS_BYTES); 683 708 684 709 /* GW activation. */ 685 - mlxbf_i2c_write(priv->smbus->io, MLXBF_I2C_SMBUS_MASTER_GW, command); 710 + writel(command, priv->smbus->io + MLXBF_I2C_SMBUS_MASTER_GW); 686 711 687 712 /* 688 713 * Poll master status and check status bits. An ACK is sent when ··· 798 823 * needs to be 'manually' reset. This should be removed in 799 824 * next tag integration. 800 825 */ 801 - mlxbf_i2c_write(priv->smbus->io, MLXBF_I2C_SMBUS_MASTER_FSM, 802 - MLXBF_I2C_SMBUS_MASTER_FSM_PS_STATE_MASK); 826 + writel(MLXBF_I2C_SMBUS_MASTER_FSM_PS_STATE_MASK, 827 + priv->smbus->io + MLXBF_I2C_SMBUS_MASTER_FSM); 803 828 } 804 829 805 830 return ret; ··· 1088 1113 timer |= mlxbf_i2c_set_timer(priv, timings->scl_low, 1089 1114 false, MLXBF_I2C_MASK_16, 1090 1115 MLXBF_I2C_SHIFT_16); 1091 - mlxbf_i2c_write(priv->smbus->io, MLXBF_I2C_SMBUS_TIMER_SCL_LOW_SCL_HIGH, 1092 - timer); 1116 + writel(timer, priv->smbus->io + 1117 + MLXBF_I2C_SMBUS_TIMER_SCL_LOW_SCL_HIGH); 1093 1118 1094 1119 timer = mlxbf_i2c_set_timer(priv, timings->sda_rise, false, 1095 1120 MLXBF_I2C_MASK_8, MLXBF_I2C_SHIFT_0); ··· 1099 1124 MLXBF_I2C_MASK_8, MLXBF_I2C_SHIFT_16); 1100 1125 timer |= mlxbf_i2c_set_timer(priv, timings->scl_fall, false, 1101 1126 MLXBF_I2C_MASK_8, MLXBF_I2C_SHIFT_24); 1102 - mlxbf_i2c_write(priv->smbus->io, MLXBF_I2C_SMBUS_TIMER_FALL_RISE_SPIKE, 1103 - timer); 1127 + writel(timer, priv->smbus->io + 1128 + MLXBF_I2C_SMBUS_TIMER_FALL_RISE_SPIKE); 1104 1129 1105 1130 timer = mlxbf_i2c_set_timer(priv, timings->hold_start, true, 1106 1131 MLXBF_I2C_MASK_16, MLXBF_I2C_SHIFT_0); 1107 1132 timer |= mlxbf_i2c_set_timer(priv, timings->hold_data, true, 1108 1133 MLXBF_I2C_MASK_16, MLXBF_I2C_SHIFT_16); 1109 - mlxbf_i2c_write(priv->smbus->io, MLXBF_I2C_SMBUS_TIMER_THOLD, timer); 1134 + writel(timer, priv->smbus->io + MLXBF_I2C_SMBUS_TIMER_THOLD); 1110 1135 1111 1136 timer = mlxbf_i2c_set_timer(priv, timings->setup_start, true, 1112 1137 MLXBF_I2C_MASK_16, MLXBF_I2C_SHIFT_0); 1113 1138 timer |= mlxbf_i2c_set_timer(priv, timings->setup_stop, true, 1114 1139 MLXBF_I2C_MASK_16, MLXBF_I2C_SHIFT_16); 1115 - mlxbf_i2c_write(priv->smbus->io, 1116 - MLXBF_I2C_SMBUS_TIMER_TSETUP_START_STOP, timer); 1140 + writel(timer, priv->smbus->io + 1141 + MLXBF_I2C_SMBUS_TIMER_TSETUP_START_STOP); 1117 1142 1118 1143 timer = mlxbf_i2c_set_timer(priv, timings->setup_data, true, 1119 1144 MLXBF_I2C_MASK_16, MLXBF_I2C_SHIFT_0); 1120 - mlxbf_i2c_write(priv->smbus->io, MLXBF_I2C_SMBUS_TIMER_TSETUP_DATA, 1121 - timer); 1145 + writel(timer, priv->smbus->io + MLXBF_I2C_SMBUS_TIMER_TSETUP_DATA); 1122 1146 1123 1147 timer = mlxbf_i2c_set_timer(priv, timings->buf, false, 1124 1148 MLXBF_I2C_MASK_16, MLXBF_I2C_SHIFT_0); 1125 1149 timer |= mlxbf_i2c_set_timer(priv, timings->thigh_max, false, 1126 1150 MLXBF_I2C_MASK_16, MLXBF_I2C_SHIFT_16); 1127 - mlxbf_i2c_write(priv->smbus->io, MLXBF_I2C_SMBUS_THIGH_MAX_TBUF, 1128 - timer); 1151 + writel(timer, priv->smbus->io + MLXBF_I2C_SMBUS_THIGH_MAX_TBUF); 1129 1152 1130 1153 timer = timings->timeout; 1131 - mlxbf_i2c_write(priv->smbus->io, MLXBF_I2C_SMBUS_SCL_LOW_TIMEOUT, 1132 - timer); 1154 + writel(timer, priv->smbus->io + MLXBF_I2C_SMBUS_SCL_LOW_TIMEOUT); 1133 1155 } 1134 1156 1135 1157 enum mlxbf_i2c_timings_config { ··· 1398 1426 * platform firmware; disabling the bus might compromise the system 1399 1427 * functionality. 1400 1428 */ 1401 - config_reg = mlxbf_i2c_read(gpio_res->io, 1402 - MLXBF_I2C_GPIO_0_FUNC_EN_0); 1429 + config_reg = readl(gpio_res->io + MLXBF_I2C_GPIO_0_FUNC_EN_0); 1403 1430 config_reg = MLXBF_I2C_GPIO_SMBUS_GW_ASSERT_PINS(priv->bus, 1404 1431 config_reg); 1405 - mlxbf_i2c_write(gpio_res->io, MLXBF_I2C_GPIO_0_FUNC_EN_0, 1406 - config_reg); 1432 + writel(config_reg, gpio_res->io + MLXBF_I2C_GPIO_0_FUNC_EN_0); 1407 1433 1408 - config_reg = mlxbf_i2c_read(gpio_res->io, 1409 - MLXBF_I2C_GPIO_0_FORCE_OE_EN); 1434 + config_reg = readl(gpio_res->io + MLXBF_I2C_GPIO_0_FORCE_OE_EN); 1410 1435 config_reg = MLXBF_I2C_GPIO_SMBUS_GW_RESET_PINS(priv->bus, 1411 1436 config_reg); 1412 - mlxbf_i2c_write(gpio_res->io, MLXBF_I2C_GPIO_0_FORCE_OE_EN, 1413 - config_reg); 1437 + writel(config_reg, gpio_res->io + MLXBF_I2C_GPIO_0_FORCE_OE_EN); 1414 1438 1415 1439 mutex_unlock(gpio_res->lock); 1416 1440 ··· 1420 1452 u32 corepll_val; 1421 1453 u16 core_f; 1422 1454 1423 - pad_frequency = MLXBF_I2C_TYU_PLL_IN_FREQ; 1455 + pad_frequency = MLXBF_I2C_PLL_IN_FREQ; 1424 1456 1425 - corepll_val = mlxbf_i2c_read(corepll_res->io, 1426 - MLXBF_I2C_CORE_PLL_REG1); 1457 + corepll_val = readl(corepll_res->io + MLXBF_I2C_CORE_PLL_REG1); 1427 1458 1428 1459 /* Get Core PLL configuration bits. */ 1429 1460 core_f = rol32(corepll_val, MLXBF_I2C_COREPLL_CORE_F_TYU_SHIFT) & ··· 1455 1488 u8 core_od, core_r; 1456 1489 u32 core_f; 1457 1490 1458 - pad_frequency = MLXBF_I2C_YU_PLL_IN_FREQ; 1491 + pad_frequency = MLXBF_I2C_PLL_IN_FREQ; 1459 1492 1460 - corepll_reg1_val = mlxbf_i2c_read(corepll_res->io, 1461 - MLXBF_I2C_CORE_PLL_REG1); 1462 - corepll_reg2_val = mlxbf_i2c_read(corepll_res->io, 1463 - MLXBF_I2C_CORE_PLL_REG2); 1493 + corepll_reg1_val = readl(corepll_res->io + MLXBF_I2C_CORE_PLL_REG1); 1494 + corepll_reg2_val = readl(corepll_res->io + MLXBF_I2C_CORE_PLL_REG2); 1464 1495 1465 1496 /* Get Core PLL configuration bits */ 1466 1497 core_f = rol32(corepll_reg1_val, MLXBF_I2C_COREPLL_CORE_F_YU_SHIFT) & ··· 1550 1585 * (7-bit address, 1 status bit (1 if enabled, 0 if not)). 1551 1586 */ 1552 1587 for (reg = 0; reg < reg_cnt; reg++) { 1553 - slave_reg = mlxbf_i2c_read(priv->smbus->io, 1588 + slave_reg = readl(priv->smbus->io + 1554 1589 MLXBF_I2C_SMBUS_SLAVE_ADDR_CFG + reg * 0x4); 1555 1590 /* 1556 1591 * Each register holds 4 slave addresses. So, we have to keep ··· 1608 1643 1609 1644 /* Enable the slave address and update the register. */ 1610 1645 slave_reg |= (1 << MLXBF_I2C_SMBUS_SLAVE_ADDR_EN_BIT) << (byte * 8); 1611 - mlxbf_i2c_write(priv->smbus->io, 1612 - MLXBF_I2C_SMBUS_SLAVE_ADDR_CFG + reg * 0x4, slave_reg); 1646 + writel(slave_reg, priv->smbus->io + MLXBF_I2C_SMBUS_SLAVE_ADDR_CFG + 1647 + reg * 0x4); 1613 1648 1614 1649 return 0; 1615 1650 } ··· 1633 1668 * (7-bit address, 1 status bit (1 if enabled, 0 if not)). 1634 1669 */ 1635 1670 for (reg = 0; reg < reg_cnt; reg++) { 1636 - slave_reg = mlxbf_i2c_read(priv->smbus->io, 1671 + slave_reg = readl(priv->smbus->io + 1637 1672 MLXBF_I2C_SMBUS_SLAVE_ADDR_CFG + reg * 0x4); 1638 1673 1639 1674 /* Check whether the address slots are empty. */ ··· 1673 1708 1674 1709 /* Cleanup the slave address slot. */ 1675 1710 slave_reg &= ~(GENMASK(7, 0) << (slave_byte * 8)); 1676 - mlxbf_i2c_write(priv->smbus->io, 1677 - MLXBF_I2C_SMBUS_SLAVE_ADDR_CFG + reg * 0x4, slave_reg); 1711 + writel(slave_reg, priv->smbus->io + MLXBF_I2C_SMBUS_SLAVE_ADDR_CFG + 1712 + reg * 0x4); 1678 1713 1679 1714 return 0; 1680 1715 } ··· 1766 1801 int ret; 1767 1802 1768 1803 /* Reset FSM. */ 1769 - mlxbf_i2c_write(priv->smbus->io, MLXBF_I2C_SMBUS_SLAVE_FSM, 0); 1804 + writel(0, priv->smbus->io + MLXBF_I2C_SMBUS_SLAVE_FSM); 1770 1805 1771 1806 /* 1772 1807 * Enable slave cause interrupt bits. Drive ··· 1775 1810 * masters issue a Read and Write, respectively. But, clear all 1776 1811 * interrupts first. 1777 1812 */ 1778 - mlxbf_i2c_write(priv->slv_cause->io, 1779 - MLXBF_I2C_CAUSE_OR_CLEAR, ~0); 1813 + writel(~0, priv->slv_cause->io + MLXBF_I2C_CAUSE_OR_CLEAR); 1780 1814 int_reg = MLXBF_I2C_CAUSE_READ_WAIT_FW_RESPONSE; 1781 1815 int_reg |= MLXBF_I2C_CAUSE_WRITE_SUCCESS; 1782 - mlxbf_i2c_write(priv->slv_cause->io, 1783 - MLXBF_I2C_CAUSE_OR_EVTEN0, int_reg); 1816 + writel(int_reg, priv->slv_cause->io + MLXBF_I2C_CAUSE_OR_EVTEN0); 1784 1817 1785 1818 /* Finally, set the 'ready' bit to start handling transactions. */ 1786 - mlxbf_i2c_write(priv->smbus->io, MLXBF_I2C_SMBUS_SLAVE_READY, 0x1); 1819 + writel(0x1, priv->smbus->io + MLXBF_I2C_SMBUS_SLAVE_READY); 1787 1820 1788 1821 /* Initialize the cause coalesce resource. */ 1789 1822 ret = mlxbf_i2c_init_coalesce(pdev, priv); ··· 1807 1844 MLXBF_I2C_CAUSE_YU_SLAVE_BIT : 1808 1845 priv->bus + MLXBF_I2C_CAUSE_TYU_SLAVE_BIT; 1809 1846 1810 - coalesce0_reg = mlxbf_i2c_read(priv->coalesce->io, 1811 - MLXBF_I2C_CAUSE_COALESCE_0); 1847 + coalesce0_reg = readl(priv->coalesce->io + MLXBF_I2C_CAUSE_COALESCE_0); 1812 1848 is_set = coalesce0_reg & (1 << slave_shift); 1813 1849 1814 1850 if (!is_set) 1815 1851 return false; 1816 1852 1817 1853 /* Check the source of the interrupt, i.e. whether a Read or Write. */ 1818 - cause_reg = mlxbf_i2c_read(priv->slv_cause->io, 1819 - MLXBF_I2C_CAUSE_ARBITER); 1854 + cause_reg = readl(priv->slv_cause->io + MLXBF_I2C_CAUSE_ARBITER); 1820 1855 if (cause_reg & MLXBF_I2C_CAUSE_READ_WAIT_FW_RESPONSE) 1821 1856 *read = true; 1822 1857 else if (cause_reg & MLXBF_I2C_CAUSE_WRITE_SUCCESS) 1823 1858 *write = true; 1824 1859 1825 1860 /* Clear cause bits. */ 1826 - mlxbf_i2c_write(priv->slv_cause->io, MLXBF_I2C_CAUSE_OR_CLEAR, ~0x0); 1861 + writel(~0x0, priv->slv_cause->io + MLXBF_I2C_CAUSE_OR_CLEAR); 1827 1862 1828 1863 return true; 1829 1864 } ··· 1861 1900 * address, if supplied. 1862 1901 */ 1863 1902 if (recv_bytes > 0) { 1864 - data32 = mlxbf_i2c_read_data(priv->smbus->io, 1865 - MLXBF_I2C_SLAVE_DATA_DESC_ADDR); 1903 + data32 = ioread32be(priv->smbus->io + 1904 + MLXBF_I2C_SLAVE_DATA_DESC_ADDR); 1866 1905 1867 1906 /* Parse the received bytes. */ 1868 1907 switch (recv_bytes) { ··· 1927 1966 control32 |= rol32(write_size, MLXBF_I2C_SLAVE_WRITE_BYTES_SHIFT); 1928 1967 control32 |= rol32(pec_en, MLXBF_I2C_SLAVE_SEND_PEC_SHIFT); 1929 1968 1930 - mlxbf_i2c_write(priv->smbus->io, MLXBF_I2C_SMBUS_SLAVE_GW, control32); 1969 + writel(control32, priv->smbus->io + MLXBF_I2C_SMBUS_SLAVE_GW); 1931 1970 1932 1971 /* 1933 1972 * Wait until the transfer is completed; the driver will wait ··· 1936 1975 mlxbf_smbus_slave_wait_for_idle(priv, MLXBF_I2C_SMBUS_TIMEOUT); 1937 1976 1938 1977 /* Release the Slave GW. */ 1939 - mlxbf_i2c_write(priv->smbus->io, 1940 - MLXBF_I2C_SMBUS_SLAVE_RS_MASTER_BYTES, 0x0); 1941 - mlxbf_i2c_write(priv->smbus->io, MLXBF_I2C_SMBUS_SLAVE_PEC, 0x0); 1942 - mlxbf_i2c_write(priv->smbus->io, MLXBF_I2C_SMBUS_SLAVE_READY, 0x1); 1978 + writel(0x0, priv->smbus->io + MLXBF_I2C_SMBUS_SLAVE_RS_MASTER_BYTES); 1979 + writel(0x0, priv->smbus->io + MLXBF_I2C_SMBUS_SLAVE_PEC); 1980 + writel(0x1, priv->smbus->io + MLXBF_I2C_SMBUS_SLAVE_READY); 1943 1981 1944 1982 return 0; 1945 1983 } ··· 1983 2023 i2c_slave_event(slave, I2C_SLAVE_STOP, &value); 1984 2024 1985 2025 /* Release the Slave GW. */ 1986 - mlxbf_i2c_write(priv->smbus->io, 1987 - MLXBF_I2C_SMBUS_SLAVE_RS_MASTER_BYTES, 0x0); 1988 - mlxbf_i2c_write(priv->smbus->io, MLXBF_I2C_SMBUS_SLAVE_PEC, 0x0); 1989 - mlxbf_i2c_write(priv->smbus->io, MLXBF_I2C_SMBUS_SLAVE_READY, 0x1); 2026 + writel(0x0, priv->smbus->io + MLXBF_I2C_SMBUS_SLAVE_RS_MASTER_BYTES); 2027 + writel(0x0, priv->smbus->io + MLXBF_I2C_SMBUS_SLAVE_PEC); 2028 + writel(0x1, priv->smbus->io + MLXBF_I2C_SMBUS_SLAVE_READY); 1990 2029 1991 2030 return ret; 1992 2031 } ··· 2020 2061 * slave, if the higher 8 bits are sent then the slave expect N bytes 2021 2062 * from the master. 2022 2063 */ 2023 - rw_bytes_reg = mlxbf_i2c_read(priv->smbus->io, 2024 - MLXBF_I2C_SMBUS_SLAVE_RS_MASTER_BYTES); 2064 + rw_bytes_reg = readl(priv->smbus->io + 2065 + MLXBF_I2C_SMBUS_SLAVE_RS_MASTER_BYTES); 2025 2066 recv_bytes = (rw_bytes_reg >> 8) & GENMASK(7, 0); 2026 2067 2027 2068 /* ··· 2223 2264 2224 2265 MODULE_DEVICE_TABLE(of, mlxbf_i2c_dt_ids); 2225 2266 2267 + #ifdef CONFIG_ACPI 2226 2268 static const struct acpi_device_id mlxbf_i2c_acpi_ids[] = { 2227 2269 { "MLNXBF03", (kernel_ulong_t)&mlxbf_i2c_chip[MLXBF_I2C_CHIP_TYPE_1] }, 2228 2270 { "MLNXBF23", (kernel_ulong_t)&mlxbf_i2c_chip[MLXBF_I2C_CHIP_TYPE_2] }, ··· 2265 2305 2266 2306 return ret; 2267 2307 } 2308 + #else 2309 + static int mlxbf_i2c_acpi_probe(struct device *dev, struct mlxbf_i2c_priv *priv) 2310 + { 2311 + return -ENOENT; 2312 + } 2313 + #endif /* CONFIG_ACPI */ 2268 2314 2269 2315 static int mlxbf_i2c_of_probe(struct device *dev, struct mlxbf_i2c_priv *priv) 2270 2316 { ··· 2439 2473 .driver = { 2440 2474 .name = "i2c-mlxbf", 2441 2475 .of_match_table = mlxbf_i2c_dt_ids, 2476 + #ifdef CONFIG_ACPI 2442 2477 .acpi_match_table = ACPI_PTR(mlxbf_i2c_acpi_ids), 2478 + #endif /* CONFIG_ACPI */ 2443 2479 }, 2444 2480 }; 2445 2481 ··· 2470 2502 module_exit(mlxbf_i2c_exit); 2471 2503 2472 2504 MODULE_DESCRIPTION("Mellanox BlueField I2C bus driver"); 2473 - MODULE_AUTHOR("Khalil Blaiech <kblaiech@mellanox.com>"); 2505 + MODULE_AUTHOR("Khalil Blaiech <kblaiech@nvidia.com>"); 2474 2506 MODULE_LICENSE("GPL v2");

+4 -4

drivers/i2c/busses/i2c-mt65xx.c

··· 475 475 { 476 476 u16 control_reg; 477 477 478 + writel(I2C_DMA_HARD_RST, i2c->pdmabase + OFFSET_RST); 479 + udelay(50); 480 + writel(I2C_DMA_CLR_FLAG, i2c->pdmabase + OFFSET_RST); 481 + 478 482 mtk_i2c_writew(i2c, I2C_SOFT_RST, OFFSET_SOFTRESET); 479 483 480 484 /* Set ioconfig */ ··· 533 529 534 530 mtk_i2c_writew(i2c, control_reg, OFFSET_CONTROL); 535 531 mtk_i2c_writew(i2c, I2C_DELAY_LEN, OFFSET_DELAY_LEN); 536 - 537 - writel(I2C_DMA_HARD_RST, i2c->pdmabase + OFFSET_RST); 538 - udelay(50); 539 - writel(I2C_DMA_CLR_FLAG, i2c->pdmabase + OFFSET_RST); 540 532 } 541 533 542 534 static const struct i2c_spec_values *mtk_i2c_get_spec(unsigned int speed)

+65 -19

drivers/i2c/busses/i2c-sh_mobile.c

··· 129 129 int sr; 130 130 bool send_stop; 131 131 bool stop_after_dma; 132 + bool atomic_xfer; 132 133 133 134 struct resource *res; 134 135 struct dma_chan *dma_tx; ··· 331 330 ret = iic_rd(pd, ICDR); 332 331 break; 333 332 case OP_RX_STOP: /* enable DTE interrupt, issue stop */ 334 - iic_wr(pd, ICIC, 335 - ICIC_DTEE | ICIC_WAITE | ICIC_ALE | ICIC_TACKE); 333 + if (!pd->atomic_xfer) 334 + iic_wr(pd, ICIC, 335 + ICIC_DTEE | ICIC_WAITE | ICIC_ALE | ICIC_TACKE); 336 336 iic_wr(pd, ICCR, ICCR_ICE | ICCR_RACK); 337 337 break; 338 338 case OP_RX_STOP_DATA: /* enable DTE interrupt, read data, issue stop */ 339 - iic_wr(pd, ICIC, 340 - ICIC_DTEE | ICIC_WAITE | ICIC_ALE | ICIC_TACKE); 339 + if (!pd->atomic_xfer) 340 + iic_wr(pd, ICIC, 341 + ICIC_DTEE | ICIC_WAITE | ICIC_ALE | ICIC_TACKE); 341 342 ret = iic_rd(pd, ICDR); 342 343 iic_wr(pd, ICCR, ICCR_ICE | ICCR_RACK); 343 344 break; ··· 432 429 433 430 if (wakeup) { 434 431 pd->sr |= SW_DONE; 435 - wake_up(&pd->wait); 432 + if (!pd->atomic_xfer) 433 + wake_up(&pd->wait); 436 434 } 437 435 438 436 /* defeat write posting to avoid spurious WAIT interrupts */ ··· 585 581 pd->pos = -1; 586 582 pd->sr = 0; 587 583 584 + if (pd->atomic_xfer) 585 + return; 586 + 588 587 pd->dma_buf = i2c_get_dma_safe_msg_buf(pd->msg, 8); 589 588 if (pd->dma_buf) 590 589 sh_mobile_i2c_xfer_dma(pd); ··· 644 637 return i ? 0 : -ETIMEDOUT; 645 638 } 646 639 647 - static int sh_mobile_i2c_xfer(struct i2c_adapter *adapter, 648 - struct i2c_msg *msgs, 649 - int num) 640 + static int sh_mobile_xfer(struct sh_mobile_i2c_data *pd, 641 + struct i2c_msg *msgs, int num) 650 642 { 651 - struct sh_mobile_i2c_data *pd = i2c_get_adapdata(adapter); 652 643 struct i2c_msg *msg; 653 644 int err = 0; 654 645 int i; 655 - long timeout; 646 + long time_left; 656 647 657 648 /* Wake up device and enable clock */ 658 649 pm_runtime_get_sync(pd->dev); ··· 667 662 if (do_start) 668 663 i2c_op(pd, OP_START); 669 664 670 - /* The interrupt handler takes care of the rest... */ 671 - timeout = wait_event_timeout(pd->wait, 672 - pd->sr & (ICSR_TACK | SW_DONE), 673 - adapter->timeout); 665 + if (pd->atomic_xfer) { 666 + unsigned long j = jiffies + pd->adap.timeout; 674 667 675 - /* 'stop_after_dma' tells if DMA transfer was complete */ 676 - i2c_put_dma_safe_msg_buf(pd->dma_buf, pd->msg, pd->stop_after_dma); 668 + time_left = time_before_eq(jiffies, j); 669 + while (time_left && 670 + !(pd->sr & (ICSR_TACK | SW_DONE))) { 671 + unsigned char sr = iic_rd(pd, ICSR); 677 672 678 - if (!timeout) { 673 + if (sr & (ICSR_AL | ICSR_TACK | 674 + ICSR_WAIT | ICSR_DTE)) { 675 + sh_mobile_i2c_isr(0, pd); 676 + udelay(150); 677 + } else { 678 + cpu_relax(); 679 + } 680 + time_left = time_before_eq(jiffies, j); 681 + } 682 + } else { 683 + /* The interrupt handler takes care of the rest... */ 684 + time_left = wait_event_timeout(pd->wait, 685 + pd->sr & (ICSR_TACK | SW_DONE), 686 + pd->adap.timeout); 687 + 688 + /* 'stop_after_dma' tells if DMA xfer was complete */ 689 + i2c_put_dma_safe_msg_buf(pd->dma_buf, pd->msg, 690 + pd->stop_after_dma); 691 + } 692 + 693 + if (!time_left) { 679 694 dev_err(pd->dev, "Transfer request timed out\n"); 680 695 if (pd->dma_direction != DMA_NONE) 681 696 sh_mobile_i2c_cleanup_dma(pd); ··· 721 696 return err ?: num; 722 697 } 723 698 699 + static int sh_mobile_i2c_xfer(struct i2c_adapter *adapter, 700 + struct i2c_msg *msgs, 701 + int num) 702 + { 703 + struct sh_mobile_i2c_data *pd = i2c_get_adapdata(adapter); 704 + 705 + pd->atomic_xfer = false; 706 + return sh_mobile_xfer(pd, msgs, num); 707 + } 708 + 709 + static int sh_mobile_i2c_xfer_atomic(struct i2c_adapter *adapter, 710 + struct i2c_msg *msgs, 711 + int num) 712 + { 713 + struct sh_mobile_i2c_data *pd = i2c_get_adapdata(adapter); 714 + 715 + pd->atomic_xfer = true; 716 + return sh_mobile_xfer(pd, msgs, num); 717 + } 718 + 724 719 static u32 sh_mobile_i2c_func(struct i2c_adapter *adapter) 725 720 { 726 721 return I2C_FUNC_I2C | I2C_FUNC_SMBUS_EMUL | I2C_FUNC_PROTOCOL_MANGLING; 727 722 } 728 723 729 724 static const struct i2c_algorithm sh_mobile_i2c_algorithm = { 730 - .functionality = sh_mobile_i2c_func, 731 - .master_xfer = sh_mobile_i2c_xfer, 725 + .functionality = sh_mobile_i2c_func, 726 + .master_xfer = sh_mobile_i2c_xfer, 727 + .master_xfer_atomic = sh_mobile_i2c_xfer_atomic, 732 728 }; 733 729 734 730 static const struct i2c_adapter_quirks sh_mobile_i2c_quirks = {

+2 -3

drivers/iommu/intel/iommu.c

··· 3818 3818 * page aligned, we don't need to use a bounce page. 3819 3819 */ 3820 3820 if (!IS_ALIGNED(paddr | size, VTD_PAGE_SIZE)) { 3821 - tlb_addr = swiotlb_tbl_map_single(dev, 3822 - phys_to_dma_unencrypted(dev, io_tlb_start), 3823 - paddr, size, aligned_size, dir, attrs); 3821 + tlb_addr = swiotlb_tbl_map_single(dev, paddr, size, 3822 + aligned_size, dir, attrs); 3824 3823 if (tlb_addr == DMA_MAPPING_ERROR) { 3825 3824 goto swiotlb_error; 3826 3825 } else {

+1 -2

drivers/irqchip/Kconfig

··· 180 180 select GENERIC_IRQ_CHIP 181 181 select GENERIC_IRQ_IPI if SYS_SUPPORTS_MULTITHREADING 182 182 select IRQ_DOMAIN 183 - select IRQ_DOMAIN_HIERARCHY if GENERIC_IRQ_IPI 184 183 select GENERIC_IRQ_EFFECTIVE_AFF_MASK 185 184 186 185 config CLPS711X_IRQCHIP ··· 314 315 config MIPS_GIC 315 316 bool 316 317 select GENERIC_IRQ_IPI 317 - select IRQ_DOMAIN_HIERARCHY 318 318 select MIPS_CM 319 319 320 320 config INGENIC_IRQ ··· 589 591 590 592 config MST_IRQ 591 593 bool "MStar Interrupt Controller" 594 + depends on ARCH_MEDIATEK || ARCH_MSTARV7 || COMPILE_TEST 592 595 default ARCH_MEDIATEK 593 596 select IRQ_DOMAIN 594 597 select IRQ_DOMAIN_HIERARCHY

+1 -1

drivers/irqchip/irq-bcm2836.c

··· 244 244 245 245 #define BITS_PER_MBOX 32 246 246 247 - static void bcm2836_arm_irqchip_smp_init(void) 247 + static void __init bcm2836_arm_irqchip_smp_init(void) 248 248 { 249 249 struct irq_fwspec ipi_fwspec = { 250 250 .fwnode = intc.domain->fwnode,

+2 -2

drivers/irqchip/irq-mst-intc.c

··· 154 154 .free = irq_domain_free_irqs_common, 155 155 }; 156 156 157 - int __init 158 - mst_intc_of_init(struct device_node *dn, struct device_node *parent) 157 + static int __init mst_intc_of_init(struct device_node *dn, 158 + struct device_node *parent) 159 159 { 160 160 struct irq_domain *domain, *domain_parent; 161 161 struct mst_intc_chip_data *cd;

+3 -5

drivers/irqchip/irq-renesas-intc-irqpin.c

··· 71 71 }; 72 72 73 73 struct intc_irqpin_config { 74 - unsigned int irlm_bit; 75 - unsigned needs_irlm:1; 74 + int irlm_bit; /* -1 if non-existent */ 76 75 }; 77 76 78 77 static unsigned long intc_irqpin_read32(void __iomem *iomem) ··· 348 349 349 350 static const struct intc_irqpin_config intc_irqpin_irlm_r8a777x = { 350 351 .irlm_bit = 23, /* ICR0.IRLM0 */ 351 - .needs_irlm = 1, 352 352 }; 353 353 354 354 static const struct intc_irqpin_config intc_irqpin_rmobile = { 355 - .needs_irlm = 0, 355 + .irlm_bit = -1, 356 356 }; 357 357 358 358 static const struct of_device_id intc_irqpin_dt_ids[] = { ··· 468 470 } 469 471 470 472 /* configure "individual IRQ mode" where needed */ 471 - if (config && config->needs_irlm) { 473 + if (config && config->irlm_bit >= 0) { 472 474 if (io[INTC_IRQPIN_REG_IRLM]) 473 475 intc_irqpin_read_modify_write(p, INTC_IRQPIN_REG_IRLM, 474 476 config->irlm_bit, 1, 1);

+5 -5

drivers/irqchip/irq-sifive-plic.c

··· 99 99 struct irq_data *d, int enable) 100 100 { 101 101 int cpu; 102 - struct plic_priv *priv = irq_get_chip_data(d->irq); 102 + struct plic_priv *priv = irq_data_get_irq_chip_data(d); 103 103 104 104 writel(enable, priv->regs + PRIORITY_BASE + d->hwirq * PRIORITY_PER_ID); 105 105 for_each_cpu(cpu, mask) { ··· 115 115 { 116 116 struct cpumask amask; 117 117 unsigned int cpu; 118 - struct plic_priv *priv = irq_get_chip_data(d->irq); 118 + struct plic_priv *priv = irq_data_get_irq_chip_data(d); 119 119 120 120 cpumask_and(&amask, &priv->lmask, cpu_online_mask); 121 121 cpu = cpumask_any_and(irq_data_get_affinity_mask(d), ··· 127 127 128 128 static void plic_irq_mask(struct irq_data *d) 129 129 { 130 - struct plic_priv *priv = irq_get_chip_data(d->irq); 130 + struct plic_priv *priv = irq_data_get_irq_chip_data(d); 131 131 132 132 plic_irq_toggle(&priv->lmask, d, 0); 133 133 } ··· 138 138 { 139 139 unsigned int cpu; 140 140 struct cpumask amask; 141 - struct plic_priv *priv = irq_get_chip_data(d->irq); 141 + struct plic_priv *priv = irq_data_get_irq_chip_data(d); 142 142 143 143 cpumask_and(&amask, &priv->lmask, mask_val); 144 144 ··· 151 151 return -EINVAL; 152 152 153 153 plic_irq_toggle(&priv->lmask, d, 0); 154 - plic_irq_toggle(cpumask_of(cpu), d, 1); 154 + plic_irq_toggle(cpumask_of(cpu), d, !irqd_irq_masked(d)); 155 155 156 156 irq_data_update_effective_affinity(d, cpumask_of(cpu)); 157 157

+4

drivers/irqchip/irq-stm32-exti.c

··· 195 195 { .exti = 25, .irq_parent = 107, .chip = &stm32_exti_h_chip_direct }, 196 196 { .exti = 30, .irq_parent = 52, .chip = &stm32_exti_h_chip_direct }, 197 197 { .exti = 47, .irq_parent = 93, .chip = &stm32_exti_h_chip_direct }, 198 + { .exti = 48, .irq_parent = 138, .chip = &stm32_exti_h_chip_direct }, 199 + { .exti = 50, .irq_parent = 139, .chip = &stm32_exti_h_chip_direct }, 200 + { .exti = 52, .irq_parent = 140, .chip = &stm32_exti_h_chip_direct }, 201 + { .exti = 53, .irq_parent = 141, .chip = &stm32_exti_h_chip_direct }, 198 202 { .exti = 54, .irq_parent = 135, .chip = &stm32_exti_h_chip_direct }, 199 203 { .exti = 61, .irq_parent = 100, .chip = &stm32_exti_h_chip_direct }, 200 204 { .exti = 65, .irq_parent = 144, .chip = &stm32_exti_h_chip },

+80 -3

drivers/irqchip/irq-ti-sci-inta.c

··· 85 85 * @base: Base address of the memory mapped IO registers 86 86 * @pdev: Pointer to platform device. 87 87 * @ti_sci_id: TI-SCI device identifier 88 + * @unmapped_cnt: Number of @unmapped_dev_ids entries 89 + * @unmapped_dev_ids: Pointer to an array of TI-SCI device identifiers of 90 + * unmapped event sources. 91 + * Unmapped Events are not part of the Global Event Map and 92 + * they are converted to Global event within INTA to be 93 + * received by the same INTA to generate an interrupt. 94 + * In case an interrupt request comes for a device which is 95 + * generating Unmapped Event, we must use the INTA's TI-SCI 96 + * device identifier in place of the source device 97 + * identifier to let sysfw know where it has to program the 98 + * Global Event number. 88 99 */ 89 100 struct ti_sci_inta_irq_domain { 90 101 const struct ti_sci_handle *sci; ··· 107 96 void __iomem *base; 108 97 struct platform_device *pdev; 109 98 u32 ti_sci_id; 99 + 100 + int unmapped_cnt; 101 + u16 *unmapped_dev_ids; 110 102 }; 111 103 112 104 #define to_vint_desc(e, i) container_of(e, struct ti_sci_inta_vint_desc, \ 113 105 events[i]) 106 + 107 + static u16 ti_sci_inta_get_dev_id(struct ti_sci_inta_irq_domain *inta, u32 hwirq) 108 + { 109 + u16 dev_id = HWIRQ_TO_DEVID(hwirq); 110 + int i; 111 + 112 + if (inta->unmapped_cnt == 0) 113 + return dev_id; 114 + 115 + /* 116 + * For devices sending Unmapped Events we must use the INTA's TI-SCI 117 + * device identifier number to be able to convert it to a Global Event 118 + * and map it to an interrupt. 119 + */ 120 + for (i = 0; i < inta->unmapped_cnt; i++) { 121 + if (dev_id == inta->unmapped_dev_ids[i]) { 122 + dev_id = inta->ti_sci_id; 123 + break; 124 + } 125 + } 126 + 127 + return dev_id; 128 + } 114 129 115 130 /** 116 131 * ti_sci_inta_irq_handler() - Chained IRQ handler for the vint irqs ··· 288 251 u16 dev_id, dev_index; 289 252 int err; 290 253 291 - dev_id = HWIRQ_TO_DEVID(hwirq); 254 + dev_id = ti_sci_inta_get_dev_id(inta, hwirq); 292 255 dev_index = HWIRQ_TO_IRQID(hwirq); 293 256 294 257 event_desc = &vint_desc->events[free_bit]; ··· 389 352 { 390 353 struct ti_sci_inta_vint_desc *vint_desc; 391 354 struct ti_sci_inta_irq_domain *inta; 355 + u16 dev_id; 392 356 393 357 vint_desc = to_vint_desc(event_desc, event_desc->vint_bit); 394 358 inta = vint_desc->domain->host_data; 359 + dev_id = ti_sci_inta_get_dev_id(inta, hwirq); 395 360 /* free event irq */ 396 361 mutex_lock(&inta->vint_mutex); 397 362 inta->sci->ops.rm_irq_ops.free_event_map(inta->sci, 398 - HWIRQ_TO_DEVID(hwirq), 399 - HWIRQ_TO_IRQID(hwirq), 363 + dev_id, HWIRQ_TO_IRQID(hwirq), 400 364 inta->ti_sci_id, 401 365 vint_desc->vint_id, 402 366 event_desc->global_event, ··· 612 574 .chip = &ti_sci_inta_msi_irq_chip, 613 575 }; 614 576 577 + static int ti_sci_inta_get_unmapped_sources(struct ti_sci_inta_irq_domain *inta) 578 + { 579 + struct device *dev = &inta->pdev->dev; 580 + struct device_node *node = dev_of_node(dev); 581 + struct of_phandle_iterator it; 582 + int count, err, ret, i; 583 + 584 + count = of_count_phandle_with_args(node, "ti,unmapped-event-sources", NULL); 585 + if (count <= 0) 586 + return 0; 587 + 588 + inta->unmapped_dev_ids = devm_kcalloc(dev, count, 589 + sizeof(*inta->unmapped_dev_ids), 590 + GFP_KERNEL); 591 + if (!inta->unmapped_dev_ids) 592 + return -ENOMEM; 593 + 594 + i = 0; 595 + of_for_each_phandle(&it, err, node, "ti,unmapped-event-sources", NULL, 0) { 596 + u32 dev_id; 597 + 598 + ret = of_property_read_u32(it.node, "ti,sci-dev-id", &dev_id); 599 + if (ret) { 600 + dev_err(dev, "ti,sci-dev-id read failure for %pOFf\n", it.node); 601 + of_node_put(it.node); 602 + return ret; 603 + } 604 + inta->unmapped_dev_ids[i++] = dev_id; 605 + } 606 + 607 + inta->unmapped_cnt = count; 608 + 609 + return 0; 610 + } 611 + 615 612 static int ti_sci_inta_irq_domain_probe(struct platform_device *pdev) 616 613 { 617 614 struct irq_domain *parent_domain, *domain, *msi_domain; ··· 701 628 inta->base = devm_ioremap_resource(dev, res); 702 629 if (IS_ERR(inta->base)) 703 630 return PTR_ERR(inta->base); 631 + 632 + ret = ti_sci_inta_get_unmapped_sources(inta); 633 + if (ret) 634 + return ret; 704 635 705 636 domain = irq_domain_add_linear(dev_of_node(dev), 706 637 ti_sci_get_num_resources(inta->vint),

+3 -1

drivers/net/dsa/mv88e6xxx/devlink.c

··· 393 393 mv88e6xxx_reg_lock(chip); 394 394 395 395 err = mv88e6xxx_fid_map(chip, fid_bitmap); 396 - if (err) 396 + if (err) { 397 + kfree(table); 397 398 goto out; 399 + } 398 400 399 401 while (1) { 400 402 fid = find_next_bit(fid_bitmap, MV88E6XXX_N_FID, fid + 1);

+3

drivers/net/ethernet/chelsio/cxgb4/cxgb4.h

··· 2124 2124 void cxgb4_write_sgl(const struct sk_buff *skb, struct sge_txq *q, 2125 2125 struct ulptx_sgl *sgl, u64 *end, unsigned int start, 2126 2126 const dma_addr_t *addr); 2127 + void cxgb4_write_partial_sgl(const struct sk_buff *skb, struct sge_txq *q, 2128 + struct ulptx_sgl *sgl, u64 *end, 2129 + const dma_addr_t *addr, u32 start, u32 send_len); 2127 2130 void cxgb4_ring_tx_db(struct adapter *adap, struct sge_txq *q, int n); 2128 2131 int t4_set_vlan_acl(struct adapter *adap, unsigned int mbox, unsigned int vf, 2129 2132 u16 vlan);

+2

drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c

··· 3573 3573 atomic64_read(&adap->ch_ktls_stats.ktls_tx_complete_pkts)); 3574 3574 seq_printf(seq, "TX trim pkts : %20llu\n", 3575 3575 atomic64_read(&adap->ch_ktls_stats.ktls_tx_trimmed_pkts)); 3576 + seq_printf(seq, "TX sw fallback : %20llu\n", 3577 + atomic64_read(&adap->ch_ktls_stats.ktls_tx_fallback)); 3576 3578 while (i < MAX_NPORTS) { 3577 3579 ktls_port = &adap->ch_ktls_stats.ktls_port[i]; 3578 3580 seq_printf(seq, "Port %d\n", i);

+1

drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c

··· 1176 1176 txq = netdev_pick_tx(dev, skb, sb_dev); 1177 1177 if (xfrm_offload(skb) || is_ptp_enabled(skb, dev) || 1178 1178 skb->encapsulation || 1179 + cxgb4_is_ktls_skb(skb) || 1179 1180 (proto != IPPROTO_TCP && proto != IPPROTO_UDP)) 1180 1181 txq = txq % pi->nqsets; 1181 1182

+6

drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.h

··· 388 388 atomic64_t ktls_tx_retransmit_pkts; 389 389 atomic64_t ktls_tx_complete_pkts; 390 390 atomic64_t ktls_tx_trimmed_pkts; 391 + atomic64_t ktls_tx_fallback; 391 392 }; 392 393 #endif 393 394 ··· 493 492 const struct xfrmdev_ops *xfrmdev_ops; 494 493 #endif 495 494 }; 495 + 496 + static inline bool cxgb4_is_ktls_skb(struct sk_buff *skb) 497 + { 498 + return skb->sk && tls_is_sk_tx_device_offloaded(skb->sk); 499 + } 496 500 497 501 void cxgb4_uld_enable(struct adapter *adap); 498 502 void cxgb4_register_uld(enum cxgb4_uld type, const struct cxgb4_uld_info *p);

+110 -1

drivers/net/ethernet/chelsio/cxgb4/sge.c

··· 890 890 } 891 891 EXPORT_SYMBOL(cxgb4_write_sgl); 892 892 893 + /* cxgb4_write_partial_sgl - populate SGL for partial packet 894 + * @skb: the packet 895 + * @q: the Tx queue we are writing into 896 + * @sgl: starting location for writing the SGL 897 + * @end: points right after the end of the SGL 898 + * @addr: the list of bus addresses for the SGL elements 899 + * @start: start offset in the SKB where partial data starts 900 + * @len: length of data from @start to send out 901 + * 902 + * This API will handle sending out partial data of a skb if required. 903 + * Unlike cxgb4_write_sgl, @start can be any offset into the skb data, 904 + * and @len will decide how much data after @start offset to send out. 905 + */ 906 + void cxgb4_write_partial_sgl(const struct sk_buff *skb, struct sge_txq *q, 907 + struct ulptx_sgl *sgl, u64 *end, 908 + const dma_addr_t *addr, u32 start, u32 len) 909 + { 910 + struct ulptx_sge_pair buf[MAX_SKB_FRAGS / 2 + 1] = {0}, *to; 911 + u32 frag_size, skb_linear_data_len = skb_headlen(skb); 912 + struct skb_shared_info *si = skb_shinfo(skb); 913 + u8 i = 0, frag_idx = 0, nfrags = 0; 914 + skb_frag_t *frag; 915 + 916 + /* Fill the first SGL either from linear data or from partial 917 + * frag based on @start. 918 + */ 919 + if (unlikely(start < skb_linear_data_len)) { 920 + frag_size = min(len, skb_linear_data_len - start); 921 + sgl->len0 = htonl(frag_size); 922 + sgl->addr0 = cpu_to_be64(addr[0] + start); 923 + len -= frag_size; 924 + nfrags++; 925 + } else { 926 + start -= skb_linear_data_len; 927 + frag = &si->frags[frag_idx]; 928 + frag_size = skb_frag_size(frag); 929 + /* find the first frag */ 930 + while (start >= frag_size) { 931 + start -= frag_size; 932 + frag_idx++; 933 + frag = &si->frags[frag_idx]; 934 + frag_size = skb_frag_size(frag); 935 + } 936 + 937 + frag_size = min(len, skb_frag_size(frag) - start); 938 + sgl->len0 = cpu_to_be32(frag_size); 939 + sgl->addr0 = cpu_to_be64(addr[frag_idx + 1] + start); 940 + len -= frag_size; 941 + nfrags++; 942 + frag_idx++; 943 + } 944 + 945 + /* If the entire partial data fit in one SGL, then send it out 946 + * now. 947 + */ 948 + if (!len) 949 + goto done; 950 + 951 + /* Most of the complexity below deals with the possibility we hit the 952 + * end of the queue in the middle of writing the SGL. For this case 953 + * only we create the SGL in a temporary buffer and then copy it. 954 + */ 955 + to = (u8 *)end > (u8 *)q->stat ? buf : sgl->sge; 956 + 957 + /* If the skb couldn't fit in first SGL completely, fill the 958 + * rest of the frags in subsequent SGLs. Note that each SGL 959 + * pair can store 2 frags. 960 + */ 961 + while (len) { 962 + frag_size = min(len, skb_frag_size(&si->frags[frag_idx])); 963 + to->len[i & 1] = cpu_to_be32(frag_size); 964 + to->addr[i & 1] = cpu_to_be64(addr[frag_idx + 1]); 965 + if (i && (i & 1)) 966 + to++; 967 + nfrags++; 968 + frag_idx++; 969 + i++; 970 + len -= frag_size; 971 + } 972 + 973 + /* If we ended in an odd boundary, then set the second SGL's 974 + * length in the pair to 0. 975 + */ 976 + if (i & 1) 977 + to->len[1] = cpu_to_be32(0); 978 + 979 + /* Copy from temporary buffer to Tx ring, in case we hit the 980 + * end of the queue in the middle of writing the SGL. 981 + */ 982 + if (unlikely((u8 *)end > (u8 *)q->stat)) { 983 + u32 part0 = (u8 *)q->stat - (u8 *)sgl->sge, part1; 984 + 985 + if (likely(part0)) 986 + memcpy(sgl->sge, buf, part0); 987 + part1 = (u8 *)end - (u8 *)q->stat; 988 + memcpy(q->desc, (u8 *)buf + part0, part1); 989 + end = (void *)q->desc + part1; 990 + } 991 + 992 + /* 0-pad to multiple of 16 */ 993 + if ((uintptr_t)end & 8) 994 + *end = 0; 995 + done: 996 + sgl->cmd_nsge = htonl(ULPTX_CMD_V(ULP_TX_SC_DSGL) | 997 + ULPTX_NSGE_V(nfrags)); 998 + } 999 + EXPORT_SYMBOL(cxgb4_write_partial_sgl); 1000 + 893 1001 /* This function copies 64 byte coalesced work request to 894 1002 * memory mapped BAR2 space. For coalesced WR SGE fetches 895 1003 * data from the FIFO instead of from Host. ··· 1530 1422 #endif /* CHELSIO_IPSEC_INLINE */ 1531 1423 1532 1424 #if IS_ENABLED(CONFIG_CHELSIO_TLS_DEVICE) 1533 - if (skb->decrypted) 1425 + if (cxgb4_is_ktls_skb(skb) && 1426 + (skb->len - (skb_transport_offset(skb) + tcp_hdrlen(skb)))) 1534 1427 return adap->uld[CXGB4_ULD_KTLS].tx_handler(skb, dev); 1535 1428 #endif /* CHELSIO_TLS_DEVICE */ 1536 1429

+355 -227

drivers/net/ethernet/chelsio/inline_crypto/ch_ktls/chcr_ktls.c

··· 14 14 static LIST_HEAD(uld_ctx_list); 15 15 static DEFINE_MUTEX(dev_mutex); 16 16 17 + /* chcr_get_nfrags_to_send: get the remaining nfrags after start offset 18 + * @skb: skb 19 + * @start: start offset. 20 + * @len: how much data to send after @start 21 + */ 22 + static int chcr_get_nfrags_to_send(struct sk_buff *skb, u32 start, u32 len) 23 + { 24 + struct skb_shared_info *si = skb_shinfo(skb); 25 + u32 frag_size, skb_linear_data_len = skb_headlen(skb); 26 + u8 nfrags = 0, frag_idx = 0; 27 + skb_frag_t *frag; 28 + 29 + /* if its a linear skb then return 1 */ 30 + if (!skb_is_nonlinear(skb)) 31 + return 1; 32 + 33 + if (unlikely(start < skb_linear_data_len)) { 34 + frag_size = min(len, skb_linear_data_len - start); 35 + start = 0; 36 + } else { 37 + start -= skb_linear_data_len; 38 + 39 + frag = &si->frags[frag_idx]; 40 + frag_size = skb_frag_size(frag); 41 + while (start >= frag_size) { 42 + start -= frag_size; 43 + frag_idx++; 44 + frag = &si->frags[frag_idx]; 45 + frag_size = skb_frag_size(frag); 46 + } 47 + frag_size = min(len, skb_frag_size(frag) - start); 48 + } 49 + len -= frag_size; 50 + nfrags++; 51 + 52 + while (len) { 53 + frag_size = min(len, skb_frag_size(&si->frags[frag_idx])); 54 + len -= frag_size; 55 + nfrags++; 56 + frag_idx++; 57 + } 58 + return nfrags; 59 + } 60 + 17 61 static int chcr_init_tcb_fields(struct chcr_ktls_info *tx_info); 18 62 /* 19 63 * chcr_ktls_save_keys: calculate and save crypto keys. ··· 733 689 } 734 690 735 691 static void *__chcr_write_cpl_set_tcb_ulp(struct chcr_ktls_info *tx_info, 736 - u32 tid, void *pos, u16 word, u64 mask, 692 + u32 tid, void *pos, u16 word, 693 + struct sge_eth_txq *q, u64 mask, 737 694 u64 val, u32 reply) 738 695 { 739 696 struct cpl_set_tcb_field_core *cpl; ··· 743 698 744 699 /* ULP_TXPKT */ 745 700 txpkt = pos; 746 - txpkt->cmd_dest = htonl(ULPTX_CMD_V(ULP_TX_PKT) | ULP_TXPKT_DEST_V(0)); 701 + txpkt->cmd_dest = htonl(ULPTX_CMD_V(ULP_TX_PKT) | 702 + ULP_TXPKT_CHANNELID_V(tx_info->port_id) | 703 + ULP_TXPKT_FID_V(q->q.cntxt_id) | 704 + ULP_TXPKT_RO_F); 747 705 txpkt->len = htonl(DIV_ROUND_UP(CHCR_SET_TCB_FIELD_LEN, 16)); 748 706 749 707 /* ULPTX_IDATA sub-command */ ··· 801 753 } else { 802 754 u8 buf[48] = {0}; 803 755 804 - __chcr_write_cpl_set_tcb_ulp(tx_info, tid, buf, word, 756 + __chcr_write_cpl_set_tcb_ulp(tx_info, tid, buf, word, q, 805 757 mask, val, reply); 806 758 807 759 return chcr_copy_to_txd(buf, &q->q, pos, ··· 809 761 } 810 762 } 811 763 812 - pos = __chcr_write_cpl_set_tcb_ulp(tx_info, tid, pos, word, 764 + pos = __chcr_write_cpl_set_tcb_ulp(tx_info, tid, pos, word, q, 813 765 mask, val, reply); 814 766 815 767 /* check again if we are at the end of the queue */ ··· 831 783 */ 832 784 static int chcr_ktls_xmit_tcb_cpls(struct chcr_ktls_info *tx_info, 833 785 struct sge_eth_txq *q, u64 tcp_seq, 834 - u64 tcp_ack, u64 tcp_win) 786 + u64 tcp_ack, u64 tcp_win, bool offset) 835 787 { 836 788 bool first_wr = ((tx_info->prev_ack == 0) && (tx_info->prev_win == 0)); 837 789 struct ch_ktls_port_stats_debug *port_stats; 838 - u32 len, cpl = 0, ndesc, wr_len; 790 + u32 len, cpl = 0, ndesc, wr_len, wr_mid = 0; 839 791 struct fw_ulptx_wr *wr; 840 792 int credits; 841 793 void *pos; ··· 849 801 if (unlikely(credits < 0)) { 850 802 chcr_eth_txq_stop(q); 851 803 return NETDEV_TX_BUSY; 804 + } 805 + 806 + if (unlikely(credits < ETHTXQ_STOP_THRES)) { 807 + chcr_eth_txq_stop(q); 808 + wr_mid |= FW_WR_EQUEQ_F | FW_WR_EQUIQ_F; 852 809 } 853 810 854 811 pos = &q->q.desc[q->q.pidx]; ··· 871 818 cpl++; 872 819 } 873 820 /* reset snd una if it's a re-transmit pkt */ 874 - if (tcp_seq != tx_info->prev_seq) { 821 + if (tcp_seq != tx_info->prev_seq || offset) { 875 822 /* reset snd_una */ 876 823 port_stats = 877 824 &tx_info->adap->ch_ktls_stats.ktls_port[tx_info->port_id]; ··· 880 827 TCB_SND_UNA_RAW_V 881 828 (TCB_SND_UNA_RAW_M), 882 829 TCB_SND_UNA_RAW_V(0), 0); 883 - atomic64_inc(&port_stats->ktls_tx_ooo); 830 + if (tcp_seq != tx_info->prev_seq) 831 + atomic64_inc(&port_stats->ktls_tx_ooo); 884 832 cpl++; 885 833 } 886 834 /* update ack */ ··· 910 856 wr->op_to_compl = htonl(FW_WR_OP_V(FW_ULPTX_WR)); 911 857 wr->cookie = 0; 912 858 /* fill len in wr field */ 913 - wr->flowid_len16 = htonl(FW_WR_LEN16_V(DIV_ROUND_UP(len, 16))); 859 + wr->flowid_len16 = htonl(wr_mid | 860 + FW_WR_LEN16_V(DIV_ROUND_UP(len, 16))); 914 861 915 862 ndesc = DIV_ROUND_UP(len, 64); 916 863 chcr_txq_advance(&q->q, ndesc); ··· 921 866 } 922 867 923 868 /* 924 - * chcr_ktls_skb_copy 925 - * @nskb - new skb where the frags to be added. 926 - * @skb - old skb from which frags will be copied. 927 - */ 928 - static void chcr_ktls_skb_copy(struct sk_buff *skb, struct sk_buff *nskb) 929 - { 930 - int i; 931 - 932 - for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) { 933 - skb_shinfo(nskb)->frags[i] = skb_shinfo(skb)->frags[i]; 934 - __skb_frag_ref(&skb_shinfo(nskb)->frags[i]); 935 - } 936 - 937 - skb_shinfo(nskb)->nr_frags = skb_shinfo(skb)->nr_frags; 938 - nskb->len += skb->data_len; 939 - nskb->data_len = skb->data_len; 940 - nskb->truesize += skb->data_len; 941 - } 942 - 943 - /* 944 869 * chcr_ktls_get_tx_flits 945 870 * returns number of flits to be sent out, it includes key context length, WR 946 871 * size and skb fragments. 947 872 */ 948 873 static unsigned int 949 - chcr_ktls_get_tx_flits(const struct sk_buff *skb, unsigned int key_ctx_len) 874 + chcr_ktls_get_tx_flits(u32 nr_frags, unsigned int key_ctx_len) 950 875 { 951 - return chcr_sgl_len(skb_shinfo(skb)->nr_frags) + 876 + return chcr_sgl_len(nr_frags) + 952 877 DIV_ROUND_UP(key_ctx_len + CHCR_KTLS_WR_SIZE, 8); 953 878 } 954 879 ··· 992 957 struct tcphdr *tcp; 993 958 int len16, pktlen; 994 959 struct iphdr *ip; 960 + u32 wr_mid = 0; 995 961 int credits; 996 962 u8 buf[150]; 963 + u64 cntrl1; 997 964 void *pos; 998 965 999 966 iplen = skb_network_header_len(skb); ··· 1004 967 /* packet length = eth hdr len + ip hdr len + tcp hdr len 1005 968 * (including options). 1006 969 */ 1007 - pktlen = skb->len - skb->data_len; 970 + pktlen = skb_transport_offset(skb) + tcp_hdrlen(skb); 1008 971 1009 972 ctrl = sizeof(*cpl) + pktlen; 1010 973 len16 = DIV_ROUND_UP(sizeof(*wr) + ctrl, 16); ··· 1017 980 return NETDEV_TX_BUSY; 1018 981 } 1019 982 983 + if (unlikely(credits < ETHTXQ_STOP_THRES)) { 984 + chcr_eth_txq_stop(q); 985 + wr_mid |= FW_WR_EQUEQ_F | FW_WR_EQUIQ_F; 986 + } 987 + 1020 988 pos = &q->q.desc[q->q.pidx]; 1021 989 wr = pos; 1022 990 ··· 1029 987 wr->op_immdlen = htonl(FW_WR_OP_V(FW_ETH_TX_PKT_WR) | 1030 988 FW_WR_IMMDLEN_V(ctrl)); 1031 989 1032 - wr->equiq_to_len16 = htonl(FW_WR_LEN16_V(len16)); 990 + wr->equiq_to_len16 = htonl(wr_mid | FW_WR_LEN16_V(len16)); 1033 991 wr->r3 = 0; 1034 992 1035 993 cpl = (void *)(wr + 1); ··· 1039 997 TXPKT_PF_V(tx_info->adap->pf)); 1040 998 cpl->pack = 0; 1041 999 cpl->len = htons(pktlen); 1042 - /* checksum offload */ 1043 - cpl->ctrl1 = 0; 1044 - 1045 - pos = cpl + 1; 1046 1000 1047 1001 memcpy(buf, skb->data, pktlen); 1048 1002 if (tx_info->ip_family == AF_INET) { 1049 1003 /* we need to correct ip header len */ 1050 1004 ip = (struct iphdr *)(buf + maclen); 1051 1005 ip->tot_len = htons(pktlen - maclen); 1006 + cntrl1 = TXPKT_CSUM_TYPE_V(TX_CSUM_TCPIP); 1052 1007 #if IS_ENABLED(CONFIG_IPV6) 1053 1008 } else { 1054 1009 ip6 = (struct ipv6hdr *)(buf + maclen); 1055 1010 ip6->payload_len = htons(pktlen - maclen - iplen); 1011 + cntrl1 = TXPKT_CSUM_TYPE_V(TX_CSUM_TCPIP6); 1056 1012 #endif 1057 1013 } 1014 + 1015 + cntrl1 |= T6_TXPKT_ETHHDR_LEN_V(maclen - ETH_HLEN) | 1016 + TXPKT_IPHDR_LEN_V(iplen); 1017 + /* checksum offload */ 1018 + cpl->ctrl1 = cpu_to_be64(cntrl1); 1019 + 1020 + pos = cpl + 1; 1021 + 1058 1022 /* now take care of the tcp header, if fin is not set then clear push 1059 1023 * bit as well, and if fin is set, it will be sent at the last so we 1060 1024 * need to update the tcp sequence number as per the last packet. ··· 1079 1031 return 0; 1080 1032 } 1081 1033 1082 - /* chcr_ktls_skb_shift - Shifts request length paged data from skb to another. 1083 - * @tgt- buffer into which tail data gets added 1084 - * @skb- buffer from which the paged data comes from 1085 - * @shiftlen- shift up to this many bytes 1086 - */ 1087 - static int chcr_ktls_skb_shift(struct sk_buff *tgt, struct sk_buff *skb, 1088 - int shiftlen) 1089 - { 1090 - skb_frag_t *fragfrom, *fragto; 1091 - int from, to, todo; 1092 - 1093 - WARN_ON(shiftlen > skb->data_len); 1094 - 1095 - todo = shiftlen; 1096 - from = 0; 1097 - to = 0; 1098 - fragfrom = &skb_shinfo(skb)->frags[from]; 1099 - 1100 - while ((todo > 0) && (from < skb_shinfo(skb)->nr_frags)) { 1101 - fragfrom = &skb_shinfo(skb)->frags[from]; 1102 - fragto = &skb_shinfo(tgt)->frags[to]; 1103 - 1104 - if (todo >= skb_frag_size(fragfrom)) { 1105 - *fragto = *fragfrom; 1106 - todo -= skb_frag_size(fragfrom); 1107 - from++; 1108 - to++; 1109 - 1110 - } else { 1111 - __skb_frag_ref(fragfrom); 1112 - skb_frag_page_copy(fragto, fragfrom); 1113 - skb_frag_off_copy(fragto, fragfrom); 1114 - skb_frag_size_set(fragto, todo); 1115 - 1116 - skb_frag_off_add(fragfrom, todo); 1117 - skb_frag_size_sub(fragfrom, todo); 1118 - todo = 0; 1119 - 1120 - to++; 1121 - break; 1122 - } 1123 - } 1124 - 1125 - /* Ready to "commit" this state change to tgt */ 1126 - skb_shinfo(tgt)->nr_frags = to; 1127 - 1128 - /* Reposition in the original skb */ 1129 - to = 0; 1130 - while (from < skb_shinfo(skb)->nr_frags) 1131 - skb_shinfo(skb)->frags[to++] = skb_shinfo(skb)->frags[from++]; 1132 - 1133 - skb_shinfo(skb)->nr_frags = to; 1134 - 1135 - WARN_ON(todo > 0 && !skb_shinfo(skb)->nr_frags); 1136 - 1137 - skb->len -= shiftlen; 1138 - skb->data_len -= shiftlen; 1139 - skb->truesize -= shiftlen; 1140 - tgt->len += shiftlen; 1141 - tgt->data_len += shiftlen; 1142 - tgt->truesize += shiftlen; 1143 - 1144 - return shiftlen; 1145 - } 1146 - 1147 1034 /* 1148 1035 * chcr_ktls_xmit_wr_complete: This sends out the complete record. If an skb 1149 1036 * received has partial end part of the record, send out the complete record, so ··· 1094 1111 static int chcr_ktls_xmit_wr_complete(struct sk_buff *skb, 1095 1112 struct chcr_ktls_info *tx_info, 1096 1113 struct sge_eth_txq *q, u32 tcp_seq, 1114 + bool is_last_wr, u32 data_len, 1115 + u32 skb_offset, u32 nfrags, 1097 1116 bool tcp_push, u32 mss) 1098 1117 { 1099 1118 u32 len16, wr_mid = 0, flits = 0, ndesc, cipher_start; ··· 1111 1126 u64 *end; 1112 1127 1113 1128 /* get the number of flits required */ 1114 - flits = chcr_ktls_get_tx_flits(skb, tx_info->key_ctx_len); 1129 + flits = chcr_ktls_get_tx_flits(nfrags, tx_info->key_ctx_len); 1115 1130 /* number of descriptors */ 1116 1131 ndesc = chcr_flits_to_desc(flits); 1117 1132 /* check if enough credits available */ ··· 1139 1154 q->mapping_err++; 1140 1155 return NETDEV_TX_BUSY; 1141 1156 } 1157 + 1158 + if (!is_last_wr) 1159 + skb_get(skb); 1142 1160 1143 1161 pos = &q->q.desc[q->q.pidx]; 1144 1162 end = (u64 *)pos + flits; ··· 1175 1187 CPL_TX_SEC_PDU_CPLLEN_V(CHCR_CPL_TX_SEC_PDU_LEN_64BIT) | 1176 1188 CPL_TX_SEC_PDU_PLACEHOLDER_V(1) | 1177 1189 CPL_TX_SEC_PDU_IVINSRTOFST_V(TLS_HEADER_SIZE + 1)); 1178 - cpl->pldlen = htonl(skb->data_len); 1190 + cpl->pldlen = htonl(data_len); 1179 1191 1180 1192 /* encryption should start after tls header size + iv size */ 1181 1193 cipher_start = TLS_HEADER_SIZE + tx_info->iv_size + 1; ··· 1217 1229 /* CPL_TX_DATA */ 1218 1230 tx_data = (void *)pos; 1219 1231 OPCODE_TID(tx_data) = htonl(MK_OPCODE_TID(CPL_TX_DATA, tx_info->tid)); 1220 - tx_data->len = htonl(TX_DATA_MSS_V(mss) | TX_LENGTH_V(skb->data_len)); 1232 + tx_data->len = htonl(TX_DATA_MSS_V(mss) | TX_LENGTH_V(data_len)); 1221 1233 1222 1234 tx_data->rsvd = htonl(tcp_seq); 1223 1235 ··· 1237 1249 } 1238 1250 1239 1251 /* send the complete packet except the header */ 1240 - cxgb4_write_sgl(skb, &q->q, pos, end, skb->len - skb->data_len, 1241 - sgl_sdesc->addr); 1252 + cxgb4_write_partial_sgl(skb, &q->q, pos, end, sgl_sdesc->addr, 1253 + skb_offset, data_len); 1242 1254 sgl_sdesc->skb = skb; 1243 1255 1244 1256 chcr_txq_advance(&q->q, ndesc); ··· 1270 1282 struct sge_eth_txq *q, 1271 1283 u32 tcp_seq, bool tcp_push, u32 mss, 1272 1284 u32 tls_rec_offset, u8 *prior_data, 1273 - u32 prior_data_len) 1285 + u32 prior_data_len, u32 data_len, 1286 + u32 skb_offset) 1274 1287 { 1288 + u32 len16, wr_mid = 0, cipher_start, nfrags; 1275 1289 struct adapter *adap = tx_info->adap; 1276 - u32 len16, wr_mid = 0, cipher_start; 1277 1290 unsigned int flits = 0, ndesc; 1278 1291 int credits, left, last_desc; 1279 1292 struct tx_sw_desc *sgl_sdesc; ··· 1287 1298 void *pos; 1288 1299 u64 *end; 1289 1300 1301 + nfrags = chcr_get_nfrags_to_send(skb, skb_offset, data_len); 1290 1302 /* get the number of flits required, it's a partial record so 2 flits 1291 1303 * (AES_BLOCK_SIZE) will be added. 1292 1304 */ 1293 - flits = chcr_ktls_get_tx_flits(skb, tx_info->key_ctx_len) + 2; 1305 + flits = chcr_ktls_get_tx_flits(nfrags, tx_info->key_ctx_len) + 2; 1294 1306 /* get the correct 8 byte IV of this record */ 1295 1307 iv_record = cpu_to_be64(tx_info->iv + tx_info->record_no); 1296 1308 /* If it's a middle record and not 16 byte aligned to run AES CTR, need ··· 1363 1373 htonl(CPL_TX_SEC_PDU_OPCODE_V(CPL_TX_SEC_PDU) | 1364 1374 CPL_TX_SEC_PDU_CPLLEN_V(CHCR_CPL_TX_SEC_PDU_LEN_64BIT) | 1365 1375 CPL_TX_SEC_PDU_IVINSRTOFST_V(1)); 1366 - cpl->pldlen = htonl(skb->data_len + AES_BLOCK_LEN + prior_data_len); 1376 + cpl->pldlen = htonl(data_len + AES_BLOCK_LEN + prior_data_len); 1367 1377 cpl->aadstart_cipherstop_hi = 1368 1378 htonl(CPL_TX_SEC_PDU_CIPHERSTART_V(cipher_start)); 1369 1379 cpl->cipherstop_lo_authinsert = 0; ··· 1394 1404 tx_data = (void *)pos; 1395 1405 OPCODE_TID(tx_data) = htonl(MK_OPCODE_TID(CPL_TX_DATA, tx_info->tid)); 1396 1406 tx_data->len = htonl(TX_DATA_MSS_V(mss) | 1397 - TX_LENGTH_V(skb->data_len + prior_data_len)); 1407 + TX_LENGTH_V(data_len + prior_data_len)); 1398 1408 tx_data->rsvd = htonl(tcp_seq); 1399 1409 tx_data->flags = htonl(TX_BYPASS_F); 1400 1410 if (tcp_push) ··· 1427 1437 if (prior_data_len) 1428 1438 pos = chcr_copy_to_txd(prior_data, &q->q, pos, 16); 1429 1439 /* send the complete packet except the header */ 1430 - cxgb4_write_sgl(skb, &q->q, pos, end, skb->len - skb->data_len, 1431 - sgl_sdesc->addr); 1440 + cxgb4_write_partial_sgl(skb, &q->q, pos, end, sgl_sdesc->addr, 1441 + skb_offset, data_len); 1432 1442 sgl_sdesc->skb = skb; 1433 1443 1434 1444 chcr_txq_advance(&q->q, ndesc); ··· 1456 1466 struct sk_buff *skb, u32 tcp_seq, u32 mss, 1457 1467 bool tcp_push, struct sge_eth_txq *q, 1458 1468 u32 port_id, u8 *prior_data, 1469 + u32 data_len, u32 skb_offset, 1459 1470 u32 prior_data_len) 1460 1471 { 1461 1472 int credits, left, len16, last_desc; ··· 1466 1475 struct ulptx_idata *idata; 1467 1476 struct ulp_txpkt *ulptx; 1468 1477 struct fw_ulptx_wr *wr; 1469 - u32 wr_mid = 0; 1478 + u32 wr_mid = 0, nfrags; 1470 1479 void *pos; 1471 1480 u64 *end; 1472 1481 1473 1482 flits = DIV_ROUND_UP(CHCR_PLAIN_TX_DATA_LEN, 8); 1474 - flits += chcr_sgl_len(skb_shinfo(skb)->nr_frags); 1483 + nfrags = chcr_get_nfrags_to_send(skb, skb_offset, data_len); 1484 + flits += chcr_sgl_len(nfrags); 1475 1485 if (prior_data_len) 1476 1486 flits += 2; 1487 + 1477 1488 /* WR will need len16 */ 1478 1489 len16 = DIV_ROUND_UP(flits, 2); 1479 1490 /* check how many descriptors needed */ ··· 1528 1535 tx_data = (struct cpl_tx_data *)(idata + 1); 1529 1536 OPCODE_TID(tx_data) = htonl(MK_OPCODE_TID(CPL_TX_DATA, tx_info->tid)); 1530 1537 tx_data->len = htonl(TX_DATA_MSS_V(mss) | 1531 - TX_LENGTH_V(skb->data_len + prior_data_len)); 1538 + TX_LENGTH_V(data_len + prior_data_len)); 1532 1539 /* set tcp seq number */ 1533 1540 tx_data->rsvd = htonl(tcp_seq); 1534 1541 tx_data->flags = htonl(TX_BYPASS_F); ··· 1552 1559 end = pos + left; 1553 1560 } 1554 1561 /* send the complete packet including the header */ 1555 - cxgb4_write_sgl(skb, &q->q, pos, end, skb->len - skb->data_len, 1556 - sgl_sdesc->addr); 1562 + cxgb4_write_partial_sgl(skb, &q->q, pos, end, sgl_sdesc->addr, 1563 + skb_offset, data_len); 1557 1564 sgl_sdesc->skb = skb; 1558 1565 1566 + chcr_txq_advance(&q->q, ndesc); 1567 + cxgb4_ring_tx_db(tx_info->adap, &q->q, ndesc); 1568 + return 0; 1569 + } 1570 + 1571 + static int chcr_ktls_tunnel_pkt(struct chcr_ktls_info *tx_info, 1572 + struct sk_buff *skb, 1573 + struct sge_eth_txq *q) 1574 + { 1575 + u32 ctrl, iplen, maclen, wr_mid = 0, len16; 1576 + struct tx_sw_desc *sgl_sdesc; 1577 + struct fw_eth_tx_pkt_wr *wr; 1578 + struct cpl_tx_pkt_core *cpl; 1579 + unsigned int flits, ndesc; 1580 + int credits, last_desc; 1581 + u64 cntrl1, *end; 1582 + void *pos; 1583 + 1584 + ctrl = sizeof(*cpl); 1585 + flits = DIV_ROUND_UP(sizeof(*wr) + ctrl, 8); 1586 + 1587 + flits += chcr_sgl_len(skb_shinfo(skb)->nr_frags + 1); 1588 + len16 = DIV_ROUND_UP(flits, 2); 1589 + /* check how many descriptors needed */ 1590 + ndesc = DIV_ROUND_UP(flits, 8); 1591 + 1592 + credits = chcr_txq_avail(&q->q) - ndesc; 1593 + if (unlikely(credits < 0)) { 1594 + chcr_eth_txq_stop(q); 1595 + return -ENOMEM; 1596 + } 1597 + 1598 + if (unlikely(credits < ETHTXQ_STOP_THRES)) { 1599 + chcr_eth_txq_stop(q); 1600 + wr_mid |= FW_WR_EQUEQ_F | FW_WR_EQUIQ_F; 1601 + } 1602 + 1603 + last_desc = q->q.pidx + ndesc - 1; 1604 + if (last_desc >= q->q.size) 1605 + last_desc -= q->q.size; 1606 + sgl_sdesc = &q->q.sdesc[last_desc]; 1607 + 1608 + if (unlikely(cxgb4_map_skb(tx_info->adap->pdev_dev, skb, 1609 + sgl_sdesc->addr) < 0)) { 1610 + memset(sgl_sdesc->addr, 0, sizeof(sgl_sdesc->addr)); 1611 + q->mapping_err++; 1612 + return -ENOMEM; 1613 + } 1614 + 1615 + iplen = skb_network_header_len(skb); 1616 + maclen = skb_mac_header_len(skb); 1617 + 1618 + pos = &q->q.desc[q->q.pidx]; 1619 + end = (u64 *)pos + flits; 1620 + wr = pos; 1621 + 1622 + /* Firmware work request header */ 1623 + wr->op_immdlen = htonl(FW_WR_OP_V(FW_ETH_TX_PKT_WR) | 1624 + FW_WR_IMMDLEN_V(ctrl)); 1625 + 1626 + wr->equiq_to_len16 = htonl(wr_mid | FW_WR_LEN16_V(len16)); 1627 + wr->r3 = 0; 1628 + 1629 + cpl = (void *)(wr + 1); 1630 + 1631 + /* CPL header */ 1632 + cpl->ctrl0 = htonl(TXPKT_OPCODE_V(CPL_TX_PKT) | 1633 + TXPKT_INTF_V(tx_info->tx_chan) | 1634 + TXPKT_PF_V(tx_info->adap->pf)); 1635 + cpl->pack = 0; 1636 + cntrl1 = TXPKT_CSUM_TYPE_V(tx_info->ip_family == AF_INET ? 1637 + TX_CSUM_TCPIP : TX_CSUM_TCPIP6); 1638 + cntrl1 |= T6_TXPKT_ETHHDR_LEN_V(maclen - ETH_HLEN) | 1639 + TXPKT_IPHDR_LEN_V(iplen); 1640 + /* checksum offload */ 1641 + cpl->ctrl1 = cpu_to_be64(cntrl1); 1642 + cpl->len = htons(skb->len); 1643 + 1644 + pos = cpl + 1; 1645 + 1646 + cxgb4_write_sgl(skb, &q->q, pos, end, 0, sgl_sdesc->addr); 1647 + sgl_sdesc->skb = skb; 1559 1648 chcr_txq_advance(&q->q, ndesc); 1560 1649 cxgb4_ring_tx_db(tx_info->adap, &q->q, ndesc); 1561 1650 return 0; ··· 1646 1571 /* 1647 1572 * chcr_ktls_copy_record_in_skb 1648 1573 * @nskb - new skb where the frags to be added. 1574 + * @skb - old skb, to copy socket and destructor details. 1649 1575 * @record - specific record which has complete 16k record in frags. 1650 1576 */ 1651 1577 static void chcr_ktls_copy_record_in_skb(struct sk_buff *nskb, 1578 + struct sk_buff *skb, 1652 1579 struct tls_record_info *record) 1653 1580 { 1654 1581 int i = 0; ··· 1665 1588 nskb->data_len = record->len; 1666 1589 nskb->len += record->len; 1667 1590 nskb->truesize += record->len; 1591 + nskb->sk = skb->sk; 1592 + nskb->destructor = skb->destructor; 1593 + refcount_add(nskb->truesize, &nskb->sk->sk_wmem_alloc); 1668 1594 } 1669 1595 1670 1596 /* ··· 1739 1659 struct sk_buff *skb, 1740 1660 struct tls_record_info *record, 1741 1661 u32 tcp_seq, int mss, bool tcp_push_no_fin, 1742 - struct sge_eth_txq *q, 1662 + struct sge_eth_txq *q, u32 skb_offset, 1743 1663 u32 tls_end_offset, bool last_wr) 1744 1664 { 1745 1665 struct sk_buff *nskb = NULL; ··· 1748 1668 nskb = skb; 1749 1669 atomic64_inc(&tx_info->adap->ch_ktls_stats.ktls_tx_complete_pkts); 1750 1670 } else { 1751 - dev_kfree_skb_any(skb); 1752 - 1753 - nskb = alloc_skb(0, GFP_KERNEL); 1754 - if (!nskb) 1671 + nskb = alloc_skb(0, GFP_ATOMIC); 1672 + if (!nskb) { 1673 + dev_kfree_skb_any(skb); 1755 1674 return NETDEV_TX_BUSY; 1675 + } 1676 + 1756 1677 /* copy complete record in skb */ 1757 - chcr_ktls_copy_record_in_skb(nskb, record); 1678 + chcr_ktls_copy_record_in_skb(nskb, skb, record); 1758 1679 /* packet is being sent from the beginning, update the tcp_seq 1759 1680 * accordingly. 1760 1681 */ 1761 1682 tcp_seq = tls_record_start_seq(record); 1762 - /* reset snd una, so the middle record won't send the already 1763 - * sent part. 1764 - */ 1765 - if (chcr_ktls_update_snd_una(tx_info, q)) 1766 - goto out; 1683 + /* reset skb offset */ 1684 + skb_offset = 0; 1685 + 1686 + if (last_wr) 1687 + dev_kfree_skb_any(skb); 1688 + 1689 + last_wr = true; 1690 + 1767 1691 atomic64_inc(&tx_info->adap->ch_ktls_stats.ktls_tx_end_pkts); 1768 1692 } 1769 1693 1770 1694 if (chcr_ktls_xmit_wr_complete(nskb, tx_info, q, tcp_seq, 1695 + last_wr, record->len, skb_offset, 1696 + record->num_frags, 1771 1697 (last_wr && tcp_push_no_fin), 1772 1698 mss)) { 1773 1699 goto out; 1774 1700 } 1701 + tx_info->prev_seq = record->end_seq; 1775 1702 return 0; 1776 1703 out: 1777 1704 dev_kfree_skb_any(nskb); ··· 1810 1723 struct sk_buff *skb, 1811 1724 struct tls_record_info *record, 1812 1725 u32 tcp_seq, int mss, bool tcp_push_no_fin, 1726 + u32 data_len, u32 skb_offset, 1813 1727 struct sge_eth_txq *q, u32 tls_end_offset) 1814 1728 { 1815 1729 u32 tls_rec_offset = tcp_seq - tls_record_start_seq(record); 1816 1730 u8 prior_data[16] = {0}; 1817 1731 u32 prior_data_len = 0; 1818 - u32 data_len; 1819 1732 1820 1733 /* check if the skb is ending in middle of tag/HASH, its a big 1821 1734 * trouble, send the packet before the HASH. 1822 1735 */ 1823 - int remaining_record = tls_end_offset - skb->data_len; 1736 + int remaining_record = tls_end_offset - data_len; 1824 1737 1825 1738 if (remaining_record > 0 && 1826 1739 remaining_record < TLS_CIPHER_AES_GCM_128_TAG_SIZE) { 1827 - int trimmed_len = skb->data_len - 1828 - (TLS_CIPHER_AES_GCM_128_TAG_SIZE - remaining_record); 1829 - struct sk_buff *tmp_skb = NULL; 1830 - /* don't process the pkt if it is only a partial tag */ 1831 - if (skb->data_len < TLS_CIPHER_AES_GCM_128_TAG_SIZE) 1832 - goto out; 1740 + int trimmed_len = 0; 1833 1741 1834 - WARN_ON(trimmed_len > skb->data_len); 1742 + if (tls_end_offset > TLS_CIPHER_AES_GCM_128_TAG_SIZE) 1743 + trimmed_len = data_len - 1744 + (TLS_CIPHER_AES_GCM_128_TAG_SIZE - 1745 + remaining_record); 1746 + if (!trimmed_len) 1747 + return FALLBACK; 1835 1748 1836 - /* shift to those many bytes */ 1837 - tmp_skb = alloc_skb(0, GFP_KERNEL); 1838 - if (unlikely(!tmp_skb)) 1839 - goto out; 1749 + WARN_ON(trimmed_len > data_len); 1840 1750 1841 - chcr_ktls_skb_shift(tmp_skb, skb, trimmed_len); 1842 - /* free the last trimmed portion */ 1843 - dev_kfree_skb_any(skb); 1844 - skb = tmp_skb; 1751 + data_len = trimmed_len; 1845 1752 atomic64_inc(&tx_info->adap->ch_ktls_stats.ktls_tx_trimmed_pkts); 1846 1753 } 1847 - data_len = skb->data_len; 1754 + 1755 + /* check if it is only the header part. */ 1756 + if (tls_rec_offset + data_len <= (TLS_HEADER_SIZE + tx_info->iv_size)) { 1757 + if (chcr_ktls_tx_plaintxt(tx_info, skb, tcp_seq, mss, 1758 + tcp_push_no_fin, q, 1759 + tx_info->port_id, prior_data, 1760 + data_len, skb_offset, prior_data_len)) 1761 + goto out; 1762 + 1763 + tx_info->prev_seq = tcp_seq + data_len; 1764 + return 0; 1765 + } 1766 + 1848 1767 /* check if the middle record's start point is 16 byte aligned. CTR 1849 1768 * needs 16 byte aligned start point to start encryption. 1850 1769 */ ··· 1911 1818 } 1912 1819 /* reset tcp_seq as per the prior_data_required len */ 1913 1820 tcp_seq -= prior_data_len; 1914 - /* include prio_data_len for further calculation. 1915 - */ 1916 - data_len += prior_data_len; 1917 1821 } 1918 1822 /* reset snd una, so the middle record won't send the already 1919 1823 * sent part. ··· 1919 1829 goto out; 1920 1830 atomic64_inc(&tx_info->adap->ch_ktls_stats.ktls_tx_middle_pkts); 1921 1831 } else { 1922 - /* Else means, its a partial first part of the record. Check if 1923 - * its only the header, don't need to send for encryption then. 1924 - */ 1925 - if (data_len <= TLS_HEADER_SIZE + tx_info->iv_size) { 1926 - if (chcr_ktls_tx_plaintxt(tx_info, skb, tcp_seq, mss, 1927 - tcp_push_no_fin, q, 1928 - tx_info->port_id, 1929 - prior_data, 1930 - prior_data_len)) { 1931 - goto out; 1932 - } 1933 - return 0; 1934 - } 1935 1832 atomic64_inc(&tx_info->adap->ch_ktls_stats.ktls_tx_start_pkts); 1936 1833 } 1937 1834 1938 1835 if (chcr_ktls_xmit_wr_short(skb, tx_info, q, tcp_seq, tcp_push_no_fin, 1939 1836 mss, tls_rec_offset, prior_data, 1940 - prior_data_len)) { 1837 + prior_data_len, data_len, skb_offset)) { 1941 1838 goto out; 1942 1839 } 1943 1840 1841 + tx_info->prev_seq = tcp_seq + data_len + prior_data_len; 1944 1842 return 0; 1945 1843 out: 1946 1844 dev_kfree_skb_any(skb); 1947 1845 return NETDEV_TX_BUSY; 1948 1846 } 1949 1847 1848 + static int chcr_ktls_sw_fallback(struct sk_buff *skb, 1849 + struct chcr_ktls_info *tx_info, 1850 + struct sge_eth_txq *q) 1851 + { 1852 + u32 data_len, skb_offset; 1853 + struct sk_buff *nskb; 1854 + struct tcphdr *th; 1855 + 1856 + nskb = tls_encrypt_skb(skb); 1857 + 1858 + if (!nskb) 1859 + return 0; 1860 + 1861 + th = tcp_hdr(nskb); 1862 + skb_offset = skb_transport_offset(nskb) + tcp_hdrlen(nskb); 1863 + data_len = nskb->len - skb_offset; 1864 + skb_tx_timestamp(nskb); 1865 + 1866 + if (chcr_ktls_tunnel_pkt(tx_info, nskb, q)) 1867 + goto out; 1868 + 1869 + tx_info->prev_seq = ntohl(th->seq) + data_len; 1870 + atomic64_inc(&tx_info->adap->ch_ktls_stats.ktls_tx_fallback); 1871 + return 0; 1872 + out: 1873 + dev_kfree_skb_any(nskb); 1874 + return 0; 1875 + } 1950 1876 /* nic tls TX handler */ 1951 1877 static int chcr_ktls_xmit(struct sk_buff *skb, struct net_device *dev) 1952 1878 { 1879 + u32 tls_end_offset, tcp_seq, skb_data_len, skb_offset; 1953 1880 struct ch_ktls_port_stats_debug *port_stats; 1954 1881 struct chcr_ktls_ofld_ctx_tx *tx_ctx; 1955 1882 struct ch_ktls_stats_debug *stats; ··· 1974 1867 int data_len, qidx, ret = 0, mss; 1975 1868 struct tls_record_info *record; 1976 1869 struct chcr_ktls_info *tx_info; 1977 - u32 tls_end_offset, tcp_seq; 1978 1870 struct tls_context *tls_ctx; 1979 - struct sk_buff *local_skb; 1980 1871 struct sge_eth_txq *q; 1981 1872 struct adapter *adap; 1982 1873 unsigned long flags; 1983 1874 1984 1875 tcp_seq = ntohl(th->seq); 1876 + skb_offset = skb_transport_offset(skb) + tcp_hdrlen(skb); 1877 + skb_data_len = skb->len - skb_offset; 1878 + data_len = skb_data_len; 1985 1879 1986 - mss = skb_is_gso(skb) ? skb_shinfo(skb)->gso_size : skb->data_len; 1987 - 1988 - /* check if we haven't set it for ktls offload */ 1989 - if (!skb->sk || !tls_is_sk_tx_device_offloaded(skb->sk)) 1990 - goto out; 1880 + mss = skb_is_gso(skb) ? skb_shinfo(skb)->gso_size : data_len; 1991 1881 1992 1882 tls_ctx = tls_get_ctx(skb->sk); 1993 1883 if (unlikely(tls_ctx->netdev != dev)) ··· 1995 1891 1996 1892 if (unlikely(!tx_info)) 1997 1893 goto out; 1998 - 1999 - /* don't touch the original skb, make a new skb to extract each records 2000 - * and send them separately. 2001 - */ 2002 - local_skb = alloc_skb(0, GFP_KERNEL); 2003 - 2004 - if (unlikely(!local_skb)) 2005 - return NETDEV_TX_BUSY; 2006 1894 2007 1895 adap = tx_info->adap; 2008 1896 stats = &adap->ch_ktls_stats; ··· 2010 1914 if (ret) 2011 1915 return NETDEV_TX_BUSY; 2012 1916 } 2013 - /* update tcb */ 2014 - ret = chcr_ktls_xmit_tcb_cpls(tx_info, q, ntohl(th->seq), 2015 - ntohl(th->ack_seq), 2016 - ntohs(th->window)); 2017 - if (ret) { 2018 - dev_kfree_skb_any(local_skb); 2019 - return NETDEV_TX_BUSY; 2020 - } 2021 1917 2022 - /* copy skb contents into local skb */ 2023 - chcr_ktls_skb_copy(skb, local_skb); 2024 - 2025 - /* go through the skb and send only one record at a time. */ 2026 - data_len = skb->data_len; 2027 1918 /* TCP segments can be in received either complete or partial. 2028 1919 * chcr_end_part_handler will handle cases if complete record or end 2029 1920 * part of the record is received. Incase of partial end part of record, ··· 2035 1952 goto out; 2036 1953 } 2037 1954 1955 + tls_end_offset = record->end_seq - tcp_seq; 1956 + 1957 + pr_debug("seq 0x%x, end_seq 0x%x prev_seq 0x%x, datalen 0x%x\n", 1958 + tcp_seq, record->end_seq, tx_info->prev_seq, data_len); 1959 + /* update tcb for the skb */ 1960 + if (skb_data_len == data_len) { 1961 + u32 tx_max = tcp_seq; 1962 + 1963 + if (!tls_record_is_start_marker(record) && 1964 + tls_end_offset < TLS_CIPHER_AES_GCM_128_TAG_SIZE) 1965 + tx_max = record->end_seq - 1966 + TLS_CIPHER_AES_GCM_128_TAG_SIZE; 1967 + 1968 + ret = chcr_ktls_xmit_tcb_cpls(tx_info, q, tx_max, 1969 + ntohl(th->ack_seq), 1970 + ntohs(th->window), 1971 + tls_end_offset != 1972 + record->len); 1973 + if (ret) { 1974 + spin_unlock_irqrestore(&tx_ctx->base.lock, 1975 + flags); 1976 + goto out; 1977 + } 1978 + 1979 + if (th->fin) 1980 + skb_get(skb); 1981 + } 1982 + 2038 1983 if (unlikely(tls_record_is_start_marker(record))) { 2039 - spin_unlock_irqrestore(&tx_ctx->base.lock, flags); 2040 1984 atomic64_inc(&port_stats->ktls_tx_skip_no_sync_data); 2041 - goto out; 1985 + /* If tls_end_offset < data_len, means there is some 1986 + * data after start marker, which needs encryption, send 1987 + * plaintext first and take skb refcount. else send out 1988 + * complete pkt as plaintext. 1989 + */ 1990 + if (tls_end_offset < data_len) 1991 + skb_get(skb); 1992 + else 1993 + tls_end_offset = data_len; 1994 + 1995 + ret = chcr_ktls_tx_plaintxt(tx_info, skb, tcp_seq, mss, 1996 + (!th->fin && th->psh), q, 1997 + tx_info->port_id, NULL, 1998 + tls_end_offset, skb_offset, 1999 + 0); 2000 + 2001 + spin_unlock_irqrestore(&tx_ctx->base.lock, flags); 2002 + if (ret) { 2003 + /* free the refcount taken earlier */ 2004 + if (tls_end_offset < data_len) 2005 + dev_kfree_skb_any(skb); 2006 + goto out; 2007 + } 2008 + 2009 + data_len -= tls_end_offset; 2010 + tcp_seq = record->end_seq; 2011 + skb_offset += tls_end_offset; 2012 + continue; 2042 2013 } 2043 2014 2044 2015 /* increase page reference count of the record, so that there ··· 2104 1967 /* lock cleared */ 2105 1968 spin_unlock_irqrestore(&tx_ctx->base.lock, flags); 2106 1969 2107 - tls_end_offset = record->end_seq - tcp_seq; 2108 1970 2109 - pr_debug("seq 0x%x, end_seq 0x%x prev_seq 0x%x, datalen 0x%x\n", 2110 - tcp_seq, record->end_seq, tx_info->prev_seq, data_len); 2111 1971 /* if a tls record is finishing in this SKB */ 2112 1972 if (tls_end_offset <= data_len) { 2113 - struct sk_buff *nskb = NULL; 2114 - 2115 - if (tls_end_offset < data_len) { 2116 - nskb = alloc_skb(0, GFP_KERNEL); 2117 - if (unlikely(!nskb)) { 2118 - ret = -ENOMEM; 2119 - goto clear_ref; 2120 - } 2121 - 2122 - chcr_ktls_skb_shift(nskb, local_skb, 2123 - tls_end_offset); 2124 - } else { 2125 - /* its the only record in this skb, directly 2126 - * point it. 2127 - */ 2128 - nskb = local_skb; 2129 - } 2130 - ret = chcr_end_part_handler(tx_info, nskb, record, 1973 + ret = chcr_end_part_handler(tx_info, skb, record, 2131 1974 tcp_seq, mss, 2132 1975 (!th->fin && th->psh), q, 1976 + skb_offset, 2133 1977 tls_end_offset, 2134 - (nskb == local_skb)); 2135 - 2136 - if (ret && nskb != local_skb) 2137 - dev_kfree_skb_any(local_skb); 1978 + skb_offset + 1979 + tls_end_offset == skb->len); 2138 1980 2139 1981 data_len -= tls_end_offset; 2140 1982 /* tcp_seq increment is required to handle next record. 2141 1983 */ 2142 1984 tcp_seq += tls_end_offset; 1985 + skb_offset += tls_end_offset; 2143 1986 } else { 2144 - ret = chcr_short_record_handler(tx_info, local_skb, 1987 + ret = chcr_short_record_handler(tx_info, skb, 2145 1988 record, tcp_seq, mss, 2146 1989 (!th->fin && th->psh), 1990 + data_len, skb_offset, 2147 1991 q, tls_end_offset); 2148 1992 data_len = 0; 2149 1993 } 2150 - clear_ref: 1994 + 2151 1995 /* clear the frag ref count which increased locally before */ 2152 1996 for (i = 0; i < record->num_frags; i++) { 2153 1997 /* clear the frag ref count */ 2154 1998 __skb_frag_unref(&record->frags[i]); 2155 1999 } 2156 2000 /* if any failure, come out from the loop. */ 2157 - if (ret) 2158 - goto out; 2001 + if (ret) { 2002 + if (th->fin) 2003 + dev_kfree_skb_any(skb); 2004 + 2005 + if (ret == FALLBACK) 2006 + return chcr_ktls_sw_fallback(skb, tx_info, q); 2007 + 2008 + return NETDEV_TX_OK; 2009 + } 2010 + 2159 2011 /* length should never be less than 0 */ 2160 2012 WARN_ON(data_len < 0); 2161 2013 2162 2014 } while (data_len > 0); 2163 2015 2164 - tx_info->prev_seq = ntohl(th->seq) + skb->data_len; 2165 2016 atomic64_inc(&port_stats->ktls_tx_encrypted_packets); 2166 - atomic64_add(skb->data_len, &port_stats->ktls_tx_encrypted_bytes); 2017 + atomic64_add(skb_data_len, &port_stats->ktls_tx_encrypted_bytes); 2167 2018 2168 2019 /* tcp finish is set, send a separate tcp msg including all the options 2169 2020 * as well. 2170 2021 */ 2171 - if (th->fin) 2022 + if (th->fin) { 2172 2023 chcr_ktls_write_tcp_options(tx_info, skb, q, tx_info->tx_chan); 2024 + dev_kfree_skb_any(skb); 2025 + } 2173 2026 2027 + return NETDEV_TX_OK; 2174 2028 out: 2175 2029 dev_kfree_skb_any(skb); 2176 2030 return NETDEV_TX_OK;

+1

drivers/net/ethernet/chelsio/inline_crypto/ch_ktls/chcr_ktls.h

··· 26 26 27 27 #define CHCR_KTLS_WR_SIZE (CHCR_PLAIN_TX_DATA_LEN +\ 28 28 sizeof(struct cpl_tx_sec_pdu)) 29 + #define FALLBACK 35 29 30 30 31 enum ch_ktls_open_state { 31 32 CH_KTLS_OPEN_SUCCESS = 0,

+24 -2

drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c

··· 2713 2713 spin_unlock_bh(&vsi->mac_filter_hash_lock); 2714 2714 goto error_param; 2715 2715 } 2716 + if (is_valid_ether_addr(al->list[i].addr) && 2717 + is_zero_ether_addr(vf->default_lan_addr.addr)) 2718 + ether_addr_copy(vf->default_lan_addr.addr, 2719 + al->list[i].addr); 2716 2720 } 2717 2721 } 2718 2722 spin_unlock_bh(&vsi->mac_filter_hash_lock); ··· 2744 2740 { 2745 2741 struct virtchnl_ether_addr_list *al = 2746 2742 (struct virtchnl_ether_addr_list *)msg; 2743 + bool was_unimac_deleted = false; 2747 2744 struct i40e_pf *pf = vf->pf; 2748 2745 struct i40e_vsi *vsi = NULL; 2749 2746 i40e_status ret = 0; ··· 2764 2759 ret = I40E_ERR_INVALID_MAC_ADDR; 2765 2760 goto error_param; 2766 2761 } 2762 + if (ether_addr_equal(al->list[i].addr, vf->default_lan_addr.addr)) 2763 + was_unimac_deleted = true; 2767 2764 } 2768 2765 vsi = pf->vsi[vf->lan_vsi_idx]; 2769 2766 ··· 2786 2779 dev_err(&pf->pdev->dev, "Unable to program VF %d MAC filters, error %d\n", 2787 2780 vf->vf_id, ret); 2788 2781 2782 + if (vf->trusted && was_unimac_deleted) { 2783 + struct i40e_mac_filter *f; 2784 + struct hlist_node *h; 2785 + u8 *macaddr = NULL; 2786 + int bkt; 2787 + 2788 + /* set last unicast mac address as default */ 2789 + spin_lock_bh(&vsi->mac_filter_hash_lock); 2790 + hash_for_each_safe(vsi->mac_filter_hash, bkt, h, f, hlist) { 2791 + if (is_valid_ether_addr(f->macaddr)) 2792 + macaddr = f->macaddr; 2793 + } 2794 + if (macaddr) 2795 + ether_addr_copy(vf->default_lan_addr.addr, macaddr); 2796 + spin_unlock_bh(&vsi->mac_filter_hash_lock); 2797 + } 2789 2798 error_param: 2790 2799 /* send the response to the VF */ 2791 - return i40e_vc_send_resp_to_vf(vf, VIRTCHNL_OP_DEL_ETH_ADDR, 2792 - ret); 2800 + return i40e_vc_send_resp_to_vf(vf, VIRTCHNL_OP_DEL_ETH_ADDR, ret); 2793 2801 } 2794 2802 2795 2803 /**

+1 -1

drivers/net/ethernet/intel/i40e/i40e_xsk.c

··· 281 281 unsigned int total_rx_bytes = 0, total_rx_packets = 0; 282 282 u16 cleaned_count = I40E_DESC_UNUSED(rx_ring); 283 283 unsigned int xdp_res, xdp_xmit = 0; 284 + bool failure = false; 284 285 struct sk_buff *skb; 285 - bool failure; 286 286 287 287 while (likely(total_rx_packets < (unsigned int)budget)) { 288 288 union i40e_rx_desc *rx_desc;

+8 -6

drivers/net/ethernet/intel/igc/igc_main.c

··· 3891 3891 } 3892 3892 3893 3893 /** 3894 - * igc_get_stats - Get System Network Statistics 3894 + * igc_get_stats64 - Get System Network Statistics 3895 3895 * @netdev: network interface device structure 3896 + * @stats: rtnl_link_stats64 pointer 3896 3897 * 3897 3898 * Returns the address of the device statistics structure. 3898 3899 * The statistics are updated here and also from the timer callback. 3899 3900 */ 3900 - static struct net_device_stats *igc_get_stats(struct net_device *netdev) 3901 + static void igc_get_stats64(struct net_device *netdev, 3902 + struct rtnl_link_stats64 *stats) 3901 3903 { 3902 3904 struct igc_adapter *adapter = netdev_priv(netdev); 3903 3905 3906 + spin_lock(&adapter->stats64_lock); 3904 3907 if (!test_bit(__IGC_RESETTING, &adapter->state)) 3905 3908 igc_update_stats(adapter); 3906 - 3907 - /* only return the current stats */ 3908 - return &netdev->stats; 3909 + memcpy(stats, &adapter->stats64, sizeof(*stats)); 3910 + spin_unlock(&adapter->stats64_lock); 3909 3911 } 3910 3912 3911 3913 static netdev_features_t igc_fix_features(struct net_device *netdev, ··· 4857 4855 .ndo_set_rx_mode = igc_set_rx_mode, 4858 4856 .ndo_set_mac_address = igc_set_mac, 4859 4857 .ndo_change_mtu = igc_change_mtu, 4860 - .ndo_get_stats = igc_get_stats, 4858 + .ndo_get_stats64 = igc_get_stats64, 4861 4859 .ndo_fix_features = igc_fix_features, 4862 4860 .ndo_set_features = igc_set_features, 4863 4861 .ndo_features_check = igc_features_check,

+1

drivers/net/ethernet/marvell/prestera/Kconfig

··· 6 6 config PRESTERA 7 7 tristate "Marvell Prestera Switch ASICs support" 8 8 depends on NET_SWITCHDEV && VLAN_8021Q 9 + depends on BRIDGE || BRIDGE=n 9 10 select NET_DEVLINK 10 11 help 11 12 This driver supports Marvell Prestera Switch ASICs family.

+5 -1

drivers/net/ethernet/mellanox/mlx5/core/en/rep/tc.c

··· 107 107 mlx5e_tc_encap_flows_del(priv, e, &flow_list); 108 108 109 109 if (neigh_connected && !(e->flags & MLX5_ENCAP_ENTRY_VALID)) { 110 + struct net_device *route_dev; 111 + 110 112 ether_addr_copy(e->h_dest, ha); 111 113 ether_addr_copy(eth->h_dest, ha); 112 114 /* Update the encap source mac, in case that we delete 113 115 * the flows when encap source mac changed. 114 116 */ 115 - ether_addr_copy(eth->h_source, e->route_dev->dev_addr); 117 + route_dev = __dev_get_by_index(dev_net(priv->netdev), e->route_dev_ifindex); 118 + if (route_dev) 119 + ether_addr_copy(eth->h_source, route_dev->dev_addr); 116 120 117 121 mlx5e_tc_encap_flows_add(priv, e, &flow_list); 118 122 }

+46 -26

drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun.c

··· 77 77 return 0; 78 78 } 79 79 80 - static int mlx5e_route_lookup_ipv4(struct mlx5e_priv *priv, 81 - struct net_device *mirred_dev, 82 - struct net_device **out_dev, 83 - struct net_device **route_dev, 84 - struct flowi4 *fl4, 85 - struct neighbour **out_n, 86 - u8 *out_ttl) 80 + static int mlx5e_route_lookup_ipv4_get(struct mlx5e_priv *priv, 81 + struct net_device *mirred_dev, 82 + struct net_device **out_dev, 83 + struct net_device **route_dev, 84 + struct flowi4 *fl4, 85 + struct neighbour **out_n, 86 + u8 *out_ttl) 87 87 { 88 88 struct neighbour *n; 89 89 struct rtable *rt; ··· 117 117 ip_rt_put(rt); 118 118 return ret; 119 119 } 120 + dev_hold(*route_dev); 120 121 121 122 if (!(*out_ttl)) 122 123 *out_ttl = ip4_dst_hoplimit(&rt->dst); 123 124 n = dst_neigh_lookup(&rt->dst, &fl4->daddr); 124 125 ip_rt_put(rt); 125 - if (!n) 126 + if (!n) { 127 + dev_put(*route_dev); 126 128 return -ENOMEM; 129 + } 127 130 128 131 *out_n = n; 129 132 return 0; 133 + } 134 + 135 + static void mlx5e_route_lookup_ipv4_put(struct net_device *route_dev, 136 + struct neighbour *n) 137 + { 138 + neigh_release(n); 139 + dev_put(route_dev); 130 140 } 131 141 132 142 static const char *mlx5e_netdev_kind(struct net_device *dev) ··· 203 193 fl4.saddr = tun_key->u.ipv4.src; 204 194 ttl = tun_key->ttl; 205 195 206 - err = mlx5e_route_lookup_ipv4(priv, mirred_dev, &out_dev, &route_dev, 207 - &fl4, &n, &ttl); 196 + err = mlx5e_route_lookup_ipv4_get(priv, mirred_dev, &out_dev, &route_dev, 197 + &fl4, &n, &ttl); 208 198 if (err) 209 199 return err; 210 200 ··· 233 223 e->m_neigh.family = n->ops->family; 234 224 memcpy(&e->m_neigh.dst_ip, n->primary_key, n->tbl->key_len); 235 225 e->out_dev = out_dev; 236 - e->route_dev = route_dev; 226 + e->route_dev_ifindex = route_dev->ifindex; 237 227 238 228 /* It's important to add the neigh to the hash table before checking 239 229 * the neigh validity state. So if we'll get a notification, in case the ··· 288 278 289 279 e->flags |= MLX5_ENCAP_ENTRY_VALID; 290 280 mlx5e_rep_queue_neigh_stats_work(netdev_priv(out_dev)); 291 - neigh_release(n); 281 + mlx5e_route_lookup_ipv4_put(route_dev, n); 292 282 return err; 293 283 294 284 destroy_neigh_entry: ··· 296 286 free_encap: 297 287 kfree(encap_header); 298 288 release_neigh: 299 - neigh_release(n); 289 + mlx5e_route_lookup_ipv4_put(route_dev, n); 300 290 return err; 301 291 } 302 292 303 293 #if IS_ENABLED(CONFIG_INET) && IS_ENABLED(CONFIG_IPV6) 304 - static int mlx5e_route_lookup_ipv6(struct mlx5e_priv *priv, 305 - struct net_device *mirred_dev, 306 - struct net_device **out_dev, 307 - struct net_device **route_dev, 308 - struct flowi6 *fl6, 309 - struct neighbour **out_n, 310 - u8 *out_ttl) 294 + static int mlx5e_route_lookup_ipv6_get(struct mlx5e_priv *priv, 295 + struct net_device *mirred_dev, 296 + struct net_device **out_dev, 297 + struct net_device **route_dev, 298 + struct flowi6 *fl6, 299 + struct neighbour **out_n, 300 + u8 *out_ttl) 311 301 { 312 302 struct dst_entry *dst; 313 303 struct neighbour *n; ··· 328 318 return ret; 329 319 } 330 320 321 + dev_hold(*route_dev); 331 322 n = dst_neigh_lookup(dst, &fl6->daddr); 332 323 dst_release(dst); 333 - if (!n) 324 + if (!n) { 325 + dev_put(*route_dev); 334 326 return -ENOMEM; 327 + } 335 328 336 329 *out_n = n; 337 330 return 0; 331 + } 332 + 333 + static void mlx5e_route_lookup_ipv6_put(struct net_device *route_dev, 334 + struct neighbour *n) 335 + { 336 + neigh_release(n); 337 + dev_put(route_dev); 338 338 } 339 339 340 340 int mlx5e_tc_tun_create_header_ipv6(struct mlx5e_priv *priv, ··· 368 348 fl6.daddr = tun_key->u.ipv6.dst; 369 349 fl6.saddr = tun_key->u.ipv6.src; 370 350 371 - err = mlx5e_route_lookup_ipv6(priv, mirred_dev, &out_dev, &route_dev, 372 - &fl6, &n, &ttl); 351 + err = mlx5e_route_lookup_ipv6_get(priv, mirred_dev, &out_dev, &route_dev, 352 + &fl6, &n, &ttl); 373 353 if (err) 374 354 return err; 375 355 ··· 398 378 e->m_neigh.family = n->ops->family; 399 379 memcpy(&e->m_neigh.dst_ip, n->primary_key, n->tbl->key_len); 400 380 e->out_dev = out_dev; 401 - e->route_dev = route_dev; 381 + e->route_dev_ifindex = route_dev->ifindex; 402 382 403 383 /* It's importent to add the neigh to the hash table before checking 404 384 * the neigh validity state. So if we'll get a notification, in case the ··· 453 433 454 434 e->flags |= MLX5_ENCAP_ENTRY_VALID; 455 435 mlx5e_rep_queue_neigh_stats_work(netdev_priv(out_dev)); 456 - neigh_release(n); 436 + mlx5e_route_lookup_ipv6_put(route_dev, n); 457 437 return err; 458 438 459 439 destroy_neigh_entry: ··· 461 441 free_encap: 462 442 kfree(encap_header); 463 443 release_neigh: 464 - neigh_release(n); 444 + mlx5e_route_lookup_ipv6_put(route_dev, n); 465 445 return err; 466 446 } 467 447 #endif

+2 -2

drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c

··· 122 122 set_bit(MLX5E_RQ_STATE_ENABLED, &c->xskrq.state); 123 123 /* TX queue is created active. */ 124 124 125 - spin_lock(&c->async_icosq_lock); 125 + spin_lock_bh(&c->async_icosq_lock); 126 126 mlx5e_trigger_irq(&c->async_icosq); 127 - spin_unlock(&c->async_icosq_lock); 127 + spin_unlock_bh(&c->async_icosq_lock); 128 128 } 129 129 130 130 void mlx5e_deactivate_xsk(struct mlx5e_channel *c)

+2 -2

drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c

··· 36 36 if (test_and_set_bit(MLX5E_SQ_STATE_PENDING_XSK_TX, &c->async_icosq.state)) 37 37 return 0; 38 38 39 - spin_lock(&c->async_icosq_lock); 39 + spin_lock_bh(&c->async_icosq_lock); 40 40 mlx5e_trigger_irq(&c->async_icosq); 41 - spin_unlock(&c->async_icosq_lock); 41 + spin_unlock_bh(&c->async_icosq_lock); 42 42 } 43 43 44 44 return 0;

+7 -7

drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_rx.c

··· 188 188 189 189 err = 0; 190 190 sq = &c->async_icosq; 191 - spin_lock(&c->async_icosq_lock); 191 + spin_lock_bh(&c->async_icosq_lock); 192 192 193 193 cseg = post_static_params(sq, priv_rx); 194 194 if (IS_ERR(cseg)) ··· 199 199 200 200 mlx5e_notify_hw(&sq->wq, sq->pc, sq->uar_map, cseg); 201 201 unlock: 202 - spin_unlock(&c->async_icosq_lock); 202 + spin_unlock_bh(&c->async_icosq_lock); 203 203 204 204 return err; 205 205 ··· 265 265 266 266 BUILD_BUG_ON(MLX5E_KTLS_GET_PROGRESS_WQEBBS != 1); 267 267 268 - spin_lock(&sq->channel->async_icosq_lock); 268 + spin_lock_bh(&sq->channel->async_icosq_lock); 269 269 270 270 if (unlikely(!mlx5e_wqc_has_room_for(&sq->wq, sq->cc, sq->pc, 1))) { 271 - spin_unlock(&sq->channel->async_icosq_lock); 271 + spin_unlock_bh(&sq->channel->async_icosq_lock); 272 272 err = -ENOSPC; 273 273 goto err_dma_unmap; 274 274 } ··· 299 299 icosq_fill_wi(sq, pi, &wi); 300 300 sq->pc++; 301 301 mlx5e_notify_hw(&sq->wq, sq->pc, sq->uar_map, cseg); 302 - spin_unlock(&sq->channel->async_icosq_lock); 302 + spin_unlock_bh(&sq->channel->async_icosq_lock); 303 303 304 304 return 0; 305 305 ··· 360 360 err = 0; 361 361 362 362 sq = &c->async_icosq; 363 - spin_lock(&c->async_icosq_lock); 363 + spin_lock_bh(&c->async_icosq_lock); 364 364 365 365 cseg = post_static_params(sq, priv_rx); 366 366 if (IS_ERR(cseg)) { ··· 372 372 mlx5e_notify_hw(&sq->wq, sq->pc, sq->uar_map, cseg); 373 373 priv_rx->stats->tls_resync_res_ok++; 374 374 unlock: 375 - spin_unlock(&c->async_icosq_lock); 375 + spin_unlock_bh(&c->async_icosq_lock); 376 376 377 377 return err; 378 378 }

+1

drivers/net/ethernet/mellanox/mlx5/core/en_main.c

··· 5233 5233 5234 5234 mlx5e_disable_async_events(priv); 5235 5235 mlx5_lag_remove(mdev); 5236 + mlx5_vxlan_reset_to_default(mdev->vxlan); 5236 5237 } 5237 5238 5238 5239 int mlx5e_update_nic_rx(struct mlx5e_priv *priv)

+1 -1

drivers/net/ethernet/mellanox/mlx5/core/en_rep.h

··· 186 186 unsigned char h_dest[ETH_ALEN]; /* destination eth addr */ 187 187 188 188 struct net_device *out_dev; 189 - struct net_device *route_dev; 189 + int route_dev_ifindex; 190 190 struct mlx5e_tc_tunnel *tunnel; 191 191 int reformat_type; 192 192 u8 flags;

+1 -1

drivers/net/ethernet/mellanox/mlx5/core/en_rx.c

··· 1584 1584 } while ((++work_done < budget) && (cqe = mlx5_cqwq_get_cqe(cqwq))); 1585 1585 1586 1586 out: 1587 - if (rq->xdp_prog) 1587 + if (rcu_access_pointer(rq->xdp_prog)) 1588 1588 mlx5e_xdp_rx_poll_complete(rq); 1589 1589 1590 1590 mlx5_cqwq_update_db_record(cqwq);

+2

drivers/net/ethernet/mellanox/mlx5/core/en_tc.c

··· 4658 4658 return flow; 4659 4659 4660 4660 err_free: 4661 + dealloc_mod_hdr_actions(&parse_attr->mod_hdr_acts); 4661 4662 mlx5e_flow_put(priv, flow); 4662 4663 out: 4663 4664 return ERR_PTR(err); ··· 4803 4802 return 0; 4804 4803 4805 4804 err_free: 4805 + dealloc_mod_hdr_actions(&parse_attr->mod_hdr_acts); 4806 4806 mlx5e_flow_put(priv, flow); 4807 4807 out: 4808 4808 return err;

-2

drivers/net/ethernet/mellanox/mlx5/core/eswitch.c

··· 1902 1902 ether_addr_copy(hw_addr, vport->info.mac); 1903 1903 *hw_addr_len = ETH_ALEN; 1904 1904 err = 0; 1905 - } else { 1906 - NL_SET_ERR_MSG_MOD(extack, "Eswitch vport is disabled"); 1907 1905 } 1908 1906 mutex_unlock(&esw->state_lock); 1909 1907 return err;

+4 -3

drivers/net/ethernet/mellanox/mlx5/core/fs_core.c

··· 2010 2010 down_write_ref_node(&fte->node, false); 2011 2011 for (i = handle->num_rules - 1; i >= 0; i--) 2012 2012 tree_remove_node(&handle->rule[i]->node, true); 2013 - if (fte->modify_mask && fte->dests_size) { 2014 - modify_fte(fte); 2013 + if (fte->dests_size) { 2014 + if (fte->modify_mask) 2015 + modify_fte(fte); 2015 2016 up_write_ref_node(&fte->node, false); 2016 - } else { 2017 + } else if (list_empty(&fte->node.children)) { 2017 2018 del_hw_fte(&fte->node); 2018 2019 /* Avoid double call to del_hw_fte */ 2019 2020 fte->node.del_hw_func = NULL;

+17 -6

drivers/net/ethernet/mellanox/mlx5/core/lib/vxlan.c

··· 168 168 169 169 void mlx5_vxlan_destroy(struct mlx5_vxlan *vxlan) 170 170 { 171 + if (!mlx5_vxlan_allowed(vxlan)) 172 + return; 173 + 174 + mlx5_vxlan_del_port(vxlan, IANA_VXLAN_UDP_PORT); 175 + WARN_ON(!hash_empty(vxlan->htable)); 176 + 177 + kfree(vxlan); 178 + } 179 + 180 + void mlx5_vxlan_reset_to_default(struct mlx5_vxlan *vxlan) 181 + { 171 182 struct mlx5_vxlan_port *vxlanp; 172 183 struct hlist_node *tmp; 173 184 int bkt; ··· 186 175 if (!mlx5_vxlan_allowed(vxlan)) 187 176 return; 188 177 189 - /* Lockless since we are the only hash table consumers*/ 190 178 hash_for_each_safe(vxlan->htable, bkt, tmp, vxlanp, hlist) { 191 - hash_del(&vxlanp->hlist); 192 - mlx5_vxlan_core_del_port_cmd(vxlan->mdev, vxlanp->udp_port); 193 - kfree(vxlanp); 179 + /* Don't delete default UDP port added by the HW. 180 + * Remove only user configured ports 181 + */ 182 + if (vxlanp->udp_port == IANA_VXLAN_UDP_PORT) 183 + continue; 184 + mlx5_vxlan_del_port(vxlan, vxlanp->udp_port); 194 185 } 195 - 196 - kfree(vxlan); 197 186 }

+2

drivers/net/ethernet/mellanox/mlx5/core/lib/vxlan.h

··· 56 56 int mlx5_vxlan_add_port(struct mlx5_vxlan *vxlan, u16 port); 57 57 int mlx5_vxlan_del_port(struct mlx5_vxlan *vxlan, u16 port); 58 58 bool mlx5_vxlan_lookup_port(struct mlx5_vxlan *vxlan, u16 port); 59 + void mlx5_vxlan_reset_to_default(struct mlx5_vxlan *vxlan); 59 60 #else 60 61 static inline struct mlx5_vxlan* 61 62 mlx5_vxlan_create(struct mlx5_core_dev *mdev) { return ERR_PTR(-EOPNOTSUPP); } ··· 64 63 static inline int mlx5_vxlan_add_port(struct mlx5_vxlan *vxlan, u16 port) { return -EOPNOTSUPP; } 65 64 static inline int mlx5_vxlan_del_port(struct mlx5_vxlan *vxlan, u16 port) { return -EOPNOTSUPP; } 66 65 static inline bool mlx5_vxlan_lookup_port(struct mlx5_vxlan *vxlan, u16 port) { return false; } 66 + static inline void mlx5_vxlan_reset_to_default(struct mlx5_vxlan *vxlan) { return; } 67 67 #endif 68 68 69 69 #endif /* __MLX5_VXLAN_H__ */

+10 -14

drivers/net/ethernet/microchip/lan743x_main.c

··· 674 674 static int lan743x_dp_write(struct lan743x_adapter *adapter, 675 675 u32 select, u32 addr, u32 length, u32 *buf) 676 676 { 677 - int ret = -EIO; 678 677 u32 dp_sel; 679 678 int i; 680 679 681 - mutex_lock(&adapter->dp_lock); 682 680 if (lan743x_csr_wait_for_bit(adapter, DP_SEL, DP_SEL_DPRDY_, 683 681 1, 40, 100, 100)) 684 - goto unlock; 682 + return -EIO; 685 683 dp_sel = lan743x_csr_read(adapter, DP_SEL); 686 684 dp_sel &= ~DP_SEL_MASK_; 687 685 dp_sel |= select; ··· 691 693 lan743x_csr_write(adapter, DP_CMD, DP_CMD_WRITE_); 692 694 if (lan743x_csr_wait_for_bit(adapter, DP_SEL, DP_SEL_DPRDY_, 693 695 1, 40, 100, 100)) 694 - goto unlock; 696 + return -EIO; 695 697 } 696 - ret = 0; 697 698 698 - unlock: 699 - mutex_unlock(&adapter->dp_lock); 700 - return ret; 699 + return 0; 701 700 } 702 701 703 702 static u32 lan743x_mac_mii_access(u16 id, u16 index, int read) ··· 1013 1018 static int lan743x_phy_open(struct lan743x_adapter *adapter) 1014 1019 { 1015 1020 struct lan743x_phy *phy = &adapter->phy; 1021 + struct phy_device *phydev = NULL; 1016 1022 struct device_node *phynode; 1017 - struct phy_device *phydev; 1018 1023 struct net_device *netdev; 1019 1024 int ret = -EIO; 1020 1025 1021 1026 netdev = adapter->netdev; 1022 1027 phynode = of_node_get(adapter->pdev->dev.of_node); 1023 - adapter->phy_mode = PHY_INTERFACE_MODE_GMII; 1024 1028 1025 1029 if (phynode) { 1030 + /* try devicetree phy, or fixed link */ 1026 1031 of_get_phy_mode(phynode, &adapter->phy_mode); 1027 1032 1028 1033 if (of_phy_is_fixed_link(phynode)) { ··· 1038 1043 lan743x_phy_link_status_change, 0, 1039 1044 adapter->phy_mode); 1040 1045 of_node_put(phynode); 1041 - if (!phydev) 1042 - goto return_error; 1043 - } else { 1046 + } 1047 + 1048 + if (!phydev) { 1049 + /* try internal phy */ 1044 1050 phydev = phy_find_first(adapter->mdiobus); 1045 1051 if (!phydev) 1046 1052 goto return_error; 1047 1053 1054 + adapter->phy_mode = PHY_INTERFACE_MODE_GMII; 1048 1055 ret = phy_connect_direct(netdev, phydev, 1049 1056 lan743x_phy_link_status_change, 1050 1057 adapter->phy_mode); ··· 2729 2732 2730 2733 adapter->intr.irq = adapter->pdev->irq; 2731 2734 lan743x_csr_write(adapter, INT_EN_CLR, 0xFFFFFFFF); 2732 - mutex_init(&adapter->dp_lock); 2733 2735 2734 2736 ret = lan743x_gpio_init(adapter); 2735 2737 if (ret)

-3

drivers/net/ethernet/microchip/lan743x_main.h

··· 712 712 struct lan743x_csr csr; 713 713 struct lan743x_intr intr; 714 714 715 - /* lock, used to prevent concurrent access to data port */ 716 - struct mutex dp_lock; 717 - 718 715 struct lan743x_gpio gpio; 719 716 struct lan743x_ptp ptp; 720 717

+5 -13

drivers/net/ethernet/realtek/r8169_main.c

··· 4134 4134 opts[1] |= transport_offset << TCPHO_SHIFT; 4135 4135 } else { 4136 4136 if (unlikely(skb->len < ETH_ZLEN && rtl_test_hw_pad_bug(tp))) 4137 - return !eth_skb_pad(skb); 4137 + /* eth_skb_pad would free the skb on error */ 4138 + return !__skb_put_padto(skb, ETH_ZLEN, false); 4138 4139 } 4139 4140 4140 4141 return true; ··· 4314 4313 rtl_chip_supports_csum_v2(tp)) 4315 4314 features &= ~NETIF_F_ALL_TSO; 4316 4315 } else if (skb->ip_summed == CHECKSUM_PARTIAL) { 4317 - if (skb->len < ETH_ZLEN) { 4318 - switch (tp->mac_version) { 4319 - case RTL_GIGA_MAC_VER_11: 4320 - case RTL_GIGA_MAC_VER_12: 4321 - case RTL_GIGA_MAC_VER_17: 4322 - case RTL_GIGA_MAC_VER_34: 4323 - features &= ~NETIF_F_CSUM_MASK; 4324 - break; 4325 - default: 4326 - break; 4327 - } 4328 - } 4316 + /* work around hw bug on some chip versions */ 4317 + if (skb->len < ETH_ZLEN) 4318 + features &= ~NETIF_F_CSUM_MASK; 4329 4319 4330 4320 if (transport_offset > TCPHO_MAX && 4331 4321 rtl_chip_supports_csum_v2(tp))

+2

drivers/net/phy/realtek.c

··· 659 659 { 660 660 PHY_ID_MATCH_EXACT(0x00008201), 661 661 .name = "RTL8201CP Ethernet", 662 + .read_page = rtl821x_read_page, 663 + .write_page = rtl821x_write_page, 662 664 }, { 663 665 PHY_ID_MATCH_EXACT(0x001cc816), 664 666 .name = "RTL8201F Fast Ethernet",

+69 -23

drivers/net/vrf.c

··· 608 608 return ret; 609 609 } 610 610 611 - static int vrf_finish_direct(struct net *net, struct sock *sk, 612 - struct sk_buff *skb) 611 + static void vrf_finish_direct(struct sk_buff *skb) 613 612 { 614 613 struct net_device *vrf_dev = skb->dev; 615 614 ··· 627 628 skb_pull(skb, ETH_HLEN); 628 629 } 629 630 630 - return 1; 631 + /* reset skb device */ 632 + nf_reset_ct(skb); 631 633 } 632 634 633 635 #if IS_ENABLED(CONFIG_IPV6) ··· 707 707 return skb; 708 708 } 709 709 710 + static int vrf_output6_direct_finish(struct net *net, struct sock *sk, 711 + struct sk_buff *skb) 712 + { 713 + vrf_finish_direct(skb); 714 + 715 + return vrf_ip6_local_out(net, sk, skb); 716 + } 717 + 710 718 static int vrf_output6_direct(struct net *net, struct sock *sk, 711 719 struct sk_buff *skb) 712 720 { 721 + int err = 1; 722 + 713 723 skb->protocol = htons(ETH_P_IPV6); 714 724 715 - return NF_HOOK_COND(NFPROTO_IPV6, NF_INET_POST_ROUTING, 716 - net, sk, skb, NULL, skb->dev, 717 - vrf_finish_direct, 718 - !(IPCB(skb)->flags & IPSKB_REROUTED)); 725 + if (!(IPCB(skb)->flags & IPSKB_REROUTED)) 726 + err = nf_hook(NFPROTO_IPV6, NF_INET_POST_ROUTING, net, sk, skb, 727 + NULL, skb->dev, vrf_output6_direct_finish); 728 + 729 + if (likely(err == 1)) 730 + vrf_finish_direct(skb); 731 + 732 + return err; 733 + } 734 + 735 + static int vrf_ip6_out_direct_finish(struct net *net, struct sock *sk, 736 + struct sk_buff *skb) 737 + { 738 + int err; 739 + 740 + err = vrf_output6_direct(net, sk, skb); 741 + if (likely(err == 1)) 742 + err = vrf_ip6_local_out(net, sk, skb); 743 + 744 + return err; 719 745 } 720 746 721 747 static struct sk_buff *vrf_ip6_out_direct(struct net_device *vrf_dev, ··· 754 728 skb->dev = vrf_dev; 755 729 756 730 err = nf_hook(NFPROTO_IPV6, NF_INET_LOCAL_OUT, net, sk, 757 - skb, NULL, vrf_dev, vrf_output6_direct); 731 + skb, NULL, vrf_dev, vrf_ip6_out_direct_finish); 758 732 759 733 if (likely(err == 1)) 760 734 err = vrf_output6_direct(net, sk, skb); 761 735 762 - /* reset skb device */ 763 736 if (likely(err == 1)) 764 - nf_reset_ct(skb); 765 - else 766 - skb = NULL; 737 + return skb; 767 738 768 - return skb; 739 + return NULL; 769 740 } 770 741 771 742 static struct sk_buff *vrf_ip6_out(struct net_device *vrf_dev, ··· 942 919 return skb; 943 920 } 944 921 922 + static int vrf_output_direct_finish(struct net *net, struct sock *sk, 923 + struct sk_buff *skb) 924 + { 925 + vrf_finish_direct(skb); 926 + 927 + return vrf_ip_local_out(net, sk, skb); 928 + } 929 + 945 930 static int vrf_output_direct(struct net *net, struct sock *sk, 946 931 struct sk_buff *skb) 947 932 { 933 + int err = 1; 934 + 948 935 skb->protocol = htons(ETH_P_IP); 949 936 950 - return NF_HOOK_COND(NFPROTO_IPV4, NF_INET_POST_ROUTING, 951 - net, sk, skb, NULL, skb->dev, 952 - vrf_finish_direct, 953 - !(IPCB(skb)->flags & IPSKB_REROUTED)); 937 + if (!(IPCB(skb)->flags & IPSKB_REROUTED)) 938 + err = nf_hook(NFPROTO_IPV4, NF_INET_POST_ROUTING, net, sk, skb, 939 + NULL, skb->dev, vrf_output_direct_finish); 940 + 941 + if (likely(err == 1)) 942 + vrf_finish_direct(skb); 943 + 944 + return err; 945 + } 946 + 947 + static int vrf_ip_out_direct_finish(struct net *net, struct sock *sk, 948 + struct sk_buff *skb) 949 + { 950 + int err; 951 + 952 + err = vrf_output_direct(net, sk, skb); 953 + if (likely(err == 1)) 954 + err = vrf_ip_local_out(net, sk, skb); 955 + 956 + return err; 954 957 } 955 958 956 959 static struct sk_buff *vrf_ip_out_direct(struct net_device *vrf_dev, ··· 989 940 skb->dev = vrf_dev; 990 941 991 942 err = nf_hook(NFPROTO_IPV4, NF_INET_LOCAL_OUT, net, sk, 992 - skb, NULL, vrf_dev, vrf_output_direct); 943 + skb, NULL, vrf_dev, vrf_ip_out_direct_finish); 993 944 994 945 if (likely(err == 1)) 995 946 err = vrf_output_direct(net, sk, skb); 996 947 997 - /* reset skb device */ 998 948 if (likely(err == 1)) 999 - nf_reset_ct(skb); 1000 - else 1001 - skb = NULL; 949 + return skb; 1002 950 1003 - return skb; 951 + return NULL; 1004 952 } 1005 953 1006 954 static struct sk_buff *vrf_ip_out(struct net_device *vrf_dev,

+1

drivers/net/wan/cosa.c

··· 889 889 chan->tx_status = 1; 890 890 spin_unlock_irqrestore(&cosa->lock, flags); 891 891 up(&chan->wsem); 892 + kfree(kbuf); 892 893 return -ERESTARTSYS; 893 894 } 894 895 }

+6 -2

drivers/nvme/host/core.c

··· 4582 4582 } 4583 4583 EXPORT_SYMBOL_GPL(nvme_start_queues); 4584 4584 4585 - 4586 - void nvme_sync_queues(struct nvme_ctrl *ctrl) 4585 + void nvme_sync_io_queues(struct nvme_ctrl *ctrl) 4587 4586 { 4588 4587 struct nvme_ns *ns; 4589 4588 ··· 4590 4591 list_for_each_entry(ns, &ctrl->namespaces, list) 4591 4592 blk_sync_queue(ns->queue); 4592 4593 up_read(&ctrl->namespaces_rwsem); 4594 + } 4595 + EXPORT_SYMBOL_GPL(nvme_sync_io_queues); 4593 4596 4597 + void nvme_sync_queues(struct nvme_ctrl *ctrl) 4598 + { 4599 + nvme_sync_io_queues(ctrl); 4594 4600 if (ctrl->admin_q) 4595 4601 blk_sync_queue(ctrl->admin_q); 4596 4602 }

+1

drivers/nvme/host/nvme.h

··· 602 602 void nvme_start_queues(struct nvme_ctrl *ctrl); 603 603 void nvme_kill_queues(struct nvme_ctrl *ctrl); 604 604 void nvme_sync_queues(struct nvme_ctrl *ctrl); 605 + void nvme_sync_io_queues(struct nvme_ctrl *ctrl); 605 606 void nvme_unfreeze(struct nvme_ctrl *ctrl); 606 607 void nvme_wait_freeze(struct nvme_ctrl *ctrl); 607 608 int nvme_wait_freeze_timeout(struct nvme_ctrl *ctrl, long timeout);

+19 -4

drivers/nvme/host/pci.c

··· 198 198 u32 q_depth; 199 199 u16 cq_vector; 200 200 u16 sq_tail; 201 + u16 last_sq_tail; 201 202 u16 cq_head; 202 203 u16 qid; 203 204 u8 cq_phase; ··· 456 455 return 0; 457 456 } 458 457 459 - static inline void nvme_write_sq_db(struct nvme_queue *nvmeq) 458 + /* 459 + * Write sq tail if we are asked to, or if the next command would wrap. 460 + */ 461 + static inline void nvme_write_sq_db(struct nvme_queue *nvmeq, bool write_sq) 460 462 { 463 + if (!write_sq) { 464 + u16 next_tail = nvmeq->sq_tail + 1; 465 + 466 + if (next_tail == nvmeq->q_depth) 467 + next_tail = 0; 468 + if (next_tail != nvmeq->last_sq_tail) 469 + return; 470 + } 471 + 461 472 if (nvme_dbbuf_update_and_check_event(nvmeq->sq_tail, 462 473 nvmeq->dbbuf_sq_db, nvmeq->dbbuf_sq_ei)) 463 474 writel(nvmeq->sq_tail, nvmeq->q_db); 475 + nvmeq->last_sq_tail = nvmeq->sq_tail; 464 476 } 465 477 466 478 /** ··· 490 476 cmd, sizeof(*cmd)); 491 477 if (++nvmeq->sq_tail == nvmeq->q_depth) 492 478 nvmeq->sq_tail = 0; 493 - if (write_sq) 494 - nvme_write_sq_db(nvmeq); 479 + nvme_write_sq_db(nvmeq, write_sq); 495 480 spin_unlock(&nvmeq->sq_lock); 496 481 } 497 482 ··· 499 486 struct nvme_queue *nvmeq = hctx->driver_data; 500 487 501 488 spin_lock(&nvmeq->sq_lock); 502 - nvme_write_sq_db(nvmeq); 489 + if (nvmeq->sq_tail != nvmeq->last_sq_tail) 490 + nvme_write_sq_db(nvmeq, true); 503 491 spin_unlock(&nvmeq->sq_lock); 504 492 } 505 493 ··· 1510 1496 struct nvme_dev *dev = nvmeq->dev; 1511 1497 1512 1498 nvmeq->sq_tail = 0; 1499 + nvmeq->last_sq_tail = 0; 1513 1500 nvmeq->cq_head = 0; 1514 1501 nvmeq->cq_phase = 1; 1515 1502 nvmeq->q_db = &dev->dbs[qid * 2 * dev->db_stride];

+3 -11

drivers/nvme/host/rdma.c

··· 122 122 struct sockaddr_storage src_addr; 123 123 124 124 struct nvme_ctrl ctrl; 125 - struct mutex teardown_lock; 126 125 bool use_inline_data; 127 126 u32 io_queues[HCTX_MAX_TYPES]; 128 127 }; ··· 1009 1010 static void nvme_rdma_teardown_admin_queue(struct nvme_rdma_ctrl *ctrl, 1010 1011 bool remove) 1011 1012 { 1012 - mutex_lock(&ctrl->teardown_lock); 1013 1013 blk_mq_quiesce_queue(ctrl->ctrl.admin_q); 1014 + blk_sync_queue(ctrl->ctrl.admin_q); 1014 1015 nvme_rdma_stop_queue(&ctrl->queues[0]); 1015 1016 if (ctrl->ctrl.admin_tagset) { 1016 1017 blk_mq_tagset_busy_iter(ctrl->ctrl.admin_tagset, ··· 1020 1021 if (remove) 1021 1022 blk_mq_unquiesce_queue(ctrl->ctrl.admin_q); 1022 1023 nvme_rdma_destroy_admin_queue(ctrl, remove); 1023 - mutex_unlock(&ctrl->teardown_lock); 1024 1024 } 1025 1025 1026 1026 static void nvme_rdma_teardown_io_queues(struct nvme_rdma_ctrl *ctrl, 1027 1027 bool remove) 1028 1028 { 1029 - mutex_lock(&ctrl->teardown_lock); 1030 1029 if (ctrl->ctrl.queue_count > 1) { 1031 1030 nvme_start_freeze(&ctrl->ctrl); 1032 1031 nvme_stop_queues(&ctrl->ctrl); 1032 + nvme_sync_io_queues(&ctrl->ctrl); 1033 1033 nvme_rdma_stop_io_queues(ctrl); 1034 1034 if (ctrl->ctrl.tagset) { 1035 1035 blk_mq_tagset_busy_iter(ctrl->ctrl.tagset, ··· 1039 1041 nvme_start_queues(&ctrl->ctrl); 1040 1042 nvme_rdma_destroy_io_queues(ctrl, remove); 1041 1043 } 1042 - mutex_unlock(&ctrl->teardown_lock); 1043 1044 } 1044 1045 1045 1046 static void nvme_rdma_free_ctrl(struct nvme_ctrl *nctrl) ··· 1973 1976 { 1974 1977 struct nvme_rdma_request *req = blk_mq_rq_to_pdu(rq); 1975 1978 struct nvme_rdma_queue *queue = req->queue; 1976 - struct nvme_rdma_ctrl *ctrl = queue->ctrl; 1977 1979 1978 - /* fence other contexts that may complete the command */ 1979 - mutex_lock(&ctrl->teardown_lock); 1980 1980 nvme_rdma_stop_queue(queue); 1981 - if (!blk_mq_request_completed(rq)) { 1981 + if (blk_mq_request_started(rq) && !blk_mq_request_completed(rq)) { 1982 1982 nvme_req(rq)->status = NVME_SC_HOST_ABORTED_CMD; 1983 1983 blk_mq_complete_request(rq); 1984 1984 } 1985 - mutex_unlock(&ctrl->teardown_lock); 1986 1985 } 1987 1986 1988 1987 static enum blk_eh_timer_return ··· 2313 2320 return ERR_PTR(-ENOMEM); 2314 2321 ctrl->ctrl.opts = opts; 2315 2322 INIT_LIST_HEAD(&ctrl->list); 2316 - mutex_init(&ctrl->teardown_lock); 2317 2323 2318 2324 if (!(opts->mask & NVMF_OPT_TRSVCID)) { 2319 2325 opts->trsvcid =

+4 -12

drivers/nvme/host/tcp.c

··· 124 124 struct sockaddr_storage src_addr; 125 125 struct nvme_ctrl ctrl; 126 126 127 - struct mutex teardown_lock; 128 127 struct work_struct err_work; 129 128 struct delayed_work connect_work; 130 129 struct nvme_tcp_request async_req; ··· 1885 1886 static void nvme_tcp_teardown_admin_queue(struct nvme_ctrl *ctrl, 1886 1887 bool remove) 1887 1888 { 1888 - mutex_lock(&to_tcp_ctrl(ctrl)->teardown_lock); 1889 1889 blk_mq_quiesce_queue(ctrl->admin_q); 1890 + blk_sync_queue(ctrl->admin_q); 1890 1891 nvme_tcp_stop_queue(ctrl, 0); 1891 1892 if (ctrl->admin_tagset) { 1892 1893 blk_mq_tagset_busy_iter(ctrl->admin_tagset, ··· 1896 1897 if (remove) 1897 1898 blk_mq_unquiesce_queue(ctrl->admin_q); 1898 1899 nvme_tcp_destroy_admin_queue(ctrl, remove); 1899 - mutex_unlock(&to_tcp_ctrl(ctrl)->teardown_lock); 1900 1900 } 1901 1901 1902 1902 static void nvme_tcp_teardown_io_queues(struct nvme_ctrl *ctrl, 1903 1903 bool remove) 1904 1904 { 1905 - mutex_lock(&to_tcp_ctrl(ctrl)->teardown_lock); 1906 1905 if (ctrl->queue_count <= 1) 1907 - goto out; 1906 + return; 1908 1907 blk_mq_quiesce_queue(ctrl->admin_q); 1909 1908 nvme_start_freeze(ctrl); 1910 1909 nvme_stop_queues(ctrl); 1910 + nvme_sync_io_queues(ctrl); 1911 1911 nvme_tcp_stop_io_queues(ctrl); 1912 1912 if (ctrl->tagset) { 1913 1913 blk_mq_tagset_busy_iter(ctrl->tagset, ··· 1916 1918 if (remove) 1917 1919 nvme_start_queues(ctrl); 1918 1920 nvme_tcp_destroy_io_queues(ctrl, remove); 1919 - out: 1920 - mutex_unlock(&to_tcp_ctrl(ctrl)->teardown_lock); 1921 1921 } 1922 1922 1923 1923 static void nvme_tcp_reconnect_or_remove(struct nvme_ctrl *ctrl) ··· 2167 2171 struct nvme_tcp_request *req = blk_mq_rq_to_pdu(rq); 2168 2172 struct nvme_ctrl *ctrl = &req->queue->ctrl->ctrl; 2169 2173 2170 - /* fence other contexts that may complete the command */ 2171 - mutex_lock(&to_tcp_ctrl(ctrl)->teardown_lock); 2172 2174 nvme_tcp_stop_queue(ctrl, nvme_tcp_queue_id(req->queue)); 2173 - if (!blk_mq_request_completed(rq)) { 2175 + if (blk_mq_request_started(rq) && !blk_mq_request_completed(rq)) { 2174 2176 nvme_req(rq)->status = NVME_SC_HOST_ABORTED_CMD; 2175 2177 blk_mq_complete_request(rq); 2176 2178 } 2177 - mutex_unlock(&to_tcp_ctrl(ctrl)->teardown_lock); 2178 2179 } 2179 2180 2180 2181 static enum blk_eh_timer_return ··· 2448 2455 nvme_tcp_reconnect_ctrl_work); 2449 2456 INIT_WORK(&ctrl->err_work, nvme_tcp_error_recovery_work); 2450 2457 INIT_WORK(&ctrl->ctrl.reset_work, nvme_reset_ctrl_work); 2451 - mutex_init(&ctrl->teardown_lock); 2452 2458 2453 2459 if (!(opts->mask & NVMF_OPT_TRSVCID)) { 2454 2460 opts->trsvcid =

+2 -2

drivers/powercap/powercap_sys.c

··· 367 367 &dev_attr_max_energy_range_uj.attr; 368 368 if (power_zone->ops->get_energy_uj) { 369 369 if (power_zone->ops->reset_energy_uj) 370 - dev_attr_energy_uj.attr.mode = S_IWUSR | S_IRUGO; 370 + dev_attr_energy_uj.attr.mode = S_IWUSR | S_IRUSR; 371 371 else 372 - dev_attr_energy_uj.attr.mode = S_IRUGO; 372 + dev_attr_energy_uj.attr.mode = S_IRUSR; 373 373 power_zone->zone_dev_attrs[count++] = 374 374 &dev_attr_energy_uj.attr; 375 375 }

+5 -4

drivers/scsi/device_handler/scsi_dh_alua.c

··· 658 658 rcu_read_lock(); 659 659 list_for_each_entry_rcu(h, 660 660 &tmp_pg->dh_list, node) { 661 - /* h->sdev should always be valid */ 662 - BUG_ON(!h->sdev); 661 + if (!h->sdev) 662 + continue; 663 663 h->sdev->access_state = desc[0]; 664 664 } 665 665 rcu_read_unlock(); ··· 705 705 pg->expiry = 0; 706 706 rcu_read_lock(); 707 707 list_for_each_entry_rcu(h, &pg->dh_list, node) { 708 - BUG_ON(!h->sdev); 708 + if (!h->sdev) 709 + continue; 709 710 h->sdev->access_state = 710 711 (pg->state & SCSI_ACCESS_STATE_MASK); 711 712 if (pg->pref) ··· 1148 1147 spin_lock(&h->pg_lock); 1149 1148 pg = rcu_dereference_protected(h->pg, lockdep_is_held(&h->pg_lock)); 1150 1149 rcu_assign_pointer(h->pg, NULL); 1151 - h->sdev = NULL; 1152 1150 spin_unlock(&h->pg_lock); 1153 1151 if (pg) { 1154 1152 spin_lock_irq(&pg->lock); ··· 1156 1156 kref_put(&pg->kref, release_port_group); 1157 1157 } 1158 1158 sdev->handler_data = NULL; 1159 + synchronize_rcu(); 1159 1160 kfree(h); 1160 1161 } 1161 1162

+3 -1

drivers/scsi/hpsa.c

··· 8855 8855 /* hook into SCSI subsystem */ 8856 8856 rc = hpsa_scsi_add_host(h); 8857 8857 if (rc) 8858 - goto clean7; /* perf, sg, cmd, irq, shost, pci, lu, aer/h */ 8858 + goto clean8; /* lastlogicals, perf, sg, cmd, irq, shost, pci, lu, aer/h */ 8859 8859 8860 8860 /* Monitor the controller for firmware lockups */ 8861 8861 h->heartbeat_sample_interval = HEARTBEAT_SAMPLE_INTERVAL; ··· 8870 8870 HPSA_EVENT_MONITOR_INTERVAL); 8871 8871 return 0; 8872 8872 8873 + clean8: /* lastlogicals, perf, sg, cmd, irq, shost, pci, lu, aer/h */ 8874 + kfree(h->lastlogicals); 8873 8875 clean7: /* perf, sg, cmd, irq, shost, pci, lu, aer/h */ 8874 8876 hpsa_free_performant_mode(h); 8875 8877 h->access.set_intr_mask(h, HPSA_INTR_OFF);

+7

drivers/scsi/mpt3sas/mpt3sas_base.c

··· 1740 1740 reply_q->irq_poll_scheduled = false; 1741 1741 reply_q->irq_line_enable = true; 1742 1742 enable_irq(reply_q->os_irq); 1743 + /* 1744 + * Go for one more round of processing the 1745 + * reply descriptor post queue incase if HBA 1746 + * Firmware has posted some reply descriptors 1747 + * while reenabling the IRQ. 1748 + */ 1749 + _base_process_reply_queue(reply_q); 1743 1750 } 1744 1751 1745 1752 return num_entries;

+1 -1

drivers/tty/serial/8250/8250_mtk.c

··· 317 317 */ 318 318 baud = tty_termios_baud_rate(termios); 319 319 320 - serial8250_do_set_termios(port, termios, old); 320 + serial8250_do_set_termios(port, termios, NULL); 321 321 322 322 tty_termios_encode_baud_rate(termios, baud, baud); 323 323

+1

drivers/tty/serial/Kconfig

··· 522 522 depends on OF 523 523 select SERIAL_EARLYCON 524 524 select SERIAL_CORE_CONSOLE 525 + default y if SERIAL_IMX_CONSOLE 525 526 help 526 527 If you have enabled the earlycon on the Freescale IMX 527 528 CPU you can make it the earlycon by answering Y to this option.

+3

drivers/tty/serial/serial_txx9.c

··· 1280 1280 1281 1281 #ifdef ENABLE_SERIAL_TXX9_PCI 1282 1282 ret = pci_register_driver(&serial_txx9_pci_driver); 1283 + if (ret) { 1284 + platform_driver_unregister(&serial_txx9_plat_driver); 1285 + } 1283 1286 #endif 1284 1287 if (ret == 0) 1285 1288 goto out;

+4 -2

drivers/tty/tty_io.c

··· 1515 1515 tty->ops->shutdown(tty); 1516 1516 tty_save_termios(tty); 1517 1517 tty_driver_remove_tty(tty->driver, tty); 1518 - tty->port->itty = NULL; 1518 + if (tty->port) 1519 + tty->port->itty = NULL; 1519 1520 if (tty->link) 1520 1521 tty->link->port->itty = NULL; 1521 - tty_buffer_cancel_work(tty->port); 1522 + if (tty->port) 1523 + tty_buffer_cancel_work(tty->port); 1522 1524 if (tty->link) 1523 1525 tty_buffer_cancel_work(tty->link->port); 1524 1526

+2 -22

drivers/tty/vt/vt.c

··· 4704 4704 return rc; 4705 4705 } 4706 4706 4707 - static int con_font_copy(struct vc_data *vc, struct console_font_op *op) 4708 - { 4709 - int con = op->height; 4710 - int rc; 4711 - 4712 - 4713 - console_lock(); 4714 - if (vc->vc_mode != KD_TEXT) 4715 - rc = -EINVAL; 4716 - else if (!vc->vc_sw->con_font_copy) 4717 - rc = -ENOSYS; 4718 - else if (con < 0 || !vc_cons_allocated(con)) 4719 - rc = -ENOTTY; 4720 - else if (con == vc->vc_num) /* nothing to do */ 4721 - rc = 0; 4722 - else 4723 - rc = vc->vc_sw->con_font_copy(vc, con); 4724 - console_unlock(); 4725 - return rc; 4726 - } 4727 - 4728 4707 int con_font_op(struct vc_data *vc, struct console_font_op *op) 4729 4708 { 4730 4709 switch (op->op) { ··· 4714 4735 case KD_FONT_OP_SET_DEFAULT: 4715 4736 return con_font_default(vc, op); 4716 4737 case KD_FONT_OP_COPY: 4717 - return con_font_copy(vc, op); 4738 + /* was buggy and never really used */ 4739 + return -EINVAL; 4718 4740 } 4719 4741 return -ENOSYS; 4720 4742 }

+3

drivers/usb/core/quirks.c

··· 378 378 { USB_DEVICE(0x0926, 0x3333), .driver_info = 379 379 USB_QUIRK_CONFIG_INTF_STRINGS }, 380 380 381 + /* Kingston DataTraveler 3.0 */ 382 + { USB_DEVICE(0x0951, 0x1666), .driver_info = USB_QUIRK_NO_LPM }, 383 + 381 384 /* X-Rite/Gretag-Macbeth Eye-One Pro display colorimeter */ 382 385 { USB_DEVICE(0x0971, 0x2000), .driver_info = USB_QUIRK_NO_SET_INTF }, 383 386

+3

drivers/usb/dwc2/platform.c

··· 608 608 #endif /* CONFIG_USB_DWC2_PERIPHERAL || CONFIG_USB_DWC2_DUAL_ROLE */ 609 609 return 0; 610 610 611 + #if IS_ENABLED(CONFIG_USB_DWC2_PERIPHERAL) || \ 612 + IS_ENABLED(CONFIG_USB_DWC2_DUAL_ROLE) 611 613 error_debugfs: 612 614 dwc2_debugfs_exit(hsotg); 613 615 if (hsotg->hcd_enabled) 614 616 dwc2_hcd_remove(hsotg); 617 + #endif 615 618 error_drd: 616 619 dwc2_drd_exit(hsotg); 617 620

+4

drivers/usb/dwc3/dwc3-pci.c

··· 40 40 #define PCI_DEVICE_ID_INTEL_TGPLP 0xa0ee 41 41 #define PCI_DEVICE_ID_INTEL_TGPH 0x43ee 42 42 #define PCI_DEVICE_ID_INTEL_JSP 0x4dee 43 + #define PCI_DEVICE_ID_INTEL_ADLS 0x7ae1 43 44 44 45 #define PCI_INTEL_BXT_DSM_GUID "732b85d5-b7a7-4a1b-9ba0-4bbd00ffd511" 45 46 #define PCI_INTEL_BXT_FUNC_PMU_PWR 4 ··· 366 365 (kernel_ulong_t) &dwc3_pci_intel_properties, }, 367 366 368 367 { PCI_VDEVICE(INTEL, PCI_DEVICE_ID_INTEL_JSP), 368 + (kernel_ulong_t) &dwc3_pci_intel_properties, }, 369 + 370 + { PCI_VDEVICE(INTEL, PCI_DEVICE_ID_INTEL_ADLS), 369 371 (kernel_ulong_t) &dwc3_pci_intel_properties, }, 370 372 371 373 { PCI_VDEVICE(AMD, PCI_DEVICE_ID_AMD_NL_USB),

+2 -1

drivers/usb/dwc3/ep0.c

··· 1058 1058 { 1059 1059 unsigned int direction = !dwc->ep0_expect_in; 1060 1060 1061 + dwc->delayed_status = false; 1062 + 1061 1063 if (dwc->ep0state != EP0_STATUS_PHASE) 1062 1064 return; 1063 1065 1064 - dwc->delayed_status = false; 1065 1066 __dwc3_ep0_do_control_status(dwc, dwc->eps[direction]); 1066 1067 } 1067 1068

+4 -1

drivers/usb/gadget/legacy/raw_gadget.c

··· 564 564 return -ENODEV; 565 565 } 566 566 length = min(arg.length, event->length); 567 - if (copy_to_user((void __user *)value, event, sizeof(*event) + length)) 567 + if (copy_to_user((void __user *)value, event, sizeof(*event) + length)) { 568 + kfree(event); 568 569 return -EFAULT; 570 + } 569 571 572 + kfree(event); 570 573 return 0; 571 574 } 572 575

+1 -1

drivers/usb/gadget/udc/fsl_udc_core.c

··· 1051 1051 u32 bitmask; 1052 1052 struct ep_queue_head *qh; 1053 1053 1054 - if (!_ep || _ep->desc || !(_ep->desc->bEndpointAddress&0xF)) 1054 + if (!_ep || !_ep->desc || !(_ep->desc->bEndpointAddress&0xF)) 1055 1055 return -ENODEV; 1056 1056 1057 1057 ep = container_of(_ep, struct fsl_ep, ep);

+1 -1

drivers/usb/gadget/udc/goku_udc.c

··· 1760 1760 goto err; 1761 1761 } 1762 1762 1763 + pci_set_drvdata(pdev, dev); 1763 1764 spin_lock_init(&dev->lock); 1764 1765 dev->pdev = pdev; 1765 1766 dev->gadget.ops = &goku_ops; ··· 1794 1793 } 1795 1794 dev->regs = (struct goku_udc_regs __iomem *) base; 1796 1795 1797 - pci_set_drvdata(pdev, dev); 1798 1796 INFO(dev, "%s\n", driver_desc); 1799 1797 INFO(dev, "version: " DRIVER_VERSION " %s\n", dmastr()); 1800 1798 INFO(dev, "irq %d, pci mem %p\n", pdev->irq, base);

+3 -1

drivers/usb/misc/apple-mfi-fastcharge.c

··· 120 120 dev_dbg(&mfi->udev->dev, "prop: %d\n", psp); 121 121 122 122 ret = pm_runtime_get_sync(&mfi->udev->dev); 123 - if (ret < 0) 123 + if (ret < 0) { 124 + pm_runtime_put_noidle(&mfi->udev->dev); 124 125 return ret; 126 + } 125 127 126 128 switch (psp) { 127 129 case POWER_SUPPLY_PROP_CHARGE_TYPE:

+1

drivers/usb/mtu3/mtu3_gadget.c

··· 564 564 565 565 spin_unlock_irqrestore(&mtu->lock, flags); 566 566 567 + synchronize_irq(mtu->irq); 567 568 return 0; 568 569 } 569 570

+6 -1

drivers/usb/serial/cyberjack.c

··· 357 357 struct device *dev = &port->dev; 358 358 int status = urb->status; 359 359 unsigned long flags; 360 + bool resubmitted = false; 360 361 361 - set_bit(0, &port->write_urbs_free); 362 362 if (status) { 363 363 dev_dbg(dev, "%s - nonzero write bulk status received: %d\n", 364 364 __func__, status); 365 + set_bit(0, &port->write_urbs_free); 365 366 return; 366 367 } 367 368 ··· 395 394 goto exit; 396 395 } 397 396 397 + resubmitted = true; 398 + 398 399 dev_dbg(dev, "%s - priv->wrsent=%d\n", __func__, priv->wrsent); 399 400 dev_dbg(dev, "%s - priv->wrfilled=%d\n", __func__, priv->wrfilled); 400 401 ··· 413 410 414 411 exit: 415 412 spin_unlock_irqrestore(&priv->lock, flags); 413 + if (!resubmitted) 414 + set_bit(0, &port->write_urbs_free); 416 415 usb_serial_port_softint(port); 417 416 } 418 417

+10

drivers/usb/serial/option.c

··· 250 250 #define QUECTEL_PRODUCT_EP06 0x0306 251 251 #define QUECTEL_PRODUCT_EM12 0x0512 252 252 #define QUECTEL_PRODUCT_RM500Q 0x0800 253 + #define QUECTEL_PRODUCT_EC200T 0x6026 253 254 254 255 #define CMOTECH_VENDOR_ID 0x16d8 255 256 #define CMOTECH_PRODUCT_6001 0x6001 ··· 1118 1117 { USB_DEVICE_AND_INTERFACE_INFO(QUECTEL_VENDOR_ID, QUECTEL_PRODUCT_RM500Q, 0xff, 0, 0) }, 1119 1118 { USB_DEVICE_AND_INTERFACE_INFO(QUECTEL_VENDOR_ID, QUECTEL_PRODUCT_RM500Q, 0xff, 0xff, 0x10), 1120 1119 .driver_info = ZLP }, 1120 + { USB_DEVICE_AND_INTERFACE_INFO(QUECTEL_VENDOR_ID, QUECTEL_PRODUCT_EC200T, 0xff, 0, 0) }, 1121 1121 1122 1122 { USB_DEVICE(CMOTECH_VENDOR_ID, CMOTECH_PRODUCT_6001) }, 1123 1123 { USB_DEVICE(CMOTECH_VENDOR_ID, CMOTECH_PRODUCT_CMU_300) }, ··· 1191 1189 .driver_info = NCTRL(0) | RSVD(1) }, 1192 1190 { USB_DEVICE_INTERFACE_CLASS(TELIT_VENDOR_ID, 0x1054, 0xff), /* Telit FT980-KS */ 1193 1191 .driver_info = NCTRL(2) | RSVD(3) }, 1192 + { USB_DEVICE_INTERFACE_CLASS(TELIT_VENDOR_ID, 0x1055, 0xff), /* Telit FN980 (PCIe) */ 1193 + .driver_info = NCTRL(0) | RSVD(1) }, 1194 1194 { USB_DEVICE(TELIT_VENDOR_ID, TELIT_PRODUCT_ME910), 1195 1195 .driver_info = NCTRL(0) | RSVD(1) | RSVD(3) }, 1196 1196 { USB_DEVICE(TELIT_VENDOR_ID, TELIT_PRODUCT_ME910_DUAL_MODEM), ··· 1205 1201 .driver_info = NCTRL(0) }, 1206 1202 { USB_DEVICE(TELIT_VENDOR_ID, TELIT_PRODUCT_LE910), 1207 1203 .driver_info = NCTRL(0) | RSVD(1) | RSVD(2) }, 1204 + { USB_DEVICE_INTERFACE_CLASS(TELIT_VENDOR_ID, 0x1203, 0xff), /* Telit LE910Cx (RNDIS) */ 1205 + .driver_info = NCTRL(2) | RSVD(3) }, 1208 1206 { USB_DEVICE(TELIT_VENDOR_ID, TELIT_PRODUCT_LE910_USBCFG4), 1209 1207 .driver_info = NCTRL(0) | RSVD(1) | RSVD(2) | RSVD(3) }, 1210 1208 { USB_DEVICE(TELIT_VENDOR_ID, TELIT_PRODUCT_LE920), ··· 1221 1215 { USB_DEVICE_INTERFACE_CLASS(TELIT_VENDOR_ID, TELIT_PRODUCT_LE920A4_1213, 0xff) }, 1222 1216 { USB_DEVICE(TELIT_VENDOR_ID, TELIT_PRODUCT_LE920A4_1214), 1223 1217 .driver_info = NCTRL(0) | RSVD(1) | RSVD(2) | RSVD(3) }, 1218 + { USB_DEVICE_INTERFACE_CLASS(TELIT_VENDOR_ID, 0x1230, 0xff), /* Telit LE910Cx (rmnet) */ 1219 + .driver_info = NCTRL(0) | RSVD(1) | RSVD(2) }, 1220 + { USB_DEVICE_INTERFACE_CLASS(TELIT_VENDOR_ID, 0x1231, 0xff), /* Telit LE910Cx (RNDIS) */ 1221 + .driver_info = NCTRL(2) | RSVD(3) }, 1224 1222 { USB_DEVICE(TELIT_VENDOR_ID, 0x1260), 1225 1223 .driver_info = NCTRL(0) | RSVD(1) | RSVD(2) }, 1226 1224 { USB_DEVICE(TELIT_VENDOR_ID, 0x1261),

+1 -2

drivers/xen/swiotlb-xen.c

··· 395 395 */ 396 396 trace_swiotlb_bounced(dev, dev_addr, size, swiotlb_force); 397 397 398 - map = swiotlb_tbl_map_single(dev, virt_to_phys(xen_io_tlb_start), 399 - phys, size, size, dir, attrs); 398 + map = swiotlb_tbl_map_single(dev, phys, size, size, dir, attrs); 400 399 if (map == (phys_addr_t)DMA_MAPPING_ERROR) 401 400 return DMA_MAPPING_ERROR; 402 401

+2 -1

fs/btrfs/block-rsv.c

··· 511 511 /*DEFAULT_RATELIMIT_BURST*/ 1); 512 512 if (__ratelimit(&_rs)) 513 513 WARN(1, KERN_DEBUG 514 - "BTRFS: block rsv returned %d\n", ret); 514 + "BTRFS: block rsv %d returned %d\n", 515 + block_rsv->type, ret); 515 516 } 516 517 try_reserve: 517 518 ret = btrfs_reserve_metadata_bytes(root, block_rsv, blocksize,

+24 -2

fs/btrfs/dev-replace.c

··· 91 91 ret = btrfs_search_slot(NULL, dev_root, &key, path, 0, 0); 92 92 if (ret) { 93 93 no_valid_dev_replace_entry_found: 94 + /* 95 + * We don't have a replace item or it's corrupted. If there is 96 + * a replace target, fail the mount. 97 + */ 98 + if (btrfs_find_device(fs_info->fs_devices, 99 + BTRFS_DEV_REPLACE_DEVID, NULL, NULL, false)) { 100 + btrfs_err(fs_info, 101 + "found replace target device without a valid replace item"); 102 + ret = -EUCLEAN; 103 + goto out; 104 + } 94 105 ret = 0; 95 106 dev_replace->replace_state = 96 107 BTRFS_IOCTL_DEV_REPLACE_STATE_NEVER_STARTED; ··· 154 143 case BTRFS_IOCTL_DEV_REPLACE_STATE_NEVER_STARTED: 155 144 case BTRFS_IOCTL_DEV_REPLACE_STATE_FINISHED: 156 145 case BTRFS_IOCTL_DEV_REPLACE_STATE_CANCELED: 157 - dev_replace->srcdev = NULL; 158 - dev_replace->tgtdev = NULL; 146 + /* 147 + * We don't have an active replace item but if there is a 148 + * replace target, fail the mount. 149 + */ 150 + if (btrfs_find_device(fs_info->fs_devices, 151 + BTRFS_DEV_REPLACE_DEVID, NULL, NULL, false)) { 152 + btrfs_err(fs_info, 153 + "replace devid present without an active replace item"); 154 + ret = -EUCLEAN; 155 + } else { 156 + dev_replace->srcdev = NULL; 157 + dev_replace->tgtdev = NULL; 158 + } 159 159 break; 160 160 case BTRFS_IOCTL_DEV_REPLACE_STATE_STARTED: 161 161 case BTRFS_IOCTL_DEV_REPLACE_STATE_SUSPENDED:

+4 -6

fs/btrfs/ioctl.c

··· 1274 1274 u64 page_start; 1275 1275 u64 page_end; 1276 1276 u64 page_cnt; 1277 + u64 start = (u64)start_index << PAGE_SHIFT; 1277 1278 int ret; 1278 1279 int i; 1279 1280 int i_done; ··· 1291 1290 page_cnt = min_t(u64, (u64)num_pages, (u64)file_end - start_index + 1); 1292 1291 1293 1292 ret = btrfs_delalloc_reserve_space(BTRFS_I(inode), &data_reserved, 1294 - start_index << PAGE_SHIFT, 1295 - page_cnt << PAGE_SHIFT); 1293 + start, page_cnt << PAGE_SHIFT); 1296 1294 if (ret) 1297 1295 return ret; 1298 1296 i_done = 0; ··· 1380 1380 btrfs_mod_outstanding_extents(BTRFS_I(inode), 1); 1381 1381 spin_unlock(&BTRFS_I(inode)->lock); 1382 1382 btrfs_delalloc_release_space(BTRFS_I(inode), data_reserved, 1383 - start_index << PAGE_SHIFT, 1384 - (page_cnt - i_done) << PAGE_SHIFT, true); 1383 + start, (page_cnt - i_done) << PAGE_SHIFT, true); 1385 1384 } 1386 1385 1387 1386 ··· 1407 1408 put_page(pages[i]); 1408 1409 } 1409 1410 btrfs_delalloc_release_space(BTRFS_I(inode), data_reserved, 1410 - start_index << PAGE_SHIFT, 1411 - page_cnt << PAGE_SHIFT, true); 1411 + start, page_cnt << PAGE_SHIFT, true); 1412 1412 btrfs_delalloc_release_extents(BTRFS_I(inode), page_cnt << PAGE_SHIFT); 1413 1413 extent_changeset_free(data_reserved); 1414 1414 return ret;

+4 -8

fs/btrfs/qgroup.c

··· 3435 3435 { 3436 3436 struct rb_node *node; 3437 3437 struct rb_node *next; 3438 - struct ulist_node *entry = NULL; 3438 + struct ulist_node *entry; 3439 3439 int ret = 0; 3440 3440 3441 3441 node = reserved->range_changed.root.rb_node; 3442 + if (!node) 3443 + return 0; 3442 3444 while (node) { 3443 3445 entry = rb_entry(node, struct ulist_node, rb_node); 3444 3446 if (entry->val < start) 3445 3447 node = node->rb_right; 3446 - else if (entry) 3447 - node = node->rb_left; 3448 3448 else 3449 - break; 3449 + node = node->rb_left; 3450 3450 } 3451 - 3452 - /* Empty changeset */ 3453 - if (!entry) 3454 - return 0; 3455 3451 3456 3452 if (entry->val > start && rb_prev(&entry->rb_node)) 3457 3453 entry = rb_entry(rb_prev(&entry->rb_node), struct ulist_node,

+1

fs/btrfs/ref-verify.c

··· 860 860 "dropping a ref for a root that doesn't have a ref on the block"); 861 861 dump_block_entry(fs_info, be); 862 862 dump_ref_action(fs_info, ra); 863 + kfree(ref); 863 864 kfree(ra); 864 865 goto out_unlock; 865 866 }

+3 -1

fs/btrfs/relocation.c

··· 1648 1648 struct btrfs_root_item *root_item; 1649 1649 struct btrfs_path *path; 1650 1650 struct extent_buffer *leaf; 1651 + int reserve_level; 1651 1652 int level; 1652 1653 int max_level; 1653 1654 int replaced = 0; ··· 1697 1696 * Thus the needed metadata size is at most root_level * nodesize, 1698 1697 * and * 2 since we have two trees to COW. 1699 1698 */ 1700 - min_reserved = fs_info->nodesize * btrfs_root_level(root_item) * 2; 1699 + reserve_level = max_t(int, 1, btrfs_root_level(root_item)); 1700 + min_reserved = fs_info->nodesize * reserve_level * 2; 1701 1701 memset(&next_key, 0, sizeof(next_key)); 1702 1702 1703 1703 while (1) {

+3 -2

fs/btrfs/scrub.c

··· 3866 3866 if (!is_dev_replace && !readonly && 3867 3867 !test_bit(BTRFS_DEV_STATE_WRITEABLE, &dev->dev_state)) { 3868 3868 mutex_unlock(&fs_info->fs_devices->device_list_mutex); 3869 - btrfs_err_in_rcu(fs_info, "scrub: device %s is not writable", 3870 - rcu_str_deref(dev->name)); 3869 + btrfs_err_in_rcu(fs_info, 3870 + "scrub on devid %llu: filesystem on %s is not writable", 3871 + devid, rcu_str_deref(dev->name)); 3871 3872 ret = -EROFS; 3872 3873 goto out; 3873 3874 }

+7 -19

fs/btrfs/volumes.c

··· 1056 1056 continue; 1057 1057 } 1058 1058 1059 - if (device->devid == BTRFS_DEV_REPLACE_DEVID) { 1060 - /* 1061 - * In the first step, keep the device which has 1062 - * the correct fsid and the devid that is used 1063 - * for the dev_replace procedure. 1064 - * In the second step, the dev_replace state is 1065 - * read from the device tree and it is known 1066 - * whether the procedure is really active or 1067 - * not, which means whether this device is 1068 - * used or whether it should be removed. 1069 - */ 1070 - if (step == 0 || test_bit(BTRFS_DEV_STATE_REPLACE_TGT, 1071 - &device->dev_state)) { 1072 - continue; 1073 - } 1074 - } 1059 + /* 1060 + * We have already validated the presence of BTRFS_DEV_REPLACE_DEVID, 1061 + * in btrfs_init_dev_replace() so just continue. 1062 + */ 1063 + if (device->devid == BTRFS_DEV_REPLACE_DEVID) 1064 + continue; 1065 + 1075 1066 if (device->bdev) { 1076 1067 blkdev_put(device->bdev, device->mode); 1077 1068 device->bdev = NULL; ··· 1071 1080 if (test_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state)) { 1072 1081 list_del_init(&device->dev_alloc_list); 1073 1082 clear_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state); 1074 - if (!test_bit(BTRFS_DEV_STATE_REPLACE_TGT, 1075 - &device->dev_state)) 1076 - fs_devices->rw_devices--; 1077 1083 } 1078 1084 list_del_init(&device->dev_list); 1079 1085 fs_devices->num_devices--;

+1 -1

fs/ceph/caps.c

··· 4074 4074 vino.snap, inode); 4075 4075 4076 4076 mutex_lock(&session->s_mutex); 4077 - session->s_seq++; 4077 + inc_session_sequence(session); 4078 4078 dout(" mds%d seq %lld cap seq %u\n", session->s_mds, session->s_seq, 4079 4079 (unsigned)seq); 4080 4080

+35 -15

fs/ceph/mds_client.c

··· 4231 4231 dname.len, dname.name); 4232 4232 4233 4233 mutex_lock(&session->s_mutex); 4234 - session->s_seq++; 4234 + inc_session_sequence(session); 4235 4235 4236 4236 if (!inode) { 4237 4237 dout("handle_lease no inode %llx\n", vino.ino); ··· 4385 4385 4386 4386 bool check_session_state(struct ceph_mds_session *s) 4387 4387 { 4388 - if (s->s_state == CEPH_MDS_SESSION_CLOSING) { 4389 - dout("resending session close request for mds%d\n", 4390 - s->s_mds); 4391 - request_close_session(s); 4392 - return false; 4393 - } 4394 - if (s->s_ttl && time_after(jiffies, s->s_ttl)) { 4395 - if (s->s_state == CEPH_MDS_SESSION_OPEN) { 4388 + switch (s->s_state) { 4389 + case CEPH_MDS_SESSION_OPEN: 4390 + if (s->s_ttl && time_after(jiffies, s->s_ttl)) { 4396 4391 s->s_state = CEPH_MDS_SESSION_HUNG; 4397 4392 pr_info("mds%d hung\n", s->s_mds); 4398 4393 } 4399 - } 4400 - if (s->s_state == CEPH_MDS_SESSION_NEW || 4401 - s->s_state == CEPH_MDS_SESSION_RESTARTING || 4402 - s->s_state == CEPH_MDS_SESSION_CLOSED || 4403 - s->s_state == CEPH_MDS_SESSION_REJECTED) 4404 - /* this mds is failed or recovering, just wait */ 4394 + break; 4395 + case CEPH_MDS_SESSION_CLOSING: 4396 + /* Should never reach this when we're unmounting */ 4397 + WARN_ON_ONCE(true); 4398 + fallthrough; 4399 + case CEPH_MDS_SESSION_NEW: 4400 + case CEPH_MDS_SESSION_RESTARTING: 4401 + case CEPH_MDS_SESSION_CLOSED: 4402 + case CEPH_MDS_SESSION_REJECTED: 4405 4403 return false; 4404 + } 4406 4405 4407 4406 return true; 4407 + } 4408 + 4409 + /* 4410 + * If the sequence is incremented while we're waiting on a REQUEST_CLOSE reply, 4411 + * then we need to retransmit that request. 4412 + */ 4413 + void inc_session_sequence(struct ceph_mds_session *s) 4414 + { 4415 + lockdep_assert_held(&s->s_mutex); 4416 + 4417 + s->s_seq++; 4418 + 4419 + if (s->s_state == CEPH_MDS_SESSION_CLOSING) { 4420 + int ret; 4421 + 4422 + dout("resending session close request for mds%d\n", s->s_mds); 4423 + ret = request_close_session(s); 4424 + if (ret < 0) 4425 + pr_err("unable to close session to mds%d: %d\n", 4426 + s->s_mds, ret); 4427 + } 4408 4428 } 4409 4429 4410 4430 /*

+1

fs/ceph/mds_client.h

··· 480 480 extern const char *ceph_mds_op_name(int op); 481 481 482 482 extern bool check_session_state(struct ceph_mds_session *s); 483 + void inc_session_sequence(struct ceph_mds_session *s); 483 484 484 485 extern struct ceph_mds_session * 485 486 __ceph_lookup_mds_session(struct ceph_mds_client *, int mds);

+1 -1

fs/ceph/quota.c

··· 53 53 54 54 /* increment msg sequence number */ 55 55 mutex_lock(&session->s_mutex); 56 - session->s_seq++; 56 + inc_session_sequence(session); 57 57 mutex_unlock(&session->s_mutex); 58 58 59 59 /* lookup inode */

+1 -1

fs/ceph/snap.c

··· 873 873 ceph_snap_op_name(op), split, trace_len); 874 874 875 875 mutex_lock(&session->s_mutex); 876 - session->s_seq++; 876 + inc_session_sequence(session); 877 877 mutex_unlock(&session->s_mutex); 878 878 879 879 down_write(&mdsc->snap_rwsem);

+1 -3

fs/crypto/keysetup.c

··· 269 269 * New inodes may not have an inode number assigned yet. 270 270 * Hashing their inode number is delayed until later. 271 271 */ 272 - if (ci->ci_inode->i_ino == 0) 273 - WARN_ON(!(ci->ci_inode->i_state & I_CREATING)); 274 - else 272 + if (ci->ci_inode->i_ino) 275 273 fscrypt_hash_inode_number(ci, mk); 276 274 return 0; 277 275 }

+11 -10

fs/erofs/inode.c

··· 107 107 i_gid_write(inode, le32_to_cpu(die->i_gid)); 108 108 set_nlink(inode, le32_to_cpu(die->i_nlink)); 109 109 110 - /* ns timestamp */ 111 - inode->i_mtime.tv_sec = inode->i_ctime.tv_sec = 112 - le64_to_cpu(die->i_ctime); 113 - inode->i_mtime.tv_nsec = inode->i_ctime.tv_nsec = 114 - le32_to_cpu(die->i_ctime_nsec); 110 + /* extended inode has its own timestamp */ 111 + inode->i_ctime.tv_sec = le64_to_cpu(die->i_ctime); 112 + inode->i_ctime.tv_nsec = le32_to_cpu(die->i_ctime_nsec); 115 113 116 114 inode->i_size = le64_to_cpu(die->i_size); 117 115 ··· 147 149 i_gid_write(inode, le16_to_cpu(dic->i_gid)); 148 150 set_nlink(inode, le16_to_cpu(dic->i_nlink)); 149 151 150 - /* use build time to derive all file time */ 151 - inode->i_mtime.tv_sec = inode->i_ctime.tv_sec = 152 - sbi->build_time; 153 - inode->i_mtime.tv_nsec = inode->i_ctime.tv_nsec = 154 - sbi->build_time_nsec; 152 + /* use build time for compact inodes */ 153 + inode->i_ctime.tv_sec = sbi->build_time; 154 + inode->i_ctime.tv_nsec = sbi->build_time_nsec; 155 155 156 156 inode->i_size = le32_to_cpu(dic->i_size); 157 157 if (erofs_inode_is_data_compressed(vi->datalayout)) ··· 162 166 err = -EOPNOTSUPP; 163 167 goto err_out; 164 168 } 169 + 170 + inode->i_mtime.tv_sec = inode->i_ctime.tv_sec; 171 + inode->i_atime.tv_sec = inode->i_ctime.tv_sec; 172 + inode->i_mtime.tv_nsec = inode->i_ctime.tv_nsec; 173 + inode->i_atime.tv_nsec = inode->i_ctime.tv_nsec; 165 174 166 175 if (!nblks) 167 176 /* measure inode.i_blocks as generic filesystems */

+5 -2

fs/erofs/zdata.c

··· 1078 1078 cond_resched(); 1079 1079 goto repeat; 1080 1080 } 1081 - set_page_private(page, (unsigned long)pcl); 1082 - SetPagePrivate(page); 1081 + 1082 + if (tocache) { 1083 + set_page_private(page, (unsigned long)pcl); 1084 + SetPagePrivate(page); 1085 + } 1083 1086 out: /* the only exit (for tracing and debugging) */ 1084 1087 return page; 1085 1088 }

+46 -20

fs/ext4/ext4.h

··· 1028 1028 * protected by sbi->s_fc_lock. 1029 1029 */ 1030 1030 1031 - /* Fast commit subtid when this inode was committed */ 1032 - unsigned int i_fc_committed_subtid; 1033 - 1034 1031 /* Start of lblk range that needs to be committed in this fast commit */ 1035 1032 ext4_lblk_t i_fc_lblk_start; 1036 1033 ··· 1419 1422 1420 1423 #ifdef __KERNEL__ 1421 1424 1422 - /* 1423 - * run-time mount flags 1424 - */ 1425 - #define EXT4_MF_MNTDIR_SAMPLED 0x0001 1426 - #define EXT4_MF_FS_ABORTED 0x0002 /* Fatal error detected */ 1427 - #define EXT4_MF_FC_INELIGIBLE 0x0004 /* Fast commit ineligible */ 1428 - #define EXT4_MF_FC_COMMITTING 0x0008 /* File system underoing a fast 1429 - * commit. 1430 - */ 1431 - 1432 1425 #ifdef CONFIG_FS_ENCRYPTION 1433 1426 #define DUMMY_ENCRYPTION_ENABLED(sbi) ((sbi)->s_dummy_enc_policy.policy != NULL) 1434 1427 #else ··· 1453 1466 struct buffer_head * __rcu *s_group_desc; 1454 1467 unsigned int s_mount_opt; 1455 1468 unsigned int s_mount_opt2; 1456 - unsigned int s_mount_flags; 1469 + unsigned long s_mount_flags; 1457 1470 unsigned int s_def_mount_opt; 1458 1471 ext4_fsblk_t s_sb_block; 1459 1472 atomic64_t s_resv_clusters; ··· 1682 1695 }) 1683 1696 1684 1697 /* 1698 + * run-time mount flags 1699 + */ 1700 + enum { 1701 + EXT4_MF_MNTDIR_SAMPLED, 1702 + EXT4_MF_FS_ABORTED, /* Fatal error detected */ 1703 + EXT4_MF_FC_INELIGIBLE, /* Fast commit ineligible */ 1704 + EXT4_MF_FC_COMMITTING /* File system underoing a fast 1705 + * commit. 1706 + */ 1707 + }; 1708 + 1709 + static inline void ext4_set_mount_flag(struct super_block *sb, int bit) 1710 + { 1711 + set_bit(bit, &EXT4_SB(sb)->s_mount_flags); 1712 + } 1713 + 1714 + static inline void ext4_clear_mount_flag(struct super_block *sb, int bit) 1715 + { 1716 + clear_bit(bit, &EXT4_SB(sb)->s_mount_flags); 1717 + } 1718 + 1719 + static inline int ext4_test_mount_flag(struct super_block *sb, int bit) 1720 + { 1721 + return test_bit(bit, &EXT4_SB(sb)->s_mount_flags); 1722 + } 1723 + 1724 + 1725 + /* 1685 1726 * Simulate_fail codes 1686 1727 */ 1687 1728 #define EXT4_SIM_BBITMAP_EIO 1 ··· 1878 1863 #define EXT4_FEATURE_COMPAT_RESIZE_INODE 0x0010 1879 1864 #define EXT4_FEATURE_COMPAT_DIR_INDEX 0x0020 1880 1865 #define EXT4_FEATURE_COMPAT_SPARSE_SUPER2 0x0200 1866 + /* 1867 + * The reason why "FAST_COMMIT" is a compat feature is that, FS becomes 1868 + * incompatible only if fast commit blocks are present in the FS. Since we 1869 + * clear the journal (and thus the fast commit blocks), we don't mark FS as 1870 + * incompatible. We also have a JBD2 incompat feature, which gets set when 1871 + * there are fast commit blocks present in the journal. 1872 + */ 1881 1873 #define EXT4_FEATURE_COMPAT_FAST_COMMIT 0x0400 1882 1874 #define EXT4_FEATURE_COMPAT_STABLE_INODES 0x0800 1883 1875 ··· 2753 2731 int ext4_fc_info_show(struct seq_file *seq, void *v); 2754 2732 void ext4_fc_init(struct super_block *sb, journal_t *journal); 2755 2733 void ext4_fc_init_inode(struct inode *inode); 2756 - void ext4_fc_track_range(struct inode *inode, ext4_lblk_t start, 2734 + void ext4_fc_track_range(handle_t *handle, struct inode *inode, ext4_lblk_t start, 2757 2735 ext4_lblk_t end); 2758 - void ext4_fc_track_unlink(struct inode *inode, struct dentry *dentry); 2759 - void ext4_fc_track_link(struct inode *inode, struct dentry *dentry); 2760 - void ext4_fc_track_create(struct inode *inode, struct dentry *dentry); 2761 - void ext4_fc_track_inode(struct inode *inode); 2736 + void __ext4_fc_track_unlink(handle_t *handle, struct inode *inode, 2737 + struct dentry *dentry); 2738 + void __ext4_fc_track_link(handle_t *handle, struct inode *inode, 2739 + struct dentry *dentry); 2740 + void ext4_fc_track_unlink(handle_t *handle, struct dentry *dentry); 2741 + void ext4_fc_track_link(handle_t *handle, struct dentry *dentry); 2742 + void ext4_fc_track_create(handle_t *handle, struct dentry *dentry); 2743 + void ext4_fc_track_inode(handle_t *handle, struct inode *inode); 2762 2744 void ext4_fc_mark_ineligible(struct super_block *sb, int reason); 2763 2745 void ext4_fc_start_ineligible(struct super_block *sb, int reason); 2764 2746 void ext4_fc_stop_ineligible(struct super_block *sb); ··· 3478 3452 extern int ext4_ci_compare(const struct inode *parent, 3479 3453 const struct qstr *fname, 3480 3454 const struct qstr *entry, bool quick); 3481 - extern int __ext4_unlink(struct inode *dir, const struct qstr *d_name, 3455 + extern int __ext4_unlink(handle_t *handle, struct inode *dir, const struct qstr *d_name, 3482 3456 struct inode *inode); 3483 3457 extern int __ext4_link(struct inode *dir, struct inode *inode, 3484 3458 struct dentry *dentry);

+1 -6

fs/ext4/extents.c

··· 3724 3724 err = ext4_ext_dirty(handle, inode, path + path->p_depth); 3725 3725 out: 3726 3726 ext4_ext_show_leaf(inode, path); 3727 - ext4_fc_track_range(inode, ee_block, ee_block + ee_len - 1); 3728 3727 return err; 3729 3728 } 3730 3729 ··· 3795 3796 if (*allocated > map->m_len) 3796 3797 *allocated = map->m_len; 3797 3798 map->m_len = *allocated; 3798 - ext4_fc_track_range(inode, ee_block, ee_block + ee_len - 1); 3799 3799 return 0; 3800 3800 } 3801 3801 ··· 4327 4329 map->m_len = ar.len; 4328 4330 allocated = map->m_len; 4329 4331 ext4_ext_show_leaf(inode, path); 4330 - ext4_fc_track_range(inode, map->m_lblk, map->m_lblk + map->m_len - 1); 4331 4332 out: 4332 4333 ext4_ext_drop_refs(path); 4333 4334 kfree(path); ··· 4599 4602 ret = ext4_mark_inode_dirty(handle, inode); 4600 4603 if (unlikely(ret)) 4601 4604 goto out_handle; 4602 - ext4_fc_track_range(inode, offset >> inode->i_sb->s_blocksize_bits, 4605 + ext4_fc_track_range(handle, inode, offset >> inode->i_sb->s_blocksize_bits, 4603 4606 (offset + len - 1) >> inode->i_sb->s_blocksize_bits); 4604 4607 /* Zero out partial block at the edges of the range */ 4605 4608 ret = ext4_zero_partial_blocks(handle, inode, offset, len); ··· 4648 4651 FALLOC_FL_COLLAPSE_RANGE | FALLOC_FL_ZERO_RANGE | 4649 4652 FALLOC_FL_INSERT_RANGE)) 4650 4653 return -EOPNOTSUPP; 4651 - ext4_fc_track_range(inode, offset >> blkbits, 4652 - (offset + len - 1) >> blkbits); 4653 4654 4654 4655 ext4_fc_start_update(inode); 4655 4656

+90 -84

fs/ext4/fast_commit.c

··· 83 83 * 84 84 * Atomicity of commits 85 85 * -------------------- 86 - * In order to gaurantee atomicity during the commit operation, fast commit 86 + * In order to guarantee atomicity during the commit operation, fast commit 87 87 * uses "EXT4_FC_TAG_TAIL" tag that marks a fast commit as complete. Tail 88 88 * tag contains CRC of the contents and TID of the transaction after which 89 89 * this fast commit should be applied. Recovery code replays fast commit ··· 152 152 INIT_LIST_HEAD(&ei->i_fc_list); 153 153 init_waitqueue_head(&ei->i_fc_wait); 154 154 atomic_set(&ei->i_fc_updates, 0); 155 - ei->i_fc_committed_subtid = 0; 155 + } 156 + 157 + /* This function must be called with sbi->s_fc_lock held. */ 158 + static void ext4_fc_wait_committing_inode(struct inode *inode) 159 + __releases(&EXT4_SB(inode->i_sb)->s_fc_lock) 160 + { 161 + wait_queue_head_t *wq; 162 + struct ext4_inode_info *ei = EXT4_I(inode); 163 + 164 + #if (BITS_PER_LONG < 64) 165 + DEFINE_WAIT_BIT(wait, &ei->i_state_flags, 166 + EXT4_STATE_FC_COMMITTING); 167 + wq = bit_waitqueue(&ei->i_state_flags, 168 + EXT4_STATE_FC_COMMITTING); 169 + #else 170 + DEFINE_WAIT_BIT(wait, &ei->i_flags, 171 + EXT4_STATE_FC_COMMITTING); 172 + wq = bit_waitqueue(&ei->i_flags, 173 + EXT4_STATE_FC_COMMITTING); 174 + #endif 175 + lockdep_assert_held(&EXT4_SB(inode->i_sb)->s_fc_lock); 176 + prepare_to_wait(wq, &wait.wq_entry, TASK_UNINTERRUPTIBLE); 177 + spin_unlock(&EXT4_SB(inode->i_sb)->s_fc_lock); 178 + schedule(); 179 + finish_wait(wq, &wait.wq_entry); 156 180 } 157 181 158 182 /* ··· 200 176 goto out; 201 177 202 178 if (ext4_test_inode_state(inode, EXT4_STATE_FC_COMMITTING)) { 203 - wait_queue_head_t *wq; 204 - #if (BITS_PER_LONG < 64) 205 - DEFINE_WAIT_BIT(wait, &ei->i_state_flags, 206 - EXT4_STATE_FC_COMMITTING); 207 - wq = bit_waitqueue(&ei->i_state_flags, 208 - EXT4_STATE_FC_COMMITTING); 209 - #else 210 - DEFINE_WAIT_BIT(wait, &ei->i_flags, 211 - EXT4_STATE_FC_COMMITTING); 212 - wq = bit_waitqueue(&ei->i_flags, 213 - EXT4_STATE_FC_COMMITTING); 214 - #endif 215 - prepare_to_wait(wq, &wait.wq_entry, TASK_UNINTERRUPTIBLE); 216 - spin_unlock(&EXT4_SB(inode->i_sb)->s_fc_lock); 217 - schedule(); 218 - finish_wait(wq, &wait.wq_entry); 179 + ext4_fc_wait_committing_inode(inode); 219 180 goto restart; 220 181 } 221 182 out: ··· 243 234 } 244 235 245 236 if (ext4_test_inode_state(inode, EXT4_STATE_FC_COMMITTING)) { 246 - wait_queue_head_t *wq; 247 - #if (BITS_PER_LONG < 64) 248 - DEFINE_WAIT_BIT(wait, &ei->i_state_flags, 249 - EXT4_STATE_FC_COMMITTING); 250 - wq = bit_waitqueue(&ei->i_state_flags, 251 - EXT4_STATE_FC_COMMITTING); 252 - #else 253 - DEFINE_WAIT_BIT(wait, &ei->i_flags, 254 - EXT4_STATE_FC_COMMITTING); 255 - wq = bit_waitqueue(&ei->i_flags, 256 - EXT4_STATE_FC_COMMITTING); 257 - #endif 258 - prepare_to_wait(wq, &wait.wq_entry, TASK_UNINTERRUPTIBLE); 259 - spin_unlock(&EXT4_SB(inode->i_sb)->s_fc_lock); 260 - schedule(); 261 - finish_wait(wq, &wait.wq_entry); 237 + ext4_fc_wait_committing_inode(inode); 262 238 goto restart; 263 239 } 264 - if (!list_empty(&ei->i_fc_list)) 265 - list_del_init(&ei->i_fc_list); 240 + list_del_init(&ei->i_fc_list); 266 241 spin_unlock(&EXT4_SB(inode->i_sb)->s_fc_lock); 267 242 } 268 243 ··· 262 269 (EXT4_SB(sb)->s_mount_state & EXT4_FC_REPLAY)) 263 270 return; 264 271 265 - sbi->s_mount_flags |= EXT4_MF_FC_INELIGIBLE; 272 + ext4_set_mount_flag(sb, EXT4_MF_FC_INELIGIBLE); 266 273 WARN_ON(reason >= EXT4_FC_REASON_MAX); 267 274 sbi->s_fc_stats.fc_ineligible_reason_count[reason]++; 268 275 } ··· 295 302 (EXT4_SB(sb)->s_mount_state & EXT4_FC_REPLAY)) 296 303 return; 297 304 298 - EXT4_SB(sb)->s_mount_flags |= EXT4_MF_FC_INELIGIBLE; 305 + ext4_set_mount_flag(sb, EXT4_MF_FC_INELIGIBLE); 299 306 atomic_dec(&EXT4_SB(sb)->s_fc_ineligible_updates); 300 307 } 301 308 302 309 static inline int ext4_fc_is_ineligible(struct super_block *sb) 303 310 { 304 - return (EXT4_SB(sb)->s_mount_flags & EXT4_MF_FC_INELIGIBLE) || 305 - atomic_read(&EXT4_SB(sb)->s_fc_ineligible_updates); 311 + return (ext4_test_mount_flag(sb, EXT4_MF_FC_INELIGIBLE) || 312 + atomic_read(&EXT4_SB(sb)->s_fc_ineligible_updates)); 306 313 } 307 314 308 315 /* ··· 316 323 * If enqueue is set, this function enqueues the inode in fast commit list. 317 324 */ 318 325 static int ext4_fc_track_template( 319 - struct inode *inode, int (*__fc_track_fn)(struct inode *, void *, bool), 326 + handle_t *handle, struct inode *inode, 327 + int (*__fc_track_fn)(struct inode *, void *, bool), 320 328 void *args, int enqueue) 321 329 { 322 - tid_t running_txn_tid; 323 330 bool update = false; 324 331 struct ext4_inode_info *ei = EXT4_I(inode); 325 332 struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb); 333 + tid_t tid = 0; 326 334 int ret; 327 335 328 336 if (!test_opt2(inode->i_sb, JOURNAL_FAST_COMMIT) || ··· 333 339 if (ext4_fc_is_ineligible(inode->i_sb)) 334 340 return -EINVAL; 335 341 336 - running_txn_tid = sbi->s_journal ? 337 - sbi->s_journal->j_commit_sequence + 1 : 0; 338 - 342 + tid = handle->h_transaction->t_tid; 339 343 mutex_lock(&ei->i_fc_lock); 340 - if (running_txn_tid == ei->i_sync_tid) { 344 + if (tid == ei->i_sync_tid) { 341 345 update = true; 342 346 } else { 343 347 ext4_fc_reset_inode(inode); 344 - ei->i_sync_tid = running_txn_tid; 348 + ei->i_sync_tid = tid; 345 349 } 346 350 ret = __fc_track_fn(inode, args, update); 347 351 mutex_unlock(&ei->i_fc_lock); ··· 350 358 spin_lock(&sbi->s_fc_lock); 351 359 if (list_empty(&EXT4_I(inode)->i_fc_list)) 352 360 list_add_tail(&EXT4_I(inode)->i_fc_list, 353 - (sbi->s_mount_flags & EXT4_MF_FC_COMMITTING) ? 361 + (ext4_test_mount_flag(inode->i_sb, EXT4_MF_FC_COMMITTING)) ? 354 362 &sbi->s_fc_q[FC_Q_STAGING] : 355 363 &sbi->s_fc_q[FC_Q_MAIN]); 356 364 spin_unlock(&sbi->s_fc_lock); ··· 376 384 mutex_unlock(&ei->i_fc_lock); 377 385 node = kmem_cache_alloc(ext4_fc_dentry_cachep, GFP_NOFS); 378 386 if (!node) { 379 - ext4_fc_mark_ineligible(inode->i_sb, EXT4_FC_REASON_MEM); 387 + ext4_fc_mark_ineligible(inode->i_sb, EXT4_FC_REASON_NOMEM); 380 388 mutex_lock(&ei->i_fc_lock); 381 389 return -ENOMEM; 382 390 } ··· 389 397 if (!node->fcd_name.name) { 390 398 kmem_cache_free(ext4_fc_dentry_cachep, node); 391 399 ext4_fc_mark_ineligible(inode->i_sb, 392 - EXT4_FC_REASON_MEM); 400 + EXT4_FC_REASON_NOMEM); 393 401 mutex_lock(&ei->i_fc_lock); 394 402 return -ENOMEM; 395 403 } ··· 403 411 node->fcd_name.len = dentry->d_name.len; 404 412 405 413 spin_lock(&sbi->s_fc_lock); 406 - if (sbi->s_mount_flags & EXT4_MF_FC_COMMITTING) 414 + if (ext4_test_mount_flag(inode->i_sb, EXT4_MF_FC_COMMITTING)) 407 415 list_add_tail(&node->fcd_list, 408 416 &sbi->s_fc_dentry_q[FC_Q_STAGING]); 409 417 else ··· 414 422 return 0; 415 423 } 416 424 417 - void ext4_fc_track_unlink(struct inode *inode, struct dentry *dentry) 425 + void __ext4_fc_track_unlink(handle_t *handle, 426 + struct inode *inode, struct dentry *dentry) 418 427 { 419 428 struct __track_dentry_update_args args; 420 429 int ret; ··· 423 430 args.dentry = dentry; 424 431 args.op = EXT4_FC_TAG_UNLINK; 425 432 426 - ret = ext4_fc_track_template(inode, __track_dentry_update, 433 + ret = ext4_fc_track_template(handle, inode, __track_dentry_update, 427 434 (void *)&args, 0); 428 435 trace_ext4_fc_track_unlink(inode, dentry, ret); 429 436 } 430 437 431 - void ext4_fc_track_link(struct inode *inode, struct dentry *dentry) 438 + void ext4_fc_track_unlink(handle_t *handle, struct dentry *dentry) 439 + { 440 + __ext4_fc_track_unlink(handle, d_inode(dentry), dentry); 441 + } 442 + 443 + void __ext4_fc_track_link(handle_t *handle, 444 + struct inode *inode, struct dentry *dentry) 432 445 { 433 446 struct __track_dentry_update_args args; 434 447 int ret; ··· 442 443 args.dentry = dentry; 443 444 args.op = EXT4_FC_TAG_LINK; 444 445 445 - ret = ext4_fc_track_template(inode, __track_dentry_update, 446 + ret = ext4_fc_track_template(handle, inode, __track_dentry_update, 446 447 (void *)&args, 0); 447 448 trace_ext4_fc_track_link(inode, dentry, ret); 448 449 } 449 450 450 - void ext4_fc_track_create(struct inode *inode, struct dentry *dentry) 451 + void ext4_fc_track_link(handle_t *handle, struct dentry *dentry) 452 + { 453 + __ext4_fc_track_link(handle, d_inode(dentry), dentry); 454 + } 455 + 456 + void ext4_fc_track_create(handle_t *handle, struct dentry *dentry) 451 457 { 452 458 struct __track_dentry_update_args args; 459 + struct inode *inode = d_inode(dentry); 453 460 int ret; 454 461 455 462 args.dentry = dentry; 456 463 args.op = EXT4_FC_TAG_CREAT; 457 464 458 - ret = ext4_fc_track_template(inode, __track_dentry_update, 465 + ret = ext4_fc_track_template(handle, inode, __track_dentry_update, 459 466 (void *)&args, 0); 460 467 trace_ext4_fc_track_create(inode, dentry, ret); 461 468 } ··· 477 472 return 0; 478 473 } 479 474 480 - void ext4_fc_track_inode(struct inode *inode) 475 + void ext4_fc_track_inode(handle_t *handle, struct inode *inode) 481 476 { 482 477 int ret; 483 478 484 479 if (S_ISDIR(inode->i_mode)) 485 480 return; 486 481 487 - ret = ext4_fc_track_template(inode, __track_inode, NULL, 1); 482 + if (ext4_should_journal_data(inode)) { 483 + ext4_fc_mark_ineligible(inode->i_sb, 484 + EXT4_FC_REASON_INODE_JOURNAL_DATA); 485 + return; 486 + } 487 + 488 + ret = ext4_fc_track_template(handle, inode, __track_inode, NULL, 1); 488 489 trace_ext4_fc_track_inode(inode, ret); 489 490 } 490 491 ··· 526 515 return 0; 527 516 } 528 517 529 - void ext4_fc_track_range(struct inode *inode, ext4_lblk_t start, 518 + void ext4_fc_track_range(handle_t *handle, struct inode *inode, ext4_lblk_t start, 530 519 ext4_lblk_t end) 531 520 { 532 521 struct __track_range_args args; ··· 538 527 args.start = start; 539 528 args.end = end; 540 529 541 - ret = ext4_fc_track_template(inode, __track_range, &args, 1); 530 + ret = ext4_fc_track_template(handle, inode, __track_range, &args, 1); 542 531 543 532 trace_ext4_fc_track_range(inode, start, end, ret); 544 533 } ··· 548 537 int write_flags = REQ_SYNC; 549 538 struct buffer_head *bh = EXT4_SB(sb)->s_fc_bh; 550 539 540 + /* TODO: REQ_FUA | REQ_PREFLUSH is unnecessarily expensive. */ 551 541 if (test_opt(sb, BARRIER)) 552 542 write_flags |= REQ_FUA | REQ_PREFLUSH; 553 543 lock_buffer(bh); 554 - clear_buffer_dirty(bh); 544 + set_buffer_dirty(bh); 555 545 set_buffer_uptodate(bh); 556 546 bh->b_end_io = ext4_end_buffer_io_sync; 557 547 submit_bh(REQ_OP_WRITE, write_flags, bh); ··· 858 846 int ret = 0; 859 847 860 848 spin_lock(&sbi->s_fc_lock); 861 - sbi->s_mount_flags |= EXT4_MF_FC_COMMITTING; 849 + ext4_set_mount_flag(sb, EXT4_MF_FC_COMMITTING); 862 850 list_for_each(pos, &sbi->s_fc_q[FC_Q_MAIN]) { 863 851 ei = list_entry(pos, struct ext4_inode_info, i_fc_list); 864 852 ext4_set_inode_state(&ei->vfs_inode, EXT4_STATE_FC_COMMITTING); ··· 912 900 913 901 /* Commit all the directory entry updates */ 914 902 static int ext4_fc_commit_dentry_updates(journal_t *journal, u32 *crc) 903 + __acquires(&sbi->s_fc_lock) 904 + __releases(&sbi->s_fc_lock) 915 905 { 916 906 struct super_block *sb = (struct super_block *)(journal->j_private); 917 907 struct ext4_sb_info *sbi = EXT4_SB(sb); ··· 1010 996 if (ret) 1011 997 return ret; 1012 998 999 + /* 1000 + * If file system device is different from journal device, issue a cache 1001 + * flush before we start writing fast commit blocks. 1002 + */ 1003 + if (journal->j_fs_dev != journal->j_dev) 1004 + blkdev_issue_flush(journal->j_fs_dev, GFP_NOFS); 1005 + 1013 1006 blk_start_plug(&plug); 1014 1007 if (sbi->s_fc_bytes == 0) { 1015 1008 /* ··· 1052 1031 if (ret) 1053 1032 goto out; 1054 1033 spin_lock(&sbi->s_fc_lock); 1055 - EXT4_I(inode)->i_fc_committed_subtid = 1056 - atomic_read(&sbi->s_fc_subtid); 1057 1034 } 1058 1035 spin_unlock(&sbi->s_fc_lock); 1059 1036 ··· 1150 1131 "Fast commit ended with blks = %d, reason = %d, subtid - %d", 1151 1132 nblks, reason, subtid); 1152 1133 if (reason == EXT4_FC_REASON_FC_FAILED) 1153 - return jbd2_fc_end_commit_fallback(journal, commit_tid); 1134 + return jbd2_fc_end_commit_fallback(journal); 1154 1135 if (reason == EXT4_FC_REASON_FC_START_FAILED || 1155 1136 reason == EXT4_FC_REASON_INELIGIBLE) 1156 1137 return jbd2_complete_transaction(journal, commit_tid); ··· 1209 1190 list_splice_init(&sbi->s_fc_q[FC_Q_STAGING], 1210 1191 &sbi->s_fc_q[FC_Q_STAGING]); 1211 1192 1212 - sbi->s_mount_flags &= ~EXT4_MF_FC_COMMITTING; 1213 - sbi->s_mount_flags &= ~EXT4_MF_FC_INELIGIBLE; 1193 + ext4_clear_mount_flag(sb, EXT4_MF_FC_COMMITTING); 1194 + ext4_clear_mount_flag(sb, EXT4_MF_FC_INELIGIBLE); 1214 1195 1215 1196 if (full) 1216 1197 sbi->s_fc_bytes = 0; ··· 1282 1263 return 0; 1283 1264 } 1284 1265 1285 - ret = __ext4_unlink(old_parent, &entry, inode); 1266 + ret = __ext4_unlink(NULL, old_parent, &entry, inode); 1286 1267 /* -ENOENT ok coz it might not exist anymore. */ 1287 1268 if (ret == -ENOENT) 1288 1269 ret = 0; ··· 2098 2079 2099 2080 void ext4_fc_init(struct super_block *sb, journal_t *journal) 2100 2081 { 2101 - int num_fc_blocks; 2102 - 2103 2082 /* 2104 2083 * We set replay callback even if fast commit disabled because we may 2105 2084 * could still have fast commit blocks that need to be replayed even if ··· 2107 2090 if (!test_opt2(sb, JOURNAL_FAST_COMMIT)) 2108 2091 return; 2109 2092 journal->j_fc_cleanup_callback = ext4_fc_cleanup; 2110 - if (!buffer_uptodate(journal->j_sb_buffer) 2111 - && ext4_read_bh_lock(journal->j_sb_buffer, REQ_META | REQ_PRIO, 2112 - true)) { 2113 - ext4_msg(sb, KERN_ERR, "I/O error on journal"); 2114 - return; 2115 - } 2116 - num_fc_blocks = be32_to_cpu(journal->j_superblock->s_num_fc_blks); 2117 - if (jbd2_fc_init(journal, num_fc_blocks ? num_fc_blocks : 2118 - EXT4_NUM_FC_BLKS)) { 2119 - pr_warn("Error while enabling fast commits, turning off."); 2120 - ext4_clear_feature_fast_commit(sb); 2121 - } 2122 2093 } 2123 2094 2124 - const char *fc_ineligible_reasons[] = { 2095 + static const char *fc_ineligible_reasons[] = { 2125 2096 "Extended attributes changed", 2126 2097 "Cross rename", 2127 2098 "Journal flag changed", ··· 2118 2113 "Resize", 2119 2114 "Dir renamed", 2120 2115 "Falloc range op", 2116 + "Data journalling", 2121 2117 "FC Commit Failed" 2122 2118 }; 2123 2119

+2 -4

fs/ext4/fast_commit.h

··· 3 3 #ifndef __FAST_COMMIT_H__ 4 4 #define __FAST_COMMIT_H__ 5 5 6 - /* Number of blocks in journal area to allocate for fast commits */ 7 - #define EXT4_NUM_FC_BLKS 256 8 - 9 6 /* Fast commit tags */ 10 7 #define EXT4_FC_TAG_ADD_RANGE 0x0001 11 8 #define EXT4_FC_TAG_DEL_RANGE 0x0002 ··· 97 100 EXT4_FC_REASON_XATTR = 0, 98 101 EXT4_FC_REASON_CROSS_RENAME, 99 102 EXT4_FC_REASON_JOURNAL_FLAG_CHANGE, 100 - EXT4_FC_REASON_MEM, 103 + EXT4_FC_REASON_NOMEM, 101 104 EXT4_FC_REASON_SWAP_BOOT, 102 105 EXT4_FC_REASON_RESIZE, 103 106 EXT4_FC_REASON_RENAME_DIR, 104 107 EXT4_FC_REASON_FALLOC_RANGE, 108 + EXT4_FC_REASON_INODE_JOURNAL_DATA, 105 109 EXT4_FC_COMMIT_FAILED, 106 110 EXT4_FC_REASON_MAX 107 111 };

+2 -4

fs/ext4/file.c

··· 761 761 if (!daxdev_mapping_supported(vma, dax_dev)) 762 762 return -EOPNOTSUPP; 763 763 764 - ext4_fc_start_update(inode); 765 764 file_accessed(file); 766 765 if (IS_DAX(file_inode(file))) { 767 766 vma->vm_ops = &ext4_dax_vm_ops; ··· 768 769 } else { 769 770 vma->vm_ops = &ext4_file_vm_ops; 770 771 } 771 - ext4_fc_stop_update(inode); 772 772 return 0; 773 773 } 774 774 ··· 780 782 handle_t *handle; 781 783 int err; 782 784 783 - if (likely(sbi->s_mount_flags & EXT4_MF_MNTDIR_SAMPLED)) 785 + if (likely(ext4_test_mount_flag(sb, EXT4_MF_MNTDIR_SAMPLED))) 784 786 return 0; 785 787 786 788 if (sb_rdonly(sb) || !sb_start_intwrite_trylock(sb)) 787 789 return 0; 788 790 789 - sbi->s_mount_flags |= EXT4_MF_MNTDIR_SAMPLED; 791 + ext4_set_mount_flag(sb, EXT4_MF_MNTDIR_SAMPLED); 790 792 /* 791 793 * Sample where the filesystem has been mounted and 792 794 * store it in the superblock for sysadmin convenience

+1 -1

fs/ext4/fsmap.c

··· 280 280 281 281 /* Fabricate an rmap entry for the external log device. */ 282 282 irec.fmr_physical = journal->j_blk_offset; 283 - irec.fmr_length = journal->j_maxlen; 283 + irec.fmr_length = journal->j_total_len; 284 284 irec.fmr_owner = EXT4_FMR_OWN_LOG; 285 285 irec.fmr_flags = 0; 286 286

+1 -1

fs/ext4/fsync.c

··· 143 143 if (sb_rdonly(inode->i_sb)) { 144 144 /* Make sure that we read updated s_mount_flags value */ 145 145 smp_rmb(); 146 - if (sbi->s_mount_flags & EXT4_MF_FS_ABORTED) 146 + if (ext4_test_mount_flag(inode->i_sb, EXT4_MF_FS_ABORTED)) 147 147 ret = -EROFS; 148 148 goto out; 149 149 }

+1

fs/ext4/inline.c

··· 1880 1880 1881 1881 ext4_write_lock_xattr(inode, &no_expand); 1882 1882 if (!ext4_has_inline_data(inode)) { 1883 + ext4_write_unlock_xattr(inode, &no_expand); 1883 1884 *has_inline = 0; 1884 1885 ext4_journal_stop(handle); 1885 1886 return 0;

+10 -9

fs/ext4/inode.c

··· 327 327 ext4_xattr_inode_array_free(ea_inode_array); 328 328 return; 329 329 no_delete: 330 + if (!list_empty(&EXT4_I(inode)->i_fc_list)) 331 + ext4_fc_mark_ineligible(inode->i_sb, EXT4_FC_REASON_NOMEM); 330 332 ext4_clear_inode(inode); /* We must guarantee clearing of inode... */ 331 333 } 332 334 ··· 732 730 if (ret) 733 731 return ret; 734 732 } 735 - ext4_fc_track_range(inode, map->m_lblk, 733 + ext4_fc_track_range(handle, inode, map->m_lblk, 736 734 map->m_lblk + map->m_len - 1); 737 735 } 738 736 ··· 2442 2440 struct super_block *sb = inode->i_sb; 2443 2441 2444 2442 if (ext4_forced_shutdown(EXT4_SB(sb)) || 2445 - EXT4_SB(sb)->s_mount_flags & EXT4_MF_FS_ABORTED) 2443 + ext4_test_mount_flag(sb, EXT4_MF_FS_ABORTED)) 2446 2444 goto invalidate_dirty_pages; 2447 2445 /* 2448 2446 * Let the uper layers retry transient errors. ··· 2676 2674 * the stack trace. 2677 2675 */ 2678 2676 if (unlikely(ext4_forced_shutdown(EXT4_SB(mapping->host->i_sb)) || 2679 - sbi->s_mount_flags & EXT4_MF_FS_ABORTED)) { 2677 + ext4_test_mount_flag(inode->i_sb, EXT4_MF_FS_ABORTED))) { 2680 2678 ret = -EROFS; 2681 2679 goto out_writepages; 2682 2680 } ··· 3312 3310 EXT4_I(inode)->i_datasync_tid)) 3313 3311 return false; 3314 3312 if (test_opt2(inode->i_sb, JOURNAL_FAST_COMMIT)) 3315 - return atomic_read(&EXT4_SB(inode->i_sb)->s_fc_subtid) < 3316 - EXT4_I(inode)->i_fc_committed_subtid; 3313 + return !list_empty(&EXT4_I(inode)->i_fc_list); 3317 3314 return true; 3318 3315 } 3319 3316 ··· 4110 4109 4111 4110 up_write(&EXT4_I(inode)->i_data_sem); 4112 4111 } 4113 - ext4_fc_track_range(inode, first_block, stop_block); 4112 + ext4_fc_track_range(handle, inode, first_block, stop_block); 4114 4113 if (IS_SYNC(inode)) 4115 4114 ext4_handle_sync(handle); 4116 4115 ··· 5443 5442 } 5444 5443 5445 5444 if (shrink) 5446 - ext4_fc_track_range(inode, 5445 + ext4_fc_track_range(handle, inode, 5447 5446 (attr->ia_size > 0 ? attr->ia_size - 1 : 0) >> 5448 5447 inode->i_sb->s_blocksize_bits, 5449 5448 (oldsize > 0 ? oldsize - 1 : 0) >> 5450 5449 inode->i_sb->s_blocksize_bits); 5451 5450 else 5452 5451 ext4_fc_track_range( 5453 - inode, 5452 + handle, inode, 5454 5453 (oldsize > 0 ? oldsize - 1 : oldsize) >> 5455 5454 inode->i_sb->s_blocksize_bits, 5456 5455 (attr->ia_size > 0 ? attr->ia_size - 1 : 0) >> ··· 5700 5699 put_bh(iloc->bh); 5701 5700 return -EIO; 5702 5701 } 5703 - ext4_fc_track_inode(inode); 5702 + ext4_fc_track_inode(handle, inode); 5704 5703 5705 5704 if (IS_I_VERSION(inode)) 5706 5705 inode_inc_iversion(inode);

+3 -3

fs/ext4/mballoc.c

··· 4477 4477 { 4478 4478 ext4_group_t i, ngroups; 4479 4479 4480 - if (EXT4_SB(sb)->s_mount_flags & EXT4_MF_FS_ABORTED) 4480 + if (ext4_test_mount_flag(sb, EXT4_MF_FS_ABORTED)) 4481 4481 return; 4482 4482 4483 4483 ngroups = ext4_get_groups_count(sb); ··· 4508 4508 { 4509 4509 struct super_block *sb = ac->ac_sb; 4510 4510 4511 - if (EXT4_SB(sb)->s_mount_flags & EXT4_MF_FS_ABORTED) 4511 + if (ext4_test_mount_flag(sb, EXT4_MF_FS_ABORTED)) 4512 4512 return; 4513 4513 4514 4514 mb_debug(sb, "Can't allocate:" ··· 5167 5167 struct super_block *sb = ar->inode->i_sb; 5168 5168 ext4_group_t group; 5169 5169 ext4_grpblk_t blkoff; 5170 - int i; 5170 + int i = sb->s_blocksize; 5171 5171 ext4_fsblk_t goal, block; 5172 5172 struct ext4_super_block *es = EXT4_SB(sb)->s_es; 5173 5173

+28 -33

fs/ext4/namei.c

··· 2606 2606 bool excl) 2607 2607 { 2608 2608 handle_t *handle; 2609 - struct inode *inode, *inode_save; 2609 + struct inode *inode; 2610 2610 int err, credits, retries = 0; 2611 2611 2612 2612 err = dquot_initialize(dir); ··· 2624 2624 inode->i_op = &ext4_file_inode_operations; 2625 2625 inode->i_fop = &ext4_file_operations; 2626 2626 ext4_set_aops(inode); 2627 - inode_save = inode; 2628 - ihold(inode_save); 2629 2627 err = ext4_add_nondir(handle, dentry, &inode); 2630 - ext4_fc_track_create(inode_save, dentry); 2631 - iput(inode_save); 2628 + if (!err) 2629 + ext4_fc_track_create(handle, dentry); 2632 2630 } 2633 2631 if (handle) 2634 2632 ext4_journal_stop(handle); ··· 2641 2643 umode_t mode, dev_t rdev) 2642 2644 { 2643 2645 handle_t *handle; 2644 - struct inode *inode, *inode_save; 2646 + struct inode *inode; 2645 2647 int err, credits, retries = 0; 2646 2648 2647 2649 err = dquot_initialize(dir); ··· 2658 2660 if (!IS_ERR(inode)) { 2659 2661 init_special_inode(inode, inode->i_mode, rdev); 2660 2662 inode->i_op = &ext4_special_inode_operations; 2661 - inode_save = inode; 2662 - ihold(inode_save); 2663 2663 err = ext4_add_nondir(handle, dentry, &inode); 2664 2664 if (!err) 2665 - ext4_fc_track_create(inode_save, dentry); 2666 - iput(inode_save); 2665 + ext4_fc_track_create(handle, dentry); 2667 2666 } 2668 2667 if (handle) 2669 2668 ext4_journal_stop(handle); ··· 2824 2829 iput(inode); 2825 2830 goto out_retry; 2826 2831 } 2827 - ext4_fc_track_create(inode, dentry); 2828 2832 ext4_inc_count(dir); 2829 2833 2830 2834 ext4_update_dx_flag(dir); ··· 2831 2837 if (err) 2832 2838 goto out_clear_inode; 2833 2839 d_instantiate_new(dentry, inode); 2840 + ext4_fc_track_create(handle, dentry); 2834 2841 if (IS_DIRSYNC(dir)) 2835 2842 ext4_handle_sync(handle); 2836 2843 ··· 3166 3171 goto end_rmdir; 3167 3172 ext4_dec_count(dir); 3168 3173 ext4_update_dx_flag(dir); 3169 - ext4_fc_track_unlink(inode, dentry); 3174 + ext4_fc_track_unlink(handle, dentry); 3170 3175 retval = ext4_mark_inode_dirty(handle, dir); 3171 3176 3172 3177 #ifdef CONFIG_UNICODE ··· 3187 3192 return retval; 3188 3193 } 3189 3194 3190 - int __ext4_unlink(struct inode *dir, const struct qstr *d_name, 3195 + int __ext4_unlink(handle_t *handle, struct inode *dir, const struct qstr *d_name, 3191 3196 struct inode *inode) 3192 3197 { 3193 3198 int retval = -ENOENT; 3194 3199 struct buffer_head *bh; 3195 3200 struct ext4_dir_entry_2 *de; 3196 - handle_t *handle = NULL; 3197 3201 int skip_remove_dentry = 0; 3198 3202 3199 3203 bh = ext4_find_entry(dir, d_name, &de, NULL); ··· 3211 3217 if (EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY) 3212 3218 skip_remove_dentry = 1; 3213 3219 else 3214 - goto out_bh; 3215 - } 3216 - 3217 - handle = ext4_journal_start(dir, EXT4_HT_DIR, 3218 - EXT4_DATA_TRANS_BLOCKS(dir->i_sb)); 3219 - if (IS_ERR(handle)) { 3220 - retval = PTR_ERR(handle); 3221 - goto out_bh; 3220 + goto out; 3222 3221 } 3223 3222 3224 3223 if (IS_DIRSYNC(dir)) ··· 3220 3233 if (!skip_remove_dentry) { 3221 3234 retval = ext4_delete_entry(handle, dir, de, bh); 3222 3235 if (retval) 3223 - goto out_handle; 3236 + goto out; 3224 3237 dir->i_ctime = dir->i_mtime = current_time(dir); 3225 3238 ext4_update_dx_flag(dir); 3226 3239 retval = ext4_mark_inode_dirty(handle, dir); 3227 3240 if (retval) 3228 - goto out_handle; 3241 + goto out; 3229 3242 } else { 3230 3243 retval = 0; 3231 3244 } ··· 3239 3252 inode->i_ctime = current_time(inode); 3240 3253 retval = ext4_mark_inode_dirty(handle, inode); 3241 3254 3242 - out_handle: 3243 - ext4_journal_stop(handle); 3244 - out_bh: 3255 + out: 3245 3256 brelse(bh); 3246 3257 return retval; 3247 3258 } 3248 3259 3249 3260 static int ext4_unlink(struct inode *dir, struct dentry *dentry) 3250 3261 { 3262 + handle_t *handle; 3251 3263 int retval; 3252 3264 3253 3265 if (unlikely(ext4_forced_shutdown(EXT4_SB(dir->i_sb)))) ··· 3264 3278 if (retval) 3265 3279 goto out_trace; 3266 3280 3267 - retval = __ext4_unlink(dir, &dentry->d_name, d_inode(dentry)); 3281 + handle = ext4_journal_start(dir, EXT4_HT_DIR, 3282 + EXT4_DATA_TRANS_BLOCKS(dir->i_sb)); 3283 + if (IS_ERR(handle)) { 3284 + retval = PTR_ERR(handle); 3285 + goto out_trace; 3286 + } 3287 + 3288 + retval = __ext4_unlink(handle, dir, &dentry->d_name, d_inode(dentry)); 3268 3289 if (!retval) 3269 - ext4_fc_track_unlink(d_inode(dentry), dentry); 3290 + ext4_fc_track_unlink(handle, dentry); 3270 3291 #ifdef CONFIG_UNICODE 3271 3292 /* VFS negative dentries are incompatible with Encoding and 3272 3293 * Case-insensitiveness. Eventually we'll want avoid ··· 3284 3291 if (IS_CASEFOLDED(dir)) 3285 3292 d_invalidate(dentry); 3286 3293 #endif 3294 + if (handle) 3295 + ext4_journal_stop(handle); 3287 3296 3288 3297 out_trace: 3289 3298 trace_ext4_unlink_exit(dentry, retval); ··· 3442 3447 3443 3448 err = ext4_add_entry(handle, dentry, inode); 3444 3449 if (!err) { 3445 - ext4_fc_track_link(inode, dentry); 3446 3450 err = ext4_mark_inode_dirty(handle, inode); 3447 3451 /* this can happen only for tmpfile being 3448 3452 * linked the first time ··· 3449 3455 if (inode->i_nlink == 1) 3450 3456 ext4_orphan_del(handle, inode); 3451 3457 d_instantiate(dentry, inode); 3458 + ext4_fc_track_link(handle, dentry); 3452 3459 } else { 3453 3460 drop_nlink(inode); 3454 3461 iput(inode); ··· 3910 3915 EXT4_FC_REASON_RENAME_DIR); 3911 3916 } else { 3912 3917 if (new.inode) 3913 - ext4_fc_track_unlink(new.inode, new.dentry); 3914 - ext4_fc_track_link(old.inode, new.dentry); 3915 - ext4_fc_track_unlink(old.inode, old.dentry); 3918 + ext4_fc_track_unlink(handle, new.dentry); 3919 + __ext4_fc_track_link(handle, old.inode, new.dentry); 3920 + __ext4_fc_track_unlink(handle, old.inode, old.dentry); 3916 3921 } 3917 3922 3918 3923 if (new.inode) {

+24 -23

fs/ext4/super.c

··· 686 686 if (!test_opt(sb, ERRORS_CONT)) { 687 687 journal_t *journal = EXT4_SB(sb)->s_journal; 688 688 689 - EXT4_SB(sb)->s_mount_flags |= EXT4_MF_FS_ABORTED; 689 + ext4_set_mount_flag(sb, EXT4_MF_FS_ABORTED); 690 690 if (journal) 691 691 jbd2_journal_abort(journal, -EIO); 692 692 } ··· 904 904 va_end(args); 905 905 906 906 if (sb_rdonly(sb) == 0) { 907 - EXT4_SB(sb)->s_mount_flags |= EXT4_MF_FS_ABORTED; 907 + ext4_set_mount_flag(sb, EXT4_MF_FS_ABORTED); 908 908 if (EXT4_SB(sb)->s_journal) 909 909 jbd2_journal_abort(EXT4_SB(sb)->s_journal, -EIO); 910 910 ··· 1716 1716 Opt_dioread_nolock, Opt_dioread_lock, 1717 1717 Opt_discard, Opt_nodiscard, Opt_init_itable, Opt_noinit_itable, 1718 1718 Opt_max_dir_size_kb, Opt_nojournal_checksum, Opt_nombcache, 1719 - Opt_prefetch_block_bitmaps, Opt_no_fc, 1719 + Opt_prefetch_block_bitmaps, 1720 1720 #ifdef CONFIG_EXT4_DEBUG 1721 - Opt_fc_debug_max_replay, 1721 + Opt_fc_debug_max_replay, Opt_fc_debug_force 1722 1722 #endif 1723 - Opt_fc_debug_force 1724 1723 }; 1725 1724 1726 1725 static const match_table_t tokens = { ··· 1806 1807 {Opt_init_itable, "init_itable=%u"}, 1807 1808 {Opt_init_itable, "init_itable"}, 1808 1809 {Opt_noinit_itable, "noinit_itable"}, 1809 - {Opt_no_fc, "no_fc"}, 1810 - {Opt_fc_debug_force, "fc_debug_force"}, 1811 1810 #ifdef CONFIG_EXT4_DEBUG 1811 + {Opt_fc_debug_force, "fc_debug_force"}, 1812 1812 {Opt_fc_debug_max_replay, "fc_debug_max_replay=%u"}, 1813 1813 #endif 1814 1814 {Opt_max_dir_size_kb, "max_dir_size_kb=%u"}, ··· 2025 2027 {Opt_noquota, (EXT4_MOUNT_QUOTA | EXT4_MOUNT_USRQUOTA | 2026 2028 EXT4_MOUNT_GRPQUOTA | EXT4_MOUNT_PRJQUOTA), 2027 2029 MOPT_CLEAR | MOPT_Q}, 2028 - {Opt_usrjquota, 0, MOPT_Q}, 2029 - {Opt_grpjquota, 0, MOPT_Q}, 2030 + {Opt_usrjquota, 0, MOPT_Q | MOPT_STRING}, 2031 + {Opt_grpjquota, 0, MOPT_Q | MOPT_STRING}, 2030 2032 {Opt_offusrjquota, 0, MOPT_Q}, 2031 2033 {Opt_offgrpjquota, 0, MOPT_Q}, 2032 2034 {Opt_jqfmt_vfsold, QFMT_VFS_OLD, MOPT_QFMT}, ··· 2037 2039 {Opt_nombcache, EXT4_MOUNT_NO_MBCACHE, MOPT_SET}, 2038 2040 {Opt_prefetch_block_bitmaps, EXT4_MOUNT_PREFETCH_BLOCK_BITMAPS, 2039 2041 MOPT_SET}, 2040 - {Opt_no_fc, EXT4_MOUNT2_JOURNAL_FAST_COMMIT, 2041 - MOPT_CLEAR | MOPT_2 | MOPT_EXT4_ONLY}, 2042 + #ifdef CONFIG_EXT4_DEBUG 2042 2043 {Opt_fc_debug_force, EXT4_MOUNT2_JOURNAL_FAST_COMMIT, 2043 2044 MOPT_SET | MOPT_2 | MOPT_EXT4_ONLY}, 2044 - #ifdef CONFIG_EXT4_DEBUG 2045 2045 {Opt_fc_debug_max_replay, 0, MOPT_GTE0}, 2046 2046 #endif 2047 2047 {Opt_err, 0, 0} ··· 2149 2153 ext4_msg(sb, KERN_WARNING, "Ignoring removed %s option", opt); 2150 2154 return 1; 2151 2155 case Opt_abort: 2152 - sbi->s_mount_flags |= EXT4_MF_FS_ABORTED; 2156 + ext4_set_mount_flag(sb, EXT4_MF_FS_ABORTED); 2153 2157 return 1; 2154 2158 case Opt_i_version: 2155 2159 sb->s_flags |= SB_I_VERSION; ··· 3972 3976 * loaded or not 3973 3977 */ 3974 3978 if (sbi->s_journal && !sbi->s_journal_bdev) 3975 - overhead += EXT4_NUM_B2C(sbi, sbi->s_journal->j_maxlen); 3979 + overhead += EXT4_NUM_B2C(sbi, sbi->s_journal->j_total_len); 3976 3980 else if (ext4_has_feature_journal(sb) && !sbi->s_journal && j_inum) { 3977 3981 /* j_inum for internal journal is non-zero */ 3978 3982 j_inode = ext4_get_journal_inode(sb, j_inum); ··· 4336 4340 #endif 4337 4341 4338 4342 if (test_opt(sb, DATA_FLAGS) == EXT4_MOUNT_JOURNAL_DATA) { 4339 - printk_once(KERN_WARNING "EXT4-fs: Warning: mounting with data=journal disables delayed allocation, dioread_nolock, and O_DIRECT support!\n"); 4343 + printk_once(KERN_WARNING "EXT4-fs: Warning: mounting with data=journal disables delayed allocation, dioread_nolock, O_DIRECT and fast_commit support!\n"); 4340 4344 /* can't mount with both data=journal and dioread_nolock. */ 4341 4345 clear_opt(sb, DIOREAD_NOLOCK); 4346 + clear_opt2(sb, JOURNAL_FAST_COMMIT); 4342 4347 if (test_opt2(sb, EXPLICIT_DELALLOC)) { 4343 4348 ext4_msg(sb, KERN_ERR, "can't mount with " 4344 4349 "both data=journal and delalloc"); ··· 4774 4777 INIT_LIST_HEAD(&sbi->s_fc_dentry_q[FC_Q_MAIN]); 4775 4778 INIT_LIST_HEAD(&sbi->s_fc_dentry_q[FC_Q_STAGING]); 4776 4779 sbi->s_fc_bytes = 0; 4777 - sbi->s_mount_flags &= ~EXT4_MF_FC_INELIGIBLE; 4778 - sbi->s_mount_flags &= ~EXT4_MF_FC_COMMITTING; 4780 + ext4_clear_mount_flag(sb, EXT4_MF_FC_INELIGIBLE); 4781 + ext4_clear_mount_flag(sb, EXT4_MF_FC_COMMITTING); 4779 4782 spin_lock_init(&sbi->s_fc_lock); 4780 4783 memset(&sbi->s_fc_stats, 0, sizeof(sbi->s_fc_stats)); 4781 4784 sbi->s_fc_replay_state.fc_regions = NULL; ··· 4851 4854 if (!set_journal_csum_feature_set(sb)) { 4852 4855 ext4_msg(sb, KERN_ERR, "Failed to set journal checksum " 4853 4856 "feature set"); 4857 + goto failed_mount_wq; 4858 + } 4859 + 4860 + if (test_opt2(sb, JOURNAL_FAST_COMMIT) && 4861 + !jbd2_journal_set_features(EXT4_SB(sb)->s_journal, 0, 0, 4862 + JBD2_FEATURE_INCOMPAT_FAST_COMMIT)) { 4863 + ext4_msg(sb, KERN_ERR, 4864 + "Failed to set fast commit journal feature"); 4854 4865 goto failed_mount_wq; 4855 4866 } 4856 4867 ··· 5877 5872 goto restore_opts; 5878 5873 } 5879 5874 5880 - if (sbi->s_mount_flags & EXT4_MF_FS_ABORTED) 5875 + if (ext4_test_mount_flag(sb, EXT4_MF_FS_ABORTED)) 5881 5876 ext4_abort(sb, EXT4_ERR_ESHUTDOWN, "Abort forced by user"); 5882 5877 5883 5878 sb->s_flags = (sb->s_flags & ~SB_POSIXACL) | ··· 5891 5886 } 5892 5887 5893 5888 if ((bool)(*flags & SB_RDONLY) != sb_rdonly(sb)) { 5894 - if (sbi->s_mount_flags & EXT4_MF_FS_ABORTED) { 5889 + if (ext4_test_mount_flag(sb, EXT4_MF_FS_ABORTED)) { 5895 5890 err = -EROFS; 5896 5891 goto restore_opts; 5897 5892 } ··· 6565 6560 brelse(bh); 6566 6561 out: 6567 6562 if (inode->i_size < off + len) { 6568 - ext4_fc_track_range(inode, 6569 - (inode->i_size > 0 ? inode->i_size - 1 : 0) 6570 - >> inode->i_sb->s_blocksize_bits, 6571 - (off + len) >> inode->i_sb->s_blocksize_bits); 6572 6563 i_size_write(inode, off + len); 6573 6564 EXT4_I(inode)->i_disksize = inode->i_size; 6574 6565 err2 = ext4_mark_inode_dirty(handle, inode);

+4

fs/io-wq.c

··· 482 482 current->files = work->identity->files; 483 483 current->nsproxy = work->identity->nsproxy; 484 484 task_unlock(current); 485 + if (!work->identity->files) { 486 + /* failed grabbing files, ensure work gets cancelled */ 487 + work->flags |= IO_WQ_WORK_CANCEL; 488 + } 485 489 } 486 490 if ((work->flags & IO_WQ_WORK_FS) && current->fs != work->identity->fs) 487 491 current->fs = work->identity->fs;

+136 -47

fs/io_uring.c

··· 995 995 if (mm) { 996 996 kthread_unuse_mm(mm); 997 997 mmput(mm); 998 + current->mm = NULL; 998 999 } 999 1000 } 1000 1001 1001 1002 static int __io_sq_thread_acquire_mm(struct io_ring_ctx *ctx) 1002 1003 { 1003 - if (!current->mm) { 1004 - if (unlikely(!(ctx->flags & IORING_SETUP_SQPOLL) || 1005 - !ctx->sqo_task->mm || 1006 - !mmget_not_zero(ctx->sqo_task->mm))) 1007 - return -EFAULT; 1008 - kthread_use_mm(ctx->sqo_task->mm); 1004 + struct mm_struct *mm; 1005 + 1006 + if (current->mm) 1007 + return 0; 1008 + 1009 + /* Should never happen */ 1010 + if (unlikely(!(ctx->flags & IORING_SETUP_SQPOLL))) 1011 + return -EFAULT; 1012 + 1013 + task_lock(ctx->sqo_task); 1014 + mm = ctx->sqo_task->mm; 1015 + if (unlikely(!mm || !mmget_not_zero(mm))) 1016 + mm = NULL; 1017 + task_unlock(ctx->sqo_task); 1018 + 1019 + if (mm) { 1020 + kthread_use_mm(mm); 1021 + return 0; 1009 1022 } 1010 1023 1011 - return 0; 1024 + return -EFAULT; 1012 1025 } 1013 1026 1014 1027 static int io_sq_thread_acquire_mm(struct io_ring_ctx *ctx, ··· 1287 1274 /* add one for this request */ 1288 1275 refcount_inc(&id->count); 1289 1276 1290 - /* drop old identity, assign new one. one ref for req, one for tctx */ 1291 - if (req->work.identity != tctx->identity && 1292 - refcount_sub_and_test(2, &req->work.identity->count)) 1277 + /* drop tctx and req identity references, if needed */ 1278 + if (tctx->identity != &tctx->__identity && 1279 + refcount_dec_and_test(&tctx->identity->count)) 1280 + kfree(tctx->identity); 1281 + if (req->work.identity != &tctx->__identity && 1282 + refcount_dec_and_test(&req->work.identity->count)) 1293 1283 kfree(req->work.identity); 1294 1284 1295 1285 req->work.identity = id; ··· 1593 1577 } 1594 1578 } 1595 1579 1596 - static inline bool io_match_files(struct io_kiocb *req, 1597 - struct files_struct *files) 1580 + static inline bool __io_match_files(struct io_kiocb *req, 1581 + struct files_struct *files) 1598 1582 { 1583 + return ((req->flags & REQ_F_WORK_INITIALIZED) && 1584 + (req->work.flags & IO_WQ_WORK_FILES)) && 1585 + req->work.identity->files == files; 1586 + } 1587 + 1588 + static bool io_match_files(struct io_kiocb *req, 1589 + struct files_struct *files) 1590 + { 1591 + struct io_kiocb *link; 1592 + 1599 1593 if (!files) 1600 1594 return true; 1601 - if ((req->flags & REQ_F_WORK_INITIALIZED) && 1602 - (req->work.flags & IO_WQ_WORK_FILES)) 1603 - return req->work.identity->files == files; 1595 + if (__io_match_files(req, files)) 1596 + return true; 1597 + if (req->flags & REQ_F_LINK_HEAD) { 1598 + list_for_each_entry(link, &req->link_list, link_list) { 1599 + if (__io_match_files(link, files)) 1600 + return true; 1601 + } 1602 + } 1604 1603 return false; 1605 1604 } 1606 1605 ··· 1699 1668 WRITE_ONCE(cqe->user_data, req->user_data); 1700 1669 WRITE_ONCE(cqe->res, res); 1701 1670 WRITE_ONCE(cqe->flags, cflags); 1702 - } else if (ctx->cq_overflow_flushed || req->task->io_uring->in_idle) { 1671 + } else if (ctx->cq_overflow_flushed || 1672 + atomic_read(&req->task->io_uring->in_idle)) { 1703 1673 /* 1704 1674 * If we're in ring overflow flush mode, or in task cancel mode, 1705 1675 * then we cannot store the request for later flushing, we need ··· 1870 1838 io_dismantle_req(req); 1871 1839 1872 1840 percpu_counter_dec(&tctx->inflight); 1873 - if (tctx->in_idle) 1841 + if (atomic_read(&tctx->in_idle)) 1874 1842 wake_up(&tctx->wait); 1875 1843 put_task_struct(req->task); 1876 1844 ··· 7727 7695 xa_init(&tctx->xa); 7728 7696 init_waitqueue_head(&tctx->wait); 7729 7697 tctx->last = NULL; 7730 - tctx->in_idle = 0; 7698 + atomic_set(&tctx->in_idle, 0); 7699 + tctx->sqpoll = false; 7731 7700 io_init_identity(&tctx->__identity); 7732 7701 tctx->identity = &tctx->__identity; 7733 7702 task->io_uring = tctx; ··· 8421 8388 return false; 8422 8389 } 8423 8390 8424 - static bool io_match_link_files(struct io_kiocb *req, 8425 - struct files_struct *files) 8426 - { 8427 - struct io_kiocb *link; 8428 - 8429 - if (io_match_files(req, files)) 8430 - return true; 8431 - if (req->flags & REQ_F_LINK_HEAD) { 8432 - list_for_each_entry(link, &req->link_list, link_list) { 8433 - if (io_match_files(link, files)) 8434 - return true; 8435 - } 8436 - } 8437 - return false; 8438 - } 8439 - 8440 8391 /* 8441 8392 * We're looking to cancel 'req' because it's holding on to our files, but 8442 8393 * 'req' could be a link to another request. See if it is, and cancel that ··· 8470 8453 8471 8454 static bool io_cancel_link_cb(struct io_wq_work *work, void *data) 8472 8455 { 8473 - return io_match_link(container_of(work, struct io_kiocb, work), data); 8456 + struct io_kiocb *req = container_of(work, struct io_kiocb, work); 8457 + bool ret; 8458 + 8459 + if (req->flags & REQ_F_LINK_TIMEOUT) { 8460 + unsigned long flags; 8461 + struct io_ring_ctx *ctx = req->ctx; 8462 + 8463 + /* protect against races with linked timeouts */ 8464 + spin_lock_irqsave(&ctx->completion_lock, flags); 8465 + ret = io_match_link(req, data); 8466 + spin_unlock_irqrestore(&ctx->completion_lock, flags); 8467 + } else { 8468 + ret = io_match_link(req, data); 8469 + } 8470 + return ret; 8474 8471 } 8475 8472 8476 8473 static void io_attempt_cancel(struct io_ring_ctx *ctx, struct io_kiocb *req) ··· 8510 8479 } 8511 8480 8512 8481 static void io_cancel_defer_files(struct io_ring_ctx *ctx, 8482 + struct task_struct *task, 8513 8483 struct files_struct *files) 8514 8484 { 8515 8485 struct io_defer_entry *de = NULL; ··· 8518 8486 8519 8487 spin_lock_irq(&ctx->completion_lock); 8520 8488 list_for_each_entry_reverse(de, &ctx->defer_list, list) { 8521 - if (io_match_link_files(de->req, files)) { 8489 + if (io_task_match(de->req, task) && 8490 + io_match_files(de->req, files)) { 8522 8491 list_cut_position(&list, &ctx->defer_list, &de->list); 8523 8492 break; 8524 8493 } ··· 8545 8512 if (list_empty_careful(&ctx->inflight_list)) 8546 8513 return false; 8547 8514 8548 - io_cancel_defer_files(ctx, files); 8549 8515 /* cancel all at once, should be faster than doing it one by one*/ 8550 8516 io_wq_cancel_cb(ctx->io_wq, io_wq_files_match, files, true); 8551 8517 ··· 8630 8598 { 8631 8599 struct task_struct *task = current; 8632 8600 8633 - if ((ctx->flags & IORING_SETUP_SQPOLL) && ctx->sq_data) 8601 + if ((ctx->flags & IORING_SETUP_SQPOLL) && ctx->sq_data) { 8634 8602 task = ctx->sq_data->thread; 8603 + atomic_inc(&task->io_uring->in_idle); 8604 + io_sq_thread_park(ctx->sq_data); 8605 + } 8606 + 8607 + if (files) 8608 + io_cancel_defer_files(ctx, NULL, files); 8609 + else 8610 + io_cancel_defer_files(ctx, task, NULL); 8635 8611 8636 8612 io_cqring_overflow_flush(ctx, true, task, files); 8637 8613 ··· 8647 8607 io_run_task_work(); 8648 8608 cond_resched(); 8649 8609 } 8610 + 8611 + if ((ctx->flags & IORING_SETUP_SQPOLL) && ctx->sq_data) { 8612 + atomic_dec(&task->io_uring->in_idle); 8613 + /* 8614 + * If the files that are going away are the ones in the thread 8615 + * identity, clear them out. 8616 + */ 8617 + if (task->io_uring->identity->files == files) 8618 + task->io_uring->identity->files = NULL; 8619 + io_sq_thread_unpark(ctx->sq_data); 8620 + } 8650 8621 } 8651 8622 8652 8623 /* 8653 8624 * Note that this task has used io_uring. We use it for cancelation purposes. 8654 8625 */ 8655 - static int io_uring_add_task_file(struct file *file) 8626 + static int io_uring_add_task_file(struct io_ring_ctx *ctx, struct file *file) 8656 8627 { 8657 8628 struct io_uring_task *tctx = current->io_uring; 8658 8629 ··· 8684 8633 } 8685 8634 tctx->last = file; 8686 8635 } 8636 + 8637 + /* 8638 + * This is race safe in that the task itself is doing this, hence it 8639 + * cannot be going through the exit/cancel paths at the same time. 8640 + * This cannot be modified while exit/cancel is running. 8641 + */ 8642 + if (!tctx->sqpoll && (ctx->flags & IORING_SETUP_SQPOLL)) 8643 + tctx->sqpoll = true; 8687 8644 8688 8645 return 0; 8689 8646 } ··· 8734 8675 unsigned long index; 8735 8676 8736 8677 /* make sure overflow events are dropped */ 8737 - tctx->in_idle = true; 8678 + atomic_inc(&tctx->in_idle); 8738 8679 8739 8680 xa_for_each(&tctx->xa, index, file) { 8740 8681 struct io_ring_ctx *ctx = file->private_data; ··· 8743 8684 if (files) 8744 8685 io_uring_del_task_file(file); 8745 8686 } 8687 + 8688 + atomic_dec(&tctx->in_idle); 8689 + } 8690 + 8691 + static s64 tctx_inflight(struct io_uring_task *tctx) 8692 + { 8693 + unsigned long index; 8694 + struct file *file; 8695 + s64 inflight; 8696 + 8697 + inflight = percpu_counter_sum(&tctx->inflight); 8698 + if (!tctx->sqpoll) 8699 + return inflight; 8700 + 8701 + /* 8702 + * If we have SQPOLL rings, then we need to iterate and find them, and 8703 + * add the pending count for those. 8704 + */ 8705 + xa_for_each(&tctx->xa, index, file) { 8706 + struct io_ring_ctx *ctx = file->private_data; 8707 + 8708 + if (ctx->flags & IORING_SETUP_SQPOLL) { 8709 + struct io_uring_task *__tctx = ctx->sqo_task->io_uring; 8710 + 8711 + inflight += percpu_counter_sum(&__tctx->inflight); 8712 + } 8713 + } 8714 + 8715 + return inflight; 8746 8716 } 8747 8717 8748 8718 /* ··· 8785 8697 s64 inflight; 8786 8698 8787 8699 /* make sure overflow events are dropped */ 8788 - tctx->in_idle = true; 8700 + atomic_inc(&tctx->in_idle); 8789 8701 8790 8702 do { 8791 8703 /* read completions before cancelations */ 8792 - inflight = percpu_counter_sum(&tctx->inflight); 8704 + inflight = tctx_inflight(tctx); 8793 8705 if (!inflight) 8794 8706 break; 8795 8707 __io_uring_files_cancel(NULL); ··· 8800 8712 * If we've seen completions, retry. This avoids a race where 8801 8713 * a completion comes in before we did prepare_to_wait(). 8802 8714 */ 8803 - if (inflight != percpu_counter_sum(&tctx->inflight)) 8715 + if (inflight != tctx_inflight(tctx)) 8804 8716 continue; 8805 8717 schedule(); 8806 8718 } while (1); 8807 8719 8808 8720 finish_wait(&tctx->wait, &wait); 8809 - tctx->in_idle = false; 8721 + atomic_dec(&tctx->in_idle); 8810 8722 } 8811 8723 8812 8724 static int io_uring_flush(struct file *file, void *data) ··· 8951 8863 io_sqpoll_wait_sq(ctx); 8952 8864 submitted = to_submit; 8953 8865 } else if (to_submit) { 8954 - ret = io_uring_add_task_file(f.file); 8866 + ret = io_uring_add_task_file(ctx, f.file); 8955 8867 if (unlikely(ret)) 8956 8868 goto out; 8957 8869 mutex_lock(&ctx->uring_lock); ··· 8988 8900 #ifdef CONFIG_PROC_FS 8989 8901 static int io_uring_show_cred(int id, void *p, void *data) 8990 8902 { 8991 - const struct cred *cred = p; 8903 + struct io_identity *iod = p; 8904 + const struct cred *cred = iod->creds; 8992 8905 struct seq_file *m = data; 8993 8906 struct user_namespace *uns = seq_user_ns(m); 8994 8907 struct group_info *gi; ··· 9181 9092 #if defined(CONFIG_UNIX) 9182 9093 ctx->ring_sock->file = file; 9183 9094 #endif 9184 - if (unlikely(io_uring_add_task_file(file))) { 9095 + if (unlikely(io_uring_add_task_file(ctx, file))) { 9185 9096 file = ERR_PTR(-ENOMEM); 9186 9097 goto err_fd; 9187 9098 }

+10 -20

fs/iomap/buffered-io.c

··· 1374 1374 WARN_ON_ONCE(!wpc->ioend && !list_empty(&submit_list)); 1375 1375 WARN_ON_ONCE(!PageLocked(page)); 1376 1376 WARN_ON_ONCE(PageWriteback(page)); 1377 + WARN_ON_ONCE(PageDirty(page)); 1377 1378 1378 1379 /* 1379 1380 * We cannot cancel the ioend directly here on error. We may have ··· 1383 1382 * appropriately. 1384 1383 */ 1385 1384 if (unlikely(error)) { 1385 + /* 1386 + * Let the filesystem know what portion of the current page 1387 + * failed to map. If the page wasn't been added to ioend, it 1388 + * won't be affected by I/O completion and we must unlock it 1389 + * now. 1390 + */ 1391 + if (wpc->ops->discard_page) 1392 + wpc->ops->discard_page(page, file_offset); 1386 1393 if (!count) { 1387 - /* 1388 - * If the current page hasn't been added to ioend, it 1389 - * won't be affected by I/O completions and we must 1390 - * discard and unlock it right here. 1391 - */ 1392 - if (wpc->ops->discard_page) 1393 - wpc->ops->discard_page(page); 1394 1394 ClearPageUptodate(page); 1395 1395 unlock_page(page); 1396 1396 goto done; 1397 1397 } 1398 - 1399 - /* 1400 - * If the page was not fully cleaned, we need to ensure that the 1401 - * higher layers come back to it correctly. That means we need 1402 - * to keep the page dirty, and for WB_SYNC_ALL writeback we need 1403 - * to ensure the PAGECACHE_TAG_TOWRITE index mark is not removed 1404 - * so another attempt to write this page in this writeback sweep 1405 - * will be made. 1406 - */ 1407 - set_page_writeback_keepwrite(page); 1408 - } else { 1409 - clear_page_dirty_for_io(page); 1410 - set_page_writeback(page); 1411 1398 } 1412 1399 1400 + set_page_writeback(page); 1413 1401 unlock_page(page); 1414 1402 1415 1403 /*

+2

fs/jbd2/checkpoint.c

··· 106 106 * for a checkpoint to free up some space in the log. 107 107 */ 108 108 void __jbd2_log_wait_for_space(journal_t *journal) 109 + __acquires(&journal->j_state_lock) 110 + __releases(&journal->j_state_lock) 109 111 { 110 112 int nblocks, space_left; 111 113 /* assert_spin_locked(&journal->j_state_lock); */

+10 -1

fs/jbd2/commit.c

··· 450 450 schedule(); 451 451 write_lock(&journal->j_state_lock); 452 452 finish_wait(&journal->j_fc_wait, &wait); 453 + /* 454 + * TODO: by blocking fast commits here, we are increasing 455 + * fsync() latency slightly. Strictly speaking, we don't need 456 + * to block fast commits until the transaction enters T_FLUSH 457 + * state. So an optimization is possible where we block new fast 458 + * commits here and wait for existing ones to complete 459 + * just before we enter T_FLUSH. That way, the existing fast 460 + * commits and this full commit can proceed parallely. 461 + */ 453 462 } 454 463 write_unlock(&journal->j_state_lock); 455 464 ··· 810 801 if (first_block < journal->j_tail) 811 802 freed += journal->j_last - journal->j_first; 812 803 /* Update tail only if we free significant amount of space */ 813 - if (freed < journal->j_maxlen / 4) 804 + if (freed < jbd2_journal_get_max_txn_bufs(journal)) 814 805 update_tail = 0; 815 806 } 816 807 J_ASSERT(commit_transaction->t_state == T_COMMIT);

+78 -62

fs/jbd2/journal.c

··· 727 727 */ 728 728 int jbd2_fc_begin_commit(journal_t *journal, tid_t tid) 729 729 { 730 + if (unlikely(is_journal_aborted(journal))) 731 + return -EIO; 730 732 /* 731 733 * Fast commits only allowed if at least one full commit has 732 734 * been processed. ··· 736 734 if (!journal->j_stats.ts_tid) 737 735 return -EINVAL; 738 736 739 - if (tid <= journal->j_commit_sequence) 740 - return -EALREADY; 741 - 742 737 write_lock(&journal->j_state_lock); 738 + if (tid <= journal->j_commit_sequence) { 739 + write_unlock(&journal->j_state_lock); 740 + return -EALREADY; 741 + } 742 + 743 743 if (journal->j_flags & JBD2_FULL_COMMIT_ONGOING || 744 744 (journal->j_flags & JBD2_FAST_COMMIT_ONGOING)) { 745 745 DEFINE_WAIT(wait); ··· 781 777 782 778 int jbd2_fc_end_commit(journal_t *journal) 783 779 { 784 - return __jbd2_fc_end_commit(journal, 0, 0); 780 + return __jbd2_fc_end_commit(journal, 0, false); 785 781 } 786 782 EXPORT_SYMBOL(jbd2_fc_end_commit); 787 783 788 - int jbd2_fc_end_commit_fallback(journal_t *journal, tid_t tid) 784 + int jbd2_fc_end_commit_fallback(journal_t *journal) 789 785 { 790 - return __jbd2_fc_end_commit(journal, tid, 1); 786 + tid_t tid; 787 + 788 + read_lock(&journal->j_state_lock); 789 + tid = journal->j_running_transaction ? 790 + journal->j_running_transaction->t_tid : 0; 791 + read_unlock(&journal->j_state_lock); 792 + return __jbd2_fc_end_commit(journal, tid, true); 791 793 } 792 794 EXPORT_SYMBOL(jbd2_fc_end_commit_fallback); 793 795 ··· 875 865 int fc_off; 876 866 877 867 *bh_out = NULL; 878 - write_lock(&journal->j_state_lock); 879 868 880 869 if (journal->j_fc_off + journal->j_fc_first < journal->j_fc_last) { 881 870 fc_off = journal->j_fc_off; ··· 883 874 } else { 884 875 ret = -EINVAL; 885 876 } 886 - write_unlock(&journal->j_state_lock); 887 877 888 878 if (ret) 889 879 return ret; ··· 895 887 if (!bh) 896 888 return -ENOMEM; 897 889 898 - lock_buffer(bh); 899 890 900 - clear_buffer_uptodate(bh); 901 - set_buffer_dirty(bh); 902 - unlock_buffer(bh); 903 891 journal->j_fc_wbuf[fc_off] = bh; 904 892 905 893 *bh_out = bh; ··· 913 909 struct buffer_head *bh; 914 910 int i, j_fc_off; 915 911 916 - read_lock(&journal->j_state_lock); 917 912 j_fc_off = journal->j_fc_off; 918 - read_unlock(&journal->j_state_lock); 919 913 920 914 /* 921 915 * Wait in reverse order to minimize chances of us being woken up before ··· 941 939 struct buffer_head *bh; 942 940 int i, j_fc_off; 943 941 944 - read_lock(&journal->j_state_lock); 945 942 j_fc_off = journal->j_fc_off; 946 - read_unlock(&journal->j_state_lock); 947 943 948 944 /* 949 945 * Wait in reverse order to minimize chances of us being woken up before ··· 1348 1348 journal->j_dev = bdev; 1349 1349 journal->j_fs_dev = fs_dev; 1350 1350 journal->j_blk_offset = start; 1351 - journal->j_maxlen = len; 1351 + journal->j_total_len = len; 1352 1352 /* We need enough buffers to write out full descriptor block. */ 1353 1353 n = journal->j_blocksize / jbd2_min_tag_size(); 1354 1354 journal->j_wbufsize = n; 1355 + journal->j_fc_wbuf = NULL; 1355 1356 journal->j_wbuf = kmalloc_array(n, sizeof(struct buffer_head *), 1356 1357 GFP_KERNEL); 1357 1358 if (!journal->j_wbuf) 1358 1359 goto err_cleanup; 1359 - 1360 - if (journal->j_fc_wbufsize > 0) { 1361 - journal->j_fc_wbuf = kmalloc_array(journal->j_fc_wbufsize, 1362 - sizeof(struct buffer_head *), 1363 - GFP_KERNEL); 1364 - if (!journal->j_fc_wbuf) 1365 - goto err_cleanup; 1366 - } 1367 1360 1368 1361 bh = getblk_unmovable(journal->j_dev, start, journal->j_blocksize); 1369 1362 if (!bh) { ··· 1371 1378 1372 1379 err_cleanup: 1373 1380 kfree(journal->j_wbuf); 1374 - kfree(journal->j_fc_wbuf); 1375 1381 jbd2_journal_destroy_revoke(journal); 1376 1382 kfree(journal); 1377 1383 return NULL; 1378 1384 } 1379 - 1380 - int jbd2_fc_init(journal_t *journal, int num_fc_blks) 1381 - { 1382 - journal->j_fc_wbufsize = num_fc_blks; 1383 - journal->j_fc_wbuf = kmalloc_array(journal->j_fc_wbufsize, 1384 - sizeof(struct buffer_head *), GFP_KERNEL); 1385 - if (!journal->j_fc_wbuf) 1386 - return -ENOMEM; 1387 - return 0; 1388 - } 1389 - EXPORT_SYMBOL(jbd2_fc_init); 1390 1385 1391 1386 /* jbd2_journal_init_dev and jbd2_journal_init_inode: 1392 1387 * ··· 1493 1512 } 1494 1513 1495 1514 journal->j_first = first; 1496 - 1497 - if (jbd2_has_feature_fast_commit(journal) && 1498 - journal->j_fc_wbufsize > 0) { 1499 - journal->j_fc_last = last; 1500 - journal->j_last = last - journal->j_fc_wbufsize; 1501 - journal->j_fc_first = journal->j_last + 1; 1502 - journal->j_fc_off = 0; 1503 - } else { 1504 - journal->j_last = last; 1505 - } 1515 + journal->j_last = last; 1506 1516 1507 1517 journal->j_head = journal->j_first; 1508 1518 journal->j_tail = journal->j_first; ··· 1503 1531 journal->j_commit_sequence = journal->j_transaction_sequence - 1; 1504 1532 journal->j_commit_request = journal->j_commit_sequence; 1505 1533 1506 - journal->j_max_transaction_buffers = journal->j_maxlen / 4; 1534 + journal->j_max_transaction_buffers = jbd2_journal_get_max_txn_bufs(journal); 1535 + 1536 + /* 1537 + * Now that journal recovery is done, turn fast commits off here. This 1538 + * way, if fast commit was enabled before the crash but if now FS has 1539 + * disabled it, we don't enable fast commits. 1540 + */ 1541 + jbd2_clear_feature_fast_commit(journal); 1507 1542 1508 1543 /* 1509 1544 * As a special case, if the on-disk copy is already marked as needing ··· 1771 1792 goto out; 1772 1793 } 1773 1794 1774 - if (be32_to_cpu(sb->s_maxlen) < journal->j_maxlen) 1775 - journal->j_maxlen = be32_to_cpu(sb->s_maxlen); 1776 - else if (be32_to_cpu(sb->s_maxlen) > journal->j_maxlen) { 1795 + if (be32_to_cpu(sb->s_maxlen) < journal->j_total_len) 1796 + journal->j_total_len = be32_to_cpu(sb->s_maxlen); 1797 + else if (be32_to_cpu(sb->s_maxlen) > journal->j_total_len) { 1777 1798 printk(KERN_WARNING "JBD2: journal file too short\n"); 1778 1799 goto out; 1779 1800 } 1780 1801 1781 1802 if (be32_to_cpu(sb->s_first) == 0 || 1782 - be32_to_cpu(sb->s_first) >= journal->j_maxlen) { 1803 + be32_to_cpu(sb->s_first) >= journal->j_total_len) { 1783 1804 printk(KERN_WARNING 1784 1805 "JBD2: Invalid start block of journal: %u\n", 1785 1806 be32_to_cpu(sb->s_first)); ··· 1851 1872 { 1852 1873 int err; 1853 1874 journal_superblock_t *sb; 1875 + int num_fc_blocks; 1854 1876 1855 1877 err = journal_get_superblock(journal); 1856 1878 if (err) ··· 1863 1883 journal->j_tail = be32_to_cpu(sb->s_start); 1864 1884 journal->j_first = be32_to_cpu(sb->s_first); 1865 1885 journal->j_errno = be32_to_cpu(sb->s_errno); 1886 + journal->j_last = be32_to_cpu(sb->s_maxlen); 1866 1887 1867 - if (jbd2_has_feature_fast_commit(journal) && 1868 - journal->j_fc_wbufsize > 0) { 1888 + if (jbd2_has_feature_fast_commit(journal)) { 1869 1889 journal->j_fc_last = be32_to_cpu(sb->s_maxlen); 1870 - journal->j_last = journal->j_fc_last - journal->j_fc_wbufsize; 1890 + num_fc_blocks = be32_to_cpu(sb->s_num_fc_blks); 1891 + if (!num_fc_blocks) 1892 + num_fc_blocks = JBD2_MIN_FC_BLOCKS; 1893 + if (journal->j_last - num_fc_blocks >= JBD2_MIN_JOURNAL_BLOCKS) 1894 + journal->j_last = journal->j_fc_last - num_fc_blocks; 1871 1895 journal->j_fc_first = journal->j_last + 1; 1872 1896 journal->j_fc_off = 0; 1873 - } else { 1874 - journal->j_last = be32_to_cpu(sb->s_maxlen); 1875 1897 } 1876 1898 1877 1899 return 0; ··· 1936 1954 */ 1937 1955 journal->j_flags &= ~JBD2_ABORT; 1938 1956 1939 - if (journal->j_fc_wbufsize > 0) 1940 - jbd2_journal_set_features(journal, 0, 0, 1941 - JBD2_FEATURE_INCOMPAT_FAST_COMMIT); 1942 1957 /* OK, we've finished with the dynamic journal bits: 1943 1958 * reinitialise the dynamic contents of the superblock in memory 1944 1959 * and reset them on disk. */ ··· 2019 2040 jbd2_journal_destroy_revoke(journal); 2020 2041 if (journal->j_chksum_driver) 2021 2042 crypto_free_shash(journal->j_chksum_driver); 2022 - if (journal->j_fc_wbufsize > 0) 2023 - kfree(journal->j_fc_wbuf); 2043 + kfree(journal->j_fc_wbuf); 2024 2044 kfree(journal->j_wbuf); 2025 2045 kfree(journal); 2026 2046 ··· 2094 2116 return 0; 2095 2117 } 2096 2118 2119 + static int 2120 + jbd2_journal_initialize_fast_commit(journal_t *journal) 2121 + { 2122 + journal_superblock_t *sb = journal->j_superblock; 2123 + unsigned long long num_fc_blks; 2124 + 2125 + num_fc_blks = be32_to_cpu(sb->s_num_fc_blks); 2126 + if (num_fc_blks == 0) 2127 + num_fc_blks = JBD2_MIN_FC_BLOCKS; 2128 + if (journal->j_last - num_fc_blks < JBD2_MIN_JOURNAL_BLOCKS) 2129 + return -ENOSPC; 2130 + 2131 + /* Are we called twice? */ 2132 + WARN_ON(journal->j_fc_wbuf != NULL); 2133 + journal->j_fc_wbuf = kmalloc_array(num_fc_blks, 2134 + sizeof(struct buffer_head *), GFP_KERNEL); 2135 + if (!journal->j_fc_wbuf) 2136 + return -ENOMEM; 2137 + 2138 + journal->j_fc_wbufsize = num_fc_blks; 2139 + journal->j_fc_last = journal->j_last; 2140 + journal->j_last = journal->j_fc_last - num_fc_blks; 2141 + journal->j_fc_first = journal->j_last + 1; 2142 + journal->j_fc_off = 0; 2143 + journal->j_free = journal->j_last - journal->j_first; 2144 + journal->j_max_transaction_buffers = 2145 + jbd2_journal_get_max_txn_bufs(journal); 2146 + 2147 + return 0; 2148 + } 2149 + 2097 2150 /** 2098 2151 * int jbd2_journal_set_features() - Mark a given journal feature in the superblock 2099 2152 * @journal: Journal to act on. ··· 2167 2158 compat, ro, incompat); 2168 2159 2169 2160 sb = journal->j_superblock; 2161 + 2162 + if (incompat & JBD2_FEATURE_INCOMPAT_FAST_COMMIT) { 2163 + if (jbd2_journal_initialize_fast_commit(journal)) { 2164 + pr_err("JBD2: Cannot enable fast commits.\n"); 2165 + return 0; 2166 + } 2167 + } 2170 2168 2171 2169 /* Load the checksum driver if necessary */ 2172 2170 if ((journal->j_chksum_driver == NULL) &&

+3 -3

fs/jbd2/recovery.c

··· 74 74 75 75 /* Do up to 128K of readahead */ 76 76 max = start + (128 * 1024 / journal->j_blocksize); 77 - if (max > journal->j_maxlen) 78 - max = journal->j_maxlen; 77 + if (max > journal->j_total_len) 78 + max = journal->j_total_len; 79 79 80 80 /* Do the readahead itself. We'll submit MAXBUF buffer_heads at 81 81 * a time to the block device IO layer. */ ··· 134 134 135 135 *bhp = NULL; 136 136 137 - if (offset >= journal->j_maxlen) { 137 + if (offset >= journal->j_total_len) { 138 138 printk(KERN_ERR "JBD2: corrupted journal superblock\n"); 139 139 return -EFSCORRUPTED; 140 140 }

+3 -1

fs/jbd2/transaction.c

··· 195 195 DEFINE_WAIT(wait); 196 196 197 197 if (WARN_ON(!journal->j_running_transaction || 198 - journal->j_running_transaction->t_state != T_SWITCH)) 198 + journal->j_running_transaction->t_state != T_SWITCH)) { 199 + read_unlock(&journal->j_state_lock); 199 200 return; 201 + } 200 202 prepare_to_wait(&journal->j_wait_transaction_locked, &wait, 201 203 TASK_UNINTERRUPTIBLE); 202 204 read_unlock(&journal->j_state_lock);

+5 -10

fs/nfs/dir.c

··· 955 955 956 956 static loff_t nfs_llseek_dir(struct file *filp, loff_t offset, int whence) 957 957 { 958 - struct inode *inode = file_inode(filp); 959 958 struct nfs_open_dir_context *dir_ctx = filp->private_data; 960 959 961 960 dfprintk(FILE, "NFS: llseek dir(%pD2, %lld, %d)\n", ··· 966 967 case SEEK_SET: 967 968 if (offset < 0) 968 969 return -EINVAL; 969 - inode_lock(inode); 970 + spin_lock(&filp->f_lock); 970 971 break; 971 972 case SEEK_CUR: 972 973 if (offset == 0) 973 974 return filp->f_pos; 974 - inode_lock(inode); 975 + spin_lock(&filp->f_lock); 975 976 offset += filp->f_pos; 976 977 if (offset < 0) { 977 - inode_unlock(inode); 978 + spin_unlock(&filp->f_lock); 978 979 return -EINVAL; 979 980 } 980 981 } ··· 986 987 dir_ctx->dir_cookie = 0; 987 988 dir_ctx->duped = 0; 988 989 } 989 - inode_unlock(inode); 990 + spin_unlock(&filp->f_lock); 990 991 return offset; 991 992 } 992 993 ··· 997 998 static int nfs_fsync_dir(struct file *filp, loff_t start, loff_t end, 998 999 int datasync) 999 1000 { 1000 - struct inode *inode = file_inode(filp); 1001 - 1002 1001 dfprintk(FILE, "NFS: fsync dir(%pD2) datasync %d\n", filp, datasync); 1003 1002 1004 - inode_lock(inode); 1005 - nfs_inc_stats(inode, NFSIOS_VFSFSYNC); 1006 - inode_unlock(inode); 1003 + nfs_inc_stats(file_inode(filp), NFSIOS_VFSFSYNC); 1007 1004 return 0; 1008 1005 } 1009 1006

+2

fs/nfs/nfs42xattr.c

··· 1047 1047 1048 1048 void nfs4_xattr_cache_exit(void) 1049 1049 { 1050 + unregister_shrinker(&nfs4_xattr_large_entry_shrinker); 1050 1051 unregister_shrinker(&nfs4_xattr_entry_shrinker); 1051 1052 unregister_shrinker(&nfs4_xattr_cache_shrinker); 1053 + list_lru_destroy(&nfs4_xattr_large_entry_lru); 1052 1054 list_lru_destroy(&nfs4_xattr_entry_lru); 1053 1055 list_lru_destroy(&nfs4_xattr_cache_lru); 1054 1056 kmem_cache_destroy(nfs4_xattr_cache_cachep);

+2 -2

fs/nfs/nfs42xdr.c

··· 196 196 1 + nfs4_xattr_name_maxsz + 1) 197 197 #define decode_setxattr_maxsz (op_decode_hdr_maxsz + decode_change_info_maxsz) 198 198 #define encode_listxattrs_maxsz (op_encode_hdr_maxsz + 2 + 1) 199 - #define decode_listxattrs_maxsz (op_decode_hdr_maxsz + 2 + 1 + 1) 199 + #define decode_listxattrs_maxsz (op_decode_hdr_maxsz + 2 + 1 + 1 + 1) 200 200 #define encode_removexattr_maxsz (op_encode_hdr_maxsz + 1 + \ 201 201 nfs4_xattr_name_maxsz) 202 202 #define decode_removexattr_maxsz (op_decode_hdr_maxsz + \ ··· 531 531 { 532 532 __be32 *p; 533 533 534 - encode_op_hdr(xdr, OP_LISTXATTRS, decode_listxattrs_maxsz + 1, hdr); 534 + encode_op_hdr(xdr, OP_LISTXATTRS, decode_listxattrs_maxsz, hdr); 535 535 536 536 p = reserve_space(xdr, 12); 537 537 if (unlikely(!p))

+6

fs/nfs/nfsroot.c

··· 88 88 #define NFS_ROOT "/tftpboot/%s" 89 89 90 90 /* Default NFSROOT mount options. */ 91 + #if defined(CONFIG_NFS_V2) 91 92 #define NFS_DEF_OPTIONS "vers=2,tcp,rsize=4096,wsize=4096" 93 + #elif defined(CONFIG_NFS_V3) 94 + #define NFS_DEF_OPTIONS "vers=3,tcp,rsize=4096,wsize=4096" 95 + #else 96 + #define NFS_DEF_OPTIONS "vers=4,tcp,rsize=4096,wsize=4096" 97 + #endif 92 98 93 99 /* Parameters passed from the kernel command line */ 94 100 static char nfs_root_parms[NFS_MAXPATHLEN + 1] __initdata = "";

+1 -5

fs/nfsd/nfs3proc.c

··· 316 316 fh_copy(&resp->dirfh, &argp->fh); 317 317 fh_init(&resp->fh, NFS3_FHSIZE); 318 318 319 - if (argp->ftype == 0 || argp->ftype >= NF3BAD) { 320 - resp->status = nfserr_inval; 321 - goto out; 322 - } 323 319 if (argp->ftype == NF3CHR || argp->ftype == NF3BLK) { 324 320 rdev = MKDEV(argp->major, argp->minor); 325 321 if (MAJOR(rdev) != argp->major || ··· 324 328 goto out; 325 329 } 326 330 } else if (argp->ftype != NF3SOCK && argp->ftype != NF3FIFO) { 327 - resp->status = nfserr_inval; 331 + resp->status = nfserr_badtype; 328 332 goto out; 329 333 } 330 334

+1

fs/nfsd/nfs3xdr.c

··· 1114 1114 { 1115 1115 struct nfsd3_pathconfres *resp = rqstp->rq_resp; 1116 1116 1117 + *p++ = resp->status; 1117 1118 *p++ = xdr_zero; /* no post_op_attr */ 1118 1119 1119 1120 if (resp->status == 0) {

+2 -1

fs/nfsd/nfs4proc.c

··· 1299 1299 struct nfsd_file *dst) 1300 1300 { 1301 1301 nfs42_ssc_close(src->nf_file); 1302 - nfsd_file_put(src); 1302 + /* 'src' is freed by nfsd4_do_async_copy */ 1303 1303 nfsd_file_put(dst); 1304 1304 mntput(ss_mnt); 1305 1305 } ··· 1486 1486 cb_copy = kzalloc(sizeof(struct nfsd4_copy), GFP_KERNEL); 1487 1487 if (!cb_copy) 1488 1488 goto out; 1489 + refcount_set(&cb_copy->refcount, 1); 1489 1490 memcpy(&cb_copy->cp_res, &copy->cp_res, sizeof(copy->cp_res)); 1490 1491 cb_copy->cp_clp = copy->cp_clp; 1491 1492 cb_copy->nfserr = copy->nfserr;

+1 -1

fs/ocfs2/journal.c

··· 877 877 goto done; 878 878 } 879 879 880 - trace_ocfs2_journal_init_maxlen(j_journal->j_maxlen); 880 + trace_ocfs2_journal_init_maxlen(j_journal->j_total_len); 881 881 882 882 *dirty = (le32_to_cpu(di->id1.journal1.ij_flags) & 883 883 OCFS2_JOURNAL_DIRTY_FL);

+1 -1

fs/proc/cpuinfo.c

··· 19 19 static const struct proc_ops cpuinfo_proc_ops = { 20 20 .proc_flags = PROC_ENTRY_PERMANENT, 21 21 .proc_open = cpuinfo_open, 22 - .proc_read = seq_read, 22 + .proc_read_iter = seq_read_iter, 23 23 .proc_lseek = seq_lseek, 24 24 .proc_release = seq_release, 25 25 };

+2 -2

fs/proc/generic.c

··· 590 590 static const struct proc_ops proc_seq_ops = { 591 591 /* not permanent -- can call into arbitrary seq_operations */ 592 592 .proc_open = proc_seq_open, 593 - .proc_read = seq_read, 593 + .proc_read_iter = seq_read_iter, 594 594 .proc_lseek = seq_lseek, 595 595 .proc_release = proc_seq_release, 596 596 }; ··· 621 621 static const struct proc_ops proc_single_ops = { 622 622 /* not permanent -- can call into arbitrary ->single_show */ 623 623 .proc_open = proc_single_open, 624 - .proc_read = seq_read, 624 + .proc_read_iter = seq_read_iter, 625 625 .proc_lseek = seq_lseek, 626 626 .proc_release = single_release, 627 627 };

+2

fs/proc/inode.c

··· 597 597 .llseek = proc_reg_llseek, 598 598 .read_iter = proc_reg_read_iter, 599 599 .write = proc_reg_write, 600 + .splice_read = generic_file_splice_read, 600 601 .poll = proc_reg_poll, 601 602 .unlocked_ioctl = proc_reg_unlocked_ioctl, 602 603 .mmap = proc_reg_mmap, ··· 623 622 static const struct file_operations proc_iter_file_ops_compat = { 624 623 .llseek = proc_reg_llseek, 625 624 .read_iter = proc_reg_read_iter, 625 + .splice_read = generic_file_splice_read, 626 626 .write = proc_reg_write, 627 627 .poll = proc_reg_poll, 628 628 .unlocked_ioctl = proc_reg_unlocked_ioctl,

+1 -1

fs/proc/stat.c

··· 226 226 static const struct proc_ops stat_proc_ops = { 227 227 .proc_flags = PROC_ENTRY_PERMANENT, 228 228 .proc_open = stat_open, 229 - .proc_read = seq_read, 229 + .proc_read_iter = seq_read_iter, 230 230 .proc_lseek = seq_lseek, 231 231 .proc_release = single_release, 232 232 };

+32 -13

fs/seq_file.c

··· 18 18 #include <linux/mm.h> 19 19 #include <linux/printk.h> 20 20 #include <linux/string_helpers.h> 21 + #include <linux/uio.h> 21 22 22 23 #include <linux/uaccess.h> 23 24 #include <asm/page.h> ··· 147 146 */ 148 147 ssize_t seq_read(struct file *file, char __user *buf, size_t size, loff_t *ppos) 149 148 { 150 - struct seq_file *m = file->private_data; 149 + struct iovec iov = { .iov_base = buf, .iov_len = size}; 150 + struct kiocb kiocb; 151 + struct iov_iter iter; 152 + ssize_t ret; 153 + 154 + init_sync_kiocb(&kiocb, file); 155 + iov_iter_init(&iter, READ, &iov, 1, size); 156 + 157 + kiocb.ki_pos = *ppos; 158 + ret = seq_read_iter(&kiocb, &iter); 159 + *ppos = kiocb.ki_pos; 160 + return ret; 161 + } 162 + EXPORT_SYMBOL(seq_read); 163 + 164 + /* 165 + * Ready-made ->f_op->read_iter() 166 + */ 167 + ssize_t seq_read_iter(struct kiocb *iocb, struct iov_iter *iter) 168 + { 169 + struct seq_file *m = iocb->ki_filp->private_data; 170 + size_t size = iov_iter_count(iter); 151 171 size_t copied = 0; 152 172 size_t n; 153 173 void *p; ··· 180 158 * if request is to read from zero offset, reset iterator to first 181 159 * record as it might have been already advanced by previous requests 182 160 */ 183 - if (*ppos == 0) { 161 + if (iocb->ki_pos == 0) { 184 162 m->index = 0; 185 163 m->count = 0; 186 164 } 187 165 188 - /* Don't assume *ppos is where we left it */ 189 - if (unlikely(*ppos != m->read_pos)) { 190 - while ((err = traverse(m, *ppos)) == -EAGAIN) 166 + /* Don't assume ki_pos is where we left it */ 167 + if (unlikely(iocb->ki_pos != m->read_pos)) { 168 + while ((err = traverse(m, iocb->ki_pos)) == -EAGAIN) 191 169 ; 192 170 if (err) { 193 171 /* With prejudice... */ ··· 196 174 m->count = 0; 197 175 goto Done; 198 176 } else { 199 - m->read_pos = *ppos; 177 + m->read_pos = iocb->ki_pos; 200 178 } 201 179 } 202 180 ··· 209 187 /* if not empty - flush it first */ 210 188 if (m->count) { 211 189 n = min(m->count, size); 212 - err = copy_to_user(buf, m->buf + m->from, n); 213 - if (err) 190 + if (copy_to_iter(m->buf + m->from, n, iter) != n) 214 191 goto Efault; 215 192 m->count -= n; 216 193 m->from += n; 217 194 size -= n; 218 - buf += n; 219 195 copied += n; 220 196 if (!size) 221 197 goto Done; ··· 274 254 } 275 255 m->op->stop(m, p); 276 256 n = min(m->count, size); 277 - err = copy_to_user(buf, m->buf, n); 278 - if (err) 257 + if (copy_to_iter(m->buf, n, iter) != n) 279 258 goto Efault; 280 259 copied += n; 281 260 m->count -= n; ··· 283 264 if (!copied) 284 265 copied = err; 285 266 else { 286 - *ppos += copied; 267 + iocb->ki_pos += copied; 287 268 m->read_pos += copied; 288 269 } 289 270 mutex_unlock(&m->lock); ··· 295 276 err = -EFAULT; 296 277 goto Done; 297 278 } 298 - EXPORT_SYMBOL(seq_read); 279 + EXPORT_SYMBOL(seq_read_iter); 299 280 300 281 /** 301 282 * seq_lseek - ->llseek() method for sequential files.

+1

fs/xfs/libxfs/xfs_alloc.c

··· 2467 2467 new->xefi_startblock = XFS_AGB_TO_FSB(mp, agno, agbno); 2468 2468 new->xefi_blockcount = 1; 2469 2469 new->xefi_oinfo = *oinfo; 2470 + new->xefi_skip_discard = false; 2470 2471 2471 2472 trace_xfs_agfl_free_defer(mp, agno, 0, agbno, 1); 2472 2473

+1 -1

fs/xfs/libxfs/xfs_bmap.h

··· 52 52 { 53 53 xfs_fsblock_t xefi_startblock;/* starting fs block number */ 54 54 xfs_extlen_t xefi_blockcount;/* number of blocks in extent */ 55 + bool xefi_skip_discard; 55 56 struct list_head xefi_list; 56 57 struct xfs_owner_info xefi_oinfo; /* extent owner */ 57 - bool xefi_skip_discard; 58 58 }; 59 59 60 60 #define XFS_BMAP_MAX_NMAP 4

+1 -2

fs/xfs/scrub/inode.c

··· 121 121 goto bad; 122 122 123 123 /* rt flags require rt device */ 124 - if ((flags & (XFS_DIFLAG_REALTIME | XFS_DIFLAG_RTINHERIT)) && 125 - !mp->m_rtdev_targp) 124 + if ((flags & XFS_DIFLAG_REALTIME) && !mp->m_rtdev_targp) 126 125 goto bad; 127 126 128 127 /* new rt bitmap flag only valid for rbmino */

+12 -8

fs/xfs/xfs_aops.c

··· 346 346 ssize_t count = i_blocksize(inode); 347 347 xfs_fileoff_t offset_fsb = XFS_B_TO_FSBT(mp, offset); 348 348 xfs_fileoff_t end_fsb = XFS_B_TO_FSB(mp, offset + count); 349 - xfs_fileoff_t cow_fsb = NULLFILEOFF; 350 - int whichfork = XFS_DATA_FORK; 349 + xfs_fileoff_t cow_fsb; 350 + int whichfork; 351 351 struct xfs_bmbt_irec imap; 352 352 struct xfs_iext_cursor icur; 353 353 int retries = 0; ··· 381 381 * landed in a hole and we skip the block. 382 382 */ 383 383 retry: 384 + cow_fsb = NULLFILEOFF; 385 + whichfork = XFS_DATA_FORK; 384 386 xfs_ilock(ip, XFS_ILOCK_SHARED); 385 387 ASSERT(ip->i_df.if_format != XFS_DINODE_FMT_BTREE || 386 388 (ip->i_df.if_flags & XFS_IFEXTENTS)); ··· 529 527 */ 530 528 static void 531 529 xfs_discard_page( 532 - struct page *page) 530 + struct page *page, 531 + loff_t fileoff) 533 532 { 534 533 struct inode *inode = page->mapping->host; 535 534 struct xfs_inode *ip = XFS_I(inode); 536 535 struct xfs_mount *mp = ip->i_mount; 537 - loff_t offset = page_offset(page); 538 - xfs_fileoff_t start_fsb = XFS_B_TO_FSBT(mp, offset); 536 + unsigned int pageoff = offset_in_page(fileoff); 537 + xfs_fileoff_t start_fsb = XFS_B_TO_FSBT(mp, fileoff); 538 + xfs_fileoff_t pageoff_fsb = XFS_B_TO_FSBT(mp, pageoff); 539 539 int error; 540 540 541 541 if (XFS_FORCED_SHUTDOWN(mp)) ··· 545 541 546 542 xfs_alert_ratelimited(mp, 547 543 "page discard on page "PTR_FMT", inode 0x%llx, offset %llu.", 548 - page, ip->i_ino, offset); 544 + page, ip->i_ino, fileoff); 549 545 550 546 error = xfs_bmap_punch_delalloc_range(ip, start_fsb, 551 - i_blocks_per_page(inode, page)); 547 + i_blocks_per_page(inode, page) - pageoff_fsb); 552 548 if (error && !XFS_FORCED_SHUTDOWN(mp)) 553 549 xfs_alert(mp, "page discard unable to remove delalloc mapping."); 554 550 out_invalidate: 555 - iomap_invalidatepage(page, 0, PAGE_SIZE); 551 + iomap_invalidatepage(page, pageoff, PAGE_SIZE - pageoff); 556 552 } 557 553 558 554 static const struct iomap_writeback_ops xfs_writeback_ops = {

+10

fs/xfs/xfs_iops.c

··· 911 911 error = iomap_zero_range(inode, oldsize, newsize - oldsize, 912 912 &did_zeroing, &xfs_buffered_write_iomap_ops); 913 913 } else { 914 + /* 915 + * iomap won't detect a dirty page over an unwritten block (or a 916 + * cow block over a hole) and subsequently skips zeroing the 917 + * newly post-EOF portion of the page. Flush the new EOF to 918 + * convert the block before the pagecache truncate. 919 + */ 920 + error = filemap_write_and_wait_range(inode->i_mapping, newsize, 921 + newsize); 922 + if (error) 923 + return error; 914 924 error = iomap_truncate_page(inode, newsize, &did_zeroing, 915 925 &xfs_buffered_write_iomap_ops); 916 926 }

+2 -1

fs/xfs/xfs_reflink.c

··· 1502 1502 &xfs_buffered_write_iomap_ops); 1503 1503 if (error) 1504 1504 goto out; 1505 - error = filemap_write_and_wait(inode->i_mapping); 1505 + 1506 + error = filemap_write_and_wait_range(inode->i_mapping, offset, len); 1506 1507 if (error) 1507 1508 goto out; 1508 1509

-2

include/linux/compiler-gcc.h

··· 175 175 #else 176 176 #define __diag_GCC_8(s) 177 177 #endif 178 - 179 - #define __no_fgcse __attribute__((optimize("-fno-gcse")))

-4

include/linux/compiler_types.h

··· 247 247 #define asm_inline asm 248 248 #endif 249 249 250 - #ifndef __no_fgcse 251 - # define __no_fgcse 252 - #endif 253 - 254 250 /* Are two types/vars the same type (ignoring qualifiers)? */ 255 251 #define __same_type(a, b) __builtin_types_compatible_p(typeof(a), typeof(b)) 256 252

+16 -2

include/linux/cpufreq.h

··· 110 110 bool fast_switch_enabled; 111 111 112 112 /* 113 + * Set if the CPUFREQ_GOV_STRICT_TARGET flag is set for the current 114 + * governor. 115 + */ 116 + bool strict_target; 117 + 118 + /* 113 119 * Preferred average time interval between consecutive invocations of 114 120 * the driver to set the frequency for this policy. To be set by the 115 121 * scaling driver (0, which is the default, means no preference). ··· 576 570 char *buf); 577 571 int (*store_setspeed) (struct cpufreq_policy *policy, 578 572 unsigned int freq); 579 - /* For governors which change frequency dynamically by themselves */ 580 - bool dynamic_switching; 581 573 struct list_head governor_list; 582 574 struct module *owner; 575 + u8 flags; 583 576 }; 577 + 578 + /* Governor flags */ 579 + 580 + /* For governors which change frequency dynamically by themselves */ 581 + #define CPUFREQ_GOV_DYNAMIC_SWITCHING BIT(0) 582 + 583 + /* For governors wanting the target frequency to be set exactly */ 584 + #define CPUFREQ_GOV_STRICT_TARGET BIT(1) 585 + 584 586 585 587 /* Pass a target to the cpufreq driver */ 586 588 unsigned int cpufreq_driver_fast_switch(struct cpufreq_policy *policy,

+11 -11

include/linux/filter.h

··· 558 558 DECLARE_STATIC_KEY_FALSE(bpf_stats_enabled_key); 559 559 560 560 #define __BPF_PROG_RUN(prog, ctx, dfunc) ({ \ 561 - u32 ret; \ 561 + u32 __ret; \ 562 562 cant_migrate(); \ 563 563 if (static_branch_unlikely(&bpf_stats_enabled_key)) { \ 564 - struct bpf_prog_stats *stats; \ 565 - u64 start = sched_clock(); \ 566 - ret = dfunc(ctx, (prog)->insnsi, (prog)->bpf_func); \ 567 - stats = this_cpu_ptr(prog->aux->stats); \ 568 - u64_stats_update_begin(&stats->syncp); \ 569 - stats->cnt++; \ 570 - stats->nsecs += sched_clock() - start; \ 571 - u64_stats_update_end(&stats->syncp); \ 564 + struct bpf_prog_stats *__stats; \ 565 + u64 __start = sched_clock(); \ 566 + __ret = dfunc(ctx, (prog)->insnsi, (prog)->bpf_func); \ 567 + __stats = this_cpu_ptr(prog->aux->stats); \ 568 + u64_stats_update_begin(&__stats->syncp); \ 569 + __stats->cnt++; \ 570 + __stats->nsecs += sched_clock() - __start; \ 571 + u64_stats_update_end(&__stats->syncp); \ 572 572 } else { \ 573 - ret = dfunc(ctx, (prog)->insnsi, (prog)->bpf_func); \ 573 + __ret = dfunc(ctx, (prog)->insnsi, (prog)->bpf_func); \ 574 574 } \ 575 - ret; }) 575 + __ret; }) 576 576 577 577 #define BPF_PROG_RUN(prog, ctx) \ 578 578 __BPF_PROG_RUN(prog, ctx, bpf_dispatcher_nop_func)

+2 -1

include/linux/io_uring.h

··· 30 30 struct percpu_counter inflight; 31 31 struct io_identity __identity; 32 32 struct io_identity *identity; 33 - bool in_idle; 33 + atomic_t in_idle; 34 + bool sqpoll; 34 35 }; 35 36 36 37 #if defined(CONFIG_IO_URING)

+1 -1

include/linux/iomap.h

··· 221 221 * Optional, allows the file system to discard state on a page where 222 222 * we failed to submit any I/O. 223 223 */ 224 - void (*discard_page)(struct page *page); 224 + void (*discard_page)(struct page *page, loff_t fileoff); 225 225 }; 226 226 227 227 struct iomap_writepage_ctx {

+15 -8

include/linux/jbd2.h

··· 68 68 extern void jbd2_free(void *ptr, size_t size); 69 69 70 70 #define JBD2_MIN_JOURNAL_BLOCKS 1024 71 + #define JBD2_MIN_FC_BLOCKS 256 71 72 72 73 #ifdef __KERNEL__ 73 74 ··· 945 944 /** 946 945 * @j_fc_off: 947 946 * 948 - * Number of fast commit blocks currently allocated. 949 - * [j_state_lock]. 947 + * Number of fast commit blocks currently allocated. Accessed only 948 + * during fast commit. Currently only process can do fast commit, so 949 + * this field is not protected by any lock. 950 950 */ 951 951 unsigned long j_fc_off; 952 952 ··· 990 988 struct block_device *j_fs_dev; 991 989 992 990 /** 993 - * @j_maxlen: Total maximum capacity of the journal region on disk. 991 + * @j_total_len: Total maximum capacity of the journal region on disk. 994 992 */ 995 - unsigned int j_maxlen; 993 + unsigned int j_total_len; 996 994 997 995 /** 998 996 * @j_reserved_credits: ··· 1110 1108 struct buffer_head **j_wbuf; 1111 1109 1112 1110 /** 1113 - * @j_fc_wbuf: Array of fast commit bhs for 1114 - * jbd2_journal_commit_transaction. 1111 + * @j_fc_wbuf: Array of fast commit bhs for fast commit. Accessed only 1112 + * during a fast commit. Currently only process can do fast commit, so 1113 + * this field is not protected by any lock. 1115 1114 */ 1116 1115 struct buffer_head **j_fc_wbuf; 1117 1116 ··· 1617 1614 extern int jbd2_cleanup_journal_tail(journal_t *); 1618 1615 1619 1616 /* Fast commit related APIs */ 1620 - int jbd2_fc_init(journal_t *journal, int num_fc_blks); 1621 1617 int jbd2_fc_begin_commit(journal_t *journal, tid_t tid); 1622 1618 int jbd2_fc_end_commit(journal_t *journal); 1623 - int jbd2_fc_end_commit_fallback(journal_t *journal, tid_t tid); 1619 + int jbd2_fc_end_commit_fallback(journal_t *journal); 1624 1620 int jbd2_fc_get_buf(journal_t *journal, struct buffer_head **bh_out); 1625 1621 int jbd2_submit_inode_data(struct jbd2_inode *jinode); 1626 1622 int jbd2_wait_inode_data(journal_t *journal, struct jbd2_inode *jinode); 1627 1623 int jbd2_fc_wait_bufs(journal_t *journal, int num_blks); 1628 1624 int jbd2_fc_release_bufs(journal_t *journal); 1625 + 1626 + static inline int jbd2_journal_get_max_txn_bufs(journal_t *journal) 1627 + { 1628 + return (journal->j_total_len - journal->j_fc_wbufsize) / 4; 1629 + } 1629 1630 1630 1631 /* 1631 1632 * is_journal_abort

+1

include/linux/seq_file.h

··· 107 107 char *mangle_path(char *s, const char *p, const char *esc); 108 108 int seq_open(struct file *, const struct seq_operations *); 109 109 ssize_t seq_read(struct file *, char __user *, size_t, loff_t *); 110 + ssize_t seq_read_iter(struct kiocb *iocb, struct iov_iter *iter); 110 111 loff_t seq_lseek(struct file *, loff_t, int); 111 112 int seq_release(struct inode *, struct file *); 112 113 int seq_write(struct seq_file *seq, const void *data, size_t len);

+3 -7

include/linux/swiotlb.h

··· 45 45 SYNC_FOR_DEVICE = 1, 46 46 }; 47 47 48 - extern phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, 49 - dma_addr_t tbl_dma_addr, 50 - phys_addr_t phys, 51 - size_t mapping_size, 52 - size_t alloc_size, 53 - enum dma_data_direction dir, 54 - unsigned long attrs); 48 + phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t phys, 49 + size_t mapping_size, size_t alloc_size, 50 + enum dma_data_direction dir, unsigned long attrs); 55 51 56 52 extern void swiotlb_tbl_unmap_single(struct device *hwdev, 57 53 phys_addr_t tlb_addr,

+1 -1

include/net/xsk_buff_pool.h

··· 86 86 void xp_destroy(struct xsk_buff_pool *pool); 87 87 void xp_release(struct xdp_buff_xsk *xskb); 88 88 void xp_get_pool(struct xsk_buff_pool *pool); 89 - void xp_put_pool(struct xsk_buff_pool *pool); 89 + bool xp_put_pool(struct xsk_buff_pool *pool); 90 90 void xp_clear_dev(struct xsk_buff_pool *pool); 91 91 void xp_add_xsk(struct xsk_buff_pool *pool, struct xdp_sock *xs); 92 92 void xp_del_xsk(struct xsk_buff_pool *pool, struct xdp_sock *xs);

+6 -4

include/trace/events/ext4.h

··· 100 100 { EXT4_FC_REASON_XATTR, "XATTR"}, \ 101 101 { EXT4_FC_REASON_CROSS_RENAME, "CROSS_RENAME"}, \ 102 102 { EXT4_FC_REASON_JOURNAL_FLAG_CHANGE, "JOURNAL_FLAG_CHANGE"}, \ 103 - { EXT4_FC_REASON_MEM, "NO_MEM"}, \ 103 + { EXT4_FC_REASON_NOMEM, "NO_MEM"}, \ 104 104 { EXT4_FC_REASON_SWAP_BOOT, "SWAP_BOOT"}, \ 105 105 { EXT4_FC_REASON_RESIZE, "RESIZE"}, \ 106 106 { EXT4_FC_REASON_RENAME_DIR, "RENAME_DIR"}, \ 107 - { EXT4_FC_REASON_FALLOC_RANGE, "FALLOC_RANGE"}) 107 + { EXT4_FC_REASON_FALLOC_RANGE, "FALLOC_RANGE"}, \ 108 + { EXT4_FC_REASON_INODE_JOURNAL_DATA, "INODE_JOURNAL_DATA"}) 108 109 109 110 TRACE_EVENT(ext4_other_inode_update_time, 110 111 TP_PROTO(struct inode *inode, ino_t orig_ino), ··· 2918 2917 ), 2919 2918 2920 2919 TP_printk("dev %d:%d fc ineligible reasons:\n" 2921 - "%s:%d, %s:%d, %s:%d, %s:%d, %s:%d, %s:%d, %s:%d, %s,%d; " 2920 + "%s:%d, %s:%d, %s:%d, %s:%d, %s:%d, %s:%d, %s:%d, %s:%d, %s:%d; " 2922 2921 "num_commits:%ld, ineligible: %ld, numblks: %ld", 2923 2922 MAJOR(__entry->dev), MINOR(__entry->dev), 2924 2923 FC_REASON_NAME_STAT(EXT4_FC_REASON_XATTR), 2925 2924 FC_REASON_NAME_STAT(EXT4_FC_REASON_CROSS_RENAME), 2926 2925 FC_REASON_NAME_STAT(EXT4_FC_REASON_JOURNAL_FLAG_CHANGE), 2927 - FC_REASON_NAME_STAT(EXT4_FC_REASON_MEM), 2926 + FC_REASON_NAME_STAT(EXT4_FC_REASON_NOMEM), 2928 2927 FC_REASON_NAME_STAT(EXT4_FC_REASON_SWAP_BOOT), 2929 2928 FC_REASON_NAME_STAT(EXT4_FC_REASON_RESIZE), 2930 2929 FC_REASON_NAME_STAT(EXT4_FC_REASON_RENAME_DIR), 2931 2930 FC_REASON_NAME_STAT(EXT4_FC_REASON_FALLOC_RANGE), 2931 + FC_REASON_NAME_STAT(EXT4_FC_REASON_INODE_JOURNAL_DATA), 2932 2932 __entry->sbi->s_fc_stats.fc_num_commits, 2933 2933 __entry->sbi->s_fc_stats.fc_ineligible_commits, 2934 2934 __entry->sbi->s_fc_stats.fc_numblks)

+4 -4

include/trace/events/sunrpc.h

··· 655 655 __field(size_t, tail_len) 656 656 __field(unsigned int, page_len) 657 657 __field(unsigned int, len) 658 - __string(progname, 659 - xdr->rqst->rq_task->tk_client->cl_program->name) 660 - __string(procedure, 661 - xdr->rqst->rq_task->tk_msg.rpc_proc->p_name) 658 + __string(progname, xdr->rqst ? 659 + xdr->rqst->rq_task->tk_client->cl_program->name : "unknown") 660 + __string(procedure, xdr->rqst ? 661 + xdr->rqst->rq_task->tk_msg.rpc_proc->p_name : "unknown") 662 662 ), 663 663 664 664 TP_fast_assign(

+5 -1

kernel/bpf/Makefile

··· 1 1 # SPDX-License-Identifier: GPL-2.0 2 2 obj-y := core.o 3 - CFLAGS_core.o += $(call cc-disable-warning, override-init) 3 + ifneq ($(CONFIG_BPF_JIT_ALWAYS_ON),y) 4 + # ___bpf_prog_run() needs GCSE disabled on x86; see 3193c0836f203 for details 5 + cflags-nogcse-$(CONFIG_X86)$(CONFIG_CC_IS_GCC) := -fno-gcse 6 + endif 7 + CFLAGS_core.o += $(call cc-disable-warning, override-init) $(cflags-nogcse-yy) 4 8 5 9 obj-$(CONFIG_BPF_SYSCALL) += syscall.o verifier.o inode.o helpers.o tnum.o bpf_iter.o map_iter.o task_iter.o prog_iter.o 6 10 obj-$(CONFIG_BPF_SYSCALL) += hashtab.o arraymap.o percpu_freelist.o bpf_lru_list.o lpm_trie.o map_in_map.o

+7 -3

kernel/bpf/bpf_lsm.c

··· 13 13 #include <linux/bpf_verifier.h> 14 14 #include <net/bpf_sk_storage.h> 15 15 #include <linux/bpf_local_storage.h> 16 + #include <linux/btf_ids.h> 16 17 17 18 /* For every LSM hook that allows attachment of BPF programs, declare a nop 18 19 * function where a BPF program can be attached. ··· 27 26 #include <linux/lsm_hook_defs.h> 28 27 #undef LSM_HOOK 29 28 30 - #define BPF_LSM_SYM_PREFX "bpf_lsm_" 29 + #define LSM_HOOK(RET, DEFAULT, NAME, ...) BTF_ID(func, bpf_lsm_##NAME) 30 + BTF_SET_START(bpf_lsm_hooks) 31 + #include <linux/lsm_hook_defs.h> 32 + #undef LSM_HOOK 33 + BTF_SET_END(bpf_lsm_hooks) 31 34 32 35 int bpf_lsm_verify_prog(struct bpf_verifier_log *vlog, 33 36 const struct bpf_prog *prog) ··· 42 37 return -EINVAL; 43 38 } 44 39 45 - if (strncmp(BPF_LSM_SYM_PREFX, prog->aux->attach_func_name, 46 - sizeof(BPF_LSM_SYM_PREFX) - 1)) { 40 + if (!btf_id_set_contains(&bpf_lsm_hooks, prog->aux->attach_btf_id)) { 47 41 bpf_log(vlog, "attach_btf_id %u points to wrong type name %s\n", 48 42 prog->aux->attach_btf_id, prog->aux->attach_func_name); 49 43 return -EINVAL;

+1 -1

kernel/bpf/core.c

··· 1369 1369 * 1370 1370 * Decode and execute eBPF instructions. 1371 1371 */ 1372 - static u64 __no_fgcse ___bpf_prog_run(u64 *regs, const struct bpf_insn *insn, u64 *stack) 1372 + static u64 ___bpf_prog_run(u64 *regs, const struct bpf_insn *insn, u64 *stack) 1373 1373 { 1374 1374 #define BPF_INSN_2_LBL(x, y) [BPF_##x | BPF_##y] = &&x##_##y 1375 1375 #define BPF_INSN_3_LBL(x, y, z) [BPF_##x | BPF_##y | BPF_##z] = &&x##_##y##_##z

+28 -2

kernel/bpf/hashtab.c

··· 821 821 } 822 822 } 823 823 824 + static void pcpu_init_value(struct bpf_htab *htab, void __percpu *pptr, 825 + void *value, bool onallcpus) 826 + { 827 + /* When using prealloc and not setting the initial value on all cpus, 828 + * zero-fill element values for other cpus (just as what happens when 829 + * not using prealloc). Otherwise, bpf program has no way to ensure 830 + * known initial values for cpus other than current one 831 + * (onallcpus=false always when coming from bpf prog). 832 + */ 833 + if (htab_is_prealloc(htab) && !onallcpus) { 834 + u32 size = round_up(htab->map.value_size, 8); 835 + int current_cpu = raw_smp_processor_id(); 836 + int cpu; 837 + 838 + for_each_possible_cpu(cpu) { 839 + if (cpu == current_cpu) 840 + bpf_long_memcpy(per_cpu_ptr(pptr, cpu), value, 841 + size); 842 + else 843 + memset(per_cpu_ptr(pptr, cpu), 0, size); 844 + } 845 + } else { 846 + pcpu_copy_value(htab, pptr, value, onallcpus); 847 + } 848 + } 849 + 824 850 static bool fd_htab_map_needs_adjust(const struct bpf_htab *htab) 825 851 { 826 852 return htab->map.map_type == BPF_MAP_TYPE_HASH_OF_MAPS && ··· 917 891 } 918 892 } 919 893 920 - pcpu_copy_value(htab, pptr, value, onallcpus); 894 + pcpu_init_value(htab, pptr, value, onallcpus); 921 895 922 896 if (!prealloc) 923 897 htab_elem_set_ptr(l_new, key_size, pptr); ··· 1209 1183 pcpu_copy_value(htab, htab_elem_get_ptr(l_old, key_size), 1210 1184 value, onallcpus); 1211 1185 } else { 1212 - pcpu_copy_value(htab, htab_elem_get_ptr(l_new, key_size), 1186 + pcpu_init_value(htab, htab_elem_get_ptr(l_new, key_size), 1213 1187 value, onallcpus); 1214 1188 hlist_nulls_add_head_rcu(&l_new->hash_node, head); 1215 1189 l_new = NULL;

+1

kernel/bpf/preload/Kconfig

··· 6 6 menuconfig BPF_PRELOAD 7 7 bool "Preload BPF file system with kernel specific program and map iterators" 8 8 depends on BPF 9 + depends on BPF_SYSCALL 9 10 # The dependency on !COMPILE_TEST prevents it from being enabled 10 11 # in allmodconfig or allyesconfig configurations 11 12 depends on !COMPILE_TEST

+11 -11

kernel/dma/swiotlb.c

··· 229 229 io_tlb_orig_addr[i] = INVALID_PHYS_ADDR; 230 230 } 231 231 io_tlb_index = 0; 232 + no_iotlb_memory = false; 232 233 233 234 if (verbose) 234 235 swiotlb_print_info(); ··· 261 260 if (vstart && !swiotlb_init_with_tbl(vstart, io_tlb_nslabs, verbose)) 262 261 return; 263 262 264 - if (io_tlb_start) 263 + if (io_tlb_start) { 265 264 memblock_free_early(io_tlb_start, 266 265 PAGE_ALIGN(io_tlb_nslabs << IO_TLB_SHIFT)); 266 + io_tlb_start = 0; 267 + } 267 268 pr_warn("Cannot allocate buffer"); 268 269 no_iotlb_memory = true; 269 270 } ··· 363 360 io_tlb_orig_addr[i] = INVALID_PHYS_ADDR; 364 361 } 365 362 io_tlb_index = 0; 363 + no_iotlb_memory = false; 366 364 367 365 swiotlb_print_info(); 368 366 ··· 445 441 } 446 442 } 447 443 448 - phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, 449 - dma_addr_t tbl_dma_addr, 450 - phys_addr_t orig_addr, 451 - size_t mapping_size, 452 - size_t alloc_size, 453 - enum dma_data_direction dir, 454 - unsigned long attrs) 444 + phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t orig_addr, 445 + size_t mapping_size, size_t alloc_size, 446 + enum dma_data_direction dir, unsigned long attrs) 455 447 { 448 + dma_addr_t tbl_dma_addr = phys_to_dma_unencrypted(hwdev, io_tlb_start); 456 449 unsigned long flags; 457 450 phys_addr_t tlb_addr; 458 451 unsigned int nslots, stride, index, wrap; ··· 668 667 trace_swiotlb_bounced(dev, phys_to_dma(dev, paddr), size, 669 668 swiotlb_force); 670 669 671 - swiotlb_addr = swiotlb_tbl_map_single(dev, 672 - phys_to_dma_unencrypted(dev, io_tlb_start), 673 - paddr, size, size, dir, attrs); 670 + swiotlb_addr = swiotlb_tbl_map_single(dev, paddr, size, size, dir, 671 + attrs); 674 672 if (swiotlb_addr == (phys_addr_t)DMA_MAPPING_ERROR) 675 673 return DMA_MAPPING_ERROR; 676 674

+2 -2

kernel/entry/common.c

··· 337 337 * already contains a warning when RCU is not watching, so no point 338 338 * in having another one here. 339 339 */ 340 + lockdep_hardirqs_off(CALLER_ADDR0); 340 341 instrumentation_begin(); 341 342 rcu_irq_enter_check_tick(); 342 - /* Use the combo lockdep/tracing function */ 343 - trace_hardirqs_off(); 343 + trace_hardirqs_off_finish(); 344 344 instrumentation_end(); 345 345 346 346 return ret;

+5 -7

kernel/events/core.c

··· 10085 10085 if (token == IF_SRC_FILE || token == IF_SRC_FILEADDR) { 10086 10086 int fpos = token == IF_SRC_FILE ? 2 : 1; 10087 10087 10088 + kfree(filename); 10088 10089 filename = match_strdup(&args[fpos]); 10089 10090 if (!filename) { 10090 10091 ret = -ENOMEM; ··· 10132 10131 */ 10133 10132 ret = -EOPNOTSUPP; 10134 10133 if (!event->ctx->task) 10135 - goto fail_free_name; 10134 + goto fail; 10136 10135 10137 10136 /* look up the path and grab its inode */ 10138 10137 ret = kern_path(filename, LOOKUP_FOLLOW, 10139 10138 &filter->path); 10140 10139 if (ret) 10141 - goto fail_free_name; 10142 - 10143 - kfree(filename); 10144 - filename = NULL; 10140 + goto fail; 10145 10141 10146 10142 ret = -EINVAL; 10147 10143 if (!filter->path.dentry || ··· 10158 10160 if (state != IF_STATE_ACTION) 10159 10161 goto fail; 10160 10162 10163 + kfree(filename); 10161 10164 kfree(orig); 10162 10165 10163 10166 return 0; 10164 10167 10165 - fail_free_name: 10166 - kfree(filename); 10167 10168 fail: 10169 + kfree(filename); 10168 10170 free_filters_list(filters); 10169 10171 kfree(orig); 10170 10172

+4 -1

kernel/exit.c

··· 454 454 mmap_read_unlock(mm); 455 455 456 456 self.task = current; 457 - self.next = xchg(&core_state->dumper.next, &self); 457 + if (self.task->flags & PF_SIGNALED) 458 + self.next = xchg(&core_state->dumper.next, &self); 459 + else 460 + self.task = NULL; 458 461 /* 459 462 * Implies mb(), the result of xchg() must be visible 460 463 * to core_state->dumper.

+5 -5

kernel/fork.c

··· 2167 2167 /* ok, now we should be set up.. */ 2168 2168 p->pid = pid_nr(pid); 2169 2169 if (clone_flags & CLONE_THREAD) { 2170 - p->exit_signal = -1; 2171 2170 p->group_leader = current->group_leader; 2172 2171 p->tgid = current->tgid; 2173 2172 } else { 2174 - if (clone_flags & CLONE_PARENT) 2175 - p->exit_signal = current->group_leader->exit_signal; 2176 - else 2177 - p->exit_signal = args->exit_signal; 2178 2173 p->group_leader = p; 2179 2174 p->tgid = p->pid; 2180 2175 } ··· 2213 2218 if (clone_flags & (CLONE_PARENT|CLONE_THREAD)) { 2214 2219 p->real_parent = current->real_parent; 2215 2220 p->parent_exec_id = current->parent_exec_id; 2221 + if (clone_flags & CLONE_THREAD) 2222 + p->exit_signal = -1; 2223 + else 2224 + p->exit_signal = current->group_leader->exit_signal; 2216 2225 } else { 2217 2226 p->real_parent = current; 2218 2227 p->parent_exec_id = current->self_exec_id; 2228 + p->exit_signal = args->exit_signal; 2219 2229 } 2220 2230 2221 2231 klp_copy_process(p);

+14 -2

kernel/futex.c

··· 2380 2380 } 2381 2381 2382 2382 /* 2383 - * Since we just failed the trylock; there must be an owner. 2383 + * The trylock just failed, so either there is an owner or 2384 + * there is a higher priority waiter than this one. 2384 2385 */ 2385 2386 newowner = rt_mutex_owner(&pi_state->pi_mutex); 2386 - BUG_ON(!newowner); 2387 + /* 2388 + * If the higher priority waiter has not yet taken over the 2389 + * rtmutex then newowner is NULL. We can't return here with 2390 + * that state because it's inconsistent vs. the user space 2391 + * state. So drop the locks and try again. It's a valid 2392 + * situation and not any different from the other retry 2393 + * conditions. 2394 + */ 2395 + if (unlikely(!newowner)) { 2396 + err = -EAGAIN; 2397 + goto handle_err; 2398 + } 2387 2399 } else { 2388 2400 WARN_ON_ONCE(argowner != current); 2389 2401 if (oldowner == current) {

+1

kernel/irq/Kconfig

··· 82 82 # Generic IRQ IPI support 83 83 config GENERIC_IRQ_IPI 84 84 bool 85 + select IRQ_DOMAIN_HIERARCHY 85 86 86 87 # Generic MSI interrupt support 87 88 config GENERIC_MSI_IRQ

+1 -1

kernel/sched/cpufreq_schedutil.c

··· 881 881 struct cpufreq_governor schedutil_gov = { 882 882 .name = "schedutil", 883 883 .owner = THIS_MODULE, 884 - .dynamic_switching = true, 884 + .flags = CPUFREQ_GOV_DYNAMIC_SWITCHING, 885 885 .init = sugov_init, 886 886 .exit = sugov_exit, 887 887 .start = sugov_start,

+6 -2

net/core/devlink.c

··· 8254 8254 { 8255 8255 struct devlink_port_attrs *attrs = &devlink_port->attrs; 8256 8256 8257 - if (WARN_ON(devlink_port->registered)) 8258 - return -EEXIST; 8259 8257 devlink_port->attrs_set = true; 8260 8258 attrs->flavour = flavour; 8261 8259 if (attrs->switch_id.id_len) { ··· 8277 8279 { 8278 8280 int ret; 8279 8281 8282 + if (WARN_ON(devlink_port->registered)) 8283 + return; 8280 8284 devlink_port->attrs = *attrs; 8281 8285 ret = __devlink_port_attrs_set(devlink_port, attrs->flavour); 8282 8286 if (ret) ··· 8301 8301 struct devlink_port_attrs *attrs = &devlink_port->attrs; 8302 8302 int ret; 8303 8303 8304 + if (WARN_ON(devlink_port->registered)) 8305 + return; 8304 8306 ret = __devlink_port_attrs_set(devlink_port, 8305 8307 DEVLINK_PORT_FLAVOUR_PCI_PF); 8306 8308 if (ret) ··· 8328 8326 struct devlink_port_attrs *attrs = &devlink_port->attrs; 8329 8327 int ret; 8330 8328 8329 + if (WARN_ON(devlink_port->registered)) 8330 + return; 8331 8331 ret = __devlink_port_attrs_set(devlink_port, 8332 8332 DEVLINK_PORT_FLAVOUR_PCI_VF); 8333 8333 if (ret)

+1 -1

net/ethtool/features.c

··· 280 280 active_diff_mask, compact); 281 281 } 282 282 if (mod) 283 - ethtool_notify(dev, ETHTOOL_MSG_FEATURES_NTF, NULL); 283 + netdev_features_change(dev); 284 284 285 285 out_rtnl: 286 286 rtnl_unlock();

+2 -2

net/ipv4/ip_tunnel_core.c

··· 263 263 const struct icmphdr *icmph = icmp_hdr(skb); 264 264 const struct iphdr *iph = ip_hdr(skb); 265 265 266 - if (mtu <= 576 || iph->frag_off != htons(IP_DF)) 266 + if (mtu < 576 || iph->frag_off != htons(IP_DF)) 267 267 return 0; 268 268 269 269 if (ipv4_is_lbcast(iph->daddr) || ipv4_is_multicast(iph->daddr) || ··· 359 359 __be16 frag_off; 360 360 int offset; 361 361 362 - if (mtu <= IPV6_MIN_MTU) 362 + if (mtu < IPV6_MIN_MTU) 363 363 return 0; 364 364 365 365 if (stype == IPV6_ADDR_ANY || stype == IPV6_ADDR_MULTICAST ||

+7 -2

net/ipv4/syncookies.c

··· 331 331 __u32 cookie = ntohl(th->ack_seq) - 1; 332 332 struct sock *ret = sk; 333 333 struct request_sock *req; 334 - int mss; 334 + int full_space, mss; 335 335 struct rtable *rt; 336 336 __u8 rcv_wscale; 337 337 struct flowi4 fl4; ··· 427 427 428 428 /* Try to redo what tcp_v4_send_synack did. */ 429 429 req->rsk_window_clamp = tp->window_clamp ? :dst_metric(&rt->dst, RTAX_WINDOW); 430 + /* limit the window selection if the user enforce a smaller rx buffer */ 431 + full_space = tcp_full_space(sk); 432 + if (sk->sk_userlocks & SOCK_RCVBUF_LOCK && 433 + (req->rsk_window_clamp > full_space || req->rsk_window_clamp == 0)) 434 + req->rsk_window_clamp = full_space; 430 435 431 - tcp_select_initial_window(sk, tcp_full_space(sk), req->mss, 436 + tcp_select_initial_window(sk, full_space, req->mss, 432 437 &req->rsk_rcv_wnd, &req->rsk_window_clamp, 433 438 ireq->wscale_ok, &rcv_wscale, 434 439 dst_metric(&rt->dst, RTAX_INITRWND));

+16 -3

net/ipv4/udp_offload.c

··· 369 369 static struct sk_buff *udp_gro_receive_segment(struct list_head *head, 370 370 struct sk_buff *skb) 371 371 { 372 - struct udphdr *uh = udp_hdr(skb); 372 + struct udphdr *uh = udp_gro_udphdr(skb); 373 373 struct sk_buff *pp = NULL; 374 374 struct udphdr *uh2; 375 375 struct sk_buff *p; ··· 503 503 } 504 504 EXPORT_SYMBOL(udp_gro_receive); 505 505 506 + static struct sock *udp4_gro_lookup_skb(struct sk_buff *skb, __be16 sport, 507 + __be16 dport) 508 + { 509 + const struct iphdr *iph = skb_gro_network_header(skb); 510 + 511 + return __udp4_lib_lookup(dev_net(skb->dev), iph->saddr, sport, 512 + iph->daddr, dport, inet_iif(skb), 513 + inet_sdif(skb), &udp_table, NULL); 514 + } 515 + 506 516 INDIRECT_CALLABLE_SCOPE 507 517 struct sk_buff *udp4_gro_receive(struct list_head *head, struct sk_buff *skb) 508 518 { 509 519 struct udphdr *uh = udp_gro_udphdr(skb); 520 + struct sock *sk = NULL; 510 521 struct sk_buff *pp; 511 - struct sock *sk; 512 522 513 523 if (unlikely(!uh)) 514 524 goto flush; ··· 536 526 skip: 537 527 NAPI_GRO_CB(skb)->is_ipv6 = 0; 538 528 rcu_read_lock(); 539 - sk = static_branch_unlikely(&udp_encap_needed_key) ? udp4_lib_lookup_skb(skb, uh->source, uh->dest) : NULL; 529 + 530 + if (static_branch_unlikely(&udp_encap_needed_key)) 531 + sk = udp4_gro_lookup_skb(skb, uh->source, uh->dest); 532 + 540 533 pp = udp_gro_receive(head, skb, uh, sk); 541 534 rcu_read_unlock(); 542 535 return pp;

-2

net/ipv6/sit.c

··· 1128 1128 if (tdev && !netif_is_l3_master(tdev)) { 1129 1129 int t_hlen = tunnel->hlen + sizeof(struct iphdr); 1130 1130 1131 - dev->hard_header_len = tdev->hard_header_len + sizeof(struct iphdr); 1132 1131 dev->mtu = tdev->mtu - t_hlen; 1133 1132 if (dev->mtu < IPV6_MIN_MTU) 1134 1133 dev->mtu = IPV6_MIN_MTU; ··· 1425 1426 dev->priv_destructor = ipip6_dev_free; 1426 1427 1427 1428 dev->type = ARPHRD_SIT; 1428 - dev->hard_header_len = LL_MAX_HEADER + t_hlen; 1429 1429 dev->mtu = ETH_DATA_LEN - t_hlen; 1430 1430 dev->min_mtu = IPV6_MIN_MTU; 1431 1431 dev->max_mtu = IP6_MAX_MTU - t_hlen;

+8 -2

net/ipv6/syncookies.c

··· 136 136 __u32 cookie = ntohl(th->ack_seq) - 1; 137 137 struct sock *ret = sk; 138 138 struct request_sock *req; 139 - int mss; 139 + int full_space, mss; 140 140 struct dst_entry *dst; 141 141 __u8 rcv_wscale; 142 142 u32 tsoff = 0; ··· 241 241 } 242 242 243 243 req->rsk_window_clamp = tp->window_clamp ? :dst_metric(dst, RTAX_WINDOW); 244 - tcp_select_initial_window(sk, tcp_full_space(sk), req->mss, 244 + /* limit the window selection if the user enforce a smaller rx buffer */ 245 + full_space = tcp_full_space(sk); 246 + if (sk->sk_userlocks & SOCK_RCVBUF_LOCK && 247 + (req->rsk_window_clamp > full_space || req->rsk_window_clamp == 0)) 248 + req->rsk_window_clamp = full_space; 249 + 250 + tcp_select_initial_window(sk, full_space, req->mss, 245 251 &req->rsk_rcv_wnd, &req->rsk_window_clamp, 246 252 ireq->wscale_ok, &rcv_wscale, 247 253 dst_metric(dst, RTAX_INITRWND));

+15 -2

net/ipv6/udp_offload.c

··· 111 111 return segs; 112 112 } 113 113 114 + static struct sock *udp6_gro_lookup_skb(struct sk_buff *skb, __be16 sport, 115 + __be16 dport) 116 + { 117 + const struct ipv6hdr *iph = skb_gro_network_header(skb); 118 + 119 + return __udp6_lib_lookup(dev_net(skb->dev), &iph->saddr, sport, 120 + &iph->daddr, dport, inet6_iif(skb), 121 + inet6_sdif(skb), &udp_table, NULL); 122 + } 123 + 114 124 INDIRECT_CALLABLE_SCOPE 115 125 struct sk_buff *udp6_gro_receive(struct list_head *head, struct sk_buff *skb) 116 126 { 117 127 struct udphdr *uh = udp_gro_udphdr(skb); 128 + struct sock *sk = NULL; 118 129 struct sk_buff *pp; 119 - struct sock *sk; 120 130 121 131 if (unlikely(!uh)) 122 132 goto flush; ··· 145 135 skip: 146 136 NAPI_GRO_CB(skb)->is_ipv6 = 1; 147 137 rcu_read_lock(); 148 - sk = static_branch_unlikely(&udpv6_encap_needed_key) ? udp6_lib_lookup_skb(skb, uh->source, uh->dest) : NULL; 138 + 139 + if (static_branch_unlikely(&udpv6_encap_needed_key)) 140 + sk = udp6_gro_lookup_skb(skb, uh->source, uh->dest); 141 + 149 142 pp = udp_gro_receive(head, skb, uh, sk); 150 143 rcu_read_unlock(); 151 144 return pp;

+2 -1

net/iucv/af_iucv.c

··· 1434 1434 break; 1435 1435 } 1436 1436 1437 - if (how == SEND_SHUTDOWN || how == SHUTDOWN_MASK) { 1437 + if ((how == SEND_SHUTDOWN || how == SHUTDOWN_MASK) && 1438 + sk->sk_state == IUCV_CONNECTED) { 1438 1439 if (iucv->transport == AF_IUCV_TRANS_IUCV) { 1439 1440 txmsg.class = 0; 1440 1441 txmsg.tag = 0;

+1

net/mptcp/protocol.c

··· 2494 2494 .memory_pressure = &tcp_memory_pressure, 2495 2495 .stream_memory_free = mptcp_memory_free, 2496 2496 .sysctl_wmem_offset = offsetof(struct net, ipv4.sysctl_tcp_wmem), 2497 + .sysctl_rmem_offset = offsetof(struct net, ipv4.sysctl_tcp_rmem), 2497 2498 .sysctl_mem = sysctl_tcp_mem, 2498 2499 .obj_size = sizeof(struct mptcp_sock), 2499 2500 .slab_flags = SLAB_TYPESAFE_BY_RCU,

+12 -5

net/netlabel/netlabel_unlabeled.c

··· 1166 1166 struct netlbl_unlhsh_walk_arg cb_arg; 1167 1167 u32 skip_bkt = cb->args[0]; 1168 1168 u32 skip_chain = cb->args[1]; 1169 - u32 iter_bkt; 1170 - u32 iter_chain = 0, iter_addr4 = 0, iter_addr6 = 0; 1169 + u32 skip_addr4 = cb->args[2]; 1170 + u32 iter_bkt, iter_chain, iter_addr4 = 0, iter_addr6 = 0; 1171 1171 struct netlbl_unlhsh_iface *iface; 1172 1172 struct list_head *iter_list; 1173 1173 struct netlbl_af4list *addr4; 1174 1174 #if IS_ENABLED(CONFIG_IPV6) 1175 + u32 skip_addr6 = cb->args[3]; 1175 1176 struct netlbl_af6list *addr6; 1176 1177 #endif 1177 1178 ··· 1183 1182 rcu_read_lock(); 1184 1183 for (iter_bkt = skip_bkt; 1185 1184 iter_bkt < rcu_dereference(netlbl_unlhsh)->size; 1186 - iter_bkt++, iter_chain = 0, iter_addr4 = 0, iter_addr6 = 0) { 1185 + iter_bkt++) { 1187 1186 iter_list = &rcu_dereference(netlbl_unlhsh)->tbl[iter_bkt]; 1188 1187 list_for_each_entry_rcu(iface, iter_list, list) { 1189 1188 if (!iface->valid || ··· 1191 1190 continue; 1192 1191 netlbl_af4list_foreach_rcu(addr4, 1193 1192 &iface->addr4_list) { 1194 - if (iter_addr4++ < cb->args[2]) 1193 + if (iter_addr4++ < skip_addr4) 1195 1194 continue; 1196 1195 if (netlbl_unlabel_staticlist_gen( 1197 1196 NLBL_UNLABEL_C_STATICLIST, ··· 1204 1203 goto unlabel_staticlist_return; 1205 1204 } 1206 1205 } 1206 + iter_addr4 = 0; 1207 + skip_addr4 = 0; 1207 1208 #if IS_ENABLED(CONFIG_IPV6) 1208 1209 netlbl_af6list_foreach_rcu(addr6, 1209 1210 &iface->addr6_list) { 1210 - if (iter_addr6++ < cb->args[3]) 1211 + if (iter_addr6++ < skip_addr6) 1211 1212 continue; 1212 1213 if (netlbl_unlabel_staticlist_gen( 1213 1214 NLBL_UNLABEL_C_STATICLIST, ··· 1222 1219 goto unlabel_staticlist_return; 1223 1220 } 1224 1221 } 1222 + iter_addr6 = 0; 1223 + skip_addr6 = 0; 1225 1224 #endif /* IPv6 */ 1226 1225 } 1226 + iter_chain = 0; 1227 + skip_chain = 0; 1227 1228 } 1228 1229 1229 1230 unlabel_staticlist_return:

+5 -4

net/sunrpc/sysctl.c

··· 63 63 void *buffer, size_t *lenp, loff_t *ppos) 64 64 { 65 65 char tmpbuf[256]; 66 - size_t len; 66 + ssize_t len; 67 67 68 - if ((*ppos && !write) || !*lenp) { 68 + if (write || *ppos) { 69 69 *lenp = 0; 70 70 return 0; 71 71 } 72 72 len = svc_print_xprts(tmpbuf, sizeof(tmpbuf)); 73 - *lenp = memory_read_from_buffer(buffer, *lenp, ppos, tmpbuf, len); 73 + len = memory_read_from_buffer(buffer, *lenp, ppos, tmpbuf, len); 74 74 75 - if (*lenp < 0) { 75 + if (len < 0) { 76 76 *lenp = 0; 77 77 return -EINVAL; 78 78 } 79 + *lenp = len; 79 80 return 0; 80 81 } 81 82

+8 -2

net/tipc/topsrv.c

··· 664 664 665 665 ret = tipc_topsrv_work_start(srv); 666 666 if (ret < 0) 667 - return ret; 667 + goto err_start; 668 668 669 669 ret = tipc_topsrv_create_listener(srv); 670 670 if (ret < 0) 671 - tipc_topsrv_work_stop(srv); 671 + goto err_create; 672 672 673 + return 0; 674 + 675 + err_create: 676 + tipc_topsrv_work_stop(srv); 677 + err_start: 678 + kfree(srv); 673 679 return ret; 674 680 } 675 681

+1 -1

net/x25/af_x25.c

··· 825 825 sock->state = SS_CONNECTED; 826 826 rc = 0; 827 827 out_put_neigh: 828 - if (rc) { 828 + if (rc && x25->neighbour) { 829 829 read_lock_bh(&x25_list_lock); 830 830 x25_neigh_put(x25->neighbour); 831 831 x25->neighbour = NULL;

+2 -1

net/xdp/xsk.c

··· 1146 1146 if (!sock_flag(sk, SOCK_DEAD)) 1147 1147 return; 1148 1148 1149 - xp_put_pool(xs->pool); 1149 + if (!xp_put_pool(xs->pool)) 1150 + xdp_put_umem(xs->umem); 1150 1151 1151 1152 sk_refcnt_debug_dec(sk); 1152 1153 }

+5 -2

net/xdp/xsk_buff_pool.c

··· 251 251 refcount_inc(&pool->users); 252 252 } 253 253 254 - void xp_put_pool(struct xsk_buff_pool *pool) 254 + bool xp_put_pool(struct xsk_buff_pool *pool) 255 255 { 256 256 if (!pool) 257 - return; 257 + return false; 258 258 259 259 if (refcount_dec_and_test(&pool->users)) { 260 260 INIT_WORK(&pool->work, xp_release_deferred); 261 261 schedule_work(&pool->work); 262 + return true; 262 263 } 264 + 265 + return false; 263 266 } 264 267 265 268 static struct xsk_dma_map *xp_find_dma_map(struct xsk_buff_pool *pool)

+1 -1

samples/bpf/task_fd_query_user.c

··· 290 290 291 291 int main(int argc, char **argv) 292 292 { 293 - struct rlimit r = {1024*1024, RLIM_INFINITY}; 293 + struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY}; 294 294 extern char __executable_start; 295 295 char filename[256], buf[256]; 296 296 __u64 uprobe_file_offset;

+1 -1

samples/bpf/tracex2_user.c

··· 116 116 117 117 int main(int ac, char **argv) 118 118 { 119 - struct rlimit r = {1024*1024, RLIM_INFINITY}; 119 + struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY}; 120 120 long key, next_key, value; 121 121 struct bpf_link *links[2]; 122 122 struct bpf_program *prog;

+1 -1

samples/bpf/tracex3_user.c

··· 107 107 108 108 int main(int ac, char **argv) 109 109 { 110 - struct rlimit r = {1024*1024, RLIM_INFINITY}; 110 + struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY}; 111 111 struct bpf_link *links[2]; 112 112 struct bpf_program *prog; 113 113 struct bpf_object *obj;

+1 -1

samples/bpf/xdp_redirect_cpu_user.c

··· 765 765 766 766 int main(int argc, char **argv) 767 767 { 768 - struct rlimit r = {10 * 1024 * 1024, RLIM_INFINITY}; 768 + struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY}; 769 769 char *prog_name = "xdp_cpu_map5_lb_hash_ip_pairs"; 770 770 char *mprog_filename = "xdp_redirect_kern.o"; 771 771 char *redir_interface = NULL, *redir_map = NULL;

+1 -1

samples/bpf/xdp_rxq_info_user.c

··· 450 450 int main(int argc, char **argv) 451 451 { 452 452 __u32 cfg_options= NO_TOUCH ; /* Default: Don't touch packet memory */ 453 - struct rlimit r = {10 * 1024 * 1024, RLIM_INFINITY}; 453 + struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY}; 454 454 struct bpf_prog_load_attr prog_load_attr = { 455 455 .prog_type = BPF_PROG_TYPE_XDP, 456 456 };

+1

scripts/bpf_helpers_doc.py

··· 408 408 'struct bpf_perf_event_data', 409 409 'struct bpf_perf_event_value', 410 410 'struct bpf_pidns_info', 411 + 'struct bpf_redir_neigh', 411 412 'struct bpf_sock', 412 413 'struct bpf_sock_addr', 413 414 'struct bpf_sock_ops',

+23

scripts/get_abi.pl

··· 287 287 sub output_rest { 288 288 create_labels(); 289 289 290 + my $part = ""; 291 + 290 292 foreach my $what (sort { 291 293 ($data{$a}->{type} eq "File") cmp ($data{$b}->{type} eq "File") || 292 294 $a cmp $b ··· 308 306 $w =~ s/([\_\-\*\=\^\~\\])/\\$1/g; 309 307 310 308 if ($type ne "File") { 309 + my $cur_part = $what; 310 + if ($what =~ '/') { 311 + if ($what =~ m#^(\/?(?:[\w\-]+\/?){1,2})#) { 312 + $cur_part = "Symbols under $1"; 313 + $cur_part =~ s,/$,,; 314 + } 315 + } 316 + 317 + if ($cur_part ne "" && $part ne $cur_part) { 318 + $part = $cur_part; 319 + my $bar = $part; 320 + $bar =~ s/./-/g; 321 + print "$part\n$bar\n\n"; 322 + } 323 + 311 324 printf ".. _%s:\n\n", $data{$what}->{label}; 312 325 313 326 my @names = split /, /,$w; ··· 369 352 370 353 if (!($desc =~ /^\s*$/)) { 371 354 if ($description_is_rst) { 355 + # Remove title markups from the description 356 + # Having titles inside ABI files will only work if extra 357 + # care would be taken in order to strictly follow the same 358 + # level order for each markup. 359 + $desc =~ s/\n[\-\*\=\^\~]+\n/\n\n/g; 360 + 372 361 # Enrich text by creating cross-references 373 362 374 363 $desc =~ s,Documentation/(?!devicetree)(\S+)\.rst,:doc:`/$1`,g;

+6 -1

tools/bpf/bpftool/feature.c

··· 843 843 else 844 844 p_err("missing %s%s%s%s%s%s%s%srequired for full feature probing; run as root or use 'unprivileged'", 845 845 capability_msg(bpf_caps, 0), 846 + #ifdef CAP_BPF 846 847 capability_msg(bpf_caps, 1), 847 848 capability_msg(bpf_caps, 2), 848 - capability_msg(bpf_caps, 3)); 849 + capability_msg(bpf_caps, 3) 850 + #else 851 + "", "", "", "", "", "" 852 + #endif /* CAP_BPF */ 853 + ); 849 854 goto exit_free; 850 855 } 851 856

+1 -1

tools/bpf/bpftool/prog.c

··· 940 940 } 941 941 942 942 if (*attach_type == BPF_FLOW_DISSECTOR) { 943 - *mapfd = -1; 943 + *mapfd = 0; 944 944 return 0; 945 945 } 946 946

+2 -2

tools/bpf/bpftool/skeleton/profiler.bpf.c

··· 70 70 static inline void 71 71 fexit_update_maps(u32 id, struct bpf_perf_event_value *after) 72 72 { 73 - struct bpf_perf_event_value *before, diff, *accum; 73 + struct bpf_perf_event_value *before, diff; 74 74 75 75 before = bpf_map_lookup_elem(&fentry_readings, &id); 76 76 /* only account samples with a valid fentry_reading */ ··· 95 95 { 96 96 struct bpf_perf_event_value readings[MAX_NUM_MATRICS]; 97 97 u32 cpu = bpf_get_smp_processor_id(); 98 - u32 i, one = 1, zero = 0; 98 + u32 i, zero = 0; 99 99 int err; 100 100 u64 *count; 101 101

+9 -6

tools/lib/bpf/hashmap.h

··· 15 15 static inline size_t hash_bits(size_t h, int bits) 16 16 { 17 17 /* shuffle bits and return requested number of upper bits */ 18 + if (bits == 0) 19 + return 0; 20 + 18 21 #if (__SIZEOF_SIZE_T__ == __SIZEOF_LONG_LONG__) 19 22 /* LP64 case */ 20 23 return (h * 11400714819323198485llu) >> (__SIZEOF_LONG_LONG__ * 8 - bits); ··· 177 174 * @key: key to iterate entries for 178 175 */ 179 176 #define hashmap__for_each_key_entry(map, cur, _key) \ 180 - for (cur = ({ size_t bkt = hash_bits(map->hash_fn((_key), map->ctx),\ 181 - map->cap_bits); \ 182 - map->buckets ? map->buckets[bkt] : NULL; }); \ 177 + for (cur = map->buckets \ 178 + ? map->buckets[hash_bits(map->hash_fn((_key), map->ctx), map->cap_bits)] \ 179 + : NULL; \ 183 180 cur; \ 184 181 cur = cur->next) \ 185 182 if (map->equal_fn(cur->key, (_key), map->ctx)) 186 183 187 184 #define hashmap__for_each_key_entry_safe(map, cur, tmp, _key) \ 188 - for (cur = ({ size_t bkt = hash_bits(map->hash_fn((_key), map->ctx),\ 189 - map->cap_bits); \ 190 - cur = map->buckets ? map->buckets[bkt] : NULL; }); \ 185 + for (cur = map->buckets \ 186 + ? map->buckets[hash_bits(map->hash_fn((_key), map->ctx), map->cap_bits)] \ 187 + : NULL; \ 191 188 cur && ({ tmp = cur->next; true; }); \ 192 189 cur = tmp) \ 193 190 if (map->equal_fn(cur->key, (_key), map->ctx))

+6 -3

tools/lib/bpf/xsk.c

··· 891 891 void xsk_socket__delete(struct xsk_socket *xsk) 892 892 { 893 893 size_t desc_sz = sizeof(struct xdp_desc); 894 - struct xsk_ctx *ctx = xsk->ctx; 895 894 struct xdp_mmap_offsets off; 895 + struct xsk_umem *umem; 896 + struct xsk_ctx *ctx; 896 897 int err; 897 898 898 899 if (!xsk) 899 900 return; 900 901 902 + ctx = xsk->ctx; 903 + umem = ctx->umem; 901 904 if (ctx->prog_fd != -1) { 902 905 xsk_delete_bpf_maps(xsk); 903 906 close(ctx->prog_fd); ··· 920 917 921 918 xsk_put_ctx(ctx); 922 919 923 - ctx->umem->refcount--; 920 + umem->refcount--; 924 921 /* Do not close an fd that also has an associated umem connected 925 922 * to it. 926 923 */ 927 - if (xsk->fd != ctx->umem->fd) 924 + if (xsk->fd != umem->fd) 928 925 close(xsk->fd); 929 926 free(xsk); 930 927 }

+2 -1

tools/power/x86/turbostat/Makefile

··· 12 12 override CFLAGS += -O2 -Wall -I../../../include 13 13 override CFLAGS += -DMSRHEADER='"../../../../arch/x86/include/asm/msr-index.h"' 14 14 override CFLAGS += -DINTEL_FAMILY_HEADER='"../../../../arch/x86/include/asm/intel-family.h"' 15 + override CFLAGS += -D_FILE_OFFSET_BITS=64 15 16 override CFLAGS += -D_FORTIFY_SOURCE=2 16 17 17 18 %: %.c 18 19 @mkdir -p $(BUILD_OUTPUT) 19 - $(CC) $(CFLAGS) $< -o $(BUILD_OUTPUT)/$@ $(LDFLAGS) -lcap 20 + $(CC) $(CFLAGS) $< -o $(BUILD_OUTPUT)/$@ $(LDFLAGS) -lcap -lrt 20 21 21 22 .PHONY : clean 22 23 clean :

+1 -1

tools/power/x86/turbostat/turbostat.8

··· 335 335 336 336 .SH REFERENCES 337 337 Volume 3B: System Programming Guide" 338 - http://www.intel.com/products/processor/manuals/ 338 + https://www.intel.com/products/processor/manuals/ 339 339 340 340 .SH FILES 341 341 .ta

+450 -123

tools/power/x86/turbostat/turbostat.c

··· 79 79 unsigned long long cpuidle_cur_cpu_lpi_us; 80 80 unsigned long long cpuidle_cur_sys_lpi_us; 81 81 unsigned int gfx_cur_mhz; 82 + unsigned int gfx_act_mhz; 82 83 unsigned int tcc_activation_temp; 83 84 unsigned int tcc_activation_temp_override; 84 85 double rapl_power_units, rapl_time_units; ··· 211 210 unsigned long long pkg_both_core_gfxe_c0; 212 211 long long gfx_rc6_ms; 213 212 unsigned int gfx_mhz; 213 + unsigned int gfx_act_mhz; 214 214 unsigned int package_id; 215 - unsigned int energy_pkg; /* MSR_PKG_ENERGY_STATUS */ 216 - unsigned int energy_dram; /* MSR_DRAM_ENERGY_STATUS */ 217 - unsigned int energy_cores; /* MSR_PP0_ENERGY_STATUS */ 218 - unsigned int energy_gfx; /* MSR_PP1_ENERGY_STATUS */ 219 - unsigned int rapl_pkg_perf_status; /* MSR_PKG_PERF_STATUS */ 220 - unsigned int rapl_dram_perf_status; /* MSR_DRAM_PERF_STATUS */ 215 + unsigned long long energy_pkg; /* MSR_PKG_ENERGY_STATUS */ 216 + unsigned long long energy_dram; /* MSR_DRAM_ENERGY_STATUS */ 217 + unsigned long long energy_cores; /* MSR_PP0_ENERGY_STATUS */ 218 + unsigned long long energy_gfx; /* MSR_PP1_ENERGY_STATUS */ 219 + unsigned long long rapl_pkg_perf_status; /* MSR_PKG_PERF_STATUS */ 220 + unsigned long long rapl_dram_perf_status; /* MSR_DRAM_PERF_STATUS */ 221 221 unsigned int pkg_temp_c; 222 222 unsigned long long counter[MAX_ADDED_COUNTERS]; 223 223 } *package_even, *package_odd; ··· 261 259 #define SYSFS_PERCPU (1 << 1) 262 260 }; 263 261 262 + /* 263 + * The accumulated sum of MSR is defined as a monotonic 264 + * increasing MSR, it will be accumulated periodically, 265 + * despite its register's bit width. 266 + */ 267 + enum { 268 + IDX_PKG_ENERGY, 269 + IDX_DRAM_ENERGY, 270 + IDX_PP0_ENERGY, 271 + IDX_PP1_ENERGY, 272 + IDX_PKG_PERF, 273 + IDX_DRAM_PERF, 274 + IDX_COUNT, 275 + }; 276 + 277 + int get_msr_sum(int cpu, off_t offset, unsigned long long *msr); 278 + 279 + struct msr_sum_array { 280 + /* get_msr_sum() = sum + (get_msr() - last) */ 281 + struct { 282 + /*The accumulated MSR value is updated by the timer*/ 283 + unsigned long long sum; 284 + /*The MSR footprint recorded in last timer*/ 285 + unsigned long long last; 286 + } entries[IDX_COUNT]; 287 + }; 288 + 289 + /* The percpu MSR sum array.*/ 290 + struct msr_sum_array *per_cpu_msr_sum; 291 + 292 + int idx_to_offset(int idx) 293 + { 294 + int offset; 295 + 296 + switch (idx) { 297 + case IDX_PKG_ENERGY: 298 + offset = MSR_PKG_ENERGY_STATUS; 299 + break; 300 + case IDX_DRAM_ENERGY: 301 + offset = MSR_DRAM_ENERGY_STATUS; 302 + break; 303 + case IDX_PP0_ENERGY: 304 + offset = MSR_PP0_ENERGY_STATUS; 305 + break; 306 + case IDX_PP1_ENERGY: 307 + offset = MSR_PP1_ENERGY_STATUS; 308 + break; 309 + case IDX_PKG_PERF: 310 + offset = MSR_PKG_PERF_STATUS; 311 + break; 312 + case IDX_DRAM_PERF: 313 + offset = MSR_DRAM_PERF_STATUS; 314 + break; 315 + default: 316 + offset = -1; 317 + } 318 + return offset; 319 + } 320 + 321 + int offset_to_idx(int offset) 322 + { 323 + int idx; 324 + 325 + switch (offset) { 326 + case MSR_PKG_ENERGY_STATUS: 327 + idx = IDX_PKG_ENERGY; 328 + break; 329 + case MSR_DRAM_ENERGY_STATUS: 330 + idx = IDX_DRAM_ENERGY; 331 + break; 332 + case MSR_PP0_ENERGY_STATUS: 333 + idx = IDX_PP0_ENERGY; 334 + break; 335 + case MSR_PP1_ENERGY_STATUS: 336 + idx = IDX_PP1_ENERGY; 337 + break; 338 + case MSR_PKG_PERF_STATUS: 339 + idx = IDX_PKG_PERF; 340 + break; 341 + case MSR_DRAM_PERF_STATUS: 342 + idx = IDX_DRAM_PERF; 343 + break; 344 + default: 345 + idx = -1; 346 + } 347 + return idx; 348 + } 349 + 350 + int idx_valid(int idx) 351 + { 352 + switch (idx) { 353 + case IDX_PKG_ENERGY: 354 + return do_rapl & RAPL_PKG; 355 + case IDX_DRAM_ENERGY: 356 + return do_rapl & RAPL_DRAM; 357 + case IDX_PP0_ENERGY: 358 + return do_rapl & RAPL_CORES_ENERGY_STATUS; 359 + case IDX_PP1_ENERGY: 360 + return do_rapl & RAPL_GFX; 361 + case IDX_PKG_PERF: 362 + return do_rapl & RAPL_PKG_PERF_STATUS; 363 + case IDX_DRAM_PERF: 364 + return do_rapl & RAPL_DRAM_PERF_STATUS; 365 + default: 366 + return 0; 367 + } 368 + } 264 369 struct sys_counters { 265 370 unsigned int added_thread_counters; 266 371 unsigned int added_core_counters; ··· 560 451 { 0x0, "APIC" }, 561 452 { 0x0, "X2APIC" }, 562 453 { 0x0, "Die" }, 454 + { 0x0, "GFXAMHz" }, 563 455 }; 564 456 565 457 #define MAX_BIC (sizeof(bic) / sizeof(struct msr_counter)) ··· 615 505 #define BIC_APIC (1ULL << 48) 616 506 #define BIC_X2APIC (1ULL << 49) 617 507 #define BIC_Die (1ULL << 50) 508 + #define BIC_GFXACTMHz (1ULL << 51) 618 509 619 510 #define BIC_DISABLED_BY_DEFAULT (BIC_USEC | BIC_TOD | BIC_APIC | BIC_X2APIC) 620 511 ··· 835 724 if (DO_BIC(BIC_GFXMHz)) 836 725 outp += sprintf(outp, "%sGFXMHz", (printed++ ? delim : "")); 837 726 727 + if (DO_BIC(BIC_GFXACTMHz)) 728 + outp += sprintf(outp, "%sGFXAMHz", (printed++ ? delim : "")); 729 + 838 730 if (DO_BIC(BIC_Totl_c0)) 839 731 outp += sprintf(outp, "%sTotl%%C0", (printed++ ? delim : "")); 840 732 if (DO_BIC(BIC_Any_c0)) ··· 972 858 outp += sprintf(outp, "pc10: %016llX\n", p->pc10); 973 859 outp += sprintf(outp, "cpu_lpi: %016llX\n", p->cpu_lpi); 974 860 outp += sprintf(outp, "sys_lpi: %016llX\n", p->sys_lpi); 975 - outp += sprintf(outp, "Joules PKG: %0X\n", p->energy_pkg); 976 - outp += sprintf(outp, "Joules COR: %0X\n", p->energy_cores); 977 - outp += sprintf(outp, "Joules GFX: %0X\n", p->energy_gfx); 978 - outp += sprintf(outp, "Joules RAM: %0X\n", p->energy_dram); 979 - outp += sprintf(outp, "Throttle PKG: %0X\n", 861 + outp += sprintf(outp, "Joules PKG: %0llX\n", p->energy_pkg); 862 + outp += sprintf(outp, "Joules COR: %0llX\n", p->energy_cores); 863 + outp += sprintf(outp, "Joules GFX: %0llX\n", p->energy_gfx); 864 + outp += sprintf(outp, "Joules RAM: %0llX\n", p->energy_dram); 865 + outp += sprintf(outp, "Throttle PKG: %0llX\n", 980 866 p->rapl_pkg_perf_status); 981 - outp += sprintf(outp, "Throttle RAM: %0X\n", 867 + outp += sprintf(outp, "Throttle RAM: %0llX\n", 982 868 p->rapl_dram_perf_status); 983 869 outp += sprintf(outp, "PTM: %dC\n", p->pkg_temp_c); 984 870 ··· 1176 1062 } 1177 1063 } 1178 1064 1179 - /* 1180 - * If measurement interval exceeds minimum RAPL Joule Counter range, 1181 - * indicate that results are suspect by printing "**" in fraction place. 1182 - */ 1183 - if (interval_float < rapl_joule_counter_range) 1184 - fmt8 = "%s%.2f"; 1185 - else 1186 - fmt8 = "%6.0f**"; 1065 + fmt8 = "%s%.2f"; 1187 1066 1188 1067 if (DO_BIC(BIC_CorWatt) && (do_rapl & RAPL_PER_CORE_ENERGY)) 1189 1068 outp += sprintf(outp, fmt8, (printed++ ? delim : ""), c->core_energy * rapl_energy_units / interval_float); ··· 1204 1097 /* GFXMHz */ 1205 1098 if (DO_BIC(BIC_GFXMHz)) 1206 1099 outp += sprintf(outp, "%s%d", (printed++ ? delim : ""), p->gfx_mhz); 1100 + 1101 + /* GFXACTMHz */ 1102 + if (DO_BIC(BIC_GFXACTMHz)) 1103 + outp += sprintf(outp, "%s%d", (printed++ ? delim : ""), p->gfx_act_mhz); 1207 1104 1208 1105 /* Totl%C0, Any%C0 GFX%C0 CPUGFX% */ 1209 1106 if (DO_BIC(BIC_Totl_c0)) ··· 1321 1210 } 1322 1211 1323 1212 #define DELTA_WRAP32(new, old) \ 1324 - if (new > old) { \ 1325 - old = new - old; \ 1326 - } else { \ 1327 - old = 0x100000000 + new - old; \ 1328 - } 1213 + old = ((((unsigned long long)new << 32) - ((unsigned long long)old << 32)) >> 32); 1329 1214 1330 1215 int 1331 1216 delta_package(struct pkg_data *new, struct pkg_data *old) ··· 1360 1253 old->gfx_rc6_ms = new->gfx_rc6_ms - old->gfx_rc6_ms; 1361 1254 1362 1255 old->gfx_mhz = new->gfx_mhz; 1256 + old->gfx_act_mhz = new->gfx_act_mhz; 1363 1257 1364 - DELTA_WRAP32(new->energy_pkg, old->energy_pkg); 1365 - DELTA_WRAP32(new->energy_cores, old->energy_cores); 1366 - DELTA_WRAP32(new->energy_gfx, old->energy_gfx); 1367 - DELTA_WRAP32(new->energy_dram, old->energy_dram); 1368 - DELTA_WRAP32(new->rapl_pkg_perf_status, old->rapl_pkg_perf_status); 1369 - DELTA_WRAP32(new->rapl_dram_perf_status, old->rapl_dram_perf_status); 1258 + old->energy_pkg = new->energy_pkg - old->energy_pkg; 1259 + old->energy_cores = new->energy_cores - old->energy_cores; 1260 + old->energy_gfx = new->energy_gfx - old->energy_gfx; 1261 + old->energy_dram = new->energy_dram - old->energy_dram; 1262 + old->rapl_pkg_perf_status = new->rapl_pkg_perf_status - old->rapl_pkg_perf_status; 1263 + old->rapl_dram_perf_status = new->rapl_dram_perf_status - old->rapl_dram_perf_status; 1370 1264 1371 1265 for (i = 0, mp = sys.pp; mp; i++, mp = mp->next) { 1372 1266 if (mp->format == FORMAT_RAW) ··· 1577 1469 1578 1470 p->gfx_rc6_ms = 0; 1579 1471 p->gfx_mhz = 0; 1472 + p->gfx_act_mhz = 0; 1580 1473 for (i = 0, mp = sys.tp; mp; i++, mp = mp->next) 1581 1474 t->counter[i] = 0; 1582 1475 ··· 1673 1564 1674 1565 average.packages.gfx_rc6_ms = p->gfx_rc6_ms; 1675 1566 average.packages.gfx_mhz = p->gfx_mhz; 1567 + average.packages.gfx_act_mhz = p->gfx_act_mhz; 1676 1568 1677 1569 average.packages.pkg_temp_c = MAX(average.packages.pkg_temp_c, p->pkg_temp_c); 1678 1570 ··· 1894 1784 int i; 1895 1785 1896 1786 if (cpu_migrate(cpu)) { 1897 - fprintf(outf, "Could not migrate to CPU %d\n", cpu); 1787 + fprintf(outf, "get_counters: Could not migrate to CPU %d\n", cpu); 1898 1788 return -1; 1899 1789 } 1900 1790 ··· 2076 1966 p->sys_lpi = cpuidle_cur_sys_lpi_us; 2077 1967 2078 1968 if (do_rapl & RAPL_PKG) { 2079 - if (get_msr(cpu, MSR_PKG_ENERGY_STATUS, &msr)) 1969 + if (get_msr_sum(cpu, MSR_PKG_ENERGY_STATUS, &msr)) 2080 1970 return -13; 2081 - p->energy_pkg = msr & 0xFFFFFFFF; 1971 + p->energy_pkg = msr; 2082 1972 } 2083 1973 if (do_rapl & RAPL_CORES_ENERGY_STATUS) { 2084 - if (get_msr(cpu, MSR_PP0_ENERGY_STATUS, &msr)) 1974 + if (get_msr_sum(cpu, MSR_PP0_ENERGY_STATUS, &msr)) 2085 1975 return -14; 2086 - p->energy_cores = msr & 0xFFFFFFFF; 1976 + p->energy_cores = msr; 2087 1977 } 2088 1978 if (do_rapl & RAPL_DRAM) { 2089 - if (get_msr(cpu, MSR_DRAM_ENERGY_STATUS, &msr)) 1979 + if (get_msr_sum(cpu, MSR_DRAM_ENERGY_STATUS, &msr)) 2090 1980 return -15; 2091 - p->energy_dram = msr & 0xFFFFFFFF; 1981 + p->energy_dram = msr; 2092 1982 } 2093 1983 if (do_rapl & RAPL_GFX) { 2094 - if (get_msr(cpu, MSR_PP1_ENERGY_STATUS, &msr)) 1984 + if (get_msr_sum(cpu, MSR_PP1_ENERGY_STATUS, &msr)) 2095 1985 return -16; 2096 - p->energy_gfx = msr & 0xFFFFFFFF; 1986 + p->energy_gfx = msr; 2097 1987 } 2098 1988 if (do_rapl & RAPL_PKG_PERF_STATUS) { 2099 - if (get_msr(cpu, MSR_PKG_PERF_STATUS, &msr)) 1989 + if (get_msr_sum(cpu, MSR_PKG_PERF_STATUS, &msr)) 2100 1990 return -16; 2101 - p->rapl_pkg_perf_status = msr & 0xFFFFFFFF; 1991 + p->rapl_pkg_perf_status = msr; 2102 1992 } 2103 1993 if (do_rapl & RAPL_DRAM_PERF_STATUS) { 2104 - if (get_msr(cpu, MSR_DRAM_PERF_STATUS, &msr)) 1994 + if (get_msr_sum(cpu, MSR_DRAM_PERF_STATUS, &msr)) 2105 1995 return -16; 2106 - p->rapl_dram_perf_status = msr & 0xFFFFFFFF; 1996 + p->rapl_dram_perf_status = msr; 2107 1997 } 2108 1998 if (do_rapl & RAPL_AMD_F17H) { 2109 - if (get_msr(cpu, MSR_PKG_ENERGY_STAT, &msr)) 1999 + if (get_msr_sum(cpu, MSR_PKG_ENERGY_STAT, &msr)) 2110 2000 return -13; 2111 - p->energy_pkg = msr & 0xFFFFFFFF; 2001 + p->energy_pkg = msr; 2112 2002 } 2113 2003 if (DO_BIC(BIC_PkgTmp)) { 2114 2004 if (get_msr(cpu, MSR_IA32_PACKAGE_THERM_STATUS, &msr)) ··· 2121 2011 2122 2012 if (DO_BIC(BIC_GFXMHz)) 2123 2013 p->gfx_mhz = gfx_cur_mhz; 2014 + 2015 + if (DO_BIC(BIC_GFXACTMHz)) 2016 + p->gfx_act_mhz = gfx_act_mhz; 2124 2017 2125 2018 for (i = 0, mp = sys.pp; mp; i++, mp = mp->next) { 2126 2019 if (get_mp(cpu, mp, &p->counter[i])) ··· 2286 2173 case INTEL_FAM6_ATOM_GOLDMONT: 2287 2174 case INTEL_FAM6_SKYLAKE_X: 2288 2175 case INTEL_FAM6_ATOM_GOLDMONT_D: 2176 + case INTEL_FAM6_ATOM_TREMONT_D: 2289 2177 return 1; 2290 2178 } 2291 2179 return 0; ··· 2764 2650 2765 2651 sprintf(path, 2766 2652 "/sys/devices/system/cpu/cpu%d/topology/thread_siblings", cpu); 2767 - filep = fopen_or_die(path, "r"); 2653 + filep = fopen(path, "r"); 2654 + 2655 + if (!filep) { 2656 + warnx("%s: open failed", path); 2657 + return -1; 2658 + } 2768 2659 do { 2769 2660 offset -= BITMASK_SIZE; 2770 2661 if (fscanf(filep, "%lx%c", &map, &character) != 2) ··· 2882 2763 { 2883 2764 free_all_buffers(); 2884 2765 setup_all_buffers(); 2885 - printf("turbostat: re-initialized with num_cpus %d\n", topo.num_cpus); 2766 + fprintf(outf, "turbostat: re-initialized with num_cpus %d\n", topo.num_cpus); 2886 2767 } 2887 2768 2888 2769 void set_max_cpu_num(void) 2889 2770 { 2890 2771 FILE *filep; 2772 + int base_cpu; 2891 2773 unsigned long dummy; 2774 + char pathname[64]; 2892 2775 2776 + base_cpu = sched_getcpu(); 2777 + if (base_cpu < 0) 2778 + err(1, "cannot find calling cpu ID"); 2779 + sprintf(pathname, 2780 + "/sys/devices/system/cpu/cpu%d/topology/thread_siblings", 2781 + base_cpu); 2782 + 2783 + filep = fopen_or_die(pathname, "r"); 2893 2784 topo.max_cpu_num = 0; 2894 - filep = fopen_or_die( 2895 - "/sys/devices/system/cpu/cpu0/topology/thread_siblings", 2896 - "r"); 2897 2785 while (fscanf(filep, "%lx,", &dummy) == 1) 2898 2786 topo.max_cpu_num += BITMASK_SIZE; 2899 2787 fclose(filep); ··· 3042 2916 } 3043 2917 3044 2918 /* 2919 + * snapshot_gfx_cur_mhz() 2920 + * 2921 + * record snapshot of 2922 + * /sys/class/graphics/fb0/device/drm/card0/gt_act_freq_mhz 2923 + * 2924 + * return 1 if config change requires a restart, else return 0 2925 + */ 2926 + int snapshot_gfx_act_mhz(void) 2927 + { 2928 + static FILE *fp; 2929 + int retval; 2930 + 2931 + if (fp == NULL) 2932 + fp = fopen_or_die("/sys/class/graphics/fb0/device/drm/card0/gt_act_freq_mhz", "r"); 2933 + else { 2934 + rewind(fp); 2935 + fflush(fp); 2936 + } 2937 + 2938 + retval = fscanf(fp, "%d", &gfx_act_mhz); 2939 + if (retval != 1) 2940 + err(1, "GFX ACT MHz"); 2941 + 2942 + return 0; 2943 + } 2944 + 2945 + /* 3045 2946 * snapshot_cpu_lpi() 3046 2947 * 3047 2948 * record snapshot of ··· 3132 2979 3133 2980 if (DO_BIC(BIC_GFXMHz)) 3134 2981 snapshot_gfx_mhz(); 2982 + 2983 + if (DO_BIC(BIC_GFXACTMHz)) 2984 + snapshot_gfx_act_mhz(); 3135 2985 3136 2986 if (DO_BIC(BIC_CPU_LPI)) 3137 2987 snapshot_cpu_lpi_us(); ··· 3213 3057 } 3214 3058 } 3215 3059 3060 + int get_msr_sum(int cpu, off_t offset, unsigned long long *msr) 3061 + { 3062 + int ret, idx; 3063 + unsigned long long msr_cur, msr_last; 3064 + 3065 + if (!per_cpu_msr_sum) 3066 + return 1; 3067 + 3068 + idx = offset_to_idx(offset); 3069 + if (idx < 0) 3070 + return idx; 3071 + /* get_msr_sum() = sum + (get_msr() - last) */ 3072 + ret = get_msr(cpu, offset, &msr_cur); 3073 + if (ret) 3074 + return ret; 3075 + msr_last = per_cpu_msr_sum[cpu].entries[idx].last; 3076 + DELTA_WRAP32(msr_cur, msr_last); 3077 + *msr = msr_last + per_cpu_msr_sum[cpu].entries[idx].sum; 3078 + 3079 + return 0; 3080 + } 3081 + 3082 + timer_t timerid; 3083 + 3084 + /* Timer callback, update the sum of MSRs periodically. */ 3085 + static int update_msr_sum(struct thread_data *t, struct core_data *c, struct pkg_data *p) 3086 + { 3087 + int i, ret; 3088 + int cpu = t->cpu_id; 3089 + 3090 + for (i = IDX_PKG_ENERGY; i < IDX_COUNT; i++) { 3091 + unsigned long long msr_cur, msr_last; 3092 + int offset; 3093 + 3094 + if (!idx_valid(i)) 3095 + continue; 3096 + offset = idx_to_offset(i); 3097 + if (offset < 0) 3098 + continue; 3099 + ret = get_msr(cpu, offset, &msr_cur); 3100 + if (ret) { 3101 + fprintf(outf, "Can not update msr(0x%x)\n", offset); 3102 + continue; 3103 + } 3104 + 3105 + msr_last = per_cpu_msr_sum[cpu].entries[i].last; 3106 + per_cpu_msr_sum[cpu].entries[i].last = msr_cur & 0xffffffff; 3107 + 3108 + DELTA_WRAP32(msr_cur, msr_last); 3109 + per_cpu_msr_sum[cpu].entries[i].sum += msr_last; 3110 + } 3111 + return 0; 3112 + } 3113 + 3114 + static void 3115 + msr_record_handler(union sigval v) 3116 + { 3117 + for_all_cpus(update_msr_sum, EVEN_COUNTERS); 3118 + } 3119 + 3120 + void msr_sum_record(void) 3121 + { 3122 + struct itimerspec its; 3123 + struct sigevent sev; 3124 + 3125 + per_cpu_msr_sum = calloc(topo.max_cpu_num + 1, sizeof(struct msr_sum_array)); 3126 + if (!per_cpu_msr_sum) { 3127 + fprintf(outf, "Can not allocate memory for long time MSR.\n"); 3128 + return; 3129 + } 3130 + /* 3131 + * Signal handler might be restricted, so use thread notifier instead. 3132 + */ 3133 + memset(&sev, 0, sizeof(struct sigevent)); 3134 + sev.sigev_notify = SIGEV_THREAD; 3135 + sev.sigev_notify_function = msr_record_handler; 3136 + 3137 + sev.sigev_value.sival_ptr = &timerid; 3138 + if (timer_create(CLOCK_REALTIME, &sev, &timerid) == -1) { 3139 + fprintf(outf, "Can not create timer.\n"); 3140 + goto release_msr; 3141 + } 3142 + 3143 + its.it_value.tv_sec = 0; 3144 + its.it_value.tv_nsec = 1; 3145 + /* 3146 + * A wraparound time has been calculated early. 3147 + * Some sources state that the peak power for a 3148 + * microprocessor is usually 1.5 times the TDP rating, 3149 + * use 2 * TDP for safety. 3150 + */ 3151 + its.it_interval.tv_sec = rapl_joule_counter_range / 2; 3152 + its.it_interval.tv_nsec = 0; 3153 + 3154 + if (timer_settime(timerid, 0, &its, NULL) == -1) { 3155 + fprintf(outf, "Can not set timer.\n"); 3156 + goto release_timer; 3157 + } 3158 + return; 3159 + 3160 + release_timer: 3161 + timer_delete(timerid); 3162 + release_msr: 3163 + free(per_cpu_msr_sum); 3164 + } 3216 3165 3217 3166 void turbostat_loop() 3218 3167 { ··· 3336 3075 if (retval < -1) { 3337 3076 exit(retval); 3338 3077 } else if (retval == -1) { 3339 - if (restarted > 1) { 3078 + if (restarted > 10) { 3340 3079 exit(retval); 3341 3080 } 3342 3081 re_initialize(); ··· 3540 3279 case INTEL_FAM6_ATOM_GOLDMONT_PLUS: 3541 3280 case INTEL_FAM6_ATOM_GOLDMONT_D: /* DNV */ 3542 3281 case INTEL_FAM6_ATOM_TREMONT: /* EHL */ 3282 + case INTEL_FAM6_ATOM_TREMONT_D: /* JVL */ 3543 3283 pkg_cstate_limits = glm_pkg_cstate_limits; 3544 3284 break; 3545 3285 default: ··· 3619 3357 3620 3358 switch (model) { 3621 3359 case INTEL_FAM6_ATOM_TREMONT: 3360 + return 1; 3361 + } 3362 + return 0; 3363 + } 3364 + int is_jvl(unsigned int family, unsigned int model) 3365 + { 3366 + if (!genuine_intel) 3367 + return 0; 3368 + 3369 + switch (model) { 3370 + case INTEL_FAM6_ATOM_TREMONT_D: 3622 3371 return 1; 3623 3372 } 3624 3373 return 0; ··· 3748 3475 } 3749 3476 3750 3477 static void 3478 + remove_underbar(char *s) 3479 + { 3480 + char *to = s; 3481 + 3482 + while (*s) { 3483 + if (*s != '_') 3484 + *to++ = *s; 3485 + s++; 3486 + } 3487 + 3488 + *to = 0; 3489 + } 3490 + 3491 + static void 3751 3492 dump_cstate_pstate_config_info(unsigned int family, unsigned int model) 3752 3493 { 3753 3494 if (!do_nhm_platform_info) ··· 3817 3530 int state; 3818 3531 char *sp; 3819 3532 3820 - if (!DO_BIC(BIC_sysfs)) 3821 - return; 3822 - 3823 3533 if (access("/sys/devices/system/cpu/cpuidle", R_OK)) { 3824 3534 fprintf(outf, "cpuidle not loaded\n"); 3825 3535 return; ··· 3842 3558 sp = strchrnul(name_buf, '\n'); 3843 3559 *sp = '\0'; 3844 3560 fclose(input); 3561 + 3562 + remove_underbar(name_buf); 3845 3563 3846 3564 sprintf(path, "/sys/devices/system/cpu/cpu%d/cpuidle/state%d/desc", 3847 3565 base_cpu, state); ··· 3931 3645 return 0; 3932 3646 3933 3647 if (cpu_migrate(cpu)) { 3934 - fprintf(outf, "Could not migrate to CPU %d\n", cpu); 3648 + fprintf(outf, "print_epb: Could not migrate to CPU %d\n", cpu); 3935 3649 return -1; 3936 3650 } 3937 3651 ··· 3975 3689 return 0; 3976 3690 3977 3691 if (cpu_migrate(cpu)) { 3978 - fprintf(outf, "Could not migrate to CPU %d\n", cpu); 3692 + fprintf(outf, "print_hwp: Could not migrate to CPU %d\n", cpu); 3979 3693 return -1; 3980 3694 } 3981 3695 ··· 4063 3777 return 0; 4064 3778 4065 3779 if (cpu_migrate(cpu)) { 4066 - fprintf(outf, "Could not migrate to CPU %d\n", cpu); 3780 + fprintf(outf, "print_perf_limit: Could not migrate to CPU %d\n", cpu); 4067 3781 return -1; 4068 3782 } 4069 3783 ··· 4167 3881 4168 3882 double get_tdp_amd(unsigned int family) 4169 3883 { 4170 - switch (family) { 4171 - case 0x17: 4172 - case 0x18: 4173 - default: 4174 - /* This is the max stock TDP of HEDT/Server Fam17h chips */ 4175 - return 250.0; 4176 - } 3884 + /* This is the max stock TDP of HEDT/Server Fam17h+ chips */ 3885 + return 280.0; 4177 3886 } 4178 3887 4179 3888 /* ··· 4239 3958 BIC_PRESENT(BIC_RAMWatt); 4240 3959 BIC_PRESENT(BIC_GFXWatt); 4241 3960 } 3961 + break; 3962 + case INTEL_FAM6_ATOM_TREMONT_D: /* JVL */ 3963 + do_rapl = RAPL_PKG | RAPL_PKG_PERF_STATUS | RAPL_PKG_POWER_INFO; 3964 + BIC_PRESENT(BIC_PKG__); 3965 + if (rapl_joules) 3966 + BIC_PRESENT(BIC_Pkg_J); 3967 + else 3968 + BIC_PRESENT(BIC_PkgWatt); 4242 3969 break; 4243 3970 case INTEL_FAM6_SKYLAKE_L: /* SKL */ 4244 3971 case INTEL_FAM6_CANNONLAKE_L: /* CNL */ ··· 4358 4069 4359 4070 if (max_extended_level >= 0x80000007) { 4360 4071 __cpuid(0x80000007, eax, ebx, ecx, edx); 4361 - /* RAPL (Fam 17h) */ 4072 + /* RAPL (Fam 17h+) */ 4362 4073 has_rapl = edx & (1 << 14); 4363 4074 } 4364 4075 4365 - if (!has_rapl) 4076 + if (!has_rapl || family < 0x17) 4366 4077 return; 4367 4078 4368 - switch (family) { 4369 - case 0x17: /* Zen, Zen+ */ 4370 - case 0x18: /* Hygon Dhyana */ 4371 - do_rapl = RAPL_AMD_F17H | RAPL_PER_CORE_ENERGY; 4372 - if (rapl_joules) { 4373 - BIC_PRESENT(BIC_Pkg_J); 4374 - BIC_PRESENT(BIC_Cor_J); 4375 - } else { 4376 - BIC_PRESENT(BIC_PkgWatt); 4377 - BIC_PRESENT(BIC_CorWatt); 4378 - } 4379 - break; 4380 - default: 4381 - return; 4079 + do_rapl = RAPL_AMD_F17H | RAPL_PER_CORE_ENERGY; 4080 + if (rapl_joules) { 4081 + BIC_PRESENT(BIC_Pkg_J); 4082 + BIC_PRESENT(BIC_Cor_J); 4083 + } else { 4084 + BIC_PRESENT(BIC_PkgWatt); 4085 + BIC_PRESENT(BIC_CorWatt); 4382 4086 } 4383 4087 4384 4088 if (get_msr(base_cpu, MSR_RAPL_PWR_UNIT, &msr)) ··· 4444 4162 return 0; 4445 4163 4446 4164 if (cpu_migrate(cpu)) { 4447 - fprintf(outf, "Could not migrate to CPU %d\n", cpu); 4165 + fprintf(outf, "print_thermal: Could not migrate to CPU %d\n", cpu); 4448 4166 return -1; 4449 4167 } 4450 4168 ··· 4516 4234 4517 4235 cpu = t->cpu_id; 4518 4236 if (cpu_migrate(cpu)) { 4519 - fprintf(outf, "Could not migrate to CPU %d\n", cpu); 4237 + fprintf(outf, "print_rapl: Could not migrate to CPU %d\n", cpu); 4520 4238 return -1; 4521 4239 } 4522 4240 ··· 4643 4361 case INTEL_FAM6_ATOM_GOLDMONT_PLUS: 4644 4362 case INTEL_FAM6_ATOM_GOLDMONT_D: /* DNV */ 4645 4363 case INTEL_FAM6_ATOM_TREMONT: /* EHL */ 4364 + case INTEL_FAM6_ATOM_TREMONT_D: /* JVL */ 4646 4365 return 1; 4647 4366 } 4648 4367 return 0; ··· 4790 4507 * below this value, including the Digital Thermal Sensor (DTS), 4791 4508 * Package Thermal Management Sensor (PTM), and thermal event thresholds. 4792 4509 */ 4793 - int set_temperature_target(struct thread_data *t, struct core_data *c, struct pkg_data *p) 4510 + int read_tcc_activation_temp() 4794 4511 { 4795 4512 unsigned long long msr; 4796 - unsigned int target_c_local; 4797 - int cpu; 4513 + unsigned int tcc, target_c, offset_c; 4798 4514 4515 + /* Temperature Target MSR is Nehalem and newer only */ 4516 + if (!do_nhm_platform_info) 4517 + return 0; 4518 + 4519 + if (get_msr(base_cpu, MSR_IA32_TEMPERATURE_TARGET, &msr)) 4520 + return 0; 4521 + 4522 + target_c = (msr >> 16) & 0xFF; 4523 + 4524 + offset_c = (msr >> 24) & 0xF; 4525 + 4526 + tcc = target_c - offset_c; 4527 + 4528 + if (!quiet) 4529 + fprintf(outf, "cpu%d: MSR_IA32_TEMPERATURE_TARGET: 0x%08llx (%d C) (%d default - %d offset)\n", 4530 + base_cpu, msr, tcc, target_c, offset_c); 4531 + 4532 + return tcc; 4533 + } 4534 + 4535 + int set_temperature_target(struct thread_data *t, struct core_data *c, struct pkg_data *p) 4536 + { 4799 4537 /* tcc_activation_temp is used only for dts or ptm */ 4800 4538 if (!(do_dts || do_ptm)) 4801 4539 return 0; ··· 4825 4521 if (!(t->flags & CPU_IS_FIRST_THREAD_IN_CORE) || !(t->flags & CPU_IS_FIRST_CORE_IN_PACKAGE)) 4826 4522 return 0; 4827 4523 4828 - cpu = t->cpu_id; 4829 - if (cpu_migrate(cpu)) { 4830 - fprintf(outf, "Could not migrate to CPU %d\n", cpu); 4831 - return -1; 4832 - } 4833 - 4834 4524 if (tcc_activation_temp_override != 0) { 4835 4525 tcc_activation_temp = tcc_activation_temp_override; 4836 - fprintf(outf, "cpu%d: Using cmdline TCC Target (%d C)\n", 4837 - cpu, tcc_activation_temp); 4526 + fprintf(outf, "Using cmdline TCC Target (%d C)\n", tcc_activation_temp); 4838 4527 return 0; 4839 4528 } 4840 4529 4841 - /* Temperature Target MSR is Nehalem and newer only */ 4842 - if (!do_nhm_platform_info) 4843 - goto guess; 4530 + tcc_activation_temp = read_tcc_activation_temp(); 4531 + if (tcc_activation_temp) 4532 + return 0; 4844 4533 4845 - if (get_msr(base_cpu, MSR_IA32_TEMPERATURE_TARGET, &msr)) 4846 - goto guess; 4847 - 4848 - target_c_local = (msr >> 16) & 0xFF; 4849 - 4850 - if (!quiet) 4851 - fprintf(outf, "cpu%d: MSR_IA32_TEMPERATURE_TARGET: 0x%08llx (%d C)\n", 4852 - cpu, msr, target_c_local); 4853 - 4854 - if (!target_c_local) 4855 - goto guess; 4856 - 4857 - tcc_activation_temp = target_c_local; 4858 - 4859 - return 0; 4860 - 4861 - guess: 4862 4534 tcc_activation_temp = TJMAX_DEFAULT; 4863 - fprintf(outf, "cpu%d: Guessing tjMax %d C, Please use -T to specify\n", 4864 - cpu, tcc_activation_temp); 4535 + fprintf(outf, "Guessing tjMax %d C, Please use -T to specify\n", tcc_activation_temp); 4865 4536 4866 4537 return 0; 4867 4538 } ··· 4964 4685 case INTEL_FAM6_ICELAKE_NNPI: 4965 4686 case INTEL_FAM6_TIGERLAKE_L: 4966 4687 case INTEL_FAM6_TIGERLAKE: 4688 + case INTEL_FAM6_ROCKETLAKE: 4689 + case INTEL_FAM6_LAKEFIELD: 4690 + case INTEL_FAM6_ALDERLAKE: 4967 4691 return INTEL_FAM6_CANNONLAKE_L; 4968 - 4969 - case INTEL_FAM6_ATOM_TREMONT_D: 4970 - return INTEL_FAM6_ATOM_GOLDMONT_D; 4971 4692 4972 4693 case INTEL_FAM6_ATOM_TREMONT_L: 4973 4694 return INTEL_FAM6_ATOM_TREMONT; 4974 4695 4975 4696 case INTEL_FAM6_ICELAKE_X: 4697 + case INTEL_FAM6_SAPPHIRERAPIDS_X: 4976 4698 return INTEL_FAM6_SKYLAKE_X; 4977 4699 } 4978 4700 return model; 4979 4701 } 4702 + 4703 + void print_dev_latency(void) 4704 + { 4705 + char *path = "/dev/cpu_dma_latency"; 4706 + int fd; 4707 + int value; 4708 + int retval; 4709 + 4710 + fd = open(path, O_RDONLY); 4711 + if (fd < 0) { 4712 + warn("fopen %s\n", path); 4713 + return; 4714 + } 4715 + 4716 + retval = read(fd, (void *)&value, sizeof(int)); 4717 + if (retval != sizeof(int)) { 4718 + warn("read %s\n", path); 4719 + close(fd); 4720 + return; 4721 + } 4722 + fprintf(outf, "/dev/cpu_dma_latency: %d usec (%s)\n", 4723 + value, value == 2000000000 ? "default" : "constrained"); 4724 + 4725 + close(fd); 4726 + } 4727 + 4980 4728 void process_cpuid() 4981 4729 { 4982 4730 unsigned int eax, ebx, ecx, edx; ··· 5222 4916 BIC_PRESENT(BIC_Mod_c6); 5223 4917 use_c1_residency_msr = 1; 5224 4918 } 4919 + if (is_jvl(family, model)) { 4920 + BIC_NOT_PRESENT(BIC_CPU_c3); 4921 + BIC_NOT_PRESENT(BIC_CPU_c7); 4922 + BIC_NOT_PRESENT(BIC_Pkgpc2); 4923 + BIC_NOT_PRESENT(BIC_Pkgpc3); 4924 + BIC_NOT_PRESENT(BIC_Pkgpc6); 4925 + BIC_NOT_PRESENT(BIC_Pkgpc7); 4926 + } 5225 4927 if (is_dnv(family, model)) { 5226 4928 BIC_PRESENT(BIC_CPU_c1); 5227 4929 BIC_NOT_PRESENT(BIC_CPU_c3); ··· 5249 4935 BIC_NOT_PRESENT(BIC_Pkgpc7); 5250 4936 } 5251 4937 if (has_c8910_msrs(family, model)) { 5252 - BIC_PRESENT(BIC_Pkgpc8); 5253 - BIC_PRESENT(BIC_Pkgpc9); 5254 - BIC_PRESENT(BIC_Pkgpc10); 4938 + if (pkg_cstate_limit >= PCL__8) 4939 + BIC_PRESENT(BIC_Pkgpc8); 4940 + if (pkg_cstate_limit >= PCL__9) 4941 + BIC_PRESENT(BIC_Pkgpc9); 4942 + if (pkg_cstate_limit >= PCL_10) 4943 + BIC_PRESENT(BIC_Pkgpc10); 5255 4944 } 5256 4945 do_irtl_hsw = has_c8910_msrs(family, model); 5257 4946 if (has_skl_msrs(family, model)) { ··· 5284 4967 dump_cstate_pstate_config_info(family, model); 5285 4968 5286 4969 if (!quiet) 4970 + print_dev_latency(); 4971 + if (!quiet) 5287 4972 dump_sysfs_cstate_config(); 5288 4973 if (!quiet) 5289 4974 dump_sysfs_pstate_config(); ··· 5298 4979 5299 4980 if (!access("/sys/class/graphics/fb0/device/drm/card0/gt_cur_freq_mhz", R_OK)) 5300 4981 BIC_PRESENT(BIC_GFXMHz); 4982 + 4983 + if (!access("/sys/class/graphics/fb0/device/drm/card0/gt_act_freq_mhz", R_OK)) 4984 + BIC_PRESENT(BIC_GFXACTMHz); 5301 4985 5302 4986 if (!access("/sys/devices/system/cpu/cpuidle/low_power_idle_cpu_residency_us", R_OK)) 5303 4987 BIC_PRESENT(BIC_CPU_LPI); ··· 5712 5390 } 5713 5391 5714 5392 void print_version() { 5715 - fprintf(outf, "turbostat version 20.03.20" 5393 + fprintf(outf, "turbostat version 20.09.30" 5716 5394 " - Len Brown <lenb@kernel.org>\n"); 5717 5395 } 5718 5396 ··· 5919 5597 *sp = '%'; 5920 5598 *(sp + 1) = '\0'; 5921 5599 5600 + remove_underbar(name_buf); 5601 + 5922 5602 fclose(input); 5923 5603 5924 5604 sprintf(path, "cpuidle/state%d/time", state); ··· 5947 5623 sp = strchrnul(name_buf, '\n'); 5948 5624 *sp = '\0'; 5949 5625 fclose(input); 5626 + 5627 + remove_underbar(name_buf); 5950 5628 5951 5629 sprintf(path, "cpuidle/state%d/usage", state); 5952 5630 ··· 6194 5868 return 0; 6195 5869 } 6196 5870 5871 + msr_sum_record(); 6197 5872 /* 6198 5873 * if any params left, it must be a command to fork 6199 5874 */

+54 -13

tools/power/x86/x86_energy_perf_policy/x86_energy_perf_policy.c

··· 622 622 } 623 623 } 624 624 625 + /* 626 + * Open a file, and exit on failure 627 + */ 628 + FILE *fopen_or_die(const char *path, const char *mode) 629 + { 630 + FILE *filep = fopen(path, "r"); 631 + 632 + if (!filep) 633 + err(1, "%s: open failed", path); 634 + return filep; 635 + } 636 + 637 + void err_on_hypervisor(void) 638 + { 639 + FILE *cpuinfo; 640 + char *flags, *hypervisor; 641 + char *buffer; 642 + 643 + /* On VMs /proc/cpuinfo contains a "flags" entry for hypervisor */ 644 + cpuinfo = fopen_or_die("/proc/cpuinfo", "ro"); 645 + 646 + buffer = malloc(4096); 647 + if (!buffer) { 648 + fclose(cpuinfo); 649 + err(-ENOMEM, "buffer malloc fail"); 650 + } 651 + 652 + if (!fread(buffer, 1024, 1, cpuinfo)) { 653 + fclose(cpuinfo); 654 + free(buffer); 655 + err(1, "Reading /proc/cpuinfo failed"); 656 + } 657 + 658 + flags = strstr(buffer, "flags"); 659 + rewind(cpuinfo); 660 + fseek(cpuinfo, flags - buffer, SEEK_SET); 661 + if (!fgets(buffer, 4096, cpuinfo)) { 662 + fclose(cpuinfo); 663 + free(buffer); 664 + err(1, "Reading /proc/cpuinfo failed"); 665 + } 666 + fclose(cpuinfo); 667 + 668 + hypervisor = strstr(buffer, "hypervisor"); 669 + 670 + free(buffer); 671 + 672 + if (hypervisor) 673 + err(-1, 674 + "not supported on this virtual machine"); 675 + } 625 676 626 677 int get_msr(int cpu, int offset, unsigned long long *msr) 627 678 { ··· 686 635 err(-1, "%s open failed, try chown or chmod +r /dev/cpu/*/msr, or run as root", pathname); 687 636 688 637 retval = pread(fd, msr, sizeof(*msr), offset); 689 - if (retval != sizeof(*msr)) 638 + if (retval != sizeof(*msr)) { 639 + err_on_hypervisor(); 690 640 err(-1, "%s offset 0x%llx read failed", pathname, (unsigned long long)offset); 641 + } 691 642 692 643 if (debug > 1) 693 644 fprintf(stderr, "get_msr(cpu%d, 0x%X, 0x%llX)\n", cpu, offset, *msr); ··· 1137 1084 1138 1085 update_hwp_request(cpu); 1139 1086 return 0; 1140 - } 1141 - 1142 - /* 1143 - * Open a file, and exit on failure 1144 - */ 1145 - FILE *fopen_or_die(const char *path, const char *mode) 1146 - { 1147 - FILE *filep = fopen(path, "r"); 1148 - 1149 - if (!filep) 1150 - err(1, "%s: open failed", path); 1151 - return filep; 1152 1087 } 1153 1088 1154 1089 unsigned int get_pkg_num(int cpu)

+214

tools/testing/selftests/bpf/prog_tests/map_init.c

··· 1 + // SPDX-License-Identifier: GPL-2.0-only 2 + /* Copyright (c) 2020 Tessares SA <http://www.tessares.net> */ 3 + 4 + #include <test_progs.h> 5 + #include "test_map_init.skel.h" 6 + 7 + #define TEST_VALUE 0x1234 8 + #define FILL_VALUE 0xdeadbeef 9 + 10 + static int nr_cpus; 11 + static int duration; 12 + 13 + typedef unsigned long long map_key_t; 14 + typedef unsigned long long map_value_t; 15 + typedef struct { 16 + map_value_t v; /* padding */ 17 + } __bpf_percpu_val_align pcpu_map_value_t; 18 + 19 + 20 + static int map_populate(int map_fd, int num) 21 + { 22 + pcpu_map_value_t value[nr_cpus]; 23 + int i, err; 24 + map_key_t key; 25 + 26 + for (i = 0; i < nr_cpus; i++) 27 + bpf_percpu(value, i) = FILL_VALUE; 28 + 29 + for (key = 1; key <= num; key++) { 30 + err = bpf_map_update_elem(map_fd, &key, value, BPF_NOEXIST); 31 + if (!ASSERT_OK(err, "bpf_map_update_elem")) 32 + return -1; 33 + } 34 + 35 + return 0; 36 + } 37 + 38 + static struct test_map_init *setup(enum bpf_map_type map_type, int map_sz, 39 + int *map_fd, int populate) 40 + { 41 + struct test_map_init *skel; 42 + int err; 43 + 44 + skel = test_map_init__open(); 45 + if (!ASSERT_OK_PTR(skel, "skel_open")) 46 + return NULL; 47 + 48 + err = bpf_map__set_type(skel->maps.hashmap1, map_type); 49 + if (!ASSERT_OK(err, "bpf_map__set_type")) 50 + goto error; 51 + 52 + err = bpf_map__set_max_entries(skel->maps.hashmap1, map_sz); 53 + if (!ASSERT_OK(err, "bpf_map__set_max_entries")) 54 + goto error; 55 + 56 + err = test_map_init__load(skel); 57 + if (!ASSERT_OK(err, "skel_load")) 58 + goto error; 59 + 60 + *map_fd = bpf_map__fd(skel->maps.hashmap1); 61 + if (CHECK(*map_fd < 0, "bpf_map__fd", "failed\n")) 62 + goto error; 63 + 64 + err = map_populate(*map_fd, populate); 65 + if (!ASSERT_OK(err, "map_populate")) 66 + goto error_map; 67 + 68 + return skel; 69 + 70 + error_map: 71 + close(*map_fd); 72 + error: 73 + test_map_init__destroy(skel); 74 + return NULL; 75 + } 76 + 77 + /* executes bpf program that updates map with key, value */ 78 + static int prog_run_insert_elem(struct test_map_init *skel, map_key_t key, 79 + map_value_t value) 80 + { 81 + struct test_map_init__bss *bss; 82 + 83 + bss = skel->bss; 84 + 85 + bss->inKey = key; 86 + bss->inValue = value; 87 + bss->inPid = getpid(); 88 + 89 + if (!ASSERT_OK(test_map_init__attach(skel), "skel_attach")) 90 + return -1; 91 + 92 + /* Let tracepoint trigger */ 93 + syscall(__NR_getpgid); 94 + 95 + test_map_init__detach(skel); 96 + 97 + return 0; 98 + } 99 + 100 + static int check_values_one_cpu(pcpu_map_value_t *value, map_value_t expected) 101 + { 102 + int i, nzCnt = 0; 103 + map_value_t val; 104 + 105 + for (i = 0; i < nr_cpus; i++) { 106 + val = bpf_percpu(value, i); 107 + if (val) { 108 + if (CHECK(val != expected, "map value", 109 + "unexpected for cpu %d: 0x%llx\n", i, val)) 110 + return -1; 111 + nzCnt++; 112 + } 113 + } 114 + 115 + if (CHECK(nzCnt != 1, "map value", "set for %d CPUs instead of 1!\n", 116 + nzCnt)) 117 + return -1; 118 + 119 + return 0; 120 + } 121 + 122 + /* Add key=1 elem with values set for all CPUs 123 + * Delete elem key=1 124 + * Run bpf prog that inserts new key=1 elem with value=0x1234 125 + * (bpf prog can only set value for current CPU) 126 + * Lookup Key=1 and check value is as expected for all CPUs: 127 + * value set by bpf prog for one CPU, 0 for all others 128 + */ 129 + static void test_pcpu_map_init(void) 130 + { 131 + pcpu_map_value_t value[nr_cpus]; 132 + struct test_map_init *skel; 133 + int map_fd, err; 134 + map_key_t key; 135 + 136 + /* max 1 elem in map so insertion is forced to reuse freed entry */ 137 + skel = setup(BPF_MAP_TYPE_PERCPU_HASH, 1, &map_fd, 1); 138 + if (!ASSERT_OK_PTR(skel, "prog_setup")) 139 + return; 140 + 141 + /* delete element so the entry can be re-used*/ 142 + key = 1; 143 + err = bpf_map_delete_elem(map_fd, &key); 144 + if (!ASSERT_OK(err, "bpf_map_delete_elem")) 145 + goto cleanup; 146 + 147 + /* run bpf prog that inserts new elem, re-using the slot just freed */ 148 + err = prog_run_insert_elem(skel, key, TEST_VALUE); 149 + if (!ASSERT_OK(err, "prog_run_insert_elem")) 150 + goto cleanup; 151 + 152 + /* check that key=1 was re-created by bpf prog */ 153 + err = bpf_map_lookup_elem(map_fd, &key, value); 154 + if (!ASSERT_OK(err, "bpf_map_lookup_elem")) 155 + goto cleanup; 156 + 157 + /* and has expected values */ 158 + check_values_one_cpu(value, TEST_VALUE); 159 + 160 + cleanup: 161 + test_map_init__destroy(skel); 162 + } 163 + 164 + /* Add key=1 and key=2 elems with values set for all CPUs 165 + * Run bpf prog that inserts new key=3 elem 166 + * (only for current cpu; other cpus should have initial value = 0) 167 + * Lookup Key=1 and check value is as expected for all CPUs 168 + */ 169 + static void test_pcpu_lru_map_init(void) 170 + { 171 + pcpu_map_value_t value[nr_cpus]; 172 + struct test_map_init *skel; 173 + int map_fd, err; 174 + map_key_t key; 175 + 176 + /* Set up LRU map with 2 elements, values filled for all CPUs. 177 + * With these 2 elements, the LRU map is full 178 + */ 179 + skel = setup(BPF_MAP_TYPE_LRU_PERCPU_HASH, 2, &map_fd, 2); 180 + if (!ASSERT_OK_PTR(skel, "prog_setup")) 181 + return; 182 + 183 + /* run bpf prog that inserts new key=3 element, re-using LRU slot */ 184 + key = 3; 185 + err = prog_run_insert_elem(skel, key, TEST_VALUE); 186 + if (!ASSERT_OK(err, "prog_run_insert_elem")) 187 + goto cleanup; 188 + 189 + /* check that key=3 replaced one of earlier elements */ 190 + err = bpf_map_lookup_elem(map_fd, &key, value); 191 + if (!ASSERT_OK(err, "bpf_map_lookup_elem")) 192 + goto cleanup; 193 + 194 + /* and has expected values */ 195 + check_values_one_cpu(value, TEST_VALUE); 196 + 197 + cleanup: 198 + test_map_init__destroy(skel); 199 + } 200 + 201 + void test_map_init(void) 202 + { 203 + nr_cpus = bpf_num_possible_cpus(); 204 + if (nr_cpus <= 1) { 205 + printf("%s:SKIP: >1 cpu needed for this test\n", __func__); 206 + test__skip(); 207 + return; 208 + } 209 + 210 + if (test__start_subtest("pcpu_map_init")) 211 + test_pcpu_map_init(); 212 + if (test__start_subtest("pcpu_lru_map_init")) 213 + test_pcpu_lru_map_init(); 214 + }

+8 -3

tools/testing/selftests/bpf/progs/profiler.inc.h

··· 243 243 } 244 244 } 245 245 246 - int pids_cgrp_id = 1; 246 + extern bool CONFIG_CGROUP_PIDS __kconfig __weak; 247 + enum cgroup_subsys_id___local { 248 + pids_cgrp_id___local = 123, /* value doesn't matter */ 249 + }; 247 250 248 251 static INLINE void* populate_cgroup_info(struct cgroup_data_t* cgroup_data, 249 252 struct task_struct* task, ··· 256 253 BPF_CORE_READ(task, nsproxy, cgroup_ns, root_cset, dfl_cgrp, kn); 257 254 struct kernfs_node* proc_kernfs = BPF_CORE_READ(task, cgroups, dfl_cgrp, kn); 258 255 259 - if (ENABLE_CGROUP_V1_RESOLVER) { 256 + if (ENABLE_CGROUP_V1_RESOLVER && CONFIG_CGROUP_PIDS) { 257 + int cgrp_id = bpf_core_enum_value(enum cgroup_subsys_id___local, 258 + pids_cgrp_id___local); 260 259 #ifdef UNROLL 261 260 #pragma unroll 262 261 #endif ··· 267 262 BPF_CORE_READ(task, cgroups, subsys[i]); 268 263 if (subsys != NULL) { 269 264 int subsys_id = BPF_CORE_READ(subsys, ss, id); 270 - if (subsys_id == pids_cgrp_id) { 265 + if (subsys_id == cgrp_id) { 271 266 proc_kernfs = BPF_CORE_READ(subsys, cgroup, kn); 272 267 root_kernfs = BPF_CORE_READ(subsys, ss, root, kf_root, kn); 273 268 break;

+33

tools/testing/selftests/bpf/progs/test_map_init.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* Copyright (c) 2020 Tessares SA <http://www.tessares.net> */ 3 + 4 + #include "vmlinux.h" 5 + #include <bpf/bpf_helpers.h> 6 + 7 + __u64 inKey = 0; 8 + __u64 inValue = 0; 9 + __u32 inPid = 0; 10 + 11 + struct { 12 + __uint(type, BPF_MAP_TYPE_PERCPU_HASH); 13 + __uint(max_entries, 2); 14 + __type(key, __u64); 15 + __type(value, __u64); 16 + } hashmap1 SEC(".maps"); 17 + 18 + 19 + SEC("tp/syscalls/sys_enter_getpgid") 20 + int sysenter_getpgid(const void *ctx) 21 + { 22 + /* Just do it for once, when called from our own test prog. This 23 + * ensures the map value is only updated for a single CPU. 24 + */ 25 + int cur_pid = bpf_get_current_pid_tgid() >> 32; 26 + 27 + if (cur_pid == inPid) 28 + bpf_map_update_elem(&hashmap1, &inKey, &inValue, BPF_NOEXIST); 29 + 30 + return 0; 31 + } 32 + 33 + char _license[] SEC("license") = "GPL";

+1 -1

tools/testing/selftests/clone3/clone3_cap_checkpoint_restore.c

··· 145 145 test_clone3_supported(); 146 146 147 147 EXPECT_EQ(getuid(), 0) 148 - XFAIL(return, "Skipping all tests as non-root\n"); 148 + SKIP(return, "Skipping all tests as non-root"); 149 149 150 150 memset(&set_tid, 0, sizeof(set_tid)); 151 151

+4 -4

tools/testing/selftests/core/close_range_test.c

··· 44 44 fd = open("/dev/null", O_RDONLY | O_CLOEXEC); 45 45 ASSERT_GE(fd, 0) { 46 46 if (errno == ENOENT) 47 - XFAIL(return, "Skipping test since /dev/null does not exist"); 47 + SKIP(return, "Skipping test since /dev/null does not exist"); 48 48 } 49 49 50 50 open_fds[i] = fd; ··· 52 52 53 53 EXPECT_EQ(-1, sys_close_range(open_fds[0], open_fds[100], -1)) { 54 54 if (errno == ENOSYS) 55 - XFAIL(return, "close_range() syscall not supported"); 55 + SKIP(return, "close_range() syscall not supported"); 56 56 } 57 57 58 58 EXPECT_EQ(0, sys_close_range(open_fds[0], open_fds[50], 0)); ··· 108 108 fd = open("/dev/null", O_RDONLY | O_CLOEXEC); 109 109 ASSERT_GE(fd, 0) { 110 110 if (errno == ENOENT) 111 - XFAIL(return, "Skipping test since /dev/null does not exist"); 111 + SKIP(return, "Skipping test since /dev/null does not exist"); 112 112 } 113 113 114 114 open_fds[i] = fd; ··· 197 197 fd = open("/dev/null", O_RDONLY | O_CLOEXEC); 198 198 ASSERT_GE(fd, 0) { 199 199 if (errno == ENOENT) 200 - XFAIL(return, "Skipping test since /dev/null does not exist"); 200 + SKIP(return, "Skipping test since /dev/null does not exist"); 201 201 } 202 202 203 203 open_fds[i] = fd;

+4 -4

tools/testing/selftests/filesystems/binderfs/binderfs_test.c

··· 74 74 ret = mount(NULL, binderfs_mntpt, "binder", 0, 0); 75 75 EXPECT_EQ(ret, 0) { 76 76 if (errno == ENODEV) 77 - XFAIL(goto out, "binderfs missing"); 77 + SKIP(goto out, "binderfs missing"); 78 78 TH_LOG("%s - Failed to mount binderfs", strerror(errno)); 79 79 goto rmdir; 80 80 } ··· 475 475 TEST(binderfs_test_privileged) 476 476 { 477 477 if (geteuid() != 0) 478 - XFAIL(return, "Tests are not run as root. Skipping privileged tests"); 478 + SKIP(return, "Tests are not run as root. Skipping privileged tests"); 479 479 480 480 if (__do_binderfs_test(_metadata)) 481 - XFAIL(return, "The Android binderfs filesystem is not available"); 481 + SKIP(return, "The Android binderfs filesystem is not available"); 482 482 } 483 483 484 484 TEST(binderfs_test_unprivileged) ··· 511 511 ret = wait_for_pid(pid); 512 512 if (ret) { 513 513 if (ret == 2) 514 - XFAIL(return, "The Android binderfs filesystem is not available"); 514 + SKIP(return, "The Android binderfs filesystem is not available"); 515 515 ASSERT_EQ(ret, 0) { 516 516 TH_LOG("wait_for_pid() failed"); 517 517 }

+1 -1

tools/testing/selftests/ftrace/test.d/dynevent/add_remove_kprobe.tc

··· 6 6 echo 0 > events/enable 7 7 echo > dynamic_events 8 8 9 - PLACE=kernel_clone 9 + PLACE=$FUNCTION_FORK 10 10 11 11 echo "p:myevent1 $PLACE" >> dynamic_events 12 12 echo "r:myevent2 $PLACE" >> dynamic_events

+1 -1

tools/testing/selftests/ftrace/test.d/dynevent/clear_select_events.tc

··· 6 6 echo 0 > events/enable 7 7 echo > dynamic_events 8 8 9 - PLACE=kernel_clone 9 + PLACE=$FUNCTION_FORK 10 10 11 11 setup_events() { 12 12 echo "p:myevent1 $PLACE" >> dynamic_events

+1 -1

tools/testing/selftests/ftrace/test.d/dynevent/generic_clear_event.tc

··· 6 6 echo 0 > events/enable 7 7 echo > dynamic_events 8 8 9 - PLACE=kernel_clone 9 + PLACE=$FUNCTION_FORK 10 10 11 11 setup_events() { 12 12 echo "p:myevent1 $PLACE" >> dynamic_events

+1 -1

tools/testing/selftests/ftrace/test.d/ftrace/func-filter-notrace-pid.tc

··· 39 39 disable_tracing 40 40 41 41 echo do_execve* > set_ftrace_filter 42 - echo *do_fork >> set_ftrace_filter 42 + echo $FUNCTION_FORK >> set_ftrace_filter 43 43 44 44 echo $PID > set_ftrace_notrace_pid 45 45 echo function > current_tracer

+1 -1

tools/testing/selftests/ftrace/test.d/ftrace/func-filter-pid.tc

··· 39 39 disable_tracing 40 40 41 41 echo do_execve* > set_ftrace_filter 42 - echo *do_fork >> set_ftrace_filter 42 + echo $FUNCTION_FORK >> set_ftrace_filter 43 43 44 44 echo $PID > set_ftrace_pid 45 45 echo function > current_tracer

+2 -2

tools/testing/selftests/ftrace/test.d/ftrace/func-filter-stacktrace.tc

··· 4 4 # requires: set_ftrace_filter 5 5 # flags: instance 6 6 7 - echo kernel_clone:stacktrace >> set_ftrace_filter 7 + echo $FUNCTION_FORK:stacktrace >> set_ftrace_filter 8 8 9 - grep -q "kernel_clone:stacktrace:unlimited" set_ftrace_filter 9 + grep -q "$FUNCTION_FORK:stacktrace:unlimited" set_ftrace_filter 10 10 11 11 (echo "forked"; sleep 1) 12 12

+7

tools/testing/selftests/ftrace/test.d/functions

··· 133 133 ping $LOCALHOST -c 1 || sleep .001 || usleep 1 || sleep 1 134 134 } 135 135 136 + # The fork function in the kernel was renamed from "_do_fork" to 137 + # "kernel_fork". As older tests should still work with older kernels 138 + # as well as newer kernels, check which version of fork is used on this 139 + # kernel so that the tests can use the fork function for the running kernel. 140 + FUNCTION_FORK=`(if grep '\bkernel_clone\b' /proc/kallsyms > /dev/null; then 141 + echo kernel_clone; else echo '_do_fork'; fi)` 142 + 136 143 # Since probe event command may include backslash, explicitly use printf "%s" 137 144 # to NOT interpret it. 138 145 ftrace_errlog_check() { # err-prefix command-with-error-pos-by-^ command-file

+1 -1

tools/testing/selftests/ftrace/test.d/kprobe/add_and_remove.tc

··· 3 3 # description: Kprobe dynamic event - adding and removing 4 4 # requires: kprobe_events 5 5 6 - echo p:myevent kernel_clone > kprobe_events 6 + echo p:myevent $FUNCTION_FORK > kprobe_events 7 7 grep myevent kprobe_events 8 8 test -d events/kprobes/myevent 9 9 echo > kprobe_events

+1 -1

tools/testing/selftests/ftrace/test.d/kprobe/busy_check.tc

··· 3 3 # description: Kprobe dynamic event - busy event check 4 4 # requires: kprobe_events 5 5 6 - echo p:myevent kernel_clone > kprobe_events 6 + echo p:myevent $FUNCTION_FORK > kprobe_events 7 7 test -d events/kprobes/myevent 8 8 echo 1 > events/kprobes/myevent/enable 9 9 echo > kprobe_events && exit_fail # this must fail

+2 -2

tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args.tc

··· 3 3 # description: Kprobe dynamic event with arguments 4 4 # requires: kprobe_events 5 5 6 - echo 'p:testprobe kernel_clone $stack $stack0 +0($stack)' > kprobe_events 6 + echo "p:testprobe $FUNCTION_FORK \$stack \$stack0 +0(\$stack)" > kprobe_events 7 7 grep testprobe kprobe_events | grep -q 'arg1=\$stack arg2=\$stack0 arg3=+0(\$stack)' 8 8 test -d events/kprobes/testprobe 9 9 10 10 echo 1 > events/kprobes/testprobe/enable 11 11 ( echo "forked") 12 - grep testprobe trace | grep 'kernel_clone' | \ 12 + grep testprobe trace | grep "$FUNCTION_FORK" | \ 13 13 grep -q 'arg1=0x[[:xdigit:]]* arg2=0x[[:xdigit:]]* arg3=0x[[:xdigit:]]*$' 14 14 15 15 echo 0 > events/kprobes/testprobe/enable

+1 -1

tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_comm.tc

··· 5 5 6 6 grep -A1 "fetcharg:" README | grep -q "\$comm" || exit_unsupported # this is too old 7 7 8 - echo 'p:testprobe kernel_clone comm=$comm ' > kprobe_events 8 + echo "p:testprobe $FUNCTION_FORK comm=\$comm " > kprobe_events 9 9 grep testprobe kprobe_events | grep -q 'comm=$comm' 10 10 test -d events/kprobes/testprobe 11 11

+2 -2

tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc

··· 30 30 : "Test get argument (1)" 31 31 echo "p:testprobe tracefs_create_dir arg1=+0(${ARG1}):string" > kprobe_events 32 32 echo 1 > events/kprobes/testprobe/enable 33 - echo "p:test kernel_clone" >> kprobe_events 33 + echo "p:test $FUNCTION_FORK" >> kprobe_events 34 34 grep -qe "testprobe.* arg1=\"test\"" trace 35 35 36 36 echo 0 > events/kprobes/testprobe/enable 37 37 : "Test get argument (2)" 38 38 echo "p:testprobe tracefs_create_dir arg1=+0(${ARG1}):string arg2=+0(${ARG1}):string" > kprobe_events 39 39 echo 1 > events/kprobes/testprobe/enable 40 - echo "p:test kernel_clone" >> kprobe_events 40 + echo "p:test $FUNCTION_FORK" >> kprobe_events 41 41 grep -qe "testprobe.* arg1=\"test\" arg2=\"test\"" trace 42 42

+5 -5

tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_symbol.tc

··· 14 14 fi 15 15 16 16 : "Test get basic types symbol argument" 17 - echo "p:testprobe_u kernel_clone arg1=@linux_proc_banner:u64 arg2=@linux_proc_banner:u32 arg3=@linux_proc_banner:u16 arg4=@linux_proc_banner:u8" > kprobe_events 18 - echo "p:testprobe_s kernel_clone arg1=@linux_proc_banner:s64 arg2=@linux_proc_banner:s32 arg3=@linux_proc_banner:s16 arg4=@linux_proc_banner:s8" >> kprobe_events 17 + echo "p:testprobe_u $FUNCTION_FORK arg1=@linux_proc_banner:u64 arg2=@linux_proc_banner:u32 arg3=@linux_proc_banner:u16 arg4=@linux_proc_banner:u8" > kprobe_events 18 + echo "p:testprobe_s $FUNCTION_FORK arg1=@linux_proc_banner:s64 arg2=@linux_proc_banner:s32 arg3=@linux_proc_banner:s16 arg4=@linux_proc_banner:s8" >> kprobe_events 19 19 if grep -q "x8/16/32/64" README; then 20 - echo "p:testprobe_x kernel_clone arg1=@linux_proc_banner:x64 arg2=@linux_proc_banner:x32 arg3=@linux_proc_banner:x16 arg4=@linux_proc_banner:x8" >> kprobe_events 20 + echo "p:testprobe_x $FUNCTION_FORK arg1=@linux_proc_banner:x64 arg2=@linux_proc_banner:x32 arg3=@linux_proc_banner:x16 arg4=@linux_proc_banner:x8" >> kprobe_events 21 21 fi 22 - echo "p:testprobe_bf kernel_clone arg1=@linux_proc_banner:b8@4/32" >> kprobe_events 22 + echo "p:testprobe_bf $FUNCTION_FORK arg1=@linux_proc_banner:b8@4/32" >> kprobe_events 23 23 echo 1 > events/kprobes/enable 24 24 (echo "forked") 25 25 echo 0 > events/kprobes/enable ··· 27 27 grep "testprobe_bf:.* arg1=.*" trace 28 28 29 29 : "Test get string symbol argument" 30 - echo "p:testprobe_str kernel_clone arg1=@linux_proc_banner:string" > kprobe_events 30 + echo "p:testprobe_str $FUNCTION_FORK arg1=@linux_proc_banner:string" > kprobe_events 31 31 echo 1 > events/kprobes/enable 32 32 (echo "forked") 33 33 echo 0 > events/kprobes/enable

+1 -1

tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_type.tc

··· 4 4 # requires: kprobe_events "x8/16/32/64":README 5 5 6 6 gen_event() { # Bitsize 7 - echo "p:testprobe kernel_clone \$stack0:s$1 \$stack0:u$1 \$stack0:x$1 \$stack0:b4@4/$1" 7 + echo "p:testprobe $FUNCTION_FORK \$stack0:s$1 \$stack0:u$1 \$stack0:x$1 \$stack0:b4@4/$1" 8 8 } 9 9 10 10 check_types() { # s-type u-type x-type bf-type width

+4

tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_user.tc

··· 9 9 :;: "user-memory access syntax and ustring working on user memory";: 10 10 echo 'p:myevent do_sys_open path=+0($arg2):ustring path2=+u0($arg2):string' \ 11 11 > kprobe_events 12 + echo 'p:myevent2 do_sys_openat2 path=+0($arg2):ustring path2=+u0($arg2):string' \ 13 + >> kprobe_events 12 14 13 15 grep myevent kprobe_events | \ 14 16 grep -q 'path=+0($arg2):ustring path2=+u0($arg2):string' 15 17 echo 1 > events/kprobes/myevent/enable 18 + echo 1 > events/kprobes/myevent2/enable 16 19 echo > /dev/null 17 20 echo 0 > events/kprobes/myevent/enable 21 + echo 0 > events/kprobes/myevent2/enable 18 22 19 23 grep myevent trace | grep -q 'path="/dev/null" path2="/dev/null"' 20 24

+7 -7

tools/testing/selftests/ftrace/test.d/kprobe/kprobe_ftrace.tc

··· 5 5 6 6 # prepare 7 7 echo nop > current_tracer 8 - echo kernel_clone > set_ftrace_filter 9 - echo 'p:testprobe kernel_clone' > kprobe_events 8 + echo $FUNCTION_FORK > set_ftrace_filter 9 + echo "p:testprobe $FUNCTION_FORK" > kprobe_events 10 10 11 11 # kprobe on / ftrace off 12 12 echo 1 > events/kprobes/testprobe/enable 13 13 echo > trace 14 14 ( echo "forked") 15 15 grep testprobe trace 16 - ! grep 'kernel_clone <-' trace 16 + ! grep "$FUNCTION_FORK <-" trace 17 17 18 18 # kprobe on / ftrace on 19 19 echo function > current_tracer 20 20 echo > trace 21 21 ( echo "forked") 22 22 grep testprobe trace 23 - grep 'kernel_clone <-' trace 23 + grep "$FUNCTION_FORK <-" trace 24 24 25 25 # kprobe off / ftrace on 26 26 echo 0 > events/kprobes/testprobe/enable 27 27 echo > trace 28 28 ( echo "forked") 29 29 ! grep testprobe trace 30 - grep 'kernel_clone <-' trace 30 + grep "$FUNCTION_FORK <-" trace 31 31 32 32 # kprobe on / ftrace on 33 33 echo 1 > events/kprobes/testprobe/enable ··· 35 35 echo > trace 36 36 ( echo "forked") 37 37 grep testprobe trace 38 - grep 'kernel_clone <-' trace 38 + grep "$FUNCTION_FORK <-" trace 39 39 40 40 # kprobe on / ftrace off 41 41 echo nop > current_tracer 42 42 echo > trace 43 43 ( echo "forked") 44 44 grep testprobe trace 45 - ! grep 'kernel_clone <-' trace 45 + ! grep "$FUNCTION_FORK <-" trace

+1 -1

tools/testing/selftests/ftrace/test.d/kprobe/kprobe_multiprobe.tc

··· 4 4 # requires: kprobe_events "Create/append/":README 5 5 6 6 # Choose 2 symbols for target 7 - SYM1=kernel_clone 7 + SYM1=$FUNCTION_FORK 8 8 SYM2=do_exit 9 9 EVENT_NAME=kprobes/testevent 10 10

+6 -6

tools/testing/selftests/ftrace/test.d/kprobe/kprobe_syntax_errors.tc

··· 86 86 87 87 # multiprobe errors 88 88 if grep -q "Create/append/" README && grep -q "imm-value" README; then 89 - echo 'p:kprobes/testevent kernel_clone' > kprobe_events 89 + echo "p:kprobes/testevent $FUNCTION_FORK" > kprobe_events 90 90 check_error '^r:kprobes/testevent do_exit' # DIFF_PROBE_TYPE 91 91 92 92 # Explicitly use printf "%s" to not interpret \1 93 - printf "%s" 'p:kprobes/testevent kernel_clone abcd=\1' > kprobe_events 94 - check_error 'p:kprobes/testevent kernel_clone ^bcd=\1' # DIFF_ARG_TYPE 95 - check_error 'p:kprobes/testevent kernel_clone ^abcd=\1:u8' # DIFF_ARG_TYPE 96 - check_error 'p:kprobes/testevent kernel_clone ^abcd=\"foo"' # DIFF_ARG_TYPE 97 - check_error '^p:kprobes/testevent kernel_clone abcd=\1' # SAME_PROBE 93 + printf "%s" "p:kprobes/testevent $FUNCTION_FORK abcd=\\1" > kprobe_events 94 + check_error "p:kprobes/testevent $FUNCTION_FORK ^bcd=\\1" # DIFF_ARG_TYPE 95 + check_error "p:kprobes/testevent $FUNCTION_FORK ^abcd=\\1:u8" # DIFF_ARG_TYPE 96 + check_error "p:kprobes/testevent $FUNCTION_FORK ^abcd=\\\"foo\"" # DIFF_ARG_TYPE 97 + check_error "^p:kprobes/testevent $FUNCTION_FORK abcd=\\1" # SAME_PROBE 98 98 fi 99 99 100 100 # %return suffix errors

+2 -2

tools/testing/selftests/ftrace/test.d/kprobe/kretprobe_args.tc

··· 4 4 # requires: kprobe_events 5 5 6 6 # Add new kretprobe event 7 - echo 'r:testprobe2 kernel_clone $retval' > kprobe_events 7 + echo "r:testprobe2 $FUNCTION_FORK \$retval" > kprobe_events 8 8 grep testprobe2 kprobe_events | grep -q 'arg1=\$retval' 9 9 test -d events/kprobes/testprobe2 10 10 11 11 echo 1 > events/kprobes/testprobe2/enable 12 12 ( echo "forked") 13 13 14 - cat trace | grep testprobe2 | grep -q '<- kernel_clone' 14 + cat trace | grep testprobe2 | grep -q "<- $FUNCTION_FORK" 15 15 16 16 echo 0 > events/kprobes/testprobe2/enable 17 17 echo '-:testprobe2' >> kprobe_events

+1 -1

tools/testing/selftests/ftrace/test.d/kprobe/profile.tc

··· 4 4 # requires: kprobe_events 5 5 6 6 ! grep -q 'myevent' kprobe_profile 7 - echo p:myevent kernel_clone > kprobe_events 7 + echo "p:myevent $FUNCTION_FORK" > kprobe_events 8 8 grep -q 'myevent[[:space:]]*0[[:space:]]*0$' kprobe_profile 9 9 echo 1 > events/kprobes/myevent/enable 10 10 ( echo "forked" )

+1 -1

tools/testing/selftests/kselftest_harness.h

··· 126 126 snprintf(_metadata->results->reason, \ 127 127 sizeof(_metadata->results->reason), fmt, ##__VA_ARGS__); \ 128 128 if (TH_LOG_ENABLED) { \ 129 - fprintf(TH_LOG_STREAM, "# SKIP %s\n", \ 129 + fprintf(TH_LOG_STREAM, "# SKIP %s\n", \ 130 130 _metadata->results->reason); \ 131 131 } \ 132 132 _metadata->passed = 1; \

+4

tools/testing/selftests/kvm/.gitignore

··· 1 1 # SPDX-License-Identifier: GPL-2.0-only 2 + /aarch64/get-reg-list 3 + /aarch64/get-reg-list-sve 2 4 /s390x/memop 3 5 /s390x/resets 4 6 /s390x/sync_regs_test 5 7 /x86_64/cr4_cpuid_sync_test 6 8 /x86_64/debug_regs 7 9 /x86_64/evmcs_test 10 + /x86_64/kvm_pv_test 8 11 /x86_64/hyperv_cpuid 9 12 /x86_64/mmio_warning_test 10 13 /x86_64/platform_info_test ··· 27 24 /clear_dirty_log_test 28 25 /demand_paging_test 29 26 /dirty_log_test 27 + /dirty_log_perf_test 30 28 /kvm_create_max_vcpus 31 29 /set_memory_region_test 32 30 /steal_time

+17 -8

tools/testing/selftests/kvm/Makefile

··· 34 34 endif 35 35 36 36 LIBKVM = lib/assert.c lib/elf.c lib/io.c lib/kvm_util.c lib/sparsebit.c lib/test_util.c 37 - LIBKVM_x86_64 = lib/x86_64/processor.c lib/x86_64/vmx.c lib/x86_64/svm.c lib/x86_64/ucall.c 37 + LIBKVM_x86_64 = lib/x86_64/processor.c lib/x86_64/vmx.c lib/x86_64/svm.c lib/x86_64/ucall.c lib/x86_64/handlers.S 38 38 LIBKVM_aarch64 = lib/aarch64/processor.c lib/aarch64/ucall.c 39 39 LIBKVM_s390x = lib/s390x/processor.c lib/s390x/ucall.c 40 40 41 41 TEST_GEN_PROGS_x86_64 = x86_64/cr4_cpuid_sync_test 42 42 TEST_GEN_PROGS_x86_64 += x86_64/evmcs_test 43 43 TEST_GEN_PROGS_x86_64 += x86_64/hyperv_cpuid 44 + TEST_GEN_PROGS_x86_64 += x86_64/kvm_pv_test 44 45 TEST_GEN_PROGS_x86_64 += x86_64/mmio_warning_test 45 46 TEST_GEN_PROGS_x86_64 += x86_64/platform_info_test 46 47 TEST_GEN_PROGS_x86_64 += x86_64/set_sregs_test ··· 59 58 TEST_GEN_PROGS_x86_64 += x86_64/debug_regs 60 59 TEST_GEN_PROGS_x86_64 += x86_64/tsc_msrs_test 61 60 TEST_GEN_PROGS_x86_64 += x86_64/user_msr_test 62 - TEST_GEN_PROGS_x86_64 += clear_dirty_log_test 63 61 TEST_GEN_PROGS_x86_64 += demand_paging_test 64 62 TEST_GEN_PROGS_x86_64 += dirty_log_test 63 + TEST_GEN_PROGS_x86_64 += dirty_log_perf_test 65 64 TEST_GEN_PROGS_x86_64 += kvm_create_max_vcpus 66 65 TEST_GEN_PROGS_x86_64 += set_memory_region_test 67 66 TEST_GEN_PROGS_x86_64 += steal_time 68 67 69 - TEST_GEN_PROGS_aarch64 += clear_dirty_log_test 68 + TEST_GEN_PROGS_aarch64 += aarch64/get-reg-list 69 + TEST_GEN_PROGS_aarch64 += aarch64/get-reg-list-sve 70 70 TEST_GEN_PROGS_aarch64 += demand_paging_test 71 71 TEST_GEN_PROGS_aarch64 += dirty_log_test 72 72 TEST_GEN_PROGS_aarch64 += kvm_create_max_vcpus ··· 113 111 include ../lib.mk 114 112 115 113 STATIC_LIBS := $(OUTPUT)/libkvm.a 116 - LIBKVM_OBJ := $(patsubst %.c, $(OUTPUT)/%.o, $(LIBKVM)) 117 - EXTRA_CLEAN += $(LIBKVM_OBJ) $(STATIC_LIBS) cscope.* 114 + LIBKVM_C := $(filter %.c,$(LIBKVM)) 115 + LIBKVM_S := $(filter %.S,$(LIBKVM)) 116 + LIBKVM_C_OBJ := $(patsubst %.c, $(OUTPUT)/%.o, $(LIBKVM_C)) 117 + LIBKVM_S_OBJ := $(patsubst %.S, $(OUTPUT)/%.o, $(LIBKVM_S)) 118 + EXTRA_CLEAN += $(LIBKVM_C_OBJ) $(LIBKVM_S_OBJ) $(STATIC_LIBS) cscope.* 118 119 119 - x := $(shell mkdir -p $(sort $(dir $(LIBKVM_OBJ)))) 120 - $(LIBKVM_OBJ): $(OUTPUT)/%.o: %.c 120 + x := $(shell mkdir -p $(sort $(dir $(LIBKVM_C_OBJ) $(LIBKVM_S_OBJ)))) 121 + $(LIBKVM_C_OBJ): $(OUTPUT)/%.o: %.c 121 122 $(CC) $(CFLAGS) $(CPPFLAGS) $(TARGET_ARCH) -c $< -o $@ 122 123 123 - $(OUTPUT)/libkvm.a: $(LIBKVM_OBJ) 124 + $(LIBKVM_S_OBJ): $(OUTPUT)/%.o: %.S 125 + $(CC) $(CFLAGS) $(CPPFLAGS) $(TARGET_ARCH) -c $< -o $@ 126 + 127 + LIBKVM_OBJS = $(LIBKVM_C_OBJ) $(LIBKVM_S_OBJ) 128 + $(OUTPUT)/libkvm.a: $(LIBKVM_OBJS) 124 129 $(AR) crs $@ $^ 125 130 126 131 x := $(shell mkdir -p $(sort $(dir $(TEST_GEN_PROGS))))

+3

tools/testing/selftests/kvm/aarch64/get-reg-list-sve.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + #define REG_LIST_SVE 3 + #include "get-reg-list.c"

+841

tools/testing/selftests/kvm/aarch64/get-reg-list.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* 3 + * Check for KVM_GET_REG_LIST regressions. 4 + * 5 + * Copyright (C) 2020, Red Hat, Inc. 6 + * 7 + * When attempting to migrate from a host with an older kernel to a host 8 + * with a newer kernel we allow the newer kernel on the destination to 9 + * list new registers with get-reg-list. We assume they'll be unused, at 10 + * least until the guest reboots, and so they're relatively harmless. 11 + * However, if the destination host with the newer kernel is missing 12 + * registers which the source host with the older kernel has, then that's 13 + * a regression in get-reg-list. This test checks for that regression by 14 + * checking the current list against a blessed list. We should never have 15 + * missing registers, but if new ones appear then they can probably be 16 + * added to the blessed list. A completely new blessed list can be created 17 + * by running the test with the --list command line argument. 18 + * 19 + * Note, the blessed list should be created from the oldest possible 20 + * kernel. We can't go older than v4.15, though, because that's the first 21 + * release to expose the ID system registers in KVM_GET_REG_LIST, see 22 + * commit 93390c0a1b20 ("arm64: KVM: Hide unsupported AArch64 CPU features 23 + * from guests"). Also, one must use the --core-reg-fixup command line 24 + * option when running on an older kernel that doesn't include df205b5c6328 25 + * ("KVM: arm64: Filter out invalid core register IDs in KVM_GET_REG_LIST") 26 + */ 27 + #include <stdio.h> 28 + #include <stdlib.h> 29 + #include <string.h> 30 + #include "kvm_util.h" 31 + #include "test_util.h" 32 + #include "processor.h" 33 + 34 + #ifdef REG_LIST_SVE 35 + #define reg_list_sve() (true) 36 + #else 37 + #define reg_list_sve() (false) 38 + #endif 39 + 40 + #define REG_MASK (KVM_REG_ARCH_MASK | KVM_REG_SIZE_MASK | KVM_REG_ARM_COPROC_MASK) 41 + 42 + #define for_each_reg(i) \ 43 + for ((i) = 0; (i) < reg_list->n; ++(i)) 44 + 45 + #define for_each_missing_reg(i) \ 46 + for ((i) = 0; (i) < blessed_n; ++(i)) \ 47 + if (!find_reg(reg_list->reg, reg_list->n, blessed_reg[i])) 48 + 49 + #define for_each_new_reg(i) \ 50 + for ((i) = 0; (i) < reg_list->n; ++(i)) \ 51 + if (!find_reg(blessed_reg, blessed_n, reg_list->reg[i])) 52 + 53 + 54 + static struct kvm_reg_list *reg_list; 55 + 56 + static __u64 base_regs[], vregs[], sve_regs[], rejects_set[]; 57 + static __u64 base_regs_n, vregs_n, sve_regs_n, rejects_set_n; 58 + static __u64 *blessed_reg, blessed_n; 59 + 60 + static bool find_reg(__u64 regs[], __u64 nr_regs, __u64 reg) 61 + { 62 + int i; 63 + 64 + for (i = 0; i < nr_regs; ++i) 65 + if (reg == regs[i]) 66 + return true; 67 + return false; 68 + } 69 + 70 + static const char *str_with_index(const char *template, __u64 index) 71 + { 72 + char *str, *p; 73 + int n; 74 + 75 + str = strdup(template); 76 + p = strstr(str, "##"); 77 + n = sprintf(p, "%lld", index); 78 + strcat(p + n, strstr(template, "##") + 2); 79 + 80 + return (const char *)str; 81 + } 82 + 83 + #define CORE_REGS_XX_NR_WORDS 2 84 + #define CORE_SPSR_XX_NR_WORDS 2 85 + #define CORE_FPREGS_XX_NR_WORDS 4 86 + 87 + static const char *core_id_to_str(__u64 id) 88 + { 89 + __u64 core_off = id & ~REG_MASK, idx; 90 + 91 + /* 92 + * core_off is the offset into struct kvm_regs 93 + */ 94 + switch (core_off) { 95 + case KVM_REG_ARM_CORE_REG(regs.regs[0]) ... 96 + KVM_REG_ARM_CORE_REG(regs.regs[30]): 97 + idx = (core_off - KVM_REG_ARM_CORE_REG(regs.regs[0])) / CORE_REGS_XX_NR_WORDS; 98 + TEST_ASSERT(idx < 31, "Unexpected regs.regs index: %lld", idx); 99 + return str_with_index("KVM_REG_ARM_CORE_REG(regs.regs[##])", idx); 100 + case KVM_REG_ARM_CORE_REG(regs.sp): 101 + return "KVM_REG_ARM_CORE_REG(regs.sp)"; 102 + case KVM_REG_ARM_CORE_REG(regs.pc): 103 + return "KVM_REG_ARM_CORE_REG(regs.pc)"; 104 + case KVM_REG_ARM_CORE_REG(regs.pstate): 105 + return "KVM_REG_ARM_CORE_REG(regs.pstate)"; 106 + case KVM_REG_ARM_CORE_REG(sp_el1): 107 + return "KVM_REG_ARM_CORE_REG(sp_el1)"; 108 + case KVM_REG_ARM_CORE_REG(elr_el1): 109 + return "KVM_REG_ARM_CORE_REG(elr_el1)"; 110 + case KVM_REG_ARM_CORE_REG(spsr[0]) ... 111 + KVM_REG_ARM_CORE_REG(spsr[KVM_NR_SPSR - 1]): 112 + idx = (core_off - KVM_REG_ARM_CORE_REG(spsr[0])) / CORE_SPSR_XX_NR_WORDS; 113 + TEST_ASSERT(idx < KVM_NR_SPSR, "Unexpected spsr index: %lld", idx); 114 + return str_with_index("KVM_REG_ARM_CORE_REG(spsr[##])", idx); 115 + case KVM_REG_ARM_CORE_REG(fp_regs.vregs[0]) ... 116 + KVM_REG_ARM_CORE_REG(fp_regs.vregs[31]): 117 + idx = (core_off - KVM_REG_ARM_CORE_REG(fp_regs.vregs[0])) / CORE_FPREGS_XX_NR_WORDS; 118 + TEST_ASSERT(idx < 32, "Unexpected fp_regs.vregs index: %lld", idx); 119 + return str_with_index("KVM_REG_ARM_CORE_REG(fp_regs.vregs[##])", idx); 120 + case KVM_REG_ARM_CORE_REG(fp_regs.fpsr): 121 + return "KVM_REG_ARM_CORE_REG(fp_regs.fpsr)"; 122 + case KVM_REG_ARM_CORE_REG(fp_regs.fpcr): 123 + return "KVM_REG_ARM_CORE_REG(fp_regs.fpcr)"; 124 + } 125 + 126 + TEST_FAIL("Unknown core reg id: 0x%llx", id); 127 + return NULL; 128 + } 129 + 130 + static const char *sve_id_to_str(__u64 id) 131 + { 132 + __u64 sve_off, n, i; 133 + 134 + if (id == KVM_REG_ARM64_SVE_VLS) 135 + return "KVM_REG_ARM64_SVE_VLS"; 136 + 137 + sve_off = id & ~(REG_MASK | ((1ULL << 5) - 1)); 138 + i = id & (KVM_ARM64_SVE_MAX_SLICES - 1); 139 + 140 + TEST_ASSERT(i == 0, "Currently we don't expect slice > 0, reg id 0x%llx", id); 141 + 142 + switch (sve_off) { 143 + case KVM_REG_ARM64_SVE_ZREG_BASE ... 144 + KVM_REG_ARM64_SVE_ZREG_BASE + (1ULL << 5) * KVM_ARM64_SVE_NUM_ZREGS - 1: 145 + n = (id >> 5) & (KVM_ARM64_SVE_NUM_ZREGS - 1); 146 + TEST_ASSERT(id == KVM_REG_ARM64_SVE_ZREG(n, 0), 147 + "Unexpected bits set in SVE ZREG id: 0x%llx", id); 148 + return str_with_index("KVM_REG_ARM64_SVE_ZREG(##, 0)", n); 149 + case KVM_REG_ARM64_SVE_PREG_BASE ... 150 + KVM_REG_ARM64_SVE_PREG_BASE + (1ULL << 5) * KVM_ARM64_SVE_NUM_PREGS - 1: 151 + n = (id >> 5) & (KVM_ARM64_SVE_NUM_PREGS - 1); 152 + TEST_ASSERT(id == KVM_REG_ARM64_SVE_PREG(n, 0), 153 + "Unexpected bits set in SVE PREG id: 0x%llx", id); 154 + return str_with_index("KVM_REG_ARM64_SVE_PREG(##, 0)", n); 155 + case KVM_REG_ARM64_SVE_FFR_BASE: 156 + TEST_ASSERT(id == KVM_REG_ARM64_SVE_FFR(0), 157 + "Unexpected bits set in SVE FFR id: 0x%llx", id); 158 + return "KVM_REG_ARM64_SVE_FFR(0)"; 159 + } 160 + 161 + return NULL; 162 + } 163 + 164 + static void print_reg(__u64 id) 165 + { 166 + unsigned op0, op1, crn, crm, op2; 167 + const char *reg_size = NULL; 168 + 169 + TEST_ASSERT((id & KVM_REG_ARCH_MASK) == KVM_REG_ARM64, 170 + "KVM_REG_ARM64 missing in reg id: 0x%llx", id); 171 + 172 + switch (id & KVM_REG_SIZE_MASK) { 173 + case KVM_REG_SIZE_U8: 174 + reg_size = "KVM_REG_SIZE_U8"; 175 + break; 176 + case KVM_REG_SIZE_U16: 177 + reg_size = "KVM_REG_SIZE_U16"; 178 + break; 179 + case KVM_REG_SIZE_U32: 180 + reg_size = "KVM_REG_SIZE_U32"; 181 + break; 182 + case KVM_REG_SIZE_U64: 183 + reg_size = "KVM_REG_SIZE_U64"; 184 + break; 185 + case KVM_REG_SIZE_U128: 186 + reg_size = "KVM_REG_SIZE_U128"; 187 + break; 188 + case KVM_REG_SIZE_U256: 189 + reg_size = "KVM_REG_SIZE_U256"; 190 + break; 191 + case KVM_REG_SIZE_U512: 192 + reg_size = "KVM_REG_SIZE_U512"; 193 + break; 194 + case KVM_REG_SIZE_U1024: 195 + reg_size = "KVM_REG_SIZE_U1024"; 196 + break; 197 + case KVM_REG_SIZE_U2048: 198 + reg_size = "KVM_REG_SIZE_U2048"; 199 + break; 200 + default: 201 + TEST_FAIL("Unexpected reg size: 0x%llx in reg id: 0x%llx", 202 + (id & KVM_REG_SIZE_MASK) >> KVM_REG_SIZE_SHIFT, id); 203 + } 204 + 205 + switch (id & KVM_REG_ARM_COPROC_MASK) { 206 + case KVM_REG_ARM_CORE: 207 + printf("\tKVM_REG_ARM64 | %s | KVM_REG_ARM_CORE | %s,\n", reg_size, core_id_to_str(id)); 208 + break; 209 + case KVM_REG_ARM_DEMUX: 210 + TEST_ASSERT(!(id & ~(REG_MASK | KVM_REG_ARM_DEMUX_ID_MASK | KVM_REG_ARM_DEMUX_VAL_MASK)), 211 + "Unexpected bits set in DEMUX reg id: 0x%llx", id); 212 + printf("\tKVM_REG_ARM64 | %s | KVM_REG_ARM_DEMUX | KVM_REG_ARM_DEMUX_ID_CCSIDR | %lld,\n", 213 + reg_size, id & KVM_REG_ARM_DEMUX_VAL_MASK); 214 + break; 215 + case KVM_REG_ARM64_SYSREG: 216 + op0 = (id & KVM_REG_ARM64_SYSREG_OP0_MASK) >> KVM_REG_ARM64_SYSREG_OP0_SHIFT; 217 + op1 = (id & KVM_REG_ARM64_SYSREG_OP1_MASK) >> KVM_REG_ARM64_SYSREG_OP1_SHIFT; 218 + crn = (id & KVM_REG_ARM64_SYSREG_CRN_MASK) >> KVM_REG_ARM64_SYSREG_CRN_SHIFT; 219 + crm = (id & KVM_REG_ARM64_SYSREG_CRM_MASK) >> KVM_REG_ARM64_SYSREG_CRM_SHIFT; 220 + op2 = (id & KVM_REG_ARM64_SYSREG_OP2_MASK) >> KVM_REG_ARM64_SYSREG_OP2_SHIFT; 221 + TEST_ASSERT(id == ARM64_SYS_REG(op0, op1, crn, crm, op2), 222 + "Unexpected bits set in SYSREG reg id: 0x%llx", id); 223 + printf("\tARM64_SYS_REG(%d, %d, %d, %d, %d),\n", op0, op1, crn, crm, op2); 224 + break; 225 + case KVM_REG_ARM_FW: 226 + TEST_ASSERT(id == KVM_REG_ARM_FW_REG(id & 0xffff), 227 + "Unexpected bits set in FW reg id: 0x%llx", id); 228 + printf("\tKVM_REG_ARM_FW_REG(%lld),\n", id & 0xffff); 229 + break; 230 + case KVM_REG_ARM64_SVE: 231 + if (reg_list_sve()) 232 + printf("\t%s,\n", sve_id_to_str(id)); 233 + else 234 + TEST_FAIL("KVM_REG_ARM64_SVE is an unexpected coproc type in reg id: 0x%llx", id); 235 + break; 236 + default: 237 + TEST_FAIL("Unexpected coproc type: 0x%llx in reg id: 0x%llx", 238 + (id & KVM_REG_ARM_COPROC_MASK) >> KVM_REG_ARM_COPROC_SHIFT, id); 239 + } 240 + } 241 + 242 + /* 243 + * Older kernels listed each 32-bit word of CORE registers separately. 244 + * For 64 and 128-bit registers we need to ignore the extra words. We 245 + * also need to fixup the sizes, because the older kernels stated all 246 + * registers were 64-bit, even when they weren't. 247 + */ 248 + static void core_reg_fixup(void) 249 + { 250 + struct kvm_reg_list *tmp; 251 + __u64 id, core_off; 252 + int i; 253 + 254 + tmp = calloc(1, sizeof(*tmp) + reg_list->n * sizeof(__u64)); 255 + 256 + for (i = 0; i < reg_list->n; ++i) { 257 + id = reg_list->reg[i]; 258 + 259 + if ((id & KVM_REG_ARM_COPROC_MASK) != KVM_REG_ARM_CORE) { 260 + tmp->reg[tmp->n++] = id; 261 + continue; 262 + } 263 + 264 + core_off = id & ~REG_MASK; 265 + 266 + switch (core_off) { 267 + case 0x52: case 0xd2: case 0xd6: 268 + /* 269 + * These offsets are pointing at padding. 270 + * We need to ignore them too. 271 + */ 272 + continue; 273 + case KVM_REG_ARM_CORE_REG(fp_regs.vregs[0]) ... 274 + KVM_REG_ARM_CORE_REG(fp_regs.vregs[31]): 275 + if (core_off & 3) 276 + continue; 277 + id &= ~KVM_REG_SIZE_MASK; 278 + id |= KVM_REG_SIZE_U128; 279 + tmp->reg[tmp->n++] = id; 280 + continue; 281 + case KVM_REG_ARM_CORE_REG(fp_regs.fpsr): 282 + case KVM_REG_ARM_CORE_REG(fp_regs.fpcr): 283 + id &= ~KVM_REG_SIZE_MASK; 284 + id |= KVM_REG_SIZE_U32; 285 + tmp->reg[tmp->n++] = id; 286 + continue; 287 + default: 288 + if (core_off & 1) 289 + continue; 290 + tmp->reg[tmp->n++] = id; 291 + break; 292 + } 293 + } 294 + 295 + free(reg_list); 296 + reg_list = tmp; 297 + } 298 + 299 + static void prepare_vcpu_init(struct kvm_vcpu_init *init) 300 + { 301 + if (reg_list_sve()) 302 + init->features[0] |= 1 << KVM_ARM_VCPU_SVE; 303 + } 304 + 305 + static void finalize_vcpu(struct kvm_vm *vm, uint32_t vcpuid) 306 + { 307 + int feature; 308 + 309 + if (reg_list_sve()) { 310 + feature = KVM_ARM_VCPU_SVE; 311 + vcpu_ioctl(vm, vcpuid, KVM_ARM_VCPU_FINALIZE, &feature); 312 + } 313 + } 314 + 315 + static void check_supported(void) 316 + { 317 + if (reg_list_sve() && !kvm_check_cap(KVM_CAP_ARM_SVE)) { 318 + fprintf(stderr, "SVE not available, skipping tests\n"); 319 + exit(KSFT_SKIP); 320 + } 321 + } 322 + 323 + int main(int ac, char **av) 324 + { 325 + struct kvm_vcpu_init init = { .target = -1, }; 326 + int new_regs = 0, missing_regs = 0, i; 327 + int failed_get = 0, failed_set = 0, failed_reject = 0; 328 + bool print_list = false, fixup_core_regs = false; 329 + struct kvm_vm *vm; 330 + __u64 *vec_regs; 331 + 332 + check_supported(); 333 + 334 + for (i = 1; i < ac; ++i) { 335 + if (strcmp(av[i], "--core-reg-fixup") == 0) 336 + fixup_core_regs = true; 337 + else if (strcmp(av[i], "--list") == 0) 338 + print_list = true; 339 + else 340 + fprintf(stderr, "Ignoring unknown option: %s\n", av[i]); 341 + } 342 + 343 + vm = vm_create(VM_MODE_DEFAULT, DEFAULT_GUEST_PHY_PAGES, O_RDWR); 344 + prepare_vcpu_init(&init); 345 + aarch64_vcpu_add_default(vm, 0, &init, NULL); 346 + finalize_vcpu(vm, 0); 347 + 348 + reg_list = vcpu_get_reg_list(vm, 0); 349 + 350 + if (fixup_core_regs) 351 + core_reg_fixup(); 352 + 353 + if (print_list) { 354 + putchar('\n'); 355 + for_each_reg(i) 356 + print_reg(reg_list->reg[i]); 357 + putchar('\n'); 358 + return 0; 359 + } 360 + 361 + /* 362 + * We only test that we can get the register and then write back the 363 + * same value. Some registers may allow other values to be written 364 + * back, but others only allow some bits to be changed, and at least 365 + * for ID registers set will fail if the value does not exactly match 366 + * what was returned by get. If registers that allow other values to 367 + * be written need to have the other values tested, then we should 368 + * create a new set of tests for those in a new independent test 369 + * executable. 370 + */ 371 + for_each_reg(i) { 372 + uint8_t addr[2048 / 8]; 373 + struct kvm_one_reg reg = { 374 + .id = reg_list->reg[i], 375 + .addr = (__u64)&addr, 376 + }; 377 + int ret; 378 + 379 + ret = _vcpu_ioctl(vm, 0, KVM_GET_ONE_REG, &reg); 380 + if (ret) { 381 + puts("Failed to get "); 382 + print_reg(reg.id); 383 + putchar('\n'); 384 + ++failed_get; 385 + } 386 + 387 + /* rejects_set registers are rejected after KVM_ARM_VCPU_FINALIZE */ 388 + if (find_reg(rejects_set, rejects_set_n, reg.id)) { 389 + ret = _vcpu_ioctl(vm, 0, KVM_SET_ONE_REG, &reg); 390 + if (ret != -1 || errno != EPERM) { 391 + printf("Failed to reject (ret=%d, errno=%d) ", ret, errno); 392 + print_reg(reg.id); 393 + putchar('\n'); 394 + ++failed_reject; 395 + } 396 + continue; 397 + } 398 + 399 + ret = _vcpu_ioctl(vm, 0, KVM_SET_ONE_REG, &reg); 400 + if (ret) { 401 + puts("Failed to set "); 402 + print_reg(reg.id); 403 + putchar('\n'); 404 + ++failed_set; 405 + } 406 + } 407 + 408 + if (reg_list_sve()) { 409 + blessed_n = base_regs_n + sve_regs_n; 410 + vec_regs = sve_regs; 411 + } else { 412 + blessed_n = base_regs_n + vregs_n; 413 + vec_regs = vregs; 414 + } 415 + 416 + blessed_reg = calloc(blessed_n, sizeof(__u64)); 417 + for (i = 0; i < base_regs_n; ++i) 418 + blessed_reg[i] = base_regs[i]; 419 + for (i = 0; i < blessed_n - base_regs_n; ++i) 420 + blessed_reg[base_regs_n + i] = vec_regs[i]; 421 + 422 + for_each_new_reg(i) 423 + ++new_regs; 424 + 425 + for_each_missing_reg(i) 426 + ++missing_regs; 427 + 428 + if (new_regs || missing_regs) { 429 + printf("Number blessed registers: %5lld\n", blessed_n); 430 + printf("Number registers: %5lld\n", reg_list->n); 431 + } 432 + 433 + if (new_regs) { 434 + printf("\nThere are %d new registers.\n" 435 + "Consider adding them to the blessed reg " 436 + "list with the following lines:\n\n", new_regs); 437 + for_each_new_reg(i) 438 + print_reg(reg_list->reg[i]); 439 + putchar('\n'); 440 + } 441 + 442 + if (missing_regs) { 443 + printf("\nThere are %d missing registers.\n" 444 + "The following lines are missing registers:\n\n", missing_regs); 445 + for_each_missing_reg(i) 446 + print_reg(blessed_reg[i]); 447 + putchar('\n'); 448 + } 449 + 450 + TEST_ASSERT(!missing_regs && !failed_get && !failed_set && !failed_reject, 451 + "There are %d missing registers; " 452 + "%d registers failed get; %d registers failed set; %d registers failed reject", 453 + missing_regs, failed_get, failed_set, failed_reject); 454 + 455 + return 0; 456 + } 457 + 458 + /* 459 + * The current blessed list was primed with the output of kernel version 460 + * v4.15 with --core-reg-fixup and then later updated with new registers. 461 + */ 462 + static __u64 base_regs[] = { 463 + KVM_REG_ARM64 | KVM_REG_SIZE_U64 | KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(regs.regs[0]), 464 + KVM_REG_ARM64 | KVM_REG_SIZE_U64 | KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(regs.regs[1]), 465 + KVM_REG_ARM64 | KVM_REG_SIZE_U64 | KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(regs.regs[2]), 466 + KVM_REG_ARM64 | KVM_REG_SIZE_U64 | KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(regs.regs[3]), 467 + KVM_REG_ARM64 | KVM_REG_SIZE_U64 | KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(regs.regs[4]), 468 + KVM_REG_ARM64 | KVM_REG_SIZE_U64 | KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(regs.regs[5]), 469 + KVM_REG_ARM64 | KVM_REG_SIZE_U64 | KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(regs.regs[6]), 470 + KVM_REG_ARM64 | KVM_REG_SIZE_U64 | KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(regs.regs[7]), 471 + KVM_REG_ARM64 | KVM_REG_SIZE_U64 | KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(regs.regs[8]), 472 + KVM_REG_ARM64 | KVM_REG_SIZE_U64 | KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(regs.regs[9]), 473 + KVM_REG_ARM64 | KVM_REG_SIZE_U64 | KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(regs.regs[10]), 474 + KVM_REG_ARM64 | KVM_REG_SIZE_U64 | KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(regs.regs[11]), 475 + KVM_REG_ARM64 | KVM_REG_SIZE_U64 | KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(regs.regs[12]), 476 + KVM_REG_ARM64 | KVM_REG_SIZE_U64 | KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(regs.regs[13]), 477 + KVM_REG_ARM64 | KVM_REG_SIZE_U64 | KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(regs.regs[14]), 478 + KVM_REG_ARM64 | KVM_REG_SIZE_U64 | KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(regs.regs[15]), 479 + KVM_REG_ARM64 | KVM_REG_SIZE_U64 | KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(regs.regs[16]), 480 + KVM_REG_ARM64 | KVM_REG_SIZE_U64 | KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(regs.regs[17]), 481 + KVM_REG_ARM64 | KVM_REG_SIZE_U64 | KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(regs.regs[18]), 482 + KVM_REG_ARM64 | KVM_REG_SIZE_U64 | KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(regs.regs[19]), 483 + KVM_REG_ARM64 | KVM_REG_SIZE_U64 | KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(regs.regs[20]), 484 + KVM_REG_ARM64 | KVM_REG_SIZE_U64 | KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(regs.regs[21]), 485 + KVM_REG_ARM64 | KVM_REG_SIZE_U64 | KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(regs.regs[22]), 486 + KVM_REG_ARM64 | KVM_REG_SIZE_U64 | KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(regs.regs[23]), 487 + KVM_REG_ARM64 | KVM_REG_SIZE_U64 | KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(regs.regs[24]), 488 + KVM_REG_ARM64 | KVM_REG_SIZE_U64 | KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(regs.regs[25]), 489 + KVM_REG_ARM64 | KVM_REG_SIZE_U64 | KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(regs.regs[26]), 490 + KVM_REG_ARM64 | KVM_REG_SIZE_U64 | KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(regs.regs[27]), 491 + KVM_REG_ARM64 | KVM_REG_SIZE_U64 | KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(regs.regs[28]), 492 + KVM_REG_ARM64 | KVM_REG_SIZE_U64 | KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(regs.regs[29]), 493 + KVM_REG_ARM64 | KVM_REG_SIZE_U64 | KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(regs.regs[30]), 494 + KVM_REG_ARM64 | KVM_REG_SIZE_U64 | KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(regs.sp), 495 + KVM_REG_ARM64 | KVM_REG_SIZE_U64 | KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(regs.pc), 496 + KVM_REG_ARM64 | KVM_REG_SIZE_U64 | KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(regs.pstate), 497 + KVM_REG_ARM64 | KVM_REG_SIZE_U64 | KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(sp_el1), 498 + KVM_REG_ARM64 | KVM_REG_SIZE_U64 | KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(elr_el1), 499 + KVM_REG_ARM64 | KVM_REG_SIZE_U64 | KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(spsr[0]), 500 + KVM_REG_ARM64 | KVM_REG_SIZE_U64 | KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(spsr[1]), 501 + KVM_REG_ARM64 | KVM_REG_SIZE_U64 | KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(spsr[2]), 502 + KVM_REG_ARM64 | KVM_REG_SIZE_U64 | KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(spsr[3]), 503 + KVM_REG_ARM64 | KVM_REG_SIZE_U64 | KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(spsr[4]), 504 + KVM_REG_ARM64 | KVM_REG_SIZE_U32 | KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(fp_regs.fpsr), 505 + KVM_REG_ARM64 | KVM_REG_SIZE_U32 | KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(fp_regs.fpcr), 506 + KVM_REG_ARM_FW_REG(0), 507 + KVM_REG_ARM_FW_REG(1), 508 + KVM_REG_ARM_FW_REG(2), 509 + ARM64_SYS_REG(3, 3, 14, 3, 1), /* CNTV_CTL_EL0 */ 510 + ARM64_SYS_REG(3, 3, 14, 3, 2), /* CNTV_CVAL_EL0 */ 511 + ARM64_SYS_REG(3, 3, 14, 0, 2), 512 + ARM64_SYS_REG(3, 0, 0, 0, 0), /* MIDR_EL1 */ 513 + ARM64_SYS_REG(3, 0, 0, 0, 6), /* REVIDR_EL1 */ 514 + ARM64_SYS_REG(3, 1, 0, 0, 1), /* CLIDR_EL1 */ 515 + ARM64_SYS_REG(3, 1, 0, 0, 7), /* AIDR_EL1 */ 516 + ARM64_SYS_REG(3, 3, 0, 0, 1), /* CTR_EL0 */ 517 + ARM64_SYS_REG(2, 0, 0, 0, 4), 518 + ARM64_SYS_REG(2, 0, 0, 0, 5), 519 + ARM64_SYS_REG(2, 0, 0, 0, 6), 520 + ARM64_SYS_REG(2, 0, 0, 0, 7), 521 + ARM64_SYS_REG(2, 0, 0, 1, 4), 522 + ARM64_SYS_REG(2, 0, 0, 1, 5), 523 + ARM64_SYS_REG(2, 0, 0, 1, 6), 524 + ARM64_SYS_REG(2, 0, 0, 1, 7), 525 + ARM64_SYS_REG(2, 0, 0, 2, 0), /* MDCCINT_EL1 */ 526 + ARM64_SYS_REG(2, 0, 0, 2, 2), /* MDSCR_EL1 */ 527 + ARM64_SYS_REG(2, 0, 0, 2, 4), 528 + ARM64_SYS_REG(2, 0, 0, 2, 5), 529 + ARM64_SYS_REG(2, 0, 0, 2, 6), 530 + ARM64_SYS_REG(2, 0, 0, 2, 7), 531 + ARM64_SYS_REG(2, 0, 0, 3, 4), 532 + ARM64_SYS_REG(2, 0, 0, 3, 5), 533 + ARM64_SYS_REG(2, 0, 0, 3, 6), 534 + ARM64_SYS_REG(2, 0, 0, 3, 7), 535 + ARM64_SYS_REG(2, 0, 0, 4, 4), 536 + ARM64_SYS_REG(2, 0, 0, 4, 5), 537 + ARM64_SYS_REG(2, 0, 0, 4, 6), 538 + ARM64_SYS_REG(2, 0, 0, 4, 7), 539 + ARM64_SYS_REG(2, 0, 0, 5, 4), 540 + ARM64_SYS_REG(2, 0, 0, 5, 5), 541 + ARM64_SYS_REG(2, 0, 0, 5, 6), 542 + ARM64_SYS_REG(2, 0, 0, 5, 7), 543 + ARM64_SYS_REG(2, 0, 0, 6, 4), 544 + ARM64_SYS_REG(2, 0, 0, 6, 5), 545 + ARM64_SYS_REG(2, 0, 0, 6, 6), 546 + ARM64_SYS_REG(2, 0, 0, 6, 7), 547 + ARM64_SYS_REG(2, 0, 0, 7, 4), 548 + ARM64_SYS_REG(2, 0, 0, 7, 5), 549 + ARM64_SYS_REG(2, 0, 0, 7, 6), 550 + ARM64_SYS_REG(2, 0, 0, 7, 7), 551 + ARM64_SYS_REG(2, 0, 0, 8, 4), 552 + ARM64_SYS_REG(2, 0, 0, 8, 5), 553 + ARM64_SYS_REG(2, 0, 0, 8, 6), 554 + ARM64_SYS_REG(2, 0, 0, 8, 7), 555 + ARM64_SYS_REG(2, 0, 0, 9, 4), 556 + ARM64_SYS_REG(2, 0, 0, 9, 5), 557 + ARM64_SYS_REG(2, 0, 0, 9, 6), 558 + ARM64_SYS_REG(2, 0, 0, 9, 7), 559 + ARM64_SYS_REG(2, 0, 0, 10, 4), 560 + ARM64_SYS_REG(2, 0, 0, 10, 5), 561 + ARM64_SYS_REG(2, 0, 0, 10, 6), 562 + ARM64_SYS_REG(2, 0, 0, 10, 7), 563 + ARM64_SYS_REG(2, 0, 0, 11, 4), 564 + ARM64_SYS_REG(2, 0, 0, 11, 5), 565 + ARM64_SYS_REG(2, 0, 0, 11, 6), 566 + ARM64_SYS_REG(2, 0, 0, 11, 7), 567 + ARM64_SYS_REG(2, 0, 0, 12, 4), 568 + ARM64_SYS_REG(2, 0, 0, 12, 5), 569 + ARM64_SYS_REG(2, 0, 0, 12, 6), 570 + ARM64_SYS_REG(2, 0, 0, 12, 7), 571 + ARM64_SYS_REG(2, 0, 0, 13, 4), 572 + ARM64_SYS_REG(2, 0, 0, 13, 5), 573 + ARM64_SYS_REG(2, 0, 0, 13, 6), 574 + ARM64_SYS_REG(2, 0, 0, 13, 7), 575 + ARM64_SYS_REG(2, 0, 0, 14, 4), 576 + ARM64_SYS_REG(2, 0, 0, 14, 5), 577 + ARM64_SYS_REG(2, 0, 0, 14, 6), 578 + ARM64_SYS_REG(2, 0, 0, 14, 7), 579 + ARM64_SYS_REG(2, 0, 0, 15, 4), 580 + ARM64_SYS_REG(2, 0, 0, 15, 5), 581 + ARM64_SYS_REG(2, 0, 0, 15, 6), 582 + ARM64_SYS_REG(2, 0, 0, 15, 7), 583 + ARM64_SYS_REG(2, 4, 0, 7, 0), /* DBGVCR32_EL2 */ 584 + ARM64_SYS_REG(3, 0, 0, 0, 5), /* MPIDR_EL1 */ 585 + ARM64_SYS_REG(3, 0, 0, 1, 0), /* ID_PFR0_EL1 */ 586 + ARM64_SYS_REG(3, 0, 0, 1, 1), /* ID_PFR1_EL1 */ 587 + ARM64_SYS_REG(3, 0, 0, 1, 2), /* ID_DFR0_EL1 */ 588 + ARM64_SYS_REG(3, 0, 0, 1, 3), /* ID_AFR0_EL1 */ 589 + ARM64_SYS_REG(3, 0, 0, 1, 4), /* ID_MMFR0_EL1 */ 590 + ARM64_SYS_REG(3, 0, 0, 1, 5), /* ID_MMFR1_EL1 */ 591 + ARM64_SYS_REG(3, 0, 0, 1, 6), /* ID_MMFR2_EL1 */ 592 + ARM64_SYS_REG(3, 0, 0, 1, 7), /* ID_MMFR3_EL1 */ 593 + ARM64_SYS_REG(3, 0, 0, 2, 0), /* ID_ISAR0_EL1 */ 594 + ARM64_SYS_REG(3, 0, 0, 2, 1), /* ID_ISAR1_EL1 */ 595 + ARM64_SYS_REG(3, 0, 0, 2, 2), /* ID_ISAR2_EL1 */ 596 + ARM64_SYS_REG(3, 0, 0, 2, 3), /* ID_ISAR3_EL1 */ 597 + ARM64_SYS_REG(3, 0, 0, 2, 4), /* ID_ISAR4_EL1 */ 598 + ARM64_SYS_REG(3, 0, 0, 2, 5), /* ID_ISAR5_EL1 */ 599 + ARM64_SYS_REG(3, 0, 0, 2, 6), /* ID_MMFR4_EL1 */ 600 + ARM64_SYS_REG(3, 0, 0, 2, 7), /* ID_ISAR6_EL1 */ 601 + ARM64_SYS_REG(3, 0, 0, 3, 0), /* MVFR0_EL1 */ 602 + ARM64_SYS_REG(3, 0, 0, 3, 1), /* MVFR1_EL1 */ 603 + ARM64_SYS_REG(3, 0, 0, 3, 2), /* MVFR2_EL1 */ 604 + ARM64_SYS_REG(3, 0, 0, 3, 3), 605 + ARM64_SYS_REG(3, 0, 0, 3, 4), /* ID_PFR2_EL1 */ 606 + ARM64_SYS_REG(3, 0, 0, 3, 5), /* ID_DFR1_EL1 */ 607 + ARM64_SYS_REG(3, 0, 0, 3, 6), /* ID_MMFR5_EL1 */ 608 + ARM64_SYS_REG(3, 0, 0, 3, 7), 609 + ARM64_SYS_REG(3, 0, 0, 4, 0), /* ID_AA64PFR0_EL1 */ 610 + ARM64_SYS_REG(3, 0, 0, 4, 1), /* ID_AA64PFR1_EL1 */ 611 + ARM64_SYS_REG(3, 0, 0, 4, 2), 612 + ARM64_SYS_REG(3, 0, 0, 4, 3), 613 + ARM64_SYS_REG(3, 0, 0, 4, 4), /* ID_AA64ZFR0_EL1 */ 614 + ARM64_SYS_REG(3, 0, 0, 4, 5), 615 + ARM64_SYS_REG(3, 0, 0, 4, 6), 616 + ARM64_SYS_REG(3, 0, 0, 4, 7), 617 + ARM64_SYS_REG(3, 0, 0, 5, 0), /* ID_AA64DFR0_EL1 */ 618 + ARM64_SYS_REG(3, 0, 0, 5, 1), /* ID_AA64DFR1_EL1 */ 619 + ARM64_SYS_REG(3, 0, 0, 5, 2), 620 + ARM64_SYS_REG(3, 0, 0, 5, 3), 621 + ARM64_SYS_REG(3, 0, 0, 5, 4), /* ID_AA64AFR0_EL1 */ 622 + ARM64_SYS_REG(3, 0, 0, 5, 5), /* ID_AA64AFR1_EL1 */ 623 + ARM64_SYS_REG(3, 0, 0, 5, 6), 624 + ARM64_SYS_REG(3, 0, 0, 5, 7), 625 + ARM64_SYS_REG(3, 0, 0, 6, 0), /* ID_AA64ISAR0_EL1 */ 626 + ARM64_SYS_REG(3, 0, 0, 6, 1), /* ID_AA64ISAR1_EL1 */ 627 + ARM64_SYS_REG(3, 0, 0, 6, 2), 628 + ARM64_SYS_REG(3, 0, 0, 6, 3), 629 + ARM64_SYS_REG(3, 0, 0, 6, 4), 630 + ARM64_SYS_REG(3, 0, 0, 6, 5), 631 + ARM64_SYS_REG(3, 0, 0, 6, 6), 632 + ARM64_SYS_REG(3, 0, 0, 6, 7), 633 + ARM64_SYS_REG(3, 0, 0, 7, 0), /* ID_AA64MMFR0_EL1 */ 634 + ARM64_SYS_REG(3, 0, 0, 7, 1), /* ID_AA64MMFR1_EL1 */ 635 + ARM64_SYS_REG(3, 0, 0, 7, 2), /* ID_AA64MMFR2_EL1 */ 636 + ARM64_SYS_REG(3, 0, 0, 7, 3), 637 + ARM64_SYS_REG(3, 0, 0, 7, 4), 638 + ARM64_SYS_REG(3, 0, 0, 7, 5), 639 + ARM64_SYS_REG(3, 0, 0, 7, 6), 640 + ARM64_SYS_REG(3, 0, 0, 7, 7), 641 + ARM64_SYS_REG(3, 0, 1, 0, 0), /* SCTLR_EL1 */ 642 + ARM64_SYS_REG(3, 0, 1, 0, 1), /* ACTLR_EL1 */ 643 + ARM64_SYS_REG(3, 0, 1, 0, 2), /* CPACR_EL1 */ 644 + ARM64_SYS_REG(3, 0, 2, 0, 0), /* TTBR0_EL1 */ 645 + ARM64_SYS_REG(3, 0, 2, 0, 1), /* TTBR1_EL1 */ 646 + ARM64_SYS_REG(3, 0, 2, 0, 2), /* TCR_EL1 */ 647 + ARM64_SYS_REG(3, 0, 5, 1, 0), /* AFSR0_EL1 */ 648 + ARM64_SYS_REG(3, 0, 5, 1, 1), /* AFSR1_EL1 */ 649 + ARM64_SYS_REG(3, 0, 5, 2, 0), /* ESR_EL1 */ 650 + ARM64_SYS_REG(3, 0, 6, 0, 0), /* FAR_EL1 */ 651 + ARM64_SYS_REG(3, 0, 7, 4, 0), /* PAR_EL1 */ 652 + ARM64_SYS_REG(3, 0, 9, 14, 1), /* PMINTENSET_EL1 */ 653 + ARM64_SYS_REG(3, 0, 9, 14, 2), /* PMINTENCLR_EL1 */ 654 + ARM64_SYS_REG(3, 0, 10, 2, 0), /* MAIR_EL1 */ 655 + ARM64_SYS_REG(3, 0, 10, 3, 0), /* AMAIR_EL1 */ 656 + ARM64_SYS_REG(3, 0, 12, 0, 0), /* VBAR_EL1 */ 657 + ARM64_SYS_REG(3, 0, 12, 1, 1), /* DISR_EL1 */ 658 + ARM64_SYS_REG(3, 0, 13, 0, 1), /* CONTEXTIDR_EL1 */ 659 + ARM64_SYS_REG(3, 0, 13, 0, 4), /* TPIDR_EL1 */ 660 + ARM64_SYS_REG(3, 0, 14, 1, 0), /* CNTKCTL_EL1 */ 661 + ARM64_SYS_REG(3, 2, 0, 0, 0), /* CSSELR_EL1 */ 662 + ARM64_SYS_REG(3, 3, 9, 12, 0), /* PMCR_EL0 */ 663 + ARM64_SYS_REG(3, 3, 9, 12, 1), /* PMCNTENSET_EL0 */ 664 + ARM64_SYS_REG(3, 3, 9, 12, 2), /* PMCNTENCLR_EL0 */ 665 + ARM64_SYS_REG(3, 3, 9, 12, 3), /* PMOVSCLR_EL0 */ 666 + ARM64_SYS_REG(3, 3, 9, 12, 4), /* PMSWINC_EL0 */ 667 + ARM64_SYS_REG(3, 3, 9, 12, 5), /* PMSELR_EL0 */ 668 + ARM64_SYS_REG(3, 3, 9, 13, 0), /* PMCCNTR_EL0 */ 669 + ARM64_SYS_REG(3, 3, 9, 14, 0), /* PMUSERENR_EL0 */ 670 + ARM64_SYS_REG(3, 3, 9, 14, 3), /* PMOVSSET_EL0 */ 671 + ARM64_SYS_REG(3, 3, 13, 0, 2), /* TPIDR_EL0 */ 672 + ARM64_SYS_REG(3, 3, 13, 0, 3), /* TPIDRRO_EL0 */ 673 + ARM64_SYS_REG(3, 3, 14, 8, 0), 674 + ARM64_SYS_REG(3, 3, 14, 8, 1), 675 + ARM64_SYS_REG(3, 3, 14, 8, 2), 676 + ARM64_SYS_REG(3, 3, 14, 8, 3), 677 + ARM64_SYS_REG(3, 3, 14, 8, 4), 678 + ARM64_SYS_REG(3, 3, 14, 8, 5), 679 + ARM64_SYS_REG(3, 3, 14, 8, 6), 680 + ARM64_SYS_REG(3, 3, 14, 8, 7), 681 + ARM64_SYS_REG(3, 3, 14, 9, 0), 682 + ARM64_SYS_REG(3, 3, 14, 9, 1), 683 + ARM64_SYS_REG(3, 3, 14, 9, 2), 684 + ARM64_SYS_REG(3, 3, 14, 9, 3), 685 + ARM64_SYS_REG(3, 3, 14, 9, 4), 686 + ARM64_SYS_REG(3, 3, 14, 9, 5), 687 + ARM64_SYS_REG(3, 3, 14, 9, 6), 688 + ARM64_SYS_REG(3, 3, 14, 9, 7), 689 + ARM64_SYS_REG(3, 3, 14, 10, 0), 690 + ARM64_SYS_REG(3, 3, 14, 10, 1), 691 + ARM64_SYS_REG(3, 3, 14, 10, 2), 692 + ARM64_SYS_REG(3, 3, 14, 10, 3), 693 + ARM64_SYS_REG(3, 3, 14, 10, 4), 694 + ARM64_SYS_REG(3, 3, 14, 10, 5), 695 + ARM64_SYS_REG(3, 3, 14, 10, 6), 696 + ARM64_SYS_REG(3, 3, 14, 10, 7), 697 + ARM64_SYS_REG(3, 3, 14, 11, 0), 698 + ARM64_SYS_REG(3, 3, 14, 11, 1), 699 + ARM64_SYS_REG(3, 3, 14, 11, 2), 700 + ARM64_SYS_REG(3, 3, 14, 11, 3), 701 + ARM64_SYS_REG(3, 3, 14, 11, 4), 702 + ARM64_SYS_REG(3, 3, 14, 11, 5), 703 + ARM64_SYS_REG(3, 3, 14, 11, 6), 704 + ARM64_SYS_REG(3, 3, 14, 12, 0), 705 + ARM64_SYS_REG(3, 3, 14, 12, 1), 706 + ARM64_SYS_REG(3, 3, 14, 12, 2), 707 + ARM64_SYS_REG(3, 3, 14, 12, 3), 708 + ARM64_SYS_REG(3, 3, 14, 12, 4), 709 + ARM64_SYS_REG(3, 3, 14, 12, 5), 710 + ARM64_SYS_REG(3, 3, 14, 12, 6), 711 + ARM64_SYS_REG(3, 3, 14, 12, 7), 712 + ARM64_SYS_REG(3, 3, 14, 13, 0), 713 + ARM64_SYS_REG(3, 3, 14, 13, 1), 714 + ARM64_SYS_REG(3, 3, 14, 13, 2), 715 + ARM64_SYS_REG(3, 3, 14, 13, 3), 716 + ARM64_SYS_REG(3, 3, 14, 13, 4), 717 + ARM64_SYS_REG(3, 3, 14, 13, 5), 718 + ARM64_SYS_REG(3, 3, 14, 13, 6), 719 + ARM64_SYS_REG(3, 3, 14, 13, 7), 720 + ARM64_SYS_REG(3, 3, 14, 14, 0), 721 + ARM64_SYS_REG(3, 3, 14, 14, 1), 722 + ARM64_SYS_REG(3, 3, 14, 14, 2), 723 + ARM64_SYS_REG(3, 3, 14, 14, 3), 724 + ARM64_SYS_REG(3, 3, 14, 14, 4), 725 + ARM64_SYS_REG(3, 3, 14, 14, 5), 726 + ARM64_SYS_REG(3, 3, 14, 14, 6), 727 + ARM64_SYS_REG(3, 3, 14, 14, 7), 728 + ARM64_SYS_REG(3, 3, 14, 15, 0), 729 + ARM64_SYS_REG(3, 3, 14, 15, 1), 730 + ARM64_SYS_REG(3, 3, 14, 15, 2), 731 + ARM64_SYS_REG(3, 3, 14, 15, 3), 732 + ARM64_SYS_REG(3, 3, 14, 15, 4), 733 + ARM64_SYS_REG(3, 3, 14, 15, 5), 734 + ARM64_SYS_REG(3, 3, 14, 15, 6), 735 + ARM64_SYS_REG(3, 3, 14, 15, 7), /* PMCCFILTR_EL0 */ 736 + ARM64_SYS_REG(3, 4, 3, 0, 0), /* DACR32_EL2 */ 737 + ARM64_SYS_REG(3, 4, 5, 0, 1), /* IFSR32_EL2 */ 738 + ARM64_SYS_REG(3, 4, 5, 3, 0), /* FPEXC32_EL2 */ 739 + KVM_REG_ARM64 | KVM_REG_SIZE_U32 | KVM_REG_ARM_DEMUX | KVM_REG_ARM_DEMUX_ID_CCSIDR | 0, 740 + KVM_REG_ARM64 | KVM_REG_SIZE_U32 | KVM_REG_ARM_DEMUX | KVM_REG_ARM_DEMUX_ID_CCSIDR | 1, 741 + KVM_REG_ARM64 | KVM_REG_SIZE_U32 | KVM_REG_ARM_DEMUX | KVM_REG_ARM_DEMUX_ID_CCSIDR | 2, 742 + }; 743 + static __u64 base_regs_n = ARRAY_SIZE(base_regs); 744 + 745 + static __u64 vregs[] = { 746 + KVM_REG_ARM64 | KVM_REG_SIZE_U128 | KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(fp_regs.vregs[0]), 747 + KVM_REG_ARM64 | KVM_REG_SIZE_U128 | KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(fp_regs.vregs[1]), 748 + KVM_REG_ARM64 | KVM_REG_SIZE_U128 | KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(fp_regs.vregs[2]), 749 + KVM_REG_ARM64 | KVM_REG_SIZE_U128 | KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(fp_regs.vregs[3]), 750 + KVM_REG_ARM64 | KVM_REG_SIZE_U128 | KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(fp_regs.vregs[4]), 751 + KVM_REG_ARM64 | KVM_REG_SIZE_U128 | KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(fp_regs.vregs[5]), 752 + KVM_REG_ARM64 | KVM_REG_SIZE_U128 | KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(fp_regs.vregs[6]), 753 + KVM_REG_ARM64 | KVM_REG_SIZE_U128 | KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(fp_regs.vregs[7]), 754 + KVM_REG_ARM64 | KVM_REG_SIZE_U128 | KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(fp_regs.vregs[8]), 755 + KVM_REG_ARM64 | KVM_REG_SIZE_U128 | KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(fp_regs.vregs[9]), 756 + KVM_REG_ARM64 | KVM_REG_SIZE_U128 | KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(fp_regs.vregs[10]), 757 + KVM_REG_ARM64 | KVM_REG_SIZE_U128 | KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(fp_regs.vregs[11]), 758 + KVM_REG_ARM64 | KVM_REG_SIZE_U128 | KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(fp_regs.vregs[12]), 759 + KVM_REG_ARM64 | KVM_REG_SIZE_U128 | KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(fp_regs.vregs[13]), 760 + KVM_REG_ARM64 | KVM_REG_SIZE_U128 | KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(fp_regs.vregs[14]), 761 + KVM_REG_ARM64 | KVM_REG_SIZE_U128 | KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(fp_regs.vregs[15]), 762 + KVM_REG_ARM64 | KVM_REG_SIZE_U128 | KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(fp_regs.vregs[16]), 763 + KVM_REG_ARM64 | KVM_REG_SIZE_U128 | KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(fp_regs.vregs[17]), 764 + KVM_REG_ARM64 | KVM_REG_SIZE_U128 | KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(fp_regs.vregs[18]), 765 + KVM_REG_ARM64 | KVM_REG_SIZE_U128 | KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(fp_regs.vregs[19]), 766 + KVM_REG_ARM64 | KVM_REG_SIZE_U128 | KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(fp_regs.vregs[20]), 767 + KVM_REG_ARM64 | KVM_REG_SIZE_U128 | KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(fp_regs.vregs[21]), 768 + KVM_REG_ARM64 | KVM_REG_SIZE_U128 | KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(fp_regs.vregs[22]), 769 + KVM_REG_ARM64 | KVM_REG_SIZE_U128 | KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(fp_regs.vregs[23]), 770 + KVM_REG_ARM64 | KVM_REG_SIZE_U128 | KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(fp_regs.vregs[24]), 771 + KVM_REG_ARM64 | KVM_REG_SIZE_U128 | KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(fp_regs.vregs[25]), 772 + KVM_REG_ARM64 | KVM_REG_SIZE_U128 | KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(fp_regs.vregs[26]), 773 + KVM_REG_ARM64 | KVM_REG_SIZE_U128 | KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(fp_regs.vregs[27]), 774 + KVM_REG_ARM64 | KVM_REG_SIZE_U128 | KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(fp_regs.vregs[28]), 775 + KVM_REG_ARM64 | KVM_REG_SIZE_U128 | KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(fp_regs.vregs[29]), 776 + KVM_REG_ARM64 | KVM_REG_SIZE_U128 | KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(fp_regs.vregs[30]), 777 + KVM_REG_ARM64 | KVM_REG_SIZE_U128 | KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(fp_regs.vregs[31]), 778 + }; 779 + static __u64 vregs_n = ARRAY_SIZE(vregs); 780 + 781 + static __u64 sve_regs[] = { 782 + KVM_REG_ARM64_SVE_VLS, 783 + KVM_REG_ARM64_SVE_ZREG(0, 0), 784 + KVM_REG_ARM64_SVE_ZREG(1, 0), 785 + KVM_REG_ARM64_SVE_ZREG(2, 0), 786 + KVM_REG_ARM64_SVE_ZREG(3, 0), 787 + KVM_REG_ARM64_SVE_ZREG(4, 0), 788 + KVM_REG_ARM64_SVE_ZREG(5, 0), 789 + KVM_REG_ARM64_SVE_ZREG(6, 0), 790 + KVM_REG_ARM64_SVE_ZREG(7, 0), 791 + KVM_REG_ARM64_SVE_ZREG(8, 0), 792 + KVM_REG_ARM64_SVE_ZREG(9, 0), 793 + KVM_REG_ARM64_SVE_ZREG(10, 0), 794 + KVM_REG_ARM64_SVE_ZREG(11, 0), 795 + KVM_REG_ARM64_SVE_ZREG(12, 0), 796 + KVM_REG_ARM64_SVE_ZREG(13, 0), 797 + KVM_REG_ARM64_SVE_ZREG(14, 0), 798 + KVM_REG_ARM64_SVE_ZREG(15, 0), 799 + KVM_REG_ARM64_SVE_ZREG(16, 0), 800 + KVM_REG_ARM64_SVE_ZREG(17, 0), 801 + KVM_REG_ARM64_SVE_ZREG(18, 0), 802 + KVM_REG_ARM64_SVE_ZREG(19, 0), 803 + KVM_REG_ARM64_SVE_ZREG(20, 0), 804 + KVM_REG_ARM64_SVE_ZREG(21, 0), 805 + KVM_REG_ARM64_SVE_ZREG(22, 0), 806 + KVM_REG_ARM64_SVE_ZREG(23, 0), 807 + KVM_REG_ARM64_SVE_ZREG(24, 0), 808 + KVM_REG_ARM64_SVE_ZREG(25, 0), 809 + KVM_REG_ARM64_SVE_ZREG(26, 0), 810 + KVM_REG_ARM64_SVE_ZREG(27, 0), 811 + KVM_REG_ARM64_SVE_ZREG(28, 0), 812 + KVM_REG_ARM64_SVE_ZREG(29, 0), 813 + KVM_REG_ARM64_SVE_ZREG(30, 0), 814 + KVM_REG_ARM64_SVE_ZREG(31, 0), 815 + KVM_REG_ARM64_SVE_PREG(0, 0), 816 + KVM_REG_ARM64_SVE_PREG(1, 0), 817 + KVM_REG_ARM64_SVE_PREG(2, 0), 818 + KVM_REG_ARM64_SVE_PREG(3, 0), 819 + KVM_REG_ARM64_SVE_PREG(4, 0), 820 + KVM_REG_ARM64_SVE_PREG(5, 0), 821 + KVM_REG_ARM64_SVE_PREG(6, 0), 822 + KVM_REG_ARM64_SVE_PREG(7, 0), 823 + KVM_REG_ARM64_SVE_PREG(8, 0), 824 + KVM_REG_ARM64_SVE_PREG(9, 0), 825 + KVM_REG_ARM64_SVE_PREG(10, 0), 826 + KVM_REG_ARM64_SVE_PREG(11, 0), 827 + KVM_REG_ARM64_SVE_PREG(12, 0), 828 + KVM_REG_ARM64_SVE_PREG(13, 0), 829 + KVM_REG_ARM64_SVE_PREG(14, 0), 830 + KVM_REG_ARM64_SVE_PREG(15, 0), 831 + KVM_REG_ARM64_SVE_FFR(0), 832 + ARM64_SYS_REG(3, 0, 1, 2, 0), /* ZCR_EL1 */ 833 + }; 834 + static __u64 sve_regs_n = ARRAY_SIZE(sve_regs); 835 + 836 + static __u64 rejects_set[] = { 837 + #ifdef REG_LIST_SVE 838 + KVM_REG_ARM64_SVE_VLS, 839 + #endif 840 + }; 841 + static __u64 rejects_set_n = ARRAY_SIZE(rejects_set);

-6

tools/testing/selftests/kvm/clear_dirty_log_test.c

··· 1 - #define USE_CLEAR_DIRTY_LOG 2 - #define KVM_DIRTY_LOG_MANUAL_PROTECT_ENABLE (1 << 0) 3 - #define KVM_DIRTY_LOG_INITIALLY_SET (1 << 1) 4 - #define KVM_DIRTY_LOG_MANUAL_CAPS (KVM_DIRTY_LOG_MANUAL_PROTECT_ENABLE | \ 5 - KVM_DIRTY_LOG_INITIALLY_SET) 6 - #include "dirty_log_test.c"

+53 -216

tools/testing/selftests/kvm/demand_paging_test.c

··· 21 21 #include <linux/bitops.h> 22 22 #include <linux/userfaultfd.h> 23 23 24 - #include "test_util.h" 25 - #include "kvm_util.h" 24 + #include "perf_test_util.h" 26 25 #include "processor.h" 26 + #include "test_util.h" 27 27 28 28 #ifdef __NR_userfaultfd 29 - 30 - /* The memory slot index demand page */ 31 - #define TEST_MEM_SLOT_INDEX 1 32 - 33 - /* Default guest test virtual memory offset */ 34 - #define DEFAULT_GUEST_TEST_MEM 0xc0000000 35 - 36 - #define DEFAULT_GUEST_TEST_MEM_SIZE (1 << 30) /* 1G */ 37 29 38 30 #ifdef PRINT_PER_PAGE_UPDATES 39 31 #define PER_PAGE_DEBUG(...) printf(__VA_ARGS__) ··· 39 47 #define PER_VCPU_DEBUG(...) _no_printf(__VA_ARGS__) 40 48 #endif 41 49 42 - #define MAX_VCPUS 512 43 - 44 - /* 45 - * Guest/Host shared variables. Ensure addr_gva2hva() and/or 46 - * sync_global_to/from_guest() are used when accessing from 47 - * the host. READ/WRITE_ONCE() should also be used with anything 48 - * that may change. 49 - */ 50 - static uint64_t host_page_size; 51 - static uint64_t guest_page_size; 52 - 53 50 static char *guest_data_prototype; 54 - 55 - /* 56 - * Guest physical memory offset of the testing memory slot. 57 - * This will be set to the topmost valid physical address minus 58 - * the test memory size. 59 - */ 60 - static uint64_t guest_test_phys_mem; 61 - 62 - /* 63 - * Guest virtual memory offset of the testing memory slot. 64 - * Must not conflict with identity mapped test code. 65 - */ 66 - static uint64_t guest_test_virt_mem = DEFAULT_GUEST_TEST_MEM; 67 - 68 - struct vcpu_args { 69 - uint64_t gva; 70 - uint64_t pages; 71 - 72 - /* Only used by the host userspace part of the vCPU thread */ 73 - int vcpu_id; 74 - struct kvm_vm *vm; 75 - }; 76 - 77 - static struct vcpu_args vcpu_args[MAX_VCPUS]; 78 - 79 - /* 80 - * Continuously write to the first 8 bytes of each page in the demand paging 81 - * memory region. 82 - */ 83 - static void guest_code(uint32_t vcpu_id) 84 - { 85 - uint64_t gva; 86 - uint64_t pages; 87 - int i; 88 - 89 - /* Make sure vCPU args data structure is not corrupt. */ 90 - GUEST_ASSERT(vcpu_args[vcpu_id].vcpu_id == vcpu_id); 91 - 92 - gva = vcpu_args[vcpu_id].gva; 93 - pages = vcpu_args[vcpu_id].pages; 94 - 95 - for (i = 0; i < pages; i++) { 96 - uint64_t addr = gva + (i * guest_page_size); 97 - 98 - addr &= ~(host_page_size - 1); 99 - *(uint64_t *)addr = 0x0123456789ABCDEF; 100 - } 101 - 102 - GUEST_SYNC(1); 103 - } 104 51 105 52 static void *vcpu_worker(void *data) 106 53 { 107 54 int ret; 108 - struct vcpu_args *args = (struct vcpu_args *)data; 109 - struct kvm_vm *vm = args->vm; 110 - int vcpu_id = args->vcpu_id; 55 + struct vcpu_args *vcpu_args = (struct vcpu_args *)data; 56 + int vcpu_id = vcpu_args->vcpu_id; 57 + struct kvm_vm *vm = perf_test_args.vm; 111 58 struct kvm_run *run; 112 - struct timespec start, end, ts_diff; 59 + struct timespec start; 60 + struct timespec ts_diff; 113 61 114 62 vcpu_args_set(vm, vcpu_id, 1, vcpu_id); 115 63 run = vcpu_state(vm, vcpu_id); ··· 65 133 exit_reason_str(run->exit_reason)); 66 134 } 67 135 68 - clock_gettime(CLOCK_MONOTONIC, &end); 69 - ts_diff = timespec_sub(end, start); 136 + ts_diff = timespec_diff_now(start); 70 137 PER_VCPU_DEBUG("vCPU %d execution time: %ld.%.9lds\n", vcpu_id, 71 138 ts_diff.tv_sec, ts_diff.tv_nsec); 72 139 73 140 return NULL; 74 141 } 75 142 76 - #define PAGE_SHIFT_4K 12 77 - #define PTES_PER_4K_PT 512 78 - 79 - static struct kvm_vm *create_vm(enum vm_guest_mode mode, int vcpus, 80 - uint64_t vcpu_memory_bytes) 81 - { 82 - struct kvm_vm *vm; 83 - uint64_t pages = DEFAULT_GUEST_PHY_PAGES; 84 - 85 - /* Account for a few pages per-vCPU for stacks */ 86 - pages += DEFAULT_STACK_PGS * vcpus; 87 - 88 - /* 89 - * Reserve twice the ammount of memory needed to map the test region and 90 - * the page table / stacks region, at 4k, for page tables. Do the 91 - * calculation with 4K page size: the smallest of all archs. (e.g., 64K 92 - * page size guest will need even less memory for page tables). 93 - */ 94 - pages += (2 * pages) / PTES_PER_4K_PT; 95 - pages += ((2 * vcpus * vcpu_memory_bytes) >> PAGE_SHIFT_4K) / 96 - PTES_PER_4K_PT; 97 - pages = vm_adjust_num_guest_pages(mode, pages); 98 - 99 - pr_info("Testing guest mode: %s\n", vm_guest_mode_string(mode)); 100 - 101 - vm = _vm_create(mode, pages, O_RDWR); 102 - kvm_vm_elf_load(vm, program_invocation_name, 0, 0); 103 - #ifdef __x86_64__ 104 - vm_create_irqchip(vm); 105 - #endif 106 - return vm; 107 - } 108 - 109 143 static int handle_uffd_page_request(int uffd, uint64_t addr) 110 144 { 111 145 pid_t tid; 112 146 struct timespec start; 113 - struct timespec end; 147 + struct timespec ts_diff; 114 148 struct uffdio_copy copy; 115 149 int r; 116 150 ··· 84 186 85 187 copy.src = (uint64_t)guest_data_prototype; 86 188 copy.dst = addr; 87 - copy.len = host_page_size; 189 + copy.len = perf_test_args.host_page_size; 88 190 copy.mode = 0; 89 191 90 192 clock_gettime(CLOCK_MONOTONIC, &start); ··· 96 198 return r; 97 199 } 98 200 99 - clock_gettime(CLOCK_MONOTONIC, &end); 201 + ts_diff = timespec_diff_now(start); 100 202 101 203 PER_PAGE_DEBUG("UFFDIO_COPY %d \t%ld ns\n", tid, 102 - timespec_to_ns(timespec_sub(end, start))); 204 + timespec_to_ns(ts_diff)); 103 205 PER_PAGE_DEBUG("Paged in %ld bytes at 0x%lx from thread %d\n", 104 - host_page_size, addr, tid); 206 + perf_test_args.host_page_size, addr, tid); 105 207 106 208 return 0; 107 209 } ··· 121 223 int pipefd = uffd_args->pipefd; 122 224 useconds_t delay = uffd_args->delay; 123 225 int64_t pages = 0; 124 - struct timespec start, end, ts_diff; 226 + struct timespec start; 227 + struct timespec ts_diff; 125 228 126 229 clock_gettime(CLOCK_MONOTONIC, &start); 127 230 while (!quit_uffd_thread) { ··· 191 292 pages++; 192 293 } 193 294 194 - clock_gettime(CLOCK_MONOTONIC, &end); 195 - ts_diff = timespec_sub(end, start); 295 + ts_diff = timespec_diff_now(start); 196 296 PER_VCPU_DEBUG("userfaulted %ld pages over %ld.%.9lds. (%f/sec)\n", 197 297 pages, ts_diff.tv_sec, ts_diff.tv_nsec, 198 298 pages / ((double)ts_diff.tv_sec + (double)ts_diff.tv_nsec / 100000000.0)); ··· 249 351 } 250 352 251 353 static void run_test(enum vm_guest_mode mode, bool use_uffd, 252 - useconds_t uffd_delay, int vcpus, 253 - uint64_t vcpu_memory_bytes) 354 + useconds_t uffd_delay) 254 355 { 255 356 pthread_t *vcpu_threads; 256 357 pthread_t *uffd_handler_threads = NULL; 257 358 struct uffd_handler_args *uffd_args = NULL; 258 - struct timespec start, end, ts_diff; 359 + struct timespec start; 360 + struct timespec ts_diff; 259 361 int *pipefds = NULL; 260 362 struct kvm_vm *vm; 261 - uint64_t guest_num_pages; 262 363 int vcpu_id; 263 364 int r; 264 365 265 - vm = create_vm(mode, vcpus, vcpu_memory_bytes); 366 + vm = create_vm(mode, nr_vcpus, guest_percpu_mem_size); 266 367 267 - guest_page_size = vm_get_page_size(vm); 368 + perf_test_args.wr_fract = 1; 268 369 269 - TEST_ASSERT(vcpu_memory_bytes % guest_page_size == 0, 270 - "Guest memory size is not guest page size aligned."); 271 - 272 - guest_num_pages = (vcpus * vcpu_memory_bytes) / guest_page_size; 273 - guest_num_pages = vm_adjust_num_guest_pages(mode, guest_num_pages); 274 - 275 - /* 276 - * If there should be more memory in the guest test region than there 277 - * can be pages in the guest, it will definitely cause problems. 278 - */ 279 - TEST_ASSERT(guest_num_pages < vm_get_max_gfn(vm), 280 - "Requested more guest memory than address space allows.\n" 281 - " guest pages: %lx max gfn: %x vcpus: %d wss: %lx]\n", 282 - guest_num_pages, vm_get_max_gfn(vm), vcpus, 283 - vcpu_memory_bytes); 284 - 285 - host_page_size = getpagesize(); 286 - TEST_ASSERT(vcpu_memory_bytes % host_page_size == 0, 287 - "Guest memory size is not host page size aligned."); 288 - 289 - guest_test_phys_mem = (vm_get_max_gfn(vm) - guest_num_pages) * 290 - guest_page_size; 291 - guest_test_phys_mem &= ~(host_page_size - 1); 292 - 293 - #ifdef __s390x__ 294 - /* Align to 1M (segment size) */ 295 - guest_test_phys_mem &= ~((1 << 20) - 1); 296 - #endif 297 - 298 - pr_info("guest physical test memory offset: 0x%lx\n", guest_test_phys_mem); 299 - 300 - /* Add an extra memory slot for testing demand paging */ 301 - vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS, 302 - guest_test_phys_mem, 303 - TEST_MEM_SLOT_INDEX, 304 - guest_num_pages, 0); 305 - 306 - /* Do mapping for the demand paging memory slot */ 307 - virt_map(vm, guest_test_virt_mem, guest_test_phys_mem, guest_num_pages, 0); 308 - 309 - ucall_init(vm, NULL); 310 - 311 - guest_data_prototype = malloc(host_page_size); 370 + guest_data_prototype = malloc(perf_test_args.host_page_size); 312 371 TEST_ASSERT(guest_data_prototype, 313 372 "Failed to allocate buffer for guest data pattern"); 314 - memset(guest_data_prototype, 0xAB, host_page_size); 373 + memset(guest_data_prototype, 0xAB, perf_test_args.host_page_size); 315 374 316 - vcpu_threads = malloc(vcpus * sizeof(*vcpu_threads)); 375 + vcpu_threads = malloc(nr_vcpus * sizeof(*vcpu_threads)); 317 376 TEST_ASSERT(vcpu_threads, "Memory allocation failed"); 377 + 378 + add_vcpus(vm, nr_vcpus, guest_percpu_mem_size); 318 379 319 380 if (use_uffd) { 320 381 uffd_handler_threads = 321 - malloc(vcpus * sizeof(*uffd_handler_threads)); 382 + malloc(nr_vcpus * sizeof(*uffd_handler_threads)); 322 383 TEST_ASSERT(uffd_handler_threads, "Memory allocation failed"); 323 384 324 - uffd_args = malloc(vcpus * sizeof(*uffd_args)); 385 + uffd_args = malloc(nr_vcpus * sizeof(*uffd_args)); 325 386 TEST_ASSERT(uffd_args, "Memory allocation failed"); 326 387 327 - pipefds = malloc(sizeof(int) * vcpus * 2); 388 + pipefds = malloc(sizeof(int) * nr_vcpus * 2); 328 389 TEST_ASSERT(pipefds, "Unable to allocate memory for pipefd"); 329 - } 330 390 331 - for (vcpu_id = 0; vcpu_id < vcpus; vcpu_id++) { 332 - vm_paddr_t vcpu_gpa; 333 - void *vcpu_hva; 391 + for (vcpu_id = 0; vcpu_id < nr_vcpus; vcpu_id++) { 392 + vm_paddr_t vcpu_gpa; 393 + void *vcpu_hva; 334 394 335 - vm_vcpu_add_default(vm, vcpu_id, guest_code); 395 + vcpu_gpa = guest_test_phys_mem + (vcpu_id * guest_percpu_mem_size); 396 + PER_VCPU_DEBUG("Added VCPU %d with test mem gpa [%lx, %lx)\n", 397 + vcpu_id, vcpu_gpa, vcpu_gpa + guest_percpu_mem_size); 336 398 337 - vcpu_gpa = guest_test_phys_mem + (vcpu_id * vcpu_memory_bytes); 338 - PER_VCPU_DEBUG("Added VCPU %d with test mem gpa [%lx, %lx)\n", 339 - vcpu_id, vcpu_gpa, vcpu_gpa + vcpu_memory_bytes); 399 + /* Cache the HVA pointer of the region */ 400 + vcpu_hva = addr_gpa2hva(vm, vcpu_gpa); 340 401 341 - /* Cache the HVA pointer of the region */ 342 - vcpu_hva = addr_gpa2hva(vm, vcpu_gpa); 343 - 344 - if (use_uffd) { 345 402 /* 346 403 * Set up user fault fd to handle demand paging 347 404 * requests. ··· 309 456 &uffd_handler_threads[vcpu_id], 310 457 pipefds[vcpu_id * 2], 311 458 uffd_delay, &uffd_args[vcpu_id], 312 - vcpu_hva, vcpu_memory_bytes); 459 + vcpu_hva, guest_percpu_mem_size); 313 460 if (r < 0) 314 461 exit(-r); 315 462 } 316 - 317 - #ifdef __x86_64__ 318 - vcpu_set_cpuid(vm, vcpu_id, kvm_get_supported_cpuid()); 319 - #endif 320 - 321 - vcpu_args[vcpu_id].vm = vm; 322 - vcpu_args[vcpu_id].vcpu_id = vcpu_id; 323 - vcpu_args[vcpu_id].gva = guest_test_virt_mem + 324 - (vcpu_id * vcpu_memory_bytes); 325 - vcpu_args[vcpu_id].pages = vcpu_memory_bytes / guest_page_size; 326 463 } 327 464 328 465 /* Export the shared variables to the guest */ 329 - sync_global_to_guest(vm, host_page_size); 330 - sync_global_to_guest(vm, guest_page_size); 331 - sync_global_to_guest(vm, vcpu_args); 466 + sync_global_to_guest(vm, perf_test_args); 332 467 333 468 pr_info("Finished creating vCPUs and starting uffd threads\n"); 334 469 335 470 clock_gettime(CLOCK_MONOTONIC, &start); 336 471 337 - for (vcpu_id = 0; vcpu_id < vcpus; vcpu_id++) { 472 + for (vcpu_id = 0; vcpu_id < nr_vcpus; vcpu_id++) { 338 473 pthread_create(&vcpu_threads[vcpu_id], NULL, vcpu_worker, 339 - &vcpu_args[vcpu_id]); 474 + &perf_test_args.vcpu_args[vcpu_id]); 340 475 } 341 476 342 477 pr_info("Started all vCPUs\n"); 343 478 344 479 /* Wait for the vcpu threads to quit */ 345 - for (vcpu_id = 0; vcpu_id < vcpus; vcpu_id++) { 480 + for (vcpu_id = 0; vcpu_id < nr_vcpus; vcpu_id++) { 346 481 pthread_join(vcpu_threads[vcpu_id], NULL); 347 482 PER_VCPU_DEBUG("Joined thread for vCPU %d\n", vcpu_id); 348 483 } 349 484 350 - pr_info("All vCPU threads joined\n"); 485 + ts_diff = timespec_diff_now(start); 351 486 352 - clock_gettime(CLOCK_MONOTONIC, &end); 487 + pr_info("All vCPU threads joined\n"); 353 488 354 489 if (use_uffd) { 355 490 char c; 356 491 357 492 /* Tell the user fault fd handler threads to quit */ 358 - for (vcpu_id = 0; vcpu_id < vcpus; vcpu_id++) { 493 + for (vcpu_id = 0; vcpu_id < nr_vcpus; vcpu_id++) { 359 494 r = write(pipefds[vcpu_id * 2 + 1], &c, 1); 360 495 TEST_ASSERT(r == 1, "Unable to write to pipefd"); 361 496 ··· 351 510 } 352 511 } 353 512 354 - ts_diff = timespec_sub(end, start); 355 513 pr_info("Total guest execution time: %ld.%.9lds\n", 356 514 ts_diff.tv_sec, ts_diff.tv_nsec); 357 515 pr_info("Overall demand paging rate: %f pgs/sec\n", 358 - guest_num_pages / ((double)ts_diff.tv_sec + (double)ts_diff.tv_nsec / 100000000.0)); 516 + perf_test_args.vcpu_args[0].pages * nr_vcpus / 517 + ((double)ts_diff.tv_sec + (double)ts_diff.tv_nsec / 100000000.0)); 359 518 360 519 ucall_uninit(vm); 361 520 kvm_vm_free(vm); ··· 409 568 410 569 int main(int argc, char *argv[]) 411 570 { 571 + int max_vcpus = kvm_check_cap(KVM_CAP_MAX_VCPUS); 412 572 bool mode_selected = false; 413 - uint64_t vcpu_memory_bytes = DEFAULT_GUEST_TEST_MEM_SIZE; 414 - int vcpus = 1; 415 573 unsigned int mode; 416 574 int opt, i; 417 575 bool use_uffd = false; ··· 459 619 "A negative UFFD delay is not supported."); 460 620 break; 461 621 case 'b': 462 - vcpu_memory_bytes = parse_size(optarg); 622 + guest_percpu_mem_size = parse_size(optarg); 463 623 break; 464 624 case 'v': 465 - vcpus = atoi(optarg); 466 - TEST_ASSERT(vcpus > 0, 467 - "Must have a positive number of vCPUs"); 468 - TEST_ASSERT(vcpus <= MAX_VCPUS, 469 - "This test does not currently support\n" 470 - "more than %d vCPUs.", MAX_VCPUS); 625 + nr_vcpus = atoi(optarg); 626 + TEST_ASSERT(nr_vcpus > 0 && nr_vcpus <= max_vcpus, 627 + "Invalid number of vcpus, must be between 1 and %d", max_vcpus); 471 628 break; 472 629 case 'h': 473 630 default: ··· 479 642 TEST_ASSERT(guest_modes[i].supported, 480 643 "Guest mode ID %d (%s) not supported.", 481 644 i, vm_guest_mode_string(i)); 482 - run_test(i, use_uffd, uffd_delay, vcpus, vcpu_memory_bytes); 645 + run_test(i, use_uffd, uffd_delay); 483 646 } 484 647 485 648 return 0;

+376

tools/testing/selftests/kvm/dirty_log_perf_test.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* 3 + * KVM dirty page logging performance test 4 + * 5 + * Based on dirty_log_test.c 6 + * 7 + * Copyright (C) 2018, Red Hat, Inc. 8 + * Copyright (C) 2020, Google, Inc. 9 + */ 10 + 11 + #define _GNU_SOURCE /* for program_invocation_name */ 12 + 13 + #include <stdio.h> 14 + #include <stdlib.h> 15 + #include <unistd.h> 16 + #include <time.h> 17 + #include <pthread.h> 18 + #include <linux/bitmap.h> 19 + #include <linux/bitops.h> 20 + 21 + #include "kvm_util.h" 22 + #include "perf_test_util.h" 23 + #include "processor.h" 24 + #include "test_util.h" 25 + 26 + /* How many host loops to run by default (one KVM_GET_DIRTY_LOG for each loop)*/ 27 + #define TEST_HOST_LOOP_N 2UL 28 + 29 + /* Host variables */ 30 + static bool host_quit; 31 + static uint64_t iteration; 32 + static uint64_t vcpu_last_completed_iteration[MAX_VCPUS]; 33 + 34 + static void *vcpu_worker(void *data) 35 + { 36 + int ret; 37 + struct kvm_vm *vm = perf_test_args.vm; 38 + uint64_t pages_count = 0; 39 + struct kvm_run *run; 40 + struct timespec start; 41 + struct timespec ts_diff; 42 + struct timespec total = (struct timespec){0}; 43 + struct timespec avg; 44 + struct vcpu_args *vcpu_args = (struct vcpu_args *)data; 45 + int vcpu_id = vcpu_args->vcpu_id; 46 + 47 + vcpu_args_set(vm, vcpu_id, 1, vcpu_id); 48 + run = vcpu_state(vm, vcpu_id); 49 + 50 + while (!READ_ONCE(host_quit)) { 51 + uint64_t current_iteration = READ_ONCE(iteration); 52 + 53 + clock_gettime(CLOCK_MONOTONIC, &start); 54 + ret = _vcpu_run(vm, vcpu_id); 55 + ts_diff = timespec_diff_now(start); 56 + 57 + TEST_ASSERT(ret == 0, "vcpu_run failed: %d\n", ret); 58 + TEST_ASSERT(get_ucall(vm, vcpu_id, NULL) == UCALL_SYNC, 59 + "Invalid guest sync status: exit_reason=%s\n", 60 + exit_reason_str(run->exit_reason)); 61 + 62 + pr_debug("Got sync event from vCPU %d\n", vcpu_id); 63 + vcpu_last_completed_iteration[vcpu_id] = current_iteration; 64 + pr_debug("vCPU %d updated last completed iteration to %lu\n", 65 + vcpu_id, vcpu_last_completed_iteration[vcpu_id]); 66 + 67 + if (current_iteration) { 68 + pages_count += vcpu_args->pages; 69 + total = timespec_add(total, ts_diff); 70 + pr_debug("vCPU %d iteration %lu dirty memory time: %ld.%.9lds\n", 71 + vcpu_id, current_iteration, ts_diff.tv_sec, 72 + ts_diff.tv_nsec); 73 + } else { 74 + pr_debug("vCPU %d iteration %lu populate memory time: %ld.%.9lds\n", 75 + vcpu_id, current_iteration, ts_diff.tv_sec, 76 + ts_diff.tv_nsec); 77 + } 78 + 79 + while (current_iteration == READ_ONCE(iteration) && 80 + !READ_ONCE(host_quit)) {} 81 + } 82 + 83 + avg = timespec_div(total, vcpu_last_completed_iteration[vcpu_id]); 84 + pr_debug("\nvCPU %d dirtied 0x%lx pages over %lu iterations in %ld.%.9lds. (Avg %ld.%.9lds/iteration)\n", 85 + vcpu_id, pages_count, vcpu_last_completed_iteration[vcpu_id], 86 + total.tv_sec, total.tv_nsec, avg.tv_sec, avg.tv_nsec); 87 + 88 + return NULL; 89 + } 90 + 91 + #ifdef USE_CLEAR_DIRTY_LOG 92 + static u64 dirty_log_manual_caps; 93 + #endif 94 + 95 + static void run_test(enum vm_guest_mode mode, unsigned long iterations, 96 + uint64_t phys_offset, int wr_fract) 97 + { 98 + pthread_t *vcpu_threads; 99 + struct kvm_vm *vm; 100 + unsigned long *bmap; 101 + uint64_t guest_num_pages; 102 + uint64_t host_num_pages; 103 + int vcpu_id; 104 + struct timespec start; 105 + struct timespec ts_diff; 106 + struct timespec get_dirty_log_total = (struct timespec){0}; 107 + struct timespec vcpu_dirty_total = (struct timespec){0}; 108 + struct timespec avg; 109 + #ifdef USE_CLEAR_DIRTY_LOG 110 + struct kvm_enable_cap cap = {}; 111 + struct timespec clear_dirty_log_total = (struct timespec){0}; 112 + #endif 113 + 114 + vm = create_vm(mode, nr_vcpus, guest_percpu_mem_size); 115 + 116 + perf_test_args.wr_fract = wr_fract; 117 + 118 + guest_num_pages = (nr_vcpus * guest_percpu_mem_size) >> vm_get_page_shift(vm); 119 + guest_num_pages = vm_adjust_num_guest_pages(mode, guest_num_pages); 120 + host_num_pages = vm_num_host_pages(mode, guest_num_pages); 121 + bmap = bitmap_alloc(host_num_pages); 122 + 123 + #ifdef USE_CLEAR_DIRTY_LOG 124 + cap.cap = KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2; 125 + cap.args[0] = dirty_log_manual_caps; 126 + vm_enable_cap(vm, &cap); 127 + #endif 128 + 129 + vcpu_threads = malloc(nr_vcpus * sizeof(*vcpu_threads)); 130 + TEST_ASSERT(vcpu_threads, "Memory allocation failed"); 131 + 132 + add_vcpus(vm, nr_vcpus, guest_percpu_mem_size); 133 + 134 + sync_global_to_guest(vm, perf_test_args); 135 + 136 + /* Start the iterations */ 137 + iteration = 0; 138 + host_quit = false; 139 + 140 + clock_gettime(CLOCK_MONOTONIC, &start); 141 + for (vcpu_id = 0; vcpu_id < nr_vcpus; vcpu_id++) { 142 + pthread_create(&vcpu_threads[vcpu_id], NULL, vcpu_worker, 143 + &perf_test_args.vcpu_args[vcpu_id]); 144 + } 145 + 146 + /* Allow the vCPU to populate memory */ 147 + pr_debug("Starting iteration %lu - Populating\n", iteration); 148 + while (READ_ONCE(vcpu_last_completed_iteration[vcpu_id]) != iteration) 149 + pr_debug("Waiting for vcpu_last_completed_iteration == %lu\n", 150 + iteration); 151 + 152 + ts_diff = timespec_diff_now(start); 153 + pr_info("Populate memory time: %ld.%.9lds\n", 154 + ts_diff.tv_sec, ts_diff.tv_nsec); 155 + 156 + /* Enable dirty logging */ 157 + clock_gettime(CLOCK_MONOTONIC, &start); 158 + vm_mem_region_set_flags(vm, TEST_MEM_SLOT_INDEX, 159 + KVM_MEM_LOG_DIRTY_PAGES); 160 + ts_diff = timespec_diff_now(start); 161 + pr_info("Enabling dirty logging time: %ld.%.9lds\n\n", 162 + ts_diff.tv_sec, ts_diff.tv_nsec); 163 + 164 + while (iteration < iterations) { 165 + /* 166 + * Incrementing the iteration number will start the vCPUs 167 + * dirtying memory again. 168 + */ 169 + clock_gettime(CLOCK_MONOTONIC, &start); 170 + iteration++; 171 + 172 + pr_debug("Starting iteration %lu\n", iteration); 173 + for (vcpu_id = 0; vcpu_id < nr_vcpus; vcpu_id++) { 174 + while (READ_ONCE(vcpu_last_completed_iteration[vcpu_id]) != iteration) 175 + pr_debug("Waiting for vCPU %d vcpu_last_completed_iteration == %lu\n", 176 + vcpu_id, iteration); 177 + } 178 + 179 + ts_diff = timespec_diff_now(start); 180 + vcpu_dirty_total = timespec_add(vcpu_dirty_total, ts_diff); 181 + pr_info("Iteration %lu dirty memory time: %ld.%.9lds\n", 182 + iteration, ts_diff.tv_sec, ts_diff.tv_nsec); 183 + 184 + clock_gettime(CLOCK_MONOTONIC, &start); 185 + kvm_vm_get_dirty_log(vm, TEST_MEM_SLOT_INDEX, bmap); 186 + 187 + ts_diff = timespec_diff_now(start); 188 + get_dirty_log_total = timespec_add(get_dirty_log_total, 189 + ts_diff); 190 + pr_info("Iteration %lu get dirty log time: %ld.%.9lds\n", 191 + iteration, ts_diff.tv_sec, ts_diff.tv_nsec); 192 + 193 + #ifdef USE_CLEAR_DIRTY_LOG 194 + clock_gettime(CLOCK_MONOTONIC, &start); 195 + kvm_vm_clear_dirty_log(vm, TEST_MEM_SLOT_INDEX, bmap, 0, 196 + host_num_pages); 197 + 198 + ts_diff = timespec_diff_now(start); 199 + clear_dirty_log_total = timespec_add(clear_dirty_log_total, 200 + ts_diff); 201 + pr_info("Iteration %lu clear dirty log time: %ld.%.9lds\n", 202 + iteration, ts_diff.tv_sec, ts_diff.tv_nsec); 203 + #endif 204 + } 205 + 206 + /* Tell the vcpu thread to quit */ 207 + host_quit = true; 208 + for (vcpu_id = 0; vcpu_id < nr_vcpus; vcpu_id++) 209 + pthread_join(vcpu_threads[vcpu_id], NULL); 210 + 211 + /* Disable dirty logging */ 212 + clock_gettime(CLOCK_MONOTONIC, &start); 213 + vm_mem_region_set_flags(vm, TEST_MEM_SLOT_INDEX, 0); 214 + ts_diff = timespec_diff_now(start); 215 + pr_info("Disabling dirty logging time: %ld.%.9lds\n", 216 + ts_diff.tv_sec, ts_diff.tv_nsec); 217 + 218 + avg = timespec_div(get_dirty_log_total, iterations); 219 + pr_info("Get dirty log over %lu iterations took %ld.%.9lds. (Avg %ld.%.9lds/iteration)\n", 220 + iterations, get_dirty_log_total.tv_sec, 221 + get_dirty_log_total.tv_nsec, avg.tv_sec, avg.tv_nsec); 222 + 223 + #ifdef USE_CLEAR_DIRTY_LOG 224 + avg = timespec_div(clear_dirty_log_total, iterations); 225 + pr_info("Clear dirty log over %lu iterations took %ld.%.9lds. (Avg %ld.%.9lds/iteration)\n", 226 + iterations, clear_dirty_log_total.tv_sec, 227 + clear_dirty_log_total.tv_nsec, avg.tv_sec, avg.tv_nsec); 228 + #endif 229 + 230 + free(bmap); 231 + free(vcpu_threads); 232 + ucall_uninit(vm); 233 + kvm_vm_free(vm); 234 + } 235 + 236 + struct guest_mode { 237 + bool supported; 238 + bool enabled; 239 + }; 240 + static struct guest_mode guest_modes[NUM_VM_MODES]; 241 + 242 + #define guest_mode_init(mode, supported, enabled) ({ \ 243 + guest_modes[mode] = (struct guest_mode){ supported, enabled }; \ 244 + }) 245 + 246 + static void help(char *name) 247 + { 248 + int i; 249 + 250 + puts(""); 251 + printf("usage: %s [-h] [-i iterations] [-p offset] " 252 + "[-m mode] [-b vcpu bytes] [-v vcpus]\n", name); 253 + puts(""); 254 + printf(" -i: specify iteration counts (default: %"PRIu64")\n", 255 + TEST_HOST_LOOP_N); 256 + printf(" -p: specify guest physical test memory offset\n" 257 + " Warning: a low offset can conflict with the loaded test code.\n"); 258 + printf(" -m: specify the guest mode ID to test " 259 + "(default: test all supported modes)\n" 260 + " This option may be used multiple times.\n" 261 + " Guest mode IDs:\n"); 262 + for (i = 0; i < NUM_VM_MODES; ++i) { 263 + printf(" %d: %s%s\n", i, vm_guest_mode_string(i), 264 + guest_modes[i].supported ? " (supported)" : ""); 265 + } 266 + printf(" -b: specify the size of the memory region which should be\n" 267 + " dirtied by each vCPU. e.g. 10M or 3G.\n" 268 + " (default: 1G)\n"); 269 + printf(" -f: specify the fraction of pages which should be written to\n" 270 + " as opposed to simply read, in the form\n" 271 + " 1/<fraction of pages to write>.\n" 272 + " (default: 1 i.e. all pages are written to.)\n"); 273 + printf(" -v: specify the number of vCPUs to run.\n"); 274 + puts(""); 275 + exit(0); 276 + } 277 + 278 + int main(int argc, char *argv[]) 279 + { 280 + unsigned long iterations = TEST_HOST_LOOP_N; 281 + bool mode_selected = false; 282 + uint64_t phys_offset = 0; 283 + unsigned int mode; 284 + int opt, i; 285 + int wr_fract = 1; 286 + 287 + #ifdef USE_CLEAR_DIRTY_LOG 288 + dirty_log_manual_caps = 289 + kvm_check_cap(KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2); 290 + if (!dirty_log_manual_caps) { 291 + print_skip("KVM_CLEAR_DIRTY_LOG not available"); 292 + exit(KSFT_SKIP); 293 + } 294 + dirty_log_manual_caps &= (KVM_DIRTY_LOG_MANUAL_PROTECT_ENABLE | 295 + KVM_DIRTY_LOG_INITIALLY_SET); 296 + #endif 297 + 298 + #ifdef __x86_64__ 299 + guest_mode_init(VM_MODE_PXXV48_4K, true, true); 300 + #endif 301 + #ifdef __aarch64__ 302 + guest_mode_init(VM_MODE_P40V48_4K, true, true); 303 + guest_mode_init(VM_MODE_P40V48_64K, true, true); 304 + 305 + { 306 + unsigned int limit = kvm_check_cap(KVM_CAP_ARM_VM_IPA_SIZE); 307 + 308 + if (limit >= 52) 309 + guest_mode_init(VM_MODE_P52V48_64K, true, true); 310 + if (limit >= 48) { 311 + guest_mode_init(VM_MODE_P48V48_4K, true, true); 312 + guest_mode_init(VM_MODE_P48V48_64K, true, true); 313 + } 314 + } 315 + #endif 316 + #ifdef __s390x__ 317 + guest_mode_init(VM_MODE_P40V48_4K, true, true); 318 + #endif 319 + 320 + while ((opt = getopt(argc, argv, "hi:p:m:b:f:v:")) != -1) { 321 + switch (opt) { 322 + case 'i': 323 + iterations = strtol(optarg, NULL, 10); 324 + break; 325 + case 'p': 326 + phys_offset = strtoull(optarg, NULL, 0); 327 + break; 328 + case 'm': 329 + if (!mode_selected) { 330 + for (i = 0; i < NUM_VM_MODES; ++i) 331 + guest_modes[i].enabled = false; 332 + mode_selected = true; 333 + } 334 + mode = strtoul(optarg, NULL, 10); 335 + TEST_ASSERT(mode < NUM_VM_MODES, 336 + "Guest mode ID %d too big", mode); 337 + guest_modes[mode].enabled = true; 338 + break; 339 + case 'b': 340 + guest_percpu_mem_size = parse_size(optarg); 341 + break; 342 + case 'f': 343 + wr_fract = atoi(optarg); 344 + TEST_ASSERT(wr_fract >= 1, 345 + "Write fraction cannot be less than one"); 346 + break; 347 + case 'v': 348 + nr_vcpus = atoi(optarg); 349 + TEST_ASSERT(nr_vcpus > 0, 350 + "Must have a positive number of vCPUs"); 351 + TEST_ASSERT(nr_vcpus <= MAX_VCPUS, 352 + "This test does not currently support\n" 353 + "more than %d vCPUs.", MAX_VCPUS); 354 + break; 355 + case 'h': 356 + default: 357 + help(argv[0]); 358 + break; 359 + } 360 + } 361 + 362 + TEST_ASSERT(iterations >= 2, "The test should have at least two iterations"); 363 + 364 + pr_info("Test iterations: %"PRIu64"\n", iterations); 365 + 366 + for (i = 0; i < NUM_VM_MODES; ++i) { 367 + if (!guest_modes[i].enabled) 368 + continue; 369 + TEST_ASSERT(guest_modes[i].supported, 370 + "Guest mode ID %d (%s) not supported.", 371 + i, vm_guest_mode_string(i)); 372 + run_test(i, iterations, phys_offset, wr_fract); 373 + } 374 + 375 + return 0; 376 + }

+158 -33

tools/testing/selftests/kvm/dirty_log_test.c

··· 128 128 static uint64_t host_clear_count; 129 129 static uint64_t host_track_next_count; 130 130 131 + enum log_mode_t { 132 + /* Only use KVM_GET_DIRTY_LOG for logging */ 133 + LOG_MODE_DIRTY_LOG = 0, 134 + 135 + /* Use both KVM_[GET|CLEAR]_DIRTY_LOG for logging */ 136 + LOG_MODE_CLEAR_LOG = 1, 137 + 138 + LOG_MODE_NUM, 139 + 140 + /* Run all supported modes */ 141 + LOG_MODE_ALL = LOG_MODE_NUM, 142 + }; 143 + 144 + /* Mode of logging to test. Default is to run all supported modes */ 145 + static enum log_mode_t host_log_mode_option = LOG_MODE_ALL; 146 + /* Logging mode for current run */ 147 + static enum log_mode_t host_log_mode; 148 + 149 + static bool clear_log_supported(void) 150 + { 151 + return kvm_check_cap(KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2); 152 + } 153 + 154 + static void clear_log_create_vm_done(struct kvm_vm *vm) 155 + { 156 + struct kvm_enable_cap cap = {}; 157 + u64 manual_caps; 158 + 159 + manual_caps = kvm_check_cap(KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2); 160 + TEST_ASSERT(manual_caps, "MANUAL_CAPS is zero!"); 161 + manual_caps &= (KVM_DIRTY_LOG_MANUAL_PROTECT_ENABLE | 162 + KVM_DIRTY_LOG_INITIALLY_SET); 163 + cap.cap = KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2; 164 + cap.args[0] = manual_caps; 165 + vm_enable_cap(vm, &cap); 166 + } 167 + 168 + static void dirty_log_collect_dirty_pages(struct kvm_vm *vm, int slot, 169 + void *bitmap, uint32_t num_pages) 170 + { 171 + kvm_vm_get_dirty_log(vm, slot, bitmap); 172 + } 173 + 174 + static void clear_log_collect_dirty_pages(struct kvm_vm *vm, int slot, 175 + void *bitmap, uint32_t num_pages) 176 + { 177 + kvm_vm_get_dirty_log(vm, slot, bitmap); 178 + kvm_vm_clear_dirty_log(vm, slot, bitmap, 0, num_pages); 179 + } 180 + 181 + struct log_mode { 182 + const char *name; 183 + /* Return true if this mode is supported, otherwise false */ 184 + bool (*supported)(void); 185 + /* Hook when the vm creation is done (before vcpu creation) */ 186 + void (*create_vm_done)(struct kvm_vm *vm); 187 + /* Hook to collect the dirty pages into the bitmap provided */ 188 + void (*collect_dirty_pages) (struct kvm_vm *vm, int slot, 189 + void *bitmap, uint32_t num_pages); 190 + } log_modes[LOG_MODE_NUM] = { 191 + { 192 + .name = "dirty-log", 193 + .collect_dirty_pages = dirty_log_collect_dirty_pages, 194 + }, 195 + { 196 + .name = "clear-log", 197 + .supported = clear_log_supported, 198 + .create_vm_done = clear_log_create_vm_done, 199 + .collect_dirty_pages = clear_log_collect_dirty_pages, 200 + }, 201 + }; 202 + 131 203 /* 132 204 * We use this bitmap to track some pages that should have its dirty 133 205 * bit set in the _next_ iteration. For example, if we detected the ··· 208 136 * report that write in the next get dirty log call. 209 137 */ 210 138 static unsigned long *host_bmap_track; 139 + 140 + static void log_modes_dump(void) 141 + { 142 + int i; 143 + 144 + printf("all"); 145 + for (i = 0; i < LOG_MODE_NUM; i++) 146 + printf(", %s", log_modes[i].name); 147 + printf("\n"); 148 + } 149 + 150 + static bool log_mode_supported(void) 151 + { 152 + struct log_mode *mode = &log_modes[host_log_mode]; 153 + 154 + if (mode->supported) 155 + return mode->supported(); 156 + 157 + return true; 158 + } 159 + 160 + static void log_mode_create_vm_done(struct kvm_vm *vm) 161 + { 162 + struct log_mode *mode = &log_modes[host_log_mode]; 163 + 164 + if (mode->create_vm_done) 165 + mode->create_vm_done(vm); 166 + } 167 + 168 + static void log_mode_collect_dirty_pages(struct kvm_vm *vm, int slot, 169 + void *bitmap, uint32_t num_pages) 170 + { 171 + struct log_mode *mode = &log_modes[host_log_mode]; 172 + 173 + TEST_ASSERT(mode->collect_dirty_pages != NULL, 174 + "collect_dirty_pages() is required for any log mode!"); 175 + mode->collect_dirty_pages(vm, slot, bitmap, num_pages); 176 + } 211 177 212 178 static void generate_random_array(uint64_t *guest_array, uint64_t size) 213 179 { ··· 305 195 page); 306 196 } 307 197 308 - if (test_bit_le(page, bmap)) { 198 + if (test_and_clear_bit_le(page, bmap)) { 309 199 host_dirty_count++; 310 200 /* 311 201 * If the bit is set, the value written onto ··· 362 252 363 253 pr_info("Testing guest mode: %s\n", vm_guest_mode_string(mode)); 364 254 365 - vm = _vm_create(mode, DEFAULT_GUEST_PHY_PAGES + extra_pg_pages, O_RDWR); 255 + vm = vm_create(mode, DEFAULT_GUEST_PHY_PAGES + extra_pg_pages, O_RDWR); 366 256 kvm_vm_elf_load(vm, program_invocation_name, 0, 0); 367 257 #ifdef __x86_64__ 368 258 vm_create_irqchip(vm); 369 259 #endif 260 + log_mode_create_vm_done(vm); 370 261 vm_vcpu_add_default(vm, vcpuid, guest_code); 371 262 return vm; 372 263 } ··· 375 264 #define DIRTY_MEM_BITS 30 /* 1G */ 376 265 #define PAGE_SHIFT_4K 12 377 266 378 - #ifdef USE_CLEAR_DIRTY_LOG 379 - static u64 dirty_log_manual_caps; 380 - #endif 381 - 382 267 static void run_test(enum vm_guest_mode mode, unsigned long iterations, 383 268 unsigned long interval, uint64_t phys_offset) 384 269 { 385 270 pthread_t vcpu_thread; 386 271 struct kvm_vm *vm; 387 272 unsigned long *bmap; 273 + 274 + if (!log_mode_supported()) { 275 + print_skip("Log mode '%s' not supported", 276 + log_modes[host_log_mode].name); 277 + return; 278 + } 388 279 389 280 /* 390 281 * We reserve page table for 2 times of extra dirty mem which ··· 430 317 bmap = bitmap_alloc(host_num_pages); 431 318 host_bmap_track = bitmap_alloc(host_num_pages); 432 319 433 - #ifdef USE_CLEAR_DIRTY_LOG 434 - struct kvm_enable_cap cap = {}; 435 - 436 - cap.cap = KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2; 437 - cap.args[0] = dirty_log_manual_caps; 438 - vm_enable_cap(vm, &cap); 439 - #endif 440 - 441 320 /* Add an extra memory slot for testing dirty logging */ 442 321 vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS, 443 322 guest_test_phys_mem, ··· 467 362 while (iteration < iterations) { 468 363 /* Give the vcpu thread some time to dirty some pages */ 469 364 usleep(interval * 1000); 470 - kvm_vm_get_dirty_log(vm, TEST_MEM_SLOT_INDEX, bmap); 471 - #ifdef USE_CLEAR_DIRTY_LOG 472 - kvm_vm_clear_dirty_log(vm, TEST_MEM_SLOT_INDEX, bmap, 0, 473 - host_num_pages); 474 - #endif 365 + log_mode_collect_dirty_pages(vm, TEST_MEM_SLOT_INDEX, 366 + bmap, host_num_pages); 475 367 vm_dirty_log_verify(mode, bmap); 476 368 iteration++; 477 369 sync_global_to_guest(vm, iteration); ··· 512 410 TEST_HOST_LOOP_INTERVAL); 513 411 printf(" -p: specify guest physical test memory offset\n" 514 412 " Warning: a low offset can conflict with the loaded test code.\n"); 413 + printf(" -M: specify the host logging mode " 414 + "(default: run all log modes). Supported modes: \n\t"); 415 + log_modes_dump(); 515 416 printf(" -m: specify the guest mode ID to test " 516 417 "(default: test all supported modes)\n" 517 418 " This option may be used multiple times.\n" ··· 534 429 bool mode_selected = false; 535 430 uint64_t phys_offset = 0; 536 431 unsigned int mode; 537 - int opt, i; 538 - 539 - #ifdef USE_CLEAR_DIRTY_LOG 540 - dirty_log_manual_caps = 541 - kvm_check_cap(KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2); 542 - if (!dirty_log_manual_caps) { 543 - print_skip("KVM_CLEAR_DIRTY_LOG not available"); 544 - exit(KSFT_SKIP); 545 - } 546 - dirty_log_manual_caps &= (KVM_DIRTY_LOG_MANUAL_PROTECT_ENABLE | 547 - KVM_DIRTY_LOG_INITIALLY_SET); 548 - #endif 432 + int opt, i, j; 549 433 550 434 #ifdef __x86_64__ 551 435 guest_mode_init(VM_MODE_PXXV48_4K, true, true); ··· 558 464 guest_mode_init(VM_MODE_P40V48_4K, true, true); 559 465 #endif 560 466 561 - while ((opt = getopt(argc, argv, "hi:I:p:m:")) != -1) { 467 + while ((opt = getopt(argc, argv, "hi:I:p:m:M:")) != -1) { 562 468 switch (opt) { 563 469 case 'i': 564 470 iterations = strtol(optarg, NULL, 10); ··· 579 485 TEST_ASSERT(mode < NUM_VM_MODES, 580 486 "Guest mode ID %d too big", mode); 581 487 guest_modes[mode].enabled = true; 488 + break; 489 + case 'M': 490 + if (!strcmp(optarg, "all")) { 491 + host_log_mode_option = LOG_MODE_ALL; 492 + break; 493 + } 494 + for (i = 0; i < LOG_MODE_NUM; i++) { 495 + if (!strcmp(optarg, log_modes[i].name)) { 496 + pr_info("Setting log mode to: '%s'\n", 497 + optarg); 498 + host_log_mode_option = i; 499 + break; 500 + } 501 + } 502 + if (i == LOG_MODE_NUM) { 503 + printf("Log mode '%s' invalid. Please choose " 504 + "from: ", optarg); 505 + log_modes_dump(); 506 + exit(1); 507 + } 582 508 break; 583 509 case 'h': 584 510 default: ··· 621 507 TEST_ASSERT(guest_modes[i].supported, 622 508 "Guest mode ID %d (%s) not supported.", 623 509 i, vm_guest_mode_string(i)); 624 - run_test(i, iterations, interval, phys_offset); 510 + if (host_log_mode_option == LOG_MODE_ALL) { 511 + /* Run each log mode */ 512 + for (j = 0; j < LOG_MODE_NUM; j++) { 513 + pr_info("Testing Log Mode '%s'\n", 514 + log_modes[j].name); 515 + host_log_mode = j; 516 + run_test(i, iterations, interval, phys_offset); 517 + } 518 + } else { 519 + host_log_mode = host_log_mode_option; 520 + run_test(i, iterations, interval, phys_offset); 521 + } 625 522 } 626 523 627 524 return 0;

+6 -1

tools/testing/selftests/kvm/include/kvm_util.h

··· 63 63 64 64 int kvm_check_cap(long cap); 65 65 int vm_enable_cap(struct kvm_vm *vm, struct kvm_enable_cap *cap); 66 + int vcpu_enable_cap(struct kvm_vm *vm, uint32_t vcpu_id, 67 + struct kvm_enable_cap *cap); 68 + void vm_enable_dirty_ring(struct kvm_vm *vm, uint32_t ring_size); 66 69 67 70 struct kvm_vm *vm_create(enum vm_guest_mode mode, uint64_t phy_pages, int perm); 68 - struct kvm_vm *_vm_create(enum vm_guest_mode mode, uint64_t phy_pages, int perm); 69 71 void kvm_vm_free(struct kvm_vm *vmp); 70 72 void kvm_vm_restart(struct kvm_vm *vmp, int perm); 71 73 void kvm_vm_release(struct kvm_vm *vmp); ··· 151 149 struct kvm_guest_debug *debug); 152 150 void vcpu_set_mp_state(struct kvm_vm *vm, uint32_t vcpuid, 153 151 struct kvm_mp_state *mp_state); 152 + struct kvm_reg_list *vcpu_get_reg_list(struct kvm_vm *vm, uint32_t vcpuid); 154 153 void vcpu_regs_get(struct kvm_vm *vm, uint32_t vcpuid, struct kvm_regs *regs); 155 154 void vcpu_regs_set(struct kvm_vm *vm, uint32_t vcpuid, struct kvm_regs *regs); 156 155 ··· 296 293 typeof(g) *_p = addr_gva2hva(vm, (vm_vaddr_t)&(g)); \ 297 294 memcpy(&(g), _p, sizeof(g)); \ 298 295 }) 296 + 297 + void assert_on_unhandled_exception(struct kvm_vm *vm, uint32_t vcpuid); 299 298 300 299 /* Common ucalls */ 301 300 enum {

+198

tools/testing/selftests/kvm/include/perf_test_util.h

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* 3 + * tools/testing/selftests/kvm/include/perf_test_util.h 4 + * 5 + * Copyright (C) 2020, Google LLC. 6 + */ 7 + 8 + #ifndef SELFTEST_KVM_PERF_TEST_UTIL_H 9 + #define SELFTEST_KVM_PERF_TEST_UTIL_H 10 + 11 + #include "kvm_util.h" 12 + #include "processor.h" 13 + 14 + #define MAX_VCPUS 512 15 + 16 + #define PAGE_SHIFT_4K 12 17 + #define PTES_PER_4K_PT 512 18 + 19 + #define TEST_MEM_SLOT_INDEX 1 20 + 21 + /* Default guest test virtual memory offset */ 22 + #define DEFAULT_GUEST_TEST_MEM 0xc0000000 23 + 24 + #define DEFAULT_PER_VCPU_MEM_SIZE (1 << 30) /* 1G */ 25 + 26 + /* 27 + * Guest physical memory offset of the testing memory slot. 28 + * This will be set to the topmost valid physical address minus 29 + * the test memory size. 30 + */ 31 + static uint64_t guest_test_phys_mem; 32 + 33 + /* 34 + * Guest virtual memory offset of the testing memory slot. 35 + * Must not conflict with identity mapped test code. 36 + */ 37 + static uint64_t guest_test_virt_mem = DEFAULT_GUEST_TEST_MEM; 38 + static uint64_t guest_percpu_mem_size = DEFAULT_PER_VCPU_MEM_SIZE; 39 + 40 + /* Number of VCPUs for the test */ 41 + static int nr_vcpus = 1; 42 + 43 + struct vcpu_args { 44 + uint64_t gva; 45 + uint64_t pages; 46 + 47 + /* Only used by the host userspace part of the vCPU thread */ 48 + int vcpu_id; 49 + }; 50 + 51 + struct perf_test_args { 52 + struct kvm_vm *vm; 53 + uint64_t host_page_size; 54 + uint64_t guest_page_size; 55 + int wr_fract; 56 + 57 + struct vcpu_args vcpu_args[MAX_VCPUS]; 58 + }; 59 + 60 + static struct perf_test_args perf_test_args; 61 + 62 + /* 63 + * Continuously write to the first 8 bytes of each page in the 64 + * specified region. 65 + */ 66 + static void guest_code(uint32_t vcpu_id) 67 + { 68 + struct vcpu_args *vcpu_args = &perf_test_args.vcpu_args[vcpu_id]; 69 + uint64_t gva; 70 + uint64_t pages; 71 + int i; 72 + 73 + /* Make sure vCPU args data structure is not corrupt. */ 74 + GUEST_ASSERT(vcpu_args->vcpu_id == vcpu_id); 75 + 76 + gva = vcpu_args->gva; 77 + pages = vcpu_args->pages; 78 + 79 + while (true) { 80 + for (i = 0; i < pages; i++) { 81 + uint64_t addr = gva + (i * perf_test_args.guest_page_size); 82 + 83 + if (i % perf_test_args.wr_fract == 0) 84 + *(uint64_t *)addr = 0x0123456789ABCDEF; 85 + else 86 + READ_ONCE(*(uint64_t *)addr); 87 + } 88 + 89 + GUEST_SYNC(1); 90 + } 91 + } 92 + 93 + static struct kvm_vm *create_vm(enum vm_guest_mode mode, int vcpus, 94 + uint64_t vcpu_memory_bytes) 95 + { 96 + struct kvm_vm *vm; 97 + uint64_t pages = DEFAULT_GUEST_PHY_PAGES; 98 + uint64_t guest_num_pages; 99 + 100 + /* Account for a few pages per-vCPU for stacks */ 101 + pages += DEFAULT_STACK_PGS * vcpus; 102 + 103 + /* 104 + * Reserve twice the ammount of memory needed to map the test region and 105 + * the page table / stacks region, at 4k, for page tables. Do the 106 + * calculation with 4K page size: the smallest of all archs. (e.g., 64K 107 + * page size guest will need even less memory for page tables). 108 + */ 109 + pages += (2 * pages) / PTES_PER_4K_PT; 110 + pages += ((2 * vcpus * vcpu_memory_bytes) >> PAGE_SHIFT_4K) / 111 + PTES_PER_4K_PT; 112 + pages = vm_adjust_num_guest_pages(mode, pages); 113 + 114 + pr_info("Testing guest mode: %s\n", vm_guest_mode_string(mode)); 115 + 116 + vm = vm_create(mode, pages, O_RDWR); 117 + kvm_vm_elf_load(vm, program_invocation_name, 0, 0); 118 + #ifdef __x86_64__ 119 + vm_create_irqchip(vm); 120 + #endif 121 + 122 + perf_test_args.vm = vm; 123 + perf_test_args.guest_page_size = vm_get_page_size(vm); 124 + perf_test_args.host_page_size = getpagesize(); 125 + 126 + TEST_ASSERT(vcpu_memory_bytes % perf_test_args.guest_page_size == 0, 127 + "Guest memory size is not guest page size aligned."); 128 + 129 + guest_num_pages = (vcpus * vcpu_memory_bytes) / 130 + perf_test_args.guest_page_size; 131 + guest_num_pages = vm_adjust_num_guest_pages(mode, guest_num_pages); 132 + 133 + /* 134 + * If there should be more memory in the guest test region than there 135 + * can be pages in the guest, it will definitely cause problems. 136 + */ 137 + TEST_ASSERT(guest_num_pages < vm_get_max_gfn(vm), 138 + "Requested more guest memory than address space allows.\n" 139 + " guest pages: %lx max gfn: %x vcpus: %d wss: %lx]\n", 140 + guest_num_pages, vm_get_max_gfn(vm), vcpus, 141 + vcpu_memory_bytes); 142 + 143 + TEST_ASSERT(vcpu_memory_bytes % perf_test_args.host_page_size == 0, 144 + "Guest memory size is not host page size aligned."); 145 + 146 + guest_test_phys_mem = (vm_get_max_gfn(vm) - guest_num_pages) * 147 + perf_test_args.guest_page_size; 148 + guest_test_phys_mem &= ~(perf_test_args.host_page_size - 1); 149 + 150 + #ifdef __s390x__ 151 + /* Align to 1M (segment size) */ 152 + guest_test_phys_mem &= ~((1 << 20) - 1); 153 + #endif 154 + 155 + pr_info("guest physical test memory offset: 0x%lx\n", guest_test_phys_mem); 156 + 157 + /* Add an extra memory slot for testing */ 158 + vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS, 159 + guest_test_phys_mem, 160 + TEST_MEM_SLOT_INDEX, 161 + guest_num_pages, 0); 162 + 163 + /* Do mapping for the demand paging memory slot */ 164 + virt_map(vm, guest_test_virt_mem, guest_test_phys_mem, guest_num_pages, 0); 165 + 166 + ucall_init(vm, NULL); 167 + 168 + return vm; 169 + } 170 + 171 + static void add_vcpus(struct kvm_vm *vm, int vcpus, uint64_t vcpu_memory_bytes) 172 + { 173 + vm_paddr_t vcpu_gpa; 174 + struct vcpu_args *vcpu_args; 175 + int vcpu_id; 176 + 177 + for (vcpu_id = 0; vcpu_id < vcpus; vcpu_id++) { 178 + vcpu_args = &perf_test_args.vcpu_args[vcpu_id]; 179 + 180 + vm_vcpu_add_default(vm, vcpu_id, guest_code); 181 + 182 + #ifdef __x86_64__ 183 + vcpu_set_cpuid(vm, vcpu_id, kvm_get_supported_cpuid()); 184 + #endif 185 + 186 + vcpu_args->vcpu_id = vcpu_id; 187 + vcpu_args->gva = guest_test_virt_mem + 188 + (vcpu_id * vcpu_memory_bytes); 189 + vcpu_args->pages = vcpu_memory_bytes / 190 + perf_test_args.guest_page_size; 191 + 192 + vcpu_gpa = guest_test_phys_mem + (vcpu_id * vcpu_memory_bytes); 193 + pr_debug("Added VCPU %d with test mem gpa [%lx, %lx)\n", 194 + vcpu_id, vcpu_gpa, vcpu_gpa + vcpu_memory_bytes); 195 + } 196 + } 197 + 198 + #endif /* SELFTEST_KVM_PERF_TEST_UTIL_H */

+2

tools/testing/selftests/kvm/include/test_util.h

··· 64 64 struct timespec timespec_add_ns(struct timespec ts, int64_t ns); 65 65 struct timespec timespec_add(struct timespec ts1, struct timespec ts2); 66 66 struct timespec timespec_sub(struct timespec ts1, struct timespec ts2); 67 + struct timespec timespec_diff_now(struct timespec start); 68 + struct timespec timespec_div(struct timespec ts, int divisor); 67 69 68 70 #endif /* SELFTEST_KVM_TEST_UTIL_H */

+37 -1

tools/testing/selftests/kvm/include/x86_64/processor.h

··· 36 36 #define X86_CR4_SMAP (1ul << 21) 37 37 #define X86_CR4_PKE (1ul << 22) 38 38 39 + #define UNEXPECTED_VECTOR_PORT 0xfff0u 40 + 39 41 /* General Registers in 64-Bit Mode */ 40 42 struct gpr64_regs { 41 43 u64 rax; ··· 61 59 struct desc64 { 62 60 uint16_t limit0; 63 61 uint16_t base0; 64 - unsigned base1:8, s:1, type:4, dpl:2, p:1; 62 + unsigned base1:8, type:4, s:1, dpl:2, p:1; 65 63 unsigned limit1:4, avl:1, l:1, db:1, g:1, base2:8; 66 64 uint32_t base3; 67 65 uint32_t zero1; ··· 241 239 return idt; 242 240 } 243 241 242 + static inline void outl(uint16_t port, uint32_t value) 243 + { 244 + __asm__ __volatile__("outl %%eax, %%dx" : : "d"(port), "a"(value)); 245 + } 246 + 244 247 #define SET_XMM(__var, __xmm) \ 245 248 asm volatile("movq %0, %%"#__xmm : : "r"(__var) : #__xmm) 246 249 ··· 344 337 uint32_t kvm_get_cpuid_max_basic(void); 345 338 uint32_t kvm_get_cpuid_max_extended(void); 346 339 void kvm_get_cpu_address_width(unsigned int *pa_bits, unsigned int *va_bits); 340 + 341 + struct ex_regs { 342 + uint64_t rax, rcx, rdx, rbx; 343 + uint64_t rbp, rsi, rdi; 344 + uint64_t r8, r9, r10, r11; 345 + uint64_t r12, r13, r14, r15; 346 + uint64_t vector; 347 + uint64_t error_code; 348 + uint64_t rip; 349 + uint64_t cs; 350 + uint64_t rflags; 351 + }; 352 + 353 + void vm_init_descriptor_tables(struct kvm_vm *vm); 354 + void vcpu_init_descriptor_tables(struct kvm_vm *vm, uint32_t vcpuid); 355 + void vm_handle_exception(struct kvm_vm *vm, int vector, 356 + void (*handler)(struct ex_regs *)); 357 + 358 + /* 359 + * set_cpuid() - overwrites a matching cpuid entry with the provided value. 360 + * matches based on ent->function && ent->index. returns true 361 + * if a match was found and successfully overwritten. 362 + * @cpuid: the kvm cpuid list to modify. 363 + * @ent: cpuid entry to insert 364 + */ 365 + bool set_cpuid(struct kvm_cpuid2 *cpuid, struct kvm_cpuid_entry2 *ent); 366 + 367 + uint64_t kvm_hypercall(uint64_t nr, uint64_t a0, uint64_t a1, uint64_t a2, 368 + uint64_t a3); 347 369 348 370 /* 349 371 * Basic CPU control in CR0

+4

tools/testing/selftests/kvm/lib/aarch64/processor.c

··· 350 350 351 351 va_end(ap); 352 352 } 353 + 354 + void assert_on_unhandled_exception(struct kvm_vm *vm, uint32_t vcpuid) 355 + { 356 + }

+3

tools/testing/selftests/kvm/lib/aarch64/ucall.c

··· 94 94 struct kvm_run *run = vcpu_state(vm, vcpu_id); 95 95 struct ucall ucall = {}; 96 96 97 + if (uc) 98 + memset(uc, 0, sizeof(*uc)); 99 + 97 100 if (run->exit_reason == KVM_EXIT_MMIO && 98 101 run->mmio.phys_addr == (uint64_t)ucall_exit_mmio_addr) { 99 102 vm_vaddr_t gva;

+61 -6

tools/testing/selftests/kvm/lib/kvm_util.c

··· 86 86 return ret; 87 87 } 88 88 89 + /* VCPU Enable Capability 90 + * 91 + * Input Args: 92 + * vm - Virtual Machine 93 + * vcpu_id - VCPU 94 + * cap - Capability 95 + * 96 + * Output Args: None 97 + * 98 + * Return: On success, 0. On failure a TEST_ASSERT failure is produced. 99 + * 100 + * Enables a capability (KVM_CAP_*) on the VCPU. 101 + */ 102 + int vcpu_enable_cap(struct kvm_vm *vm, uint32_t vcpu_id, 103 + struct kvm_enable_cap *cap) 104 + { 105 + struct vcpu *vcpu = vcpu_find(vm, vcpu_id); 106 + int r; 107 + 108 + TEST_ASSERT(vcpu, "cannot find vcpu %d", vcpu_id); 109 + 110 + r = ioctl(vcpu->fd, KVM_ENABLE_CAP, cap); 111 + TEST_ASSERT(!r, "KVM_ENABLE_CAP vCPU ioctl failed,\n" 112 + " rc: %i, errno: %i", r, errno); 113 + 114 + return r; 115 + } 116 + 89 117 static void vm_open(struct kvm_vm *vm, int perm) 90 118 { 91 119 vm->kvm_fd = open(KVM_DEV_PATH, perm); ··· 180 152 * descriptor to control the created VM is created with the permissions 181 153 * given by perm (e.g. O_RDWR). 182 154 */ 183 - struct kvm_vm *_vm_create(enum vm_guest_mode mode, uint64_t phy_pages, int perm) 155 + struct kvm_vm *vm_create(enum vm_guest_mode mode, uint64_t phy_pages, int perm) 184 156 { 185 157 struct kvm_vm *vm; 186 158 ··· 269 241 0, 0, phy_pages, 0); 270 242 271 243 return vm; 272 - } 273 - 274 - struct kvm_vm *vm_create(enum vm_guest_mode mode, uint64_t phy_pages, int perm) 275 - { 276 - return _vm_create(mode, phy_pages, perm); 277 244 } 278 245 279 246 /* ··· 1227 1204 do { 1228 1205 rc = ioctl(vcpu->fd, KVM_RUN, NULL); 1229 1206 } while (rc == -1 && errno == EINTR); 1207 + 1208 + assert_on_unhandled_exception(vm, vcpuid); 1209 + 1230 1210 return rc; 1231 1211 } 1232 1212 ··· 1284 1258 ret = ioctl(vcpu->fd, KVM_SET_MP_STATE, mp_state); 1285 1259 TEST_ASSERT(ret == 0, "KVM_SET_MP_STATE IOCTL failed, " 1286 1260 "rc: %i errno: %i", ret, errno); 1261 + } 1262 + 1263 + /* 1264 + * VM VCPU Get Reg List 1265 + * 1266 + * Input Args: 1267 + * vm - Virtual Machine 1268 + * vcpuid - VCPU ID 1269 + * 1270 + * Output Args: 1271 + * None 1272 + * 1273 + * Return: 1274 + * A pointer to an allocated struct kvm_reg_list 1275 + * 1276 + * Get the list of guest registers which are supported for 1277 + * KVM_GET_ONE_REG/KVM_SET_ONE_REG calls 1278 + */ 1279 + struct kvm_reg_list *vcpu_get_reg_list(struct kvm_vm *vm, uint32_t vcpuid) 1280 + { 1281 + struct kvm_reg_list reg_list_n = { .n = 0 }, *reg_list; 1282 + int ret; 1283 + 1284 + ret = _vcpu_ioctl(vm, vcpuid, KVM_GET_REG_LIST, &reg_list_n); 1285 + TEST_ASSERT(ret == -1 && errno == E2BIG, "KVM_GET_REG_LIST n=0"); 1286 + reg_list = calloc(1, sizeof(*reg_list) + reg_list_n.n * sizeof(__u64)); 1287 + reg_list->n = reg_list_n.n; 1288 + vcpu_ioctl(vm, vcpuid, KVM_GET_REG_LIST, reg_list); 1289 + return reg_list; 1287 1290 } 1288 1291 1289 1292 /*

+2

tools/testing/selftests/kvm/lib/kvm_util_internal.h

··· 50 50 vm_paddr_t pgd; 51 51 vm_vaddr_t gdt; 52 52 vm_vaddr_t tss; 53 + vm_vaddr_t idt; 54 + vm_vaddr_t handlers; 53 55 }; 54 56 55 57 struct vcpu *vcpu_find(struct kvm_vm *vm, uint32_t vcpuid);

+4

tools/testing/selftests/kvm/lib/s390x/processor.c

··· 241 241 fprintf(stream, "%*spstate: psw: 0x%.16llx:0x%.16llx\n", 242 242 indent, "", vcpu->state->psw_mask, vcpu->state->psw_addr); 243 243 } 244 + 245 + void assert_on_unhandled_exception(struct kvm_vm *vm, uint32_t vcpuid) 246 + { 247 + }

+3

tools/testing/selftests/kvm/lib/s390x/ucall.c

··· 38 38 struct kvm_run *run = vcpu_state(vm, vcpu_id); 39 39 struct ucall ucall = {}; 40 40 41 + if (uc) 42 + memset(uc, 0, sizeof(*uc)); 43 + 41 44 if (run->exit_reason == KVM_EXIT_S390_SIEIC && 42 45 run->s390_sieic.icptcode == 4 && 43 46 (run->s390_sieic.ipa >> 8) == 0x83 && /* 0x83 means DIAGNOSE */

+20 -2

tools/testing/selftests/kvm/lib/test_util.c

··· 4 4 * 5 5 * Copyright (C) 2020, Google LLC. 6 6 */ 7 - #include <stdlib.h> 7 + 8 + #include <assert.h> 8 9 #include <ctype.h> 9 10 #include <limits.h> 10 - #include <assert.h> 11 + #include <stdlib.h> 12 + #include <time.h> 13 + 11 14 #include "test_util.h" 12 15 13 16 /* ··· 82 79 int64_t ns1 = timespec_to_ns(ts1); 83 80 int64_t ns2 = timespec_to_ns(ts2); 84 81 return timespec_add_ns((struct timespec){0}, ns1 - ns2); 82 + } 83 + 84 + struct timespec timespec_diff_now(struct timespec start) 85 + { 86 + struct timespec end; 87 + 88 + clock_gettime(CLOCK_MONOTONIC, &end); 89 + return timespec_sub(end, start); 90 + } 91 + 92 + struct timespec timespec_div(struct timespec ts, int divisor) 93 + { 94 + int64_t ns = timespec_to_ns(ts) / divisor; 95 + 96 + return timespec_add_ns((struct timespec){0}, ns); 85 97 } 86 98 87 99 void print_skip(const char *fmt, ...)

+81

tools/testing/selftests/kvm/lib/x86_64/handlers.S

··· 1 + handle_exception: 2 + push %r15 3 + push %r14 4 + push %r13 5 + push %r12 6 + push %r11 7 + push %r10 8 + push %r9 9 + push %r8 10 + 11 + push %rdi 12 + push %rsi 13 + push %rbp 14 + push %rbx 15 + push %rdx 16 + push %rcx 17 + push %rax 18 + mov %rsp, %rdi 19 + 20 + call route_exception 21 + 22 + pop %rax 23 + pop %rcx 24 + pop %rdx 25 + pop %rbx 26 + pop %rbp 27 + pop %rsi 28 + pop %rdi 29 + pop %r8 30 + pop %r9 31 + pop %r10 32 + pop %r11 33 + pop %r12 34 + pop %r13 35 + pop %r14 36 + pop %r15 37 + 38 + /* Discard vector and error code. */ 39 + add $16, %rsp 40 + iretq 41 + 42 + /* 43 + * Build the handle_exception wrappers which push the vector/error code on the 44 + * stack and an array of pointers to those wrappers. 45 + */ 46 + .pushsection .rodata 47 + .globl idt_handlers 48 + idt_handlers: 49 + .popsection 50 + 51 + .macro HANDLERS has_error from to 52 + vector = \from 53 + .rept \to - \from + 1 54 + .align 8 55 + 56 + /* Fetch current address and append it to idt_handlers. */ 57 + current_handler = . 58 + .pushsection .rodata 59 + .quad current_handler 60 + .popsection 61 + 62 + .if ! \has_error 63 + pushq $0 64 + .endif 65 + pushq $vector 66 + jmp handle_exception 67 + vector = vector + 1 68 + .endr 69 + .endm 70 + 71 + .global idt_handler_code 72 + idt_handler_code: 73 + HANDLERS has_error=0 from=0 to=7 74 + HANDLERS has_error=1 from=8 to=8 75 + HANDLERS has_error=0 from=9 to=9 76 + HANDLERS has_error=1 from=10 to=14 77 + HANDLERS has_error=0 from=15 to=16 78 + HANDLERS has_error=1 from=17 to=17 79 + HANDLERS has_error=0 from=18 to=255 80 + 81 + .section .note.GNU-stack, "", %progbits

+142 -4

tools/testing/selftests/kvm/lib/x86_64/processor.c

··· 12 12 #include "../kvm_util_internal.h" 13 13 #include "processor.h" 14 14 15 + #ifndef NUM_INTERRUPTS 16 + #define NUM_INTERRUPTS 256 17 + #endif 18 + 19 + #define DEFAULT_CODE_SELECTOR 0x8 20 + #define DEFAULT_DATA_SELECTOR 0x10 21 + 15 22 /* Minimum physical address used for virtual translation tables. */ 16 23 #define KVM_GUEST_PAGE_TABLE_MIN_PADDR 0x180000 24 + 25 + vm_vaddr_t exception_handlers; 17 26 18 27 /* Virtual translation table structure declarations */ 19 28 struct pageMapL4Entry { ··· 401 392 desc->limit0 = segp->limit & 0xFFFF; 402 393 desc->base0 = segp->base & 0xFFFF; 403 394 desc->base1 = segp->base >> 16; 404 - desc->s = segp->s; 405 395 desc->type = segp->type; 396 + desc->s = segp->s; 406 397 desc->dpl = segp->dpl; 407 398 desc->p = segp->present; 408 399 desc->limit1 = segp->limit >> 16; 400 + desc->avl = segp->avl; 409 401 desc->l = segp->l; 410 402 desc->db = segp->db; 411 403 desc->g = segp->g; ··· 566 556 sregs.efer |= (EFER_LME | EFER_LMA | EFER_NX); 567 557 568 558 kvm_seg_set_unusable(&sregs.ldt); 569 - kvm_seg_set_kernel_code_64bit(vm, 0x8, &sregs.cs); 570 - kvm_seg_set_kernel_data_64bit(vm, 0x10, &sregs.ds); 571 - kvm_seg_set_kernel_data_64bit(vm, 0x10, &sregs.es); 559 + kvm_seg_set_kernel_code_64bit(vm, DEFAULT_CODE_SELECTOR, &sregs.cs); 560 + kvm_seg_set_kernel_data_64bit(vm, DEFAULT_DATA_SELECTOR, &sregs.ds); 561 + kvm_seg_set_kernel_data_64bit(vm, DEFAULT_DATA_SELECTOR, &sregs.es); 572 562 kvm_setup_tss_64bit(vm, &sregs.tr, 0x18, gdt_memslot, pgd_memslot); 573 563 break; 574 564 ··· 1127 1117 *pa_bits = entry->eax & 0xff; 1128 1118 *va_bits = (entry->eax >> 8) & 0xff; 1129 1119 } 1120 + } 1121 + 1122 + struct idt_entry { 1123 + uint16_t offset0; 1124 + uint16_t selector; 1125 + uint16_t ist : 3; 1126 + uint16_t : 5; 1127 + uint16_t type : 4; 1128 + uint16_t : 1; 1129 + uint16_t dpl : 2; 1130 + uint16_t p : 1; 1131 + uint16_t offset1; 1132 + uint32_t offset2; uint32_t reserved; 1133 + }; 1134 + 1135 + static void set_idt_entry(struct kvm_vm *vm, int vector, unsigned long addr, 1136 + int dpl, unsigned short selector) 1137 + { 1138 + struct idt_entry *base = 1139 + (struct idt_entry *)addr_gva2hva(vm, vm->idt); 1140 + struct idt_entry *e = &base[vector]; 1141 + 1142 + memset(e, 0, sizeof(*e)); 1143 + e->offset0 = addr; 1144 + e->selector = selector; 1145 + e->ist = 0; 1146 + e->type = 14; 1147 + e->dpl = dpl; 1148 + e->p = 1; 1149 + e->offset1 = addr >> 16; 1150 + e->offset2 = addr >> 32; 1151 + } 1152 + 1153 + void kvm_exit_unexpected_vector(uint32_t value) 1154 + { 1155 + outl(UNEXPECTED_VECTOR_PORT, value); 1156 + } 1157 + 1158 + void route_exception(struct ex_regs *regs) 1159 + { 1160 + typedef void(*handler)(struct ex_regs *); 1161 + handler *handlers = (handler *)exception_handlers; 1162 + 1163 + if (handlers && handlers[regs->vector]) { 1164 + handlers[regs->vector](regs); 1165 + return; 1166 + } 1167 + 1168 + kvm_exit_unexpected_vector(regs->vector); 1169 + } 1170 + 1171 + void vm_init_descriptor_tables(struct kvm_vm *vm) 1172 + { 1173 + extern void *idt_handlers; 1174 + int i; 1175 + 1176 + vm->idt = vm_vaddr_alloc(vm, getpagesize(), 0x2000, 0, 0); 1177 + vm->handlers = vm_vaddr_alloc(vm, 256 * sizeof(void *), 0x2000, 0, 0); 1178 + /* Handlers have the same address in both address spaces.*/ 1179 + for (i = 0; i < NUM_INTERRUPTS; i++) 1180 + set_idt_entry(vm, i, (unsigned long)(&idt_handlers)[i], 0, 1181 + DEFAULT_CODE_SELECTOR); 1182 + } 1183 + 1184 + void vcpu_init_descriptor_tables(struct kvm_vm *vm, uint32_t vcpuid) 1185 + { 1186 + struct kvm_sregs sregs; 1187 + 1188 + vcpu_sregs_get(vm, vcpuid, &sregs); 1189 + sregs.idt.base = vm->idt; 1190 + sregs.idt.limit = NUM_INTERRUPTS * sizeof(struct idt_entry) - 1; 1191 + sregs.gdt.base = vm->gdt; 1192 + sregs.gdt.limit = getpagesize() - 1; 1193 + kvm_seg_set_kernel_data_64bit(NULL, DEFAULT_DATA_SELECTOR, &sregs.gs); 1194 + vcpu_sregs_set(vm, vcpuid, &sregs); 1195 + *(vm_vaddr_t *)addr_gva2hva(vm, (vm_vaddr_t)(&exception_handlers)) = vm->handlers; 1196 + } 1197 + 1198 + void vm_handle_exception(struct kvm_vm *vm, int vector, 1199 + void (*handler)(struct ex_regs *)) 1200 + { 1201 + vm_vaddr_t *handlers = (vm_vaddr_t *)addr_gva2hva(vm, vm->handlers); 1202 + 1203 + handlers[vector] = (vm_vaddr_t)handler; 1204 + } 1205 + 1206 + void assert_on_unhandled_exception(struct kvm_vm *vm, uint32_t vcpuid) 1207 + { 1208 + if (vcpu_state(vm, vcpuid)->exit_reason == KVM_EXIT_IO 1209 + && vcpu_state(vm, vcpuid)->io.port == UNEXPECTED_VECTOR_PORT 1210 + && vcpu_state(vm, vcpuid)->io.size == 4) { 1211 + /* Grab pointer to io data */ 1212 + uint32_t *data = (void *)vcpu_state(vm, vcpuid) 1213 + + vcpu_state(vm, vcpuid)->io.data_offset; 1214 + 1215 + TEST_ASSERT(false, 1216 + "Unexpected vectored event in guest (vector:0x%x)", 1217 + *data); 1218 + } 1219 + } 1220 + 1221 + bool set_cpuid(struct kvm_cpuid2 *cpuid, 1222 + struct kvm_cpuid_entry2 *ent) 1223 + { 1224 + int i; 1225 + 1226 + for (i = 0; i < cpuid->nent; i++) { 1227 + struct kvm_cpuid_entry2 *cur = &cpuid->entries[i]; 1228 + 1229 + if (cur->function != ent->function || cur->index != ent->index) 1230 + continue; 1231 + 1232 + memcpy(cur, ent, sizeof(struct kvm_cpuid_entry2)); 1233 + return true; 1234 + } 1235 + 1236 + return false; 1237 + } 1238 + 1239 + uint64_t kvm_hypercall(uint64_t nr, uint64_t a0, uint64_t a1, uint64_t a2, 1240 + uint64_t a3) 1241 + { 1242 + uint64_t r; 1243 + 1244 + asm volatile("vmcall" 1245 + : "=a"(r) 1246 + : "b"(a0), "c"(a1), "d"(a2), "S"(a3)); 1247 + return r; 1130 1248 }

+3

tools/testing/selftests/kvm/lib/x86_64/ucall.c

··· 40 40 struct kvm_run *run = vcpu_state(vm, vcpu_id); 41 41 struct ucall ucall = {}; 42 42 43 + if (uc) 44 + memset(uc, 0, sizeof(*uc)); 45 + 43 46 if (run->exit_reason == KVM_EXIT_IO && run->io.port == UCALL_PIO_PORT) { 44 47 struct kvm_regs regs; 45 48

+234

tools/testing/selftests/kvm/x86_64/kvm_pv_test.c

··· 1 + // SPDX-License-Identifier: GPL-2.0-only 2 + /* 3 + * Copyright (C) 2020, Google LLC. 4 + * 5 + * Tests for KVM paravirtual feature disablement 6 + */ 7 + #include <asm/kvm_para.h> 8 + #include <linux/kvm_para.h> 9 + #include <stdint.h> 10 + 11 + #include "test_util.h" 12 + #include "kvm_util.h" 13 + #include "processor.h" 14 + 15 + extern unsigned char rdmsr_start; 16 + extern unsigned char rdmsr_end; 17 + 18 + static u64 do_rdmsr(u32 idx) 19 + { 20 + u32 lo, hi; 21 + 22 + asm volatile("rdmsr_start: rdmsr;" 23 + "rdmsr_end:" 24 + : "=a"(lo), "=c"(hi) 25 + : "c"(idx)); 26 + 27 + return (((u64) hi) << 32) | lo; 28 + } 29 + 30 + extern unsigned char wrmsr_start; 31 + extern unsigned char wrmsr_end; 32 + 33 + static void do_wrmsr(u32 idx, u64 val) 34 + { 35 + u32 lo, hi; 36 + 37 + lo = val; 38 + hi = val >> 32; 39 + 40 + asm volatile("wrmsr_start: wrmsr;" 41 + "wrmsr_end:" 42 + : : "a"(lo), "c"(idx), "d"(hi)); 43 + } 44 + 45 + static int nr_gp; 46 + 47 + static void guest_gp_handler(struct ex_regs *regs) 48 + { 49 + unsigned char *rip = (unsigned char *)regs->rip; 50 + bool r, w; 51 + 52 + r = rip == &rdmsr_start; 53 + w = rip == &wrmsr_start; 54 + GUEST_ASSERT(r || w); 55 + 56 + nr_gp++; 57 + 58 + if (r) 59 + regs->rip = (uint64_t)&rdmsr_end; 60 + else 61 + regs->rip = (uint64_t)&wrmsr_end; 62 + } 63 + 64 + struct msr_data { 65 + uint32_t idx; 66 + const char *name; 67 + }; 68 + 69 + #define TEST_MSR(msr) { .idx = msr, .name = #msr } 70 + #define UCALL_PR_MSR 0xdeadbeef 71 + #define PR_MSR(msr) ucall(UCALL_PR_MSR, 1, msr) 72 + 73 + /* 74 + * KVM paravirtual msrs to test. Expect a #GP if any of these msrs are read or 75 + * written, as the KVM_CPUID_FEATURES leaf is cleared. 76 + */ 77 + static struct msr_data msrs_to_test[] = { 78 + TEST_MSR(MSR_KVM_SYSTEM_TIME), 79 + TEST_MSR(MSR_KVM_SYSTEM_TIME_NEW), 80 + TEST_MSR(MSR_KVM_WALL_CLOCK), 81 + TEST_MSR(MSR_KVM_WALL_CLOCK_NEW), 82 + TEST_MSR(MSR_KVM_ASYNC_PF_EN), 83 + TEST_MSR(MSR_KVM_STEAL_TIME), 84 + TEST_MSR(MSR_KVM_PV_EOI_EN), 85 + TEST_MSR(MSR_KVM_POLL_CONTROL), 86 + TEST_MSR(MSR_KVM_ASYNC_PF_INT), 87 + TEST_MSR(MSR_KVM_ASYNC_PF_ACK), 88 + }; 89 + 90 + static void test_msr(struct msr_data *msr) 91 + { 92 + PR_MSR(msr); 93 + do_rdmsr(msr->idx); 94 + GUEST_ASSERT(READ_ONCE(nr_gp) == 1); 95 + 96 + nr_gp = 0; 97 + do_wrmsr(msr->idx, 0); 98 + GUEST_ASSERT(READ_ONCE(nr_gp) == 1); 99 + nr_gp = 0; 100 + } 101 + 102 + struct hcall_data { 103 + uint64_t nr; 104 + const char *name; 105 + }; 106 + 107 + #define TEST_HCALL(hc) { .nr = hc, .name = #hc } 108 + #define UCALL_PR_HCALL 0xdeadc0de 109 + #define PR_HCALL(hc) ucall(UCALL_PR_HCALL, 1, hc) 110 + 111 + /* 112 + * KVM hypercalls to test. Expect -KVM_ENOSYS when called, as the corresponding 113 + * features have been cleared in KVM_CPUID_FEATURES. 114 + */ 115 + static struct hcall_data hcalls_to_test[] = { 116 + TEST_HCALL(KVM_HC_KICK_CPU), 117 + TEST_HCALL(KVM_HC_SEND_IPI), 118 + TEST_HCALL(KVM_HC_SCHED_YIELD), 119 + }; 120 + 121 + static void test_hcall(struct hcall_data *hc) 122 + { 123 + uint64_t r; 124 + 125 + PR_HCALL(hc); 126 + r = kvm_hypercall(hc->nr, 0, 0, 0, 0); 127 + GUEST_ASSERT(r == -KVM_ENOSYS); 128 + } 129 + 130 + static void guest_main(void) 131 + { 132 + int i; 133 + 134 + for (i = 0; i < ARRAY_SIZE(msrs_to_test); i++) { 135 + test_msr(&msrs_to_test[i]); 136 + } 137 + 138 + for (i = 0; i < ARRAY_SIZE(hcalls_to_test); i++) { 139 + test_hcall(&hcalls_to_test[i]); 140 + } 141 + 142 + GUEST_DONE(); 143 + } 144 + 145 + static void clear_kvm_cpuid_features(struct kvm_cpuid2 *cpuid) 146 + { 147 + struct kvm_cpuid_entry2 ent = {0}; 148 + 149 + ent.function = KVM_CPUID_FEATURES; 150 + TEST_ASSERT(set_cpuid(cpuid, &ent), 151 + "failed to clear KVM_CPUID_FEATURES leaf"); 152 + } 153 + 154 + static void pr_msr(struct ucall *uc) 155 + { 156 + struct msr_data *msr = (struct msr_data *)uc->args[0]; 157 + 158 + pr_info("testing msr: %s (%#x)\n", msr->name, msr->idx); 159 + } 160 + 161 + static void pr_hcall(struct ucall *uc) 162 + { 163 + struct hcall_data *hc = (struct hcall_data *)uc->args[0]; 164 + 165 + pr_info("testing hcall: %s (%lu)\n", hc->name, hc->nr); 166 + } 167 + 168 + static void handle_abort(struct ucall *uc) 169 + { 170 + TEST_FAIL("%s at %s:%ld", (const char *)uc->args[0], 171 + __FILE__, uc->args[1]); 172 + } 173 + 174 + #define VCPU_ID 0 175 + 176 + static void enter_guest(struct kvm_vm *vm) 177 + { 178 + struct kvm_run *run; 179 + struct ucall uc; 180 + int r; 181 + 182 + run = vcpu_state(vm, VCPU_ID); 183 + 184 + while (true) { 185 + r = _vcpu_run(vm, VCPU_ID); 186 + TEST_ASSERT(!r, "vcpu_run failed: %d\n", r); 187 + TEST_ASSERT(run->exit_reason == KVM_EXIT_IO, 188 + "unexpected exit reason: %u (%s)", 189 + run->exit_reason, exit_reason_str(run->exit_reason)); 190 + 191 + switch (get_ucall(vm, VCPU_ID, &uc)) { 192 + case UCALL_PR_MSR: 193 + pr_msr(&uc); 194 + break; 195 + case UCALL_PR_HCALL: 196 + pr_hcall(&uc); 197 + break; 198 + case UCALL_ABORT: 199 + handle_abort(&uc); 200 + return; 201 + case UCALL_DONE: 202 + return; 203 + } 204 + } 205 + } 206 + 207 + int main(void) 208 + { 209 + struct kvm_enable_cap cap = {0}; 210 + struct kvm_cpuid2 *best; 211 + struct kvm_vm *vm; 212 + 213 + if (!kvm_check_cap(KVM_CAP_ENFORCE_PV_FEATURE_CPUID)) { 214 + pr_info("will skip kvm paravirt restriction tests.\n"); 215 + return 0; 216 + } 217 + 218 + vm = vm_create_default(VCPU_ID, 0, guest_main); 219 + 220 + cap.cap = KVM_CAP_ENFORCE_PV_FEATURE_CPUID; 221 + cap.args[0] = 1; 222 + vcpu_enable_cap(vm, VCPU_ID, &cap); 223 + 224 + best = kvm_get_supported_cpuid(); 225 + clear_kvm_cpuid_features(best); 226 + vcpu_set_cpuid(vm, VCPU_ID, best); 227 + 228 + vm_init_descriptor_tables(vm); 229 + vcpu_init_descriptor_tables(vm, VCPU_ID); 230 + vm_handle_exception(vm, GP_VECTOR, guest_gp_handler); 231 + 232 + enter_guest(vm); 233 + kvm_vm_free(vm); 234 + }

+1 -1

tools/testing/selftests/lib.mk

··· 136 136 ifeq ($(OVERRIDE_TARGETS),) 137 137 LOCAL_HDRS := $(selfdir)/kselftest_harness.h $(selfdir)/kselftest.h 138 138 $(OUTPUT)/%:%.c $(LOCAL_HDRS) 139 - $(LINK.c) $^ $(LDLIBS) -o $@ 139 + $(LINK.c) $(filter-out $(LOCAL_HDRS),$^) $(LDLIBS) -o $@ 140 140 141 141 $(OUTPUT)/%.o:%.S 142 142 $(COMPILE.S) $^ -o $@

+1

tools/testing/selftests/pidfd/config

··· 4 4 CONFIG_PID_NS=y 5 5 CONFIG_NET_NS=y 6 6 CONFIG_CGROUPS=y 7 + CONFIG_CHECKPOINT_RESTORE=y

+4 -1

tools/testing/selftests/pidfd/pidfd_getfd_test.c

··· 204 204 fd = sys_pidfd_getfd(self->pidfd, self->remote_fd, 0); 205 205 ASSERT_GE(fd, 0); 206 206 207 - EXPECT_EQ(0, sys_kcmp(getpid(), self->pid, KCMP_FILE, fd, self->remote_fd)); 207 + ret = sys_kcmp(getpid(), self->pid, KCMP_FILE, fd, self->remote_fd); 208 + if (ret < 0 && errno == ENOSYS) 209 + SKIP(return, "kcmp() syscall not supported"); 210 + EXPECT_EQ(ret, 0); 208 211 209 212 ret = fcntl(fd, F_GETFD); 210 213 ASSERT_GE(ret, 0);

-1

tools/testing/selftests/pidfd/pidfd_open_test.c

··· 6 6 #include <inttypes.h> 7 7 #include <limits.h> 8 8 #include <linux/types.h> 9 - #include <linux/wait.h> 10 9 #include <sched.h> 11 10 #include <signal.h> 12 11 #include <stdbool.h>

-1

tools/testing/selftests/pidfd/pidfd_poll_test.c

··· 3 3 #define _GNU_SOURCE 4 4 #include <errno.h> 5 5 #include <linux/types.h> 6 - #include <linux/wait.h> 7 6 #include <poll.h> 8 7 #include <signal.h> 9 8 #include <stdbool.h>

-1

tools/testing/selftests/pidfd/pidfd_setns_test.c

··· 16 16 #include <unistd.h> 17 17 #include <sys/socket.h> 18 18 #include <sys/stat.h> 19 - #include <linux/kcmp.h> 20 19 21 20 #include "pidfd.h" 22 21 #include "../clone3/clone3_selftests.h"

+1 -1

tools/testing/selftests/pidfd/pidfd_test.c

··· 330 330 ksft_exit_fail_msg("%s test: Failed to recycle pid %d\n", 331 331 test_name, PID_RECYCLE); 332 332 case PIDFD_SKIP: 333 - ksft_print_msg("%s test: Skipping test\n", test_name); 333 + ksft_test_result_skip("%s test: Skipping test\n", test_name); 334 334 ret = 0; 335 335 break; 336 336 case PIDFD_XFAIL:

-1

tools/testing/selftests/proc/proc-loadavg-001.c

··· 14 14 * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. 15 15 */ 16 16 /* Test that /proc/loadavg correctly reports last pid in pid namespace. */ 17 - #define _GNU_SOURCE 18 17 #include <errno.h> 19 18 #include <sched.h> 20 19 #include <sys/types.h>

-1

tools/testing/selftests/proc/proc-self-syscall.c

··· 13 13 * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF 14 14 * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. 15 15 */ 16 - #define _GNU_SOURCE 17 16 #include <unistd.h> 18 17 #include <sys/syscall.h> 19 18 #include <sys/types.h>

-1

tools/testing/selftests/proc/proc-uptime-002.c

··· 15 15 */ 16 16 // Test that values in /proc/uptime increment monotonically 17 17 // while shifting across CPUs. 18 - #define _GNU_SOURCE 19 18 #undef NDEBUG 20 19 #include <assert.h> 21 20 #include <unistd.h>

+2 -2

tools/testing/selftests/tc-testing/tc-tests/filters/tests.json

··· 100 100 ], 101 101 "cmdUnderTest": "$TC filter add dev $DEV2 protocol ip pref 1 ingress flower dst_mac e4:11:22:11:4a:51 action drop", 102 102 "expExitCode": "0", 103 - "verifyCmd": "$TC filter show terse dev $DEV2 ingress", 103 + "verifyCmd": "$TC -br filter show dev $DEV2 ingress", 104 104 "matchPattern": "filter protocol ip pref 1 flower.*handle", 105 105 "matchCount": "1", 106 106 "teardown": [ ··· 119 119 ], 120 120 "cmdUnderTest": "$TC filter add dev $DEV2 protocol ip pref 1 ingress flower dst_mac e4:11:22:11:4a:51 action drop", 121 121 "expExitCode": "0", 122 - "verifyCmd": "$TC filter show terse dev $DEV2 ingress", 122 + "verifyCmd": "$TC -br filter show dev $DEV2 ingress", 123 123 "matchPattern": " dst_mac e4:11:22:11:4a:51", 124 124 "matchCount": "0", 125 125 "teardown": [