Merge tag 'docs-5.14' of git://git.lwn.net/linux

+1 -1

Documentation/ABI/obsolete/sysfs-cpuidle

··· 6 6 with the update that cpuidle governor can be changed at runtime in default, 7 7 both current_governor and current_governor_ro co-exist under 8 8 /sys/devices/system/cpu/cpuidle/ file, it's duplicate so make 9 - current_governor_ro obselete. 9 + current_governor_ro obsolete.

+1 -1

Documentation/ABI/removed/sysfs-kernel-uids

··· 5 5 Description: 6 6 The /sys/kernel/uids/<uid>/cpu_shares tunable is used 7 7 to set the cpu bandwidth a user is allowed. This is a 8 - propotional value. What that means is that if there 8 + proportional value. What that means is that if there 9 9 are two users logged in, each with an equal number of 10 10 shares, then they will get equal CPU bandwidth. Another 11 11 example would be, if User A has shares = 1024 and user

+1 -1

Documentation/ABI/stable/sysfs-bus-vmbus

··· 61 61 KernelVersion: 4.14 62 62 Contact: Stephen Hemminger <sthemmin@microsoft.com> 63 63 Description: Directory for per-channel information 64 - NN is the VMBUS relid associtated with the channel. 64 + NN is the VMBUS relid associated with the channel. 65 65 66 66 What: /sys/bus/vmbus/devices/<UUID>/channels/<N>/cpu 67 67 Date: September. 2017

+1 -1

Documentation/ABI/stable/sysfs-bus-xen-backend

··· 19 19 KernelVersion: 3.0 20 20 Contact: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> 21 21 Description: 22 - The major:minor number (in hexidecimal) of the 22 + The major:minor number (in hexadecimal) of the 23 23 physical device providing the storage for this backend 24 24 block device. 25 25

+83

Documentation/ABI/stable/sysfs-devices-system-cpu

··· 23 23 here). 24 24 If set by a process it will be inherited by child processes. 25 25 Values: 64 bit unsigned integer (bit field) 26 + 27 + What: /sys/devices/system/cpu/cpuX/topology/physical_package_id 28 + Description: physical package id of cpuX. Typically corresponds to a physical 29 + socket number, but the actual value is architecture and platform 30 + dependent. 31 + Values: integer 32 + 33 + What: /sys/devices/system/cpu/cpuX/topology/die_id 34 + Description: the CPU die ID of cpuX. Typically it is the hardware platform's 35 + identifier (rather than the kernel's). The actual value is 36 + architecture and platform dependent. 37 + Values: integer 38 + 39 + What: /sys/devices/system/cpu/cpuX/topology/core_id 40 + Description: the CPU core ID of cpuX. Typically it is the hardware platform's 41 + identifier (rather than the kernel's). The actual value is 42 + architecture and platform dependent. 43 + Values: integer 44 + 45 + What: /sys/devices/system/cpu/cpuX/topology/book_id 46 + Description: the book ID of cpuX. Typically it is the hardware platform's 47 + identifier (rather than the kernel's). The actual value is 48 + architecture and platform dependent. it's only used on s390. 49 + Values: integer 50 + 51 + What: /sys/devices/system/cpu/cpuX/topology/drawer_id 52 + Description: the drawer ID of cpuX. Typically it is the hardware platform's 53 + identifier (rather than the kernel's). The actual value is 54 + architecture and platform dependent. it's only used on s390. 55 + Values: integer 56 + 57 + What: /sys/devices/system/cpu/cpuX/topology/core_cpus 58 + Description: internal kernel map of CPUs within the same core. 59 + (deprecated name: "thread_siblings") 60 + Values: hexadecimal bitmask. 61 + 62 + What: /sys/devices/system/cpu/cpuX/topology/core_cpus_list 63 + Description: human-readable list of CPUs within the same core. 64 + The format is like 0-3, 8-11, 14,17. 65 + (deprecated name: "thread_siblings_list"). 66 + Values: decimal list. 67 + 68 + What: /sys/devices/system/cpu/cpuX/topology/package_cpus 69 + Description: internal kernel map of the CPUs sharing the same physical_package_id. 70 + (deprecated name: "core_siblings"). 71 + Values: hexadecimal bitmask. 72 + 73 + What: /sys/devices/system/cpu/cpuX/topology/package_cpus_list 74 + Description: human-readable list of CPUs sharing the same physical_package_id. 75 + The format is like 0-3, 8-11, 14,17. 76 + (deprecated name: "core_siblings_list") 77 + Values: decimal list. 78 + 79 + What: /sys/devices/system/cpu/cpuX/topology/die_cpus 80 + Description: internal kernel map of CPUs within the same die. 81 + Values: hexadecimal bitmask. 82 + 83 + What: /sys/devices/system/cpu/cpuX/topology/die_cpus_list 84 + Description: human-readable list of CPUs within the same die. 85 + The format is like 0-3, 8-11, 14,17. 86 + Values: decimal list. 87 + 88 + What: /sys/devices/system/cpu/cpuX/topology/book_siblings 89 + Description: internal kernel map of cpuX's hardware threads within the same 90 + book_id. it's only used on s390. 91 + Values: hexadecimal bitmask. 92 + 93 + What: /sys/devices/system/cpu/cpuX/topology/book_siblings_list 94 + Description: human-readable list of cpuX's hardware threads within the same 95 + book_id. 96 + The format is like 0-3, 8-11, 14,17. it's only used on s390. 97 + Values: decimal list. 98 + 99 + What: /sys/devices/system/cpu/cpuX/topology/drawer_siblings 100 + Description: internal kernel map of cpuX's hardware threads within the same 101 + drawer_id. it's only used on s390. 102 + Values: hexadecimal bitmask. 103 + 104 + What: /sys/devices/system/cpu/cpuX/topology/drawer_siblings_list 105 + Description: human-readable list of cpuX's hardware threads within the same 106 + drawer_id. 107 + The format is like 0-3, 8-11, 14,17. it's only used on s390. 108 + Values: decimal list.

+1 -1

Documentation/ABI/stable/sysfs-driver-dma-idxd

··· 173 173 Date: Oct 25, 2019 174 174 KernelVersion: 5.6.0 175 175 Contact: dmaengine@vger.kernel.org 176 - Description: The priority value of this work queue, it is a vlue relative to 176 + Description: The priority value of this work queue, it is a value relative to 177 177 other work queue in the same group to control quality of service 178 178 for dispatching work from multiple workqueues in the same group. 179 179

+2 -2

Documentation/ABI/stable/sysfs-driver-mlxreg-io

··· 137 137 Description: These files show the system reset cause, as following: 138 138 COMEX thermal shutdown; wathchdog power off or reset was derived 139 139 by one of the next components: COMEX, switch board or by Small Form 140 - Factor mezzanine, reset requested from ASIC, reset cuased by BIOS 140 + Factor mezzanine, reset requested from ASIC, reset caused by BIOS 141 141 reload. Value 1 in file means this is reset cause, 0 - otherwise. 142 142 Only one of the above causes could be 1 at the same time, representing 143 143 only last reset cause. ··· 183 183 Date: January 2020 184 184 KernelVersion: 5.6 185 185 Contact: Vadim Pasternak <vadimpmellanox.com> 186 - Description: This file allows to overwrite system VPD hardware wrtie 186 + Description: This file allows to overwrite system VPD hardware write 187 187 protection when attribute is set 1. 188 188 189 189 The file is read/write.

+1 -1

Documentation/ABI/testing/configfs-iio

··· 31 31 KernelVersion: 4.7 32 32 Description: 33 33 Dummy IIO devices directory. Creating a directory here will result 34 - in creating a dummy IIO device in the IIO subystem. 34 + in creating a dummy IIO device in the IIO subsystem.

+4 -4

Documentation/ABI/testing/configfs-most

··· 20 20 21 21 subbuffer_size 22 22 configure the sub-buffer size for this channel 23 - (needed for synchronous and isochrnous data) 23 + (needed for synchronous and isochronous data) 24 24 25 25 26 26 num_buffers ··· 75 75 76 76 subbuffer_size 77 77 configure the sub-buffer size for this channel 78 - (needed for synchronous and isochrnous data) 78 + (needed for synchronous and isochronous data) 79 79 80 80 81 81 num_buffers ··· 130 130 131 131 subbuffer_size 132 132 configure the sub-buffer size for this channel 133 - (needed for synchronous and isochrnous data) 133 + (needed for synchronous and isochronous data) 134 134 135 135 136 136 num_buffers ··· 196 196 197 197 subbuffer_size 198 198 configure the sub-buffer size for this channel 199 - (needed for synchronous and isochrnous data) 199 + (needed for synchronous and isochronous data) 200 200 201 201 202 202 num_buffers

+1 -1

Documentation/ABI/testing/configfs-usb-gadget

··· 137 137 This group contains "OS String" extension handling attributes. 138 138 139 139 ============= =============================================== 140 - use flag turning "OS Desctiptors" support on/off 140 + use flag turning "OS Descriptors" support on/off 141 141 b_vendor_code one-byte value used for custom per-device and 142 142 per-interface requests 143 143 qw_sign an identifier to be reported as "OS String"

+2 -2

Documentation/ABI/testing/configfs-usb-gadget-uvc

··· 170 170 bMatrixCoefficients matrix used to compute luma and 171 171 chroma values from the color primaries 172 172 bTransferCharacteristics optoelectronic transfer 173 - characteristic of the source picutre, 173 + characteristic of the source picture, 174 174 also called the gamma function 175 175 bColorPrimaries color primaries and the reference 176 176 white ··· 311 311 a hardware trigger interrupt event 312 312 bTriggerSupport flag specifying if hardware 313 313 triggering is supported 314 - bStillCaptureMethod method of still image caputre 314 + bStillCaptureMethod method of still image capture 315 315 supported 316 316 bTerminalLink id of the output terminal to which 317 317 the video endpoint of this interface

+1 -1

Documentation/ABI/testing/debugfs-driver-genwqe

··· 31 31 Date: Oct 2013 32 32 Contact: haver@linux.vnet.ibm.com 33 33 Description: Dump of the error registers before the last reset of 34 - the card occured. 34 + the card occurred. 35 35 Only available for PF. 36 36 37 37 What: /sys/kernel/debug/genwqe/genwqe<n>_card/prev_dbg_uid0

+1 -1

Documentation/ABI/testing/debugfs-driver-habanalabs

··· 153 153 Contact: ogabbay@kernel.org 154 154 Description: Triggers an I2C transaction that is generated by the device's 155 155 CPU. Writing to this file generates a write transaction while 156 - reading from the file generates a read transcation 156 + reading from the file generates a read transaction 157 157 158 158 What: /sys/kernel/debug/habanalabs/hl<n>/i2c_reg 159 159 Date: Jan 2019

+1 -1

Documentation/ABI/testing/sysfs-bus-fsi

··· 12 12 Contact: linux-fsi@lists.ozlabs.org 13 13 Description: 14 14 Sends an FSI BREAK command on a master's communication 15 - link to any connnected slaves. A BREAK resets connected 15 + link to any connected slaves. A BREAK resets connected 16 16 device's logic and preps it to receive further commands 17 17 from the master. 18 18

+3 -3

Documentation/ABI/testing/sysfs-bus-iio

··· 786 786 What: /sys/.../events/in_capacitanceY_adaptive_thresh_falling_en 787 787 KernelVersion: 5.13 788 788 Contact: linux-iio@vger.kernel.org 789 - Descrption: 789 + Description: 790 790 Adaptive thresholds are similar to normal fixed thresholds 791 791 but the value is expressed as an offset from a value which 792 792 provides a low frequency approximation of the channel itself. ··· 798 798 What: /sys/.../in_capacitanceY_adaptive_thresh_falling_timeout 799 799 KernelVersion: 5.11 800 800 Contact: linux-iio@vger.kernel.org 801 - Descrption: 801 + Description: 802 802 When adaptive thresholds are used, the tracking signal 803 803 may adjust too slowly to step changes in the raw signal. 804 - *_timeout (in seconds) specifies a time for which the 804 + Thus these specify the time in seconds for which the 805 805 difference between the slow tracking signal and the raw 806 806 signal is allowed to remain out-of-range before a reset 807 807 event occurs in which the tracking signal is made equal

+2 -2

Documentation/ABI/testing/sysfs-bus-pci

··· 139 139 binary file containing the Vital Product Data for the 140 140 device. It should follow the VPD format defined in 141 141 PCI Specification 2.1 or 2.2, but users should consider 142 - that some devices may have malformatted data. If the 143 - underlying VPD has a writable section then the 142 + that some devices may have incorrectly formatted data. 143 + If the underlying VPD has a writable section then the 144 144 corresponding section of this file will be writable. 145 145 146 146 What: /sys/bus/pci/devices/.../virtfnN

+100

Documentation/ABI/testing/sysfs-class-backlight

··· 84 84 It can be enabled by writing the value stored in 85 85 /sys/class/backlight/<backlight>/max_brightness to 86 86 /sys/class/backlight/<backlight>/brightness. 87 + 88 + What: /sys/class/backlight/<backlight>/<ambient light zone>_max 89 + Date: Sep, 2009 90 + KernelVersion: v2.6.32 91 + Contact: device-drivers-devel@blackfin.uclinux.org 92 + Description: 93 + Control the maximum brightness for <ambient light zone> 94 + on this <backlight>. Values are between 0 and 127. This file 95 + will also show the brightness level stored for this 96 + <ambient light zone>. 97 + 98 + The <ambient light zone> is device-driver specific: 99 + 100 + For ADP5520 and ADP5501, <ambient light zone> can be: 101 + 102 + =========== ================================================ 103 + Ambient sysfs entry 104 + light zone 105 + =========== ================================================ 106 + daylight /sys/class/backlight/<backlight>/daylight_max 107 + office /sys/class/backlight/<backlight>/office_max 108 + dark /sys/class/backlight/<backlight>/dark_max 109 + =========== ================================================ 110 + 111 + For ADP8860, <ambient light zone> can be: 112 + 113 + =========== ================================================ 114 + Ambient sysfs entry 115 + light zone 116 + =========== ================================================ 117 + l1_daylight /sys/class/backlight/<backlight>/l1_daylight_max 118 + l2_office /sys/class/backlight/<backlight>/l2_office_max 119 + l3_dark /sys/class/backlight/<backlight>/l3_dark_max 120 + =========== ================================================ 121 + 122 + For ADP8870, <ambient light zone> can be: 123 + 124 + =========== ================================================ 125 + Ambient sysfs entry 126 + light zone 127 + =========== ================================================ 128 + l1_daylight /sys/class/backlight/<backlight>/l1_daylight_max 129 + l2_bright /sys/class/backlight/<backlight>/l2_bright_max 130 + l3_office /sys/class/backlight/<backlight>/l3_office_max 131 + l4_indoor /sys/class/backlight/<backlight>/l4_indoor_max 132 + l5_dark /sys/class/backlight/<backlight>/l5_dark_max 133 + =========== ================================================ 134 + 135 + See also: /sys/class/backlight/<backlight>/ambient_light_zone. 136 + 137 + What: /sys/class/backlight/<backlight>/<ambient light zone>_dim 138 + Date: Sep, 2009 139 + KernelVersion: v2.6.32 140 + Contact: device-drivers-devel@blackfin.uclinux.org 141 + Description: 142 + Control the dim brightness for <ambient light zone> 143 + on this <backlight>. Values are between 0 and 127, typically 144 + set to 0. Full off when the backlight is disabled. 145 + This file will also show the dim brightness level stored for 146 + this <ambient light zone>. 147 + 148 + The <ambient light zone> is device-driver specific: 149 + 150 + For ADP5520 and ADP5501, <ambient light zone> can be: 151 + 152 + =========== ================================================ 153 + Ambient sysfs entry 154 + light zone 155 + =========== ================================================ 156 + daylight /sys/class/backlight/<backlight>/daylight_dim 157 + office /sys/class/backlight/<backlight>/office_dim 158 + dark /sys/class/backlight/<backlight>/dark_dim 159 + =========== ================================================ 160 + 161 + For ADP8860, <ambient light zone> can be: 162 + 163 + =========== ================================================ 164 + Ambient sysfs entry 165 + light zone 166 + =========== ================================================ 167 + l1_daylight /sys/class/backlight/<backlight>/l1_daylight_dim 168 + l2_office /sys/class/backlight/<backlight>/l2_office_dim 169 + l3_dark /sys/class/backlight/<backlight>/l3_dark_dim 170 + =========== ================================================ 171 + 172 + For ADP8870, <ambient light zone> can be: 173 + 174 + =========== ================================================ 175 + Ambient sysfs entry 176 + light zone 177 + =========== ================================================ 178 + l1_daylight /sys/class/backlight/<backlight>/l1_daylight_dim 179 + l2_bright /sys/class/backlight/<backlight>/l2_bright_dim 180 + l3_office /sys/class/backlight/<backlight>/l3_office_dim 181 + l4_indoor /sys/class/backlight/<backlight>/l4_indoor_dim 182 + l5_dark /sys/class/backlight/<backlight>/l5_dark_dim 183 + =========== ================================================ 184 + 185 + See also: /sys/class/backlight/<backlight>/ambient_light_zone. 186 +

-31

Documentation/ABI/testing/sysfs-class-backlight-adp5520

··· 1 - sysfs interface for analog devices adp5520(01) backlight driver 2 - --------------------------------------------------------------- 3 - 4 - The backlight brightness control operates at three different levels for the 5 - adp5520 and adp5501 devices: daylight (level 1), office (level 2) and dark 6 - (level 3). By default the brightness operates at the daylight brightness level. 7 - 8 - What: /sys/class/backlight/<backlight>/daylight_max 9 - What: /sys/class/backlight/<backlight>/office_max 10 - What: /sys/class/backlight/<backlight>/dark_max 11 - Date: Sep, 2009 12 - KernelVersion: v2.6.32 13 - Contact: Michael Hennerich <michael.hennerich@analog.com> 14 - Description: 15 - (RW) Maximum current setting for the backlight when brightness 16 - is at one of the three levels (daylight, office or dark). This 17 - is an input code between 0 and 127, which is transformed to a 18 - value between 0 mA and 30 mA using linear or non-linear 19 - algorithms. 20 - 21 - What: /sys/class/backlight/<backlight>/daylight_dim 22 - What: /sys/class/backlight/<backlight>/office_dim 23 - What: /sys/class/backlight/<backlight>/dark_dim 24 - Date: Sep, 2009 25 - KernelVersion: v2.6.32 26 - Contact: Michael Hennerich <michael.hennerich@analog.com> 27 - Description: 28 - (RW) Dim current setting for the backlight when brightness is at 29 - one of the three levels (daylight, office or dark). This is an 30 - input code between 0 and 127, which is transformed to a value 31 - between 0 mA and 30 mA using linear or non-linear algorithms.

-37

Documentation/ABI/testing/sysfs-class-backlight-adp8860

··· 1 - sysfs interface for analog devices adp8860 backlight driver 2 - ----------------------------------------------------------- 3 - 4 - The backlight brightness control operates at three different levels for the 5 - adp8860, adp8861 and adp8863 devices: daylight (level 1), office (level 2) and 6 - dark (level 3). By default the brightness operates at the daylight brightness 7 - level. 8 - 9 - See also /sys/class/backlight/<backlight>/ambient_light_level and 10 - /sys/class/backlight/<backlight>/ambient_light_zone. 11 - 12 - 13 - What: /sys/class/backlight/<backlight>/l1_daylight_max 14 - What: /sys/class/backlight/<backlight>/l2_office_max 15 - What: /sys/class/backlight/<backlight>/l3_dark_max 16 - Date: Apr, 2010 17 - KernelVersion: v2.6.35 18 - Contact: Michael Hennerich <michael.hennerich@analog.com> 19 - Description: 20 - (RW) Maximum current setting for the backlight when brightness 21 - is at one of the three levels (daylight, office or dark). This 22 - is an input code between 0 and 127, which is transformed to a 23 - value between 0 mA and 30 mA using linear or non-linear 24 - algorithms. 25 - 26 - 27 - What: /sys/class/backlight/<backlight>/l1_daylight_dim 28 - What: /sys/class/backlight/<backlight>/l2_office_dim 29 - What: /sys/class/backlight/<backlight>/l3_dark_dim 30 - Date: Apr, 2010 31 - KernelVersion: v2.6.35 32 - Contact: Michael Hennerich <michael.hennerich@analog.com> 33 - Description: 34 - (RW) Dim current setting for the backlight when brightness is at 35 - one of the three levels (daylight, office or dark). This is an 36 - input code between 0 and 127, which is transformed to a value 37 - between 0 mA and 30 mA using linear or non-linear algorithms.

-32

Documentation/ABI/testing/sysfs-class-backlight-driver-adp8870

··· 1 - See also /sys/class/backlight/<backlight>/ambient_light_level and 2 - /sys/class/backlight/<backlight>/ambient_light_zone. 3 - 4 - What: /sys/class/backlight/<backlight>/<ambient light zone>_max 5 - What: /sys/class/backlight/<backlight>/l1_daylight_max 6 - What: /sys/class/backlight/<backlight>/l2_bright_max 7 - What: /sys/class/backlight/<backlight>/l3_office_max 8 - What: /sys/class/backlight/<backlight>/l4_indoor_max 9 - What: /sys/class/backlight/<backlight>/l5_dark_max 10 - Date: May 2011 11 - KernelVersion: 3.0 12 - Contact: device-drivers-devel@blackfin.uclinux.org 13 - Description: 14 - Control the maximum brightness for <ambient light zone> 15 - on this <backlight>. Values are between 0 and 127. This file 16 - will also show the brightness level stored for this 17 - <ambient light zone>. 18 - 19 - What: /sys/class/backlight/<backlight>/<ambient light zone>_dim 20 - What: /sys/class/backlight/<backlight>/l2_bright_dim 21 - What: /sys/class/backlight/<backlight>/l3_office_dim 22 - What: /sys/class/backlight/<backlight>/l4_indoor_dim 23 - What: /sys/class/backlight/<backlight>/l5_dark_dim 24 - Date: May 2011 25 - KernelVersion: 3.0 26 - Contact: device-drivers-devel@blackfin.uclinux.org 27 - Description: 28 - Control the dim brightness for <ambient light zone> 29 - on this <backlight>. Values are between 0 and 127, typically 30 - set to 0. Full off when the backlight is disabled. 31 - This file will also show the dim brightness level stored for 32 - this <ambient light zone>.

-9

Documentation/ABI/testing/sysfs-class-led-driver-el15203000

··· 1 - What: /sys/class/leds/<led>/repeat 2 - Date: September 2019 3 - KernelVersion: 5.5 4 - Description: 5 - EL15203000 supports only indefinitely patterns, 6 - so this file should always store -1. 7 - 8 - For more info, please see: 9 - Documentation/ABI/testing/sysfs-class-led-trigger-pattern

+3

Documentation/ABI/testing/sysfs-class-led-trigger-pattern

··· 35 35 36 36 This file will always return the originally written repeat 37 37 number. 38 + 39 + It should be noticed that some leds, like EL15203000 may 40 + only support indefinitely patterns, so they always store -1.

+5 -5

Documentation/ABI/testing/sysfs-devices-system-cpu

··· 50 50 architecture specific. 51 51 52 52 release: writes to this file dynamically remove a CPU from 53 - the system. Information writtento the file to remove CPU's 53 + the system. Information written to the file to remove CPU's 54 54 is architecture specific. 55 55 56 56 What: /sys/devices/system/cpu/cpu#/node ··· 97 97 corresponds to a physical socket number, but the actual value 98 98 is architecture and platform dependent. 99 99 100 - thread_siblings: internel kernel map of cpu#'s hardware 100 + thread_siblings: internal kernel map of cpu#'s hardware 101 101 threads within the same core as cpu# 102 102 103 103 thread_siblings_list: human-readable list of cpu#'s hardware ··· 280 280 on a processor with this functionality will return the currently 281 281 disabled index for that node. There is one L3 structure per 282 282 node, or per internal node on MCM machines. Writing a valid 283 - index to one of these files will cause the specificed cache 283 + index to one of these files will cause the specified cache 284 284 index to be disabled. 285 285 286 286 All AMD processors with L3 caches provide this functionality. ··· 295 295 296 296 This switch controls the boost setting for the whole system. 297 297 Boosting allows the CPU and the firmware to run at a frequency 298 - beyound it's nominal limit. 298 + beyond it's nominal limit. 299 299 300 300 More details can be found in 301 301 Documentation/admin-guide/pm/cpufreq.rst ··· 532 532 /sys/devices/system/cpu/smt/control 533 533 Date: June 2018 534 534 Contact: Linux kernel mailing list <linux-kernel@vger.kernel.org> 535 - Description: Control Symetric Multi Threading (SMT) 535 + Description: Control Symmetric Multi Threading (SMT) 536 536 537 537 active: Tells whether SMT is active (enabled and siblings online) 538 538

+2 -2

Documentation/ABI/testing/sysfs-driver-ufs

··· 168 168 What: /sys/bus/platform/drivers/ufshcd/*/device_descriptor/manufacturer_id 169 169 Date: February 2018 170 170 Contact: Stanislav Nijnikov <stanislav.nijnikov@wdc.com> 171 - Description: This file shows the manufacturee ID. This is one of the 171 + Description: This file shows the manufacturer ID. This is one of the 172 172 UFS device descriptor parameters. The full information about 173 173 the descriptor could be found at UFS specifications 2.1. 174 174 ··· 521 521 What: /sys/bus/platform/drivers/ufshcd/*/string_descriptors/manufacturer_name 522 522 Date: February 2018 523 523 Contact: Stanislav Nijnikov <stanislav.nijnikov@wdc.com> 524 - Description: This file contains a device manufactureer name string. 524 + Description: This file contains a device manufacturer name string. 525 525 The full information about the descriptor could be found at 526 526 UFS specifications 2.1. 527 527

+1 -1

Documentation/ABI/testing/sysfs-fs-f2fs

··· 238 238 What: /sys/fs/f2fs/<disk>/gc_urgent 239 239 Date: August 2017 240 240 Contact: "Jaegeuk Kim" <jaegeuk@kernel.org> 241 - Description: Do background GC agressively when set. When gc_urgent = 1, 241 + Description: Do background GC aggressively when set. When gc_urgent = 1, 242 242 background thread starts to do GC by given gc_urgent_sleep_time 243 243 interval. When gc_urgent = 2, F2FS will lower the bar of 244 244 checking idle in order to process outstanding discard commands

+4 -8

Documentation/ABI/testing/sysfs-kernel-iommu_groups

··· 25 25 the base IOVA, the second is the end IOVA and the third 26 26 field describes the type of the region. 27 27 28 - What: /sys/kernel/iommu_groups/reserved_regions 29 - Date: June 2019 30 - KernelVersion: v5.3 31 - Contact: Eric Auger <eric.auger@redhat.com> 32 - Description: In case an RMRR is used only by graphics or USB devices 33 - it is now exposed as "direct-relaxable" instead of "direct". 34 - In device assignment use case, for instance, those RMRR 35 - are considered to be relaxable and safe. 28 + Since kernel 5.3, in case an RMRR is used only by graphics or 29 + USB devices it is now exposed as "direct-relaxable" instead 30 + of "direct". In device assignment use case, for instance, 31 + those RMRR are considered to be relaxable and safe. 36 32 37 33 What: /sys/kernel/iommu_groups/<grp_id>/type 38 34 Date: November 2020

+1 -1

Documentation/Makefile

··· 76 76 PYTHONDONTWRITEBYTECODE=1 \ 77 77 BUILDDIR=$(abspath $(BUILDDIR)) SPHINX_CONF=$(abspath $(srctree)/$(src)/$5/$(SPHINX_CONF)) \ 78 78 $(PYTHON3) $(srctree)/scripts/jobserver-exec \ 79 - $(SHELL) $(srctree)/Documentation/sphinx/parallel-wrapper.sh \ 79 + $(CONFIG_SHELL) $(srctree)/Documentation/sphinx/parallel-wrapper.sh \ 80 80 $(SPHINXBUILD) \ 81 81 -b $2 \ 82 82 -c $(abspath $(srctree)/$(src)) \

+9 -9

Documentation/PCI/acpi-info.rst

··· 22 22 controllers and a _PRT is needed to describe those connections. 23 23 24 24 ACPI resource description is done via _CRS objects of devices in the ACPI 25 - namespace [2]. The _CRS is like a generalized PCI BAR: the OS can read 25 + namespace [2]. The _CRS is like a generalized PCI BAR: the OS can read 26 26 _CRS and figure out what resource is being consumed even if it doesn't have 27 - a driver for the device [3]. That's important because it means an old OS 27 + a driver for the device [3]. That's important because it means an old OS 28 28 can work correctly even on a system with new devices unknown to the OS. 29 29 The new devices might not do anything, but the OS can at least make sure no 30 30 resources conflict with them. ··· 41 41 driver to bind to it, and the _CRS tells the OS and the driver where the 42 42 device's registers are. 43 43 44 - PCI host bridges are PNP0A03 or PNP0A08 devices. Their _CRS should 45 - describe all the address space they consume. This includes all the windows 44 + PCI host bridges are PNP0A03 or PNP0A08 devices. Their _CRS should 45 + describe all the address space they consume. This includes all the windows 46 46 they forward down to the PCI bus, as well as registers of the host bridge 47 - itself that are not forwarded to PCI. The host bridge registers include 47 + itself that are not forwarded to PCI. The host bridge registers include 48 48 things like secondary/subordinate bus registers that determine the bus 49 49 range below the bridge, window registers that describe the apertures, etc. 50 50 These are all device-specific, non-architected things, so the only way a 51 51 PNP0A03/PNP0A08 driver can manage them is via _PRS/_CRS/_SRS, which contain 52 - the device-specific details. The host bridge registers also include ECAM 52 + the device-specific details. The host bridge registers also include ECAM 53 53 space, since it is consumed by the host bridge. 54 54 55 55 ACPI defines a Consumer/Producer bit to distinguish the bridge registers ··· 66 66 bridge registers (including ECAM space) in PNP0C02 catch-all devices [6]. 67 67 With the exception of ECAM, the bridge register space is device-specific 68 68 anyway, so the generic PNP0A03/PNP0A08 driver (pci_root.c) has no need to 69 - know about it. 69 + know about it. 70 70 71 71 New architectures should be able to use "Consumer" Extended Address Space 72 72 descriptors in the PNP0A03 device for bridge registers, including ECAM, ··· 75 75 Extended Address Space ones, are windows, so it would not be safe to 76 76 describe bridge registers this way on those architectures. 77 77 78 - PNP0C02 "motherboard" devices are basically a catch-all. There's no 78 + PNP0C02 "motherboard" devices are basically a catch-all. There's no 79 79 programming model for them other than "don't use these resources for 80 - anything else." So a PNP0C02 _CRS should claim any address space that is 80 + anything else." So a PNP0C02 _CRS should claim any address space that is 81 81 (1) not claimed by _CRS under any other device object in the ACPI namespace 82 82 and (2) should not be assigned by the OS to something else. 83 83

+1 -1

Documentation/PCI/endpoint/pci-endpoint-cfs.rst

··· 125 125 | interrupt_pin 126 126 | function 127 127 128 - [1] :doc:`pci-endpoint` 128 + [1] Documentation/PCI/endpoint/pci-endpoint.rst

+3 -3

Documentation/PCI/pci.rst

··· 265 265 --------------------- 266 266 .. note:: 267 267 If anything below doesn't make sense, please refer to 268 - :doc:`/core-api/dma-api`. This section is just a reminder that 268 + Documentation/core-api/dma-api.rst. This section is just a reminder that 269 269 drivers need to indicate DMA capabilities of the device and is not 270 270 an authoritative source for DMA interfaces. 271 271 ··· 291 291 Setup shared control data 292 292 ------------------------- 293 293 Once the DMA masks are set, the driver can allocate "consistent" (a.k.a. shared) 294 - memory. See :doc:`/core-api/dma-api` for a full description of 294 + memory. See Documentation/core-api/dma-api.rst for a full description of 295 295 the DMA APIs. This section is just a reminder that it needs to be done 296 296 before enabling DMA on the device. 297 297 ··· 421 421 422 422 Then clean up "consistent" buffers which contain the control data. 423 423 424 - See :doc:`/core-api/dma-api` for details on unmapping interfaces. 424 + See Documentation/core-api/dma-api.rst for details on unmapping interfaces. 425 425 426 426 427 427 Unregister from other subsystems

+4 -81

Documentation/admin-guide/cputopology.rst

··· 2 2 How CPU topology info is exported via sysfs 3 3 =========================================== 4 4 5 - Export CPU topology info via sysfs. Items (attributes) are similar 6 - to /proc/cpuinfo output of some architectures. They reside in 7 - /sys/devices/system/cpu/cpuX/topology/: 8 - 9 - physical_package_id: 10 - 11 - physical package id of cpuX. Typically corresponds to a physical 12 - socket number, but the actual value is architecture and platform 13 - dependent. 14 - 15 - die_id: 16 - 17 - the CPU die ID of cpuX. Typically it is the hardware platform's 18 - identifier (rather than the kernel's). The actual value is 19 - architecture and platform dependent. 20 - 21 - core_id: 22 - 23 - the CPU core ID of cpuX. Typically it is the hardware platform's 24 - identifier (rather than the kernel's). The actual value is 25 - architecture and platform dependent. 26 - 27 - book_id: 28 - 29 - the book ID of cpuX. Typically it is the hardware platform's 30 - identifier (rather than the kernel's). The actual value is 31 - architecture and platform dependent. 32 - 33 - drawer_id: 34 - 35 - the drawer ID of cpuX. Typically it is the hardware platform's 36 - identifier (rather than the kernel's). The actual value is 37 - architecture and platform dependent. 38 - 39 - core_cpus: 40 - 41 - internal kernel map of CPUs within the same core. 42 - (deprecated name: "thread_siblings") 43 - 44 - core_cpus_list: 45 - 46 - human-readable list of CPUs within the same core. 47 - (deprecated name: "thread_siblings_list"); 48 - 49 - package_cpus: 50 - 51 - internal kernel map of the CPUs sharing the same physical_package_id. 52 - (deprecated name: "core_siblings") 53 - 54 - package_cpus_list: 55 - 56 - human-readable list of CPUs sharing the same physical_package_id. 57 - (deprecated name: "core_siblings_list") 58 - 59 - die_cpus: 60 - 61 - internal kernel map of CPUs within the same die. 62 - 63 - die_cpus_list: 64 - 65 - human-readable list of CPUs within the same die. 66 - 67 - book_siblings: 68 - 69 - internal kernel map of cpuX's hardware threads within the same 70 - book_id. 71 - 72 - book_siblings_list: 73 - 74 - human-readable list of cpuX's hardware threads within the same 75 - book_id. 76 - 77 - drawer_siblings: 78 - 79 - internal kernel map of cpuX's hardware threads within the same 80 - drawer_id. 81 - 82 - drawer_siblings_list: 83 - 84 - human-readable list of cpuX's hardware threads within the same 85 - drawer_id. 5 + CPU topology info is exported via sysfs. Items (attributes) are similar 6 + to /proc/cpuinfo output of some architectures. They reside in 7 + /sys/devices/system/cpu/cpuX/topology/. Please refer to the ABI file: 8 + Documentation/ABI/stable/sysfs-devices-system-cpu. 86 9 87 10 Architecture-neutral, drivers/base/topology.c, exports these attributes. 88 11 However, the book and drawer related sysfs files will only be created if

+1 -1

Documentation/admin-guide/ext4.rst

··· 392 392 393 393 dax 394 394 Use direct access (no page cache). See 395 - Documentation/filesystems/dax.txt. Note that this option is 395 + Documentation/filesystems/dax.rst. Note that this option is 396 396 incompatible with data=journal. 397 397 398 398 inlinecrypt

+2 -1

Documentation/admin-guide/hw-vuln/special-register-buffer-data-sampling.rst

··· 3 3 SRBDS - Special Register Buffer Data Sampling 4 4 ============================================= 5 5 6 - SRBDS is a hardware vulnerability that allows MDS :doc:`mds` techniques to 6 + SRBDS is a hardware vulnerability that allows MDS 7 + Documentation/admin-guide/hw-vuln/mds.rst techniques to 7 8 infer values returned from special register accesses. Special register 8 9 accesses are accesses to off core registers. According to Intel's evaluation, 9 10 the special register reads that have a security expectation of privacy are

+120 -72

Documentation/admin-guide/kdump/kdump.rst

··· 2 2 Documentation for Kdump - The kexec-based Crash Dumping Solution 3 3 ================================================================ 4 4 5 - This document includes overview, setup and installation, and analysis 5 + This document includes overview, setup, installation, and analysis 6 6 information. 7 7 8 8 Overview ··· 13 13 the system panics). The system kernel's memory image is preserved across 14 14 the reboot and is accessible to the dump-capture kernel. 15 15 16 - You can use common commands, such as cp and scp, to copy the 17 - memory image to a dump file on the local disk, or across the network to 18 - a remote system. 16 + You can use common commands, such as cp, scp or makedumpfile to copy 17 + the memory image to a dump file on the local disk, or across the network 18 + to a remote system. 19 19 20 20 Kdump and kexec are currently supported on the x86, x86_64, ppc64, ia64, 21 21 s390x, arm and arm64 architectures. ··· 26 26 The kexec -p command loads the dump-capture kernel into this reserved 27 27 memory. 28 28 29 - On x86 machines, the first 640 KB of physical memory is needed to boot, 30 - regardless of where the kernel loads. Therefore, kexec backs up this 31 - region just before rebooting into the dump-capture kernel. 29 + On x86 machines, the first 640 KB of physical memory is needed for boot, 30 + regardless of where the kernel loads. For simpler handling, the whole 31 + low 1M is reserved to avoid any later kernel or device driver writing 32 + data into this area. Like this, the low 1M can be reused as system RAM 33 + by kdump kernel without extra handling. 32 34 33 - Similarly on PPC64 machines first 32KB of physical memory is needed for 34 - booting regardless of where the kernel is loaded and to support 64K page 35 - size kexec backs up the first 64KB memory. 35 + On PPC64 machines first 32KB of physical memory is needed for booting 36 + regardless of where the kernel is loaded and to support 64K page size 37 + kexec backs up the first 64KB memory. 36 38 37 39 For s390x, when kdump is triggered, the crashkernel region is exchanged 38 40 with the region [0, crashkernel region size] and then the kdump kernel ··· 48 46 parameter. Optionally the size of the ELF header can also be passed 49 47 when using the elfcorehdr=[size[KMG]@]offset[KMG] syntax. 50 48 51 - 52 49 With the dump-capture kernel, you can access the memory image through 53 50 /proc/vmcore. This exports the dump as an ELF-format file that you can 54 - write out using file copy commands such as cp or scp. Further, you can 55 - use analysis tools such as the GNU Debugger (GDB) and the Crash tool to 56 - debug the dump file. This method ensures that the dump pages are correctly 57 - ordered. 58 - 51 + write out using file copy commands such as cp or scp. You can also use 52 + makedumpfile utility to analyze and write out filtered contents with 53 + options, e.g with '-d 31' it will only write out kernel data. Further, 54 + you can use analysis tools such as the GNU Debugger (GDB) and the Crash 55 + tool to debug the dump file. This method ensures that the dump pages are 56 + correctly ordered. 59 57 60 58 Setup and Installation 61 59 ====================== ··· 127 125 System kernel config options 128 126 ---------------------------- 129 127 130 - 1) Enable "kexec system call" in "Processor type and features.":: 128 + 1) Enable "kexec system call" or "kexec file based system call" in 129 + "Processor type and features.":: 131 130 132 - CONFIG_KEXEC=y 131 + CONFIG_KEXEC=y or CONFIG_KEXEC_FILE=y 132 + 133 + And both of them will select KEXEC_CORE:: 134 + 135 + CONFIG_KEXEC_CORE=y 136 + 137 + Subsequently, CRASH_CORE is selected by KEXEC_CORE:: 138 + 139 + CONFIG_CRASH_CORE=y 133 140 134 141 2) Enable "sysfs file system support" in "Filesystem" -> "Pseudo 135 142 filesystems." This is usually enabled by default:: ··· 186 175 187 176 CONFIG_HIGHMEM4G 188 177 189 - 2) On i386 and x86_64, disable symmetric multi-processing support 190 - under "Processor type and features":: 178 + 2) With CONFIG_SMP=y, usually nr_cpus=1 need specified on the kernel 179 + command line when loading the dump-capture kernel because one 180 + CPU is enough for kdump kernel to dump vmcore on most of systems. 191 181 192 - CONFIG_SMP=n 182 + However, you can also specify nr_cpus=X to enable multiple processors 183 + in kdump kernel. In this case, "disable_cpu_apicid=" is needed to 184 + tell kdump kernel which cpu is 1st kernel's BSP. Please refer to 185 + admin-guide/kernel-parameters.txt for more details. 193 186 194 - (If CONFIG_SMP=y, then specify maxcpus=1 on the kernel command line 195 - when loading the dump-capture kernel, see section "Load the Dump-capture 196 - Kernel".) 187 + With CONFIG_SMP=n, the above things are not related. 197 188 198 - 3) If one wants to build and use a relocatable kernel, 199 - Enable "Build a relocatable kernel" support under "Processor type and 189 + 3) A relocatable kernel is suggested to be built by default. If not yet, 190 + enable "Build a relocatable kernel" support under "Processor type and 200 191 features":: 201 192 202 193 CONFIG_RELOCATABLE=y ··· 245 232 as a dump-capture kernel if desired. 246 233 247 234 The crashkernel region can be automatically placed by the system 248 - kernel at run time. This is done by specifying the base address as 0, 235 + kernel at runtime. This is done by specifying the base address as 0, 249 236 or omitting it all together:: 250 237 251 238 crashkernel=256M@0 ··· 253 240 or:: 254 241 255 242 crashkernel=256M 256 - 257 - If the start address is specified, note that the start address of the 258 - kernel will be aligned to 64Mb, so if the start address is not then 259 - any space below the alignment point will be wasted. 260 243 261 244 Dump-capture kernel config options (Arch Dependent, arm) 262 245 ---------------------------------------------------------- ··· 269 260 on non-VHE systems even if it is configured. This is because the CPU 270 261 will not be reset to EL2 on panic. 271 262 272 - Extended crashkernel syntax 263 + crashkernel syntax 273 264 =========================== 265 + 1) crashkernel=size@offset 274 266 275 - While the "crashkernel=size[@offset]" syntax is sufficient for most 276 - configurations, sometimes it's handy to have the reserved memory dependent 277 - on the value of System RAM -- that's mostly for distributors that pre-setup 278 - the kernel command line to avoid a unbootable system after some memory has 279 - been removed from the machine. 280 - 281 - The syntax is:: 282 - 283 - crashkernel=<range1>:<size1>[,<range2>:<size2>,...][@offset] 284 - range=start-[end] 285 - 286 - For example:: 287 - 288 - crashkernel=512M-2G:64M,2G-:128M 289 - 290 - This would mean: 291 - 292 - 1) if the RAM is smaller than 512M, then don't reserve anything 293 - (this is the "rescue" case) 294 - 2) if the RAM size is between 512M and 2G (exclusive), then reserve 64M 295 - 3) if the RAM size is larger than 2G, then reserve 128M 296 - 297 - 298 - 299 - Boot into System Kernel 300 - ======================= 301 - 302 - 1) Update the boot loader (such as grub, yaboot, or lilo) configuration 303 - files as necessary. 304 - 305 - 2) Boot the system kernel with the boot parameter "crashkernel=Y@X", 306 - where Y specifies how much memory to reserve for the dump-capture kernel 307 - and X specifies the beginning of this reserved memory. For example, 267 + Here 'size' specifies how much memory to reserve for the dump-capture kernel 268 + and 'offset' specifies the beginning of this reserved memory. For example, 308 269 "crashkernel=64M@16M" tells the system kernel to reserve 64 MB of memory 309 270 starting at physical address 0x01000000 (16MB) for the dump-capture kernel. 310 271 311 - On x86 and x86_64, use "crashkernel=64M@16M". 272 + The crashkernel region can be automatically placed by the system 273 + kernel at run time. This is done by specifying the base address as 0, 274 + or omitting it all together:: 275 + 276 + crashkernel=256M@0 277 + 278 + or:: 279 + 280 + crashkernel=256M 281 + 282 + If the start address is specified, note that the start address of the 283 + kernel will be aligned to a value (which is Arch dependent), so if the 284 + start address is not then any space below the alignment point will be 285 + wasted. 286 + 287 + 2) range1:size1[,range2:size2,...][@offset] 288 + 289 + While the "crashkernel=size[@offset]" syntax is sufficient for most 290 + configurations, sometimes it's handy to have the reserved memory dependent 291 + on the value of System RAM -- that's mostly for distributors that pre-setup 292 + the kernel command line to avoid a unbootable system after some memory has 293 + been removed from the machine. 294 + 295 + The syntax is:: 296 + 297 + crashkernel=<range1>:<size1>[,<range2>:<size2>,...][@offset] 298 + range=start-[end] 299 + 300 + For example:: 301 + 302 + crashkernel=512M-2G:64M,2G-:128M 303 + 304 + This would mean: 305 + 306 + 1) if the RAM is smaller than 512M, then don't reserve anything 307 + (this is the "rescue" case) 308 + 2) if the RAM size is between 512M and 2G (exclusive), then reserve 64M 309 + 3) if the RAM size is larger than 2G, then reserve 128M 310 + 311 + 3) crashkernel=size,high and crashkernel=size,low 312 + 313 + If memory above 4G is preferred, crashkernel=size,high can be used to 314 + fulfill that. With it, physical memory is allowed to be allocated from top, 315 + so could be above 4G if system has more than 4G RAM installed. Otherwise, 316 + memory region will be allocated below 4G if available. 317 + 318 + When crashkernel=X,high is passed, kernel could allocate physical memory 319 + region above 4G, low memory under 4G is needed in this case. There are 320 + three ways to get low memory: 321 + 322 + 1) Kernel will allocate at least 256M memory below 4G automatically 323 + if crashkernel=Y,low is not specified. 324 + 2) Let user specify low memory size instead. 325 + 3) Specified value 0 will disable low memory allocation:: 326 + 327 + crashkernel=0,low 328 + 329 + Boot into System Kernel 330 + ----------------------- 331 + 1) Update the boot loader (such as grub, yaboot, or lilo) configuration 332 + files as necessary. 333 + 334 + 2) Boot the system kernel with the boot parameter "crashkernel=Y@X". 335 + 336 + On x86 and x86_64, use "crashkernel=Y[@X]". Most of the time, the 337 + start address 'X' is not necessary, kernel will search a suitable 338 + area. Unless an explicit start address is expected. 312 339 313 340 On ppc64, use "crashkernel=128M@32M". 314 341 ··· 376 331 377 332 For i386 and x86_64: 378 333 379 - - Use vmlinux if kernel is not relocatable. 380 334 - Use bzImage/vmlinuz if kernel is relocatable. 335 + - Use vmlinux if kernel is not relocatable. 381 336 382 337 For ppc64: 383 338 ··· 437 392 438 393 For i386, x86_64 and ia64: 439 394 440 - "1 irqpoll maxcpus=1 reset_devices" 395 + "1 irqpoll nr_cpus=1 reset_devices" 441 396 442 397 For ppc64: 443 398 ··· 445 400 446 401 For s390x: 447 402 448 - "1 maxcpus=1 cgroup_disable=memory" 403 + "1 nr_cpus=1 cgroup_disable=memory" 449 404 450 405 For arm: 451 406 ··· 453 408 454 409 For arm64: 455 410 456 - "1 maxcpus=1 reset_devices" 411 + "1 nr_cpus=1 reset_devices" 457 412 458 413 Notes on loading the dump-capture kernel: 459 414 ··· 533 488 534 489 cp /proc/vmcore <dump-file> 535 490 491 + You can also use makedumpfile utility to write out the dump file 492 + with specified options to filter out unwanted contents, e.g:: 493 + 494 + makedumpfile -l --message-level 1 -d 31 /proc/vmcore <dump-file> 536 495 537 496 Analysis 538 497 ======== ··· 584 535 Contact 585 536 ======= 586 537 587 - - Vivek Goyal (vgoyal@redhat.com) 588 - - Maneesh Soni (maneesh@in.ibm.com) 538 + - kexec@lists.infradead.org 589 539 590 540 GDB macros 591 541 ==========

+3

Documentation/admin-guide/kernel-parameters.txt

··· 3513 3513 3514 3514 nr_uarts= [SERIAL] maximum number of UARTs to be registered. 3515 3515 3516 + numa=off [KNL, ARM64, PPC, RISCV, SPARC, X86] Disable NUMA, Only 3517 + set up a single NUMA node spanning all memory. 3518 + 3516 3519 numa_balancing= [KNL,ARM64,PPC,RISCV,S390,X86] Enable or disable automatic 3517 3520 NUMA balancing. 3518 3521 Allowed values are enable and disable

+10 -6

Documentation/admin-guide/pm/intel_idle.rst

··· 20 20 a particular processor model in it depends on whether or not it recognizes that 21 21 processor model and may also depend on information coming from the platform 22 22 firmware. [To understand ``intel_idle`` it is necessary to know how ``CPUIdle`` 23 - works in general, so this is the time to get familiar with :doc:`cpuidle` if you 24 - have not done that yet.] 23 + works in general, so this is the time to get familiar with 24 + Documentation/admin-guide/pm/cpuidle.rst if you have not done that yet.] 25 25 26 26 ``intel_idle`` uses the ``MWAIT`` instruction to inform the processor that the 27 27 logical CPU executing it is idle and so it may be possible to put some of the ··· 53 53 depend on the configuration of the platform. 54 54 55 55 In order to create a list of available idle states required by the ``CPUIdle`` 56 - subsystem (see :ref:`idle-states-representation` in :doc:`cpuidle`), 56 + subsystem (see :ref:`idle-states-representation` in 57 + Documentation/admin-guide/pm/cpuidle.rst), 57 58 ``intel_idle`` can use two sources of information: static tables of idle states 58 59 for different processor models included in the driver itself and the ACPI tables 59 60 of the system. The former are always used if the processor model at hand is ··· 99 98 preliminary list of idle states coming from the ACPI tables. In that case user 100 99 space still can enable them later (on a per-CPU basis) with the help of 101 100 the ``disable`` idle state attribute in ``sysfs`` (see 102 - :ref:`idle-states-representation` in :doc:`cpuidle`). This basically means that 101 + :ref:`idle-states-representation` in 102 + Documentation/admin-guide/pm/cpuidle.rst). This basically means that 103 103 the idle states "known" to the driver may not be enabled by default if they have 104 104 not been exposed by the platform firmware (through the ACPI tables). 105 105 ··· 188 186 states in question cannot be enabled during system startup, because in the 189 187 working state of the system the CPU power management quality of service (PM 190 188 QoS) feature can be used to prevent ``CPUIdle`` from touching those idle states 191 - even if they have been enumerated (see :ref:`cpu-pm-qos` in :doc:`cpuidle`). 189 + even if they have been enumerated (see :ref:`cpu-pm-qos` in 190 + Documentation/admin-guide/pm/cpuidle.rst). 192 191 Setting ``max_cstate`` to 0 causes the ``intel_idle`` initialization to fail. 193 192 194 193 The ``no_acpi`` and ``use_acpi`` module parameters (recognized by ``intel_idle`` ··· 205 202 the indices of idle states to be disabled by default (as reflected by the names 206 203 of the corresponding idle state directories in ``sysfs``, :file:`state0`, 207 204 :file:`state1` ... :file:`state<i>` ..., where ``<i>`` is the index of the given 208 - idle state; see :ref:`idle-states-representation` in :doc:`cpuidle`). 205 + idle state; see :ref:`idle-states-representation` in 206 + Documentation/admin-guide/pm/cpuidle.rst). 209 207 210 208 For example, if ``states_off`` is equal to 3, the driver will disable idle 211 209 states 0 and 1 by default, and if it is equal to 8, idle state 3 will be

+5 -4

Documentation/admin-guide/pm/intel_pstate.rst

··· 18 18 (``CPUFreq``). It is a scaling driver for the Sandy Bridge and later 19 19 generations of Intel processors. Note, however, that some of those processors 20 20 may not be supported. [To understand ``intel_pstate`` it is necessary to know 21 - how ``CPUFreq`` works in general, so this is the time to read :doc:`cpufreq` if 22 - you have not done that yet.] 21 + how ``CPUFreq`` works in general, so this is the time to read 22 + Documentation/admin-guide/pm/cpufreq.rst if you have not done that yet.] 23 23 24 24 For the processors supported by ``intel_pstate``, the P-state concept is broader 25 25 than just an operating frequency or an operating performance point (see the ··· 445 445 ----------------------------------- 446 446 447 447 The interpretation of some ``CPUFreq`` policy attributes described in 448 - :doc:`cpufreq` is special with ``intel_pstate`` as the current scaling driver 449 - and it generally depends on the driver's `operation mode <Operation Modes_>`_. 448 + Documentation/admin-guide/pm/cpufreq.rst is special with ``intel_pstate`` 449 + as the current scaling driver and it generally depends on the driver's 450 + `operation mode <Operation Modes_>`_. 450 451 451 452 First of all, the values of the ``cpuinfo_max_freq``, ``cpuinfo_min_freq`` and 452 453 ``scaling_cur_freq`` attributes are produced by applying a processor-specific

+1 -1

Documentation/admin-guide/reporting-issues.rst

··· 1248 1248 1249 1249 In case you performed a successful bisection, use the title of the change that 1250 1250 introduced the regression as the second part of your subject. Make the report 1251 - also mention the commit id of the culprit. In case of an unsuccessful bisection, 1251 + also mention the commit id of the culprit. In case of an unsuccessful bisection, 1252 1252 make your report mention the latest tested version that's working fine (say 5.7) 1253 1253 and the oldest where the issue occurs (say 5.8-rc1). 1254 1254

+1 -1

Documentation/admin-guide/sysctl/abi.rst

··· 11 11 12 12 Copyright (c) 2020, Stephen Kitt 13 13 14 - For general info, see :doc:`index`. 14 + For general info, see Documentation/admin-guide/sysctl/index.rst. 15 15 16 16 ------------------------------------------------------------------------------ 17 17

+26 -18

Documentation/admin-guide/sysctl/kernel.rst

··· 9 9 10 10 Copyright (c) 2009, Shen Feng<shen@cn.fujitsu.com> 11 11 12 - For general info and legal blurb, please look in :doc:`index`. 12 + For general info and legal blurb, please look in 13 + Documentation/admin-guide/sysctl/index.rst. 13 14 14 15 ------------------------------------------------------------------------------ 15 16 ··· 55 54 acpi_video_flags 56 55 ================ 57 56 58 - See :doc:`/power/video`. This allows the video resume mode to be set, 57 + See Documentation/power/video.rst. This allows the video resume mode to be set, 59 58 in a similar fashion to the ``acpi_sleep`` kernel parameter, by 60 59 combining the following values: 61 60 ··· 90 89 the value 340 = 0x154. 91 90 92 91 See the ``type_of_loader`` and ``ext_loader_type`` fields in 93 - :doc:`/x86/boot` for additional information. 92 + Documentation/x86/boot.rst for additional information. 94 93 95 94 96 95 bootloader_version (x86 only) ··· 100 99 file will contain the value 564 = 0x234. 101 100 102 101 See the ``type_of_loader`` and ``ext_loader_ver`` fields in 103 - :doc:`/x86/boot` for additional information. 102 + Documentation/x86/boot.rst for additional information. 104 103 105 104 106 105 bpf_stats_enabled ··· 270 269 firmware_config 271 270 =============== 272 271 273 - See :doc:`/driver-api/firmware/fallback-mechanisms`. 272 + See Documentation/driver-api/firmware/fallback-mechanisms.rst. 274 273 275 274 The entries in this directory allow the firmware loader helper 276 275 fallback to be controlled: ··· 298 297 ftrace_enabled, stack_tracer_enabled 299 298 ==================================== 300 299 301 - See :doc:`/trace/ftrace`. 300 + See Documentation/trace/ftrace.rst. 302 301 303 302 304 303 hardlockup_all_cpu_backtrace ··· 326 325 1 Panic on hard lockup. 327 326 = =========================== 328 327 329 - See :doc:`/admin-guide/lockup-watchdogs` for more information. 328 + See Documentation/admin-guide/lockup-watchdogs.rst for more information. 330 329 This can also be set using the nmi_watchdog kernel parameter. 331 330 332 331 ··· 334 333 ======= 335 334 336 335 Path for the hotplug policy agent. 337 - Default value is "``/sbin/hotplug``". 336 + Default value is ``CONFIG_UEVENT_HELPER_PATH``, which in turn defaults 337 + to the empty string. 338 + 339 + This file only exists when ``CONFIG_UEVENT_HELPER`` is enabled. Most 340 + modern systems rely exclusively on the netlink-based uevent source and 341 + don't need this. 338 342 339 343 340 344 hung_task_all_cpu_backtrace ··· 588 582 589 583 nmi_watchdog=1 590 584 591 - to the guest kernel command line (see :doc:`/admin-guide/kernel-parameters`). 585 + to the guest kernel command line (see 586 + Documentation/admin-guide/kernel-parameters.rst). 592 587 593 588 594 589 numa_balancing ··· 1074 1067 real-root-dev 1075 1068 ============= 1076 1069 1077 - See :doc:`/admin-guide/initrd`. 1070 + See Documentation/admin-guide/initrd.rst. 1078 1071 1079 1072 1080 1073 reboot-cmd (SPARC only) ··· 1168 1161 seccomp 1169 1162 ======= 1170 1163 1171 - See :doc:`/userspace-api/seccomp_filter`. 1164 + See Documentation/userspace-api/seccomp_filter.rst. 1172 1165 1173 1166 1174 1167 sg-big-buff ··· 1339 1332 sysrq 1340 1333 ===== 1341 1334 1342 - See :doc:`/admin-guide/sysrq`. 1335 + See Documentation/admin-guide/sysrq.rst. 1343 1336 1344 1337 1345 1338 tainted ··· 1369 1362 131072 `(T)` The kernel was built with the struct randomization plugin 1370 1363 ====== ===== ============================================================== 1371 1364 1372 - See :doc:`/admin-guide/tainted-kernels` for more information. 1365 + See Documentation/admin-guide/tainted-kernels.rst for more information. 1373 1366 1374 1367 Note: 1375 1368 writes to this sysctl interface will fail with ``EINVAL`` if the kernel is 1376 1369 booted with the command line option ``panic_on_taint=<bitmask>,nousertaint`` 1377 1370 and any of the ORed together values being written to ``tainted`` match with 1378 1371 the bitmask declared on panic_on_taint. 1379 - See :doc:`/admin-guide/kernel-parameters` for more details on that particular 1380 - kernel command line option and its optional ``nousertaint`` switch. 1372 + See Documentation/admin-guide/kernel-parameters.rst for more details on 1373 + that particular kernel command line option and its optional 1374 + ``nousertaint`` switch. 1381 1375 1382 1376 threads-max 1383 1377 =========== ··· 1402 1394 traceoff_on_warning 1403 1395 =================== 1404 1396 1405 - When set, disables tracing (see :doc:`/trace/ftrace`) when a 1397 + When set, disables tracing (see Documentation/trace/ftrace.rst) when a 1406 1398 ``WARN()`` is hit. 1407 1399 1408 1400 ··· 1422 1414 1423 1415 This only works if the kernel was booted with ``tp_printk`` enabled. 1424 1416 1425 - See :doc:`/admin-guide/kernel-parameters` and 1426 - :doc:`/trace/boottime-trace`. 1417 + See Documentation/admin-guide/kernel-parameters.rst and 1418 + Documentation/trace/boottime-trace.rst. 1427 1419 1428 1420 1429 1421 .. _unaligned-dump-stack:

+1 -1

Documentation/arm/marvell.rst

··· 259 259 https://web.archive.org/web/20191129073953/http://www.marvell.com/storage/armada-sp/ 260 260 261 261 Core: 262 - Sheeva ARMv7 comatible Quad-core PJ4C 262 + Sheeva ARMv7 compatible Quad-core PJ4C 263 263 264 264 (not supported in upstream Linux kernel) 265 265

+1 -1

Documentation/block/biodoc.rst

··· 196 196 do not have a corresponding kernel virtual address space mapping) and 197 197 low-memory pages. 198 198 199 - Note: Please refer to :doc:`/core-api/dma-api-howto` for a discussion 199 + Note: Please refer to Documentation/core-api/dma-api-howto.rst for a discussion 200 200 on PCI high mem DMA aspects and mapping of scatter gather lists, and support 201 201 for 64 bit PCI. 202 202

+2 -2

Documentation/block/blk-mq.rst

··· 62 62 Software staging queues 63 63 ~~~~~~~~~~~~~~~~~~~~~~~ 64 64 65 - The block IO subsystem adds requests in the software staging queues 65 + The block IO subsystem adds requests in the software staging queues 66 66 (represented by struct blk_mq_ctx) in case that they weren't sent 67 67 directly to the driver. A request is one or more BIOs. They arrived at the 68 68 block layer through the data structure struct bio. The block layer ··· 132 132 identified by an integer, ranging from 0 to the dispatch queue size. This tag 133 133 is generated by the block layer and later reused by the device driver, removing 134 134 the need to create a redundant identifier. When a request is completed in the 135 - drive, the tag is sent back to the block layer to notify it of the finalization. 135 + driver, the tag is sent back to the block layer to notify it of the finalization. 136 136 This removes the need to do a linear search to find out which IO has been 137 137 completed. 138 138

+1 -1

Documentation/block/stat.rst

··· 18 18 each, it would be impossible to guarantee that a set of readings 19 19 represent a single point in time. 20 20 21 - The stat file consists of a single line of text containing 11 decimal 21 + The stat file consists of a single line of text containing 17 decimal 22 22 values separated by whitespace. The fields are summarized in the 23 23 following table, and described in more detail below. 24 24

+7 -6

Documentation/bpf/bpf_lsm.rst

··· 20 20 Other LSM hooks which can be instrumented can be found in 21 21 ``include/linux/lsm_hooks.h``. 22 22 23 - eBPF programs that use :doc:`/bpf/btf` do not need to include kernel headers 24 - for accessing information from the attached eBPF program's context. They can 25 - simply declare the structures in the eBPF program and only specify the fields 26 - that need to be accessed. 23 + eBPF programs that use Documentation/bpf/btf.rst do not need to include kernel 24 + headers for accessing information from the attached eBPF program's context. 25 + They can simply declare the structures in the eBPF program and only specify 26 + the fields that need to be accessed. 27 27 28 28 .. code-block:: c 29 29 ··· 88 88 89 89 The ``__attribute__((preserve_access_index))`` is a clang feature that allows 90 90 the BPF verifier to update the offsets for the access at runtime using the 91 - :doc:`/bpf/btf` information. Since the BPF verifier is aware of the types, it 92 - also validates all the accesses made to the various types in the eBPF program. 91 + Documentation/bpf/btf.rst information. Since the BPF verifier is aware of the 92 + types, it also validates all the accesses made to the various types in the 93 + eBPF program. 93 94 94 95 Loading 95 96 -------

+15 -9

Documentation/conf.py

··· 41 41 'maintainers_include', 'sphinx.ext.autosectionlabel', 42 42 'kernel_abi', 'kernel_feat'] 43 43 44 - # 45 - # cdomain is badly broken in Sphinx 3+. Leaving it out generates *most* 46 - # of the docs correctly, but not all. Scream bloody murder but allow 47 - # the process to proceed; hopefully somebody will fix this properly soon. 48 - # 49 44 if major >= 3: 50 - sys.stderr.write('''WARNING: The kernel documentation build process 51 - support for Sphinx v3.0 and above is brand new. Be prepared for 52 - possible issues in the generated output.\n''') 53 45 if (major > 3) or (minor > 0 or patch >= 2): 54 46 # Sphinx c function parser is more pedantic with regards to type 55 47 # checking. Due to that, having macros at c:function cause problems. ··· 345 353 346 354 # Additional stuff for the LaTeX preamble. 347 355 'preamble': ''' 356 + % Prevent column squeezing of tabulary. 357 + \\setlength{\\tymin}{20em} 348 358 % Use some font with UTF-8 support with XeLaTeX 349 359 \\usepackage{fontspec} 350 360 \\setsansfont{DejaVu Sans} ··· 360 366 361 367 cjk_cmd = check_output(['fc-list', '--format="%{family[0]}\n"']).decode('utf-8', 'ignore') 362 368 if cjk_cmd.find("Noto Sans CJK SC") >= 0: 363 - print ("enabling CJK for LaTeX builder") 364 369 latex_elements['preamble'] += ''' 365 370 % This is needed for translations 366 371 \\usepackage{xeCJK} 367 372 \\setCJKmainfont{Noto Sans CJK SC} 373 + % Define custom macros to on/off CJK 374 + \\newcommand{\\kerneldocCJKon}{\\makexeCJKactive} 375 + \\newcommand{\\kerneldocCJKoff}{\\makexeCJKinactive} 376 + % To customize \sphinxtableofcontents 377 + \\usepackage{etoolbox} 378 + % Inactivate CJK after tableofcontents 379 + \\apptocmd{\\sphinxtableofcontents}{\\kerneldocCJKoff}{}{} 380 + ''' 381 + else: 382 + latex_elements['preamble'] += ''' 383 + % Custom macros to on/off CJK (Dummy) 384 + \\newcommand{\\kerneldocCJKon}{} 385 + \\newcommand{\\kerneldocCJKoff}{} 368 386 ''' 369 387 370 388 # Fix reference escape troubles with Sphinx 1.4.x

+1 -1

Documentation/core-api/bus-virt-phys-mapping.rst

··· 8 8 9 9 The virt_to_bus() and bus_to_virt() functions have been 10 10 superseded by the functionality provided by the PCI DMA interface 11 - (see :doc:`/core-api/dma-api-howto`). They continue 11 + (see Documentation/core-api/dma-api-howto.rst). They continue 12 12 to be documented below for historical purposes, but new code 13 13 must not use them. --davidm 00/12/12 14 14

+3 -2

Documentation/core-api/dma-api.rst

··· 5 5 :Author: James E.J. Bottomley <James.Bottomley@HansenPartnership.com> 6 6 7 7 This document describes the DMA API. For a more gentle introduction 8 - of the API (and actual examples), see :doc:`/core-api/dma-api-howto`. 8 + of the API (and actual examples), see Documentation/core-api/dma-api-howto.rst. 9 9 10 10 This API is split into two pieces. Part I describes the basic API. 11 11 Part II describes extensions for supporting non-consistent memory ··· 479 479 dma_attrs. 480 480 481 481 The interpretation of DMA attributes is architecture-specific, and 482 - each attribute should be documented in :doc:`/core-api/dma-attributes`. 482 + each attribute should be documented in 483 + Documentation/core-api/dma-attributes.rst. 483 484 484 485 If dma_attrs are 0, the semantics of each of these functions 485 486 is identical to those of the corresponding function

+1 -1

Documentation/core-api/dma-isa-lpc.rst

··· 17 17 #include <asm/dma.h> 18 18 19 19 The first is the generic DMA API used to convert virtual addresses to 20 - bus addresses (see :doc:`/core-api/dma-api` for details). 20 + bus addresses (see Documentation/core-api/dma-api.rst for details). 21 21 22 22 The second contains the routines specific to ISA DMA transfers. Since 23 23 this is not present on all platforms make sure you construct your

+2 -2

Documentation/core-api/index.rst

··· 48 48 ====================== 49 49 50 50 How Linux keeps everything from happening at the same time. See 51 - :doc:`/locking/index` for more related documentation. 51 + Documentation/locking/index.rst for more related documentation. 52 52 53 53 .. toctree:: 54 54 :maxdepth: 1 ··· 77 77 ================= 78 78 79 79 How to allocate and use memory in the kernel. Note that there is a lot 80 - more memory-management documentation in :doc:`/vm/index`. 80 + more memory-management documentation in Documentation/vm/index.rst. 81 81 82 82 .. toctree:: 83 83 :maxdepth: 1

+4 -5

Documentation/core-api/printk-formats.rst

··· 37 37 u64 %llu or %llx 38 38 39 39 40 - If <type> is dependent on a config option for its size (e.g., sector_t, 41 - blkcnt_t) or is architecture-dependent for its size (e.g., tcflag_t), use a 42 - format specifier of its largest possible type and explicitly cast to it. 40 + If <type> is architecture-dependent for its size (e.g., cycles_t, tcflag_t) or 41 + is dependent on a config option for its size (e.g., blk_status_t), use a format 42 + specifier of its largest possible type and explicitly cast to it. 43 43 44 44 Example:: 45 45 46 - printk("test: sector number/total blocks: %llu/%llu\n", 47 - (unsigned long long)sector, (unsigned long long)blockcount); 46 + printk("test: latency: %llu cycles\n", (unsigned long long)time); 48 47 49 48 Reminder: sizeof() returns type size_t. 50 49

+461 -48

Documentation/dev-tools/checkpatch.rst

··· 246 246 The first argument for kcalloc or kmalloc_array should be the 247 247 number of elements. sizeof() as the first argument is generally 248 248 wrong. 249 + 249 250 See: https://www.kernel.org/doc/html/latest/core-api/memory-allocation.html 250 251 251 252 **ALLOC_SIZEOF_STRUCT** ··· 265 264 **ALLOC_WITH_MULTIPLY** 266 265 Prefer kmalloc_array/kcalloc over kmalloc/kzalloc with a 267 266 sizeof multiply. 267 + 268 268 See: https://www.kernel.org/doc/html/latest/core-api/memory-allocation.html 269 269 270 270 ··· 286 284 BUG() or BUG_ON() should be avoided totally. 287 285 Use WARN() and WARN_ON() instead, and handle the "impossible" 288 286 error condition as gracefully as possible. 287 + 289 288 See: https://www.kernel.org/doc/html/latest/process/deprecated.html#bug-and-bug-on 290 289 291 290 **CONSIDER_KSTRTO** ··· 295 292 may lead to unexpected results in callers. The respective kstrtol(), 296 293 kstrtoll(), kstrtoul(), and kstrtoull() functions tend to be the 297 294 correct replacements. 295 + 298 296 See: https://www.kernel.org/doc/html/latest/process/deprecated.html#simple-strtol-simple-strtoll-simple-strtoul-simple-strtoull 297 + 298 + **CONSTANT_CONVERSION** 299 + Use of __constant_<foo> form is discouraged for the following functions:: 300 + 301 + __constant_cpu_to_be[x] 302 + __constant_cpu_to_le[x] 303 + __constant_be[x]_to_cpu 304 + __constant_le[x]_to_cpu 305 + __constant_htons 306 + __constant_ntohs 307 + 308 + Using any of these outside of include/uapi/ is not preferred as using the 309 + function without __constant_ is identical when the argument is a 310 + constant. 311 + 312 + In big endian systems, the macros like __constant_cpu_to_be32(x) and 313 + cpu_to_be32(x) expand to the same expression:: 314 + 315 + #define __constant_cpu_to_be32(x) ((__force __be32)(__u32)(x)) 316 + #define __cpu_to_be32(x) ((__force __be32)(__u32)(x)) 317 + 318 + In little endian systems, the macros __constant_cpu_to_be32(x) and 319 + cpu_to_be32(x) expand to __constant_swab32 and __swab32. __swab32 320 + has a __builtin_constant_p check:: 321 + 322 + #define __swab32(x) \ 323 + (__builtin_constant_p((__u32)(x)) ? \ 324 + ___constant_swab32(x) : \ 325 + __fswab32(x)) 326 + 327 + So ultimately they have a special case for constants. 328 + Similar is the case with all of the macros in the list. Thus 329 + using the __constant_... forms are unnecessarily verbose and 330 + not preferred outside of include/uapi. 331 + 332 + See: https://lore.kernel.org/lkml/1400106425.12666.6.camel@joe-AO725/ 333 + 334 + **DEPRECATED_API** 335 + Usage of a deprecated RCU API is detected. It is recommended to replace 336 + old flavourful RCU APIs by their new vanilla-RCU counterparts. 337 + 338 + The full list of available RCU APIs can be viewed from the kernel docs. 339 + 340 + See: https://www.kernel.org/doc/html/latest/RCU/whatisRCU.html#full-list-of-rcu-apis 341 + 342 + **DEPRECATED_VARIABLE** 343 + EXTRA_{A,C,CPP,LD}FLAGS are deprecated and should be replaced by the new 344 + flags added via commit f77bf01425b1 ("kbuild: introduce ccflags-y, 345 + asflags-y and ldflags-y"). 346 + 347 + The following conversion scheme maybe used:: 348 + 349 + EXTRA_AFLAGS -> asflags-y 350 + EXTRA_CFLAGS -> ccflags-y 351 + EXTRA_CPPFLAGS -> cppflags-y 352 + EXTRA_LDFLAGS -> ldflags-y 353 + 354 + See: 355 + 356 + 1. https://lore.kernel.org/lkml/20070930191054.GA15876@uranus.ravnborg.org/ 357 + 2. https://lore.kernel.org/lkml/1313384834-24433-12-git-send-email-lacombar@gmail.com/ 358 + 3. https://www.kernel.org/doc/html/latest/kbuild/makefiles.html#compilation-flags 359 + 360 + **DEVICE_ATTR_FUNCTIONS** 361 + The function names used in DEVICE_ATTR is unusual. 362 + Typically, the store and show functions are used with <attr>_store and 363 + <attr>_show, where <attr> is a named attribute variable of the device. 364 + 365 + Consider the following examples:: 366 + 367 + static DEVICE_ATTR(type, 0444, type_show, NULL); 368 + static DEVICE_ATTR(power, 0644, power_show, power_store); 369 + 370 + The function names should preferably follow the above pattern. 371 + 372 + See: https://www.kernel.org/doc/html/latest/driver-api/driver-model/device.html#attributes 373 + 374 + **DEVICE_ATTR_RO** 375 + The DEVICE_ATTR_RO(name) helper macro can be used instead of 376 + DEVICE_ATTR(name, 0444, name_show, NULL); 377 + 378 + Note that the macro automatically appends _show to the named 379 + attribute variable of the device for the show method. 380 + 381 + See: https://www.kernel.org/doc/html/latest/driver-api/driver-model/device.html#attributes 382 + 383 + **DEVICE_ATTR_RW** 384 + The DEVICE_ATTR_RW(name) helper macro can be used instead of 385 + DEVICE_ATTR(name, 0644, name_show, name_store); 386 + 387 + Note that the macro automatically appends _show and _store to the 388 + named attribute variable of the device for the show and store methods. 389 + 390 + See: https://www.kernel.org/doc/html/latest/driver-api/driver-model/device.html#attributes 391 + 392 + **DEVICE_ATTR_WO** 393 + The DEVICE_AATR_WO(name) helper macro can be used instead of 394 + DEVICE_ATTR(name, 0200, NULL, name_store); 395 + 396 + Note that the macro automatically appends _store to the 397 + named attribute variable of the device for the store method. 398 + 399 + See: https://www.kernel.org/doc/html/latest/driver-api/driver-model/device.html#attributes 400 + 401 + **DUPLICATED_SYSCTL_CONST** 402 + Commit d91bff3011cf ("proc/sysctl: add shared variables for range 403 + check") added some shared const variables to be used instead of a local 404 + copy in each source file. 405 + 406 + Consider replacing the sysctl range checking value with the shared 407 + one in include/linux/sysctl.h. The following conversion scheme may 408 + be used:: 409 + 410 + &zero -> SYSCTL_ZERO 411 + &one -> SYSCTL_ONE 412 + &int_max -> SYSCTL_INT_MAX 413 + 414 + See: 415 + 416 + 1. https://lore.kernel.org/lkml/20190430180111.10688-1-mcroce@redhat.com/ 417 + 2. https://lore.kernel.org/lkml/20190531131422.14970-1-mcroce@redhat.com/ 418 + 419 + **ENOSYS** 420 + ENOSYS means that a nonexistent system call was called. 421 + Earlier, it was wrongly used for things like invalid operations on 422 + otherwise valid syscalls. This should be avoided in new code. 423 + 424 + See: https://lore.kernel.org/lkml/5eb299021dec23c1a48fa7d9f2c8b794e967766d.1408730669.git.luto@amacapital.net/ 425 + 426 + **ENOTSUPP** 427 + ENOTSUPP is not a standard error code and should be avoided in new patches. 428 + EOPNOTSUPP should be used instead. 429 + 430 + See: https://lore.kernel.org/netdev/20200510182252.GA411829@lunn.ch/ 431 + 432 + **EXPORT_SYMBOL** 433 + EXPORT_SYMBOL should immediately follow the symbol to be exported. 434 + 435 + **IN_ATOMIC** 436 + in_atomic() is not for driver use so any such use is reported as an ERROR. 437 + Also in_atomic() is often used to determine if sleeping is permitted, 438 + but it is not reliable in this use model. Therefore its use is 439 + strongly discouraged. 440 + 441 + However, in_atomic() is ok for core kernel use. 442 + 443 + See: https://lore.kernel.org/lkml/20080320201723.b87b3732.akpm@linux-foundation.org/ 299 444 300 445 **LOCKDEP** 301 446 The lockdep_no_validate class was added as a temporary measure to 302 447 prevent warnings on conversion of device->sem to device->mutex. 303 448 It should not be used for any other purpose. 449 + 304 450 See: https://lore.kernel.org/lkml/1268959062.9440.467.camel@laptop/ 305 451 306 452 **MALFORMED_INCLUDE** ··· 460 308 **USE_LOCKDEP** 461 309 lockdep_assert_held() annotations should be preferred over 462 310 assertions based on spin_is_locked() 311 + 463 312 See: https://www.kernel.org/doc/html/latest/locking/lockdep-design.html#annotations 464 313 465 314 **UAPI_INCLUDE** 466 315 No #include statements in include/uapi should use a uapi/ path. 467 316 317 + **USLEEP_RANGE** 318 + usleep_range() should be preferred over udelay(). The proper way of 319 + using usleep_range() is mentioned in the kernel docs. 468 320 469 - Comment style 470 - ------------- 321 + See: https://www.kernel.org/doc/html/latest/timers/timers-howto.html#delays-information-on-the-various-kernel-delay-sleep-mechanisms 322 + 323 + 324 + Comments 325 + -------- 471 326 472 327 **BLOCK_COMMENT_STYLE** 473 328 The comment style is incorrect. The preferred style for multi- ··· 497 338 **C99_COMMENTS** 498 339 C99 style single line comments (//) should not be used. 499 340 Prefer the block comment style instead. 341 + 500 342 See: https://www.kernel.org/doc/html/latest/process/coding-style.html#commenting 343 + 344 + **DATA_RACE** 345 + Applications of data_race() should have a comment so as to document the 346 + reasoning behind why it was deemed safe. 347 + 348 + See: https://lore.kernel.org/lkml/20200401101714.44781-1-elver@google.com/ 349 + 350 + **FSF_MAILING_ADDRESS** 351 + Kernel maintainers reject new instances of the GPL boilerplate paragraph 352 + directing people to write to the FSF for a copy of the GPL, since the 353 + FSF has moved in the past and may do so again. 354 + So do not write paragraphs about writing to the Free Software Foundation's 355 + mailing address. 356 + 357 + See: https://lore.kernel.org/lkml/20131006222342.GT19510@leaf/ 501 358 502 359 503 360 Commit message ··· 522 347 **BAD_SIGN_OFF** 523 348 The signed-off-by line does not fall in line with the standards 524 349 specified by the community. 350 + 525 351 See: https://www.kernel.org/doc/html/latest/process/submitting-patches.html#developer-s-certificate-of-origin-1-1 526 352 527 353 **BAD_STABLE_ADDRESS_STYLE** ··· 544 368 **COMMIT_MESSAGE** 545 369 The patch is missing a commit description. A brief 546 370 description of the changes made by the patch should be added. 371 + 547 372 See: https://www.kernel.org/doc/html/latest/process/submitting-patches.html#describe-your-changes 373 + 374 + **EMAIL_SUBJECT** 375 + Naming the tool that found the issue is not very useful in the 376 + subject line. A good subject line summarizes the change that 377 + the patch brings. 378 + 379 + See: https://www.kernel.org/doc/html/latest/process/submitting-patches.html#describe-your-changes 380 + 381 + **FROM_SIGN_OFF_MISMATCH** 382 + The author's email does not match with that in the Signed-off-by: 383 + line(s). This can be sometimes caused due to an improperly configured 384 + email client. 385 + 386 + This message is emitted due to any of the following reasons:: 387 + 388 + - The email names do not match. 389 + - The email addresses do not match. 390 + - The email subaddresses do not match. 391 + - The email comments do not match. 548 392 549 393 **MISSING_SIGN_OFF** 550 394 The patch is missing a Signed-off-by line. A signed-off-by 551 395 line should be added according to Developer's certificate of 552 396 Origin. 397 + 553 398 See: https://www.kernel.org/doc/html/latest/process/submitting-patches.html#sign-your-work-the-developer-s-certificate-of-origin 554 399 555 400 **NO_AUTHOR_SIGN_OFF** ··· 579 382 end of explanation of the patch to denote that the author has 580 383 written it or otherwise has the rights to pass it on as an open 581 384 source patch. 385 + 582 386 See: https://www.kernel.org/doc/html/latest/process/submitting-patches.html#sign-your-work-the-developer-s-certificate-of-origin 583 387 584 388 **DIFF_IN_COMMIT_MSG** ··· 587 389 This causes problems when one tries to apply a file containing both 588 390 the changelog and the diff because patch(1) tries to apply the diff 589 391 which it found in the changelog. 392 + 590 393 See: https://lore.kernel.org/lkml/20150611134006.9df79a893e3636019ad2759e@linux-foundation.org/ 591 394 592 395 **GERRIT_CHANGE_ID** ··· 630 431 **BOOL_COMPARISON** 631 432 Comparisons of A to true and false are better written 632 433 as A and !A. 434 + 633 435 See: https://lore.kernel.org/lkml/1365563834.27174.12.camel@joe-AO722/ 634 436 635 437 **COMPARISON_TO_NULL** ··· 640 440 **CONSTANT_COMPARISON** 641 441 Comparisons with a constant or upper case identifier on the left 642 442 side of the test should be avoided. 443 + 444 + 445 + Indentation and Line Breaks 446 + --------------------------- 447 + 448 + **CODE_INDENT** 449 + Code indent should use tabs instead of spaces. 450 + Outside of comments, documentation and Kconfig, 451 + spaces are never used for indentation. 452 + 453 + See: https://www.kernel.org/doc/html/latest/process/coding-style.html#indentation 454 + 455 + **DEEP_INDENTATION** 456 + Indentation with 6 or more tabs usually indicate overly indented 457 + code. 458 + 459 + It is suggested to refactor excessive indentation of 460 + if/else/for/do/while/switch statements. 461 + 462 + See: https://lore.kernel.org/lkml/1328311239.21255.24.camel@joe2Laptop/ 463 + 464 + **SWITCH_CASE_INDENT_LEVEL** 465 + switch should be at the same indent as case. 466 + Example:: 467 + 468 + switch (suffix) { 469 + case 'G': 470 + case 'g': 471 + mem <<= 30; 472 + break; 473 + case 'M': 474 + case 'm': 475 + mem <<= 20; 476 + break; 477 + case 'K': 478 + case 'k': 479 + mem <<= 10; 480 + fallthrough; 481 + default: 482 + break; 483 + } 484 + 485 + See: https://www.kernel.org/doc/html/latest/process/coding-style.html#indentation 486 + 487 + **LONG_LINE** 488 + The line has exceeded the specified maximum length. 489 + To use a different maximum line length, the --max-line-length=n option 490 + may be added while invoking checkpatch. 491 + 492 + Earlier, the default line length was 80 columns. Commit bdc48fa11e46 493 + ("checkpatch/coding-style: deprecate 80-column warning") increased the 494 + limit to 100 columns. This is not a hard limit either and it's 495 + preferable to stay within 80 columns whenever possible. 496 + 497 + See: https://www.kernel.org/doc/html/latest/process/coding-style.html#breaking-long-lines-and-strings 498 + 499 + **LONG_LINE_STRING** 500 + A string starts before but extends beyond the maximum line length. 501 + To use a different maximum line length, the --max-line-length=n option 502 + may be added while invoking checkpatch. 503 + 504 + See: https://www.kernel.org/doc/html/latest/process/coding-style.html#breaking-long-lines-and-strings 505 + 506 + **LONG_LINE_COMMENT** 507 + A comment starts before but extends beyond the maximum line length. 508 + To use a different maximum line length, the --max-line-length=n option 509 + may be added while invoking checkpatch. 510 + 511 + See: https://www.kernel.org/doc/html/latest/process/coding-style.html#breaking-long-lines-and-strings 512 + 513 + **TRAILING_STATEMENTS** 514 + Trailing statements (for example after any conditional) should be 515 + on the next line. 516 + Statements, such as:: 517 + 518 + if (x == y) break; 519 + 520 + should be:: 521 + 522 + if (x == y) 523 + break; 643 524 644 525 645 526 Macros, Attributes and Symbols ··· 753 472 754 473 **BIT_MACRO** 755 474 Defines like: 1 << <digit> could be BIT(digit). 756 - The BIT() macro is defined in include/linux/bitops.h:: 475 + The BIT() macro is defined via include/linux/bits.h:: 757 476 758 477 #define BIT(nr) (1UL << (nr)) 759 478 ··· 773 492 The kernel does *not* use the ``__DATE__`` and ``__TIME__`` macros, 774 493 and enables warnings if they are used as they can lead to 775 494 non-deterministic builds. 495 + 776 496 See: https://www.kernel.org/doc/html/latest/kbuild/reproducible-builds.html#timestamps 777 497 778 498 **DEFINE_ARCH_HAS** ··· 784 502 want architectures able to override them with optimized ones, we 785 503 should either use weak functions (appropriate for some cases), or 786 504 the symbol that protects them should be the same symbol we use. 505 + 787 506 See: https://lore.kernel.org/lkml/CA+55aFycQ9XJvEOsiM3txHL5bjUc8CeKWJNR_H+MiicaddB42Q@mail.gmail.com/ 507 + 508 + **DO_WHILE_MACRO_WITH_TRAILING_SEMICOLON** 509 + do {} while(0) macros should not have a trailing semicolon. 788 510 789 511 **INIT_ATTRIBUTE** 790 512 Const init definitions should use __initconst instead of ··· 814 528 ... 815 529 } 816 530 531 + **MISPLACED_INIT** 532 + It is possible to use section markers on variables in a way 533 + which gcc doesn't understand (or at least not the way the 534 + developer intended):: 535 + 536 + static struct __initdata samsung_pll_clock exynos4_plls[nr_plls] = { 537 + 538 + does not put exynos4_plls in the .initdata section. The __initdata 539 + marker can be virtually anywhere on the line, except right after 540 + "struct". The preferred location is before the "=" sign if there is 541 + one, or before the trailing ";" otherwise. 542 + 543 + See: https://lore.kernel.org/lkml/1377655732.3619.19.camel@joe-AO722/ 544 + 817 545 **MULTISTATEMENT_MACRO_USE_DO_WHILE** 818 546 Macros with multiple statements should be enclosed in a 819 547 do - while block. Same should also be the case for macros ··· 841 541 842 542 See: https://www.kernel.org/doc/html/latest/process/coding-style.html#macros-enums-and-rtl 843 543 544 + **PREFER_FALLTHROUGH** 545 + Use the `fallthrough;` pseudo keyword instead of 546 + `/* fallthrough */` like comments. 547 + 844 548 **WEAK_DECLARATION** 845 549 Using weak declarations like __attribute__((weak)) or __weak 846 550 can have unintended link defects. Avoid using them. ··· 855 551 856 552 **CAMELCASE** 857 553 Avoid CamelCase Identifiers. 554 + 858 555 See: https://www.kernel.org/doc/html/latest/process/coding-style.html#naming 556 + 557 + **CONST_CONST** 558 + Using `const <type> const *` is generally meant to be 559 + written `const <type> * const`. 560 + 561 + **CONST_STRUCT** 562 + Using const is generally a good idea. Checkpatch reads 563 + a list of frequently used structs that are always or 564 + almost always constant. 565 + 566 + The existing structs list can be viewed from 567 + `scripts/const_structs.checkpatch`. 568 + 569 + See: https://lore.kernel.org/lkml/alpine.DEB.2.10.1608281509480.3321@hadrien/ 570 + 571 + **EMBEDDED_FUNCTION_NAME** 572 + Embedded function names are less appropriate to use as 573 + refactoring can cause function renaming. Prefer the use of 574 + "%s", __func__ to embedded function names. 575 + 576 + Note that this does not work with -f (--file) checkpatch option 577 + as it depends on patch context providing the function name. 578 + 579 + **FUNCTION_ARGUMENTS** 580 + This warning is emitted due to any of the following reasons: 581 + 582 + 1. Arguments for the function declaration do not follow 583 + the identifier name. Example:: 584 + 585 + void foo 586 + (int bar, int baz) 587 + 588 + This should be corrected to:: 589 + 590 + void foo(int bar, int baz) 591 + 592 + 2. Some arguments for the function definition do not 593 + have an identifier name. Example:: 594 + 595 + void foo(int) 596 + 597 + All arguments should have identifier names. 859 598 860 599 **FUNCTION_WITHOUT_ARGS** 861 600 Function declarations without arguments like:: ··· 928 581 can simply be:: 929 582 930 583 return bar; 584 + 585 + 586 + Permissions 587 + ----------- 588 + 589 + **DEVICE_ATTR_PERMS** 590 + The permissions used in DEVICE_ATTR are unusual. 591 + Typically only three permissions are used - 0644 (RW), 0444 (RO) 592 + and 0200 (WO). 593 + 594 + See: https://www.kernel.org/doc/html/latest/filesystems/sysfs.html#attributes 595 + 596 + **EXECUTE_PERMISSIONS** 597 + There is no reason for source files to be executable. The executable 598 + bit can be removed safely. 599 + 600 + **EXPORTED_WORLD_WRITABLE** 601 + Exporting world writable sysfs/debugfs files is usually a bad thing. 602 + When done arbitrarily they can introduce serious security bugs. 603 + In the past, some of the debugfs vulnerabilities would seemingly allow 604 + any local user to write arbitrary values into device registers - a 605 + situation from which little good can be expected to emerge. 606 + 607 + See: https://lore.kernel.org/linux-arm-kernel/cover.1296818921.git.segoon@openwall.com/ 608 + 609 + **NON_OCTAL_PERMISSIONS** 610 + Permission bits should use 4 digit octal permissions (like 0700 or 0444). 611 + Avoid using any other base like decimal. 931 612 932 613 933 614 Spacing and Brackets ··· 991 616 992 617 1. With a type on the left:: 993 618 994 - ;int [] a; 619 + int [] a; 995 620 996 621 2. At the beginning of a line for slice initialisers:: 997 622 ··· 1000 625 3. Inside a curly brace:: 1001 626 1002 627 = { [0...10] = 5 } 1003 - 1004 - **CODE_INDENT** 1005 - Code indent should use tabs instead of spaces. 1006 - Outside of comments, documentation and Kconfig, 1007 - spaces are never used for indentation. 1008 - See: https://www.kernel.org/doc/html/latest/process/coding-style.html#indentation 1009 628 1010 629 **CONCATENATED_STRING** 1011 630 Concatenated elements should have a space in between. ··· 1013 644 1014 645 **ELSE_AFTER_BRACE** 1015 646 `else {` should follow the closing block `}` on the same line. 647 + 1016 648 See: https://www.kernel.org/doc/html/latest/process/coding-style.html#placing-braces-and-spaces 1017 649 1018 650 **LINE_SPACING** 1019 651 Vertical space is wasted given the limited number of lines an 1020 652 editor window can display when multiple blank lines are used. 653 + 1021 654 See: https://www.kernel.org/doc/html/latest/process/coding-style.html#spaces 1022 655 1023 656 **OPEN_BRACE** 1024 657 The opening brace should be following the function definitions on the 1025 658 next line. For any non-functional block it should be on the same line 1026 659 as the last construct. 660 + 1027 661 See: https://www.kernel.org/doc/html/latest/process/coding-style.html#placing-braces-and-spaces 1028 662 1029 663 **POINTER_LOCATION** ··· 1043 671 1044 672 **SPACING** 1045 673 Whitespace style used in the kernel sources is described in kernel docs. 674 + 1046 675 See: https://www.kernel.org/doc/html/latest/process/coding-style.html#spaces 1047 - 1048 - **SWITCH_CASE_INDENT_LEVEL** 1049 - switch should be at the same indent as case. 1050 - Example:: 1051 - 1052 - switch (suffix) { 1053 - case 'G': 1054 - case 'g': 1055 - mem <<= 30; 1056 - break; 1057 - case 'M': 1058 - case 'm': 1059 - mem <<= 20; 1060 - break; 1061 - case 'K': 1062 - case 'k': 1063 - mem <<= 10; 1064 - /* fall through */ 1065 - default: 1066 - break; 1067 - } 1068 - 1069 - See: https://www.kernel.org/doc/html/latest/process/coding-style.html#indentation 1070 676 1071 677 **TRAILING_WHITESPACE** 1072 678 Trailing whitespace should always be removed. 1073 679 Some editors highlight the trailing whitespace and cause visual 1074 680 distractions when editing files. 681 + 1075 682 See: https://www.kernel.org/doc/html/latest/process/coding-style.html#spaces 683 + 684 + **UNNECESSARY_PARENTHESES** 685 + Parentheses are not required in the following cases: 686 + 687 + 1. Function pointer uses:: 688 + 689 + (foo->bar)(); 690 + 691 + could be:: 692 + 693 + foo->bar(); 694 + 695 + 2. Comparisons in if:: 696 + 697 + if ((foo->bar) && (foo->baz)) 698 + if ((foo == bar)) 699 + 700 + could be:: 701 + 702 + if (foo->bar && foo->baz) 703 + if (foo == bar) 704 + 705 + 3. addressof/dereference single Lvalues:: 706 + 707 + &(foo->bar) 708 + *(foo->bar) 709 + 710 + could be:: 711 + 712 + &foo->bar 713 + *foo->bar 1076 714 1077 715 **WHILE_AFTER_BRACE** 1078 716 while should follow the closing bracket on the same line:: ··· 1105 723 The patch seems to be corrupted or lines are wrapped. 1106 724 Please regenerate the patch file before sending it to the maintainer. 1107 725 726 + **CVS_KEYWORD** 727 + Since linux moved to git, the CVS markers are no longer used. 728 + So, CVS style keywords ($Id$, $Revision$, $Log$) should not be 729 + added. 730 + 731 + **DEFAULT_NO_BREAK** 732 + switch default case is sometimes written as "default:;". This can 733 + cause new cases added below default to be defective. 734 + 735 + A "break;" should be added after empty default statement to avoid 736 + unwanted fallthrough. 737 + 1108 738 **DOS_LINE_ENDINGS** 1109 739 For DOS-formatted patches, there are extra ^M symbols at the end of 1110 740 the line. These should be removed. 1111 741 1112 - **EXECUTE_PERMISSIONS** 1113 - There is no reason for source files to be executable. The executable 1114 - bit can be removed safely. 742 + **DT_SCHEMA_BINDING_PATCH** 743 + DT bindings moved to a json-schema based format instead of 744 + freeform text. 1115 745 1116 - **NON_OCTAL_PERMISSIONS** 1117 - Permission bits should use 4 digit octal permissions (like 0700 or 0444). 1118 - Avoid using any other base like decimal. 746 + See: https://www.kernel.org/doc/html/latest/devicetree/bindings/writing-schema.html 747 + 748 + **DT_SPLIT_BINDING_PATCH** 749 + Devicetree bindings should be their own patch. This is because 750 + bindings are logically independent from a driver implementation, 751 + they have a different maintainer (even though they often 752 + are applied via the same tree), and it makes for a cleaner history in the 753 + DT only tree created with git-filter-branch. 754 + 755 + See: https://www.kernel.org/doc/html/latest/devicetree/bindings/submitting-patches.html#i-for-patch-submitters 756 + 757 + **EMBEDDED_FILENAME** 758 + Embedding the complete filename path inside the file isn't particularly 759 + useful as often the path is moved around and becomes incorrect. 760 + 761 + **FILE_PATH_CHANGES** 762 + Whenever files are added, moved, or deleted, the MAINTAINERS file 763 + patterns can be out of sync or outdated. 764 + 765 + So MAINTAINERS might need updating in these cases. 766 + 767 + **MEMSET** 768 + The memset use appears to be incorrect. This may be caused due to 769 + badly ordered parameters. Please recheck the usage. 1119 770 1120 771 **NOT_UNIFIED_DIFF** 1121 772 The patch file does not appear to be in unified-diff format. Please ··· 1157 742 **PRINTF_0XDECIMAL** 1158 743 Prefixing 0x with decimal output is defective and should be corrected. 1159 744 1160 - **TRAILING_STATEMENTS** 1161 - Trailing statements (for example after any conditional) should be 1162 - on the next line. 1163 - Like:: 745 + **SPDX_LICENSE_TAG** 746 + The source file is missing or has an improper SPDX identifier tag. 747 + The Linux kernel requires the precise SPDX identifier in all source files, 748 + and it is thoroughly documented in the kernel docs. 1164 749 1165 - if (x == y) break; 750 + See: https://www.kernel.org/doc/html/latest/process/license-rules.html 1166 751 1167 - should be:: 1168 - 1169 - if (x == y) 1170 - break; 752 + **TYPO_SPELLING** 753 + Some words may have been misspelled. Consider reviewing them.

+4 -4

Documentation/dev-tools/kunit/api/index.rst

··· 10 10 This section documents the KUnit kernel testing API. It is divided into the 11 11 following sections: 12 12 13 - ================================= ============================================== 14 - :doc:`test` documents all of the standard testing API 15 - excluding mocking or mocking related features. 16 - ================================= ============================================== 13 + Documentation/dev-tools/kunit/api/test.rst 14 + 15 + - documents all of the standard testing API excluding mocking 16 + or mocking related features.

+1 -1

Documentation/dev-tools/kunit/faq.rst

··· 97 97 modules will automatically execute associated tests when loaded. Test results 98 98 can be collected from ``/sys/kernel/debug/kunit/<test suite>/results``, and 99 99 can be parsed with ``kunit.py parse``. For more details, see "KUnit on 100 - non-UML architectures" in :doc:`usage`. 100 + non-UML architectures" in Documentation/dev-tools/kunit/usage.rst. 101 101 102 102 If none of the above tricks help, you are always welcome to email any issues to 103 103 kunit-dev@googlegroups.com.

+7 -7

Documentation/dev-tools/kunit/index.rst

··· 36 36 results. This provides a quick way of running KUnit tests during development, 37 37 without requiring a virtual machine or separate hardware. 38 38 39 - Get started now: :doc:`start` 39 + Get started now: Documentation/dev-tools/kunit/start.rst 40 40 41 41 Why KUnit? 42 42 ========== ··· 88 88 How do I use it? 89 89 ================ 90 90 91 - * :doc:`start` - for new users of KUnit 92 - * :doc:`tips` - for short examples of best practices 93 - * :doc:`usage` - for a more detailed explanation of KUnit features 94 - * :doc:`api/index` - for the list of KUnit APIs used for testing 95 - * :doc:`kunit-tool` - for more information on the kunit_tool helper script 96 - * :doc:`faq` - for answers to some common questions about KUnit 91 + * Documentation/dev-tools/kunit/start.rst - for new users of KUnit 92 + * Documentation/dev-tools/kunit/tips.rst - for short examples of best practices 93 + * Documentation/dev-tools/kunit/usage.rst - for a more detailed explanation of KUnit features 94 + * Documentation/dev-tools/kunit/api/index.rst - for the list of KUnit APIs used for testing 95 + * Documentation/dev-tools/kunit/kunit-tool.rst - for more information on the kunit_tool helper script 96 + * Documentation/dev-tools/kunit/faq.rst - for answers to some common questions about KUnit

+2 -2

Documentation/dev-tools/kunit/start.rst

··· 21 21 ./tools/testing/kunit/kunit.py run 22 22 23 23 For more information on this wrapper (also called kunit_tool) check out the 24 - :doc:`kunit-tool` page. 24 + Documentation/dev-tools/kunit/kunit-tool.rst page. 25 25 26 26 Creating a .kunitconfig 27 27 ----------------------- ··· 234 234 235 235 Next Steps 236 236 ========== 237 - * Check out the :doc:`tips` page for tips on 237 + * Check out the Documentation/dev-tools/kunit/tips.rst page for tips on 238 238 writing idiomatic KUnit tests. 239 239 * Optional: see the :doc:`usage` page for a more 240 240 in-depth explanation of KUnit.

+3 -2

Documentation/dev-tools/kunit/tips.rst

··· 125 125 126 126 127 127 Note: here we're able to get away with using ``test->priv``, but if you wanted 128 - something more flexible you could use a named ``kunit_resource``, see :doc:`api/test`. 128 + something more flexible you could use a named ``kunit_resource``, see 129 + Documentation/dev-tools/kunit/api/test.rst. 129 130 130 131 Failing the current test 131 132 ------------------------ ··· 186 185 187 186 Next Steps 188 187 ========== 189 - * Optional: see the :doc:`usage` page for a more 188 + * Optional: see the Documentation/dev-tools/kunit/usage.rst page for a more 190 189 in-depth explanation of KUnit.

+5 -3

Documentation/dev-tools/kunit/usage.rst

··· 10 10 some basic knowledge of testing. 11 11 12 12 For a high level introduction to KUnit, including setting up KUnit for your 13 - project, see :doc:`start`. 13 + project, see Documentation/dev-tools/kunit/start.rst. 14 14 15 15 Organization of this document 16 16 ============================= ··· 99 99 expectations until the test case ends or is otherwise terminated. This is as 100 100 opposed to *assertions* which are discussed later. 101 101 102 - To learn about more expectations supported by KUnit, see :doc:`api/test`. 102 + To learn about more expectations supported by KUnit, see 103 + Documentation/dev-tools/kunit/api/test.rst. 103 104 104 105 .. note:: 105 106 A single test case should be pretty short, pretty easy to understand, ··· 217 216 after late_init, or when the test module is loaded (depending on whether the 218 217 test was built in or not). 219 218 220 - For more information on these types of things see the :doc:`api/test`. 219 + For more information on these types of things see the 220 + Documentation/dev-tools/kunit/api/test.rst. 221 221 222 222 Common Patterns 223 223 ===============

+8 -8

Documentation/dev-tools/testing-overview.rst

··· 71 71 of code. This is useful for determining how much of the kernel is being tested, 72 72 and for finding corner-cases which are not covered by the appropriate test. 73 73 74 - :doc:`gcov` is GCC's coverage testing tool, which can be used with the kernel 75 - to get global or per-module coverage. Unlike KCOV, it does not record per-task 76 - coverage. Coverage data can be read from debugfs, and interpreted using the 77 - usual gcov tooling. 74 + Documentation/dev-tools/gcov.rst is GCC's coverage testing tool, which can be 75 + used with the kernel to get global or per-module coverage. Unlike KCOV, it 76 + does not record per-task coverage. Coverage data can be read from debugfs, 77 + and interpreted using the usual gcov tooling. 78 78 79 - :doc:`kcov` is a feature which can be built in to the kernel to allow 80 - capturing coverage on a per-task level. It's therefore useful for fuzzing and 81 - other situations where information about code executed during, for example, a 82 - single syscall is useful. 79 + Documentation/dev-tools/kcov.rst is a feature which can be built in to the 80 + kernel to allow capturing coverage on a per-task level. It's therefore useful 81 + for fuzzing and other situations where information about code executed during, 82 + for example, a single syscall is useful. 83 83 84 84 85 85 Dynamic Analysis Tools

+6 -5

Documentation/devicetree/bindings/submitting-patches.rst

··· 7 7 I. For patch submitters 8 8 ======================= 9 9 10 - 0) Normal patch submission rules from Documentation/process/submitting-patches.rst 11 - applies. 10 + 0) Normal patch submission rules from 11 + Documentation/process/submitting-patches.rst applies. 12 12 13 13 1) The Documentation/ and include/dt-bindings/ portion of the patch should 14 14 be a separate patch. The preferred subject prefix for binding patches is:: ··· 25 25 26 26 make dt_binding_check 27 27 28 - See Documentation/devicetree/bindings/writing-schema.rst for more details about 29 - schema and tools setup. 28 + See Documentation/devicetree/bindings/writing-schema.rst for more details 29 + about schema and tools setup. 30 30 31 31 3) DT binding files should be dual licensed. The preferred license tag is 32 32 (GPL-2.0-only OR BSD-2-Clause). ··· 84 84 III. Notes 85 85 ========== 86 86 87 - 0) Please see :doc:`ABI` for details regarding devicetree ABI. 87 + 0) Please see Documentation/devicetree/bindings/ABI.rst for details 88 + regarding devicetree ABI. 88 89 89 90 1) This document is intended as a general familiarization with the process as 90 91 decided at the 2013 Kernel Summit. When in doubt, the current word of the

+4 -4

Documentation/doc-guide/contributing.rst

··· 237 237 a set of "books" that group documentation for specific readers. These 238 238 include: 239 239 240 - - :doc:`../admin-guide/index` 241 - - :doc:`../core-api/index` 242 - - :doc:`../driver-api/index` 243 - - :doc:`../userspace-api/index` 240 + - Documentation/admin-guide/index.rst 241 + - Documentation/core-api/index.rst 242 + - Documentation/driver-api/index.rst 243 + - Documentation/userspace-api/index.rst 244 244 245 245 As well as this book on documentation itself. 246 246

+1 -1

Documentation/driver-api/acpi/linuxized-acpica.rst

··· 276 276 # git clone https://github.com/acpica/acpica 277 277 # git clone https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 278 278 # cd acpica 279 - # generate/linux/divergences.sh -s ../linux 279 + # generate/linux/divergence.sh -s ../linux

+2 -2

Documentation/driver-api/gpio/using-gpio.rst

··· 9 9 10 10 For examples of already existing generic drivers that will also be good 11 11 examples for any other kernel drivers you want to author, refer to 12 - :doc:`drivers-on-gpio` 12 + Documentation/driver-api/gpio/drivers-on-gpio.rst 13 13 14 14 For any kind of mass produced system you want to support, such as servers, 15 15 laptops, phones, tablets, routers, and any consumer or office or business goods 16 16 using appropriate kernel drivers is paramount. Submit your code for inclusion 17 17 in the upstream Linux kernel when you feel it is mature enough and you will get 18 - help to refine it, see :doc:`../../process/submitting-patches`. 18 + help to refine it, see Documentation/process/submitting-patches.rst. 19 19 20 20 In Linux GPIO lines also have a userspace ABI. 21 21

+5 -5

Documentation/driver-api/ioctl.rst

··· 25 25 with the correct parameters: 26 26 27 27 _IO/_IOR/_IOW/_IOWR 28 - The macro name specifies how the argument will be used. It may be a 28 + The macro name specifies how the argument will be used. It may be a 29 29 pointer to data to be passed into the kernel (_IOW), out of the kernel 30 - (_IOR), or both (_IOWR). _IO can indicate either commands with no 30 + (_IOR), or both (_IOWR). _IO can indicate either commands with no 31 31 argument or those passing an integer value instead of a pointer. 32 32 It is recommended to only use _IO for commands without arguments, 33 33 and use pointers for passing data. 34 34 35 35 type 36 36 An 8-bit number, often a character literal, specific to a subsystem 37 - or driver, and listed in :doc:`../userspace-api/ioctl/ioctl-number` 37 + or driver, and listed in Documentation/userspace-api/ioctl/ioctl-number.rst 38 38 39 39 nr 40 40 An 8-bit number identifying the specific command, unique for a give ··· 200 200 space layout randomization (KASLR), helping in an attack. 201 201 202 202 For this reason (and for compat support) it is best to avoid any 203 - implicit padding in data structures. Where there is implicit padding 203 + implicit padding in data structures. Where there is implicit padding 204 204 in an existing structure, kernel drivers must be careful to fully 205 205 initialize an instance of the structure before copying it to user 206 - space. This is usually done by calling memset() before assigning to 206 + space. This is usually done by calling memset() before assigning to 207 207 individual members. 208 208 209 209 Subsystem abstractions

+4 -4

Documentation/driver-api/pm/devices.rst

··· 217 217 flag is clear. 218 218 219 219 For more information about the runtime power management framework, refer to 220 - :file:`Documentation/power/runtime_pm.rst`. 220 + Documentation/power/runtime_pm.rst. 221 221 222 222 223 223 Calling Drivers to Enter and Leave System Sleep States ··· 655 655 actions that either require user space to be available, or at least won't 656 656 interfere with user space. 657 657 658 - For details refer to :doc:`notifiers`. 658 + For details refer to Documentation/driver-api/pm/notifiers.rst. 659 659 660 660 661 661 Device Low-Power (suspend) States ··· 726 726 727 727 Devices may be defined as IRQ-safe which indicates to the PM core that their 728 728 runtime PM callbacks may be invoked with disabled interrupts (see 729 - :file:`Documentation/power/runtime_pm.rst` for more information). If an 729 + Documentation/power/runtime_pm.rst for more information). If an 730 730 IRQ-safe device belongs to a PM domain, the runtime PM of the domain will be 731 731 disallowed, unless the domain itself is defined as IRQ-safe. However, it 732 732 makes sense to define a PM domain as IRQ-safe only if all the devices in it ··· 805 805 -------------------------------------------- 806 806 807 807 During system-wide resume from a sleep state it's easiest to put devices into 808 - the full-power state, as explained in :file:`Documentation/power/runtime_pm.rst`. 808 + the full-power state, as explained in Documentation/power/runtime_pm.rst. 809 809 [Refer to that document for more information regarding this particular issue as 810 810 well as for information on the device runtime power management framework in 811 811 general.] However, it often is desirable to leave devices in suspend after

+2 -1

Documentation/driver-api/surface_aggregator/clients/index.rst

··· 5 5 =========================== 6 6 7 7 This is the documentation for client drivers themselves. Refer to 8 - :doc:`../client` for documentation on how to write client drivers. 8 + Documentation/driver-api/surface_aggregator/client.rst for documentation 9 + on how to write client drivers. 9 10 10 11 .. toctree:: 11 12 :maxdepth: 1

+8 -7

Documentation/driver-api/surface_aggregator/internal.rst

··· 87 87 implemented as platform devices, via |ssam_device| and |ssam_device_driver| 88 88 simplify management of client devices and client drivers. 89 89 90 - Refer to :doc:`client` for documentation regarding the client device/driver 91 - API and interface options for other kernel drivers. It is recommended to 92 - familiarize oneself with that chapter and the :doc:`ssh` before continuing 93 - with the architectural overview below. 90 + Refer to Documentation/driver-api/surface_aggregator/client.rst for 91 + documentation regarding the client device/driver API and interface options 92 + for other kernel drivers. It is recommended to familiarize oneself with 93 + that chapter and the Documentation/driver-api/surface_aggregator/ssh.rst 94 + before continuing with the architectural overview below. 94 95 95 96 96 97 Packet Transport Layer ··· 191 190 192 191 Transmission of sequenced packets is limited by the number of concurrently 193 192 pending packets, i.e. a limit on how many packets may be waiting for an ACK 194 - from the EC in parallel. This limit is currently set to one (see :doc:`ssh` 195 - for the reasoning behind this). Control packets (i.e. ACK and NAK) can 196 - always be transmitted. 193 + from the EC in parallel. This limit is currently set to one (see 194 + Documentation/driver-api/surface_aggregator/ssh.rst for the reasoning behind 195 + this). Control packets (i.e. ACK and NAK) can always be transmitted. 197 196 198 197 Receiver Thread 199 198 ---------------

+4 -2

Documentation/driver-api/surface_aggregator/overview.rst

··· 73 73 without response as commands. In general, events need to be enabled via one 74 74 of multiple dedicated requests before they are sent by the EC. 75 75 76 - See :doc:`ssh` for a more technical protocol documentation and 77 - :doc:`internal` for an overview of the internal driver architecture. 76 + See Documentation/driver-api/surface_aggregator/ssh.rst for a 77 + more technical protocol documentation and 78 + Documentation/driver-api/surface_aggregator/internal.rst for an 79 + overview of the internal driver architecture.

+3 -3

Documentation/driver-api/usb/dma.rst

··· 10 10 11 11 The big picture is that USB drivers can continue to ignore most DMA issues, 12 12 though they still must provide DMA-ready buffers (see 13 - :doc:`/core-api/dma-api-howto`). That's how they've worked through 13 + Documentation/core-api/dma-api-howto.rst). That's how they've worked through 14 14 the 2.4 (and earlier) kernels, or they can now be DMA-aware. 15 15 16 16 DMA-aware usb drivers: ··· 60 60 force a consistent memory access ordering by using memory barriers. It's 61 61 not using a streaming DMA mapping, so it's good for small transfers on 62 62 systems where the I/O would otherwise thrash an IOMMU mapping. (See 63 - :doc:`/core-api/dma-api-howto` for definitions of "coherent" and 63 + Documentation/core-api/dma-api-howto.rst for definitions of "coherent" and 64 64 "streaming" DMA mappings.) 65 65 66 66 Asking for 1/Nth of a page (as well as asking for N pages) is reasonably ··· 91 91 Existing buffers aren't usable for DMA without first being mapped into the 92 92 DMA address space of the device. However, most buffers passed to your 93 93 driver can safely be used with such DMA mapping. (See the first section 94 - of :doc:`/core-api/dma-api-howto`, titled "What memory is DMA-able?") 94 + of Documentation/core-api/dma-api-howto.rst, titled "What memory is DMA-able?") 95 95 96 96 - When you're using scatterlists, you can map everything at once. On some 97 97 systems, this kicks in an IOMMU and turns the scatterlists into single

+14 -10

Documentation/fault-injection/fault-injection.rst

··· 78 78 79 79 - /sys/kernel/debug/fail*/times: 80 80 81 - specifies how many times failures may happen at most. 82 - A value of -1 means "no limit". 81 + specifies how many times failures may happen at most. A value of -1 82 + means "no limit". Note, though, that this file only accepts unsigned 83 + values. So, if you want to specify -1, you better use 'printf' instead 84 + of 'echo', e.g.: $ printf %#x -1 > times 83 85 84 86 - /sys/kernel/debug/fail*/space: 85 87 ··· 169 167 - ERRNO: retval must be -1 to -MAX_ERRNO (-4096). 170 168 - ERR_NULL: retval must be 0 or -1 to -MAX_ERRNO (-4096). 171 169 172 - - /sys/kernel/debug/fail_function/<functiuon-name>/retval: 170 + - /sys/kernel/debug/fail_function/<function-name>/retval: 173 171 174 - specifies the "error" return value to inject to the given 175 - function for given function. This will be created when 176 - user specifies new injection entry. 172 + specifies the "error" return value to inject to the given function. 173 + This will be created when the user specifies a new injection entry. 174 + Note that this file only accepts unsigned values. So, if you want to 175 + use a negative errno, you better use 'printf' instead of 'echo', e.g.: 176 + $ printf %#x -12 > retval 177 177 178 178 Boot option 179 179 ^^^^^^^^^^^ ··· 259 255 echo Y > /sys/kernel/debug/$FAILTYPE/task-filter 260 256 echo 10 > /sys/kernel/debug/$FAILTYPE/probability 261 257 echo 100 > /sys/kernel/debug/$FAILTYPE/interval 262 - echo -1 > /sys/kernel/debug/$FAILTYPE/times 258 + printf %#x -1 > /sys/kernel/debug/$FAILTYPE/times 263 259 echo 0 > /sys/kernel/debug/$FAILTYPE/space 264 260 echo 2 > /sys/kernel/debug/$FAILTYPE/verbose 265 261 echo 1 > /sys/kernel/debug/$FAILTYPE/ignore-gfp-wait ··· 313 309 echo N > /sys/kernel/debug/$FAILTYPE/task-filter 314 310 echo 10 > /sys/kernel/debug/$FAILTYPE/probability 315 311 echo 100 > /sys/kernel/debug/$FAILTYPE/interval 316 - echo -1 > /sys/kernel/debug/$FAILTYPE/times 312 + printf %#x -1 > /sys/kernel/debug/$FAILTYPE/times 317 313 echo 0 > /sys/kernel/debug/$FAILTYPE/space 318 314 echo 2 > /sys/kernel/debug/$FAILTYPE/verbose 319 315 echo 1 > /sys/kernel/debug/$FAILTYPE/ignore-gfp-wait ··· 340 336 FAILTYPE=fail_function 341 337 FAILFUNC=open_ctree 342 338 echo $FAILFUNC > /sys/kernel/debug/$FAILTYPE/inject 343 - echo -12 > /sys/kernel/debug/$FAILTYPE/$FAILFUNC/retval 339 + printf %#x -12 > /sys/kernel/debug/$FAILTYPE/$FAILFUNC/retval 344 340 echo N > /sys/kernel/debug/$FAILTYPE/task-filter 345 341 echo 100 > /sys/kernel/debug/$FAILTYPE/probability 346 342 echo 0 > /sys/kernel/debug/$FAILTYPE/interval 347 - echo -1 > /sys/kernel/debug/$FAILTYPE/times 343 + printf %#x -1 > /sys/kernel/debug/$FAILTYPE/times 348 344 echo 0 > /sys/kernel/debug/$FAILTYPE/space 349 345 echo 1 > /sys/kernel/debug/$FAILTYPE/verbose 350 346

+291

Documentation/filesystems/dax.rst

··· 1 + ======================= 2 + Direct Access for files 3 + ======================= 4 + 5 + Motivation 6 + ---------- 7 + 8 + The page cache is usually used to buffer reads and writes to files. 9 + It is also used to provide the pages which are mapped into userspace 10 + by a call to mmap. 11 + 12 + For block devices that are memory-like, the page cache pages would be 13 + unnecessary copies of the original storage. The `DAX` code removes the 14 + extra copy by performing reads and writes directly to the storage device. 15 + For file mappings, the storage device is mapped directly into userspace. 16 + 17 + 18 + Usage 19 + ----- 20 + 21 + If you have a block device which supports `DAX`, you can make a filesystem 22 + on it as usual. The `DAX` code currently only supports files with a block 23 + size equal to your kernel's `PAGE_SIZE`, so you may need to specify a block 24 + size when creating the filesystem. 25 + 26 + Currently 3 filesystems support `DAX`: ext2, ext4 and xfs. Enabling `DAX` on them 27 + is different. 28 + 29 + Enabling DAX on ext2 30 + -------------------- 31 + 32 + When mounting the filesystem, use the ``-o dax`` option on the command line or 33 + add 'dax' to the options in ``/etc/fstab``. This works to enable `DAX` on all files 34 + within the filesystem. It is equivalent to the ``-o dax=always`` behavior below. 35 + 36 + 37 + Enabling DAX on xfs and ext4 38 + ---------------------------- 39 + 40 + Summary 41 + ------- 42 + 43 + 1. There exists an in-kernel file access mode flag `S_DAX` that corresponds to 44 + the statx flag `STATX_ATTR_DAX`. See the manpage for statx(2) for details 45 + about this access mode. 46 + 47 + 2. There exists a persistent flag `FS_XFLAG_DAX` that can be applied to regular 48 + files and directories. This advisory flag can be set or cleared at any 49 + time, but doing so does not immediately affect the `S_DAX` state. 50 + 51 + 3. If the persistent `FS_XFLAG_DAX` flag is set on a directory, this flag will 52 + be inherited by all regular files and subdirectories that are subsequently 53 + created in this directory. Files and subdirectories that exist at the time 54 + this flag is set or cleared on the parent directory are not modified by 55 + this modification of the parent directory. 56 + 57 + 4. There exist dax mount options which can override `FS_XFLAG_DAX` in the 58 + setting of the `S_DAX` flag. Given underlying storage which supports `DAX` the 59 + following hold: 60 + 61 + ``-o dax=inode`` means "follow `FS_XFLAG_DAX`" and is the default. 62 + 63 + ``-o dax=never`` means "never set `S_DAX`, ignore `FS_XFLAG_DAX`." 64 + 65 + ``-o dax=always`` means "always set `S_DAX` ignore `FS_XFLAG_DAX`." 66 + 67 + ``-o dax`` is a legacy option which is an alias for ``dax=always``. 68 + 69 + .. warning:: 70 + 71 + The option ``-o dax`` may be removed in the future so ``-o dax=always`` is 72 + the preferred method for specifying this behavior. 73 + 74 + .. note:: 75 + 76 + Modifications to and the inheritance behavior of `FS_XFLAG_DAX` remain 77 + the same even when the filesystem is mounted with a dax option. However, 78 + in-core inode state (`S_DAX`) will be overridden until the filesystem is 79 + remounted with dax=inode and the inode is evicted from kernel memory. 80 + 81 + 5. The `S_DAX` policy can be changed via: 82 + 83 + a) Setting the parent directory `FS_XFLAG_DAX` as needed before files are 84 + created 85 + 86 + b) Setting the appropriate dax="foo" mount option 87 + 88 + c) Changing the `FS_XFLAG_DAX` flag on existing regular files and 89 + directories. This has runtime constraints and limitations that are 90 + described in 6) below. 91 + 92 + 6. When changing the `S_DAX` policy via toggling the persistent `FS_XFLAG_DAX` 93 + flag, the change to existing regular files won't take effect until the 94 + files are closed by all processes. 95 + 96 + 97 + Details 98 + ------- 99 + 100 + There are 2 per-file dax flags. One is a persistent inode setting (`FS_XFLAG_DAX`) 101 + and the other is a volatile flag indicating the active state of the feature 102 + (`S_DAX`). 103 + 104 + `FS_XFLAG_DAX` is preserved within the filesystem. This persistent config 105 + setting can be set, cleared and/or queried using the `FS_IOC_FS`[`GS`]`ETXATTR` ioctl 106 + (see ioctl_xfs_fsgetxattr(2)) or an utility such as 'xfs_io'. 107 + 108 + New files and directories automatically inherit `FS_XFLAG_DAX` from 109 + their parent directory **when created**. Therefore, setting `FS_XFLAG_DAX` at 110 + directory creation time can be used to set a default behavior for an entire 111 + sub-tree. 112 + 113 + To clarify inheritance, here are 3 examples: 114 + 115 + Example A: 116 + 117 + .. code-block:: shell 118 + 119 + mkdir -p a/b/c 120 + xfs_io -c 'chattr +x' a 121 + mkdir a/b/c/d 122 + mkdir a/e 123 + 124 + ------[outcome]------ 125 + 126 + dax: a,e 127 + no dax: b,c,d 128 + 129 + Example B: 130 + 131 + .. code-block:: shell 132 + 133 + mkdir a 134 + xfs_io -c 'chattr +x' a 135 + mkdir -p a/b/c/d 136 + 137 + ------[outcome]------ 138 + 139 + dax: a,b,c,d 140 + no dax: 141 + 142 + Example C: 143 + 144 + .. code-block:: shell 145 + 146 + mkdir -p a/b/c 147 + xfs_io -c 'chattr +x' c 148 + mkdir a/b/c/d 149 + 150 + ------[outcome]------ 151 + 152 + dax: c,d 153 + no dax: a,b 154 + 155 + The current enabled state (`S_DAX`) is set when a file inode is instantiated in 156 + memory by the kernel. It is set based on the underlying media support, the 157 + value of `FS_XFLAG_DAX` and the filesystem's dax mount option. 158 + 159 + statx can be used to query `S_DAX`. 160 + 161 + .. note:: 162 + 163 + That only regular files will ever have `S_DAX` set and therefore statx 164 + will never indicate that `S_DAX` is set on directories. 165 + 166 + Setting the `FS_XFLAG_DAX` flag (specifically or through inheritance) occurs even 167 + if the underlying media does not support dax and/or the filesystem is 168 + overridden with a mount option. 169 + 170 + 171 + Implementation Tips for Block Driver Writers 172 + -------------------------------------------- 173 + 174 + To support `DAX` in your block driver, implement the 'direct_access' 175 + block device operation. It is used to translate the sector number 176 + (expressed in units of 512-byte sectors) to a page frame number (pfn) 177 + that identifies the physical page for the memory. It also returns a 178 + kernel virtual address that can be used to access the memory. 179 + 180 + The direct_access method takes a 'size' parameter that indicates the 181 + number of bytes being requested. The function should return the number 182 + of bytes that can be contiguously accessed at that offset. It may also 183 + return a negative errno if an error occurs. 184 + 185 + In order to support this method, the storage must be byte-accessible by 186 + the CPU at all times. If your device uses paging techniques to expose 187 + a large amount of memory through a smaller window, then you cannot 188 + implement direct_access. Equally, if your device can occasionally 189 + stall the CPU for an extended period, you should also not attempt to 190 + implement direct_access. 191 + 192 + These block devices may be used for inspiration: 193 + - brd: RAM backed block device driver 194 + - dcssblk: s390 dcss block device driver 195 + - pmem: NVDIMM persistent memory driver 196 + 197 + 198 + Implementation Tips for Filesystem Writers 199 + ------------------------------------------ 200 + 201 + Filesystem support consists of: 202 + 203 + * Adding support to mark inodes as being `DAX` by setting the `S_DAX` flag in 204 + i_flags 205 + * Implementing ->read_iter and ->write_iter operations which use 206 + :c:func:`dax_iomap_rw()` when inode has `S_DAX` flag set 207 + * Implementing an mmap file operation for `DAX` files which sets the 208 + `VM_MIXEDMAP` and `VM_HUGEPAGE` flags on the `VMA`, and setting the vm_ops to 209 + include handlers for fault, pmd_fault, page_mkwrite, pfn_mkwrite. These 210 + handlers should probably call :c:func:`dax_iomap_fault()` passing the 211 + appropriate fault size and iomap operations. 212 + * Calling :c:func:`iomap_zero_range()` passing appropriate iomap operations 213 + instead of :c:func:`block_truncate_page()` for `DAX` files 214 + * Ensuring that there is sufficient locking between reads, writes, 215 + truncates and page faults 216 + 217 + The iomap handlers for allocating blocks must make sure that allocated blocks 218 + are zeroed out and converted to written extents before being returned to avoid 219 + exposure of uninitialized data through mmap. 220 + 221 + These filesystems may be used for inspiration: 222 + 223 + .. seealso:: 224 + 225 + ext2: see Documentation/filesystems/ext2.rst 226 + 227 + .. seealso:: 228 + 229 + xfs: see Documentation/admin-guide/xfs.rst 230 + 231 + .. seealso:: 232 + 233 + ext4: see Documentation/filesystems/ext4/ 234 + 235 + 236 + Handling Media Errors 237 + --------------------- 238 + 239 + The libnvdimm subsystem stores a record of known media error locations for 240 + each pmem block device (in gendisk->badblocks). If we fault at such location, 241 + or one with a latent error not yet discovered, the application can expect 242 + to receive a `SIGBUS`. Libnvdimm also allows clearing of these errors by simply 243 + writing the affected sectors (through the pmem driver, and if the underlying 244 + NVDIMM supports the clear_poison DSM defined by ACPI). 245 + 246 + Since `DAX` IO normally doesn't go through the ``driver/bio`` path, applications or 247 + sysadmins have an option to restore the lost data from a prior ``backup/inbuilt`` 248 + redundancy in the following ways: 249 + 250 + 1. Delete the affected file, and restore from a backup (sysadmin route): 251 + This will free the filesystem blocks that were being used by the file, 252 + and the next time they're allocated, they will be zeroed first, which 253 + happens through the driver, and will clear bad sectors. 254 + 255 + 2. Truncate or hole-punch the part of the file that has a bad-block (at least 256 + an entire aligned sector has to be hole-punched, but not necessarily an 257 + entire filesystem block). 258 + 259 + These are the two basic paths that allow `DAX` filesystems to continue operating 260 + in the presence of media errors. More robust error recovery mechanisms can be 261 + built on top of this in the future, for example, involving redundancy/mirroring 262 + provided at the block layer through DM, or additionally, at the filesystem 263 + level. These would have to rely on the above two tenets, that error clearing 264 + can happen either by sending an IO through the driver, or zeroing (also through 265 + the driver). 266 + 267 + 268 + Shortcomings 269 + ------------ 270 + 271 + Even if the kernel or its modules are stored on a filesystem that supports 272 + `DAX` on a block device that supports `DAX`, they will still be copied into RAM. 273 + 274 + The DAX code does not work correctly on architectures which have virtually 275 + mapped caches such as ARM, MIPS and SPARC. 276 + 277 + Calling :c:func:`get_user_pages()` on a range of user memory that has been 278 + mmaped from a `DAX` file will fail when there are no 'struct page' to describe 279 + those pages. This problem has been addressed in some device drivers 280 + by adding optional struct page support for pages under the control of 281 + the driver (see `CONFIG_NVDIMM_PFN` in ``drivers/nvdimm`` for an example of 282 + how to do this). In the non struct page cases `O_DIRECT` reads/writes to 283 + those memory ranges from a non-`DAX` file will fail 284 + 285 + 286 + .. note:: 287 + 288 + `O_DIRECT` reads/writes _of a `DAX` file do work, it is the memory that 289 + is being accessed that is key here). Other things that will not work in 290 + the non struct page case include RDMA, :c:func:`sendfile()` and 291 + :c:func:`splice()`.

-257

Documentation/filesystems/dax.txt

··· 1 - Direct Access for files 2 - ----------------------- 3 - 4 - Motivation 5 - ---------- 6 - 7 - The page cache is usually used to buffer reads and writes to files. 8 - It is also used to provide the pages which are mapped into userspace 9 - by a call to mmap. 10 - 11 - For block devices that are memory-like, the page cache pages would be 12 - unnecessary copies of the original storage. The DAX code removes the 13 - extra copy by performing reads and writes directly to the storage device. 14 - For file mappings, the storage device is mapped directly into userspace. 15 - 16 - 17 - Usage 18 - ----- 19 - 20 - If you have a block device which supports DAX, you can make a filesystem 21 - on it as usual. The DAX code currently only supports files with a block 22 - size equal to your kernel's PAGE_SIZE, so you may need to specify a block 23 - size when creating the filesystem. 24 - 25 - Currently 3 filesystems support DAX: ext2, ext4 and xfs. Enabling DAX on them 26 - is different. 27 - 28 - Enabling DAX on ext2 29 - ----------------------------- 30 - 31 - When mounting the filesystem, use the "-o dax" option on the command line or 32 - add 'dax' to the options in /etc/fstab. This works to enable DAX on all files 33 - within the filesystem. It is equivalent to the '-o dax=always' behavior below. 34 - 35 - 36 - Enabling DAX on xfs and ext4 37 - ---------------------------- 38 - 39 - Summary 40 - ------- 41 - 42 - 1. There exists an in-kernel file access mode flag S_DAX that corresponds to 43 - the statx flag STATX_ATTR_DAX. See the manpage for statx(2) for details 44 - about this access mode. 45 - 46 - 2. There exists a persistent flag FS_XFLAG_DAX that can be applied to regular 47 - files and directories. This advisory flag can be set or cleared at any 48 - time, but doing so does not immediately affect the S_DAX state. 49 - 50 - 3. If the persistent FS_XFLAG_DAX flag is set on a directory, this flag will 51 - be inherited by all regular files and subdirectories that are subsequently 52 - created in this directory. Files and subdirectories that exist at the time 53 - this flag is set or cleared on the parent directory are not modified by 54 - this modification of the parent directory. 55 - 56 - 4. There exist dax mount options which can override FS_XFLAG_DAX in the 57 - setting of the S_DAX flag. Given underlying storage which supports DAX the 58 - following hold: 59 - 60 - "-o dax=inode" means "follow FS_XFLAG_DAX" and is the default. 61 - 62 - "-o dax=never" means "never set S_DAX, ignore FS_XFLAG_DAX." 63 - 64 - "-o dax=always" means "always set S_DAX ignore FS_XFLAG_DAX." 65 - 66 - "-o dax" is a legacy option which is an alias for "dax=always". 67 - This may be removed in the future so "-o dax=always" is 68 - the preferred method for specifying this behavior. 69 - 70 - NOTE: Modifications to and the inheritance behavior of FS_XFLAG_DAX remain 71 - the same even when the filesystem is mounted with a dax option. However, 72 - in-core inode state (S_DAX) will be overridden until the filesystem is 73 - remounted with dax=inode and the inode is evicted from kernel memory. 74 - 75 - 5. The S_DAX policy can be changed via: 76 - 77 - a) Setting the parent directory FS_XFLAG_DAX as needed before files are 78 - created 79 - 80 - b) Setting the appropriate dax="foo" mount option 81 - 82 - c) Changing the FS_XFLAG_DAX flag on existing regular files and 83 - directories. This has runtime constraints and limitations that are 84 - described in 6) below. 85 - 86 - 6. When changing the S_DAX policy via toggling the persistent FS_XFLAG_DAX 87 - flag, the change to existing regular files won't take effect until the 88 - files are closed by all processes. 89 - 90 - 91 - Details 92 - ------- 93 - 94 - There are 2 per-file dax flags. One is a persistent inode setting (FS_XFLAG_DAX) 95 - and the other is a volatile flag indicating the active state of the feature 96 - (S_DAX). 97 - 98 - FS_XFLAG_DAX is preserved within the filesystem. This persistent config 99 - setting can be set, cleared and/or queried using the FS_IOC_FS[GS]ETXATTR ioctl 100 - (see ioctl_xfs_fsgetxattr(2)) or an utility such as 'xfs_io'. 101 - 102 - New files and directories automatically inherit FS_XFLAG_DAX from 103 - their parent directory _when_ _created_. Therefore, setting FS_XFLAG_DAX at 104 - directory creation time can be used to set a default behavior for an entire 105 - sub-tree. 106 - 107 - To clarify inheritance, here are 3 examples: 108 - 109 - Example A: 110 - 111 - mkdir -p a/b/c 112 - xfs_io -c 'chattr +x' a 113 - mkdir a/b/c/d 114 - mkdir a/e 115 - 116 - dax: a,e 117 - no dax: b,c,d 118 - 119 - Example B: 120 - 121 - mkdir a 122 - xfs_io -c 'chattr +x' a 123 - mkdir -p a/b/c/d 124 - 125 - dax: a,b,c,d 126 - no dax: 127 - 128 - Example C: 129 - 130 - mkdir -p a/b/c 131 - xfs_io -c 'chattr +x' c 132 - mkdir a/b/c/d 133 - 134 - dax: c,d 135 - no dax: a,b 136 - 137 - 138 - The current enabled state (S_DAX) is set when a file inode is instantiated in 139 - memory by the kernel. It is set based on the underlying media support, the 140 - value of FS_XFLAG_DAX and the filesystem's dax mount option. 141 - 142 - statx can be used to query S_DAX. NOTE that only regular files will ever have 143 - S_DAX set and therefore statx will never indicate that S_DAX is set on 144 - directories. 145 - 146 - Setting the FS_XFLAG_DAX flag (specifically or through inheritance) occurs even 147 - if the underlying media does not support dax and/or the filesystem is 148 - overridden with a mount option. 149 - 150 - 151 - 152 - Implementation Tips for Block Driver Writers 153 - -------------------------------------------- 154 - 155 - To support DAX in your block driver, implement the 'direct_access' 156 - block device operation. It is used to translate the sector number 157 - (expressed in units of 512-byte sectors) to a page frame number (pfn) 158 - that identifies the physical page for the memory. It also returns a 159 - kernel virtual address that can be used to access the memory. 160 - 161 - The direct_access method takes a 'size' parameter that indicates the 162 - number of bytes being requested. The function should return the number 163 - of bytes that can be contiguously accessed at that offset. It may also 164 - return a negative errno if an error occurs. 165 - 166 - In order to support this method, the storage must be byte-accessible by 167 - the CPU at all times. If your device uses paging techniques to expose 168 - a large amount of memory through a smaller window, then you cannot 169 - implement direct_access. Equally, if your device can occasionally 170 - stall the CPU for an extended period, you should also not attempt to 171 - implement direct_access. 172 - 173 - These block devices may be used for inspiration: 174 - - brd: RAM backed block device driver 175 - - dcssblk: s390 dcss block device driver 176 - - pmem: NVDIMM persistent memory driver 177 - 178 - 179 - Implementation Tips for Filesystem Writers 180 - ------------------------------------------ 181 - 182 - Filesystem support consists of 183 - - adding support to mark inodes as being DAX by setting the S_DAX flag in 184 - i_flags 185 - - implementing ->read_iter and ->write_iter operations which use dax_iomap_rw() 186 - when inode has S_DAX flag set 187 - - implementing an mmap file operation for DAX files which sets the 188 - VM_MIXEDMAP and VM_HUGEPAGE flags on the VMA, and setting the vm_ops to 189 - include handlers for fault, pmd_fault, page_mkwrite, pfn_mkwrite. These 190 - handlers should probably call dax_iomap_fault() passing the appropriate 191 - fault size and iomap operations. 192 - - calling iomap_zero_range() passing appropriate iomap operations instead of 193 - block_truncate_page() for DAX files 194 - - ensuring that there is sufficient locking between reads, writes, 195 - truncates and page faults 196 - 197 - The iomap handlers for allocating blocks must make sure that allocated blocks 198 - are zeroed out and converted to written extents before being returned to avoid 199 - exposure of uninitialized data through mmap. 200 - 201 - These filesystems may be used for inspiration: 202 - - ext2: see Documentation/filesystems/ext2.rst 203 - - ext4: see Documentation/filesystems/ext4/ 204 - - xfs: see Documentation/admin-guide/xfs.rst 205 - 206 - 207 - Handling Media Errors 208 - --------------------- 209 - 210 - The libnvdimm subsystem stores a record of known media error locations for 211 - each pmem block device (in gendisk->badblocks). If we fault at such location, 212 - or one with a latent error not yet discovered, the application can expect 213 - to receive a SIGBUS. Libnvdimm also allows clearing of these errors by simply 214 - writing the affected sectors (through the pmem driver, and if the underlying 215 - NVDIMM supports the clear_poison DSM defined by ACPI). 216 - 217 - Since DAX IO normally doesn't go through the driver/bio path, applications or 218 - sysadmins have an option to restore the lost data from a prior backup/inbuilt 219 - redundancy in the following ways: 220 - 221 - 1. Delete the affected file, and restore from a backup (sysadmin route): 222 - This will free the filesystem blocks that were being used by the file, 223 - and the next time they're allocated, they will be zeroed first, which 224 - happens through the driver, and will clear bad sectors. 225 - 226 - 2. Truncate or hole-punch the part of the file that has a bad-block (at least 227 - an entire aligned sector has to be hole-punched, but not necessarily an 228 - entire filesystem block). 229 - 230 - These are the two basic paths that allow DAX filesystems to continue operating 231 - in the presence of media errors. More robust error recovery mechanisms can be 232 - built on top of this in the future, for example, involving redundancy/mirroring 233 - provided at the block layer through DM, or additionally, at the filesystem 234 - level. These would have to rely on the above two tenets, that error clearing 235 - can happen either by sending an IO through the driver, or zeroing (also through 236 - the driver). 237 - 238 - 239 - Shortcomings 240 - ------------ 241 - 242 - Even if the kernel or its modules are stored on a filesystem that supports 243 - DAX on a block device that supports DAX, they will still be copied into RAM. 244 - 245 - The DAX code does not work correctly on architectures which have virtually 246 - mapped caches such as ARM, MIPS and SPARC. 247 - 248 - Calling get_user_pages() on a range of user memory that has been mmaped 249 - from a DAX file will fail when there are no 'struct page' to describe 250 - those pages. This problem has been addressed in some device drivers 251 - by adding optional struct page support for pages under the control of 252 - the driver (see CONFIG_NVDIMM_PFN in drivers/nvdimm for an example of 253 - how to do this). In the non struct page cases O_DIRECT reads/writes to 254 - those memory ranges from a non-DAX file will fail (note that O_DIRECT 255 - reads/writes _of a DAX file_ do work, it is the memory that is being 256 - accessed that is key here). Other things that will not work in the 257 - non struct page case include RDMA, sendfile() and splice().

+1 -1

Documentation/filesystems/ext2.rst

··· 25 25 (check=normal and check=strict options removed) 26 26 27 27 dax Use direct access (no page cache). See 28 - Documentation/filesystems/dax.txt. 28 + Documentation/filesystems/dax.rst. 29 29 30 30 debug Extra debugging information is sent to the 31 31 kernel syslog. Useful for developers.

+1 -1

Documentation/filesystems/ext4/blockgroup.rst

··· 84 84 descriptors copies are kept in the first block group. Given the default 85 85 128MiB(2^27 bytes) block group size and 64-byte group descriptors, ext4 86 86 can have at most 2^27/64 = 2^21 block groups. This limits the entire 87 - filesystem size to 2^21 ∗ 2^27 = 2^48bytes or 256TiB. 87 + filesystem size to 2^21 * 2^27 = 2^48bytes or 256TiB. 88 88 89 89 The solution to this problem is to use the metablock group feature 90 90 (META\_BG), which is already in ext3 for all 2.6 releases. With the

+1

Documentation/filesystems/index.rst

··· 77 77 coda 78 78 configfs 79 79 cramfs 80 + dax 80 81 debugfs 81 82 dlmfs 82 83 ecryptfs

+81 -105

Documentation/filesystems/path-lookup.rst

··· 448 448 filesystem to revalidate the result if it is that sort of filesystem. 449 449 If that doesn't get a good result, it calls "``lookup_slow()``" which 450 450 takes ``i_rwsem``, rechecks the cache, and then asks the filesystem 451 - to find a definitive answer. Each of these will call 452 - ``follow_managed()`` (as described below) to handle any mount points. 451 + to find a definitive answer. 453 452 454 - In the absence of symbolic links, ``walk_component()`` creates a new 455 - ``struct path`` containing a counted reference to the new dentry and a 456 - reference to the new ``vfsmount`` which is only counted if it is 457 - different from the previous ``vfsmount``. It then calls 458 - ``path_to_nameidata()`` to install the new ``struct path`` in the 459 - ``struct nameidata`` and drop the unneeded references. 453 + As the last step of walk_component(), step_into() will be called either 454 + directly from walk_component() or from handle_dots(). It calls 455 + handle_mounts(), to check and handle mount points, in which a new 456 + ``struct path`` is created containing a counted reference to the new dentry and 457 + a reference to the new ``vfsmount`` which is only counted if it is 458 + different from the previous ``vfsmount``. Then if there is 459 + a symbolic link, step_into() calls pick_link() to deal with it, 460 + otherwise it installs the new ``struct path`` in the ``struct nameidata``, and 461 + drops the unneeded references. 460 462 461 463 This "hand-over-hand" sequencing of getting a reference to the new 462 464 dentry before dropping the reference to the previous dentry may ··· 472 470 ``nd->last_type`` to refer to the final component of the path. It does 473 471 not call ``walk_component()`` that last time. Handling that final 474 472 component remains for the caller to sort out. Those callers are 475 - ``path_lookupat()``, ``path_parentat()``, ``path_mountpoint()`` and 476 - ``path_openat()`` each of which handles the differing requirements of 473 + path_lookupat(), path_parentat() and 474 + path_openat() each of which handles the differing requirements of 477 475 different system calls. 478 476 479 477 ``path_parentat()`` is clearly the simplest - it just wraps a little bit ··· 488 486 object is wanted such as by ``stat()`` or ``chmod()``. It essentially just 489 487 calls ``walk_component()`` on the final component through a call to 490 488 ``lookup_last()``. ``path_lookupat()`` returns just the final dentry. 491 - 492 - ``path_mountpoint()`` handles the special case of unmounting which must 493 - not try to revalidate the mounted filesystem. It effectively 494 - contains, through a call to ``mountpoint_last()``, an alternate 495 - implementation of ``lookup_slow()`` which skips that step. This is 496 - important when unmounting a filesystem that is inaccessible, such as 489 + It is worth noting that when flag ``LOOKUP_MOUNTPOINT`` is set, 490 + path_lookupat() will unset LOOKUP_JUMPED in nameidata so that in the 491 + subsequent path traversal d_weak_revalidate() won't be called. 492 + This is important when unmounting a filesystem that is inaccessible, such as 497 493 one provided by a dead NFS server. 498 494 499 495 Finally ``path_openat()`` is used for the ``open()`` system call; it 500 - contains, in support functions starting with "``do_last()``", all the 496 + contains, in support functions starting with "open_last_lookups()", all the 501 497 complexity needed to handle the different subtleties of O_CREAT (with 502 498 or without O_EXCL), final "``/``" characters, and trailing symbolic 503 499 links. We will revisit this in the final part of this series, which 504 - focuses on those symbolic links. "``do_last()``" will sometimes, but 500 + focuses on those symbolic links. "open_last_lookups()" will sometimes, but 505 501 not always, take ``i_rwsem``, depending on what it finds. 506 502 507 503 Each of these, or the functions which call them, need to be alert to ··· 535 535 tree, but a few notes specifically related to path lookup are in order 536 536 here. 537 537 538 - The Linux VFS has a concept of "managed" dentries which is reflected 539 - in function names such as "``follow_managed()``". There are three 538 + The Linux VFS has a concept of "managed" dentries. There are three 540 539 potentially interesting things about these dentries corresponding 541 540 to three different flags that might be set in ``dentry->d_flags``: 542 541 ··· 651 652 restarts from the top with REF-walk. 652 653 653 654 This pattern of "try RCU-walk, if that fails try REF-walk" can be 654 - clearly seen in functions like ``filename_lookup()``, 655 - ``filename_parentat()``, ``filename_mountpoint()``, 656 - ``do_filp_open()``, and ``do_file_open_root()``. These five 657 - correspond roughly to the four ``path_*()`` functions we met earlier, 655 + clearly seen in functions like filename_lookup(), 656 + filename_parentat(), 657 + do_filp_open(), and do_file_open_root(). These four 658 + correspond roughly to the three ``path_*()`` functions we met earlier, 658 659 each of which calls ``link_path_walk()``. The ``path_*()`` functions are 659 660 called using different mode flags until a mode is found which works. 660 661 They are first called with ``LOOKUP_RCU`` set to request "RCU-walk". If ··· 992 993 kernel spend too much time on just one path is one of them. With 993 994 symbolic links you can effectively generate much longer paths so some 994 995 sort of limit is needed for the same reason. Linux imposes a limit of 995 - at most 40 symlinks in any one path lookup. It previously imposed a 996 - further limit of eight on the maximum depth of recursion, but that was 996 + at most 40 (MAXSYMLINKS) symlinks in any one path lookup. It previously imposed 997 + a further limit of eight on the maximum depth of recursion, but that was 997 998 raised to 40 when a separate stack was implemented, so there is now 998 999 just the one limit. 999 1000 ··· 1060 1061 must return ``-ECHILD`` and ``unlazy_walk()`` will be called to return to 1061 1062 REF-walk mode in which the filesystem is allowed to sleep. 1062 1063 1063 - The place for all this to happen is the ``i_op->follow_link()`` inode 1064 - method. In the present mainline code this is never actually called in 1065 - RCU-walk mode as the rewrite is not quite complete. It is likely that 1066 - in a future release this method will be passed an ``inode`` pointer when 1067 - called in RCU-walk mode so it both (1) knows to be careful, and (2) has the 1068 - validated pointer. Much like the ``i_op->permission()`` method we 1069 - looked at previously, ``->follow_link()`` would need to be careful that 1064 + The place for all this to happen is the ``i_op->get_link()`` inode 1065 + method. This is called both in RCU-walk and REF-walk. In RCU-walk the 1066 + ``dentry*`` argument is NULL, ``->get_link()`` can return -ECHILD to drop out of 1067 + RCU-walk. Much like the ``i_op->permission()`` method we 1068 + looked at previously, ``->get_link()`` would need to be careful that 1070 1069 all the data structures it references are safe to be accessed while 1071 - holding no counted reference, only the RCU lock. Though getting a 1072 - reference with ``->follow_link()`` is not yet done in RCU-walk mode, the 1073 - code is ready to release the reference when that does happen. 1074 - 1075 - This need to drop the reference to a symlink adds significant 1076 - complexity. It requires a reference to the inode so that the 1077 - ``i_op->put_link()`` inode operation can be called. In REF-walk, that 1078 - reference is kept implicitly through a reference to the dentry, so 1079 - keeping the ``struct path`` of the symlink is easiest. For RCU-walk, 1080 - the pointer to the inode is kept separately. To allow switching from 1081 - RCU-walk back to REF-walk in the middle of processing nested symlinks 1082 - we also need the seq number for the dentry so we can confirm that 1083 - switching back was safe. 1084 - 1085 - Finally, when providing a reference to a symlink, the filesystem also 1086 - provides an opaque "cookie" that must be passed to ``->put_link()`` so that it 1087 - knows what to free. This might be the allocated memory area, or a 1088 - pointer to the ``struct page`` in the page cache, or something else 1089 - completely. Only the filesystem knows what it is. 1070 + holding no counted reference, only the RCU lock. A callback 1071 + ``struct delayed_called`` will be passed to ``->get_link()``: 1072 + file systems can set their own put_link function and argument through 1073 + set_delayed_call(). Later on, when VFS wants to put link, it will call 1074 + do_delayed_call() to invoke that callback function with the argument. 1090 1075 1091 1076 In order for the reference to each symlink to be dropped when the walk completes, 1092 1077 whether in RCU-walk or REF-walk, the symlink stack needs to contain, 1093 1078 along with the path remnants: 1094 1079 1095 - - the ``struct path`` to provide a reference to the inode in REF-walk 1096 - - the ``struct inode *`` to provide a reference to the inode in RCU-walk 1080 + - the ``struct path`` to provide a reference to the previous path 1081 + - the ``const char *`` to provide a reference to the to previous name 1097 1082 - the ``seq`` to allow the path to be safely switched from RCU-walk to REF-walk 1098 - - the ``cookie`` that tells ``->put_path()`` what to put. 1083 + - the ``struct delayed_call`` for later invocation. 1099 1084 1100 1085 This means that each entry in the symlink stack needs to hold five 1101 1086 pointers and an integer instead of just one pointer (the path ··· 1103 1120 stack is very straightforward; pushing and popping the references is 1104 1121 a little more complex. 1105 1122 1106 - When a symlink is found, ``walk_component()`` returns the value ``1`` 1107 - (``0`` is returned for any other sort of success, and a negative number 1108 - is, as usual, an error indicator). This causes ``get_link()`` to be 1109 - called; it then gets the link from the filesystem. Providing that 1110 - operation is successful, the old path ``name`` is placed on the stack, 1111 - and the new value is used as the ``name`` for a while. When the end of 1123 + When a symlink is found, walk_component() calls pick_link() via step_into() 1124 + which returns the link from the filesystem. 1125 + Providing that operation is successful, the old path ``name`` is placed on the 1126 + stack, and the new value is used as the ``name`` for a while. When the end of 1112 1127 the path is found (i.e. ``*name`` is ``'\0'``) the old ``name`` is restored 1113 1128 off the stack and path walking continues. 1114 1129 ··· 1123 1142 old symlink as it walks that last component. So it is quite 1124 1143 convenient for ``walk_component()`` to release the old symlink and pop 1125 1144 the references just before pushing the reference information for the 1126 - new symlink. It is guided in this by two flags; ``WALK_GET``, which 1127 - gives it permission to follow a symlink if it finds one, and 1128 - ``WALK_PUT``, which tells it to release the current symlink after it has been 1129 - followed. ``WALK_PUT`` is tested first, leading to a call to 1130 - ``put_link()``. ``WALK_GET`` is tested subsequently (by 1131 - ``should_follow_link()``) leading to a call to ``pick_link()`` which sets 1132 - up the stack frame. 1145 + new symlink. It is guided in this by three flags: ``WALK_NOFOLLOW`` which 1146 + forbids it from following a symlink if it finds one, ``WALK_MORE`` 1147 + which indicates that it is yet too early to release the 1148 + current symlink, and ``WALK_TRAILING`` which indicates that it is on the final 1149 + component of the lookup, so we will check userspace flag ``LOOKUP_FOLLOW`` to 1150 + decide whether follow it when it is a symlink and call ``may_follow_link()`` to 1151 + check if we have privilege to follow it. 1133 1152 1134 1153 Symlinks with no final component 1135 1154 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1136 1155 1137 1156 A pair of special-case symlinks deserve a little further explanation. 1138 1157 Both result in a new ``struct path`` (with mount and dentry) being set 1139 - up in the ``nameidata``, and result in ``get_link()`` returning ``NULL``. 1158 + up in the ``nameidata``, and result in pick_link() returning ``NULL``. 1140 1159 1141 1160 The more obvious case is a symlink to "``/``". All symlinks starting 1142 - with "``/``" are detected in ``get_link()`` which resets the ``nameidata`` 1161 + with "``/``" are detected in pick_link() which resets the ``nameidata`` 1143 1162 to point to the effective filesystem root. If the symlink only 1144 1163 contains "``/``" then there is nothing more to do, no components at all, 1145 1164 so ``NULL`` is returned to indicate that the symlink can be released and ··· 1156 1175 target file, not just the name of it. When you ``readlink`` these 1157 1176 objects you get a name that might refer to the same file - unless it 1158 1177 has been unlinked or mounted over. When ``walk_component()`` follows 1159 - one of these, the ``->follow_link()`` method in "procfs" doesn't return 1160 - a string name, but instead calls ``nd_jump_link()`` which updates the 1161 - ``nameidata`` in place to point to that target. ``->follow_link()`` then 1162 - returns ``NULL``. Again there is no final component and ``get_link()`` 1163 - reports this by leaving the ``last_type`` field of ``nameidata`` as 1164 - ``LAST_BIND``. 1178 + one of these, the ``->get_link()`` method in "procfs" doesn't return 1179 + a string name, but instead calls nd_jump_link() which updates the 1180 + ``nameidata`` in place to point to that target. ``->get_link()`` then 1181 + returns ``NULL``. Again there is no final component and pick_link() 1182 + returns ``NULL``. 1165 1183 1166 1184 Following the symlink in the final component 1167 1185 -------------------------------------------- ··· 1177 1197 successive symlinks until one is found that doesn't point to another 1178 1198 symlink. 1179 1199 1180 - This case is handled by the relevant caller of ``link_path_walk()``, such as 1181 - ``path_lookupat()`` using a loop that calls ``link_path_walk()``, and then 1182 - handles the final component. If the final component is a symlink 1183 - that needs to be followed, then ``trailing_symlink()`` is called to set 1184 - things up properly and the loop repeats, calling ``link_path_walk()`` 1185 - again. This could loop as many as 40 times if the last component of 1186 - each symlink is another symlink. 1200 + This case is handled by relevant callers of link_path_walk(), such as 1201 + path_lookupat(), path_openat() using a loop that calls link_path_walk(), 1202 + and then handles the final component by calling open_last_lookups() or 1203 + lookup_last(). If it is a symlink that needs to be followed, 1204 + open_last_lookups() or lookup_last() will set things up properly and 1205 + return the path so that the loop repeats, calling 1206 + link_path_walk() again. This could loop as many as 40 times if the last 1207 + component of each symlink is another symlink. 1187 1208 1188 - The various functions that examine the final component and possibly 1189 - report that it is a symlink are ``lookup_last()``, ``mountpoint_last()`` 1190 - and ``do_last()``, each of which use the same convention as 1191 - ``walk_component()`` of returning ``1`` if a symlink was found that needs 1192 - to be followed. 1209 + Of the various functions that examine the final component, 1210 + open_last_lookups() is the most interesting as it works in tandem 1211 + with do_open() for opening a file. Part of open_last_lookups() runs 1212 + with ``i_rwsem`` held and this part is in a separate function: lookup_open(). 1193 1213 1194 - Of these, ``do_last()`` is the most interesting as it is used for 1195 - opening a file. Part of ``do_last()`` runs with ``i_rwsem`` held and this 1196 - part is in a separate function: ``lookup_open()``. 1214 + Explaining open_last_lookups() and do_open() completely is beyond the scope 1215 + of this article, but a few highlights should help those interested in exploring 1216 + the code. 1197 1217 1198 - Explaining ``do_last()`` completely is beyond the scope of this article, 1199 - but a few highlights should help those interested in exploring the 1200 - code. 1201 - 1202 - 1. Rather than just finding the target file, ``do_last()`` needs to open 1218 + 1. Rather than just finding the target file, do_open() is used after 1219 + open_last_lookup() to open 1203 1220 it. If the file was found in the dcache, then ``vfs_open()`` is used for 1204 1221 this. If not, then ``lookup_open()`` will either call ``atomic_open()`` (if 1205 1222 the filesystem provides it) to combine the final lookup with the open, or 1206 - will perform the separate ``lookup_real()`` and ``vfs_create()`` steps 1223 + will perform the separate ``i_op->lookup()`` and ``i_op->create()`` steps 1207 1224 directly. In the later case the actual "open" of this newly found or 1208 - created file will be performed by ``vfs_open()``, just as if the name 1225 + created file will be performed by vfs_open(), just as if the name 1209 1226 were found in the dcache. 1210 1227 1211 - 2. ``vfs_open()`` can fail with ``-EOPENSTALE`` if the cached information 1212 - wasn't quite current enough. Rather than restarting the lookup from 1213 - the top with ``LOOKUP_REVAL`` set, ``lookup_open()`` is called instead, 1214 - giving the filesystem a chance to resolve small inconsistencies. 1215 - If that doesn't work, only then is the lookup restarted from the top. 1228 + 2. vfs_open() can fail with ``-EOPENSTALE`` if the cached information 1229 + wasn't quite current enough. If it's in RCU-walk ``-ECHILD`` will be returned 1230 + otherwise ``-ESTALE`` is returned. When ``-ESTALE`` is returned, the caller may 1231 + retry with ``LOOKUP_REVAL`` flag set. 1216 1232 1217 1233 3. An open with O_CREAT **does** follow a symlink in the final component, 1218 1234 unlike other creation system calls (like ``mkdir``). So the sequence:: ··· 1218 1242 1219 1243 will create a file called ``/tmp/bar``. This is not permitted if 1220 1244 ``O_EXCL`` is set but otherwise is handled for an O_CREAT open much 1221 - like for a non-creating open: ``should_follow_link()`` returns ``1``, and 1222 - so does ``do_last()`` so that ``trailing_symlink()`` gets called and the 1245 + like for a non-creating open: lookup_last() or open_last_lookup() 1246 + returns a non ``NULL`` value, and link_path_walk() gets called and the 1223 1247 open process continues on the symlink that was found. 1224 1248 1225 1249 Updating the access time

+2 -1

Documentation/firmware-guide/acpi/dsd/data-node-references.rst

··· 79 79 }) 80 80 } 81 81 82 - Please also see a graph example in :doc:`graph`. 82 + Please also see a graph example in 83 + Documentation/firmware-guide/acpi/dsd/graph.rst. 83 84 84 85 References 85 86 ==========

+1 -1

Documentation/firmware-guide/acpi/dsd/graph.rst

··· 174 174 referenced 2016-10-04. 175 175 176 176 [7] _DSD Device Properties Usage Rules. 177 - :doc:`../DSD-properties-rules` 177 + Documentation/firmware-guide/acpi/DSD-properties-rules.rst

+4 -3

Documentation/firmware-guide/acpi/enumeration.rst

··· 339 339 There are also devm_* versions of these functions which release the 340 340 descriptors once the device is released. 341 341 342 - See Documentation/firmware-guide/acpi/gpio-properties.rst for more information about the 343 - _DSD binding related to GPIOs. 342 + See Documentation/firmware-guide/acpi/gpio-properties.rst for more information 343 + about the _DSD binding related to GPIOs. 344 344 345 345 MFD devices 346 346 =========== ··· 460 460 Otherwise, the _DSD itself is regarded as invalid and therefore the "compatible" 461 461 property returned by it is meaningless. 462 462 463 - Refer to :doc:`DSD-properties-rules` for more information. 463 + Refer to Documentation/firmware-guide/acpi/DSD-properties-rules.rst for more 464 + information. 464 465 465 466 PCI hierarchy representation 466 467 ============================

+1 -1

Documentation/i2c/instantiating-devices.rst

··· 59 59 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 60 60 61 61 ACPI can also describe I2C devices. There is special documentation for this 62 - which is currently located at :doc:`../firmware-guide/acpi/enumeration`. 62 + which is currently located at Documentation/firmware-guide/acpi/enumeration.rst. 63 63 64 64 65 65 Declare the I2C devices in board files

+2 -1

Documentation/i2c/old-module-parameters.rst

··· 17 17 With the conversion of the I2C subsystem to the standard device driver 18 18 binding model, it became clear that these per-module parameters were no 19 19 longer needed, and that a centralized implementation was possible. The new, 20 - sysfs-based interface is described in :doc:`instantiating-devices`, section 20 + sysfs-based interface is described in 21 + Documentation/i2c/instantiating-devices.rst, section 21 22 "Method 4: Instantiate from user-space". 22 23 23 24 Below is a mapping from the old module parameters to the new interface.

+2 -2

Documentation/i2c/smbus-protocol.rst

··· 27 27 Each transaction type corresponds to a functionality flag. Before calling a 28 28 transaction function, a device driver should always check (just once) for 29 29 the corresponding functionality flag to ensure that the underlying I2C 30 - adapter supports the transaction in question. See :doc:`functionality` for 31 - the details. 30 + adapter supports the transaction in question. See 31 + Documentation/i2c/functionality.rst for the details. 32 32 33 33 34 34 Key to symbols

+1 -1

Documentation/input/joydev/joystick-api.rst

··· 263 263 264 264 char name[128]; 265 265 if (ioctl(fd, JSIOCGNAME(sizeof(name)), name) < 0) 266 - strncpy(name, "Unknown", sizeof(name)); 266 + strscpy(name, "Unknown", sizeof(name)); 267 267 printf("Name: %s\n", name); 268 268 269 269

+2 -2

Documentation/kernel-hacking/hacking.rst

··· 601 601 602 602 This is the variant of `EXPORT_SYMBOL()` that allows specifying a symbol 603 603 namespace. Symbol Namespaces are documented in 604 - :doc:`../core-api/symbol-namespaces` 604 + Documentation/core-api/symbol-namespaces.rst 605 605 606 606 :c:func:`EXPORT_SYMBOL_NS_GPL()` 607 607 -------------------------------- ··· 610 610 611 611 This is the variant of `EXPORT_SYMBOL_GPL()` that allows specifying a symbol 612 612 namespace. Symbol Namespaces are documented in 613 - :doc:`../core-api/symbol-namespaces` 613 + Documentation/core-api/symbol-namespaces.rst 614 614 615 615 Routines and Conventions 616 616 ========================

+3 -3

Documentation/networking/device_drivers/ethernet/intel/i40e.rst

··· 466 466 "ethtool -T <netdev name>" to get a definitive list of PTP capabilities 467 467 supported by the device. 468 468 469 - IEEE 802.1ad (QinQ) Support 469 + IEEE 802.1ad (QinQ) Support 470 470 --------------------------- 471 471 The IEEE 802.1ad standard, informally known as QinQ, allows for multiple VLAN 472 472 IDs within a single Ethernet frame. VLAN IDs are sometimes referred to as ··· 523 523 Maximum Bandwidth is not restricted, because no more than 100% of a port's 524 524 bandwidth can ever be used. 525 525 526 - NOTE: X710/XXV710 devices fail to enable Max VFs (64) when Multiple Functions 527 - per Port (MFP) and SR-IOV are enabled. An error from i40e is logged that says 526 + NOTE: X710/XXV710 devices fail to enable Max VFs (64) when Multiple Functions 527 + per Port (MFP) and SR-IOV are enabled. An error from i40e is logged that says 528 528 "add vsi failed for VF N, aq_err 16". To workaround the issue, enable less than 529 529 64 virtual functions (VFs). 530 530

+1 -1

Documentation/networking/device_drivers/ethernet/intel/iavf.rst

··· 113 113 - AVF device ID 114 114 - HW mailbox is used for VF to PF communications (including on Windows) 115 115 116 - IEEE 802.1ad (QinQ) Support 116 + IEEE 802.1ad (QinQ) Support 117 117 --------------------------- 118 118 The IEEE 802.1ad standard, informally known as QinQ, allows for multiple VLAN 119 119 IDs within a single Ethernet frame. VLAN IDs are sometimes referred to as

+1 -1

Documentation/networking/devlink/devlink-region.rst

··· 22 22 address regions that are otherwise inaccessible to the user. 23 23 24 24 Regions may also be used to provide an additional way to debug complex error 25 - states, but see also :doc:`devlink-health` 25 + states, but see also Documentation/networking/devlink/devlink-health.rst 26 26 27 27 Regions may optionally support capturing a snapshot on demand via the 28 28 ``DEVLINK_CMD_REGION_NEW`` netlink message. A driver wishing to allow

+2 -2

Documentation/networking/devlink/devlink-trap.rst

··· 495 495 links to the description of driver-specific traps registered by various device 496 496 drivers: 497 497 498 - * :doc:`netdevsim` 499 - * :doc:`mlxsw` 498 + * Documentation/networking/devlink/netdevsim.rst 499 + * Documentation/networking/devlink/mlxsw.rst 500 500 501 501 .. _Generic-Packet-Trap-Groups: 502 502

+1 -1

Documentation/networking/packet_mmap.rst

··· 153 153 struct ifreq s_ifr; 154 154 ... 155 155 156 - strncpy (s_ifr.ifr_name, "eth0", sizeof(s_ifr.ifr_name)); 156 + strscpy_pad (s_ifr.ifr_name, "eth0", sizeof(s_ifr.ifr_name)); 157 157 158 158 /* get interface index of eth0 */ 159 159 ioctl(this->socket, SIOCGIFINDEX, &s_ifr);

+1 -1

Documentation/networking/tuntap.rst

··· 107 107 */ 108 108 ifr.ifr_flags = IFF_TUN; 109 109 if( *dev ) 110 - strncpy(ifr.ifr_name, dev, IFNAMSIZ); 110 + strscpy_pad(ifr.ifr_name, dev, IFNAMSIZ); 111 111 112 112 if( (err = ioctl(fd, TUNSETIFF, (void *) &ifr)) < 0 ){ 113 113 close(fd);

+15 -17

Documentation/process/submitting-patches.rst

··· 10 10 11 11 This document contains a large number of suggestions in a relatively terse 12 12 format. For detailed information on how the kernel development process 13 - works, see :doc:`development-process`. Also, read :doc:`submit-checklist` 13 + works, see Documentation/process/development-process.rst. Also, read 14 + Documentation/process/submit-checklist.rst 14 15 for a list of items to check before submitting code. If you are submitting 15 - a driver, also read :doc:`submitting-drivers`; for device tree binding patches, 16 - read :doc:`submitting-patches`. 16 + a driver, also read Documentation/process/submitting-drivers.rst; for device 17 + tree binding patches, read Documentation/process/submitting-patches.rst. 17 18 18 19 This documentation assumes that you're using ``git`` to prepare your patches. 19 20 If you're unfamiliar with ``git``, you would be well-advised to learn how to ··· 179 178 ------------------------ 180 179 181 180 Check your patch for basic style violations, details of which can be 182 - found in 183 - :ref:`Documentation/process/coding-style.rst <codingstyle>`. 181 + found in Documentation/process/coding-style.rst. 184 182 Failure to do so simply wastes 185 183 the reviewers time and will get your patch rejected, probably 186 184 without even being read. ··· 238 238 to security@kernel.org. For severe bugs, a short embargo may be considered 239 239 to allow distributors to get the patch out to users; in such cases, 240 240 obviously, the patch should not be sent to any public lists. See also 241 - :doc:`/admin-guide/security-bugs`. 241 + Documentation/admin-guide/security-bugs.rst. 242 242 243 243 Patches that fix a severe bug in a released kernel should be directed 244 244 toward the stable maintainers by putting a line like this:: ··· 246 246 Cc: stable@vger.kernel.org 247 247 248 248 into the sign-off area of your patch (note, NOT an email recipient). You 249 - should also read 250 - :ref:`Documentation/process/stable-kernel-rules.rst <stable_kernel_rules>` 251 - in addition to this file. 249 + should also read Documentation/process/stable-kernel-rules.rst 250 + in addition to this document. 252 251 253 252 If changes affect userland-kernel interfaces, please send the MAN-PAGES 254 253 maintainer (as listed in the MAINTAINERS file) a man-pages patch, or at ··· 304 305 Exception: If your mailer is mangling patches then someone may ask 305 306 you to re-send them using MIME. 306 307 307 - See :doc:`/process/email-clients` for hints about configuring your e-mail 308 - client so that it sends your patches untouched. 308 + See Documentation/process/email-clients.rst for hints about configuring 309 + your e-mail client so that it sends your patches untouched. 309 310 310 311 Respond to review comments 311 312 -------------------------- ··· 323 324 reviewers sometimes get grumpy. Even in that case, though, respond 324 325 politely and address the problems they have pointed out. 325 326 326 - See :doc:`email-clients` for recommendations on email 327 + See Documentation/process/email-clients.rst for recommendations on email 327 328 clients and mailing list etiquette. 328 329 329 330 ··· 561 562 for more details. 562 563 563 564 Note: Attaching a Fixes: tag does not subvert the stable kernel rules 564 - process nor the requirement to Cc: stable@vger.kernel.org on all stable 565 + process nor the requirement to Cc: stable@vger.kernel.org on all stable 565 566 patch candidates. For more information, please read 566 - :ref:`Documentation/process/stable-kernel-rules.rst <stable_kernel_rules>` 567 - 567 + Documentation/process/stable-kernel-rules.rst. 568 + 568 569 .. _the_canonical_patch_format: 569 570 570 571 The canonical patch format ··· 823 824 NO!!!! No more huge patch bombs to linux-kernel@vger.kernel.org people! 824 825 <https://lore.kernel.org/r/20050711.125305.08322243.davem@davemloft.net> 825 826 826 - Kernel Documentation/process/coding-style.rst: 827 - :ref:`Documentation/process/coding-style.rst <codingstyle>` 827 + Kernel Documentation/process/coding-style.rst 828 828 829 829 Linus Torvalds's mail on the canonical patch format: 830 830 <https://lore.kernel.org/r/Pine.LNX.4.58.0504071023190.28951@ppc970.osdl.org>

+1 -1

Documentation/scheduler/sched-bwc.rst

··· 29 29 .. note:: 30 30 The cgroupfs files described in this section are only applicable 31 31 to cgroup v1. For cgroup v2, see 32 - :ref:`Documentation/admin-guide/cgroupv2.rst <cgroup-v2-cpu>`. 32 + :ref:`Documentation/admin-guide/cgroup-v2.rst <cgroup-v2-cpu>`. 33 33 34 34 - cpu.cfs_quota_us: the total available run-time within a period (in 35 35 microseconds)

+1 -1

Documentation/scheduler/sched-nice-design.rst

··· 60 60 coupling to timeslices and granularity it was not really viable. 61 61 62 62 The second (less frequent but still periodically occurring) complaint 63 - about Linux's nice level support was its assymetry around the origo 63 + about Linux's nice level support was its asymmetry around the origin 64 64 (which you can see demonstrated in the picture above), or more 65 65 accurately: the fact that nice level behavior depended on the _absolute_ 66 66 nice level as well, while the nice API itself is fundamentally

+2 -1

Documentation/security/landlock.rst

··· 25 25 evaluated according to the inherited ones in a way that ensures that only more 26 26 constraints can be added. 27 27 28 - User space documentation can be found here: :doc:`/userspace-api/landlock`. 28 + User space documentation can be found here: 29 + Documentation/userspace-api/landlock.rst. 29 30 30 31 Guiding principles for safe access controls 31 32 ===========================================

+1 -1

Documentation/trace/coresight/coresight-etm4x-reference.rst

··· 427 427 :Syntax: 428 428 ``echo idx > vmid_idx`` 429 429 430 - Where idx < numvmidc 430 + Where idx < numvmidc 431 431 432 432 ---- 433 433

+5 -3

Documentation/trace/coresight/coresight.rst

··· 315 315 316 316 Note: ``cti_sys0`` appears in two of the connections lists above. 317 317 CTIs can connect to multiple devices and are arranged in a star topology 318 - via the CTM. See (:doc:`coresight-ect`) [#fourth]_ for further details. 318 + via the CTM. See (Documentation/trace/coresight/coresight-ect.rst) 319 + [#fourth]_ for further details. 319 320 Looking at this device we see 4 connections:: 320 321 321 322 linaro-developer:~# ls -l /sys/bus/coresight/devices/cti_sys0/connections ··· 607 606 crw------- 1 root root 10, 61 Jan 3 18:11 /dev/stm0 608 607 root@genericarmv8:~# 609 608 610 - Details on how to use the generic STM API can be found here:- :doc:`../stm` [#second]_. 609 + Details on how to use the generic STM API can be found here: 610 + - Documentation/trace/stm.rst [#second]_. 611 611 612 612 The CTI & CTM Modules 613 613 --------------------- ··· 618 616 channels on the CTM (Cross Trigger Matrix). 619 617 620 618 A separate documentation file is provided to explain the use of these devices. 621 - (:doc:`coresight-ect`) [#fourth]_. 619 + (Documentation/trace/coresight/coresight-ect.rst) [#fourth]_. 622 620 623 621 624 622 .. [#first] Documentation/ABI/testing/sysfs-bus-coresight-devices-stm

+3 -3

Documentation/trace/ftrace.rst

··· 40 40 Implementation Details 41 41 ---------------------- 42 42 43 - See :doc:`ftrace-design` for details for arch porters and such. 43 + See Documentation/trace/ftrace-design.rst for details for arch porters and such. 44 44 45 45 46 46 The File System ··· 354 354 is being directly called by the function. If the count is greater 355 355 than 1 it most likely will be ftrace_ops_list_func(). 356 356 357 - If the callback of the function jumps to a trampoline that is 358 - specific to a the callback and not the standard trampoline, 357 + If the callback of a function jumps to a trampoline that is 358 + specific to the callback and which is not the standard trampoline, 359 359 its address will be printed as well as the function that the 360 360 trampoline calls. 361 361

+4

Documentation/translations/index.rst

··· 18 18 Disclaimer 19 19 ---------- 20 20 21 + .. raw:: latex 22 + 23 + \kerneldocCJKoff 24 + 21 25 Translation's purpose is to ease reading and understanding in languages other 22 26 than English. Its aim is to help people who do not understand English or have 23 27 doubts about its interpretation. Additionally, some people prefer to read

+4

Documentation/translations/it_IT/index.rst

··· 4 4 Traduzione italiana 5 5 =================== 6 6 7 + .. raw:: latex 8 + 9 + \kerneldocCJKoff 10 + 7 11 :manutentore: Federico Vaga <federico.vaga@vaga.pv.it> 8 12 9 13 .. _it_disclaimer:

+1 -1

Documentation/translations/it_IT/process/coding-style.rst

··· 62 62 case 'K': 63 63 case 'k': 64 64 mem <<= 10; 65 - /* fall through */ 65 + fallthrough; 66 66 default: 67 67 break; 68 68 }

+3 -2

Documentation/translations/ja_JP/index.rst

··· 1 1 .. raw:: latex 2 2 3 - \renewcommand\thesection* 4 - \renewcommand\thesubsection* 3 + \renewcommand\thesection* 4 + \renewcommand\thesubsection* 5 + \kerneldocCJKon 5 6 6 7 Japanese translations 7 8 =====================

+3 -2

Documentation/translations/ko_KR/index.rst

··· 1 1 .. raw:: latex 2 2 3 - \renewcommand\thesection* 4 - \renewcommand\thesubsection* 3 + \renewcommand\thesection* 4 + \renewcommand\thesubsection* 5 + \kerneldocCJKon 5 6 6 7 한국어 번역 7 8 ===========

+1 -1

Documentation/translations/zh_CN/admin-guide/index.rst

··· 65 65 66 66 clearing-warn-once 67 67 cpu-load 68 + lockup-watchdogs 68 69 unicode 69 70 70 71 Todolist: ··· 101 100 laptops/index 102 101 lcd-panel-cgram 103 102 ldm 104 - lockup-watchdogs 105 103 LSM/index 106 104 md 107 105 media/index

+66

Documentation/translations/zh_CN/admin-guide/lockup-watchdogs.rst

··· 1 + .. include:: ../disclaimer-zh_CN.rst 2 + 3 + :Original: Documentation/admin-guide/lockup-watchdogs.rst 4 + :Translator: Hailong Liu <liu.hailong6@zte.com.cn> 5 + 6 + .. _cn_lockup-watchdogs: 7 + 8 + 9 + ================================================= 10 + Softlockup与hardlockup检测机制(又名:nmi_watchdog) 11 + ================================================= 12 + 13 + Linux中内核实现了一种用以检测系统发生softlockup和hardlockup的看门狗机制。 14 + 15 + Softlockup是一种会引发系统在内核态中一直循环超过20秒（详见下面“实现”小节）导致 16 + 其他任务没有机会得到运行的BUG。一旦检测到'softlockup'发生，默认情况下系统会打 17 + 印当前堆栈跟踪信息并进入锁定状态。也可配置使其在检测到'softlockup'后进入panic 18 + 状态；通过sysctl命令设置“kernel.softlockup_panic”、使用内核启动参数 19 + “softlockup_panic”（详见Documentation/admin-guide/kernel-parameters.rst）以及使 20 + 能内核编译选项“BOOTPARAM_SOFTLOCKUP_PANIC”都可实现这种配置。 21 + 22 + 而'hardlockup'是一种会引发系统在内核态一直循环超过10秒钟（详见"实现"小节）导致其 23 + 他中断没有机会运行的缺陷。与'softlockup'情况类似，除了使用sysctl命令设置 24 + 'hardlockup_panic'、使能内核选项“BOOTPARAM_HARDLOCKUP_PANIC”以及使用内核参数 25 + "nmi_watchdog"(详见:”Documentation/admin-guide/kernel-parameters.rst“)外，一旦检 26 + 测到'hardlockup'默认情况下系统打印当前堆栈跟踪信息，然后进入锁定状态。 27 + 28 + 这个panic选项也可以与panic_timeout结合使用（这个panic_timeout是通过稍具迷惑性的 29 + sysctl命令"kernel.panic"来设置），使系统在panic指定时间后自动重启。 30 + 31 + 实现 32 + ==== 33 + 34 + Softlockup和hardlockup分别建立在hrtimer(高精度定时器)和perf两个子系统上而实现。 35 + 这也就意味着理论上任何架构只要实现了这两个子系统就支持这两种检测机制。 36 + 37 + Hrtimer用于周期性产生中断并唤醒watchdog线程；NMI perf事件则以”watchdog_thresh“ 38 + (编译时默认初始化为10秒，也可通过”watchdog_thresh“这个sysctl接口来进行配置修改) 39 + 为间隔周期产生以检测 hardlockups。如果一个CPU在这个时间段内没有检测到hrtimer中 40 + 断发生，'hardlockup 检测器'(即NMI perf事件处理函数)将会视系统配置而选择产生内核 41 + 警告或者直接panic。 42 + 43 + 而watchdog线程本质上是一个高优先级内核线程，每调度一次就对时间戳进行一次更新。 44 + 如果时间戳在2*watchdog_thresh(这个是softlockup的触发门限)这段时间都未更新,那么 45 + "softlocup 检测器"(内部hrtimer定时器回调函数)会将相关的调试信息打印到系统日志中， 46 + 然后如果系统配置了进入panic流程则进入panic，否则内核继续执行。 47 + 48 + Hrtimer定时器的周期是2*watchdog_thresh/5，也就是说在hardlockup被触发前hrtimer有 49 + 2~3次机会产生时钟中断。 50 + 51 + 如上所述,内核相当于为系统管理员提供了一个可调节hrtimer定时器和perf事件周期长度 52 + 的调节旋钮。如何通过这个旋钮为特定使用场景配置一个合理的周期值要对lockups检测的 53 + 响应速度和lockups检测开销这二者之间进行权衡。 54 + 55 + 默认情况下所有在线cpu上都会运行一个watchdog线程。不过在内核配置了”NO_HZ_FULL“的 56 + 情况下watchdog线程默认只会运行在管家(housekeeping)cpu上，而”nohz_full“启动参数指 57 + 定的cpu上则不会有watchdog线程运行。试想，如果我们允许watchdog线程在”nohz_full“指 58 + 定的cpu上运行，这些cpu上必须得运行时钟定时器来激发watchdog线程调度；这样一来就会 59 + 使”nohz_full“保护用户程序免受内核干扰的功能失效。当然，副作用就是”nohz_full“指定 60 + 的cpu即使在内核产生了lockup问题我们也无法检测到。不过，至少我们可以允许watchdog 61 + 线程在管家(non-tickless)核上继续运行以便我们能继续正常的监测这些cpus上的lockups 62 + 事件。 63 + 64 + 不论哪种情况都可以通过sysctl命令kernel.watchdog_cpumask来对没有运行watchdog线程 65 + 的cpu集合进行调节。对于nohz_full而言,如果nohz_full cpu上有异常挂住的情况，通过 66 + 这种方式打开这些cpu上的watchdog进行调试可能会有所作用。

+336

Documentation/translations/zh_CN/core-api/cachetlb.rst

··· 1 + .. include:: ../disclaimer-zh_CN.rst 2 + 3 + :Original: Documentation/core-api/cachetlb.rst 4 + 5 + :翻译: 6 + 7 + 司延腾 Yanteng Si <siyanteng@loongson.cn> 8 + 9 + :校译: 10 + 11 + 吴想成 Wu XiangCheng <bobwxc@email.cn> 12 + 13 + .. _cn_core-api_cachetlb: 14 + 15 + ====================== 16 + Linux下的缓存和TLB刷新 17 + ====================== 18 + 19 + :作者: David S. Miller <davem@redhat.com> 20 + 21 + *译注：TLB，Translation Lookaside Buffer，页表缓存/变换旁查缓冲器* 22 + 23 + 本文描述了由Linux虚拟内存子系统调用的缓存/TLB刷新接口。它列举了每个接 24 + 口，描述了它的预期目的，以及接口被调用后的预期副作用。 25 + 26 + 下面描述的副作用是针对单处理器的实现，以及在单个处理器上发生的情况。若 27 + 为SMP，则只需将定义简单地扩展一下，使发生在某个特定接口的副作用扩展到系 28 + 统的所有处理器上。不要被这句话吓到，以为SMP的缓存/tlb刷新一定是很低 29 + 效的，事实上，这是一个可以进行很多优化的领域。例如，如果可以证明一个用 30 + 户地址空间从未在某个cpu上执行过（见mm_cpumask()），那么就不需要在该 31 + cpu上对这个地址空间进行刷新。 32 + 33 + 首先是TLB刷新接口，因为它们是最简单的。在Linux下，TLB被抽象为cpu 34 + 用来缓存从软件页表获得的虚拟->物理地址转换的东西。这意味着，如果软件页 35 + 表发生变化，这个“TLB”缓存中就有可能出现过时（脏）的翻译。因此，当软件页表 36 + 发生变化时，内核会在页表发生 *变化后* 调用以下一种刷新方法： 37 + 38 + 1) ``void flush_tlb_all(void)`` 39 + 40 + 最严格的刷新。在这个接口运行后，任何以前的页表修改都会对cpu可见。 41 + 42 + 这通常是在内核页表被改变时调用的，因为这种转换在本质上是“全局”的。 43 + 44 + 2) ``void flush_tlb_mm(struct mm_struct *mm)`` 45 + 46 + 这个接口从TLB中刷新整个用户地址空间。在运行后，这个接口必须确保 47 + 以前对地址空间‘mm’的任何页表修改对cpu来说是可见的。也就是说，在 48 + 运行后，TLB中不会有‘mm’的页表项。 49 + 50 + 这个接口被用来处理整个地址空间的页表操作，比如在fork和exec过程 51 + 中发生的事情。 52 + 53 + 3) ``void flush_tlb_range(struct vm_area_struct *vma, 54 + unsigned long start, unsigned long end)`` 55 + 56 + 这里我们要从TLB中刷新一个特定范围的（用户）虚拟地址转换。在运行后， 57 + 这个接口必须确保以前对‘start’到‘end-1’范围内的地址空间‘vma->vm_mm’ 58 + 的任何页表修改对cpu来说是可见的。也就是说，在运行后，TLB中不会有 59 + ‘mm’的页表项用于‘start’到‘end-1’范围内的虚拟地址。 60 + 61 + “vma”是用于该区域的备份存储。主要是用于munmap()类型的操作。 62 + 63 + 提供这个接口是希望端口能够找到一个合适的有效方法来从TLB中删除多 64 + 个页面大小的转换，而不是让内核为每个可能被修改的页表项调用 65 + flush_tlb_page(见下文)。 66 + 67 + 4) ``void flush_tlb_page(struct vm_area_struct *vma, unsigned long addr)`` 68 + 69 + 这一次我们需要从TLB中删除PAGE_SIZE大小的转换。‘vma’是Linux用来跟 70 + 踪进程的mmap区域的支持结构体，地址空间可以通过vma->vm_mm获得。另 71 + 外，可以通过测试（vma->vm_flags & VM_EXEC）来查看这个区域是否是 72 + 可执行的（因此在split-tlb类型的设置中可能在“指令TLB”中）。 73 + 74 + 在运行后，这个接口必须确保之前对用户虚拟地址“addr”的地址空间 75 + “vma->vm_mm”的页表修改对cpu来说是可见的。也就是说，在运行后，TLB 76 + 中不会有虚拟地址‘addr’的‘vma->vm_mm’的页表项。 77 + 78 + 这主要是在故障处理时使用。 79 + 80 + 5) ``void update_mmu_cache(struct vm_area_struct *vma, 81 + unsigned long address, pte_t *ptep)`` 82 + 83 + 在每个页面故障结束时，这个程序被调用，以告诉体系结构特定的代码，在 84 + 软件页表中，在地址空间“vma->vm_mm”的虚拟地址“地址”处，现在存在 85 + 一个翻译。 86 + 87 + 可以用它所选择的任何方式使用这个信息来进行移植。例如，它可以使用这 88 + 个事件来为软件管理的TLB配置预装TLB转换。目前sparc64移植就是这么干 89 + 的。 90 + 91 + 接下来，我们有缓存刷新接口。一般来说，当Linux将现有的虚拟->物理映射 92 + 改变为新的值时，其顺序将是以下形式之一:: 93 + 94 + 1) flush_cache_mm(mm); 95 + change_all_page_tables_of(mm); 96 + flush_tlb_mm(mm); 97 + 98 + 2) flush_cache_range(vma, start, end); 99 + change_range_of_page_tables(mm, start, end); 100 + flush_tlb_range(vma, start, end); 101 + 102 + 3) flush_cache_page(vma, addr, pfn); 103 + set_pte(pte_pointer, new_pte_val); 104 + flush_tlb_page(vma, addr); 105 + 106 + 缓存级别的刷新将永远是第一位的，因为这允许我们正确处理那些缓存严格， 107 + 且在虚拟地址被从缓存中刷新时要求一个虚拟地址的虚拟->物理转换存在的系统。 108 + HyperSparc cpu就是这样一个具有这种属性的cpu。 109 + 110 + 下面的缓存刷新程序只需要在特定的cpu需要的范围内处理缓存刷新。大多数 111 + 情况下，这些程序必须为cpu实现，这些cpu有虚拟索引的缓存，当虚拟->物 112 + 理转换被改变或移除时，必须被刷新。因此，例如，IA32处理器的物理索引 113 + 的物理标记的缓存没有必要实现这些接口，因为这些缓存是完全同步的，并 114 + 且不依赖于翻译信息。 115 + 116 + 下面逐个列出这些程序: 117 + 118 + 1) ``void flush_cache_mm(struct mm_struct *mm)`` 119 + 120 + 这个接口将整个用户地址空间从高速缓存中刷掉。也就是说，在运行后， 121 + 将没有与‘mm’相关的缓存行。 122 + 123 + 这个接口被用来处理整个地址空间的页表操作，比如在退出和执行过程 124 + 中发生的事情。 125 + 126 + 2) ``void flush_cache_dup_mm(struct mm_struct *mm)`` 127 + 128 + 这个接口将整个用户地址空间从高速缓存中刷新掉。也就是说，在运行 129 + 后，将没有与‘mm’相关的缓存行。 130 + 131 + 这个接口被用来处理整个地址空间的页表操作，比如在fork过程中发生 132 + 的事情。 133 + 134 + 这个选项与flush_cache_mm分开，以允许对VIPT缓存进行一些优化。 135 + 136 + 3) ``void flush_cache_range(struct vm_area_struct *vma, 137 + unsigned long start, unsigned long end)`` 138 + 139 + 在这里，我们要从缓存中刷新一个特定范围的（用户）虚拟地址。运行 140 + 后，在“start”到“end-1”范围内的虚拟地址的“vma->vm_mm”的缓存中 141 + 将没有页表项。 142 + 143 + “vma”是被用于该区域的备份存储。主要是用于munmap()类型的操作。 144 + 145 + 提供这个接口是希望端口能够找到一个合适的有效方法来从缓存中删 146 + 除多个页面大小的区域，而不是让内核为每个可能被修改的页表项调 147 + 用 flush_cache_page (见下文)。 148 + 149 + 4) ``void flush_cache_page(struct vm_area_struct *vma, unsigned long addr, unsigned long pfn)`` 150 + 151 + 这一次我们需要从缓存中删除一个PAGE_SIZE大小的区域。“vma”是 152 + Linux用来跟踪进程的mmap区域的支持结构体，地址空间可以通过 153 + vma->vm_mm获得。另外，我们可以通过测试（vma->vm_flags & 154 + VM_EXEC）来查看这个区域是否是可执行的（因此在“Harvard”类 155 + 型的缓存布局中可能是在“指令缓存”中）。 156 + 157 + “pfn”表示“addr”所对应的物理页框（通过PAGE_SHIFT左移这个 158 + 值来获得物理地址）。正是这个映射应该从缓存中删除。 159 + 160 + 在运行之后，对于虚拟地址‘addr’的‘vma->vm_mm’，在缓存中不会 161 + 有任何页表项，它被翻译成‘pfn’。 162 + 163 + 这主要是在故障处理过程中使用。 164 + 165 + 5) ``void flush_cache_kmaps(void)`` 166 + 167 + 只有在平台使用高位内存的情况下才需要实现这个程序。它将在所有的 168 + kmaps失效之前被调用。 169 + 170 + 运行后，内核虚拟地址范围PKMAP_ADDR(0)到PKMAP_ADDR(LAST_PKMAP) 171 + 的缓存中将没有页表项。 172 + 173 + 这个程序应该在asm/highmem.h中实现。 174 + 175 + 6) ``void flush_cache_vmap(unsigned long start, unsigned long end)`` 176 + ``void flush_cache_vunmap(unsigned long start, unsigned long end)`` 177 + 178 + 在这里，在这两个接口中，我们从缓存中刷新一个特定范围的（内核） 179 + 虚拟地址。运行后，在“start”到“end-1”范围内的虚拟地址的内核地 180 + 址空间的缓存中不会有页表项。 181 + 182 + 这两个程序中的第一个是在vmap_range()安装了页表项之后调用的。 183 + 第二个是在vunmap_range()删除页表项之前调用的。 184 + 185 + 还有一类cpu缓存问题，目前需要一套完全不同的接口来正确处理。最大 186 + 的问题是处理器的数据缓存中的虚拟别名。 187 + 188 + .. note:: 189 + 190 + 这段内容有些晦涩，为了减轻中文阅读压力，特作此译注。 191 + 192 + 别名（alias）属于缓存一致性问题，当不同的虚拟地址映射相同的 193 + 物理地址，而这些虚拟地址的index不同，此时就发生了别名现象(多 194 + 个虚拟地址被称为别名)。通俗点来说就是指同一个物理地址的数据被 195 + 加载到不同的cacheline中就会出现别名现象。 196 + 197 + 常见的解决方法有两种：第一种是硬件维护一致性，设计特定的cpu电 198 + 路来解决问题（例如设计为PIPT的cache）；第二种是软件维护一致性， 199 + 就是下面介绍的sparc的解决方案——页面染色，涉及的技术细节太多， 200 + 译者不便展开，请读者自行查阅相关资料。 201 + 202 + 您的移植是否容易在其D-cache中出现虚拟别名？嗯，如果您的D-cache 203 + 是虚拟索引的，且cache大于PAGE_SIZE（页大小），并且不能防止同一 204 + 物理地址的多个cache行同时存在，您就会遇到这个问题。 205 + 206 + 如果你的D-cache有这个问题，首先正确定义asm/shmparam.h SHMLBA， 207 + 它基本上应该是你的虚拟寻址D-cache的大小（或者如果大小是可变的， 208 + 则是最大的可能大小）。这个设置将迫使SYSv IPC层只允许用户进程在 209 + 这个值的倍数的地址上对共享内存进行映射。 210 + 211 + .. note:: 212 + 213 + 这并不能解决共享mmaps的问题，请查看sparc64移植解决 214 + 这个问题的一个方法（特别是 SPARC_FLAG_MMAPSHARED）。 215 + 216 + 接下来，你必须解决所有其他情况下的D-cache别名问题。请记住这个事 217 + 实，对于一个给定的页面映射到某个用户地址空间，总是至少还有一个映 218 + 射，那就是内核在其线性映射中从PAGE_OFFSET开始。因此，一旦第一个 219 + 用户将一个给定的物理页映射到它的地址空间，就意味着D-cache的别名 220 + 问题有可能存在，因为内核已经将这个页映射到它的虚拟地址。 221 + 222 + ``void copy_user_page(void *to, void *from, unsigned long addr, struct page *page)`` 223 + ``void clear_user_page(void *to, unsigned long addr, struct page *page)`` 224 + 225 + 这两个程序在用户匿名或COW页中存储数据。它允许一个端口有效地 226 + 避免用户空间和内核之间的D-cache别名问题。 227 + 228 + 例如，一个端口可以在复制过程中把“from”和“to”暂时映射到内核 229 + 的虚拟地址上。这两个页面的虚拟地址的选择方式是，内核的加载/存 230 + 储指令发生在虚拟地址上，而这些虚拟地址与用户的页面映射是相同 231 + 的“颜色”。例如，Sparc64就使用这种技术。 232 + 233 + “addr”参数告诉了用户最终要映射这个页面的虚拟地址，“page”参 234 + 数给出了一个指向目标页结构体的指针。 235 + 236 + 如果D-cache别名不是问题，这两个程序可以简单地直接调用 237 + memcpy/memset而不做其他事情。 238 + 239 + ``void flush_dcache_page(struct page *page)`` 240 + 241 + 任何时候，当内核写到一个页面缓存页，或者内核要从一个页面缓存 242 + 页中读出，并且这个页面的用户空间共享/可写映射可能存在时， 243 + 这个程序就会被调用。 244 + 245 + .. note:: 246 + 247 + 这个程序只需要为有可能被映射到用户进程的地址空间的 248 + 页面缓存调用。因此，例如，处理页面缓存中vfs符号链 249 + 接的VFS层代码根本不需要调用这个接口。 250 + 251 + “内核写入页面缓存的页面”这句话的意思是，具体来说，内核执行存 252 + 储指令，在该页面的页面->虚拟映射处弄脏该页面的数据。在这里，通 253 + 过刷新的手段处理D-cache的别名是很重要的，以确保这些内核存储对 254 + 该页的用户空间映射是可见的。 255 + 256 + 推论的情况也同样重要，如果有用户对这个文件有共享+可写的映射， 257 + 我们必须确保内核对这些页面的读取会看到用户所做的最新的存储。 258 + 259 + 如果D-cache别名不是一个问题，这个程序可以简单地定义为该架构上 260 + 的nop。 261 + 262 + 在page->flags (PG_arch_1)中有一个位是“架构私有”。内核保证， 263 + 对于分页缓存的页面，当这样的页面第一次进入分页缓存时，它将清除 264 + 这个位。 265 + 266 + 这使得这些接口可以更有效地被实现。如果目前没有用户进程映射这个 267 + 页面，它允许我们“推迟”（也许是无限期）实际的刷新过程。请看 268 + sparc64的flush_dcache_page和update_mmu_cache实现，以了解如 269 + 何做到这一点。 270 + 271 + 这个想法是，首先在flush_dcache_page()时，如果page->mapping->i_mmap 272 + 是一个空树，只需标记架构私有页标志位。之后，在update_mmu_cache() 273 + 中，会对这个标志位进行检查，如果设置了，就进行刷新，并清除标志位。 274 + 275 + .. important:: 276 + 277 + 通常很重要的是，如果你推迟刷新，实际的刷新发生在同一个 278 + CPU上，因为它将cpu存储到页面上，使其变脏。同样，请看 279 + sparc64关于如何处理这个问题的例子。 280 + 281 + ``void copy_to_user_page(struct vm_area_struct *vma, struct page *page, 282 + unsigned long user_vaddr, void *dst, void *src, int len)`` 283 + ``void copy_from_user_page(struct vm_area_struct *vma, struct page *page, 284 + unsigned long user_vaddr, void *dst, void *src, int len)`` 285 + 286 + 当内核需要复制任意的数据进出任意的用户页时（比如ptrace()），它将使 287 + 用这两个程序。 288 + 289 + 任何必要的缓存刷新或其他需要发生的一致性操作都应该在这里发生。如果 290 + 处理器的指令缓存没有对cpu存储进行窥探，那么你很可能需要为 291 + copy_to_user_page()刷新指令缓存。 292 + 293 + ``void flush_anon_page(struct vm_area_struct *vma, struct page *page, 294 + unsigned long vmaddr)`` 295 + 296 + 当内核需要访问一个匿名页的内容时，它会调用这个函数（目前只有 297 + get_user_pages()）。注意：flush_dcache_page()故意对匿名页不起作 298 + 用。默认的实现是nop（对于所有相干的架构应该保持这样）。对于不一致性 299 + 的架构，它应该刷新vmaddr处的页面缓存。 300 + 301 + ``void flush_kernel_dcache_page(struct page *page)`` 302 + 303 + 当内核需要修改一个用kmap获得的用户页时，它会在所有修改完成后（但在 304 + kunmapping之前）调用这个函数，以使底层页面达到最新状态。这里假定用 305 + 户没有不一致性的缓存副本（即原始页面是从类似get_user_pages()的机制 306 + 中获得的）。默认的实现是一个nop，在所有相干的架构上都应该如此。在不 307 + 一致性的架构上，这应该刷新内核缓存中的页面（使用page_address(page)）。 308 + 309 + 310 + ``void flush_icache_range(unsigned long start, unsigned long end)`` 311 + 312 + 当内核存储到它将执行的地址中时（例如在加载模块时），这个函数被调用。 313 + 314 + 如果icache不对存储进行窥探，那么这个程序将需要对其进行刷新。 315 + 316 + ``void flush_icache_page(struct vm_area_struct *vma, struct page *page)`` 317 + 318 + flush_icache_page的所有功能都可以在flush_dcache_page和update_mmu_cache 319 + 中实现。在未来，我们希望能够完全删除这个接口。 320 + 321 + 最后一类API是用于I/O到内核内特意设置的别名地址范围。这种别名是通过使用 322 + vmap/vmalloc API设置的。由于内核I/O是通过物理页进行的，I/O子系统假定用户 323 + 映射和内核偏移映射是唯一的别名。这对vmap别名来说是不正确的，所以内核中任何 324 + 试图对vmap区域进行I/O的东西都必须手动管理一致性。它必须在做I/O之前刷新vmap 325 + 范围，并在I/O返回后使其失效。 326 + 327 + ``void flush_kernel_vmap_range(void *vaddr, int size)`` 328 + 329 + 刷新vmap区域中指定的虚拟地址范围的内核缓存。这是为了确保内核在vmap范围 330 + 内修改的任何数据对物理页是可见的。这个设计是为了使这个区域可以安全地执 331 + 行I/O。注意，这个API并 *没有* 刷新该区域的偏移映射别名。 332 + 333 + ``void invalidate_kernel_vmap_range(void *vaddr, int size) invalidates`` 334 + 335 + 在vmap区域的一个给定的虚拟地址范围的缓存，这可以防止处理器在物理页的I/O 336 + 发生时通过投机性地读取数据而使缓存变脏。这只对读入vmap区域的数据是必要的。

+17 -7

Documentation/translations/zh_CN/core-api/index.rst

··· 19 19 来的大量 kerneldoc 信息；有朝一日，若有人有动力的话，应当把它们拆分 20 20 出来。 21 21 22 - Todolist: 22 + .. toctree:: 23 + :maxdepth: 1 23 24 24 25 kernel-api 25 - workqueue 26 26 printk-basics 27 27 printk-formats 28 + workqueue 28 29 symbol-namespaces 29 30 30 31 数据结构和低级实用程序 ··· 33 32 34 33 在整个内核中使用的函数库。 35 34 36 - Todolist: 35 + .. toctree:: 36 + :maxdepth: 1 37 37 38 38 kobject 39 + 40 + Todolist: 41 + 39 42 kref 40 43 assoc_array 41 44 xarray ··· 63 58 :maxdepth: 1 64 59 65 60 irq/index 66 - 67 - Todolist: 68 - 69 61 refcount-vs-atomic 70 62 local_ops 71 63 padata 64 + 65 + Todolist: 66 + 72 67 ../RCU/index 73 68 74 69 低级硬件管理 ··· 76 71 77 72 缓存管理，CPU热插拔管理等。 78 73 79 - Todolist: 74 + .. toctree:: 75 + :maxdepth: 1 80 76 81 77 cachetlb 78 + 79 + Todolist: 80 + 81 + 82 82 cpu_hotplug 83 83 memory-hotplug 84 84 genericirq

+369

Documentation/translations/zh_CN/core-api/kernel-api.rst

··· 1 + .. include:: ../disclaimer-zh_CN.rst 2 + 3 + :Original: Documentation/core-api/kernel-api.rst 4 + :Translator: Yanteng Si <siyanteng@loongson.cn> 5 + 6 + .. _cn_kernel-api.rst: 7 + 8 + 9 + ============ 10 + Linux内核API 11 + ============ 12 + 13 + 14 + 列表管理函数 15 + ============ 16 + 17 + 该API在以下内核代码中: 18 + 19 + include/linux/list.h 20 + 21 + 基本的C库函数 22 + ============= 23 + 24 + 在编写驱动程序时，一般不能使用C库中的例程。部分函数通常很有用，它们在 25 + 下面被列出。这些函数的行为可能会与ANSI定义的略有不同，这些偏差会在文中 26 + 注明。 27 + 28 + 字符串转换 29 + ---------- 30 + 31 + 该API在以下内核代码中: 32 + 33 + lib/vsprintf.c 34 + 35 + include/linux/kernel.h 36 + 37 + include/linux/kernel.h 38 + 39 + lib/kstrtox.c 40 + 41 + lib/string_helpers.c 42 + 43 + 字符串处理 44 + ---------- 45 + 46 + 该API在以下内核代码中: 47 + 48 + lib/string.c 49 + 50 + include/linux/string.h 51 + 52 + mm/util.c 53 + 54 + 基本的内核库函数 55 + ================ 56 + 57 + Linux内核提供了很多实用的基本函数。 58 + 59 + 位运算 60 + ------ 61 + 62 + 该API在以下内核代码中: 63 + 64 + include/asm-generic/bitops/instrumented-atomic.h 65 + 66 + include/asm-generic/bitops/instrumented-non-atomic.h 67 + 68 + include/asm-generic/bitops/instrumented-lock.h 69 + 70 + 位图运算 71 + -------- 72 + 73 + 该API在以下内核代码中: 74 + 75 + lib/bitmap.c 76 + 77 + include/linux/bitmap.h 78 + 79 + include/linux/bitmap.h 80 + 81 + include/linux/bitmap.h 82 + 83 + lib/bitmap.c 84 + 85 + lib/bitmap.c 86 + 87 + include/linux/bitmap.h 88 + 89 + 命令行解析 90 + ---------- 91 + 92 + 该API在以下内核代码中: 93 + 94 + lib/cmdline.c 95 + 96 + 排序 97 + ---- 98 + 99 + 该API在以下内核代码中: 100 + 101 + lib/sort.c 102 + 103 + lib/list_sort.c 104 + 105 + 文本检索 106 + -------- 107 + 108 + 该API在以下内核代码中: 109 + 110 + lib/textsearch.c 111 + 112 + lib/textsearch.c 113 + 114 + include/linux/textsearch.h 115 + 116 + Linux中的CRC和数学函数 117 + ====================== 118 + 119 + 120 + CRC函数 121 + ------- 122 + 123 + *译注：CRC，Cyclic Redundancy Check，循环冗余校验* 124 + 125 + 该API在以下内核代码中: 126 + 127 + lib/crc4.c 128 + 129 + lib/crc7.c 130 + 131 + lib/crc8.c 132 + 133 + lib/crc16.c 134 + 135 + lib/crc32.c 136 + 137 + lib/crc-ccitt.c 138 + 139 + lib/crc-itu-t.c 140 + 141 + 基数为2的对数和幂函数 142 + --------------------- 143 + 144 + 该API在以下内核代码中: 145 + 146 + include/linux/log2.h 147 + 148 + 整数幂函数 149 + ---------- 150 + 151 + 该API在以下内核代码中: 152 + 153 + lib/math/int_pow.c 154 + 155 + lib/math/int_sqrt.c 156 + 157 + 除法函数 158 + -------- 159 + 160 + 该API在以下内核代码中: 161 + 162 + include/asm-generic/div64.h 163 + 164 + include/linux/math64.h 165 + 166 + lib/math/div64.c 167 + 168 + lib/math/gcd.c 169 + 170 + UUID/GUID 171 + --------- 172 + 173 + 该API在以下内核代码中: 174 + 175 + lib/uuid.c 176 + 177 + 内核IPC设备 178 + =========== 179 + 180 + IPC实用程序 181 + ----------- 182 + 183 + 该API在以下内核代码中: 184 + 185 + ipc/util.c 186 + 187 + FIFO 缓冲区 188 + =========== 189 + 190 + kfifo接口 191 + --------- 192 + 193 + 该API在以下内核代码中: 194 + 195 + include/linux/kfifo.h 196 + 197 + 转发接口支持 198 + ============ 199 + 200 + 转发接口支持旨在为工具和设备提供一种有效的机制，将大量数据从内核空间 201 + 转发到用户空间。 202 + 203 + 转发接口 204 + -------- 205 + 206 + 该API在以下内核代码中: 207 + 208 + kernel/relay.c 209 + 210 + kernel/relay.c 211 + 212 + 模块支持 213 + ======== 214 + 215 + 模块加载 216 + -------- 217 + 218 + 该API在以下内核代码中: 219 + 220 + kernel/kmod.c 221 + 222 + 模块接口支持 223 + ------------ 224 + 225 + 更多信息请参考文件kernel/module.c。 226 + 227 + 硬件接口 228 + ======== 229 + 230 + 231 + 该API在以下内核代码中: 232 + 233 + kernel/dma.c 234 + 235 + 资源管理 236 + -------- 237 + 238 + 该API在以下内核代码中: 239 + 240 + kernel/resource.c 241 + 242 + kernel/resource.c 243 + 244 + MTRR处理 245 + -------- 246 + 247 + 该API在以下内核代码中: 248 + 249 + arch/x86/kernel/cpu/mtrr/mtrr.c 250 + 251 + 安全框架 252 + ======== 253 + 254 + 该API在以下内核代码中: 255 + 256 + security/security.c 257 + 258 + security/inode.c 259 + 260 + 审计接口 261 + ======== 262 + 263 + 该API在以下内核代码中: 264 + 265 + kernel/audit.c 266 + 267 + kernel/auditsc.c 268 + 269 + kernel/auditfilter.c 270 + 271 + 核算框架 272 + ======== 273 + 274 + 该API在以下内核代码中: 275 + 276 + kernel/acct.c 277 + 278 + 块设备 279 + ====== 280 + 281 + 该API在以下内核代码中: 282 + 283 + block/blk-core.c 284 + 285 + block/blk-core.c 286 + 287 + block/blk-map.c 288 + 289 + block/blk-sysfs.c 290 + 291 + block/blk-settings.c 292 + 293 + block/blk-exec.c 294 + 295 + block/blk-flush.c 296 + 297 + block/blk-lib.c 298 + 299 + block/blk-integrity.c 300 + 301 + kernel/trace/blktrace.c 302 + 303 + block/genhd.c 304 + 305 + block/genhd.c 306 + 307 + 字符设备 308 + ======== 309 + 310 + 该API在以下内核代码中: 311 + 312 + fs/char_dev.c 313 + 314 + 时钟框架 315 + ======== 316 + 317 + 时钟框架定义了编程接口，以支持系统时钟树的软件管理。该框架广泛用于系统级芯片（SOC）平 318 + 台，以支持电源管理和各种可能需要自定义时钟速率的设备。请注意，这些 “时钟”与计时或实 319 + 时时钟(RTC)无关，它们都有单独的框架。这些:c:type: `struct clk <clk>` 实例可用于管理 320 + 各种时钟信号，例如一个96理例如96MHz的时钟信号，该信号可被用于总线或外设的数据交换，或以 321 + 其他方式触发系统硬件中的同步状态机转换。 322 + 323 + 通过明确的软件时钟门控来支持电源管理：未使用的时钟被禁用，因此系统不会因为改变不在使用 324 + 中的晶体管的状态而浪费电源。在某些系统中，这可能是由硬件时钟门控支持的，其中时钟被门控 325 + 而不在软件中被禁用。芯片的部分，在供电但没有时钟的情况下，可能会保留其最后的状态。这种 326 + 低功耗状态通常被称为*保留模式*。这种模式仍然会产生漏电流，特别是在电路几何结构较细的情 327 + 况下，但对于CMOS电路来说，电能主要是随着时钟翻转而被消耗的。 328 + 329 + 电源感知驱动程序只有在其管理的设备处于活动使用状态时才会启用时钟。此外，系统睡眠状态通 330 + 常根据哪些时钟域处于活动状态而有所不同：“待机”状态可能允许从多个活动域中唤醒，而 331 + "mem"（暂停到RAM）状态可能需要更全面地关闭来自高速PLL和振荡器的时钟，从而限制了可能 332 + 的唤醒事件源的数量。驱动器的暂停方法可能需要注意目标睡眠状态的系统特定时钟约束。 333 + 334 + 一些平台支持可编程时钟发生器。这些可以被各种外部芯片使用，如其他CPU、多媒体编解码器以 335 + 及对接口时钟有严格要求的设备。 336 + 337 + 该API在以下内核代码中: 338 + 339 + include/linux/clk.h 340 + 341 + 同步原语 342 + ======== 343 + 344 + 读-复制-更新（RCU） 345 + ------------------- 346 + 347 + 该API在以下内核代码中: 348 + 349 + include/linux/rcupdate.h 350 + 351 + kernel/rcu/tree.c 352 + 353 + kernel/rcu/tree_exp.h 354 + 355 + kernel/rcu/update.c 356 + 357 + include/linux/srcu.h 358 + 359 + kernel/rcu/srcutree.c 360 + 361 + include/linux/rculist_bl.h 362 + 363 + include/linux/rculist.h 364 + 365 + include/linux/rculist_nulls.h 366 + 367 + include/linux/rcu_sync.h 368 + 369 + kernel/rcu/sync.c

+378

Documentation/translations/zh_CN/core-api/kobject.rst

··· 1 + .. include:: ../disclaimer-zh_CN.rst 2 + 3 + :Original: Documentation/core-api/kobject.rst 4 + :Translator: Yanteng Si <siyanteng@loongson.cn> 5 + 6 + .. _cn_core_api_kobject.rst: 7 + 8 + ======================================================= 9 + 关于kobjects、ksets和ktypes的一切你没想过需要了解的东西 10 + ======================================================= 11 + 12 + :作者: Greg Kroah-Hartman <gregkh@linuxfoundation.org> 13 + :最后一次更新: December 19, 2007 14 + 15 + 根据Jon Corbet于2003年10月1日为lwn.net撰写的原创文章改编，网址是： 16 + https://lwn.net/Articles/51437/ 17 + 18 + 理解驱动模型和建立在其上的kobject抽象的部分的困难在于，没有明显的切入点。 19 + 处理kobjects需要理解一些不同的类型，所有这些类型都会相互引用。为了使事情 20 + 变得更简单，我们将多路并进，从模糊的术语开始，并逐渐增加细节。那么，先来 21 + 了解一些我们将要使用的术语的简明定义吧。 22 + 23 + - 一个kobject是一个kobject结构体类型的对象。Kobjects有一个名字和一个 24 + 引用计数。一个kobject也有一个父指针（允许对象被排列成层次结构），一个 25 + 特定的类型，并且，通常在sysfs虚拟文件系统中表示。 26 + 27 + Kobjects本身通常并不引人关注；相反它们常常被嵌入到其他包含真正引人注目 28 + 的代码的结构体中。 29 + 30 + 任何结构体都 **不应该** 有一个以上的kobject嵌入其中。如果有的话，对象的引用计 31 + 数肯定会被打乱，而且不正确，你的代码就会出现错误。所以不要这样做。 32 + 33 + - ktype是嵌入一个kobject的对象的类型。每个嵌入kobject的结构体都需要一个 34 + 相应的ktype。ktype控制着kobject在被创建和销毁时的行为。 35 + 36 + - 一个kset是一组kobjects。这些kobjects可以是相同的ktype或者属于不同的 37 + ktype。kset是kobjects集合的基本容器类型。Ksets包含它们自己的kobjects， 38 + 但你可以安全地忽略这个实现细节，因为kset的核心代码会自动处理这个kobject。 39 + 40 + 当你看到一个下面全是其他目录的sysfs目录时，通常这些目录中的每一个都对应 41 + 于同一个kset中的一个kobject。 42 + 43 + 我们将研究如何创建和操作所有这些类型。将采取一种自下而上的方法，所以我们 44 + 将回到kobjects。 45 + 46 + 47 + 嵌入kobjects 48 + ============= 49 + 50 + 内核代码很少创建孤立的kobject，只有一个主要的例外，下面会解释。相反， 51 + kobjects被用来控制对一个更大的、特定领域的对象的访问。为此，kobjects会被 52 + 嵌入到其他结构中。如果你习惯于用面向对象的术语来思考问题，那么kobjects可 53 + 以被看作是一个顶级的抽象类，其他的类都是从它派生出来的。一个kobject实现了 54 + 一系列的功能，这些功能本身并不特别有用，但在其他对象中却很好用。C语言不允 55 + 许直接表达继承，所以必须使用其他技术——比如结构体嵌入。 56 + 57 + （对于那些熟悉内核链表实现的人来说，这类似于“list_head”结构本身很少有用， 58 + 但总是被嵌入到感兴趣的更大的对象中）。 59 + 60 + 例如， ``drivers/uio/uio.c`` 中的IO代码有一个结构体，定义了与uio设备相 61 + 关的内存区域:: 62 + 63 + struct uio_map { 64 + struct kobject kobj; 65 + struct uio_mem *mem; 66 + }; 67 + 68 + 如果你有一个uio_map结构体，找到其嵌入的kobject只是一个使用kobj成员的问题。 69 + 然而，与kobjects一起工作的代码往往会遇到相反的问题：给定一个结构体kobject 70 + 的指针，指向包含结构体的指针是什么？你必须避免使用一些技巧（比如假设 71 + kobject在结构的开头），相反，你得使用container_of()宏，其可以在 ``<linux/kernel.h>`` 72 + 中找到:: 73 + 74 + container_of(ptr, type, member) 75 + 76 + 其中: 77 + 78 + * ``ptr`` 是一个指向嵌入kobject的指针， 79 + * ``type`` 是包含结构体的类型， 80 + * ``member`` 是 ``指针`` 所指向的结构体域的名称。 81 + 82 + container_of()的返回值是一个指向相应容器类型的指针。因此，例如，一个嵌入到 83 + uio_map结构 **中** 的kobject结构体的指针kp可以被转换为一个指向 **包含** uio_map 84 + 结构体的指针，方法是:: 85 + 86 + struct uio_map *u_map = container_of(kp, struct uio_map, kobj); 87 + 88 + 为了方便起见，程序员经常定义一个简单的宏，用于将kobject指针 **反推** 到包含 89 + 类型。在早期的 ``drivers/uio/uio.c`` 中正是如此，你可以在这里看到:: 90 + 91 + struct uio_map { 92 + struct kobject kobj; 93 + struct uio_mem *mem; 94 + }; 95 + 96 + #define to_map(map) container_of(map, struct uio_map, kobj) 97 + 98 + 其中宏的参数“map”是一个指向有关的kobject结构体的指针。该宏随后被调用:: 99 + 100 + struct uio_map *map = to_map(kobj); 101 + 102 + 103 + kobjects的初始化 104 + ================ 105 + 106 + 当然，创建kobject的代码必须初始化该对象。一些内部字段是通过（强制）调用kobject_init() 107 + 来设置的:: 108 + 109 + void kobject_init(struct kobject *kobj, struct kobj_type *ktype); 110 + 111 + ktype是正确创建kobject的必要条件，因为每个kobject都必须有一个相关的kobj_type。 112 + 在调用kobject_init()后，为了向sysfs注册kobject，必须调用函数kobject_add():: 113 + 114 + int kobject_add(struct kobject *kobj, struct kobject *parent, 115 + const char *fmt, ...); 116 + 117 + 这将正确设置kobject的父级和kobject的名称。如果该kobject要与一个特定的kset相关 118 + 联，在调用kobject_add()之前必须分配kobj->kset。如果kset与kobject相关联，则 119 + kobject的父级可以在调用kobject_add()时被设置为NULL，则kobject的父级将是kset 120 + 本身。 121 + 122 + 由于kobject的名字是在它被添加到内核时设置的，所以kobject的名字不应该被直接操作。 123 + 如果你必须改变kobject的名字，请调用kobject_rename():: 124 + 125 + int kobject_rename(struct kobject *kobj, const char *new_name); 126 + 127 + kobject_rename()函数不会执行任何锁定操作，也不会对name进行可靠性检查，所以调用 128 + 者自己检查和串行化操作是明智的选择 129 + 130 + 有一个叫kobject_set_name()的函数，但那是历史遗产，正在被删除。如果你的代码需 131 + 要调用这个函数，那么它是不正确的，需要被修复。 132 + 133 + 要正确访问kobject的名称，请使用函数kobject_name():: 134 + 135 + const char *kobject_name(const struct kobject * kobj); 136 + 137 + 有一个辅助函数可以同时初始化和添加kobject到内核中，令人惊讶的是，该函数被称为 138 + kobject_init_and_add():: 139 + 140 + int kobject_init_and_add(struct kobject *kobj, struct kobj_type *ktype, 141 + struct kobject *parent, const char *fmt, ...); 142 + 143 + 参数与上面描述的单个kobject_init()和kobject_add()函数相同。 144 + 145 + 146 + Uevents 147 + ======= 148 + 149 + 当一个kobject被注册到kobject核心后，你需要向全世界宣布它已经被创建了。这可以通 150 + 过调用kobject_uevent()来实现:: 151 + 152 + int kobject_uevent(struct kobject *kobj, enum kobject_action action); 153 + 154 + 当kobject第一次被添加到内核时，使用 *KOBJ_ADD* 动作。这应该在该kobject的任 155 + 何属性或子对象被正确初始化后进行，因为当这个调用发生时，用户空间会立即开始寻 156 + 找它们。 157 + 158 + 当kobject从内核中移除时（关于如何做的细节在下面）， **KOBJ_REMOVE** 的uevent 159 + 将由kobject核心自动创建，所以调用者不必担心手动操作。 160 + 161 + 162 + 引用计数 163 + ======== 164 + 165 + kobject的关键功能之一是作为它所嵌入的对象的一个引用计数器。只要对该对象的引用 166 + 存在，该对象（以及支持它的代码）就必须继续存在。用于操作kobject的引用计数的低 167 + 级函数是:: 168 + 169 + struct kobject *kobject_get(struct kobject *kobj); 170 + void kobject_put(struct kobject *kobj); 171 + 172 + 对kobject_get()的成功调用将增加kobject的引用计数器值并返回kobject的指针。 173 + 174 + 当引用被释放时，对kobject_put()的调用将递减引用计数值，并可能释放该对象。请注 175 + 意，kobject_init()将引用计数设置为1，所以设置kobject的代码最终需要kobject_put() 176 + 来释放该引用。 177 + 178 + 因为kobjects是动态的，所以它们不能以静态方式或在堆栈中声明，而总是以动态方式分 179 + 配。未来版本的内核将包含对静态创建的kobjects的运行时检查，并将警告开发者这种不 180 + 当的使用。 181 + 182 + 如果你使用struct kobject只是为了给你的结构体提供一个引用计数器，请使用struct kref 183 + 来代替；kobject是多余的。关于如何使用kref结构体的更多信息，请参见Linux内核源代 184 + 码树中的文件Documentation/core-api/kref.rst 185 + 186 + 187 + 创建“简单的”kobjects 188 + ==================== 189 + 190 + 有时，开发者想要的只是在sysfs层次结构中创建一个简单的目录，而不必去搞那些复杂 191 + 的ksets、显示和存储函数，以及其他细节。这是一个应该创建单个kobject的例外。要 192 + 创建这样一个条目（即简单的目录），请使用函数:: 193 + 194 + struct kobject *kobject_create_and_add(const char *name, struct kobject *parent); 195 + 196 + 这个函数将创建一个kobject，并将其放在sysfs中指定的父kobject下面的位置。要创 197 + 建与此kobject相关的简单属性，请使用:: 198 + 199 + int sysfs_create_file(struct kobject *kobj, const struct attribute *attr); 200 + 201 + 或者:: 202 + 203 + int sysfs_create_group(struct kobject *kobj, const struct attribute_group *grp); 204 + 205 + 这里使用的两种类型的属性，与已经用kobject_create_and_add()创建的kobject， 206 + 都可以是kobj_attribute类型，所以不需要创建特殊的自定义属性。 207 + 208 + 参见示例模块， ``samples/kobject/kobject-example.c`` ，以了解一个简单的 209 + kobject和属性的实现。 210 + 211 + 212 + 213 + ktypes和释放方法 214 + ================ 215 + 216 + 以上讨论中还缺少一件重要的事情，那就是当一个kobject的引用次数达到零的时候 217 + 会发生什么。创建kobject的代码通常不知道何时会发生这种情况；首先，如果它知 218 + 道，那么使用kobject就没有什么意义。当sysfs被引入时，即使是可预测的对象生命 219 + 周期也会变得更加复杂，因为内核的其他部分可以获得在系统中注册的任何kobject 220 + 的引用。 221 + 222 + 最终的结果是，一个由kobject保护的结构体在其引用计数归零之前不能被释放。引 223 + 用计数不受创建kobject的代码的直接控制。因此，每当它的一个kobjects的最后一 224 + 个引用消失时，必须异步通知该代码。 225 + 226 + 一旦你通过kobject_add()注册了你的kobject，你绝对不能使用kfree()来直接释 227 + 放它。唯一安全的方法是使用kobject_put()。在kobject_init()之后总是使用 228 + kobject_put()以避免错误的发生是一个很好的做法。 229 + 230 + 这个通知是通过kobject的release()方法完成的。通常这样的方法有如下形式:: 231 + 232 + void my_object_release(struct kobject *kobj) 233 + { 234 + struct my_object *mine = container_of(kobj, struct my_object, kobj); 235 + 236 + /* Perform any additional cleanup on this object, then... */ 237 + kfree(mine); 238 + } 239 + 240 + 有一点很重要：每个kobject都必须有一个release()方法，而且这个kobject必 241 + 须持续存在（处于一致的状态），直到这个方法被调用。如果这些约束条件没有 242 + 得到满足，那么代码就是有缺陷的。注意，如果你忘记提供release()方法，内 243 + 核会警告你。不要试图通过提供一个“空”的释放函数来摆脱这个警告。 244 + 245 + 如果你的清理函数只需要调用kfree()，那么你必须创建一个包装函数，该函数 246 + 使用container_of()来向上造型到正确的类型（如上面的例子所示），然后在整个 247 + 结构体上调用kfree()。 248 + 249 + 注意，kobject的名字在release函数中是可用的，但它不能在这个回调中被改 250 + 变。否则，在kobject核心中会出现内存泄漏，这让人很不爽。 251 + 252 + 有趣的是，release()方法并不存储在kobject本身；相反，它与ktype相关。 253 + 因此，让我们引入结构体kobj_type:: 254 + 255 + struct kobj_type { 256 + void (*release)(struct kobject *kobj); 257 + const struct sysfs_ops *sysfs_ops; 258 + struct attribute **default_attrs; 259 + const struct attribute_group **default_groups; 260 + const struct kobj_ns_type_operations *(*child_ns_type)(struct kobject *kobj); 261 + const void *(*namespace)(struct kobject *kobj); 262 + void (*get_ownership)(struct kobject *kobj, kuid_t *uid, kgid_t *gid); 263 + }; 264 + 265 + 这个结构提用来描述一个特定类型的kobject（或者更正确地说，包含对象的 266 + 类型）。每个kobject都需要有一个相关的kobj_type结构；当你调用 267 + kobject_init()或kobject_init_and_add()时必须指定一个指向该结构的 268 + 指针。 269 + 270 + 当然，kobj_type结构中的release字段是指向这种类型的kobject的release() 271 + 方法的一个指针。另外两个字段（sysfs_ops 和 default_attrs）控制这种 272 + 类型的对象如何在 sysfs 中被表示；它们超出了本文的范围。 273 + 274 + default_attrs 指针是一个默认属性的列表，它将为任何用这个 ktype 注册 275 + 的 kobject 自动创建。 276 + 277 + 278 + ksets 279 + ===== 280 + 281 + 一个kset仅仅是一个希望相互关联的kobjects的集合。没有限制它们必须是相 282 + 同的ktype，但是如果它们不是相同的，就要非常小心。 283 + 284 + 一个kset有以下功能: 285 + 286 + - 它像是一个包含一组对象的袋子。一个kset可以被内核用来追踪“所有块 287 + 设备”或“所有PCI设备驱动”。 288 + 289 + - kset也是sysfs中的一个子目录，与kset相关的kobjects可以在这里显示 290 + 出来。每个kset都包含一个kobject，它可以被设置为其他kobject的父对象； 291 + sysfs层次结构的顶级目录就是以这种方式构建的。 292 + 293 + - Ksets可以支持kobjects的 "热插拔"，并影响uevent事件如何被报告给 294 + 用户空间。 295 + 296 + 在面向对象的术语中，“kset”是顶级的容器类；ksets包含它们自己的kobject， 297 + 但是这个kobject是由kset代码管理的，不应该被任何其他用户所操纵。 298 + 299 + kset在一个标准的内核链表中保存它的子对象。Kobjects通过其kset字段指向其 300 + 包含的kset。在几乎所有的情况下，属于一个kset的kobjects在它们的父 301 + 对象中都有那个kset（或者，严格地说，它的嵌入kobject）。 302 + 303 + 由于kset中包含一个kobject，它应该总是被动态地创建，而不是静态地 304 + 或在堆栈中声明。要创建一个新的kset，请使用:: 305 + 306 + struct kset *kset_create_and_add(const char *name, 307 + const struct kset_uevent_ops *uevent_ops, 308 + struct kobject *parent_kobj); 309 + 310 + 当你完成对kset的处理后，调用:: 311 + 312 + void kset_unregister(struct kset *k); 313 + 314 + 来销毁它。这将从sysfs中删除该kset并递减其引用计数值。当引用计数 315 + 为零时，该kset将被释放。因为对该kset的其他引用可能仍然存在， 316 + 释放可能发生在kset_unregister()返回之后。 317 + 318 + 一个使用kset的例子可以在内核树中的 ``samples/kobject/kset-example.c`` 319 + 文件中看到。 320 + 321 + 如果一个kset希望控制与它相关的kobjects的uevent操作，它可以使用 322 + 结构体kset_uevent_ops来处理它:: 323 + 324 + struct kset_uevent_ops { 325 + int (* const filter)(struct kset *kset, struct kobject *kobj); 326 + const char *(* const name)(struct kset *kset, struct kobject *kobj); 327 + int (* const uevent)(struct kset *kset, struct kobject *kobj, 328 + struct kobj_uevent_env *env); 329 + }; 330 + 331 + 332 + 过滤器函数允许kset阻止一个特定kobject的uevent被发送到用户空间。 333 + 如果该函数返回0，该uevent将不会被发射出去。 334 + 335 + name函数将被调用以覆盖uevent发送到用户空间的kset的默认名称。默 336 + 认情况下，该名称将与kset本身相同，但这个函数，如果存在，可以覆盖 337 + 该名称。 338 + 339 + 当uevent即将被发送至用户空间时，uevent函数将被调用，以允许更多 340 + 的环境变量被添加到uevent中。 341 + 342 + 有人可能会问，鉴于没有提出执行该功能的函数，究竟如何将一个kobject 343 + 添加到一个kset中。答案是这个任务是由kobject_add()处理的。当一个 344 + kobject被传递给kobject_add()时，它的kset成员应该指向这个kobject 345 + 所属的kset。 kobject_add()将处理剩下的部分。 346 + 347 + 如果属于一个kset的kobject没有父kobject集，它将被添加到kset的目 348 + 录中。并非所有的kset成员都必须住在kset目录中。如果在添加kobject 349 + 之前分配了一个明确的父kobject，那么该kobject将被注册到kset中， 350 + 但是被添加到父kobject下面。 351 + 352 + 353 + 移除Kobject 354 + =========== 355 + 356 + 当一个kobject在kobject核心注册成功后，在代码使用完它时，必须将其 357 + 清理掉。要做到这一点，请调用kobject_put()。通过这样做，kobject核 358 + 心会自动清理这个kobject分配的所有内存。如果为这个对象发送了 ``KOBJ_ADD`` 359 + uevent，那么相应的 ``KOBJ_REMOVE`` uevent也将被发送，任何其他的 360 + sysfs内务将被正确处理。 361 + 362 + 如果你需要分两次对kobject进行删除（比如说在你要销毁对象时无权睡眠）， 363 + 那么调用kobject_del()将从sysfs中取消kobject的注册。这使得kobject 364 + “不可见”，但它并没有被清理掉，而且该对象的引用计数仍然是一样的。在稍 365 + 后的时间调用kobject_put()来完成与该kobject相关的内存的清理。 366 + 367 + kobject_del()可以用来放弃对父对象的引用，如果循环引用被构建的话。 368 + 在某些情况下，一个父对象引用一个子对象是有效的。循环引用必须通过明 369 + 确调用kobject_del()来打断，这样一个释放函数就会被调用，前一个循环 370 + 中的对象会相互释放。 371 + 372 + 373 + 示例代码出处 374 + ============ 375 + 376 + 关于正确使用ksets和kobjects的更完整的例子，请参见示例程序 377 + ``samples/kobject/{kobject-example.c,kset-example.c}`` ，如果 378 + 您选择 ``CONFIG_SAMPLE_KOBJECT`` ，它们将被构建为可加载模块。

+194

Documentation/translations/zh_CN/core-api/local_ops.rst

··· 1 + .. include:: ../disclaimer-zh_CN.rst 2 + 3 + :Original: Documentation/core-api/local_ops.rst 4 + :Translator: Yanteng Si <siyanteng@loongson.cn> 5 + 6 + .. _cn_local_ops: 7 + 8 + 9 + ======================== 10 + 本地原子操作的语义和行为 11 + ======================== 12 + 13 + :作者: Mathieu Desnoyers 14 + 15 + 16 + 本文解释了本地原子操作的目的，如何为任何给定的架构实现这些操作，并说明了 17 + 如何正确使用这些操作。它还强调了在内存写入顺序很重要的情况下，跨CPU读取 18 + 这些本地变量时必须采取的预防措施。 19 + 20 + .. note:: 21 + 22 + 注意，基于 ``local_t`` 的操作不建议用于一般内核操作。请使用 ``this_cpu`` 23 + 操作来代替使用，除非真的有特殊目的。大多数内核中使用的 ``local_t`` 已 24 + 经被 ``this_cpu`` 操作所取代。 ``this_cpu`` 操作在一条指令中结合了重 25 + 定位和类似 ``local_t`` 的语义，产生了更紧凑和更快的执行代码。 26 + 27 + 28 + 本地原子操作的目的 29 + ================== 30 + 31 + 本地原子操作的目的是提供快速和高度可重入的每CPU计数器。它们通过移除LOCK前 32 + 缀和通常需要在CPU间同步的内存屏障，将标准原子操作的性能成本降到最低。 33 + 34 + 在许多情况下，拥有快速的每CPU原子计数器是很有吸引力的：它不需要禁用中断来保护中 35 + 断处理程序，它允许在NMI（Non Maskable Interrupt）处理程序中使用连贯的计数器。 36 + 它对追踪目的和各种性能监测计数器特别有用。 37 + 38 + 本地原子操作只保证在拥有数据的CPU上的变量修改的原子性。因此，必须注意确保只 39 + 有一个CPU写到 ``local_t`` 的数据。这是通过使用每CPU的数据来实现的，并确 40 + 保我们在一个抢占式安全上下文中修改它。然而，从任何一个CPU读取 ``local_t`` 41 + 数据都是允许的：这样它就会显得与所有者CPU的其他内存写入顺序不一致。 42 + 43 + 44 + 针对特定架构的实现 45 + ================== 46 + 47 + 这可以通过稍微修改标准的原子操作来实现：只有它们的UP变体必须被保留。这通常 48 + 意味着删除LOCK前缀（在i386和x86_64上）和任何SMP同步屏障。如果架构在SMP和 49 + UP之间没有不同的行为，在你的架构的 ``local.h`` 中包括 ``asm-generic/local.h`` 50 + 就足够了。 51 + 52 + 通过在一个结构体中嵌入一个 ``atomic_long_t`` ， ``local_t`` 类型被定义为 53 + 一个不透明的 ``signed long`` 。这样做的目的是为了使从这个类型到 54 + ``long`` 的转换失败。该定义看起来像:: 55 + 56 + typedef struct { atomic_long_t a; } local_t; 57 + 58 + 59 + 使用本地原子操作时应遵循的规则 60 + ============================== 61 + 62 + * 被本地操作触及的变量必须是每cpu的变量。 63 + 64 + * *只有* 这些变量的CPU所有者才可以写入这些变量。 65 + 66 + * 这个CPU可以从任何上下文（进程、中断、软中断、nmi...）中使用本地操作来更新 67 + 它的local_t变量。 68 + 69 + * 当在进程上下文中使用本地操作时，必须禁用抢占（或中断），以确保进程在获得每 70 + CPU变量和进行实际的本地操作之间不会被迁移到不同的CPU。 71 + 72 + * 当在中断上下文中使用本地操作时，在主线内核上不需要特别注意，因为它们将在局 73 + 部CPU上运行，并且已经禁用了抢占。然而，我建议无论如何都要明确地禁用抢占， 74 + 以确保它在-rt内核上仍能正确工作。 75 + 76 + * 读取本地cpu变量将提供该变量的当前拷贝。 77 + 78 + * 对这些变量的读取可以从任何CPU进行，因为对 “ ``long`` ”，对齐的变量的更新 79 + 总是原子的。由于写入程序的CPU没有进行内存同步，所以在读取 *其他* cpu的变 80 + 量时，可以读取该变量的过期副本。 81 + 82 + 83 + 如何使用本地原子操作 84 + ==================== 85 + 86 + :: 87 + 88 + #include <linux/percpu.h> 89 + #include <asm/local.h> 90 + 91 + static DEFINE_PER_CPU(local_t, counters) = LOCAL_INIT(0); 92 + 93 + 94 + 计数器 95 + ====== 96 + 97 + 计数是在一个signed long的所有位上进行的。 98 + 99 + 在可抢占的上下文中，围绕本地原子操作使用 ``get_cpu_var()`` 和 100 + ``put_cpu_var()`` ：它确保在对每个cpu变量进行写访问时，抢占被禁用。比如 101 + 说:: 102 + 103 + local_inc(&get_cpu_var(counters)); 104 + put_cpu_var(counters); 105 + 106 + 如果你已经在一个抢占安全上下文中，你可以使用 ``this_cpu_ptr()`` 代替:: 107 + 108 + local_inc(this_cpu_ptr(&counters)); 109 + 110 + 111 + 112 + 读取计数器 113 + ========== 114 + 115 + 那些本地计数器可以从外部的CPU中读取，以求得计数的总和。请注意，local_read 116 + 所看到的跨CPU的数据必须被认为是相对于拥有该数据的CPU上发生的其他内存写入来 117 + 说不符合顺序的:: 118 + 119 + long sum = 0; 120 + for_each_online_cpu(cpu) 121 + sum += local_read(&per_cpu(counters, cpu)); 122 + 123 + 如果你想使用远程local_read来同步CPU之间对资源的访问，必须在写入者和读取者 124 + 的CPU上分别使用显式的 ``smp_wmb()`` 和 ``smp_rmb()`` 内存屏障。如果你使 125 + 用 ``local_t`` 变量作为写在缓冲区中的字节的计数器，就会出现这种情况：在缓 126 + 冲区写和计数器增量之间应该有一个 ``smp_wmb()`` ，在计数器读和缓冲区读之间 127 + 也应有一个 ``smp_rmb()`` 。 128 + 129 + 下面是一个使用 ``local.h`` 实现每个cpu基本计数器的示例模块:: 130 + 131 + /* test-local.c 132 + * 133 + * Sample module for local.h usage. 134 + */ 135 + 136 + 137 + #include <asm/local.h> 138 + #include <linux/module.h> 139 + #include <linux/timer.h> 140 + 141 + static DEFINE_PER_CPU(local_t, counters) = LOCAL_INIT(0); 142 + 143 + static struct timer_list test_timer; 144 + 145 + /* IPI called on each CPU. */ 146 + static void test_each(void *info) 147 + { 148 + /* Increment the counter from a non preemptible context */ 149 + printk("Increment on cpu %d\n", smp_processor_id()); 150 + local_inc(this_cpu_ptr(&counters)); 151 + 152 + /* This is what incrementing the variable would look like within a 153 + * preemptible context (it disables preemption) : 154 + * 155 + * local_inc(&get_cpu_var(counters)); 156 + * put_cpu_var(counters); 157 + */ 158 + } 159 + 160 + static void do_test_timer(unsigned long data) 161 + { 162 + int cpu; 163 + 164 + /* Increment the counters */ 165 + on_each_cpu(test_each, NULL, 1); 166 + /* Read all the counters */ 167 + printk("Counters read from CPU %d\n", smp_processor_id()); 168 + for_each_online_cpu(cpu) { 169 + printk("Read : CPU %d, count %ld\n", cpu, 170 + local_read(&per_cpu(counters, cpu))); 171 + } 172 + mod_timer(&test_timer, jiffies + 1000); 173 + } 174 + 175 + static int __init test_init(void) 176 + { 177 + /* initialize the timer that will increment the counter */ 178 + timer_setup(&test_timer, do_test_timer, 0); 179 + mod_timer(&test_timer, jiffies + 1); 180 + 181 + return 0; 182 + } 183 + 184 + static void __exit test_exit(void) 185 + { 186 + del_timer_sync(&test_timer); 187 + } 188 + 189 + module_init(test_init); 190 + module_exit(test_exit); 191 + 192 + MODULE_LICENSE("GPL"); 193 + MODULE_AUTHOR("Mathieu Desnoyers"); 194 + MODULE_DESCRIPTION("Local Atomic Ops");

+158

Documentation/translations/zh_CN/core-api/padata.rst

··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + .. include:: ../disclaimer-zh_CN.rst 4 + 5 + :Original: Documentation/core-api/padata.rst 6 + :Translator: Yanteng Si <siyanteng@loongson.cn> 7 + 8 + .. _cn_core_api_padata.rst: 9 + 10 + ================== 11 + padata并行执行机制 12 + ================== 13 + 14 + :日期: 2020年5月 15 + 16 + Padata是一种机制，内核可以通过此机制将工作分散到多个CPU上并行完成，同时 17 + 可以选择保持它们的顺序。 18 + 19 + 它最初是为IPsec开发的，它需要在不对这些数据包重新排序的其前提下，为大量的数 20 + 据包进行加密和解密。这是目前padata的序列化作业支持的唯一用途。 21 + 22 + Padata还支持多线程作业，将作业平均分割，同时在线程之间进行负载均衡和协调。 23 + 24 + 执行序列化作业 25 + ============== 26 + 27 + 初始化 28 + ------ 29 + 30 + 使用padata执行序列化作业的第一步是建立一个padata_instance结构体，以全面 31 + 控制作业的运行方式:: 32 + 33 + #include <linux/padata.h> 34 + 35 + struct padata_instance *padata_alloc(const char *name); 36 + 37 + 'name'即标识了这个实例。 38 + 39 + 然后，通过分配一个padata_shell来完成padata的初始化:: 40 + 41 + struct padata_shell *padata_alloc_shell(struct padata_instance *pinst); 42 + 43 + 一个padata_shell用于向padata提交一个作业，并允许一系列这样的作业被独立地 44 + 序列化。一个padata_instance可以有一个或多个padata_shell与之相关联，每个 45 + 都允许一系列独立的作业。 46 + 47 + 修改cpumasks 48 + ------------ 49 + 50 + 用于运行作业的CPU可以通过两种方式改变，通过padata_set_cpumask()编程或通 51 + 过sysfs。前者的定义是:: 52 + 53 + int padata_set_cpumask(struct padata_instance *pinst, int cpumask_type, 54 + cpumask_var_t cpumask); 55 + 56 + 这里cpumask_type是PADATA_CPU_PARALLEL（并行）或PADATA_CPU_SERIAL（串行）之一，其中并 57 + 行cpumask描述了哪些处理器将被用来并行执行提交给这个实例的作业，串行cpumask 58 + 定义了哪些处理器被允许用作串行化回调处理器。 cpumask指定了要使用的新cpumask。 59 + 60 + 一个实例的cpumasks可能有sysfs文件。例如，pcrypt的文件在 61 + /sys/kernel/pcrypt/<instance-name>。在一个实例的目录中，有两个文件，parallel_cpumask 62 + 和serial_cpumask，任何一个cpumask都可以通过在文件中回显（echo）一个bitmask 63 + 来改变，比如说:: 64 + 65 + echo f > /sys/kernel/pcrypt/pencrypt/parallel_cpumask 66 + 67 + 读取其中一个文件会显示用户提供的cpumask，它可能与“可用”的cpumask不同。 68 + 69 + Padata内部维护着两对cpumask，用户提供的cpumask和“可用的”cpumask(每一对由一个 70 + 并行和一个串行cpumask组成)。用户提供的cpumasks在实例分配时默认为所有可能的CPU， 71 + 并且可以如上所述进行更改。可用的cpumasks总是用户提供的cpumasks的一个子集，只包 72 + 含用户提供的掩码中的在线CPU；这些是padata实际使用的cpumasks。因此，向padata提 73 + 供一个包含离线CPU的cpumask是合法的。一旦用户提供的cpumask中的一个离线CPU上线， 74 + padata就会使用它。 75 + 76 + 改变CPU掩码的操作代价很高，所以不应频繁更改。 77 + 78 + 运行一个作业 79 + ------------- 80 + 81 + 实际上向padata实例提交工作需要创建一个padata_priv结构体，它代表一个作业:: 82 + 83 + struct padata_priv { 84 + /* Other stuff here... */ 85 + void (*parallel)(struct padata_priv *padata); 86 + void (*serial)(struct padata_priv *padata); 87 + }; 88 + 89 + 这个结构体几乎肯定会被嵌入到一些针对要做的工作的大结构体中。它的大部分字段对 90 + padata来说是私有的，但是这个结构在初始化时应该被清零，并且应该提供parallel()和 91 + serial()函数。在完成工作的过程中，这些函数将被调用，我们马上就会遇到。 92 + 93 + 工作的提交是通过:: 94 + 95 + int padata_do_parallel(struct padata_shell *ps, 96 + struct padata_priv *padata, int *cb_cpu); 97 + 98 + ps和padata结构体必须如上所述进行设置；cb_cpu指向作业完成后用于最终回调的首选CPU； 99 + 它必须在当前实例的CPU掩码中（如果不是，cb_cpu指针将被更新为指向实际选择的CPU）。 100 + padata_do_parallel()的返回值在成功时为0，表示工作正在进行中。-EBUSY意味着有人 101 + 在其他地方正在搞乱实例的CPU掩码，而当cb_cpu不在串行cpumask中、并行或串行cpumasks 102 + 中无在线CPU，或实例停止时，则会出现-EINVAL反馈。 103 + 104 + 每个提交给padata_do_parallel()的作业将依次传递给一个CPU上的上述parallel()函数 105 + 的一个调用，所以真正的并行是通过提交多个作业来实现的。parallel()在运行时禁用软 106 + 件中断，因此不能睡眠。parallel()函数把获得的padata_priv结构体指针作为其唯一的参 107 + 数；关于实际要做的工作的信息可能是通过使用container_of()找到封装结构体来获得的。 108 + 109 + 请注意，parallel()没有返回值；padata子系统假定parallel()将从此时开始负责这项工 110 + 作。作业不需要在这次调用中完成，但是，如果parallel()留下了未完成的工作，它应该准 111 + 备在前一个作业完成之前，被以新的作业再次调用 112 + 113 + 序列化作业 114 + ---------- 115 + 116 + 当一个作业完成时，parallel()（或任何实际完成该工作的函数）应该通过调用通知padata此 117 + 事:: 118 + 119 + void padata_do_serial(struct padata_priv *padata); 120 + 121 + 在未来的某个时刻，padata_do_serial()将触发对padata_priv结构体中serial()函数的调 122 + 用。这个调用将发生在最初要求调用padata_do_parallel()的CPU上；它也是在本地软件中断 123 + 被禁用的情况下运行的。 124 + 请注意，这个调用可能会被推迟一段时间，因为padata代码会努力确保作业按照提交的顺序完 125 + 成。 126 + 127 + 销毁 128 + ---- 129 + 130 + 清理一个padata实例时，可以预见的是调用两个free函数，这两个函数对应于分配的逆过程:: 131 + 132 + void padata_free_shell(struct padata_shell *ps); 133 + void padata_free(struct padata_instance *pinst); 134 + 135 + 用户有责任确保在调用上述任何一项之前，所有未完成的工作都已完成。 136 + 137 + 运行多线程作业 138 + ============== 139 + 140 + 一个多线程作业有一个主线程和零个或多个辅助线程，主线程参与作业，然后等待所有辅助线 141 + 程完成。padata将作业分割成称为chunk的单元，其中chunk是一个线程在一次调用线程函数 142 + 中完成的作业片段。 143 + 144 + 用户必须做三件事来运行一个多线程作业。首先，通过定义一个padata_mt_job结构体来描述 145 + 作业，这在接口部分有解释。这包括一个指向线程函数的指针，padata每次将作业块分配给线 146 + 程时都会调用这个函数。然后，定义线程函数，它接受三个参数： ``start`` 、 ``end`` 和 ``arg`` ， 147 + 其中前两个参数限定了线程操作的范围，最后一个是指向作业共享状态的指针，如果有的话。 148 + 准备好共享状态，它通常被分配在主线程的堆栈中。最后，调用padata_do_multithreaded()， 149 + 它将在作业完成后返回。 150 + 151 + 接口 152 + ==== 153 + 154 + 该API在以下内核代码中: 155 + 156 + include/linux/padata.h 157 + 158 + kernel/padata.c

+110

Documentation/translations/zh_CN/core-api/printk-basics.rst

··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + .. include:: ../disclaimer-zh_CN.rst 3 + 4 + :Original: Documentation/core-api/printk-basics.rst 5 + :Translator: Yanteng Si <siyanteng@loongson.cn> 6 + 7 + .. _cn_printk-basics.rst: 8 + 9 + 10 + ================== 11 + 使用printk记录消息 12 + ================== 13 + 14 + printk()是Linux内核中最广为人知的函数之一。它是我们打印消息的标准工具，通常也是追踪和调试 15 + 的最基本方法。如果你熟悉printf(3)，你就能够知道printk()是基于它的，尽管它在功能上有一些不 16 + 同之处: 17 + 18 + - printk() 消息可以指定日志级别。 19 + 20 + - 格式字符串虽然与C99基本兼容，但并不遵循完全相同的规范。它有一些扩展和一些限制（没 21 + 有 ``%n`` 或浮点转换指定符）。参见:ref: `如何正确地获得printk格式指定符<printk-specifiers>` 。 22 + 23 + 所有的printk()消息都会被打印到内核日志缓冲区，这是一个通过/dev/kmsg输出到用户空间的环 24 + 形缓冲区。读取它的通常方法是使用 ``dmesg`` 。 25 + 26 + printk()的用法通常是这样的:: 27 + 28 + printk(KERN_INFO "Message: %s\n", arg); 29 + 30 + 其中 ``KERN_INFO`` 是日志级别（注意，它与格式字符串连在一起，日志级别不是一个单独的参数）。 31 + 可用的日志级别是： 32 + 33 + 34 + +----------------+--------+-----------------------------------------------+ 35 + | 名称 | 字符串 | 别名函数 | 36 + +================+========+===============================================+ 37 + | KERN_EMERG | "0" | pr_emerg() | 38 + +----------------+--------+-----------------------------------------------+ 39 + | KERN_ALERT | "1" | pr_alert() | 40 + +----------------+--------+-----------------------------------------------+ 41 + | KERN_CRIT | "2" | pr_crit() | 42 + +----------------+--------+-----------------------------------------------+ 43 + | KERN_ERR | "3" | pr_err() | 44 + +----------------+--------+-----------------------------------------------+ 45 + | KERN_WARNING | "4" | pr_warn() | 46 + +----------------+--------+-----------------------------------------------+ 47 + | KERN_NOTICE | "5" | pr_notice() | 48 + +----------------+--------+-----------------------------------------------+ 49 + | KERN_INFO | "6" | pr_info() | 50 + +----------------+--------+-----------------------------------------------+ 51 + | KERN_DEBUG | "7" | pr_debug() and pr_devel() 若定义了DEBUG | 52 + +----------------+--------+-----------------------------------------------+ 53 + | KERN_DEFAULT | "" | | 54 + +----------------+--------+-----------------------------------------------+ 55 + | KERN_CONT | "c" | pr_cont() | 56 + +----------------+--------+-----------------------------------------------+ 57 + 58 + 59 + 日志级别指定了一条消息的重要性。内核根据日志级别和当前 *console_loglevel* （一个内核变量）决 60 + 定是否立即显示消息（将其打印到当前控制台）。如果消息的优先级比 *console_loglevel* 高（日志级 61 + 别值较低），消息将被打印到控制台。 62 + 63 + 如果省略了日志级别，则以 ``KERN_DEFAULT`` 级别打印消息。 64 + 65 + 你可以用以下方法检查当前的 *console_loglevel* :: 66 + 67 + $ cat /proc/sys/kernel/printk 68 + 4 4 1 7 69 + 70 + 结果显示了 *current*, *default*, *minimum* 和 *boot-time-default* 日志级别 71 + 72 + 要改变当前的 console_loglevel，只需在 ``/proc/sys/kernel/printk`` 中写入所需的 73 + 级别。例如，要打印所有的消息到控制台上:: 74 + 75 + # echo 8 > /proc/sys/kernel/printk 76 + 77 + 另一种方式，使用 ``dmesg``:: 78 + 79 + # dmesg -n 5 80 + 81 + 设置 console_loglevel 打印 KERN_WARNING (4) 或更严重的消息到控制台。更多消息参 82 + 见 ``dmesg(1)`` 。 83 + 84 + 作为printk()的替代方案，你可以使用 ``pr_*()`` 别名来记录日志。这个系列的宏在宏名中 85 + 嵌入了日志级别。例如:: 86 + 87 + pr_info("Info message no. %d\n", msg_num); 88 + 89 + 打印 ``KERN_INFO`` 消息。 90 + 91 + 除了比等效的printk()调用更简洁之外，它们还可以通过pr_fmt()宏为格式字符串使用一个通用 92 + 的定义。例如，在源文件的顶部（在任何 ``#include`` 指令之前）定义这样的内容。:: 93 + 94 + #define pr_fmt(fmt) "%s:%s: " fmt, KBUILD_MODNAME, __func__ 95 + 96 + 会在该文件中的每一条 pr_*() 消息前加上发起该消息的模块和函数名称。 97 + 98 + 为了调试，还有两个有条件编译的宏： 99 + pr_debug()和pr_devel()，除非定义了 ``DEBUG`` (或者在pr_debug()的情况下定义了 100 + ``CONFIG_DYNAMIC_DEBUG`` )，否则它们会被编译。 101 + 102 + 103 + 函数接口 104 + ======== 105 + 106 + 该API在以下内核代码中: 107 + 108 + kernel/printk/printk.c 109 + 110 + include/linux/printk.h

+595

Documentation/translations/zh_CN/core-api/printk-formats.rst

··· 1 + .. include:: ../disclaimer-zh_CN.rst 2 + 3 + :Original: Documentation/core-api/printk-formats.rst 4 + :Translator: Yanteng Si <siyanteng@loongson.cn> 5 + 6 + .. _cn_printk-formats.rst: 7 + 8 + 9 + ============================== 10 + 如何获得正确的printk格式占位符 11 + ============================== 12 + 13 + 14 + 15 + :作者: Randy Dunlap <rdunlap@infradead.org> 16 + :作者: Andrew Murray <amurray@mpc-data.co.uk> 17 + 18 + 19 + 整数类型 20 + ======== 21 + 22 + :: 23 + 24 + 若变量类型是Type，则使用printk格式占位符。 25 + ------------------------------------------- 26 + char %d 或 %x 27 + unsigned char %u 或 %x 28 + short int %d 或 %x 29 + unsigned short int %u 或 %x 30 + int %d 或 %x 31 + unsigned int %u 或 %x 32 + long %ld 或 %lx 33 + unsigned long %lu 或 %lx 34 + long long %lld 或 %llx 35 + unsigned long long %llu 或 %llx 36 + size_t %zu 或 %zx 37 + ssize_t %zd 或 %zx 38 + s8 %d 或 %x 39 + u8 %u 或 %x 40 + s16 %d 或 %x 41 + u16 %u 或 %x 42 + s32 %d 或 %x 43 + u32 %u 或 %x 44 + s64 %lld 或 %llx 45 + u64 %llu 或 %llx 46 + 47 + 48 + 如果 <type> 的大小依赖于配置选项 (例如 sector_t, blkcnt_t) 或其大小依赖于架构 49 + (例如 tcflag_t)，则使用其可能的最大类型的格式占位符并显式强制转换为它。 50 + 51 + 例如 52 + 53 + :: 54 + 55 + printk("test: sector number/total blocks: %llu/%llu\n", 56 + (unsigned long long)sector, (unsigned long long)blockcount); 57 + 58 + 提醒：sizeof()返回类型为size_t。 59 + 60 + 内核的printf不支持%n。显而易见，浮点格式(%e, %f, %g, %a)也不被识别。使用任何不 61 + 支持的占位符或长度限定符都会导致一个WARN并且终止vsnprintf()执行。 62 + 63 + 指针类型 64 + ======== 65 + 66 + 一个原始指针值可以用%p打印，它将在打印前对地址进行哈希处理。内核也支持扩展占位符来打印 67 + 不同类型的指针。 68 + 69 + 一些扩展占位符会打印给定地址上的数据，而不是打印地址本身。在这种情况下，以下错误消息可能 70 + 会被打印出来，而不是无法访问的消息:: 71 + 72 + (null) data on plain NULL address 73 + (efault) data on invalid address 74 + (einval) invalid data on a valid address 75 + 76 + 普通指针 77 + ---------- 78 + 79 + :: 80 + 81 + %p abcdef12 or 00000000abcdef12 82 + 83 + 没有指定扩展名的指针（即没有修饰符的%p）被哈希（hash），以防止内核内存布局消息的泄露。这 84 + 样还有一个额外的好处，就是提供一个唯一的标识符。在64位机器上，前32位被清零。当没有足够的 85 + 熵进行散列处理时，内核将打印(ptrval)代替 86 + 87 + 如果可能的话，使用专门的修饰符，如%pS或%pB（如下所述），以避免打印一个必须事后解释的非哈 88 + 希地址。如果不可能，而且打印地址的目的是为调试提供更多的消息，使用%p，并在调试过程中 89 + 用 ``no_hash_pointers`` 参数启动内核，这将打印所有未修改的%p地址。如果你 *真的* 想知 90 + 道未修改的地址，请看下面的%px。 91 + 92 + 如果（也只有在）你将地址作为虚拟文件的内容打印出来，例如在procfs或sysfs中（使用 93 + seq_printf()，而不是printk()）由用户空间进程读取，使用下面描述的%pK修饰符，不 94 + 要用%p或%px。 95 + 96 + 97 + 错误指针 98 + -------- 99 + 100 + :: 101 + 102 + %pe -ENOSPC 103 + 104 + 用于打印错误指针(即IS_ERR()为真的指针)的符号错误名。不知道符号名的错误值会以十进制打印， 105 + 而作为%pe参数传递的非ERR_PTR会被视为普通的%p。 106 + 107 + 符号/函数指针 108 + ------------- 109 + 110 + :: 111 + 112 + %pS versatile_init+0x0/0x110 113 + %ps versatile_init 114 + %pSR versatile_init+0x9/0x110 115 + (with __builtin_extract_return_addr() translation) 116 + %pB prev_fn_of_versatile_init+0x88/0x88 117 + 118 + 119 + ``S`` 和 ``s`` 占位符用于打印符号格式的指针。它们的结果是符号名称带有(S)或不带有(s)偏移 120 + 量。如果禁用KALLSYMS，则打印符号地址。 121 + 122 + ``B`` 占位符的结果是带有偏移量的符号名，在打印堆栈回溯时应该使用。占位符将考虑编译器优化 123 + 的影响，当使用尾部调用并使用noreturn GCC属性标记时，可能会发生这种优化。 124 + 125 + 如果指针在一个模块内，模块名称和可选的构建ID将被打印在符号名称之后，并在说明符的末尾添加 126 + 一个额外的 ``b`` 。 127 + 128 + :: 129 + 130 + %pS versatile_init+0x0/0x110 [module_name] 131 + %pSb versatile_init+0x0/0x110 [module_name ed5019fdf5e53be37cb1ba7899292d7e143b259e] 132 + %pSRb versatile_init+0x9/0x110 [module_name ed5019fdf5e53be37cb1ba7899292d7e143b259e] 133 + (with __builtin_extract_return_addr() translation) 134 + %pBb prev_fn_of_versatile_init+0x88/0x88 [module_name ed5019fdf5e53be37cb1ba7899292d7e143b259e] 135 + 136 + 来自BPF / tracing追踪的探查指针 137 + ---------------------------------- 138 + 139 + :: 140 + 141 + %pks kernel string 142 + %pus user string 143 + 144 + ``k`` 和 ``u`` 指定符用于打印来自内核内存(k)或用户内存(u)的先前探测的内存。后面的 ``s`` 指 145 + 定符的结果是打印一个字符串。对于直接在常规的vsnprintf()中使用时，(k)和(u)注释被忽略，但是，当 146 + 在BPF的bpf_trace_printk()之外使用时，它会读取它所指向的内存，不会出现错误。 147 + 148 + 内核指针 149 + -------- 150 + 151 + :: 152 + 153 + %pK 01234567 or 0123456789abcdef 154 + 155 + 用于打印应该对非特权用户隐藏的内核指针。%pK的行为取决于kptr_restrict sysctl——详见 156 + Documentation/admin-guide/sysctl/kernel.rst。 157 + 158 + 未经修改的地址 159 + -------------- 160 + 161 + :: 162 + 163 + %px 01234567 or 0123456789abcdef 164 + 165 + 对于打印指针，当你 *真的* 想打印地址时。在用%px打印指针之前，请考虑你是否泄露了内核内 166 + 存布局的敏感消息。%px在功能上等同于%lx（或%lu）。%px是首选，因为它在grep查找时更唯一。 167 + 如果将来我们需要修改内核处理打印指针的方式，我们将能更好地找到调用点。 168 + 169 + 在使用%px之前，请考虑使用%p并在调试过程中启用' ' no_hash_pointer ' '内核参数是否足 170 + 够(参见上面的%p描述)。%px的一个有效场景可能是在panic发生之前立即打印消息，这样无论如何 171 + 都可以防止任何敏感消息被利用，使用%px就不需要用no_hash_pointer来重现panic。 172 + 173 + 指针差异 174 + -------- 175 + 176 + :: 177 + 178 + %td 2560 179 + %tx a00 180 + 181 + 为了打印指针的差异，使用ptrdiff_t的%t修饰符。 182 + 183 + 例如:: 184 + 185 + printk("test: difference between pointers: %td\n", ptr2 - ptr1); 186 + 187 + 结构体资源（Resources） 188 + ----------------------- 189 + 190 + :: 191 + 192 + %pr [mem 0x60000000-0x6fffffff flags 0x2200] or 193 + [mem 0x0000000060000000-0x000000006fffffff flags 0x2200] 194 + %pR [mem 0x60000000-0x6fffffff pref] or 195 + [mem 0x0000000060000000-0x000000006fffffff pref] 196 + 197 + 用于打印结构体资源。 ``R`` 和 ``r`` 占位符的结果是打印出的资源带有（R）或不带有（r）解码标志 198 + 成员。 199 + 200 + 通过引用传递。 201 + 202 + 物理地址类型 phys_addr_t 203 + ------------------------ 204 + 205 + :: 206 + 207 + %pa[p] 0x01234567 or 0x0123456789abcdef 208 + 209 + 用于打印phys_addr_t类型（以及它的衍生物，如resource_size_t），该类型可以根据构建选项而 210 + 变化，无论CPU数据真实物理地址宽度如何。 211 + 212 + 通过引用传递。 213 + 214 + DMA地址类型dma_addr_t 215 + --------------------- 216 + 217 + :: 218 + 219 + %pad 0x01234567 or 0x0123456789abcdef 220 + 221 + 用于打印dma_addr_t类型，该类型可以根据构建选项而变化，而不考虑CPU数据路径的宽度。 222 + 223 + 通过引用传递。 224 + 225 + 原始缓冲区为转义字符串 226 + ---------------------- 227 + 228 + :: 229 + 230 + %*pE[achnops] 231 + 232 + 用于将原始缓冲区打印成转义字符串。对于以下缓冲区:: 233 + 234 + 1b 62 20 5c 43 07 22 90 0d 5d 235 + 236 + 几个例子展示了如何进行转换（不包括两端的引号）。:: 237 + 238 + %*pE "\eb \C\a"\220\r]" 239 + %*pEhp "\x1bb \C\x07"\x90\x0d]" 240 + %*pEa "\e\142\040\\\103\a\042\220\r\135" 241 + 242 + 转换规则是根据可选的标志组合来应用的(详见:c:func:`string_escape_mem` 内核文档): 243 + 244 + - a - ESCAPE_ANY 245 + - c - ESCAPE_SPECIAL 246 + - h - ESCAPE_HEX 247 + - n - ESCAPE_NULL 248 + - o - ESCAPE_OCTAL 249 + - p - ESCAPE_NP 250 + - s - ESCAPE_SPACE 251 + 252 + 默认情况下，使用 ESCAPE_ANY_NP。 253 + 254 + ESCAPE_ANY_NP是许多情况下的明智选择，特别是对于打印SSID。 255 + 256 + 如果字段宽度被省略，那么将只转义1个字节。 257 + 258 + 原始缓冲区为十六进制字符串 259 + -------------------------- 260 + 261 + :: 262 + 263 + %*ph 00 01 02 ... 3f 264 + %*phC 00:01:02: ... :3f 265 + %*phD 00-01-02- ... -3f 266 + %*phN 000102 ... 3f 267 + 268 + 对于打印小的缓冲区（最长64个字节），可以用一定的分隔符作为一个 269 + 十六进制字符串。对于较大的缓冲区，可以考虑使用 270 + :c:func:`print_hex_dump` 。 271 + 272 + MAC/FDDI地址 273 + ------------ 274 + 275 + :: 276 + 277 + %pM 00:01:02:03:04:05 278 + %pMR 05:04:03:02:01:00 279 + %pMF 00-01-02-03-04-05 280 + %pm 000102030405 281 + %pmR 050403020100 282 + 283 + 用于打印以十六进制表示的6字节MAC/FDDI地址。 ``M`` 和 ``m`` 占位符导致打印的 284 + 地址有(M)或没有(m)字节分隔符。默认的字节分隔符是冒号（：）。 285 + 286 + 对于FDDI地址，可以在 ``M`` 占位符之后使用 ``F`` 说明，以使用破折号(——)分隔符 287 + 代替默认的分隔符。 288 + 289 + 对于蓝牙地址， ``R`` 占位符应使用在 ``M`` 占位符之后，以使用反转的字节顺序，适 290 + 合于以小尾端顺序的蓝牙地址的肉眼可见的解析。 291 + 292 + 通过引用传递。 293 + 294 + IPv4地址 295 + -------- 296 + 297 + :: 298 + 299 + %pI4 1.2.3.4 300 + %pi4 001.002.003.004 301 + %p[Ii]4[hnbl] 302 + 303 + 用于打印IPv4点分隔的十进制地址。 ``I4`` 和 ``i4`` 占位符的结果是打印的地址 304 + 有(i4)或没有(I4)前导零。 305 + 306 + 附加的 ``h`` 、 ``n`` 、 ``b`` 和 ``l`` 占位符分别用于指定主机、网络、大 307 + 尾端或小尾端地址。如果没有提供占位符，则使用默认的网络/大尾端顺序。 308 + 309 + 通过引用传递。 310 + 311 + IPv6 地址 312 + --------- 313 + 314 + :: 315 + 316 + %pI6 0001:0002:0003:0004:0005:0006:0007:0008 317 + %pi6 00010002000300040005000600070008 318 + %pI6c 1:2:3:4:5:6:7:8 319 + 320 + 用于打印IPv6网络顺序的16位十六进制地址。 ``I6`` 和 ``i6`` 占位符的结果是 321 + 打印的地址有(I6)或没有(i6)分号。始终使用前导零。 322 + 323 + 额外的 ``c`` 占位符可与 ``I`` 占位符一起使用，以打印压缩的IPv6地址，如 324 + https://tools.ietf.org/html/rfc5952 所述 325 + 326 + 通过引用传递。 327 + 328 + IPv4/IPv6地址(generic, with port, flowinfo, scope) 329 + -------------------------------------------------- 330 + 331 + :: 332 + 333 + %pIS 1.2.3.4 or 0001:0002:0003:0004:0005:0006:0007:0008 334 + %piS 001.002.003.004 or 00010002000300040005000600070008 335 + %pISc 1.2.3.4 or 1:2:3:4:5:6:7:8 336 + %pISpc 1.2.3.4:12345 or [1:2:3:4:5:6:7:8]:12345 337 + %p[Ii]S[pfschnbl] 338 + 339 + 用于打印一个IP地址，不需要区分它的类型是AF_INET还是AF_INET6。一个指向有效结构 340 + 体sockaddr的指针，通过 ``IS`` 或 ``IS`` 指定，可以传递给这个格式占位符。 341 + 342 + 附加的 ``p`` 、 ``f`` 和 ``s`` 占位符用于指定port(IPv4, IPv6)、 343 + flowinfo (IPv6)和sope(IPv6)。port有一个 ``:`` 前缀，flowinfo是 ``/`` 和 344 + 范围是 ``%`` ，每个后面都跟着实际的值。 345 + 346 + 对于IPv6地址，如果指定了额外的指定符 ``c`` ，则使用 347 + https://tools.ietf.org/html/rfc5952 描述的压缩IPv6地址。 348 + 如https://tools.ietf.org/html/draft-ietf-6man-text-addr-representation-07 349 + 所建议的，IPv6地址由'['，']'包围，以防止出现额外的占位符 ``p`` ， ``f`` 或 ``s`` 。 350 + 351 + 对于IPv4地址，也可以使用额外的 ``h`` ， ``n`` ， ``b`` 和 ``l`` 说 352 + 明符，但对于IPv6地址则忽略。 353 + 354 + 通过引用传递。 355 + 356 + 更多例子:: 357 + 358 + %pISfc 1.2.3.4 or [1:2:3:4:5:6:7:8]/123456789 359 + %pISsc 1.2.3.4 or [1:2:3:4:5:6:7:8]%1234567890 360 + %pISpfc 1.2.3.4:12345 or [1:2:3:4:5:6:7:8]:12345/123456789 361 + 362 + UUID/GUID地址 363 + ------------- 364 + 365 + :: 366 + 367 + %pUb 00010203-0405-0607-0809-0a0b0c0d0e0f 368 + %pUB 00010203-0405-0607-0809-0A0B0C0D0E0F 369 + %pUl 03020100-0504-0706-0809-0a0b0c0e0e0f 370 + %pUL 03020100-0504-0706-0809-0A0B0C0E0E0F 371 + 372 + 用于打印16字节的UUID/GUIDs地址。附加的 ``l`` , ``L`` , ``b`` 和 ``B`` 占位符用 373 + 于指定小写(l)或大写(L)十六进制表示法中的小尾端顺序，以及小写(b)或大写(B)十六进制表 374 + 示法中的大尾端顺序。 375 + 376 + 如果没有使用额外的占位符，则将打印带有小写十六进制表示法的默认大端顺序。 377 + 378 + 通过引用传递。 379 + 380 + 目录项（dentry）的名称 381 + ---------------------- 382 + 383 + :: 384 + 385 + %pd{,2,3,4} 386 + %pD{,2,3,4} 387 + 388 + 用于打印dentry名称；如果我们用 :c:func:`d_move` 和它比较，名称可能是新旧混合的，但 389 + 不会oops。 %pd dentry比较安全，其相当于我们以前用的%s dentry->d_name.name，%pd<n>打 390 + 印 ``n`` 最后的组件。 %pD对结构文件做同样的事情。 391 + 392 + 393 + 通过引用传递。 394 + 395 + 块设备（block_device）名称 396 + -------------------------- 397 + 398 + :: 399 + 400 + %pg sda, sda1 or loop0p1 401 + 402 + 用于打印block_device指针的名称。 403 + 404 + va_format结构体 405 + --------------- 406 + 407 + :: 408 + 409 + %pV 410 + 411 + 用于打印结构体va_format。这些结构包含一个格式字符串 412 + 和va_list如下 413 + 414 + :: 415 + 416 + struct va_format { 417 + const char *fmt; 418 + va_list *va; 419 + }; 420 + 421 + 实现 "递归vsnprintf"。 422 + 423 + 如果没有一些机制来验证格式字符串和va_list参数的正确性，请不要使用这个功能。 424 + 425 + 通过引用传递。 426 + 427 + 设备树节点 428 + ---------- 429 + 430 + :: 431 + 432 + %pOF[fnpPcCF] 433 + 434 + 435 + 用于打印设备树节点结构。默认行为相当于%pOFf。 436 + 437 + - f - 设备节点全称 438 + - n - 设备节点名 439 + - p - 设备节点句柄 440 + - P - 设备节点路径规范(名称+@单位) 441 + - F - 设备节点标志 442 + - c - 主要兼容字符串 443 + - C - 全兼容字符串 444 + 445 + 当使用多个参数时，分隔符是':'。 446 + 447 + 例如 448 + 449 + :: 450 + 451 + %pOF /foo/bar@0 - Node full name 452 + %pOFf /foo/bar@0 - Same as above 453 + %pOFfp /foo/bar@0:10 - Node full name + phandle 454 + %pOFfcF /foo/bar@0:foo,device:--P- - Node full name + 455 + major compatible string + 456 + node flags 457 + D - dynamic 458 + d - detached 459 + P - Populated 460 + B - Populated bus 461 + 462 + 通过引用传递。 463 + 464 + Fwnode handles 465 + -------------- 466 + 467 + :: 468 + 469 + %pfw[fP] 470 + 471 + 用于打印fwnode_handles的消息。默认情况下是打印完整的节点名称，包括路径。 472 + 这些修饰符在功能上等同于上面的%pOF。 473 + 474 + - f - 节点的全名，包括路径。 475 + - P - 节点名称，包括地址（如果有的话）。 476 + 477 + 例如 (ACPI) 478 + 479 + :: 480 + 481 + %pfwf \_SB.PCI0.CIO2.port@1.endpoint@0 - Full node name 482 + %pfwP endpoint@0 - Node name 483 + 484 + 例如 (OF) 485 + 486 + :: 487 + 488 + %pfwf /ocp@68000000/i2c@48072000/camera@10/port/endpoint - Full name 489 + %pfwP endpoint - Node name 490 + 491 + 时间和日期 492 + ---------- 493 + 494 + :: 495 + 496 + %pt[RT] YYYY-mm-ddTHH:MM:SS 497 + %pt[RT]s YYYY-mm-dd HH:MM:SS 498 + %pt[RT]d YYYY-mm-dd 499 + %pt[RT]t HH:MM:SS 500 + %pt[RT][dt][r][s] 501 + 502 + 用于打印日期和时间:: 503 + 504 + R struct rtc_time structure 505 + T time64_t type 506 + 507 + 以我们（人类）可读的格式。 508 + 509 + 默认情况下，年将以1900为单位递增，月将以1为单位递增。使用%pt[RT]r (raw) 510 + 来抑制这种行为。 511 + 512 + %pt[RT]s（空格）将覆盖ISO 8601的分隔符，在日期和时间之间使用''（空格）而 513 + 不是'T'（大写T）。当日期或时间被省略时，它不会有任何影响。 514 + 515 + 通过引用传递。 516 + 517 + clk结构体 518 + --------- 519 + 520 + :: 521 + 522 + %pC pll1 523 + %pCn pll1 524 + 525 + 用于打印clk结构。%pC 和 %pCn 打印时钟的名称（通用时钟框架）或唯一的32位 526 + ID（传统时钟框架）。 527 + 528 + 通过引用传递。 529 + 530 + 位图及其衍生物，如cpumask和nodemask 531 + ----------------------------------- 532 + 533 + :: 534 + 535 + %*pb 0779 536 + %*pbl 0,3-6,8-10 537 + 538 + 对于打印位图（bitmap）及其派生的cpumask和nodemask，%*pb输出以字段宽度为位数的位图， 539 + %*pbl输出以字段宽度为位数的范围列表。 540 + 541 + 字段宽度用值传递，位图用引用传递。可以使用辅助宏cpumask_pr_args()和 542 + nodemask_pr_args()来方便打印cpumask和nodemask。 543 + 544 + 标志位字段，如页标志、gfp_flags 545 + ------------------------------- 546 + 547 + :: 548 + 549 + %pGp referenced|uptodate|lru|active|private|node=0|zone=2|lastcpupid=0x1fffff 550 + %pGg GFP_USER|GFP_DMA32|GFP_NOWARN 551 + %pGv read|exec|mayread|maywrite|mayexec|denywrite 552 + 553 + 将flags位字段打印为构造值的符号常量集合。标志的类型由第三个字符给出。目前支持的 554 + 是[p]age flags， [v]ma_flags(都期望 ``unsigned long *`` )和 555 + [g]fp_flags(期望 ``gfp_t *`` )。标志名称和打印顺序取决于特定的类型。 556 + 557 + 注意，这种格式不应该直接用于跟踪点的:c:func:`TP_printk()` 部分。相反，应使 558 + 用 <trace/events/mmflags.h>中的show_*_flags()函数。 559 + 560 + 通过引用传递。 561 + 562 + 网络设备特性 563 + ------------ 564 + 565 + :: 566 + 567 + %pNF 0x000000000000c000 568 + 569 + 用于打印netdev_features_t。 570 + 571 + 通过引用传递。 572 + 573 + V4L2和DRM FourCC代码(像素格式) 574 + ------------------------------ 575 + 576 + :: 577 + 578 + %p4cc 579 + 580 + 打印V4L2或DRM使用的FourCC代码，包括格式端序及其十六进制的数值。 581 + 582 + 通过引用传递。 583 + 584 + 例如:: 585 + 586 + %p4cc BG12 little-endian (0x32314742) 587 + %p4cc Y10 little-endian (0x20303159) 588 + %p4cc NV12 big-endian (0xb231564e) 589 + 590 + 谢谢 591 + ==== 592 + 593 + 如果您添加了其他%p扩展，请在可行的情况下，用一个或多个测试用例扩展<lib/test_printf.c>。 594 + 595 + 谢谢你的合作和关注。

+154

Documentation/translations/zh_CN/core-api/refcount-vs-atomic.rst

··· 1 + .. include:: ../disclaimer-zh_CN.rst 2 + 3 + :Original: Documentation/core-api/refcount-vs-atomic.rst 4 + :Translator: Yanteng Si <siyanteng@loongson.cn> 5 + 6 + .. _cn_refcount-vs-atomic: 7 + 8 + 9 + ======================================= 10 + 与atomic_t相比，refcount_t的API是这样的 11 + ======================================= 12 + 13 + .. contents:: :local: 14 + 15 + 简介 16 + ==== 17 + 18 + refcount_t API的目标是为实现对象的引用计数器提供一个最小的API。虽然来自 19 + lib/refcount.c的独立于架构的通用实现在下面使用了原子操作，但一些 ``refcount_*()`` 20 + 和 ``atomic_*()`` 函数在内存顺序保证方面有很多不同。本文档概述了这些差异，并 21 + 提供了相应的例子，以帮助开发者根据这些内存顺序保证的变化来验证他们的代码。 22 + 23 + 本文档中使用的术语尽量遵循tools/memory-model/Documentation/explanation.txt 24 + 中定义的正式LKMM。 25 + 26 + memory-barriers.txt和atomic_t.txt提供了更多关于内存顺序的背景，包括通用的 27 + 和针对原子操作的。 28 + 29 + 内存顺序的相关类型 30 + ================== 31 + 32 + .. note:: 下面的部分只涵盖了本文使用的与原子操作和引用计数器有关的一些内存顺 33 + 序类型。如果想了解更广泛的情况，请查阅memory-barriers.txt文件。 34 + 35 + 在没有任何内存顺序保证的情况下（即完全无序），atomics和refcounters只提供原 36 + 子性和程序顺序（program order, po）关系（在同一个CPU上）。它保证每个 37 + ``atomic_* ()`` 和 ``refcount_*()`` 操作都是原子性的，指令在单个CPU上按程序 38 + 顺序执行。这是用READ_ONCE()/WRITE_ONCE()和比较并交换原语实现的。 39 + 40 + 强（完全）内存顺序保证在同一CPU上的所有较早加载和存储的指令（所有程序顺序较早 41 + [po-earlier]指令）在执行任何程序顺序较后指令（po-later）之前完成。它还保证 42 + 同一CPU上储存的程序优先较早的指令和来自其他CPU传播的指令必须在该CPU执行任何 43 + 程序顺序较后指令之前传播到其他CPU（A-累积属性）。这是用smp_mb()实现的。 44 + 45 + RELEASE内存顺序保证了在同一CPU上所有较早加载和存储的指令（所有程序顺序较早 46 + 指令）在此操作前完成。它还保证同一CPU上储存的程序优先较早的指令和来自其他CPU 47 + 传播的指令必须在释放（release）操作之前传播到所有其他CPU（A-累积属性）。这是用 48 + smp_store_release()实现的。 49 + 50 + ACQUIRE内存顺序保证了同一CPU上的所有后加载和存储的指令（所有程序顺序较后 51 + 指令）在获取（acquire）操作之后完成。它还保证在获取操作执行后，同一CPU上 52 + 储存的所有程序顺序较后指令必须传播到所有其他CPU。这是用 53 + smp_acquire__after_ctrl_dep()实现的。 54 + 55 + 对Refcounters的控制依赖（取决于成功）保证了如果一个对象的引用被成功获得（引用计数 56 + 器的增量或增加行为发生了，函数返回true），那么进一步的存储是针对这个操作的命令。对存 57 + 储的控制依赖没有使用任何明确的屏障来实现，而是依赖于CPU不对存储进行猜测。这只是 58 + 一个单一的CPU关系，对其他CPU不提供任何保证。 59 + 60 + 61 + 函数的比较 62 + ========== 63 + 64 + 情况1） - 非 “读/修改/写”（RMW）操作 65 + ------------------------------------ 66 + 67 + 函数变化: 68 + 69 + * atomic_set() --> refcount_set() 70 + * atomic_read() --> refcount_read() 71 + 72 + 内存顺序保证变化: 73 + 74 + * none (两者都是完全无序的) 75 + 76 + 77 + 情况2） - 基于增量的操作，不返回任何值 78 + -------------------------------------- 79 + 80 + 函数变化: 81 + 82 + * atomic_inc() --> refcount_inc() 83 + * atomic_add() --> refcount_add() 84 + 85 + 内存顺序保证变化: 86 + 87 + * none (两者都是完全无序的) 88 + 89 + 情况3） - 基于递减的RMW操作，没有返回值 90 + --------------------------------------- 91 + 92 + 函数变化: 93 + 94 + * atomic_dec() --> refcount_dec() 95 + 96 + 内存顺序保证变化: 97 + 98 + * 完全无序的 --> RELEASE顺序 99 + 100 + 101 + 情况4） - 基于增量的RMW操作，返回一个值 102 + --------------------------------------- 103 + 104 + 函数变化: 105 + 106 + * atomic_inc_not_zero() --> refcount_inc_not_zero() 107 + * 无原子性对应函数 --> refcount_add_not_zero() 108 + 109 + 内存顺序保证变化: 110 + 111 + * 完全有序的 --> 控制依赖于存储的成功 112 + 113 + .. note:: 此处 **假设** 了，必要的顺序是作为获得对象指针的结果而提供的。 114 + 115 + 116 + 情况 5） - 基于Dec/Sub递减的通用RMW操作，返回一个值 117 + --------------------------------------------------- 118 + 119 + 函数变化: 120 + 121 + * atomic_dec_and_test() --> refcount_dec_and_test() 122 + * atomic_sub_and_test() --> refcount_sub_and_test() 123 + 124 + 内存顺序保证变化: 125 + 126 + * 完全有序的 --> RELEASE顺序 + 成功后ACQUIRE顺序 127 + 128 + 129 + 情况6）其他基于递减的RMW操作，返回一个值 130 + ---------------------------------------- 131 + 132 + 函数变化: 133 + 134 + * 无原子性对应函数 --> refcount_dec_if_one() 135 + * ``atomic_add_unless(&var, -1, 1)`` --> ``refcount_dec_not_one(&var)`` 136 + 137 + 内存顺序保证变化: 138 + 139 + * 完全有序的 --> RELEASE顺序 + 控制依赖 140 + 141 + .. note:: atomic_add_unless()只在执行成功时提供完整的顺序。 142 + 143 + 144 + 情况7）--基于锁的RMW 145 + -------------------- 146 + 147 + 函数变化: 148 + 149 + * atomic_dec_and_lock() --> refcount_dec_and_lock() 150 + * atomic_dec_and_mutex_lock() --> refcount_dec_and_mutex_lock() 151 + 152 + 内存顺序保证变化: 153 + 154 + * 完全有序 --> RELEASE顺序 + 控制依赖 + 持有

+142

Documentation/translations/zh_CN/core-api/symbol-namespaces.rst

··· 1 + .. include:: ../disclaimer-zh_CN.rst 2 + 3 + :Original: Documentation/core-api/symbol-namespaces.rst 4 + :Translator: Yanteng Si <siyanteng@loongson.cn> 5 + 6 + .. _cn_symbol-namespaces.rst: 7 + 8 + 9 + ================================= 10 + 符号命名空间（Symbol Namespaces） 11 + ================================= 12 + 13 + 本文档描述了如何使用符号命名空间来构造通过EXPORT_SYMBOL()系列宏导出的内核内符号的导出面。 14 + 15 + .. 目录 16 + 17 + === 1 简介 18 + === 2 如何定义符号命名空间 19 + --- 2.1 使用EXPORT_SYMBOL宏 20 + --- 2.2 使用DEFAULT_SYMBOL_NAMESPACE定义 21 + === 3 如何使用命名空间中导出的符号 22 + === 4 加载使用命名空间符号的模块 23 + === 5 自动创建MODULE_IMPORT_NS声明 24 + 25 + 1. 简介 26 + ======= 27 + 28 + 符号命名空间已经被引入，作为构造内核内API的导出面的一种手段。它允许子系统维护者将 29 + 他们导出的符号划分进独立的命名空间。这对于文档的编写非常有用（想想SUBSYSTEM_DEBUG 30 + 命名空间），也可以限制一组符号在内核其他部分的使用。今后，使用导出到命名空间的符号 31 + 的模块必须导入命名空间。否则，内核将根据其配置，拒绝加载该模块或警告说缺少 32 + 导入。 33 + 34 + 2. 如何定义符号命名空间 35 + ======================= 36 + 37 + 符号可以用不同的方法导出到命名空间。所有这些都在改变 EXPORT_SYMBOL 和与之类似的那些宏 38 + 被检测到的方式，以创建 ksymtab 条目。 39 + 40 + 2.1 使用EXPORT_SYMBOL宏 41 + ======================= 42 + 43 + 除了允许将内核符号导出到内核符号表的宏EXPORT_SYMBOL()和EXPORT_SYMBOL_GPL()之外， 44 + 这些宏的变体还可以将符号导出到某个命名空间：EXPORT_SYMBOL_NS() 和 EXPORT_SYMBOL_NS_GPL()。 45 + 它们需要一个额外的参数：命名空间（the namespace）。请注意，由于宏扩展，该参数需 46 + 要是一个预处理器符号。例如，要把符号 ``usb_stor_suspend`` 导出到命名空间 ``USB_STORAGE``， 47 + 请使用:: 48 + 49 + EXPORT_SYMBOL_NS(usb_stor_suspend, USB_STORAGE); 50 + 51 + 相应的 ksymtab 条目结构体 ``kernel_symbol`` 将有相应的成员 ``命名空间`` 集。 52 + 导出时未指明命名空间的符号将指向 ``NULL`` 。如果没有定义命名空间，则默认没有。 53 + ``modpost`` 和kernel/module.c分别在构建时或模块加载时使用名称空间。 54 + 55 + 2.2 使用DEFAULT_SYMBOL_NAMESPACE定义 56 + ==================================== 57 + 58 + 为一个子系统的所有符号定义命名空间可能会非常冗长，并可能变得难以维护。因此，我 59 + 们提供了一个默认定义（DEFAULT_SYMBOL_NAMESPACE），如果设置了这个定义，它将成 60 + 为所有没有指定命名空间的 EXPORT_SYMBOL() 和 EXPORT_SYMBOL_GPL() 宏扩展的默认 61 + 定义。 62 + 63 + 有多种方法来指定这个定义，使用哪种方法取决于子系统和维护者的喜好。第一种方法是在 64 + 子系统的 ``Makefile`` 中定义默认命名空间。例如，如果要将usb-common中定义的所有符号导 65 + 出到USB_COMMON命名空间，可以在drivers/usb/common/Makefile中添加这样一行:: 66 + 67 + ccflags-y += -DDEFAULT_SYMBOL_NAMESPACE=USB_COMMON 68 + 69 + 这将影响所有 EXPORT_SYMBOL() 和 EXPORT_SYMBOL_GPL() 语句。当这个定义存在时， 70 + 用EXPORT_SYMBOL_NS()导出的符号仍然会被导出到作为命名空间参数传递的命名空间中， 71 + 因为这个参数优先于默认的符号命名空间。 72 + 73 + 定义默认命名空间的第二个选项是直接在编译单元中作为预处理声明。上面的例子就会变 74 + 成:: 75 + 76 + #undef DEFAULT_SYMBOL_NAMESPACE 77 + #define DEFAULT_SYMBOL_NAMESPACE USB_COMMON 78 + 79 + 应置于相关编译单元中任何 EXPORT_SYMBOL 宏之前 80 + 81 + 3. 如何使用命名空间中导出的符号 82 + =============================== 83 + 84 + 为了使用被导出到命名空间的符号，内核模块需要明确地导入这些命名空间。 85 + 否则内核可能会拒绝加载该模块。模块代码需要使用宏MODULE_IMPORT_NS来 86 + 表示它所使用的命名空间的符号。例如，一个使用usb_stor_suspend符号的 87 + 模块，需要使用如下语句导入命名空间USB_STORAGE:: 88 + 89 + MODULE_IMPORT_NS(USB_STORAGE); 90 + 91 + 这将在模块中为每个导入的命名空间创建一个 ``modinfo`` 标签。这也顺带 92 + 使得可以用modinfo检查模块已导入的命名空间:: 93 + 94 + $ modinfo drivers/usb/storage/ums-karma.ko 95 + [...] 96 + import_ns: USB_STORAGE 97 + [...] 98 + 99 + 100 + 建议将 MODULE_IMPORT_NS() 语句添加到靠近其他模块元数据定义的地方， 101 + 如 MODULE_AUTHOR() 或 MODULE_LICENSE() 。关于自动创建缺失的导入 102 + 语句的方法，请参考第5节。 103 + 104 + 4. 加载使用命名空间符号的模块 105 + ============================= 106 + 107 + 在模块加载时（比如 ``insmod`` ），内核将检查每个从模块中引用的符号是否可 108 + 用，以及它可能被导出到的名字空间是否被模块导入。内核的默认行为是拒绝 109 + 加载那些没有指明足以导入的模块。此错误会被记录下来，并且加载将以 110 + EINVAL方式失败。要允许加载不满足这个前提条件的模块，可以使用此配置选项： 111 + 设置 MODULE_ALLOW_MISSING_NAMESPACE_IMPORTS=y 将使加载不受影响，但会 112 + 发出警告。 113 + 114 + 5. 自动创建MODULE_IMPORT_NS声明 115 + =============================== 116 + 117 + 缺少命名空间的导入可以在构建时很容易被检测到。事实上，如果一个模块 118 + 使用了一个命名空间的符号而没有导入它，modpost会发出警告。 119 + MODULE_IMPORT_NS()语句通常会被添加到一个明确的位置（和其他模块元 120 + 数据一起）。为了使模块作者（和子系统维护者）的生活更加轻松，我们提 121 + 供了一个脚本和make目标来修复丢失的导入。修复丢失的导入可以用:: 122 + 123 + $ make nsdeps 124 + 125 + 对模块作者来说，以下情况可能很典型:: 126 + 127 + - 编写依赖未导入命名空间的符号的代码 128 + - ``make`` 129 + - 注意 ``modpost`` 的警告，提醒你有一个丢失的导入。 130 + - 运行 ``make nsdeps``将导入添加到正确的代码位置。 131 + 132 + 对于引入命名空间的子系统维护者来说，其步骤非常相似。同样，make nsdeps最终将 133 + 为树内模块添加缺失的命名空间导入:: 134 + 135 + - 向命名空间转移或添加符号（例如，使用EXPORT_SYMBOL_NS()）。 136 + - `make e`（最好是用allmodconfig来覆盖所有的内核模块）。 137 + - 注意 ``modpost`` 的警告，提醒你有一个丢失的导入。 138 + - 运行 ``maknsdeps``将导入添加到正确的代码位置。 139 + 140 + 你也可以为外部模块的构建运行nsdeps。典型的用法是:: 141 + 142 + $ make -C <path_to_kernel_src> M=$PWD nsdeps

+337

Documentation/translations/zh_CN/core-api/workqueue.rst

··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + .. include:: ../disclaimer-zh_CN.rst 3 + 4 + :Original: Documentation/core-api/workqueue.rst 5 + :Translator: Yanteng Si <siyanteng@loongson.cn> 6 + 7 + .. _cn_workqueue.rst: 8 + 9 + 10 + ========================= 11 + 并发管理的工作队列 (cmwq) 12 + ========================= 13 + 14 + :日期: September, 2010 15 + :作者: Tejun Heo <tj@kernel.org> 16 + :作者: Florian Mickler <florian@mickler.org> 17 + 18 + 19 + 简介 20 + ==== 21 + 22 + 在很多情况下，需要一个异步进程的执行环境，工作队列（wq）API是这种情况下 23 + 最常用的机制。 24 + 25 + 当需要这样一个异步执行上下文时，一个描述将要执行的函数的工作项（work， 26 + 即一个待执行的任务）被放在队列中。一个独立的线程作为异步执行环境。该队 27 + 列被称为workqueue，线程被称为工作者（worker，即执行这一队列的线程）。 28 + 29 + 当工作队列上有工作项时，工作者会一个接一个地执行与工作项相关的函数。当 30 + 工作队列中没有任何工作项时，工作者就会变得空闲。当一个新的工作项被排入 31 + 队列时，工作者又开始执行。 32 + 33 + 34 + 为什么要cmwq? 35 + ============= 36 + 37 + 在最初的wq实现中，多线程（MT）wq在每个CPU上有一个工作者线程，而单线程 38 + （ST）wq在全系统有一个工作者线程。一个MT wq需要保持与CPU数量相同的工 39 + 作者数量。这些年来，内核增加了很多MT wq的用户，随着CPU核心数量的不断 40 + 增加，一些系统刚启动就达到了默认的32k PID的饱和空间。 41 + 42 + 尽管MT wq浪费了大量的资源，但所提供的并发性水平却不能令人满意。这个限 43 + 制在ST和MT wq中都有，只是在MT中没有那么严重。每个wq都保持着自己独立的 44 + 工作者池。一个MT wq只能为每个CPU提供一个执行环境，而一个ST wq则为整个 45 + 系统提供一个。工作项必须竞争这些非常有限的执行上下文，从而导致各种问题， 46 + 包括在单一执行上下文周围容易发生死锁。 47 + 48 + (MT wq)所提供的并发性水平和资源使用之间的矛盾也迫使其用户做出不必要的权衡，比 49 + 如libata选择使用ST wq来轮询PIO，并接受一个不必要的限制，即没有两个轮 50 + 询PIO可以同时进行。由于MT wq并没有提供更好的并发性，需要更高层次的并 51 + 发性的用户，如async或fscache，不得不实现他们自己的线程池。 52 + 53 + 并发管理工作队列（cmwq）是对wq的重新实现，重点是以下目标。 54 + 55 + * 保持与原始工作队列API的兼容性。 56 + 57 + * 使用由所有wq共享的每CPU统一的工作者池，在不浪费大量资源的情况下按 58 + * 需提供灵活的并发水平。 59 + 60 + * 自动调节工作者池和并发水平，使API用户不需要担心这些细节。 61 + 62 + 63 + 设计 64 + ==== 65 + 66 + 为了简化函数的异步执行，引入了一个新的抽象概念，即工作项。 67 + 68 + 一个工作项是一个简单的结构，它持有一个指向将被异步执行的函数的指针。 69 + 每当一个驱动程序或子系统希望一个函数被异步执行时，它必须建立一个指 70 + 向该函数的工作项，并在工作队列中排队等待该工作项。（就是挂到workqueue 71 + 队列里面去） 72 + 73 + 特定目的线程，称为工作线程（工作者），一个接一个地执行队列中的功能。 74 + 如果没有工作项排队，工作者线程就会闲置。这些工作者线程被管理在所谓 75 + 的工作者池中。 76 + 77 + cmwq设计区分了面向用户的工作队列，子系统和驱动程序在上面排队工作， 78 + 以及管理工作者池和处理排队工作项的后端机制。 79 + 80 + 每个可能的CPU都有两个工作者池，一个用于正常的工作项，另一个用于高 81 + 优先级的工作项，还有一些额外的工作者池，用于服务未绑定工作队列的工 82 + 作项目——这些后备池的数量是动态的。 83 + 84 + 当他们认为合适的时候，子系统和驱动程序可以通过特殊的 85 + ``workqueue API`` 函数创建和排队工作项。他们可以通过在工作队列上 86 + 设置标志来影响工作项执行方式的某些方面，他们把工作项放在那里。这些 87 + 标志包括诸如CPU定位、并发限制、优先级等等。要获得详细的概述，请参 88 + 考下面的 ``alloc_workqueue()`` 的 API 描述。 89 + 90 + 当一个工作项被排入一个工作队列时，目标工作池将根据队列参数和工作队 91 + 列属性确定，并被附加到工作池的共享工作列表上。例如，除非特别重写， 92 + 否则一个绑定的工作队列的工作项将被排在与发起线程运行的CPU相关的普 93 + 通或高级工作工作者池的工作项列表中。 94 + 95 + 对于任何工作者池的实施，管理并发水平（有多少执行上下文处于活动状 96 + 态）是一个重要问题。最低水平是为了节省资源，而饱和水平是指系统被 97 + 充分使用。 98 + 99 + 每个与实际CPU绑定的worker-pool通过钩住调度器来实现并发管理。每当 100 + 一个活动的工作者被唤醒或睡眠时，工作者池就会得到通知，并跟踪当前可 101 + 运行的工作者的数量。一般来说，工作项不会占用CPU并消耗很多周期。这 102 + 意味着保持足够的并发性以防止工作处理停滞应该是最优的。只要CPU上有 103 + 一个或多个可运行的工作者，工作者池就不会开始执行新的工作，但是，当 104 + 最后一个运行的工作者进入睡眠状态时，它会立即安排一个新的工作者，这 105 + 样CPU就不会在有待处理的工作项目时闲置。这允许在不损失执行带宽的情 106 + 况下使用最少的工作者。 107 + 108 + 除了kthreads的内存空间外，保留空闲的工作者并没有其他成本，所以cmwq 109 + 在杀死它们之前会保留一段时间的空闲。 110 + 111 + 对于非绑定的工作队列，后备池的数量是动态的。可以使用 112 + ``apply_workqueue_attrs()`` 为非绑定工作队列分配自定义属性， 113 + workqueue将自动创建与属性相匹配的后备工作者池。调节并发水平的责任在 114 + 用户身上。也有一个标志可以将绑定的wq标记为忽略并发管理。 115 + 详情请参考API部分。 116 + 117 + 前进进度的保证依赖于当需要更多的执行上下文时可以创建工作者，这也是 118 + 通过使用救援工作者来保证的。所有可能在处理内存回收的代码路径上使用 119 + 的工作项都需要在wq上排队，wq上保留了一个救援工作者，以便在内存有压 120 + 力的情况下下执行。否则，工作者池就有可能出现死锁，等待执行上下文释 121 + 放出来。 122 + 123 + 124 + 应用程序编程接口 (API) 125 + ====================== 126 + 127 + ``alloc_workqueue()`` 分配了一个wq。原来的 ``create_*workqueue()`` 128 + 函数已被废弃，并计划删除。 ``alloc_workqueue()`` 需要三个 129 + 参数 - ``@name`` , ``@flags`` 和 ``@max_active`` 。 130 + ``@name`` 是wq的名称，如果有的话，也用作救援线程的名称。 131 + 132 + 一个wq不再管理执行资源，而是作为前进进度保证、刷新(flush)和 133 + 工作项属性的域。 ``@flags`` 和 ``@max_active`` 控制着工作 134 + 项如何被分配执行资源、安排和执行。 135 + 136 + 137 + ``flags`` 138 + --------- 139 + 140 + ``WQ_UNBOUND`` 141 + 排队到非绑定wq的工作项由特殊的工作者池提供服务，这些工作者不 142 + 绑定在任何特定的CPU上。这使得wq表现得像一个简单的执行环境提 143 + 供者，没有并发管理。非绑定工作者池试图尽快开始执行工作项。非 144 + 绑定的wq牺牲了局部性，但在以下情况下是有用的。 145 + 146 + * 预计并发水平要求会有很大的波动，使用绑定的wq最终可能会在不 147 + 同的CPU上产生大量大部分未使用的工作者，因为发起线程在不同 148 + 的CPU上跳转。 149 + 150 + * 长期运行的CPU密集型工作负载，可以由系统调度器更好地管理。 151 + 152 + ``WQ_FREEZABLE`` 153 + 一个可冻结的wq参与了系统暂停操作的冻结阶段。wq上的工作项被 154 + 排空，在解冻之前没有新的工作项开始执行。 155 + 156 + ``WQ_MEM_RECLAIM`` 157 + 所有可能在内存回收路径中使用的wq都必须设置这个标志。无论内 158 + 存压力如何，wq都能保证至少有一个执行上下文。 159 + 160 + ``WQ_HIGHPRI`` 161 + 高优先级wq的工作项目被排到目标cpu的高优先级工作者池中。高 162 + 优先级的工作者池由具有较高级别的工作者线程提供服务。 163 + 164 + 请注意，普通工作者池和高优先级工作者池之间并不相互影响。他 165 + 们各自维护其独立的工作者池，并在其工作者之间实现并发管理。 166 + 167 + ``WQ_CPU_INTENSIVE`` 168 + CPU密集型wq的工作项对并发水平没有贡献。换句话说，可运行的 169 + CPU密集型工作项不会阻止同一工作者池中的其他工作项开始执行。 170 + 这对于那些预计会占用CPU周期的绑定工作项很有用，这样它们的 171 + 执行就会受到系统调度器的监管。 172 + 173 + 尽管CPU密集型工作项不会对并发水平做出贡献，但它们的执行开 174 + 始仍然受到并发管理的管制，可运行的非CPU密集型工作项会延迟 175 + CPU密集型工作项的执行。 176 + 177 + 这个标志对于未绑定的wq来说是没有意义的。 178 + 179 + 请注意，标志 ``WQ_NON_REENTRANT`` 不再存在，因为现在所有的工作 180 + 队列都是不可逆的——任何工作项都保证在任何时间内最多被整个系统的一 181 + 个工作者执行。 182 + 183 + 184 + ``max_active`` 185 + -------------- 186 + 187 + ``@max_active`` 决定了每个CPU可以分配给wq的工作项的最大执行上 188 + 下文数量。例如，如果 ``@max_active为16`` ，每个CPU最多可以同 189 + 时执行16个wq的工作项。 190 + 191 + 目前，对于一个绑定的wq， ``@max_active`` 的最大限制是512，当指 192 + 定为0时使用的默认值是256。对于非绑定的wq，其限制是512和 193 + 4 * ``num_possible_cpus()`` 中的较高值。这些值被选得足够高，所 194 + 以它们不是限制性因素，同时会在失控情况下提供保护。 195 + 196 + 一个wq的活动工作项的数量通常由wq的用户来调节，更具体地说，是由用 197 + 户在同一时间可以排列多少个工作项来调节。除非有特定的需求来控制活动 198 + 工作项的数量，否则建议指定为"0"。 199 + 200 + 一些用户依赖于ST wq的严格执行顺序。 ``@max_active`` 为1和 ``WQ_UNBOUND`` 201 + 的组合用来实现这种行为。这种wq上的工作项目总是被排到未绑定的工作池 202 + 中，并且在任何时候都只有一个工作项目处于活动状态，从而实现与ST wq相 203 + 同的排序属性。 204 + 205 + 在目前的实现中，上述配置只保证了特定NUMA节点内的ST行为。相反， 206 + ``alloc_ordered_queue()`` 应该被用来实现全系统的ST行为。 207 + 208 + 209 + 执行场景示例 210 + ============ 211 + 212 + 下面的示例执行场景试图说明cmwq在不同配置下的行为。 213 + 214 + 工作项w0、w1、w2被排到同一个CPU上的一个绑定的wq q0上。w0 215 + 消耗CPU 5ms，然后睡眠10ms，然后在完成之前再次消耗CPU 5ms。 216 + 217 + 忽略所有其他的任务、工作和处理开销，并假设简单的FIFO调度， 218 + 下面是一个高度简化的原始wq的可能事件序列的版本。:: 219 + 220 + TIME IN MSECS EVENT 221 + 0 w0 starts and burns CPU 222 + 5 w0 sleeps 223 + 15 w0 wakes up and burns CPU 224 + 20 w0 finishes 225 + 20 w1 starts and burns CPU 226 + 25 w1 sleeps 227 + 35 w1 wakes up and finishes 228 + 35 w2 starts and burns CPU 229 + 40 w2 sleeps 230 + 50 w2 wakes up and finishes 231 + 232 + And with cmwq with ``@max_active`` >= 3, :: 233 + 234 + TIME IN MSECS EVENT 235 + 0 w0 starts and burns CPU 236 + 5 w0 sleeps 237 + 5 w1 starts and burns CPU 238 + 10 w1 sleeps 239 + 10 w2 starts and burns CPU 240 + 15 w2 sleeps 241 + 15 w0 wakes up and burns CPU 242 + 20 w0 finishes 243 + 20 w1 wakes up and finishes 244 + 25 w2 wakes up and finishes 245 + 246 + 如果 ``@max_active`` == 2, :: 247 + 248 + TIME IN MSECS EVENT 249 + 0 w0 starts and burns CPU 250 + 5 w0 sleeps 251 + 5 w1 starts and burns CPU 252 + 10 w1 sleeps 253 + 15 w0 wakes up and burns CPU 254 + 20 w0 finishes 255 + 20 w1 wakes up and finishes 256 + 20 w2 starts and burns CPU 257 + 25 w2 sleeps 258 + 35 w2 wakes up and finishes 259 + 260 + 现在，我们假设w1和w2被排到了不同的wq q1上，这个wq q1 261 + 有 ``WQ_CPU_INTENSIVE`` 设置:: 262 + 263 + TIME IN MSECS EVENT 264 + 0 w0 starts and burns CPU 265 + 5 w0 sleeps 266 + 5 w1 and w2 start and burn CPU 267 + 10 w1 sleeps 268 + 15 w2 sleeps 269 + 15 w0 wakes up and burns CPU 270 + 20 w0 finishes 271 + 20 w1 wakes up and finishes 272 + 25 w2 wakes up and finishes 273 + 274 + 275 + 指南 276 + ==== 277 + 278 + * 如果一个wq可能处理在内存回收期间使用的工作项目，请不 279 + 要忘记使用 ``WQ_MEM_RECLAIM`` 。每个设置了 280 + ``WQ_MEM_RECLAIM`` 的wq都有一个为其保留的执行环境。 281 + 如果在内存回收过程中使用的多个工作项之间存在依赖关系， 282 + 它们应该被排在不同的wq中，每个wq都有 ``WQ_MEM_RECLAIM`` 。 283 + 284 + * 除非需要严格排序，否则没有必要使用ST wq。 285 + 286 + * 除非有特殊需要，建议使用0作为@max_active。在大多数使用情 287 + 况下，并发水平通常保持在默认限制之下。 288 + 289 + * 一个wq作为前进进度保证（WQ_MEM_RECLAIM，冲洗（flush）和工 290 + 作项属性的域。不涉及内存回收的工作项，不需要作为工作项组的一 291 + 部分被刷新，也不需要任何特殊属性，可以使用系统中的一个wq。使 292 + 用专用wq和系统wq在执行特性上没有区别。 293 + 294 + * 除非工作项预计会消耗大量的CPU周期，否则使用绑定的wq通常是有 295 + 益的，因为wq操作和工作项执行中的定位水平提高了。 296 + 297 + 298 + 调试 299 + ==== 300 + 301 + 因为工作函数是由通用的工作者线程执行的，所以需要一些手段来揭示一些行为不端的工作队列用户。 302 + 303 + 工作者线程在进程列表中显示为: :: 304 + 305 + root 5671 0.0 0.0 0 0 ? S 12:07 0:00 [kworker/0:1] 306 + root 5672 0.0 0.0 0 0 ? S 12:07 0:00 [kworker/1:2] 307 + root 5673 0.0 0.0 0 0 ? S 12:12 0:00 [kworker/0:0] 308 + root 5674 0.0 0.0 0 0 ? S 12:13 0:00 [kworker/1:0] 309 + 310 + 如果kworkers失控了（使用了太多的cpu），有两类可能的问题: 311 + 312 + 1. 正在迅速调度的事情 313 + 2. 一个消耗大量cpu周期的工作项。 314 + 315 + 第一个可以用追踪的方式进行跟踪: :: 316 + 317 + $ echo workqueue:workqueue_queue_work > /sys/kernel/debug/tracing/set_event 318 + $ cat /sys/kernel/debug/tracing/trace_pipe > out.txt 319 + (wait a few secs) 320 + 321 + 如果有什么东西在工作队列上忙着做循环，它就会主导输出，可以用工作项函数确定违规者。 322 + 323 + 对于第二类问题，应该可以只检查违规工作者线程的堆栈跟踪。 :: 324 + 325 + $ cat /proc/THE_OFFENDING_KWORKER/stack 326 + 327 + 工作项函数在堆栈追踪中应该是微不足道的。 328 + 329 + 330 + 内核内联文档参考 331 + ================ 332 + 333 + 该API在以下内核代码中: 334 + 335 + include/linux/workqueue.h 336 + 337 + kernel/workqueue.c

+1 -1

Documentation/translations/zh_CN/dev-tools/index.rst

··· 19 19 :maxdepth: 2 20 20 21 21 gcov 22 + kasan 22 23 23 24 Todolist: 24 25 25 26 - coccinelle 26 27 - sparse 27 28 - kcov 28 - - kasan 29 29 - ubsan 30 30 - kmemleak 31 31 - kcsan

+417

Documentation/translations/zh_CN/dev-tools/kasan.rst

··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + .. include:: ../disclaimer-zh_CN.rst 4 + 5 + :Original: Documentation/dev-tools/kasan.rst 6 + :Translator: 万家兵 Wan Jiabing <wanjiabing@vivo.com> 7 + 8 + 内核地址消毒剂(KASAN) 9 + ===================== 10 + 11 + 概述 12 + ---- 13 + 14 + KernelAddressSANitizer(KASAN)是一种动态内存安全错误检测工具，主要功能是 15 + 检查内存越界访问和使用已释放内存的问题。KASAN有三种模式: 16 + 17 + 1. 通用KASAN（与用户空间的ASan类似） 18 + 2. 基于软件标签的KASAN（与用户空间的HWASan类似） 19 + 3. 基于硬件标签的KASAN（基于硬件内存标签） 20 + 21 + 由于通用KASAN的内存开销较大，通用KASAN主要用于调试。基于软件标签的KASAN 22 + 可用于dogfood测试，因为它具有较低的内存开销，并允许将其用于实际工作量。 23 + 基于硬件标签的KASAN具有较低的内存和性能开销，因此可用于生产。同时可用于 24 + 检测现场内存问题或作为安全缓解措施。 25 + 26 + 软件KASAN模式（#1和#2）使用编译时工具在每次内存访问之前插入有效性检查， 27 + 因此需要一个支持它的编译器版本。 28 + 29 + 通用KASAN在GCC和Clang受支持。GCC需要8.3.0或更高版本。任何受支持的Clang 30 + 版本都是兼容的，但从Clang 11才开始支持检测全局变量的越界访问。 31 + 32 + 基于软件标签的KASAN模式仅在Clang中受支持。 33 + 34 + 硬件KASAN模式（#3）依赖硬件来执行检查，但仍需要支持内存标签指令的编译器 35 + 版本。GCC 10+和Clang 11+支持此模式。 36 + 37 + 两种软件KASAN模式都适用于SLUB和SLAB内存分配器，而基于硬件标签的KASAN目前 38 + 仅支持SLUB。 39 + 40 + 目前x86_64、arm、arm64、xtensa、s390、riscv架构支持通用KASAN模式，仅 41 + arm64架构支持基于标签的KASAN模式。 42 + 43 + 用法 44 + ---- 45 + 46 + 要启用KASAN，请使用以下命令配置内核:: 47 + 48 + CONFIG_KASAN=y 49 + 50 + 同时在 ``CONFIG_KASAN_GENERIC`` (启用通用KASAN模式)， ``CONFIG_KASAN_SW_TAGS`` 51 + (启用基于硬件标签的KASAN模式)，和 ``CONFIG_KASAN_HW_TAGS`` (启用基于硬件标签 52 + 的KASAN模式)之间进行选择。 53 + 54 + 对于软件模式，还可以在 ``CONFIG_KASAN_OUTLINE`` 和 ``CONFIG_KASAN_INLINE`` 55 + 之间进行选择。outline和inline是编译器插桩类型。前者产生较小的二进制文件， 56 + 而后者快1.1-2倍。 57 + 58 + 要将受影响的slab对象的alloc和free堆栈跟踪包含到报告中，请启用 59 + ``CONFIG_STACKTRACE`` 。要包括受影响物理页面的分配和释放堆栈跟踪的话， 60 + 请启用 ``CONFIG_PAGE_OWNER`` 并使用 ``page_owner=on`` 进行引导。 61 + 62 + 错误报告 63 + ~~~~~~~~ 64 + 65 + 典型的KASAN报告如下所示:: 66 + 67 + ================================================================== 68 + BUG: KASAN: slab-out-of-bounds in kmalloc_oob_right+0xa8/0xbc [test_kasan] 69 + Write of size 1 at addr ffff8801f44ec37b by task insmod/2760 70 + 71 + CPU: 1 PID: 2760 Comm: insmod Not tainted 4.19.0-rc3+ #698 72 + Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 73 + Call Trace: 74 + dump_stack+0x94/0xd8 75 + print_address_description+0x73/0x280 76 + kasan_report+0x144/0x187 77 + __asan_report_store1_noabort+0x17/0x20 78 + kmalloc_oob_right+0xa8/0xbc [test_kasan] 79 + kmalloc_tests_init+0x16/0x700 [test_kasan] 80 + do_one_initcall+0xa5/0x3ae 81 + do_init_module+0x1b6/0x547 82 + load_module+0x75df/0x8070 83 + __do_sys_init_module+0x1c6/0x200 84 + __x64_sys_init_module+0x6e/0xb0 85 + do_syscall_64+0x9f/0x2c0 86 + entry_SYSCALL_64_after_hwframe+0x44/0xa9 87 + RIP: 0033:0x7f96443109da 88 + RSP: 002b:00007ffcf0b51b08 EFLAGS: 00000202 ORIG_RAX: 00000000000000af 89 + RAX: ffffffffffffffda RBX: 000055dc3ee521a0 RCX: 00007f96443109da 90 + RDX: 00007f96445cff88 RSI: 0000000000057a50 RDI: 00007f9644992000 91 + RBP: 000055dc3ee510b0 R08: 0000000000000003 R09: 0000000000000000 92 + R10: 00007f964430cd0a R11: 0000000000000202 R12: 00007f96445cff88 93 + R13: 000055dc3ee51090 R14: 0000000000000000 R15: 0000000000000000 94 + 95 + Allocated by task 2760: 96 + save_stack+0x43/0xd0 97 + kasan_kmalloc+0xa7/0xd0 98 + kmem_cache_alloc_trace+0xe1/0x1b0 99 + kmalloc_oob_right+0x56/0xbc [test_kasan] 100 + kmalloc_tests_init+0x16/0x700 [test_kasan] 101 + do_one_initcall+0xa5/0x3ae 102 + do_init_module+0x1b6/0x547 103 + load_module+0x75df/0x8070 104 + __do_sys_init_module+0x1c6/0x200 105 + __x64_sys_init_module+0x6e/0xb0 106 + do_syscall_64+0x9f/0x2c0 107 + entry_SYSCALL_64_after_hwframe+0x44/0xa9 108 + 109 + Freed by task 815: 110 + save_stack+0x43/0xd0 111 + __kasan_slab_free+0x135/0x190 112 + kasan_slab_free+0xe/0x10 113 + kfree+0x93/0x1a0 114 + umh_complete+0x6a/0xa0 115 + call_usermodehelper_exec_async+0x4c3/0x640 116 + ret_from_fork+0x35/0x40 117 + 118 + The buggy address belongs to the object at ffff8801f44ec300 119 + which belongs to the cache kmalloc-128 of size 128 120 + The buggy address is located 123 bytes inside of 121 + 128-byte region [ffff8801f44ec300, ffff8801f44ec380) 122 + The buggy address belongs to the page: 123 + page:ffffea0007d13b00 count:1 mapcount:0 mapping:ffff8801f7001640 index:0x0 124 + flags: 0x200000000000100(slab) 125 + raw: 0200000000000100 ffffea0007d11dc0 0000001a0000001a ffff8801f7001640 126 + raw: 0000000000000000 0000000080150015 00000001ffffffff 0000000000000000 127 + page dumped because: kasan: bad access detected 128 + 129 + Memory state around the buggy address: 130 + ffff8801f44ec200: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb 131 + ffff8801f44ec280: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc 132 + >ffff8801f44ec300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 03 133 + ^ 134 + ffff8801f44ec380: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb 135 + ffff8801f44ec400: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc 136 + ================================================================== 137 + 138 + 报告标题总结了发生的错误类型以及导致该错误的访问类型。紧随其后的是错误访问的 139 + 堆栈跟踪、所访问内存分配位置的堆栈跟踪（对于访问了slab对象的情况）以及对象 140 + 被释放的位置的堆栈跟踪（对于访问已释放内存的问题报告）。接下来是对访问的 141 + slab对象的描述以及关于访问的内存页的信息。 142 + 143 + 最后，报告展示了访问地址周围的内存状态。在内部，KASAN单独跟踪每个内存颗粒的 144 + 内存状态，根据KASAN模式分为8或16个对齐字节。报告的内存状态部分中的每个数字 145 + 都显示了围绕访问地址的其中一个内存颗粒的状态。 146 + 147 + 对于通用KASAN，每个内存颗粒的大小为8个字节。每个颗粒的状态被编码在一个影子字节 148 + 中。这8个字节可以是可访问的，部分访问的，已释放的或成为Redzone的一部分。KASAN 149 + 对每个影子字节使用以下编码:00表示对应内存区域的所有8个字节都可以访问；数字N 150 + (1 <= N <= 7)表示前N个字节可访问，其他(8 - N)个字节不可访问；任何负值都表示 151 + 无法访问整个8字节。KASAN使用不同的负值来区分不同类型的不可访问内存，如redzones 152 + 或已释放的内存（参见 mm/kasan/kasan.h）。 153 + 154 + 在上面的报告中，箭头指向影子字节 ``03`` ，表示访问的地址是部分可访问的。 155 + 156 + 对于基于标签的KASAN模式，报告最后的部分显示了访问地址周围的内存标签 157 + (参考 `实施细则`_ 章节)。 158 + 159 + 请注意，KASAN错误标题（如 ``slab-out-of-bounds`` 或 ``use-after-free`` ） 160 + 是尽量接近的:KASAN根据其拥有的有限信息打印出最可能的错误类型。错误的实际类型 161 + 可能会有所不同。 162 + 163 + 通用KASAN还报告两个辅助调用堆栈跟踪。这些堆栈跟踪指向代码中与对象交互但不直接 164 + 出现在错误访问堆栈跟踪中的位置。目前，这包括 call_rcu() 和排队的工作队列。 165 + 166 + 启动参数 167 + ~~~~~~~~ 168 + 169 + KASAN受通用 ``panic_on_warn`` 命令行参数的影响。启用该功能后，KASAN在打印错误 170 + 报告后会引起内核恐慌。 171 + 172 + 默认情况下，KASAN只为第一次无效内存访问打印错误报告。使用 ``kasan_multi_shot`` ， 173 + KASAN会针对每个无效访问打印报告。这有效地禁用了KASAN报告的 ``panic_on_warn`` 。 174 + 175 + 基于硬件标签的KASAN模式（请参阅下面有关各种模式的部分）旨在在生产中用作安全缓解 176 + 措施。因此，它支持允许禁用KASAN或控制其功能的引导参数。 177 + 178 + - ``kasan=off`` 或 ``=on`` 控制KASAN是否启用 (默认: ``on`` )。 179 + 180 + - ``kasan.mode=sync`` 或 ``=async`` 控制KASAN是否配置为同步或异步执行模式(默认: 181 + ``sync`` )。同步模式：当标签检查错误发生时，立即检测到错误访问。异步模式： 182 + 延迟错误访问检测。当标签检查错误发生时，信息存储在硬件中（在arm64的 183 + TFSR_EL1寄存器中）。内核会定期检查硬件，并且仅在这些检查期间报告标签错误。 184 + 185 + - ``kasan.stacktrace=off`` 或 ``=on`` 禁用或启用alloc和free堆栈跟踪收集 186 + (默认: ``on`` )。 187 + 188 + - ``kasan.fault=report`` 或 ``=panic`` 控制是只打印KASAN报告还是同时使内核恐慌 189 + (默认: ``report`` )。即使启用了 ``kasan_multi_shot`` ，也会发生内核恐慌。 190 + 191 + 实施细则 192 + -------- 193 + 194 + 通用KASAN 195 + ~~~~~~~~~ 196 + 197 + 软件KASAN模式使用影子内存来记录每个内存字节是否可以安全访问，并使用编译时工具 198 + 在每次内存访问之前插入影子内存检查。 199 + 200 + 通用KASAN将1/8的内核内存专用于其影子内存（16TB以覆盖x86_64上的128TB），并使用 201 + 具有比例和偏移量的直接映射将内存地址转换为其相应的影子地址。 202 + 203 + 这是将地址转换为其相应影子地址的函数:: 204 + 205 + static inline void *kasan_mem_to_shadow(const void *addr) 206 + { 207 + return (void *)((unsigned long)addr >> KASAN_SHADOW_SCALE_SHIFT) 208 + + KASAN_SHADOW_OFFSET; 209 + } 210 + 211 + 在这里 ``KASAN_SHADOW_SCALE_SHIFT = 3`` 。 212 + 213 + 编译时工具用于插入内存访问检查。编译器在每次访问大小为1、2、4、8或16的内存之前 214 + 插入函数调用( ``__asan_load*(addr)`` , ``__asan_store*(addr)``)。这些函数通过 215 + 检查相应的影子内存来检查内存访问是否有效。 216 + 217 + 使用inline插桩，编译器不进行函数调用，而是直接插入代码来检查影子内存。此选项 218 + 显著地增大了内核体积，但与outline插桩内核相比，它提供了x1.1-x2的性能提升。 219 + 220 + 通用KASAN是唯一一种通过隔离延迟重新使用已释放对象的模式 221 + （参见 mm/kasan/quarantine.c 以了解实现）。 222 + 223 + 基于软件标签的KASAN模式 224 + ~~~~~~~~~~~~~~~~~~~~~~~ 225 + 226 + 基于软件标签的KASAN使用软件内存标签方法来检查访问有效性。目前仅针对arm64架构实现。 227 + 228 + 基于软件标签的KASAN使用arm64 CPU的顶部字节忽略(TBI)特性在内核指针的顶部字节中 229 + 存储一个指针标签。它使用影子内存来存储与每个16字节内存单元相关的内存标签(因此， 230 + 它将内核内存的1/16专用于影子内存)。 231 + 232 + 在每次内存分配时，基于软件标签的KASAN都会生成一个随机标签，用这个标签标记分配 233 + 的内存，并将相同的标签嵌入到返回的指针中。 234 + 235 + 基于软件标签的KASAN使用编译时工具在每次内存访问之前插入检查。这些检查确保正在 236 + 访问的内存的标签等于用于访问该内存的指针的标签。如果标签不匹配，基于软件标签 237 + 的KASAN会打印错误报告。 238 + 239 + 基于软件标签的KASAN也有两种插桩模式（outline，发出回调来检查内存访问；inline， 240 + 执行内联的影子内存检查）。使用outline插桩模式，会从执行访问检查的函数打印错误 241 + 报告。使用inline插桩，编译器会发出 ``brk`` 指令，并使用专用的 ``brk`` 处理程序 242 + 来打印错误报告。 243 + 244 + 基于软件标签的KASAN使用0xFF作为匹配所有指针标签（不检查通过带有0xFF指针标签 245 + 的指针进行的访问）。值0xFE当前保留用于标记已释放的内存区域。 246 + 247 + 基于软件标签的KASAN目前仅支持对Slab和page_alloc内存进行标记。 248 + 249 + 基于硬件标签的KASAN模式 250 + ~~~~~~~~~~~~~~~~~~~~~~~ 251 + 252 + 基于硬件标签的KASAN在概念上类似于软件模式，但它是使用硬件内存标签作为支持而 253 + 不是编译器插桩和影子内存。 254 + 255 + 基于硬件标签的KASAN目前仅针对arm64架构实现，并且基于ARMv8.5指令集架构中引入 256 + 的arm64内存标记扩展(MTE)和最高字节忽略(TBI)。 257 + 258 + 特殊的arm64指令用于为每次内存分配指定内存标签。相同的标签被指定给指向这些分配 259 + 的指针。在每次内存访问时，硬件确保正在访问的内存的标签等于用于访问该内存的指针 260 + 的标签。如果标签不匹配，则会生成故障并打印报告。 261 + 262 + 基于硬件标签的KASAN使用0xFF作为匹配所有指针标签（不检查通过带有0xFF指针标签的 263 + 指针进行的访问）。值0xFE当前保留用于标记已释放的内存区域。 264 + 265 + 基于硬件标签的KASAN目前仅支持对Slab和page_alloc内存进行标记。 266 + 267 + 如果硬件不支持MTE（ARMv8.5之前），则不会启用基于硬件标签的KASAN。在这种情况下， 268 + 所有KASAN引导参数都将被忽略。 269 + 270 + 请注意，启用CONFIG_KASAN_HW_TAGS始终会导致启用内核中的TBI。即使提供了 271 + ``kasan.mode=off`` 或硬件不支持MTE（但支持TBI）。 272 + 273 + 基于硬件标签的KASAN只报告第一个发现的错误。之后，MTE标签检查将被禁用。 274 + 275 + 影子内存 276 + -------- 277 + 278 + 内核将内存映射到地址空间的几个不同部分。内核虚拟地址的范围很大：没有足够的真实 279 + 内存来支持内核可以访问的每个地址的真实影子区域。因此，KASAN只为地址空间的某些 280 + 部分映射真实的影子。 281 + 282 + 默认行为 283 + ~~~~~~~~ 284 + 285 + 默认情况下，体系结构仅将实际内存映射到用于线性映射的阴影区域（以及可能的其他 286 + 小区域）。对于所有其他区域 —— 例如vmalloc和vmemmap空间 —— 一个只读页面被映射 287 + 到阴影区域上。这个只读的影子页面声明所有内存访问都是允许的。 288 + 289 + 这给模块带来了一个问题：它们不存在于线性映射中，而是存在于专用的模块空间中。 290 + 通过连接模块分配器，KASAN临时映射真实的影子内存以覆盖它们。例如，这允许检测 291 + 对模块全局变量的无效访问。 292 + 293 + 这也造成了与 ``VMAP_STACK`` 的不兼容：如果堆栈位于vmalloc空间中，它将被分配 294 + 只读页面的影子内存，并且内核在尝试为堆栈变量设置影子数据时会出错。 295 + 296 + CONFIG_KASAN_VMALLOC 297 + ~~~~~~~~~~~~~~~~~~~~ 298 + 299 + 使用 ``CONFIG_KASAN_VMALLOC`` ，KASAN可以以更大的内存使用为代价覆盖vmalloc 300 + 空间。目前，这在x86、riscv、s390和powerpc上受支持。 301 + 302 + 这通过连接到vmalloc和vmap并动态分配真实的影子内存来支持映射。 303 + 304 + vmalloc空间中的大多数映射都很小，需要不到一整页的阴影空间。因此，为每个映射 305 + 分配一个完整的影子页面将是一种浪费。此外，为了确保不同的映射使用不同的影子 306 + 页面，映射必须与 ``KASAN_GRANULE_SIZE * PAGE_SIZE`` 对齐。 307 + 308 + 相反，KASAN跨多个映射共享后备空间。当vmalloc空间中的映射使用影子区域的特定 309 + 页面时，它会分配一个后备页面。此页面稍后可以由其他vmalloc映射共享。 310 + 311 + KASAN连接到vmap基础架构以懒清理未使用的影子内存。 312 + 313 + 为了避免交换映射的困难，KASAN预测覆盖vmalloc空间的阴影区域部分将不会被早期 314 + 的阴影页面覆盖，但是将不会被映射。这将需要更改特定于arch的代码。 315 + 316 + 这允许在x86上支持 ``VMAP_STACK`` ，并且可以简化对没有固定模块区域的架构的支持。 317 + 318 + 对于开发者 319 + ---------- 320 + 321 + 忽略访问 322 + ~~~~~~~~ 323 + 324 + 软件KASAN模式使用编译器插桩来插入有效性检查。此类检测可能与内核的某些部分 325 + 不兼容，因此需要禁用。 326 + 327 + 内核的其他部分可能会访问已分配对象的元数据。通常，KASAN会检测并报告此类访问， 328 + 但在某些情况下（例如，在内存分配器中），这些访问是有效的。 329 + 330 + 对于软件KASAN模式，要禁用特定文件或目录的检测，请将 ``KASAN_SANITIZE`` 添加 331 + 到相应的内核Makefile中: 332 + 333 + - 对于单个文件(例如，main.o):: 334 + 335 + KASAN_SANITIZE_main.o := n 336 + 337 + - 对于一个目录下的所有文件:: 338 + 339 + KASAN_SANITIZE := n 340 + 341 + 对于软件KASAN模式，要在每个函数的基础上禁用检测，请使用KASAN特定的 342 + ``__no_sanitize_address`` 函数属性或通用的 ``noinstr`` 。 343 + 344 + 请注意，禁用编译器插桩（基于每个文件或每个函数）会使KASAN忽略在软件KASAN模式 345 + 的代码中直接发生的访问。当访问是间接发生的（通过调用检测函数）或使用没有编译器 346 + 插桩的基于硬件标签的模式时，它没有帮助。 347 + 348 + 对于软件KASAN模式，要在当前任务的一部分内核代码中禁用KASAN报告，请使用 349 + ``kasan_disable_current()``/``kasan_enable_current()`` 部分注释这部分代码。 350 + 这也会禁用通过函数调用发生的间接访问的报告。 351 + 352 + 对于基于标签的KASAN模式（包括硬件模式），要禁用访问检查，请使用 353 + ``kasan_reset_tag()`` 或 ``page_kasan_tag_reset()`` 。请注意，通过 354 + ``page_kasan_tag_reset()`` 临时禁用访问检查需要通过 ``page_kasan_tag`` 355 + / ``page_kasan_tag_set`` 保存和恢复每页KASAN标签。 356 + 357 + 测试 358 + ~~~~ 359 + 360 + 有一些KASAN测试可以验证KASAN是否正常工作并可以检测某些类型的内存损坏。 361 + 测试由两部分组成: 362 + 363 + 1. 与KUnit测试框架集成的测试。使用 ``CONFIG_KASAN_KUNIT_TEST`` 启用。 364 + 这些测试可以通过几种不同的方式自动运行和部分验证；请参阅下面的说明。 365 + 366 + 2. 与KUnit不兼容的测试。使用 ``CONFIG_KASAN_MODULE_TEST`` 启用并且只能作为模块 367 + 运行。这些测试只能通过加载内核模块并检查内核日志以获取KASAN报告来手动验证。 368 + 369 + 如果检测到错误，每个KUnit兼容的KASAN测试都会打印多个KASAN报告之一，然后测试打印 370 + 其编号和状态。 371 + 372 + 当测试通过:: 373 + 374 + ok 28 - kmalloc_double_kzfree 375 + 376 + 当由于 ``kmalloc`` 失败而导致测试失败时:: 377 + 378 + # kmalloc_large_oob_right: ASSERTION FAILED at lib/test_kasan.c:163 379 + Expected ptr is not null, but is 380 + not ok 4 - kmalloc_large_oob_right 381 + 382 + 当由于缺少KASAN报告而导致测试失败时:: 383 + 384 + # kmalloc_double_kzfree: EXPECTATION FAILED at lib/test_kasan.c:629 385 + Expected kasan_data->report_expected == kasan_data->report_found, but 386 + kasan_data->report_expected == 1 387 + kasan_data->report_found == 0 388 + not ok 28 - kmalloc_double_kzfree 389 + 390 + 最后打印所有KASAN测试的累积状态。成功:: 391 + 392 + ok 1 - kasan 393 + 394 + 或者，如果其中一项测试失败:: 395 + 396 + not ok 1 - kasan 397 + 398 + 有几种方法可以运行与KUnit兼容的KASAN测试。 399 + 400 + 1. 可加载模块 401 + 402 + 启用 ``CONFIG_KUNIT`` 后，KASAN-KUnit测试可以构建为可加载模块，并通过使用 403 + ``insmod`` 或 ``modprobe`` 加载 ``test_kasan.ko`` 来运行。 404 + 405 + 2. 内置 406 + 407 + 通过内置 ``CONFIG_KUNIT`` ，也可以内置KASAN-KUnit测试。在这种情况下， 408 + 测试将在启动时作为后期初始化调用运行。 409 + 410 + 3. 使用kunit_tool 411 + 412 + 通过内置 ``CONFIG_KUNIT`` 和 ``CONFIG_KASAN_KUNIT_TEST`` ，还可以使用 413 + ``kunit_tool`` 以更易读的方式查看KUnit测试结果。这不会打印通过测试 414 + 的KASAN报告。有关 ``kunit_tool`` 更多最新信息，请参阅 415 + `KUnit文档 <https://www.kernel.org/doc/html/latest/dev-tools/kunit/index.html>`_ 。 416 + 417 + .. _KUnit: https://www.kernel.org/doc/html/latest/dev-tools/kunit/index.html

+3 -2

Documentation/translations/zh_CN/index.rst

··· 4 4 5 5 \renewcommand\thesection* 6 6 \renewcommand\thesubsection* 7 + \kerneldocCJKon 7 8 8 9 .. _linux_doc_zh: 9 10 ··· 73 72 dev-tools/index 74 73 doc-guide/index 75 74 kernel-hacking/index 75 + maintainer/index 76 76 77 77 TODOList: 78 78 79 79 * trace/index 80 - * maintainer/index 81 80 * fault-injection/index 82 81 * livepatch/index 83 82 * rust/index ··· 154 153 arm64/index 155 154 riscv/index 156 155 openrisc/index 156 + parisc/index 157 157 158 158 TODOList: 159 159 ··· 162 160 * ia64/index 163 161 * m68k/index 164 162 * nios2/index 165 - * parisc/index 166 163 * powerpc/index 167 164 * s390/index 168 165 * sh/index

+62

Documentation/translations/zh_CN/maintainer/configure-git.rst

··· 1 + .. include:: ../disclaimer-zh_CN.rst 2 + 3 + :Original: Documentation/maintainer/configure-git.rst 4 + 5 + :译者: 6 + 7 + 吴想成 Wu XiangCheng <bobwxc@email.cn> 8 + 9 + .. _configuregit_zh: 10 + 11 + Git配置 12 + ======= 13 + 14 + 本章讲述了维护者级别的git配置。 15 + 16 + Documentation/maintainer/pull-requests.rst 中使用的标记分支应使用开发人员的 17 + GPG公钥进行签名。可以通过将 ``-u`` 标志传递给 ``git tag`` 来创建签名标记。 18 + 但是，由于 *通常* 对同一项目使用同一个密钥，因此可以设置:: 19 + 20 + git config user.signingkey "keyname" 21 + 22 + 或者手动编辑你的 ``.git/config`` 或 ``~/.gitconfig`` 文件:: 23 + 24 + [user] 25 + name = Jane Developer 26 + email = jd@domain.org 27 + signingkey = jd@domain.org 28 + 29 + 你可能需要告诉 ``git`` 去使用 ``gpg2``:: 30 + 31 + [gpg] 32 + program = /path/to/gpg2 33 + 34 + 你可能也需要告诉 ``gpg`` 去使用哪个 ``tty`` （添加到你的shell rc文件中）:: 35 + 36 + export GPG_TTY=$(tty) 37 + 38 + 39 + 创建链接到lore.kernel.org的提交 40 + ------------------------------- 41 + 42 + http://lore.kernel.org 网站是所有涉及或影响内核开发的邮件列表的总存档。在这里 43 + 存储补丁存档是推荐的做法，当维护人员将补丁应用到子系统树时，最好提供一个指向 44 + lore存档链接的标签，以便浏览提交历史的人可以找到某个更改背后的相关讨论和基本 45 + 原理。链接标签如下所示： 46 + 47 + Link: https://lore.kernel.org/r/<message-id> 48 + 49 + 通过在git中添加以下钩子，可以将此配置为在发布 ``git am`` 时自动执行： 50 + 51 + .. code-block:: none 52 + 53 + $ git config am.messageid true 54 + $ cat >.git/hooks/applypatch-msg <<'EOF' 55 + #!/bin/sh 56 + . git-sh-setup 57 + perl -pi -e 's|^Message-Id:\s*<?([^>]+)>?$|Link: https://lore.kernel.org/r/$1|g;' "$1" 58 + test -x "$GIT_DIR/hooks/commit-msg" && 59 + exec "$GIT_DIR/hooks/commit-msg" ${1+"$@"} 60 + : 61 + EOF 62 + $ chmod a+x .git/hooks/applypatch-msg

+21

Documentation/translations/zh_CN/maintainer/index.rst

··· 1 + .. include:: ../disclaimer-zh_CN.rst 2 + 3 + :Original: Documentation/maintainer/index.rst 4 + 5 + ============== 6 + 内核维护者手册 7 + ============== 8 + 9 + 本文档本是内核维护者手册的首页。 10 + 本手册还需要大量完善！请自由提出（和编写）本手册的补充内容。 11 + *译注：指英文原版* 12 + 13 + .. toctree:: 14 + :maxdepth: 2 15 + 16 + configure-git 17 + rebasing-and-merging 18 + pull-requests 19 + maintainer-entry-profile 20 + modifying-patches 21 +

+92

Documentation/translations/zh_CN/maintainer/maintainer-entry-profile.rst

··· 1 + .. include:: ../disclaimer-zh_CN.rst 2 + 3 + :Original: Documentation/maintainer/maintainer-entry-profile.rst 4 + 5 + :译者: 6 + 7 + 吴想成 Wu XiangCheng <bobwxc@email.cn> 8 + 9 + .. _maintainerentryprofile_zh: 10 + 11 + 维护者条目概要 12 + ============== 13 + 14 + 维护人员条目概要补充了顶层过程文档（提交补丁，提交驱动程序……），增加了子系 15 + 统/设备驱动程序本地习惯以及有关补丁提交生命周期的相关内容。贡献者使用此文档 16 + 来调整他们的期望和避免常见错误；维护人员可以使用这些信息超越子系统层面查看 17 + 是否有机会汇聚到通用实践中。 18 + 19 + 20 + 总览 21 + ---- 22 + 23 + 提供了子系统如何操作的介绍。MAINTAINERS文件告诉了贡献者应发送某文件的补丁到哪， 24 + 但它没有传达其他子系统的本地基础设施和机制以协助开发。 25 + 26 + 请考虑以下问题： 27 + 28 + - 当补丁被本地树接纳或合并到上游时是否有通知？ 29 + - 子系统是否使用patchwork实例？Patchwork状态变更是否有通知？ 30 + - 是否有任何机器人或CI基础设施监视列表，或子系统是否使用自动测试反馈以便把 31 + 控接纳补丁？ 32 + - 被拉入-next的Git分支是哪个？ 33 + - 贡献者应针对哪个分支提交？ 34 + - 是否链接到其他维护者条目概要？例如一个设备驱动可能指向其父子系统的条目。 35 + 这使得贡献者意识到某维护者可能对提交链中其他维护者负有的义务。 36 + 37 + 38 + 提交检查单补遗 39 + -------------- 40 + 41 + 列出强制性和咨询性标准，超出通用标准“提交检查表，以便维护者检查一个补丁是否 42 + 足够健康。例如：“通过checkpatch.pl，没有错误、没有警告。通过单元测试详见某处”。 43 + 44 + 提交检查单补遗还可以包括有关硬件规格状态的详细信息。例如，子系统接受补丁之前 45 + 是否需要考虑在某个修订版上发布的规范。 46 + 47 + 48 + 开发周期的关键日期 49 + ------------------ 50 + 51 + 提交者常常会误以为补丁可以在合并窗口关闭之前的任何时间发送，且下一个-rc1时仍 52 + 可以。事实上，大多数补丁都需要在下一个合并窗口打开之前提前进入linux-next中。 53 + 向提交者澄清关键日期（以-rc发布周为标志）以明确什么时候补丁会被考虑合并以及 54 + 何时需要等待下一个-rc。 55 + 56 + 至少需要讲明： 57 + 58 + - 最后一个可以提交新功能的-rc： 59 + 针对下一个合并窗口的新功能提交应该在此点之前首次发布以供考虑。在此时间点 60 + 之后提交的补丁应该明确他们的目标为下下个合并窗口，或者给出应加快进度被接受 61 + 的充足理由。通常新特性贡献者的提交应出现在-rc5之前。 62 + 63 + - 最后合并-rc：合并决策的最后期限。 64 + 向贡献者指出尚未接受的补丁集需要等待下下个合并窗口。当然，维护者没有义务 65 + 接受所有给定的补丁集，但是如果审阅在此时间点尚未结束，那么希望贡献者应该 66 + 等待并在下一个合并窗口重新提交。 67 + 68 + 可选项： 69 + 70 + - 开发基线分支的首个-rc，列在概述部分，视为已为新提交做好准备。 71 + 72 + 73 + 审阅节奏 74 + -------- 75 + 76 + 贡献者最担心的问题之一是：补丁集已发布却未收到反馈，应在多久后发送提醒。除了 77 + 指定在重新提交之前要等待多长时间，还可以指示更新的首选样式；例如，重新发送 78 + 整个系列，或私下发送提醒邮件。本节也可以列出本区域的代码审阅方式，以及获取 79 + 不能直接从维护者那里得到的反馈的方法。 80 + 81 + 82 + 现有概要 83 + -------- 84 + 85 + 这里列出了现有的维护人员条目概要；我们可能会想要在不久的将来做一些不同的事情。 86 + 87 + .. toctree:: 88 + :maxdepth: 1 89 + 90 + ../doc-guide/maintainer-profile 91 + ../../../nvdimm/maintainer-entry-profile 92 + ../../../riscv/patch-acceptance

+51

Documentation/translations/zh_CN/maintainer/modifying-patches.rst

··· 1 + .. include:: ../disclaimer-zh_CN.rst 2 + 3 + :Original: Documentation/maintainer/modifying-patches.rst 4 + 5 + :译者: 6 + 7 + 吴想成 Wu XiangCheng <bobwxc@email.cn> 8 + 9 + .. _modifyingpatches_zh: 10 + 11 + 修改补丁 12 + ======== 13 + 14 + 如果你是子系统或者分支的维护者，由于代码在你的和提交者的树中并不完全相同， 15 + 有时你需要稍微修改一下收到的补丁以合并它们。 16 + 17 + 如果你严格遵守开发者来源证书的规则（c），你应该要求提交者重做，但这完全是会 18 + 适得其反的时间、精力浪费。规则（b）允许你调整代码，但这样修改提交者的代码并 19 + 让他背书你的错误是非常不礼貌的。为解决此问题，建议在你之前最后一个 20 + Signed-off-by标签和你的之间添加一行，以指示更改的性质。这没有强制性要求，最 21 + 好在描述前面加上你的邮件和/或姓名，用方括号括住整行，以明显指出你对最后一刻 22 + 的更改负责。例如:: 23 + 24 + Signed-off-by: Random J Developer <random@developer.example.org> 25 + [lucky@maintainer.example.org: struct foo moved from foo.c to foo.h] 26 + Signed-off-by: Lucky K Maintainer <lucky@maintainer.example.org> 27 + 28 + 如果您维护着一个稳定的分支，并希望同时明确贡献、跟踪更改、合并修复，并保护 29 + 提交者免受责难，这种做法尤其有用。请注意，在任何情况下都不得更改作者的身份 30 + （From头），因为它会在变更日志中显示。 31 + 32 + 向后移植（back-port）人员特别要注意：为了便于跟踪，请在提交消息的顶部（即主题行 33 + 之后）插入补丁的来源，这是一种常见而有用的做法。例如，我们可以在3.x稳定版本 34 + 中看到以下内容:: 35 + 36 + Date: Tue Oct 7 07:26:38 2014 -0400 37 + 38 + libata: Un-break ATA blacklist 39 + 40 + commit 1c40279960bcd7d52dbdf1d466b20d24b99176c8 upstream. 41 + 42 + 下面是一个旧的内核在某补丁被向后移植后会出现的:: 43 + 44 + Date: Tue May 13 22:12:27 2008 +0200 45 + 46 + wireless, airo: waitbusy() won't delay 47 + 48 + [backport of 2.6 commit b7acbdfbd1f277c1eb23f344f899cfa4cd0bf36a] 49 + 50 + 不管什么格式，这些信息都为人们跟踪你的树，以及试图解决你树中的错误的人提供了 51 + 有价值的帮助。

+148

Documentation/translations/zh_CN/maintainer/pull-requests.rst

··· 1 + .. include:: ../disclaimer-zh_CN.rst 2 + 3 + :Original: Documentation/maintainer/pull-requests.rst 4 + 5 + :译者: 6 + 7 + 吴想成 Wu XiangCheng <bobwxc@email.cn> 8 + 9 + .. _pullrequests_zh: 10 + 11 + 如何创建拉取请求 12 + ================ 13 + 14 + 本章描述维护人员如何创建并向其他维护人员提交拉取请求。这对将更改从一个维护者 15 + 树转移到另一个维护者树非常有用。 16 + 17 + 本文档由Tobin C. Harding（当时他尚不是一名经验丰富的维护人员）编写，内容主要 18 + 来自Greg Kroah Hartman和Linus Torvalds在LKML上的评论。Jonathan Corbet和Mauro 19 + Carvalho Chehab提出了一些建议和修改。错误不可避免，如有问题，请找Tobin C. 20 + Harding <me@tobin.cc>。 21 + 22 + 原始邮件线程:: 23 + 24 + http://lkml.kernel.org/r/20171114110500.GA21175@kroah.com 25 + 26 + 27 + 创建分支 28 + -------- 29 + 30 + 首先，您需要将希望包含拉取请求里的所有更改都放在单独分支中。通常您将基于某开发 31 + 人员树的一个分支，一般是打算向其发送拉取请求的开发人员。 32 + 33 + 为了创建拉取请求，您必须首先标记刚刚创建的分支。建议您选择一个有意义的标记名， 34 + 以即使过了一段时间您和他人仍能理解的方式。在名称中包含源子系统和目标内核版本 35 + 的指示也是一个好的做法。 36 + 37 + Greg提供了以下内容。对于一个含有drivers/char中混杂事项、将应用于4.15-rc1内核的 38 + 拉取请求，可以命名为 ``char-misc-4.15-rc1`` 。如果要在 ``char-misc-next`` 分支 39 + 上打上此标记，您可以使用以下命令:: 40 + 41 + git tag -s char-misc-4.15-rc1 char-misc-next 42 + 43 + 这将在 ``char-misc-next`` 分支的最后一个提交上创建一个名为 ``char-misc-4.15-rc1`` 44 + 的标记，并用您的gpg密钥签名（参见 Documentation/maintainer/configure-git.rst ）。 45 + 46 + Linus只接受基于签名过的标记的拉取请求。其他维护者可能会有所不同。 47 + 48 + 当您运行上述命令时 ``git`` 会打开编辑器要求你描述一下这个标记。在本例中您需要 49 + 描述拉取请求，所以请概述一下包含的内容，为什么要合并，是否完成任何测试。所有 50 + 这些信息都将留在标记中，然后在维护者合并拉取请求时保留在合并提交中。所以把它 51 + 写好，它将永远留在内核中。 52 + 53 + 正如Linus所说:: 54 + 55 + 不管怎么样，至少对我来说，重要的是 *信息* 。我需要知道我在拉取什么、 56 + 为什么我要拉取。我也希望将此消息用于合并消息，因此它不仅应该对我有 57 + 意义，也应该可以成为一个有意义的历史记录。 58 + 59 + 注意，如果拉取请求有一些不寻常的地方，请详细说明。如果你修改了并非 60 + 由你维护的文件，请解释 **为什么** 。我总会在差异中看到的，如果你不 61 + 提的话，我只会觉得分外可疑。当你在合并窗口后给我发新东西的时候， 62 + （甚至是比较重大的错误修复），不仅需要解释做了什么、为什么这么做， 63 + 还请解释一下 **时间问题** 。为什么错过了合并窗口…… 64 + 65 + 我会看你写在拉取请求邮件和签名标记里面的内容，所以根据你的工作流， 66 + 你可以在签名标记里面描述工作内容（也会自动放进拉取请求邮件），也 67 + 可以只在标记里面放个占位符，稍后在你实际发给我拉取请求时描述工作内容。 68 + 69 + 是的，我会编辑这些消息。部分因为我需要做一些琐碎的格式调整（整体缩进、 70 + 括号等），也因为此消息可能对我有意义（描述了冲突或一些个人问题）而对 71 + 合并提交信息上下文没啥意义，因此我需要尽力让它有意义起来。我也会 72 + 修复一些拼写和语法错误，特别是非母语者（母语者也是;^）。但我也会删掉 73 + 或增加一些内容。 74 + 75 + Linus 76 + 77 + Greg给出了一个拉取请求的例子:: 78 + 79 + Char/Misc patches for 4.15-rc1 80 + 81 + Here is the big char/misc patch set for the 4.15-rc1 merge window. 82 + Contained in here is the normal set of new functions added to all 83 + of these crazy drivers, as well as the following brand new 84 + subsystems: 85 + - time_travel_controller: Finally a set of drivers for the 86 + latest time travel bus architecture that provides i/o to 87 + the CPU before it asked for it, allowing uninterrupted 88 + processing 89 + - relativity_shifters: due to the affect that the 90 + time_travel_controllers have on the overall system, there 91 + was a need for a new set of relativity shifter drivers to 92 + accommodate the newly formed black holes that would 93 + threaten to suck CPUs into them. This subsystem handles 94 + this in a way to successfully neutralize the problems. 95 + There is a Kconfig option to force these to be enabled 96 + when needed, so problems should not occur. 97 + 98 + All of these patches have been successfully tested in the latest 99 + linux-next releases, and the original problems that it found have 100 + all been resolved (apologies to anyone living near Canberra for the 101 + lack of the Kconfig options in the earlier versions of the 102 + linux-next tree creations.) 103 + 104 + Signed-off-by: Your-name-here <your_email@domain> 105 + 106 + 107 + 此标记消息格式就像一个git提交。顶部有一行“总结标题”，一定要在下面sign-off。 108 + 109 + 现在您已经有了一个本地签名标记，您需要将它推送到可以被拉取的位置:: 110 + 111 + git push origin char-misc-4.15-rc1 112 + 113 + 114 + 创建拉取请求 115 + ------------ 116 + 117 + 最后要做的是创建拉取请求消息。可以使用 ``git request-pull`` 命令让 ``git`` 118 + 为你做这件事，但它需要确定你想拉取什么，以及拉取针对的基础（显示正确的拉取 119 + 更改和变更状态）。以下命令将生成一个拉取请求:: 120 + 121 + git request-pull master git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git/ char-misc-4.15-rc1 122 + 123 + 引用Greg的话:: 124 + 125 + 此命令要求git比较从“char-misc-4.15-rc1”标记位置到“master”分支头（上述 126 + 例子中指向了我从Linus的树分叉的地方，通常是-rc发布）的差异，并去使用 127 + git:// 协议拉取。如果你希望使用 https:// 协议，也可以用在这里（但是请 128 + 注意，部分人由于防火墙问题没法用https协议拉取）。 129 + 130 + 如果char-misc-4.15-rc1标记没有出现在我要求拉取的仓库中，git会提醒 131 + 它不在那里，所以记得推送到公开地方。 132 + 133 + “git request-pull”会包含git树的地址和需要拉取的特定标记，以及标记 134 + 描述全文（详尽描述标记）。同时它也会创建此拉取请求的差异状态和单个 135 + 提交的缩短日志。 136 + 137 + Linus回复说他倾向于 ``git://`` 协议。其他维护者可能有不同的偏好。另外，请注意 138 + 如果你创建的拉取请求没有签名标记， ``https://`` 可能是更好的选择。完整的讨论 139 + 请看原邮件。 140 + 141 + 142 + 提交拉取请求 143 + ------------ 144 + 145 + 拉取请求的提交方式与普通补丁相同。向维护人员发送内联电子邮件并抄送LKML以及 146 + 任何必要特定子系统的列表。对Linus的拉取请求通常有如下主题行:: 147 + 148 + [GIT PULL] <subsystem> changes for v4.15-rc1

+165

Documentation/translations/zh_CN/maintainer/rebasing-and-merging.rst

··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + .. include:: ../disclaimer-zh_CN.rst 4 + 5 + :Original: Documentation/maintainer/rebasing-and-merging.rst 6 + 7 + :译者: 8 + 9 + 吴想成 Wu XiangCheng <bobwxc@email.cn> 10 + 11 + ========== 12 + 变基与合并 13 + ========== 14 + 15 + 一般来说，维护子系统需要熟悉Git源代码管理系统。Git是一个功能强大的工具，有 16 + 很多功能；就像这类工具常出现的情况一样，使用这些功能的方法有对有错。本文档 17 + 特别介绍了变基与合并的用法。维护者经常在错误使用这些工具时遇到麻烦，但避免 18 + 问题实际上并不那么困难。 19 + 20 + 总的来说，需要注意的一点是：与许多其他项目不同，内核社区并不害怕在其开发历史 21 + 中看到合并提交。事实上，考虑到该项目的规模，避免合并几乎是不可能的。维护者会 22 + 在希望避免合并时遇到一些问题，而过于频繁的合并也会带来另一些问题。 23 + 24 + 变基 25 + ==== 26 + 27 + “变基（Rebase）”是更改存储库中一系列提交的历史记录的过程。有两种不同型的操作 28 + 都被称为变基，因为这两种操作都使用 ``git rebase`` 命令，但它们之间存在显著 29 + 差异： 30 + 31 + - 更改一系列补丁的父提交（起始提交）。例如，变基操作可以将基于上一内核版本 32 + 的一个补丁集重建到当前版本上。在下面的讨论中，我们将此操作称为“变根”。 33 + 34 + - 通过修复（或删除）损坏的提交、添加补丁、添加标记以更改一系列补丁的历史， 35 + 来提交变更日志或更改已应用提交的顺序。在下文中，这种类型的操作称为“历史 36 + 修改” 37 + 38 + 术语“变基”将用于指代上述两种操作。如果使用得当，变基可以产生更清晰、更整洁的 39 + 开发历史；如果使用不当，它可能会模糊历史并引入错误。 40 + 41 + 以下一些经验法则可以帮助开发者避免最糟糕的变基风险： 42 + 43 + - 已经发布到你私人系统之外世界的历史通常不应更改。其他人可能会拉取你的树 44 + 的副本，然后基于它进行工作；修改你的树会给他们带来麻烦。如果工作需要变基， 45 + 这通常是表明它还没有准备好提交到公共存储库的信号。 46 + 47 + 但是，总有例外。有些树（linux-next是一个典型的例子）由于它们的需要经常 48 + 变基，开发人员知道不要基于它们来工作。开发人员有时会公开一个不稳定的分支， 49 + 供其他人或自动测试服务进行测试。如果您确实以这种方式公开了一个可能不稳定 50 + 的分支，请确保潜在使用者知道不要基于它来工作。 51 + 52 + - 不要在包含由他人创建的历史的分支上变基。如果你从别的开发者的仓库拉取了变更， 53 + 那你现在就成了他们历史记录的保管人。你不应该改变它，除了少数例外情况。例如 54 + 树中有问题的提交必须显式恢复（即通过另一个补丁修复），而不是通过修改历史而 55 + 消失。 56 + 57 + - 没有合理理由，不要对树变根。仅为了切换到更新的基或避免与上游储存库的合并 58 + 通常不是合理理由。 59 + 60 + - 如果你必须对储存库进行变根，请不要随机选取一个提交作为新基。在发布节点之间 61 + 内核通常处于一个相对不稳定的状态；基于其中某点进行开发会增加遇到意外错误的 62 + 几率。当一系列补丁必须移动到新基时，请选择移动到一个稳定节点（例如-rc版本 63 + 节点）。 64 + 65 + - 请知悉对补丁系列进行变根（或做明显的历史修改）会改变它们的开发环境，且很 66 + 可能使做过的大部分测试失效。一般来说，变基后的补丁系列应当像新代码一样对 67 + 待，并重新测试。 68 + 69 + 合并窗口麻烦的一个常见原因是，Linus收到了一个明显在拉取请求发送之前不久才变根 70 + （通常是变根到随机的提交上）的补丁系列。这样一个系列被充分测试的可能性相对较 71 + 低，拉取请求被接受的几率也同样较低。 72 + 73 + 相反，如果变基仅限于私有树、提交基于一个通用的起点、且经过充分测试，则引起 74 + 麻烦的可能性就很低。 75 + 76 + 合并 77 + ==== 78 + 79 + 内核开发过程中，合并是一个很常见的操作；5.1版本开发周期中有超过1126个合并 80 + ——差不多占了整体的9%。内核开发工作积累在100多个不同的子系统树中，每个 81 + 子系统树都可能包含多个主题分支；每个分支通常独立于其他分支进行开发。因此 82 + 在任何给定分支进入上游储存库之前，至少需要一次合并。 83 + 84 + 许多项目要求拉取请求中的分支基于当前主干，这样历史记录中就不会出现合并提交。 85 + 内核并不是这样；任何为了避免合并而重新对分支变基都很可能导致麻烦。 86 + 87 + 子系统维护人员发现他们必须进行两种类型的合并：从较低层级的子系统树和从其他 88 + 子系统树（同级树或主线）进行合并。这两种情况下要遵循的最佳实践是不同的。 89 + 90 + 合并较低层级树 91 + -------------- 92 + 93 + 较大的子系统往往有多个级别的维护人员，较低级别的维护人员向较高级别发送拉取 94 + 请求。合并这样的请求执行几乎肯定会生成一个合并提交；这也是应该的。实际上， 95 + 子系统维护人员可能希望在极少数快进合并情况下使用 ``-–no-ff`` 标志来强制添加 96 + 合并提交，以便记录合并的原因。 **任何** 类型的合并的变更日志必须说明 97 + *为什么* 合并。对于较低级别的树，“为什么”通常是对该取所带来的变化的总结。 98 + 99 + 各级维护人员都应在他们的拉取请求上使用经签名的标签，上游维护人员应在拉取分支 100 + 时验证标签。不这样做会威胁整个开发过程的安全。 101 + 102 + 根据上面列出的规则，一旦您将其他人的历史记录合并到树中，您就不得对该分支进行 103 + 变基，即使您能够这样做。 104 + 105 + 合并同级树或上游树 106 + ------------------ 107 + 108 + 虽然来自下游的合并是常见且不起眼的，但当需要将一个分支推向上游时，其中来自 109 + 其他树的合并往往是一个危险信号。这种合并需要仔细考虑并加以充分证明，否则后续 110 + 的拉取请求很可能会被拒绝。 111 + 112 + 想要将主分支合并到存储库中是很自然的；这种类型的合并通常被称为“反向合并” 113 + 。反向合并有助于确保与并行的开发没有冲突，并且通常会给人一种温暖、舒服的 114 + 感觉，即处于最新。但这种诱惑几乎总是应该避免的。 115 + 116 + 为什么呢？反向合并将搅乱你自己分支的开发历史。它们会大大增加你遇到来自社区 117 + 其他地方的错误的机会，且使你很难确保你所管理的工作稳定并准备好合入上游。 118 + 频繁的合并还可以掩盖树中开发过程中的问题；它们会隐藏与其他树的交互，而这些 119 + 交互不应该（经常）发生在管理良好的分支中。 120 + 121 + 也就是说，偶尔需要进行反向合并；当这种情况发生时，一定要在提交信息中记录 122 + *为什么* 。同样，在一个众所周知的稳定点进行合并，而不是随机提交。即使这样， 123 + 你也不应该反向合并一棵比你的直接上游树更高层级的树；如果确实需要更高级别的 124 + 反向合并，应首先在上游树进行。 125 + 126 + 导致合并相关问题最常见的原因之一是：在发送拉取请求之前维护者合并上游以解决 127 + 合并冲突。同样，这种诱惑很容易理解，但绝对应该避免。对于最终拉取请求来说 128 + 尤其如此：Linus坚信他更愿意看到合并冲突，而不是不必要的反向合并。看到冲突 129 + 可以让他了解潜在的问题所在。他做过很多合并（在5.1版本开发周期中是382次）， 130 + 而且在解决冲突方面也很在行——通常比参与的开发人员要强。 131 + 132 + 那么，当他们的子系统分支和主线之间发生冲突时，维护人员应该怎么做呢？最重要 133 + 的一步是在拉取请求中提示Linus会发生冲突；如果啥都没说则表明您的分支可以正常 134 + 合入。对于特别困难的冲突，创建并推送一个 *独立* 分支来展示你将如何解决问题。 135 + 在拉取请求中提到该分支，但是请求本身应该针对未合并的分支。 136 + 137 + 即使不存在已知冲突，在发送拉取请求之前进行合并测试也是个好主意。它可能会提醒 138 + 您一些在linux-next树中没有发现的问题，并帮助您准确地理解您正在要求上游做什么。 139 + 140 + 合并上游树或另一个子系统树的另一个原因是解决依赖关系。这些依赖性问题有时确实 141 + 会发生，而且有时与另一棵树交叉合并是解决这些问题的最佳方法；同样，在这种情况 142 + 下，合并提交应该解释为什么要进行合并。花点时间把它做好；会有人阅读这些变更 143 + 日志。 144 + 145 + 然而依赖性问题通常表明需要改变方法。合并另一个子系统树以解决依赖性风险会带来 146 + 其他缺陷，几乎永远不应这样做。如果该子系统树无法被合到上游，那么它的任何问题 147 + 也都会阻碍你的树合并。更可取的选择包括与维护人员达成一致意见，在其中一个树中 148 + 同时进行两组更改；或者创建一个主题分支专门处理可以合并到两个树中的先决条件提交。 149 + 如果依赖关系与主要的基础结构更改相关，正确的解决方案可能是将依赖提交保留一个 150 + 开发周期，以便这些更改有时间在主线上稳定。 151 + 152 + 最后 153 + ==== 154 + 155 + 在开发周期的开头合并主线是比较常见的，可以获取树中其他地方的更改和修复。同样， 156 + 这样的合并应该选择一个众所周知的发布点，而不是一些随机点。如果在合并窗口期间 157 + 上游分支已完全清空到主线中，则可以使用以下命令向前拉取它:: 158 + 159 + git merge v5.2-rc1^0 160 + 161 + “^0”使Git执行快进合并（在这种情况下这应该可以），从而避免多余的虚假合并提交。 162 + 163 + 上面列出的就是指导方针了。总是会有一些情况需要不同的解决方案，这些指导原则 164 + 不应阻止开发人员在需要时做正确的事情。但是，我们应该时刻考虑是否真的出现了 165 + 这样的需求，并准备好解释为什么需要做一些不寻常的事情。

+42

Documentation/translations/zh_CN/parisc/debugging.rst

··· 1 + .. include:: ../disclaimer-zh_CN.rst 2 + 3 + :Original: Documentation/parisc/debugging.rst 4 + :Translator: Yanteng Si <siyanteng@loongson.cn> 5 + 6 + .. _cn_parisc_debugging: 7 + 8 + ================= 9 + 调试PA-RISC 10 + ================= 11 + 12 + 好吧，这里有一些关于调试linux/parisc的较底层部分的信息。 13 + 14 + 15 + 1. 绝对地址 16 + ===================== 17 + 18 + 很多汇编代码目前运行在实模式下，这意味着会使用绝对地址，而不是像内核其他 19 + 部分那样使用虚拟地址。要将绝对地址转换为虚拟地址，你可以在System.map中查 20 + 找，添加__PAGE_OFFSET（目前是0x10000000）。 21 + 22 + 23 + 2. HPMCs 24 + ======== 25 + 26 + 当实模式的代码试图访问不存在的内存时，会出现HPMC（high priority machine 27 + check）而不是内核oops。若要调试HPMC，请尝试找到系统响应程序/请求程序地址。 28 + 系统请求程序地址应该与（某）处理器的HPA（I/O范围内的高地址）相匹配；系统响应程 29 + 序地址是实模式代码试图访问的地址。 30 + 31 + 系统响应程序地址的典型值是大于__PAGE_OFFSET （0x10000000）的地址，这意味着 32 + 在实模式试图访问它之前，虚拟地址没有被翻译成物理地址。 33 + 34 + 35 + 3. 有趣的Q位 36 + ============ 37 + 38 + 某些非常关键的代码必须清除PSW中的Q位。当Q位被清除时，CPU不会更新中断处理 39 + 程序所读取的寄存器，以找出机器被中断的位置——所以如果你在清除Q位的指令和再 40 + 次设置Q位的RFI之间遇到中断，你不知道它到底发生在哪里。如果你幸运的话，IAOQ 41 + 会指向清除Q位的指令，如果你不幸运的话，它会指向任何地方。通常Q位的问题会 42 + 表现为无法解释的系统挂起或物理内存越界。

+28

Documentation/translations/zh_CN/parisc/index.rst

··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + .. include:: ../disclaimer-zh_CN.rst 3 + 4 + :Original: Documentation/parisc/index.rst 5 + :Translator: Yanteng Si <siyanteng@loongson.cn> 6 + 7 + .. _cn_parisc_index: 8 + 9 + ==================== 10 + PA-RISC体系架构 11 + ==================== 12 + 13 + .. toctree:: 14 + :maxdepth: 2 15 + 16 + debugging 17 + registers 18 + 19 + Todolist: 20 + 21 + features 22 + 23 + .. only:: subproject and html 24 + 25 + Indices 26 + ======= 27 + 28 + * :ref:`genindex`

+153

Documentation/translations/zh_CN/parisc/registers.rst

··· 1 + .. include:: ../disclaimer-zh_CN.rst 2 + 3 + :Original: Documentation/parisc/registers.rst 4 + :Translator: Yanteng Si <siyanteng@loongson.cn> 5 + 6 + .. _cn_parisc_registers: 7 + 8 + ========================= 9 + Linux/PA-RISC的寄存器用法 10 + ========================= 11 + 12 + [ 用星号表示目前尚未实现的计划用途。 ] 13 + 14 + ABI约定的通用寄存器 15 + =================== 16 + 17 + 控制寄存器 18 + ---------- 19 + 20 + ============================ ================================= 21 + CR 0 (恢复计数器) 用于ptrace 22 + CR 1-CR 7(无定义) 未使用 23 + CR 8 (Protection ID) 每进程值* 24 + CR 9, 12, 13 (PIDS) 未使用 25 + CR10 (CCR) FPU延迟保存* 26 + CR11 按照ABI的规定（SAR） 27 + CR14 (中断向量) 初始化为 fault_vector 28 + CR15 (EIEM) 所有位初始化为1* 29 + CR16 (间隔计时器) 读取周期数/写入开始时间间隔计时器 30 + CR17-CR22 中断参数 31 + CR19 中断指令寄存器 32 + CR20 中断空间寄存器 33 + CR21 中断偏移量寄存器 34 + CR22 中断 PSW 35 + CR23 (EIRR) 读取未决中断/写入清除位 36 + CR24 (TR 0) 内核空间页目录指针 37 + CR25 (TR 1) 用户空间页目录指针 38 + CR26 (TR 2) 不使用 39 + CR27 (TR 3) 线程描述符指针 40 + CR28 (TR 4) 不使用 41 + CR29 (TR 5) 不使用 42 + CR30 (TR 6) 当前 / 0 43 + CR31 (TR 7) 临时寄存器，在不同地方使用 44 + ============================ ================================= 45 + 46 + 空间寄存器（内核模式） 47 + ---------------------- 48 + 49 + ======== ============================== 50 + SR0 临时空间寄存器 51 + SR4-SR7 设置为0 52 + SR1 临时空间寄存器 53 + SR2 内核不应该破坏它 54 + SR3 用于用户空间访问（当前进程） 55 + ======== ============================== 56 + 57 + 空间寄存器（用户模式） 58 + ---------------------- 59 + 60 + ======== ============================ 61 + SR0 临时空间寄存器 62 + SR1 临时空间寄存器 63 + SR2 保存Linux gateway page的空间 64 + SR3 在内核中保存用户地址空间的值 65 + SR4-SR7 定义了用户/内核的短地址空间 66 + ======== ============================ 67 + 68 + 69 + 处理器状态字 70 + ------------ 71 + 72 + ====================== ================================================ 73 + W （64位地址） 0 74 + E （小尾端） 0 75 + S （安全间隔计时器） 0 76 + T （产生分支陷阱） 0 77 + H （高特权级陷阱） 0 78 + L （低特权级陷阱） 0 79 + N （撤销下一条指令）被C代码使用 80 + X （数据存储中断禁用） 0 81 + B （产生分支）被C代码使用 82 + C （代码地址转译） 1, 在执行实模式代码时为0 83 + V （除法步长校正）被C代码使用 84 + M （HPMC 掩码） 0, 在执行HPMC操作*时为1 85 + C/B （进/借位）被C代码使用 86 + O （有序引用） 1* 87 + F （性能监视器） 0 88 + R （回收计数器陷阱） 0 89 + Q （收集中断状态） 1 （在rfi之前的代码中为0） 90 + P （保护标识符） 1* 91 + D （数据地址转译） 1, 在执行实模式代码时为0 92 + I （外部中断掩码）由cli()/sti()宏使用。 93 + ====================== ================================================ 94 + 95 + “隐形”寄存器（影子寄存器） 96 + --------------------------- 97 + 98 + ============= =================== 99 + PSW W 默认值 0 100 + PSW E 默认值 0 101 + 影子寄存器被中断处理代码使用 102 + TOC启用位 1 103 + ============= =================== 104 + 105 + ---------------------------------------------------------- 106 + 107 + PA-RISC架构定义了7个寄存器作为“影子寄存器”。这些寄存器在 108 + RETURN FROM INTERRUPTION AND RESTORE指令中使用，通过消 109 + 除中断处理程序中对一般寄存器（GR）的保存和恢复的需要来减 110 + 少状态保存和恢复时间。影子寄存器是GRs 1, 8, 9, 16, 17, 111 + 24和25。 112 + 113 + ------------------------------------------------------------------------- 114 + 115 + 寄存器使用说明，最初由John Marvin提供，并由Randolph Chung提供一些补充说明。 116 + 117 + 对于通用寄存器: 118 + 119 + r1,r2,r19-r26,r28,r29 & r31可以在不保存它们的情况下被使用。当然，如果你 120 + 关心它们，在调用另一个程序之前，你也需要保存它们。上面的一些寄存器确实 121 + 有特殊的含义，你应该注意一下: 122 + 123 + r1: 124 + addil指令是硬性规定将其结果放在r1中，所以如果你使用这条指令要 125 + 注意这点。 126 + 127 + r2: 128 + 这就是返回指针。一般来说，你不想使用它，因为你需要这个指针来返 129 + 回给你的调用者。然而，它与这组寄存器组合在一起，因为调用者不能 130 + 依赖你返回时的值是相同的，也就是说，你可以将r2复制到另一个寄存 131 + 器，并在作废r2后通过该寄存器返回，这应该不会给调用程序带来问题。 132 + 133 + r19-r22: 134 + 这些通常被认为是临时寄存器。 135 + 请注意，在64位中它们是arg7-arg4。 136 + 137 + r23-r26: 138 + 这些是arg3-arg0，也就是说，如果你不再关心传入的值，你可以使用 139 + 它们。 140 + 141 + r28,r29: 142 + 这俩是ret0和ret1。它们是你传入返回值的地方。r28是主返回值。当返回 143 + 小结构体时，r29也可以用来将数据传回给调用程序。 144 + 145 + r30: 146 + 栈指针 147 + 148 + r31: 149 + ble指令将返回指针放在这里。 150 + 151 + 152 + r3-r18,r27,r30需要被保存和恢复。r3-r18只是一般用途的寄存器。 153 + r27是数据指针，用来使对全局变量的引用更容易。r30是栈指针。

+1 -1

Documentation/translations/zh_CN/process/8.Conclusion.rst

··· 19 19 :ref:`Documentation/translations/zh_CN/process/howto.rst <cn_process_howto>` 20 20 文件是一个重要的起点； 21 21 :ref:`Documentation/translations/zh_CN/process/submitting-patches.rst <cn_submittingpatches>` 22 - 和 :ref:`Documentation/transaltions/zh_CN/process/submitting-drivers.rst <cn_submittingdrivers>` 22 + 和 :ref:`Documentation/translations/zh_CN/process/submitting-drivers.rst <cn_submittingdrivers>` 23 23 也是所有内核开发人员都应该阅读的内容。许多内部内核API都是使用kerneldoc机制 24 24 记录的；“make htmldocs”或“make pdfdocs”可用于以HTML或PDF格式生成这些文档 25 25 （尽管某些发行版提供的tex版本会遇到内部限制，无法正确处理文档）。

+1 -1

Documentation/translations/zh_CN/process/coding-style.rst

··· 61 61 case 'K': 62 62 case 'k': 63 63 mem <<= 10; 64 - /* fall through */ 64 + fallthrough; 65 65 default: 66 66 break; 67 67 }

+1 -1

Documentation/usb/ehci.rst

··· 1 - =========== 1 + =========== 2 2 EHCI driver 3 3 =========== 4 4

+1 -1

Documentation/usb/gadget_printer.rst

··· 1 - =============================== 1 + =============================== 2 2 Linux USB Printer Gadget Driver 3 3 =============================== 4 4

+6 -5

Documentation/userspace-api/landlock.rst

··· 145 145 146 146 Landlock enables to restrict access to file hierarchies, which means that these 147 147 access rights can be propagated with bind mounts (cf. 148 - :doc:`/filesystems/sharedsubtree`) but not with :doc:`/filesystems/overlayfs`. 148 + Documentation/filesystems/sharedsubtree.rst) but not with 149 + Documentation/filesystems/overlayfs.rst. 149 150 150 151 A bind mount mirrors a source file hierarchy to a destination. The destination 151 152 hierarchy is then composed of the exact same files, on which Landlock rules can ··· 171 170 172 171 Every new thread resulting from a :manpage:`clone(2)` inherits Landlock domain 173 172 restrictions from its parent. This is similar to the seccomp inheritance (cf. 174 - :doc:`/userspace-api/seccomp_filter`) or any other LSM dealing with task's 175 - :manpage:`credentials(7)`. For instance, one process's thread may apply 173 + Documentation/userspace-api/seccomp_filter.rst) or any other LSM dealing with 174 + task's :manpage:`credentials(7)`. For instance, one process's thread may apply 176 175 Landlock rules to itself, but they will not be automatically applied to other 177 176 sibling threads (unlike POSIX thread credential changes, cf. 178 177 :manpage:`nptl(7)`). ··· 279 278 ------------ 280 279 281 280 Kernel memory allocated to create rulesets is accounted and can be restricted 282 - by the :doc:`/admin-guide/cgroup-v1/memory`. 281 + by the Documentation/admin-guide/cgroup-v1/memory.rst. 283 282 284 283 Questions and answers 285 284 ===================== ··· 304 303 Additional documentation 305 304 ======================== 306 305 307 - * :doc:`/security/landlock` 306 + * Documentation/security/landlock.rst 308 307 * https://landlock.io 309 308 310 309 .. Links

+1 -1

Documentation/virt/kvm/api.rst

··· 6620 6620 by running an enclave in a VM, KVM prevents access to privileged attributes by 6621 6621 default. 6622 6622 6623 - See Documentation/x86/sgx/2.Kernel-internals.rst for more details. 6623 + See Documentation/x86/sgx.rst for more details. 6624 6624 6625 6625 7.26 KVM_CAP_PPC_RPT_INVALIDATE 6626 6626 -------------------------------

+1 -1

Documentation/virt/kvm/s390-pv-boot.rst

··· 10 10 I/O or the hypervisor. In those cases where the hypervisor needs to 11 11 access the memory of a PVM, that memory must be made accessible. 12 12 Memory made accessible to the hypervisor will be encrypted. See 13 - :doc:`s390-pv` for details." 13 + Documentation/virt/kvm/s390-pv.rst for details." 14 14 15 15 On IPL (boot) a small plaintext bootloader is started, which provides 16 16 information about the encrypted components and necessary metadata to

+1 -1

Documentation/virt/kvm/vcpu-requests.rst

··· 304 304 References 305 305 ========== 306 306 307 - .. [atomic-ops] Documentation/core-api/atomic_ops.rst 307 + .. [atomic-ops] Documentation/atomic_bitops.txt and Documentation/atomic_t.txt 308 308 .. [memory-barriers] Documentation/memory-barriers.txt 309 309 .. [lwn-mb] https://lwn.net/Articles/573436/

+2 -2

Documentation/vm/zswap.rst

··· 10 10 Zswap is a lightweight compressed cache for swap pages. It takes pages that are 11 11 in the process of being swapped out and attempts to compress them into a 12 12 dynamically allocated RAM-based memory pool. zswap basically trades CPU cycles 13 - for potentially reduced swap I/O. This trade-off can also result in a 13 + for potentially reduced swap I/O. This trade-off can also result in a 14 14 significant performance improvement if reads from the compressed cache are 15 15 faster than reads from a swap device. 16 16 ··· 26 26 performance impact of swapping. 27 27 * Overcommitted guests that share a common I/O resource can 28 28 dramatically reduce their swap I/O pressure, avoiding heavy handed I/O 29 - throttling by the hypervisor. This allows more work to get done with less 29 + throttling by the hypervisor. This allows more work to get done with less 30 30 impact to the guest workload and guests sharing the I/O subsystem 31 31 * Users with SSDs as swap devices can extend the life of the device by 32 32 drastically reducing life-shortening writes.

+2 -2

Documentation/x86/boot.rst

··· 1343 1343 In addition to read/modify/write the setup header of the struct 1344 1344 boot_params as that of 16-bit boot protocol, the boot loader should 1345 1345 also fill the additional fields of the struct boot_params as 1346 - described in chapter :doc:`zero-page`. 1346 + described in chapter Documentation/x86/zero-page.rst. 1347 1347 1348 1348 After setting up the struct boot_params, the boot loader can load the 1349 1349 32/64-bit kernel in the same way as that of 16-bit boot protocol. ··· 1379 1379 In addition to read/modify/write the setup header of the struct 1380 1380 boot_params as that of 16-bit boot protocol, the boot loader should 1381 1381 also fill the additional fields of the struct boot_params as described 1382 - in chapter :doc:`zero-page`. 1382 + in chapter Documentation/x86/zero-page.rst. 1383 1383 1384 1384 After setting up the struct boot_params, the boot loader can load 1385 1385 64-bit kernel in the same way as that of 16-bit boot protocol, but

+1 -1

Documentation/x86/mtrr.rst

··· 28 28 firmware code though and the OS does not make any specific MTRR mapping 29 29 requests mtrr_type_lookup() should always return MTRR_TYPE_INVALID. 30 30 31 - For details refer to :doc:`pat`. 31 + For details refer to Documentation/x86/pat.rst. 32 32 33 33 .. tip:: 34 34 On Intel P6 family processors (Pentium Pro, Pentium II and later)

+1 -1

include/linux/device.h

··· 399 399 * along with subsystem-level and driver-level callbacks. 400 400 * @em_pd: device's energy model performance domain 401 401 * @pins: For device pin management. 402 - * See Documentation/driver-api/pinctl.rst for details. 402 + * See Documentation/driver-api/pin-control.rst for details. 403 403 * @msi_list: Hosts MSI descriptors 404 404 * @msi_domain: The generic MSI domain this device is using. 405 405 * @numa_node: NUMA node this device is close to.

+1 -1

include/linux/mfd/madera/pdata.h

··· 31 31 * @irq_flags: Mode for primary IRQ (defaults to active low) 32 32 * @gpio_base: Base GPIO number 33 33 * @gpio_configs: Array of GPIO configurations (See 34 - * Documentation/driver-api/pinctl.rst) 34 + * Documentation/driver-api/pin-control.rst) 35 35 * @n_gpio_configs: Number of entries in gpio_configs 36 36 * @gpsw: General purpose switch mode setting. Depends on the external 37 37 * hardware connected to the switch. (See the SW1_MODE field

+1 -1

include/linux/pinctrl/pinconf-generic.h

··· 89 89 * it. 90 90 * @PIN_CONFIG_OUTPUT: this will configure the pin as an output and drive a 91 91 * value on the line. Use argument 1 to indicate high level, argument 0 to 92 - * indicate low level. (Please see Documentation/driver-api/pinctl.rst, 92 + * indicate low level. (Please see Documentation/driver-api/pin-control.rst, 93 93 * section "GPIO mode pitfalls" for a discussion around this parameter.) 94 94 * @PIN_CONFIG_PERSIST_STATE: retain pin state across sleep or controller reset 95 95 * @PIN_CONFIG_POWER_SOURCE: if the pin can select between different power

+1 -1

include/linux/platform_profile.h

··· 2 2 /* 3 3 * Platform profile sysfs interface 4 4 * 5 - * See Documentation/ABI/testing/sysfs-platform_profile.rst for more 5 + * See Documentation/userspace-api/sysfs-platform_profile.rst for more 6 6 * information. 7 7 */ 8 8

+16 -15

samples/kprobes/kprobe_example.c

··· 10 10 * whenever kernel_clone() is invoked to create a new process. 11 11 */ 12 12 13 + #define pr_fmt(fmt) "%s: " fmt, __func__ 14 + 13 15 #include <linux/kernel.h> 14 16 #include <linux/module.h> 15 17 #include <linux/kprobes.h> ··· 29 27 static int __kprobes handler_pre(struct kprobe *p, struct pt_regs *regs) 30 28 { 31 29 #ifdef CONFIG_X86 32 - pr_info("<%s> pre_handler: p->addr = 0x%p, ip = %lx, flags = 0x%lx\n", 30 + pr_info("<%s> p->addr = 0x%p, ip = %lx, flags = 0x%lx\n", 33 31 p->symbol_name, p->addr, regs->ip, regs->flags); 34 32 #endif 35 33 #ifdef CONFIG_PPC 36 - pr_info("<%s> pre_handler: p->addr = 0x%p, nip = 0x%lx, msr = 0x%lx\n", 34 + pr_info("<%s> p->addr = 0x%p, nip = 0x%lx, msr = 0x%lx\n", 37 35 p->symbol_name, p->addr, regs->nip, regs->msr); 38 36 #endif 39 37 #ifdef CONFIG_MIPS 40 - pr_info("<%s> pre_handler: p->addr = 0x%p, epc = 0x%lx, status = 0x%lx\n", 38 + pr_info("<%s> p->addr = 0x%p, epc = 0x%lx, status = 0x%lx\n", 41 39 p->symbol_name, p->addr, regs->cp0_epc, regs->cp0_status); 42 40 #endif 43 41 #ifdef CONFIG_ARM64 44 - pr_info("<%s> pre_handler: p->addr = 0x%p, pc = 0x%lx," 45 - " pstate = 0x%lx\n", 42 + pr_info("<%s> p->addr = 0x%p, pc = 0x%lx, pstate = 0x%lx\n", 46 43 p->symbol_name, p->addr, (long)regs->pc, (long)regs->pstate); 47 44 #endif 48 45 #ifdef CONFIG_ARM 49 - pr_info("<%s> pre_handler: p->addr = 0x%p, pc = 0x%lx, cpsr = 0x%lx\n", 46 + pr_info("<%s> p->addr = 0x%p, pc = 0x%lx, cpsr = 0x%lx\n", 50 47 p->symbol_name, p->addr, (long)regs->ARM_pc, (long)regs->ARM_cpsr); 51 48 #endif 52 49 #ifdef CONFIG_RISCV 53 - pr_info("<%s> pre_handler: p->addr = 0x%p, pc = 0x%lx, status = 0x%lx\n", 50 + pr_info("<%s> p->addr = 0x%p, pc = 0x%lx, status = 0x%lx\n", 54 51 p->symbol_name, p->addr, regs->epc, regs->status); 55 52 #endif 56 53 #ifdef CONFIG_S390 57 - pr_info("<%s> pre_handler: p->addr, 0x%p, ip = 0x%lx, flags = 0x%lx\n", 54 + pr_info("<%s> p->addr, 0x%p, ip = 0x%lx, flags = 0x%lx\n", 58 55 p->symbol_name, p->addr, regs->psw.addr, regs->flags); 59 56 #endif 60 57 ··· 66 65 unsigned long flags) 67 66 { 68 67 #ifdef CONFIG_X86 69 - pr_info("<%s> post_handler: p->addr = 0x%p, flags = 0x%lx\n", 68 + pr_info("<%s> p->addr = 0x%p, flags = 0x%lx\n", 70 69 p->symbol_name, p->addr, regs->flags); 71 70 #endif 72 71 #ifdef CONFIG_PPC 73 - pr_info("<%s> post_handler: p->addr = 0x%p, msr = 0x%lx\n", 72 + pr_info("<%s> p->addr = 0x%p, msr = 0x%lx\n", 74 73 p->symbol_name, p->addr, regs->msr); 75 74 #endif 76 75 #ifdef CONFIG_MIPS 77 - pr_info("<%s> post_handler: p->addr = 0x%p, status = 0x%lx\n", 76 + pr_info("<%s> p->addr = 0x%p, status = 0x%lx\n", 78 77 p->symbol_name, p->addr, regs->cp0_status); 79 78 #endif 80 79 #ifdef CONFIG_ARM64 81 - pr_info("<%s> post_handler: p->addr = 0x%p, pstate = 0x%lx\n", 80 + pr_info("<%s> p->addr = 0x%p, pstate = 0x%lx\n", 82 81 p->symbol_name, p->addr, (long)regs->pstate); 83 82 #endif 84 83 #ifdef CONFIG_ARM 85 - pr_info("<%s> post_handler: p->addr = 0x%p, cpsr = 0x%lx\n", 84 + pr_info("<%s> p->addr = 0x%p, cpsr = 0x%lx\n", 86 85 p->symbol_name, p->addr, (long)regs->ARM_cpsr); 87 86 #endif 88 87 #ifdef CONFIG_RISCV 89 - pr_info("<%s> post_handler: p->addr = 0x%p, status = 0x%lx\n", 88 + pr_info("<%s> p->addr = 0x%p, status = 0x%lx\n", 90 89 p->symbol_name, p->addr, regs->status); 91 90 #endif 92 91 #ifdef CONFIG_S390 93 - pr_info("<%s> pre_handler: p->addr, 0x%p, flags = 0x%lx\n", 92 + pr_info("<%s> p->addr, 0x%p, flags = 0x%lx\n", 94 93 p->symbol_name, p->addr, regs->flags); 95 94 #endif 96 95 }

+1 -1

scripts/documentation-file-ref-check

··· 24 24 my $fix = 0; 25 25 my $warn = 0; 26 26 27 - if (! -d ".git") { 27 + if (! -e ".git") { 28 28 printf "Warning: can't check if file exists, as this is not a git tree\n"; 29 29 exit 0; 30 30 }

+33 -36

scripts/kernel-doc

··· 406 406 my $doc_inline_end = '^\s*\*/\s*$'; 407 407 my $doc_inline_oneline = '^\s*/\*\*\s*(@[\w\s]+):\s*(.*)\s*\*/\s*$'; 408 408 my $export_symbol = '^\s*EXPORT_SYMBOL(_GPL)?\s*$\s*(\w+)\s*$\s*;'; 409 + my $function_pointer = qr{([^$]*\(\*)\s*$\s*$([^$]*)\)}; 410 + my $attribute = qr{__attribute__\s*$\([a-z0-9,_\*\s\($]*\)\)}i; 409 411 410 412 my %parameterdescs; 411 413 my %parameterdesc_start_lines; ··· 696 694 $post = ");"; 697 695 } 698 696 $type = $args{'parametertypes'}{$parameter}; 699 - if ($type =~ m/([^$]*\(\*)\s*$\s*$([^$]*)\)/) { 697 + if ($type =~ m/$function_pointer/) { 700 698 # pointer-to-function 701 699 print ".BI \"" . $parenth . $1 . "\" " . " \") (" . $2 . ")" . $post . "\"\n"; 702 700 } else { ··· 976 974 $count++; 977 975 $type = $args{'parametertypes'}{$parameter}; 978 976 979 - if ($type =~ m/([^$]*\(\*)\s*$\s*$([^$]*)\)/) { 977 + if ($type =~ m/$function_pointer/) { 980 978 # pointer-to-function 981 979 print $1 . $parameter . ") (" . $2 . ")"; 982 980 } else { ··· 1213 1211 my $members; 1214 1212 my $type = qr{struct|union}; 1215 1213 # For capturing struct/union definition body, i.e. "{members*}qualifiers*" 1216 - my $definition_body = qr{\{(.*)\}(?:\s*(?:__packed|__aligned|____cacheline_aligned_in_smp|____cacheline_aligned|__attribute__\s*$\([a-z0-9,_\s\($]*\)\)))*}; 1214 + my $qualifiers = qr{$attribute|__packed|__aligned|____cacheline_aligned_in_smp|____cacheline_aligned}; 1215 + my $definition_body = qr{\{(.*)\}\s*$qualifiers*}; 1216 + my $struct_members = qr{($type)([^\{\};]+)\{([^\{\}]*)\}([^\{\}\;]*)\;}; 1217 1217 1218 1218 if ($x =~ /($type)\s+(\w+)\s*$definition_body/) { 1219 1219 $decl_type = $1; ··· 1239 1235 # strip comments: 1240 1236 $members =~ s/\/\*.*?\*\///gos; 1241 1237 # strip attributes 1242 - $members =~ s/\s*__attribute__\s*$\([a-z0-9,_\*\s\($]*\)\)/ /gi; 1238 + $members =~ s/\s*$attribute/ /gi; 1243 1239 $members =~ s/\s*__aligned\s*$[^;]*$/ /gos; 1244 1240 $members =~ s/\s*__packed\s*/ /gos; 1245 1241 $members =~ s/\s*CRYPTO_MINALIGN_ATTR/ /gos; 1246 1242 $members =~ s/\s*____cacheline_aligned_in_smp/ /gos; 1247 1243 $members =~ s/\s*____cacheline_aligned/ /gos; 1248 1244 1245 + my $args = qr{([^,)]+)}; 1249 1246 # replace DECLARE_BITMAP 1250 1247 $members =~ s/__ETHTOOL_DECLARE_LINK_MODE_MASK\s*$([^$]+)\)/DECLARE_BITMAP($1, __ETHTOOL_LINK_MODE_MASK_NBITS)/gos; 1251 - $members =~ s/DECLARE_BITMAP\s*$([^,)]+),\s*([^,)]+)$/unsigned long $1\[BITS_TO_LONGS($2)\]/gos; 1248 + $members =~ s/DECLARE_BITMAP\s*$$args,\s*$args$/unsigned long $1\[BITS_TO_LONGS($2)\]/gos; 1252 1249 # replace DECLARE_HASHTABLE 1253 - $members =~ s/DECLARE_HASHTABLE\s*$([^,)]+),\s*([^,)]+)$/unsigned long $1\[1 << (($2) - 1)\]/gos; 1250 + $members =~ s/DECLARE_HASHTABLE\s*$$args,\s*$args$/unsigned long $1\[1 << (($2) - 1)\]/gos; 1254 1251 # replace DECLARE_KFIFO 1255 - $members =~ s/DECLARE_KFIFO\s*$([^,)]+),\s*([^,)]+),\s*([^,)]+)$/$2 \*$1/gos; 1252 + $members =~ s/DECLARE_KFIFO\s*$$args,\s*$args,\s*$args$/$2 \*$1/gos; 1256 1253 # replace DECLARE_KFIFO_PTR 1257 - $members =~ s/DECLARE_KFIFO_PTR\s*$([^,)]+),\s*([^,)]+)$/$2 \*$1/gos; 1258 - 1254 + $members =~ s/DECLARE_KFIFO_PTR\s*$$args,\s*$args$/$2 \*$1/gos; 1259 1255 my $declaration = $members; 1260 1256 1261 1257 # Split nested struct/union elements as newer ones 1262 - while ($members =~ m/(struct|union)([^\{\};]+)\{([^\{\}]*)\}([^\{\}\;]*)\;/) { 1258 + while ($members =~ m/$struct_members/) { 1263 1259 my $newmember; 1264 1260 my $maintype = $1; 1265 1261 my $ids = $4; ··· 1319 1315 } 1320 1316 } 1321 1317 } 1322 - $members =~ s/(struct|union)([^\{\};]+)\{([^\{\}]*)\}([^\{\}\;]*)\;/$newmember/; 1318 + $members =~ s/$struct_members/$newmember/; 1323 1319 } 1324 1320 1325 1321 # Ignore other nested elements, like enums ··· 1559 1555 my $param; 1560 1556 1561 1557 # temporarily replace commas inside function pointer definition 1562 - while ($args =~ /($[^$,]+),/) { 1563 - $args =~ s/($[^$,]+),/$1#/g; 1558 + my $arg_expr = qr{$[^$,]+}; 1559 + while ($args =~ /$arg_expr,/) { 1560 + $args =~ s/($arg_expr),/$1#/g; 1564 1561 } 1565 1562 1566 1563 foreach my $arg (split($splitter, $args)) { ··· 1712 1707 foreach $px (0 .. $#prms) { 1713 1708 $prm_clean = $prms[$px]; 1714 1709 $prm_clean =~ s/\[.*\]//; 1715 - $prm_clean =~ s/__attribute__\s*$\([a-z,_\*\s\($]*\)\)//i; 1710 + $prm_clean =~ s/$attribute//i; 1716 1711 # ignore array size in a parameter string; 1717 1712 # however, the original param string may contain 1718 1713 # spaces, e.g.: addr[6 + 2] ··· 1814 1809 # - parport_register_device (function pointer parameters) 1815 1810 # - atomic_set (macro) 1816 1811 # - pci_match_device, __copy_to_user (long return type) 1812 + my $name = qr{[a-zA-Z0-9_~:]+}; 1813 + my $prototype_end1 = qr{[^$]*}; 1814 + my $prototype_end2 = qr{[^\{]*}; 1815 + my $prototype_end = qr{\(($prototype_end1|$prototype_end2)$}; 1816 + my $type1 = qr{[\w\s]+}; 1817 + my $type2 = qr{$type1\*+}; 1817 1818 1818 - if ($define && $prototype =~ m/^()([a-zA-Z0-9_~:]+)\s+/) { 1819 + if ($define && $prototype =~ m/^()($name)\s+/) { 1819 1820 # This is an object-like macro, it has no return type and no parameter 1820 1821 # list. 1821 1822 # Function-like macros are not allowed to have spaces between ··· 1829 1818 $return_type = $1; 1830 1819 $declaration_name = $2; 1831 1820 $noret = 1; 1832 - } elsif ($prototype =~ m/^()([a-zA-Z0-9_~:]+)\s*$([^\(]*)$/ || 1833 - $prototype =~ m/^(\w+)\s+([a-zA-Z0-9_~:]+)\s*$([^\(]*)$/ || 1834 - $prototype =~ m/^(\w+\s*\*+)\s*([a-zA-Z0-9_~:]+)\s*$([^\(]*)$/ || 1835 - $prototype =~ m/^(\w+\s+\w+)\s+([a-zA-Z0-9_~:]+)\s*$([^\(]*)$/ || 1836 - $prototype =~ m/^(\w+\s+\w+\s*\*+)\s*([a-zA-Z0-9_~:]+)\s*$([^\(]*)$/ || 1837 - $prototype =~ m/^(\w+\s+\w+\s+\w+)\s+([a-zA-Z0-9_~:]+)\s*$([^\(]*)$/ || 1838 - $prototype =~ m/^(\w+\s+\w+\s+\w+\s*\*+)\s*([a-zA-Z0-9_~:]+)\s*$([^\(]*)$/ || 1839 - $prototype =~ m/^()([a-zA-Z0-9_~:]+)\s*$([^\{]*)$/ || 1840 - $prototype =~ m/^(\w+)\s+([a-zA-Z0-9_~:]+)\s*$([^\{]*)$/ || 1841 - $prototype =~ m/^(\w+\s*\*+)\s*([a-zA-Z0-9_~:]+)\s*$([^\{]*)$/ || 1842 - $prototype =~ m/^(\w+\s+\w+)\s+([a-zA-Z0-9_~:]+)\s*$([^\{]*)$/ || 1843 - $prototype =~ m/^(\w+\s+\w+\s*\*+)\s*([a-zA-Z0-9_~:]+)\s*$([^\{]*)$/ || 1844 - $prototype =~ m/^(\w+\s+\w+\s+\w+)\s+([a-zA-Z0-9_~:]+)\s*$([^\{]*)$/ || 1845 - $prototype =~ m/^(\w+\s+\w+\s+\w+\s*\*+)\s*([a-zA-Z0-9_~:]+)\s*$([^\{]*)$/ || 1846 - $prototype =~ m/^(\w+\s+\w+\s+\w+\s+\w+)\s+([a-zA-Z0-9_~:]+)\s*$([^\{]*)$/ || 1847 - $prototype =~ m/^(\w+\s+\w+\s+\w+\s+\w+\s*\*+)\s*([a-zA-Z0-9_~:]+)\s*$([^\{]*)$/ || 1848 - $prototype =~ m/^(\w+\s+\w+\s*\*+\s*\w+\s*\*+\s*)\s*([a-zA-Z0-9_~:]+)\s*$([^\{]*)$/) { 1821 + } elsif ($prototype =~ m/^()($name)\s*$prototype_end/ || 1822 + $prototype =~ m/^($type1)\s+($name)\s*$prototype_end/ || 1823 + $prototype =~ m/^($type2+)\s*($name)\s*$prototype_end/) { 1849 1824 $return_type = $1; 1850 1825 $declaration_name = $2; 1851 1826 my $args = $3; ··· 2108 2111 } elsif (/$doc_decl/o) { 2109 2112 $identifier = $1; 2110 2113 my $is_kernel_comment = 0; 2111 - my $decl_start = qr{\s*\*}; 2114 + my $decl_start = qr{$doc_com}; 2112 2115 # test for pointer declaration type, foo * bar() - desc 2113 2116 my $fn_type = qr{\w+\s*\*\s*}; 2114 2117 my $parenthesis = qr{$\w*$}; 2115 2118 my $decl_end = qr{[-:].*}; 2116 - if (/^$decl_start\s*([\w\s]+?)$parenthesis?\s*$decl_end?$/) { 2119 + if (/^$decl_start([\w\s]+?)$parenthesis?\s*$decl_end?$/) { 2117 2120 $identifier = $1; 2118 2121 } 2119 2122 if ($identifier =~ m/^(struct|union|enum|typedef)\b\s*(\S*)/) { ··· 2123 2126 } 2124 2127 # Look for foo() or static void foo() - description; or misspelt 2125 2128 # identifier 2126 - elsif (/^$decl_start\s*$fn_type?(\w+)\s*$parenthesis?\s*$decl_end?$/ || 2127 - /^$decl_start\s*$fn_type?(\w+.*)$parenthesis?\s*$decl_end$/) { 2129 + elsif (/^$decl_start$fn_type?(\w+)\s*$parenthesis?\s*$decl_end?$/ || 2130 + /^$decl_start$fn_type?(\w+.*)$parenthesis?\s*$decl_end$/) { 2128 2131 $identifier = $1; 2129 2132 $decl_type = 'function'; 2130 2133 $identifier =~ s/^define\s+//;

+180 -82

scripts/sphinx-pre-install

··· 22 22 my $optional = 0; 23 23 my $need_symlink = 0; 24 24 my $need_sphinx = 0; 25 - my $need_venv = 0; 25 + my $need_pip = 0; 26 26 my $need_virtualenv = 0; 27 27 my $rec_sphinx_upgrade = 0; 28 28 my $install = ""; 29 29 my $virtenv_dir = ""; 30 30 my $python_cmd = ""; 31 + my $activate_cmd; 31 32 my $min_version; 32 33 my $cur_version; 33 34 my $rec_version = "1.7.9"; # PDF won't build here 34 35 my $min_pdf_version = "2.4.4"; # Min version where pdf builds 36 + my $latest_avail_ver; 35 37 36 38 # 37 39 # Command line arguments ··· 321 319 return; 322 320 } 323 321 324 - if ($cur_version lt $rec_version) { 325 - $rec_sphinx_upgrade = 1; 326 - return; 327 - } 322 + return if ($cur_version lt $rec_version); 328 323 329 324 # On version check mode, just assume Sphinx has all mandatory deps 330 325 exit (0) if ($version_check); ··· 700 701 printf "\tdeactivate\n"; 701 702 } 702 703 704 + sub get_virtenv() 705 + { 706 + my $ver; 707 + my $min_activate = "$ENV{'PWD'}/${virtenv_prefix}${min_version}/bin/activate"; 708 + my @activates = glob "$ENV{'PWD'}/${virtenv_prefix}*/bin/activate"; 709 + 710 + @activates = sort {$b cmp $a} @activates; 711 + 712 + foreach my $f (@activates) { 713 + next if ($f lt $min_activate); 714 + 715 + my $sphinx_cmd = $f; 716 + $sphinx_cmd =~ s/activate/sphinx-build/; 717 + next if (! -f $sphinx_cmd); 718 + 719 + my $ver = get_sphinx_version($sphinx_cmd); 720 + if ($need_sphinx && ($ver ge $min_version)) { 721 + return ($f, $ver); 722 + } elsif ($ver gt $cur_version) { 723 + return ($f, $ver); 724 + } 725 + } 726 + return ("", ""); 727 + } 728 + 729 + sub recommend_sphinx_upgrade() 730 + { 731 + my $venv_ver; 732 + 733 + # Avoid running sphinx-builds from venv if $cur_version is good 734 + if ($cur_version && ($cur_version ge $rec_version)) { 735 + $latest_avail_ver = $cur_version; 736 + return; 737 + } 738 + 739 + # Get the highest version from sphinx_*/bin/sphinx-build and the 740 + # corresponding command to activate the venv/virtenv 741 + $activate_cmd = get_virtenv(); 742 + 743 + # Store the highest version from Sphinx existing virtualenvs 744 + if (($activate_cmd ne "") && ($venv_ver gt $cur_version)) { 745 + $latest_avail_ver = $venv_ver; 746 + } else { 747 + $latest_avail_ver = $cur_version if ($cur_version); 748 + } 749 + 750 + # As we don't know package version of Sphinx, and there's no 751 + # virtual environments, don't check if upgrades are needed 752 + if (!$virtualenv) { 753 + return if (!$latest_avail_ver); 754 + } 755 + 756 + # Either there are already a virtual env or a new one should be created 757 + $need_pip = 1; 758 + 759 + # Return if the reason is due to an upgrade or not 760 + if ($latest_avail_ver lt $rec_version) { 761 + $rec_sphinx_upgrade = 1; 762 + } 763 + } 764 + 765 + # 766 + # The logic here is complex, as it have to deal with different versions: 767 + # - minimal supported version; 768 + # - minimal PDF version; 769 + # - recommended version. 770 + # It also needs to work fine with both distro's package and venv/virtualenv 771 + sub recommend_sphinx_version($) 772 + { 773 + my $virtualenv_cmd = shift; 774 + 775 + if ($latest_avail_ver lt $min_pdf_version) { 776 + print "note: If you want pdf, you need at least Sphinx $min_pdf_version.\n"; 777 + } 778 + 779 + # Version is OK. Nothing to do. 780 + return if ($cur_version && ($cur_version ge $rec_version)); 781 + 782 + if (!$need_sphinx) { 783 + # sphinx-build is present and its version is >= $min_version 784 + 785 + #only recommend enabling a newer virtenv version if makes sense. 786 + if ($latest_avail_ver gt $cur_version) { 787 + printf "\nYou may also use the newer Sphinx version $latest_avail_ver with:\n"; 788 + printf "\tdeactivate\n" if ($ENV{'PWD'} =~ /${virtenv_prefix}/); 789 + printf "\t. $activate_cmd\n"; 790 + deactivate_help(); 791 + 792 + return; 793 + } 794 + return if ($latest_avail_ver ge $rec_version); 795 + } 796 + 797 + if (!$virtualenv) { 798 + # No sphinx either via package or via virtenv. As we can't 799 + # Compare the versions here, just return, recommending the 800 + # user to install it from the package distro. 801 + return if (!$latest_avail_ver); 802 + 803 + # User doesn't want a virtenv recommendation, but he already 804 + # installed one via virtenv with a newer version. 805 + # So, print commands to enable it 806 + if ($latest_avail_ver gt $cur_version) { 807 + printf "\nYou may also use the Sphinx virtualenv version $latest_avail_ver with:\n"; 808 + printf "\tdeactivate\n" if ($ENV{'PWD'} =~ /${virtenv_prefix}/); 809 + printf "\t. $activate_cmd\n"; 810 + deactivate_help(); 811 + 812 + return; 813 + } 814 + print "\n"; 815 + } else { 816 + $need++ if ($need_sphinx); 817 + } 818 + 819 + # Suggest newer versions if current ones are too old 820 + if ($latest_avail_ver && $cur_version ge $min_version) { 821 + # If there's a good enough version, ask the user to enable it 822 + if ($latest_avail_ver ge $rec_version) { 823 + printf "\nNeed to activate Sphinx (version $latest_avail_ver) on virtualenv with:\n"; 824 + printf "\t. $activate_cmd\n"; 825 + deactivate_help(); 826 + 827 + return; 828 + } 829 + 830 + # Version is above the minimal required one, but may be 831 + # below the recommended one. So, print warnings/notes 832 + 833 + if ($latest_avail_ver lt $rec_version) { 834 + print "Warning: It is recommended at least Sphinx version $rec_version.\n"; 835 + } 836 + } 837 + 838 + # At this point, either it needs Sphinx or upgrade is recommended, 839 + # both via pip 840 + 841 + if ($rec_sphinx_upgrade) { 842 + if (!$virtualenv) { 843 + print "Instead of install/upgrade Python Sphinx pkg, you could use pip/pypi with:\n\n"; 844 + } else { 845 + print "To upgrade Sphinx, use:\n\n"; 846 + } 847 + } else { 848 + print "Sphinx needs to be installed either as a package or via pip/pypi with:\n"; 849 + } 850 + 851 + $python_cmd = find_python_no_venv(); 852 + 853 + printf "\t$virtualenv_cmd $virtenv_dir\n"; 854 + 855 + printf "\t. $virtenv_dir/bin/activate\n"; 856 + printf "\tpip install -r $requirement_file\n"; 857 + deactivate_help(); 858 + } 859 + 703 860 sub check_needs() 704 861 { 705 862 # Check if Sphinx is already accessible from current environment ··· 877 722 if ($virtualenv) { 878 723 my $tmp = qx($python_cmd --version 2>&1); 879 724 if ($tmp =~ m/(\d+\.)(\d+\.)/) { 880 - if ($1 >= 3 && $2 >= 3) { 881 - $need_venv = 1; # python 3.3 or upper 882 - } else { 883 - $need_virtualenv = 1; 884 - } 885 725 if ($1 < 3) { 886 726 # Fail if it finds python2 (or worse) 887 727 die "Python 3 is required to build the kernel docs\n"; 728 + } 729 + if ($1 == 3 && $2 < 3) { 730 + # Need Python 3.3 or upper for venv 731 + $need_virtualenv = 1; 888 732 } 889 733 } else { 890 734 die "Warning: couldn't identify $python_cmd version!"; ··· 893 739 } 894 740 } 895 741 896 - # Set virtualenv command line, if python < 3.3 742 + recommend_sphinx_upgrade(); 743 + 897 744 my $virtualenv_cmd; 898 - if ($need_virtualenv) { 899 - $virtualenv_cmd = findprog("virtualenv-3"); 900 - $virtualenv_cmd = findprog("virtualenv-3.5") if (!$virtualenv_cmd); 901 - if (!$virtualenv_cmd) { 902 - check_program("virtualenv", 0); 903 - $virtualenv_cmd = "virtualenv"; 745 + 746 + if ($need_pip) { 747 + # Set virtualenv command line, if python < 3.3 748 + if ($need_virtualenv) { 749 + $virtualenv_cmd = findprog("virtualenv-3"); 750 + $virtualenv_cmd = findprog("virtualenv-3.5") if (!$virtualenv_cmd); 751 + if (!$virtualenv_cmd) { 752 + check_program("virtualenv", 0); 753 + $virtualenv_cmd = "virtualenv"; 754 + } 755 + } else { 756 + $virtualenv_cmd = "$python_cmd -m venv"; 757 + check_python_module("ensurepip", 0); 904 758 } 905 759 } 906 760 ··· 924 762 check_program("xelatex", 2) if ($pdf); 925 763 check_program("rsvg-convert", 2) if ($pdf); 926 764 check_program("latexmk", 2) if ($pdf); 927 - 928 - if ($need_sphinx || $rec_sphinx_upgrade) { 929 - check_python_module("ensurepip", 0) if ($need_venv); 930 - } 931 765 932 766 # Do distro-specific checks and output distro-install commands 933 767 check_distros(); ··· 942 784 which("sphinx-build-3"); 943 785 } 944 786 945 - # NOTE: if the system has a too old Sphinx version installed, 946 - # it will recommend installing a newer version using virtualenv 947 - 948 - if ($need_sphinx || $rec_sphinx_upgrade) { 949 - my $min_activate = "$ENV{'PWD'}/${virtenv_prefix}${min_version}/bin/activate"; 950 - my @activates = glob "$ENV{'PWD'}/${virtenv_prefix}*/bin/activate"; 951 - 952 - if ($cur_version lt $rec_version) { 953 - print "Warning: It is recommended at least Sphinx version $rec_version.\n"; 954 - print " If you want pdf, you need at least $min_pdf_version.\n"; 955 - } 956 - if ($cur_version lt $min_pdf_version) { 957 - print "Note: It is recommended at least Sphinx version $min_pdf_version if you need PDF support.\n"; 958 - } 959 - @activates = sort {$b cmp $a} @activates; 960 - my ($activate, $ver); 961 - foreach my $f (@activates) { 962 - next if ($f lt $min_activate); 963 - 964 - my $sphinx_cmd = $f; 965 - $sphinx_cmd =~ s/activate/sphinx-build/; 966 - next if (! -f $sphinx_cmd); 967 - 968 - $ver = get_sphinx_version($sphinx_cmd); 969 - if ($need_sphinx && ($ver ge $min_version)) { 970 - $activate = $f; 971 - last; 972 - } elsif ($ver gt $cur_version) { 973 - $activate = $f; 974 - last; 975 - } 976 - } 977 - if ($activate ne "") { 978 - if ($need_sphinx) { 979 - printf "\nNeed to activate Sphinx (version $ver) on virtualenv with:\n"; 980 - printf "\t. $activate\n"; 981 - deactivate_help(); 982 - exit (1); 983 - } else { 984 - printf "\nYou may also use a newer Sphinx (version $ver) with:\n"; 985 - printf "\tdeactivate && . $activate\n"; 986 - } 987 - } else { 988 - my $rec_activate = "$virtenv_dir/bin/activate"; 989 - 990 - print "To upgrade Sphinx, use:\n\n" if ($rec_sphinx_upgrade); 991 - 992 - $python_cmd = find_python_no_venv(); 993 - 994 - if ($need_venv) { 995 - printf "\t$python_cmd -m venv $virtenv_dir\n"; 996 - } else { 997 - printf "\t$virtualenv_cmd $virtenv_dir\n"; 998 - } 999 - printf "\t. $rec_activate\n"; 1000 - printf "\tpip install -r $requirement_file\n"; 1001 - deactivate_help(); 1002 - 1003 - $need++ if (!$rec_sphinx_upgrade); 1004 - } 1005 - } 787 + recommend_sphinx_version($virtualenv_cmd); 1006 788 printf "\n"; 1007 789 1008 790 print "All optional dependencies are met.\n" if (!$optional);

+1 -1

tools/debugging/kernel-chktaint

··· 196 196 fi 197 197 198 198 echo "For a more detailed explanation of the various taint flags see" 199 - echo " Documentation/admin-guide/tainted-kernels.rst in the the Linux kernel sources" 199 + echo " Documentation/admin-guide/tainted-kernels.rst in the Linux kernel sources" 200 200 echo " or https://kernel.org/doc/html/latest/admin-guide/tainted-kernels.html" 201 201 echo "Raw taint value as int/string: $taint/'$out'" 202 202 #EOF#