Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

Documentation: x86: convert pat.txt to reST

This converts the plain text documentation to reStructuredText format and
add it to Sphinx TOC tree. No essential content change.

Signed-off-by: Changbin Du <changbin.du@gmail.com>
Cc: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>

authored by

Changbin Du and committed by
Jonathan Corbet
2f6eae47 26d14a20

+243 -230
+1
Documentation/x86/index.rst
··· 17 17 zero-page 18 18 tlb 19 19 mtrr 20 + pat
+242
Documentation/x86/pat.rst
··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + ========================== 4 + PAT (Page Attribute Table) 5 + ========================== 6 + 7 + x86 Page Attribute Table (PAT) allows for setting the memory attribute at the 8 + page level granularity. PAT is complementary to the MTRR settings which allows 9 + for setting of memory types over physical address ranges. However, PAT is 10 + more flexible than MTRR due to its capability to set attributes at page level 11 + and also due to the fact that there are no hardware limitations on number of 12 + such attribute settings allowed. Added flexibility comes with guidelines for 13 + not having memory type aliasing for the same physical memory with multiple 14 + virtual addresses. 15 + 16 + PAT allows for different types of memory attributes. The most commonly used 17 + ones that will be supported at this time are: 18 + 19 + === ============== 20 + WB Write-back 21 + UC Uncached 22 + WC Write-combined 23 + WT Write-through 24 + UC- Uncached Minus 25 + === ============== 26 + 27 + 28 + PAT APIs 29 + ======== 30 + 31 + There are many different APIs in the kernel that allows setting of memory 32 + attributes at the page level. In order to avoid aliasing, these interfaces 33 + should be used thoughtfully. Below is a table of interfaces available, 34 + their intended usage and their memory attribute relationships. Internally, 35 + these APIs use a reserve_memtype()/free_memtype() interface on the physical 36 + address range to avoid any aliasing. 37 + 38 + +------------------------+----------+--------------+------------------+ 39 + | API | RAM | ACPI,... | Reserved/Holes | 40 + +------------------------+----------+--------------+------------------+ 41 + | ioremap | -- | UC- | UC- | 42 + +------------------------+----------+--------------+------------------+ 43 + | ioremap_cache | -- | WB | WB | 44 + +------------------------+----------+--------------+------------------+ 45 + | ioremap_uc | -- | UC | UC | 46 + +------------------------+----------+--------------+------------------+ 47 + | ioremap_nocache | -- | UC- | UC- | 48 + +------------------------+----------+--------------+------------------+ 49 + | ioremap_wc | -- | -- | WC | 50 + +------------------------+----------+--------------+------------------+ 51 + | ioremap_wt | -- | -- | WT | 52 + +------------------------+----------+--------------+------------------+ 53 + | set_memory_uc, | UC- | -- | -- | 54 + | set_memory_wb | | | | 55 + +------------------------+----------+--------------+------------------+ 56 + | set_memory_wc, | WC | -- | -- | 57 + | set_memory_wb | | | | 58 + +------------------------+----------+--------------+------------------+ 59 + | set_memory_wt, | WT | -- | -- | 60 + | set_memory_wb | | | | 61 + +------------------------+----------+--------------+------------------+ 62 + | pci sysfs resource | -- | -- | UC- | 63 + +------------------------+----------+--------------+------------------+ 64 + | pci sysfs resource_wc | -- | -- | WC | 65 + | is IORESOURCE_PREFETCH | | | | 66 + +------------------------+----------+--------------+------------------+ 67 + | pci proc | -- | -- | UC- | 68 + | !PCIIOC_WRITE_COMBINE | | | | 69 + +------------------------+----------+--------------+------------------+ 70 + | pci proc | -- | -- | WC | 71 + | PCIIOC_WRITE_COMBINE | | | | 72 + +------------------------+----------+--------------+------------------+ 73 + | /dev/mem | -- | WB/WC/UC- | WB/WC/UC- | 74 + | read-write | | | | 75 + +------------------------+----------+--------------+------------------+ 76 + | /dev/mem | -- | UC- | UC- | 77 + | mmap SYNC flag | | | | 78 + +------------------------+----------+--------------+------------------+ 79 + | /dev/mem | -- | WB/WC/UC- | WB/WC/UC- | 80 + | mmap !SYNC flag | | | | 81 + | and | |(from existing| (from existing | 82 + | any alias to this area | |alias) | alias) | 83 + +------------------------+----------+--------------+------------------+ 84 + | /dev/mem | -- | WB | WB | 85 + | mmap !SYNC flag | | | | 86 + | no alias to this area | | | | 87 + | and | | | | 88 + | MTRR says WB | | | | 89 + +------------------------+----------+--------------+------------------+ 90 + | /dev/mem | -- | -- | UC- | 91 + | mmap !SYNC flag | | | | 92 + | no alias to this area | | | | 93 + | and | | | | 94 + | MTRR says !WB | | | | 95 + +------------------------+----------+--------------+------------------+ 96 + 97 + 98 + Advanced APIs for drivers 99 + ========================= 100 + 101 + A. Exporting pages to users with remap_pfn_range, io_remap_pfn_range, 102 + vmf_insert_pfn. 103 + 104 + Drivers wanting to export some pages to userspace do it by using mmap 105 + interface and a combination of: 106 + 107 + 1) pgprot_noncached() 108 + 2) io_remap_pfn_range() or remap_pfn_range() or vmf_insert_pfn() 109 + 110 + With PAT support, a new API pgprot_writecombine is being added. So, drivers can 111 + continue to use the above sequence, with either pgprot_noncached() or 112 + pgprot_writecombine() in step 1, followed by step 2. 113 + 114 + In addition, step 2 internally tracks the region as UC or WC in memtype 115 + list in order to ensure no conflicting mapping. 116 + 117 + Note that this set of APIs only works with IO (non RAM) regions. If driver 118 + wants to export a RAM region, it has to do set_memory_uc() or set_memory_wc() 119 + as step 0 above and also track the usage of those pages and use set_memory_wb() 120 + before the page is freed to free pool. 121 + 122 + MTRR effects on PAT / non-PAT systems 123 + ===================================== 124 + 125 + The following table provides the effects of using write-combining MTRRs when 126 + using ioremap*() calls on x86 for both non-PAT and PAT systems. Ideally 127 + mtrr_add() usage will be phased out in favor of arch_phys_wc_add() which will 128 + be a no-op on PAT enabled systems. The region over which a arch_phys_wc_add() 129 + is made, should already have been ioremapped with WC attributes or PAT entries, 130 + this can be done by using ioremap_wc() / set_memory_wc(). Devices which 131 + combine areas of IO memory desired to remain uncacheable with areas where 132 + write-combining is desirable should consider use of ioremap_uc() followed by 133 + set_memory_wc() to white-list effective write-combined areas. Such use is 134 + nevertheless discouraged as the effective memory type is considered 135 + implementation defined, yet this strategy can be used as last resort on devices 136 + with size-constrained regions where otherwise MTRR write-combining would 137 + otherwise not be effective. 138 + :: 139 + 140 + ==== ======= === ========================= ===================== 141 + MTRR Non-PAT PAT Linux ioremap value Effective memory type 142 + ==== ======= === ========================= ===================== 143 + PAT Non-PAT | PAT 144 + |PCD | 145 + ||PWT | 146 + ||| | 147 + WC 000 WB _PAGE_CACHE_MODE_WB WC | WC 148 + WC 001 WC _PAGE_CACHE_MODE_WC WC* | WC 149 + WC 010 UC- _PAGE_CACHE_MODE_UC_MINUS WC* | UC 150 + WC 011 UC _PAGE_CACHE_MODE_UC UC | UC 151 + ==== ======= === ========================= ===================== 152 + 153 + (*) denotes implementation defined and is discouraged 154 + 155 + .. note:: -- in the above table mean "Not suggested usage for the API". Some 156 + of the --'s are strictly enforced by the kernel. Some others are not really 157 + enforced today, but may be enforced in future. 158 + 159 + For ioremap and pci access through /sys or /proc - The actual type returned 160 + can be more restrictive, in case of any existing aliasing for that address. 161 + For example: If there is an existing uncached mapping, a new ioremap_wc can 162 + return uncached mapping in place of write-combine requested. 163 + 164 + set_memory_[uc|wc|wt] and set_memory_wb should be used in pairs, where driver 165 + will first make a region uc, wc or wt and switch it back to wb after use. 166 + 167 + Over time writes to /proc/mtrr will be deprecated in favor of using PAT based 168 + interfaces. Users writing to /proc/mtrr are suggested to use above interfaces. 169 + 170 + Drivers should use ioremap_[uc|wc] to access PCI BARs with [uc|wc] access 171 + types. 172 + 173 + Drivers should use set_memory_[uc|wc|wt] to set access type for RAM ranges. 174 + 175 + 176 + PAT debugging 177 + ============= 178 + 179 + With CONFIG_DEBUG_FS enabled, PAT memtype list can be examined by:: 180 + 181 + # mount -t debugfs debugfs /sys/kernel/debug 182 + # cat /sys/kernel/debug/x86/pat_memtype_list 183 + PAT memtype list: 184 + uncached-minus @ 0x7fadf000-0x7fae0000 185 + uncached-minus @ 0x7fb19000-0x7fb1a000 186 + uncached-minus @ 0x7fb1a000-0x7fb1b000 187 + uncached-minus @ 0x7fb1b000-0x7fb1c000 188 + uncached-minus @ 0x7fb1c000-0x7fb1d000 189 + uncached-minus @ 0x7fb1d000-0x7fb1e000 190 + uncached-minus @ 0x7fb1e000-0x7fb25000 191 + uncached-minus @ 0x7fb25000-0x7fb26000 192 + uncached-minus @ 0x7fb26000-0x7fb27000 193 + uncached-minus @ 0x7fb27000-0x7fb28000 194 + uncached-minus @ 0x7fb28000-0x7fb2e000 195 + uncached-minus @ 0x7fb2e000-0x7fb2f000 196 + uncached-minus @ 0x7fb2f000-0x7fb30000 197 + uncached-minus @ 0x7fb31000-0x7fb32000 198 + uncached-minus @ 0x80000000-0x90000000 199 + 200 + This list shows physical address ranges and various PAT settings used to 201 + access those physical address ranges. 202 + 203 + Another, more verbose way of getting PAT related debug messages is with 204 + "debugpat" boot parameter. With this parameter, various debug messages are 205 + printed to dmesg log. 206 + 207 + PAT Initialization 208 + ================== 209 + 210 + The following table describes how PAT is initialized under various 211 + configurations. The PAT MSR must be updated by Linux in order to support WC 212 + and WT attributes. Otherwise, the PAT MSR has the value programmed in it 213 + by the firmware. Note, Xen enables WC attribute in the PAT MSR for guests. 214 + 215 + ==== ===== ========================== ========= ======= 216 + MTRR PAT Call Sequence PAT State PAT MSR 217 + ==== ===== ========================== ========= ======= 218 + E E MTRR -> PAT init Enabled OS 219 + E D MTRR -> PAT init Disabled - 220 + D E MTRR -> PAT disable Disabled BIOS 221 + D D MTRR -> PAT disable Disabled - 222 + - np/E PAT -> PAT disable Disabled BIOS 223 + - np/D PAT -> PAT disable Disabled - 224 + E !P/E MTRR -> PAT init Disabled BIOS 225 + D !P/E MTRR -> PAT disable Disabled BIOS 226 + !M !P/E MTRR stub -> PAT disable Disabled BIOS 227 + ==== ===== ========================== ========= ======= 228 + 229 + Legend 230 + 231 + ========= ======================================= 232 + E Feature enabled in CPU 233 + D Feature disabled/unsupported in CPU 234 + np "nopat" boot option specified 235 + !P CONFIG_X86_PAT option unset 236 + !M CONFIG_MTRR option unset 237 + Enabled PAT state set to enabled 238 + Disabled PAT state set to disabled 239 + OS PAT initializes PAT MSR with OS setting 240 + BIOS PAT keeps PAT MSR with BIOS setting 241 + ========= ======================================= 242 +
-230
Documentation/x86/pat.txt
··· 1 - 2 - PAT (Page Attribute Table) 3 - 4 - x86 Page Attribute Table (PAT) allows for setting the memory attribute at the 5 - page level granularity. PAT is complementary to the MTRR settings which allows 6 - for setting of memory types over physical address ranges. However, PAT is 7 - more flexible than MTRR due to its capability to set attributes at page level 8 - and also due to the fact that there are no hardware limitations on number of 9 - such attribute settings allowed. Added flexibility comes with guidelines for 10 - not having memory type aliasing for the same physical memory with multiple 11 - virtual addresses. 12 - 13 - PAT allows for different types of memory attributes. The most commonly used 14 - ones that will be supported at this time are Write-back, Uncached, 15 - Write-combined, Write-through and Uncached Minus. 16 - 17 - 18 - PAT APIs 19 - -------- 20 - 21 - There are many different APIs in the kernel that allows setting of memory 22 - attributes at the page level. In order to avoid aliasing, these interfaces 23 - should be used thoughtfully. Below is a table of interfaces available, 24 - their intended usage and their memory attribute relationships. Internally, 25 - these APIs use a reserve_memtype()/free_memtype() interface on the physical 26 - address range to avoid any aliasing. 27 - 28 - 29 - ------------------------------------------------------------------- 30 - API | RAM | ACPI,... | Reserved/Holes | 31 - -----------------------|----------|------------|------------------| 32 - | | | | 33 - ioremap | -- | UC- | UC- | 34 - | | | | 35 - ioremap_cache | -- | WB | WB | 36 - | | | | 37 - ioremap_uc | -- | UC | UC | 38 - | | | | 39 - ioremap_nocache | -- | UC- | UC- | 40 - | | | | 41 - ioremap_wc | -- | -- | WC | 42 - | | | | 43 - ioremap_wt | -- | -- | WT | 44 - | | | | 45 - set_memory_uc | UC- | -- | -- | 46 - set_memory_wb | | | | 47 - | | | | 48 - set_memory_wc | WC | -- | -- | 49 - set_memory_wb | | | | 50 - | | | | 51 - set_memory_wt | WT | -- | -- | 52 - set_memory_wb | | | | 53 - | | | | 54 - pci sysfs resource | -- | -- | UC- | 55 - | | | | 56 - pci sysfs resource_wc | -- | -- | WC | 57 - is IORESOURCE_PREFETCH| | | | 58 - | | | | 59 - pci proc | -- | -- | UC- | 60 - !PCIIOC_WRITE_COMBINE | | | | 61 - | | | | 62 - pci proc | -- | -- | WC | 63 - PCIIOC_WRITE_COMBINE | | | | 64 - | | | | 65 - /dev/mem | -- | WB/WC/UC- | WB/WC/UC- | 66 - read-write | | | | 67 - | | | | 68 - /dev/mem | -- | UC- | UC- | 69 - mmap SYNC flag | | | | 70 - | | | | 71 - /dev/mem | -- | WB/WC/UC- | WB/WC/UC- | 72 - mmap !SYNC flag | |(from exist-| (from exist- | 73 - and | | ing alias)| ing alias) | 74 - any alias to this area| | | | 75 - | | | | 76 - /dev/mem | -- | WB | WB | 77 - mmap !SYNC flag | | | | 78 - no alias to this area | | | | 79 - and | | | | 80 - MTRR says WB | | | | 81 - | | | | 82 - /dev/mem | -- | -- | UC- | 83 - mmap !SYNC flag | | | | 84 - no alias to this area | | | | 85 - and | | | | 86 - MTRR says !WB | | | | 87 - | | | | 88 - ------------------------------------------------------------------- 89 - 90 - Advanced APIs for drivers 91 - ------------------------- 92 - A. Exporting pages to users with remap_pfn_range, io_remap_pfn_range, 93 - vmf_insert_pfn 94 - 95 - Drivers wanting to export some pages to userspace do it by using mmap 96 - interface and a combination of 97 - 1) pgprot_noncached() 98 - 2) io_remap_pfn_range() or remap_pfn_range() or vmf_insert_pfn() 99 - 100 - With PAT support, a new API pgprot_writecombine is being added. So, drivers can 101 - continue to use the above sequence, with either pgprot_noncached() or 102 - pgprot_writecombine() in step 1, followed by step 2. 103 - 104 - In addition, step 2 internally tracks the region as UC or WC in memtype 105 - list in order to ensure no conflicting mapping. 106 - 107 - Note that this set of APIs only works with IO (non RAM) regions. If driver 108 - wants to export a RAM region, it has to do set_memory_uc() or set_memory_wc() 109 - as step 0 above and also track the usage of those pages and use set_memory_wb() 110 - before the page is freed to free pool. 111 - 112 - MTRR effects on PAT / non-PAT systems 113 - ------------------------------------- 114 - 115 - The following table provides the effects of using write-combining MTRRs when 116 - using ioremap*() calls on x86 for both non-PAT and PAT systems. Ideally 117 - mtrr_add() usage will be phased out in favor of arch_phys_wc_add() which will 118 - be a no-op on PAT enabled systems. The region over which a arch_phys_wc_add() 119 - is made, should already have been ioremapped with WC attributes or PAT entries, 120 - this can be done by using ioremap_wc() / set_memory_wc(). Devices which 121 - combine areas of IO memory desired to remain uncacheable with areas where 122 - write-combining is desirable should consider use of ioremap_uc() followed by 123 - set_memory_wc() to white-list effective write-combined areas. Such use is 124 - nevertheless discouraged as the effective memory type is considered 125 - implementation defined, yet this strategy can be used as last resort on devices 126 - with size-constrained regions where otherwise MTRR write-combining would 127 - otherwise not be effective. 128 - 129 - ---------------------------------------------------------------------- 130 - MTRR Non-PAT PAT Linux ioremap value Effective memory type 131 - ---------------------------------------------------------------------- 132 - Non-PAT | PAT 133 - PAT 134 - |PCD 135 - ||PWT 136 - ||| 137 - WC 000 WB _PAGE_CACHE_MODE_WB WC | WC 138 - WC 001 WC _PAGE_CACHE_MODE_WC WC* | WC 139 - WC 010 UC- _PAGE_CACHE_MODE_UC_MINUS WC* | UC 140 - WC 011 UC _PAGE_CACHE_MODE_UC UC | UC 141 - ---------------------------------------------------------------------- 142 - 143 - (*) denotes implementation defined and is discouraged 144 - 145 - Notes: 146 - 147 - -- in the above table mean "Not suggested usage for the API". Some of the --'s 148 - are strictly enforced by the kernel. Some others are not really enforced 149 - today, but may be enforced in future. 150 - 151 - For ioremap and pci access through /sys or /proc - The actual type returned 152 - can be more restrictive, in case of any existing aliasing for that address. 153 - For example: If there is an existing uncached mapping, a new ioremap_wc can 154 - return uncached mapping in place of write-combine requested. 155 - 156 - set_memory_[uc|wc|wt] and set_memory_wb should be used in pairs, where driver 157 - will first make a region uc, wc or wt and switch it back to wb after use. 158 - 159 - Over time writes to /proc/mtrr will be deprecated in favor of using PAT based 160 - interfaces. Users writing to /proc/mtrr are suggested to use above interfaces. 161 - 162 - Drivers should use ioremap_[uc|wc] to access PCI BARs with [uc|wc] access 163 - types. 164 - 165 - Drivers should use set_memory_[uc|wc|wt] to set access type for RAM ranges. 166 - 167 - 168 - PAT debugging 169 - ------------- 170 - 171 - With CONFIG_DEBUG_FS enabled, PAT memtype list can be examined by 172 - 173 - # mount -t debugfs debugfs /sys/kernel/debug 174 - # cat /sys/kernel/debug/x86/pat_memtype_list 175 - PAT memtype list: 176 - uncached-minus @ 0x7fadf000-0x7fae0000 177 - uncached-minus @ 0x7fb19000-0x7fb1a000 178 - uncached-minus @ 0x7fb1a000-0x7fb1b000 179 - uncached-minus @ 0x7fb1b000-0x7fb1c000 180 - uncached-minus @ 0x7fb1c000-0x7fb1d000 181 - uncached-minus @ 0x7fb1d000-0x7fb1e000 182 - uncached-minus @ 0x7fb1e000-0x7fb25000 183 - uncached-minus @ 0x7fb25000-0x7fb26000 184 - uncached-minus @ 0x7fb26000-0x7fb27000 185 - uncached-minus @ 0x7fb27000-0x7fb28000 186 - uncached-minus @ 0x7fb28000-0x7fb2e000 187 - uncached-minus @ 0x7fb2e000-0x7fb2f000 188 - uncached-minus @ 0x7fb2f000-0x7fb30000 189 - uncached-minus @ 0x7fb31000-0x7fb32000 190 - uncached-minus @ 0x80000000-0x90000000 191 - 192 - This list shows physical address ranges and various PAT settings used to 193 - access those physical address ranges. 194 - 195 - Another, more verbose way of getting PAT related debug messages is with 196 - "debugpat" boot parameter. With this parameter, various debug messages are 197 - printed to dmesg log. 198 - 199 - PAT Initialization 200 - ------------------ 201 - 202 - The following table describes how PAT is initialized under various 203 - configurations. The PAT MSR must be updated by Linux in order to support WC 204 - and WT attributes. Otherwise, the PAT MSR has the value programmed in it 205 - by the firmware. Note, Xen enables WC attribute in the PAT MSR for guests. 206 - 207 - MTRR PAT Call Sequence PAT State PAT MSR 208 - ========================================================= 209 - E E MTRR -> PAT init Enabled OS 210 - E D MTRR -> PAT init Disabled - 211 - D E MTRR -> PAT disable Disabled BIOS 212 - D D MTRR -> PAT disable Disabled - 213 - - np/E PAT -> PAT disable Disabled BIOS 214 - - np/D PAT -> PAT disable Disabled - 215 - E !P/E MTRR -> PAT init Disabled BIOS 216 - D !P/E MTRR -> PAT disable Disabled BIOS 217 - !M !P/E MTRR stub -> PAT disable Disabled BIOS 218 - 219 - Legend 220 - ------------------------------------------------ 221 - E Feature enabled in CPU 222 - D Feature disabled/unsupported in CPU 223 - np "nopat" boot option specified 224 - !P CONFIG_X86_PAT option unset 225 - !M CONFIG_MTRR option unset 226 - Enabled PAT state set to enabled 227 - Disabled PAT state set to disabled 228 - OS PAT initializes PAT MSR with OS setting 229 - BIOS PAT keeps PAT MSR with BIOS setting 230 -