···11+ MEMORY ATTRIBUTE ALIASING ON IA-6422+33+ Bjorn Helgaas44+ <bjorn.helgaas@hp.com>55+ May 4, 200666+77+88+MEMORY ATTRIBUTES99+1010+ Itanium supports several attributes for virtual memory references.1111+ The attribute is part of the virtual translation, i.e., it is1212+ contained in the TLB entry. The ones of most interest to the Linux1313+ kernel are:1414+1515+ WB Write-back (cacheable)1616+ UC Uncacheable1717+ WC Write-coalescing1818+1919+ System memory typically uses the WB attribute. The UC attribute is2020+ used for memory-mapped I/O devices. The WC attribute is uncacheable2121+ like UC is, but writes may be delayed and combined to increase2222+ performance for things like frame buffers.2323+2424+ The Itanium architecture requires that we avoid accessing the same2525+ page with both a cacheable mapping and an uncacheable mapping[1].2626+2727+ The design of the chipset determines which attributes are supported2828+ on which regions of the address space. For example, some chipsets2929+ support either WB or UC access to main memory, while others support3030+ only WB access.3131+3232+MEMORY MAP3333+3434+ Platform firmware describes the physical memory map and the3535+ supported attributes for each region. At boot-time, the kernel uses3636+ the EFI GetMemoryMap() interface. ACPI can also describe memory3737+ devices and the attributes they support, but Linux/ia64 currently3838+ doesn't use this information.3939+4040+ The kernel uses the efi_memmap table returned from GetMemoryMap() to4141+ learn the attributes supported by each region of physical address4242+ space. Unfortunately, this table does not completely describe the4343+ address space because some machines omit some or all of the MMIO4444+ regions from the map.4545+4646+ The kernel maintains another table, kern_memmap, which describes the4747+ memory Linux is actually using and the attribute for each region.4848+ This contains only system memory; it does not contain MMIO space.4949+5050+ The kern_memmap table typically contains only a subset of the system5151+ memory described by the efi_memmap. Linux/ia64 can't use all memory5252+ in the system because of constraints imposed by the identity mapping5353+ scheme.5454+5555+ The efi_memmap table is preserved unmodified because the original5656+ boot-time information is required for kexec.5757+5858+KERNEL IDENTITY MAPPINGS5959+6060+ Linux/ia64 identity mappings are done with large pages, currently6161+ either 16MB or 64MB, referred to as "granules." Cacheable mappings6262+ are speculative[2], so the processor can read any location in the6363+ page at any time, independent of the programmer's intentions. This6464+ means that to avoid attribute aliasing, Linux can create a cacheable6565+ identity mapping only when the entire granule supports cacheable6666+ access.6767+6868+ Therefore, kern_memmap contains only full granule-sized regions that6969+ can referenced safely by an identity mapping.7070+7171+ Uncacheable mappings are not speculative, so the processor will7272+ generate UC accesses only to locations explicitly referenced by7373+ software. This allows UC identity mappings to cover granules that7474+ are only partially populated, or populated with a combination of UC7575+ and WB regions.7676+7777+USER MAPPINGS7878+7979+ User mappings are typically done with 16K or 64K pages. The smaller8080+ page size allows more flexibility because only 16K or 64K has to be8181+ homogeneous with respect to memory attributes.8282+8383+POTENTIAL ATTRIBUTE ALIASING CASES8484+8585+ There are several ways the kernel creates new mappings:8686+8787+ mmap of /dev/mem8888+8989+ This uses remap_pfn_range(), which creates user mappings. These9090+ mappings may be either WB or UC. If the region being mapped9191+ happens to be in kern_memmap, meaning that it may also be mapped9292+ by a kernel identity mapping, the user mapping must use the same9393+ attribute as the kernel mapping.9494+9595+ If the region is not in kern_memmap, the user mapping should use9696+ an attribute reported as being supported in the EFI memory map.9797+9898+ Since the EFI memory map does not describe MMIO on some9999+ machines, this should use an uncacheable mapping as a fallback.100100+101101+ mmap of /sys/class/pci_bus/.../legacy_mem102102+103103+ This is very similar to mmap of /dev/mem, except that legacy_mem104104+ only allows mmap of the one megabyte "legacy MMIO" area for a105105+ specific PCI bus. Typically this is the first megabyte of106106+ physical address space, but it may be different on machines with107107+ several VGA devices.108108+109109+ "X" uses this to access VGA frame buffers. Using legacy_mem110110+ rather than /dev/mem allows multiple instances of X to talk to111111+ different VGA cards.112112+113113+ The /dev/mem mmap constraints apply.114114+115115+ However, since this is for mapping legacy MMIO space, WB access116116+ does not make sense. This matters on machines without legacy117117+ VGA support: these machines may have WB memory for the entire118118+ first megabyte (or even the entire first granule).119119+120120+ On these machines, we could mmap legacy_mem as WB, which would121121+ be safe in terms of attribute aliasing, but X has no way of122122+ knowing that it is accessing regular memory, not a frame buffer,123123+ so the kernel should fail the mmap rather than doing it with WB.124124+125125+ read/write of /dev/mem126126+127127+ This uses copy_from_user(), which implicitly uses a kernel128128+ identity mapping. This is obviously safe for things in129129+ kern_memmap.130130+131131+ There may be corner cases of things that are not in kern_memmap,132132+ but could be accessed this way. For example, registers in MMIO133133+ space are not in kern_memmap, but could be accessed with a UC134134+ mapping. This would not cause attribute aliasing. But135135+ registers typically can be accessed only with four-byte or136136+ eight-byte accesses, and the copy_from_user() path doesn't allow137137+ any control over the access size, so this would be dangerous.138138+139139+ ioremap()140140+141141+ This returns a kernel identity mapping for use inside the142142+ kernel.143143+144144+ If the region is in kern_memmap, we should use the attribute145145+ specified there. Otherwise, if the EFI memory map reports that146146+ the entire granule supports WB, we should use that (granules147147+ that are partially reserved or occupied by firmware do not appear148148+ in kern_memmap). Otherwise, we should use a UC mapping.149149+150150+PAST PROBLEM CASES151151+152152+ mmap of various MMIO regions from /dev/mem by "X" on Intel platforms153153+154154+ The EFI memory map may not report these MMIO regions.155155+156156+ These must be allowed so that X will work. This means that157157+ when the EFI memory map is incomplete, every /dev/mem mmap must158158+ succeed. It may create either WB or UC user mappings, depending159159+ on whether the region is in kern_memmap or the EFI memory map.160160+161161+ mmap of 0x0-0xA0000 /dev/mem by "hwinfo" on HP sx1000 with VGA enabled162162+163163+ See https://bugzilla.novell.com/show_bug.cgi?id=140858.164164+165165+ The EFI memory map reports the following attributes:166166+ 0x00000-0x9FFFF WB only167167+ 0xA0000-0xBFFFF UC only (VGA frame buffer)168168+ 0xC0000-0xFFFFF WB only169169+170170+ This mmap is done with user pages, not kernel identity mappings,171171+ so it is safe to use WB mappings.172172+173173+ The kernel VGA driver may ioremap the VGA frame buffer at 0xA0000,174174+ which will use a granule-sized UC mapping covering 0-0xFFFFF. This175175+ granule covers some WB-only memory, but since UC is non-speculative,176176+ the processor will never generate an uncacheable reference to the177177+ WB-only areas unless the driver explicitly touches them.178178+179179+ mmap of 0x0-0xFFFFF legacy_mem by "X"180180+181181+ If the EFI memory map reports this entire range as WB, there182182+ is no VGA MMIO hole, and the mmap should fail or be done with183183+ a WB mapping.184184+185185+ There's no easy way for X to determine whether the 0xA0000-0xBFFFF186186+ region is a frame buffer or just memory, so I think it's best to187187+ just fail this mmap request rather than using a WB mapping. As188188+ far as I know, there's no need to map legacy_mem with WB189189+ mappings.190190+191191+ Otherwise, a UC mapping of the entire region is probably safe.192192+ The VGA hole means the region will not be in kern_memmap. The193193+ HP sx1000 chipset doesn't support UC access to the memory surrounding194194+ the VGA hole, but X doesn't need that area anyway and should not195195+ reference it.196196+197197+ mmap of 0xA0000-0xBFFFF legacy_mem by "X" on HP sx1000 with VGA disabled198198+199199+ The EFI memory map reports the following attributes:200200+ 0x00000-0xFFFFF WB only (no VGA MMIO hole)201201+202202+ This is a special case of the previous case, and the mmap should203203+ fail for the same reason as above.204204+205205+NOTES206206+207207+ [1] SDM rev 2.2, vol 2, sec 4.4.1.208208+ [2] SDM rev 2.2, vol 2, sec 4.4.6.
+106-58
arch/ia64/kernel/efi.c
···88 * Copyright (C) 1999-2003 Hewlett-Packard Co.99 * David Mosberger-Tang <davidm@hpl.hp.com>1010 * Stephane Eranian <eranian@hpl.hp.com>1111+ * (c) Copyright 2006 Hewlett-Packard Development Company, L.P.1212+ * Bjorn Helgaas <bjorn.helgaas@hp.com>1113 *1214 * All EFI Runtime Services are not implemented yet as EFI only1315 * supports physical mode addressing on SoftSDV. This is to be fixed···624622 return 0;625623}626624625625+static struct kern_memdesc *626626+kern_memory_descriptor (unsigned long phys_addr)627627+{628628+ struct kern_memdesc *md;629629+630630+ for (md = kern_memmap; md->start != ~0UL; md++) {631631+ if (phys_addr - md->start < (md->num_pages << EFI_PAGE_SHIFT))632632+ return md;633633+ }634634+ return 0;635635+}636636+627637static efi_memory_desc_t *628638efi_memory_descriptor (unsigned long phys_addr)629639{···652638653639 if (phys_addr - md->phys_addr < (md->num_pages << EFI_PAGE_SHIFT))654640 return md;655655- }656656- return 0;657657-}658658-659659-static int660660-efi_memmap_has_mmio (void)661661-{662662- void *efi_map_start, *efi_map_end, *p;663663- efi_memory_desc_t *md;664664- u64 efi_desc_size;665665-666666- efi_map_start = __va(ia64_boot_param->efi_memmap);667667- efi_map_end = efi_map_start + ia64_boot_param->efi_memmap_size;668668- efi_desc_size = ia64_boot_param->efi_memdesc_size;669669-670670- for (p = efi_map_start; p < efi_map_end; p += efi_desc_size) {671671- md = p;672672-673673- if (md->type == EFI_MEMORY_MAPPED_IO)674674- return 1;675641 }676642 return 0;677643}···677683}678684EXPORT_SYMBOL(efi_mem_attributes);679685680680-/*681681- * Determines whether the memory at phys_addr supports the desired682682- * attribute (WB, UC, etc). If this returns 1, the caller can safely683683- * access size bytes at phys_addr with the specified attribute.684684- */685685-int686686-efi_mem_attribute_range (unsigned long phys_addr, unsigned long size, u64 attr)686686+u64687687+efi_mem_attribute (unsigned long phys_addr, unsigned long size)687688{688689 unsigned long end = phys_addr + size;689690 efi_memory_desc_t *md = efi_memory_descriptor(phys_addr);691691+ u64 attr;692692+693693+ if (!md)694694+ return 0;690695691696 /*692692- * Some firmware doesn't report MMIO regions in the EFI memory693693- * map. The Intel BigSur (a.k.a. HP i2000) has this problem.694694- * On those platforms, we have to assume UC is valid everywhere.697697+ * EFI_MEMORY_RUNTIME is not a memory attribute; it just tells698698+ * the kernel that firmware needs this region mapped.695699 */696696- if (!md || (md->attribute & attr) != attr) {697697- if (attr == EFI_MEMORY_UC && !efi_memmap_has_mmio())698698- return 1;699699- return 0;700700- }701701-700700+ attr = md->attribute & ~EFI_MEMORY_RUNTIME;702701 do {703702 unsigned long md_end = efi_md_end(md);704703705704 if (end <= md_end)706706- return 1;705705+ return attr;707706708707 md = efi_memory_descriptor(md_end);709709- if (!md || (md->attribute & attr) != attr)708708+ if (!md || (md->attribute & ~EFI_MEMORY_RUNTIME) != attr)710709 return 0;711710 } while (md);712711 return 0;713712}714713715715-/*716716- * For /dev/mem, we only allow read & write system calls to access717717- * write-back memory, because read & write don't allow the user to718718- * control access size.719719- */714714+u64715715+kern_mem_attribute (unsigned long phys_addr, unsigned long size)716716+{717717+ unsigned long end = phys_addr + size;718718+ struct kern_memdesc *md;719719+ u64 attr;720720+721721+ /*722722+ * This is a hack for ioremap calls before we set up kern_memmap.723723+ * Maybe we should do efi_memmap_init() earlier instead.724724+ */725725+ if (!kern_memmap) {726726+ attr = efi_mem_attribute(phys_addr, size);727727+ if (attr & EFI_MEMORY_WB)728728+ return EFI_MEMORY_WB;729729+ return 0;730730+ }731731+732732+ md = kern_memory_descriptor(phys_addr);733733+ if (!md)734734+ return 0;735735+736736+ attr = md->attribute;737737+ do {738738+ unsigned long md_end = kmd_end(md);739739+740740+ if (end <= md_end)741741+ return attr;742742+743743+ md = kern_memory_descriptor(md_end);744744+ if (!md || md->attribute != attr)745745+ return 0;746746+ } while (md);747747+ return 0;748748+}749749+EXPORT_SYMBOL(kern_mem_attribute);750750+720751int721752valid_phys_addr_range (unsigned long phys_addr, unsigned long size)722753{723723- return efi_mem_attribute_range(phys_addr, size, EFI_MEMORY_WB);754754+ u64 attr;755755+756756+ /*757757+ * /dev/mem reads and writes use copy_to_user(), which implicitly758758+ * uses a granule-sized kernel identity mapping. It's really759759+ * only safe to do this for regions in kern_memmap. For more760760+ * details, see Documentation/ia64/aliasing.txt.761761+ */762762+ attr = kern_mem_attribute(phys_addr, size);763763+ if (attr & EFI_MEMORY_WB || attr & EFI_MEMORY_UC)764764+ return 1;765765+ return 0;724766}725767726726-/*727727- * We allow mmap of anything in the EFI memory map that supports728728- * either write-back or uncacheable access. For uncacheable regions,729729- * the supported access sizes are system-dependent, and the user is730730- * responsible for using the correct size.731731- *732732- * Note that this doesn't currently allow access to hot-added memory,733733- * because that doesn't appear in the boot-time EFI memory map.734734- */735768int736769valid_mmap_phys_addr_range (unsigned long phys_addr, unsigned long size)737770{738738- if (efi_mem_attribute_range(phys_addr, size, EFI_MEMORY_WB))739739- return 1;771771+ /*772772+ * MMIO regions are often missing from the EFI memory map.773773+ * We must allow mmap of them for programs like X, so we774774+ * currently can't do any useful validation.775775+ */776776+ return 1;777777+}740778741741- if (efi_mem_attribute_range(phys_addr, size, EFI_MEMORY_UC))742742- return 1;779779+pgprot_t780780+phys_mem_access_prot(struct file *file, unsigned long pfn, unsigned long size,781781+ pgprot_t vma_prot)782782+{783783+ unsigned long phys_addr = pfn << PAGE_SHIFT;784784+ u64 attr;743785744744- return 0;786786+ /*787787+ * For /dev/mem mmap, we use user mappings, but if the region is788788+ * in kern_memmap (and hence may be covered by a kernel mapping),789789+ * we must use the same attribute as the kernel mapping.790790+ */791791+ attr = kern_mem_attribute(phys_addr, size);792792+ if (attr & EFI_MEMORY_WB)793793+ return pgprot_cacheable(vma_prot);794794+ else if (attr & EFI_MEMORY_UC)795795+ return pgprot_noncached(vma_prot);796796+797797+ /*798798+ * Some chipsets don't support UC access to memory. If799799+ * WB is supported, we prefer that.800800+ */801801+ if (efi_mem_attribute(phys_addr, size) & EFI_MEMORY_WB)802802+ return pgprot_cacheable(vma_prot);803803+804804+ return pgprot_noncached(vma_prot);745805}746806747807int __init
+22-5
arch/ia64/mm/ioremap.c
···1111#include <linux/module.h>1212#include <linux/efi.h>1313#include <asm/io.h>1414+#include <asm/meminit.h>14151516static inline void __iomem *1617__ioremap (unsigned long offset, unsigned long size)···2221void __iomem *2322ioremap (unsigned long offset, unsigned long size)2423{2525- if (efi_mem_attribute_range(offset, size, EFI_MEMORY_WB))2626- return phys_to_virt(offset);2424+ u64 attr;2525+ unsigned long gran_base, gran_size;27262828- if (efi_mem_attribute_range(offset, size, EFI_MEMORY_UC))2727+ /*2828+ * For things in kern_memmap, we must use the same attribute2929+ * as the rest of the kernel. For more details, see3030+ * Documentation/ia64/aliasing.txt.3131+ */3232+ attr = kern_mem_attribute(offset, size);3333+ if (attr & EFI_MEMORY_WB)3434+ return phys_to_virt(offset);3535+ else if (attr & EFI_MEMORY_UC)2936 return __ioremap(offset, size);30373138 /*3232- * Someday this should check ACPI resources so we3333- * can do the right thing for hot-plugged regions.3939+ * Some chipsets don't support UC access to memory. If4040+ * WB is supported for the whole granule, we prefer that.3441 */4242+ gran_base = GRANULEROUNDDOWN(offset);4343+ gran_size = GRANULEROUNDUP(offset + size) - gran_base;4444+ if (efi_mem_attribute(gran_base, gran_size) & EFI_MEMORY_WB)4545+ return phys_to_virt(offset);4646+3547 return __ioremap(offset, size);3648}3749EXPORT_SYMBOL(ioremap);···5238void __iomem *5339ioremap_nocache (unsigned long offset, unsigned long size)5440{4141+ if (kern_mem_attribute(offset, size) & EFI_MEMORY_WB)4242+ return 0;4343+5544 return __ioremap(offset, size);5645}5746EXPORT_SYMBOL(ioremap_nocache);
···8888}89899090#define ARCH_HAS_VALID_PHYS_ADDR_RANGE9191+extern u64 kern_mem_attribute (unsigned long phys_addr, unsigned long size);9192extern int valid_phys_addr_range (unsigned long addr, size_t count); /* efi.c */9293extern int valid_mmap_phys_addr_range (unsigned long addr, size_t count);9394
+10-12
include/asm-ia64/pgtable.h
···316316#define pte_mkhuge(pte) (__pte(pte_val(pte)))317317318318/*319319- * Macro to a page protection value as "uncacheable". Note that "protection" is really a320320- * misnomer here as the protection value contains the memory attribute bits, dirty bits,321321- * and various other bits as well.319319+ * Make page protection values cacheable, uncacheable, or write-320320+ * combining. Note that "protection" is really a misnomer here as the321321+ * protection value contains the memory attribute bits, dirty bits, and322322+ * various other bits as well.322323 */324324+#define pgprot_cacheable(prot) __pgprot((pgprot_val(prot) & ~_PAGE_MA_MASK) | _PAGE_MA_WB)323325#define pgprot_noncached(prot) __pgprot((pgprot_val(prot) & ~_PAGE_MA_MASK) | _PAGE_MA_UC)324324-325325-/*326326- * Macro to make mark a page protection value as "write-combining".327327- * Note that "protection" is really a misnomer here as the protection328328- * value contains the memory attribute bits, dirty bits, and various329329- * other bits as well. Accesses through a write-combining translation330330- * works bypasses the caches, but does allow for consecutive writes to331331- * be combined into single (but larger) write transactions.332332- */333326#define pgprot_writecombine(prot) __pgprot((pgprot_val(prot) & ~_PAGE_MA_MASK) | _PAGE_MA_WC)327327+328328+struct file;329329+extern pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,330330+ unsigned long size, pgprot_t vma_prot);331331+#define __HAVE_PHYS_MEM_ACCESS_PROT334332335333static inline unsigned long336334pgd_index (unsigned long address)
+1
include/linux/efi.h
···294294extern u64 efi_get_iobase (void);295295extern u32 efi_mem_type (unsigned long phys_addr);296296extern u64 efi_mem_attributes (unsigned long phys_addr);297297+extern u64 efi_mem_attribute (unsigned long phys_addr, unsigned long size);297298extern int efi_mem_attribute_range (unsigned long phys_addr, unsigned long size,298299 u64 attr);299300extern int __init efi_uart_console_only (void);