Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

x86-64: rename misleadingly named '__copy_user_nocache()' function

This function was a masterclass in bad naming, for various historical
reasons.

It claimed to be a non-cached user copy. It is literally _neither_ of
those things. It's a specialty memory copy routine that uses
non-temporal stores for the destination (but not the source), and that
does exception handling for both source and destination accesses.

Also note that while it works for unaligned targets, any unaligned parts
(whether at beginning or end) will not use non-temporal stores, since
only words and quadwords can be non-temporal on x86.

The exception handling means that it _can_ be used for user space
accesses, but not on its own - it needs all the normal "start user space
access" logic around it.

But typically the user space access would be the source, not the
non-temporal destination. That was the original intention of this,
where the destination was some fragile persistent memory target that
needed non-temporal stores in order to catch machine check exceptions
synchronously and deal with them gracefully.

Thus that non-descriptive name: one use case was to copy from user space
into a non-cached kernel buffer. However, the existing users are a mix
of that intended use-case, and a couple of random drivers that just did
this as a performance tweak.

Some of those random drivers then actively misused the user copying
version (with STAC/CLAC and all) to do kernel copies without ever even
caring about the exception handling, _just_ for the non-temporal
destination.

Rename it as a first small step to actually make it halfway sane, and
change the prototype to be more normal: it doesn't take a user pointer
unless the caller has done the proper conversion, and the argument size
is the full size_t (it still won't actually copy more than 4GB in one
go, but there's also no reason to silently truncate the size argument in
the caller).

Finally, use this now sanely named function in the NTB code, which
mis-used a user copy version (with STAC/CLAC and all) of this interface
despite it not actually being a user copy at all.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

+16 -16
+3 -2
arch/x86/include/asm/uaccess_64.h
··· 147 147 return copy_user_generic((__force void *)dst, src, size); 148 148 } 149 149 150 - extern long __copy_user_nocache(void *dst, const void __user *src, unsigned size); 150 + #define copy_to_nontemporal copy_to_nontemporal 151 + extern size_t copy_to_nontemporal(void *dst, const void *src, size_t size); 151 152 extern long __copy_user_flushcache(void *dst, const void __user *src, unsigned size); 152 153 153 154 static inline int ··· 158 157 long ret; 159 158 kasan_check_write(dst, size); 160 159 stac(); 161 - ret = __copy_user_nocache(dst, src, size); 160 + ret = copy_to_nontemporal(dst, (__force const void *)src, size); 162 161 clac(); 163 162 return ret; 164 163 }
+3 -3
arch/x86/lib/copy_user_uncached_64.S
··· 27 27 * Output: 28 28 * rax uncopied bytes or 0 if successful. 29 29 */ 30 - SYM_FUNC_START(__copy_user_nocache) 30 + SYM_FUNC_START(copy_to_nontemporal) 31 31 ANNOTATE_NOENDBR 32 32 /* If destination is not 7-byte aligned, we'll have to align it */ 33 33 testb $7,%dil ··· 240 240 _ASM_EXTABLE_UA(52b, .Ldone0) 241 241 _ASM_EXTABLE_UA(53b, .Ldone0) 242 242 243 - SYM_FUNC_END(__copy_user_nocache) 244 - EXPORT_SYMBOL(__copy_user_nocache) 243 + SYM_FUNC_END(copy_to_nontemporal) 244 + EXPORT_SYMBOL(copy_to_nontemporal)
+2 -2
arch/x86/lib/usercopy_64.c
··· 49 49 long rc; 50 50 51 51 stac(); 52 - rc = __copy_user_nocache(dst, src, size); 52 + rc = copy_to_nontemporal(dst, (__force const void *)src, size); 53 53 clac(); 54 54 55 55 /* 56 - * __copy_user_nocache() uses non-temporal stores for the bulk 56 + * copy_to_nontemporal() uses non-temporal stores for the bulk 57 57 * of the transfer, but we need to manually flush if the 58 58 * transfer is unaligned. A cached memory copy is used when 59 59 * destination or size is not naturally aligned. That is:
+3 -5
drivers/infiniband/sw/rdmavt/qp.c
··· 92 92 static void cacheless_memcpy(void *dst, void *src, size_t n) 93 93 { 94 94 /* 95 - * Use the only available X64 cacheless copy. Add a __user cast 96 - * to quiet sparse. The src agument is already in the kernel so 97 - * there are no security issues. The extra fault recovery machinery 98 - * is not invoked. 95 + * Use the only available X64 cacheless copy. 96 + * The extra fault recovery machinery is not invoked. 99 97 */ 100 - __copy_user_nocache(dst, (void __user *)src, n); 98 + copy_to_nontemporal(dst, src, n); 101 99 } 102 100 103 101 void rvt_wss_exit(struct rvt_dev_info *rdi)
+4 -3
drivers/ntb/ntb_transport.c
··· 1810 1810 1811 1811 static void ntb_memcpy_tx(struct ntb_queue_entry *entry, void __iomem *offset) 1812 1812 { 1813 - #ifdef ARCH_HAS_NOCACHE_UACCESS 1813 + #ifdef copy_to_nontemporal 1814 1814 /* 1815 1815 * Using non-temporal mov to improve performance on non-cached 1816 - * writes, even though we aren't actually copying from user space. 1816 + * writes. This only works if __iomem is strictly memory-like, 1817 + * but that is the case on x86-64 1817 1818 */ 1818 - __copy_from_user_inatomic_nocache(offset, entry->buf, entry->len); 1819 + copy_to_nontemporal(offset, entry->buf, entry->len); 1819 1820 #else 1820 1821 memcpy_toio(offset, entry->buf, entry->len); 1821 1822 #endif
+1 -1
tools/objtool/check.c
··· 1260 1260 "copy_mc_enhanced_fast_string", 1261 1261 "rep_stos_alternative", 1262 1262 "rep_movs_alternative", 1263 - "__copy_user_nocache", 1263 + "copy_to_nontemporal", 1264 1264 NULL 1265 1265 }; 1266 1266