Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

string: memchr_inv() speed improvements

- Generate a 64-bit pattern more efficiently

memchr_inv needs to generate a 64-bit pattern filled with a target
character. The operation can be done by more efficient way.

- Don't call the slow check_bytes() if the memory area is 64-bit aligned

memchr_inv compares contiguous 64-bit words with the 64-bit pattern as
much as possible. The outside of the region is checked by check_bytes()
that scans for each byte. Unfortunately, the first 64-bit word is
unexpectedly scanned by check_bytes() even if the memory area is aligned
to a 64-bit boundary.

Both changes were originally suggested by Eric Dumazet.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

authored by

Akinobu Mita and committed by
Linus Torvalds
f43804bf a403d930

+16 -4
+16 -4
lib/string.c
··· 785 785 if (bytes <= 16) 786 786 return check_bytes8(start, value, bytes); 787 787 788 - value64 = value | value << 8 | value << 16 | value << 24; 789 - value64 = (value64 & 0xffffffff) | value64 << 32; 790 - prefix = 8 - ((unsigned long)start) % 8; 788 + value64 = value; 789 + #if defined(ARCH_HAS_FAST_MULTIPLIER) && BITS_PER_LONG == 64 790 + value64 *= 0x0101010101010101; 791 + #elif defined(ARCH_HAS_FAST_MULTIPLIER) 792 + value64 *= 0x01010101; 793 + value64 |= value64 << 32; 794 + #else 795 + value64 |= value64 << 8; 796 + value64 |= value64 << 16; 797 + value64 |= value64 << 32; 798 + #endif 791 799 800 + prefix = (unsigned long)start % 8; 792 801 if (prefix) { 793 - u8 *r = check_bytes8(start, value, prefix); 802 + u8 *r; 803 + 804 + prefix = 8 - prefix; 805 + r = check_bytes8(start, value, prefix); 794 806 if (r) 795 807 return r; 796 808 start += prefix;