Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

perf bench mem: Always memset source before memcpy

For memcpy, the source pages are memset to zero only when --cycles is
used. This leads to wildly different results with or without --cycles,
since all sources pages are likely to be mapped to the same zero page
without explicit writes.

Before this fix:

$ export cmd="./perf stat -e LLC-loads -- ./perf bench \
mem memcpy -s 1024MB -l 100 -f default"
$ $cmd

2,935,826 LLC-loads
3.821677452 seconds time elapsed

$ $cmd --cycles

217,533,436 LLC-loads
8.616725985 seconds time elapsed

After this fix:

$ $cmd

214,459,686 LLC-loads
8.674301124 seconds time elapsed

$ $cmd --cycles

214,758,651 LLC-loads
8.644480006 seconds time elapsed

Fixes: 47b5757bac03c338 ("perf bench mem: Move boilerplate memory allocation to the infrastructure")
Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: kernel@axis.com
Link: http://lore.kernel.org/lkml/20200810133404.30829-1-vincent.whitchurch@axis.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

authored by

Vincent Whitchurch and committed by
Arnaldo Carvalho de Melo
1beaef29 d566a9c2

+11 -10
+11 -10
tools/perf/bench/mem-functions.c
··· 223 223 return 0; 224 224 } 225 225 226 - static u64 do_memcpy_cycles(const struct function *r, size_t size, void *src, void *dst) 226 + static void memcpy_prefault(memcpy_t fn, size_t size, void *src, void *dst) 227 227 { 228 - u64 cycle_start = 0ULL, cycle_end = 0ULL; 229 - memcpy_t fn = r->fn.memcpy; 230 - int i; 231 - 232 228 /* Make sure to always prefault zero pages even if MMAP_THRESH is crossed: */ 233 229 memset(src, 0, size); 234 230 ··· 233 237 * to not measure page fault overhead: 234 238 */ 235 239 fn(dst, src, size); 240 + } 241 + 242 + static u64 do_memcpy_cycles(const struct function *r, size_t size, void *src, void *dst) 243 + { 244 + u64 cycle_start = 0ULL, cycle_end = 0ULL; 245 + memcpy_t fn = r->fn.memcpy; 246 + int i; 247 + 248 + memcpy_prefault(fn, size, src, dst); 236 249 237 250 cycle_start = get_cycles(); 238 251 for (i = 0; i < nr_loops; ++i) ··· 257 252 memcpy_t fn = r->fn.memcpy; 258 253 int i; 259 254 260 - /* 261 - * We prefault the freshly allocated memory range here, 262 - * to not measure page fault overhead: 263 - */ 264 - fn(dst, src, size); 255 + memcpy_prefault(fn, size, src, dst); 265 256 266 257 BUG_ON(gettimeofday(&tv_start, NULL)); 267 258 for (i = 0; i < nr_loops; ++i)