Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

KVM: arm64: selftests: Enable tuning of error margin in arch_timer test

There are intermittent failures occurred when stressing the
arch-timer test in a Qemu VM:

Guest assert failed, vcpu 0; stage; 4; iter: 3
==== Test Assertion Failure ====
aarch64/arch_timer.c:196: config_iter + 1 == irq_iter
pid=4048 tid=4049 errno=4 - Interrupted system call
1 0x000000000040253b: test_vcpu_run at arch_timer.c:248
2 0x0000ffffb60dd5c7: ?? ??:0
3 0x0000ffffb6145d1b: ?? ??:0
0x3 != 0x2 (config_iter + 1 != irq_iter)e

Further test and debug show that the timeout for an interrupt
to arrive do have random high fluctuation, espectially when
testing in an virtual environment.

To alleviate this issue, just expose the timeout value as user
configurable and print some hint message to increase the value
when hitting the failure..

Signed-off-by: Haibo Xu <haibo1.xu@intel.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Anup Patel <anup@brainfault.org>

authored by

Haibo Xu and committed by
Anup Patel
d1dafd06 f0617e4a

+23 -9
+23 -9
tools/testing/selftests/kvm/aarch64/arch_timer.c
··· 6 6 * CVAL and TVAL registers. This consitutes the four stages in the test. 7 7 * The guest's main thread configures the timer interrupt for a stage 8 8 * and waits for it to fire, with a timeout equal to the timer period. 9 - * It asserts that the timeout doesn't exceed the timer period. 9 + * It asserts that the timeout doesn't exceed the timer period plus 10 + * a user configurable error margin(default to 100us). 10 11 * 11 12 * On the other hand, upon receipt of an interrupt, the guest's interrupt 12 13 * handler validates the interrupt by checking if the architectural state 13 14 * is in compliance with the specifications. 14 15 * 15 16 * The test provides command-line options to configure the timer's 16 - * period (-p), number of vCPUs (-n), and iterations per stage (-i). 17 - * To stress-test the timer stack even more, an option to migrate the 18 - * vCPUs across pCPUs (-m), at a particular rate, is also provided. 17 + * period (-p), number of vCPUs (-n), iterations per stage (-i) and timer 18 + * interrupt arrival error margin (-e). To stress-test the timer stack 19 + * even more, an option to migrate the vCPUs across pCPUs (-m), at a 20 + * particular rate, is also provided. 19 21 * 20 22 * Copyright (c) 2021, Google LLC. 21 23 */ ··· 48 46 uint32_t nr_iter; 49 47 uint32_t timer_period_ms; 50 48 uint32_t migration_freq_ms; 49 + uint32_t timer_err_margin_us; 51 50 struct kvm_arm_counter_offset offset; 52 51 }; 53 52 ··· 57 54 .nr_iter = NR_TEST_ITERS_DEF, 58 55 .timer_period_ms = TIMER_TEST_PERIOD_MS_DEF, 59 56 .migration_freq_ms = TIMER_TEST_MIGRATION_FREQ_MS, 57 + .timer_err_margin_us = TIMER_TEST_ERR_MARGIN_US, 60 58 .offset = { .reserved = 1 }, 61 59 }; 62 60 ··· 194 190 195 191 /* Setup a timeout for the interrupt to arrive */ 196 192 udelay(msecs_to_usecs(test_args.timer_period_ms) + 197 - TIMER_TEST_ERR_MARGIN_US); 193 + test_args.timer_err_margin_us); 198 194 199 195 irq_iter = READ_ONCE(shared_data->nr_iter); 200 - GUEST_ASSERT_EQ(config_iter + 1, irq_iter); 196 + __GUEST_ASSERT(config_iter + 1 == irq_iter, 197 + "config_iter + 1 = 0x%lx, irq_iter = 0x%lx.\n" 198 + " Guest timer interrupt was not trigged within the specified\n" 199 + " interval, try to increase the error margin by [-e] option.\n", 200 + config_iter + 1, irq_iter); 201 201 } 202 202 } 203 203 ··· 416 408 417 409 static void test_print_help(char *name) 418 410 { 419 - pr_info("Usage: %s [-h] [-n nr_vcpus] [-i iterations] [-p timer_period_ms]\n", 420 - name); 411 + pr_info("Usage: %s [-h] [-n nr_vcpus] [-i iterations] [-p timer_period_ms]\n" 412 + "\t\t [-m migration_freq_ms] [-o counter_offset]\n" 413 + "\t\t [-e timer_err_margin_us]\n", name); 421 414 pr_info("\t-n: Number of vCPUs to configure (default: %u; max: %u)\n", 422 415 NR_VCPUS_DEF, KVM_MAX_VCPUS); 423 416 pr_info("\t-i: Number of iterations per stage (default: %u)\n", ··· 428 419 pr_info("\t-m: Frequency (in ms) of vCPUs to migrate to different pCPU. 0 to turn off (default: %u)\n", 429 420 TIMER_TEST_MIGRATION_FREQ_MS); 430 421 pr_info("\t-o: Counter offset (in counter cycles, default: 0)\n"); 422 + pr_info("\t-e: Interrupt arrival error margin (in us) of the guest timer (default: %u)\n", 423 + TIMER_TEST_ERR_MARGIN_US); 431 424 pr_info("\t-h: print this help screen\n"); 432 425 } 433 426 ··· 437 426 { 438 427 int opt; 439 428 440 - while ((opt = getopt(argc, argv, "hn:i:p:m:o:")) != -1) { 429 + while ((opt = getopt(argc, argv, "hn:i:p:m:o:e:")) != -1) { 441 430 switch (opt) { 442 431 case 'n': 443 432 test_args.nr_vcpus = atoi_positive("Number of vCPUs", optarg); ··· 455 444 break; 456 445 case 'm': 457 446 test_args.migration_freq_ms = atoi_non_negative("Frequency", optarg); 447 + break; 448 + case 'e': 449 + test_args.timer_err_margin_us = atoi_non_negative("Error Margin", optarg); 458 450 break; 459 451 case 'o': 460 452 test_args.offset.counter_offset = strtol(optarg, NULL, 0);