Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

driver core: Rewrite test_async_driver_probe to cover serialization and NUMA affinity

The current async_probe test code is only testing one device allocated
prior to driver load and only loading one device afterwards. Instead of
doing things this way it makes much more sense to load one device per CPU
in order to actually stress the async infrastructure. By doing this we
should see delays significantly increase in the event of devices being
serialized.

In addition I have updated the test to verify that we are trying to place
the work on the correct NUMA node when we are running in async mode. By
doing this we can verify the best possible outcome for device and driver
load times.

I have added a timeout value that is used to disable the sleep and instead
cause the probe routine to report an error indicating it timed out. By
doing this we limit the maximum runtime for the test to 20 seconds or less.

The last major change in this set is that I have gone through and tuned it
for handling the massive number of possible events that will be scheduled.
Instead of reporting the sleep for each individual device it is moved to
only being displayed if we enable debugging.

With this patch applied below are what a failing test and a passing test
should look like. I elided a few hundred lines in the failing test that
were duplicated since the system I was testing on had a massive number of
CPU cores:

-- Failing --
[ 243.524697] test_async_driver_probe: registering first set of asynchronous devices...
[ 243.535625] test_async_driver_probe: registering asynchronous driver...
[ 243.543038] test_async_driver_probe: registration took 0 msecs
[ 243.549559] test_async_driver_probe: registering second set of asynchronous devices...
[ 243.568350] platform test_async_driver.447: registration took 9 msecs
[ 243.575544] test_async_driver_probe: registering first synchronous device...
[ 243.583454] test_async_driver_probe: registering synchronous driver...
[ 248.825920] test_async_driver_probe: registration took 5235 msecs
[ 248.825922] test_async_driver_probe: registering second synchronous device...
[ 248.825928] test_async_driver test_async_driver.443: NUMA node mismatch 3 != 1
[ 248.825932] test_async_driver test_async_driver.445: NUMA node mismatch 3 != 1
[ 248.825935] test_async_driver test_async_driver.446: NUMA node mismatch 3 != 1
[ 248.825939] test_async_driver test_async_driver.440: NUMA node mismatch 3 != 1
[ 248.825943] test_async_driver test_async_driver.441: NUMA node mismatch 3 != 1
...
[ 248.827150] test_async_driver test_async_driver.229: NUMA node mismatch 0 != 1
[ 248.827158] test_async_driver test_async_driver.228: NUMA node mismatch 0 != 1
[ 248.827220] test_async_driver test_async_driver.281: NUMA node mismatch 2 != 1
[ 248.827229] test_async_driver test_async_driver.282: NUMA node mismatch 2 != 1
[ 248.827240] test_async_driver test_async_driver.280: NUMA node mismatch 2 != 1
[ 253.945834] test_async_driver test_async_driver.1: NUMA node mismatch 0 != 1
[ 253.945878] test_sync_driver test_sync_driver.1: registration took 5119 msecs
[ 253.961693] test_async_driver_probe: async events still pending, forcing timeout and synchronize
[ 259.065839] test_async_driver test_async_driver.2: NUMA node mismatch 0 != 1
[ 259.073786] test_async_driver test_async_driver.3: async probe took too long
[ 259.081669] test_async_driver test_async_driver.3: NUMA node mismatch 0 != 1
[ 259.089569] test_async_driver test_async_driver.4: async probe took too long
[ 259.097451] test_async_driver test_async_driver.4: NUMA node mismatch 0 != 1
[ 259.105338] test_async_driver test_async_driver.5: async probe took too long
[ 259.113204] test_async_driver test_async_driver.5: NUMA node mismatch 0 != 1
[ 259.121089] test_async_driver test_async_driver.6: async probe took too long
[ 259.128961] test_async_driver test_async_driver.6: NUMA node mismatch 0 != 1
[ 259.136850] test_async_driver test_async_driver.7: async probe took too long
...
[ 262.124062] test_async_driver test_async_driver.221: async probe took too long
[ 262.132130] test_async_driver test_async_driver.221: NUMA node mismatch 3 != 1
[ 262.140206] test_async_driver test_async_driver.222: async probe took too long
[ 262.148277] test_async_driver test_async_driver.222: NUMA node mismatch 3 != 1
[ 262.156351] test_async_driver test_async_driver.223: async probe took too long
[ 262.164419] test_async_driver test_async_driver.223: NUMA node mismatch 3 != 1
[ 262.172630] test_async_driver_probe: Test failed with 222 errors and 336 warnings

-- Passing --
[ 105.419247] test_async_driver_probe: registering first set of asynchronous devices...
[ 105.432040] test_async_driver_probe: registering asynchronous driver...
[ 105.439718] test_async_driver_probe: registration took 0 msecs
[ 105.446239] test_async_driver_probe: registering second set of asynchronous devices...
[ 105.477986] platform test_async_driver.447: registration took 22 msecs
[ 105.485276] test_async_driver_probe: registering first synchronous device...
[ 105.493169] test_async_driver_probe: registering synchronous driver...
[ 110.597981] test_async_driver_probe: registration took 5097 msecs
[ 110.604806] test_async_driver_probe: registering second synchronous device...
[ 115.707490] test_sync_driver test_sync_driver.1: registration took 5094 msecs
[ 115.715478] test_async_driver_probe: completed successfully

Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

authored by

Alexander Duyck and committed by
Greg Kroah-Hartman
57ea974f af87b9a7

+203 -62
+203 -62
drivers/base/test/test_async_driver_probe.c
··· 11 11 #include <linux/module.h> 12 12 #include <linux/platform_device.h> 13 13 #include <linux/time.h> 14 + #include <linux/numa.h> 15 + #include <linux/nodemask.h> 16 + #include <linux/topology.h> 14 17 15 18 #define TEST_PROBE_DELAY (5 * 1000) /* 5 sec */ 16 19 #define TEST_PROBE_THRESHOLD (TEST_PROBE_DELAY / 2) 17 20 21 + static atomic_t warnings, errors, timeout, async_completed; 22 + 18 23 static int test_probe(struct platform_device *pdev) 19 24 { 20 - dev_info(&pdev->dev, "sleeping for %d msecs in probe\n", 21 - TEST_PROBE_DELAY); 22 - msleep(TEST_PROBE_DELAY); 23 - dev_info(&pdev->dev, "done sleeping\n"); 25 + struct device *dev = &pdev->dev; 26 + 27 + /* 28 + * Determine if we have hit the "timeout" limit for the test if we 29 + * have then report it as an error, otherwise we wil sleep for the 30 + * required amount of time and then report completion. 31 + */ 32 + if (atomic_read(&timeout)) { 33 + dev_err(dev, "async probe took too long\n"); 34 + atomic_inc(&errors); 35 + } else { 36 + dev_dbg(&pdev->dev, "sleeping for %d msecs in probe\n", 37 + TEST_PROBE_DELAY); 38 + msleep(TEST_PROBE_DELAY); 39 + dev_dbg(&pdev->dev, "done sleeping\n"); 40 + } 41 + 42 + /* 43 + * Report NUMA mismatch if device node is set and we are not 44 + * performing an async init on that node. 45 + */ 46 + if (dev->driver->probe_type == PROBE_PREFER_ASYNCHRONOUS) { 47 + if (dev_to_node(dev) != numa_node_id()) { 48 + dev_warn(dev, "NUMA node mismatch %d != %d\n", 49 + dev_to_node(dev), numa_node_id()); 50 + atomic_inc(&warnings); 51 + } 52 + 53 + atomic_inc(&async_completed); 54 + } 24 55 25 56 return 0; 26 57 } ··· 72 41 .probe = test_probe, 73 42 }; 74 43 75 - static struct platform_device *async_dev_1, *async_dev_2; 76 - static struct platform_device *sync_dev_1; 44 + static struct platform_device *async_dev[NR_CPUS * 2]; 45 + static struct platform_device *sync_dev[2]; 46 + 47 + static struct platform_device * 48 + test_platform_device_register_node(char *name, int id, int nid) 49 + { 50 + struct platform_device *pdev; 51 + int ret; 52 + 53 + pdev = platform_device_alloc(name, id); 54 + if (!pdev) 55 + return NULL; 56 + 57 + if (nid != NUMA_NO_NODE) 58 + set_dev_node(&pdev->dev, nid); 59 + 60 + ret = platform_device_add(pdev); 61 + if (ret) { 62 + platform_device_put(pdev); 63 + return ERR_PTR(ret); 64 + } 65 + 66 + return pdev; 67 + 68 + } 77 69 78 70 static int __init test_async_probe_init(void) 79 71 { 80 - ktime_t calltime, delta; 72 + struct platform_device **pdev = NULL; 73 + int async_id = 0, sync_id = 0; 81 74 unsigned long long duration; 82 - int error; 75 + ktime_t calltime, delta; 76 + int err, nid, cpu; 83 77 84 - pr_info("registering first asynchronous device...\n"); 78 + pr_info("registering first set of asynchronous devices...\n"); 85 79 86 - async_dev_1 = platform_device_register_simple("test_async_driver", 1, 87 - NULL, 0); 88 - if (IS_ERR(async_dev_1)) { 89 - error = PTR_ERR(async_dev_1); 90 - pr_err("failed to create async_dev_1: %d\n", error); 91 - return error; 80 + for_each_online_cpu(cpu) { 81 + nid = cpu_to_node(cpu); 82 + pdev = &async_dev[async_id]; 83 + *pdev = test_platform_device_register_node("test_async_driver", 84 + async_id, 85 + nid); 86 + if (IS_ERR(*pdev)) { 87 + err = PTR_ERR(*pdev); 88 + *pdev = NULL; 89 + pr_err("failed to create async_dev: %d\n", err); 90 + goto err_unregister_async_devs; 91 + } 92 + 93 + async_id++; 92 94 } 93 95 94 96 pr_info("registering asynchronous driver...\n"); 95 97 calltime = ktime_get(); 96 - error = platform_driver_register(&async_driver); 97 - if (error) { 98 - pr_err("Failed to register async_driver: %d\n", error); 99 - goto err_unregister_async_dev_1; 98 + err = platform_driver_register(&async_driver); 99 + if (err) { 100 + pr_err("Failed to register async_driver: %d\n", err); 101 + goto err_unregister_async_devs; 100 102 } 101 103 102 104 delta = ktime_sub(ktime_get(), calltime); ··· 137 73 pr_info("registration took %lld msecs\n", duration); 138 74 if (duration > TEST_PROBE_THRESHOLD) { 139 75 pr_err("test failed: probe took too long\n"); 140 - error = -ETIMEDOUT; 76 + err = -ETIMEDOUT; 141 77 goto err_unregister_async_driver; 142 78 } 143 79 144 - pr_info("registering second asynchronous device...\n"); 80 + pr_info("registering second set of asynchronous devices...\n"); 145 81 calltime = ktime_get(); 146 - async_dev_2 = platform_device_register_simple("test_async_driver", 2, 147 - NULL, 0); 148 - if (IS_ERR(async_dev_2)) { 149 - error = PTR_ERR(async_dev_2); 150 - pr_err("failed to create async_dev_2: %d\n", error); 151 - goto err_unregister_async_driver; 82 + for_each_online_cpu(cpu) { 83 + nid = cpu_to_node(cpu); 84 + pdev = &sync_dev[sync_id]; 85 + 86 + *pdev = test_platform_device_register_node("test_async_driver", 87 + async_id, 88 + nid); 89 + if (IS_ERR(*pdev)) { 90 + err = PTR_ERR(*pdev); 91 + *pdev = NULL; 92 + pr_err("failed to create async_dev: %d\n", err); 93 + goto err_unregister_async_driver; 94 + } 95 + 96 + async_id++; 152 97 } 153 98 154 99 delta = ktime_sub(ktime_get(), calltime); 155 100 duration = (unsigned long long) ktime_to_ms(delta); 156 - pr_info("registration took %lld msecs\n", duration); 101 + dev_info(&(*pdev)->dev, 102 + "registration took %lld msecs\n", duration); 157 103 if (duration > TEST_PROBE_THRESHOLD) { 158 - pr_err("test failed: probe took too long\n"); 159 - error = -ETIMEDOUT; 160 - goto err_unregister_async_dev_2; 104 + dev_err(&(*pdev)->dev, 105 + "test failed: probe took too long\n"); 106 + err = -ETIMEDOUT; 107 + goto err_unregister_async_driver; 161 108 } 109 + 110 + 111 + pr_info("registering first synchronous device...\n"); 112 + nid = cpu_to_node(cpu); 113 + pdev = &sync_dev[sync_id]; 114 + 115 + *pdev = test_platform_device_register_node("test_sync_driver", 116 + sync_id, 117 + NUMA_NO_NODE); 118 + if (IS_ERR(*pdev)) { 119 + err = PTR_ERR(*pdev); 120 + *pdev = NULL; 121 + pr_err("failed to create sync_dev: %d\n", err); 122 + goto err_unregister_async_driver; 123 + } 124 + 125 + sync_id++; 162 126 163 127 pr_info("registering synchronous driver...\n"); 164 - 165 - error = platform_driver_register(&sync_driver); 166 - if (error) { 167 - pr_err("Failed to register async_driver: %d\n", error); 168 - goto err_unregister_async_dev_2; 169 - } 170 - 171 - pr_info("registering synchronous device...\n"); 172 128 calltime = ktime_get(); 173 - sync_dev_1 = platform_device_register_simple("test_sync_driver", 1, 174 - NULL, 0); 175 - if (IS_ERR(sync_dev_1)) { 176 - error = PTR_ERR(sync_dev_1); 177 - pr_err("failed to create sync_dev_1: %d\n", error); 178 - goto err_unregister_sync_driver; 129 + err = platform_driver_register(&sync_driver); 130 + if (err) { 131 + pr_err("Failed to register async_driver: %d\n", err); 132 + goto err_unregister_sync_devs; 179 133 } 180 134 181 135 delta = ktime_sub(ktime_get(), calltime); 182 136 duration = (unsigned long long) ktime_to_ms(delta); 183 137 pr_info("registration took %lld msecs\n", duration); 184 138 if (duration < TEST_PROBE_THRESHOLD) { 185 - pr_err("test failed: probe was too quick\n"); 186 - error = -ETIMEDOUT; 187 - goto err_unregister_sync_dev_1; 139 + dev_err(&(*pdev)->dev, 140 + "test failed: probe was too quick\n"); 141 + err = -ETIMEDOUT; 142 + goto err_unregister_sync_driver; 188 143 } 189 144 190 - pr_info("completed successfully"); 145 + pr_info("registering second synchronous device...\n"); 146 + pdev = &sync_dev[sync_id]; 147 + calltime = ktime_get(); 191 148 192 - return 0; 149 + *pdev = test_platform_device_register_node("test_sync_driver", 150 + sync_id, 151 + NUMA_NO_NODE); 152 + if (IS_ERR(*pdev)) { 153 + err = PTR_ERR(*pdev); 154 + *pdev = NULL; 155 + pr_err("failed to create sync_dev: %d\n", err); 156 + goto err_unregister_sync_driver; 157 + } 193 158 194 - err_unregister_sync_dev_1: 195 - platform_device_unregister(sync_dev_1); 159 + sync_id++; 160 + 161 + delta = ktime_sub(ktime_get(), calltime); 162 + duration = (unsigned long long) ktime_to_ms(delta); 163 + dev_info(&(*pdev)->dev, 164 + "registration took %lld msecs\n", duration); 165 + if (duration < TEST_PROBE_THRESHOLD) { 166 + dev_err(&(*pdev)->dev, 167 + "test failed: probe was too quick\n"); 168 + err = -ETIMEDOUT; 169 + goto err_unregister_sync_driver; 170 + } 171 + 172 + /* 173 + * The async events should have completed while we were taking care 174 + * of the synchronous events. We will now terminate any outstanding 175 + * asynchronous probe calls remaining by forcing timeout and remove 176 + * the driver before we return which should force the flush of the 177 + * pending asynchronous probe calls. 178 + * 179 + * Otherwise if they completed without errors or warnings then 180 + * report successful completion. 181 + */ 182 + if (atomic_read(&async_completed) != async_id) { 183 + pr_err("async events still pending, forcing timeout\n"); 184 + atomic_inc(&timeout); 185 + err = -ETIMEDOUT; 186 + } else if (!atomic_read(&errors) && !atomic_read(&warnings)) { 187 + pr_info("completed successfully\n"); 188 + return 0; 189 + } 196 190 197 191 err_unregister_sync_driver: 198 192 platform_driver_unregister(&sync_driver); 199 - 200 - err_unregister_async_dev_2: 201 - platform_device_unregister(async_dev_2); 202 - 193 + err_unregister_sync_devs: 194 + while (sync_id--) 195 + platform_device_unregister(sync_dev[sync_id]); 203 196 err_unregister_async_driver: 204 197 platform_driver_unregister(&async_driver); 198 + err_unregister_async_devs: 199 + while (async_id--) 200 + platform_device_unregister(async_dev[async_id]); 205 201 206 - err_unregister_async_dev_1: 207 - platform_device_unregister(async_dev_1); 202 + /* 203 + * If err is already set then count that as an additional error for 204 + * the test. Otherwise we will report an invalid argument error and 205 + * not count that as we should have reached here as a result of 206 + * errors or warnings being reported by the probe routine. 207 + */ 208 + if (err) 209 + atomic_inc(&errors); 210 + else 211 + err = -EINVAL; 208 212 209 - return error; 213 + pr_err("Test failed with %d errors and %d warnings\n", 214 + atomic_read(&errors), atomic_read(&warnings)); 215 + 216 + return err; 210 217 } 211 218 module_init(test_async_probe_init); 212 219 213 220 static void __exit test_async_probe_exit(void) 214 221 { 222 + int id = 2; 223 + 215 224 platform_driver_unregister(&async_driver); 216 225 platform_driver_unregister(&sync_driver); 217 - platform_device_unregister(async_dev_1); 218 - platform_device_unregister(async_dev_2); 219 - platform_device_unregister(sync_dev_1); 226 + 227 + while (id--) 228 + platform_device_unregister(sync_dev[id]); 229 + 230 + id = NR_CPUS * 2; 231 + while (id--) 232 + platform_device_unregister(async_dev[id]); 220 233 } 221 234 module_exit(test_async_probe_exit); 222 235