Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

rbtree: add prio tree and interval tree tests

Patch 1 implements support for interval trees, on top of the augmented
rbtree API. It also adds synthetic tests to compare the performance of
interval trees vs prio trees. Short answers is that interval trees are
slightly faster (~25%) on insert/erase, and much faster (~2.4 - 3x)
on search. It is debatable how realistic the synthetic test is, and I have
not made such measurements yet, but my impression is that interval trees
would still come out faster.

Patch 2 uses a preprocessor template to make the interval tree generic,
and uses it as a replacement for the vma prio_tree.

Patch 3 takes the other prio_tree user, kmemleak, and converts it to use
a basic rbtree. We don't actually need the augmented rbtree support here
because the intervals are always non-overlapping.

Patch 4 removes the now-unused prio tree library.

Patch 5 proposes an additional optimization to rb_erase_augmented, now
providing it as an inline function so that the augmented callbacks can be
inlined in. This provides an additional 5-10% performance improvement
for the interval tree insert/erase benchmark. There is a maintainance cost
as it exposes augmented rbtree users to some of the rbtree library internals;
however I think this cost shouldn't be too high as I expect the augmented
rbtree will always have much less users than the base rbtree.

I should probably add a quick summary of why I think it makes sense to
replace prio trees with augmented rbtree based interval trees now. One of
the drivers is that we need augmented rbtrees for Rik's vma gap finding
code, and once you have them, it just makes sense to use them for interval
trees as well, as this is the simpler and more well known algorithm. prio
trees, in comparison, seem *too* clever: they impose an additional 'heap'
constraint on the tree, which they use to guarantee a faster worst-case
complexity of O(k+log N) for stabbing queries in a well-balanced prio
tree, vs O(k*log N) for interval trees (where k=number of matches,
N=number of intervals). Now this sounds great, but in practice prio trees
don't realize this theorical benefit. First, the additional constraint
makes them harder to update, so that the kernel implementation has to
simplify things by balancing them like a radix tree, which is not always
ideal. Second, the fact that there are both index and heap properties
makes both tree manipulation and search more complex, which results in a
higher multiplicative time constant. As it turns out, the simple interval
tree algorithm ends up running faster than the more clever prio tree.

This patch:

Add two test modules:

- prio_tree_test measures the performance of lib/prio_tree.c, both for
insertion/removal and for stabbing searches

- interval_tree_test measures the performance of a library of equivalent
functionality, built using the augmented rbtree support.

In order to support the second test module, lib/interval_tree.c is
introduced. It is kept separate from the interval_tree_test main file
for two reasons: first we don't want to provide an unfair advantage
over prio_tree_test by having everything in a single compilation unit,
and second there is the possibility that the interval tree functionality
could get some non-test users in kernel over time.

Signed-off-by: Michel Lespinasse <walken@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

authored by

Michel Lespinasse and committed by
Linus Torvalds
fff3fd8a 3908836a

+417
+27
include/linux/interval_tree.h
··· 1 + #ifndef _LINUX_INTERVAL_TREE_H 2 + #define _LINUX_INTERVAL_TREE_H 3 + 4 + #include <linux/rbtree.h> 5 + 6 + struct interval_tree_node { 7 + struct rb_node rb; 8 + unsigned long start; /* Start of interval */ 9 + unsigned long last; /* Last location _in_ interval */ 10 + unsigned long __subtree_last; 11 + }; 12 + 13 + extern void 14 + interval_tree_insert(struct interval_tree_node *node, struct rb_root *root); 15 + 16 + extern void 17 + interval_tree_remove(struct interval_tree_node *node, struct rb_root *root); 18 + 19 + extern struct interval_tree_node * 20 + interval_tree_iter_first(struct rb_root *root, 21 + unsigned long start, unsigned long last); 22 + 23 + extern struct interval_tree_node * 24 + interval_tree_iter_next(struct interval_tree_node *node, 25 + unsigned long start, unsigned long last); 26 + 27 + #endif /* _LINUX_INTERVAL_TREE_H */
+12
lib/Kconfig.debug
··· 1289 1289 A benchmark measuring the performance of the rbtree library. 1290 1290 Also includes rbtree invariant checks. 1291 1291 1292 + config PRIO_TREE_TEST 1293 + tristate "Prio tree test" 1294 + depends on m && DEBUG_KERNEL 1295 + help 1296 + A benchmark measuring the performance of the prio tree library 1297 + 1298 + config INTERVAL_TREE_TEST 1299 + tristate "Interval tree test" 1300 + depends on m && DEBUG_KERNEL 1301 + help 1302 + A benchmark measuring the performance of the interval tree library 1303 + 1292 1304 config PROVIDE_OHCI1394_DMA_INIT 1293 1305 bool "Remote debugging over FireWire early on boot" 1294 1306 depends on PCI && X86
+4
lib/Makefile
··· 141 141 lib-$(CONFIG_LIBFDT) += $(libfdt_files) 142 142 143 143 obj-$(CONFIG_RBTREE_TEST) += rbtree_test.o 144 + obj-$(CONFIG_PRIO_TREE_TEST) += prio_tree_test.o 145 + obj-$(CONFIG_INTERVAL_TREE_TEST) += interval_tree_test.o 146 + 147 + interval_tree_test-objs := interval_tree_test_main.o interval_tree.o 144 148 145 149 hostprogs-y := gen_crc32table 146 150 clean-files := crc32table.h
+159
lib/interval_tree.c
··· 1 + #include <linux/init.h> 2 + #include <linux/interval_tree.h> 3 + 4 + /* Callbacks for augmented rbtree insert and remove */ 5 + 6 + static inline unsigned long 7 + compute_subtree_last(struct interval_tree_node *node) 8 + { 9 + unsigned long max = node->last, subtree_last; 10 + if (node->rb.rb_left) { 11 + subtree_last = rb_entry(node->rb.rb_left, 12 + struct interval_tree_node, rb)->__subtree_last; 13 + if (max < subtree_last) 14 + max = subtree_last; 15 + } 16 + if (node->rb.rb_right) { 17 + subtree_last = rb_entry(node->rb.rb_right, 18 + struct interval_tree_node, rb)->__subtree_last; 19 + if (max < subtree_last) 20 + max = subtree_last; 21 + } 22 + return max; 23 + } 24 + 25 + RB_DECLARE_CALLBACKS(static, augment_callbacks, struct interval_tree_node, rb, 26 + unsigned long, __subtree_last, compute_subtree_last) 27 + 28 + /* Insert / remove interval nodes from the tree */ 29 + 30 + void interval_tree_insert(struct interval_tree_node *node, 31 + struct rb_root *root) 32 + { 33 + struct rb_node **link = &root->rb_node, *rb_parent = NULL; 34 + unsigned long start = node->start, last = node->last; 35 + struct interval_tree_node *parent; 36 + 37 + while (*link) { 38 + rb_parent = *link; 39 + parent = rb_entry(rb_parent, struct interval_tree_node, rb); 40 + if (parent->__subtree_last < last) 41 + parent->__subtree_last = last; 42 + if (start < parent->start) 43 + link = &parent->rb.rb_left; 44 + else 45 + link = &parent->rb.rb_right; 46 + } 47 + 48 + node->__subtree_last = last; 49 + rb_link_node(&node->rb, rb_parent, link); 50 + rb_insert_augmented(&node->rb, root, &augment_callbacks); 51 + } 52 + 53 + void interval_tree_remove(struct interval_tree_node *node, 54 + struct rb_root *root) 55 + { 56 + rb_erase_augmented(&node->rb, root, &augment_callbacks); 57 + } 58 + 59 + /* 60 + * Iterate over intervals intersecting [start;last] 61 + * 62 + * Note that a node's interval intersects [start;last] iff: 63 + * Cond1: node->start <= last 64 + * and 65 + * Cond2: start <= node->last 66 + */ 67 + 68 + static struct interval_tree_node * 69 + subtree_search(struct interval_tree_node *node, 70 + unsigned long start, unsigned long last) 71 + { 72 + while (true) { 73 + /* 74 + * Loop invariant: start <= node->__subtree_last 75 + * (Cond2 is satisfied by one of the subtree nodes) 76 + */ 77 + if (node->rb.rb_left) { 78 + struct interval_tree_node *left = 79 + rb_entry(node->rb.rb_left, 80 + struct interval_tree_node, rb); 81 + if (start <= left->__subtree_last) { 82 + /* 83 + * Some nodes in left subtree satisfy Cond2. 84 + * Iterate to find the leftmost such node N. 85 + * If it also satisfies Cond1, that's the match 86 + * we are looking for. Otherwise, there is no 87 + * matching interval as nodes to the right of N 88 + * can't satisfy Cond1 either. 89 + */ 90 + node = left; 91 + continue; 92 + } 93 + } 94 + if (node->start <= last) { /* Cond1 */ 95 + if (start <= node->last) /* Cond2 */ 96 + return node; /* node is leftmost match */ 97 + if (node->rb.rb_right) { 98 + node = rb_entry(node->rb.rb_right, 99 + struct interval_tree_node, rb); 100 + if (start <= node->__subtree_last) 101 + continue; 102 + } 103 + } 104 + return NULL; /* No match */ 105 + } 106 + } 107 + 108 + struct interval_tree_node * 109 + interval_tree_iter_first(struct rb_root *root, 110 + unsigned long start, unsigned long last) 111 + { 112 + struct interval_tree_node *node; 113 + 114 + if (!root->rb_node) 115 + return NULL; 116 + node = rb_entry(root->rb_node, struct interval_tree_node, rb); 117 + if (node->__subtree_last < start) 118 + return NULL; 119 + return subtree_search(node, start, last); 120 + } 121 + 122 + struct interval_tree_node * 123 + interval_tree_iter_next(struct interval_tree_node *node, 124 + unsigned long start, unsigned long last) 125 + { 126 + struct rb_node *rb = node->rb.rb_right, *prev; 127 + 128 + while (true) { 129 + /* 130 + * Loop invariants: 131 + * Cond1: node->start <= last 132 + * rb == node->rb.rb_right 133 + * 134 + * First, search right subtree if suitable 135 + */ 136 + if (rb) { 137 + struct interval_tree_node *right = 138 + rb_entry(rb, struct interval_tree_node, rb); 139 + if (start <= right->__subtree_last) 140 + return subtree_search(right, start, last); 141 + } 142 + 143 + /* Move up the tree until we come from a node's left child */ 144 + do { 145 + rb = rb_parent(&node->rb); 146 + if (!rb) 147 + return NULL; 148 + prev = &node->rb; 149 + node = rb_entry(rb, struct interval_tree_node, rb); 150 + rb = node->rb.rb_right; 151 + } while (prev == rb); 152 + 153 + /* Check if the node intersects [start;last] */ 154 + if (last < node->start) /* !Cond1 */ 155 + return NULL; 156 + else if (start <= node->last) /* Cond2 */ 157 + return node; 158 + } 159 + }
+105
lib/interval_tree_test_main.c
··· 1 + #include <linux/module.h> 2 + #include <linux/interval_tree.h> 3 + #include <linux/random.h> 4 + #include <asm/timex.h> 5 + 6 + #define NODES 100 7 + #define PERF_LOOPS 100000 8 + #define SEARCHES 100 9 + #define SEARCH_LOOPS 10000 10 + 11 + static struct rb_root root = RB_ROOT; 12 + static struct interval_tree_node nodes[NODES]; 13 + static u32 queries[SEARCHES]; 14 + 15 + static struct rnd_state rnd; 16 + 17 + static inline unsigned long 18 + search(unsigned long query, struct rb_root *root) 19 + { 20 + struct interval_tree_node *node; 21 + unsigned long results = 0; 22 + 23 + for (node = interval_tree_iter_first(root, query, query); node; 24 + node = interval_tree_iter_next(node, query, query)) 25 + results++; 26 + return results; 27 + } 28 + 29 + static void init(void) 30 + { 31 + int i; 32 + for (i = 0; i < NODES; i++) { 33 + u32 a = prandom32(&rnd), b = prandom32(&rnd); 34 + if (a <= b) { 35 + nodes[i].start = a; 36 + nodes[i].last = b; 37 + } else { 38 + nodes[i].start = b; 39 + nodes[i].last = a; 40 + } 41 + } 42 + for (i = 0; i < SEARCHES; i++) 43 + queries[i] = prandom32(&rnd); 44 + } 45 + 46 + static int interval_tree_test_init(void) 47 + { 48 + int i, j; 49 + unsigned long results; 50 + cycles_t time1, time2, time; 51 + 52 + printk(KERN_ALERT "interval tree insert/remove"); 53 + 54 + prandom32_seed(&rnd, 3141592653589793238ULL); 55 + init(); 56 + 57 + time1 = get_cycles(); 58 + 59 + for (i = 0; i < PERF_LOOPS; i++) { 60 + for (j = 0; j < NODES; j++) 61 + interval_tree_insert(nodes + j, &root); 62 + for (j = 0; j < NODES; j++) 63 + interval_tree_remove(nodes + j, &root); 64 + } 65 + 66 + time2 = get_cycles(); 67 + time = time2 - time1; 68 + 69 + time = div_u64(time, PERF_LOOPS); 70 + printk(" -> %llu cycles\n", (unsigned long long)time); 71 + 72 + printk(KERN_ALERT "interval tree search"); 73 + 74 + for (j = 0; j < NODES; j++) 75 + interval_tree_insert(nodes + j, &root); 76 + 77 + time1 = get_cycles(); 78 + 79 + results = 0; 80 + for (i = 0; i < SEARCH_LOOPS; i++) 81 + for (j = 0; j < SEARCHES; j++) 82 + results += search(queries[j], &root); 83 + 84 + time2 = get_cycles(); 85 + time = time2 - time1; 86 + 87 + time = div_u64(time, SEARCH_LOOPS); 88 + results = div_u64(results, SEARCH_LOOPS); 89 + printk(" -> %llu cycles (%lu results)\n", 90 + (unsigned long long)time, results); 91 + 92 + return -EAGAIN; /* Fail will directly unload the module */ 93 + } 94 + 95 + static void interval_tree_test_exit(void) 96 + { 97 + printk(KERN_ALERT "test exit\n"); 98 + } 99 + 100 + module_init(interval_tree_test_init) 101 + module_exit(interval_tree_test_exit) 102 + 103 + MODULE_LICENSE("GPL"); 104 + MODULE_AUTHOR("Michel Lespinasse"); 105 + MODULE_DESCRIPTION("Interval Tree test");
+4
lib/prio_tree.c
··· 14 14 #include <linux/init.h> 15 15 #include <linux/mm.h> 16 16 #include <linux/prio_tree.h> 17 + #include <linux/export.h> 17 18 18 19 /* 19 20 * A clever mix of heap and radix trees forms a radix priority search tree (PST) ··· 242 241 BUG(); 243 242 return NULL; 244 243 } 244 + EXPORT_SYMBOL(prio_tree_insert); 245 245 246 246 /* 247 247 * Remove a prio_tree_node @node from a radix priority search tree @root. The ··· 292 290 while (cur != node) 293 291 cur = prio_tree_replace(root, cur->parent, cur); 294 292 } 293 + EXPORT_SYMBOL(prio_tree_remove); 295 294 296 295 static void iter_walk_down(struct prio_tree_iter *iter) 297 296 { ··· 467 464 468 465 goto repeat; 469 466 } 467 + EXPORT_SYMBOL(prio_tree_next);
+106
lib/prio_tree_test.c
··· 1 + #include <linux/module.h> 2 + #include <linux/prio_tree.h> 3 + #include <linux/random.h> 4 + #include <asm/timex.h> 5 + 6 + #define NODES 100 7 + #define PERF_LOOPS 100000 8 + #define SEARCHES 100 9 + #define SEARCH_LOOPS 10000 10 + 11 + static struct prio_tree_root root; 12 + static struct prio_tree_node nodes[NODES]; 13 + static u32 queries[SEARCHES]; 14 + 15 + static struct rnd_state rnd; 16 + 17 + static inline unsigned long 18 + search(unsigned long query, struct prio_tree_root *root) 19 + { 20 + struct prio_tree_iter iter; 21 + unsigned long results = 0; 22 + 23 + prio_tree_iter_init(&iter, root, query, query); 24 + while (prio_tree_next(&iter)) 25 + results++; 26 + return results; 27 + } 28 + 29 + static void init(void) 30 + { 31 + int i; 32 + for (i = 0; i < NODES; i++) { 33 + u32 a = prandom32(&rnd), b = prandom32(&rnd); 34 + if (a <= b) { 35 + nodes[i].start = a; 36 + nodes[i].last = b; 37 + } else { 38 + nodes[i].start = b; 39 + nodes[i].last = a; 40 + } 41 + } 42 + for (i = 0; i < SEARCHES; i++) 43 + queries[i] = prandom32(&rnd); 44 + } 45 + 46 + static int prio_tree_test_init(void) 47 + { 48 + int i, j; 49 + unsigned long results; 50 + cycles_t time1, time2, time; 51 + 52 + printk(KERN_ALERT "prio tree insert/remove"); 53 + 54 + prandom32_seed(&rnd, 3141592653589793238ULL); 55 + INIT_PRIO_TREE_ROOT(&root); 56 + init(); 57 + 58 + time1 = get_cycles(); 59 + 60 + for (i = 0; i < PERF_LOOPS; i++) { 61 + for (j = 0; j < NODES; j++) 62 + prio_tree_insert(&root, nodes + j); 63 + for (j = 0; j < NODES; j++) 64 + prio_tree_remove(&root, nodes + j); 65 + } 66 + 67 + time2 = get_cycles(); 68 + time = time2 - time1; 69 + 70 + time = div_u64(time, PERF_LOOPS); 71 + printk(" -> %llu cycles\n", (unsigned long long)time); 72 + 73 + printk(KERN_ALERT "prio tree search"); 74 + 75 + for (j = 0; j < NODES; j++) 76 + prio_tree_insert(&root, nodes + j); 77 + 78 + time1 = get_cycles(); 79 + 80 + results = 0; 81 + for (i = 0; i < SEARCH_LOOPS; i++) 82 + for (j = 0; j < SEARCHES; j++) 83 + results += search(queries[j], &root); 84 + 85 + time2 = get_cycles(); 86 + time = time2 - time1; 87 + 88 + time = div_u64(time, SEARCH_LOOPS); 89 + results = div_u64(results, SEARCH_LOOPS); 90 + printk(" -> %llu cycles (%lu results)\n", 91 + (unsigned long long)time, results); 92 + 93 + return -EAGAIN; /* Fail will directly unload the module */ 94 + } 95 + 96 + static void prio_tree_test_exit(void) 97 + { 98 + printk(KERN_ALERT "test exit\n"); 99 + } 100 + 101 + module_init(prio_tree_test_init) 102 + module_exit(prio_tree_test_exit) 103 + 104 + MODULE_LICENSE("GPL"); 105 + MODULE_AUTHOR("Michel Lespinasse"); 106 + MODULE_DESCRIPTION("Prio Tree test");