Union-Find: add a new module in kernel library

Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

kernel os linux

This patch implements a union-find data structure in the kernel library,
which includes operations for allocating nodes, freeing nodes,
finding the root of a node, and merging two nodes.

Signed-off-by: Xavier <xavier_qy@163.com>
Signed-off-by: Tejun Heo <tj@kernel.org>

authored by

Xavier and committed by

Tejun Heo 2 years ago 93c8332c 4a711dd9

+289 -1

6 changed files

expand all

Documentation

core-api

union_find.rst

translations

zh_CN

core-api

union_find.rst

MAINTAINERS

include

linux

union_find.h

lib

Makefile

union_find.c

+102

Documentation/core-api/union_find.rst

··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + ==================== 4 + Union-Find in Linux 5 + ==================== 6 + 7 + 8 + :Date: June 21, 2024 9 + :Author: Xavier <xavier_qy@163.com> 10 + 11 + What is union-find, and what is it used for? 12 + ------------------------------------------------ 13 + 14 + Union-find is a data structure used to handle the merging and querying 15 + of disjoint sets. The primary operations supported by union-find are: 16 + 17 + Initialization: Resetting each element as an individual set, with 18 + each set's initial parent node pointing to itself. 19 + Find: Determine which set a particular element belongs to, usually by 20 + returning a “representative element” of that set. This operation 21 + is used to check if two elements are in the same set. 22 + Union: Merge two sets into one. 23 + 24 + As a data structure used to maintain sets (groups), union-find is commonly 25 + utilized to solve problems related to offline queries, dynamic connectivity, 26 + and graph theory. It is also a key component in Kruskal's algorithm for 27 + computing the minimum spanning tree, which is crucial in scenarios like 28 + network routing. Consequently, union-find is widely referenced. Additionally, 29 + union-find has applications in symbolic computation, register allocation, 30 + and more. 31 + 32 + Space Complexity: O(n), where n is the number of nodes. 33 + 34 + Time Complexity: Using path compression can reduce the time complexity of 35 + the find operation, and using union by rank can reduce the time complexity 36 + of the union operation. These optimizations reduce the average time 37 + complexity of each find and union operation to O(α(n)), where α(n) is the 38 + inverse Ackermann function. This can be roughly considered a constant time 39 + complexity for practical purposes. 40 + 41 + This document covers use of the Linux union-find implementation. For more 42 + information on the nature and implementation of union-find, see: 43 + 44 + Wikipedia entry on union-find 45 + https://en.wikipedia.org/wiki/Disjoint-set_data_structure 46 + 47 + Linux implementation of union-find 48 + ----------------------------------- 49 + 50 + Linux's union-find implementation resides in the file "lib/union_find.c". 51 + To use it, "#include <linux/union_find.h>". 52 + 53 + The union-find data structure is defined as follows:: 54 + 55 + struct uf_node { 56 + struct uf_node *parent; 57 + unsigned int rank; 58 + }; 59 + 60 + In this structure, parent points to the parent node of the current node. 61 + The rank field represents the height of the current tree. During a union 62 + operation, the tree with the smaller rank is attached under the tree with the 63 + larger rank to maintain balance. 64 + 65 + Initializing union-find 66 + -------------------- 67 + 68 + You can complete the initialization using either static or initialization 69 + interface. Initialize the parent pointer to point to itself and set the rank 70 + to 0. 71 + Example:: 72 + 73 + struct uf_node my_node = UF_INIT_NODE(my_node); 74 + or 75 + uf_node_init(&my_node); 76 + 77 + Find the Root Node of union-find 78 + -------------------------------- 79 + 80 + This operation is mainly used to determine whether two nodes belong to the same 81 + set in the union-find. If they have the same root, they are in the same set. 82 + During the find operation, path compression is performed to improve the 83 + efficiency of subsequent find operations. 84 + Example:: 85 + 86 + int connected; 87 + struct uf_node *root1 = uf_find(&node_1); 88 + struct uf_node *root2 = uf_find(&node_2); 89 + if (root1 == root2) 90 + connected = 1; 91 + else 92 + connected = 0; 93 + 94 + Union Two Sets in union-find 95 + ---------------------------- 96 + 97 + To union two sets in the union-find, you first find their respective root nodes 98 + and then link the smaller node to the larger node based on the rank of the root 99 + nodes. 100 + Example:: 101 + 102 + uf_union(&node_1, &node_2);

+87

Documentation/translations/zh_CN/core-api/union_find.rst

··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + .. include:: ../disclaimer-zh_CN.rst 3 + 4 + :Original: Documentation/core-api/union_find.rst 5 + 6 + =========================== 7 + Linux中的并查集（Union-Find） 8 + =========================== 9 + 10 + 11 + :日期: 2024年6月21日 12 + :作者: Xavier <xavier_qy@163.com> 13 + 14 + 何为并查集，它有什么用？ 15 + --------------------- 16 + 17 + 并查集是一种数据结构，用于处理一些不交集的合并及查询问题。并查集支持的主要操作： 18 + 初始化：将每个元素初始化为单独的集合，每个集合的初始父节点指向自身 19 + 查询：查询某个元素属于哪个集合，通常是返回集合中的一个“代表元素”。这个操作是为 20 + 了判断两个元素是否在同一个集合之中。 21 + 合并：将两个集合合并为一个。 22 + 23 + 并查集作为一种用于维护集合（组）的数据结构，它通常用于解决一些离线查询、动态连通性和 24 + 图论等相关问题，同时也是用于计算最小生成树的克鲁斯克尔算法中的关键，由于最小生成树在 25 + 网络路由等场景下十分重要，并查集也得到了广泛的引用。此外，并查集在符号计算，寄存器分 26 + 配等方面也有应用。 27 + 28 + 空间复杂度: O(n)，n为节点数。 29 + 30 + 时间复杂度：使用路径压缩可以减少查找操作的时间复杂度，使用按秩合并可以减少合并操作的 31 + 时间复杂度，使得并查集每个查询和合并操作的平均时间复杂度仅为O(α(n))，其中α(n)是反阿 32 + 克曼函数，可以粗略地认为并查集的操作有常数的时间复杂度。 33 + 34 + 本文档涵盖了对Linux并查集实现的使用方法。更多关于并查集的性质和实现的信息，参见： 35 + 36 + 维基百科并查集词条 37 + https://en.wikipedia.org/wiki/Disjoint-set_data_structure 38 + 39 + 并查集的Linux实现 40 + ---------------- 41 + 42 + Linux的并查集实现在文件“lib/union_find.c”中。要使用它，需要 43 + “#include <linux/union_find.h>”。 44 + 45 + 并查集的数据结构定义如下:: 46 + 47 + struct uf_node { 48 + struct uf_node *parent; 49 + unsigned int rank; 50 + }; 51 + 其中parent为当前节点的父节点，rank为当前树的高度，在合并时将rank小的节点接到rank大 52 + 的节点下面以增加平衡性。 53 + 54 + 初始化并查集 55 + --------- 56 + 57 + 可以采用静态或初始化接口完成初始化操作。初始化时，parent 指针指向自身，rank 设置 58 + 为 0。 59 + 示例:: 60 + 61 + struct uf_node my_node = UF_INIT_NODE(my_node); 62 + 或 63 + uf_node_init(&my_node); 64 + 65 + 查找并查集的根节点 66 + ---------------- 67 + 68 + 主要用于判断两个并查集是否属于一个集合，如果根相同，那么他们就是一个集合。在查找过程中 69 + 会对路径进行压缩，提高后续查找效率。 70 + 示例:: 71 + 72 + int connected; 73 + struct uf_node *root1 = uf_find(&node_1); 74 + struct uf_node *root2 = uf_find(&node_2); 75 + if (root1 == root2) 76 + connected = 1; 77 + else 78 + connected = 0; 79 + 80 + 合并两个并查集 81 + ------------- 82 + 83 + 对于两个相交的并查集进行合并，会首先查找它们各自的根节点，然后根据根节点秩大小，将小的 84 + 节点连接到大的节点下面。 85 + 示例:: 86 + 87 + uf_union(&node_1, &node_2);

MAINTAINERS

··· 23458 23458 F: include/linux/cdrom.h 23459 23459 F: include/uapi/linux/cdrom.h 23460 23460 23461 + UNION-FIND 23462 + M: Xavier <xavier_qy@163.com> 23463 + L: linux-kernel@vger.kernel.org 23464 + S: Maintained 23465 + F: Documentation/core-api/union_find.rst 23466 + F: Documentation/translations/zh_CN/core-api/union_find.rst 23467 + F: include/linux/union_find.h 23468 + F: lib/union_find.c 23469 + 23461 23470 UNIVERSAL FLASH STORAGE HOST CONTROLLER DRIVER 23462 23471 R: Alim Akhtar <alim.akhtar@samsung.com> 23463 23472 R: Avri Altman <avri.altman@wdc.com>

+41

include/linux/union_find.h

··· 1 + /* SPDX-License-Identifier: GPL-2.0 */ 2 + #ifndef __LINUX_UNION_FIND_H 3 + #define __LINUX_UNION_FIND_H 4 + /** 5 + * union_find.h - union-find data structure implementation 6 + * 7 + * This header provides functions and structures to implement the union-find 8 + * data structure. The union-find data structure is used to manage disjoint 9 + * sets and supports efficient union and find operations. 10 + * 11 + * See Documentation/core-api/union_find.rst for documentation and samples. 12 + */ 13 + 14 + struct uf_node { 15 + struct uf_node *parent; 16 + unsigned int rank; 17 + }; 18 + 19 + /* This macro is used for static initialization of a union-find node. */ 20 + #define UF_INIT_NODE(node) {.parent = &node, .rank = 0} 21 + 22 + /** 23 + * uf_node_init - Initialize a union-find node 24 + * @node: pointer to the union-find node to be initialized 25 + * 26 + * This function sets the parent of the node to itself and 27 + * initializes its rank to 0. 28 + */ 29 + static inline void uf_node_init(struct uf_node *node) 30 + { 31 + node->parent = node; 32 + node->rank = 0; 33 + } 34 + 35 + /* find the root of a node */ 36 + struct uf_node *uf_find(struct uf_node *node); 37 + 38 + /* Merge two intersecting nodes */ 39 + void uf_union(struct uf_node *node1, struct uf_node *node2); 40 + 41 + #endif /* __LINUX_UNION_FIND_H */

+1 -1

lib/Makefile

··· 34 34 is_single_threaded.o plist.o decompress.o kobject_uevent.o \ 35 35 earlycpio.o seq_buf.o siphash.o dec_and_lock.o \ 36 36 nmi_backtrace.o win_minmax.o memcat_p.o \ 37 - buildid.o objpool.o 37 + buildid.o objpool.o union_find.o 38 38 39 39 lib-$(CONFIG_PRINTK) += dump_stack.o 40 40 lib-$(CONFIG_SMP) += cpumask.o

+49

lib/union_find.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + #include <linux/union_find.h> 3 + 4 + /** 5 + * uf_find - Find the root of a node and perform path compression 6 + * @node: the node to find the root of 7 + * 8 + * This function returns the root of the node by following the parent 9 + * pointers. It also performs path compression, making the tree shallower. 10 + * 11 + * Returns the root node of the set containing node. 12 + */ 13 + struct uf_node *uf_find(struct uf_node *node) 14 + { 15 + struct uf_node *parent; 16 + 17 + while (node->parent != node) { 18 + parent = node->parent; 19 + node->parent = parent->parent; 20 + node = parent; 21 + } 22 + return node; 23 + } 24 + 25 + /** 26 + * uf_union - Merge two sets, using union by rank 27 + * @node1: the first node 28 + * @node2: the second node 29 + * 30 + * This function merges the sets containing node1 and node2, by comparing 31 + * the ranks to keep the tree balanced. 32 + */ 33 + void uf_union(struct uf_node *node1, struct uf_node *node2) 34 + { 35 + struct uf_node *root1 = uf_find(node1); 36 + struct uf_node *root2 = uf_find(node2); 37 + 38 + if (root1 == root2) 39 + return; 40 + 41 + if (root1->rank < root2->rank) { 42 + root1->parent = root2; 43 + } else if (root1->rank > root2->rank) { 44 + root2->parent = root1; 45 + } else { 46 + root2->parent = root1; 47 + root1->rank++; 48 + } 49 + }