code complexity & repetition analysis tool
1# Clone Detection
2
3## What is Code Cloning?
4
5Code clones are similar or identical code fragments that appear in multiple places. They indicate duplication and potential refactoring opportunities.
6
7## How Mccabre Detects Clones
8
9Mccabre uses **Rabin-Karp rolling hash**, a fast string matching algorithm adapted for token sequences.
10
11### Algorithm Overview
12
131. **Tokenization**: Convert source code to tokens
142. **Windowing**: Slide a window of N tokens across the sequence
153. **Hashing**: Compute a rolling hash for each window
164. **Matching**: Identify windows with identical hashes
175. **Reporting**: Group matches into clone groups
18
19### Why This Approach?
20
21**Advantages:**
22
23- **Fast**: O(n) time complexity
24- **Language-agnostic**: Works on tokens, not syntax trees
25- **Tunable**: Adjust window size to find smaller or larger clones
26
27**Trade-offs:**
28
29- Finds exact token matches only
30- Doesn't detect semantic equivalence
31- May miss clones with renamed variables
32
33## Using Clone Detection
34
35### Basic Usage
36
37```bash
38mccabre clones src/
39```
40
41### Adjust Sensitivity
42
43The `--min-tokens` flag controls the minimum clone size:
44
45```bash
46# Find larger clones (more strict)
47mccabre clones src/ --min-tokens 50
48
49# Find smaller clones (more sensitive)
50mccabre clones src/ --min-tokens 15
51```
52
53### Sample Output
54
55```text
56DETECTED CLONES
57--------------------------------------------------------------------------------
58Clone Group #1 (length: 32 tokens, 3 occurrences)
59 - src/user.go:15-28
60 - src/product.go:42-55
61 - src/order.go:88-101
62
63Clone Group #2 (length: 45 tokens, 2 occurrences)
64 - src/validators.rs:120-145
65 - src/sanitizers.rs:67-92
66```
67
68## Interpreting Results
69
70### Clone Group Fields
71
72- **ID**: Unique identifier for the clone group
73- **Length**: Number of tokens in the duplicated sequence
74- **Occurrences**: How many times this clone appears
75- **Locations**: File paths and line ranges
76
77### Significance
78
79| Tokens | Significance | Action |
80|--------|-------------|--------|
81| 15-25 | Minor duplication | Consider refactoring if repeated 3+ times |
82| 26-50 | Moderate duplication | Should refactor |
83| 50+ | Major duplication | Urgent refactoring needed |
84
85## Refactoring Clones
86
87### Example: Extract Function
88
89**Before:**
90
91```go
92// In file1.go
93func processUser(input string) string {
94 trimmed := strings.TrimSpace(input)
95 if len(trimmed) == 0 {
96 return ""
97 }
98 lower := strings.ToLower(trimmed)
99 return lower
100}
101
102// In file2.go
103func processProduct(name string) string {
104 trimmed := strings.TrimSpace(name)
105 if len(trimmed) == 0 {
106 return ""
107 }
108 lower := strings.ToLower(trimmed)
109 return lower
110}
111```
112
113**After:**
114
115```go
116// In utils.go
117func sanitizeString(input string) string {
118 trimmed := strings.TrimSpace(input)
119 if len(trimmed) == 0 {
120 return ""
121 }
122 return strings.ToLower(trimmed)
123}
124
125// In file1.go
126func processUser(input string) string {
127 return sanitizeString(input)
128}
129
130// In file2.go
131func processProduct(name string) string {
132 return sanitizeString(name)
133}
134```
135
136### Example: Extract Class/Module
137
138**Before:** Multiple files with similar validation logic
139
140**After:** Single `validation` module imported by all files
141
142## Types of Clones
143
144### Type 1: Exact Clones
145
146Identical code except for whitespace and comments.
147
148```javascript
149// Clone 1
150function calc(a, b) {
151 return a + b;
152}
153
154// Clone 2
155function calc(a, b) {
156 return a + b;
157}
158```
159
160✅ **Mccabre detects these**
161
162### Type 2: Renamed Clones
163
164Identical except for variable/function names.
165
166```javascript
167// Clone 1
168function add(x, y) {
169 return x + y;
170}
171
172// Clone 2
173function sum(a, b) {
174 return a + b;
175}
176```
177
178❌ **Mccabre does NOT detect these** (yet)
179
180### Type 3: Near-Miss Clones
181
182Similar structure with minor modifications.
183
184```javascript
185// Clone 1
186function validate(user) {
187 if (!user.email) return false;
188 if (!user.name) return false;
189 return true;
190}
191
192// Clone 2
193function validate(product) {
194 if (!product.id) return false;
195 if (!product.price) return false;
196 if (!product.name) return false;
197 return true;
198}
199```
200
201❌ **Mccabre does NOT detect these**
202
203### Type 4: Semantic Clones
204
205Different syntax, same behavior.
206
207```javascript
208// Clone 1
209const sum = arr.reduce((a, b) => a + b, 0);
210
211// Clone 2
212let sum = 0;
213for (let num of arr) {
214 sum += num;
215}
216```
217
218❌ **Mccabre does NOT detect these**
219
220## Configuration
221
222### Via Command Line
223
224```bash
225mccabre clones . --min-tokens 30
226```
227
228### Via Config File
229
230Create `mccabre.toml`:
231
232```toml
233[clones]
234enabled = true
235min_tokens = 30
236```
237
238## JSON Output
239
240```bash
241mccabre clones src/ --json
242```
243
244```json
245{
246 "clones": [
247 {
248 "id": 1,
249 "length": 32,
250 "locations": [
251 {
252 "file": "src/user.go",
253 "start_line": 15,
254 "end_line": 28
255 },
256 {
257 "file": "src/product.go",
258 "start_line": 42,
259 "end_line": 55
260 }
261 ]
262 }
263 ]
264}
265```
266
267## References
268
269- [Rabin-Karp Algorithm](https://en.wikipedia.org/wiki/Rabin%E2%80%93Karp_algorithm)
270- [Code Clone Research](https://www.sei.cmu.edu/library/code-similarity-detection-using-syntax-agnostic-locality-sensitive-hashing/)
271
272## See Also
273
274- [Cyclomatic Complexity](./cyclomatic-complexity.md)
275- [CLI Reference](./cli-reference.md)
276- [Examples](./examples.md)