code complexity & repetition analysis tool
at main 276 lines 5.3 kB view raw view rendered
1# Clone Detection 2 3## What is Code Cloning? 4 5Code clones are similar or identical code fragments that appear in multiple places. They indicate duplication and potential refactoring opportunities. 6 7## How Mccabre Detects Clones 8 9Mccabre uses **Rabin-Karp rolling hash**, a fast string matching algorithm adapted for token sequences. 10 11### Algorithm Overview 12 131. **Tokenization**: Convert source code to tokens 142. **Windowing**: Slide a window of N tokens across the sequence 153. **Hashing**: Compute a rolling hash for each window 164. **Matching**: Identify windows with identical hashes 175. **Reporting**: Group matches into clone groups 18 19### Why This Approach? 20 21**Advantages:** 22 23- **Fast**: O(n) time complexity 24- **Language-agnostic**: Works on tokens, not syntax trees 25- **Tunable**: Adjust window size to find smaller or larger clones 26 27**Trade-offs:** 28 29- Finds exact token matches only 30- Doesn't detect semantic equivalence 31- May miss clones with renamed variables 32 33## Using Clone Detection 34 35### Basic Usage 36 37```bash 38mccabre clones src/ 39``` 40 41### Adjust Sensitivity 42 43The `--min-tokens` flag controls the minimum clone size: 44 45```bash 46# Find larger clones (more strict) 47mccabre clones src/ --min-tokens 50 48 49# Find smaller clones (more sensitive) 50mccabre clones src/ --min-tokens 15 51``` 52 53### Sample Output 54 55```text 56DETECTED CLONES 57-------------------------------------------------------------------------------- 58Clone Group #1 (length: 32 tokens, 3 occurrences) 59 - src/user.go:15-28 60 - src/product.go:42-55 61 - src/order.go:88-101 62 63Clone Group #2 (length: 45 tokens, 2 occurrences) 64 - src/validators.rs:120-145 65 - src/sanitizers.rs:67-92 66``` 67 68## Interpreting Results 69 70### Clone Group Fields 71 72- **ID**: Unique identifier for the clone group 73- **Length**: Number of tokens in the duplicated sequence 74- **Occurrences**: How many times this clone appears 75- **Locations**: File paths and line ranges 76 77### Significance 78 79| Tokens | Significance | Action | 80|--------|-------------|--------| 81| 15-25 | Minor duplication | Consider refactoring if repeated 3+ times | 82| 26-50 | Moderate duplication | Should refactor | 83| 50+ | Major duplication | Urgent refactoring needed | 84 85## Refactoring Clones 86 87### Example: Extract Function 88 89**Before:** 90 91```go 92// In file1.go 93func processUser(input string) string { 94 trimmed := strings.TrimSpace(input) 95 if len(trimmed) == 0 { 96 return "" 97 } 98 lower := strings.ToLower(trimmed) 99 return lower 100} 101 102// In file2.go 103func processProduct(name string) string { 104 trimmed := strings.TrimSpace(name) 105 if len(trimmed) == 0 { 106 return "" 107 } 108 lower := strings.ToLower(trimmed) 109 return lower 110} 111``` 112 113**After:** 114 115```go 116// In utils.go 117func sanitizeString(input string) string { 118 trimmed := strings.TrimSpace(input) 119 if len(trimmed) == 0 { 120 return "" 121 } 122 return strings.ToLower(trimmed) 123} 124 125// In file1.go 126func processUser(input string) string { 127 return sanitizeString(input) 128} 129 130// In file2.go 131func processProduct(name string) string { 132 return sanitizeString(name) 133} 134``` 135 136### Example: Extract Class/Module 137 138**Before:** Multiple files with similar validation logic 139 140**After:** Single `validation` module imported by all files 141 142## Types of Clones 143 144### Type 1: Exact Clones 145 146Identical code except for whitespace and comments. 147 148```javascript 149// Clone 1 150function calc(a, b) { 151 return a + b; 152} 153 154// Clone 2 155function calc(a, b) { 156 return a + b; 157} 158``` 159 160**Mccabre detects these** 161 162### Type 2: Renamed Clones 163 164Identical except for variable/function names. 165 166```javascript 167// Clone 1 168function add(x, y) { 169 return x + y; 170} 171 172// Clone 2 173function sum(a, b) { 174 return a + b; 175} 176``` 177 178**Mccabre does NOT detect these** (yet) 179 180### Type 3: Near-Miss Clones 181 182Similar structure with minor modifications. 183 184```javascript 185// Clone 1 186function validate(user) { 187 if (!user.email) return false; 188 if (!user.name) return false; 189 return true; 190} 191 192// Clone 2 193function validate(product) { 194 if (!product.id) return false; 195 if (!product.price) return false; 196 if (!product.name) return false; 197 return true; 198} 199``` 200 201**Mccabre does NOT detect these** 202 203### Type 4: Semantic Clones 204 205Different syntax, same behavior. 206 207```javascript 208// Clone 1 209const sum = arr.reduce((a, b) => a + b, 0); 210 211// Clone 2 212let sum = 0; 213for (let num of arr) { 214 sum += num; 215} 216``` 217 218**Mccabre does NOT detect these** 219 220## Configuration 221 222### Via Command Line 223 224```bash 225mccabre clones . --min-tokens 30 226``` 227 228### Via Config File 229 230Create `mccabre.toml`: 231 232```toml 233[clones] 234enabled = true 235min_tokens = 30 236``` 237 238## JSON Output 239 240```bash 241mccabre clones src/ --json 242``` 243 244```json 245{ 246 "clones": [ 247 { 248 "id": 1, 249 "length": 32, 250 "locations": [ 251 { 252 "file": "src/user.go", 253 "start_line": 15, 254 "end_line": 28 255 }, 256 { 257 "file": "src/product.go", 258 "start_line": 42, 259 "end_line": 55 260 } 261 ] 262 } 263 ] 264} 265``` 266 267## References 268 269- [Rabin-Karp Algorithm](https://en.wikipedia.org/wiki/Rabin%E2%80%93Karp_algorithm) 270- [Code Clone Research](https://www.sei.cmu.edu/library/code-similarity-detection-using-syntax-agnostic-locality-sensitive-hashing/) 271 272## See Also 273 274- [Cyclomatic Complexity](./cyclomatic-complexity.md) 275- [CLI Reference](./cli-reference.md) 276- [Examples](./examples.md)