Clone Detection#
What is Code Cloning?#
Code clones are similar or identical code fragments that appear in multiple places. They indicate duplication and potential refactoring opportunities.
How Mccabre Detects Clones#
Mccabre uses Rabin-Karp rolling hash, a fast string matching algorithm adapted for token sequences.
Algorithm Overview#
- Tokenization: Convert source code to tokens
- Windowing: Slide a window of N tokens across the sequence
- Hashing: Compute a rolling hash for each window
- Matching: Identify windows with identical hashes
- Reporting: Group matches into clone groups
Why This Approach?#
Advantages:
- Fast: O(n) time complexity
- Language-agnostic: Works on tokens, not syntax trees
- Tunable: Adjust window size to find smaller or larger clones
Trade-offs:
- Finds exact token matches only
- Doesn't detect semantic equivalence
- May miss clones with renamed variables
Using Clone Detection#
Basic Usage#
mccabre clones src/
Adjust Sensitivity#
The --min-tokens flag controls the minimum clone size:
# Find larger clones (more strict)
mccabre clones src/ --min-tokens 50
# Find smaller clones (more sensitive)
mccabre clones src/ --min-tokens 15
Sample Output#
DETECTED CLONES
--------------------------------------------------------------------------------
Clone Group #1 (length: 32 tokens, 3 occurrences)
- src/user.go:15-28
- src/product.go:42-55
- src/order.go:88-101
Clone Group #2 (length: 45 tokens, 2 occurrences)
- src/validators.rs:120-145
- src/sanitizers.rs:67-92
Interpreting Results#
Clone Group Fields#
- ID: Unique identifier for the clone group
- Length: Number of tokens in the duplicated sequence
- Occurrences: How many times this clone appears
- Locations: File paths and line ranges
Significance#
| Tokens | Significance | Action |
|---|---|---|
| 15-25 | Minor duplication | Consider refactoring if repeated 3+ times |
| 26-50 | Moderate duplication | Should refactor |
| 50+ | Major duplication | Urgent refactoring needed |
Refactoring Clones#
Example: Extract Function#
Before:
// In file1.go
func processUser(input string) string {
trimmed := strings.TrimSpace(input)
if len(trimmed) == 0 {
return ""
}
lower := strings.ToLower(trimmed)
return lower
}
// In file2.go
func processProduct(name string) string {
trimmed := strings.TrimSpace(name)
if len(trimmed) == 0 {
return ""
}
lower := strings.ToLower(trimmed)
return lower
}
After:
// In utils.go
func sanitizeString(input string) string {
trimmed := strings.TrimSpace(input)
if len(trimmed) == 0 {
return ""
}
return strings.ToLower(trimmed)
}
// In file1.go
func processUser(input string) string {
return sanitizeString(input)
}
// In file2.go
func processProduct(name string) string {
return sanitizeString(name)
}
Example: Extract Class/Module#
Before: Multiple files with similar validation logic
After: Single validation module imported by all files
Types of Clones#
Type 1: Exact Clones#
Identical code except for whitespace and comments.
// Clone 1
function calc(a, b) {
return a + b;
}
// Clone 2
function calc(a, b) {
return a + b;
}
✅ Mccabre detects these
Type 2: Renamed Clones#
Identical except for variable/function names.
// Clone 1
function add(x, y) {
return x + y;
}
// Clone 2
function sum(a, b) {
return a + b;
}
❌ Mccabre does NOT detect these (yet)
Type 3: Near-Miss Clones#
Similar structure with minor modifications.
// Clone 1
function validate(user) {
if (!user.email) return false;
if (!user.name) return false;
return true;
}
// Clone 2
function validate(product) {
if (!product.id) return false;
if (!product.price) return false;
if (!product.name) return false;
return true;
}
❌ Mccabre does NOT detect these
Type 4: Semantic Clones#
Different syntax, same behavior.
// Clone 1
const sum = arr.reduce((a, b) => a + b, 0);
// Clone 2
let sum = 0;
for (let num of arr) {
sum += num;
}
❌ Mccabre does NOT detect these
Configuration#
Via Command Line#
mccabre clones . --min-tokens 30
Via Config File#
Create mccabre.toml:
[clones]
enabled = true
min_tokens = 30
JSON Output#
mccabre clones src/ --json
{
"clones": [
{
"id": 1,
"length": 32,
"locations": [
{
"file": "src/user.go",
"start_line": 15,
"end_line": 28
},
{
"file": "src/product.go",
"start_line": 42,
"end_line": 55
}
]
}
]
}