code complexity & repetition analysis tool

Clone Detection#

What is Code Cloning?#

Code clones are similar or identical code fragments that appear in multiple places. They indicate duplication and potential refactoring opportunities.

How Mccabre Detects Clones#

Mccabre uses Rabin-Karp rolling hash, a fast string matching algorithm adapted for token sequences.

Algorithm Overview#

  1. Tokenization: Convert source code to tokens
  2. Windowing: Slide a window of N tokens across the sequence
  3. Hashing: Compute a rolling hash for each window
  4. Matching: Identify windows with identical hashes
  5. Reporting: Group matches into clone groups

Why This Approach?#

Advantages:

  • Fast: O(n) time complexity
  • Language-agnostic: Works on tokens, not syntax trees
  • Tunable: Adjust window size to find smaller or larger clones

Trade-offs:

  • Finds exact token matches only
  • Doesn't detect semantic equivalence
  • May miss clones with renamed variables

Using Clone Detection#

Basic Usage#

mccabre clones src/

Adjust Sensitivity#

The --min-tokens flag controls the minimum clone size:

# Find larger clones (more strict)
mccabre clones src/ --min-tokens 50

# Find smaller clones (more sensitive)
mccabre clones src/ --min-tokens 15

Sample Output#

DETECTED CLONES
--------------------------------------------------------------------------------
Clone Group #1 (length: 32 tokens, 3 occurrences)
  - src/user.go:15-28
  - src/product.go:42-55
  - src/order.go:88-101

Clone Group #2 (length: 45 tokens, 2 occurrences)
  - src/validators.rs:120-145
  - src/sanitizers.rs:67-92

Interpreting Results#

Clone Group Fields#

  • ID: Unique identifier for the clone group
  • Length: Number of tokens in the duplicated sequence
  • Occurrences: How many times this clone appears
  • Locations: File paths and line ranges

Significance#

Tokens Significance Action
15-25 Minor duplication Consider refactoring if repeated 3+ times
26-50 Moderate duplication Should refactor
50+ Major duplication Urgent refactoring needed

Refactoring Clones#

Example: Extract Function#

Before:

// In file1.go
func processUser(input string) string {
    trimmed := strings.TrimSpace(input)
    if len(trimmed) == 0 {
        return ""
    }
    lower := strings.ToLower(trimmed)
    return lower
}

// In file2.go
func processProduct(name string) string {
    trimmed := strings.TrimSpace(name)
    if len(trimmed) == 0 {
        return ""
    }
    lower := strings.ToLower(trimmed)
    return lower
}

After:

// In utils.go
func sanitizeString(input string) string {
    trimmed := strings.TrimSpace(input)
    if len(trimmed) == 0 {
        return ""
    }
    return strings.ToLower(trimmed)
}

// In file1.go
func processUser(input string) string {
    return sanitizeString(input)
}

// In file2.go
func processProduct(name string) string {
    return sanitizeString(name)
}

Example: Extract Class/Module#

Before: Multiple files with similar validation logic

After: Single validation module imported by all files

Types of Clones#

Type 1: Exact Clones#

Identical code except for whitespace and comments.

// Clone 1
function calc(a, b) {
    return a + b;
}

// Clone 2
function calc(a, b) {
    return a + b;
}

Mccabre detects these

Type 2: Renamed Clones#

Identical except for variable/function names.

// Clone 1
function add(x, y) {
    return x + y;
}

// Clone 2
function sum(a, b) {
    return a + b;
}

Mccabre does NOT detect these (yet)

Type 3: Near-Miss Clones#

Similar structure with minor modifications.

// Clone 1
function validate(user) {
    if (!user.email) return false;
    if (!user.name) return false;
    return true;
}

// Clone 2
function validate(product) {
    if (!product.id) return false;
    if (!product.price) return false;
    if (!product.name) return false;
    return true;
}

Mccabre does NOT detect these

Type 4: Semantic Clones#

Different syntax, same behavior.

// Clone 1
const sum = arr.reduce((a, b) => a + b, 0);

// Clone 2
let sum = 0;
for (let num of arr) {
    sum += num;
}

Mccabre does NOT detect these

Configuration#

Via Command Line#

mccabre clones . --min-tokens 30

Via Config File#

Create mccabre.toml:

[clones]
enabled = true
min_tokens = 30

JSON Output#

mccabre clones src/ --json
{
  "clones": [
    {
      "id": 1,
      "length": 32,
      "locations": [
        {
          "file": "src/user.go",
          "start_line": 15,
          "end_line": 28
        },
        {
          "file": "src/product.go",
          "start_line": 42,
          "end_line": 55
        }
      ]
    }
  ]
}

References#

See Also#