copyright detection#
technical documentation for the copyright scanning system.
how it works#
upload completes
│
▼
┌──────────────┐ ┌─────────────────┐ ┌─────────────┐
│ backend │────▶│ moderation │────▶│ AuDD API │
│ (background) │ │ service │ │ │
│ │◀────│ (Rust) │◀────│ │
└──────────────┘ └─────────────────┘ └─────────────┘
│ │
│ │ if flagged
▼ ▼
┌──────────────┐ ┌─────────────────┐
│ copyright_ │ │ ATProto label │
│ scans table │ │ emission │
└──────────────┘ └─────────────────┘
- track upload completes, file stored in R2
- backend calls moderation service
/scanendpoint with R2 URL - moderation service calls AuDD API for music recognition
- results returned to backend, stored in
copyright_scanstable - if flagged, backend calls
/emit-labelto create ATProto label - label stored in moderation service's
labelstable
AuDD API#
AuDD is a music recognition service similar to Shazam. their API scans audio and returns matched songs with confidence scores.
request#
curl -X POST https://api.audd.io/ \
-F "api_token=YOUR_TOKEN" \
-F "url=https://your-r2-bucket.com/audio/abc123.mp3" \
-F "accurate_offsets=1"
response#
{
"status": "success",
"result": [
{
"offset": 0,
"songs": [
{
"artist": "Artist Name",
"title": "Song Title",
"album": "Album Name",
"score": 85,
"isrc": "USRC12345678",
"timecode": "01:30"
}
]
},
{
"offset": 180000,
"songs": [
{
"artist": "Another Artist",
"title": "Another Song",
"score": 72
}
]
}
]
}
pricing#
- $2 per 1000 requests
- 1 request = 12 seconds of audio
- 5-minute track ≈ 25 requests ≈ $0.05
- first 300 requests free
database schema#
backend: copyright_scans table#
CREATE TABLE copyright_scans (
id SERIAL PRIMARY KEY,
track_id INTEGER NOT NULL REFERENCES tracks(id) ON DELETE CASCADE,
is_flagged BOOLEAN NOT NULL DEFAULT FALSE,
highest_score INTEGER NOT NULL DEFAULT 0,
matches JSONB NOT NULL DEFAULT '[]', -- [{artist, title, score, isrc}]
raw_response JSONB NOT NULL DEFAULT '{}', -- full API response
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
UNIQUE(track_id)
);
moderation service: labels table#
CREATE TABLE labels (
id BIGSERIAL PRIMARY KEY,
seq BIGSERIAL UNIQUE NOT NULL, -- monotonic sequence for subscriptions
src TEXT NOT NULL, -- labeler DID
uri TEXT NOT NULL, -- target AT URI
cid TEXT, -- optional target CID
val TEXT NOT NULL, -- label value (e.g., "copyright-violation")
neg BOOLEAN NOT NULL DEFAULT FALSE, -- negation (for revoking labels)
cts TIMESTAMPTZ NOT NULL, -- creation timestamp
exp TIMESTAMPTZ, -- optional expiration
sig BYTEA NOT NULL, -- secp256k1 signature
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
scan result states#
| is_flagged | highest_score | meaning |
|---|---|---|
false |
0 | no matches found |
false |
0 | scan failed (error in raw_response) |
true |
> 0 | matches found, label emitted |
configuration#
backend environment variables#
# moderation service connection
MODERATION_SERVICE_URL=https://moderation.plyr.fm
MODERATION_AUTH_TOKEN=shared_secret_token
MODERATION_TIMEOUT_SECONDS=300
MODERATION_ENABLED=true
# labeler URL (for emitting labels after scan)
MODERATION_LABELER_URL=https://moderation.plyr.fm
moderation service environment variables#
# AuDD API
AUDD_API_KEY=your_audd_token
# database
DATABASE_URL=postgres://...
# labeler identity
LABELER_DID=did:plc:your-labeler-did
LABELER_SIGNING_KEY=hex-encoded-secp256k1-private-key
# auth
MODERATION_AUTH_TOKEN=shared_secret_token
interpreting results#
confidence scores#
AuDD returns a score (0-100) for each match:
| score | meaning |
|---|---|
| 90-100 | very high confidence, almost certainly a match |
| 70-89 | high confidence, likely a match |
| 50-69 | moderate confidence, may be similar but not exact |
| < 50 | low confidence, probably not a match |
default threshold is 70. tracks with any match >= 70 are flagged.
false positives#
common causes:
- generic beats/samples used in multiple songs
- covers or remixes (legal gray area)
- similar chord progressions
- audio artifacts matching by coincidence
this is why we flag but don't enforce. human review is needed.
ISRC codes#
International Standard Recording Code - unique identifier for recordings. when present, this is strong evidence of a specific recording match (not just similar audio).
admin queries#
list all flagged tracks#
SELECT t.id, t.title, a.handle, cf.confidence_score, cf.matched_tracks
FROM copyright_flags cf
JOIN tracks t ON t.id = cf.track_id
JOIN artists a ON a.did = t.artist_did
WHERE cf.status = 'flagged'
ORDER BY cf.confidence_score DESC;
scan statistics#
SELECT
status,
COUNT(*) as count,
AVG(confidence_score) as avg_score
FROM copyright_flags
GROUP BY status;
tracks pending scan#
SELECT t.id, t.title, t.created_at
FROM tracks t
LEFT JOIN copyright_flags cf ON cf.track_id = t.id
WHERE cf.id IS NULL OR cf.status = 'pending'
ORDER BY t.created_at DESC;
querying labels#
labels can be queried via standard ATProto XRPC endpoints:
# query labels for a specific track
curl "https://moderation.plyr.fm/xrpc/com.atproto.label.queryLabels?uriPatterns=at://did:plc:artist/fm.plyr.track/*"
# query all labels from our labeler
curl "https://moderation.plyr.fm/xrpc/com.atproto.label.queryLabels?sources=did:plc:plyr-labeler"
response:
{
"labels": [
{
"src": "did:plc:plyr-labeler",
"uri": "at://did:plc:artist/fm.plyr.track/abc123",
"val": "copyright-violation",
"cts": "2025-11-30T12:00:00.000Z",
"sig": "base64-encoded-signature"
}
]
}
future considerations#
batch scanning existing tracks#
# scan all tracks that haven't been scanned
async def backfill_scans():
async with get_session() as session:
unscanned = await session.execute(
select(Track)
.outerjoin(CopyrightScan)
.where(CopyrightScan.id.is_(None))
)
for track in unscanned.scalars():
await scan_track_for_copyright(track.id, track.r2_url)
label subscriptions#
the moderation service exposes com.atproto.label.subscribeLabels for real-time label streaming. apps can subscribe to receive new labels as they're created.
user-facing appeals#
eventual flow:
- artist sees flag on their track
- artist submits dispute with evidence (license, original work proof)
- admin reviews dispute
- if resolved: emit negation label (
neg: true) to revoke the original
admin dashboard#
considerations for where to build the admin UI:
- option A: add to main frontend (plyr.fm/admin) - simpler, reuse existing auth
- option B: separate UI on moderation service - isolated, but needs its own auth
- option C: use Ozone - Bluesky's open-source moderation tool, already built for ATProto labels
see overview.md for architecture discussion.