1# copyright detection 2 3technical documentation for the copyright scanning system. 4 5## how it works 6 7``` 8upload completes 9 10 11┌──────────────┐ ┌─────────────────┐ ┌─────────────┐ 12│ backend │────▶│ moderation │────▶│ AuDD API │ 13│ (background) │ │ service │ │ │ 14│ │◀────│ (Rust) │◀────│ │ 15└──────────────┘ └─────────────────┘ └─────────────┘ 16 │ │ 17 │ │ if flagged 18 ▼ ▼ 19┌──────────────┐ ┌─────────────────┐ 20│ copyright_ │ │ ATProto label │ 21│ scans table │ │ emission │ 22└──────────────┘ └─────────────────┘ 23``` 24 251. track upload completes, file stored in R2 262. backend calls moderation service `/scan` endpoint with R2 URL 273. moderation service calls AuDD API for music recognition 284. results returned to backend, stored in `copyright_scans` table 295. if flagged, backend calls `/emit-label` to create ATProto label 306. label stored in moderation service's `labels` table 31 32## AuDD API 33 34[AuDD](https://audd.io/) is a music recognition service similar to Shazam. their API scans audio and returns matched songs with confidence scores. 35 36### request 37 38```bash 39curl -X POST https://api.audd.io/ \ 40 -F "api_token=YOUR_TOKEN" \ 41 -F "url=https://your-r2-bucket.com/audio/abc123.mp3" \ 42 -F "accurate_offsets=1" 43``` 44 45### response 46 47```json 48{ 49 "status": "success", 50 "result": [ 51 { 52 "offset": 0, 53 "songs": [ 54 { 55 "artist": "Artist Name", 56 "title": "Song Title", 57 "album": "Album Name", 58 "score": 85, 59 "isrc": "USRC12345678", 60 "timecode": "01:30" 61 } 62 ] 63 }, 64 { 65 "offset": 180000, 66 "songs": [ 67 { 68 "artist": "Another Artist", 69 "title": "Another Song", 70 "score": 72 71 } 72 ] 73 } 74 ] 75} 76``` 77 78### pricing 79 80- $2 per 1000 requests 81- 1 request = 12 seconds of audio 82- 5-minute track ≈ 25 requests ≈ $0.05 83- first 300 requests free 84 85## database schema 86 87### backend: copyright_scans table 88 89```sql 90CREATE TABLE copyright_scans ( 91 id SERIAL PRIMARY KEY, 92 track_id INTEGER NOT NULL REFERENCES tracks(id) ON DELETE CASCADE, 93 94 is_flagged BOOLEAN NOT NULL DEFAULT FALSE, 95 highest_score INTEGER NOT NULL DEFAULT 0, 96 matches JSONB NOT NULL DEFAULT '[]', -- [{artist, title, score, isrc}] 97 raw_response JSONB NOT NULL DEFAULT '{}', -- full API response 98 99 created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), 100 101 UNIQUE(track_id) 102); 103``` 104 105### moderation service: labels table 106 107```sql 108CREATE TABLE labels ( 109 id BIGSERIAL PRIMARY KEY, 110 seq BIGSERIAL UNIQUE NOT NULL, -- monotonic sequence for subscriptions 111 src TEXT NOT NULL, -- labeler DID 112 uri TEXT NOT NULL, -- target AT URI 113 cid TEXT, -- optional target CID 114 val TEXT NOT NULL, -- label value (e.g., "copyright-violation") 115 neg BOOLEAN NOT NULL DEFAULT FALSE, -- negation (for revoking labels) 116 cts TIMESTAMPTZ NOT NULL, -- creation timestamp 117 exp TIMESTAMPTZ, -- optional expiration 118 sig BYTEA NOT NULL, -- secp256k1 signature 119 created_at TIMESTAMPTZ NOT NULL DEFAULT NOW() 120); 121``` 122 123### scan result states 124 125| is_flagged | highest_score | meaning | 126|------------|---------------|---------| 127| `false` | 0 | no matches found | 128| `false` | 0 | scan failed (error in raw_response) | 129| `true` | > 0 | matches found, label emitted | 130 131## configuration 132 133### backend environment variables 134 135```bash 136# moderation service connection 137MODERATION_SERVICE_URL=https://moderation.plyr.fm 138MODERATION_AUTH_TOKEN=shared_secret_token 139MODERATION_TIMEOUT_SECONDS=300 140MODERATION_ENABLED=true 141 142# labeler URL (for emitting labels after scan) 143MODERATION_LABELER_URL=https://moderation.plyr.fm 144``` 145 146### moderation service environment variables 147 148```bash 149# AuDD API 150AUDD_API_KEY=your_audd_token 151 152# database 153DATABASE_URL=postgres://... 154 155# labeler identity 156LABELER_DID=did:plc:your-labeler-did 157LABELER_SIGNING_KEY=hex-encoded-secp256k1-private-key 158 159# auth 160MODERATION_AUTH_TOKEN=shared_secret_token 161``` 162 163## interpreting results 164 165### confidence scores 166 167AuDD returns a score (0-100) for each match: 168 169| score | meaning | 170|-------|---------| 171| 90-100 | very high confidence, almost certainly a match | 172| 70-89 | high confidence, likely a match | 173| 50-69 | moderate confidence, may be similar but not exact | 174| < 50 | low confidence, probably not a match | 175 176default threshold is 70. tracks with any match >= 70 are flagged. 177 178### false positives 179 180common causes: 181- generic beats/samples used in multiple songs 182- covers or remixes (legal gray area) 183- similar chord progressions 184- audio artifacts matching by coincidence 185 186this is why we flag but don't enforce. human review is needed. 187 188### ISRC codes 189 190[International Standard Recording Code](https://en.wikipedia.org/wiki/International_Standard_Recording_Code) - unique identifier for recordings. when present, this is strong evidence of a specific recording match (not just similar audio). 191 192## admin queries 193 194### list all flagged tracks 195 196```sql 197SELECT t.id, t.title, a.handle, cf.confidence_score, cf.matched_tracks 198FROM copyright_flags cf 199JOIN tracks t ON t.id = cf.track_id 200JOIN artists a ON a.did = t.artist_did 201WHERE cf.status = 'flagged' 202ORDER BY cf.confidence_score DESC; 203``` 204 205### scan statistics 206 207```sql 208SELECT 209 status, 210 COUNT(*) as count, 211 AVG(confidence_score) as avg_score 212FROM copyright_flags 213GROUP BY status; 214``` 215 216### tracks pending scan 217 218```sql 219SELECT t.id, t.title, t.created_at 220FROM tracks t 221LEFT JOIN copyright_flags cf ON cf.track_id = t.id 222WHERE cf.id IS NULL OR cf.status = 'pending' 223ORDER BY t.created_at DESC; 224``` 225 226## querying labels 227 228labels can be queried via standard ATProto XRPC endpoints: 229 230```bash 231# query labels for a specific track 232curl "https://moderation.plyr.fm/xrpc/com.atproto.label.queryLabels?uriPatterns=at://did:plc:artist/fm.plyr.track/*" 233 234# query all labels from our labeler 235curl "https://moderation.plyr.fm/xrpc/com.atproto.label.queryLabels?sources=did:plc:plyr-labeler" 236``` 237 238response: 239 240```json 241{ 242 "labels": [ 243 { 244 "src": "did:plc:plyr-labeler", 245 "uri": "at://did:plc:artist/fm.plyr.track/abc123", 246 "val": "copyright-violation", 247 "cts": "2025-11-30T12:00:00.000Z", 248 "sig": "base64-encoded-signature" 249 } 250 ] 251} 252``` 253 254## future considerations 255 256### batch scanning existing tracks 257 258```python 259# scan all tracks that haven't been scanned 260async def backfill_scans(): 261 async with get_session() as session: 262 unscanned = await session.execute( 263 select(Track) 264 .outerjoin(CopyrightScan) 265 .where(CopyrightScan.id.is_(None)) 266 ) 267 for track in unscanned.scalars(): 268 await scan_track_for_copyright(track.id, track.r2_url) 269``` 270 271### label subscriptions 272 273the moderation service exposes `com.atproto.label.subscribeLabels` for real-time label streaming. apps can subscribe to receive new labels as they're created. 274 275### user-facing appeals 276 277eventual flow: 2781. artist sees flag on their track 2792. artist submits dispute with evidence (license, original work proof) 2803. admin reviews dispute 2814. if resolved: emit negation label (`neg: true`) to revoke the original 282 283### admin dashboard 284 285considerations for where to build the admin UI: 286- **option A**: add to main frontend (plyr.fm/admin) - simpler, reuse existing auth 287- **option B**: separate UI on moderation service - isolated, but needs its own auth 288- **option C**: use Ozone - Bluesky's open-source moderation tool, already built for ATProto labels 289 290see [overview.md](./overview.md) for architecture discussion.