music on atproto
plyr.fm
1# copyright detection
2
3technical documentation for the copyright scanning system.
4
5## how it works
6
7```
8upload completes
9 │
10 ▼
11┌──────────────┐ ┌─────────────────┐ ┌─────────────┐
12│ backend │────▶│ moderation │────▶│ AuDD API │
13│ (background) │ │ service │ │ │
14│ │◀────│ (Rust) │◀────│ │
15└──────────────┘ └─────────────────┘ └─────────────┘
16 │ │
17 │ │ if flagged
18 ▼ ▼
19┌──────────────┐ ┌─────────────────┐
20│ copyright_ │ │ ATProto label │
21│ scans table │ │ emission │
22└──────────────┘ └─────────────────┘
23```
24
251. track upload completes, file stored in R2
262. backend calls moderation service `/scan` endpoint with R2 URL
273. moderation service calls AuDD API for music recognition
284. results returned to backend, stored in `copyright_scans` table
295. if flagged, backend calls `/emit-label` to create ATProto label
306. label stored in moderation service's `labels` table
31
32## AuDD API
33
34[AuDD](https://audd.io/) is a music recognition service similar to Shazam. their API scans audio and returns matched songs with confidence scores.
35
36### request
37
38```bash
39curl -X POST https://api.audd.io/ \
40 -F "api_token=YOUR_TOKEN" \
41 -F "url=https://your-r2-bucket.com/audio/abc123.mp3" \
42 -F "accurate_offsets=1"
43```
44
45### response
46
47```json
48{
49 "status": "success",
50 "result": [
51 {
52 "offset": 0,
53 "songs": [
54 {
55 "artist": "Artist Name",
56 "title": "Song Title",
57 "album": "Album Name",
58 "score": 85,
59 "isrc": "USRC12345678",
60 "timecode": "01:30"
61 }
62 ]
63 },
64 {
65 "offset": 180000,
66 "songs": [
67 {
68 "artist": "Another Artist",
69 "title": "Another Song",
70 "score": 72
71 }
72 ]
73 }
74 ]
75}
76```
77
78### pricing
79
80- $2 per 1000 requests
81- 1 request = 12 seconds of audio
82- 5-minute track ≈ 25 requests ≈ $0.05
83- first 300 requests free
84
85## database schema
86
87### backend: copyright_scans table
88
89```sql
90CREATE TABLE copyright_scans (
91 id SERIAL PRIMARY KEY,
92 track_id INTEGER NOT NULL REFERENCES tracks(id) ON DELETE CASCADE,
93
94 is_flagged BOOLEAN NOT NULL DEFAULT FALSE,
95 highest_score INTEGER NOT NULL DEFAULT 0,
96 matches JSONB NOT NULL DEFAULT '[]', -- [{artist, title, score, isrc}]
97 raw_response JSONB NOT NULL DEFAULT '{}', -- full API response
98
99 created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
100
101 UNIQUE(track_id)
102);
103```
104
105### moderation service: labels table
106
107```sql
108CREATE TABLE labels (
109 id BIGSERIAL PRIMARY KEY,
110 seq BIGSERIAL UNIQUE NOT NULL, -- monotonic sequence for subscriptions
111 src TEXT NOT NULL, -- labeler DID
112 uri TEXT NOT NULL, -- target AT URI
113 cid TEXT, -- optional target CID
114 val TEXT NOT NULL, -- label value (e.g., "copyright-violation")
115 neg BOOLEAN NOT NULL DEFAULT FALSE, -- negation (for revoking labels)
116 cts TIMESTAMPTZ NOT NULL, -- creation timestamp
117 exp TIMESTAMPTZ, -- optional expiration
118 sig BYTEA NOT NULL, -- secp256k1 signature
119 created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
120);
121```
122
123### scan result states
124
125| is_flagged | highest_score | meaning |
126|------------|---------------|---------|
127| `false` | 0 | no matches found |
128| `false` | 0 | scan failed (error in raw_response) |
129| `true` | > 0 | matches found, label emitted |
130
131## configuration
132
133### backend environment variables
134
135```bash
136# moderation service connection
137MODERATION_SERVICE_URL=https://moderation.plyr.fm
138MODERATION_AUTH_TOKEN=shared_secret_token
139MODERATION_TIMEOUT_SECONDS=300
140MODERATION_ENABLED=true
141
142# labeler URL (for emitting labels after scan)
143MODERATION_LABELER_URL=https://moderation.plyr.fm
144```
145
146### moderation service environment variables
147
148```bash
149# AuDD API
150AUDD_API_KEY=your_audd_token
151
152# database
153DATABASE_URL=postgres://...
154
155# labeler identity
156LABELER_DID=did:plc:your-labeler-did
157LABELER_SIGNING_KEY=hex-encoded-secp256k1-private-key
158
159# auth
160MODERATION_AUTH_TOKEN=shared_secret_token
161```
162
163## interpreting results
164
165### confidence scores
166
167AuDD returns a score (0-100) for each match:
168
169| score | meaning |
170|-------|---------|
171| 90-100 | very high confidence, almost certainly a match |
172| 70-89 | high confidence, likely a match |
173| 50-69 | moderate confidence, may be similar but not exact |
174| < 50 | low confidence, probably not a match |
175
176default threshold is 70. tracks with any match >= 70 are flagged.
177
178### false positives
179
180common causes:
181- generic beats/samples used in multiple songs
182- covers or remixes (legal gray area)
183- similar chord progressions
184- audio artifacts matching by coincidence
185
186this is why we flag but don't enforce. human review is needed.
187
188### ISRC codes
189
190[International Standard Recording Code](https://en.wikipedia.org/wiki/International_Standard_Recording_Code) - unique identifier for recordings. when present, this is strong evidence of a specific recording match (not just similar audio).
191
192## admin queries
193
194### list all flagged tracks
195
196```sql
197SELECT t.id, t.title, a.handle, cf.confidence_score, cf.matched_tracks
198FROM copyright_flags cf
199JOIN tracks t ON t.id = cf.track_id
200JOIN artists a ON a.did = t.artist_did
201WHERE cf.status = 'flagged'
202ORDER BY cf.confidence_score DESC;
203```
204
205### scan statistics
206
207```sql
208SELECT
209 status,
210 COUNT(*) as count,
211 AVG(confidence_score) as avg_score
212FROM copyright_flags
213GROUP BY status;
214```
215
216### tracks pending scan
217
218```sql
219SELECT t.id, t.title, t.created_at
220FROM tracks t
221LEFT JOIN copyright_flags cf ON cf.track_id = t.id
222WHERE cf.id IS NULL OR cf.status = 'pending'
223ORDER BY t.created_at DESC;
224```
225
226## querying labels
227
228labels can be queried via standard ATProto XRPC endpoints:
229
230```bash
231# query labels for a specific track
232curl "https://moderation.plyr.fm/xrpc/com.atproto.label.queryLabels?uriPatterns=at://did:plc:artist/fm.plyr.track/*"
233
234# query all labels from our labeler
235curl "https://moderation.plyr.fm/xrpc/com.atproto.label.queryLabels?sources=did:plc:plyr-labeler"
236```
237
238response:
239
240```json
241{
242 "labels": [
243 {
244 "src": "did:plc:plyr-labeler",
245 "uri": "at://did:plc:artist/fm.plyr.track/abc123",
246 "val": "copyright-violation",
247 "cts": "2025-11-30T12:00:00.000Z",
248 "sig": "base64-encoded-signature"
249 }
250 ]
251}
252```
253
254## future considerations
255
256### batch scanning existing tracks
257
258```python
259# scan all tracks that haven't been scanned
260async def backfill_scans():
261 async with get_session() as session:
262 unscanned = await session.execute(
263 select(Track)
264 .outerjoin(CopyrightScan)
265 .where(CopyrightScan.id.is_(None))
266 )
267 for track in unscanned.scalars():
268 await scan_track_for_copyright(track.id, track.r2_url)
269```
270
271### label subscriptions
272
273the moderation service exposes `com.atproto.label.subscribeLabels` for real-time label streaming. apps can subscribe to receive new labels as they're created.
274
275### user-facing appeals
276
277eventual flow:
2781. artist sees flag on their track
2792. artist submits dispute with evidence (license, original work proof)
2803. admin reviews dispute
2814. if resolved: emit negation label (`neg: true`) to revoke the original
282
283### admin dashboard
284
285considerations for where to build the admin UI:
286- **option A**: add to main frontend (plyr.fm/admin) - simpler, reuse existing auth
287- **option B**: separate UI on moderation service - isolated, but needs its own auth
288- **option C**: use Ozone - Bluesky's open-source moderation tool, already built for ATProto labels
289
290see [overview.md](./overview.md) for architecture discussion.