docs/moderation/copyright-detection.md at main · zzstoatzz.io/plyr.fm

zzstoatzz.io / plyr.fm
music on atproto plyr.fm
plyr.fm / docs / moderation / copyright-detection.md
at main 8.2 kB view raw view rendered
  1# copyright detection
  2
  3technical documentation for the copyright scanning system.
  4
  5## how it works
  6
  7```
  8upload completes
  9       │
 10       ▼
 11┌──────────────┐     ┌─────────────────┐     ┌─────────────┐
 12│   backend    │────▶│   moderation    │────▶│  AuDD API   │
 13│ (background) │     │   service       │     │             │
 14│              │◀────│   (Rust)        │◀────│             │
 15└──────────────┘     └─────────────────┘     └─────────────┘
 16       │                    │
 17       │                    │ if flagged
 18       ▼                    ▼
 19┌──────────────┐     ┌─────────────────┐
 20│ copyright_   │     │  ATProto label  │
 21│ scans table  │     │  emission       │
 22└──────────────┘     └─────────────────┘
 23```
 24
 251. track upload completes, file stored in R2
 262. backend calls moderation service `/scan` endpoint with R2 URL
 273. moderation service calls AuDD API for music recognition
 284. results returned to backend, stored in `copyright_scans` table
 295. if flagged, backend calls `/emit-label` to create ATProto label
 306. label stored in moderation service's `labels` table
 31
 32## AuDD API
 33
 34[AuDD](https://audd.io/) is a music recognition service similar to Shazam. their API scans audio and returns matched songs with confidence scores.
 35
 36### request
 37
 38```bash
 39curl -X POST https://api.audd.io/ \
 40  -F "api_token=YOUR_TOKEN" \
 41  -F "url=https://your-r2-bucket.com/audio/abc123.mp3" \
 42  -F "accurate_offsets=1"
 43```
 44
 45### response
 46
 47```json
 48{
 49  "status": "success",
 50  "result": [
 51    {
 52      "offset": 0,
 53      "songs": [
 54        {
 55          "artist": "Artist Name",
 56          "title": "Song Title",
 57          "album": "Album Name",
 58          "score": 85,
 59          "isrc": "USRC12345678",
 60          "timecode": "01:30"
 61        }
 62      ]
 63    },
 64    {
 65      "offset": 180000,
 66      "songs": [
 67        {
 68          "artist": "Another Artist",
 69          "title": "Another Song",
 70          "score": 72
 71        }
 72      ]
 73    }
 74  ]
 75}
 76```
 77
 78### pricing
 79
 80- $2 per 1000 requests
 81- 1 request = 12 seconds of audio
 82- 5-minute track ≈ 25 requests ≈ $0.05
 83- first 300 requests free
 84
 85## database schema
 86
 87### backend: copyright_scans table
 88
 89```sql
 90CREATE TABLE copyright_scans (
 91    id SERIAL PRIMARY KEY,
 92    track_id INTEGER NOT NULL REFERENCES tracks(id) ON DELETE CASCADE,
 93
 94    is_flagged BOOLEAN NOT NULL DEFAULT FALSE,
 95    highest_score INTEGER NOT NULL DEFAULT 0,
 96    matches JSONB NOT NULL DEFAULT '[]',      -- [{artist, title, score, isrc}]
 97    raw_response JSONB NOT NULL DEFAULT '{}', -- full API response
 98
 99    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
100
101    UNIQUE(track_id)
102);
103```
104
105### moderation service: labels table
106
107```sql
108CREATE TABLE labels (
109    id BIGSERIAL PRIMARY KEY,
110    seq BIGSERIAL UNIQUE NOT NULL,           -- monotonic sequence for subscriptions
111    src TEXT NOT NULL,                        -- labeler DID
112    uri TEXT NOT NULL,                        -- target AT URI
113    cid TEXT,                                 -- optional target CID
114    val TEXT NOT NULL,                        -- label value (e.g., "copyright-violation")
115    neg BOOLEAN NOT NULL DEFAULT FALSE,       -- negation (for revoking labels)
116    cts TIMESTAMPTZ NOT NULL,                 -- creation timestamp
117    exp TIMESTAMPTZ,                          -- optional expiration
118    sig BYTEA NOT NULL,                       -- secp256k1 signature
119    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
120);
121```
122
123### scan result states
124
125| is_flagged | highest_score | meaning |
126|------------|---------------|---------|
127| `false` | 0 | no matches found |
128| `false` | 0 | scan failed (error in raw_response) |
129| `true` | > 0 | matches found, label emitted |
130
131## configuration
132
133### backend environment variables
134
135```bash
136# moderation service connection
137MODERATION_SERVICE_URL=https://moderation.plyr.fm
138MODERATION_AUTH_TOKEN=shared_secret_token
139MODERATION_TIMEOUT_SECONDS=300
140MODERATION_ENABLED=true
141
142# labeler URL (for emitting labels after scan)
143MODERATION_LABELER_URL=https://moderation.plyr.fm
144```
145
146### moderation service environment variables
147
148```bash
149# AuDD API
150AUDD_API_KEY=your_audd_token
151
152# database
153DATABASE_URL=postgres://...
154
155# labeler identity
156LABELER_DID=did:plc:your-labeler-did
157LABELER_SIGNING_KEY=hex-encoded-secp256k1-private-key
158
159# auth
160MODERATION_AUTH_TOKEN=shared_secret_token
161```
162
163## interpreting results
164
165### confidence scores
166
167AuDD returns a score (0-100) for each match:
168
169| score | meaning |
170|-------|---------|
171| 90-100 | very high confidence, almost certainly a match |
172| 70-89 | high confidence, likely a match |
173| 50-69 | moderate confidence, may be similar but not exact |
174| < 50 | low confidence, probably not a match |
175
176default threshold is 70. tracks with any match >= 70 are flagged.
177
178### false positives
179
180common causes:
181- generic beats/samples used in multiple songs
182- covers or remixes (legal gray area)
183- similar chord progressions
184- audio artifacts matching by coincidence
185
186this is why we flag but don't enforce. human review is needed.
187
188### ISRC codes
189
190[International Standard Recording Code](https://en.wikipedia.org/wiki/International_Standard_Recording_Code) - unique identifier for recordings. when present, this is strong evidence of a specific recording match (not just similar audio).
191
192## admin queries
193
194### list all flagged tracks
195
196```sql
197SELECT t.id, t.title, a.handle, cf.confidence_score, cf.matched_tracks
198FROM copyright_flags cf
199JOIN tracks t ON t.id = cf.track_id
200JOIN artists a ON a.did = t.artist_did
201WHERE cf.status = 'flagged'
202ORDER BY cf.confidence_score DESC;
203```
204
205### scan statistics
206
207```sql
208SELECT
209    status,
210    COUNT(*) as count,
211    AVG(confidence_score) as avg_score
212FROM copyright_flags
213GROUP BY status;
214```
215
216### tracks pending scan
217
218```sql
219SELECT t.id, t.title, t.created_at
220FROM tracks t
221LEFT JOIN copyright_flags cf ON cf.track_id = t.id
222WHERE cf.id IS NULL OR cf.status = 'pending'
223ORDER BY t.created_at DESC;
224```
225
226## querying labels
227
228labels can be queried via standard ATProto XRPC endpoints:
229
230```bash
231# query labels for a specific track
232curl "https://moderation.plyr.fm/xrpc/com.atproto.label.queryLabels?uriPatterns=at://did:plc:artist/fm.plyr.track/*"
233
234# query all labels from our labeler
235curl "https://moderation.plyr.fm/xrpc/com.atproto.label.queryLabels?sources=did:plc:plyr-labeler"
236```
237
238response:
239
240```json
241{
242  "labels": [
243    {
244      "src": "did:plc:plyr-labeler",
245      "uri": "at://did:plc:artist/fm.plyr.track/abc123",
246      "val": "copyright-violation",
247      "cts": "2025-11-30T12:00:00.000Z",
248      "sig": "base64-encoded-signature"
249    }
250  ]
251}
252```
253
254## future considerations
255
256### batch scanning existing tracks
257
258```python
259# scan all tracks that haven't been scanned
260async def backfill_scans():
261    async with get_session() as session:
262        unscanned = await session.execute(
263            select(Track)
264            .outerjoin(CopyrightScan)
265            .where(CopyrightScan.id.is_(None))
266        )
267        for track in unscanned.scalars():
268            await scan_track_for_copyright(track.id, track.r2_url)
269```
270
271### label subscriptions
272
273the moderation service exposes `com.atproto.label.subscribeLabels` for real-time label streaming. apps can subscribe to receive new labels as they're created.
274
275### user-facing appeals
276
277eventual flow:
2781. artist sees flag on their track
2792. artist submits dispute with evidence (license, original work proof)
2803. admin reviews dispute
2814. if resolved: emit negation label (`neg: true`) to revoke the original
282
283### admin dashboard
284
285considerations for where to build the admin UI:
286- **option A**: add to main frontend (plyr.fm/admin) - simpler, reuse existing auth
287- **option B**: separate UI on moderation service - isolated, but needs its own auth
288- **option C**: use Ozone - Bluesky's open-source moderation tool, already built for ATProto labels
289
290see [overview.md](./overview.md) for architecture discussion.