# ATProto labeler service

technical documentation for the moderation service's ATProto labeling capabilities.

## overview

the moderation service (`moderation.plyr.fm`) acts as an ATProto labeler - a service that produces signed labels about content. labels are metadata objects that follow the `com.atproto.label.defs#label` schema and can be queried by any ATProto-compatible app.

key distinction: **labels are signed data objects, not repository records**. they don't live in a user's repo - they're served directly by the labeler via XRPC endpoints.

## why labels?

from [Bluesky's labeling architecture](https://docs.bsky.app/docs/advanced-guides/moderation):

> "Labels are assertions made about content or accounts. They don't enforce anything on their own - clients decide how to interpret them."

this enables **stackable moderation**: multiple labelers can label the same content, and clients can choose which labelers to trust and how to handle different label values.

for plyr.fm, this means:
- we produce `copyright-violation` labels when tracks are flagged
- other ATProto apps can query our labels and apply their own policies
- users/apps can choose to subscribe to our labeler or ignore it
- we can revoke labels by emitting negations (`neg: true`)

## architecture

```
┌─────────────────────────────────────────────────────────────────┐
│                     moderation service                           │
│                     (moderation.plyr.fm)                        │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  ┌─────────────┐    ┌─────────────┐    ┌─────────────────────┐  │
│  │  /scan      │    │ /emit-label │    │ /xrpc/com.atproto.  │  │
│  │  endpoint   │    │  endpoint   │    │ label.queryLabels   │  │
│  └──────┬──────┘    └──────┬──────┘    └──────────┬──────────┘  │
│         │                  │                      │              │
│         ▼                  ▼                      ▼              │
│  ┌─────────────┐    ┌─────────────┐    ┌─────────────────────┐  │
│  │   AuDD      │    │   sign      │    │   query labels      │  │
│  │   client    │    │   label     │    │   from postgres     │  │
│  └─────────────┘    └─────────────┘    └─────────────────────┘  │
│                            │                                     │
│                            ▼                                     │
│                     ┌─────────────┐                              │
│                     │   labels    │                              │
│                     │   table     │                              │
│                     └─────────────┘                              │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘
```

## endpoints

### POST /scan

scans audio for copyright matches via AuDD.

```bash
curl -X POST https://moderation.plyr.fm/scan \
  -H "X-Moderation-Key: $MODERATION_AUTH_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"audio_url": "https://r2.plyr.fm/audio/abc123.mp3"}'
```

response:

```json
{
  "matches": [
    {
      "artist": "Taylor Swift",
      "title": "Love Story",
      "score": 95,
      "isrc": "USRC10701234"
    }
  ],
  "is_flagged": true,
  "highest_score": 95,
  "raw_response": { ... }
}
```

### POST /emit-label

creates a signed ATProto label.

```bash
curl -X POST https://moderation.plyr.fm/emit-label \
  -H "X-Moderation-Key: $MODERATION_AUTH_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "uri": "at://did:plc:abc123/fm.plyr.track/xyz789",
    "val": "copyright-violation",
    "cid": "bafyreiabc123"
  }'
```

the service:
1. creates label with current timestamp
2. signs with labeler's secp256k1 private key (DAG-CBOR encoded)
3. stores in `labels` table with monotonic sequence number

### GET /xrpc/com.atproto.label.queryLabels

standard ATProto XRPC endpoint for querying labels.

```bash
# query by URI pattern
curl "https://moderation.plyr.fm/xrpc/com.atproto.label.queryLabels?uriPatterns=at://did:plc:*"

# query by source (labeler DID)
curl "https://moderation.plyr.fm/xrpc/com.atproto.label.queryLabels?sources=did:plc:plyr-labeler"

# query by cursor (pagination)
curl "https://moderation.plyr.fm/xrpc/com.atproto.label.queryLabels?cursor=123&limit=50"
```

response:

```json
{
  "cursor": "456",
  "labels": [
    {
      "ver": 1,
      "src": "did:plc:plyr-labeler",
      "uri": "at://did:plc:abc123/fm.plyr.track/xyz789",
      "cid": "bafyreiabc123",
      "val": "copyright-violation",
      "neg": false,
      "cts": "2025-11-30T12:00:00.000Z",
      "sig": "base64-encoded-secp256k1-signature"
    }
  ]
}
```

## label signing

labels are signed using DAG-CBOR serialization with secp256k1 keys (same as ATProto repo commits).

signing process:
1. construct label object without `sig` field
2. encode as DAG-CBOR (deterministic CBOR)
3. compute SHA-256 hash of encoded bytes
4. sign hash with labeler's secp256k1 private key
5. attach signature as `sig` field

this allows any client to verify labels came from our labeler by checking the signature against our public key (in our DID document).

## label values

current supported values:

| val | meaning | when emitted |
|-----|---------|--------------|
| `copyright-violation` | track flagged for potential copyright infringement | scan returns matches |

future values could include:
- `explicit` - explicit content marker
- `spam` - suspected spam upload
- `dmca-takedown` - formal DMCA notice received

## negation labels

to revoke a label, emit the same label with `neg: true`:

```json
{
  "uri": "at://did:plc:abc123/fm.plyr.track/xyz789",
  "val": "copyright-violation",
  "neg": true
}
```

use cases:
- false positive resolved after manual review
- artist provided proof of licensing
- DMCA counter-notice accepted

## database schema

```sql
CREATE TABLE labels (
    id BIGSERIAL PRIMARY KEY,
    seq BIGSERIAL UNIQUE NOT NULL,     -- monotonic for subscribeLabels cursor
    src TEXT NOT NULL,                  -- labeler DID
    uri TEXT NOT NULL,                  -- target AT URI
    cid TEXT,                           -- optional target CID
    val TEXT NOT NULL,                  -- label value
    neg BOOLEAN NOT NULL DEFAULT FALSE, -- negation flag
    cts TIMESTAMPTZ NOT NULL,           -- creation timestamp
    exp TIMESTAMPTZ,                    -- optional expiration
    sig BYTEA NOT NULL,                 -- signature bytes
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE INDEX idx_labels_uri ON labels(uri);
CREATE INDEX idx_labels_src ON labels(src);
CREATE INDEX idx_labels_seq ON labels(seq);
CREATE INDEX idx_labels_val ON labels(val);
```

## deployment

the moderation service runs on Fly.io:

```bash
# deploy
cd moderation && fly deploy

# check logs
fly logs -a plyr-moderation

# secrets
fly secrets set -a plyr-moderation \
  LABELER_DID=did:plc:xxx \
  LABELER_SIGNING_KEY=hex-private-key \
  DATABASE_URL=postgres://... \
  AUDD_API_KEY=xxx \
  MODERATION_AUTH_TOKEN=xxx
```

## integration with backend

the backend calls the moderation service in two places:

1. **scan on upload** (`_internal/moderation.py:scan_track_for_copyright`)
   - POST to `/scan` with R2 URL
   - store result in `copyright_scans` table

2. **emit label on flag** (`_internal/moderation.py:_store_scan_result`)
   - if `is_flagged` and track has `atproto_record_uri`
   - POST to `/emit-label` with track's AT URI and CID

```python
async def _emit_copyright_label(uri: str, cid: str | None) -> None:
    async with httpx.AsyncClient(timeout=10.0) as client:
        await client.post(
            f"{settings.moderation.labeler_url}/emit-label",
            json={"uri": uri, "val": "copyright-violation", "cid": cid},
            headers={"X-Moderation-Key": settings.moderation.auth_token},
        )
```

## troubleshooting

### label not appearing in queries

1. check moderation service logs for emit errors
2. verify track has `atproto_record_uri` set
3. query labels table directly:
   ```sql
   SELECT * FROM labels WHERE uri LIKE '%track_rkey%';
   ```

### signature verification failing

1. ensure `LABELER_SIGNING_KEY` matches DID document's public key
2. check DAG-CBOR encoding is deterministic
3. verify hash algorithm is SHA-256

### scan returning empty matches

AuDD requires actual audio fingerprints. common issues:
- audio too short (< 3 seconds usable)
- microphone recordings don't match source audio
- very low bitrate or corrupted files

## references

- [ATProto Labeling Spec](https://atproto.com/specs/label)
- [Bluesky Moderation Guide](https://docs.bsky.app/docs/advanced-guides/moderation)
- [DAG-CBOR Spec](https://ipld.io/specs/codecs/dag-cbor/spec/)
- [AuDD API Docs](https://docs.audd.io/)