add protocols/atproto notes

covers identity, data model, lexicons, firehose/jetstream,
auth, and labels. based on plyr.fm, pdsx, teal-fm patterns
and paul frazee's atmospheric computing posts.

๐Ÿค– Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

+7
protocols/README.md
··· 1 + # protocols 2 + 3 + notes on open protocols. 4 + 5 + ## contents 6 + 7 + - [atproto](./atproto/) - the AT Protocol for connected clouds
+59
protocols/atproto/README.md
··· 1 + # atproto 2 + 3 + the AT Protocol. a protocol for connected clouds. 4 + 5 + ## the problem 6 + 7 + cloud computing won. it's convenient, scalable, everywhere. but the clouds are closed. your identity, data, and social graph live in silos that don't interoperate. if you want to connect to people, your computer becomes a portal to someone else's computer. 8 + 9 + this wasn't inevitable. early federated systems (XMPP, RSS/Reader) proved interoperability was possible. they died when large providers realized closed networks were more profitable. 10 + 11 + ## the solution 12 + 13 + bridge the clouds. my cloud talks to your cloud. a bigger cloud does bigger things, a smaller cloud does smaller things, but they all talk. 14 + 15 + the AT Protocol is the technical foundation for this. it standardizes identity, data storage, event streams, and API contracts so that independent services can interoperate without direct coordination. 16 + 17 + ## architecture 18 + 19 + two roles matter: 20 + 21 + ``` 22 + โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” 23 + โ”‚ PDS โ”‚ โ—„โ”€โ”€โ”€โ”€โ”€โ–บ โ”‚ Application โ”‚ 24 + โ”‚ (account) โ”‚ โ”‚ (app logic) โ”‚ 25 + โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ 26 + ``` 27 + 28 + **PDS (Personal Data Server)** hosts user accounts and data. each user is one database. the PDS produces a signed event log that anyone can consume. 29 + 30 + **Applications** subscribe to these event logs and build aggregated views. bluesky, leaflet, tangled - they're all applications consuming the same underlying data. 31 + 32 + this separation matters for moderation. if an application suspends you, it affects only that application. if a PDS takes down your account, it affects all applications until you migrate. PDS takedowns should be reserved for network abuse and illegal content. 33 + 34 + ## key properties 35 + 36 + **sharded by user, aggregated by app.** each user's data lives in their PDS with strict serial ordering. applications aggregate across users with only causal ordering between them. 37 + 38 + **authenticated data.** records are signed. event logs can be gossiped across organizational boundaries without trusting the messenger - you verify by checking proofs, not providers. 39 + 40 + **schema interoperability.** applications agree on record schemas (lexicons) so they can read each other's data. your bluesky posts can appear in leaflet because both understand the schema. 41 + 42 + **migration.** because identity is separate from hosting, you can move your account between PDS providers without losing your identity or social graph. 43 + 44 + ## contents 45 + 46 + - [identity](./identity.md) - DIDs, handles, resolution 47 + - [data](./data.md) - repos, records, collections, references 48 + - [lexicons](./lexicons.md) - schema language, namespaces 49 + - [firehose](./firehose.md) - event streaming, jetstream 50 + - [auth](./auth.md) - OAuth, scopes, permission sets 51 + - [labels](./labels.md) - moderation, signed assertions 52 + 53 + ## sources 54 + 55 + - [atmospheric computing](https://pfrazee.com/blog/atmospheric-computing) - paul frazee 56 + - [update on protocol moderation](https://leaflet.pub/pfrazee.com/3lgy73zy4bc2a) - paul frazee 57 + - [atproto.com](https://atproto.com) 58 + - [plyr.fm](https://github.com/zzstoatzz/plyr.fm) - music streaming on atproto 59 + - [pdsx](https://github.com/zzstoatzz/pdsx) - atproto CLI/MCP
+109
protocols/atproto/auth.md
··· 1 + # auth 2 + 3 + atproto uses OAuth 2.0 for application authorization. 4 + 5 + ## the flow 6 + 7 + 1. user visits application 8 + 2. application redirects to user's PDS for authorization 9 + 3. user approves requested scopes 10 + 4. PDS redirects back with authorization code 11 + 5. application exchanges code for tokens 12 + 6. application uses tokens to act on user's behalf 13 + 14 + standard OAuth, but the authorization server is the user's PDS, not a central service. 15 + 16 + ## scopes 17 + 18 + scopes define what an application can do: 19 + 20 + ``` 21 + atproto # full access (legacy) 22 + repo:fm.plyr.track # read/write fm.plyr.track collection 23 + repo:fm.plyr.like # read/write fm.plyr.like collection 24 + repo:read # read-only access to repo 25 + ``` 26 + 27 + granular scopes let users grant minimal permissions. an app that only needs to read your profile shouldn't have write access to your posts. 28 + 29 + ## permission sets 30 + 31 + listing individual scopes is noisy. permission sets bundle them under human-readable names: 32 + 33 + ``` 34 + include:fm.plyr.authFullApp # "plyr.fm Music Library" 35 + ``` 36 + 37 + instead of seeing `fm.plyr.track, fm.plyr.like, fm.plyr.comment, ...`, users see a single permission with a description. 38 + 39 + permission sets are lexicons published to `com.atproto.lexicon.schema` on your authority repo. 40 + 41 + from [plyr.fm permission sets](https://github.com/zzstoatzz/plyr.fm/blob/main/docs/lexicons/overview.md#permission-sets) 42 + 43 + ## session management 44 + 45 + tokens expire. applications need refresh logic: 46 + 47 + ```python 48 + class SessionManager: 49 + def __init__(self, session_path: Path): 50 + self.session_path = session_path 51 + self._client: AsyncClient | None = None 52 + 53 + async def get_client(self) -> AsyncClient: 54 + if self._client: 55 + return self._client 56 + 57 + # try loading saved session 58 + if self.session_path.exists(): 59 + session_str = self.session_path.read_text() 60 + self._client = AsyncClient() 61 + await self._client.login(session_string=session_str) 62 + self._client.on_session_change(self._save_session) 63 + return self._client 64 + 65 + # fall back to fresh login 66 + self._client = AsyncClient() 67 + await self._client.login(handle, password) 68 + self._save_session(None, None) 69 + return self._client 70 + 71 + def _save_session(self, event, session): 72 + self.session_path.write_text(self._client.export_session_string()) 73 + ``` 74 + 75 + from [bot](https://github.com/zzstoatzz/bot) - persists sessions to disk, refreshes automatically. 76 + 77 + ## per-request credentials 78 + 79 + for multi-tenant applications (one backend serving many users), credentials come per-request: 80 + 81 + ```python 82 + # middleware extracts from headers 83 + x-atproto-handle: user.handle 84 + x-atproto-password: app-password 85 + 86 + # or from OAuth session 87 + authorization: Bearer <token> 88 + ``` 89 + 90 + from [pdsx MCP server](https://github.com/zzstoatzz/pdsx) - accepts credentials via HTTP headers for multi-tenant deployment. 91 + 92 + ## app passwords 93 + 94 + for bots and automated tools, app passwords are simpler than full OAuth: 95 + 96 + 1. user creates app password in their PDS settings 97 + 2. bot uses handle + app password to authenticate 98 + 3. no redirect flow needed 99 + 100 + app passwords have full account access. use OAuth with scopes when you need granular permissions. 101 + 102 + ## why this matters 103 + 104 + OAuth at the protocol level means: 105 + 106 + - users authorize apps, not the other way around 107 + - applications can't lock in users by controlling auth 108 + - the same identity works across all atmospheric applications 109 + - granular scopes enable minimal-permission applications
+111
protocols/atproto/data.md
··· 1 + # data 2 + 3 + atproto's data model: each user is a signed database. 4 + 5 + ## repos 6 + 7 + a repository is a user's data store. it contains all their records - posts, likes, follows, whatever the applications define. 8 + 9 + repos are merkle trees. every commit is signed by the user's key and can be verified by anyone. this is what enables authenticated data gossip - you don't need to trust the messenger, you verify the signature. 10 + 11 + ## records 12 + 13 + records are JSON documents stored in collections: 14 + 15 + ``` 16 + at://did:plc:xyz/app.bsky.feed.post/3jui7akfj2k2a 17 + โ””โ”€โ”€ DID โ”€โ”€โ”˜ โ””โ”€โ”€ collection โ”€โ”€โ”€โ”˜ โ””โ”€โ”€ rkey โ”€โ”€โ”˜ 18 + ``` 19 + 20 + - **DID**: whose repo 21 + - **collection**: the record type (lexicon NSID) 22 + - **rkey**: record key within the collection 23 + 24 + record keys are typically TIDs (timestamp-based IDs) for records where users have many (posts, likes). for singletons like profiles, the literal `self` is used. 25 + 26 + ## AT-URIs 27 + 28 + the `at://` URI scheme identifies records: 29 + 30 + ``` 31 + at://did:plc:xyz/fm.plyr.track/3jui7akfj2k2a 32 + at://zzstoatzz.io/app.bsky.feed.post/3jui7akfj2k2a # handle also works 33 + ``` 34 + 35 + these are stable references. the URI uniquely identifies a record across the network. 36 + 37 + ## CIDs 38 + 39 + a CID (Content Identifier) is a hash of a specific version of a record: 40 + 41 + ``` 42 + bafyreig2fjxi3qbp5jvyqx2i4djxfkp... 43 + ``` 44 + 45 + URIs identify *what*, CIDs identify *which version*. when you reference another record and care about the exact content, you include both. 46 + 47 + ## strongRef 48 + 49 + the standard pattern for cross-record references: 50 + 51 + ```json 52 + { 53 + "subject": { 54 + "uri": "at://did:plc:xyz/fm.plyr.track/abc123", 55 + "cid": "bafyreig..." 56 + } 57 + } 58 + ``` 59 + 60 + used in likes (referencing tracks), comments (referencing tracks), lists (referencing any records). the CID proves you're referencing a specific version. 61 + 62 + from [plyr.fm lexicons](https://github.com/zzstoatzz/plyr.fm/tree/main/lexicons) - likes, comments, and lists all use strongRef. 63 + 64 + ## collections 65 + 66 + records are grouped into collections by type: 67 + 68 + ``` 69 + repo/ 70 + โ”œโ”€โ”€ app.bsky.feed.post/ 71 + โ”‚ โ”œโ”€โ”€ 3jui7akfj2k2a 72 + โ”‚ โ””โ”€โ”€ 3jui8bklg3l3b 73 + โ”œโ”€โ”€ app.bsky.feed.like/ 74 + โ”‚ โ””โ”€โ”€ ... 75 + โ””โ”€โ”€ fm.plyr.track/ 76 + โ””โ”€โ”€ ... 77 + ``` 78 + 79 + each collection corresponds to a lexicon. applications read and write to collections they understand. 80 + 81 + ## local indexing 82 + 83 + querying across PDSes is slow. applications maintain local indexes: 84 + 85 + ```sql 86 + -- plyr.fm indexes fm.plyr.track records 87 + CREATE TABLE tracks ( 88 + id SERIAL PRIMARY KEY, 89 + did TEXT NOT NULL, 90 + rkey TEXT NOT NULL, 91 + uri TEXT NOT NULL, 92 + cid TEXT, 93 + title TEXT NOT NULL, 94 + artist TEXT NOT NULL, 95 + -- ... application-specific fields 96 + UNIQUE(did, rkey) 97 + ); 98 + ``` 99 + 100 + when users log in, sync their records from PDS to local database. background jobs keep indexes fresh. 101 + 102 + from [plyr.fm](https://github.com/zzstoatzz/plyr.fm) - indexes tracks, likes, comments, playlists locally. 103 + 104 + ## why this matters 105 + 106 + the "each user is one database" model enables: 107 + 108 + - **portability**: your data is yours, stored in your PDS 109 + - **verification**: anyone can verify record authenticity via signatures 110 + - **aggregation**: applications build views across users without centralizing storage 111 + - **interop**: multiple apps can read the same records if they share schemas
+135
protocols/atproto/firehose.md
··· 1 + # firehose 2 + 3 + the firehose is atproto's event stream. applications subscribe to it to build aggregated views of network activity. 4 + 5 + ## the CDC model 6 + 7 + each PDS produces a CDC (Change Data Capture) log of commits. when a user creates, updates, or deletes a record, the PDS emits a signed event. applications consume these events and update their local databases. 8 + 9 + ``` 10 + User commits record โ†’ PDS emits event โ†’ Applications consume โ†’ Local DB updated 11 + ``` 12 + 13 + this is the "sharded by user, aggregated by app" model. users have strict serial ordering within their repos. applications see causal ordering across users. 14 + 15 + ## firehose vs jetstream 16 + 17 + two ways to consume network events: 18 + 19 + ### firehose (raw) 20 + 21 + the protocol-level stream. binary format (CBOR), includes full cryptographic proofs. 22 + 23 + ``` 24 + com.atproto.sync.subscribeRepos 25 + ``` 26 + 27 + - full merkle proofs for verification 28 + - CAR file blocks for data 29 + - higher bandwidth, more complex parsing 30 + - required for archival, moderation decisions, anything needing authenticity guarantees 31 + 32 + ### jetstream 33 + 34 + a simplified relay. JSON format, filtering support. 35 + 36 + ``` 37 + wss://jetstream2.us-east.bsky.network/subscribe 38 + ``` 39 + 40 + - JSON encoding (easier to parse) 41 + - filter by collection or DID 42 + - compressed, lower bandwidth 43 + - no cryptographic proofs - data isn't self-authenticating 44 + 45 + use jetstream for: 46 + - prototypes and experiments 47 + - bots and interactive tools 48 + - applications where you trust the relay 49 + 50 + use firehose for: 51 + - archival systems 52 + - moderation services 53 + - anything requiring proof of authenticity 54 + 55 + from [music-atmosphere-feed](https://github.com/zzstoatzz/music-atmosphere-feed) - uses jetstream to filter posts containing music links. 56 + 57 + ## event structure 58 + 59 + firehose events contain: 60 + 61 + ``` 62 + repo: did:plc:xyz # whose repo 63 + rev: 3jui7akfj2k2a # commit revision 64 + seq: 12345 # sequence number 65 + time: 2024-01-01T00:00:00Z # timestamp 66 + ops: [ # operations in this commit 67 + { 68 + action: "create", # create, update, delete 69 + path: "app.bsky.feed.post/abc123", 70 + cid: bafyrei... 71 + } 72 + ] 73 + blocks: <CAR data> # actual record content 74 + ``` 75 + 76 + jetstream simplifies this to JSON with the record content inline. 77 + 78 + ## consuming events 79 + 80 + pattern from plyr.fm and follower-weight: 81 + 82 + ```python 83 + async def consume_firehose(): 84 + async for event in firehose.subscribe(): 85 + for op in event.ops: 86 + if op.collection == "fm.plyr.track": 87 + if op.action == "create": 88 + await index_track(event.repo, op.rkey, op.record) 89 + elif op.action == "delete": 90 + await remove_track(event.repo, op.rkey) 91 + ``` 92 + 93 + ## batch processing 94 + 95 + for high-volume consumption, batch writes: 96 + 97 + ```python 98 + BATCH_SIZE = 1000 99 + buffer = [] 100 + 101 + async for event in firehose.subscribe(): 102 + buffer.append(event) 103 + if len(buffer) >= BATCH_SIZE: 104 + await bulk_insert(buffer) 105 + await ack_cursor(event.seq) # ack AFTER persistence 106 + buffer = [] 107 + ``` 108 + 109 + critical: acknowledge cursor only after successful persistence. if you crash, you'll replay from the last ack. 110 + 111 + from [follower-weight](https://github.com/zzstoatzz/follower-weight) - batches 1000 events, acks after postgres commit. 112 + 113 + ## cursor management 114 + 115 + firehose supports resumption via cursor (sequence number): 116 + 117 + ```python 118 + # resume from where we left off 119 + cursor = await db.get_last_cursor() 120 + async for event in firehose.subscribe(cursor=cursor): 121 + # process... 122 + await db.save_cursor(event.seq) 123 + ``` 124 + 125 + store cursor persistently. on restart, resume from stored position. 126 + 127 + ## why this matters 128 + 129 + the firehose enables "cooperative computing": 130 + 131 + - third parties can build first-party experiences (feeds, search, analytics) 132 + - no API rate limits or access restrictions on public data 133 + - applications compete on what they build, not what data they have access to 134 + 135 + the For You algorithm on bluesky runs on someone's gaming PC, consuming the same firehose as bluesky itself.
+79
protocols/atproto/identity.md
··· 1 + # identity 2 + 3 + identity in atproto separates "who you are" from "where you're hosted." 4 + 5 + ## DIDs 6 + 7 + a DID (Decentralized Identifier) is your permanent identity. it looks like: 8 + 9 + ``` 10 + did:plc:xbtmt2zjwlrfegqvch7fboei 11 + ``` 12 + 13 + the DID never changes, even if you move to a different PDS. this is what makes account migration possible - your identity isn't tied to your host. 14 + 15 + atproto primarily uses `did:plc`, where the PLC Directory (`plc.directory`) maintains a mapping from DIDs to their current metadata: signing keys, PDS location, and associated handles. 16 + 17 + `did:web` is also supported, using DNS as the resolution mechanism. this gives you full control but requires maintaining infrastructure. 18 + 19 + ## handles 20 + 21 + a handle is the human-readable name: 22 + 23 + ``` 24 + zzstoatzz.io 25 + pfrazee.com 26 + ``` 27 + 28 + handles are DNS-based. you prove ownership by either: 29 + - adding a DNS TXT record at `_atproto.yourdomain.com` 30 + - serving a file at `/.well-known/atproto-did` 31 + 32 + handles can change. they're aliases to DIDs, not identities themselves. if you lose a domain, you lose the handle but keep your DID and all your data. 33 + 34 + ## resolution 35 + 36 + to find a user: 37 + 38 + 1. resolve handle โ†’ DID (via DNS or well-known) 39 + 2. resolve DID โ†’ DID document (via PLC directory) 40 + 3. DID document contains PDS endpoint 41 + 4. query PDS for data 42 + 43 + ```python 44 + # simplified resolution flow 45 + handle = "zzstoatzz.io" 46 + did = resolve_handle(handle) # โ†’ did:plc:... 47 + doc = resolve_did(did) # โ†’ {service: [...], alsoKnownAs: [...]} 48 + pds_url = doc["service"][0]["serviceEndpoint"] 49 + ``` 50 + 51 + ## caching 52 + 53 + DID resolution is expensive (HTTP calls to PLC directory). cache aggressively: 54 + 55 + ```python 56 + _did_cache: dict[str, tuple[str, float]] = {} 57 + DID_CACHE_TTL = 3600 # 1 hour 58 + 59 + async def get_did(handle: str) -> str: 60 + if handle in _did_cache: 61 + did, ts = _did_cache[handle] 62 + if time.time() - ts < DID_CACHE_TTL: 63 + return did 64 + did = await resolve_handle(handle) 65 + _did_cache[handle] = (did, time.time()) 66 + return did 67 + ``` 68 + 69 + from [at-me](https://github.com/zzstoatzz/at-me) - caches DID resolutions with 1-hour TTL. 70 + 71 + ## why this matters 72 + 73 + the separation of identity (DID) from location (PDS) and presentation (handle) is what enables the "connected clouds" model. you can: 74 + 75 + - switch PDS providers without losing followers 76 + - use your own domain as your identity 77 + - maintain identity even if banned from specific applications 78 + 79 + your identity is yours. hosting is a service you can change.
+128
protocols/atproto/labels.md
··· 1 + # labels 2 + 3 + labels are signed assertions about content. they're how applications do moderation without affecting the underlying data. 4 + 5 + ## the key distinction 6 + 7 + remember the two roles: 8 + 9 + - **PDS**: hosts accounts, affects all applications 10 + - **Application**: consumes data, affects only itself 11 + 12 + if a PDS takes down your account, you're gone from all applications until you migrate. this is the nuclear option - reserved for illegal content and network abuse. 13 + 14 + labels are the application-level mechanism. when bluesky labels your content, it affects bluesky. leaflet can ignore those labels entirely. 15 + 16 + from [update on protocol moderation](https://leaflet.pub/pfrazee.com/3lgy73zy4bc2a) - paul frazee 17 + 18 + ## what labels are 19 + 20 + labels are metadata objects, not repository records. they don't live in anyone's repo. a labeler service produces them and serves them via XRPC. 21 + 22 + ```json 23 + { 24 + "ver": 1, 25 + "src": "did:plc:labeler-did", 26 + "uri": "at://did:plc:xyz/fm.plyr.track/abc123", 27 + "cid": "bafyreig...", 28 + "val": "copyright-violation", 29 + "neg": false, 30 + "cts": "2025-11-30T12:00:00.000Z", 31 + "sig": "<base64-signature>" 32 + } 33 + ``` 34 + 35 + - **src**: who made this assertion (labeler DID) 36 + - **uri**: what content it's about 37 + - **val**: the label value 38 + - **neg**: true if this negates a previous label 39 + - **sig**: cryptographic signature 40 + 41 + ## signed assertions 42 + 43 + labels are signed using DAG-CBOR + secp256k1 (same as repo commits). anyone can verify a label came from the claimed labeler by checking the signature against the labeler's public key in their DID document. 44 + 45 + this enables trust decisions: you can choose which labelers you trust and how to interpret their labels. 46 + 47 + ## labeler services 48 + 49 + a labeler is a service that: 50 + 51 + 1. analyzes content (automated or manual review) 52 + 2. produces signed labels 53 + 3. serves labels via `com.atproto.label.queryLabels` 54 + 55 + ``` 56 + POST /emit-label 57 + { 58 + "uri": "at://did:plc:xyz/fm.plyr.track/abc123", 59 + "val": "copyright-violation", 60 + "cid": "bafyreig..." 61 + } 62 + ``` 63 + 64 + from [plyr.fm moderation service](https://github.com/zzstoatzz/plyr.fm/blob/main/docs/moderation/atproto-labeler.md) - runs copyright detection, emits labels for flagged tracks. 65 + 66 + ## stackable moderation 67 + 68 + multiple labelers can label the same content. applications choose: 69 + 70 + - which labelers to subscribe to 71 + - how to interpret each label value 72 + - what action to take (hide, warn, ignore) 73 + 74 + this is "stackable moderation" - layers of independent assertions that clients compose into a moderation policy. 75 + 76 + ## negation 77 + 78 + to revoke a label, emit the same label with `neg: true`: 79 + 80 + ```json 81 + { 82 + "uri": "at://did:plc:xyz/fm.plyr.track/abc123", 83 + "val": "copyright-violation", 84 + "neg": true 85 + } 86 + ``` 87 + 88 + use cases: 89 + - false positive resolved after review 90 + - artist provided licensing proof 91 + - DMCA counter-notice accepted 92 + 93 + ## label values 94 + 95 + common patterns: 96 + 97 + | val | meaning | 98 + |-----|---------| 99 + | `!takedown` | strong: hide from view | 100 + | `!warn` | show warning before content | 101 + | `copyright-violation` | potential copyright issue | 102 + | `explicit` | adult content | 103 + | `spam` | suspected spam | 104 + 105 + applications define how to handle each value. `!takedown` conventionally means "don't show this" but applications make that choice. 106 + 107 + ## querying labels 108 + 109 + ``` 110 + GET /xrpc/com.atproto.label.queryLabels?uriPatterns=at://did:plc:* 111 + 112 + { 113 + "cursor": "456", 114 + "labels": [...] 115 + } 116 + ``` 117 + 118 + applications can query labels for content they're about to display and apply their moderation policy. 119 + 120 + ## why this matters 121 + 122 + labels separate moderation concerns: 123 + 124 + - **PDS operators** handle illegal content and network abuse 125 + - **Applications** handle policy violations for their context 126 + - **Users** can choose which labelers to trust 127 + 128 + no single entity controls moderation for the entire network. applications compete on moderation quality. users can route around overzealous or insufficient moderation by choosing different apps or labelers.
+122
protocols/atproto/lexicons.md
··· 1 + # lexicons 2 + 3 + lexicons are atproto's schema system. they define what records look like and what APIs accept. 4 + 5 + ## NSIDs 6 + 7 + a Namespace ID identifies a lexicon: 8 + 9 + ``` 10 + fm.plyr.track 11 + app.bsky.feed.post 12 + com.atproto.repo.createRecord 13 + ``` 14 + 15 + format is reverse-DNS. the domain owner controls that namespace. this prevents collisions and makes ownership clear. 16 + 17 + ## defining a lexicon 18 + 19 + ```json 20 + { 21 + "$type": "com.atproto.lexicon", 22 + "id": "fm.plyr.track", 23 + "defs": { 24 + "main": { 25 + "type": "record", 26 + "key": "tid", 27 + "record": { 28 + "type": "object", 29 + "required": ["title", "artist", "audioUrl", "createdAt"], 30 + "properties": { 31 + "title": {"type": "string"}, 32 + "artist": {"type": "string"}, 33 + "audioUrl": {"type": "string", "format": "uri"}, 34 + "album": {"type": "string"}, 35 + "duration": {"type": "integer"}, 36 + "createdAt": {"type": "string", "format": "datetime"} 37 + } 38 + } 39 + } 40 + } 41 + } 42 + ``` 43 + 44 + from [plyr.fm/lexicons/track.json](https://github.com/zzstoatzz/plyr.fm/blob/main/lexicons/track.json) 45 + 46 + ## record keys 47 + 48 + - **tid**: timestamp-based ID. for records where users have many (tracks, likes, posts). 49 + - **literal:self**: singleton. for records where users have one (profile). 50 + 51 + ```json 52 + "key": "tid" // generates 3jui7akfj2k2a 53 + "key": "literal:self" // always "self" 54 + ``` 55 + 56 + ## knownValues 57 + 58 + extensible enums. the schema declares known values but validators won't reject unknown ones: 59 + 60 + ```json 61 + "listType": { 62 + "type": "string", 63 + "knownValues": ["album", "playlist", "liked"] 64 + } 65 + ``` 66 + 67 + this allows schemas to evolve without breaking existing records. new values can be added; old clients just won't recognize them. 68 + 69 + from [plyr.fm list lexicon](https://github.com/zzstoatzz/plyr.fm/blob/main/lexicons/list.json) 70 + 71 + ## namespace discipline 72 + 73 + plyr.fm uses environment-aware namespaces: 74 + 75 + | environment | namespace | 76 + |-------------|-----------| 77 + | production | `fm.plyr` | 78 + | staging | `fm.plyr.stg` | 79 + | development | `fm.plyr.dev` | 80 + 81 + never hardcode namespaces. configure via settings so dev/staging don't pollute production data. 82 + 83 + important: don't reuse another app's lexicons even for similar concepts. plyr.fm defines `fm.plyr.like` rather than using `app.bsky.feed.like`. this maintains namespace isolation and avoids coupling to another app's schema evolution. 84 + 85 + ## shared lexicons 86 + 87 + for true interoperability, multiple apps can agree on a common schema: 88 + 89 + ``` 90 + audio.ooo.track # shared schema for audio content 91 + ``` 92 + 93 + plyr.fm writes to `audio.ooo.track` (production) so other audio apps can read the same records. this follows the pattern at [standard.site](https://standard.site). 94 + 95 + benefits: 96 + - one schema for discovery, any app can read it 97 + - content is portable - tracks live in your PDS, playable anywhere 98 + - platform-specific features live as extensions, not forks 99 + 100 + from [plyr.fm shared audio lexicon research](https://github.com/zzstoatzz/plyr.fm/blob/main/docs/research/2026-01-03-shared-audio-lexicon.md) 101 + 102 + ## schema evolution 103 + 104 + atproto schemas can only: 105 + - add optional fields 106 + - add new knownValues 107 + 108 + you cannot: 109 + - remove fields 110 + - change required fields 111 + - change field types 112 + 113 + plan schemas carefully. once published, breaking changes aren't possible. 114 + 115 + ## why this matters 116 + 117 + lexicons enable the "cooperative computing" model: 118 + 119 + - apps agree on schemas โ†’ they can read each other's data 120 + - namespace ownership โ†’ no collisions, clear responsibility 121 + - extensibility โ†’ schemas evolve without breaking 122 + - shared lexicons โ†’ true cross-app interoperability