docs: add lexicons overview documentation (#514)

living documentation explaining:
- what lexicons are and how ATProto uses them
- our fm.plyr namespace and environment awareness
- each lexicon (track, like, comment, list, profile) with history
- ATProto primitives we use (tid, literal:self, strongRef, knownValues)
- local indexing pattern for performance
- future codegen plans (issue #494)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude <noreply@anthropic.com>

authored by zzstoatzz.io Claude and committed by GitHub 31547edd 6143ab1a

Changed files
+152
docs
lexicons
+152
docs/lexicons/overview.md
··· 1 + # plyr.fm Lexicons 2 + 3 + > **note**: this is living documentation. the lexicon JSON definitions in `/lexicons/` are the source of truth. 4 + 5 + ## what are lexicons? 6 + 7 + lexicons are ATProto's schema system for defining record types and API methods. each schema uses a **Namespace ID (NSID)** in reverse-DNS format (e.g., `fm.plyr.track`) to uniquely identify it across the network. 8 + 9 + for background, see: 10 + - [ATProto lexicon guide](https://atproto.com/guides/lexicon) 11 + - [ATProto data model](https://atproto.com/guides/data-repos) 12 + 13 + ## our namespace 14 + 15 + plyr.fm uses the `fm.plyr` namespace for all custom record types. this is environment-aware: 16 + 17 + | environment | namespace | 18 + |-------------|-----------| 19 + | production | `fm.plyr` | 20 + | staging | `fm.plyr.stg` | 21 + | development | `fm.plyr.dev` | 22 + 23 + **important**: we never use Bluesky's `app.bsky.*` lexicons. even for concepts like "likes" that Bluesky has, we define our own (`fm.plyr.like`) to maintain namespace isolation and avoid coupling to another app's schema evolution. 24 + 25 + ## current lexicons 26 + 27 + ### fm.plyr.track 28 + 29 + the core content record - an audio track uploaded by an artist. 30 + 31 + ``` 32 + key: tid (timestamp-based ID) 33 + required: title, artist, audioUrl, fileType, createdAt 34 + optional: album, duration, features, imageUrl 35 + ``` 36 + 37 + this was the first lexicon, established when the project began. tracks are stored in the user's PDS (Personal Data Server) and indexed by plyr.fm for discovery. 38 + 39 + ### fm.plyr.like 40 + 41 + engagement signal indicating a user liked a track. 42 + 43 + ``` 44 + key: tid 45 + required: subject (strongRef to track), createdAt 46 + ``` 47 + 48 + introduced in november 2025. uses `com.atproto.repo.strongRef` to reference the target track by URI and CID, which is the standard ATProto pattern for cross-record references. 49 + 50 + early implementation mistakenly used `app.bsky.feed.like` before being corrected to use our own namespace - a lesson in why namespace discipline matters. 51 + 52 + ### fm.plyr.comment 53 + 54 + timed comments anchored to playback positions, similar to SoundCloud. 55 + 56 + ``` 57 + key: tid 58 + required: subject (strongRef to track), text, timestampMs, createdAt 59 + optional: updatedAt 60 + ``` 61 + 62 + introduced in november 2025. the `timestampMs` field captures playback position when the comment was made, enabling "click to seek" functionality. 63 + 64 + ### fm.plyr.list 65 + 66 + generic ordered collection for playlists, albums, and liked track lists. 67 + 68 + ``` 69 + key: tid 70 + required: items (array of strongRefs), createdAt 71 + optional: name, listType, updatedAt 72 + ``` 73 + 74 + introduced in december 2025. the `listType` field uses `knownValues` (an ATProto pattern for extensible enums) with current values: `album`, `playlist`, `liked`. 75 + 76 + this lexicon went through several iterations: 77 + 1. initially designed specifically for playlists 78 + 2. generalized to support albums and liked collections 79 + 3. simplified to just reference any record type via strongRef 80 + 81 + ### fm.plyr.actor.profile 82 + 83 + artist profile metadata specific to plyr.fm. 84 + 85 + ``` 86 + key: literal:self (singleton - only one per user) 87 + required: createdAt 88 + optional: bio, updatedAt 89 + ``` 90 + 91 + introduced in december 2025. uses `literal:self` as the record key, meaning each user can only have one profile record. this is updated via `putRecord` with rkey="self". 92 + 93 + ## ATProto primitives we use 94 + 95 + ### record keys 96 + 97 + - **tid**: timestamp-based IDs generated by the client. used for most records where multiple instances per user are expected (tracks, likes, comments, lists). 98 + - **literal:self**: a fixed key for singleton records. used for profile where only one record per user should exist. 99 + 100 + ### strongRef 101 + 102 + `com.atproto.repo.strongRef` is ATProto's standard way to reference another record: 103 + 104 + ```json 105 + { 106 + "uri": "at://did:plc:xyz/fm.plyr.track/abc123", 107 + "cid": "bafyreig..." 108 + } 109 + ``` 110 + 111 + the URI identifies the record; the CID is its content hash at a specific version. we use strongRefs in likes (referencing tracks), comments (referencing tracks), and lists (referencing any records). 112 + 113 + ### knownValues 114 + 115 + rather than strict enums, ATProto uses `knownValues` for extensible value sets. our `fm.plyr.list.listType` field declares known values but validators won't reject unknown values - this allows the schema to evolve without breaking existing records. 116 + 117 + ## local indexing 118 + 119 + ATProto records in user PDSes are the source of truth, but querying across PDSes is slow. we maintain local database tables that index records for efficient queries: 120 + 121 + - `tracks` table indexes `fm.plyr.track` records 122 + - `track_likes` table indexes `fm.plyr.like` records 123 + - `track_comments` table indexes `fm.plyr.comment` records 124 + - `playlists` table indexes `fm.plyr.list` records 125 + 126 + the sync pattern: when a user logs in, we fetch their records from their PDS and update our local index. background jobs keep indexes fresh. 127 + 128 + ## future: codegen from lexicon JSON 129 + 130 + currently, our Python models are hand-written to match the lexicon JSON definitions. this is error-prone. 131 + 132 + issue [#494](https://github.com/zzstoatzz/plyr.fm/issues/494) tracks building a portable lexicon-to-Pydantic codegen tool. the goal is to generate models directly from the JSON definitions in `/lexicons/`, ensuring the code always matches the schema. 133 + 134 + a Rust-based SDK for this purpose is in development. once complete, the workflow will be: 135 + 136 + 1. edit lexicon JSON definitions 137 + 2. run codegen to regenerate Python models 138 + 3. use generated models in application code 139 + 140 + this removes the manual sync burden and enables type-safe ATProto record handling. 141 + 142 + ## adding new lexicons 143 + 144 + when adding a new record type: 145 + 146 + 1. create the JSON definition in `/lexicons/` 147 + 2. add the collection to `AtprotoSettings` in `backend/src/backend/config.py` 148 + 3. add the OAuth scope in the auth flow 149 + 4. create database migration for local indexing 150 + 5. implement API endpoints and sync logic 151 + 152 + see existing lexicons as templates. keep records minimal - ATProto schemas can only add optional fields after publication, never remove or change required fields.