an atproto pds written in F# (.NET 9) 🦒
pds fsharp giraffe dotnet atproto

docs: add notes, update plans, and reorganize README

Changed files
+264 -70
PDSharp.Docs
+50
PDSharp.Docs/car.md
···
··· 1 + # CAR Format Implementation Notes 2 + 3 + The **Content Addressable aRchives (CAR)** format is used to store content-addressable objects (IPLD blocks) as a sequence of bytes. 4 + It is the standard format for repository export (`sync.getRepo`) and block transfer (`sync.getBlocks`) in the AT Protocol. 5 + 6 + ## 1. Format Overview (v1) 7 + 8 + A CAR file consists of a **Header** followed by a sequence of **Data** sections. 9 + 10 + ```text 11 + |--------- Header --------| |--------------- Data Section 1 ---------------| |--------------- Data Section 2 ---------------| ... 12 + [ varint | DAG-CBOR block ] [ varint | CID bytes | Block Data bytes ] [ varint | CID bytes | Block Data bytes ] ... 13 + ``` 14 + 15 + ### LEB128 Varints 16 + 17 + All length prefixes in CAR are encoded as **unsigned LEB128 (UVarint)** integers. 18 + 19 + - Used to prefix the Header block. 20 + - Used to prefix each Data section. 21 + 22 + ## 2. Header 23 + 24 + The header is a single DAG-CBOR encoded block describing the archive. 25 + 26 + **Encoding:** 27 + 28 + 1. Construct the CBOR map: `{ "version": 1, "roots": [<cid>, ...] }`. 29 + 2. Encode as DAG-CBOR bytes. 30 + 3. Prefix with the length of those bytes (as UVarint). 31 + 32 + ## 3. Data Sections 33 + 34 + Following the header, the file contains a concatenated sequence of data sections. Each section represents one IPLD block. 35 + 36 + ```text 37 + [ Section Length (UVarint) ] [ CID (raw bytes) ] [ Binary Data ] 38 + ``` 39 + 40 + - **Section Length**: The total length of the *CID bytes* + *Binary Data*. 41 + - **CID**: The raw binary representation of the block's CID (usually CIDv1 + DAG-CBOR + SHA2-256). 42 + - **Binary Data**: The actual content of the block. 43 + 44 + The Section Length *includes* the length of the CID. 45 + 46 + This is slightly different from some other framing formats where length might only cover the payload. 47 + 48 + ## 4. References 49 + 50 + - [IPLD CARv1 Specification](https://ipld.io/specs/transport/car/carv1/)
+46
PDSharp.Docs/cbor.md
···
··· 1 + # DAG-CBOR Implementation Notes 2 + 3 + DAG-CBOR is the canonical data serialization format for the AT Protocol. 4 + It is a strict subset of CBOR (RFC 8949) with specific rules for determinism and linking. 5 + 6 + ## 1. Canonicalization Rules 7 + 8 + To ensure consistent Content IDs (CIDs) for the same data, specific canonicalization rules must be followed during encoding. 9 + 10 + ### Map Key Sorting 11 + 12 + Maps must be sorted by keys. The sorting order is **NOT** standard lexicographical order. 13 + 14 + 1. **Length**: Shorter keys come first. 15 + 2. **Bytes**: keys of the same length are sorted lexicographically by their UTF-8 byte representation. 16 + 17 + **Example:** 18 + 19 + - `"a"` (len 1) comes before `"aa"` (len 2). 20 + - `"b"` (len 1) comes before `"aa"` (len 2). 21 + - `"a"` comes before `"b"`. 22 + 23 + ### Integer Encoding 24 + 25 + Integers must be encoded using the smallest possible representation. 26 + 27 + `System.Formats.Cbor` (in Strict mode) generally handles this, but care must be taken to treat `int`, `int64`, and `uint64` consistently. 28 + 29 + ## 2. Content Addressing (CIDs) 30 + 31 + Links to other nodes (CIDs) are encoded using **CBOR Tag 42**. 32 + 33 + ### Format 34 + 35 + 1. **Tag**: `42` (Major type 6, value 42). 36 + 2. **Payload**: A byte string containing: 37 + - The `0x00` byte (Multibase identity prefix, required by IPLD specs for binary CID inclusion). 38 + - The raw bytes of the CID. 39 + 40 + ## 3. Known Gotchas 41 + 42 + - **Float vs Int**: 43 + AT Protocol generally discourages floats where integers suffice. 44 + F# types must be matched carefully to avoid encoding `2.0` instead of `2`. 45 + - **String Encoding**: 46 + Must be UTF-8. Indefinite length strings are prohibited in DAG-CBOR.
+69
PDSharp.Docs/mst.md
···
··· 1 + # Merkle Search Tree (MST) Implementation Notes 2 + 3 + The Merkle Search Tree (MST) is a probabilistic, balanced search tree used by the AT Protocol to store repository records. 4 + 5 + ## Overview 6 + 7 + MSTs combine properties of B-trees and Merkle trees to ensure: 8 + 9 + 1. **Determinism**: The tree structure is determined by the keys (and their hashes), not insertion order. 10 + 2. **Verifyability**: Every node is content-addressed (CID), allowing the entire state to be verified via a single root hash. 11 + 3. **Efficiency**: Efficient key-value lookups and delta-based sync (subtrees that haven't changed share the same CIDs). 12 + 13 + ## Core Concepts 14 + 15 + ### Layering (Probabilistic Balance) 16 + 17 + MSTs do not use rotation for balancing. Instead, they assign every key a "layer" based on its hash. 18 + 19 + - **Formula**: 20 + `Layer(key) = countLeadingZeros(SHA256(key)) / 2`. 21 + - **Fanout**: 22 + The divisor `2` implies a fanout of roughly 4 (2 bits per layer increment). 23 + - Keys with higher layers appear higher in the tree, splitting the range of keys below them. 24 + 25 + ### Data Structure (`MstNode`) 26 + 27 + An MST node consists of: 28 + 29 + - **Left Child (`l`)**: Use to traverse to keys lexicographically smaller than the first entry in this node. 30 + - **Entries (`e`)**: A sorted list of entries. Each entry contains: 31 + - **Prefix Length (`p`)**: Length of the shared prefix with the *previous* key in the node (or the split key). 32 + - **Key Suffix (`k`)**: The remaining bytes of the key. 33 + - **Value (`v`)**: The CID of the record data. 34 + - **Tree (`t`)**: (Optional) CID of the subtree containing keys between this entry and the next. 35 + 36 + **Serialization**: The node is serialized as a DAG-CBOR array: `[l, [e1, e2, ...]]`. 37 + 38 + ## Algorithms 39 + 40 + ### Insertion (`Put`) 41 + 42 + Insertion relies on the "Layer" property: 43 + 44 + 1. Calculate `Layer(newKey)`. 45 + 2. Traverse the tree from the root. 46 + 3. **Split Condition**: If `Layer(newKey)` is **greater** than the layer of the current node, the new key belongs *above* this node. 47 + - The current node is **split** into two children (Left and Right) based on the `newKey`. 48 + - The `newKey` becomes a new node pointing to these two children. 49 + 4. **Recurse**: If `Layer(newKey)` is **less** than the current node, find the correct child subtree (based on key comparison) and recurse. 50 + 5. **Same Layer**: If `Layer(newKey)` equals the current node's layer: 51 + - Insert perfectly into the sorted entries list. 52 + - Any existing child pointer at that position must be split and redistributed if necessary (though spec usually implies layers are unique enough or handled by standard BST insert at that level). 53 + 54 + ### Deletion 55 + 56 + 1. Locate the key. 57 + 2. Remove the entry. 58 + 3. **Merge**: 59 + Since the key acted as a separator for two subtrees (its "Left" previous child and its "Tree" child), removing it requires merging these two adjacent subtrees into a single valid MST node to preserve the tree's density and structure. 60 + 61 + ### Determinism & Prefix Compression 62 + 63 + - **Canonical Order**: Keys must always be sorted. 64 + - **Prefix Compression**: 65 + Crucial for space saving. 66 + The prefix length `p` is calculated relative to the *immediately preceding key* in the node. 67 + - **Issues**: 68 + Insertion order *should not* matter (commutativity). 69 + However, implementations must be careful with `Split` and `Merge` operations to ensure exactly the same node boundaries are created regardless of history.
+72 -35
README.md
··· 1 # PDSharp 2 3 - > A Personal Data Server (PDS) for the AT Protocol, written in F# with Giraffe. 4 5 ## Goal 6 ··· 8 9 ## Requirements 10 11 - - .NET 9.0 SDK 12 - - [Just](https://github.com/casey/just) (optional, for potential future task running) 13 14 ## Getting Started 15 ··· 34 35 The server will start at `http://localhost:5000`. 36 37 ## API Testing 38 39 - ### Server Info 40 41 ```bash 42 curl http://localhost:5000/xrpc/com.atproto.server.describeServer 43 ``` 44 45 ### Record Operations 46 47 - **Create a record:** 48 49 ```bash 50 curl -X POST http://localhost:5000/xrpc/com.atproto.repo.createRecord \ ··· 52 -d '{"repo":"did:web:test","collection":"app.bsky.feed.post","record":{"text":"Hello, ATProto!"}}' 53 ``` 54 55 - **Get a record** (use the rkey from createRecord response): 56 57 ```bash 58 curl "http://localhost:5000/xrpc/com.atproto.repo.getRecord?repo=did:web:test&collection=app.bsky.feed.post&rkey=<RKEY>" 59 ``` 60 61 - **Put a record** (upsert with explicit rkey): 62 63 ```bash 64 curl -X POST http://localhost:5000/xrpc/com.atproto.repo.putRecord \ ··· 66 -d '{"repo":"did:web:test","collection":"app.bsky.feed.post","rkey":"my-post","record":{"text":"Updated!"}}' 67 ``` 68 69 ### Sync & CAR Export 70 71 - **Get entire repository as CAR:** 72 73 ```bash 74 curl "http://localhost:5000/xrpc/com.atproto.sync.getRepo?did=did:web:test" -o repo.car 75 ``` 76 77 - **Get specific blocks** (comma-separated CIDs): 78 79 ```bash 80 curl "http://localhost:5000/xrpc/com.atproto.sync.getBlocks?did=did:web:test&cids=<CID1>,<CID2>" -o blocks.car 81 ``` 82 83 - **Get a blob by CID:** 84 85 ```bash 86 curl "http://localhost:5000/xrpc/com.atproto.sync.getBlob?did=did:web:test&cid=<BLOB_CID>" 87 ``` 88 89 ### Firehose (WebSocket) 90 91 Subscribe to real-time commit events using [websocat](https://github.com/vi/websocat): 92 93 - ```bash 94 - # Install websocat (macOS) 95 - brew install websocat 96 97 - # Connect to firehose 98 websocat ws://localhost:5000/xrpc/com.atproto.sync.subscribeRepos 99 ``` 100 101 Then create/update records in another terminal to see CBOR-encoded commit events stream in real-time. 102 103 - **With cursor for resumption:** 104 105 ```bash 106 websocat "ws://localhost:5000/xrpc/com.atproto.sync.subscribeRepos?cursor=5" 107 ``` 108 109 - ## Configuration 110 - 111 - The application uses `appsettings.json` and supports Environment Variable overrides. 112 - 113 - | Key | Env Var | Default | Description | 114 - | ----------- | ------------------- | ----------------------- | ------------------------- | 115 - | `DidHost` | `PDSHARP_DidHost` | `did:web:localhost` | The DID of the PDS itself | 116 - | `PublicUrl` | `PDSHARP_PublicUrl` | `http://localhost:5000` | Publicly reachable URL | 117 - 118 - Example `appsettings.json`: 119 - 120 - ```json 121 - { 122 - "PublicUrl": "http://localhost:5000", 123 - "DidHost": "did:web:localhost" 124 - } 125 - ``` 126 127 ## Architecture 128 129 - ### App (Giraffe) 130 131 - `XrpcRouter`: `/xrpc/<NSID>` routing 132 - `Auth`: Session management (JWTs) 133 - `RepoApi`: Write/Read records (`putRecord`, `getRecord`) 134 - `ServerApi`: Server meta (`describeServer`) 135 136 - ### Core (Pure F#) 137 138 - `DidResolver`: Identity resolution 139 - `RepoEngine`: MST, DAG-CBOR, CIDs, Blocks 140 - `Models`: Data types for XRPC/Database 141 142 - ### Infra 143 144 - SQLite/Postgres for persistence 145 - S3/Disk for blob storage
··· 1 + <!-- markdownlint-disable MD033 --> 2 # PDSharp 3 4 + A Personal Data Server (PDS) for the AT Protocol, written in F# with Giraffe. 5 6 ## Goal 7 ··· 9 10 ## Requirements 11 12 + .NET 9.0 SDK 13 14 ## Getting Started 15 ··· 34 35 The server will start at `http://localhost:5000`. 36 37 + ## Configuration 38 + 39 + The application uses `appsettings.json` and supports Environment Variable overrides. 40 + 41 + | Key | Env Var | Default | Description | 42 + | ----------- | ------------------- | ----------------------- | ------------------------- | 43 + | `DidHost` | `PDSHARP_DidHost` | `did:web:localhost` | The DID of the PDS itself | 44 + | `PublicUrl` | `PDSHARP_PublicUrl` | `http://localhost:5000` | Publicly reachable URL | 45 + 46 + Example `appsettings.json`: 47 + 48 + ```json 49 + { 50 + "PublicUrl": "http://localhost:5000", 51 + "DidHost": "did:web:localhost" 52 + } 53 + ``` 54 + 55 ## API Testing 56 57 + <details> 58 + <summary>Server Info</summary> 59 60 ```bash 61 curl http://localhost:5000/xrpc/com.atproto.server.describeServer 62 ``` 63 64 + </details> 65 + 66 ### Record Operations 67 68 + <details> 69 + <summary>Create a record</summary> 70 71 ```bash 72 curl -X POST http://localhost:5000/xrpc/com.atproto.repo.createRecord \ ··· 74 -d '{"repo":"did:web:test","collection":"app.bsky.feed.post","record":{"text":"Hello, ATProto!"}}' 75 ``` 76 77 + </details> 78 + 79 + <details> 80 + <summary>Get a record</summary> 81 82 ```bash 83 curl "http://localhost:5000/xrpc/com.atproto.repo.getRecord?repo=did:web:test&collection=app.bsky.feed.post&rkey=<RKEY>" 84 ``` 85 86 + </details> 87 + 88 + <details> 89 + <summary>Put a record</summary> 90 91 ```bash 92 curl -X POST http://localhost:5000/xrpc/com.atproto.repo.putRecord \ ··· 94 -d '{"repo":"did:web:test","collection":"app.bsky.feed.post","rkey":"my-post","record":{"text":"Updated!"}}' 95 ``` 96 97 + </details> 98 + 99 ### Sync & CAR Export 100 101 + <details> 102 + <summary>Get entire repository as CAR</summary> 103 104 ```bash 105 curl "http://localhost:5000/xrpc/com.atproto.sync.getRepo?did=did:web:test" -o repo.car 106 ``` 107 108 + </details> 109 + 110 + <details> 111 + <summary>Get specific blocks</summary> 112 113 ```bash 114 curl "http://localhost:5000/xrpc/com.atproto.sync.getBlocks?did=did:web:test&cids=<CID1>,<CID2>" -o blocks.car 115 ``` 116 117 + </details> 118 + 119 + <details> 120 + <summary>Get a blob by CID</summary> 121 122 ```bash 123 curl "http://localhost:5000/xrpc/com.atproto.sync.getBlob?did=did:web:test&cid=<BLOB_CID>" 124 ``` 125 126 + </details> 127 + 128 ### Firehose (WebSocket) 129 130 Subscribe to real-time commit events using [websocat](https://github.com/vi/websocat): 131 132 + <details> 133 + <summary>Open a WebSocket connection</summary> 134 135 + ```bash 136 websocat ws://localhost:5000/xrpc/com.atproto.sync.subscribeRepos 137 ``` 138 139 + </details> 140 + 141 + <br /> 142 Then create/update records in another terminal to see CBOR-encoded commit events stream in real-time. 143 144 + <br /> 145 + 146 + <details> 147 + <summary>Open a WebSocket connection with cursor for resumption</summary> 148 149 ```bash 150 websocat "ws://localhost:5000/xrpc/com.atproto.sync.subscribeRepos?cursor=5" 151 ``` 152 153 + </details> 154 155 ## Architecture 156 157 + <details> 158 + <summary>App (Giraffe)</summary> 159 160 - `XrpcRouter`: `/xrpc/<NSID>` routing 161 - `Auth`: Session management (JWTs) 162 - `RepoApi`: Write/Read records (`putRecord`, `getRecord`) 163 - `ServerApi`: Server meta (`describeServer`) 164 165 + </details> 166 + 167 + <details> 168 + <summary>Core (Pure F#)</summary> 169 170 - `DidResolver`: Identity resolution 171 - `RepoEngine`: MST, DAG-CBOR, CIDs, Blocks 172 - `Models`: Data types for XRPC/Database 173 174 + </details> 175 + 176 + <details> 177 + <summary>Infra</summary> 178 179 - SQLite/Postgres for persistence 180 - S3/Disk for blob storage 181 + 182 + </details>
+27 -35
roadmap.txt
··· 59 - [x] Conformance testing: diff CIDs/CARs/signatures vs reference PDS 60 DoD: Same inputs → same outputs for repo/sync surfaces 61 -------------------------------------------------------------------------------- 62 - Milestone J: Persistence + Backups 63 -------------------------------------------------------------------------------- 64 - Deliverables: 65 - - BackupOps module in Core (scheduler unit / cron / scripts, plus Litestream config) 66 - Backups (SQLite) 67 - [ ] Set PDS_SQLITE_DISABLE_WAL_AUTO_CHECKPOINT=true (Litestream-friendly) 68 - [ ] Run a scheduled backup/replication job that: 69 - - finds recently updated DBs 70 - - backs up /pds/actors/* and PDS-wide DBs 71 - - runs on SIGTERM during deploys (avoid missing last writes) 72 - Backups (Blobs) 73 - [ ] Configurable Options (app settings): 74 - (A) Disk blobs: include /pds/blocks in backups 75 - (B) S3-compatible blobstore: rely on object-store durability 76 - Guardrails 77 - [ ] Uptime check: https://<pds>/xrpc/_health 78 - [ ] Alert if “latest backup” is older than N minutes. 79 - [ ] Alert on disk pressure for /pds. 80 DoD: 81 - - You can restore onto a fresh host and pass the P3 verification checklist. 82 - - Backups run automatically and are observable (“last successful backup”). 83 - - Backup set is explicitly documented (DBs + blobs decision). 84 ================================================================================ 85 PHASE 2: DEPLOYMENT (Self-Host) 86 ================================================================================ 87 - Milestone J: Topology + Domain Planning 88 -------------------------------------------------------------------------------- 89 - Choose PDS hostname (pds.example.com) vs handle domain (example.com) 90 - Obtain domain, DNS access, VPS with static IP, reverse proxy 91 DoD: Clear plan for PDS location, handle, and DID resolution 92 -------------------------------------------------------------------------------- 93 - Milestone K: DNS + TLS + Reverse Proxy 94 -------------------------------------------------------------------------------- 95 - DNS A/AAAA records for PDS hostname 96 - TLS certs (ACME) via Caddy 97 DoD: https://<pds-hostname> responds with valid cert 98 -------------------------------------------------------------------------------- 99 - Milestone L: Deploy PDSharp 100 -------------------------------------------------------------------------------- 101 - Deploy built PDS with persistence (SQLite/Postgres + blob storage) 102 - Verify /xrpc/com.atproto.server.describeServer 103 DoD: describeServer returns capabilities payload 104 -------------------------------------------------------------------------------- 105 - Milestone M: Account Creation 106 -------------------------------------------------------------------------------- 107 - Create account using admin tooling 108 - Verify authentication: createSession 109 DoD: Obtain session and perform authenticated write 110 -------------------------------------------------------------------------------- 111 - Milestone N: Smoke Test Repo + Blobs 112 -------------------------------------------------------------------------------- 113 - Write record via putRecord 114 - Upload blob, verify retrieval via sync.getBlob 115 DoD: Posts appear in clients, media loads reliably 116 -------------------------------------------------------------------------------- 117 - Milestone O: Account Migration 118 -------------------------------------------------------------------------------- 119 - Export/import from bsky.social 120 - Update DID service endpoint 121 - Verify handle/DID resolution 122 DoD: Handle unchanged, DID points to your PDS 123 -------------------------------------------------------------------------------- 124 - Milestone P: Reliability 125 - -------------------------------------------------------------------------------- 126 - - Backups: repo storage + database + blobs 127 - - Restore drill on fresh instance 128 - - Monitoring: uptime checks for describeServer + getBlob 129 - DoD: Restore from backup passes smoke tests 130 - -------------------------------------------------------------------------------- 131 - Milestone Q: Updates + Security 132 -------------------------------------------------------------------------------- 133 - Update cadence with rollback plan 134 - Rate limits and access controls at proxy 135 - - Log retention and disk growth alerts 136 DoD: Update smoothly, maintain stable federation 137 ================================================================================ 138 QUICK CHECKLIST
··· 59 - [x] Conformance testing: diff CIDs/CARs/signatures vs reference PDS 60 DoD: Same inputs → same outputs for repo/sync surfaces 61 -------------------------------------------------------------------------------- 62 + Milestone J: Storage Backend Configuration 63 -------------------------------------------------------------------------------- 64 + - [ ] Configure SQLite WAL mode (PDS_SQLITE_DISABLE_WAL_AUTO_CHECKPOINT=true) 65 + - [ ] Implement S3-compatible blobstore adapter (optional via config) 66 + - [ ] Configure disk-based vs S3-based blob storage selection 67 + DoD: PDS runs with S3 blobs (if configured) and SQLite passes Litestream checks 68 + -------------------------------------------------------------------------------- 69 + Milestone K: Backup Automation + Guardrails 70 + -------------------------------------------------------------------------------- 71 + - [ ] Implement BackupOps module (scheduler/cron logic) 72 + - [ ] Automated backup jobs: 73 + - [ ] Databases (Litestream or raw copy) + /pds/actors backup 74 + - [ ] Local disk blobs (if applicable) 75 + - [ ] Guardrails & Monitoring: 76 + - [ ] Uptime check endpoint: /xrpc/_health with JSON status 77 + - [ ] Alerts: "Latest backup" too old, Disk pressure > 90% 78 + - [ ] Log retention policies 79 DoD: 80 + - Backups run automatically and report status 81 + - Health checks indicate system state 82 + - Restore drill: Restore backups onto a fresh host passes verification 83 + - Backup set is explicitly documented 84 ================================================================================ 85 PHASE 2: DEPLOYMENT (Self-Host) 86 ================================================================================ 87 + Milestone L: Topology + Domain Planning 88 -------------------------------------------------------------------------------- 89 - Choose PDS hostname (pds.example.com) vs handle domain (example.com) 90 - Obtain domain, DNS access, VPS with static IP, reverse proxy 91 DoD: Clear plan for PDS location, handle, and DID resolution 92 -------------------------------------------------------------------------------- 93 + Milestone M: DNS + TLS + Reverse Proxy 94 -------------------------------------------------------------------------------- 95 - DNS A/AAAA records for PDS hostname 96 - TLS certs (ACME) via Caddy 97 DoD: https://<pds-hostname> responds with valid cert 98 -------------------------------------------------------------------------------- 99 + Milestone N: Deploy PDSharp 100 -------------------------------------------------------------------------------- 101 - Deploy built PDS with persistence (SQLite/Postgres + blob storage) 102 - Verify /xrpc/com.atproto.server.describeServer 103 DoD: describeServer returns capabilities payload 104 -------------------------------------------------------------------------------- 105 + Milestone O: Account Creation 106 -------------------------------------------------------------------------------- 107 - Create account using admin tooling 108 - Verify authentication: createSession 109 DoD: Obtain session and perform authenticated write 110 -------------------------------------------------------------------------------- 111 + Milestone P: Smoke Test Repo + Blobs 112 -------------------------------------------------------------------------------- 113 - Write record via putRecord 114 - Upload blob, verify retrieval via sync.getBlob 115 DoD: Posts appear in clients, media loads reliably 116 -------------------------------------------------------------------------------- 117 + Milestone Q: Account Migration 118 -------------------------------------------------------------------------------- 119 - Export/import from bsky.social 120 - Update DID service endpoint 121 - Verify handle/DID resolution 122 DoD: Handle unchanged, DID points to your PDS 123 -------------------------------------------------------------------------------- 124 + Milestone R: Updates + Security 125 -------------------------------------------------------------------------------- 126 - Update cadence with rollback plan 127 - Rate limits and access controls at proxy 128 DoD: Update smoothly, maintain stable federation 129 ================================================================================ 130 QUICK CHECKLIST