an atproto pds written in F# (.NET 9) 馃
pds
fsharp
giraffe
dotnet
atproto
1# DAG-CBOR Implementation Notes
2
3DAG-CBOR is the canonical data serialization format for the AT Protocol.
4It is a strict subset of CBOR (RFC 8949) with specific rules for determinism and linking.
5
6## 1. Canonicalization Rules
7
8To ensure consistent Content IDs (CIDs) for the same data, specific canonicalization rules must be followed during encoding.
9
10### Map Key Sorting
11
12Maps must be sorted by keys. The sorting order is **NOT** standard lexicographical order.
13
141. **Length**: Shorter keys come first.
152. **Bytes**: keys of the same length are sorted lexicographically by their UTF-8 byte representation.
16
17**Example:**
18
19- `"a"` (len 1) comes before `"aa"` (len 2).
20- `"b"` (len 1) comes before `"aa"` (len 2).
21- `"a"` comes before `"b"`.
22
23### Integer Encoding
24
25Integers must be encoded using the smallest possible representation.
26
27`System.Formats.Cbor` (in Strict mode) generally handles this, but care must be taken to treat `int`, `int64`, and `uint64` consistently.
28
29## 2. Content Addressing (CIDs)
30
31Links to other nodes (CIDs) are encoded using **CBOR Tag 42**.
32
33### Format
34
351. **Tag**: `42` (Major type 6, value 42).
362. **Payload**: A byte string containing:
37 - The `0x00` byte (Multibase identity prefix, required by IPLD specs for binary CID inclusion).
38 - The raw bytes of the CID.
39
40## 3. Known Gotchas
41
42- **Float vs Int**:
43 AT Protocol generally discourages floats where integers suffice.
44 F# types must be matched carefully to avoid encoding `2.0` instead of `2`.
45- **String Encoding**:
46 Must be UTF-8. Indefinite length strings are prohibited in DAG-CBOR.