···30>
31> This project and plcbundle specification is currently unstable and under heavy development. Things can break at any time. Bundle hashes or data formats may change. **Do not** use this for production systems. Please wait for the **`1.0`** release.
3233-plcbundle archives AT Protocol's DID PLC Directory operations into immutable, cryptographically-chained bundles of 10,000 operations. Each bundle is hashed (SHA-256), compressed (zstd), and linked to the previous bundle, creating a verifiable chain of DID operations.
3435This repository contains a reference library and a CLI tool written in Go language.
36
···30>
31> This project and plcbundle specification is currently unstable and under heavy development. Things can break at any time. Bundle hashes or data formats may change. **Do not** use this for production systems. Please wait for the **`1.0`** release.
3233+plcbundle archives AT Protocol's [DID PLC Directory](https://plc.directory/) operations into immutable, cryptographically-chained bundles of 10,000 operations. Each bundle is hashed (SHA-256), compressed (zstd), and linked to the previous bundle, creating a verifiable chain of DID operations.
3435This repository contains a reference library and a CLI tool written in Go language.
36
+12-10
SPECIFICATION.md
···1-# plcbundle V1 Specification
0023## 1. Abstract
45-`plcbundle` is a system for archiving and distributing DID PLC (Placeholder) directory operations in a secure, verifiable, and efficient manner. It groups chronological operations from the PLC directory into immutable, compressed bundles. These bundles are cryptographically linked, forming a verifiable chain of history. This specification details the V1 format for the bundles, the index file that describes them, and the processes for creating and verifying them to ensure interoperability between implementations.
67---
89## 2. Key Terminology
1011-* **Operation:** A single DID PLC operation, as exported from a PLC directory's `/export` endpoint. It is represented as a single JSON object.
12* **Bundle:** A single compressed file containing a fixed number of operations.
13* **Index:** A JSON file named `plc_bundles.json` that contains metadata for all available bundles in the repository. It is the entry point for discovering and verifying bundles.
14-* **Content Hash:** The SHA-256 hash of the *uncompressed* JSONL content of a single bundle. This hash uniquely identifies the bundle's data.
15* **Chain Hash:** A cumulative SHA-256 hash that links a bundle to its predecessor, ensuring the integrity and order of the entire chain.
16* **Compressed Hash:** The SHA-256 hash of the *compressed* `.jsonl.zst` bundle file. This is used to verify file integrity during downloads.
17···30* **Naming Convention:** Bundles are named sequentially with six-digit zero-padding, following the format `%06d.jsonl.zst`.
31 * *Examples:* `000001.jsonl.zst`, `000123.jsonl.zst`.
32* **Content:** Each bundle contains exactly **10,000** PLC operations.
33-* **Compression:** The JSONL content is compressed using Zstandard (zstd).
3435### 4.2. Serialization and Data Integrity
36···9091### 6.1. Collecting Operations
9293-1. **Mempool:** Operations are fetched from the PLC directory's `/export` endpoint and collected into a temporary staging area, or "mempool".
94-2. **Chronological Validation:** The mempool must enforce that operations are added in chronological order, as described in Section 3.
953. **Boundary Deduplication:** To prevent including the same operation in two adjacent bundles, the system must use a "boundary CID" mechanism. When creating bundle `N+1`, it must ignore any fetched operations whose `createdAt` timestamp and `CID` match those from the very end of bundle `N`.
964. **Filling the Mempool:** The process continues fetching and deduplicating operations until at least 10,000 are collected in the mempool.
9798### 6.2. Creating a Bundle File
991001. **Take Operations:** Exactly 10,000 operations are taken from the front of the mempool.
101-2. **Serialize:** These operations are serialized into a single block of newline-delimited JSON (JSONL), adhering to the integrity rules in Section 4.2.
102-3. **Compress and Save:** The JSONL data is compressed using Zstandard and saved to a file with the appropriate sequential name (e.g., `000001.jsonl.zst`).
103104### 6.3. Hash Calculation
105···123124### 6.4. Updating the Index
125126-1. A new `BundleMetadata` object is created for the new bundle, populated with all the information described in Section 5.2.
1272. This metadata object is appended to the `bundles` array in the main `Index` object.
1283. The `Index` object's top-level fields (`last_bundle`, `updated_at`, `total_size_bytes`) are updated to reflect the new state.
1294. The entire `Index` object is serialized to JSON and saved, atomically overwriting the existing `plc_bundles.json` file.
···1+# plcbundle V1 (draft) Specification
2+3+> ⚠️ **Preview Version - Request for Comments!**
45## 1. Abstract
67+`plcbundle` is a system for archiving and distributing [DID PLC (Placeholder) directory](plc.directory) operations in a secure, verifiable, and efficient manner. It groups chronological operations from the PLC directory into immutable, compressed bundles. These bundles are cryptographically linked, forming a verifiable chain of history. This specification details the V1 format for the bundles, the index file that describes them, and the processes for creating and verifying them to ensure interoperability between implementations.
89---
1011## 2. Key Terminology
1213+* **Operation:** A single DID PLC operation, as exported from a PLC directory's [`/export` endpoint](https://web.plc.directory/api/redoc#operation/Export). It is represented as a single JSON object.
14* **Bundle:** A single compressed file containing a fixed number of operations.
15* **Index:** A JSON file named `plc_bundles.json` that contains metadata for all available bundles in the repository. It is the entry point for discovering and verifying bundles.
16+* **Content Hash:** The [SHA-256](https://en.wikipedia.org/wiki/SHA-2) hash of the *uncompressed* [JSONL](https://jsonlines.org/) content of a single bundle. This hash uniquely identifies the bundle's data.
17* **Chain Hash:** A cumulative SHA-256 hash that links a bundle to its predecessor, ensuring the integrity and order of the entire chain.
18* **Compressed Hash:** The SHA-256 hash of the *compressed* `.jsonl.zst` bundle file. This is used to verify file integrity during downloads.
19···32* **Naming Convention:** Bundles are named sequentially with six-digit zero-padding, following the format `%06d.jsonl.zst`.
33 * *Examples:* `000001.jsonl.zst`, `000123.jsonl.zst`.
34* **Content:** Each bundle contains exactly **10,000** PLC operations.
35+* **Compression:** The JSONL content is compressed using [Zstandard](https://facebook.github.io/zstd/) (zstd).
3637### 4.2. Serialization and Data Integrity
38···9293### 6.1. Collecting Operations
9495+1. **Mempool:** Operations are fetched from the PLC directory's [`/export` endpoint](https://web.plc.directory/api/redoc#operation/Export) and collected into a temporary staging area, or "mempool".
96+2. **Chronological Validation:** The mempool must enforce that operations are added in chronological order, as described in [Section 3](#3-operation-order-and-reproducibility).
973. **Boundary Deduplication:** To prevent including the same operation in two adjacent bundles, the system must use a "boundary CID" mechanism. When creating bundle `N+1`, it must ignore any fetched operations whose `createdAt` timestamp and `CID` match those from the very end of bundle `N`.
984. **Filling the Mempool:** The process continues fetching and deduplicating operations until at least 10,000 are collected in the mempool.
99100### 6.2. Creating a Bundle File
1011021. **Take Operations:** Exactly 10,000 operations are taken from the front of the mempool.
103+2. **Serialize:** These operations are serialized into a single block of newline-delimited JSON ([JSONL](https://jsonlines.org/)), adhering to the integrity rules in [Section 4.2](#42-serialization-and-data-integrity).
104+3. **Compress and Save:** The JSONL data is compressed using [Zstandard](https://facebook.github.io/zstd/) and saved to a file with the appropriate sequential name (e.g., `000001.jsonl.zst`).
105106### 6.3. Hash Calculation
107···125126### 6.4. Updating the Index
127128+1. A new `BundleMetadata` object is created for the new bundle, populated with all the information described in [Section 5.2](#52-bundlemetadata-object).
1292. This metadata object is appended to the `bundles` array in the main `Index` object.
1303. The `Index` object's top-level fields (`last_bundle`, `updated_at`, `total_size_bytes`) are updated to reflect the new state.
1314. The entire `Index` object is serialized to JSON and saved, atomically overwriting the existing `plc_bundles.json` file.