High-performance implementation of plcbundle written in Rust

plcbundle-rs High-Level API Design#

Overview#

The plcbundle-rs library provides a unified, high-level API through the BundleManager type. This API is designed to be:

  • Consistent: Same patterns across all operations
  • Efficient: Minimal allocations, streaming where possible
  • FFI-friendly: Clean C bindings that map naturally to Go
  • CLI-first: The Rust CLI uses only public high-level APIs

Quick Reference: All Methods#

// === Initialization ===
BundleManager::new<O: IntoManagerOptions>(directory: PathBuf, options: O) -> Result<Self>

// === Repository Setup ===
init_repository(directory: PathBuf, origin: String) -> Result<()>  // Creates plc_bundles.json

// === Bundle Loading ===
load_bundle(num: u32, options: LoadOptions) -> Result<LoadResult>
get_bundle_info(num: u32, flags: InfoFlags) -> Result<BundleInfo>

// === Single Operation Access ===
get_operation_raw(bundle_num: u32, position: usize) -> Result<String>
get_operation(bundle_num: u32, position: usize) -> Result<Operation>
get_operation_with_stats(bundle_num: u32, position: usize) -> Result<OperationResult>

// === Batch Operations ===
get_operations_batch(requests: Vec<OperationRequest>) -> Result<Vec<Operation>>
get_operations_range(start: u32, end: u32, filter: Option<OperationFilter>) -> RangeIterator

// === Query & Export ===
query(spec: QuerySpec) -> QueryIterator
export(spec: ExportSpec) -> ExportIterator
export_to_writer<W: Write>(spec: ExportSpec, writer: W) -> Result<ExportStats>

// === DID Operations ===
get_did_operations(did: &str) -> Result<Vec<Operation>>
resolve_did(did: &str) -> Result<DIDDocument>
batch_resolve_dids(dids: Vec<String>) -> Result<HashMap<String, Vec<Operation>>>
sample_random_dids(count: usize, seed: Option<u64>) -> Result<Vec<String>>

// === Verification ===
verify_bundle(num: u32, spec: VerifySpec) -> Result<VerifyResult>
verify_chain(spec: ChainVerifySpec) -> Result<ChainVerifyResult>

// === Rollback ===
rollback_plan(spec: RollbackSpec) -> Result<RollbackPlan>
rollback(spec: RollbackSpec) -> Result<RollbackResult>

// === Repository Cleanup ===
clean() -> Result<CleanResult>

// === Performance & Caching ===
prefetch_bundles(nums: Vec<u32>) -> Result<()>
warm_up(spec: WarmUpSpec) -> Result<()>
clear_caches()

// === DID Index ===
build_did_index(flush_interval: u32, progress_cb: Option<F>, num_threads: Option<usize>, interrupted: Option<Arc<AtomicBool>>) -> Result<RebuildStats>
verify_did_index(verbose: bool, flush_interval: u32, full: bool, progress_callback: Option<F>) -> Result<did_index::VerifyResult>
repair_did_index(num_threads: usize, flush_interval: u32, progress_callback: Option<F>) -> Result<did_index::RepairResult>
get_did_index_stats() -> HashMap<String, serde_json::Value>

// === Sync Operations (Async) ===
sync_next_bundle(client: &PLCClient) -> Result<u32>
sync_once(client: &PLCClient) -> Result<usize>

// === Remote Access (Async) ===
fetch_remote_index(target: &str) -> Result<Index>
fetch_remote_bundle(base_url: &str, bundle_num: u32) -> Result<Vec<Operation>>
fetch_remote_operation(base_url: &str, bundle_num: u32, position: usize) -> Result<String>

// === Server API Methods ===
get_plc_origin() -> String
stream_bundle_raw(bundle_num: u32) -> Result<File>
stream_bundle_decompressed(bundle_num: u32) -> Result<Box<dyn Read + Send>>
get_current_cursor() -> u64
resolve_handle_or_did(input: &str) -> Result<(String, u64)>
get_resolver_stats() -> HashMap<String, serde_json::Value>
get_handle_resolver_base_url() -> Option<String>
get_did_index() -> Arc<RwLock<did_index::Manager>>

// === Mempool Operations ===
get_mempool_stats() -> Result<MempoolStats>
get_mempool_operations() -> Result<Vec<Operation>>
add_to_mempool(ops: Vec<Operation>) -> Result<usize>
clear_mempool() -> Result<()>

// === Observability ===
get_stats() -> ManagerStats
get_last_bundle() -> u32
directory() -> &PathBuf

11. Sync & Repository Management#

Repository Initialization#

// Standalone function (used by CLI init command)
pub fn init_repository(directory: PathBuf, origin: String) -> Result<()>

Purpose: Initialize a new PLC bundle repository (like git init).

What it does:

  • Creates directory if it doesn't exist
  • Creates empty plc_bundles.json index file
  • Sets origin identifier

CLI Usage:

plcbundle init
plcbundle init /path/to/bundles --origin my-node

Sync from PLC Directory#

pub async fn sync_next_bundle(&mut self, client: &PLCClient) -> Result<u32>

Purpose: Fetch operations from PLC directory and create next bundle.

What it does:

  1. Gets boundary CIDs from last bundle (prevents duplicates)
  2. Fetches operations from PLC until mempool has 10,000
  3. Deduplicates using boundary CID logic
  4. Creates and saves bundle
  5. Returns bundle number
pub async fn sync_once(&mut self, client: &PLCClient) -> Result<usize>

Purpose: Sync until caught up with PLC directory (like git fetch).

Returns: Number of bundles synced

CLI Usage:

# Sync once
plcbundle sync

# Continuous daemon
plcbundle sync --continuous --interval 30s

# Max bundles
plcbundle sync --max-bundles 10

PLC Client#

pub struct PLCClient {
    // Private fields
}

impl PLCClient {
    pub fn new(base_url: impl Into<String>) -> Result<Self>
    pub async fn fetch_operations(&self, after: &str, count: usize) -> Result<Vec<PLCOperation>>
}

Features:

  • Rate limiting (90 requests/minute)
  • Automatic retries with exponential backoff
  • 429 (rate limit) handling

Mempool Operations#

pub fn get_mempool_stats(&self) -> Result<MempoolStats>

Purpose: Get current mempool statistics.

pub struct MempoolStats {
    pub count: usize,
    pub can_create_bundle: bool,      // count >= 10,000
    pub target_bundle: u32,
    pub min_timestamp: DateTime<Utc>,
    pub first_time: Option<DateTime<Utc>>,
    pub last_time: Option<DateTime<Utc>>,
    pub size_bytes: Option<usize>,
    pub did_count: Option<usize>,
}
pub fn get_mempool_operations(&self) -> Result<Vec<Operation>>
pub fn add_to_mempool(&self, ops: Vec<Operation>) -> Result<usize>
pub fn clear_mempool(&self) -> Result<()>

CLI Usage:

plcbundle mempool status
plcbundle mempool dump
plcbundle mempool clear

Boundary CID Deduplication#

Critical helper functions for preventing duplicates across bundle boundaries:

pub fn get_boundary_cids(operations: &[Operation]) -> HashSet<String>

Purpose: Extract CIDs that share the same timestamp as the last operation.

pub fn strip_boundary_duplicates(
    operations: Vec<Operation>,
    prev_boundary: &HashSet<String>
) -> Vec<Operation>

Purpose: Remove operations that were in the previous bundle's boundary.

Why this matters: Operations can have identical timestamps. When creating bundles, the last N operations might share a timestamp. The next fetch might return these same operations again. Must deduplicate!


Design Principles#

  1. Single Entry Point: All operations go through BundleManager
  2. Options Pattern: Complex operations use dedicated option structs
  3. Result Types: Operations return structured result types, not raw tuples
  4. Streaming by Default: Use iterators for large datasets
  5. No Direct File Access: CLI never opens bundle files directly

API Structure#

Core Manager#

pub struct BundleManager {
    // Private fields
}

impl BundleManager {
    /// Create a new BundleManager with options
    ///
    /// # Examples
    ///
    /// ```rust
    /// use plcbundle::{BundleManager, ManagerOptions};
    /// use std::path::PathBuf;
    ///
    /// // With default options
    /// let manager = BundleManager::new(PathBuf::from("."), ())?;
    ///
    /// // With custom options
    /// let options = ManagerOptions {
    ///     handle_resolver_url: Some("https://example.com".to_string()),
    ///     preload_mempool: true,
    ///     verbose: true,
    /// };
    /// let manager = BundleManager::new(PathBuf::from("."), options)?;
    /// ```
    pub fn new<O: IntoManagerOptions>(directory: PathBuf, options: O) -> Result<Self>
}

ManagerOptions#

/// Options for configuring BundleManager initialization
pub struct ManagerOptions {
    /// Optional handle resolver URL for resolving @handle.did identifiers
    pub handle_resolver_url: Option<String>,
    /// Whether to preload the mempool at initialization (for server use)
    pub preload_mempool: bool,
    /// Whether to enable verbose logging
    pub verbose: bool,
}

impl Default for ManagerOptions {
    fn default() -> Self {
        Self {
            handle_resolver_url: None,
            preload_mempool: false,
            verbose: false,
        }
    }
}

Usage:

  • Pass () to use default options: BundleManager::new(dir, ())
  • Pass ManagerOptions for custom configuration (including verbose mode)

1. Bundle Loading#

Individual Bundle Loading#

pub fn load_bundle(&self, bundle_num: u32, options: LoadOptions) -> Result<LoadResult>

Purpose: Load a single bundle with control over parsing, verification, and caching.

Options:

pub struct LoadOptions {
    pub verify_hash: bool,      // Verify bundle integrity
    pub decompress: bool,        // Decompress bundle data
    pub cache: bool,             // Cache in memory
    pub parse_operations: bool,  // Parse JSON into Operation structs
}

Result:

pub struct LoadResult {
    pub bundle_number: u32,
    pub operations: Vec<Operation>,
    pub metadata: BundleMetadata,
    pub hash: Option<String>,
}

Use Cases:

  • CLI: plcbundle info --bundle 42
  • CLI: plcbundle verify --bundle 42

2. Operation Access#

Single Operation#

pub fn get_operation(&self, bundle_num: u32, position: usize, options: OperationOptions) -> Result<Operation>

Purpose: Efficiently retrieve a single operation without loading entire bundle.

Options:

pub struct OperationOptions {
    pub raw_json: bool,          // Return raw JSON string (faster)
    pub parse: bool,             // Parse into Operation struct
}

Use Cases:

  • CLI: plcbundle op get 42 1337
  • CLI: plcbundle op show 420000

Batch Operations#

pub fn get_operations_batch(&self, requests: Vec<OperationRequest>) -> Result<Vec<Operation>>

Purpose: Retrieve multiple operations in one call (optimizes file I/O).

pub struct OperationRequest {
    pub bundle_num: u32,
    pub position: usize,
}

Range Operations#

pub fn get_operations_range(&self, start: u64, end: u64, filter: Option<OperationFilter>) -> Result<RangeIterator>

Purpose: Stream operations across bundle boundaries by global position.


3. DID Operations#

DID Lookup#

pub fn get_did_operations(&self, did: &str) -> Result<Vec<Operation>>

Purpose: Get all operations for a specific DID (requires DID index).

Batch DID Lookup#

pub fn batch_resolve_dids(&self, dids: Vec<String>) -> Result<HashMap<String, Vec<Operation>>>

Purpose: Efficiently resolve multiple DIDs in one call.

Use Cases:

  • Bulk DID resolution
  • Identity verification workflows

Random DID Sampling#

pub fn sample_random_dids(&self, count: usize, seed: Option<u64>) -> Result<Vec<String>>

Purpose: Retrieve pseudo-random DIDs directly from the DID index without touching bundle files.

Details:

  • Reads identifiers from memory-mapped shard data (no decompression).
  • Deterministic when seed is provided; otherwise uses current timestamp.
  • Useful for benchmarks, sampling, or quick spot checks.

4. Query & Export (Streaming)#

Query Operations#

pub fn query(&self, spec: QuerySpec) -> Result<QueryIterator>

Purpose: Execute JMESPath or simple path queries across bundles.

pub struct QuerySpec {
    pub expression: String,          // JMESPath or simple path
    pub bundles: BundleRange,        // Which bundles to search
    pub mode: QueryMode,             // Auto, Simple, or JMESPath
    pub filter: Option<OperationFilter>,
    pub parallel: Option<usize>,     // Number of threads (0=auto)
    pub batch_size: usize,           // Output batch size
}

pub enum QueryMode {
    Auto,
    Simple,      // Fast path: direct field access
    JMESPath,    // Full JMESPath evaluation
}

Iterator:

pub struct QueryIterator {
    // Implements Iterator<Item = Result<String>>
}

Use Cases:

  • CLI: plcbundle query "did" --bundles 1-100
  • CLI: plcbundle query "operation.type" --mode simple

Export Operations#

pub fn export(&self, spec: ExportSpec) -> Result<ExportIterator>

Purpose: Export operations in various formats with filtering.

pub struct ExportSpec {
    pub bundles: BundleRange,
    pub format: ExportFormat,           // JSONL only
    pub filter: Option<OperationFilter>,
    pub count: Option<usize>,           // Limit number of operations
    pub after_timestamp: Option<String>,// Filter by timestamp
}

pub enum ExportFormat {
    Jsonl,
}

pub enum BundleRange {
    Single(u32),
    Range(u32, u32),
    List(Vec<u32>),
    All,
}

Use Cases:

  • CLI: plcbundle export --range 1-100 --format jsonl
  • CLI: plcbundle export --all --after 2024-01-01T00:00:00Z

Export with Callback#

pub fn export_to_writer<W: Write>(&self, spec: ExportSpec, writer: W) -> Result<ExportStats>

Purpose: Export directly to a writer (file, stdout, network).

pub struct ExportStats {
    pub records_written: u64,
    pub bytes_written: u64,
    pub bundles_processed: u32,
    pub duration: Duration,
}

Use Cases:

  • FFI: Stream to Go io.Writer
  • Direct file writing without buffering

5. Verification#

Bundle Verification#

pub fn verify_bundle(&self, bundle_num: u32, spec: VerifySpec) -> Result<VerifyResult>

Purpose: Verify bundle integrity with various checks.

pub struct VerifySpec {
    pub check_hash: bool,
    pub check_compression: bool,
    pub check_json: bool,
    pub parallel: bool,
}

pub struct VerifyResult {
    pub valid: bool,
    pub hash_match: Option<bool>,
    pub compression_ok: Option<bool>,
    pub json_valid: Option<bool>,
    pub errors: Vec<String>,
}

Chain Verification#

pub fn verify_chain(&self, spec: ChainVerifySpec) -> Result<ChainVerifyResult>

Purpose: Verify chain continuity and prev pointers.

pub struct ChainVerifySpec {
    pub bundles: BundleRange,
    pub check_prev_links: bool,
    pub check_timestamps: bool,
}

pub struct ChainVerifyResult {
    pub valid: bool,
    pub bundles_checked: u32,
    pub chain_breaks: Vec<ChainBreak>,
}

pub struct ChainBreak {
    pub bundle_num: u32,
    pub did: String,
    pub reason: String,
}

Use Cases:

  • CLI: plcbundle verify --chain --bundles 1-100

6. Bundle Information#

pub fn get_bundle_info(&self, bundle_num: u32, flags: InfoFlags) -> Result<BundleInfo>

Purpose: Get metadata and statistics for a bundle.

pub struct InfoFlags {
    pub include_dids: bool,
    pub include_types: bool,
    pub include_timeline: bool,
}

pub struct BundleInfo {
    pub bundle_number: u32,
    pub operation_count: u32,
    pub did_count: Option<u32>,
    pub compressed_size: u64,
    pub uncompressed_size: u64,
    pub start_time: String,
    pub end_time: String,
    pub hash: String,
    pub operation_types: Option<HashMap<String, u32>>,
}

Use Cases:

  • CLI: plcbundle info --bundle 42 --detailed

7. Rollback Operations#

Plan Rollback#

pub fn rollback_plan(&self, spec: RollbackSpec) -> Result<RollbackPlan>

Purpose: Preview what would be rolled back (dry-run).

pub struct RollbackSpec {
    pub target_bundle: u32,
    pub keep_index: bool,
}

pub struct RollbackPlan {
    pub bundles_to_remove: Vec<u32>,
    pub total_size: u64,
    pub operations_affected: u64,
}

Execute Rollback#

pub fn rollback(&self, spec: RollbackSpec) -> Result<RollbackResult>

Purpose: Execute the rollback operation.

pub struct RollbackResult {
    pub bundles_removed: Vec<u32>,
    pub bytes_freed: u64,
    pub success: bool,
}

7.5. Repository Cleanup#

Clean Temporary Files#

pub fn clean(&self) -> Result<CleanResult>

Purpose: Remove all temporary files from the repository.

What it does:

  • Removes all .tmp files from the repository root directory (e.g., plc_bundles.json.tmp)
  • Removes temporary files from the DID index directory .plcbundle/ (e.g., config.json.tmp)
  • Removes temporary shard files from .plcbundle/shards/ (e.g., 00.tmp, 01.tmp, etc.)

Result:

pub struct CleanResult {
    pub files_removed: usize,
    pub bytes_freed: u64,
    pub errors: Option<Vec<String>>,
}

Use Cases:

  • CLI: plcbundle clean
  • Cleanup after interrupted operations
  • Maintenance tasks

Note: Temporary files are normally cleaned up automatically during atomic write operations. This method is useful for cleaning up leftover files from interrupted operations.


8. Performance & Caching#

Cache Management#

pub fn prefetch_bundles(&self, bundle_nums: Vec<u32>) -> Result<()>

Purpose: Preload bundles into cache for faster access.

pub fn warm_up(&self, spec: WarmUpSpec) -> Result<()>

Purpose: Warm up caches with intelligent prefetching.

pub struct WarmUpSpec {
    pub bundles: BundleRange,
    pub strategy: WarmUpStrategy,
}

pub enum WarmUpStrategy {
    Sequential,    // Load in order
    Parallel,      // Load concurrently
    Adaptive,      // Based on access patterns
}
pub fn clear_caches(&self)

Purpose: Clear all in-memory caches.


9. DID Index Management#

Build Index#

pub fn build_did_index<F>(
    &self,
    flush_interval: u32,
    progress_cb: Option<F>,
    num_threads: Option<usize>,
    interrupted: Option<Arc<AtomicBool>>,
) -> Result<RebuildStats>
where
    F: Fn(u32, u32, u64, u64) + Send + Sync, // (current, total, bytes_processed, total_bytes)

Purpose: Build or rebuild the DID → operations index from scratch.

Parameters:

  • flush_interval: Flush to disk every N bundles (0 = only at end)
  • progress_cb: Optional progress callback
  • num_threads: Number of threads (None = auto-detect)
  • interrupted: Optional flag to check for cancellation

Result:

pub struct RebuildStats {
    pub bundles_processed: u32,
    pub operations_indexed: u64,
    pub unique_dids: usize,
    pub duration: Duration,
}

Verify Index#

pub fn verify_did_index<F>(
    &self,
    verbose: bool,
    flush_interval: u32,
    full: bool,
    progress_callback: Option<F>,
) -> Result<did_index::VerifyResult>
where
    F: Fn(u32, u32, u64, u64) + Send + Sync, // (current, total, bytes_processed, total_bytes)

Purpose: Verify DID index integrity. Performs standard checks by default, or full verification (rebuild and compare) if full is true.

Parameters:

  • verbose: Enable verbose logging
  • flush_interval: Flush interval for full rebuild (if full is true)
  • full: If true, rebuilds index in temp directory and compares with existing
  • progress_callback: Optional progress callback for full verification

Result:

pub struct VerifyResult {
    pub errors: usize,
    pub warnings: usize,
    pub missing_base_shards: usize,
    pub missing_delta_segments: usize,
    pub shards_checked: usize,
    pub segments_checked: usize,
    pub error_categories: Vec<(String, usize)>,
    pub index_last_bundle: u32,
    pub last_bundle_in_repo: u32,
}

Use Cases:

  • CLI: plcbundle index verify (standard check)
  • CLI: plcbundle index verify --full (full rebuild and compare)
  • Server: Startup integrity check (call with full=false and check missing_base_shards/missing_delta_segments)

Repair Index#

pub fn repair_did_index<F>(
    &self,
    num_threads: usize,
    flush_interval: u32,
    progress_callback: Option<F>,
) -> Result<did_index::RepairResult>
where
    F: Fn(u32, u32, u64, u64) + Send + Sync, // (current, total, bytes_processed, total_bytes)

Purpose: Intelligently repair the DID index by:

  • Rebuilding missing delta segments (if base shards are intact)
  • Performing incremental update (if < 1000 bundles behind)
  • Performing full rebuild (if > 1000 bundles behind or base shards corrupted)
  • Compacting delta segments (if > 50 segments)

Parameters:

  • num_threads: Number of threads (0 = auto-detect)
  • flush_interval: Flush to disk every N bundles (0 = only at end)
  • progress_callback: Optional progress callback

Result:

pub struct RepairResult {
    pub repaired: bool,
    pub compacted: bool,
    pub bundles_processed: u32,
    pub segments_rebuilt: usize,
}

Use Cases:

  • CLI: plcbundle index repair
  • Maintenance: Fix corrupted or incomplete index
  • Recovery: Restore index after data loss

Index Statistics#

pub fn get_did_index_stats(&self) -> HashMap<String, serde_json::Value>

Purpose: Get comprehensive DID index statistics.

Returns: JSON-compatible map with fields like:

  • exists: Whether index exists
  • total_dids: Total number of unique DIDs
  • last_bundle: Last bundle indexed
  • delta_segments: Number of delta segments
  • shard_count: Number of shards (should be 256)

Use Cases:

  • CLI: plcbundle index status
  • Server: Status endpoint
  • Monitoring: Health checks

10. Server API Methods#

The following methods are specifically designed for use by the HTTP server component. They provide streaming capabilities, cursor tracking, and resolver functionality.

Streaming Bundle Data#

/// Stream bundle raw (compressed) data
pub fn stream_bundle_raw(&self, bundle_num: u32) -> Result<std::fs::File>

/// Stream bundle decompressed (JSONL) data
pub fn stream_bundle_decompressed(&self, bundle_num: u32) -> Result<Box<dyn std::io::Read + Send>>

Purpose: Efficient streaming of bundle data for HTTP responses.

Use Cases:

  • Server: /data/:number endpoint (raw compressed)
  • Server: /jsonl/:number endpoint (decompressed JSONL)
  • WebSocket: Streaming operations to clients

Cursor & Position Tracking#

/// Get current cursor (global position of last operation)
/// Cursor = (last_bundle * 10000) + mempool_ops_count
pub fn get_current_cursor(&self) -> u64

Purpose: Track the global position in the operation stream for WebSocket streaming.

Use Cases:

  • WebSocket: /ws?cursor=N to resume from a specific position
  • Server: Status endpoint showing current position

Global Position Mapping#

  • Positions are 0-indexed per bundle (0..9,999).
  • Global position formula: global = ((bundle - 1) × 10,000) + position.
  • Conversion back: bundle = floor(global / 10,000) + 1, position = global % 10,000.
  • Examples:
    • global 0 → bundle 1, position 0
    • global 10000 → bundle 2, position 0
    • global 88410345 → bundle 8842, position 345
  • Shorthand: Small numbers < 10000 are treated as bundle 1 positions.

Handle & DID Resolution#

/// Resolve handle to DID or validate DID format
/// Returns (did, handle_resolve_time_ms)
pub fn resolve_handle_or_did(&self, input: &str) -> Result<(String, u64)>

/// Get handle resolver base URL (if configured)
pub fn get_handle_resolver_base_url(&self) -> Option<String>

/// Get resolver statistics
pub fn get_resolver_stats(&self) -> HashMap<String, serde_json::Value>

Purpose: Support handle-to-DID resolution and resolver metrics.

Use Cases:

  • Server: /:did endpoints (accepts both DIDs and handles)
  • Server: /status endpoint (resolver stats)
  • Server: /debug/resolver endpoint

DID Index Access#

/// Get DID index manager (for stats and direct access)
pub fn get_did_index(&self) -> Arc<RwLock<did_index::Manager>>

Purpose: Direct access to DID index for server endpoints.

Use Cases:

  • Server: /debug/didindex endpoint
  • Server: /status endpoint (DID index stats)

PLC Origin#

/// Get PLC origin from index
pub fn get_plc_origin(&self) -> String

Purpose: Get the origin identifier for the repository.

Use Cases:

  • Server: Root endpoint (display origin)
  • Server: /status endpoint

11. Observability#

Manager Statistics#

pub fn get_stats(&self) -> ManagerStats

Purpose: Get runtime statistics for monitoring.

pub struct ManagerStats {
    pub cache_hits: u64,
    pub cache_misses: u64,
    pub bundles_loaded: u64,
    pub operations_read: u64,
    pub bytes_read: u64,
}

Common Types#

Operation Filter#

pub struct OperationFilter {
    pub did: Option<String>,
    pub operation_type: Option<String>,
    pub time_range: Option<(String, String)>,
    pub include_nullified: bool,
}

Used across query, export, and range operations.

Operation#

pub struct Operation {
    pub did: String,
    pub operation: serde_json::Value,
    pub cid: Option<String>,
    pub nullified: bool,
    pub created_at: String,
    // Optional: raw_json field for zero-copy access
}

CLI Usage Examples#

All CLI commands use only the high-level API:

# Uses: get_operation()
plcbundle op get 42 1337

# Uses: query()
plcbundle query "did" --bundles 1-100

# Uses: export_to_writer()
plcbundle export --range 1-100 -o output.jsonl

# Uses: get_bundle_info()
plcbundle info --bundle 42

# Uses: verify_bundle()
plcbundle verify --bundle 42 --checksums

# Uses: get_did_operations()
plcbundle did-lookup did:plc:xyz123

FFI Mapping#

The high-level API maps cleanly to C/Go:

C Bindings#

// Direct 1:1 mapping
// Note: C bindings use default options (equivalent to passing () in Rust)
CBundleManager* bundle_manager_new(const char* path);
int bundle_manager_load_bundle(CBundleManager* mgr, uint32_t num, CLoadOptions* opts, CLoadResult* result);
int bundle_manager_get_operation(CBundleManager* mgr, uint32_t bundle, size_t pos, COperation* out);
int bundle_manager_export(CBundleManager* mgr, CExportSpec* spec, ExportCallback cb, void* user_data);

Go Bindings#

type BundleManager struct { /* ... */ }

func (m *BundleManager) LoadBundle(num uint32, opts LoadOptions) (*LoadResult, error)
func (m *BundleManager) GetOperation(bundle uint32, position int, opts OperationOptions) (*Operation, error)
func (m *BundleManager) Query(spec QuerySpec) (*QueryIterator, error)
func (m *BundleManager) Export(spec ExportSpec, opts ExportOptions) (*ExportStats, error)

Implementation Status#

✅ Implemented#

  • load_bundle()
  • get_bundle_info()
  • query()
  • export() / export_to_writer()
  • verify_bundle()
  • verify_chain()
  • get_stats()
  • build_did_index()
  • verify_did_index()
  • repair_did_index()

🚧 Needs Refactoring#

  • get_operation() - Currently in CLI, should be in BundleManager
  • get_operations_batch() - Not yet implemented
  • get_operations_range() - Partially implemented

📋 To Implement#

  • get_did_operations() - Uses DID index
  • batch_resolve_dids() - Batch DID lookup
  • prefetch_bundles() - Cache warming
  • warm_up() - Intelligent prefetching

Migration Plan#

Phase 1: Move Operation Access to Manager ✅ PRIORITY#

// Add to BundleManager
impl BundleManager {
    /// Get a single operation efficiently (reads only the needed line)
    pub fn get_operation(&self, bundle_num: u32, position: usize) -> Result<OperationData>
    
    /// Get operation with raw JSON (zero-copy)
    pub fn get_operation_raw(&self, bundle_num: u32, position: usize) -> Result<String>
}

pub struct OperationData {
    pub raw_json: String,           // Original JSON from file
    pub parsed: Option<Operation>,  // Lazy parsing
}

Phase 2: Update CLI to Use API#

// Before (direct file access):
let file = std::fs::File::open(bundle_path)?;
let decoder = zstd::Decoder::new(file)?;
// ... manual line reading

// After (high-level API):
let op_data = manager.get_operation_raw(bundle_num, position)?;
println!("{}", op_data);

Phase 3: Add to FFI/Go#

Once in BundleManager, automatically available to FFI:

int bundle_manager_get_operation_raw(
    CBundleManager* mgr,
    uint32_t bundle_num,
    size_t position,
    char** out_json,
    size_t* out_len
);
func (m *BundleManager) GetOperationRaw(bundle uint32, position int) (string, error)

Design Benefits#

  1. Single Source of Truth: All functionality in BundleManager
  2. Testable: High-level API is easy to unit test
  3. Consistent: Same patterns across Rust, C, and Go
  4. Maintainable: Changes in one place affect all consumers
  5. Efficient: API designed for performance (streaming, lazy parsing)
  6. Safe: No direct file access by consumers

Next Steps#

  1. ✅ Add get_operation() / get_operation_raw() to BundleManager
  2. ✅ Update CLI op get to use new API
  3. ✅ Add FFI bindings for operation access
  4. ⏭️ Implement get_operations_batch()
  5. ⏭️ Implement DID lookup operations
  6. ⏭️ Add cache warming APIs