commits
Cache with 100k entries improves reads significantly:
- disk: 4k → 14k ops/s (3.4×)
- memory: 9.5k → 13.4k (1.4×)
- lavyek: 8.5k → 12.6k (1.5×)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Runners accept an optional ~cache parameter that wraps the backend
with Backend.cached. The main CLI exposes --cache N to set the
capacity. Names include "+cache" suffix when cache is active.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Store.create now accepts ?cache:int to wrap the backend with an LRU
cache of the given capacity. Default: no cache (unchanged behavior).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Replace naive list-based cache with O(1) hashtable+linked-list LRU
- Increase default capacity from 1000 to 100000 entries
- Populate cache on write and write_batch (not just on read)
- Make capacity configurable via optional parameter
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Hashtable for lookups + doubly-linked list for LRU ordering.
Replaces the naive O(n) list-based implementation.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
100-byte for base, 30-byte for +inline, 10 KiB for large-values.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Refactor gen_chart.py to generate both linear and log-scale SVG charts.
The log scale makes it easier to compare backends with very different
magnitudes (e.g. concurrent scenario: 263 vs 447k ops/s).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
gen_chart.py now generates bench_chart_<timestamp>.svg, removes old
versions, and updates the README reference automatically.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The previous benchmarks had inline_threshold=0 in the codec, meaning
inlining was never active. Setting it to 48 and using 30-byte values
shows the real impact:
- Memory commits: 539 -> 127k ops/s (235x faster)
- Lavyek commits: 480 -> 114k ops/s (238x faster)
- Disk commits: 445 -> 4.6k ops/s (10x faster)
- Memory reads: 9.5k -> 19.5k ops/s (2x faster)
The speedup comes from avoiding separate content-addressable store
writes — inlined contents are stored directly in tree nodes.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Each scenario now has its own independent Y axis, making it easier to
compare backends within the same scenario without log distortion.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Small but consistent improvement: memory commits +6%, disk reads +11%,
lavyek commits +6%. The gains are modest because the bottleneck is full
tree re-serialization, not node encoding.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Generated with gen_chart.py, grouped by scenario with log scale.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
irmin-git uses the Git object format (zlib + SHA-1 loose objects) via
the lwt_eio bridge. Results show it is the slowest Irmin backend:
commits at ~2.2k ops/s (17x slower than irmin-fs), reads at ~145k ops/s,
large-values at ~1.6k ops/s. Memory usage is ~552-682 MiB.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
irmin-fs stores one file per object on disk. Results: commits 37k ops/s
(vs irmin-pack 50k), reads 208k ops/s (vs 1.3M for irmin-pack — no LRU),
incremental very slow at 193 ops/s, large-values 2.6k ops/s.
Concurrent scenario skipped (Queue.Empty bug in irmin-fs Eio pool).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The comparison with Lavyek is not apples-to-apples: irmini/Lavyek
does raw backend read/write, while irmin-pack goes through the full
Irmin stack (tree, inodes, pack file, index). Also irmin-pack
serializes all writes behind a single mutex on the append-only pack.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
100 fibers across 12 domains, each writing to its own branch to avoid
CAS contention, reading from main. irmin-pack: ~1.5-1.7k ops/s,
~6× faster than irmini disk but far behind Lavyek (447k ops/s).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Run irmin-pack benchmarks on both `eio` and
`cuihtlauac-inline-small-objects-v2` branches. Inlining gives ~25%
boost on irmin-pack commits; reads equally fast thanks to LRU cache.
Fix bench_irmin_pack.ml syntax (split KV functor application).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add results from official Irmin on both the eio branch and the
cuihtlauac-inline-small-objects-v2 branch (Eio + small object inlining).
All runs use the same configuration (50 commits, 500 adds, depth 10).
Irmin-Eio is ~300× faster on commits and ~160× on reads thanks to
inode-based structural sharing. Inlining has marginal impact on these
benchmarks (100-byte values, in-memory backend).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Document scenarios, CLI options, file layout and reference results
from a 50-commit / 500-adds run on a 12-core machine. Highlights
Lavyek's 1700× advantage on concurrent workloads.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Test codec round-trip with inlined entries, and tree write/read with
inlining enabled (both flat and nested paths).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Extend Codec.S with a `Contents_inlined of string` entry variant that
allows small content values to be stored directly in parent tree nodes
instead of as separate content-addressed blobs. This reduces backend
lookups for small values.
Key changes:
- Codec.S: new `entry` type with `Contents_inlined`, `inline_threshold`
- Codec.Git: wrapped node type supporting both standard Git tree entries
and inlined entries, with versioned serialization (v0 backward compat,
v1 with inlined data)
- Tree: `hash` takes optional `~inline_threshold` parameter; contents
at or below the threshold are embedded in the parent node
- Proof: handles inlined entries in find/list/add/build_proof_tree
The default threshold is 0 (no inlining), preserving backward
compatibility. Callers opt in by passing ~inline_threshold:48 to
Tree.hash. Git interop backends should use the default (0) since
Git's object store cannot represent the extended format.
Inspired by Irmin's cuihtlauac-inline-small-objects-v2 branch.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
100 fibers distributed round-robin across available domains (12 on
this machine). Shows ~2190x throughput advantage for Lavyek over
the mutex-based disk backend under high concurrency.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Spawns N fibers on separate domains doing parallel reads+writes
directly on the backend. Shows Lavyek's lock-free advantage
(~2000x faster than disk backend under 8-fiber contention).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The Irmin workspace ignores directories starting with _. Use bench-irmini/
as the temporary directory name and generate the dune+main files inline.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Use Lavyek.create (fresh) instead of open_out, and give each scenario
its own subdirectory to avoid WAL replay issues between runs.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Comparable multi-scenario benchmarks for official Irmin (eio branch):
- Irmin_mem in-memory backend
- Irmin-pack persistent backend
Same 4 scenarios: commits, reads, incremental, large-values.
Includes run.sh script to run both irmini and Irmin-Eio benchmarks.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Benchmark suite with 4 scenarios (commits, reads, incremental, large-values)
across 3 backends (memory, disk, lavyek). Includes comparison table output.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Cmdliner 1.3.0 shadows exit with a deprecated binding.
Replace all bare exit calls with Stdlib.exit.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Convert all packages from:
(source (uri https://tangled.org/handle/repo))
to:
(source (tangled handle/repo))
This uses dune 3.21's native tangled support for cleaner source
declarations. Also removes redundant homepage/bug_reports fields
that are auto-generated from tangled sources.
- Rename directory: ocaml-build-info -> monopam-info
- Rename module: Mono_info -> Monopam_info
- Rename package: ocaml-build-info -> monopam-info
- Update all consumers to use new module name
- Remove "Skipping pull" log noise from push output
Rename ocaml-version to ocaml-build-info with Mono_info module.
All homebrew binaries now use Mono_info.version for consistent
version reporting across the monorepo.
The library wraps dune-build-info and falls back to git hash
in dev mode. Can be extended with SBOM or other metadata later.
git-subtree-dir: irmin
git-subtree-mainline: e4c45b7c50e04b8e1492a24403ad5d796eb5e49e
- ocaml-git: Add advance_head function that properly updates branch refs
instead of writing directly to HEAD, preventing detached HEAD state
- ocaml-git: Add tests for advance_head in both branch and detached modes
- monopam: Add dune-build-info for proper versioning
- monopam: Add cram test for workflow commands
git-subtree-dir: irmin
git-subtree-mainline: e4c45b7c50e04b8e1492a24403ad5d796eb5e49e
- Simplify homebrew.yml: use `target` path instead of package/exe_name
- Rename prune to pruner (avoid conflict with graphviz)
- Add parallel uploads with rclone --transfers 16
- Fix hyphenated name parsing (git-mono was parsed as "git")
- Add timing output for build/upload/tap update steps
- Add dry-run command to preview build targets
git-subtree-dir: irmin
git-subtree-mainline: e4c45b7c50e04b8e1492a24403ad5d796eb5e49e
Wrap H2_client.close in try-catch to handle the case where fibers
are already cancelled during switch cleanup. This fixes the
"Cancelled: Stdlib.Exit" error that occurred when monopam pull
completed and the connection pool was cleaning up.
git-subtree-dir: irmin
git-subtree-mainline: e4c45b7c50e04b8e1492a24403ad5d796eb5e49e
The @ocaml-index target was failing because:
1. Test modules lacked .mli files needed for index generation
2. Test directories with multiple executables didn't specify (modules),
causing dune to include all .ml files in each executable - modules
using alcotest were included in executables without that dependency
Added empty .mli stubs and explicit (modules) fields. Also added
(enabled_if (= %{context_name} "default")) to disable tests in afl context.
- Add fpath dependency to ocaml-git
- Update Git.Repository API to use Fpath.t for:
- open_repo, open_bare, init, is_repo, git_dir
- Update all callers in irmin and monopam to use Fpath.t
- Remove duplicate functions from Git_cli that now use native ocaml-git:
- is_repo (use Git.Repository.is_repo)
- is_dirty (use Git.Repository.is_dirty)
- current_branch (use Git.Repository.current_branch)
- ahead_behind (use Git.Repository.ahead_behind)
- ls_remote_head (unused)
- list_remotes, get_remote_url, add_remote, remove_remote,
set_remote_url, ensure_remote (use Git.Repository.*)
This reduces subprocess spawning in hot paths and provides type-safe
path handling throughout the git operations.
Integrates existing monorepo storage packages:
- ocaml-wal: crash-safe writes with CRC checksums
- ocaml-bloom: fast negative lookups for exists()
Write path now:
1. Write to WAL (crash-safe with CRC)
2. Write to data file
3. Update in-memory index and bloom filter
4. On flush: save index and bloom, then clear WAL
Recovery on startup:
1. Load index and bloom from disk
2. Replay any WAL entries not yet in index
Adds test for WAL crash recovery scenario.
Implements Backend.Disk module for persistent storage:
- objects.data: append-only file for all objects
- objects.idx: index mapping hex hash -> (offset, length)
- refs/: directory with nested paths for refs
Design inspired by lavyek's append-only storage patterns
for high write throughput.
Features:
- Thread-safe with Eio.Mutex
- Atomic index updates via rename
- Recursive ref directory scanning
- Supports nested refs like refs/heads/main
Adds 4 tests: basic operations, persistence, refs, and batch writes.
Implements the previously stubbed diff function that compares two trees
and yields a sequence of changes (Add, Remove, Change).
The algorithm recursively traverses both trees and:
- Emits Remove for entries only in old tree
- Emits Add for entries only in new tree
- Emits Change for modified contents
- Handles subtree transitions (contents ↔ node)
Import test scenarios from NASA's HDTN (High-rate Delay Tolerant Network):
- cgrTutorial: 5-node network from pyCGR tutorial (16 contacts)
- contactPlan_RoutingTest: 6-node routing scenario (8 contacts)
Test cases verify:
- Multi-hop routing with OWLT (1->3->4->5 path)
- Contact timing constraints (waiting for contact windows)
- Unidirectional links (no reverse path)
- Path selection (choosing faster route via node 2 vs 3)
Reference: https://github.com/nasa/HDTN
The arrival_time function was using departure_time (contact start) as the
initial time, but should use start_time (query time when we were at the
source). This caused incorrect arrival times when the query time was after
a contact's start time.
Found by fuzz testing.
ocaml-git:
- Add Repository module for high-level Git access
- Transparent loose object + pack file reading
- Zlib compression/decompression for loose objects
- Lazy pack file loading
irmin:
- Simplify git_interop to use Git.Repository directly
- Add open_git for working directory repos
- Pack files handled transparently
Also resolves bundle opam conflict (use tangled.org).
- Regenerate opam files after source URL updates
- Add README files for irmin, git
- Add hash serialization ops (hash_to_bytes, hash_to_hex, hash_of_hex,
hash_equal, hash_compare) to Tree_format.S
- Add commit operations to Tree_format.S using ocaml-git for Git format
- Implement MST commit parsing/encoding using atp Dagcbor
- Fix closed vs open variant types in node_of_bytes and read_object
- Remove unused helper functions from git_interop.ml
- Simplify commit.ml to delegate to Tree_format
Cache with 100k entries improves reads significantly:
- disk: 4k → 14k ops/s (3.4×)
- memory: 9.5k → 13.4k (1.4×)
- lavyek: 8.5k → 12.6k (1.5×)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
100-byte for base, 30-byte for +inline, 10 KiB for large-values.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The previous benchmarks had inline_threshold=0 in the codec, meaning
inlining was never active. Setting it to 48 and using 30-byte values
shows the real impact:
- Memory commits: 539 -> 127k ops/s (235x faster)
- Lavyek commits: 480 -> 114k ops/s (238x faster)
- Disk commits: 445 -> 4.6k ops/s (10x faster)
- Memory reads: 9.5k -> 19.5k ops/s (2x faster)
The speedup comes from avoiding separate content-addressable store
writes — inlined contents are stored directly in tree nodes.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
irmin-git uses the Git object format (zlib + SHA-1 loose objects) via
the lwt_eio bridge. Results show it is the slowest Irmin backend:
commits at ~2.2k ops/s (17x slower than irmin-fs), reads at ~145k ops/s,
large-values at ~1.6k ops/s. Memory usage is ~552-682 MiB.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
irmin-fs stores one file per object on disk. Results: commits 37k ops/s
(vs irmin-pack 50k), reads 208k ops/s (vs 1.3M for irmin-pack — no LRU),
incremental very slow at 193 ops/s, large-values 2.6k ops/s.
Concurrent scenario skipped (Queue.Empty bug in irmin-fs Eio pool).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The comparison with Lavyek is not apples-to-apples: irmini/Lavyek
does raw backend read/write, while irmin-pack goes through the full
Irmin stack (tree, inodes, pack file, index). Also irmin-pack
serializes all writes behind a single mutex on the append-only pack.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add results from official Irmin on both the eio branch and the
cuihtlauac-inline-small-objects-v2 branch (Eio + small object inlining).
All runs use the same configuration (50 commits, 500 adds, depth 10).
Irmin-Eio is ~300× faster on commits and ~160× on reads thanks to
inode-based structural sharing. Inlining has marginal impact on these
benchmarks (100-byte values, in-memory backend).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Test codec round-trip with inlined entries, and tree write/read with
inlining enabled (both flat and nested paths).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Extend Codec.S with a `Contents_inlined of string` entry variant that
allows small content values to be stored directly in parent tree nodes
instead of as separate content-addressed blobs. This reduces backend
lookups for small values.
Key changes:
- Codec.S: new `entry` type with `Contents_inlined`, `inline_threshold`
- Codec.Git: wrapped node type supporting both standard Git tree entries
and inlined entries, with versioned serialization (v0 backward compat,
v1 with inlined data)
- Tree: `hash` takes optional `~inline_threshold` parameter; contents
at or below the threshold are embedded in the parent node
- Proof: handles inlined entries in find/list/add/build_proof_tree
The default threshold is 0 (no inlining), preserving backward
compatibility. Callers opt in by passing ~inline_threshold:48 to
Tree.hash. Git interop backends should use the default (0) since
Git's object store cannot represent the extended format.
Inspired by Irmin's cuihtlauac-inline-small-objects-v2 branch.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Comparable multi-scenario benchmarks for official Irmin (eio branch):
- Irmin_mem in-memory backend
- Irmin-pack persistent backend
Same 4 scenarios: commits, reads, incremental, large-values.
Includes run.sh script to run both irmini and Irmin-Eio benchmarks.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- ocaml-git: Add advance_head function that properly updates branch refs
instead of writing directly to HEAD, preventing detached HEAD state
- ocaml-git: Add tests for advance_head in both branch and detached modes
- monopam: Add dune-build-info for proper versioning
- monopam: Add cram test for workflow commands
- Simplify homebrew.yml: use `target` path instead of package/exe_name
- Rename prune to pruner (avoid conflict with graphviz)
- Add parallel uploads with rclone --transfers 16
- Fix hyphenated name parsing (git-mono was parsed as "git")
- Add timing output for build/upload/tap update steps
- Add dry-run command to preview build targets
The @ocaml-index target was failing because:
1. Test modules lacked .mli files needed for index generation
2. Test directories with multiple executables didn't specify (modules),
causing dune to include all .ml files in each executable - modules
using alcotest were included in executables without that dependency
Added empty .mli stubs and explicit (modules) fields. Also added
(enabled_if (= %{context_name} "default")) to disable tests in afl context.
- Add fpath dependency to ocaml-git
- Update Git.Repository API to use Fpath.t for:
- open_repo, open_bare, init, is_repo, git_dir
- Update all callers in irmin and monopam to use Fpath.t
- Remove duplicate functions from Git_cli that now use native ocaml-git:
- is_repo (use Git.Repository.is_repo)
- is_dirty (use Git.Repository.is_dirty)
- current_branch (use Git.Repository.current_branch)
- ahead_behind (use Git.Repository.ahead_behind)
- ls_remote_head (unused)
- list_remotes, get_remote_url, add_remote, remove_remote,
set_remote_url, ensure_remote (use Git.Repository.*)
This reduces subprocess spawning in hot paths and provides type-safe
path handling throughout the git operations.
Integrates existing monorepo storage packages:
- ocaml-wal: crash-safe writes with CRC checksums
- ocaml-bloom: fast negative lookups for exists()
Write path now:
1. Write to WAL (crash-safe with CRC)
2. Write to data file
3. Update in-memory index and bloom filter
4. On flush: save index and bloom, then clear WAL
Recovery on startup:
1. Load index and bloom from disk
2. Replay any WAL entries not yet in index
Adds test for WAL crash recovery scenario.
Implements Backend.Disk module for persistent storage:
- objects.data: append-only file for all objects
- objects.idx: index mapping hex hash -> (offset, length)
- refs/: directory with nested paths for refs
Design inspired by lavyek's append-only storage patterns
for high write throughput.
Features:
- Thread-safe with Eio.Mutex
- Atomic index updates via rename
- Recursive ref directory scanning
- Supports nested refs like refs/heads/main
Adds 4 tests: basic operations, persistence, refs, and batch writes.
Implements the previously stubbed diff function that compares two trees
and yields a sequence of changes (Add, Remove, Change).
The algorithm recursively traverses both trees and:
- Emits Remove for entries only in old tree
- Emits Add for entries only in new tree
- Emits Change for modified contents
- Handles subtree transitions (contents ↔ node)
Import test scenarios from NASA's HDTN (High-rate Delay Tolerant Network):
- cgrTutorial: 5-node network from pyCGR tutorial (16 contacts)
- contactPlan_RoutingTest: 6-node routing scenario (8 contacts)
Test cases verify:
- Multi-hop routing with OWLT (1->3->4->5 path)
- Contact timing constraints (waiting for contact windows)
- Unidirectional links (no reverse path)
- Path selection (choosing faster route via node 2 vs 3)
Reference: https://github.com/nasa/HDTN
ocaml-git:
- Add Repository module for high-level Git access
- Transparent loose object + pack file reading
- Zlib compression/decompression for loose objects
- Lazy pack file loading
irmin:
- Simplify git_interop to use Git.Repository directly
- Add open_git for working directory repos
- Pack files handled transparently
Also resolves bundle opam conflict (use tangled.org).
- Add hash serialization ops (hash_to_bytes, hash_to_hex, hash_of_hex,
hash_equal, hash_compare) to Tree_format.S
- Add commit operations to Tree_format.S using ocaml-git for Git format
- Implement MST commit parsing/encoding using atp Dagcbor
- Fix closed vs open variant types in node_of_bytes and read_object
- Remove unused helper functions from git_interop.ml
- Simplify commit.ml to delegate to Tree_format