Pure OCaml B-tree implementation for persistent storage
OCaml 98.6%
Dune 0.5%
Other 0.9%
29 1 0

Clone this repository

https://tangled.org/gazagnaire.org/ocaml-btree https://tangled.org/did:plc:jhift2vwcxhou52p3sewcrpx/ocaml-btree
git@git.recoil.org:gazagnaire.org/ocaml-btree git@git.recoil.org:did:plc:jhift2vwcxhou52p3sewcrpx/ocaml-btree

For self-hosted knots, clone URLs may differ based on your setup.

Download tar.gz
README.md

btree#

Pure OCaml B-tree storage engine with SQLite-compatible file format.

Overview#

A persistent, page-based B-tree that produces valid SQLite database files. You get a fast embedded storage engine whose files can be inspected and debugged with standard sqlite3 tools.

Two B-tree variants following the SQLite file format specification:

  • Table B-tree (B+tree) — data lives only in leaves, interior nodes hold rowid keys and child pointers. Optimised for sequential scans.
  • Index B-tree — keys stored in both interior and leaf nodes. Optimised for point lookups.

Features#

  • Page sizes 512 to 65536 (powers of 2)
  • Overflow pages for payloads exceeding U - 35 bytes
  • File-backed or purely in-memory operation
  • Configurable page cache

Design choices#

The SQLite file format is an implementation choice, not a limitation. It brings free tooling (sqlite3 CLI, DB Browser, etc.) and a spec with 20+ years of battle-testing. The user-facing API is generic — persistent ordered map (Table) and persistent ordered set (Index).

What the format does not give you (compared to LMDB/sanakirja):

Feature SQLite format LMDB / sanakirja
Concurrency In-place updates + WAL/journal Copy-on-write (lock-free readers)
Range scans Via parent traversal Leaf sibling pointers
Crash safety Rollback journal or WAL Atomic root pointer swap

These are deliberate tradeoffs — compatibility and tooling vs. raw throughput on concurrent workloads.

Installation#

opam install btree

Usage#

(* File-backed table B-tree *)
let pager = Btree.Pager.v ~page_size:4096 file in
let table = Btree.Table.v pager in
Btree.Table.insert table ~rowid:1L "Hello";
Btree.Table.insert table ~rowid:2L "World";
Btree.Table.find table 1L          (* Some "Hello" *)

(* In-memory index B-tree *)
let pager = Btree.Pager.mem ~page_size:4096 () in
let index = Btree.Index.v pager in
Btree.Index.insert index "key";
Btree.Index.mem index "key"        (* true *)

(* Iterate in key order *)
Btree.Table.iter table (fun rowid data ->
    Printf.printf "%Ld: %s\n" rowid data)

Modules#

Module Purpose
Pager Page cache and file I/O (file-backed or in-memory)
Table B+tree for rowid-keyed records (int64 -> string)
Index B-tree for string key sets
Page Page header parsing, binary helpers
Cell Cell encoding/decoding (table leaf, interior, index)
Record SQLite record format (serial types, column values)
Varint Variable-length integer encoding

File format#

Page header (8 bytes leaf, 12 bytes interior)#

Offset Size Description
0 1 Page type: 0x0d leaf table, 0x05 interior table, 0x0a leaf index, 0x02 interior index
1 2 First freeblock offset (0 if none)
3 2 Cell count
5 2 Cell content area start (0 = 65536)
7 1 Fragmented free bytes (max 60)
8 4 Right-most child pointer (interior pages only)

Overflow#

When a cell payload exceeds X = U - 35 bytes (where U is usable page size), excess data is stored in a chain of overflow pages. Each overflow page has a 4-byte next-page pointer followed by up to U - 4 bytes of data. The local payload size is computed per the SQLite spec:

M = ((U - 12) * 32 / 255) - 23
K = M + ((P - M) mod (U - 4))
local = if K <= X then K else M
  • SQLite file format — the specification this library implements
  • ocaml-sqlite — database layer built on top of this library (KV API, named tables, schema)
  • LMDB — C B+tree with memory-mapped COW (different tradeoffs)
  • sanakirja — Rust COW B-tree, used by Pijul
  • bbolt — Go B+tree, used by etcd
  • Limbo — Rust SQLite reimplementation

Licence#

ISC. See LICENSE.md for details.