Bindle File Format#
Bindle is an append-only binary archive format designed for efficient writes, logical updates via shadowing, and memory-mapped reads.
1. High-Level Layout#
A Bindle file consists of a fixed header, a data payload area, a trailing index, and a fixed-size footer.
| Offset | Component | Description |
|---|---|---|
0x00 |
Header | 8-byte magic identification string |
0x08 |
Data Payload | Sequential blobs of raw or compressed data |
Variable |
Index | Sequence of Entry headers and filenames |
EOF - 16 |
Footer | 16-byte tail containing the Index pointer and count |
2. Component Details#
2.1 Header#
Every Bindle file MUST begin with exactly 8 bytes:
42 49 4e 44 4c 30 30 31 (ASCII: BINDL001).
2.2 Data Segment#
Data blobs begin at offset 0x08.
- Alignment: Every data blob MUST be padded with null bytes to an 8-byte boundary.
- Compression: Blobs may be raw or compressed via Zstd.
- Shadowing: New versions of existing files are simply appended to the end of the data segment. The file remains append-only until a vacuum operation is performed.
2.3 Index Entry#
The index is a series of entries. Each entry consists of a fixed metadata block followed by a variable-length filename. All multi-byte integers are stored in little-endian byte order.
| Field | Size | Type | Description |
|---|---|---|---|
offset |
8 bytes | u64 | Absolute file offset to the data blob |
c_size |
8 bytes | u64 | Compressed size on disk |
u_size |
8 bytes | u64 | Original uncompressed size |
crc32 |
4 bytes | u32 | CRC32 checksum of the uncompressed data |
name_len |
2 bytes | u16 | Length of the filename string |
comp_type |
1 byte | u8 | 0 = None, 1 = Zstd |
reserved |
1 byte | u8 | Alignment padding |
filename |
Variable | UTF-8 | The entry name |
Padding: After the filename, the file MUST be padded with null bytes (\0) to the next 8-byte boundary before the next entry begins.
2.4 Footer#
The last 16 bytes of the file are used to locate the index. All fields are stored in little-endian format.
| Field | Size | Type | Description |
|---|---|---|---|
index_offset |
8 bytes | u64 | Absolute offset to the start of the index |
entry_count |
4 bytes | u32 | Total number of unique entries in the index |
magic |
4 bytes | u32 | Magic sentinel value 0x62626262 (ASCII: bbbb) |
3. Operational Logic#
3.1 Shadowing & Atomic Updates#
To "update" a file or add new ones:
- Append new data starting at the current
index_offset. - Write a new Index. If a filename is repeated, the index points to the newest data offset.
- Write a new Footer.
- Old data remains in the file (unreferenced) until a vacuum occurs.
3.2 Vacuuming#
To reclaim space used by shadowed data:
- Create a temporary file and write the
BINDL001header. - Iterate through the live index entries only, copying referenced data from the original.
- Write the new Index and Footer to the temporary file.
- Atomically replace the original file with the temporary file.
- On failure, delete the temporary file.
4. Design Rationale#
- Trailing Index: Enables "single-pass" appending. You don't need to shift existing data to grow the index.
- 8-Byte Alignment: Ensures that all 64-bit integers in the metadata and footer are naturally aligned, preventing performance penalties on architectures that dislike unaligned reads.
- Zero-Copy Potential: Raw (uncompressed) data blobs can be used directly as memory slices via
mmapwithout intermediate buffers.