h264-decoder-rs#
A pure Rust H.264 (AVC) decoder library and CLI tool.
Parses H.264 Annex B bytestreams using the h264-reader crate and decodes frames into raw YUV/RGB output, suitable for use by downstream video encoders such as rav1e (AV1) or display pipelines.
Project structure#
h264-decoder-rs/
├── Cargo.toml # Workspace root
├── crates/
│ ├── h264-decode/ # Core decoder library
│ │ └── src/
│ │ ├── lib.rs # Public API surface
│ │ ├── decode.rs # Decoder + DecoderConfig
│ │ ├── dpb.rs # Decoded Picture Buffer (reference management + reordering)
│ │ ├── error.rs # Error and warning types
│ │ ├── frame.rs # DecodedFrame, FramePlane, PictureType
│ │ ├── nal_handler.rs # NAL unit parsing bridge (h264-reader integration)
│ │ ├── pixel.rs # Pixel formats and YUV↔RGB conversion
│ │ └── transform.rs # Inverse transforms (4×4, 8×8 IDCT, Hadamard) and dequantization
│ └── h264-decode-cli/ # CLI binary
│ └── src/
│ └── main.rs
├── default.nix # Nix package and devShell
├── justfile # Development commands
└── README.md
Library usage#
Add the dependency to your Cargo.toml:
[dependencies]
h264-decode = { path = "crates/h264-decode" }
Decode to YUV 4:2:0 (native output)#
use h264_decode::{Decoder, DecoderConfig, PixelFormat};
let config = DecoderConfig::new().pixel_format(PixelFormat::Yuv420p);
let mut decoder = Decoder::new(config);
// Feed raw H.264 Annex B bytestream data (can be called incrementally)
let h264_data: &[u8] = &[ /* ... */ ];
let frames = decoder.decode(h264_data).expect("decode failed");
for frame in &frames {
// Access individual YUV planes
let y = frame.y_plane().data();
let u = frame.u_plane().data();
let v = frame.v_plane().data();
println!(
"decoded {}x{} frame, poc={}, type={:?}",
frame.width(),
frame.height(),
frame.pic_order_cnt(),
frame.picture_type(),
);
}
// Flush remaining frames at end of stream
let trailing = decoder.flush().expect("flush failed");
Decode to RGB24 (automatic conversion)#
use h264_decode::{Decoder, DecoderConfig, PixelFormat};
let config = DecoderConfig::new()
.pixel_format(PixelFormat::Rgb24)
.colour_matrix(h264_decode::pixel::ColourMatrix::Bt709);
let mut decoder = Decoder::new(config);
let frames = decoder.decode(h264_data).unwrap();
for frame in &frames {
// Single packed RGB plane: R, G, B, R, G, B, ...
let rgb_data = frame.plane(0).data();
assert_eq!(rgb_data.len(), (frame.width() * frame.height() * 3) as usize);
}
Convert between formats on demand#
use h264_decode::{Decoder, DecoderConfig, PixelFormat};
// Decode as native YUV 4:2:0
let mut decoder = Decoder::with_defaults();
let frames = decoder.decode(h264_data).unwrap();
// Convert individual frames as needed
for frame in &frames {
let rgb_frame = frame.to_rgb24();
let rgba_frame = frame.to_rgba32();
let nv12_frame = frame.to_nv12();
}
Feeding a downstream encoder (e.g. rav1e)#
The decoder's native YUV 4:2:0 planar output maps directly to what video encoders expect:
use h264_decode::{Decoder, DecoderConfig, PixelFormat};
let mut decoder = Decoder::new(DecoderConfig::new().pixel_format(PixelFormat::Yuv420p));
let frames = decoder.decode(h264_data).unwrap();
for frame in &frames {
let y_plane = frame.y_plane();
let u_plane = frame.u_plane();
let v_plane = frame.v_plane();
// Feed planes to rav1e, x264, or any other encoder:
// encoder.send_frame(y_plane.data(), u_plane.data(), v_plane.data(),
// frame.width(), frame.height());
}
Supported output pixel formats#
| Format | Description | Planes | Use case |
|---|---|---|---|
Yuv420p |
YUV 4:2:0 planar (I420) | 3 (Y, U, V) | Video encoders (rav1e, x264), default |
Yuv422p |
YUV 4:2:2 planar | 3 (Y, U, V) | Professional video |
Yuv444p |
YUV 4:4:4 planar | 3 (Y, U, V) | Lossless workflows |
Nv12 |
NV12 semi-planar | 2 (Y, UV interleaved) | Hardware APIs (VA-API, DXVA) |
Rgb24 |
Packed 24-bit RGB | 1 | Image processing, display |
Rgba32 |
Packed 32-bit RGBA | 1 | Compositing, UI rendering |
CLI usage#
# Build and install
cargo build --release -p h264-decode-cli
# Decode H.264 to raw YUV 4:2:0
h264-decode input.264 -o output.yuv
# Decode to RGB24
h264-decode input.264 -o output.rgb -f rgb24
# Decode to NV12 with BT.709 matrix
h264-decode input.264 -o output.nv12 -f nv12 -m bt709
# Print stream info with verbose logging
h264-decode input.264 -v
# Decode only the first 10 frames
h264-decode input.264 -o output.yuv --max-frames 10
# Write raw frames to stdout (for piping)
h264-decode input.264 -o - | ffplay -f rawvideo -pix_fmt yuv420p -s 1920x1080 -
Architecture#
Decoder pipeline#
Annex B bytestream
│
▼
┌───────────────────┐
│ Start-code scan │ Split bytestream on 00 00 01 / 00 00 00 01
└───────┬───────────┘
│
▼
┌───────────────────┐
│ NAL unit parse │ Classify NAL type, remove emulation prevention bytes
│ (h264-reader) │ Parse SPS, PPS, slice headers
└───────┬───────────┘
│
▼
┌───────────────────┐
│ Entropy decode │ CAVLC / CABAC → transform coefficients
│ │ (Exp-Golomb reader for headers)
└───────┬───────────┘
│
▼
┌───────────────────┐
│ Dequantize + │ Level-scale matrices, 4×4 / 8×8 inverse integer DCT,
│ Inverse transform│ Hadamard for DC coefficients
└───────┬───────────┘
│
▼
┌───────────────────┐
│ Prediction + │ Intra prediction / motion-compensated inter prediction
│ Reconstruction │ Add residual, clip to pixel range
└───────┬───────────┘
│
▼
┌───────────────────┐
│ Deblocking filter│ In-loop filter to reduce blocking artifacts
└───────┬───────────┘
│
▼
┌───────────────────┐
│ DPB management │ Reference frame storage, sliding-window / MMCO marking,
│ │ picture reordering by POC for display output
└───────┬───────────┘
│
▼
┌───────────────────┐
│ Format convert │ Optional YUV→RGB / NV12 conversion
│ (pixel.rs) │ BT.601 or BT.709 colour matrix
└───────┬───────────┘
│
▼
DecodedFrame
Key components#
Decoder— Top-level entry point. Accepts incremental Annex B data and produces decoded frames in display order.DecoderConfig— Builder for configuring output format, colour matrix, deblocking, and reference frame limits.DecodedFrame— Decoded picture with per-plane accessors, metadata (POC, picture type, frame number), and format conversion methods.Dpb— Decoded Picture Buffer implementing H.264 reference frame management (sliding window, MMCO, short/long-term marking) and display reordering by picture order count.transform— Normative 4×4 and 8×8 inverse integer DCT, Hadamard transforms for DC coefficients, dequantization with level-scale matrices, and zig-zag scan orders.pixel— Pixel format definitions and BT.601/BT.709 YUV↔RGB conversion routines.nal_handler— NAL unit classification, SPS/PPS storage, slice header parsing with an exp-Golomb reader, and event-driven accumulator for the decoder.
Development#
# Run all checks
just all
# Build
just build
# Run tests
just test
# Run clippy
just clippy
# Format code
just fmt
# Open documentation
just doc
Nix#
# Enter the devShell
direnv allow # or: nix develop .#h264-decoder-rs
# Build the package
nix build .#h264-decoder-rs
Current status#
The project provides a complete, compiling decoder scaffold with:
- ✅ Annex B bytestream parsing and start-code scanning
- ✅ NAL unit classification and emulation prevention byte removal
- ✅ SPS parsing (profile, level, dimensions, chroma format, POC type, reference frame limits)
- ✅ PPS parsing (bootstrap)
- ✅ Slice header parsing (slice type, frame_num, PPS reference)
- ✅ Picture order count computation (types 0 and 2)
- ✅ Decoded Picture Buffer with full reference management (sliding window, MMCO, short/long-term marking)
- ✅ Reference picture list construction (list 0 and list 1)
- ✅ Display reordering by POC
- ✅ 4×4 and 8×8 inverse integer DCT transforms
- ✅ 4×4 and 2×2 inverse Hadamard transforms for DC coefficients
- ✅ Dequantization with H.264 level-scale matrices
- ✅ Zig-zag scan orders (4×4 and 8×8)
- ✅ Residual reconstruction with clipping
- ✅ YUV 4:2:0 → RGB24 / RGBA32 / NV12 conversion (BT.601 and BT.709)
- ✅ Full
DecodedFrametype with per-plane access and format conversion - ✅ CLI tool with clap argument parsing
- ✅ Comprehensive test suite
Planned work#
- 🔲 Full CAVLC entropy decoding of macroblock residual data
- 🔲 CABAC entropy decoding
- 🔲 Intra prediction modes (4×4, 8×8, 16×16 luma + chroma)
- 🔲 Motion-compensated inter prediction (P and B slices)
- 🔲 Sub-pixel interpolation (quarter-pel luma, eighth-pel chroma)
- 🔲 In-loop deblocking filter
- 🔲 Weighted prediction
- 🔲 High profile 8×8 transform support
- 🔲 Multiple slice groups
- 🔲 Field/MBAFF coding
- 🔲 10-bit and higher bit-depth support
License#
MIT