html5rw#
Pure OCaml HTML5 parser compiled to JavaScript and WebAssembly via js_of_ocaml.
Note: This package is browser-only. It uses DOM APIs and browser events for initialization and cannot be used in Node.js.
This is a fully compliant HTML5 parser implementing the WHATWG HTML5 specification, passing the html5lib-tests conformance suite. It is based on transpiling https://github.com/validator/validator into OCaml.
Installation#
npm install html5rw-jsoo
Usage (Browser Only)#
JavaScript Version#
<!DOCTYPE html>
<html>
<head>
<script src="node_modules/html5rw/htmlrw.js"></script>
</head>
<body>
<script>
// The library initializes on DOMContentLoaded
// API documentation coming soon
</script>
</body>
</html>
WebAssembly Version#
<!DOCTYPE html>
<html>
<head>
<script src="node_modules/html5rw/htmlrw.wasm.js"></script>
</head>
<body>
<script>
// Same API as JavaScript version, but runs as WASM
// Automatically loads WASM modules from htmlrw_js_main.bc.wasm.assets/
</script>
</body>
</html>
Web Worker (Background Validation)#
For non-blocking HTML validation in a separate thread:
const worker = new Worker('node_modules/html5rw/htmlrw-worker.js');
worker.onmessage = (e) => {
console.log('Validation result:', e.data);
};
worker.postMessage({ html: '<div><p>Hello' });
WASM version:
const worker = new Worker('node_modules/html5rw/htmlrw-worker.wasm.js');
Files Included#
| File | Description |
|---|---|
htmlrw.js |
Main library (JavaScript) |
htmlrw.wasm.js |
Main library (WebAssembly loader) |
htmlrw-worker.js |
Web Worker (JavaScript) |
htmlrw-worker.wasm.js |
Web Worker (WebAssembly loader) |
htmlrw-tests.js |
Browser test runner (JavaScript) |
htmlrw-tests.wasm.js |
Browser test runner (WebAssembly loader) |
htmlrw_js_main.bc.wasm.assets/ |
WASM modules for main library |
htmlrw_js_worker.bc.wasm.assets/ |
WASM modules for web worker |
htmlrw_js_tests_main.bc.wasm.assets/ |
WASM modules for test runner |
Features#
- Full HTML5 parsing per WHATWG specification
- Encoding detection and conversion
- Error recovery (like browsers)
- CSS selector queries
- DOM manipulation
- HTML serialization
Source Code#
The OCaml source code is available on the main branch:
https://tangled.org/anil.recoil.org/ocaml-html5rw
License#
MIT