OCaml HTML5 parser/serialiser based on Python's JustHTML
JavaScript 96.7%
Shell 3.3%
5 3 0

Clone this repository

https://tangled.org/anil.recoil.org/ocaml-html5rw
git@git.recoil.org:anil.recoil.org/ocaml-html5rw

For self-hosted knots, clone URLs may differ based on your setup.

README.md

html5rw#

Pure OCaml HTML5 parser compiled to JavaScript and WebAssembly via js_of_ocaml.

Note: This package is browser-only. It uses DOM APIs and browser events for initialization and cannot be used in Node.js.

This is a fully compliant HTML5 parser implementing the WHATWG HTML5 specification, passing the html5lib-tests conformance suite. It is based on transpiling https://github.com/validator/validator into OCaml.

Installation#

npm install html5rw-jsoo

Usage (Browser Only)#

JavaScript Version#

<!DOCTYPE html>
<html>
<head>
  <script src="node_modules/html5rw/htmlrw.js"></script>
</head>
<body>
  <script>
    // The library initializes on DOMContentLoaded
    // API documentation coming soon
  </script>
</body>
</html>

WebAssembly Version#

<!DOCTYPE html>
<html>
<head>
  <script src="node_modules/html5rw/htmlrw.wasm.js"></script>
</head>
<body>
  <script>
    // Same API as JavaScript version, but runs as WASM
    // Automatically loads WASM modules from htmlrw_js_main.bc.wasm.assets/
  </script>
</body>
</html>

Web Worker (Background Validation)#

For non-blocking HTML validation in a separate thread:

const worker = new Worker('node_modules/html5rw/htmlrw-worker.js');

worker.onmessage = (e) => {
  console.log('Validation result:', e.data);
};

worker.postMessage({ html: '<div><p>Hello' });

WASM version:

const worker = new Worker('node_modules/html5rw/htmlrw-worker.wasm.js');

Files Included#

File Description
htmlrw.js Main library (JavaScript)
htmlrw.wasm.js Main library (WebAssembly loader)
htmlrw-worker.js Web Worker (JavaScript)
htmlrw-worker.wasm.js Web Worker (WebAssembly loader)
htmlrw-tests.js Browser test runner (JavaScript)
htmlrw-tests.wasm.js Browser test runner (WebAssembly loader)
htmlrw_js_main.bc.wasm.assets/ WASM modules for main library
htmlrw_js_worker.bc.wasm.assets/ WASM modules for web worker
htmlrw_js_tests_main.bc.wasm.assets/ WASM modules for test runner

Features#

  • Full HTML5 parsing per WHATWG specification
  • Encoding detection and conversion
  • Error recovery (like browsers)
  • CSS selector queries
  • DOM manipulation
  • HTML serialization

Source Code#

The OCaml source code is available on the main branch: https://tangled.org/anil.recoil.org/ocaml-html5rw

License#

MIT