OCaml HTML5 parser/serialiser based on Python's JustHTML
1# html5rw
2
3Pure OCaml HTML5 parser compiled to JavaScript and WebAssembly via js_of_ocaml.
4
5**Note: This package is browser-only.** It uses DOM APIs and browser events for initialization and cannot be used in Node.js.
6
7This is a fully compliant HTML5 parser implementing the [WHATWG HTML5 specification](https://html.spec.whatwg.org/multipage/parsing.html), passing
8the html5lib-tests conformance suite. It is based on transpiling <https://github.com/validator/validator> into OCaml.
9
10## Installation
11
12```bash
13npm install html5rw-jsoo
14```
15
16## Usage (Browser Only)
17
18### JavaScript Version
19
20```html
21<!DOCTYPE html>
22<html>
23<head>
24 <script src="node_modules/html5rw/htmlrw.js"></script>
25</head>
26<body>
27 <script>
28 // The library initializes on DOMContentLoaded
29 // API documentation coming soon
30 </script>
31</body>
32</html>
33```
34
35### WebAssembly Version
36
37```html
38<!DOCTYPE html>
39<html>
40<head>
41 <script src="node_modules/html5rw/htmlrw.wasm.js"></script>
42</head>
43<body>
44 <script>
45 // Same API as JavaScript version, but runs as WASM
46 // Automatically loads WASM modules from htmlrw_js_main.bc.wasm.assets/
47 </script>
48</body>
49</html>
50```
51
52### Web Worker (Background Validation)
53
54For non-blocking HTML validation in a separate thread:
55
56```javascript
57const worker = new Worker('node_modules/html5rw/htmlrw-worker.js');
58
59worker.onmessage = (e) => {
60 console.log('Validation result:', e.data);
61};
62
63worker.postMessage({ html: '<div><p>Hello' });
64```
65
66WASM version:
67```javascript
68const worker = new Worker('node_modules/html5rw/htmlrw-worker.wasm.js');
69```
70
71## Files Included
72
73| File | Description |
74|------|-------------|
75| `htmlrw.js` | Main library (JavaScript) |
76| `htmlrw.wasm.js` | Main library (WebAssembly loader) |
77| `htmlrw-worker.js` | Web Worker (JavaScript) |
78| `htmlrw-worker.wasm.js` | Web Worker (WebAssembly loader) |
79| `htmlrw-tests.js` | Browser test runner (JavaScript) |
80| `htmlrw-tests.wasm.js` | Browser test runner (WebAssembly loader) |
81| `htmlrw_js_main.bc.wasm.assets/` | WASM modules for main library |
82| `htmlrw_js_worker.bc.wasm.assets/` | WASM modules for web worker |
83| `htmlrw_js_tests_main.bc.wasm.assets/` | WASM modules for test runner |
84
85## Features
86
87- Full HTML5 parsing per WHATWG specification
88- Encoding detection and conversion
89- Error recovery (like browsers)
90- CSS selector queries
91- DOM manipulation
92- HTML serialization
93
94## Source Code
95
96The OCaml source code is available on the `main` branch:
97https://tangled.org/anil.recoil.org/ocaml-html5rw
98
99## License
100
101MIT