OCaml HTML5 parser/serialiser based on Python's JustHTML
1# html5rw 2 3Pure OCaml HTML5 parser compiled to JavaScript and WebAssembly via js_of_ocaml. 4 5**Note: This package is browser-only.** It uses DOM APIs and browser events for initialization and cannot be used in Node.js. 6 7This is a fully compliant HTML5 parser implementing the [WHATWG HTML5 specification](https://html.spec.whatwg.org/multipage/parsing.html), passing 8the html5lib-tests conformance suite. It is based on transpiling <https://github.com/validator/validator> into OCaml. 9 10## Installation 11 12```bash 13npm install html5rw-jsoo 14``` 15 16## Usage (Browser Only) 17 18### JavaScript Version 19 20```html 21<!DOCTYPE html> 22<html> 23<head> 24 <script src="node_modules/html5rw/htmlrw.js"></script> 25</head> 26<body> 27 <script> 28 // The library initializes on DOMContentLoaded 29 // API documentation coming soon 30 </script> 31</body> 32</html> 33``` 34 35### WebAssembly Version 36 37```html 38<!DOCTYPE html> 39<html> 40<head> 41 <script src="node_modules/html5rw/htmlrw.wasm.js"></script> 42</head> 43<body> 44 <script> 45 // Same API as JavaScript version, but runs as WASM 46 // Automatically loads WASM modules from htmlrw_js_main.bc.wasm.assets/ 47 </script> 48</body> 49</html> 50``` 51 52### Web Worker (Background Validation) 53 54For non-blocking HTML validation in a separate thread: 55 56```javascript 57const worker = new Worker('node_modules/html5rw/htmlrw-worker.js'); 58 59worker.onmessage = (e) => { 60 console.log('Validation result:', e.data); 61}; 62 63worker.postMessage({ html: '<div><p>Hello' }); 64``` 65 66WASM version: 67```javascript 68const worker = new Worker('node_modules/html5rw/htmlrw-worker.wasm.js'); 69``` 70 71## Files Included 72 73| File | Description | 74|------|-------------| 75| `htmlrw.js` | Main library (JavaScript) | 76| `htmlrw.wasm.js` | Main library (WebAssembly loader) | 77| `htmlrw-worker.js` | Web Worker (JavaScript) | 78| `htmlrw-worker.wasm.js` | Web Worker (WebAssembly loader) | 79| `htmlrw-tests.js` | Browser test runner (JavaScript) | 80| `htmlrw-tests.wasm.js` | Browser test runner (WebAssembly loader) | 81| `htmlrw_js_main.bc.wasm.assets/` | WASM modules for main library | 82| `htmlrw_js_worker.bc.wasm.assets/` | WASM modules for web worker | 83| `htmlrw_js_tests_main.bc.wasm.assets/` | WASM modules for test runner | 84 85## Features 86 87- Full HTML5 parsing per WHATWG specification 88- Encoding detection and conversion 89- Error recovery (like browsers) 90- CSS selector queries 91- DOM manipulation 92- HTML serialization 93 94## Source Code 95 96The OCaml source code is available on the `main` branch: 97https://tangled.org/anil.recoil.org/ocaml-html5rw 98 99## License 100 101MIT