OCaml HTML5 parser/serialiser based on Python's JustHTML

Initial npm package setup

- package.json for html5rw npm package
- release.sh to copy built assets from main branch
- README.md with browser-only usage instructions
- LICENSE (MIT)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

+14
.gitignore
··· 1 + # Build outputs 2 + _build/ 3 + *.opam 4 + 5 + # Test files from main branch 6 + examples/ 7 + html5lib-tests/ 8 + third_party/ 9 + validator/ 10 + test.html 11 + broken.html 12 + 13 + # Node modules (if used locally) 14 + node_modules/
+21
LICENSE
··· 1 + MIT License 2 + 3 + Copyright (c) 2024 Anil Madhavapeddy 4 + 5 + Permission is hereby granted, free of charge, to any person obtaining a copy 6 + of this software and associated documentation files (the "Software"), to deal 7 + in the Software without restriction, including without limitation the rights 8 + to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 + copies of the Software, and to permit persons to whom the Software is 10 + furnished to do so, subject to the following conditions: 11 + 12 + The above copyright notice and this permission notice shall be included in all 13 + copies or substantial portions of the Software. 14 + 15 + THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 + IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 + FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 + AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 + LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 + OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 + SOFTWARE.
+107
README.md
··· 1 + # html5rw 2 + 3 + Pure OCaml HTML5 parser compiled to JavaScript and WebAssembly via js_of_ocaml. 4 + 5 + **Note: This package is browser-only.** It uses DOM APIs and browser events for initialization and cannot be used in Node.js. 6 + 7 + This is a fully compliant HTML5 parser implementing the [WHATWG HTML5 specification](https://html.spec.whatwg.org/multipage/parsing.html), passing the html5lib-tests conformance suite. 8 + 9 + ## Installation 10 + 11 + ```bash 12 + npm install html5rw 13 + ``` 14 + 15 + ## Usage (Browser Only) 16 + 17 + ### JavaScript Version 18 + 19 + ```html 20 + <!DOCTYPE html> 21 + <html> 22 + <head> 23 + <script src="node_modules/html5rw/htmlrw.js"></script> 24 + </head> 25 + <body> 26 + <script> 27 + // The library initializes on DOMContentLoaded 28 + // API documentation coming soon 29 + </script> 30 + </body> 31 + </html> 32 + ``` 33 + 34 + ### WebAssembly Version (faster, recommended) 35 + 36 + ```html 37 + <!DOCTYPE html> 38 + <html> 39 + <head> 40 + <script src="node_modules/html5rw/htmlrw.wasm.js"></script> 41 + </head> 42 + <body> 43 + <script> 44 + // Same API as JavaScript version, but runs as WASM 45 + // Automatically loads WASM modules from htmlrw_js_main.bc.wasm.assets/ 46 + </script> 47 + </body> 48 + </html> 49 + ``` 50 + 51 + ### Web Worker (Background Validation) 52 + 53 + For non-blocking HTML validation in a separate thread: 54 + 55 + ```javascript 56 + const worker = new Worker('node_modules/html5rw/htmlrw-worker.js'); 57 + 58 + worker.onmessage = (e) => { 59 + console.log('Validation result:', e.data); 60 + }; 61 + 62 + worker.postMessage({ html: '<div><p>Hello' }); 63 + ``` 64 + 65 + WASM version: 66 + ```javascript 67 + const worker = new Worker('node_modules/html5rw/htmlrw-worker.wasm.js'); 68 + ``` 69 + 70 + ## Files Included 71 + 72 + | File | Description | 73 + |------|-------------| 74 + | `htmlrw.js` | Main library (JavaScript) | 75 + | `htmlrw.wasm.js` | Main library (WebAssembly loader) | 76 + | `htmlrw-worker.js` | Web Worker (JavaScript) | 77 + | `htmlrw-worker.wasm.js` | Web Worker (WebAssembly loader) | 78 + | `htmlrw-tests.js` | Browser test runner (JavaScript) | 79 + | `htmlrw-tests.wasm.js` | Browser test runner (WebAssembly loader) | 80 + | `htmlrw_js_main.bc.wasm.assets/` | WASM modules for main library | 81 + | `htmlrw_js_worker.bc.wasm.assets/` | WASM modules for web worker | 82 + | `htmlrw_js_tests_main.bc.wasm.assets/` | WASM modules for test runner | 83 + 84 + ## Features 85 + 86 + - Full HTML5 parsing per WHATWG specification 87 + - Encoding detection and conversion 88 + - Error recovery (like browsers) 89 + - CSS selector queries 90 + - DOM manipulation 91 + - HTML serialization 92 + 93 + ## Browser Compatibility 94 + 95 + Requires a modern browser with: 96 + - ES6 support 97 + - WebAssembly (for WASM version) 98 + - Web Workers (for worker version) 99 + 100 + ## Source Code 101 + 102 + The OCaml source code is available on the `main` branch: 103 + https://github.com/avsm/ocaml-html5rw 104 + 105 + ## License 106 + 107 + MIT
+36
package.json
··· 1 + { 2 + "name": "html5rw", 3 + "version": "0.1.0", 4 + "description": "Pure OCaml HTML5 parser compiled to JavaScript/WebAssembly via js_of_ocaml", 5 + "browser": "htmlrw.js", 6 + "homepage": "https://tangled.org/@anil.recoil.org/ocaml-html5rw", 7 + "author": "Anil Madhavapeddy <anil@recoil.org>", 8 + "license": "MIT", 9 + "repository": { 10 + "type": "git", 11 + "url": "git+https://github.com/avsm/ocaml-html5rw.git#npm" 12 + }, 13 + "keywords": [ 14 + "ocaml", 15 + "html5", 16 + "parser", 17 + "js_of_ocaml", 18 + "wasm", 19 + "webassembly", 20 + "validator", 21 + "whatwg" 22 + ], 23 + "files": [ 24 + "htmlrw.js", 25 + "htmlrw-worker.js", 26 + "htmlrw-tests.js", 27 + "htmlrw.wasm.js", 28 + "htmlrw-worker.wasm.js", 29 + "htmlrw-tests.wasm.js", 30 + "htmlrw_js_main.bc.wasm.assets/", 31 + "htmlrw_js_worker.bc.wasm.assets/", 32 + "htmlrw_js_tests_main.bc.wasm.assets/", 33 + "README.md", 34 + "LICENSE" 35 + ] 36 + }
+51
release.sh
··· 1 + #!/bin/bash 2 + # Release script for html5rw npm package 3 + # Run from npm branch after building on main 4 + 5 + set -e 6 + 7 + # Path to dune install directory (relative to repo root) 8 + INSTALL_DIR="_build/install/default/share/html5rw-js" 9 + 10 + # Check we're on the npm branch 11 + BRANCH=$(git rev-parse --abbrev-ref HEAD) 12 + if [ "$BRANCH" != "npm" ]; then 13 + echo "Error: Must be on npm branch (currently on $BRANCH)" 14 + exit 1 15 + fi 16 + 17 + # Check the install directory exists 18 + if [ ! -d "$INSTALL_DIR" ]; then 19 + echo "Error: Install directory not found at $INSTALL_DIR" 20 + echo "Run 'opam exec -- dune build @install' on main branch first" 21 + exit 1 22 + fi 23 + 24 + # Copy JavaScript files 25 + echo "Copying JavaScript files..." 26 + cp "$INSTALL_DIR/htmlrw.js" . 27 + cp "$INSTALL_DIR/htmlrw-worker.js" . 28 + cp "$INSTALL_DIR/htmlrw-tests.js" . 29 + 30 + # Copy WASM loader scripts 31 + echo "Copying WASM loader scripts..." 32 + cp "$INSTALL_DIR/htmlrw.wasm.js" . 33 + cp "$INSTALL_DIR/htmlrw-worker.wasm.js" . 34 + cp "$INSTALL_DIR/htmlrw-tests.wasm.js" . 35 + 36 + # Copy WASM assets directories 37 + echo "Copying WASM assets..." 38 + rm -rf htmlrw_js_main.bc.wasm.assets htmlrw_js_worker.bc.wasm.assets htmlrw_js_tests_main.bc.wasm.assets 39 + cp -r "$INSTALL_DIR/htmlrw_js_main.bc.wasm.assets" . 40 + cp -r "$INSTALL_DIR/htmlrw_js_worker.bc.wasm.assets" . 41 + cp -r "$INSTALL_DIR/htmlrw_js_tests_main.bc.wasm.assets" . 42 + 43 + # Fix permissions 44 + echo "Fixing permissions..." 45 + chmod 644 *.js 46 + find htmlrw_js_main.bc.wasm.assets -type f -exec chmod 644 {} \; 47 + find htmlrw_js_worker.bc.wasm.assets -type f -exec chmod 644 {} \; 48 + find htmlrw_js_tests_main.bc.wasm.assets -type f -exec chmod 644 {} \; 49 + 50 + echo "Done! Ready to commit and publish." 51 + echo "Run: git add -A && git commit -m 'Release X.Y.Z' && npm publish"