+14
.gitignore
+14
.gitignore
+21
LICENSE
+21
LICENSE
···
1
+
MIT License
2
+
3
+
Copyright (c) 2024 Anil Madhavapeddy
4
+
5
+
Permission is hereby granted, free of charge, to any person obtaining a copy
6
+
of this software and associated documentation files (the "Software"), to deal
7
+
in the Software without restriction, including without limitation the rights
8
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+
copies of the Software, and to permit persons to whom the Software is
10
+
furnished to do so, subject to the following conditions:
11
+
12
+
The above copyright notice and this permission notice shall be included in all
13
+
copies or substantial portions of the Software.
14
+
15
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+
SOFTWARE.
+107
README.md
+107
README.md
···
1
+
# html5rw
2
+
3
+
Pure OCaml HTML5 parser compiled to JavaScript and WebAssembly via js_of_ocaml.
4
+
5
+
**Note: This package is browser-only.** It uses DOM APIs and browser events for initialization and cannot be used in Node.js.
6
+
7
+
This is a fully compliant HTML5 parser implementing the [WHATWG HTML5 specification](https://html.spec.whatwg.org/multipage/parsing.html), passing the html5lib-tests conformance suite.
8
+
9
+
## Installation
10
+
11
+
```bash
12
+
npm install html5rw
13
+
```
14
+
15
+
## Usage (Browser Only)
16
+
17
+
### JavaScript Version
18
+
19
+
```html
20
+
<!DOCTYPE html>
21
+
<html>
22
+
<head>
23
+
<script src="node_modules/html5rw/htmlrw.js"></script>
24
+
</head>
25
+
<body>
26
+
<script>
27
+
// The library initializes on DOMContentLoaded
28
+
// API documentation coming soon
29
+
</script>
30
+
</body>
31
+
</html>
32
+
```
33
+
34
+
### WebAssembly Version (faster, recommended)
35
+
36
+
```html
37
+
<!DOCTYPE html>
38
+
<html>
39
+
<head>
40
+
<script src="node_modules/html5rw/htmlrw.wasm.js"></script>
41
+
</head>
42
+
<body>
43
+
<script>
44
+
// Same API as JavaScript version, but runs as WASM
45
+
// Automatically loads WASM modules from htmlrw_js_main.bc.wasm.assets/
46
+
</script>
47
+
</body>
48
+
</html>
49
+
```
50
+
51
+
### Web Worker (Background Validation)
52
+
53
+
For non-blocking HTML validation in a separate thread:
54
+
55
+
```javascript
56
+
const worker = new Worker('node_modules/html5rw/htmlrw-worker.js');
57
+
58
+
worker.onmessage = (e) => {
59
+
console.log('Validation result:', e.data);
60
+
};
61
+
62
+
worker.postMessage({ html: '<div><p>Hello' });
63
+
```
64
+
65
+
WASM version:
66
+
```javascript
67
+
const worker = new Worker('node_modules/html5rw/htmlrw-worker.wasm.js');
68
+
```
69
+
70
+
## Files Included
71
+
72
+
| File | Description |
73
+
|------|-------------|
74
+
| `htmlrw.js` | Main library (JavaScript) |
75
+
| `htmlrw.wasm.js` | Main library (WebAssembly loader) |
76
+
| `htmlrw-worker.js` | Web Worker (JavaScript) |
77
+
| `htmlrw-worker.wasm.js` | Web Worker (WebAssembly loader) |
78
+
| `htmlrw-tests.js` | Browser test runner (JavaScript) |
79
+
| `htmlrw-tests.wasm.js` | Browser test runner (WebAssembly loader) |
80
+
| `htmlrw_js_main.bc.wasm.assets/` | WASM modules for main library |
81
+
| `htmlrw_js_worker.bc.wasm.assets/` | WASM modules for web worker |
82
+
| `htmlrw_js_tests_main.bc.wasm.assets/` | WASM modules for test runner |
83
+
84
+
## Features
85
+
86
+
- Full HTML5 parsing per WHATWG specification
87
+
- Encoding detection and conversion
88
+
- Error recovery (like browsers)
89
+
- CSS selector queries
90
+
- DOM manipulation
91
+
- HTML serialization
92
+
93
+
## Browser Compatibility
94
+
95
+
Requires a modern browser with:
96
+
- ES6 support
97
+
- WebAssembly (for WASM version)
98
+
- Web Workers (for worker version)
99
+
100
+
## Source Code
101
+
102
+
The OCaml source code is available on the `main` branch:
103
+
https://github.com/avsm/ocaml-html5rw
104
+
105
+
## License
106
+
107
+
MIT
+36
package.json
+36
package.json
···
1
+
{
2
+
"name": "html5rw",
3
+
"version": "0.1.0",
4
+
"description": "Pure OCaml HTML5 parser compiled to JavaScript/WebAssembly via js_of_ocaml",
5
+
"browser": "htmlrw.js",
6
+
"homepage": "https://tangled.org/@anil.recoil.org/ocaml-html5rw",
7
+
"author": "Anil Madhavapeddy <anil@recoil.org>",
8
+
"license": "MIT",
9
+
"repository": {
10
+
"type": "git",
11
+
"url": "git+https://github.com/avsm/ocaml-html5rw.git#npm"
12
+
},
13
+
"keywords": [
14
+
"ocaml",
15
+
"html5",
16
+
"parser",
17
+
"js_of_ocaml",
18
+
"wasm",
19
+
"webassembly",
20
+
"validator",
21
+
"whatwg"
22
+
],
23
+
"files": [
24
+
"htmlrw.js",
25
+
"htmlrw-worker.js",
26
+
"htmlrw-tests.js",
27
+
"htmlrw.wasm.js",
28
+
"htmlrw-worker.wasm.js",
29
+
"htmlrw-tests.wasm.js",
30
+
"htmlrw_js_main.bc.wasm.assets/",
31
+
"htmlrw_js_worker.bc.wasm.assets/",
32
+
"htmlrw_js_tests_main.bc.wasm.assets/",
33
+
"README.md",
34
+
"LICENSE"
35
+
]
36
+
}
+51
release.sh
+51
release.sh
···
1
+
#!/bin/bash
2
+
# Release script for html5rw npm package
3
+
# Run from npm branch after building on main
4
+
5
+
set -e
6
+
7
+
# Path to dune install directory (relative to repo root)
8
+
INSTALL_DIR="_build/install/default/share/html5rw-js"
9
+
10
+
# Check we're on the npm branch
11
+
BRANCH=$(git rev-parse --abbrev-ref HEAD)
12
+
if [ "$BRANCH" != "npm" ]; then
13
+
echo "Error: Must be on npm branch (currently on $BRANCH)"
14
+
exit 1
15
+
fi
16
+
17
+
# Check the install directory exists
18
+
if [ ! -d "$INSTALL_DIR" ]; then
19
+
echo "Error: Install directory not found at $INSTALL_DIR"
20
+
echo "Run 'opam exec -- dune build @install' on main branch first"
21
+
exit 1
22
+
fi
23
+
24
+
# Copy JavaScript files
25
+
echo "Copying JavaScript files..."
26
+
cp "$INSTALL_DIR/htmlrw.js" .
27
+
cp "$INSTALL_DIR/htmlrw-worker.js" .
28
+
cp "$INSTALL_DIR/htmlrw-tests.js" .
29
+
30
+
# Copy WASM loader scripts
31
+
echo "Copying WASM loader scripts..."
32
+
cp "$INSTALL_DIR/htmlrw.wasm.js" .
33
+
cp "$INSTALL_DIR/htmlrw-worker.wasm.js" .
34
+
cp "$INSTALL_DIR/htmlrw-tests.wasm.js" .
35
+
36
+
# Copy WASM assets directories
37
+
echo "Copying WASM assets..."
38
+
rm -rf htmlrw_js_main.bc.wasm.assets htmlrw_js_worker.bc.wasm.assets htmlrw_js_tests_main.bc.wasm.assets
39
+
cp -r "$INSTALL_DIR/htmlrw_js_main.bc.wasm.assets" .
40
+
cp -r "$INSTALL_DIR/htmlrw_js_worker.bc.wasm.assets" .
41
+
cp -r "$INSTALL_DIR/htmlrw_js_tests_main.bc.wasm.assets" .
42
+
43
+
# Fix permissions
44
+
echo "Fixing permissions..."
45
+
chmod 644 *.js
46
+
find htmlrw_js_main.bc.wasm.assets -type f -exec chmod 644 {} \;
47
+
find htmlrw_js_worker.bc.wasm.assets -type f -exec chmod 644 {} \;
48
+
find htmlrw_js_tests_main.bc.wasm.assets -type f -exec chmod 644 {} \;
49
+
50
+
echo "Done! Ready to commit and publish."
51
+
echo "Run: git add -A && git commit -m 'Release X.Y.Z' && npm publish"