# html5rw - Pure OCaml HTML5 Parser A pure OCaml HTML5 parser implementing the WHATWG HTML5 parsing specification. This library passes the html5lib-tests suite and provides full support for tokenization, tree construction, encoding detection, and CSS selector queries. This library was ported from [JustHTML](https://github.com/EmilStenstrom/justhtml/). ## Key Features - **WHATWG Compliant**: Implements the full HTML5 parsing algorithm with proper error recovery - **CSS Selectors**: Query the DOM using standard CSS selector syntax - **Streaming I/O**: Uses bytesrw for efficient streaming input/output - **Encoding Detection**: Automatic character encoding detection following the WHATWG algorithm - **Entity Decoding**: Complete HTML5 named character reference support ## Usage ```ocaml open Bytesrw (* Parse HTML from a string *) let html = "

Hello, world!

" let reader = Bytes.Reader.of_string html let doc = Html5rw.parse reader (* Query with CSS selectors *) let paragraphs = Html5rw.query doc "p" (* Extract text content *) let text = Html5rw.to_text doc (* Serialize back to HTML *) let output = Html5rw.to_string doc ``` For fragment parsing (innerHTML): ```ocaml (* Parse as innerHTML of a
*) let ctx = Html5rw.make_fragment_context ~tag_name:"div" () let reader = Bytes.Reader.of_string "

Fragment content

" let doc = Html5rw.parse ~fragment_context:ctx reader ``` ## Installation ``` opam install html5rw ``` ## Documentation API documentation is available via: ``` opam install html5rw odig doc html5rw ``` ## License MIT