Summary#
Implement speculative HTML parsing by running the HTML tokenizer on a background thread while the main thread executes scripts, enabling faster page loads.
Background#
When the HTML parser encounters a <script> tag, it must pause tree construction and execute the script (which may call document.write). During this pause, the tokenizer could speculatively continue scanning the rest of the document to discover resources (<link>, <img>, <script src>) that can be fetched in parallel. This is a major performance win for script-heavy pages.
Acceptance Criteria#
- Speculative tokenizer: a second tokenizer instance that scans ahead in the HTML input while the main parser is blocked on script execution
- Resource discovery: the speculative tokenizer identifies preloadable resources:
<link rel="stylesheet" href="...">— CSS files<script src="...">— JavaScript files<img src="...">— Images<link rel="preload" href="..." as="...">— Preload hints
- Preload queue: discovered URLs are sent to the resource loader for early fetching
- Speculation results: speculatively tokenized tokens are buffered and reused by the main parser if
document.writedidn't invalidate them - Invalidation: if
document.writeinjects content, discard speculative results from that point forward and re-tokenize - Thread safety: speculative tokenizer runs on a background thread; communication via channels (
std::sync::mpsc) - Main parser behavior is unchanged when speculation is disabled
- All existing HTML parsing tests pass
- Add tests for: speculation hit (no document.write), speculation miss (document.write invalidates)
Implementation Notes#
- The speculative tokenizer only needs to find tags — it doesn't need to build a DOM tree
- It can use a simplified state machine that only tracks tag names and
src/hrefattributes - Communication: main thread sends (html_bytes, start_offset) to speculative thread; speculative thread sends back Vec
- Use
std::thread::spawnandstd::sync::mpsc::channel— no external threading crates - The speculative tokenizer should be conservative: if it encounters
<script>(without src), it should stop speculating until the main parser catches up - This is an optimization — the browser must work correctly with speculation disabled
Dependencies#
None — independent of other Phase 15 work.
Phase#
Phase 15: Performance