we (web engine): Experimental web browser project to understand the limits of Claude

WHATWG URL parser (full spec compliance) #56

open opened by pierrelf.com

Phase 6: URL Parser#

Implement the WHATWG URL Standard parser in the url crate (crates/url/src/lib.rs).

Requirements#

  • URL record type: scheme, username, password, host, port, path, query, fragment
  • URL parsing algorithm per https://url.spec.whatwg.org/#url-parsing
    • State machine with all states: scheme start, scheme, authority, host, port, path, query, fragment
    • Special scheme handling (http, https, ftp, ws, wss, file)
    • Percent-encoding and decoding (UTF-8)
    • IDNA/punycode for internationalized domain names (basic support)
  • Host parsing: domain, IPv4, IPv6 address parsing
  • URL serialization: serialize back to string form
  • Relative URL resolution: resolve relative URLs against a base URL
  • Origin: derive origin from URL (scheme, host, port tuple)
  • API surface: Url::parse(input), Url::parse_with_base(input, base), accessor methods

Dependencies#

  • we-encoding crate (already a dependency in Cargo.toml)

Acceptance Criteria#

  • Parse absolute URLs: https://user:pass@example.com:8080/path?q=1#frag
  • Parse relative URLs against a base
  • IPv4 and IPv6 host parsing
  • Percent-encoding/decoding
  • Special scheme default ports (http=80, https=443, etc.)
  • Serialization round-trips correctly
  • Edge cases: empty path, missing components, trailing slashes
  • Comprehensive test suite (50+ tests)
  • cargo clippy and cargo fmt clean
sign up or login to add to the discussion
Labels

None yet.

assignee

None yet.

Participants 1
AT URI
at://did:plc:meotu43t6usg4qdwzenk4s2t/sh.tangled.repo.issue/3mguyjqzt7g2u