Phase 6: URL Parser#
Implement the WHATWG URL Standard parser in the url crate (crates/url/src/lib.rs).
Requirements#
- URL record type: scheme, username, password, host, port, path, query, fragment
- URL parsing algorithm per https://url.spec.whatwg.org/#url-parsing
- State machine with all states: scheme start, scheme, authority, host, port, path, query, fragment
- Special scheme handling (http, https, ftp, ws, wss, file)
- Percent-encoding and decoding (UTF-8)
- IDNA/punycode for internationalized domain names (basic support)
- Host parsing: domain, IPv4, IPv6 address parsing
- URL serialization: serialize back to string form
- Relative URL resolution: resolve relative URLs against a base URL
- Origin: derive origin from URL (scheme, host, port tuple)
- API surface:
Url::parse(input),Url::parse_with_base(input, base), accessor methods
Dependencies#
we-encodingcrate (already a dependency in Cargo.toml)
Acceptance Criteria#
- Parse absolute URLs:
https://user:pass@example.com:8080/path?q=1#frag - Parse relative URLs against a base
- IPv4 and IPv6 host parsing
- Percent-encoding/decoding
- Special scheme default ports (http=80, https=443, etc.)
- Serialization round-trips correctly
- Edge cases: empty path, missing components, trailing slashes
- Comprehensive test suite (50+ tests)
-
cargo clippyandcargo fmtclean