Phase 8 — Resource Loading + Character Encoding + Real Page Loading#
Implement legacy single-byte text encodings in the encoding crate per the WHATWG Encoding Standard.
Requirements#
- Windows-1252 (aka cp1252): the most common legacy Western encoding
- ISO-8859-1 (Latin-1): maps directly to first 256 Unicode codepoints
- ISO-8859-2 through ISO-8859-16: Central/Eastern European, Cyrillic, Greek, Arabic, Hebrew, etc.
- Windows-874, Windows-1250 through Windows-1258: Windows codepages
- macintosh (Mac OS Roman)
- IBM866, KOI8-R, KOI8-U: Cyrillic encodings
Each single-byte encoding is a 128-entry lookup table mapping bytes 0x80–0xFF to Unicode codepoints (bytes 0x00–0x7F are ASCII).
Implementation#
- Define each encoding as a
[u16; 128]or[char; 128]lookup table (indexed bybyte - 0x80) - Decoder: for bytes < 0x80, use ASCII; for bytes >= 0x80, look up in table
- Unmapped bytes produce U+FFFD (replacement character)
- Register all WHATWG encoding labels as aliases
Acceptance Criteria#
- Windows-1252 and ISO-8859-1 decoders work correctly
- At least 5 additional single-byte encodings implemented
- All WHATWG-specified label aliases map correctly
- No external dependencies, no
unsafe - Unit tests with known byte sequences and expected Unicode output
Dependencies#
Depends on: WHATWG Encoding: UTF-8 and UTF-16 codecs (for the shared Encoding trait/API)