pierrelf.com / we

we (web engine): Experimental web browser project to understand the limits of Claude

JS Lexer/Tokenizer (ECMAScript 2024) #90

open opened by

pierrelf.com 1 week ago

Implement a JavaScript lexer/tokenizer conforming to ECMAScript 2024 specification.

Scope#

Create the we-js crate with a tokenizer that converts JavaScript source text into a stream of tokens.

Token Types#

Identifiers and keywords: all ES2024 keywords (var, let, const, function, class, if, else, for, while, do, switch, case, break, continue, return, throw, try, catch, finally, new, delete, typeof, instanceof, void, in, of, import, export, default, async, await, yield, etc.)
Punctuators: all operators and delimiters (+, -, *, /, %, **, =, ==, ===, !=, !==, <, >, <=, >=, &&, ||, ??, ?., ..., =>, etc.)
Numeric literals: decimal, hex (0x), octal (0o), binary (0b), floating point, exponential notation
String literals: single and double quoted, escape sequences (\n, \t, \uXXXX, \u{XXXXX}, \\, etc.)
Template literals: backtick strings with ${...} interpolation support (TemplateHead, TemplateMiddle, TemplateTail tokens)
Regular expression literals: /pattern/flags
Comments: single-line (//) and multi-line (/* */) — skip or optionally preserve
Boolean/null literals: true, false, null

Features#

Track source positions (line, column) for error reporting
Handle Unicode identifiers (basic — at minimum ASCII plus common Unicode letters)
Distinguish division / from RegExp literal / based on context
Automatic semicolon insertion awareness (track newlines between tokens)

Acceptance Criteria#

Token enum with all ES2024 token types
Lexer struct that takes &str source and produces Vec<Token> or iterator
Correct tokenization of all numeric literal forms
Correct string literal parsing with escape sequences
Template literal tokenization
Keyword vs identifier distinction
Source position tracking (line/column on each token)
Unit tests covering each token type, edge cases, and error handling

Phase 10 — JavaScript Engine (issue 1 of 15). No dependencies.

sign up or login to add to the discussion

Labels

None yet.

assignee

None yet.

Participants 1

AT URI

at://did:plc:meotu43t6usg4qdwzenk4s2t/sh.tangled.repo.issue/3mhn3jhhfd32f