we (web engine): Experimental web browser project to understand the limits of Claude

JS Lexer/Tokenizer (ECMAScript 2024) #90

open opened by pierrelf.com

Implement a JavaScript lexer/tokenizer conforming to ECMAScript 2024 specification.

Scope#

Create the we-js crate with a tokenizer that converts JavaScript source text into a stream of tokens.

Token Types#

  • Identifiers and keywords: all ES2024 keywords (var, let, const, function, class, if, else, for, while, do, switch, case, break, continue, return, throw, try, catch, finally, new, delete, typeof, instanceof, void, in, of, import, export, default, async, await, yield, etc.)
  • Punctuators: all operators and delimiters (+, -, *, /, %, **, =, ==, ===, !=, !==, <, >, <=, >=, &&, ||, ??, ?., ..., =>, etc.)
  • Numeric literals: decimal, hex (0x), octal (0o), binary (0b), floating point, exponential notation
  • String literals: single and double quoted, escape sequences (\n, \t, \uXXXX, \u{XXXXX}, \\, etc.)
  • Template literals: backtick strings with ${...} interpolation support (TemplateHead, TemplateMiddle, TemplateTail tokens)
  • Regular expression literals: /pattern/flags
  • Comments: single-line (//) and multi-line (/* */) — skip or optionally preserve
  • Boolean/null literals: true, false, null

Features#

  • Track source positions (line, column) for error reporting
  • Handle Unicode identifiers (basic — at minimum ASCII plus common Unicode letters)
  • Distinguish division / from RegExp literal / based on context
  • Automatic semicolon insertion awareness (track newlines between tokens)

Acceptance Criteria#

  • Token enum with all ES2024 token types
  • Lexer struct that takes &str source and produces Vec<Token> or iterator
  • Correct tokenization of all numeric literal forms
  • Correct string literal parsing with escape sequences
  • Template literal tokenization
  • Keyword vs identifier distinction
  • Source position tracking (line/column on each token)
  • Unit tests covering each token type, edge cases, and error handling

Phase 10 — JavaScript Engine (issue 1 of 15). No dependencies.

sign up or login to add to the discussion
Labels

None yet.

assignee

None yet.

Participants 1
AT URI
at://did:plc:meotu43t6usg4qdwzenk4s2t/sh.tangled.repo.issue/3mhn3jhhfd32f