we (web engine): Experimental web browser project to understand the limits of Claude

Baseline JIT compiler: compile hot functions to native code #139

open opened by pierrelf.com

Summary#

Implement the baseline JIT compiler that translates frequently-executed bytecode functions into AArch64 machine code. This is a simple, fast single-pass compiler — not an optimizing compiler.

Background#

With the assembler infrastructure in place, this issue builds the actual compilation pipeline: detect hot functions, translate bytecode to native code, and execute the compiled version. The baseline JIT does a 1:1 translation of bytecode ops to machine code with minimal optimization.

Acceptance Criteria#

  • Hot function detection: track per-function call count; tier up to JIT after threshold (e.g., 100 calls)
  • Bytecode → AArch64 compiler that handles core opcodes:
    • LoadConst, LoadNull, LoadUndefined, LoadTrue, LoadFalse, Move, LoadInt8
    • Add, Sub, Mul, Div, Rem, Neg (with type checks — fast path for numbers, slow path call-out)
    • Eq, StrictEq, LessThan, GreaterThan and other comparisons
    • Jump, JumpIfTrue, JumpIfFalse
    • LoadGlobal, StoreGlobal (via pointer to globals table)
    • GetPropertyByName, SetPropertyByName (with inline cache integration)
    • Call, Return
  • Value representation in registers: NaN-boxing or tagged pointers for Value in machine registers
  • Slow-path call-outs: for complex operations (e.g., string concatenation, object allocation), call back into Rust VM helper functions
  • GC safepoints: at function calls and loop back-edges, allow GC to run
  • OSR (on-stack replacement): not required for baseline, but function entry JIT is sufficient
  • Deoptimization: if a type guard fails (e.g., IC shape mismatch), bail out to interpreter
  • Compiled code is stored per-function and reused across calls
  • All existing JS tests pass (JIT should be semantically identical to interpreter)
  • Add tests that exercise JIT compilation (functions called >threshold times)

Implementation Notes#

  • Register allocation: simple linear scan or even fixed mapping (bytecode register N → machine register or stack slot)
  • The bytecode is register-based with max 255 registers — most will spill to stack frame
  • Use x19-x28 for frequently-used VM state (e.g., x19 = register file base, x20 = VM pointer, x21 = GC pointer)
  • Floating-point ops use d0-d31 NEON registers
  • For the baseline compiler, correctness > speed — keep it simple
  • Slow paths call Rust functions via extern "C" ABI

Dependencies#

  • Requires: AArch64 assembler infrastructure issue
  • Benefits from: Object shapes + inline caches (for fast property access in JIT code)

Phase#

Phase 15: Performance

sign up or login to add to the discussion
Labels

None yet.

assignee

None yet.

Participants 1
AT URI
at://did:plc:meotu43t6usg4qdwzenk4s2t/sh.tangled.repo.issue/3mi4zygmgnn2b