this repo has no description

Replace browser TTS with Edge TTS server-side endpoint #3

open opened by pierrelf.com

Problem#

The browser's built-in Web Speech API (speechSynthesis) produces low-quality, robotic-sounding Tagalog speech. Voice availability varies across browsers and OS, and many systems have no Filipino voice at all. This makes the "hear it" buttons and SpeakExercise component unreliable and unpleasant.

Solution#

Add a server-side TTS endpoint powered by Microsoft Edge's free Read Aloud TTS service via the msedge-tts Rust crate. This provides high-quality neural voices for Filipino (Tagalog) with zero cost and no API keys.

Available Filipino voices#

  • fil-PH-BlessicaNeural (Female) — use as default
  • fil-PH-AngeloNeural (Male)

API Design#

GET /api/tts?text=<text>&voice=<voice>#

  • text (required): The text to synthesize
  • voice (optional): Voice name, defaults to fil-PH-BlessicaNeural
  • Response: audio/mpeg binary stream
  • Cache: Set Cache-Control: public, max-age=86400 since the same text+voice always produces the same audio. Consider an on-disk LRU cache in /app/data/tts-cache/ to avoid re-synthesizing repeated phrases (lesson content is static).

Implementation#

Backend (api/)#

  1. Add msedge-tts to Cargo.toml (async support)
  2. Create src/routes/tts.rs:
    • Connect to Edge TTS async client
    • Synthesize text with the requested voice
    • Return the MP3 audio bytes with appropriate content-type
    • Validate text length (max 500 chars) to prevent abuse
    • No auth required — lesson content is public
  3. Add optional file-based cache: hash (text, voice) → cache file in data/tts-cache/. Serve from cache on hit.
  4. Register route in main.rs: .route("/api/tts", get(routes::tts::synthesize))

Frontend (web/)#

  1. Update useSpeech hook:
    • Replace speechSynthesis.speak() with fetching /api/tts?text=... and playing the returned audio via new Audio(blob_url)
    • Remove browser TTS feature detection (no longer needed — server always available)
    • Keep STT (SpeechRecognition) as-is — that's input, not output
  2. Update SpeakButton component: show a loading spinner while audio is being fetched
  3. Update SpeakExercise: the TTS playback for the target phrase should use the new endpoint

Cleanup#

  1. Remove speechSynthesis usage from useSpeech.ts (the speak function)
  2. Keep isSupported.stt detection for SpeechRecognition — only TTS is being replaced

Crate reference#

Verification#

  1. Start dev server, visit a lesson with promptAudio: true
  2. Click the speaker button — should hear natural Filipino speech (not robotic)
  3. Verify SpeakExercise plays the target phrase clearly
  4. Verify audio caching works (second click should be instant)
  5. Test with both voices by passing ?voice=fil-PH-AngeloNeural
  6. Verify Docker build still works (make build)
sign up or login to add to the discussion
Labels

None yet.

assignee

None yet.

Participants 1
AT URI
at://did:plc:meotu43t6usg4qdwzenk4s2t/sh.tangled.repo.issue/3mgxe2llct32x