Replace browser TTS with Edge TTS server-side endpoint #3

Problem#

The browser's built-in Web Speech API (speechSynthesis) produces low-quality, robotic-sounding Tagalog speech. Voice availability varies across browsers and OS, and many systems have no Filipino voice at all. This makes the "hear it" buttons and SpeakExercise component unreliable and unpleasant.

Solution#

Add a server-side TTS endpoint powered by Microsoft Edge's free Read Aloud TTS service via the msedge-tts Rust crate. This provides high-quality neural voices for Filipino (Tagalog) with zero cost and no API keys.

Available Filipino voices#

fil-PH-BlessicaNeural (Female) — use as default
fil-PH-AngeloNeural (Male)

API Design#

`GET /api/tts?text=<text>&voice=<voice>`#

text (required): The text to synthesize
voice (optional): Voice name, defaults to fil-PH-BlessicaNeural
Response: audio/mpeg binary stream
Cache: Set Cache-Control: public, max-age=86400 since the same text+voice always produces the same audio. Consider an on-disk LRU cache in /app/data/tts-cache/ to avoid re-synthesizing repeated phrases (lesson content is static).

Implementation#

Backend (`api/`)#

Add msedge-tts to Cargo.toml (async support)
Create src/routes/tts.rs:
- Connect to Edge TTS async client
- Synthesize text with the requested voice
- Return the MP3 audio bytes with appropriate content-type
- Validate text length (max 500 chars) to prevent abuse
- No auth required — lesson content is public
Add optional file-based cache: hash (text, voice) → cache file in data/tts-cache/. Serve from cache on hit.
Register route in main.rs: .route("/api/tts", get(routes::tts::synthesize))

Frontend (`web/`)#

Update useSpeech hook:
- Replace speechSynthesis.speak() with fetching /api/tts?text=... and playing the returned audio via new Audio(blob_url)
- Remove browser TTS feature detection (no longer needed — server always available)
- Keep STT (SpeechRecognition) as-is — that's input, not output
Update SpeakButton component: show a loading spinner while audio is being fetched
Update SpeakExercise: the TTS playback for the target phrase should use the new endpoint

Cleanup#

Remove speechSynthesis usage from useSpeech.ts (the speak function)
Keep isSupported.stt detection for SpeechRecognition — only TTS is being replaced

Crate reference#

Crate: msedge-tts on crates.io (https://crates.io/crates/msedge-tts)
Docs: https://docs.rs/msedge-tts
Async client: msedge_tts::tts::client::connect_async
Voice listing: msedge_tts::voice::get_voices_list_async
Filter voices by locale fil-PH to get Filipino options

Verification#

Start dev server, visit a lesson with promptAudio: true
Click the speaker button — should hear natural Filipino speech (not robotic)
Verify SpeakExercise plays the target phrase clearly
Verify audio caching works (second click should be instant)
Test with both voices by passing ?voice=fil-PH-AngeloNeural
Verify Docker build still works (make build)