Problem#
The browser's built-in Web Speech API (speechSynthesis) produces low-quality, robotic-sounding Tagalog speech. Voice availability varies across browsers and OS, and many systems have no Filipino voice at all. This makes the "hear it" buttons and SpeakExercise component unreliable and unpleasant.
Solution#
Add a server-side TTS endpoint powered by Microsoft Edge's free Read Aloud TTS service via the msedge-tts Rust crate. This provides high-quality neural voices for Filipino (Tagalog) with zero cost and no API keys.
Available Filipino voices#
fil-PH-BlessicaNeural(Female) — use as defaultfil-PH-AngeloNeural(Male)
API Design#
GET /api/tts?text=<text>&voice=<voice>#
- text (required): The text to synthesize
- voice (optional): Voice name, defaults to
fil-PH-BlessicaNeural - Response:
audio/mpegbinary stream - Cache: Set
Cache-Control: public, max-age=86400since the same text+voice always produces the same audio. Consider an on-disk LRU cache in/app/data/tts-cache/to avoid re-synthesizing repeated phrases (lesson content is static).
Implementation#
Backend (api/)#
- Add
msedge-ttstoCargo.toml(async support) - Create
src/routes/tts.rs:- Connect to Edge TTS async client
- Synthesize text with the requested voice
- Return the MP3 audio bytes with appropriate content-type
- Validate text length (max 500 chars) to prevent abuse
- No auth required — lesson content is public
- Add optional file-based cache: hash
(text, voice)→ cache file indata/tts-cache/. Serve from cache on hit. - Register route in
main.rs:.route("/api/tts", get(routes::tts::synthesize))
Frontend (web/)#
- Update
useSpeechhook:- Replace
speechSynthesis.speak()with fetching/api/tts?text=...and playing the returned audio vianew Audio(blob_url) - Remove browser TTS feature detection (no longer needed — server always available)
- Keep STT (SpeechRecognition) as-is — that's input, not output
- Replace
- Update
SpeakButtoncomponent: show a loading spinner while audio is being fetched - Update
SpeakExercise: the TTS playback for the target phrase should use the new endpoint
Cleanup#
- Remove
speechSynthesisusage fromuseSpeech.ts(thespeakfunction) - Keep
isSupported.sttdetection for SpeechRecognition — only TTS is being replaced
Crate reference#
- Crate:
msedge-ttson crates.io (https://crates.io/crates/msedge-tts) - Docs: https://docs.rs/msedge-tts
- Async client:
msedge_tts::tts::client::connect_async - Voice listing:
msedge_tts::voice::get_voices_list_async - Filter voices by locale
fil-PHto get Filipino options
Verification#
- Start dev server, visit a lesson with
promptAudio: true - Click the speaker button — should hear natural Filipino speech (not robotic)
- Verify SpeakExercise plays the target phrase clearly
- Verify audio caching works (second click should be instant)
- Test with both voices by passing
?voice=fil-PH-AngeloNeural - Verify Docker build still works (
make build)