Link Search Menu Expand Document Moon Sun

Text To Speech

After handling the voice command, it’s common to give the user a response in the form of a speech. Rhasspy uses a text to speech system (TTS) to generate a voice out of text. We tested two TTS,Mozilla TTS and Nano TTS.


To trigger the TTS to say something, you must publish the MQTT message /hermes/tts/say with the attribute text and the attribute siteId. The attribute text must contain the text you want to hear from the TTS. After you send it out, the TTS produces a voice wav file and plays it on the speaker of your rhasspy. When rhasspy finished playing the wav file, the MQTT message hermes/tts/sayFinished is send with the same siteId as /hermes/tts/say.

What systems are available?

There are different text to speech systems, and the list below shows some examples with the available language:

system en de es fr it nl ru el hi zh vi pt ca
eSpeak x x x x x x x x x x x x x
Flite x               x        
PicoTTS x x x x x                
NanoTTS x x x x x                
MaryTTS x x   x x x              
OpenTTS x x x x x x x x x x x x x
WaveNet x x x x x x x   x x     x

eSpeak: is the default text to speech system and has the widest support.

Flite: sounds better than eSpeak, but can support only fewer languages. Uses FestVox’s Flite for speech.

PicoTTS: uses SVOX’s PicoTTS for text to speech.

NanoTTS: uses an imported fork of SVOX’s picoTTs for test to speech.

MaryTTS: has a MaryTTS Docker image available with many voices included. Use a remote MaryTTS web server.

OpenTTS: Uses a remote OpenTTS, supports many different text to speech system such as Mozilla TTS.

Google WaveNet: requires a google account and internet connection to function

Table of contents