Prerequisites: API key
Get a Voice ID (Optional)
Optionally get a
voice_id from your dashboard or clone a voice. Skip this to use the default voice.Generate Speech
Generate speech from text. Include
voice_id if you have one, or omit it to use the default voice. See the Generate Speech endpoint for details.Streaming
HTTP Streaming (Simple)
For simple request/response streaming, use the HTTP streaming endpoint:WebSocket Streaming (Optimal for Conversational AI)
For lowest latency in multi-turn conversations, use the Multi-Context WebSocket (/multi-stream):
Supported Languages
The TTS API supports multiple languages. Specify thelanguage parameter using ISO 639-1 language codes. If not provided, the API defaults to English (en).
| Language Code | Language | Model Type |
|---|---|---|
en | English | Non-multilingual |
ca | Catalan | Multilingual |
sv | Swedish | Multilingual |
es | Spanish | Multilingual |
fr | French | Multilingual |
de | German | Multilingual |
it | Italian | Multilingual |
pt | Portuguese | Multilingual |
pl | Polish | Multilingual |
ru | Russian | Multilingual |
nl | Dutch | Multilingual |
Model Selection: The API automatically selects the appropriate model based on the language. English uses
voiceai-tts-v1-latest (non-multilingual), while all other languages use voiceai-tts-multilingual-v1-latest. You can override this by explicitly specifying the model parameter.Audio Output
The TTS API supports multiple audio formats with various sample rates and bitrates. Basic formats (mp3, wav, pcm) output at 32kHz sample rate. Format-specific options allow you to control sample rate and bitrate.
32kHz Formats
| Format | Description | Use Case |
|---|---|---|
mp3 | Compressed, smallest file size | Web playback, storage efficiency |
wav | Uncompressed with headers | Professional audio editing |
pcm | Raw 16-bit signed little-endian, 32kHz mono | Real-time processing, custom decoders |
MP3 Formats (with sample rate and bitrate)
| Format | Sample Rate | Bitrate | Use Case |
|---|---|---|---|
mp3_22050_32 | 22.05kHz | 32kbps | Low bandwidth, voice-only |
mp3_24000_48 | 24kHz | 48kbps | Voice applications |
mp3_44100_32 | 44.1kHz | 32kbps | Music/voice, low bandwidth |
mp3_44100_64 | 44.1kHz | 64kbps | Music/voice, balanced |
mp3_44100_96 | 44.1kHz | 96kbps | Music/voice, good quality |
mp3_44100_128 | 44.1kHz | 128kbps | Music/voice, high quality |
mp3_44100_192 | 44.1kHz | 192kbps | Music/voice, highest quality |
Opus Formats (with sample rate and bitrate)
| Format | Sample Rate | Bitrate | Use Case |
|---|---|---|---|
opus_48000_32 | 48kHz | 32kbps | Low bandwidth, voice-only |
opus_48000_64 | 48kHz | 64kbps | Voice applications, balanced |
opus_48000_96 | 48kHz | 96kbps | Voice/music, good quality |
opus_48000_128 | 48kHz | 128kbps | Voice/music, high quality |
opus_48000_192 | 48kHz | 192kbps | Voice/music, highest quality |
PCM Formats (with sample rate)
Allpcm_* formats use 16-bit signed little-endian mono at the specified sample rate.
| Format | Sample Rate | Use Case |
|---|---|---|
pcm_8000 | 8kHz | Telephony, low bandwidth |
pcm_16000 | 16kHz | Voice applications |
pcm_22050 | 22.05kHz | Voice/music, balanced |
pcm_24000 | 24kHz | Voice/music |
pcm_32000 | 32kHz | Voice/music, standard |
pcm_44100 | 44.1kHz | Music, CD quality |
pcm_48000 | 48kHz | Music, professional quality |
WAV Formats (with sample rate)
| Format | Sample Rate | Use Case |
|---|---|---|
wav_16000 | 16kHz | Voice applications |
wav_22050 | 22.05kHz | Voice/music, balanced |
wav_24000 | 24kHz | Voice/music |
Telephony Formats
| Format | Sample Rate | Use Case |
|---|---|---|
alaw_8000 | 8kHz | A-law telephony (G.711) |
ulaw_8000 | 8kHz | μ-law telephony (G.711) |
Voice Cloning
Create custom voices from audio samples
Streaming
HTTP & WebSocket streaming (WebSocket for lowest latency)
API Reference
Complete endpoint documentation