Generate speech from text using the Voice.ai TTS API.
Get a Voice ID (Optional)
Optionally get a voice_id from your dashboard or clone a voice. Skip this to use the default voice. Generate Speech
Generate speech from text. Include voice_id if you have one, or omit it to use the default voice. See the Generate Speech endpoint for details.import requests
# Using default voice (voice_id is optional)
response = requests.post(
'https://dev.voice.ai/api/v1/tts/speech',
headers={'Authorization': 'Bearer YOUR_API_KEY', 'Content-Type': 'application/json'},
json={
'text': 'Hello! This is a test of the Voice.ai TTS API.',
'model': 'voiceai-tts-v1-latest', # Optional, defaults to voiceai-tts-v1-latest
'language': 'en' # Optional, defaults to 'en'
}
)
# Or with a custom voice_id:
# json={'voice_id': 'your-voice-id-here', 'text': 'Hello! This is a test of the Voice.ai TTS API.', 'model': 'voiceai-tts-v1-latest', 'language': 'en'}
with open('output.mp3', 'wb') as f:
f.write(response.content)
Streaming
For lowest latency: Use the WebSocket endpoint for conversational AI or multiple sequential requests.
HTTP Streaming (Simple)
For simple request/response streaming, use the HTTP streaming endpoint:
import requests
# Using default voice (voice_id is optional)
response = requests.post(
'https://dev.voice.ai/api/v1/tts/speech/stream',
headers={'Authorization': 'Bearer YOUR_API_KEY', 'Content-Type': 'application/json'},
json={
'text': 'This text will be streamed in chunks.',
'model': 'voiceai-tts-v1-latest', # Optional, defaults to voiceai-tts-v1-latest
'language': 'en' # Optional, defaults to 'en'
},
stream=True
)
# Or with a custom voice_id:
# json={'voice_id': 'your-voice-id-here', 'text': 'This text will be streamed in chunks.', 'model': 'voiceai-tts-v1-latest', 'language': 'en'}
with open('output.mp3', 'wb') as f:
for chunk in response.iter_content():
if chunk: f.write(chunk)
WebSocket Streaming (Optimal for Conversational AI)
For lowest latency in multi-turn conversations, use the Multi-Context WebSocket (/multi-stream):
import asyncio
import json
import base64
import websockets
async def tts_conversation():
# Use /multi-stream for multiple generations over persistent connection
url = "wss://dev.voice.ai/api/v1/tts/multi-stream"
headers = {"Authorization": "Bearer YOUR_API_KEY"}
async with websockets.connect(url, additional_headers=headers) as ws:
# First message (context auto-generated)
await ws.send(json.dumps({
"text": "Hello! How can I help you today?",
"language": "en",
"flush": True
}))
# Receive audio chunks
while True:
msg = await ws.recv()
data = json.loads(msg)
if data.get("audio"):
audio_chunk = base64.b64decode(data["audio"])
# Process/play audio chunk...
elif data.get("is_last"):
break
# Second generation (same connection)
await ws.send(json.dumps({
"text": "I can help you with that.",
"flush": True
}))
# Receive audio...
while True:
msg = await ws.recv()
data = json.loads(msg)
if data.get("audio"):
audio_chunk = base64.b64decode(data["audio"])
# Process/play audio chunk...
elif data.get("is_last"):
break
# Close when done
await ws.send(json.dumps({"close_socket": True}))
asyncio.run(tts_conversation())
See the Streaming Guide for complete WebSocket documentation.
Supported Languages
The TTS API supports multiple languages. Specify the language parameter using ISO 639-1 language codes. If not provided, the API defaults to English (en).
| Language Code | Language | Model Type |
|---|
en | English | Non-multilingual |
ca | Catalan | Multilingual |
sv | Swedish | Multilingual |
es | Spanish | Multilingual |
fr | French | Multilingual |
de | German | Multilingual |
it | Italian | Multilingual |
pt | Portuguese | Multilingual |
pl | Polish | Multilingual |
ru | Russian | Multilingual |
nl | Dutch | Multilingual |
Model Selection: The API automatically selects the appropriate model based on the language. English uses voiceai-tts-v1-latest (non-multilingual), while all other languages use voiceai-tts-multilingual-v1-latest. You can override this by explicitly specifying the model parameter.
Audio Output
The TTS API supports multiple audio formats with various sample rates and bitrates. Basic formats (mp3, wav, pcm) output at 32kHz sample rate. Format-specific options allow you to control sample rate and bitrate.
| Format | Description | Use Case |
|---|
mp3 | Compressed, smallest file size | Web playback, storage efficiency |
wav | Uncompressed with headers | Professional audio editing |
pcm | Raw 16-bit signed little-endian samples | Real-time processing, custom decoders |
| Format | Sample Rate | Bitrate | Use Case |
|---|
mp3_22050_32 | 22.05kHz | 32kbps | Low bandwidth, voice-only |
mp3_24000_48 | 24kHz | 48kbps | Voice applications |
mp3_44100_32 | 44.1kHz | 32kbps | Music/voice, low bandwidth |
mp3_44100_64 | 44.1kHz | 64kbps | Music/voice, balanced |
mp3_44100_96 | 44.1kHz | 96kbps | Music/voice, good quality |
mp3_44100_128 | 44.1kHz | 128kbps | Music/voice, high quality |
mp3_44100_192 | 44.1kHz | 192kbps | Music/voice, highest quality |
| Format | Sample Rate | Bitrate | Use Case |
|---|
opus_48000_32 | 48kHz | 32kbps | Low bandwidth, voice-only |
opus_48000_64 | 48kHz | 64kbps | Voice applications, balanced |
opus_48000_96 | 48kHz | 96kbps | Voice/music, good quality |
opus_48000_128 | 48kHz | 128kbps | Voice/music, high quality |
opus_48000_192 | 48kHz | 192kbps | Voice/music, highest quality |
| Format | Sample Rate | Use Case |
|---|
pcm_8000 | 8kHz | Telephony, low bandwidth |
pcm_16000 | 16kHz | Voice applications |
pcm_22050 | 22.05kHz | Voice/music, balanced |
pcm_24000 | 24kHz | Voice/music |
pcm_32000 | 32kHz | Voice/music, standard |
pcm_44100 | 44.1kHz | Music, CD quality |
pcm_48000 | 48kHz | Music, professional quality |
| Format | Sample Rate | Use Case |
|---|
wav_16000 | 16kHz | Voice applications |
wav_22050 | 22.05kHz | Voice/music, balanced |
wav_24000 | 24kHz | Voice/music |
| Format | Sample Rate | Use Case |
|---|
alaw_8000 | 8kHz | A-law telephony (G.711) |
ulaw_8000 | 8kHz | μ-law telephony (G.711) |