Generate speech from text using the Voice.ai TTS API.
Get a Voice ID (Optional)
Optionally get a voice_id from your dashboard or clone a voice. Skip this to use the default voice. Generate Speech
Generate speech from text. Include voice_id if you have one, or omit it to use the default voice. See the Generate Speech endpoint for details.import requests
# Using default voice (voice_id is optional)
response = requests.post(
'https://dev.voice.ai/api/v1/tts/speech',
headers={'Authorization': 'Bearer YOUR_API_KEY', 'Content-Type': 'application/json'},
json={
'text': 'Hello! This is a test of the Voice.ai TTS API.',
'model': 'voiceai-tts-v1-latest', # Optional, defaults to voiceai-tts-v1-latest
'language': 'en' # Optional, defaults to 'en'
}
)
# Or with a custom voice_id:
# json={'voice_id': 'your-voice-id-here', 'text': 'Hello! This is a test of the Voice.ai TTS API.', 'model': 'voiceai-tts-v1-latest', 'language': 'en'}
with open('output.mp3', 'wb') as f:
f.write(response.content)
Streaming
For lowest latency: Use the WebSocket endpoint for conversational AI or multiple sequential requests.
HTTP Streaming (Simple)
For simple request/response streaming, use the HTTP streaming endpoint:
import requests
# Using default voice (voice_id is optional)
response = requests.post(
'https://dev.voice.ai/api/v1/tts/speech/stream',
headers={'Authorization': 'Bearer YOUR_API_KEY', 'Content-Type': 'application/json'},
json={
'text': 'This text will be streamed in chunks.',
'model': 'voiceai-tts-v1-latest', # Optional, defaults to voiceai-tts-v1-latest
'language': 'en' # Optional, defaults to 'en'
},
stream=True
)
# Or with a custom voice_id:
# json={'voice_id': 'your-voice-id-here', 'text': 'This text will be streamed in chunks.', 'model': 'voiceai-tts-v1-latest', 'language': 'en'}
with open('output.mp3', 'wb') as f:
for chunk in response.iter_content():
if chunk: f.write(chunk)
WebSocket Streaming (Optimal for Conversational AI)
For lowest latency in multi-turn conversations, use WebSocket:
import asyncio
import json
import base64
import websockets
async def tts_conversation():
url = "wss://dev.voice.ai/api/v1/tts/stream"
headers = {"Authorization": "Bearer YOUR_API_KEY"}
async with websockets.connect(url, additional_headers=headers) as ws:
# First message
await ws.send(json.dumps({
"text": "Hello! How can I help you today?",
"language": "en",
"flush": True
}))
# Receive audio chunks
while True:
msg = await ws.recv()
data = json.loads(msg)
if data.get("audio"):
audio_chunk = base64.b64decode(data["audio"])
# Process/play audio chunk...
elif data.get("is_last"):
break
# Subsequent messages:
await ws.send(json.dumps({
"text": "I can help you with that.",
"flush": True
}))
# Receive audio...
asyncio.run(tts_conversation())
See the Streaming Guide for complete WebSocket documentation including multi-context streaming for multiple concurrent voices.
Supported Languages
The TTS API supports multiple languages. Specify the language parameter using ISO 639-1 language codes. If not provided, the API defaults to English (en).
| Language Code | Language | Model Type |
|---|
en | English | Non-multilingual |
ca | Catalan | Multilingual |
sv | Swedish | Multilingual |
es | Spanish | Multilingual |
fr | French | Multilingual |
de | German | Multilingual |
it | Italian | Multilingual |
pt | Portuguese | Multilingual |
pl | Polish | Multilingual |
ru | Russian | Multilingual |
nl | Dutch | Multilingual |
Model Selection: The API automatically selects the appropriate model based on the language. English uses voiceai-tts-v1-latest (non-multilingual), while all other languages use voiceai-tts-multilingual-v1-latest. You can override this by explicitly specifying the model parameter.