Skip to main content
Generate speech from text using the Voice.ai TTS API.
Prerequisites: API key
1

Get a Voice ID (Optional)

Optionally get a voice_id from your dashboard or clone a voice. Skip this to use the default voice.
2

Generate Speech

Generate speech from text. Include voice_id if you have one, or omit it to use the default voice. See the Generate Speech endpoint for details.
import requests

# Using default voice (voice_id is optional)
response = requests.post(
    'https://dev.voice.ai/api/v1/tts/speech',
    headers={'Authorization': 'Bearer YOUR_API_KEY', 'Content-Type': 'application/json'},
    json={
        'text': 'Hello! This is a test of the Voice.ai TTS API.',
        'model': 'voiceai-tts-v1-latest',  # Optional, defaults to voiceai-tts-v1-latest
        'language': 'en'  # Optional, defaults to 'en'
    }
)

# Or with a custom voice_id:
# json={'voice_id': 'your-voice-id-here', 'text': 'Hello! This is a test of the Voice.ai TTS API.', 'model': 'voiceai-tts-v1-latest', 'language': 'en'}

with open('output.mp3', 'wb') as f:
        f.write(response.content)

Streaming

For lowest latency: Use the WebSocket endpoint for conversational AI or multiple sequential requests.

HTTP Streaming (Simple)

For simple request/response streaming, use the HTTP streaming endpoint:
import requests

# Using default voice (voice_id is optional)
response = requests.post(
    'https://dev.voice.ai/api/v1/tts/speech/stream',
    headers={'Authorization': 'Bearer YOUR_API_KEY', 'Content-Type': 'application/json'},
    json={
        'text': 'This text will be streamed in chunks.',
        'model': 'voiceai-tts-v1-latest',  # Optional, defaults to voiceai-tts-v1-latest
        'language': 'en'  # Optional, defaults to 'en'
    },
    stream=True
)

# Or with a custom voice_id:
# json={'voice_id': 'your-voice-id-here', 'text': 'This text will be streamed in chunks.', 'model': 'voiceai-tts-v1-latest', 'language': 'en'}

with open('output.mp3', 'wb') as f:
    for chunk in response.iter_content():
        if chunk: f.write(chunk)

WebSocket Streaming (Optimal for Conversational AI)

For lowest latency in multi-turn conversations, use WebSocket:
import asyncio
import json
import base64
import websockets

async def tts_conversation():
    url = "wss://dev.voice.ai/api/v1/tts/stream"
    headers = {"Authorization": "Bearer YOUR_API_KEY"}
    
    async with websockets.connect(url, additional_headers=headers) as ws:
        # First message
        await ws.send(json.dumps({
            "text": "Hello! How can I help you today?",
            "language": "en",
            "flush": True
        }))
        
        # Receive audio chunks
        while True:
            msg = await ws.recv()
            data = json.loads(msg)
            if data.get("audio"):
                audio_chunk = base64.b64decode(data["audio"])
                # Process/play audio chunk...
            elif data.get("is_last"):
                break
        
        # Subsequent messages:
        await ws.send(json.dumps({
            "text": "I can help you with that.",
            "flush": True
        }))
        
        # Receive audio...

asyncio.run(tts_conversation())
See the Streaming Guide for complete WebSocket documentation including multi-context streaming for multiple concurrent voices.

Supported Languages

The TTS API supports multiple languages. Specify the language parameter using ISO 639-1 language codes. If not provided, the API defaults to English (en).
Language CodeLanguageModel Type
enEnglishNon-multilingual
caCatalanMultilingual
svSwedishMultilingual
esSpanishMultilingual
frFrenchMultilingual
deGermanMultilingual
itItalianMultilingual
ptPortugueseMultilingual
plPolishMultilingual
ruRussianMultilingual
nlDutchMultilingual
Model Selection: The API automatically selects the appropriate model based on the language. English uses voiceai-tts-v1-latest (non-multilingual), while all other languages use voiceai-tts-multilingual-v1-latest. You can override this by explicitly specifying the model parameter.