Skip to main content
Stream audio in real-time using HTTP chunked transfer encoding. Audio chunks are sent as they’re generated, reducing latency for conversational AI and real-time applications.

How It Works

The streaming endpoint (/api/v1/tts/speech/stream) uses HTTP chunked transfer encoding:
  • Audio arrives incrementally, reducing time-to-first-audio
  • Start playing audio before generation completes
  • Lower memory usage (no need to buffer entire file)
  • Better UX for real-time applications

Examples

import requests

# Using default voice (voice_id is optional)
response = requests.post(
    'https://dev.voice.ai/api/v1/tts/speech/stream',
    headers={'Authorization': 'Bearer YOUR_API_KEY', 'Content-Type': 'application/json'},
    json={'text': 'This is a test of streaming audio generation.'},
    stream=True
)

# Or with a custom voice_id:
# json={'voice_id': 'your-voice-id-here', 'text': 'This is a test of streaming audio generation.'}

with open('output.mp3', 'wb') as f:
    for chunk in response.iter_content():
        if chunk: f.write(chunk)

When to Use Streaming

Use streaming for:
  • Low latency requirements
  • Real-time applications (conversational AI, live narration)
  • Large text inputs
  • Memory-constrained environments
Use non-streaming for:
  • Batch processing
  • Simpler code requirements
  • Small text inputs

Best Practices

  • Handle network errors gracefully
  • Start playing audio chunks as soon as they arrive
  • Implement timeout handling for long streams
  • Prefer MP3 for efficiency; PCM for highest quality
See the API Reference for complete documentation.