Generate speech from text. If voice_id is provided, uses that voice; otherwise uses the default built-in voice. Returns complete audio file. Synchronous endpoint - blocks until generation completes.
Bearer token authentication. Use your API key as the bearer token. Format: Authorization: Bearer
The text to generate speech for
Voice ID to use. Omit to use the default built-in voice.
Audio output format (32kHz sample rate)
mp3, wav, pcm Sampling temperature (0.0-2.0)
0 <= x <= 2Nucleus sampling parameter (0.0-1.0)
0 <= x <= 1TTS model to use. If not provided, automatically selected based on language. English uses non-multilingual models; other languages use multilingual models.
voiceai-tts-v1-latest, voiceai-tts-v1-2025-12-19, voiceai-tts-multilingual-v1-latest, voiceai-tts-multilingual-v1-2025-01-14 Language code (ISO 639-1 format)
en, ca, sv, es, fr, de, it, pt, pl, ru, nl Successful Response - Returns binary audio file (32kHz sample rate)
MP3 audio file (32kHz sample rate, compressed)