Generate speech from text. If voice_id is provided, uses that voice; otherwise uses the default built-in voice. Returns complete audio file. Synchronous endpoint - blocks until generation completes.
Bearer token authentication. Use your API key as the bearer token. Format: Authorization: Bearer
The text to generate speech for
Voice ID to use. Omit to use the default built-in voice.
Audio output format (32kHz sample rate)
mp3, wav, pcm, alaw_8000, mp3_22050_32, mp3_24000_48, mp3_44100_32, mp3_44100_64, mp3_44100_96, mp3_44100_128, mp3_44100_192, opus_48000_32, opus_48000_64, opus_48000_96, opus_48000_128, opus_48000_192, pcm_8000, pcm_16000, pcm_22050, pcm_24000, pcm_32000, pcm_44100, pcm_48000, ulaw_8000, wav_16000, wav_22050, wav_24000 Sampling temperature (0.0-2.0)
0 <= x <= 2Nucleus sampling parameter (0.0-1.0)
0 <= x <= 1TTS model to use. If not provided, automatically selected based on language. English uses non-multilingual models; other languages use multilingual models.
voiceai-tts-v1-latest, voiceai-tts-v1-2026-02-10, voiceai-tts-multilingual-v1-latest, voiceai-tts-multilingual-v1-2026-02-10 Language code (ISO 639-1 format)
en, ca, sv, es, fr, de, it, pt, pl, ru, nl Successful Response - Returns binary audio file (32kHz sample rate)
MP3 audio file (32kHz sample rate, compressed)