Complete reference for all agent configuration parameters available in the Voice AI API.
Overview
Agent configuration consists of:
- Root-level parameters -
name and kb_id are set at the root level of create/update requests
config object - Contains conversation style, voice settings, timing controls, and integrations
All configuration parameters are optional (except name when creating) - the system uses sensible defaults for any omitted fields.
Status Management: Agent status (paused, deployed, disabled) cannot be updated directly through the update endpoint. Use the dedicated endpoints: Deploy Agent, Pause Agent, or Delete Agent.
Agent-Level Parameters
These parameters are set at the root level of agent create/update requests, not inside the config object:
| Parameter | Type | Required | Description |
|---|
name | string | Yes (create), No (update) | Agent name. Must be at least 1 character. |
kb_id | number | No | Knowledge base ID to assign to the agent. Set to an existing knowledge base ID to assign it, or null to remove the knowledge base. See Knowledge Base Management. |
Configuration Structure
When creating or updating an agent, the request structure is:
{
"name": "My Agent",
"config": {
"prompt": "You are a helpful assistant...",
"greeting": "Hello! How can I help you?",
"llm_temperature": 0.7,
"llm_model": "gemini-2.5-flash-lite",
"tts_min_sentence_len": 20,
"tts_params": {
"voice_id": "abc",
"temperature": 1.0
},
"allow_interruptions": true,
"min_silence_duration": 0.55,
"phone_number": "+14155551234",
"mcp_servers": [...]
},
"kb_id": 123
}
The config object contains all behavioral parameters. The kb_id parameter is set at the root level to assign a knowledge base to the agent.
Core Conversation Parameters
| Parameter | Type | Default | Description |
|---|
prompt | string | null | Custom instructions that define the agent’s personality, role, and behavior. This is the system prompt sent to the LLM. |
greeting | string | null | Initial message the agent speaks when a call starts. If not provided, the agent will start with a default greeting. |
llm_temperature | number | 0.7 | Controls the creativity/randomness of the LLM responses. Range: 0.0-2.0. Lower values (0.0-0.5) = more focused and deterministic. Higher values (1.0-2.0) = more creative and varied. |
llm_model | string | "gemini-2.5-flash-lite" | The language model to use. Options: "gpt-4o-mini", "gemini-2.5-flash", "gemini-2.5-flash-lite", "gpt-4o", "gemini-2.5-pro". |
Voice & Speech Parameters
| Parameter | Type | Default | Description |
|---|
tts_min_sentence_len | number | 20 | Minimum sentence length (in characters) before TTS starts streaming audio. Lower values = faster response time but more choppy audio. Higher values = smoother audio but longer wait time. |
tts_params | object | See TTS Parameters | Nested object containing voice and TTS generation settings. |
TTS Parameters
The tts_params object contains voice-specific settings:
| Parameter | Type | Default | Description |
|---|
voice_id | string | null | ID of the voice to use. Get available voices from List Voices endpoint. If not provided, uses default voice. |
model | string | null | TTS model to use. Supported models: voiceai-tts-v1-latest, voiceai-tts-v1-2025-12-19 (English only), voiceai-tts-multilingual-v1-latest, voiceai-tts-multilingual-v1-2025-01-14 (multilingual). If not provided, automatically selected based on language. |
language | string | null | Language code (ISO 639-1 format) or "auto" for ASR-detected language. Supported languages: en, ca, sv, es, fr, de, it, pt, pl, ru, nl. Use "auto" to automatically detect language from speech recognition (requires multilingual model). Defaults to "en" at runtime if not set. |
temperature | number | 1.0 | Sampling temperature (0.0-2.0). Higher values make output more random. Controls voice variation and expressiveness. |
top_p | number | 0.8 | Nucleus sampling parameter (0.0-1.0). Controls diversity of output. |
Automatic Language Detection: Set language: "auto" to automatically detect the user’s language from speech recognition. This requires a multilingual model (e.g., voiceai-tts-multilingual-v1-latest). The agent will match the TTS language to the detected speech language in real-time.
Interruption & Turn-Taking
| Parameter | Type | Default | Description |
|---|
allow_interruptions | boolean | true | Whether users can interrupt the agent while it’s speaking. When false, users must wait for the agent to finish. |
allow_interruptions_on_greeting | boolean | false | Whether to allow interruptions during the greeting message. Useful for impatient callers. |
min_interruption_words | number | 1 | Minimum number of words required for an interruption to be recognized. Range: 0-10. 0 = any speech triggers interruption, 1+ = requires that many words. |
auto_noise_reduction | boolean | true | Enable automatic noise reduction based on environment detection. Improves speech recognition in noisy environments. |
Timing & Endpointing
These parameters control when the system detects speech start/end and manages conversation flow:
| Parameter | Type | Default | Description |
|---|
min_silence_duration | number | 0.55 | Minimum duration of silence (seconds) before considering speech has ended. Lower = more responsive but may cut off slow speakers. |
min_speech_duration | number | 0.5 | Minimum duration of speech (seconds) required to trigger an interruption. Prevents false interruptions from brief sounds. |
min_endpointing_delay | number | 0.5 | Minimum delay (seconds) before considering speech has ended. Works with min_silence_duration for endpointing. |
max_endpointing_delay | number | 3.0 | Maximum delay (seconds) before forcing speech to end. Prevents long pauses from blocking conversation. |
vad_activation_threshold | number | 0.5 | Voice activity detection threshold. Range: 0.0-1.0. Lower = more sensitive (detects quieter speech), higher = less sensitive (requires clearer speech). |
user_silence_timeout | number | 10.0 | Seconds of user silence before agent prompts to check if user is still there. Helps detect dropped calls or disengaged users. |
max_call_duration_seconds | number | null | Maximum call duration in seconds. null = no limit. Useful for limiting costs or enforcing time constraints. |
Agent Control Permissions
| Parameter | Type | Default | Description |
|---|
allow_agent_to_end_call | boolean | false | Whether the agent can end calls via tool calls or timeout. When true, agent can proactively end conversations. |
allow_agent_to_skip_turn | boolean | false | Whether the agent can skip turns and yield conversation control. Useful for multi-party scenarios. |
Phone Number
| Parameter | Type | Default | Description |
|---|
phone_number | string | null | Assigned phone number in E.164 format (e.g., "+14155551234"). Must be a number from your available phone numbers. See Phone Number Management. |
Integrations
MCP Servers
| Parameter | Type | Default | Description |
|---|
mcp_servers | array | null | List of Model Context Protocol (MCP) server configurations. Allows agents to connect to external tools and data sources. See Model Context Protocol for details. |
Each MCP server configuration includes:
name (string, required) - Human-readable name
description (string, optional) - Server description
url (string, required) - MCP server endpoint URL
auth_type (string, optional) - Authentication type: "none", "bearer_token", "api_key", "custom_headers". Default: "none"
auth_token (string, optional) - Token for authentication
headers (object, optional) - Custom HTTP headers
Examples
Basic Configuration
{
"prompt": "You are a helpful customer support agent.",
"greeting": "Thank you for calling. How can I help you?",
"llm_model": "gemini-2.5-flash-lite"
}
Advanced Configuration
{
"prompt": "You are a technical support specialist. Be concise and professional.",
"greeting": "Hello, this is technical support. What issue can I help you with?",
"llm_temperature": 0.5,
"llm_model": "gemini-2.5-flash",
"tts_params": {
"voice_id": "custom_voice_123",
"model": "voiceai-tts-multilingual-v1-latest",
"language": "en",
"temperature": 0.9
},
"allow_interruptions": true,
"min_interruption_words": 2,
"min_silence_duration": 0.6,
"vad_activation_threshold": 0.6,
"max_call_duration_seconds": 1800,
"phone_number": "+14155551234",
"mcp_servers": [
{
"name": "Database Server",
"url": "https://api.example.com/mcp",
"auth_type": "bearer_token",
"auth_token": "your-token"
}
]
}
Multilingual Agent with Auto Language Detection
{
"prompt": "You are a multilingual customer support agent. Respond in the same language as the customer.",
"greeting": "Hello! How can I help you today?",
"llm_model": "gemini-2.5-flash-lite",
"tts_params": {
"model": "voiceai-tts-multilingual-v1-latest",
"language": "auto"
},
"allow_interruptions": true
}
This configuration automatically detects the user’s language from speech recognition and matches the TTS output language accordingly.
Configuration for Fast-Paced Conversations
{
"prompt": "You are a quick and efficient assistant.",
"llm_temperature": 0.3,
"allow_interruptions": true,
"min_interruption_words": 0,
"min_silence_duration": 0.4,
"min_endpointing_delay": 0.3,
"max_endpointing_delay": 2.0,
"vad_activation_threshold": 0.4
}
{
"prompt": "You are a professional business assistant. Speak formally and wait for complete thoughts.",
"allow_interruptions": false,
"min_silence_duration": 0.8,
"min_endpointing_delay": 0.7,
"max_endpointing_delay": 4.0,
"vad_activation_threshold": 0.6
}
Best Practices
- Start Simple: Begin with just
prompt and greeting, then add parameters as needed
- Test Incrementally: Adjust one parameter at a time to understand its effect
- Timing Parameters: Fine-tune
min_silence_duration and vad_activation_threshold based on your use case
- Interruptions: Set
allow_interruptions: false for formal scenarios, true for casual conversations
- Voice Selection: Use
voice_id in tts_params to select a specific voice from your available voices
- Call Duration: Set
max_call_duration_seconds to prevent runaway costs or enforce time limits
Next Steps