Skip to main content

What is the Audio Tab?

The Audio Tab is where you configure how your agent listens and speaks. Set up language preferences, choose transcription providers for speech-to-text, and select voice synthesizers for natural-sounding responses.
Audio Tab showing language, speech-to-text, and text-to-speech configuration

Configuration Options

Configure Language

Set your agent’s primary language and enable multilingual support.
Configure Language section with language dropdown and Auto language switch toggle

Language Selection

Choose primary language (English, Hindi, Spanish, etc.)

Auto Language Switch

Automatically detect and switch languages during calls
Enable Auto Language Switch for multilingual support. Your agent will detect the caller’s language and respond accordingly.

Speech-to-Text (Transcription)

Configure how your agent converts spoken words into text.
Speech-to-Text section showing Provider dropdown, Model selection, and Keywords field
1

Select Provider

Choose your transcription provider (e.g., Deepgram, Azure).
2

Select Model

Pick the model (e.g., nova-3 for best accuracy).
3

Add Keywords (Optional)

Boost recognition of specific terms like names or brand words.
Keywords help accuracy! Add names, brand terms, or technical words with boost values. Format: word:boost_value (e.g., Bruce:100).

Text-to-Speech (Voice)

Configure how your agent sounds with voice synthesis settings.
Text-to-Speech section showing Provider, Model, Voice selection, and voice tuning sliders
1

Select Provider

Choose your voice synthesis provider (e.g., ElevenLabs, Azure).
2

Select Model

Pick the model (e.g., eleven_turbo_v2_5 for low latency).
3

Choose Voice

Select a specific voice. Click ▶️ to preview!
Click “Add voices” to import or clone custom voices for a unique brand experience.

Voice Tuning

Fine-tune your agent’s voice quality with these settings.
SettingDescriptionRecommended
Buffer SizeAudio buffering before playback200 (balance of quality and speed)
Speed RateSpeaking speed1.0 (natural pace)
Similarity BoostVoice matching accuracy0.75 (close to original voice)
StabilityVoice consistency0.5 (balanced expression)
Style ExaggerationVoice characteristics0 (neutral, increase for expressive)
Balance is key! High buffer size improves quality but increases latency. Test different settings to find the right balance for your use case.

Next Steps