Configure Voice and Transcription Settings

What is the Audio Tab?

The Audio Tab is where you configure how your agent listens and speaks. Set up language preferences, choose transcription providers for speech-to-text, and select voice synthesizers for natural-sounding responses.

Audio Tab showing language, speech-to-text, and text-to-speech configuration

Configuration Options

Configure Language

Set your agent’s primary language and enable multilingual support.

Language Selection

Choose primary language (English, Hindi, Spanish, etc.)

Auto Language Switch

Automatically detect and switch languages during calls

Enable Auto Language Switch for multilingual support. Your agent will detect the caller’s language and respond accordingly.

Speech-to-Text (Transcription)

Configure how your agent converts spoken words into text.

Speech-to-Text section showing Provider dropdown, Model selection, and Keywords field

Select Provider

Choose your transcription provider (e.g., Deepgram, Azure).

Select Model

Pick the model (e.g., nova-3 for best accuracy).

Add Keywords (Optional)

Boost recognition of specific terms like names or brand words.

Keywords help accuracy! Add names, brand terms, or technical words with boost values. Format: word:boost_value (e.g., Bruce:100).

Text-to-Speech (Voice)

Configure how your agent sounds with voice synthesis settings.

Text-to-Speech section showing Provider, Model, Voice selection, and voice tuning sliders

Select Provider

Choose your voice synthesis provider (e.g., ElevenLabs, Azure).

Select Model

Pick the model (e.g., eleven_turbo_v2_5 for low latency).

Choose Voice

Select a specific voice. Click ▶️ to preview!

Click “Add voices” to import or clone custom voices for a unique brand experience.

Voice Tuning

Fine-tune your agent’s voice quality with these settings.

Setting	Description	Recommended
Buffer Size	Audio buffering before playback	200 (balance of quality and speed)
Speed Rate	Speaking speed	1.0 (natural pace)
Similarity Boost	Voice matching accuracy	0.75 (close to original voice)
Stability	Voice consistency	0.5 (balanced expression)
Style Exaggeration	Voice characteristics	0 (neutral, increase for expressive)

Balance is key! High buffer size improves quality but increases latency. Test different settings to find the right balance for your use case.

Next Steps

Agent Tab

Configure prompts and welcome message

Engine Tab

Configure transcription and latency

Clone Voices

Create custom voice clones

Deepgram

Learn about transcription options

Getting Started

Using Bolna Platform

Pricing

Enterprise

On premise deployments

Multilingual Voice agents

Integrations

Voice AI Agent Function calls

Features

Advance capabilities

Supported Telephony

Phone calls using Bolna

Resources

Configure Voice and Transcription Settings

What is the Audio Tab?

Configuration Options

Configure Language

Language Selection

Auto Language Switch

Speech-to-Text (Transcription)

Text-to-Speech (Voice)

Voice Tuning

Next Steps

Agent Tab

Engine Tab

Clone Voices

Deepgram

Getting Started

Using Bolna Platform

Pricing

Enterprise

On premise deployments

Multilingual Voice agents

Integrations

Voice AI Agent Function calls

Features

Advance capabilities

Supported Telephony

Phone calls using Bolna

Resources

​What is the Audio Tab?

​Configuration Options

​Configure Language

Language Selection

Auto Language Switch

​Speech-to-Text (Transcription)

​Text-to-Speech (Voice)

​Voice Tuning

​Next Steps

Agent Tab

Engine Tab

Clone Voices

Deepgram

What is the Audio Tab?

Configuration Options

Configure Language

Speech-to-Text (Transcription)

Text-to-Speech (Voice)

Voice Tuning

Next Steps