Platform Concepts

Bolna Voice AI is built on a modular architecture that combines multiple AI components into seamless conversational agents. Understanding these core concepts will help you build more effective voice AI applications. Bolna helps you create AI Agents which can be instructed to perform tasks using a modular pipeline:

Input medium

The channel through which users interact with your agent:

Voice conversations: Microphone or phone call
Text conversations: Keyboard input via chat interfaces
Visual conversations: Image inputs (Coming soon)

ASR (Automatic Speech Recognition)

The transcriber component converts spoken input into text format that the LLM can understand. Bolna supports multiple ASR providers including Deepgram, Azure, AssemblyAI, and Sarvam.

LLM (Large Language Model)

The LLM component processes the transcribed input and generates appropriate responses. It’s the “brain” of your agent that understands context and makes decisions. Bolna integrates with OpenAI, Azure OpenAI, Anthropic, and other providers.

TTS (Text-to-Speech) / Synthesizer

The voice synthesizer converts the LLM’s text response into natural-sounding speech. Choose from providers like ElevenLabs, Azure, Cartesia, and more.

Output component

Delivers the agent’s response back to the user through the appropriate medium (voice, text, or visual).

What tasks can agents perform?

Bolna provides functionality to instruct your agent to execute tasks during and after conversations:

Real-time tasks

Transfer calls to human agents
Fetch calendar slots for scheduling
Book appointments automatically
Execute custom functions based on conversation

Post-conversation tasks

Summarization: Generate call summaries automatically
Data extraction: Extract specific information from conversations
Function execution: Trigger custom workflows after calls end

Next steps

Ready to build your first agent? Start with the agent setup guide or explore provider integrations to configure your components.

Getting Started

Pricing

Enterprise

Multilingual Voice agents

Integrations

Voice AI Agent Function calls

Features

Advance customizations

Phone calls using Bolna

Resources

Using Bolna Playground

Input medium

ASR (Automatic Speech Recognition)

LLM (Large Language Model)

TTS (Text-to-Speech) / Synthesizer

Output component

What tasks can agents perform?

Real-time tasks

Post-conversation tasks

Next steps

Getting Started

Pricing

Enterprise

Multilingual Voice agents

Integrations

Voice AI Agent Function calls

Features

Advance customizations

Phone calls using Bolna

Resources

Using Bolna Playground

​Input medium

​ASR (Automatic Speech Recognition)

​LLM (Large Language Model)

​TTS (Text-to-Speech) / Synthesizer

​Output component

​What tasks can agents perform?

​Real-time tasks

​Post-conversation tasks

​Next steps

Input medium

ASR (Automatic Speech Recognition)

LLM (Large Language Model)

TTS (Text-to-Speech) / Synthesizer

Output component

What tasks can agents perform?

Real-time tasks

Post-conversation tasks

Next steps