Input medium
The channel through which users interact with your agent:- Voice conversations: Microphone or phone call
- Text conversations: Keyboard input via chat interfaces
- Visual conversations: Image inputs (Coming soon)
ASR (Automatic Speech Recognition)
The transcriber component converts spoken input into text format that the LLM can understand. Bolna supports multiple ASR providers including Deepgram, Azure, AssemblyAI, and Sarvam.LLM (Large Language Model)
The LLM component processes the transcribed input and generates appropriate responses. It’s the “brain” of your agent that understands context and makes decisions. Bolna integrates with OpenAI, Azure OpenAI, Anthropic, and other providers.TTS (Text-to-Speech) / Synthesizer
The voice synthesizer converts the LLM’s text response into natural-sounding speech. Choose from providers like ElevenLabs, Azure, Cartesia, and more.Output component
Delivers the agent’s response back to the user through the appropriate medium (voice, text, or visual).What tasks can agents perform?
Bolna provides functionality to instruct your agent to execute tasks during and after conversations:Real-time tasks
- Transfer calls to human agents
- Fetch calendar slots for scheduling
- Book appointments automatically
- Execute custom functions based on conversation
Post-conversation tasks
- Summarization: Generate call summaries automatically
- Data extraction: Extract specific information from conversations
- Function execution: Trigger custom workflows after calls end