Zing Forum

Reading

ERA Voice Agent: A Production-Grade AI Phone Voice Agent Based on Twilio, Groq, and ElevenLabs

An open-source production-grade AI voice phone agent system that integrates Twilio telephony services, Groq's high-speed LLM inference, and ElevenLabs' natural speech synthesis, supporting both incoming and outgoing call scenarios.

AI语音代理TwilioGroqElevenLabs电话机器人语音合成大语言模型FastAPI开源项目
Published 2026-05-02 02:43Recent activity 2026-05-02 02:51Estimated read 8 min
ERA Voice Agent: A Production-Grade AI Phone Voice Agent Based on Twilio, Groq, and ElevenLabs
1

Section 01

Introduction: ERA Voice Agent—An Open-Source Production-Grade AI Phone Voice Agent System

ERA Voice Agent is an open-source production-grade AI voice phone agent system that integrates Twilio telecommunication, Groq's high-speed LLM inference, and ElevenLabs' natural speech synthesis capabilities. It supports both incoming and outgoing call scenarios, addressing the technical barriers for enterprises to deploy AI phone customer service and enabling sub-second real-time conversation flows.

2

Section 02

Project Background and Overview

ERA Voice Agent aims to address the technical barriers for enterprises to deploy AI phone customer service. It builds a complete voice conversation pipeline by integrating Twilio (telecommunication), Groq (high-speed LLM inference), and ElevenLabs (speech synthesis). Unlike traditional call centers or simple voice robots, it enables true real-time conversations: answering/outbound calls, natural voice greetings, real-time input monitoring, LLM intelligent responses, and high-quality voice replies—all completed in sub-seconds.

3

Section 03

Core Architecture and Tech Stack

ERA adopts a modular design with core components including:

  1. FastAPI Server: The central coordinator that handles Twilio Webhook callbacks, manages session states, and coordinates the order of speech synthesis and LLM inference;
  2. Speech Synthesis Module: Encapsulates the ElevenLabs API to generate MP3 audio, falling back to Twilio's tag if it fails;
  3. Conversation Inference Module: Interacts with the Groq API, uses the Llama 3.3 70B model to generate context-aware responses, and maintains conversation history;
  4. Outbound Call Script: A CLI tool that supports initiating outbound calls from the terminal and customizing the purpose of the call.
4

Section 04

Key Feature Analysis

ERA's core features include:

  • Two-way Call Support: Incoming calls are accessed via Twilio Voice Webhook, and outgoing calls are triggered via CLI script with a configurable purpose parameter;
  • Purpose-Aware Conversation: The call purpose (e.g., meeting reservation) is passed via URL parameters, and the LLM adjusts its conversation strategy accordingly;
  • Graceful Termination: When the LLM response includes the [END_CALL] tag, it plays a farewell voice and hangs up automatically;
  • Fault-Tolerant Fallback: If ElevenLabs fails, it falls back to Twilio's voice; if Groq fails, it returns a pre-set apology text;
  • Session Management: Uses an in-memory dictionary to track conversation history by CallSid, supporting multi-turn context understanding.
5

Section 05

Detailed Typical Call Flow

Incoming Call Flow:

  1. Call Trigger: Twilio receives a PSTN call and sends a POST request to the /voice endpoint;
  2. Session Initialization: FastAPI creates a new session, storing the CallSid and call purpose;
  3. Opening Line Generation: ElevenLabs generates a greeting voice and plays it;
  4. Voice Collection: Twilio's listens and transcribes the voice into text;
  5. AI Response Generation: The transcribed text is sent to Groq, which generates a response based on history;
  6. Speech Synthesis: The response is converted into an audio file;
  7. Response Playback: Twilio plays the audio; if it fails, it reads the text directly;
  8. Loop or Terminate: If [END_CALL] is present, hang up; otherwise, return to step 4.
6

Section 06

Deployment and Configuration Requirements

Deployment Requirements:

  • Python 3.10+, with dependencies including FastAPI, Uvicorn, Twilio SDK, Groq SDK, etc.;
  • Requires Twilio Account SID/Auth Token, Groq API key, ElevenLabs API key, and a public access URL (use ngrok for local deployment);
  • Configuration is managed via .env file, supporting custom models (default Llama3.3 70B), voice IDs, TTS versions, etc.
7

Section 07

Production Environment Optimization Recommendations

Production Deployment Optimization:

  • Persistent Sessions: Replace in-memory call_sessions with Redis/database;
  • Key Security: Use AWS/GCP Secrets Manager or HashiCorp Vault instead of plaintext .env files;
  • Audio Cleanup: Schedule tasks to clean up expired MP3 files in the audio directory;
  • Security Validation: Add rate limits to /voice and /gather endpoints, and verify Twilio signatures;
  • Horizontal Scaling: After externalizing session states, deploy multiple instances with load balancing.
8

Section 08

Application Scenarios and Value

ERA is suitable for scenarios such as customer support hotlines, appointment reminder outbound calls, satisfaction surveys, sales lead screening, order status inquiries, etc. Its open-source nature and modular architecture allow enterprises to customize conversation logic and integrate with internal systems, providing teams with a low-threshold solution to validate the AI phone agent concept—enabling LLM-driven real-time voice interaction without building a voice pipeline from scratch.