How Tolan builds voice-first AI with GPT-5.1
These articles are AI-generated summaries. Please check the original sources for full details.
How Tolan builds voice-first AI with GPT-5.1
Tolan is a voice-first AI companion utilizing GPT-5.1 to deliver personalized, ongoing conversations with users. The application, built by Portola, has already amassed over 200,000 monthly active users since its launch in February 2025.
Voice AI presents unique challenges compared to text-based models, demanding low latency and robust context management to maintain natural, flowing interactions. Traditional approaches to context caching often fail in dynamic voice conversations, leading to disjointed experiences and user frustration, potentially impacting retention rates.
Key Insights
- 0.7-second latency reduction: Implementing OpenAI’s GPT-5.1 and Responses API decreased speech initiation time by 0.7 seconds.
- Context Reconstruction: Tolan rebuilds its context window each turn, incorporating summaries, persona cards, memories, and real-time signals.
- Turbopuffer: Tolan uses Turbopuffer, a high-speed vector database, for sub-50ms memory lookup times.
Practical Applications
- Personalized Companions: Tolan provides a continuously learning AI companion, improving user engagement through consistent personality and memory.
- Pitfall: Relying on cached prompts in voice applications leads to inconsistencies and a disjointed user experience when the conversation topic shifts.
References:
Continue reading
Next article
FIFA's AI-Powered Offside Calls to Debut at World Cup 2026
Related Content
Salesforce AI Research Releases VoiceAgentRAG: A Dual-Agent Memory Router that Cuts Voice RAG Retrieval Latency by 316x
Salesforce AI Research released VoiceAgentRAG, an open-source architecture that reduces retrieval latency by 316x using a dual-agent system to meet the 200ms voice response budget.
Mistral Voxtral TTS: Closing the Expressivity Gap in Multilingual Voice Cloning
Mistral's Voxtral TTS uses a hybrid 4B-parameter architecture to achieve a 68.4% win rate over ElevenLabs Flash v2.5 in multilingual voice cloning.
Characterizing AWS Graviton Memory Subsystems: Graviton2 vs. Graviton4 Performance
Analysis of AWS Graviton4 reveals a 79.8% increase in L1 data architectural efficiency over Graviton2 using the Arm System Characterization Tool.