Skip to main content
← All Tags

Voice AI

19 articles in this category

AI NewsVoice AISoftware Engineering

Supertonic v3: On-Device TTS with 31-Language Support and Expressive Tags

Supertone releases Supertonic v3, an on-device TTS model supporting 31 languages and expressive tags with a compact 404 MB disk footprint.

Read more
AI NewsVoice AINew Releases

OpenAI Launches GPT-Realtime-2 and Specialized Audio Models in General Availability

OpenAI moves the Realtime API to general availability, introducing GPT-Realtime-2 with GPT-5-class reasoning and a 128K context window.

Read more
AI NewsVoice AILanguage Model

Mistral Voxtral TTS: Closing the Expressivity Gap in Multilingual Voice Cloning

Mistral's Voxtral TTS uses a hybrid 4B-parameter architecture to achieve a 68.4% win rate over ElevenLabs Flash v2.5 in multilingual voice cloning.

Read more
AI NewsVoice AIGenerative AI

Sakana AI Introduces KAME: Real-Time LLM Knowledge Injection for Near-Zero Latency Speech

Sakana AI's new KAME architecture boosts S2S model MT-Bench scores from 2.05 to 6.43 while maintaining near-zero latency by injecting back-end LLM knowledge in real-time.

Read more
AI NewsArtificial IntelligenceVoice AI

IBM Releases Two Granite Speech 4.1 2B Models: High-Speed ASR and Translation

IBM's Granite Speech 4.1 2B models deliver a 5.33 mean WER and an RTFx of 1820 on H100 GPUs, offering enterprise-grade speech recognition and translation.

Read more
AI NewsArtificial IntelligenceVoice AI

OpenMOSS MOSS-Audio: A Unified Open-Source Foundation Model for Time-Aware Audio Reasoning

OpenMOSS releases MOSS-Audio, a unified foundation model achieving 71.08 average accuracy on audio benchmarks, outperforming 30B+ parameter systems.

Read more
AI NewsVoice AILarge Language Models

xAI Launches grok-voice-think-fast-1.0: Setting a New Standard for Full-Duplex Voice AI

xAI's new grok-voice-think-fast-1.0 tops the τ-voice Bench with a 67.3% score, outperforming Gemini 3.1 and GPT Realtime 1.5 in complex, real-world voice tasks.

Read more
AI NewsVoice AIAgentic AI

Mastering the Deepgram Python SDK: A Full-Stack Voice AI Implementation Guide

Learn to implement a complete voice AI pipeline using the Deepgram Python SDK, featuring Nova-3 transcription, Aura-2 text-to-speech, and automated text intelligence.

Read more
AI NewsVoice AILanguage Model

xAI Launches Grok STT and TTS APIs for Enterprise Voice Developers

xAI releases standalone Grok speech APIs featuring a 5.0% error rate in phone call entity recognition, outperforming ElevenLabs and Deepgram.

Read more
AI NewsAgentic AIVoice AI

Salesforce AI Research Releases VoiceAgentRAG: A Dual-Agent Memory Router that Cuts Voice RAG Retrieval Latency by 316x

Salesforce AI Research released VoiceAgentRAG, an open-source architecture that reduces retrieval latency by 316x using a dual-agent system to meet the 200ms voice response budget.

Read more
AI NewsArtificial IntelligenceVoice AI

Cohere AI Releases Cohere Transcribe: A SOTA Conformer-Based ASR for Enterprise Intelligence

Cohere Transcribe debuts as the #1 model on the Hugging Face Open ASR Leaderboard with a 5.42% average WER, outperforming Whisper Large v3 and ElevenLabs Scribe v2.

Read more
AI NewsArtificial IntelligenceVoice AI

Google AI Releases WAXAL: A 24-Language African Speech Dataset for ASR and TTS

Google AI launches WAXAL, an open multilingual dataset covering 24 African languages with specialized components for ASR and studio-quality TTS.

Read more
AI NewsVoice AIAgentic AI

Fish Audio S2-Pro: High-Fidelity TTS with Dual-AR Architecture and Sub-150ms Latency

Fish Audio S2-Pro introduces a Dual-AR framework and Residual Vector Quantization to deliver 44.1kHz speech synthesis with 100ms latency on NVIDIA H200.

Read more
AI NewsVoice AISoftware Development

8 Leading Platforms for Building Low-Latency Voice AI Agents

Discover 8 top platforms like OpenAI and Stream for building real-time voice AI agents with low-latency WebRTC and multi-modal LLM support.

Read more
AI NewsVoice AIAgentic AI

Beyond Simple API Requests: How OpenAI’s WebSocket Mode Changes the Game for Low Latency Voice Powered AI Experiences

OpenAI's Realtime API collapses the STT-LLM-TTS stack using WebSocket protocols to enable full-duplex, multimodal GPT-4o interactions with sub-millisecond latency improvements.

Read more
AI NewsVoice AIGame Development

Building Multi-Speaker AI Games with Gemini Live

Fishjam.io's Deep Sea Stories game showcases a multi-speaker AI interface using Gemini Live, handling group conversations with real-time audio streaming and Voice Activity Detection.

Read more
AI NewsVoice AIGPT-5

How Tolan builds voice-first AI with GPT-5.1

Tolan leverages GPT-5.1 to achieve a 30% reduction in memory recall misses and a 20% increase in next-day user retention.

Read more
AI NewsVoice AILanguage Model

Meta AI Releases Omnilingual ASR: A Suite of Open-Source Multilingual Speech Recognition Models for 1600+ Languages

Meta AI launches Omnilingual ASR, an open-source speech recognition system supporting 1600+ languages with <10% character error rate.

Read more
AI NewsAgentic AIVoice AI

Building an Agentic Voice AI Assistant with Autonomous Intelligence

A tutorial on creating an AI voice assistant that understands, reasons, plans, and responds through autonomous multi-step intelligence using Whisper and SpeechT5.

Read more