Supertonic v3: On-Device TTS with 31-Language Support and Expressive Tags
These articles are AI-generated summaries. Please check the original sources for full details.
Supertone Releases Supertonic v3: On-Device Text-to-Speech Model with 31-Language Support, Fewer Reading Failures, and Expression Tags
Supertone has launched Supertonic v3, the third generation of its ONNX-based on-device text-to-speech system. The model expands language support from 5 to 31 codes while maintaining a compact footprint of approximately 99M parameters. This release introduces expressive tags and a built-in text normalization engine that outperforms major cloud-based competitors on technical units.
Why This Matters
Most high-fidelity TTS models require significant cloud resources, with parameter counts often ranging from 0.7B to 2B, making edge deployment difficult. Supertonic v3 addresses this by utilizing flow-matching to achieve usable audio in just 2 inference steps, significantly reducing memory and compute requirements compared to diffusion-based models. The built-in text normalization solves the common failure point where standard systems struggle with complex surface forms like financial units ($5.2M) and technical abbreviations (30kph). While competitors like ElevenLabs Flash v2.5 and OpenAI TTS-1 failed to correctly process these inputs, Supertonic v3 maintains reading accuracy without requiring external preprocessing pipelines.
Key Insights
- Expanded language coverage from 5 to 31 ISO codes, including a special ‘na’ fallback for unknown text (Supertone, 2026).
- Flow-matching architecture enables high-speed inference on CPU, achieving an average RTF of 0.3x on an Onyx Boox Go 6 e-reader.
- Introduction of Length-Aware Rotary Position Embedding (LARoPE) and Self-Purifying Flow Matching to improve text-speech alignment and robustness against noisy labels.
- Expressive tag support allows embedding prosodic cues like
, , and directly into input text without separate preprocessing. - Public ONNX assets occupy only 404 MB, making the system viable for browser and mobile environments via onnxruntime-web.
Working Examples
Minimal Python SDK example for synthesizing audio using the Supertonic v3 model.
from supertonic import TTS
tts = TTS(auto_download=True)
style = tts.get_voice_style(voice_name="M1")
text = "A gentle breeze moved through the open window while everyone listened to the story."
wav, duration = tts.synthesize(text, voice_style=style, lang="en")
tts.save_audio(wav, "output.wav")
print(f"Generated {duration:.2f}s of audio")
Practical Applications
- Use case: E-ink e-readers (Onyx Boox Go 6) can perform local TTS in airplane mode with 0.3x RTF. Pitfall: Attempting to use larger 2B parameter models on such hardware typically results in excessive latency and memory exhaustion.
- Use case: Automated financial reporting systems can correctly verbalize ‘$5.2M’ as ‘five point two million dollars’ using built-in text normalization. Pitfall: Relying on generic TTS systems like OpenAI TTS-1 or Gemini 2.5 Flash often leads to reading failures on technical units and currency formats.
- Use case: Web applications using onnxruntime-web for pure client-side execution of voice interfaces. Pitfall: Neglecting to handle the ‘na’ fallback for unsupported languages, which could lead to inconsistent synthesis quality for unknown text inputs.
References:
Continue reading
Next article
Swift Protocol Magic: Designing a Reusable Location Tracking System for iOS
Related Content
Automating Policy-Gated Releases: Building SwiftDeploy for Observable DevOps
SwiftDeploy evolves into a policy-gated system using OPA to block releases if disk space is under 10GB or error rates exceed 1%.
OpenAI Releases MRC Protocol: Scaling AI Supercomputing to 131,000 GPUs
OpenAI's new MRC protocol enables 131,000 GPU clusters with 33% fewer optics and microsecond failure recovery for frontier AI model training.
OpenAI Launches Codex Chrome Extension for Signed-In Browser Workflows
OpenAI releases a Codex Chrome extension enabling AI agents to access authenticated sessions for LinkedIn and Salesforce via a new three-tier browser execution model.