Google Health AI Releases MedASR: A Conformer-Based Medical Speech-to-Text Model
These articles are AI-generated summaries. Please check the original sources for full details.
Google Health AI Releases MedASR: A Conformer-Based Medical Speech-to-Text Model
Google Health AI has released MedASR, an open-weights speech-to-text model based on the Conformer architecture, specifically designed for clinical dictation and physician-patient conversations. The model contains 105 million parameters and is positioned as a starting point for healthcare voice applications.
Why This Matters
General-purpose speech-to-text models often struggle with the nuanced vocabulary and phrasing common in clinical settings, leading to inaccurate transcriptions and hindering downstream AI workflows. This can increase the workload for medical professionals and potentially compromise patient care. MedASR addresses this gap with domain-specific training, aiming for significantly lower word error rates in medical contexts.
Key Insights
- Conformer Architecture: Combines convolutional and self-attention layers for capturing both local and long-range dependencies in speech.
- Training Data: Trained on approximately 5000 hours of de-identified medical speech data, covering radiology, internal medicine, and family medicine.
- Performance Gains: MedASR achieves competitive or superior word error rates compared to models like Gemini 2.5 Pro and Whisper v3 Large, particularly when combined with a six-gram language model.
Working Example
from transformers import pipeline
import huggingface_hub
audio = huggingface_hub.hf_hub_download("google/medasr", "test_audio.wav")
pipe = pipeline("automatic-speech-recognition", model="google/medasr")
result = pipe(audio, chunk_length_s=20, stride_length_s=2)
print(result)
Practical Applications
- Radiology Dictation: Integrating MedASR into radiology workflows for automated transcription of image interpretations.
- Pitfall: Relying on greedy decoding alone; combining MedASR with a language model yields significant performance improvements.
References:
Continue reading
Next article
InstaDeep Introduces Nucleotide Transformer v3 (NTv3): A New Multi-Species Genomics Foundation Model
Related Content
FunctionGemma: Google AI’s 270M Parameter Function Calling Specialist for Edge Workloads
Google released FunctionGemma, a compact 270M parameter model achieving 85% accuracy on the Mobile Actions benchmark after fine-tuning.
NVIDIA Releases PersonaPlex-7B-v1: A Real-Time Speech-to-Speech Model
NVIDIA’s PersonaPlex-7B-v1 achieves a 0.908 Takeover Rate on FullDuplexBench, demonstrating significant progress in natural, full-duplex conversational AI.
Liquid AI Releases LFM2-ColBERT-350M: A Compact Late Interaction Model for Multilingual Cross-Lingual Retrieval
Liquid AI introduces LFM2-ColBERT-350M, a 350M-parameter late interaction retriever optimized for multilingual and cross-lingual search, offering high accuracy and fast inference speeds.