Skip to main content

On This Page

VoxPilot: Local Voice-to-Code VS Code Extension Using Moonshine ASR

2 min read
Share

These articles are AI-generated summaries. Please check the original sources for full details.

I Built a Voice-to-Code VS Code Extension That Runs Entirely On-Device

Nate Archer developed VoxPilot, an open-source VS Code extension that enables voice-driven programming without cloud dependency. The system utilizes a compact 27MB Moonshine ASR model running locally via ONNX Runtime to eliminate API latency and privacy risks.

Why This Matters

Traditional AI coding assistants rely on text prompts, forcing developers to spend significant time typing complex refactoring instructions or unit test requirements. VoxPilot addresses this by converting speech to text locally, offering a high-performance accessibility solution for developers with RSI or carpal tunnel while maintaining strict data privacy by processing sensitive voice data entirely in-memory without network transmission.

Key Insights

  • Moonshine ASR Tiny model (27MB) handles quick commands while the Base model (65MB) supports longer dictation via ONNX Runtime.
  • Local processing eliminates the need for API keys or cloud telemetry, ensuring voice data never leaves the developer’s machine.
  • Native CLI tools like arecord on Linux, sox on macOS, and ffmpeg on Windows capture 16kHz PCM audio for standardized processing.
  • Energy-based Voice Activity Detection (VAD) automates the recording process, removing the requirement for manual push-to-talk triggers.
  • Integration with VS Code’s Chat API allows VoxPilot to bridge voice input directly to participants like GitHub Copilot or Continue.

Practical Applications

  • Use case: Developers with RSI using VoxPilot to dictate complex refactoring commands to Copilot. Pitfall: High ambient noise environments may trigger the energy-based VAD unintentionally.
  • Use case: Security-conscious engineering teams using local ASR to prevent sensitive voice data from being sent to external clouds. Pitfall: Choosing the 27MB Tiny model for highly technical jargon may result in lower transcription accuracy than the Base model.

References:

Continue reading

Next article

Lessons from Running 100+ AI Agents in Production: Scaling Rate Limits and Costs

Related Content