Dictare: Open-Source Voice Layer for AI Coding Agents Launches with 100% Local Processing
Key Takeaways
- ▸Dictare introduces an open protocol (OpenVIP) for voice-to-agent communication that bypasses window focus limitations of existing voice tools
- ▸100% local processing with on-device STT eliminates privacy concerns by ensuring no audio data leaves the user's machine
- ▸Open-source design allows any tool to implement the SSE endpoint, enabling broader adoption across different coding agents and AI platforms
Summary
Dictare, a new open-source voice interaction system, has launched to enable developers to speak commands to AI coding agents without requiring window focus or sending data to external servers. Unlike existing voice tools that simulate keystrokes into focused windows, Dictare uses an open protocol called OpenVIP that allows coding agents to receive transcriptions via Server-Sent Events (SSE) regardless of application window focus. The tool runs entirely locally on-device using speech-to-text models like Whisper or Parakeet, ensuring zero data leaves the user's machine.
The platform supports multiple coding agents including Claude Code, OpenAI Codex, Google Gemini, and Aider, with users able to switch between agents via voice commands. Dictare operates as a background service (launchd on macOS, systemd on Linux) and includes bidirectional voice capabilities with both speech-to-text (STT) and text-to-speech (TTS) functionality. Installation is straightforward via Homebrew on macOS or a bash script on Linux, with configurable hotkeys (Right ⌘ on macOS, Scroll Lock on Linux) to control listening states.
- Multi-agent support with voice-activated switching enables developers to manage multiple AI coding assistants from a single hotkey interface
Editorial Opinion
Dictare represents a significant step forward in making AI coding agents more accessible through natural voice interaction. By prioritizing privacy through local processing and creating an open protocol rather than a closed ecosystem, the project demonstrates how developer tools can be both powerful and user-respecting. The elimination of window-focus requirements is a clever technical solution that removes a major friction point in existing voice tools, making voice-driven development more practical for real-world workflows.


