Voice typing for Linux, published on PyPI
View on GitHubA desktop voice typing tool for Linux. Hold a hotkey, speak, release -- your words are typed at the cursor. Built because existing voice typing tools were either Mac-only, required a browser, or had unacceptable latency. Published on PyPI as a pip-installable package.
Audio capture uses sounddevice with a callback-buffered InputStream at 16kHz mono. Recording starts on hotkey press (Right Ctrl or F8 via pynput) and stops on release, outputting WAV via BytesIO.
Transcription follows a Groq-first-with-local-fallback strategy: tries Groq API (whisper-large-v3-turbo model) with retry and exponential backoff on 429/5xx errors. Falls back to local faster-whisper (distil-large-v3, CPU int8, VAD filter enabled) on any failure.
Intent classification is pure keyword matching -- no LLM in the loop. Checks for action keywords (open, find, search, grep, copy) returning 'act' or 'ask'. Action planning maps transcripts to 6 tools: find_file (ripgrep), search_repo, open_file, open_url, open_app, copy_text.
Text injection uses xclip for X11 and wl-copy for Wayland. App-aware cleanup applies context-specific transformations. Custom dictionaries and snippets are stored in ~/.config/undertone/.
TTS system has 5 backends (3 local: Qwen3-TTS, Supertonic v2 ONNX, KittenTTS Mini; 2 hosted: Inworld, ElevenLabs) with a built-in bakeoff function for side-by-side comparison.