voice — TTS & STT tool for AI agents
voice speaks text aloud using Kokoro TTS and transcribes speech using Moonshine STT on Apple Silicon. Use it to talk to your user, listen for their response, or run a full voice conversation loop.
Quick reference
Speak (TTS)
# Speak text (backward compatible — no subcommand needed)
voice Hello, I finished the task.
# Explicit say subcommand with options
voice say -v am_michael "Switching to a male voice."
# Speak from a pipe
echo "Build complete." | voice say
# Read a file aloud (strip markdown first)
voice say --markdown -f README.md
# Save to WAV instead of playing
voice say -o result.wav "Here is your audio."
# Precise pronunciation via IPA phonemes
voice say --phonemes "həlˈO wˈɜɹld"
Converse (speak + listen)
# Speak text, then immediately listen for a response
voice converse "How are you today?"
# With voice and speed options
voice converse -v am_michael -s 1.2 "What do you think about that?"
Listen (STT)
# Record from mic, transcribe on Enter/Ctrl+C
voice listen
# Continuous mode — transcribe segments as you speak, split on silence
voice listen --continuous
# Transcribe a WAV file
voice transcribe recording.wav
JSON-RPC server
# Start the server (for programmatic control)
voice serve -v am_michael
# Speak
→ {"jsonrpc":"2.0","method":"speak","params":{"text":"Hello"},"id":1}
← {"jsonrpc":"2.0","result":{"duration_ms":1800,"chunks":1},"id":1}
# Listen (ding plays, records, auto-stops on silence)
→ {"jsonrpc":"2.0","method":"listen","id":2}
← {"jsonrpc":"2.0","result":{"text":"I heard you","tokens":4,"duration_ms":3200},"id":2}
# Cancel current playback or recording
→ {"jsonrpc":"2.0","method":"cancel","id":3}
# Other methods: set_voice, set_speed, list_voices, ping
When to use
- Get attention: Speak when a long task finishes, a build fails, or you need input
- Read content: Pipe text through
voice sayto read back docs, errors, or summaries - Confirm actions: "Deploying to production" before doing something irreversible
- Listen for input: Use
voice listento capture a spoken response from the user - Voice conversation: Use
voice converseto speak then listen in one shot, orvoice servefor programmatic control - Transcribe recordings: Use
voice transcribeto convert audio files to text
Tips
TTS tips
- Use
-qfor quiet mode — suppresses phonemes and progress, only errors print - For long text,
voiceautomatically chunks at ~510 phonemes and streams playback - Stderr shows phoneme output — useful for debugging pronunciation
- Use
--sub word=replacementto fix names:voice say --sub kubectl=cube-cuddle "Restarting kubectl" - A
.voice-subsfile in the project root is auto-discovered for persistent fixes - Wrap substitution values in
/slashes/for raw phoneme overrides:Kokoro=/kˈOkəɹO/
STT tips
- A ding sound plays when the mic is ready — wait for it before speaking
- Bluetooth mics (AirPods) have ~0.5s latency; the ding helps you time it
- Noise floor is calibrated automatically — works with MacBook mic or AirPods
- Use
STT_MODEL=UsefulSensors/moonshine-tinyfor faster (but less accurate) transcription - Default model is
moonshine-base(61M params, ~50× real-time on Apple Silicon)
JSON-RPC tips
voice serveloads the TTS model at startup; STT model loads lazily on firstlistencancelinterrupts the current speak or listen mid-operationspeaksupports per-requestvoiceandspeedoverrides without changing defaultslistenparams are tunable:noise_multiplier,calibration_ms,silence_timeout_ms- Notifications (requests without
id) are fire-and-forget — no response returned
Subcommands
| Command | What it does |
|---|---|
voice <text> | Speak text (implicit say, backward compatible) |
voice say | Speak text with full TTS options |
voice converse | Speak text, then listen for a response |
voice listen | Record from mic, transcribe once |
voice listen --continuous | Record and transcribe segments continuously |
voice transcribe <file> | Transcribe a WAV file |
voice serve | Start JSON-RPC server on stdin/stdout |
Builtin voices (no network)
af_heart (default), af_bella, af_sarah, af_sky, am_michael, am_adam, bf_emma
Install
git clone https://github.com/rgbkrk/voice.git
cd voice
cargo install --path crates/voice-cli
Requires macOS with Apple Silicon, Git LFS, and Rust 1.85+. TTS model weights download on first voice say (~312MB, cached). STT model weights download on first voice listen (~246MB, cached).