Install
openclaw skills install nexus-voice-transcriberVoice note transcription and archival for OpenClaw agents. Powered by Deepgram Nova-3 or local Whisper. Transcribes audio messages, saves both audio files and text transcripts.
openclaw skills install nexus-voice-transcriberOn first use, read references/whisper-models.md and references/troubleshooting.md.
Ensure dependencies: ffmpeg, python3, and required Python packages (openai-whisper, deepgram-sdk optional).
Memory lives in ~/voice-transcriber/. See below for structure.
~/voice-transcriber/
├── memory.md # Provider preferences, defaults, history
├── transcripts/ # Saved transcripts (txt, json, srt)
├── audio/ # Saved original audio files
└── temp/ # Processing workspace (auto-cleaned)
| Topic | File |
|---|---|
| Whisper model guide | references/whisper-models.md |
| Troubleshooting | references/troubleshooting.md |
| Main script | scripts/transcribe.py |
Before transcription:
temp/, then process| Scenario | Best Provider | Why |
|---|---|---|
| Privacy, no API keys | Local Whisper | Runs on-device, free |
| High accuracy, speed | Deepgram Nova‑3 | Low latency, good accuracy |
| Speaker identification | Deepgram (with diarization) | Native speaker labels |
| No internet | Local Whisper | Offline capable |
Files >25 MB or >2 hours:
ffmpeg (see scripts/transcribe.py --split)After successful transcription:
~/voice-transcriber/transcripts/ with a meaningful name~/voice-transcriber/audio/ if user wants archivalmemory.md with date, file, provider, durationDefault to plain text (.txt). Offer alternatives:
.txt — clean text, no timestamps.srt / .vtt — subtitles with timing.json — structured with word‑level timing (Deepgram) or segment timing (Whisper)ffmpeg noise reduction).Required:
ffmpeg (audio conversion, splitting)python3 + pipopenai-whisper (local), requests (for Deepgram if used)Optional API keys (only if using Deepgram):
DEEPGRAM_API_KEY — for Deepgram Nova‑3 (speaker diarization available)Local Whisper works without any API keys.
# Install
pip install openai-whisper
# Basic transcription (via script)
python3 scripts/transcribe.py --file audio.wav --provider whisper --model base
# Output formats: txt (default), srt, vtt, json
python3 scripts/transcribe.py --file audio.wav --provider whisper --model medium --format srt
Models: tiny (fastest) → base → small → medium → large (most accurate).
# Set environment variable
export DEEPGRAM_API_KEY="your_key_here"
# Transcribe with speaker diarization
python3 scripts/transcribe.py --file audio.wav --provider deepgram --diarize
# Output JSON with speaker labels
python3 scripts/transcribe.py --file audio.wav --provider deepgram --format json
ffmpeg -i video.mp4 -vn -acodec pcm_s16le -ar 16000 -ac 1 audio.wav
ffmpeg -i noisy.wav -af "afftdn=nf=-25" clean.wav
ffmpeg -i long.mp3 -f segment -segment_time 600 -c copy temp/chunk_%03d.mp3
Data that stays local:
~/voice-transcriber/transcripts/~/voice-transcriber/audio/Data that leaves your machine (if using Deepgram):
api.deepgram.com)This skill does NOT:
| Endpoint | Data Sent | Purpose |
|---|---|---|
api.deepgram.com/v1/listen | Audio file | Deepgram transcription |
Only called when user explicitly chooses Deepgram provider. Local Whisper sends nothing.
Create ~/voice-transcriber/memory.md with this structure:
# Voice Transcriber Memory
## Status
status: ongoing
version: 1.0.0
last: YYYY‑MM‑DD
integration: pending
## Context
<!-- Observations about transcription needs, preferred providers, languages, etc. -->
## Notes
<!-- Provider preferences, format preferences, diarization needs -->
---
*Updated: YYYY‑MM‑DD*
Install with clawhub install <slug> if user confirms:
speech-to-text-transcription — broader audio/video transcription with more providersffmpeg — advanced audio/video processingaudio — general audio manipulationclawhub star voice-transcriberclawhub sync