{"skill":{"slug":"nexus-voice-transcriber","displayName":"NEXUS Voice Transcriber","summary":"Voice note transcription and archival for OpenClaw agents. Powered by Deepgram Nova-3 or local Whisper. Transcribes audio messages, saves both audio files an...","description":"---\nname: voice-transcriber\ndescription: Voice note transcription and archival for OpenClaw agents. Powered by Deepgram Nova-3 or local Whisper. Transcribes audio messages, saves both audio files and text transcripts.\nhomepage: https://clawhub.ai/skills/voice-transcriber\nmetadata: {\"openclaw\":{\"emoji\":\"🎙️\",\"requires\":{\"bins\":[\"ffmpeg\",\"python3\"]},\"os\":[\"linux\",\"darwin\",\"win32\"]}}\n---\n\n## Setup\n\nOn first use, read `references/whisper-models.md` and `references/troubleshooting.md`.  \nEnsure dependencies: `ffmpeg`, `python3`, and required Python packages (`openai-whisper`, `deepgram-sdk` optional).\n\n## When to Use\n\n- User sends a voice note / audio file / video file that needs transcription.\n- Need to archive both the original audio and the text transcript.\n- Want speaker detection (if using Deepgram with diarization).\n- Quick local transcription without external APIs (Whisper).\n\n## Architecture\n\nMemory lives in `~/voice-transcriber/`. See below for structure.\n\n```\n~/voice-transcriber/\n├── memory.md          # Provider preferences, defaults, history\n├── transcripts/       # Saved transcripts (txt, json, srt)\n├── audio/             # Saved original audio files\n└── temp/              # Processing workspace (auto-cleaned)\n```\n\n## Quick Reference\n\n| Topic | File |\n|-------|------|\n| Whisper model guide | `references/whisper-models.md` |\n| Troubleshooting | `references/troubleshooting.md` |\n| Main script | `scripts/transcribe.py` |\n\n## Core Rules\n\n### 1. Detect Input Type\nBefore transcription:\n- **Local file path** → verify exists, check format (mp3, wav, m4a, mp4, etc.)\n- **URL** → download to `temp/`, then process\n- **Voice memo** → usually single speaker, short\n- **Meeting / interview** → likely multiple speakers, consider diarization\n\n### 2. Choose Provider Based on Context\n| Scenario | Best Provider | Why |\n|----------|---------------|-----|\n| Privacy, no API keys | Local Whisper | Runs on-device, free |\n| High accuracy, speed | Deepgram Nova‑3 | Low latency, good accuracy |\n| Speaker identification | Deepgram (with diarization) | Native speaker labels |\n| No internet | Local Whisper | Offline capable |\n\n### 3. Handle Long Audio\nFiles >25 MB or >2 hours:\n1. Split into chunks with `ffmpeg` (see `scripts/transcribe.py --split`)\n2. Process each chunk\n3. Merge transcripts with proper timestamps\n\n### 4. Save Artifacts\nAfter successful transcription:\n- Save transcript to `~/voice-transcriber/transcripts/` with a meaningful name\n- Save original audio to `~/voice-transcriber/audio/` if user wants archival\n- Update `memory.md` with date, file, provider, duration\n\n### 5. Output Formats\nDefault to plain text (`.txt`). Offer alternatives:\n- `.txt` — clean text, no timestamps\n- `.srt` / `.vtt` — subtitles with timing\n- `.json` — structured with word‑level timing (Deepgram) or segment timing (Whisper)\n\n## Common Traps\n\n- **Assuming one provider fits all** → Whisper lacks diarization; Deepgram needs API key.\n- **Uploading huge files directly** → Timeouts. Split first.\n- **Ignoring audio quality** → Noisy audio may need preprocessing (`ffmpeg` noise reduction).\n- **Not checking language** → Whisper auto‑detects but can fail on mixed‑language content.\n- **Forgetting to save audio** → User may want the original file archived.\n\n## Requirements\n\n**Required:**\n- `ffmpeg` (audio conversion, splitting)\n- `python3` + `pip`\n- Python packages: `openai-whisper` (local), `requests` (for Deepgram if used)\n\n**Optional API keys (only if using Deepgram):**\n- `DEEPGRAM_API_KEY` — for Deepgram Nova‑3 (speaker diarization available)\n\nLocal Whisper works without any API keys.\n\n## Provider Quick Reference\n\n### Local Whisper (No API Key)\n```bash\n# Install\npip install openai-whisper\n\n# Basic transcription (via script)\npython3 scripts/transcribe.py --file audio.wav --provider whisper --model base\n\n# Output formats: txt (default), srt, vtt, json\npython3 scripts/transcribe.py --file audio.wav --provider whisper --model medium --format srt\n```\n\nModels: `tiny` (fastest) → `base` → `small` → `medium` → `large` (most accurate).\n\n### Deepgram Nova‑3 (API Key Required)\n```bash\n# Set environment variable\nexport DEEPGRAM_API_KEY=\"your_key_here\"\n\n# Transcribe with speaker diarization\npython3 scripts/transcribe.py --file audio.wav --provider deepgram --diarize\n\n# Output JSON with speaker labels\npython3 scripts/transcribe.py --file audio.wav --provider deepgram --format json\n```\n\n## Audio Preprocessing\n\n### Extract Audio from Video\n```bash\nffmpeg -i video.mp4 -vn -acodec pcm_s16le -ar 16000 -ac 1 audio.wav\n```\n\n### Reduce Noise\n```bash\nffmpeg -i noisy.wav -af \"afftdn=nf=-25\" clean.wav\n```\n\n### Split Long Audio (10‑minute chunks)\n```bash\nffmpeg -i long.mp3 -f segment -segment_time 600 -c copy temp/chunk_%03d.mp3\n```\n\n## Security & Privacy\n\n**Data that stays local:**\n- Transcripts in `~/voice-transcriber/transcripts/`\n- Original audio in `~/voice-transcriber/audio/`\n- Local Whisper processes entirely on‑device\n\n**Data that leaves your machine (if using Deepgram):**\n- Audio file sent to Deepgram API (`api.deepgram.com`)\n- Transcript returned and stored locally\n\n**This skill does NOT:**\n- Store API keys in plain text (use environment variables)\n- Auto‑upload without confirmation\n- Retain files on external servers after processing\n\n## External Endpoints\n\n| Endpoint | Data Sent | Purpose |\n|----------|-----------|---------|\n| `api.deepgram.com/v1/listen` | Audio file | Deepgram transcription |\n\nOnly called when user explicitly chooses Deepgram provider. Local Whisper sends nothing.\n\n## Memory Template\n\nCreate `~/voice-transcriber/memory.md` with this structure:\n\n```markdown\n# Voice Transcriber Memory\n\n## Status\nstatus: ongoing\nversion: 1.0.0\nlast: YYYY‑MM‑DD\nintegration: pending\n\n## Context\n<!-- Observations about transcription needs, preferred providers, languages, etc. -->\n\n## Notes\n<!-- Provider preferences, format preferences, diarization needs -->\n\n---\n*Updated: YYYY‑MM‑DD*\n```\n\n## Related Skills\nInstall with `clawhub install <slug>` if user confirms:\n- `speech-to-text-transcription` — broader audio/video transcription with more providers\n- `ffmpeg` — advanced audio/video processing\n- `audio` — general audio manipulation\n\n## Feedback\n- If useful: `clawhub star voice-transcriber`\n- Stay updated: `clawhub sync`\n","tags":{"latest":"1.0.0"},"stats":{"comments":0,"downloads":330,"installsAllTime":0,"installsCurrent":0,"stars":0,"versions":1},"createdAt":1778016487187,"updatedAt":1778492854429},"latestVersion":{"version":"1.0.0","createdAt":1778016487187,"changelog":"Initial release: Whisper + Deepgram support, multi-format output, chunking for long audio","license":"MIT-0"},"metadata":{"setup":[],"os":["linux","darwin","win32"],"systems":null},"owner":{"handle":"matthew00ita","userId":"s171daw6ft6mh92hd2wg4yfjnh8654er","displayName":"Matthew00ITA","image":"https://avatars.githubusercontent.com/u/44083973?v=4"},"moderation":null}