ElevenLabs STT OpenClaw

v1.2.2

Transcribe audio files with ElevenLabs Speech-to-Text (Scribe v2) from the local CLI. Supports diarization, events, JSON output, webhooks, and advanced STT o...

0· 461·1 current·2 all-time
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
Name/description (ElevenLabs STT CLI) match the implementation: scripts call ElevenLabs STT and TTS endpoints, use ffmpeg for audio conversion, websocat for realtime websockets, and jq/curl for API interaction. Required binaries and ELEVENLABS_API_KEY are appropriate for the stated functionality.
Instruction Scope
The SKILL.md and scripts consistently instruct sending audio to ElevenLabs endpoints and optionally streaming microphone audio. One minor mismatch: README and transcribe.sh require ALLOW_LOCAL_FILE=true to operate on local files, but SKILL.md metadata lists only ELEVENLABS_API_KEY. Scripts also read optional env vars (ELEVENLABS_VOICE_ID, RT_*), which are not declared in the metadata. Aside from that, the instructions stay within the stated STT/TTS scope and do not attempt to read unrelated host files or secrets.
Install Mechanism
No install spec; this is an instruction-and-script bundle that relies on existing system binaries. No remote downloads or archive extraction are performed by an installer, which keeps install risk low.
Credentials
Only ELEVENLABS_API_KEY is required and is appropriate for calling ElevenLabs APIs. The scripts reference additional environment variables (ALLOW_LOCAL_FILE, ELEVENLABS_VOICE_ID, RT_DEVICE/RT_LANG/RT_TTS/RT_VOICE_ID) that are optional/defaulted but not declared in the skill metadata — this mismatch should be noted before use. No unrelated credentials are requested.
Persistence & Privilege
The skill does not request persistent or elevated platform privileges; always:false and it doesn't modify other skills or system-wide agent configuration.
Assessment
This skill appears to do what it claims: it streams or uploads audio to ElevenLabs for transcription and optionally uses ElevenLabs TTS for playback. Before installing/use: 1) Understand that your audio (and the ELEVENLABS_API_KEY) will be sent to api.elevenlabs.io — only use a key you trust to expose audio to that service. 2) To transcribe local files you must set ALLOW_LOCAL_FILE=true (the README and scripts require it, but SKILL.md metadata doesn't list it). 3) The live-listen mode will capture microphone audio — run it only when you intend to stream mic input. 4) Inspect the scripts locally (they are plain shell/python) and consider using a dedicated API key with limited permissions or usage limits. 5) If you plan to use webhooks, ensure the webhook endpoint you register is secure because ElevenLabs will deliver transcription results to it. If any of these behaviors are unexpected, do not run the scripts and rotate/revoke any keys used for testing.

Like a lobster shell, security has layers — review code before you run it.

Runtime requirements

Binscurl, jq, python3, ffmpeg, websocat
EnvELEVENLABS_API_KEY
latestvk9792y3d0d1t0pf1zq5jkzmead81vezd
461downloads
0stars
3versions
Updated 1mo ago
v1.2.2
MIT-0

ElevenLabs Speech-to-Text (Local CLI)

Use

Run the script in scripts/transcribe.sh with an audio file path or URL.

Examples:

scripts/transcribe.sh /path/to/audio.mp3
scripts/transcribe.sh /path/to/audio.mp3 --diarize --lang en
scripts/transcribe.sh /path/to/audio.mp3 --json
scripts/transcribe.sh /path/to/audio.mp3 --webhook --webhook-metadata '{"job":"call-001"}'
scripts/transcribe.sh --url https://example.com/audio.mp3 --lang en

Environment

Set ELEVENLABS_API_KEY in your shell or OpenClaw env before running.

Notes

  • Defaults to scribe_v2 (the Speech-to-Text model) and uses a filesystem lock to avoid parallel requests.
  • Requires curl and jq.
  • For async workflows, use --webhook with optional --webhook-id and --webhook-metadata.
  • Realtime streaming is available via scripts/realtime.sh (requires ffmpeg + websocat) and uses the scribe_v2_realtime model.
  • Live listener mode is available via scripts/live_listen.sh with toggle/always-on modes and optional TTS response.

Comments

Loading comments...