Elevenlabs Transcribe

Transcribe audio to text using ElevenLabs Scribe. Supports batch transcription, realtime streaming from URLs, microphone input, and local files.

MIT-0 · Free to use, modify, and redistribute. No attribution required.

⭐ 2 · 2.2k · 2 current installs · 3 all-time installs

by@PaulAsjes

MIT-0

Security Scan

VirusTotal

Benign

View report →

OpenClaw

Suspicious

medium confidence

ℹ

Purpose & Capability

The scripts and declared requirements (ffmpeg, python3, ELEVENLABS_API_KEY) align with a speech-to-text skill using ElevenLabs. However, SKILL.md calls this the 'Official ElevenLabs skill' while the registry 'Source' is unknown and the owner ID does not obviously belong to ElevenLabs — possible impersonation or mislabeling.

✓

Instruction Scope

The runtime instructions and scripts stay within the stated purpose: convert audio (file, mic, URL) to text and send audio to ElevenLabs via their SDK. One minor scope note: the Python code calls load_dotenv(), which will read a local .env file if present — that can surface other environment variables from disk (not declared in requires.env).

ℹ

Install Mechanism

There is no platform install spec, but the provided shell wrapper auto-creates a local virtualenv and runs pip install -r requirements.txt. Main dependencies are pinned with hashes for supply-chain integrity (elevenlabs, pydub, python-dotenv), but some platform-specific packages (sounddevice, numpy) are not hashed. pip installs from PyPI on first run (network activity) and writes a .venv directory under the skill folder.

✓

Credentials

Only ELEVENLABS_API_KEY is declared and used; that is appropriate for a transcription client. Note that load_dotenv() may read a .env file from disk and load additional env vars implicitly. The code will transmit audio and the API key (via the ElevenLabs SDK) to ElevenLabs' service — this is expected behavior but worth confirming you're comfortable sending audio to that provider.

✓

Persistence & Privilege

The skill does not request always:true and won't be force-included. It sets up a per-skill .venv and an installed marker in the skill directory; it doesn't modify other skills or system-wide agent settings.

What to consider before installing

This skill's code behaves like a normal ElevenLabs transcription client, but before installing: 1) Confirm the publisher — SKILL.md claims 'Official ElevenLabs' but the source/owner are not ElevenLabs; prefer official plugins from the vendor when possible. 2) Review and protect your ELEVENLABS_API_KEY (use a scoped/test key if possible). 3) Be aware the script will create a local .venv and pip-install packages from PyPI (network activity); consider installing in an isolated environment/container. 4) Note load_dotenv() will read a .env file in the skill directory and could load other env vars — remove secrets you don't want read. 5) If you need stronger supply-chain guarantees, request that all requirements be pinned with verified hashes for every platform-specific package or run the code review/install inside an isolated sandbox first.

Like a lobster shell, security has layers — review code before you run it.

Current versionv1.0.1

Download zip

latestvk974g0tvkwspjd8zrrxg94dqpn80exh8

License

MIT-0

Free to use, modify, and redistribute. No attribution required.

Termshttps://spdx.org/licenses/MIT-0.html

Runtime requirements

🎙️ Clawdis

Binsffmpeg, python3

EnvELEVENLABS_API_KEY

Primary envELEVENLABS_API_KEY

SKILL.md

ElevenLabs Speech-to-Text

Official ElevenLabs skill for speech-to-text transcription.

Convert audio to text with state-of-the-art accuracy. Supports 90+ languages, speaker diarization, and realtime streaming.

Prerequisites

ffmpeg installed (brew install ffmpeg on macOS)
ELEVENLABS_API_KEY environment variable set
Python 3.8+ (dependencies auto-install on first run)

Usage

{baseDir}/scripts/transcribe.sh <audio_file> [options]
{baseDir}/scripts/transcribe.sh --url <stream_url> [options]
{baseDir}/scripts/transcribe.sh --mic [options]

Examples

Batch Transcription

Transcribe a local audio file:

{baseDir}/scripts/transcribe.sh recording.mp3

With speaker identification:

{baseDir}/scripts/transcribe.sh meeting.mp3 --diarize

Get full JSON response with timestamps:

{baseDir}/scripts/transcribe.sh interview.wav --diarize --json

Realtime Streaming

Stream from a URL (e.g., live radio, podcast):

{baseDir}/scripts/transcribe.sh --url https://npr-ice.streamguys1.com/live.mp3

Transcribe from microphone:

{baseDir}/scripts/transcribe.sh --mic

Stream a local file in realtime (useful for testing):

{baseDir}/scripts/transcribe.sh audio.mp3 --realtime

Quiet Mode for Agents

Suppress status messages on stderr:

{baseDir}/scripts/transcribe.sh --mic --quiet

Options

Option	Description
`--diarize`	Identify different speakers in the audio
`--lang CODE`	ISO language hint (e.g., `en`, `pt`, `es`, `fr`)
`--json`	Output full JSON with timestamps and metadata
`--events`	Tag audio events (laughter, music, applause)
`--realtime`	Stream local file instead of batch processing
`--partials`	Show interim transcripts during realtime mode
`-q, --quiet`	Suppress status messages (recommended for agents)

Output Format

Text Mode (default)

Plain text transcription:

The quick brown fox jumps over the lazy dog.

JSON Mode (`--json`)

{
  "text": "The quick brown fox jumps over the lazy dog.",
  "language_code": "eng",
  "language_probability": 0.98,
  "words": [
    {"text": "The", "start": 0.0, "end": 0.15, "type": "word", "speaker_id": "speaker_0"}
  ]
}

Realtime Mode

Final transcripts print as they're committed. With --partials:

[partial] The quick
[partial] The quick brown fox
The quick brown fox jumps over the lazy dog.

Supported Formats

Audio: MP3, WAV, M4A, FLAC, OGG, WebM, AAC, AIFF, Opus Video: MP4, AVI, MKV, MOV, WMV, FLV, WebM, MPEG, 3GPP

Limits: Up to 3GB file size, 10 hours duration

Error Handling

The script exits with non-zero status on errors:

Missing API key: Set ELEVENLABS_API_KEY environment variable
File not found: Check the file path exists
Missing ffmpeg: Install with your package manager
API errors: Check API key validity and rate limits

When to Use Each Mode

Scenario	Command
Transcribe a recording	`./transcribe.sh file.mp3`
Meeting with multiple speakers	`./transcribe.sh meeting.mp3 --diarize`
Live radio/podcast stream	`./transcribe.sh --url <url>`
Voice input from user	`./transcribe.sh --mic --quiet`
Need word timestamps	`./transcribe.sh file.mp3 --json`

Files

4 total

Select a file

Select a file to preview.

Comments

Loading comments…