ElevenLabs STT OpenClaw

Security checks across malware telemetry and agentic risk

Overview

The skill appears to provide ElevenLabs transcription as advertised, but its live-audio modes and API-key handling create privacy and credential-exposure risks users should review first.

Install only if you are comfortable sending selected audio, microphone streams, transcript text, and any configured webhook metadata to ElevenLabs or chosen endpoints. Prefer using it on a single-user machine, rotate the ElevenLabs API key if exposed, avoid sensitive recordings unless consent is clear, and review or modify the scripts to avoid putting API keys in command-line arguments.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • Taint TrackingDirect Taint Flow, Variable-Mediated Taint Flow, Credential Exfiltration Chain
  • MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
  • MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
  • Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Findings (12)

Tainted flow: 'API_KEY' from os.environ.get (line 8, credential/environment) → subprocess.Popen (code execution)

Medium
Category
Data Flow
Content
ffmpeg = subprocess.Popen(ffmpeg_cmd, stdout=subprocess.PIPE, stderr=subprocess.DEVNULL)

# websocat websocket
ws = subprocess.Popen([
    "websocat", WS_URL, "-t", "-H", f"xi-api-key: {API_KEY}"
], stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.DEVNULL, text=True)
Confidence
96% confidence
Finding
ws = subprocess.Popen([ "websocat", WS_URL, "-t", "-H", f"xi-api-key: {API_KEY}" ], stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.DEVNULL, text=True)

Tainted flow: 'tts_cmd' from os.environ.get (line 59, credential/environment) → subprocess.run (code execution)

Medium
Category
Data Flow
Content
"-d", json.dumps({"text": text})
    ]
    with open(tmp.name, "wb") as f:
        subprocess.run(tts_cmd, stdout=f, stderr=subprocess.DEVNULL)

    # Validate MP3 header (avoid AudioFileOpen errors on bad payloads)
    try:
Confidence
93% confidence
Finding
subprocess.run(tts_cmd, stdout=f, stderr=subprocess.DEVNULL)

Lp3

Medium
Category
MCP Least Privilege
Confidence
89% confidence
Finding
The skill declares no explicit permissions even though it requires environment access and shell execution, which hides its true execution capabilities from users and any policy layer that relies on declarations. In this context, the skill invokes local scripts, external binaries, and an API key, so missing permission disclosure reduces transparency and can lead to unintended execution or secret exposure.

Tp4

High
Category
MCP Tool Poisoning
Confidence
95% confidence
Finding
The documented purpose focuses on transcribing audio files, but the skill also supports realtime microphone capture, live listening, and optional speech playback/TTS behaviors that materially expand the privacy and execution risk. This mismatch is dangerous because users may consent to file transcription without realizing the skill can capture live audio or transmit/process additional content through other pathways.

Missing User Warnings

Medium
Confidence
83% confidence
Finding
The README encourages use of webhooks and cloud URL processing without a prominent privacy warning that audio content, transcripts, and metadata may be sent to third-party services. In the broader skill context, this is more sensitive because the skill also supports microphone-based capture and transcription of potentially private conversations, increasing the chance of unintentional data disclosure.

Missing User Warnings

Medium
Confidence
92% confidence
Finding
The skill description does not clearly warn that provided audio files or URLs are sent to ElevenLabs, and that webhook-based workflows may further transmit metadata or results to external endpoints. This is a privacy and data-handling transparency issue: users may supply sensitive recordings or URLs under the mistaken impression that processing is purely local because the description emphasizes a local CLI.

Missing User Warnings

Medium
Confidence
95% confidence
Finding
The script continuously captures microphone audio and streams it to an external service without any runtime consent prompt or clear in-code user-facing disclosure. In a local agent-skill setting, silent realtime microphone transmission is especially sensitive because users may invoke the tool from trusted automation contexts without realizing live audio leaves the device.

Missing User Warnings

Medium
Confidence
94% confidence
Finding
Transcript text is sent to an external TTS endpoint and stored in a local temporary MP3 without any explicit disclosure, consent, or cleanup. This creates privacy risk for potentially sensitive spoken content and leaves artifacts on disk that may be accessible after execution.

Missing User Warnings

Medium
Confidence
92% confidence
Finding
The script verifies that an ElevenLabs API key is present and then starts realtime microphone streaming to an external speech service, but it does not provide a clear user-facing notice or confirmation at the point of capture/transmission. In a live-listening skill, this creates a real privacy risk because users may unknowingly transmit sensitive spoken content off-device, especially in always-on or toggle modes.

Missing User Warnings

Medium
Confidence
93% confidence
Finding
The script streams locally supplied audio content to ElevenLabs over a WebSocket, but it provides no user-facing disclosure that the file's contents will leave the local machine and be sent to a third-party service. In a CLI skill handling potentially sensitive audio, this creates a real privacy and data-handling risk because users may assume local processing unless explicitly warned.

Missing User Warnings

Medium
Confidence
95% confidence
Finding
The script sends local audio files, cloud audio URLs, and optional webhook metadata to ElevenLabs' external API, but it provides no explicit user-facing disclosure or consent step at execution time. In a transcription skill, external transmission is expected, but the absence of a clear warning increases the risk of unintentionally exfiltrating sensitive voice content, PII, or secrets contained in audio or metadata.

External Transmission

Medium
Category
Data Exfiltration
Content
CURL_ARGS=(
    -s
    -X POST
    "https://api.elevenlabs.io/v1/speech-to-text"
    -H "xi-api-key: $API_KEY"
    -F "model_id=$MODEL_ID"
    -F "diarize=$DIARIZE"
Confidence
93% confidence
Finding
https://api.elevenlabs.io/

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal