Elevenlabs Toolkit

Security checks across malware telemetry and agentic risk

Overview

The ElevenLabs voice features mostly match the stated purpose, but the skill also includes an undeclared Mistral chat endpoint that can send user messages to a second provider.

Review before installing. Use the ElevenLabs routes only if you are comfortable sending chosen text and audio to ElevenLabs, and remove or disable /api/story/chat unless you explicitly approve Mistral processing and MISTRAL_API_KEY use. Do not expose the FastAPI router publicly without authentication, file-size limits, quota controls, and clear privacy notice.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Behavioral ASTexec() Call, eval() Call, Dynamic Import
MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection

Findings (10)

Lp3

Medium

Category: MCP Least Privilege
Confidence: 91% confidence
Finding: The skill metadata declares outbound network use and an environment variable requirement, but there is no explicit top-level permissions declaration aligned with those capabilities. This can mislead operators and automated policy systems about what the skill actually needs, reducing transparency and making risky capabilities easier to approve unintentionally.

Context-Inappropriate Capability

Low

Confidence: 80% confidence
Finding: The documentation recommends spawning a local uvicorn server via subprocess from example code, which expands the skill from simple API usage into local process execution. Even though the example is local-only, encouraging process launch without clear safety boundaries increases operational risk and can normalize agent-driven code execution beyond the stated purpose.

Description-Behavior Mismatch

High

Confidence: 96% confidence
Finding: The skill metadata describes an ElevenLabs audio integration, but this endpoint adds a separate Mistral-backed chat capability that expands the trust boundary and data flows beyond the declared purpose. Hidden or unjustified capability expansion is dangerous because users or operators may provide sensitive content assuming it only goes to ElevenLabs, while the code can also route prompts to another provider and consume an additional secret.

Context-Inappropriate Capability

Medium

Confidence: 94% confidence
Finding: The code reads MISTRAL_API_KEY and invokes a second external AI service unrelated to the stated ElevenLabs toolkit purpose. This increases attack surface, introduces undisclosed data egress, and may cause operators to expose an additional credential they did not expect this skill to need.

Intent-Code Divergence

Medium

Confidence: 91% confidence
Finding: The comment claims the endpoint uses ElevenLabs Conversational AI, but the implementation actually calls Mistral for the response generation. Misleading implementation notes can conceal real data flows during review, causing incorrect security assumptions about which vendor receives conversation content and which credentials are in use.

Missing User Warnings

Low

Confidence: 86% confidence
Finding: The example instructs starting a local server subprocess automatically if a health check fails, but it does not explicitly warn that this executes a local process. In agentic environments, silent process creation can surprise users, bypass expected approval flows, and broaden the execution surface unnecessarily.

Missing User Warnings

Medium

Confidence: 86% confidence
Finding: Uploaded audio is forwarded to ElevenLabs speech-to-text, which is expected for STT functionality, but there is no visible disclosure or consent mechanism in the code. This is a security/privacy issue because users may upload sensitive voice data, biometric traits, or spoken secrets without being told that the content leaves the local system for third-party processing.

Missing User Warnings

Medium

Confidence: 84% confidence
Finding: User-provided text is sent to ElevenLabs for text-to-speech with no visible disclosure in the endpoint. While external transmission is inherent to a TTS integration, the lack of notice can expose private prompts or sensitive application data to a third party without informed user consent.

Missing User Warnings

High

Confidence: 95% confidence
Finding: The WebSocket streaming path forwards live user text to ElevenLabs and also includes the API key in the JSON payload after already authenticating in the WebSocket headers. This is more dangerous than ordinary TTS disclosure because it transmits credentials and conversational content over a long-lived bidirectional channel, increasing exposure if logs, intermediaries, or the remote service capture message bodies.

Missing User Warnings

Medium

Confidence: 92% confidence
Finding: Conversation content is sent to Mistral without visible disclosure, even though the skill is presented as an ElevenLabs integration. This creates undisclosed third-party sharing of user prompts and may surprise users and operators who did not consent to their conversational data being processed by a separate LLM vendor.

VirusTotal

64/64 vendors flagged this skill as clean.

View on VirusTotal