Security audit

LrshuAI Voice Clone

Security checks across malware telemetry and agentic risk

Overview

This voice-cloning skill sends sensitive prompts and media to a configurable remote API, but its scope and privacy boundaries are not clearly disclosed.

Review this before installing. Only use it if you understand which remote service TEAM_BASE_URL points to, trust that endpoint, and are comfortable sending prompts and media there with your bearer token. Do not submit someone else's voice sample unless you have permission, and avoid sensitive or private media until the skill adds clear privacy, consent, endpoint validation, and input-scope controls.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
Behavioral ASTexec() Call, eval() Call, Dynamic Import
Taint TrackingDirect Taint Flow, Variable-Mediated Taint Flow, Credential Exfiltration Chain

Findings (11)

Tainted flow: 'endpoint' from os.getenv (line 39, credential/environment) → requests.post (network output)

Critical

Category: Data Flow
Content: print(f"Invoking model: {args.model} ...") try: response = requests.post(endpoint, headers=headers, json=payload) response.raise_for_status() result = response.json()
Confidence: 94% confidence
Finding: response = requests.post(endpoint, headers=headers, json=payload)

Tainted flow: 'poll_endpoint' from os.getenv (line 132, credential/environment) → requests.get (network output)

Critical

Category: Data Flow
Content: while True: time.sleep(3) # 每3秒查询一次 poll_resp = requests.get(poll_endpoint, headers=headers) poll_resp.raise_for_status() poll_data = poll_resp.json()
Confidence: 93% confidence
Finding: poll_resp = requests.get(poll_endpoint, headers=headers)

Lp3

Medium

Category: MCP Least Privilege
Confidence: 94% confidence
Finding: The skill declares no explicit permissions while requiring an environment variable API key and instructing direct Python execution that will likely perform outbound network calls. This weakens transparency and sandbox expectations, making it easier for an agent or user to invoke code with more capability than the manifest clearly communicates.

Tp4

High

Category: MCP Tool Poisoning
Confidence: 97% confidence
Finding: The documented purpose is narrow—voice cloning from reference audio—but the described behavior indicates a generic remote model invoker supporting unrelated media types and arbitrary model requests. That mismatch is dangerous because users and agents may grant trust or sensitive inputs under false assumptions, while the skill can process broader content or exfiltrate data to a remote API.

Description-Behavior Mismatch

High

Confidence: 84% confidence
Finding: The implementation is a generic multimodal model invoker and does not enforce voice-cloning-only behavior despite the skill being presented as a voice-cloning tool. This mismatch expands capability beyond the declared purpose, increasing the chance of unauthorized data handling or misuse because callers may unknowingly send arbitrary prompts and media to a remote model service.

Context-Inappropriate Capability

Medium

Confidence: 88% confidence
Finding: Supporting arbitrary image and video inputs in a voice-cloning skill is unjustified by the stated function and broadens the data exposure surface. In this context, the mismatch makes the skill more dangerous because users expecting audio cloning may inadvertently transmit unrelated media to an external service, and the broader modality support can be repurposed for unintended operations.

Vague Triggers

Medium

Confidence: 78% confidence
Finding: The invocation language is broad enough that an agent may trigger the skill whenever a user asks about voice cloning, without clear checks on consent, reference-audio ownership, or safety constraints. In a voice-cloning context, ambiguous activation increases the risk of unauthorized impersonation or transmission of sensitive biometric data.

Vague Triggers

Medium

Confidence: 88% confidence
Finding: The skill contains imperative instructions that the agent 'must' execute direct Python when the skill is requested, but it does not establish safe boundaries for when the request is legitimate or appropriate. This can bypass safer orchestration paths and encourages automatic execution based on minimal user prompting.

Missing User Warnings

Medium

Confidence: 85% confidence
Finding: The documentation instructs direct subprocess execution via Python but does not warn users or agents that this bypasses the standard execution wrapper. That omission reduces informed consent and may circumvent logging, sandboxing, or permission enforcement expected from the normal skill execution path.

Missing User Warnings

High

Confidence: 96% confidence
Finding: A voice-cloning skill inherently handles sensitive biometric audio, and this one also depends on an API key for a remote service, yet the documentation provides no privacy warning. Users may unknowingly submit personal or third-party voice samples to an external API without understanding retention, sharing, or consent implications.

Missing User Warnings

Medium

Confidence: 80% confidence
Finding: The code sends prompts and potentially base64-encoded local media to a remote API but provides no explicit disclosure, consent, or minimization controls. This is risky because prompts and media may contain sensitive personal or proprietary information, and in a voice-cloning context the uploaded content may include biometric voice data, making undisclosed transmission more sensitive than usual.

VirusTotal

64/64 vendors flagged this skill as clean.

View on VirusTotal