Back to skill

Security audit

LrshuAI Voice Clone

Security checks across malware telemetry and agentic risk

Overview

This voice-cloning skill sends sensitive prompts and media to a configurable remote API, but its scope and privacy boundaries are not clearly disclosed.

Review this before installing. Only use it if you understand which remote service TEAM_BASE_URL points to, trust that endpoint, and are comfortable sending prompts and media there with your bearer token. Do not submit someone else's voice sample unless you have permission, and avoid sensitive or private media until the skill adds clear privacy, consent, endpoint validation, and input-scope controls.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
  • Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
  • Behavioral ASTexec() Call, eval() Call, Dynamic Import
  • Taint TrackingDirect Taint Flow, Variable-Mediated Taint Flow, Credential Exfiltration Chain
Findings (11)

Tainted flow: 'endpoint' from os.getenv (line 39, credential/environment) → requests.post (network output)

Critical
Category
Data Flow
Content
print(f"Invoking model: {args.model} ...")
    try:
        response = requests.post(endpoint, headers=headers, json=payload)
        response.raise_for_status()
        result = response.json()
Confidence
94% confidence
Finding
response = requests.post(endpoint, headers=headers, json=payload)

Tainted flow: 'poll_endpoint' from os.getenv (line 132, credential/environment) → requests.get (network output)

Critical
Category
Data Flow
Content
while True:
            time.sleep(3) # 每3秒查询一次
            poll_resp = requests.get(poll_endpoint, headers=headers)
            poll_resp.raise_for_status()
            poll_data = poll_resp.json()
Confidence
93% confidence
Finding
poll_resp = requests.get(poll_endpoint, headers=headers)

Lp3

Medium
Category
MCP Least Privilege
Confidence
94% confidence
Finding
The skill declares no explicit permissions while requiring an environment variable API key and instructing direct Python execution that will likely perform outbound network calls. This weakens transparency and sandbox expectations, making it easier for an agent or user to invoke code with more capability than the manifest clearly communicates.

Tp4

High
Category
MCP Tool Poisoning
Confidence
97% confidence
Finding
The documented purpose is narrow—voice cloning from reference audio—but the described behavior indicates a generic remote model invoker supporting unrelated media types and arbitrary model requests. That mismatch is dangerous because users and agents may grant trust or sensitive inputs under false assumptions, while the skill can process broader content or exfiltrate data to a remote API.

Description-Behavior Mismatch

High
Confidence
84% confidence
Finding
The implementation is a generic multimodal model invoker and does not enforce voice-cloning-only behavior despite the skill being presented as a voice-cloning tool. This mismatch expands capability beyond the declared purpose, increasing the chance of unauthorized data handling or misuse because callers may unknowingly send arbitrary prompts and media to a remote model service.

Context-Inappropriate Capability

Medium
Confidence
88% confidence
Finding
Supporting arbitrary image and video inputs in a voice-cloning skill is unjustified by the stated function and broadens the data exposure surface. In this context, the mismatch makes the skill more dangerous because users expecting audio cloning may inadvertently transmit unrelated media to an external service, and the broader modality support can be repurposed for unintended operations.

Vague Triggers

Medium
Confidence
78% confidence
Finding
The invocation language is broad enough that an agent may trigger the skill whenever a user asks about voice cloning, without clear checks on consent, reference-audio ownership, or safety constraints. In a voice-cloning context, ambiguous activation increases the risk of unauthorized impersonation or transmission of sensitive biometric data.

Vague Triggers

Medium
Confidence
88% confidence
Finding
The skill contains imperative instructions that the agent 'must' execute direct Python when the skill is requested, but it does not establish safe boundaries for when the request is legitimate or appropriate. This can bypass safer orchestration paths and encourages automatic execution based on minimal user prompting.

Missing User Warnings

Medium
Confidence
85% confidence
Finding
The documentation instructs direct subprocess execution via Python but does not warn users or agents that this bypasses the standard execution wrapper. That omission reduces informed consent and may circumvent logging, sandboxing, or permission enforcement expected from the normal skill execution path.

Missing User Warnings

High
Confidence
96% confidence
Finding
A voice-cloning skill inherently handles sensitive biometric audio, and this one also depends on an API key for a remote service, yet the documentation provides no privacy warning. Users may unknowingly submit personal or third-party voice samples to an external API without understanding retention, sharing, or consent implications.

Missing User Warnings

Medium
Confidence
80% confidence
Finding
The code sends prompts and potentially base64-encoded local media to a remote API but provides no explicit disclosure, consent, or minimization controls. This is risky because prompts and media may contain sensitive personal or proprietary information, and in a voice-cloning context the uploaded content may include biometric voice data, making undisclosed transmission more sensitive than usual.

VirusTotal

64/64 vendors flagged this skill as clean.

View on VirusTotal