Security audit

Tts Voice Generator

Security checks across malware telemetry and agentic risk

Overview

This appears to be a real TTS skill, but it can send local voice files and text to Datamass, use a stored API key, and retain custom voices without strong confirmation or deletion controls.

Review before installing. Use only if you are comfortable sending text, voice samples, transcripts, and selected local audio files to Datamass. Avoid uploading third-party voices or sensitive recordings unless you have permission, confirm whether custom voices can be deleted, and prefer a scoped API key dedicated to this service.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands

Findings (14)

Context-Inappropriate Capability

Medium

Confidence: 91% confidence
Finding: The skill reads a Datamass API key directly from ~/.openclaw/config.json, which is a sensitive local credential source not disclosed in the stated tool description. Accessing local secrets expands the skill's effective privileges and can surprise users or upstream orchestrators that expect only TTS operations, creating unnecessary secret exposure risk if the skill is reused, modified, or logs/errors leak credential-related details.

Context-Inappropriate Capability

Medium

Confidence: 94% confidence
Finding: The skill reads a credential from a fixed path in the user's home directory rather than receiving it through explicit workflow inputs or a managed secret mechanism. This creates an implicit trust boundary crossing: invoking the skill causes access to local sensitive material without clear user consent at call time, and any downstream compromise of the skill or service could misuse that credential.

Description-Behavior Mismatch

Medium

Confidence: 88% confidence
Finding: The service includes a get_username() call to /sys/openapi/user/info, which is outside the TTS operations described in the skill metadata. Even though the function is not obviously malicious and appears intended to resolve the current account identity, it expands data access beyond user-expected TTS functionality and can expose account information unnecessarily if invoked.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: The upload_audio flow sends a user-supplied local file_path to a remote TTS service, but the user-facing interface shown in this file does not provide an explicit warning that local audio content and associated transcript metadata will be transmitted off-device. This is a privacy and data-handling issue, especially because voice samples are highly sensitive biometric data and may contain personal information.

Vague Triggers

Medium

Confidence: 88% confidence
Finding: The trigger section uses broad natural-language patterns such as generic requests for TTS, listing voices, uploading audio, and even 'direct conversion' based on user-provided text. This can cause the skill to activate on ambiguous conversations and initiate external API usage, file handling, or billable actions without sufficiently explicit user intent, which is especially relevant because the skill supports uploads and paid generation.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: The document states that uploaded audio is saved to OSS with a 24-hour URL and that custom voices are permanently stored in a database, but it does not clearly warn users before upload/generation that their voice data and outputs will be transmitted to external services and retained. Because this skill handles biometric-like voice samples and generated audio, missing disclosure and consent language increases privacy, compliance, and data-handling risk.

Missing User Warnings

Medium

Confidence: 91% confidence
Finding: The skill uploads a user-specified local audio file to a remote TTS service, but this file does not present any disclosure, confirmation, or safety checks before exfiltrating local content. In an agent setting, local path access plus automatic network upload can expose sensitive recordings or unintended files if the path is influenced by a prompt or user misunderstanding.

Missing User Warnings

Low

Confidence: 86% confidence
Finding: The code silently accesses a locally stored API key without any user-facing notice in this file. While this overlaps with the credential-handling issue above, the separate concern is transparency: users may not realize the skill will inspect local configuration and use stored credentials when invoked.

Missing User Warnings

Medium

Confidence: 92% confidence
Finding: This function reads an arbitrary local file path and uploads the file contents to a remote API using the caller-provided path. If higher-level code passes untrusted or unintended paths, sensitive local audio or other files could be exfiltrated to the backend without an explicit confirmation step at the operation site.

Missing User Warnings

Medium

Confidence: 88% confidence
Finding: The method sends user-provided text, media path, style identifiers, and related metadata to a backend TTS service. In a TTS skill this is functionally expected, but it still creates a real privacy risk because potentially sensitive prompt content is transmitted off-host without any disclosure or consent handling in this code path.

Vague Triggers

Medium

Confidence: 88% confidence
Finding: The trigger section uses broad natural-language patterns such as generic requests to generate speech, list voices, upload audio, or convert text, which can cause the skill to activate unintentionally in contexts where the user did not explicitly intend to invoke this specific tool. In an agent environment, overbroad invocation increases the chance of accidental API usage, unwanted uploads, and unintended processing of user content, especially because this skill can submit tasks and upload files.

Missing User Warnings

Medium

Confidence: 94% confidence
Finding: The documentation instructs users to configure an API key, upload audio, clone/customize voices, and notes that audio is stored on OSS for 24 hours, but it does not provide explicit privacy, consent, retention, or third-party data handling warnings. This is risky because users may upload sensitive voice samples or personal data without understanding that the data is transmitted to external services and temporarily stored, creating privacy, compliance, and consent issues.

Missing User Warnings

Medium

Confidence: 83% confidence
Finding: The upload_audio() method reads an arbitrary local file path and sends the file contents to a remote endpoint. In an agent skill context, this is sensitive because local file exfiltration can occur if higher-level logic passes user or system paths without a clear consent boundary, especially since the skill description emphasizes TTS features rather than explicit local-file transfer risk.

Missing User Warnings

Medium

Confidence: 92% confidence
Finding: This method reads a local file and uploads its contents, along with optional transcript text, to a remote backend/OSS endpoint using the caller's API key. In an agent skill context, silent transmission of local content can expose sensitive audio or embedded personal data if the caller is not clearly informed and does not explicitly consent at the point of upload.

VirusTotal

65/65 vendors flagged this skill as clean.

View on VirusTotal