Qwen Tts

Security checks across malware telemetry and agentic risk

Overview

This skill is a disclosed cloud text-to-speech helper with an optional documented Feishu send script, but users should avoid sending sensitive text or voice-cloning content without consent.

Install only if you are comfortable sending text to Alibaba DashScope for synthesis. Use speak.sh for local audio-file output, and use speak_and_send.py only when you intentionally want the generated audio uploaded and sent to the configured Feishu user. Keep API keys scoped, verify the Feishu recipient, and use voice cloning only with clear permission from the voice owner.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection

Findings (11)

Lp3

Medium

Category: MCP Least Privilege
Confidence: 91% confidence
Finding: The skill documentation describes capabilities that use environment variables, shell execution, network access, and local file output, but it does not declare corresponding permissions. This creates a transparency and governance gap: operators may enable the skill expecting only local TTS behavior while it can access secrets and perform outbound actions.

Tp4

High

Category: MCP Tool Poisoning
Confidence: 95% confidence
Finding: The declared purpose is TTS generation, but the skill also documents sending generated audio to Feishu using separate credentials and user identifiers, which is a materially different data-handling behavior. This mismatch can cause users or orchestrators to invoke the skill without realizing content will be transmitted to a third-party messaging platform, increasing risk of unintended exfiltration.

Description-Behavior Mismatch

High

Confidence: 96% confidence
Finding: The skill is described as a TTS capability, but the code also uploads generated audio and sends it to a Feishu user. That materially expands the trust boundary from local/media generation to external delivery, creating privacy, data-leakage, and covert-notification risks that a user may not expect from a TTS-only skill.

Context-Inappropriate Capability

High

Confidence: 95% confidence
Finding: The presence of Feishu app credentials and token acquisition logic indicates messaging/integration capability not justified by the stated TTS-only purpose. Hidden integration code is dangerous in an agent skill because users and reviewers may approve it for speech synthesis while it also gains access to external communication channels.

Context-Inappropriate Capability

High

Confidence: 97% confidence
Finding: The script uploads generated audio files and sends them to a Feishu user, which is a distinct exfiltration/messaging action beyond basic TTS generation. In an agent setting, this could be abused to transmit sensitive spoken content, confirmations, or other data to an external user without sufficiently clear user awareness.

Intent-Code Divergence

Medium

Confidence: 88% confidence
Finding: The module docstring describes audio generation and delivery details about synthesis quality, but omits the fact that the resulting audio is sent to Feishu. This mismatch undermines informed consent and makes the skill more dangerous because reviewers may underestimate its external communication behavior.

Vague Triggers

Medium

Confidence: 79% confidence
Finding: The trigger description includes broad everyday phrases such as requests to read text aloud or convert text to sound, which may cause the skill to activate in contexts where the user did not intend this specific integration. In a skill that can invoke external APIs and optionally send content onward, overbroad triggering raises the chance of accidental disclosure or unintended side effects.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: The documentation states that the Python script automatically sends synthesized audio to a Feishu user, but it does not clearly warn that user-provided content will be transmitted off-platform using stored credentials. Without explicit notice and consent, sensitive user text may be converted and delivered to a third party unexpectedly.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: The voice-cloning example directly shows how to replicate a person's voice but provides no warning about consent, impersonation, privacy, or legal restrictions. In a reusable skill reference, this omission can normalize unsafe use and materially increase the risk of non-consensual cloning, fraud, or deceptive impersonation.

Missing User Warnings

Medium

Confidence: 94% confidence
Finding: The script sends user-provided text to Alibaba Cloud's DashScope TTS API, which is an external transmission of potentially sensitive content. In a skill context, this is expected functionality, but it is still a real privacy/security issue if users are not clearly warned at execution time that their text leaves the local environment and is processed by a third party.

Missing User Warnings

Medium

Confidence: 90% confidence
Finding: The script sends user-provided text to DashScope for synthesis and later transmits generated audio to Feishu, but there is no explicit warning or consent flow about these disclosures. For a TTS skill, user text may contain sensitive or personal content, so silent external transmission raises privacy and compliance concerns.

VirusTotal

64/64 vendors flagged this skill as clean.

View on VirusTotal