Alicloud Ai Audio Tts Realtime

Security checks across malware telemetry and agentic risk

Overview

This is a coherent Alibaba Cloud text-to-speech helper, with expected cloud API and credential use plus some documentation and input-handling caveats.

Install this only if you intend to use Alibaba Cloud DashScope TTS. Use a dedicated API key where possible, keep the default Alibaba endpoint unless you have verified an alternative, avoid sending confidential or regulated text unless your Alibaba Cloud terms and policies allow it, and review output paths before running fallback audio generation.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Taint TrackingDirect Taint Flow, Variable-Mediated Taint Flow, Credential Exfiltration Chain
MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands

Findings (5)

Tainted flow: 'audio_url' from os.getenv (line 186, credential/environment) → urllib.request.urlopen (network output)

Critical

Category: Data Flow
Content: def _download_audio(audio_url: str, output_path: Path) -> None: output_path.parent.mkdir(parents=True, exist_ok=True) with urllib.request.urlopen(audio_url) as response: output_path.write_bytes(response.read())
Confidence: 88% confidence
Finding: with urllib.request.urlopen(audio_url) as response:

Lp3

Medium

Category: MCP Least Privilege
Confidence: 96% confidence
Finding: The skill clearly requires environment access for credentials, filesystem access for reading references and writing outputs, and network access to call Alibaba Cloud services, yet no explicit permissions are declared. This creates a transparency and governance gap: callers may authorize or execute the skill without understanding its actual capability surface, which increases the risk of unintended data access, outbound transmission, and policy bypass in agent environments.

Description-Behavior Mismatch

Medium

Confidence: 89% confidence
Finding: The workflow instructs the agent to confirm region, identifiers, mutability, run read-only queries, and execute bounded cloud operations, which is generic cloud-operations guidance unrelated to a narrow TTS synthesis skill. This mismatch can cause an agent to overgeneralize the skill's authority and perform broader provider-side actions than intended, especially in autonomous settings where documentation is used as operational instruction.

Intent-Code Divergence

Medium

Confidence: 91% confidence
Finding: Referring to operations as 'read-only or mutating' contradicts the documented interface, which only describes realtime text-to-speech synthesis. That inconsistency can mislead orchestration systems or users into treating the skill as capable of broader state-changing actions, increasing the chance of misuse or unsafe delegation beyond the intended TTS scope.

Missing User Warnings

Medium

Confidence: 94% confidence
Finding: The skill instructs users to configure credentials and use a networked Alibaba Cloud TTS service but does not clearly disclose that input text, optional instructions, voice selections, and related metadata may be transmitted to a third-party cloud provider. In contexts where prompts contain sensitive or regulated information, this omission can lead to unintentional data exposure and noncompliant processing.

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal