Security audit

ifly-hyper-tts

Security checks across malware telemetry and agentic risk

Overview

This is a coherent text-to-speech skill that uses iFlytek credentials to send chosen text to iFlytek and save an audio file.

Install only if you are comfortable using iFlytek/Xfei for speech synthesis. Use a dedicated scoped API key if possible, avoid sending secrets or regulated/confidential text, and install websocket-client from a trusted package source.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access
Supply ChainUnpinned Dependencies, External Script Fetching, Obfuscated Code

Findings (4)

Missing User Warnings

Medium

Confidence: 91% confidence
Finding: The README instructs users to configure API credentials and send text to a remote iFlytek TTS service, but it does not clearly disclose that the provided text content will be transmitted to a third-party provider for processing. In an agent skill context, users may submit sensitive prompts, documents, or personal data for speech synthesis, so the missing disclosure creates a privacy and compliance risk rather than a code-execution issue.

Vague Triggers

Medium

Confidence: 76% confidence
Finding: The listed user instruction examples are broad natural-language phrases like '把这段文案读出来', which can easily appear in ordinary conversation and cause the skill to activate unexpectedly. In this context, accidental activation matters because the skill can send arbitrary supplied text to an external service and write output files, potentially causing privacy leakage or unintended API usage/cost.

Vague Triggers

Medium

Confidence: 82% confidence
Finding: The activation rule uses keyword matching for common phrases such as '读出来', '念出来', '转语音', and '文字转语音' without clear boundaries on whether the user actually wants an external TTS service invoked. Because this skill reads text or files and transmits content over the network, broad matching increases the chance of unintended execution, data disclosure to the provider, and unnecessary billable requests.

Missing User Warnings

Medium

Confidence: 84% confidence
Finding: The skill transmits user-supplied text to a third-party WebSocket endpoint for synthesis, but it provides no explicit warning, consent flow, or safeguards for sensitive content. In an agent skill context, users may provide private, regulated, or confidential text expecting local processing, so silent off-device transmission can cause privacy breaches and compliance issues.

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal

Static analysis

No suspicious patterns detected.