Security audit

XunFei Voice Reply

Security checks across malware telemetry and agentic risk

Overview

This skill appears to do what it claims: convert replies to Xunfei-generated audio and send them through Feishu, with some privacy and implementation cautions.

Install only if you want replies processed by Xunfei and delivered through Feishu. Use dedicated Xunfei credentials, avoid voice mode for sensitive content unless approved, install ffmpeg/ws from trusted sources, and remember that voice mode persists in USER.md until switched back to text.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
Behavioral ASTexec() Call, eval() Call, Dynamic Import
MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration

Findings (9)

Lp3

Medium

Category: MCP Least Privilege
Confidence: 85% confidence
Finding: The skill uses sensitive capabilities via environment variables for external service credentials (`XUNFEI_APP_ID`, `XUNFEI_API_KEY`, `XUNFEI_API_SECRET`) but does not declare permissions or clearly scope that access. This weakens reviewability and can lead to unintended secret exposure or use of external-network-backed functionality without explicit operator awareness.

Context-Inappropriate Capability

Medium

Confidence: 95% confidence
Finding: The code builds shell commands with execSync using the unquoted outputPath directly in the ffmpeg command line. If outputPath can be influenced by upstream input, an attacker could inject shell metacharacters and execute arbitrary local commands. In a voice-reply skill, local media conversion is plausible, but doing it via shell string interpolation makes the context more dangerous rather than justified.

Vague Triggers

Medium

Confidence: 80% confidence
Finding: The phrase "语音模式" lacks clear activation boundaries and may be mentioned descriptively rather than as a command. In this skill, accidental triggering is more dangerous because it persists state in `USER.md` and changes subsequent responses to audio delivery through external systems.

Vague Triggers

Medium

Confidence: 80% confidence
Finding: The phrase "语音模式" lacks clear activation boundaries and may be mentioned descriptively rather than as a command. In this skill, accidental triggering is more dangerous because it persists state in `USER.md` and changes subsequent responses to audio delivery through external systems.

Missing User Warnings

Medium

Confidence: 90% confidence
Finding: The skill description and flow do not prominently warn that reply text is transmitted to an external TTS service and that generated audio is then sent to Feishu. This omission undermines informed consent and can expose sensitive user content to third parties when voice mode is enabled.

Missing User Warnings

Medium

Confidence: 87% confidence
Finding: The code writes temporary PCM data to disk, converts it with a subprocess, and deletes the temp file without any path safety checks or robust cleanup handling. If outputPath is attacker-controlled, this can lead to overwriting arbitrary files, unsafe temp-file handling, and operational exposure from local file manipulation; combined with the shell usage, the practical risk increases.

Missing User Warnings

Medium

Confidence: 92% confidence
Finding: The document explicitly describes automatically editing USER.md to switch reply mode and persist that change across sessions, but it does not mention user confirmation, auditability, or the fact that this alters a durable preference file. Persistent configuration changes without clear warning can surprise users, create unintended long-term behavior, and be abused to keep the agent in a mode the user did not knowingly authorize.

Missing User Warnings

Medium

Confidence: 88% confidence
Finding: The flow describes generating audio via the iFlytek TTS API and sending the resulting audio to Feishu, which means user content is transmitted to external services and across channels. Without an explicit privacy/transmission warning, users may not understand that their text is being shared with third parties or converted into audio artifacts outside the local environment.

Missing User Warnings

Medium

Confidence: 94% confidence
Finding: The script sends arbitrary user-provided text to an external TTS provider via `XunfeiTTS().synthesize(...)` but gives no notice, consent check, or data-classification guard before transmission. In a messaging skill, replies may contain sensitive user content, so silent export of message text to a third-party service creates a real privacy and compliance risk even if it is expected for functionality.

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal