Security audit

MiniMax TTS for FeiShu

Security checks across malware telemetry and agentic risk

Overview

This skill does what it says: it uses MiniMax to generate speech and Feishu to deliver audio, with privacy-sensitive chat and credential handling that users should configure carefully.

Install this only if you want a Feishu-connected TTS bot. Use minimally scoped Feishu app credentials, set the recipient deliberately, avoid converting confidential chat content, and remember that recent message text may be cached locally in /tmp and sent to MiniMax and Feishu during normal operation.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
Taint TrackingDirect Taint Flow, Variable-Mediated Taint Flow, Credential Exfiltration Chain
MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration

Findings (19)

Tainted flow: 'VOICES_MAP_PATH' from os.environ.get (line 13, credential/environment) → open (file write)

Medium

Category: Data Flow
Content: *本文件由 update_voices_map.py 自动更新* """ with open(VOICES_MAP_PATH, "w", encoding="utf-8") as f: f.write(content) print(f"✅ 已更新 {VOICES_MAP_PATH}")
Confidence: 76% confidence
Finding: with open(VOICES_MAP_PATH, "w", encoding="utf-8") as f:

Lp3

Medium

Category: MCP Least Privilege
Confidence: 94% confidence
Finding: The skill documentation declares no permissions while the described functionality clearly requires environment access, local file read/write, and network communication to MiniMax and Feishu. This is dangerous because it hides the true trust boundary from users and reviewers, making data exfiltration and filesystem side effects easier to overlook.

Tp4

High

Category: MCP Tool Poisoning
Confidence: 96% confidence
Finding: The top-level description presents the skill as simple TTS, but the documented behavior includes message retrieval, local chat-message storage, outbound transmission to third-party services, Feishu message sending, voice design, and local metadata updates. This mismatch is security-relevant because users may invoke it expecting only text-to-speech while the skill accesses prior conversation content and persists data locally.

Context-Inappropriate Capability

Medium

Confidence: 96% confidence
Finding: The skill metadata declares FEISHU_APP_ID and FEISHU_APP_SECRET as required even though the stated purpose is MiniMax text-to-speech. Requiring unrelated credentials broadens the secret exposure surface and could allow the skill or its dependencies to access Feishu resources without a clear functional need, which is especially suspicious in a minimal metadata file with no documented justification.

Description-Behavior Mismatch

High

Confidence: 98% confidence
Finding: The declared skill purpose is text-to-speech, but the code also obtains Feishu tokens, uploads generated audio, and sends messages externally. This hidden capability materially changes the trust boundary and can cause unintended data disclosure or unauthorized outbound messaging if users expect only local or direct TTS generation.

Context-Inappropriate Capability

Medium

Confidence: 95% confidence
Finding: The skill loads Feishu application credentials and recipient-related data despite being described as a TTS skill. Collecting and using extra credentials beyond the stated purpose increases blast radius and enables unauthorized external communications if the skill is invoked unexpectedly or misused.

Description-Behavior Mismatch

Low

Confidence: 88% confidence
Finding: The script persists chat text to a predictable file in /tmp for later reuse, which can expose potentially sensitive message content to other local processes or users depending on host configuration. Because the stored content may later be converted to speech without fresh user confirmation, the skill context increases privacy risk beyond simple transient processing.

Description-Behavior Mismatch

Medium

Confidence: 87% confidence
Finding: The skill can fetch the body of an arbitrary replied-to Feishu message, expanding its access from user-supplied text to chat history content. In a TTS skill, this broader message-retrieval capability is more sensitive because it can speak or forward content the current user did not explicitly re-enter, creating privacy and authorization concerns if message scope is not constrained.

Description-Behavior Mismatch

Medium

Confidence: 87% confidence
Finding: This module does more than local text-to-speech generation: it creates new voices via a remote API and then uploads/sends generated audio through Feishu. That broader behavior increases data-sharing and action scope beyond what a user would reasonably expect from a TTS skill, which can lead to unintended disclosure of user content or generated media.

Context-Inappropriate Capability

Medium

Confidence: 93% confidence
Finding: Uploading generated audio to Feishu introduces an outbound communication channel unrelated to core TTS generation. If users provide sensitive text, the resulting audio may be disclosed to an external messaging platform without clear expectation or consent.

Vague Triggers

Medium

Confidence: 89% confidence
Finding: The README advertises a conversational trigger phrase, "转语音", but does not define clear activation boundaries, authorization checks, or exclusions for quoted/mentioned text. In a chat-integrated skill that can send generated audio to Feishu, an overly broad trigger increases the chance of unintended activation, causing recent messages to be converted and delivered without explicit user intent.

Vague Triggers

Medium

Confidence: 90% confidence
Finding: The Chinese documentation repeats that saying "转语音" can convert recent messages to speech, but it still omits scope limits, consent requirements, and edge cases. Because the skill sends audio to a specified Feishu Open ID and caches the last message, ambiguous triggering could expose chat content or cause spammy/unintended audio delivery.

Vague Triggers

Medium

Confidence: 82% confidence
Finding: Broad conversational triggers such as '说一遍' or '语音播放' can be used in ordinary discussion and may unintentionally activate the skill. In this skill's context, accidental activation is more dangerous because it can retrieve prior message content and send it to external services and Feishu without a clearly deliberate command.

Missing User Warnings

Medium

Confidence: 97% confidence
Finding: The workflow states that the skill fetches replied or recent messages, preprocesses them, sends them to the MiniMax API, and then sends audio to Feishu, but it does not warn users that message content leaves the local context and is reused. This creates a privacy and data-handling vulnerability because sensitive chat content may be transmitted to external providers or other recipients without informed consent.

Missing User Warnings

Medium

Confidence: 97% confidence
Finding: User-provided text is transmitted to an external TTS provider with no visible disclosure, warning, or consent check in the execution path. If the text contains sensitive or regulated content, this can violate user expectations, internal policy, or compliance requirements.

Missing User Warnings

Medium

Confidence: 98% confidence
Finding: The generated audio file is uploaded to Feishu without a clear user-facing indication that content leaves the local workflow and is sent to a messaging platform. This creates a secondary disclosure channel for potentially sensitive material and is more dangerous because it is outside the advertised TTS-only scope.

Missing User Warnings

Medium

Confidence: 91% confidence
Finding: The function forwards message-derived text to downstream TTS processing and also fetches content from Feishu APIs, but there is no explicit consent, warning, or data-handling notice in code or flow. In a chat skill, message text may contain sensitive personal or business information, so sending it to external processing without clear disclosure creates a real privacy and compliance risk.

Missing User Warnings

Medium

Confidence: 88% confidence
Finding: User-supplied prompt and preview text are transmitted to a remote API, which can expose sensitive content if users assume processing is local or limited to synthesis only. In the context of a TTS skill, undisclosed third-party transmission materially affects privacy expectations.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: The skill uploads generated audio to Feishu without any visible warning or consent control in this file. That can disclose user-derived content to a third party, especially problematic because the skill's stated purpose is TTS rather than messaging or file sharing.

VirusTotal

65/65 vendors flagged this skill as clean.

View on VirusTotal

Static analysis

Detected: suspicious.dangerous_exec

Shell command execution detected (child_process).

Critical

Code: suspicious.dangerous_exec
Location: index.js:19