moss-tts-family-chatbot

Security checks across malware telemetry and agentic risk

Overview

This is a real MOSI audio skill, but it needs review because it can clone voices, upload sensitive audio/text to MOSI, and send Feishu voice messages with app credentials.

Review before installing. Use only with MOSI and Feishu accounts you trust, configure least-privilege Feishu credentials, confirm the target chat before sending voice bubbles, and avoid sending confidential recordings or text unless MOSI cloud processing is acceptable. Do not use the voice-cloning workflow unless you have explicit rights and consent from the speaker.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands

Findings (13)

Lp3

Medium

Category: MCP Least Privilege
Confidence: 85% confidence
Finding: The skill exercises sensitive capabilities including shell execution, network access, environment-variable access, and file writes, yet declares no permissions. This undermines least-privilege controls and informed review because operators may enable the skill without realizing it can call external APIs, access secrets such as API keys, and write media files locally.

Tp4

High

Category: MCP Tool Poisoning
Confidence: 81% confidence
Finding: The documented behavior extends beyond the stated purpose by including Feishu token retrieval and message sending, voice inventory operations, and additional generation modes not clearly reflected in the top-level declaration. Capability creep increases attack surface and can surprise users or administrators, especially where external messaging APIs and credentialed actions are involved.

Context-Inappropriate Capability

Medium

Confidence: 80% confidence
Finding: The skill uses Feishu app credentials to obtain tenant tokens, upload files, and send audio messages to external recipients. That is a credentialed outbound messaging capability beyond simple local audio processing, and if misused it could send data to unintended chats or abuse organization messaging privileges.

Missing User Warnings

Medium

Confidence: 88% confidence
Finding: The skill encourages voice cloning and even mentions celebrity-voice scenarios without any consent, authorization, or anti-impersonation safeguards. In context, this materially raises abuse risk: cloned or imitated voices can facilitate fraud, harassment, impersonation, and privacy violations.

Missing User Warnings

Medium

Confidence: 77% confidence
Finding: The skill mandates immediate transcription of inbound voice attachments without any user-facing privacy notice or confirmation. Because voice messages may contain sensitive personal or business information, automatic processing and transmission to an external ASR service increases privacy and compliance risk, especially in workplace chat contexts.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: The API reference documents voice cloning and audio upload workflows without any mention of consent, biometric sensitivity, retention, deletion, or acceptable-use restrictions. In this skill’s context, that omission is security-relevant because voice data is biometric and the surrounding skill explicitly promotes voice cloning, which increases the risk of non-consensual cloning, impersonation, and privacy abuse by downstream integrators.

Missing User Warnings

Medium

Confidence: 94% confidence
Finding: The script uploads the user-provided audio file to a third-party service at studio.mosi.cn for transcription, but it provides no user-facing notice, consent check, or privacy warning before transmitting potentially sensitive voice content. In this skill context, inbound voice messages may contain personal data, secrets, or regulated content, so silent exfiltration to a remote service creates a real privacy and compliance risk.

Missing User Warnings

Medium

Confidence: 90% confidence
Finding: The script transmits user-provided dialogue text to a third-party remote service at studio.mosi.cn, which can expose sensitive or private content if users are unaware that their input leaves the local environment. In a voice/TTS skill this behavior is expected functionally, but the lack of explicit disclosure or consent still creates a real privacy and data-handling risk.

Missing User Warnings

Low

Confidence: 72% confidence
Finding: The script writes synthesized audio to a persistent file under the user's home directory, which may leave potentially sensitive generated speech on disk without the user's awareness. This is a lower-severity issue because file output is core to the tool's purpose, but undisclosed persistence can still create privacy and forensic exposure.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: The script sends the user-provided prompt directly to a third-party remote API without any explicit consent prompt, warning, or data-classification check. If users place sensitive information in the prompt, that content is disclosed off-system, which is especially risky in an agent setting where users may not realize backend transmission occurs.

Missing User Warnings

Medium

Confidence: 88% confidence
Finding: The script uploads user-provided audio to a third-party remote service, but it provides no explicit notice, consent check, or privacy warning before transmitting potentially sensitive voice data. In a skill centered on ASR/TTS/voice cloning, audio may contain biometric identifiers or private speech, so silent external transmission creates a real privacy and compliance risk even if the behavior is functionally intended.

External Transmission

Medium

Category: Data Exfiltration
Content: echo "Generating dialogue..." curl -sf -X POST \ "https://studio.mosi.cn/api/v1/audio/speech" \ -H "Authorization: Bearer $API_KEY" \ -H "Content-Type: application/json" \
Confidence: 92% confidence
Finding: curl -sf -X POST \ "https://studio.mosi.cn/api/v1/audio/speech" \ -H "Authorization: Bearer $API_KEY" \ -H "Content-Type: application/json" \ -d

External Transmission

Medium

Category: Data Exfiltration
Content: echo "Generating sound effect: ${PROMPT:0:60}..." curl -sf -X POST \ "https://studio.mosi.cn/api/v1/audio/sound-effect" \ -H "Authorization: Bearer $API_KEY" \ -H "Content-Type: application/json" \
Confidence: 91% confidence
Finding: curl -sf -X POST \ "https://studio.mosi.cn/api/v1/audio/sound-effect" \ -H "Authorization: Bearer $API_KEY" \ -H "Content-Type: application/json" \ -d

VirusTotal

64/64 vendors flagged this skill as clean.

View on VirusTotal