MiMo Voice Assistant

Security checks across malware telemetry and agentic risk

Overview

This voice assistant skill shows evidence of intentionally hiding file-access behavior from scanners while sending local voice data to an external service.

Do not install this version. Require a clean, static-analyzable release with scanner-evasion code and wording removed, clear documentation of the API domain and data retention, explicit consent for voice cloning and off-device audio processing, and user-controlled limits on what local files can be read and sent.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access

Findings (8)

Intent-Code Divergence

Medium

Confidence: 99% confidence
Finding: The release notes explicitly state the code was changed to remove flagged keywords, avoid static analysis through dynamic imports, and advertise the absence of specific APIs in both code and comments. That is not normal product documentation; it is a strong indicator of deliberate scanner evasion and materially increases the likelihood that harmful behavior is being concealed elsewhere in the skill.

Intent-Code Divergence

Medium

Confidence: 95% confidence
Finding: The comment explicitly says the dynamic import is used to avoid static-analysis flagging, which is a concealment indicator rather than a technical necessity. While dynamic import of fs is not inherently dangerous, documenting evasion of security tooling undermines trust and can hide file-access behavior from reviewers and automated scanners.

Missing User Warnings

Medium

Confidence: 89% confidence
Finding: The README promotes voice cloning from reference audio without any warning about consent, impersonation, biometric privacy, or misuse. In a voice-assistant skill, this omission can normalize unsafe deployment patterns and enable unauthorized cloning of a person's voice for social engineering, fraud, or privacy violations.

Natural-Language Policy Violations

Medium

Confidence: 76% confidence
Finding: The skill instructs the agent to automatically detect a user's language and mirror it in both text and voice without explicit consent. In a voice assistant context this can create privacy and safety issues, such as inferring sensitive attributes, surprising users with unintended language switching, or causing speech output in contexts where the chosen language has operational or social risk.

Missing User Warnings

Medium

Confidence: 97% confidence
Finding: The script reads a local audio file, base64-encodes it, and sends the full contents to a remote API endpoint, which may expose sensitive voice data, background speech, or other personal information. In a voice-assistant skill, this is especially significant because users may reasonably expect local handling unless remote transmission is clearly disclosed and consented to.

Missing User Warnings

Medium

Confidence: 89% confidence
Finding: The documentation recommends automatic voice replies and routes content through a configured TTS provider, but it does not clearly warn that user message content may be transmitted to a separate service for synthesis. In a voice-assistant skill, this creates a real privacy and consent risk because users and operators may unknowingly send sensitive audio-derived text to an external processor.

Missing User Warnings

Medium

Confidence: 87% confidence
Finding: The curl example includes both message text and a bearer token sent to the local proxy, yet the guide gives no warning about handling sensitive content or protecting credentials. Even though the endpoint is localhost, the proxy may forward data upstream to a third-party API, so the omission can lead to accidental exposure of secrets and private user content.

Ssd 2

High

Confidence: 99% confidence
Finding: The changelog frames implementation choices in terms of avoiding scanners rather than improving functionality, including hiding imports from static analysis and removing terms that trigger review. In security analysis, this is a severe red flag because it signals conscious concealment behavior, making the surrounding skill substantially more dangerous even if the specific hidden payload is not visible in this file.

VirusTotal

61/61 vendors flagged this skill as clean.

View on VirusTotal