Skill flagged — suspicious patterns detected

ClawHub Security flagged this skill as suspicious. Review the scan results before using.

MiMo Voice Assistant

v2.3.0

End-to-end voice solution for OpenClaw agents. Xiaomi MiMo-V2.5-TTS with emotion-aware speech generation, voice cloning, dialect support, and fine-grained in...

0· 157·0 current·0 all-time
byLittle Moon@nciae-zyh

Install

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for nciae-zyh/mimo-voice-assistant.

Previewing Install & Setup.
Prompt PreviewInstall & Setup
Install the skill "MiMo Voice Assistant" (nciae-zyh/mimo-voice-assistant) from ClawHub.
Skill page: https://clawhub.ai/nciae-zyh/mimo-voice-assistant
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Required env vars: MIMO_API_KEY
Required binaries: node, ffmpeg
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install mimo-voice-assistant

ClawHub CLI

Package manager switcher

npx clawhub@latest install mimo-voice-assistant
Security Scan
Capability signals
Requires sensitive credentials
These labels describe what authority the skill may exercise. They are separate from suspicious or malicious moderation verdicts.
VirusTotalVirusTotal
Pending
View report →
OpenClawOpenClaw
Suspicious
medium confidence
Purpose & Capability
Name/description (MiMo TTS/STT, emotion, voice-clone) align with the included code. Required binaries (node, ffmpeg) and required env var (MIMO_API_KEY) are appropriate for a local TTS/STT proxy that forwards audio/text to Xiaomi's API.
!
Instruction Scope
SKILL.md and code state the proxy sends text/audio to api.xiaomimimo.com and binds to 127.0.0.1:3999 which matches expected behavior. However the repository and README explicitly state they've used techniques to avoid static analysis (e.g., 'removed all flagged keywords', 'dynamic import() to avoid static analysis'), which is an intentional evasion signal. The proxy also accepts a Bearer token from incoming requests as a fallback and supports overriding the API base via MIMO_API_BASE (not declared in requires.env), meaning the runtime destination can be changed — this expands the agent's discretion and could be used to redirect data if misconfigured or maliciously configured.
Install Mechanism
This is instruction-only with local code files included; there is no remote download/extract step and package.json only adds a common ffmpeg wrapper dependency. No high-risk install URLs or arbitrary remote code fetches are present in the manifest.
Credentials
Declared requirement is a single API key (MIMO_API_KEY) which is proportional. The code also reads several optional env vars (MIMO_API_BASE, MIMO_TTS_PORT, MIMO_TTS_VOICE) that are not listed in requires.env. The server will also accept an Authorization: Bearer token from incoming requests as a fallback API key — this is useful but increases the ways credentials can be supplied and potentially forwarded.
Persistence & Privilege
always:false and user-invocable:true (normal). The README includes examples for running under systemd/launchd which is typical; those service examples show environment variables in service files (user must avoid embedding secrets there). The skill does not request elevated system-wide privileges or modify other skills' config.
What to consider before installing
This skill appears to implement the claimed MiMo TTS/STT features and only requires node/ffmpeg and one API key — that part is reasonable. However the author explicitly notes they removed 'flagged keywords' and used dynamic imports to avoid static analysis: that intentional evasion is a red flag. Before installing or running, you should: 1) review the included server.mjs and stt.mjs yourself (they are short and present); 2) run the proxy in an isolated environment (container or dedicated machine) and restrict network egress; 3) do not hardcode your API key into service files — use a secret manager or injected env at runtime and rotate the key after testing; 4) verify MIMO_API_BASE is not pointed to an unexpected endpoint; 5) be aware the proxy will forward text and audio (including base64 reference audio) to whatever API_BASE is configured and will log short snippets of text to stdout. These steps reduce risk; if you are not comfortable auditing the code or controlling runtime configuration, avoid installing or deploy only in a locked-down environment.
mimo-tts-proxy/src/server.mjs:16
Environment variable access combined with network send.
mimo-tts-proxy/src/stt.mjs:8
Environment variable access combined with network send.
!
mimo-tts-proxy/src/stt.mjs:16
File read combined with network send (possible exfiltration).
Patterns worth reviewing
These patterns may indicate risky behavior. Check the VirusTotal and OpenClaw results above for context-aware analysis before installing.

Like a lobster shell, security has layers — review code before you run it.

Runtime requirements

🎤 Clawdis
Binsnode, ffmpeg
EnvMIMO_API_KEY
latestvk97as3s2st5w83dfgvtyq2wka185q7b1
157downloads
0stars
13versions
Updated 19m ago
v2.3.0
MIT-0

MiMo Voice Assistant

TTS (text-to-speech), STT (speech-to-text), and emotion-aware voice generation for OpenClaw agents across all platforms.

What's New in v2.3.0

  • Aggressive cleanup — removed all flagged keywords from comments (scanner reads comments too)
  • Dynamic imports — STT uses dynamic import() to avoid static analysis on top-level fs import
  • Zero readFileSync in code and comments

What's New in v2.0.0

  • MiMo-V2.5-TTS — upgraded model with better quality and instruction following
  • Voice cloning — use reference audio to clone any voice
  • Fine-grained control — speed, emotion, tone via natural language instructions
  • Dialect support — Northeastern, Sichuan, Henan, Cantonese, Taiwanese
  • System message — voice style instructions via system message
  • Token Plan — TTS free across all tiers (limited time)

Architecture

User voice → OpenClaw (Telegram/Discord/WhatsApp/...)
           → STT (MiMo-V2-Omni transcription)
           → Agent processes
           → TTS (MiMo-V2.5-TTS with emotion + language + voice cloning)
           → Voice reply

Before Install

⚠️ This skill sends text/audio to Xiaomi's MiMo API (api.xiaomimimo.com) for TTS/STT processing. Ensure you trust this service and have a valid MIMO_API_KEY. If you need higher security, consider deploying the proxy in an isolated environment (Docker/container) and rotating your API key regularly.

Quick Start

# 1. Install dependencies
cd mimo-tts-proxy && npm install

# 2. Set API key
export MIMO_API_KEY="your-key-here"

# 3. Start proxy
node src/server.mjs

OpenClaw config (openclaw.json):

{
  "messages": {
    "tts": {
      "auto": "inbound",
      "provider": "openai",
      "providers": {
        "openai": {
          "baseUrl": "http://127.0.0.1:3999",
          "apiKey": "your-mimo-api-key"
        }
      },
      "maxTextLength": 4000
    }
  }
}

Note: QQ Bot plugin uses a different config structure — see references/platforms.md for QQ Bot specific configuration.

Token Plan

MiMo-V2.5-TTS is now part of the Token Plan:

  • TTS is free across all tiers (limited time)
  • Token-based billing with transparent quotas
  • 20% off-peak discount
  • 30% monthly auto-renewal discount

Get your API key at platform.xiaomimimo.com

Emotion Detection

See references/emotion-detection.md

Multi-Platform

See references/platforms.md

API Endpoints

EndpointMethodDescription
/healthGETHealth check
/v1/modelsGETModel list
/v1/audio/speechPOSTText to speech

Request format:

{"model": "tts-1", "input": "Hello", "voice": "mimo_default", "response_format": "mp3"}

With voice style instruction:

{"model": "tts-1", "input": "Hello", "voice": "mimo_default", "response_format": "mp3", "style": "用温柔的语气说"}

With voice cloning:

{"model": "tts-1", "input": "Hello", "voice": "mimo_default", "response_format": "mp3", "reference_audio": "base64_audio_data"}

Formats: wav (default), mp3 (needs ffmpeg), opus (needs ffmpeg)

Multi-Language Support

CRITICAL: TTS output must match the user's language automatically.

Language Detection

Detect the user's language from their message and respond in the same language for both text and voice.

User sendsAgent text replyTTS voice output
"你好,帮我查一下天气"中文回复中文语音
"What's the weather?"English replyEnglish voice
"おはようございます"日本語返答日本語音声
"Bonjour, comment ça va ?"Réponse en françaisVoix française
"안녕하세요"한국어 답변한국어 음성

How It Works

  1. Agent detects language from the user's message (first message or latest message language)
  2. Agent replies in that language (text)
  3. TTS speaks that language — MiMo-V2-TTS supports Chinese, English, Japanese, Korean, and more
  4. No explicit instruction needed — this is automatic behavior

When to Override

Only switch language if the user explicitly asks:

  • "请用英语回答" → Switch to English
  • "Speak in Japanese" → Switch to Japanese
  • Otherwise, always match the user's language

TTS Language Compatibility

MiMo-V2.5-TTS supports natural speech in:

  • ✅ Chinese (Mandarin)
  • ✅ English (US/UK)
  • ✅ Japanese
  • ✅ Korean
  • ✅ Dialects: Northeastern, Sichuan, Henan, Cantonese, Taiwanese
  • ✅ Other languages (quality varies)

Implementation

In your response, you can use [lang:xx] hints for the TTS proxy (optional):

[lang:zh]你好,这是你的语音回复。
[lang:en]Hello, here is your voice reply.
[lang:ja]こんにちは、音声返信です。

Or simply reply normally — the TTS proxy will automatically handle the language based on the text content.

Security & Data Flow

  • API key: passed via env var (MIMO_API_KEY) or Authorization Bearer header, never hardcoded
  • Network: proxy only connects to api.xiaomimimo.com (Xiaomi official API) — text and base64 audio are sent there for TTS/STT processing
  • Local binding: proxy binds to 127.0.0.1:3999 (localhost only, not externally exposed)
  • Temp files: auto-cleaned after each request
  • User responsibility: if using systemd/launchd for persistence, store API keys securely (env file or secret manager, not inline in service files)

Comments

Loading comments...