Skill flagged — suspicious patterns detected

ClawHub Security flagged this skill as suspicious. Review the scan results before using.

Multimodal Base

v0.1.0

Supports image understanding, OCR, speech-to-text, and text-to-speech synthesis with multi-voice and multimodal unified processing using OpenAI and Edge TTS.

⭐ 0· 43·0 current·0 all-time

by@yuyonghao-123

MIT-0

Download zip

LicenseMIT-0 · Free to use, modify, and redistribute. No attribution required.

Security Scan

VirusTotal

Benign

View report →

OpenClaw

Suspicious

medium confidence

ℹ

Purpose & Capability

Code and SKILL.md implement image understanding (OpenAI GPT-4V), OCR (tesseract.js), Whisper-based speech-to-text (API and local), and Edge TTS — which matches the skill description. However, registry metadata declared no required env vars while the code and docs rely on OPENAI_API_KEY. Also SKILL.md asks to pip install Python edge-tts while package.json lists an npm 'edge-tts' dependency — this mismatch is unexplained.

Instruction Scope

Runtime instructions and code perform file reads/writes (images, audio, temp files, output directory), call external network APIs (OpenAI endpoints, Hugging Face model URL), and spawn local executables ('whisper' / whisper.cpp, 'edge-tts', and 'ffprobe'). The pipeline also implements an automatic model download from a Hugging Face URL. Those actions go beyond pure in-process computation and require user awareness and filesystem/network permissions.

Install Mechanism

There is no automated install spec in the registry (instruction-only), but SKILL.md instructs npm install and pip install edge-tts. The code will download a binary model from a Hugging Face URL at runtime (extract/write to disk). Downloading/extracting model binaries and depending on external CLI tools increases risk and should be reviewed; the pip vs npm edge-tts ambiguity is also an installation coherence issue.

Credentials

The code and documentation require an OpenAI API key (process.env.OPENAI_API_KEY) for image and audio API calls, but the registry metadata lists no required environment variables. The skill also expects system binaries (whisper executable, edge-tts CLI, ffprobe) which are not declared in metadata. The requested access (OpenAI key + ability to write model and audio files + spawn executables) is significant and should be clearly declared and limited to what the user expects.

✓

Persistence & Privilege

The skill does not request permanent inclusion (always:false) and does not modify other skills or global agent settings. It stores output and temporary files within its own directories but does not claim elevated privileges.

What to consider before installing

This skill largely does what it says (image understanding, OCR, STT and TTS), but there are several red flags you should consider before installing: - It requires an OpenAI API key (used by multiple modules) even though the registry metadata lists no required env vars — confirm you are willing to provide that key. Limit the key's scope if possible. - The README asks you to pip install the Python 'edge-tts' CLI, but package.json also lists an npm 'edge-tts' package — clarify which implementation is intended. The code spawns a system 'edge-tts' command, so you must install the Python CLI or otherwise provide that executable. - The speech recognizer can run locally and will attempt to download a Whisper model from Hugging Face and save it to disk. Downloading and executing third-party binaries has risk — review the model URL and consider running in a sandbox or verifying checksums. - The code spawns external executables ('whisper' / whisper.cpp, 'edge-tts', 'ffprobe') and writes output/temp files. Ensure you trust the package source and run it in an environment where those binaries and filesystem writes are acceptable. - If you want to proceed: ask the author to (1) update registry metadata to declare required env vars (OPENAI_API_KEY), (2) clarify install steps (npm vs pip edge-tts), and (3) document where the Whisper model is stored and whether checksums/verifications are provided. Otherwise run the skill in an isolated container or VM and avoid giving it high-privilege credentials.

✗

src/speech-recognizer.js:75

Shell command execution detected (child_process).

✗

src/speech-synthesizer.js:156

Shell command execution detected (child_process).

✗

src/image-processor.js:22

Environment variable access combined with network send.

✗

src/speech-recognizer.js:31

Environment variable access combined with network send.

src/image-processor.js:33

File read combined with network send (possible exfiltration).

src/speech-recognizer.js:116

File read combined with network send (possible exfiltration).

Patterns worth reviewing

These patterns may indicate risky behavior. Check the VirusTotal and OpenClaw results above for context-aware analysis before installing.

Like a lobster shell, security has layers — review code before you run it.

latestvk97a0m524ss7229ymbp2km9b8983pjgk

License

MIT-0

Free to use, modify, and redistribute. No attribution required.

Termshttps://spdx.org/licenses/MIT-0.html

Multimodal Base

License

Comments