Multimodal Base

Security checks across static analysis, malware telemetry, and agentic risk

Overview

This skill appears purpose-aligned for multimodal processing, but it can send selected images/audio to OpenAI and run local speech helper commands.

Before installing, confirm you are comfortable with selected media files being sent to OpenAI, install edge-tts/FFmpeg/Whisper-related tools only from trusted sources, keep API keys scoped and protected, and clear the pipeline history/output files after sensitive work.

Static analysis

Dangerous exec

Critical
Finding
Shell command execution detected (child_process).

Dangerous exec

Critical
Finding
Shell command execution detected (child_process).

Env credential access

Critical
Finding
Environment variable access combined with network send.

Env credential access

Critical
Finding
Environment variable access combined with network send.

Exposed secret literal

Critical
Finding
File appears to expose a hardcoded API secret or token.

Exposed secret literal

Critical
Finding
File appears to expose a hardcoded API secret or token.

VirusTotal

VirusTotal findings are pending for this skill version.

View on VirusTotal

Risk analysis

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

#
ASI03: Identity and Privilege Abuse
Medium
What this means

Images provided to this skill may be uploaded to OpenAI or another configured baseURL, and usage may consume the user's API quota.

Why it was flagged

The image module reads an OpenAI API key and sends base64-encoded image content to the configured OpenAI-compatible chat endpoint.

Skill content
this.apiKey = config.openaiApiKey || process.env.OPENAI_API_KEY; ... url: `data:${mimeType};base64,${base64Image}` ... 'Authorization': `Bearer ${this.apiKey}`
Recommendation

Only process images you are comfortable sending to the configured provider, and use an API key with appropriate billing and access controls.

#
ASI03: Identity and Privilege Abuse
Medium
What this means

Audio files selected for transcription may leave the local machine and be processed by OpenAI.

Why it was flagged

The speech recognizer reads a local audio file and submits it to OpenAI's transcription API using the configured API key.

Skill content
formData.append('file', await fs.readFile(audioPath), path.basename(audioPath)); ... axios.post('https://api.openai.com/v1/audio/transcriptions', formData, ... 'Authorization': `Bearer ${this.apiKey}`)
Recommendation

Avoid sending sensitive recordings unless that is intended, and prefer local mode if provider upload is not acceptable.

#
ASI05: Unexpected Code Execution
Low
What this means

Using TTS requires a trusted local edge-tts installation, and the skill will create audio output files.

Why it was flagged

The TTS module launches the edge-tts command-line tool to synthesize audio, which is expected for this feature but still runs a local executable.

Skill content
const edgeTTS = spawn('edge-tts', args);
Recommendation

Install edge-tts from a trusted source and keep outputDir pointed at a dedicated directory.

#
ASI04: Agentic Supply Chain Vulnerabilities
Low
What this means

Dependency installation can introduce normal supply-chain risk if packages or binaries are obtained from untrusted sources.

Why it was flagged

The documented setup relies on external package managers and optional system tooling; this is normal for the skill but outside the registry install spec.

Skill content
npm install

# 安装 Edge TTS(需要 Python)
pip install edge-tts

# 可选:安装 FFmpeg
Recommendation

Use trusted package repositories, review dependency versions, and consider pinning Python/system dependencies in controlled environments.

#
ASI06: Memory and Context Poisoning
Low
What this means

Transcripts, image descriptions, file paths, and raw input metadata may remain available within the pipeline instance until cleared or evicted.

Why it was flagged

The pipeline stores recent multimodal messages, including raw input metadata and processed text/image/audio results, in memory.

Skill content
this.history = []; this.maxHistory = config.maxHistory || 100; ... metadata: { rawInput: input, processedAt: Date.now() }
Recommendation

Call clearHistory() after sensitive sessions and avoid exporting conversation history unless intended.