Smart Audio Analyzer

WarnAudited by ClawScan on May 18, 2026.

Overview

The skill’s audio-analysis purpose is coherent, but its local Whisper fallback builds a shell command from the user-provided audio path, creating a real command-execution risk.

Install only if you are comfortable with cloud audio processing and persistent local voice profiles. Avoid using the local Whisper fallback on untrusted or strangely named files until the shell-command construction is fixed.

Findings (4)

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

What this means

Processing a maliciously named local audio file could run arbitrary commands on the user’s machine with the agent’s privileges.

Why it was flagged

The command is executed through a shell and includes the user-supplied audio path and derived output directory. A crafted filename containing shell metacharacters could execute unintended commands when the Whisper fallback is used.

Skill content
const audioPath = process.argv[2]; ... execSync(`whisper "${audioPath}" --language zh --output_format json --output_dir "${outputDir}" --model base`,
Recommendation

Replace execSync shell strings with execFile/spawn using an argument array, validate paths, and avoid automatic processing of untrusted filenames until this is fixed.

What this means

Private recordings and transcripts may be processed by third-party AI providers unless the user forces an offline local workflow.

Why it was flagged

The skill clearly discloses provider API use and cloud upload of audio for transcription/summarization; this is purpose-aligned but sensitive.

Skill content
network: AssemblyAI API, OpenRouter API, Google Gemini API ... Audio files are uploaded to cloud ASR services for transcription (use Whisper for offline).
Recommendation

Use this only with recordings you are allowed to send to the configured provider, or force local Whisper for sensitive audio.

What this means

Stored voice profiles can influence future speaker labels and may contain biometric-like identifiers or personal associations.

Why it was flagged

The skill intentionally stores reusable speaker identity data across sessions. This is disclosed and central to the feature, but it is sensitive persistent memory.

Skill content
voiceprint.py extracts and stores speaker embeddings locally in references/voice-db.json for cross-session speaker identification ... Users must explicitly confirm speaker identity before profiles are updated.
Recommendation

Confirm identities carefully, review or delete voice-db.json/voice-profiles.md when needed, and avoid enrolling people without consent.

What this means

The skill can consume the configured provider accounts and send requested audio/transcript data under those credentials.

Why it was flagged

The skill uses provider API keys from the environment for transcription and summarization. This is expected for the integration and there is no evidence of unrelated credential use.

Skill content
const ASSEMBLYAI_KEY = process.env.ASSEMBLYAI_API_KEY; const GEMINI_KEY = process.env.GEMINI_API_KEY; const OPENROUTER_KEY = process.env.OPENAI_API_KEY;
Recommendation

Use provider keys with appropriate limits, keep them out of shared workspaces, and rotate them if the workspace is compromised.