Smart Audio Analyzer
WarnAudited by ClawScan on May 18, 2026.
Overview
The skill’s audio-analysis purpose is coherent, but its local Whisper fallback builds a shell command from the user-provided audio path, creating a real command-execution risk.
Install only if you are comfortable with cloud audio processing and persistent local voice profiles. Avoid using the local Whisper fallback on untrusted or strangely named files until the shell-command construction is fixed.
Findings (4)
Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.
Processing a maliciously named local audio file could run arbitrary commands on the user’s machine with the agent’s privileges.
The command is executed through a shell and includes the user-supplied audio path and derived output directory. A crafted filename containing shell metacharacters could execute unintended commands when the Whisper fallback is used.
const audioPath = process.argv[2]; ... execSync(`whisper "${audioPath}" --language zh --output_format json --output_dir "${outputDir}" --model base`,Replace execSync shell strings with execFile/spawn using an argument array, validate paths, and avoid automatic processing of untrusted filenames until this is fixed.
Private recordings and transcripts may be processed by third-party AI providers unless the user forces an offline local workflow.
The skill clearly discloses provider API use and cloud upload of audio for transcription/summarization; this is purpose-aligned but sensitive.
network: AssemblyAI API, OpenRouter API, Google Gemini API ... Audio files are uploaded to cloud ASR services for transcription (use Whisper for offline).
Use this only with recordings you are allowed to send to the configured provider, or force local Whisper for sensitive audio.
Stored voice profiles can influence future speaker labels and may contain biometric-like identifiers or personal associations.
The skill intentionally stores reusable speaker identity data across sessions. This is disclosed and central to the feature, but it is sensitive persistent memory.
voiceprint.py extracts and stores speaker embeddings locally in references/voice-db.json for cross-session speaker identification ... Users must explicitly confirm speaker identity before profiles are updated.
Confirm identities carefully, review or delete voice-db.json/voice-profiles.md when needed, and avoid enrolling people without consent.
The skill can consume the configured provider accounts and send requested audio/transcript data under those credentials.
The skill uses provider API keys from the environment for transcription and summarization. This is expected for the integration and there is no evidence of unrelated credential use.
const ASSEMBLYAI_KEY = process.env.ASSEMBLYAI_API_KEY; const GEMINI_KEY = process.env.GEMINI_API_KEY; const OPENROUTER_KEY = process.env.OPENAI_API_KEY;
Use provider keys with appropriate limits, keep them out of shared workspaces, and rotate them if the workspace is compromised.
