Skillv1.0.0

ClawScan security

Audio Speaker Tools · ClawHub's context-aware review of the artifact, metadata, and declared behavior.

Scanner verdict

SuspiciousFeb 27, 2026, 3:29 AM

Verdict: suspicious
Confidence: medium
Model: gpt-5-mini
Summary: The skill mostly does what its name says (speaker separation and voice comparison) but its metadata and instructions are inconsistent about required secrets/installation and it installs heavy third‑party ML packages without pinned sources — users should review and run it in isolation before trusting it with sensitive audio or credentials.
Guidance: Before installing or running this skill: - Expect to provide a HuggingFace token (HF_TOKEN) — the registry metadata failed to list it; do not overlook this sensitive credential. Use a token with minimal privileges and restrict it (rotate/revoke when done). - Audit the setup_venv.sh and pinned dependencies: it pip-installs many ML packages without version pins or checksums. Run in an isolated environment (container or VM) and consider pinning package versions before install. - Be aware the scripts will download pretrained models from HuggingFace (network I/O). If you need to avoid external downloads, mirror/verify required models first. - The functionality (separation + comparison + preparing samples for ElevenLabs) is consistent with the code, but cloning/verification of voices has privacy and legal implications — ensure you have consent for any voice processing or uploads to third parties (e.g., ElevenLabs). - If you plan to use HF_TOKEN, avoid passing it on the CLI or in shared logs; the scripts accept --token but the README recommends env var usage. Consider using a secrets manager and run the tool in an environment where stdout/stderr are not exposed. - If you want higher assurance, ask the publisher for provenance (source repo, homepage, signed releases) and for pinned dependency versions or a lockfile. Run a test in an isolated VM with non-sensitive sample audio first.

Review Dimensions

Purpose & Capability: concernThe code and SKILL.md implement speaker separation (pyannote/Demucs) and voice comparison (Resemblyzer), which matches the advertised purpose. However the registry metadata claims no required env vars or credentials while the runtime instructions and scripts require a HuggingFace token (HF_TOKEN) for model downloads. That mismatch (no declared HF_TOKEN in metadata) is an incoherence users should be aware of.
Instruction Scope: noteSKILL.md and the scripts are explicit about workflows and tools (ffmpeg, Demucs, pyannote, Resemblyzer). The scripts convert audio, run diarization, export RTTM/segments, and compute embeddings — all within the stated domain. They will download pretrained models from HuggingFace (Pipeline.from_pretrained), which involves network activity to HF and requires the HF_TOKEN for gated model access. Minor inconsistency: SKILL.md says 'never as CLI arg' for HF_TOKEN, but diarize_and_slice_mps.py accepts a --token argument (it still defaults to HF_TOKEN). No other unexpected file reads, hidden endpoints, or data exfiltration code was found in the provided sources.
Install Mechanism: noteThis is instruction-plus-scripts (no platform install spec). A provided setup_venv.sh will create a virtualenv and pip-install packages including torch, demucs, pyannote.audio, resemblyzer, pydub, librosa. Installing via pip from PyPI is expected for this functionality but the script does not pin versions or verify package sources (no checksums). Installing PyTorch and ML packages can be slow and may pull many wheels — this is normal for the task but has typical supply‑chain risk if you don’t audit or pin versions.
Credentials: concernThe runtime requires a HuggingFace token (HF_TOKEN) to load gated pyannote models; the scripts also accept HUGGINGFACE_TOKEN. The registry metadata lists no required env vars — this omission is an inconsistency and increases the chance a user will overlook the sensitive HF_TOKEN requirement. No other credentials are requested, which is proportionate to the purpose, but HF_TOKEN is sensitive and necessary for core functionality.
Persistence & Privilege: okThe skill does not request permanent/always-on privileges (always:false), does not modify other skills, and does not claim to persist credentials on the agent. It runs as a set of scripts invoked by the user; nothing here implies elevated platform privileges.