Speech to text

PassAudited by ClawScan on May 1, 2026.

Overview

This appears to be a normal local Whisper-based speech-to-text skill, with noteworthy but purpose-aligned file processing, dependency installation, and continuous-watch behavior.

This skill looks coherent for local speech-to-text use. Before installing, be comfortable with installing Whisper/PyTorch/FFmpeg dependencies, use a dedicated inbound folder, and remember that watch or batch modes can automatically process and move audio files while saving transcripts locally.

Findings (4)

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

Low

#ASI04: Agentic Supply Chain Vulnerabilities

What this means

Installing the skill may pull large third-party ML/audio packages from package repositories.

Why it was flagged

The skill depends on external Python packages with lower-bound version ranges rather than exact pinned versions. This is expected for a Whisper transcription tool, but users should be aware of the dependency supply-chain surface.

Skill content

openai-whisper>=20231117

torch>=1.8.0

torchaudio>=0.8.0

Recommendation

Install in a controlled Python environment and review or pin dependency versions if reproducibility is important.

Low

#ASI02: Tool Misuse and Exploitation

What this means

Audio files submitted for processing may be relocated from their original folder.

Why it was flagged

After processing, the script moves the original audio file into a processed or failed directory. This is common for inbound batch processing, but it means transcription is not strictly read-only.

Skill content

shutil.move(str(audio_file), str(target_path))

Recommendation

Use a dedicated inbound folder or keep backups if you do not want original audio files moved.

Low

#ASI10: Rogue Agents

What this means

If watch mode is started, new audio files placed in the inbound folder may be transcribed and moved automatically.

Why it was flagged

The skill exposes a continuous monitoring mode for the inbound folder. It is disclosed and purpose-aligned, but it can keep processing future files until stopped.

Skill content

- stt_watch: Inicia monitoramento contínuo da pasta inbound

Recommendation

Start watch mode only when continuous processing is desired, and stop it when finished.

Low

#ASI06: Memory and Context Poisoning

What this means

Private audio contents may be saved as readable local JSON transcript files.

Why it was flagged

The transcription output includes the audio file path and the transcribed text, which may contain private voice-message contents. This is expected for speech-to-text, but it creates persistent local records.

Skill content

'file_path': str(audio_path),
...
'text': result.get('text', '').strip()

Recommendation

Store the output directory securely and delete transcripts or logs when they are no longer needed.