Speech to text
PassAudited by ClawScan on May 1, 2026.
Overview
This appears to be a normal local Whisper-based speech-to-text skill, with noteworthy but purpose-aligned file processing, dependency installation, and continuous-watch behavior.
This skill looks coherent for local speech-to-text use. Before installing, be comfortable with installing Whisper/PyTorch/FFmpeg dependencies, use a dedicated inbound folder, and remember that watch or batch modes can automatically process and move audio files while saving transcripts locally.
Findings (4)
Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.
Installing the skill may pull large third-party ML/audio packages from package repositories.
The skill depends on external Python packages with lower-bound version ranges rather than exact pinned versions. This is expected for a Whisper transcription tool, but users should be aware of the dependency supply-chain surface.
openai-whisper>=20231117 torch>=1.8.0 torchaudio>=0.8.0
Install in a controlled Python environment and review or pin dependency versions if reproducibility is important.
Audio files submitted for processing may be relocated from their original folder.
After processing, the script moves the original audio file into a processed or failed directory. This is common for inbound batch processing, but it means transcription is not strictly read-only.
shutil.move(str(audio_file), str(target_path))
Use a dedicated inbound folder or keep backups if you do not want original audio files moved.
If watch mode is started, new audio files placed in the inbound folder may be transcribed and moved automatically.
The skill exposes a continuous monitoring mode for the inbound folder. It is disclosed and purpose-aligned, but it can keep processing future files until stopped.
- stt_watch: Inicia monitoramento contínuo da pasta inbound
Start watch mode only when continuous processing is desired, and stop it when finished.
Private audio contents may be saved as readable local JSON transcript files.
The transcription output includes the audio file path and the transcribed text, which may contain private voice-message contents. This is expected for speech-to-text, but it creates persistent local records.
'file_path': str(audio_path),
...
'text': result.get('text', '').strip()Store the output directory securely and delete transcripts or logs when they are no longer needed.
