Super-Transcribe — Unified Speech-to-Text
v1.0.2Unified speech-to-text skill. Use when the user asks to transcribe audio or video, generate subtitles, identify speakers, translate speech, search transcript...
⭐ 0· 425·1 current·1 all-time
bySarah Mak@theplasmak
MIT-0
Download zip
LicenseMIT-0 · Free to use, modify, and redistribute. No attribution required.
Security Scan
OpenClaw
Benign
high confidencePurpose & Capability
Name/description (unified speech-to-text) align with contents: two bundled backends (faster-whisper and Parakeet/NeMo), CLI entrypoints, and many audio processing utilities. Requested binaries (python3, optional ffmpeg/yt-dlp) are appropriate for the described features.
Instruction Scope
SKILL.md and included scripts instruct the agent/user to run the bundled CLI and setup scripts which: (a) probe system state (GPU, ffmpeg, huggingface token), (b) create per-backend virtualenvs and install packages, (c) run ffmpeg/yt-dlp for conversion / URL downloads, and (d) download ML model weights on first use. These actions are within transcription scope, but the shared lib contains an auto_install_package helper that will run pip (or uv) using the current Python interpreter — if the CLI is invoked outside the created venv, that could install into the system environment. Consider running the skill in an isolated environment if you want to avoid system-wide pip operations.
Install Mechanism
There is no registry install spec (instruction-only), but the skill includes setup.sh scripts that will pip-install dependencies and trigger large downloads (PyTorch with CUDA, CTranslate2, NeMo, model weights via HuggingFace/pip). These are expected for this functionality; the downloads are from package managers/HuggingFace flows rather than an arbitrary short URL. Expect multi-gigabyte downloads for the Parakeet backend and model weights.
Credentials
The skill declares no required environment secrets. It optionally reads $HOME/.cache/huggingface/token when diarization is used (documented). No unrelated credentials or config paths are required. Network access is required for pip/yt-dlp/model downloads — appropriate for the purpose but worth noting.
Persistence & Privilege
always:false (not force-included). The skill writes venvs and caches under its scripts/backends directories and will cache downloaded models; it does not claim to modify other skills or global agent config. Autonomous invocation is allowed by default, which is normal; combine this with the lazy-install behavior if you want to control when downloads happen.
Assessment
This skill appears coherent with its stated purpose, but note the following before installing or running: (1) It will create per-backend virtual environments and perform pip installs and model downloads on first use — expect large (GB-scale) downloads for Parakeet/NeMo and PyTorch with CUDA. (2) The shared helper can run pip using the current Python interpreter; to avoid installing packages system-wide, run the tool inside an isolated environment (container, dedicated user, or a directory where venv creation is allowed) or review and run the setup scripts manually. (3) Speaker diarization may require a HuggingFace token at ~/.cache/huggingface/token; the skill will check for that file but does not require it unless you request diarization. (4) The tool uses ffmpeg and yt-dlp for media conversion and URL downloads — only install those if you need URL input or non-wav formats. Recommended steps: run ./scripts/transcribe --check (or the provided setup.sh --check) to preview what would be installed, use the lean-install option if bandwidth is limited, and inspect the setup.sh and auto-install helper if you prefer to perform installations manually or inside a controlled environment.Like a lobster shell, security has layers — review code before you run it.
latestvk974teyjc75sdn65p6z5f0ww5x8248zd
License
MIT-0
Free to use, modify, and redistribute. No attribution required.
Runtime requirements
🎙️ Clawdis
Binspython3
