Skill flagged — suspicious patterns detected

ClawHub Security flagged this skill as suspicious. Review the scan results before using.

Speech-to-text, 3x faster than Whisper, remote FREE GPU

v1.3.1

3x Faster than Whisper, Speech-to-text transcription with sentence-level timestamps on remote (FREE) L4 GPU. Trigger when user says: transcribe, speech to te...

⭐ 0· 42·0 current·0 all-time

by@speech2srt

MIT-0

Download zip

LicenseMIT-0 · Free to use, modify, and redistribute. No attribution required.

Security Scan

VirusTotal

Benign

View report →

OpenClaw

Suspicious

medium confidence

ℹ

Purpose & Capability

The name/description (remote L4 GPU speech-to-text) aligns with the code and SKILL.md: it uses Modal, a CUDA image, faster-whisper/stable-whisper, and Modal volumes. Minor mismatches: SKILL.md advertises a “FREE L4 GPU” which is a marketing claim (Modal provisioning may be free or billable depending on account), and config.PYTHON_VERSION = '3.11' whereas the image requests add_python='3.12' (inconsequential but inconsistent). Overall the requested resources (Modal, GPU, volumes) are reasonable for the stated purpose.

Instruction Scope

SKILL.md instructs users to use the Modal CLI and modal run, and to upload files to Modal volumes — that is coherent. However, the runtime code (_load_model) forcibly replaces ~/.cache with a symlink to the models volume: if run outside the intended Modal container (e.g., if someone runs transcribe.py locally), this could delete the user's ~/.cache directory (shutil.rmtree) and then symlink it. The instructions do not explicitly warn about this destructive behavior or require running only inside the Modal container. There is also a partial/truncated bug in the provided transcribe.py (a bare 'jso' token in the truncated portion) indicating the code may not be robust as-is.

ℹ

Install Mechanism

There is no external install spec; the code builds a Modal image that apt_installs ffmpeg and pip_installs 'faster-whisper' and 'stable-ts' — typical for this use case and traceable to PyPI. No arbitrary URL downloads or shorteners are used. Building a custom container image is expected for GPU inference, but pip-installed dependencies mean you will execute third-party packages from PyPI inside the image (normal but requires trust in those packages).

✓

Credentials

The skill declares no required environment variables and does not request unrelated credentials. The error-handling doc notes HF_TOKEN as optional for higher Hugging Face rate limits; that is reasonable and optional. The skill will, however, operate against the user's Modal account (Modal token) and create volumes under that account — expected for remote GPU runs.

ℹ

Persistence & Privilege

always is false (good). The skill creates Modal volumes (create_if_missing=True) and mounts them into the job image — this is expected for caching models and storing outputs but does grant persistent remote storage of uploaded audio and downloaded models in the user's Modal account. The code's symlink attempts to make ~/.cache point to the persistent models volume inside the container; the destructive cache replacement behavior is the main persistence/privilege risk if the code is run outside the container context.

What to consider before installing

What to consider before installing/running this skill: - The skill uses Modal (your Modal CLI/token) and will create and use Modal volumes to upload audio and store models/results. Uploaded audio and generated transcripts live in those volumes under your Modal account — treat that as remote storage. - The code contains a step that removes ~/.cache and symlinks it to the models volume when the model loads. If you accidentally run transcribe.py locally rather than via 'modal run', this could delete your local ~/.cache (which may contain other cached credentials or valuable caches). Do NOT run the Python file directly on your machine unless you inspect and modify that behavior first. - Dependencies are installed from PyPI (faster-whisper, stable-ts). That is expected for model inference, but you must trust those packages. The image also apt-installs ffmpeg. - The README mentions HF_TOKEN as optional for higher rate limits; you should only set that if you trust the skill to access Hugging Face on your behalf. - There is at least one apparent bug/typo in transcribe.py (truncated 'jso' usage) — expect the code may need fixing before reliable use. Recommendations: 1) Inspect transcribe.py and remove/modify the code that deletes ~/.cache (or ensure it runs only in an isolated container). 2) Run the skill in an isolated Modal account or project where persistent volumes and billing are acceptable. 3) Backup your local ~/.cache before trying local experimentation. 4) If you need stronger assurance, run the container build in an isolated environment and review all third-party dependencies (PyPI packages) before executing on real data.

Like a lobster shell, security has layers — review code before you run it.

latestvk977saagh6vv889bwdg8bedn8s84ze13

License

MIT-0

Free to use, modify, and redistribute. No attribution required.

Termshttps://spdx.org/licenses/MIT-0.html

Speech-to-text, 3x faster than Whisper, remote FREE GPU

License

Comments