Vocal Isolation, Background Music Removal

v1.1.0

Isolate vocals by removing background music from local audio/video files using a free remote GPU-powered pipeline with ffmpeg and Demucs.

1· 18·0 current·0 all-time
MIT-0
Download zip
LicenseMIT-0 · Free to use, modify, and redistribute. No attribution required.
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
medium confidence
Purpose & Capability
Name/description match the implementation: the code uses ffmpeg + Demucs inside a Modal container, converts inputs to flac, runs GPU inference, and writes vocals to a volume. The included files (isolate.py and src/) are appropriate for the stated pipeline. The SKILL.md mentions Modal usage and the code builds Modal images that install ffmpeg and Demucs, which is expected.
Instruction Scope
Run instructions stay within the isolation/use-case: they ask you to create a Modal volume, upload local audio/video via modal CLI, run the remote pipeline, and download results. Important operational behavior is explicit: user files are uploaded to Modal storage and model weights are downloaded from Hugging Face during runtime. The SKILL.md also references 'npx skills add' (odd/unrelated tooling mention) and suggests checking local ffmpeg; otherwise the runtime steps are scoped to the task.
Install Mechanism
No local install spec is provided (instruction-only at registry level), and runtime dependencies are installed into the Modal image (pip/apt inside the container). There are no third-party binary downloads from unknown URLs in the manifest. Required Python packages (demucs, torch, torchaudio, soundfile) are reasonable for the task.
Credentials
The skill declares no required environment variables or credentials, which aligns with the repo. However, practical use requires a Modal account/token (the SKILL.md instructs running 'modal setup' and the CLI uses Modal volumes) and model downloads may benefit from an HF_TOKEN (mentioned in error-handling guidance). The code symlinks ~/.cache to a models volume inside the container to persist downloads — this operates inside the Modal container's filesystem, not the user's host, but users should be aware model downloads occur and are stored in a shared volume.
Persistence & Privilege
always:false (no forced global presence). The skill will create/use Modal volumes named 'speech2srt-data' and 'speech2srt-models' (create_if_missing=True), which will persist uploaded audio and downloaded models in your Modal account. That is expected for this pipeline, but may collide with existing volumes of the same name and will store user data remotely until you run the provided cleanup step.
Assessment
This skill appears to implement what it claims, but it sends your local audio/video files to Modal's remote environment and stores results and downloaded models in Modal volumes. Before installing/using: (1) ensure you are comfortable uploading the files to a third-party remote service and check for any sensitive content; (2) make sure you have a Modal account and understand possible rate limits or charges — 'free' in the README may be misleading depending on your Modal plan; (3) be aware model weights are downloaded from Hugging Face and cached in a volume (HF_TOKEN may be needed for higher rate limits); (4) the skill will create volumes named speech2srt-data and speech2srt-models in your Modal account, which you should remove with the provided cleanup command if you want to delete persisted data; (5) the code performs a symlink of ~/.cache to the models volume inside the container to persist downloads — this affects the container filesystem (not your local home) but note it will persist model files across runs. If you need higher assurance, review the full source and verify behavior in a disposable Modal account before running on sensitive data.

Like a lobster shell, security has layers — review code before you run it.

latestvk97a3jg6xv1ddd6pbtmdwyekhx84hg14

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

Comments