video-audio-replace
Replace video audio with TTS voice while preserving original timing. Includes subtitle generation from video using Whisper. Uses ElevenLabs or Edge TTS, alig...
MIT-0 · Free to use, modify, and redistribute. No attribution required.
⭐ 0 · 247 · 1 current installs · 1 all-time installs
bymarc@synthere
MIT-0
Security Scan
OpenClaw
Benign
medium confidencePurpose & Capability
Name/description match the included scripts: generating subtitles (Whisper), creating TTS segments (ElevenLabs or Edge), aligning and replacing audio with ffmpeg. Required libraries (requests, faster-whisper, edge-tts) and ffmpeg usage are proportionate to the stated purpose.
Instruction Scope
Runtime instructions and scripts operate on local video/audio files, call ffmpeg/sox, and send text to the declared TTS APIs (api.elevenlabs.io / Edge TTS). They do not read unrelated system files or attempt broad environment discovery, but SKILL.md and code assume presence of ELEVENLABS_API_KEY when using the ElevenLabs engine (the registry metadata listed no required env vars — this is inconsistent).
Install Mechanism
This is an instruction-only skill with bundled Python scripts and no installer that downloads arbitrary code. _meta.json lists pip packages; all dependencies are standard public packages. No downloads from untrusted URLs or extracted archives are present.
Credentials
The only credential used is ELEVENLABS_API_KEY (optional if you use Edge TTS). That is appropriate for the ElevenLabs integration. However, the registry metadata reported no required env vars while the code clearly checks ELEVENLABS_API_KEY and will exit if ElevenLabs is selected — metadata should be corrected. Also the default ElevenLabs voice constant is a long alphanumeric string (likely a voice ID) which could be confusing; ensure it is not a misplaced secret.
Persistence & Privilege
always is false, skill does not request persistent presence or modify other skills or system-wide settings. It runs as a local tool operating on user-supplied files.
Assessment
This skill appears to do what it says: it uses Whisper for subtitles, and ElevenLabs or Edge for TTS, then aligns/merges audio using ffmpeg. Before installing or running: 1) If you plan to use ElevenLabs set ELEVENLABS_API_KEY in your environment (or run with --engine edge to avoid sending text to an external API). 2) Review and install the listed Python packages in a virtualenv; run in an isolated environment if you’re concerned about dependencies. 3) Verify the default ElevenLabs voice constant is not a leaked secret (it looks like a voice ID but confirm). 4) Be aware that using ElevenLabs will send your subtitle text to their API — do not upload sensitive content. 5) Update the skill metadata if you maintain it so required env vars are declared accurately.Like a lobster shell, security has layers — review code before you run it.
Current versionv1.0.0
Download ziplatest
License
MIT-0
Free to use, modify, and redistribute. No attribution required.
SKILL.md
Video Audio Replace
Replace a video's original audio with TTS-generated voice while maintaining precise timing alignment. Also supports generating subtitles from video using Whisper.
Full Workflow
Step 1: Generate subtitles from video (optional)
If you don't have an SRT file, generate one from the video using the included script:
# Generate subtitles from video (uses faster-whisper, free, local)
generate_subtitles.py video.mp4 -o subtitles.srt -l zh
Or manually with Python:
# Using faster-whisper (recommended, local, free)
pip install faster-whisper srt
python3 << 'EOF'
from faster_whisper import WhisperModel
import srt
from datetime import timedelta
model = WhisperModel("base", device="cpu", compute_type="int8")
segments, info = model.transcribe("input_video.mp4", language="zh")
# Generate SRT
def format_time(seconds):
td = timedelta(seconds=seconds)
return f"{td.seconds//3600:02d}:{(td.seconds%3600)//60:02d}:{td.seconds%60:02d},{td.microseconds//1000:03d}"
srt_content = ""
for i, seg in enumerate(segments, 1):
start = format_time(seg.start)
end = format_time(seg.end)
srt_content += f"{i}\n{start} --> {end}\n{seg.text.strip()}\n\n"
with open("subtitles.srt", "w", encoding="utf-8") as f:
f.write(srt_content)
EOF
Step 2: Replace audio with TTS
Use the generated SRT to create a new video with TTS voice.
When to use
- Dubbing videos with AI-generated voice
- Converting subtitle files to voice-over
- Creating multilingual video versions
Requirements
API Keys (choose one)
- ElevenLabs: Set
ELEVENLABS_API_KEYenvironment variable - Edge TTS (free, no key needed): Use
--engine edge
System dependencies
- ffmpeg
- sox (optional, for advanced processing)
Usage
Basic usage (ElevenLabs)
video-audio-replace --video input.mp4 --srt subtitles.srt --output output.mp4 --voice "Liam"
Using Edge TTS (free, no API key)
video-audio-replace --video input.mp4 --srt subtitles.srt --output output.mp4 --engine edge --voice "zh-CN-YunxiNeural"
Options
| Option | Description | Default |
|---|---|---|
--video | Input video file | Required |
--srt | SRT subtitle file | Required |
--output | Output video file | input_tts.mp4 |
--voice | Voice ID or name | Liam (ElevenLabs) |
--engine | TTS engine: elevenlabs, edge | elevenlabs |
--speed-range | Speed adjustment range | 0.85-1.15 |
Examples
English voice (ElevenLabs)
video-audio-replace --video 2028.mp4 --srt 2028.srt --output 2028_final.mp4 --voice "Liam"
Chinese voice (Edge TTS)
video-audio-replace --video video.mp4 --srt subs.srt --output result.mp4 --engine edge --voice "zh-CN-YunxiNeural"
How it works
- Extract original audio from video
- Split audio into segments based on subtitle timestamps
- Generate TTS audio for each subtitle segment
- Adjust TTS speed (within 0.85-1.15x) to match original segment duration
- Add silence padding to fill any remaining time gap
- Merge all segments preserving original timing gaps
- Replace video audio with aligned TTS audio
Available Voices
ElevenLabs (require API key)
Liam- Energetic male (recommended)Sarah- Professional femaleBrian- Deep resonant male- Run
curlwith your API key to list all voices
Edge TTS (free)
- Chinese:
zh-CN-XiaoxiaoNeural,zh-CN-YunxiNeural,zh-CN-YunyangNeural - English:
en-US-JennyNeural,en-US-GuyNeural - Many more languages available
Files
5 totalSelect a file
Select a file to preview.
Comments
Loading comments…
