Install
openclaw skills install douyin-video-transcribeDouyin video transcription suite. Extract audio from Douyin/TikTok China videos, transcribe with Whisper, and analyze content. Supports video links, local fi...
openclaw skills install douyin-video-transcribeA complete solution for transcribing Douyin (抖音/TikTok China) videos. Extracts audio, transcribes speech to text, and generates structured summaries.
| Version | Changes |
|---|---|
| 2.0.0 | Modular architecture, improved workflow, browser DOM extraction |
| 1.0.0 | Initial release, basic transcription |
\
User Input (Douyin Link/File)
│
▼
┌─────────────────────────────────────────┐
│ Workflow Orchestrator │
├─────────────────────────────────────────┤
│ Step 1: Fetcher → Get video file │
│ Step 2: Transcriber → Extract & convert│
│ Step 3: Analyzer → Structure output │
│ Step 4: Output → Save results │
└─────────────────────────────────────────┘
\\
| Tool | Purpose | Install |
|---|---|---|
| curl | Download files | Built-in (Windows: \curl.exe) |
| ffmpeg | Audio extraction/merge | \winget install Gyan.FFmpeg\ |
| Whisper | Transcription | \pip install openai-whisper\ or Docker |
| Browser | Video extraction | OpenClaw profile required |
Docker Whisper (Recommended): \\ash docker run -d -p 9000:9000 --name whisper-asr onerahmet/openai-whisper-asr-webservice:latest \\
| Input Type | Detection | Action |
|---|---|---|
| Video link (/video/) | URL pattern | Full workflow |
| Image note (/note/) | URL pattern | Snapshot only |
| Local video file | File path | Start from Step 2 |
| Text input | Plain text | Start from Step 3 |
\\ash
curl.exe -sL -o NUL -w "%{url_effective}" "https://v.douyin.com/xxx/"
curl -sL -o /dev/null -w '%{url_effective}' "https://v.douyin.com/xxx/" \\
Output: \https://www.douyin.com/video/7616020798351871284\
\
browser(action='open', profile='openclaw', url='https://www.douyin.com/video/{VIDEO_ID}')
\\
Wait 10-15 seconds for page to load completely.
\\javascript browser(action='act', targetId='PAGE_ID', request={ "kind": "evaluate", "fn": "(() => { const entries = performance.getEntriesByType('resource'); const videoEntries = entries.filter(e => { const name = e.name.toLowerCase(); return name.includes('douyinvod') && (name.includes('.mp4') || name.includes('video')); }); if (videoEntries.length > 0) { const video = videoEntries[videoEntries.length - 1]; return { url: video.name, type: video.name.includes('.mp4') ? 'mp4' : 'dash' }; } return null; })()" }) \\
Important Notes:
\\ash curl.exe -L -H "Referer: https://www.douyin.com/" -o video.mp4 "<CDN_URL>" \\
Referer header is required, otherwise 403.
\\ash
ffmpeg -i video.mp4 -ar 16000 -ac 1 -c:a pcm_s16le audio.wav -y
ffmpeg -i video.mp4 -i audio.mp4 -c copy merged.mp4 -y ffmpeg -i merged.mp4 -ar 16000 -ac 1 -c:a pcm_s16le audio.wav -y \\
Parameters:
\\ash curl.exe -X POST "http://localhost:PORT/asr" -F "audio_file=@audio.wav" \\
\\ash python -m whisper audio.wav --model small --language zh \\
Model Selection:
| Model | Size | 5-min Video (CPU) | Accuracy | Use Case |
|---|---|---|---|---|
| tiny | 75MB | ~30s | Fair | Quick preview |
| base | 142MB | ~1min | Good | Daily use |
| small | 466MB | ~3min | Better | Recommended |
| medium | 1.5GB | ~8min | Best | High accuracy |
Agent processes transcript and generates:
Fix transcription errors
Structure content
Extract key points
Generate tags
\\markdown
作者: {Author} 来源: 抖音 日期: {Date} 转录时间: {Transcription Date}
{Summary}
{Transcript content with paragraphs}
#{tag1} #{tag2} #{tag3} \\
\
{VIDEO_ID}-抖音转录.md
\\
| Stage | Issue | Solution |
|---|---|---|
| Step 1 | Short URL fails | Check link completeness, remove share text |
| Step 1 | JS returns null | Wait 15-20s and retry, increase timeout |
| Step 1 | Download 403 | URL expired, re-fetch from browser |
| Step 1 | DASH no audio | Merge with \fmpeg -i video -i audio -c copy\ |
| Step 2 | ffmpeg not installed | \winget install Gyan.FFmpeg\ |
| Step 2 | Whisper service down | \docker start whisper-asr\ |
| Step 2 | Transcription slow | 10-min video takes 15-20 min on CPU |
| Step 2 | Poor quality | Use larger model (medium) |
Image notes (/note/) don't need transcription:
\\
This skill can be extended with standalone modules:
| Module | Purpose |
|---|---|
| douyin-fetcher | Video fetching only |
| douyin-transcriber | Audio transcription only |
| douyin-analyzer | Content analysis only |
| douyin-orchestrator | Workflow coordination |
MIT-0 License - Free to use, modify, and redistribute.