Install
openclaw skills install asr-skillsThis skill should be used when the user asks to "transcribe audio", "transcribe video", "convert speech to text", "generate subtitles", "create captions", "identify speakers in audio", or mentions audio/video transcription needs. Provides local ASR transcription with speaker diarization using FunASR.
openclaw skills install asr-skillsProvide local audio/video transcription with speaker diarization, multiple output formats, and progress indication.
Enable users to transcribe audio and video files to text with automatic speaker identification, supporting multiple subtitle formats while preserving privacy through local processing.
This skill triggers when the user:
# Transcribe audio file (outputs TXT by default)
python3 skills/asr/scripts/transcribe.py path/to/audio.mp3
# Transcribe video file
python3 skills/asr/scripts/transcribe.py path/to/video.mp4
python3 skills/asr/scripts/transcribe.py audio.mp3 -f json # Structured JSON with metadata
python3 skills/asr/scripts/transcribe.py audio.mp3 -f srt # SubRip subtitles
python3 skills/asr/scripts/transcribe.py audio.mp3 -f ass # ASS/SSA subtitles with speaker styling
python3 skills/asr/scripts/transcribe.py audio.mp3 -f md # Markdown with speaker sections
from asr_skill import transcribe
result = transcribe("meeting.mp4", format="srt")
print(f"Output: {result['output_path']}")
print(f"Speakers: {result.get('speakers', [])}")
Avoid timeouts by running transcription in the background:
# Start async task
python3 skills/asr/scripts/transcribe.py long_video.mp4 --async
# Output: {"task_id": "a1b2c3d4", "status": "queued", ...}
# Check status
python3 skills/asr/scripts/transcribe.py --status a1b2c3d4
# Output: {"task_id": "a1b2c3d4", "status": "processing", "progress": 45, ...}
# List recent tasks
python3 skills/asr/scripts/transcribe.py --list
Automatically identifies and labels different speakers:
Detects and uses the best available hardware:
Handles audio files longer than 1 hour:
| Format | Extension | Use Case |
|---|---|---|
| txt | .txt | Plain text with timestamps |
| json | .json | Structured data with word-level info |
| srt | .srt | Video subtitles |
| ass | .ass | Styled subtitles |
| md | .md | Documentation with speaker sections |
./models/"FFmpeg not found"
"CUDA out of memory"
"No speakers detected"
For detailed format specifications:
references/output-formats.md - Complete format documentationUtility scripts for batch processing:
scripts/transcribe.py - Batch transcription scriptWorking examples:
examples/basic_usage.py - Python API examplesexamples/cli_examples.sh - CLI usage examples