Install
openclaw skills install zhipu-asrAutomatic Speech Recognition (ASR) using Zhipu AI (BigModel) GLM-ASR model. Use when you need to transcribe audio files to text. Supports Chinese audio transcription with context prompts, custom hotwords, and multiple audio formats.
openclaw skills install zhipu-asrTranscribe Chinese audio files to text using Zhipu AI's GLM-ASR model.
1. Get your API Key: Get a key from Zhipu AI Console
2. Set it in your environment:
export ZHIPU_API_KEY="your-key-here"
Note: The script automatically converts unsupported formats to MP3 using ffmpeg. Only WAV and MP3 are accepted by the API, but you can use any format that ffmpeg supports.
Transcribe an audio file with default settings:
bash scripts/speech_to_text.sh recording.wav
Provide previous transcription or context for better accuracy:
bash scripts/speech_to_text.sh recording.wav "这是之前的转录内容,有助于提高准确性"
Use custom vocabulary to improve recognition of specific terms:
bash scripts/speech_to_text.sh recording.mp3 "" "人名,地名,专业术语,公司名称"
Combine context and hotwords:
bash scripts/speech_to_text.sh recording.wav "会议记录片段" "张三,李四,项目名称"
Parameters:
audio_file (required): Path to audio file (.wav or .mp3)prompt (optional): Previous transcription or context text (max 8000 chars)hotwords (optional): Comma-separated list of specific terms (max 100 words)Why use context prompts:
When to use:
Example:
bash scripts/speech_to_text.sh part2.wav "第一部分的转录内容:讨论了项目进展和下一步计划"
What are hotwords: Custom vocabulary list that boosts recognition accuracy for specific terms.
Best use cases:
Examples:
# Medical transcription
bash scripts/speech_to_text.sh medical.wav "" "患者,症状,诊断,治疗方案"
# Business meeting
bash scripts/speech_to_text.sh meeting.wav "" "张经理,李总,项目代号,预算"
# Tech discussion
bash scripts/speech_to_text.sh tech.wav "" "API,数据库,算法,框架"
# Part 1
bash scripts/speech_to_text.sh meeting_part1.wav
# Part 2 with context
bash scripts/speech_to_text.sh meeting_part2.wav "第一部分讨论了项目进度" "张总,李经理,项目名称"
# Part 3 with context
bash scripts/speech_to_text.sh meeting_part3.wav "前两部分讨论了项目进度和预算" "张总,李经理,项目名称"
bash scripts/speech_to_text.sh lecture.wav "" "教授,课程名称,专业术语1,专业术语2"
for file in recording_*.wav; do
bash scripts/speech_to_text.sh "$file"
done
Best practices for accurate transcription:
Clear audio source
Optimal audio settings
File preparation
The script outputs JSON with:
id: Task IDcreated: Request timestamp (Unix timestamp)request_id: Unique request identifiermodel: Model name usedtext: Transcribed textExample output:
{
"id": "task-12345",
"created": 1234567890,
"request_id": "req-abc123",
"model": "glm-asr-2512",
"text": "你好,这是转录的文本内容"
}
File Size Issues:
Duration Issues:
Poor Accuracy:
Format Issues: