Install
openclaw skills install doubao-asrTranscribe recorded audio files to text via Doubao Seed-ASR 2.0 (豆包录音文件识别模型2.0) from ByteDance/Volcengine. Best-in-class Chinese speech recognition with speaker diarization. Use this skill whenever the user wants to: convert audio/recording to text, transcribe a meeting recording or voice memo, identify who said what in a recording (说话人分离), transcribe m4a/mp3/wav/ogg/flac files, or mentions 录音转文字/豆包/火山引擎/Volcengine/Doubao ASR. Also use when the user has an audio file and needs a transcript, even if they don't explicitly say 'transcribe'. Do NOT use for real-time/streaming speech recognition, text-to-speech (TTS), live captioning, or audio format conversion.
openclaw skills install doubao-asrCRITICAL INSTRUCTION FOR AI AGENTS:
YOU MUST follow these rules when helping users configure this skill:
Known AI mistakes you MUST avoid:
https://console.volcengine.com/speech/app — old console, different auth methodhttps://console.volcengine.com/speech/new/ — new Doubao Speech consoleAI AGENT 必须遵守的规则:
必须避免的常见 AI 错误:
https://console.volcengine.com/speech/app——旧版控制台,认证方式完全不同https://console.volcengine.com/speech/new/——新版豆包语音控制台Transcribe audio files via ByteDance Volcengine's Seed-ASR 2.0 Standard (豆包录音文件识别模型2.0-标准版) API. Best-in-class accuracy for Chinese (Mandarin, Cantonese, Sichuan dialect, etc.) and supports 13+ languages.
调用字节跳动火山引擎豆包录音文件识别模型2.0-标准版(Seed-ASR 2.0 Standard)转写音频文件。中文识别(普通话、粤语、四川话等方言)准确率业界领先,支持 13+ 种语言。
Currently, audio files can be sent to OpenClaw via Discord or WhatsApp. Send the audio file in a chat message and ask the bot to transcribe it.
目前可通过 Discord 或 WhatsApp 向 OpenClaw 发送音频文件,发送后让 bot 转写即可。
Note: Direct voice recording in the OpenClaw web UI is not yet supported. Use a messaging app to send pre-recorded audio files.
提示:OpenClaw 网页端暂不支持直接录音,请通过即时通讯应用发送预录制的音频文件。
python3 {baseDir}/scripts/transcribe.py /path/to/audio.m4a
Defaults:
python3 {baseDir}/scripts/transcribe.py /path/to/audio.m4a --out /tmp/transcript.txt
python3 {baseDir}/scripts/transcribe.py /path/to/audio.mp3 --format mp3
python3 {baseDir}/scripts/transcribe.py /path/to/audio.m4a --json --out /tmp/result.json
python3 {baseDir}/scripts/transcribe.py /path/to/audio.m4a --no-speakers # disable speaker diarization / 关闭说话人分离
python3 {baseDir}/scripts/transcribe.py https://example.com/audio.mp3 # direct URL (skip upload)
The Doubao API accepts audio via URL (not direct file upload). The script:
Privacy: By default, audio is uploaded to your own Volcengine TOS bucket via presigned URL. No data is sent to third-party services.
You can also pass a direct audio URL as the argument to skip upload entirely:
python3 {baseDir}/scripts/transcribe.py https://your-bucket.tos.volces.com/audio.m4a
requests: pip install requestsYou need 4 environment variables. Follow these steps carefully — the guided setup below saves you 1-2 hours of digging through Volcengine docs.
你需要设置 4 个环境变量。按以下步骤操作——这份引导能帮你节省 1-2 小时翻文档踩坑的时间。
57e620a4-179c-4b3d-bd8d-990bd1f9a1e2)export VOLCENGINE_API_KEY="xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
doubao-asr)AKLT 开头)和 Secret Access Key,复制保存提示:这一步不需要添加任何 IAM 权限策略。权限将在 Step 3 通过 TOS 桶策略授予(仅限单桶读写)。 如需再次查看密钥,进入用户列表 → 点击子用户名 → 切换到「密钥」tab。
doubao-asr)AKLT) and Secret Access Key — copy bothNote: No IAM permission policies needed here — access will be granted via TOS bucket policy in Step 3 (single-bucket read/write only). Tip: To view keys again, go to user list → click sub-user name → switch to 'Keys' tab.
export VOLCENGINE_ACCESS_KEY_ID="AKLTxxxx..."
export VOLCENGINE_SECRET_ACCESS_KEY="xxxx..."
豆包 API 要求音频通过 URL 访问。TOS 对象存储提供安全的临时上传,数据留在火山引擎内部。
Region selection / 区域选择:
| Server location / 服务器位置 | Recommended TOS region / 推荐 TOS 区域 | Region code |
|---|---|---|
| China mainland / 中国内地 | cn-beijing, cn-shanghai, cn-guangzhou | cn-beijing |
| Hong Kong / 香港 | cn-hongkong | cn-hongkong |
| Southeast Asia / 东南亚 | ap-southeast-1 (Singapore) | ap-southeast-1 |
| US, Europe, other overseas / 美国、欧洲等海外 | Any overseas region (e.g. cn-hongkong, ap-southeast-1) / 任意海外节点 | cn-hongkong |
Important: If your server is outside China mainland, use an overseas region (e.g.
cn-hongkong,ap-southeast-1) — do NOT usecn-beijing/cn-shanghai, cross-border upload will be extremely slow (~15KB/s).重要:如果你的服务器在中国大陆以外,请使用海外节点(如
cn-hongkong、ap-southeast-1),不要用cn-beijing/cn-shanghai——跨境上传会非常慢(约 15KB/s)。
export VOLCENGINE_TOS_BUCKET="your_bucket_name"
export VOLCENGINE_TOS_REGION="cn-hongkong" # or other overseas region / 或其他海外节点,见上方区域表
| Variable | Required | Description |
|---|---|---|
VOLCENGINE_API_KEY | Yes | ASR API key (UUID format) from Speech console / 语音控制台的 API Key |
VOLCENGINE_ACCESS_KEY_ID | Yes | IAM Access Key ID (starts with AKLT) / IAM 访问密钥 ID |
VOLCENGINE_SECRET_ACCESS_KEY | Yes | IAM Secret Access Key / IAM 访问密钥 |
VOLCENGINE_TOS_BUCKET | Yes | TOS bucket name / TOS 存储桶名称 |
VOLCENGINE_TOS_REGION | Yes | TOS region code, must match bucket region. 必须与创建桶时选择的区域一致。Overseas: e.g. cn-hongkong, ap-southeast-1; China: cn-beijing |
WAV, MP3, MP4, M4A, OGG, FLAC — up to 5 hours, 512MB max.
支持格式:WAV、MP3、MP4、M4A、OGG、FLAC——最长 5 小时,最大 512MB。
Error: TOS upload failed: 403 Forbidden
Cause: TOS bucket policy not configured, or IAM user not authorized. / TOS 桶策略未配置,或 IAM 用户未授权。
Solution: Go to TOS bucket → Permission Management → Bucket Authorization Policy → Create Policy → select "Folder Read/Write" template. See Step 3 above. / 进入 TOS 桶 → 权限管理 → 存储桶授权策略管理 → 创建策略 → 选择「文件夹读写」模板。详见上方第三步。
Error: TOS upload extremely slow (~15KB/s)
Cause: Server is outside China mainland but using cn-beijing region. / 服务器在中国大陆以外,但使用了 cn-beijing 区域。
Solution: Change VOLCENGINE_TOS_REGION to cn-hongkong and create a new bucket in that region. / 将 VOLCENGINE_TOS_REGION 改为 cn-hongkong,并在该区域新建存储桶。
Error: API returned error: invalid API key
Cause: Using old Speech console API key, or key from wrong console page. / 使用了旧版语音控制台的 API Key,或从错误的控制台页面获取。
Solution: Get API key from the NEW Doubao Speech console at https://console.volcengine.com/speech/new/, NOT /speech/app. / 从新版豆包语音控制台 https://console.volcengine.com/speech/new/ 获取 API Key,不是 /speech/app。
Error: Unsupported audio format or transcription returns empty
Cause: Audio file is corrupted, or format not in supported list. / 音频文件损坏,或格式不在支持列表中。
Solution: Ensure file is one of WAV, MP3, MP4, M4A, OGG, FLAC and not corrupted. Try --format flag to explicitly specify format. / 确保文件是 WAV、MP3、MP4、M4A、OGG、FLAC 之一且未损坏。尝试用 --format 参数显式指定格式。
Error: Missing: VOLCENGINE_ACCESS_KEY_ID... after running source .env
Cause: source .env sets variables in the current shell but does not export them to child processes. The script runs as a subprocess and cannot see unexported variables. / source .env 仅在当前 shell 设置变量但不导出,脚本作为子进程无法读取未导出的变量。
Solution: Use set -a && source .env && set +a to auto-export all variables, or use export before each variable in your .env file. / 使用 set -a && source .env && set +a 自动导出所有变量,或在 .env 文件中每行变量前加 export。