AudioClaw Skills Voice Intake
When to use
Use this skill when the user sends a voice message and AudioClaw should understand the content before replying.
Common triggers:
- A Feishu or chat bot receives an audio message instead of text.
- AudioClaw needs a transcript plus a clean user message payload.
- The workflow wants richer ASR features such as timestamps, sentiment, or speaker separation.
- The team wants one stable AudioClaw intake entrypoint instead of hand-written ASR requests.
- The channel stores inbound voice files as
.ogg or .opus, and AudioClaw still needs one stable ASR path.
Do not use this skill for speech output. Use $audioclaw-skills-voice-reply for TTS.
Workflow
- Save the incoming audio file locally.
- Run
scripts/openclaw_voice_intake.py with the audio path.
- Let the script choose the best model when no model is forced:
sense-asr-deepthink for normal single-speaker voice understanding
sense-asr when a language hint is provided
sense-asr-pro when timestamps, sentiment, speaker diarization, or punctuation are requested
sense-asr-lite when hotwords are requested
- Use the JSON manifest it returns as the AudioClaw handoff:
transcript.normalized_text
openclaw.turn_payload
routing.selected_model
- If
understanding.clarification_needed is true, ask the user to repeat or resend the audio.
Runtime model
Official HTTP ASR API:
- Endpoint:
https://api.senseaudio.cn/v1/audio/transcriptions
- Content type:
multipart/form-data
- File size limit:
<=10MB
- Practical local input suffixes accepted by this skill:
.wav, .mp3, .ogg, .opus, .flac, .aac, .m4a, .mp4
Supported response goals:
- plain transcript
- richer raw response passthrough
- AudioClaw-ready turn payload
The skill keeps two layers separate:
- ASR output from AudioClaw ASR
- AudioClaw packaging and clarification heuristics
API key lookup
This skill now treats SENSEAUDIO_API_KEY as the default API key source again.
Runtime rules:
- If the host app injects
SENSEAUDIO_API_KEY as an AudioClaw login token such as v2.public..., the shared bootstrap will replace it with the real sk-... value from ~/.audioclaw/workspace/state/senseaudio_credentials.json before ASR starts.
--api-key-env still works, but the default runtime path is SENSEAUDIO_API_KEY.
Commands
Basic voice intake:
python3 scripts/openclaw_voice_intake.py \
--input /path/to/user_audio.mp3
Voice intake with richer AudioClaw structure:
python3 scripts/openclaw_voice_intake.py \
--input /path/to/meeting_clip.m4a \
--enable-punctuation \
--timestamp-granularity segment \
--enable-sentiment \
--out-json /tmp/openclaw_voice_turn.json
Force a specific model:
python3 scripts/openclaw_voice_intake.py \
--input /path/to/user_audio.mp3 \
--model sense-asr-deepthink
AudioClaw integration pattern
Recommended handoff:
- Channel adapter stores the inbound audio.
- AudioClaw calls
scripts/openclaw_voice_intake.py.
- AudioClaw reads:
openclaw.turn_payload.role
openclaw.turn_payload.content
openclaw.turn_payload.metadata
- The normal dialogue pipeline continues as if the user typed the recognized text.
Operational rules:
- Keep the original audio path in metadata for debugging.
- Pass
language only when you are confident; otherwise let ASR auto-detect.
- If you request timestamps, sentiment, or diarization, let the script choose
sense-asr-pro.
- If transcript is empty, do not hallucinate a user intent. Ask for clarification.
Resources
scripts/senseaudio_asr_client.py
- Multipart HTTP client for AudioClaw ASR
- Handles model routing validation and JSON or text responses
scripts/openclaw_voice_intake.py
- Main runtime for AudioClaw
- Builds transcript, normalized user text, and turn payload
references/openclaw_voice_intake.md
- Official ASR docs summary, model support notes, and AudioClaw payload examples