audioclaw-skills-voice-intake

v1.0.1

Use when AudioClaw Skills needs to understand a user voice message with AudioClaw ASR, including speech-to-text, model routing for deepthink or pro features,...

⭐ 0· 233·0 current·0 all-time

byWu Ruixiao@kikidouloveme79

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for kikidouloveme79/audioclaw-skills-voice-intake.

Previewing Install & Setup.

Prompt PreviewInstall & Setup

Install the skill "audioclaw-skills-voice-intake" (kikidouloveme79/audioclaw-skills-voice-intake) from ClawHub.
Skill page: https://clawhub.ai/kikidouloveme79/audioclaw-skills-voice-intake
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install audioclaw-skills-voice-intake

ClawHub CLI

Package manager switcher

npx clawhub@latest install audioclaw-skills-voice-intake

Security Scan

VirusTotal

Benign

View report →

OpenClaw

Suspicious

medium confidence

ℹ

Purpose & Capability

The name, description, SKILL.md, and scripts all consistently implement an AudioClaw voice intake that posts audio to SenseAudio ASR and builds an AudioClaw turn payload. That capability aligns with the stated purpose. However, the registry metadata lists no required environment variables even though the runtime clearly expects an API key (SENSEAUDIO_API_KEY), so the declared requirements are incomplete.

Instruction Scope

The instructions are concrete and scoped to ASR (save incoming audio, run the included script, hand off JSON). However the SKILL.md and code explicitly reference a runtime bootstrap that can replace an injected token with a real key from ~/.audioclaw/workspace/state/senseaudio_credentials.json via a shared module (senseaudio_env / senseaudio_api_guard). The instructions therefore rely on reading or substituting credentials from a host-local path and on a shared bootstrap module that is not included in the bundle — this is an important behavioral detail that is not reflected in the declared requirements and increases trust surface.

✓

Install Mechanism

There is no install spec and no external download. The skill is instruction-plus-scripts only; all code is included in the bundle. No archives are pulled from external URLs and nothing is written during an automated install step beyond the skill files themselves.

Credentials

The runtime expects an API key in SENSEAUDIO_API_KEY (and provides an override --api-key-env). The registry metadata lists no required env vars or primary credential, which is inconsistent and misleading. The code also expects a shared bootstrap that can read a local credentials file (~/.audioclaw/workspace/state/senseaudio_credentials.json) to replace placeholder tokens — access to that file contains sensitive credentials and should be explicitly declared and audited.

✓

Persistence & Privilege

The skill does not request always:true and does not modify other skills. It imports an optional shared module from parent directories if present, which is a local code-loading behavior (not an automatic persistence or system-level config change). This increases the trusted-code surface but is not an elevated privilege setting by itself.

What to consider before installing

This skill's behavior is largely consistent with its stated purpose (sending audio to the SenseAudio API and returning a structured JSON handoff), but two mismatches deserve attention before installing: 1) API key handling: The scripts expect a SENSEAUDIO_API_KEY at runtime (or an alternative env via --api-key-env), but the package metadata does not declare any required env vars. Confirm how your agent runtime will provide the API key. Ask the maintainer to declare SENSEAUDIO_API_KEY in the registry metadata so you can review and control access. 2) Shared bootstrap and local credentials: The code will attempt to import a shared module (../_shared/senseaudio_env.py) and the documentation states it may replace placeholder tokens with a 'real' key read from ~/.audioclaw/workspace/state/senseaudio_credentials.json. Before installing, inspect or request the source of that shared module and the on-disk credentials file. Ensure the file location and replacement logic are trustworthy and that no code path will exfiltrate those credentials. If you do not control the host-provided shared module, treat that as an untrusted dependency. Other practical checks: verify the included scripts do not post to endpoints other than https://api.senseaudio.cn, confirm you are comfortable with the code using /usr/bin/afinfo (macOS) for duration detection (it falls back if absent), and run the scripts in a controlled environment with a test API key before using with production credentials. If the maintainer cannot provide the shared bootstrap code for review, prefer to run your own vetted wrapper or modify the scripts to accept an explicit API key and not load external shared modules.

Like a lobster shell, security has layers — review code before you run it.

latestvk970tgfqnzaxsctqwq5jxjn3jd83dyts

233downloads

0stars

2versions

Updated 21h ago

v1.0.1

MIT-0

AudioClaw Skills Voice Intake

When to use

Use this skill when the user sends a voice message and AudioClaw should understand the content before replying.

Common triggers:

A Feishu or chat bot receives an audio message instead of text.
AudioClaw needs a transcript plus a clean user message payload.
The workflow wants richer ASR features such as timestamps, sentiment, or speaker separation.
The team wants one stable AudioClaw intake entrypoint instead of hand-written ASR requests.
The channel stores inbound voice files as .ogg or .opus, and AudioClaw still needs one stable ASR path.

Do not use this skill for speech output. Use $audioclaw-skills-voice-reply for TTS.

Workflow

Save the incoming audio file locally.
Run scripts/openclaw_voice_intake.py with the audio path.
Let the script choose the best model when no model is forced:
- sense-asr-deepthink for normal single-speaker voice understanding
- sense-asr when a language hint is provided
- sense-asr-pro when timestamps, sentiment, speaker diarization, or punctuation are requested
- sense-asr-lite when hotwords are requested
Use the JSON manifest it returns as the AudioClaw handoff:
- transcript.normalized_text
- openclaw.turn_payload
- routing.selected_model
If understanding.clarification_needed is true, ask the user to repeat or resend the audio.

Runtime model

Official HTTP ASR API:

Endpoint: https://api.senseaudio.cn/v1/audio/transcriptions
Content type: multipart/form-data
File size limit: <=10MB
Practical local input suffixes accepted by this skill: .wav, .mp3, .ogg, .opus, .flac, .aac, .m4a, .mp4

Supported response goals:

plain transcript
richer raw response passthrough
AudioClaw-ready turn payload

The skill keeps two layers separate:

ASR output from AudioClaw ASR
AudioClaw packaging and clarification heuristics

API key lookup

This skill now treats SENSEAUDIO_API_KEY as the default API key source again.

Runtime rules:

If the host app injects SENSEAUDIO_API_KEY as an AudioClaw login token such as v2.public..., the shared bootstrap will replace it with the real sk-... value from ~/.audioclaw/workspace/state/senseaudio_credentials.json before ASR starts.
--api-key-env still works, but the default runtime path is SENSEAUDIO_API_KEY.

Commands

Basic voice intake:

python3 scripts/openclaw_voice_intake.py \
  --input /path/to/user_audio.mp3

Voice intake with richer AudioClaw structure:

python3 scripts/openclaw_voice_intake.py \
  --input /path/to/meeting_clip.m4a \
  --enable-punctuation \
  --timestamp-granularity segment \
  --enable-sentiment \
  --out-json /tmp/openclaw_voice_turn.json

Force a specific model:

python3 scripts/openclaw_voice_intake.py \
  --input /path/to/user_audio.mp3 \
  --model sense-asr-deepthink

AudioClaw integration pattern

Recommended handoff:

Channel adapter stores the inbound audio.
AudioClaw calls scripts/openclaw_voice_intake.py.
AudioClaw reads:
- openclaw.turn_payload.role
- openclaw.turn_payload.content
- openclaw.turn_payload.metadata
The normal dialogue pipeline continues as if the user typed the recognized text.

Operational rules:

Keep the original audio path in metadata for debugging.
Pass language only when you are confident; otherwise let ASR auto-detect.
If you request timestamps, sentiment, or diarization, let the script choose sense-asr-pro.
If transcript is empty, do not hallucinate a user intent. Ask for clarification.

Resources

scripts/senseaudio_asr_client.py
- Multipart HTTP client for AudioClaw ASR
- Handles model routing validation and JSON or text responses
scripts/openclaw_voice_intake.py
- Main runtime for AudioClaw
- Builds transcript, normalized user text, and turn payload
references/openclaw_voice_intake.md
- Official ASR docs summary, model support notes, and AudioClaw payload examples

Comments

Loading comments...