Audio Command Handler

v1.0.0

Handle audio messages as commands. When user sends an audio file (WAV/PCM/MP3), transcribe it using iFlytek Speed Transcription and either (1) execute the tr...

⭐ 0· 35·0 current·0 all-time

by@smallkeyboy

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for smallkeyboy/audio-command-handler.

Previewing Install & Setup.

Prompt PreviewInstall & Setup

Install the skill "Audio Command Handler" (smallkeyboy/audio-command-handler) from ClawHub.
Skill page: https://clawhub.ai/smallkeyboy/audio-command-handler
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install audio-command-handler

ClawHub CLI

Package manager switcher

npx clawhub@latest install audio-command-handler

Security Scan

VirusTotal

Suspicious

View report →

OpenClaw

Suspicious

high confidence

Purpose & Capability

The description and SKILL.md say audio can be transcribed and executed as commands. The shipped script transcribes audio, prepares output, and may save/upload results, but it does NOT actually execute arbitrary commands derived from the transcription. That is a substantive mismatch: the skill advertises command execution capability that the code does not implement.

Instruction Scope

SKILL.md instructs agents to run the ifly-speed-transcription and uploader scripts from specific workspace paths and describes executing transcriptions as commands. Those instructions grant broad discretion (execute user-provided text as commands) which is dangerous in general. The actual script does not perform shell execution, but the instructions still direct the agent to use other local scripts and to upload potentially sensitive content; this scope is broader than just transcription.

✓

Install Mechanism

No install spec or remote downloads: the skill is instruction-only with a local Python script. Nothing is pulled from external URLs during install, so install-time risk is low.

ℹ

Credentials

The skill declares no credentials or env vars. It does, however, depend on external skills (ifly-speed-transcription and uploader) located under ~/.openclaw/workspace. Those helper scripts (not included here) may require credentials or upload targets; this skill will forward transcript data to them, so secret handling/exfiltration risk depends on those other skills.

✓

Persistence & Privilege

No elevated privileges requested: always is false, the skill does not modify other skill configs, and only writes files under the user's workspace directory. It uses subprocess to run other local scripts but does not install persistent agents or alter system-wide settings.

What to consider before installing

Key things to consider before installing: - Mismatch between docs and code: the README/skill description says it will "execute" transcribed audio as commands, but scripts/handle_audio.py only transcribes and prints/saves the result — it does not execute arbitrary shell commands. If you expected automated execution, do not assume it exists; conversely, if you worry about remote execution, the code is safer than the docs claim, but the docs could cause an agent to behave dangerously when chained with other skills. - External dependencies: this skill calls two other local scripts in ~/.openclaw/workspace/skills (ifly-speed-transcription and uploader). Verify those scripts exist, inspect them, and confirm where uploader actually sends files. The uploader could exfiltrate sensitive transcriptions to external endpoints. - Data exposure: transcription results are written into HTML files in ~/.openclaw/workspace and then handed to the uploader. If your audio contains sensitive data, confirm retention and access controls for that workspace and the uploader's storage. - Execution risk in orchestration: although this script does not exec transcription text, SKILL.md instructs agents to "use transcription as the command." If an agent or orchestrator follows the SKILL.md instead of the included script, that could lead to executing user-supplied text. Ensure your agent enforces safe execution policies and does not run arbitrary text as shell commands. - Recommended actions: inspect the ifly-speed-transcription and uploader skills (particularly uploader backend endpoints and auth), confirm the uploader's destination and access controls, and decide whether you need stricter sanitization or to remove auto-upload behavior. If you will rely on automatic command execution, require additional code review and strict input sanitization before enabling that behavior.

Like a lobster shell, security has layers — review code before you run it.

latestvk97aa6p13k2fzwqm7wxjhmhyks85j9k9

35downloads

0stars

1versions

Updated 1d ago

v1.0.0

MIT-0

Audio Command Handler

Process audio messages and execute them as commands.

Workflow

Scenario 1: Audio Only (No Text)

User sends an audio file without any text instruction:

Transcribe the audio using ifly-speed-transcription skill
Use transcription as the command - execute it as if the user typed it
Return result directly - no file upload needed, regardless of length

Scenario 2: Audio + Text Command

User sends an audio file WITH a text instruction:

Transcribe the audio using ifly-speed-transcription skill
Execute the text command with the transcription as context/input
Check result length:
- If ≤ 58 characters: return result directly
- If > 58 characters: save to file, upload via uploader skill, return URL

Quick Reference

Transcription

python3 ~/.openclaw/workspace/skills/ifly-speed-transcription/scripts/transcribe.py /path/to/audio.mp3

Upload

python3 ~/.openclaw/workspace/skills/uploader/scripts/upload_media.py /path/to/file.txt

Execution Flow

┌─────────────────┐
│  Audio Message  │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│   Transcribe    │
│ (ifly-speed-    │
│  transcription) │
└────────┬────────┘
         │
         ▼
┌─────────────────┐     NO      ┌──────────────┐
│ Has Text Cmd?   │────────────►│ Use Transcrip│
└────────┬────────┘              │ as Command   │
         │ YES                   └──────┬───────┘
         ▼                              │
┌─────────────────┐                     │
│ Execute Text    │                     │
│ Cmd with Trans  │                     │
│ Context         │                     │
└────────┬────────┘                     │
         │                              │
         │                              ▼
         │                    ┌──────────────┐
         │                    │ Return Direct│
         │                    │ to User      │
         │                    │ (no upload)  │
         │                    └──────────────┘
         │
         ▼
┌─────────────────┐
│ Result > 58 ch? │
└────────┬────────┘
         │
         ┌─────────────┴─────────────┐
         │ YES                       │ NO
         ▼                           ▼
┌─────────────────┐         ┌──────────────┐
│ Save to File    │         │ Return Direct│
│ Upload via      │         │ to User      │
│ uploader skill  │         └──────────────┘
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│ Return URL to   │
│ User            │
└─────────────────┘

Example Scenarios

Example 1: Audio Only

User sends: 🎤 audio file (speech: "帮我查一下明天上海的天气")

Flow:

Transcribe → "帮我查一下明天上海的天气"
Execute as command → check Shanghai weather for tomorrow
Return weather info directly (no upload, regardless of length)

Example 2: Audio + Command (Short Result)

User sends: 🎤 audio file + text "帮我总结这段录音"

Flow:

Transcribe audio → get text content
Execute "帮我总结这段录音" with transcription as context
If summary ≤ 58 chars → return directly

Example 3: Audio + Command (Long Result)

User sends: 🎤 audio file + text "帮我根据这段录音写一篇文章"

Flow:

Transcribe audio → get text content
Execute command with transcription as context
Result > 58 chars → save to file, upload
Return: "已生成内容，下载链接：https://..."

Notes

Audio formats: WAV, PCM, MP3 (16kHz, 16-bit, mono recommended)
Max duration: 5 hours
Language support: Chinese, English, 202+ Chinese dialects
Result threshold: 58 characters (configurable per implementation)
File location: Saved to ~/.openclaw/workspace/ before upload

Comments

Loading comments...