Transcribe audio files via OpenRouter using audio-capable models

Transcribe audio files via OpenRouter using audio-capable models (Gemini, GPT-4o-audio, etc).

MIT-0 · Free to use, modify, and redistribute. No attribution required.

⭐ 3 · 2.9k · 6 current installs · 6 all-time installs

by@obviyus

MIT-0

Security Scan

VirusTotal

Benign

View report →

OpenClaw

Suspicious

medium confidence

✓

Purpose & Capability

Name/description match the included script: the script converts audio, encodes it, and calls OpenRouter's chat/completions API with input_audio. The required binaries (ffmpeg, base64, jq, curl) are appropriate for this task. However, registry metadata shown earlier (no required env vars / no primary credential) contradicts the SKILL.md metadata and the script, which do require OPENROUTER_API_KEY — this mismatch is unexpected.

ℹ

Instruction Scope

SKILL.md and the script stay within the stated purpose (audio conversion, base64, POST to openrouter.ai). There are no hidden external endpoints or attempts to read unrelated system files. But SKILL.md documents an alternative API-key storage path (~/.clawdbot/clawdbot.json) that is not implemented in the script (the script only checks OPENROUTER_API_KEY env var). This is an inconsistency between documentation and implementation that could confuse users and lead to misconfiguration.

✓

Install Mechanism

No install spec — instruction-only with a bundled shell script. This is low risk compared to remote downloads or package installs. The file is plain shell script, readable, and does not extract or run external installers.

ℹ

Credentials

The only secret actually required by the script is OPENROUTER_API_KEY, which is proportionate for calling OpenRouter. The SKILL.md metadata correctly lists required binaries and OPENROUTER_API_KEY, but the registry summary in the provided data claimed 'no required env vars' — that inconsistency should be resolved. SKILL.md's suggestion to store keys in ~/.clawdbot/clawdbot.json is not reflected in code, so the declared config path is misleading.

✓

Persistence & Privilege

The skill does not request always:true and does not alter other skills or global agent settings. It uses temporary files and cleans them up via trap; no persistent credentials are stored by the script. Autonomous invocation is allowed by default but is not combined with other broad privileges here.

What to consider before installing

This skill appears to do what it says (convert audio and call OpenRouter), but there are documentation/metadata inconsistencies you should resolve before use. Specifically: (1) the script requires OPENROUTER_API_KEY — ensure the registry metadata advertises this and you provide a key you trust; (2) SKILL.md mentions storing the key in ~/.clawdbot/clawdbot.json but the script does not read that file — do not assume the key will be read from disk; (3) the script sends X-Title and an HTTP-Referer header (defaulting to 'Clawdbot' and https://clawdbot.com) which will appear in your OpenRouter dashboard — change or remove these defaults if you don't want identifying metadata sent; (4) inspect and (if needed) run the script in an isolated environment first to confirm it behaves as expected and to test large-file handling; and (5) ask the skill author/maintainer to fix the metadata vs. code discrepancies (registry declarations, SKILL.md vs actual behavior) before trusting it with real credentials.

Like a lobster shell, security has layers — review code before you run it.

Current versionv1.0.0

Download zip

latestvk970ma0vk3kg4h43aargq0dzy17ynx6v

License

MIT-0

Free to use, modify, and redistribute. No attribution required.

Termshttps://spdx.org/licenses/MIT-0.html

Runtime requirements

🎙️ Clawdis

Binscurl, ffmpeg, base64, jq

EnvOPENROUTER_API_KEY

Primary envOPENROUTER_API_KEY

SKILL.md

OpenRouter Audio Transcription

Transcribe audio files using OpenRouter's chat completions API with input_audio content type. Works with any audio-capable model.

Quick start

{baseDir}/scripts/transcribe.sh /path/to/audio.m4a

Output goes to stdout.

Useful flags

# Custom model (default: google/gemini-2.5-flash)
{baseDir}/scripts/transcribe.sh audio.ogg --model openai/gpt-4o-audio-preview

# Custom instructions
{baseDir}/scripts/transcribe.sh audio.m4a --prompt "Transcribe with speaker labels"

# Save to file
{baseDir}/scripts/transcribe.sh audio.m4a --out /tmp/transcript.txt

# Custom caller identifier (for OpenRouter dashboard)
{baseDir}/scripts/transcribe.sh audio.m4a --title "MyApp"

How it works

Converts audio to WAV (mono, 16kHz) using ffmpeg
Base64 encodes the audio
Sends to OpenRouter chat completions with input_audio content
Extracts transcript from response

API key

Set OPENROUTER_API_KEY env var, or configure in ~/.clawdbot/clawdbot.json:

{
  skills: {
    "openrouter-transcribe": {
      apiKey: "YOUR_OPENROUTER_KEY"
    }
  }
}

Headers

The script sends identification headers to OpenRouter:

X-Title: Caller name (default: "Peanut/Clawdbot")
HTTP-Referer: Reference URL (default: "https://clawdbot.com")

These show up in your OpenRouter dashboard for tracking.

Troubleshooting

ffmpeg format errors: The script uses a temp directory (not mktemp -t file.wav) because macOS's mktemp adds random suffixes after the extension, breaking format detection.

Argument list too long: Large audio files produce huge base64 strings that exceed shell argument limits. The script writes to temp files (--rawfile for jq, @file for curl) instead of passing data as arguments.

Empty response: If you get "Empty response from API", the script will dump the raw response for debugging. Common causes:

Invalid API key
Model doesn't support audio input
Audio file too large or corrupted

Files

2 total

Select a file

Select a file to preview.

Comments

Loading comments…