Transcribe audio files via OpenRouter using audio-capable models

Transcribe audio files via OpenRouter using audio-capable models (Gemini, GPT-4o-audio, etc).

MIT-0 · Free to use, modify, and redistribute. No attribution required.
3 · 2.9k · 6 current installs · 6 all-time installs
MIT-0
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Suspicious
medium confidence
Purpose & Capability
Name/description match the included script: the script converts audio, encodes it, and calls OpenRouter's chat/completions API with input_audio. The required binaries (ffmpeg, base64, jq, curl) are appropriate for this task. However, registry metadata shown earlier (no required env vars / no primary credential) contradicts the SKILL.md metadata and the script, which do require OPENROUTER_API_KEY — this mismatch is unexpected.
Instruction Scope
SKILL.md and the script stay within the stated purpose (audio conversion, base64, POST to openrouter.ai). There are no hidden external endpoints or attempts to read unrelated system files. But SKILL.md documents an alternative API-key storage path (~/.clawdbot/clawdbot.json) that is not implemented in the script (the script only checks OPENROUTER_API_KEY env var). This is an inconsistency between documentation and implementation that could confuse users and lead to misconfiguration.
Install Mechanism
No install spec — instruction-only with a bundled shell script. This is low risk compared to remote downloads or package installs. The file is plain shell script, readable, and does not extract or run external installers.
Credentials
The only secret actually required by the script is OPENROUTER_API_KEY, which is proportionate for calling OpenRouter. The SKILL.md metadata correctly lists required binaries and OPENROUTER_API_KEY, but the registry summary in the provided data claimed 'no required env vars' — that inconsistency should be resolved. SKILL.md's suggestion to store keys in ~/.clawdbot/clawdbot.json is not reflected in code, so the declared config path is misleading.
Persistence & Privilege
The skill does not request always:true and does not alter other skills or global agent settings. It uses temporary files and cleans them up via trap; no persistent credentials are stored by the script. Autonomous invocation is allowed by default but is not combined with other broad privileges here.
What to consider before installing
This skill appears to do what it says (convert audio and call OpenRouter), but there are documentation/metadata inconsistencies you should resolve before use. Specifically: (1) the script requires OPENROUTER_API_KEY — ensure the registry metadata advertises this and you provide a key you trust; (2) SKILL.md mentions storing the key in ~/.clawdbot/clawdbot.json but the script does not read that file — do not assume the key will be read from disk; (3) the script sends X-Title and an HTTP-Referer header (defaulting to 'Clawdbot' and https://clawdbot.com) which will appear in your OpenRouter dashboard — change or remove these defaults if you don't want identifying metadata sent; (4) inspect and (if needed) run the script in an isolated environment first to confirm it behaves as expected and to test large-file handling; and (5) ask the skill author/maintainer to fix the metadata vs. code discrepancies (registry declarations, SKILL.md vs actual behavior) before trusting it with real credentials.

Like a lobster shell, security has layers — review code before you run it.

Current versionv1.0.0
Download zip
latestvk970ma0vk3kg4h43aargq0dzy17ynx6v

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

Runtime requirements

🎙️ Clawdis
Binscurl, ffmpeg, base64, jq
EnvOPENROUTER_API_KEY
Primary envOPENROUTER_API_KEY

SKILL.md

OpenRouter Audio Transcription

Transcribe audio files using OpenRouter's chat completions API with input_audio content type. Works with any audio-capable model.

Quick start

{baseDir}/scripts/transcribe.sh /path/to/audio.m4a

Output goes to stdout.

Useful flags

# Custom model (default: google/gemini-2.5-flash)
{baseDir}/scripts/transcribe.sh audio.ogg --model openai/gpt-4o-audio-preview

# Custom instructions
{baseDir}/scripts/transcribe.sh audio.m4a --prompt "Transcribe with speaker labels"

# Save to file
{baseDir}/scripts/transcribe.sh audio.m4a --out /tmp/transcript.txt

# Custom caller identifier (for OpenRouter dashboard)
{baseDir}/scripts/transcribe.sh audio.m4a --title "MyApp"

How it works

  1. Converts audio to WAV (mono, 16kHz) using ffmpeg
  2. Base64 encodes the audio
  3. Sends to OpenRouter chat completions with input_audio content
  4. Extracts transcript from response

API key

Set OPENROUTER_API_KEY env var, or configure in ~/.clawdbot/clawdbot.json:

{
  skills: {
    "openrouter-transcribe": {
      apiKey: "YOUR_OPENROUTER_KEY"
    }
  }
}

Headers

The script sends identification headers to OpenRouter:

  • X-Title: Caller name (default: "Peanut/Clawdbot")
  • HTTP-Referer: Reference URL (default: "https://clawdbot.com")

These show up in your OpenRouter dashboard for tracking.

Troubleshooting

ffmpeg format errors: The script uses a temp directory (not mktemp -t file.wav) because macOS's mktemp adds random suffixes after the extension, breaking format detection.

Argument list too long: Large audio files produce huge base64 strings that exceed shell argument limits. The script writes to temp files (--rawfile for jq, @file for curl) instead of passing data as arguments.

Empty response: If you get "Empty response from API", the script will dump the raw response for debugging. Common causes:

  • Invalid API key
  • Model doesn't support audio input
  • Audio file too large or corrupted

Files

2 total
Select a file
Select a file to preview.

Comments

Loading comments…