K8s Self Hosted Whisper Api
v0.1.0Transcribe audio via the self-hosted Whisper ASR instance running on Kubernetes. Use this skill whenever the user wants to transcribe audio files, convert sp...
Security Scan
OpenClaw
Benign
high confidencePurpose & Capability
Name, description, SKILL.md, and the included transcribe.sh all align: they send an audio file to the cluster-local Whisper ASR /asr endpoint and save the transcript. Required binary (curl) matches the script's primary network calls.
Instruction Scope
The instructions and script only read the specified audio file and write the transcript output. They POST the file to the declared cluster-local endpoint and do not reference unrelated system files or external endpoints. Two notes: (1) the SKILL.md and script state 'No authentication required' — this means sensitive audio will be sent to that service without credentials, so users should confirm they trust that service. (2) the script invokes python3 to URL-encode an initial prompt (fallback is provided), but python3 is not listed in required binaries.
Install Mechanism
This is an instruction-only skill with no install spec and only a small shell script; nothing is downloaded or written to disk by an installer—low install risk.
Credentials
No environment variables, credentials, or config paths are requested. The script uses a hard-coded cluster-local base URL; no extra secrets are required, which is proportionate for the stated purpose.
Persistence & Privilege
The skill does not request always:true and does not modify other skills or system-wide configs. It is user-invocable and can be invoked autonomously (platform default), which is expected for a skill of this type.
Assessment
This skill is coherent for transcribing audio to a self-hosted Whisper service, but before installing or using it you should: (1) confirm the cluster-local endpoint (http://whisper-asr.whisper-asr.svc.cluster.local:9000) is the intended and trusted service — audio data will be uploaded there without authentication; (2) ensure the agent environment can reach that DNS name (it's a Kubernetes in-cluster name and may be unreachable from outside the cluster); (3) verify python3 is available if you plan to use the --prompt flag (the script falls back but may behave differently); and (4) treat transcripts/audio as potentially sensitive and ensure they are handled per your privacy policy. If any of these are unacceptable, do not enable the skill.Like a lobster shell, security has layers — review code before you run it.
Runtime requirements
🎙️ Clawdis
Binscurl
latest
Self-Hosted Whisper API (curl)
Transcribe an audio file via the Whisper ASR webservice at http://whisper-asr.whisper-asr.svc.cluster.local:9000.
Uses the onerahmet/openai-whisper-asr-webservice API (/asr endpoint).
Quick start
{baseDir}/scripts/transcribe.sh /path/to/audio.m4a
Defaults:
- Endpoint:
http://whisper-asr.whisper-asr.svc.cluster.local:9000/asr - Task:
transcribe - Output:
txt
Useful flags
{baseDir}/scripts/transcribe.sh /path/to/audio.ogg --language en --out /tmp/transcript.txt
{baseDir}/scripts/transcribe.sh /path/to/audio.m4a --language de
{baseDir}/scripts/transcribe.sh /path/to/audio.m4a --json --out /tmp/transcript.json
{baseDir}/scripts/transcribe.sh /path/to/audio.m4a --output srt --out /tmp/subtitles.srt
{baseDir}/scripts/transcribe.sh /path/to/audio.m4a --output vtt
{baseDir}/scripts/transcribe.sh /path/to/audio.m4a --translate
{baseDir}/scripts/transcribe.sh /path/to/audio.m4a --vad-filter --json
{baseDir}/scripts/transcribe.sh /path/to/audio.m4a --word-timestamps --json
Notes
- Supported
--outputformats:txt,json,vtt,srt,tsv --translateproduces an English transcript regardless of source language--vad-filterenables voice activity detection to skip silent sections--word-timestampsadds word-level timing (use with--json)- The model is configured on the server side (ASR_MODEL env var), not per request
- Swagger docs available at
http://whisper-asr.whisper-asr.svc.cluster.local:9000/docs - No authentication required
Comments
Loading comments...
