Speech To Text

Transcribe audio to text with Whisper models via inference.sh CLI. Models: Fast Whisper Large V3, Whisper V3 Large. Capabilities: transcription, translation,...

MIT-0 · Free to use, modify, and redistribute. No attribution required.

⭐ 0 · 1.9k · 24 current installs · 24 all-time installs

byÖmer Karışman@okaris

MIT-0

Security Scan

VirusTotal

Suspicious

View report →

OpenClaw

Suspicious

medium confidence

✓

Purpose & Capability

Name/description align with the instructions: the SKILL.md consistently instructs use of the inference.sh CLI and specific Whisper apps for transcription, translation, timestamps, etc. Nothing requested in the doc is unrelated to transcription.

Instruction Scope

The instructions tell the user/agent to run a network installer via `curl -fsSL https://cli.inference.sh | sh` and to run `infsh login` and `infsh app run ... --input '{"audio_url": "https://..."}'`. That means audio will be uploaded/sent to inference.sh apps. The SKILL.md does not declare what credentials are required, how login works, or the privacy/retention policies for uploaded audio. It also encourages installing and running code fetched from the network, which expands the runtime scope beyond local-only transcription.

Install Mechanism

Although the registry shows no formal install spec, the instructions explicitly recommend piping a remote script to sh (download-and-run) and say it downloads binaries from dist.inference.sh. Downloading and running a remote installer is higher-risk than an instruction-only skill; while the URLs are consistent (inference.sh / dist.inference.sh) and a checksum file is referenced, the installer pattern (curl | sh) and archive extraction are not enforced by the registry metadata and may write binaries to disk.

ℹ

Credentials

The skill declares no required environment variables or primary credential, which matches the registry. However, the SKILL.md tells the user to run `infsh login` (implying credentials or an account are needed) and to send audio URLs to remote apps — this implicitly requires account credentials or interactive login and causes data to leave the host. The lack of declared credential requirements and absence of privacy/retention guidance is a gap.

✓

Persistence & Privilege

Registry flags do not request persistent/always-on privileges and the skill is instruction-only. The only persistence risk comes from the installer it recommends (a local binary), but the skill itself does not request always-on or modify other skills' configs.

What to consider before installing

This skill appears to do what it says (use inference.sh to transcribe audio) but exercise caution before following the installer steps. Avoid piping unknown remote scripts directly into sh; if you want to use this skill, manually inspect the installer at https://cli.inference.sh and verify checksums from https://dist.inference.sh/checksums.txt before running. Understand that `infsh app run ... --input {"audio_url": ...}` will send audio to inference.sh's servers—check their privacy, retention, and security policies before uploading sensitive audio. Expect to create or provide an inference.sh account (login/API key) even though no env vars are declared. If you need offline/local-only transcription or stronger privacy guarantees, consider an alternative that runs models locally without uploading data.

Like a lobster shell, security has layers — review code before you run it.

Current versionv0.1.5

Download zip

latestvk979c27dp3d69xd9kh7c1sae1981ck6e

License

MIT-0

Free to use, modify, and redistribute. No attribution required.

Termshttps://spdx.org/licenses/MIT-0.html

SKILL.md

Speech-to-Text

Transcribe audio to text via inference.sh CLI.

Speech-to-Text

Quick Start

curl -fsSL https://cli.inference.sh | sh && infsh login

infsh app run infsh/fast-whisper-large-v3 --input '{"audio_url": "https://audio.mp3"}'

Install note: The install script only detects your OS/architecture, downloads the matching binary from dist.inference.sh, and verifies its SHA-256 checksum. No elevated permissions or background processes. Manual install & verification available.

Available Models

Model	App ID	Best For
Fast Whisper V3	`infsh/fast-whisper-large-v3`	Fast transcription
Whisper V3 Large	`infsh/whisper-v3-large`	Highest accuracy

Examples

Basic Transcription

infsh app run infsh/fast-whisper-large-v3 --input '{"audio_url": "https://meeting.mp3"}'

With Timestamps

infsh app sample infsh/fast-whisper-large-v3 --save input.json

# {
#   "audio_url": "https://podcast.mp3",
#   "timestamps": true
# }

infsh app run infsh/fast-whisper-large-v3 --input input.json

Translation (to English)

infsh app run infsh/whisper-v3-large --input '{
  "audio_url": "https://french-audio.mp3",
  "task": "translate"
}'

From Video

# Extract audio from video first
infsh app run infsh/video-audio-extractor --input '{"video_url": "https://video.mp4"}' > audio.json

# Transcribe the extracted audio
infsh app run infsh/fast-whisper-large-v3 --input '{"audio_url": "<audio-url>"}'

Workflow: Video Subtitles

# 1. Transcribe video audio
infsh app run infsh/fast-whisper-large-v3 --input '{
  "audio_url": "https://video.mp4",
  "timestamps": true
}' > transcript.json

# 2. Use transcript for captions
infsh app run infsh/caption-videos --input '{
  "video_url": "https://video.mp4",
  "captions": "<transcript-from-step-1>"
}'

Supported Languages

Whisper supports 99+ languages including: English, Spanish, French, German, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, Hindi, Russian, and many more.

Use Cases

Meetings: Transcribe recordings
Podcasts: Generate transcripts
Subtitles: Create captions for videos
Voice Notes: Convert to searchable text
Interviews: Transcription for research
Accessibility: Make audio content accessible

Output Format

Returns JSON with:

text: Full transcription
segments: Timestamped segments (if requested)
language: Detected language

Related Skills

# Full platform skill (all 150+ apps)
npx skills add inference-sh/skills@inference-sh

# Text-to-speech (reverse direction)
npx skills add inference-sh/skills@text-to-speech

# Video generation (add captions)
npx skills add inference-sh/skills@ai-video-generation

# AI avatars (lipsync with transcripts)
npx skills add inference-sh/skills@ai-avatar-video

Browse all audio apps: infsh app list --category audio

Documentation

Running Apps - How to run apps via CLI
Audio Transcription Example - Complete transcription guide
Apps Overview - Understanding the app ecosystem

Files

1 total

Select a file

Select a file to preview.

Comments

Loading comments…