Local Transcription

v1.0.2

Local speech-to-text transcription with Qwen ASR — transcription routed across your Apple Silicon fleet. Transcribe meetings, voice notes, podcasts with loca...

⭐ 1· 157·2 current·2 all-time

byTwin Geeks@twinsgeeks

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for twinsgeeks/local-transcription.

Previewing Install & Setup.

Prompt PreviewInstall & Setup

Install the skill "Local Transcription" (twinsgeeks/local-transcription) from ClawHub.
Skill page: https://clawhub.ai/twinsgeeks/local-transcription
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install local-transcription

ClawHub CLI

Package manager switcher

npx clawhub@latest install local-transcription

Security Scan

VirusTotal

Pending

View report →

OpenClaw

Benign

high confidence

✓

Purpose & Capability

Name/description claim local ASR across an Apple Silicon fleet and the SKILL.md only requires local HTTP endpoints, curl/wget, and optional python/pip; the listed metadata config paths (~/.fleet-manager/...) are consistent with a fleet manager and the darwin OS restriction matches Apple Silicon.

ℹ

Instruction Scope

Instructions focus on installing and running a local router/node and calling localhost endpoints for transcription and other workloads. They do not ask the agent to read unrelated files or external credentials. Note: running the router/node will open network services (port 11435) and may expose endpoints to LAN depending on configuration—verify binding to localhost if you want strictly local-only access.

ℹ

Install Mechanism

This is instruction-only (no install spec). The SKILL.md tells users to pip install 'ollama-herd' and to run a model installer (uv tool) which will download model weights and software from external sources—expected for a local ASR setup but means large downloads and external network access during setup. No opaque download URLs are embedded in the skill itself.

✓

Credentials

No environment variables, credentials, or unrelated config paths are requested beyond fleet manager files. The lack of secret requests is appropriate for the claimed local-only transcription purpose.

✓

Persistence & Privilege

always:false (not force-enabled). The skill is user-invocable and allows autonomous invocation (platform default) which is expected. The skill does not request to modify other skills or system-wide agent settings.

Assessment

This skill appears coherent, but before installing: 1) confirm you trust the PyPI package and the 'uv' model installer sources (they will download model weights). 2) Check herd configuration to ensure the router binds to localhost if you want traffic restricted to your machine (otherwise it may be reachable on the LAN). 3) Inspect ~/.fleet-manager files (latency.db, logs) after startup to understand what telemetry/logs are collected. 4) Ensure adequate disk space and bandwidth for model downloads. 5) If you plan to run nodes across multiple devices, only join devices you control or fully trust.

Like a lobster shell, security has layers — review code before you run it.

Runtime requirements

microphone Clawdis

OSmacOS

Any bincurl, wget

latestvk97eerep3vw6qn6wbd699jjz7s844tr6

157downloads

1stars

5versions

Updated 3w ago

v1.0.2

MIT-0

macOS

Local Speech-to-Text Transcription

You're helping someone use speech-to-text transcription on audio files — meetings, voice memos, podcast episodes, phone recordings — without sending anything to the cloud. Every audio file stays on their devices. The fleet picks the best node to handle each speech-to-text transcription automatically.

Why local speech-to-text transcription matters

Cloud speech-to-text transcription APIs charge per minute and send your audio to third-party servers. Meeting recordings contain sensitive business discussions. Voice notes contain personal thoughts. Podcast interviews contain unreleased content. None of that should leave your network. Local transcription keeps it private.

This skill routes speech-to-text transcription requests across your fleet of devices. If one machine is busy with a 3-hour transcription, the next speech-to-text request goes to a different device. Transcription queue management, health monitoring, and dashboard visibility — same infrastructure you'd get from a cloud speech-to-text API, running entirely on your hardware.

Get started with speech-to-text transcription

pip install ollama-herd
herd                                    # start the transcription router (port 11435)
herd-node                               # start on each transcription device
uv tool install "mlx-qwen3-asr[serve]" --python 3.14  # install speech-to-text model

Enable speech-to-text transcription:

curl -X POST http://localhost:11435/dashboard/api/settings \
  -H "Content-Type: application/json" \
  -d '{"transcription": true}'

Package: ollama-herd | Repo: github.com/geeks-accelerator/ollama-herd

Transcribe audio with speech-to-text

curl — basic transcription

# Speech-to-text transcription of a meeting recording
curl -s http://localhost:11435/api/transcribe \
  -F "audio=@meeting-recording.wav" | python3 -m json.tool

Python — speech-to-text transcription

import httpx

def speech_to_text_transcription(audio_path):
    """Run speech-to-text transcription on an audio file."""
    with open(audio_path, "rb") as f:
        transcription_resp = httpx.post(
            "http://localhost:11435/api/transcribe",
            files={"audio": (audio_path, f)},
            timeout=300.0,
        )
    transcription_resp.raise_for_status()
    transcription_result = transcription_resp.json()
    return transcription_result["text"]

# Run speech-to-text transcription
transcription_text = speech_to_text_transcription("meeting.wav")
print(transcription_text)

Speech-to-text transcription with timestamps

def transcription_with_timestamps(audio_path):
    """Speech-to-text transcription returning timestamped chunks."""
    with open(audio_path, "rb") as f:
        transcription_resp = httpx.post(
            "http://localhost:11435/api/transcribe",
            files={"audio": (audio_path, f)},
            timeout=300.0,
        )
    transcription_resp.raise_for_status()
    transcription_result = transcription_resp.json()
    for transcription_chunk in transcription_result.get("chunks", []):
        print(f"[{transcription_chunk['start']:.1f}s - {transcription_chunk['end']:.1f}s] {transcription_chunk['text']}")
    return transcription_result

Transcription response format

{
  "transcription_text": "Hello, this is a test of the speech-to-text transcription system.",
  "language": "English",
  "transcription_chunks": [
    {
      "text": "Hello, this is a test of the speech-to-text transcription system.",
      "start": 0.0,
      "end": 3.2,
      "chunk_index": 0,
      "language": "English"
    }
  ]
}

Supported audio formats for transcription

WAV, MP3, M4A, FLAC, MP4, OGG — any format FFmpeg supports. WAV files get a ~25% transcription speed boost via native fast-path.

Speech-to-text transcription response headers

Header	Description
`X-Fleet-Node`	Which device performed the speech-to-text transcription
`X-Fleet-Model`	Transcription model used (qwen3-asr)
`X-Transcription-Time`	Transcription processing time in milliseconds

Speech-to-text transcription model

Qwen3-ASR — state-of-the-art open-source speech-to-text transcription in 2026. ~5% word error rate, runs natively on Apple Silicon via MLX. The 0.6B transcription model uses ~1.2GB memory and transcribes at 0.08x real-time factor (a 10-minute recording completes transcription in ~48 seconds).

Also available on this fleet

The same router handles three other AI workloads alongside speech-to-text transcription. All endpoints are at http://localhost:11435:

LLM inference

curl http://localhost:11435/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"gpt-oss:120b","messages":[{"role":"user","content":"Hello"}]}'

Image generation

curl -o image.png http://localhost:11435/api/generate-image \
  -H "Content-Type: application/json" \
  -d '{"model":"z-image-turbo","prompt":"a sunset","width":1024,"height":1024,"steps":4}'

Embeddings

curl http://localhost:11435/api/embeddings \
  -d '{"model":"nomic-embed-text","prompt":"search query"}'

Monitoring speech-to-text transcription

# Transcription stats (last 24h)
curl -s http://localhost:11435/dashboard/api/transcription-stats | python3 -m json.tool

# Fleet health (includes speech-to-text transcription activity)
curl -s http://localhost:11435/dashboard/api/health | python3 -m json.tool

Dashboard at http://localhost:11435/dashboard — speech-to-text transcription queues show with [STT] badge alongside LLM and image queues.

Full documentation

Agent Setup Guide — complete reference for all 4 model types including speech-to-text transcription with Python, JavaScript, and curl examples.

Guardrails

Never delete or modify audio files provided by the user for transcription.
Never send audio data to external services — all speech-to-text transcription is local.
Never delete or modify files in ~/.fleet-manager/.
If transcription fails, suggest checking node logs: tail ~/.fleet-manager/logs/herd.jsonl.
If no speech-to-text models available, suggest installing: uv tool install "mlx-qwen3-asr[serve]" --python 3.14.

Comments

Loading comments...