Install
openclaw skills install local-qwen3-asr-aipcLocal offline ASR on Windows — no cloud, no API cost, full privacy. Qwen3-ASR 0.6B + Intel OpenVINO, GPU-accelerated inference. NETWORK: required for first-time setup (install deps + download 2 GB model); NOT required for inference. Auto-extracts audio from video files (mp4, mkv, webm, mov, avi) — just pass the video path. Also supports audio: mp3, wav, flac, m4a, ogg, aac, wma, opus. Single file, batch folder, or watch-mode continuous transcription with automatic txt/json archive. 30 languages + 22 Chinese dialects, auto language detection. One-line LLM API: from acoustic_pipeline import AcousticPipeline Local speech to text, transcribe audio, voice recognition, transcribe video, transcribe recording, convert speech to text, audio transcription, local ASR, offline speech recognition, dictation on Windows. 本地离线语音识别,零云端,支持视频音轨自动提取,支持批量转录和文件夹监听,自动保存转写文档。 网络说明:仅首次安装环境和下载模型时需要网络,推理转录完全离线。
openclaw skills install local-qwen3-asr-aipcModel: snake7gun/Qwen3-ASR-0.6B-fp16-ov (ModelScope FP16)
SKILL_VERSION: 'v1.0.3'
First time? Before using this skill, run these two scripts once in a terminal:
python setup.py # [SYSTEM PYTHON OK] creates venv, installs deps (~5 min) python download_model.py # [SYSTEM PYTHON OK] downloads the model (~2 GB, resumable)Both scripts are in the skill directory alongside this SKILL.md.
Always use acoustic_pipeline.py as the entry point, called with VENV_PY (obtained from check_env.py output). It handles all cases:
# VENV_PY = value from check_env.py output, e.g. C:\intel_openvino\venv\Scripts\python.exe
# Single file
& "<VENV_PY>" "<skill_dir>\acoustic_pipeline.py" --file "<FILE_PATH>" --language auto
# Single file + save transcript
& "<VENV_PY>" "<skill_dir>\acoustic_pipeline.py" --file "<FILE_PATH>" --language auto --archive json
# Watch folder
& "<VENV_PY>" "<skill_dir>\acoustic_pipeline.py" --watch "<DIR_PATH>" --language auto --archive both
# Batch folder
& "<VENV_PY>" "<skill_dir>\acoustic_pipeline.py" --batch "<DIR_PATH>" --language auto --archive json
Never run
acoustic_pipeline.pywith systempython. It imports model packages (openvino,qwen_asr) that are only installed in the venv.
Use transcribe.py directly only when called internally by acoustic_pipeline.py — do not invoke it as a standalone entry point.
Any agent should treat this skill as a local audio/video transcription skill.
The result should be returned as a JSON object (or equivalent dictionary) with:
Example shape:
{
"text": "...",
"language": "Chinese",
"source_file": "C:\\demo\\meeting.mp4",
"source_format": ".mp4",
"confidence": null,
"archive_files": {
"json": "C:\\demo\\transcripts\\meeting_20260326_120000.json"
}
}
The agent should return a short structured error summary including:
ls, rm, cat). Never use && or call.state.json itself — do not pass paths between steps manually.VENV_PY from state.json for inference calls — never use system python for inference. check_env.py and setup.py are the only scripts intentionally run with system python (they create or validate the venv, so they must not depend on it). check_env.py validates the venv's packages, not system Python's — PACKAGES_MISSING from check_env.py means the venv is incomplete, not the system installation.transcribe.py is automatically deployed to ASR_DIR when setup.py runs. If it is missing, re-run setup.py.transcribe.py fails at runtime, do NOT edit it manually. Re-run setup.py to redeploy the original from the repository.thinker/ subdirectory — the scripts handle this automatically.check_env.py instead.Auto-recovery policy — try before asking user: ⚠️ Network handling (proxy-aware)
When running setup.py or download_model.py:
Try to read system proxy settings:
If proxy is detected:
If no proxy is detected:
If download/setup fails:
STATE=MISSING, VENV_PY=BROKEN, or PACKAGES_MISSING: automatically run setup.py (up to 3 attempts). Only ask user to run it manually if all 3 attempts fail. setup.py is idempotent — it skips any step already completed (venv, packages, repo clone, asr_engine.py, transcribe.py). Re-running a working environment is safe and fast.MODEL_STATUS=MISSING: automatically run download_model.py (up to 3 attempts). Only ask user to run it manually if all 3 attempts fail or if a single attempt runs longer than 8 minutes without completing (likely a slow connection — download_model.py supports resume so partial progress is not lost). download_model.py exits immediately if the model is already complete — re-running the script on an already-downloaded model is safe.🔍 Pre-flight: Checking Python and git...
python --version
Interpretation:
| Output | Action |
|---|---|
Python 3.10.x or higher | PYTHON_OK — set PYTHON_EXE=python, continue to git check |
Python 3.8 / 3.9 | Version too low; upgrade required (see below) |
'python' is not recognized... | Python missing; install required (see below) |
Python was not found; run without arguments... | Windows Store alias — run py --version instead (see below) |
If output contains "run without arguments to install from the Microsoft Store", the Windows Store App Execution Alias is shadowing the real Python. Do NOT ask the user to change settings, and do NOT write helper scripts. Use where.exe (native Windows tool, works in both PowerShell and cmd) to find the real Python:
⚠️ Do NOT write any helper .ps1 scripts. Run the command below directly in the terminal.
where.exe python 2>$null | Where-Object { $_ -notlike "*WindowsApps*" } | Select-Object -First 1
SYSTEM_PYTHON (this is the system-level Python, used only to run setup.py, check_env.py, and download_model.py). For every command marked [SYSTEM PYTHON], substitute the full literal path for python. Example: python "<skill_dir>\check_env.py" becomes "C:\Users\intel\AppData\Local\Programs\Python\Python312\python.exe" "<skill_dir>\check_env.py". Do NOT use this path for inference — inference must always use VENV_PY (the venv Python path printed by check_env.py). Do NOT rely on a $variable across tool calls — each call is a new shell process; always embed the literal path directly.**```powershell $f = "$env:TEMP\python-installer.exe" Invoke-WebRequest "https://www.python.org/ftp/python/3.12.10/python-3.12.10-amd64.exe" -OutFile $f Start-Process $f -ArgumentList "/quiet InstallAllUsers=0 PrependPath=1 Include_pip=1" -Wait Remove-Item $f
> `PrependPath=1` adds Python to PATH automatically; `Include_pip=1` installs pip; `InstallAllUsers=0` avoids requiring administrator privileges.
After installation, **restart the terminal**, then run `python --version` and confirm it reports `Python 3.12.x`.
If you prefer manual installation: download **https://www.python.org/ftp/python/3.12.10/python-3.12.10-amd64.exe** and make sure to check **"Add python.exe to PATH"** during setup.
### Check git
```bat
git --version
Interpretation:
| Output | Action |
|------|------|
| git version 2.x.x | ✅ GIT_OK, Pre-flight passed |
| 'git' is not recognized as an internal or external command | git is not installed; install is required (see below) |
If git is missing, run this one-command silent installer in PowerShell:
$f = "$env:TEMP\\git-installer.exe"
Invoke-WebRequest "https://github.com/git-for-windows/git/releases/download/v2.49.0.windows.1/Git-2.49.0-64-bit.exe" -OutFile $f
Start-Process $f -ArgumentList "/VERYSILENT /NORESTART /NOCANCEL /SP- /CLOSEAPPLICATIONS /RESTARTAPPLICATIONS /COMPONENTS=icons,ext\\reg\\shellhere,assoc,assoc_sh" -Wait
Remove-Item $f
After installation, restart the terminal, then run git --version to confirm.
If you prefer manual installation: open https://git-scm.com/download/win, download the installer, and proceed with default options.
git is required for the
git+https://dependency inrequirements_imagegen.txt; without git,pip installwill fail withgit: command not found.
Pre-flight pass criteria: python --version is >= 3.10 and git --version returns a valid version string.
Status message: ✅ Python and git are ready. Starting main workflow.
Pipeline — follow exactly in order, no skipping:
Step 0: parse request → AUDIO_PATH, LANGUAGE, TOPIC
Step 1: verify environment → run check_env.py → record VENV_PY and ASR_DIR
↳ if STATE=MISSING or VENV_BROKEN or PACKAGES_MISSING: auto-run setup.py (3 attempts)
↳ if SCRIPTS_STALE=...: auto-run setup.py to redeploy runtime scripts
↳ if MODEL_STATUS=MISSING: auto-run download_model.py (3 attempts)
Step 2: transcribe + send → run acoustic_pipeline.py using VENV_PY from Step 1
↳ fallback: run transcribe.py directly using VENV_PY and ASR_DIR from Step 1
---
Extract from the user's message:
| Field | Default | Notes |
|---|---|---|
AUDIO_PATH | required | Absolute path to audio/video file (wav/mp3/flac/m4a/ogg/aac/wma/opus/mp4/mkv/webm/flv/mov/avi/mts/m2ts/ts/m3u8) |
LANGUAGE | auto-detect | Optional: Chinese, English, Japanese, etc. |
TOPIC | English snake_case from context | Used for output filename |
If no audio file provided, ask the user before continuing.
Step 1/3: checking environment and model...
# [SYSTEM PYTHON] check_env.py creates/validates the venv — must NOT use venv python here
python "<skill_dir>\check_env.py"
check_env.pyandsetup.pyintentionally run with system Python — they create and validate the venv, so they cannot depend on it.
On success: record VENV_PY and ASR_DIR from output, proceed to Step 2.
check_env.pyalso printsSCRIPTS_STALE=<old>-><new>when the deployedtranscribe.pyis outdated. If this line appears, treat it as a mandatory auto-update — runsetup.pybefore Step 2 (see below).
On failure — auto-recovery (try before asking user):
SCRIPTS_STALE means the venv and model are both OK, but the deployed transcribe.py (and/or asr_engine.py) in ASR_DIR is an older version. setup.py is idempotent — it skips the venv, package, and model steps and only redeploys the outdated files.
⚙️ Runtime scripts are outdated. Redeploying (attempt 1/3)...
# [SYSTEM PYTHON] — redeploys transcribe.py and asr_engine.py to ASR_DIR
python "<skill_dir>\setup.py"
After running, re-run check_env.py to confirm SCRIPTS_STALE no longer appears, then proceed to Step 2.
PACKAGES_MISSING means the venv exists but required packages (e.g. openvino, qwen_asr) are not installed. Re-running setup.py re-installs only what is missing; it does not recreate the venv or re-clone the repo.
Announce and run (up to 3 attempts):
⚙️ Environment is not initialized. Running automatic setup (attempt 1/3)...
# [SYSTEM PYTHON] setup.py creates the venv — must NOT use venv python here
python "<skill_dir>\setup.py"
After each attempt, re-run check_env.py to verify. If all 3 attempts fail, show manual fallback below.
Announce and run (up to 3 attempts, stop if a single attempt exceeds 8 minutes):
📥 Model not found. Starting automatic download (attempt 1/3)...
Estimated time: ~3 minutes at 100 Mbps, ~5 minutes at 50 Mbps
Download supports resume; rerun safely after interruption.
# [SYSTEM PYTHON] download_model.py runs before venv — must NOT use venv python here
python "<skill_dir>\download_model.py"
After each attempt, re-run check_env.py to verify. If all 3 attempts fail, show manual fallback below.
Show user this message:
⚠️ Automatic setup failed. Manual steps are required.
Open a Windows terminal (PowerShell or Command Prompt) and run the following in order:
1) Install environment (if not installed yet):
python "<skill_dir>\setup.py" # [SYSTEM PYTHON OK]
Expected duration: about 5 minutes, fully automated.
2) Download model (about 2 GB):
python "<skill_dir>\download_model.py" # [SYSTEM PYTHON OK]
Download supports resume; rerun safely after interruption.
Estimated time: ~3 minutes at 100 Mbps, ~5 minutes at 50 Mbps.
After completion, return here and resend your request.
Step 2/2: transcribing...
Preferred path — run acoustic_pipeline.py with the venv Python obtained from Step 1:
# [VENV PYTHON] VENV_PY = value printed by check_env.py, e.g. C:\intel_openvino\venv\Scripts\python.exe
# Never use system python here — openvino/qwen_asr are only in the venv
& "<VENV_PY>" "<skill_dir>\acoustic_pipeline.py" --file "AUDIO_PATH" --language auto --archive json
If the user specified a language:
# [VENV PYTHON]
& "<VENV_PY>" "<skill_dir>\acoustic_pipeline.py" --file "AUDIO_PATH" --language "LANGUAGE" --archive json
Return the JSON result directly to the conversation and include any archive_files paths.
Fallback — only if acoustic_pipeline.py exits with a non-zero code:
# [VENV PYTHON] fallback — both VENV_PY and ASR_DIR come from check_env.py (Step 1)
& "<VENV_PY>" "<ASR_DIR>\transcribe.py" --audio "AUDIO_PATH" --language "LANGUAGE"
Never use system python here.
Pass: $LASTEXITCODE -eq 0. Stdout is a single line of JSON with this shape:
{"text": "...", "language": "Chinese", "time_elapsed": 12.3, "audio_path": "C:\\audio\\file.wav"}
Parse with $result = $stdout | ConvertFrom-Json. Record TRANSCRIPT from $result.text and LANG from $result.language.
Fail: if $LASTEXITCODE -ne 0, do not attempt to parse stdout. Show the stderr output to the user.
Send via message tool:
action: "send" message: "✅ LANG\n\nTRANSCRIPT"
| Error | Fix |
|---|---|
Python was not found; run without arguments... | Windows Store alias blocking python. Run: `where.exe python 2>$null |
STATE=MISSING | Run python "<skill_dir>\setup.py" |
VENV_PY=BROKEN | Re-run python "<skill_dir>\setup.py" — it will rebuild the venv |
PACKAGES_MISSING: ... | Re-run python "<skill_dir>\setup.py" — re-installs missing venv packages only; skips steps already done |
MODEL_STATUS=MISSING | Run python "<skill_dir>\download_model.py" — exits immediately if model is already complete |
[ERROR] Audio not found | Verify the file path is correct and the file exists |
[ERROR] Model incomplete | Re-run python "<skill_dir>\download_model.py" — supports resume |
[ERROR] state.json not found | Re-run Step 1 (check_env.py) |
SCRIPTS_STALE=v1.0.1->v1.0.2 | Deployed runtime scripts are outdated. Run python "<skill_dir>\setup.py" — it redeploys only the changed files, skipping venv/packages/model (fast). |
RuntimeError on GPU | Run check_env.py first to confirm model is READY. If model is OK, try adding --language auto to remove language mismatch. If still failing, re-run setup.py to upgrade OpenVINO and redeploy runtime files. |
For agents that prefer a Python import interface over shell commands, run this inside a subprocess using VENV_PY:
# Must be run with VENV_PY, not system python
& "<VENV_PY>" -c "
import sys; sys.path.insert(0, r'<skill_dir>')
from acoustic_pipeline import AcousticPipeline
pipeline = AcousticPipeline()
result = pipeline.transcribe(r'C:\\meeting.mp4', language='auto', archive_mode='json')
print(result['text'])
"
Or if your agent framework already runs inside the venv Python process:
from acoustic_pipeline import AcousticPipeline
pipeline = AcousticPipeline()
result = pipeline.transcribe("C:\\meeting.mp4", language="auto", archive_mode="json")
print(result["text"])
print(result.get("archive_files"))
check_env.py — it is versioned, auditable, and repeatable.