Gipformer ASR
v1.0.0Vietnamese speech-to-text using Gipformer ASR (65M params, Zipformer-RNNT). Accepts audio of any length — the server handles VAD chunking, batching, and retu...
⭐ 0· 107·0 current·0 all-time
MIT-0
Download zip
LicenseMIT-0 · Free to use, modify, and redistribute. No attribution required.
Security Scan
OpenClaw
Benign
high confidencePurpose & Capability
Name/description (Vietnamese ASR) align with the included code and requirements: scripts implement VAD chunking, ONNX-based inference (sherpa-onnx), a FastAPI server, and a client. Required packages in requirements.txt are consistent with the functionality.
Instruction Scope
SKILL.md instructs installing dependencies, running a local server, and sending base64 audio to /transcribe. The runtime instructions and code operate on provided audio files and do not read unrelated system files or env vars. The server decodes audio, chunks it, runs inference, and returns transcripts as described.
Install Mechanism
There is no automated install spec in the registry; SKILL.md expects the user to pip install -r requirements.txt. Model files are downloaded at first run from Hugging Face (hf_hub_download). Network downloads and heavy native/system deps (ffmpeg, libsndfile) are required — expected for this use-case but worth noting before install.
Credentials
The skill does not request environment variables, credentials, or configuration paths. It uses huggingface_hub to download public model files; if a private repo were used the huggingface token (HUGGINGFACE_HUB_TOKEN) would be used by the library but is not required by this package.
Persistence & Privilege
Skill is not always-enabled and does not modify other skills or system-wide agent settings. It runs a local server when started; no privileged or persistent platform-level presence is requested by the skill metadata.
Assessment
This skill appears coherent for running a local Vietnamese ASR server, but review and be prepared for the following before installing: 1) It will download model files from Hugging Face at first run — verify the REPO_ID (g-group-ai-lab/gipformer-65M-rnnt) is trusted. 2) You must install Python packages (sherpa-onnx, onnxruntime, silero-vad, fastapi, etc.) and system dependencies like ffmpeg and possibly libsndfile — these can be large and may require system package installs. 3) The server executes ffmpeg via subprocess and writes temporary files while decoding uploaded audio; run in a sandbox/virtualenv or container if you want isolation. 4) No secrets are requested by the skill, but huggingface_hub may use your HUGGINGFACE_HUB_TOKEN automatically if present (only needed for private models). 5) If you plan to expose the server beyond localhost, review network/security settings (authentication is not implemented). If uncertain, run the code in a controlled environment and inspect the repository on Hugging Face before use.Like a lobster shell, security has layers — review code before you run it.
latestvk977v94sz8wn11d3v9fc4yxxe183jy7s
License
MIT-0
Free to use, modify, and redistribute. No attribution required.
