Gipformer ASR

Security checks across static analysis, malware telemetry, and agentic risk

Overview

This appears to be a coherent Vietnamese speech-to-text tool, with the main considerations being external model/dependency downloads and where audio files are sent for transcription.

Before installing, use a virtual environment, review the PyPI dependencies and Hugging Face model source, keep the server bound to 127.0.0.1 unless you intentionally secure it, and avoid sending sensitive audio to a remote --server URL.

Static analysis

No static analysis findings were reported for this release.

VirusTotal

VirusTotal findings are pending for this skill version.

View on VirusTotal

Risk analysis

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

#ASI04: Agentic Supply Chain Vulnerabilities

Low

What this means

First run depends on external model artifacts; if the upstream source or dependency environment changes, the local transcription environment may change too.

Why it was flagged

The server fetches model files from Hugging Face at startup. This is expected for the stated ASR purpose, but it creates a normal third-party model provenance dependency.

Skill content

paths[key] = hf_hub_download(repo_id=REPO_ID, filename=filename)

Recommendation

Install in an isolated environment, verify the Hugging Face model source, and consider pinning dependency/model versions for reproducible deployments.

#ASI02: Tool Misuse and Exploitation

Low

What this means

If exposed on a network, other reachable clients could submit audio for processing and consume local compute.

Why it was flagged

The documented server can bind to all network interfaces. The default is local, but using this option broadens who can reach the transcription API.

Skill content

python serve.py --host 0.0.0.0 --num-threads 8

Recommendation

Keep the server bound to 127.0.0.1 for personal use, or add network access controls if intentionally exposing it.

#ASI05: Unexpected Code Execution

Low

What this means

Processing malicious or malformed media could expose the local ffmpeg installation to parser vulnerabilities.

Why it was flagged

The server invokes ffmpeg to decode audio. The command arguments are fixed and purpose-aligned, but media parsing is still a local execution surface.

Skill content

result = subprocess.run(["ffmpeg", "-y", "-i", tmp_path, "-f", "wav", ...], capture_output=True, timeout=120)

Recommendation

Keep ffmpeg updated and avoid exposing the API to untrusted uploaders unless the host is properly isolated.

#ASI07: Insecure Inter-Agent Communication

Low

What this means

Private voice recordings or meeting audio could be transmitted to a non-local server if the user changes the server URL.

Why it was flagged

The client reads the full audio file and sends it to the configured HTTP server URL. The default server is localhost, but the --server option can point elsewhere.

Skill content

audio_b64 = base64.b64encode(f.read()).decode("ascii") ... requests.post(f"{server_url}/transcribe", json={"audio_b64": audio_b64}, timeout=600)

Recommendation

Use the default localhost server for private audio, and only send audio to remote servers you trust.