faster-whisper 中文版 - 高性能本地语音转文字工具

Security checks across malware telemetry and agentic risk

Overview

The skill is a coherent local transcription tool, but it needs review because its batch script hardcodes a third-party model mirror and uses unsafe shell command construction.

Review before installing. Use it only for audio you intend to transcribe, prefer the official Hugging Face endpoint unless you explicitly trust hf-mirror.com, and avoid running the batch script on directories containing untrusted filenames until the eval-based command execution is fixed.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
  • MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
  • MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
  • Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Findings (4)

Lp3

Medium
Category
MCP Least Privilege
Confidence
89% confidence
Finding
The skill documentation instructs users to run shell commands and a setup script, and it includes examples that create or write files (for example, a batch script), yet the skill declares no permissions. This mismatch is dangerous because it obscures the skill's actual capabilities from policy and review systems, increasing the chance that file writes or shell execution occur without explicit scrutiny.

Description-Behavior Mismatch

Medium
Confidence
92% confidence
Finding
The script advertises a local transcription tool but forcibly configures Hugging Face to use an external mirror endpoint, which can cause model downloads and metadata leakage to a third party without explicit user consent. In a security context, this expands the trust boundary and introduces supply-chain and privacy risk because model artifacts are obtained from infrastructure outside the local environment and outside the stated tool behavior.

Missing User Warnings

Medium
Confidence
86% confidence
Finding
The README recommends setting HF_ENDPOINT to a third-party mirror, which changes where model download traffic is sent without clearly warning users that requests, metadata, and downloaded artifacts will come from a non-default service. In a speech-transcription skill, this creates supply-chain and privacy risk because users may unknowingly trust mirrored model sources and disclose network metadata to that operator.

Vague Triggers

Medium
Confidence
78% confidence
Finding
The trigger phrases include very broad terms such as 'whisper' and generic requests like '语音转文字', which can match many ordinary conversations or unrelated user intents. Overly broad activation is risky because it can cause the wrong skill to run automatically, leading to unexpected shell usage, dependency installation, network/model downloads, or processing of local audio files when the user did not specifically request this tool.

VirusTotal

64/64 vendors flagged this skill as clean.

View on VirusTotal