Security audit

feishu-asr

Security checks across malware telemetry and agentic risk

Overview

This Feishu voice transcription skill has a plausible purpose, but its offline/no-network claims conflict with remote model downloads, cloud ASR setup, and sensitive message access.

Review before installing. Use only if you are comfortable granting Feishu message-read access and possibly sending voice audio to cloud ASR providers. For sensitive workplace messages, prefer a version that clearly states whether audio stays local, pins model sources, disables remote downloads unless explicitly approved, and scopes Feishu access narrowly.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access
Supply ChainUnpinned Dependencies, External Script Fetching, Obfuscated Code

Findings (9)

Tp4

High

Category: MCP Tool Poisoning
Confidence: 93% confidence
Finding: The skill claims it is local, offline, and does not require networking, but the implementation explicitly configures a remote Hugging Face mirror and states that models are downloaded on first run. This mismatch can mislead users and operators into enabling a skill under false assumptions about data flow, network access, and supply-chain exposure.

Intent-Code Divergence

High

Confidence: 96% confidence
Finding: The README and skill metadata make contradictory claims: the skill is advertised as using a local/offline Whisper model, but the setup instructions require cloud ASR credentials and later direct users to external ASR providers. This mismatch is dangerous because operators may deploy the skill under false privacy and network assumptions, causing unintended transmission of voice data and secret provisioning to third parties.

Description-Behavior Mismatch

High

Confidence: 97% confidence
Finding: The README explicitly describes downloading user voice messages and sending them to external ASR services, directly conflicting with the stated offline/no-registration design. In the context of a voice-processing skill, this makes the issue more dangerous because users and administrators may rely on the offline claim when handling potentially sensitive audio, leading to undisclosed data exfiltration to cloud vendors.

Description-Behavior Mismatch

High

Confidence: 96% confidence
Finding: The manifest advertises the skill as offline and not requiring networking, yet the documented setup downloads Whisper models from Hugging Face or a mirror. This is dangerous because deployment decisions, sandboxing, and trust assumptions may rely on the manifest, causing unapproved outbound network access and external model retrieval.

Intent-Code Divergence

High

Confidence: 97% confidence
Finding: The documentation directly contradicts itself by promising offline operation while configuring `HF_ENDPOINT` and warning that the first run downloads models. This inconsistency increases operational risk because users may unknowingly permit network egress, fetch unpinned artifacts, and process sensitive audio in an environment they believed was air-gapped.

Description-Behavior Mismatch

High

Confidence: 96% confidence
Finding: The skill advertises that it is fully offline and does not require networking, but the code explicitly sets a Hugging Face mirror endpoint and loads models via from_pretrained(), which may trigger remote downloads. This creates a trust and privacy issue because users may provide sensitive voice data or run the skill in restricted environments based on a false no-networking claim.

Intent-Code Divergence

High

Confidence: 97% confidence
Finding: The inline documentation says no network is needed, while the code configures HF_ENDPOINT to an online mirror and then uses transformer loaders that can access the network. In a security-sensitive agent ecosystem, misleading claims about network behavior can cause unsafe deployment in air-gapped or privacy-constrained contexts.

Missing User Warnings

Medium

Confidence: 91% confidence
Finding: The README instructs users to download voice messages and use external ASR services but does not warn that user audio may be transmitted to third parties or describe any privacy implications. Because voice messages often contain personal or sensitive content, the absence of an explicit disclosure can cause unintentional privacy violations and noncompliant handling of user data.

Missing User Warnings

Medium

Confidence: 90% confidence
Finding: The model-loading calls can download artifacts from the network, but there is no warning or consent at the point where this occurs. This is dangerous because it can violate user expectations, leak metadata such as IP/environment access, and break policies that assume the skill operates without outbound connectivity.

VirusTotal

64/64 vendors flagged this skill as clean.

View on VirusTotal