iFlytek ASR - 讯飞语音转文字

Security checks across malware telemetry and agentic risk

Overview

This skill coherently provides cloud transcription and optional YouTube audio download, with privacy and transport-safety caveats users should understand before use.

Install only if you are comfortable sending chosen audio to iFlytek for cloud transcription and using yt-dlp for YouTube downloads. Keep the .env API credentials private and out of git, avoid sensitive or regulated recordings unless authorized, delete downloaded audio/transcripts when no longer needed, and consider removing the downloader's certificate-bypass option before use.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands

Findings (8)

Lp3

Medium

Category: MCP Least Privilege
Confidence: 91% confidence
Finding: The skill documentation describes capabilities that access environment variables, write files, use the network, and invoke shell-installed tooling, but it does not declare permissions or present those capabilities to the user in a structured way. This creates a transparency and consent problem: users may trigger downloads, local file writes, and external API transmission without clearly understanding the operational scope.

Context-Inappropriate Capability

Medium

Confidence: 99% confidence
Finding: Disabling TLS certificate verification removes authentication of the remote endpoint and allows man-in-the-middle interception or tampering of downloaded content. In a downloader skill that fetches untrusted remote media, this increases the chance of malicious or corrupted content being delivered silently.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: The quickstart explicitly instructs users to copy long-lived API credentials into a local `.env` file and even shows realistic variable names and values, but provides no warning about secret handling, file permissions, or preventing accidental commits. In an agent skill context, this is more dangerous because users may follow setup steps mechanically and later expose the `.env` file through source control, logs, backups, or shared workspaces, leading to credential theft and misuse of the cloud speech API.

Missing User Warnings

Medium

Confidence: 91% confidence
Finding: The README encourages uploading local audio/video to iFlytek and downloading/transcribing YouTube content, but it does not clearly disclose that media is sent to a third-party service or warn about privacy, consent, and copyright implications. This can lead users to process sensitive meetings, voice notes, or copyrighted material without understanding the data-sharing and legal risks.

Vague Triggers

Medium

Confidence: 84% confidence
Finding: The trigger language is broad enough that the skill could auto-match many ordinary transcription-related requests, including cases where the user did not intend to use a third-party cloud ASR provider or download YouTube media. Over-broad triggering increases the chance of unintended activation and accidental transfer of user content to external services.

Missing User Warnings

Medium

Confidence: 96% confidence
Finding: The skill does not clearly warn users that uploaded audio and referenced YouTube URLs/content are sent to external services and that YouTube media is downloaded to local storage. This is a meaningful privacy and compliance risk because users may provide sensitive recordings or proprietary media without informed consent about where data goes and what is stored locally.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: The script uploads full audio content to a third-party cloud API without any explicit user warning or consent flow about external transmission of potentially sensitive voice data. In an ASR skill, this is contextually important because recordings may contain confidential meetings, personal information, or regulated data.

Missing User Warnings

Low

Confidence: 87% confidence
Finding: The script automatically writes the transcript to disk without warning that recognized speech may contain sensitive content and will persist locally. This creates avoidable privacy and data-handling risk, especially on shared machines or in workflows involving confidential audio.

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal