OmniVoice

Security checks across malware telemetry and agentic risk

Overview

OmniVoice is a disclosed voice toolkit, but it handles sensitive voice data and users should install it only with clear consent and credential controls.

Install only if you are comfortable storing voice samples in the workspace and sending selected audio to SiliconFlow or Feishu. Use it only for voices you are authorized to process or clone, protect API keys, verify recipient IDs before sending, and delete voice-refs and TOOLS.md entries when no longer needed.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands

Findings (8)

Description-Behavior Mismatch

Medium

Confidence: 91% confidence
Finding: Adding Feishu voice-message sending introduces an external data-exfiltration path not inherent to local speaker identification or cloning. In a skill handling sensitive biometric voice data, undocumented outbound messaging materially increases privacy and misuse risk.

Context-Inappropriate Capability

Medium

Confidence: 88% confidence
Finding: The Feishu integration is not justified by the stated purpose of voice analysis and cloning, so it creates unnecessary attack surface and an easy path to share synthesized or captured audio outside the workspace. That is especially sensitive here because the skill processes identity-linked voice prints and cloned speech.

Description-Behavior Mismatch

Medium

Confidence: 90% confidence
Finding: The script adds an outbound Feishu messaging capability that is not disclosed in the skill's stated voice-toolkit description. Hidden communication features are risky because they can exfiltrate user audio or derived artifacts to third-party recipients outside the user's expected workflow.

Context-Inappropriate Capability

Medium

Confidence: 88% confidence
Finding: The script reads FEISHU_APP_ID and FEISHU_APP_SECRET from the environment and uses them to obtain a tenant token for outbound IM delivery, a capability not justified by the described speaker/voice-processing purpose. In a skill handling audio and voice identity data, undisclosed access to messaging credentials materially increases the risk of silent data export.

Vague Triggers

Medium

Confidence: 90% confidence
Finding: Broad triggers like 'voice' and 'audio' are likely to cause accidental activation on ordinary conversations or benign file handling requests. For a skill that can identify speakers, store voice references, clone voices, and transmit audio, unintended invocation raises substantial privacy and misuse concerns.

Missing User Warnings

High

Confidence: 97% confidence
Finding: The skill exposes highly sensitive capabilities—speaker identification, persistent voice library storage, voice cloning, and external audio transmission—without user-facing warnings or misuse boundaries. In this context, lack of consent, disclosure, and anti-impersonation guidance makes the skill more dangerous because it handles biometric data and enables realistic spoofing.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: When --ref or --ref-url is used, the script sends reference audio and accompanying transcript text to a third-party API for voice cloning without any consent check, notice, or policy enforcement. In a voice identity/cloning skill, this is especially sensitive because biometric voice data and potentially impersonation-enabling samples are transmitted off-box, raising privacy, legal, and abuse risks.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: This script performs speaker identification on user-supplied audio against stored reference voices, which involves processing biometric voice data. There is no explicit warning, consent check, or notice about the sensitivity of voiceprints and reference samples, increasing the risk of unauthorized identification, privacy violations, and noncompliance with biometric data requirements. In this skill context, the danger is elevated because the tool is explicitly marketed for identifying who is speaking and managing a voice library, making misuse of sensitive voice data a core risk rather than an incidental one.

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal