Virtual voice builder

Security checks across malware telemetry and agentic risk

Overview

This skill’s voice pipeline is mostly disclosed and purpose-aligned, but it needs review because it handles live microphone audio, third-party AI services, meeting audio injection, API keys, and a missing core capture script.

Review before installing. Use this only in conversations where microphone capture, external AI processing, and synthesized meeting audio are appropriate and disclosed. Keep .env out of version control, use dedicated low-privilege API keys, check provider retention settings, verify or replace the missing capture script before running, and test the kill switch before using it in a real meeting.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access
Supply ChainUnpinned Dependencies, External Script Fetching, Obfuscated Code
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep

Findings (4)

Missing User Warnings

Medium

Confidence: 91% confidence
Finding: The skill explicitly directs writing required API keys into the project's .env file without any warning about secure handling, scoping, or storage of secrets. This is risky because .env files are commonly committed, copied, logged, or shared in development workflows, which can expose credentials for STT, LLM, and TTS providers.

Missing User Warnings

Medium

Confidence: 92% confidence
Finding: The README explicitly describes capturing a real microphone, sending audio/transcripts to third-party STT/LLM/TTS providers, and injecting synthesized output into meeting applications, but provides no clear privacy, consent, retention, or legal-use warning. In this context, the omission is security-relevant because users may deploy the skill in live conversations without understanding that sensitive speech is transmitted off-device and replayed into calls, creating privacy, compliance, and impersonation risks.

Missing User Warnings

Medium

Confidence: 94% confidence
Finding: The architecture explicitly routes live microphone audio to third-party AI services (Deepgram for transcription and external TTS providers) but does not describe any user disclosure, consent flow, or privacy notice. In a voice pipeline, silent transmission of real-time audio off-device can expose sensitive conversations, credentials, or meeting content to external processors, creating material privacy and compliance risk.

Missing User Warnings

Medium

Confidence: 92% confidence
Finding: The code sends arbitrary text content to third-party TTS providers (ElevenLabs or Cartesia) over WebSocket connections, but it contains no consent flow, warning, redaction, or policy gate before transmitting potentially sensitive content off-box. In a voice pipeline, queued text may include private user data, secrets, or internal prompts, so this creates a real privacy and data-handling risk even though the transport uses secure WebSockets.

VirusTotal

65/65 vendors flagged this skill as clean.

View on VirusTotal