voice

Security checks across malware telemetry and agentic risk

Overview

This Discord voice skill performs its stated voice-assistant function, but it needs Review because channel speech can drive the full agent toolset and some audio/text is routed through SkillBoss rather than the provider names users may expect.

Install only if you are comfortable with Discord voice audio and generated text leaving your environment for configured STT/TTS services, including SkillBoss relay paths. Restrict allowedUsers, avoid autoJoinChannel unless participants are clearly notified, and treat spoken prompts as capable of driving the same tools your OpenClaw agent can use.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access

Findings (14)

Context-Inappropriate Capability

Medium

Confidence: 90% confidence
Finding: The plugin explicitly tells the embedded agent, for untrusted voice input from anyone in the joined channel, that it has access to all normal tools and skills. That creates a privilege-boundary failure: spoken content can indirectly drive unrelated capabilities such as file, network, or other high-impact tools, greatly expanding the blast radius beyond simple voice interaction.

Context-Inappropriate Capability

Medium

Confidence: 86% confidence
Finding: The code explicitly enables local model loading via @xenova/transformers, which broadens the skill's file access behavior beyond simple Discord voice handling. If model identifiers or paths are user- or config-controlled, this can cause unintended local file access and loading of untrusted artifacts from disk.

Description-Behavior Mismatch

Medium

Confidence: 95% confidence
Finding: The Whisper/OpenAI/Deepgram providers send captured voice data to a third-party relay endpoint at api.skillbossai.com rather than directly to the named vendors. This creates an undisclosed external data-sharing path for potentially sensitive voice content and changes the trust boundary in a way users and integrators may not expect.

Missing User Warnings

Medium

Confidence: 89% confidence
Finding: The README clearly describes capturing live voice, transcribing it, sending it to third-party STT/TTS providers, and routing content through an agent, but it does not include an explicit privacy notice or consent warning for users in the voice channel. In a real-time Discord setting, this can lead to unintentional collection and external sharing of participants' speech, increasing privacy, compliance, and trust risks.

Missing User Warnings

Medium

Confidence: 91% confidence
Finding: The documented autoJoinChannel feature can cause the bot to enter a voice channel and begin monitoring for speech automatically, yet the README does not prominently warn administrators or users that listening may start on startup. This creates a meaningful risk of covert or unexpected audio capture in shared channels, especially if operators assume the bot is passive until manually invoked.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: This plugin captures live voice, transcribes it, sends the transcript to the agent, and may send audio/text to third-party STT/TTS providers, but the feature description does not prominently warn users about this data flow. In a voice setting, lack of explicit disclosure increases the risk of unintended collection and external sharing of sensitive spoken content.

Missing User Warnings

Medium

Confidence: 97% confidence
Finding: The documented default `allowedUsers: []` means all users may interact, and in a voice-channel context that can expose the bot to unauthorized prompting and capture/transcription of any speaker in the channel. Because the skill operates on ambient voice activity, the context makes this more dangerous than a normal text-only plugin: bystanders can be recorded or influence agent behavior without an explicit allowlist.

Missing User Warnings

Medium

Confidence: 87% confidence
Finding: The manifest advertises real-time voice conversations and multiple third-party STT/TTS backends, but it does not clearly warn that user audio and generated speech may be transmitted to external services such as OpenAI, Deepgram, ElevenLabs, or AWS Polly. In a voice-channel context, lack of explicit disclosure and consent can lead to privacy violations, accidental capture of sensitive spoken data, and deployment in environments with regulatory or policy constraints.

Missing User Warnings

Medium

Confidence: 77% confidence
Finding: User speech is transcribed and processed through external STT/LLM/TTS providers, but this file does not provide an explicit in-band notice or consent step to channel participants at the time audio is processed. In a voice-channel context, that can expose sensitive spoken content to third parties without participants realizing their audio is being sent off-platform.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: The manifest explicitly supports multiple cloud STT/TTS providers and requests related API keys, which strongly implies users' voice/audio, transcripts, and generated speech may be sent to third-party services. Failing to disclose this data flow is a real privacy/security issue because operators may enable the plugin without understanding that sensitive voice content can leave their environment and be processed under external vendors' retention and logging policies.

Missing User Warnings

Medium

Confidence: 86% confidence
Finding: The function sends arbitrary input text to an external service at api.skillbossai.com for speech synthesis. In a voice-assistant skill, that text may contain sensitive prompts, private user content, or server-specific information, so undisclosed third-party transmission creates a real privacy and data-governance risk.

Missing User Warnings

Medium

Confidence: 87% confidence
Finding: This second provider also transmits user-supplied text, plus a voice identifier, to the same external SkillBoss endpoint. Because the provider is labeled as ElevenLabs, the mismatch can further obscure where sensitive data is actually processed, increasing privacy and compliance risk.

Missing User Warnings

Medium

Confidence: 88% confidence
Finding: This code sends captured Discord voice audio to configurable third-party STT providers, but the file shows no user-facing notice, consent gate, or policy enforcement at the transmission points. In a real-time voice skill, that creates a privacy/security risk because users may be recorded and their speech exported off-platform without clear awareness, potentially exposing sensitive conversations to external services.

Missing User Warnings

Medium

Confidence: 85% confidence
Finding: The skill transmits generated response text to external TTS providers without any visible disclosure or consent control at the call sites. While less sensitive than raw microphone audio, responses may still contain personal, confidential, or server-specific data derived from user speech and agent context, so silent export to third parties is a legitimate privacy issue.

VirusTotal

67/67 vendors flagged this skill as clean.

View on VirusTotal