Discord Voice

Security checks across malware telemetry and agentic risk

Overview

The Discord voice feature is coherent, but by default it can let anyone in a joined voice channel talk to a full-capability agent and leave shared persistent context unless tightly restricted.

Before installing, only use this in trusted Discord servers, set allowedUsers instead of leaving it empty, avoid auto-joining public channels, prefer local STT/TTS for private speech, keep TLS verification enabled, and use limited-scope Discord/API credentials.

VirusTotal

64/64 vendors flagged this skill as clean.

View on VirusTotal

Risk analysis

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

#
ASI01: Agent Goal Hijack
High
What this means

People in the Discord voice channel could verbally steer the agent, not just the installer or an explicitly trusted user.

Why it was flagged

The code acknowledges that, with the default empty allowlist, any user in joined Discord voice channels can interact with the bot and trigger API calls, while transcribed speech is routed to the agent.

Skill content
Handle transcribed speech - route to agent and get response ... No allowedUsers configured — all users in joined channels can interact with the bot and trigger API calls.
Recommendation

Set allowedUsers to explicit trusted Discord user IDs, avoid public voice channels, and require confirmation or a restricted mode for voice-origin instructions.

#
ASI02: Tool Misuse and Exploitation
High
What this means

A spoken prompt could potentially cause the agent to use broader tools or take actions beyond a simple voice reply.

Why it was flagged

The changelog indicates the voice-routed agent is not confined to a restricted lane, creating a broad tool-use surface for prompts originating from Discord voice.

Skill content
Remove lane restriction to allow full tool access
Recommendation

Use a limited tool lane for Discord voice, require approval for high-impact tools, and separate casual voice chat from administrative agent capabilities.

#
ASI06: Memory and Context Poisoning
Medium
What this means

One person’s spoken instructions could affect later conversations in the same Discord server.

Why it was flagged

The voice session is keyed at the guild level and saved, so multiple users in the same guild can share and persist context rather than being isolated per user or channel.

Skill content
const sessionKey = `discord:voice:${guildId}`; ... sessionStore[sessionKey] = sessionEntry; await deps.saveSessionStore(storePath, sessionStore);
Recommendation

Use per-user or per-channel sessions, add expiration/reset controls, and clear sessions after untrusted or public-channel use.

#
ASI03: Identity and Privilege Abuse
Low
What this means

Compromised or over-permissive tokens could allow bot misuse or unexpected provider charges.

Why it was flagged

The plugin requires a Discord bot token and can use provider API keys; this is expected for Discord voice/STT/TTS integration but grants meaningful account and billing authority.

Skill content
"discord.token": { "required": true ... } ... "OPENAI_API_KEY" ... "ELEVENLABS_API_KEY" ... "DEEPGRAM_API_KEY"
Recommendation

Use least-privilege Discord bot permissions, restrict provider keys, keep keys out of shared configs, and rotate keys if exposed.

#
ASI07: Insecure Inter-Agent Communication
Low
What this means

Voice content may leave Discord/OpenClaw and be processed by the selected speech providers.

Why it was flagged

Spoken audio/transcripts and response text may be sent to configured third-party STT/TTS providers; this is disclosed and purpose-aligned but involves sensitive voice data.

Skill content
Speech-to-Text: Whisper API (OpenAI), Deepgram, or Local Whisper ... Text-to-Speech: OpenAI TTS, ElevenLabs, Deepgram Aura, Amazon Polly, Edge TTS
Recommendation

Choose local providers for private conversations, review provider retention policies, and avoid using the bot in channels where sensitive speech may be captured.

#
ASI10: Rogue Agents
Low
What this means

If configured, the bot may join and remain active in a voice channel without a fresh manual join command each time.

Why it was flagged

The plugin can maintain or restore a voice connection and optionally auto-join on startup; this is disclosed and aligned with a voice bot, but it is persistent behavior users should control.

Skill content
Auto-reconnect: Automatic heartbeat monitoring and reconnection on disconnect ... autoJoinChannel ... Channel ID to auto-join on startup
Recommendation

Enable autoJoinChannel only for trusted private channels and confirm the bot leaves when voice use is finished.

#
ASI04: Agentic Supply Chain Vulnerabilities
Low
What this means

Installing dependencies runs the normal npm supply chain for this plugin.

Why it was flagged

The skill relies on local npm dependency installation even though the registry says there is no install spec; package-lock is present, but users still need to trust the package source and dependency chain.

Skill content
cd ~/.openclaw/extensions/discord-voice && npm install
Recommendation

Install from the intended repository/package, review package-lock changes, and avoid running npm install from an untrusted clone.