Discord Voice Using Deepgram

Voice-channel conversations in Discord using Deepgram streaming STT + low-latency TTS

MIT-0 · Free to use, modify, and redistribute. No attribution required.
5 · 1.1k · 1 current installs · 1 all-time installs
MIT-0
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Suspicious
medium confidence
Purpose & Capability
Name/description match the code: this is a Discord voice plugin that uses Deepgram for STT/TTS and routes transcripts to the agent. However, the registry metadata listed no required env vars while the SKILL.md and code expect a Discord token and a Deepgram API key (DISCORD_TOKEN / DEEPGRAM_API_KEY or config.deepgram.apiKey). That mismatch is an inconsistency to be aware of.
!
Instruction Scope
The SKILL.md and code instruct the plugin to join voice channels, stream audio to Deepgram, and forward transcripts to the embedded agent. The code builds an extraSystemPrompt and calls runEmbeddedPiAgent (the agent is told it has access to its normal tools/skills and the user's Discord ID). The plugin also reads/writes the session store and agent workspace via core-bridge. Those actions go beyond simple STT/TTS plumbing because they give the invoked agent contextual info and access to its usual toolset and persisted session data — a potential surprise/privilege escalation if you weren't expecting that.
Install Mechanism
This is effectively an instruction-plus-source package (package.json present). There's no packaged install spec in the registry, so install is manual via npm (npm install). Dependencies are standard npm packages (discord.js, @discordjs/voice, ws, etc.) from normal registries — no obscure download URLs or extract steps were found.
Credentials
The plugin legitimately needs a Discord bot token and a Deepgram API key. The code reads Deepgram keys from config or environment and attempts to get the Discord token from the host OpenClaw/Clawdbot main config (mainConfig.channels.discord.token or mainConfig.discord.token) rather than directly requiring an env var. This is plausible but should be called out: the plugin expects access to your platform's Discord token storage and may also read Deepgram keys from env/config, so credential placement matters.
!
Persistence & Privilege
The plugin loads Clawdbot core modules and uses them to resolve agent workspace, session store, and to run an embedded agent. It also creates/updates session entries (saving a session store). It intentionally removed a commented-out 'lane' restriction and passes an extra system prompt telling the agent it 'has access to all your normal tools and skills'. That combination (embedded agent invocation + persisted session state + broad tool access) increases the blast radius of voice-triggered operations and is not clearly surfaced in SKILL.md.
Scan Findings in Context
[system-prompt-override] unexpected: A pattern indicating system-prompt manipulation was flagged in the SKILL.md. The code does in fact construct and inject an extra system prompt into runEmbeddedPiAgent (giving the agent contextual instructions and the user's Discord ID). While this may be intended behavior for voice UX, it should be considered a prompt-injection risk and is not prominently documented in the SKILL.md frontmatter.
What to consider before installing
Before installing, be aware of these points: - Credentials: The plugin requires a Discord bot token and a Deepgram API key. Decide whether those keys will be stored in OpenClaw/Clawdbot config or environment variables; prefer least-privilege tokens for the Discord bot (only Connect/Speak/Voice Activity), and don't reuse high-privilege tokens. - Agent access: Voice input is forwarded into your embedded agent via runEmbeddedPiAgent. The plugin intentionally supplies an extra system prompt and does not enforce a restrictive 'lane' — meaning the invoked agent may have access to its usual tools and persisted session data. If your agent has tools that can access external services or secrets, voice input could indirectly trigger them. If you don't want that, do not enable this plugin or inspect/modify the runEmbeddedPiAgent call to restrict tool access. - Session persistence: Transcripts and session IDs are stored via the platform session store/workspace. If you handle sensitive conversations, verify where the session store is located and who can read it. - Test safely: Try this in a throwaway Discord server with a bot that has minimal permissions and with a non-production Deepgram key. Review and, if needed, modify the code to (a) explicitly restrict which tools the agent may use when invoked by voice, (b) avoid sending or persisting sensitive context, and (c) require an explicit opt-in to auto-join channels. - If you lack trust in the source (homepage unknown, owner ID only): prefer official/verified plugins or conduct a code review. The behavior is plausible for the stated purpose, but the privilege surface (embedded agent invocation + persisted sessions + undocumented system-prompt injection) merits caution.

Like a lobster shell, security has layers — review code before you run it.

Current versionv1.0.0
Download zip
Plugin bundle (nix)
Skill pack · CLI binary · Config
SKILL.mdCLIConfig
Config requirements
Required envDISCORD_TOKEN, DEEPGRAM_API_KEY
latestvk9795jzy8z7mn48vb8z3rywhn980ye27

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

Config example

Starter config for this plugin bundle.

{
  "plugins": {
    "entries": {
      "deepgram-discord-voice": {
        "enabled": true,
        "config": {
          "streamingSTT": true,
          "streamingTTS": true,
          "ttsVoice": "aura-2-thalia-en",
          "vadSensitivity": "medium",
          "bargeIn": true,

          "primaryUser": "atechy",
          "allowVoiceSwitch": true,
          "wakeWord": "openclaw",

          "deepgram": {
            "sttModel": "nova-2",
            "language": "en-US"
          }
        }
      }
    }
  }
}

SKILL.md

Deepgram Discord Voice (Clawdbot/OpenClaw Plugin)

This plugin lets you talk to your agent only from a Discord voice channel.

Pipeline (low latency):

  • Discord voice audio → Deepgram streaming STT (WebSocket)
  • Transcript → your agent
  • Agent reply → Deepgram TTS (/v1/speak streamed HTTP Ogg/Opus)
  • Audio played back into the voice channel

Requirements

  • A Discord bot token (DISCORD_TOKEN)
  • A Deepgram API key (DEEPGRAM_API_KEY)
  • Discord bot permissions in your server:
    • Connect
    • Speak
    • Use Voice Activity

Install

Option A: Install from ClawHub

  1. In your OpenClaw/Clawdbot dashboard, open Skills/Plugins.
  2. Add/install deepgram-discord-voice.
  3. Set the required environment variables.

Option B: Manual install

  1. Copy this folder into your extensions/plugins directory.
  2. Run:
npm install
  1. Restart OpenClaw/Clawdbot.

Configuration

Key settings

  • primaryUser (recommended): Who the bot listens to by default.

    • Best: your Discord user ID (numeric)
    • Also supported: username/display name (e.g., atechy) if unique in-channel
  • allowVoiceSwitch: If true, the primary user can switch who is allowed by voice.

  • wakeWord: Prefix for voice control commands. Default: openclaw.

  • deepgram.sttModel: Default nova-2.

  • deepgram.language: Optional BCP‑47 language tag (e.g., en-US, es, es-EC).

  • ttsVoice: Deepgram Aura voice model (e.g., aura-2-thalia-en).

Example config

{
  "plugins": {
    "entries": {
      "deepgram-discord-voice": {
        "enabled": true,
        "config": {
          "streamingSTT": true,
          "streamingTTS": true,

          "primaryUser": "atechy",
          "allowVoiceSwitch": true,
          "wakeWord": "openclaw",

          "ttsVoice": "aura-2-thalia-en",
          "vadSensitivity": "medium",
          "bargeIn": true,

          "deepgram": {
            "sttModel": "nova-2",
            "language": "en-US"
          }
        }
      }
    }
  }
}

Usage

Join a voice channel

Use the plugin tool or slash command (depends on your OpenClaw setup):

  • Join: action=join with the channelId
  • Leave: action=leave

Talk (voice channel)

Once the bot is connected, just speak.

Safeguard: only listen to you (default)

When primaryUser is set, the plugin will only listen to that user unless you allow someone else.

Let someone else talk (voice commands)

As the primary user, say:

  • openclaw allow <name>
  • openclaw listen to <name>

To lock it back:

  • openclaw only me
  • openclaw reset

Switch via tool actions (optional)

  • allow_speaker with user (id / @mention / name)
  • only_me
  • status

Notes

  • Lowest latency comes from streamingSTT=true and streamingTTS=true.
  • Deepgram TTS is streamed over HTTP in Ogg/Opus so Discord can play it immediately.

Files

13 total
Select a file
Select a file to preview.

Comments

Loading comments…