Voxtral Speech

@sliekens/voxtral

Install

openclaw plugins install clawhub:@sliekens/voxtral

@sliekens/voxtral

OpenClaw plugin that adds Voxtral text-to-speech and real-time speech-to-text via the Mistral API.

CapabilityConfig sectionProvider IDDefault model
Speech (TTS)messages.ttsvoxtralvoxtral-mini-tts-2603
Realtime transcription (STT)plugins.entries.voice-call.config.streamingvoxtralvoxtral-mini-transcribe-realtime-2602

Installation

openclaw plugins install @sliekens/voxtral

API key

The plugin uses your Mistral API key. There are three ways to provide it, in order of priority:

1. Already have the Mistral core plugin configured?

Nothing to do. The Voxtral plugin will automatically pick up the Mistral credentials you already have configured.

2. Environment variable

export MISTRAL_API_KEY=your-key-here

3. Explicit per-provider key in openclaw.json

{
  "messages": {
    "tts": {
      "providers": {
        "voxtral": {
          "apiKey": "your-key-here"
        }
      }
    }
  },
  "plugins": {
    "entries": {
      "voice-call": {
        "config": {
          "streaming": {
            "providers": {
              "voxtral": {
                "apiKey": "your-key-here"
              }
            }
          }
        }
      }
    }
  }
}

Configuration

Enable Voxtral TTS

{
  "messages": {
    "tts": {
      "provider": "voxtral"
    }
  }
}

Set a default voice:

{
  "messages": {
    "tts": {
      "provider": "voxtral",
      "providers": {
        "voxtral": {
          "voice": "your-voice-id"
        }
      }
    }
  }
}

Enable Voxtral STT

{
  "plugins": {
    "entries": {
      "voice-call": {
        "config": {
          "streaming": {
            "provider": "voxtral"
          }
        }
      }
    }
  }
}

Full example

{
  "messages": {
    "tts": {
      "provider": "voxtral",
      "providers": {
        "voxtral": {
          "apiKey": "your-key-here",
          "voice": "your-voice-id",
          "model": "voxtral-mini-tts-2603"
        }
      }
    }
  },
  "plugins": {
    "entries": {
      "voice-call": {
        "config": {
          "streaming": {
            "provider": "voxtral",
            "providers": {
              "voxtral": {
                "apiKey": "your-key-here",
                "model": "voxtral-mini-transcribe-realtime-2602"
              }
            }
          }
        }
      }
    }
  }
}

Directive tokens

Override the voice or model for a specific message using inline directive tokens:

[voxtral_voice: your-voice-id] Hello, this uses a specific voice.
[voxtral_model: voxtral-mini-tts-2603] Hello, this uses a specific model.

Advanced options

OptionWhereDefaultDescription
apiKeyTTS + STT provider configMistral API key
baseUrlTTS + STT provider confighttps://api.mistral.ai/v1Custom API endpoint
modelTTS + STT provider configsee aboveOverride the default model
voiceTTS provider configDefault voice ID
sampleRateSTT provider config8000Audio sample rate in Hz (e.g. 8000, 16000)
silenceDurationMsSTT provider config800Milliseconds of silence before flushing the current utterance

You can also set the base URL via environment variable:

export MISTRAL_TTS_BASE_URL=https://your-endpoint/v1
export MISTRAL_STT_BASE_URL=https://your-endpoint/v1

Release notes

1.0.3

  • Fixes compatibility with newer OpenClaw versions
    • By prebuilding index.ts to dist/index.js and including it in the ClawHub package.

1.0.2

  • Fix compatibility with newer OpenClaw versions
    • By adding missing activation.onStartup field to openclaw.plugin.json

1.0.1

  • Fix compatibility with newer OpenClaw versions
    • By replacing deprecated providerAuthEnvVars with setup.providers[].envVars
  • Add separate CHANGELOG.md file

1.0.0

  • Initial release
  • Supports Voxtral TTS and real-time STT via the Mistral API
  • Supports reusing the same Mistral API key as the core plugin, or specifying separate keys for TTS and STT
  • Supports voices and models (currently only one TTS and one STT model, but more may be added in the future)
  • Supports directive tokens to override the voice or model on a per-message basis
  • Configurable sample rate and silence duration for STT
  • Integrates with the OpenClaw messages and voice call plugins
  • Integrates with Twilio via voice call plugin for real-time transcription of phone calls