MAI Voice

v0.1.1

Synthesize speech with Microsoft's MAI-Voice-1 voices via Azure AI Speech REST API.

⭐ 0· 98·0 current·0 all-time

by@robotsbuildrobots

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for robotsbuildrobots/mai-voice.

Previewing Install & Setup.

Prompt PreviewInstall & Setup

Install the skill "MAI Voice" (robotsbuildrobots/mai-voice) from ClawHub.
Skill page: https://clawhub.ai/robotsbuildrobots/mai-voice
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Required env vars: AZURE_SPEECH_KEY, AZURE_SPEECH_REGION
Required binaries: bash, curl
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install mai-voice

ClawHub CLI

Package manager switcher

npx clawhub@latest install mai-voice

Security Scan

VirusTotal

Benign

View report →

OpenClaw

Benign

high confidence

✓

Purpose & Capability

Name/description claim TTS via Azure MAI-Voice-1 and the skill only requires bash and curl plus AZURE_SPEECH_KEY and AZURE_SPEECH_REGION. The included scripts implement exactly that (construct SSML, POST to https://{region}.tts.speech.microsoft.com/cognitiveservices/v1). No unrelated services, binaries, or credentials are requested.

✓

Instruction Scope

SKILL.md instructs running scripts/speak.sh which only reads an optional user-provided text file, uses declared env vars, builds SSML, POSTs to Azure, and writes a local output audio file. The script does not read other system config, does not send data to third-party endpoints beyond the documented Azure endpoint, and does not perform broad or vague data collection. Note: the script will write to any --out path provided by the user and will read any --text-file path the user supplies — both are expected behaviors for a CLI tool.

✓

Install Mechanism

No install spec; this is instruction-only plus a small shell script. No downloads or archive extraction occur, so there is no high-risk install mechanism.

✓

Credentials

Only AZURE_SPEECH_KEY (primary credential) and AZURE_SPEECH_REGION are required — exactly what a TTS tool calling Azure REST should need. No extra SECRET/TOKEN env vars or unrelated credentials are requested.

✓

Persistence & Privilege

always:false and user-invocable:true (defaults). The skill does not modify other skills or system-wide settings. It does not store persistent tokens or enable permanent background services.

Assessment

This skill is internally consistent and appears to do exactly what it says: send text (from --text or a user-provided file) to Azure Speech and save the returned audio. Before installing, ensure you: 1) only provide an Azure Speech key that you are willing to use for TTS (treat AZURE_SPEECH_KEY as a secret); 2) limit the key's scope (use a dedicated Speech resource or key, monitor usage, and rotate if needed); 3) be mindful that any text you pass will be sent to Microsoft's Azure endpoint (do not send highly sensitive PII unless your compliance requirements allow it); 4) review the script if you plan to run it unattended because it will write to whatever --out path is supplied; and 5) verify your Azure region supports MAI-Voice-1. The skill does not exhibit hidden endpoints or unrelated credential requests.

Like a lobster shell, security has layers — review code before you run it.

Runtime requirements

🗣️ Clawdis

Binsbash, curl

EnvAZURE_SPEECH_KEY, AZURE_SPEECH_REGION

Primary envAZURE_SPEECH_KEY

latestvk9756wgajv5rtyeegbwta1105d84bw2c

98downloads

0stars

2versions

Updated 3w ago

v0.1.1

MIT-0

MAI-Voice-1

Synthesize speech with Azure AI Speech using Microsoft's MAI-Voice-1 voices.

Quick start

{baseDir}/scripts/speak.sh --text "Hello Steve"

Defaults:

Voice: en-us-Jasper:MAI-Voice-1
Output: ./mai-voice.mp3
Format: audio-24khz-160kbitrate-mono-mp3

Useful flags

{baseDir}/scripts/speak.sh --text "Hello Steve" --voice en-us-Iris:MAI-Voice-1 --out /tmp/iris.mp3
{baseDir}/scripts/speak.sh --text-file /tmp/input.txt --voice en-us-June:MAI-Voice-1 --style empathy --out /tmp/june.mp3
{baseDir}/scripts/speak.sh --text "Let's go" --voice en-us-Jasper:MAI-Voice-1 --style excitement
{baseDir}/scripts/speak.sh --list-voices

Required env vars

export AZURE_SPEECH_KEY="YOUR_SPEECH_RESOURCE_KEY"
export AZURE_SPEECH_REGION="eastus"

How to get the API key and region

Go to the Azure portal and open your Speech or Foundry Speech resource.
Open Keys and Endpoint.
Copy one of the resource keys.
Copy the resource region, for example eastus.
Export them:

export AZURE_SPEECH_KEY="YOUR_SPEECH_RESOURCE_KEY"
export AZURE_SPEECH_REGION="eastus"

The MAI-Voice docs currently point at East US for preview access, so if you are not using eastus, double-check that your resource actually supports the model before blaming the script.

Optional:

export AZURE_SPEECH_OUTPUT_FORMAT="audio-24khz-160kbitrate-mono-mp3"

Supported voices

en-us-Jasper:MAI-Voice-1
en-us-June:MAI-Voice-1
en-us-Grant:MAI-Voice-1
en-us-Iris:MAI-Voice-1
en-us-Reed:MAI-Voice-1
en-us-Joy:MAI-Voice-1

API shape

The script calls:

POST https://{AZURE_SPEECH_REGION}.tts.speech.microsoft.com/cognitiveservices/v1

Headers:

Ocp-Apim-Subscription-Key: {AZURE_SPEECH_KEY}
Content-Type: application/ssml+xml
X-Microsoft-OutputFormat: {format}
User-Agent: curl

Body:

SSML with a MAI voice name
optional mstts:express-as style wrapper

Notes

This uses the Azure Speech REST API, not the Python SDK.
Voice selection is user-configurable.
Style is optional and applied via SSML.
MAI-Voice-1 is currently public preview.

Comments

Loading comments...