MAI-Voice-1
Synthesize speech with Azure AI Speech using Microsoft's MAI-Voice-1 voices.
Quick start
{baseDir}/scripts/speak.sh --text "Hello Steve"
Defaults:
- Voice:
en-us-Jasper:MAI-Voice-1
- Output:
./mai-voice.mp3
- Format:
audio-24khz-160kbitrate-mono-mp3
Useful flags
{baseDir}/scripts/speak.sh --text "Hello Steve" --voice en-us-Iris:MAI-Voice-1 --out /tmp/iris.mp3
{baseDir}/scripts/speak.sh --text-file /tmp/input.txt --voice en-us-June:MAI-Voice-1 --style empathy --out /tmp/june.mp3
{baseDir}/scripts/speak.sh --text "Let's go" --voice en-us-Jasper:MAI-Voice-1 --style excitement
{baseDir}/scripts/speak.sh --list-voices
Required env vars
export AZURE_SPEECH_KEY="YOUR_SPEECH_RESOURCE_KEY"
export AZURE_SPEECH_REGION="eastus"
How to get the API key and region
- Go to the Azure portal and open your Speech or Foundry Speech resource.
- Open Keys and Endpoint.
- Copy one of the resource keys.
- Copy the resource region, for example
eastus.
- Export them:
export AZURE_SPEECH_KEY="YOUR_SPEECH_RESOURCE_KEY"
export AZURE_SPEECH_REGION="eastus"
The MAI-Voice docs currently point at East US for preview access, so if you are not using eastus, double-check that your resource actually supports the model before blaming the script.
Optional:
export AZURE_SPEECH_OUTPUT_FORMAT="audio-24khz-160kbitrate-mono-mp3"
Supported voices
en-us-Jasper:MAI-Voice-1
en-us-June:MAI-Voice-1
en-us-Grant:MAI-Voice-1
en-us-Iris:MAI-Voice-1
en-us-Reed:MAI-Voice-1
en-us-Joy:MAI-Voice-1
API shape
The script calls:
POST https://{AZURE_SPEECH_REGION}.tts.speech.microsoft.com/cognitiveservices/v1
Headers:
Ocp-Apim-Subscription-Key: {AZURE_SPEECH_KEY}
Content-Type: application/ssml+xml
X-Microsoft-OutputFormat: {format}
User-Agent: curl
Body:
- SSML with a MAI voice name
- optional
mstts:express-as style wrapper
Notes
- This uses the Azure Speech REST API, not the Python SDK.
- Voice selection is user-configurable.
- Style is optional and applied via SSML.
- MAI-Voice-1 is currently public preview.