Install
openclaw skills install mai-transcribeTranscribe audio with Microsoft's MAI-Transcribe-1 model via Azure AI Speech.
openclaw skills install mai-transcribeTranscribe an audio file via Azure AI Speech using Microsoft's MAI-Transcribe-1 model.
node {baseDir}/scripts/transcribe.js /path/to/audio.m4a
Defaults:
mai-transcribe-1<input>.txt2025-10-15node {baseDir}/scripts/transcribe.js /path/to/audio.ogg --out /tmp/transcript.txt
node {baseDir}/scripts/transcribe.js /path/to/audio.m4a --language en-GB
node {baseDir}/scripts/transcribe.js /path/to/audio.m4a --json --out /tmp/transcript.json
node {baseDir}/scripts/transcribe.js /path/to/audio.wav --model mai-transcribe-1
node {baseDir}/scripts/transcribe.js --help
export AZURE_SPEECH_ENDPOINT="https://YOUR-RESOURCE.cognitiveservices.azure.com"
export AZURE_SPEECH_KEY="YOUR_SPEECH_RESOURCE_KEY"
https://your-resource.cognitiveservices.azure.comexport AZURE_SPEECH_ENDPOINT="https://YOUR-RESOURCE.cognitiveservices.azure.com"
export AZURE_SPEECH_KEY="YOUR_SPEECH_RESOURCE_KEY"
If gh-style copy-paste chaos is happening, the most important bit is that this skill expects the Speech resource endpoint, not a generic Foundry project URL.
Optional:
export AZURE_SPEECH_API_VERSION="2025-10-15"
The script calls:
POST {AZURE_SPEECH_ENDPOINT}/speechtotext/transcriptions:transcribe?api-version=2025-10-15
Headers:
Ocp-Apim-Subscription-Key: {AZURE_SPEECH_KEY}Multipart form fields:
audiodefinitionExample definition payload:
{
"enhancedMode": {
"enabled": true,
"model": "mai-transcribe-1"
}
}
--json writes the raw Azure response for debugging or downstream processing.