MAI Voice

Security checks across malware telemetry and agentic risk

Overview

This is a straightforward Azure text-to-speech skill; the main caution is that the text you choose to synthesize is sent to Microsoft Azure.

Install only if you are comfortable sending the text you synthesize to Microsoft Azure Speech. Avoid submitting secrets, credentials, regulated data, or confidential text unless your policies allow Azure processing, and protect the Azure Speech key like any other API secret.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
  • Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access
  • Supply ChainUnpinned Dependencies, External Script Fetching, Obfuscated Code
  • Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Findings (4)

Missing User Warnings

Low
Confidence
84% confidence
Finding
The README instructs users to configure an Azure API key and send text for speech synthesis to Azure's external service, but it does not clearly warn about secure credential handling or that submitted text/audio content leaves the local environment. This can lead to accidental exposure of secrets in shell history, logs, screenshots, or misuse with sensitive input that users did not realize would be transmitted to a third party.

Missing User Warnings

Medium
Confidence
94% confidence
Finding
The skill sends user-provided text to Azure's remote Speech REST API, but the user-facing description does not clearly warn that prompts or file contents leave the local environment. This can cause accidental disclosure of sensitive text, secrets, or regulated data if users assume synthesis happens locally.

Missing User Warnings

Medium
Confidence
95% confidence
Finding
The script sends user-provided text or file contents to Azure's remote TTS endpoint, but it does not present any explicit warning, confirmation, or privacy notice at the time of transmission. This can cause unintended disclosure of sensitive content if users assume synthesis happens locally or do not realize their input is being uploaded to a third-party service.

External Transmission

Medium
Category
Data Exfiltration
Content
url="https://${AZURE_SPEECH_REGION}.tts.speech.microsoft.com/cognitiveservices/v1"

curl -sS --fail-with-body \
  -X POST "$url" \
  -H "Ocp-Apim-Subscription-Key: ${AZURE_SPEECH_KEY}" \
  -H "Content-Type: application/ssml+xml" \
Confidence
92% confidence
Finding
curl -sS --fail-with-body \ -X POST "$url" \ -H "Ocp-Apim-Subscription-Key: ${AZURE_SPEECH_KEY}" \ -H "Content-Type: application/ssml+xml" \ -H "X-Microsoft-OutputFormat: ${format}" \ -H "Us

VirusTotal

64/64 vendors flagged this skill as clean.

View on VirusTotal