MAI Voice

Security checks across malware telemetry and agentic risk

Overview

This is a straightforward Azure text-to-speech skill; the main caution is that the text you choose to synthesize is sent to Microsoft Azure.

Install only if you are comfortable sending the text you synthesize to Microsoft Azure Speech. Avoid submitting secrets, credentials, regulated data, or confidential text unless your policies allow Azure processing, and protect the Azure Speech key like any other API secret.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access
Supply ChainUnpinned Dependencies, External Script Fetching, Obfuscated Code
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep

Findings (4)

Missing User Warnings

Low

Confidence: 84% confidence
Finding: The README instructs users to configure an Azure API key and send text for speech synthesis to Azure's external service, but it does not clearly warn about secure credential handling or that submitted text/audio content leaves the local environment. This can lead to accidental exposure of secrets in shell history, logs, screenshots, or misuse with sensitive input that users did not realize would be transmitted to a third party.

Missing User Warnings

Medium

Confidence: 94% confidence
Finding: The skill sends user-provided text to Azure's remote Speech REST API, but the user-facing description does not clearly warn that prompts or file contents leave the local environment. This can cause accidental disclosure of sensitive text, secrets, or regulated data if users assume synthesis happens locally.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: The script sends user-provided text or file contents to Azure's remote TTS endpoint, but it does not present any explicit warning, confirmation, or privacy notice at the time of transmission. This can cause unintended disclosure of sensitive content if users assume synthesis happens locally or do not realize their input is being uploaded to a third-party service.

External Transmission

Medium

Category: Data Exfiltration
Content: url="https://${AZURE_SPEECH_REGION}.tts.speech.microsoft.com/cognitiveservices/v1" curl -sS --fail-with-body \ -X POST "$url" \ -H "Ocp-Apim-Subscription-Key: ${AZURE_SPEECH_KEY}" \ -H "Content-Type: application/ssml+xml" \
Confidence: 92% confidence
Finding: curl -sS --fail-with-body \ -X POST "$url" \ -H "Ocp-Apim-Subscription-Key: ${AZURE_SPEECH_KEY}" \ -H "Content-Type: application/ssml+xml" \ -H "X-Microsoft-OutputFormat: ${format}" \ -H "Us

VirusTotal

64/64 vendors flagged this skill as clean.

View on VirusTotal