Gladia Live Transcription

Security checks across malware telemetry and agentic risk

Overview

This is a documentation-only Gladia live transcription skill, but users should treat live audio and transcripts as sensitive.

Install only if you intend to use Gladia for live transcription. Obtain required participant consent before streaming or recording audio, secure the Gladia API key, choose the correct language settings, disable analytics features you do not need, use callback URLs only for trusted HTTPS endpoints, and delete retained sessions when no longer needed.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access
Supply ChainUnpinned Dependencies, External Script Fetching, Obfuscated Code
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep

Findings (5)

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: The skill promotes real-time transcription through a third-party service but does not clearly warn that live audio is transmitted off-platform to an external provider. In privacy-sensitive contexts such as calls, meetings, or microphones, this can cause users or downstream agents to share sensitive audio without informed consent or appropriate handling expectations.

Natural-Language Policy Violations

Medium

Confidence: 90% confidence
Finding: The JavaScript example hard-codes English as the transcription language without indicating that this is only a sample or that language selection should come from user input. This can lead to silent transcription errors, exclusion of non-English speakers, and incorrect downstream automation based on misrecognized speech.

Natural-Language Policy Violations

Medium

Confidence: 91% confidence
Finding: The Python example similarly forces English transcription by default and provides no user-choice or warning about the locale restriction. In a live-transcription skill, this increases the chance of inaccurate transcripts and bad downstream decisions in multilingual or non-English environments.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: The call center example explicitly enables named entity recognition and summarization, and the surrounding guidance notes extraction of customer details such as names and account numbers without any warning about consent, data minimization, retention, or regulatory obligations. In a real call center context, this can lead developers to process sensitive personal data by default, increasing privacy, compliance, and insider-misuse risk.

Missing User Warnings

Medium

Confidence: 88% confidence
Finding: The example includes `callback_url` and advanced audio-analysis options like translation, named entity recognition, sentiment analysis, summarization, and chapterization without any warning that transcript-derived data may be sent to external endpoints or subjected to additional processing. In a real-time transcription context, this can lead implementers to unknowingly forward sensitive speech content or metadata to third parties, increasing privacy and compliance risk.

VirusTotal

65/65 vendors flagged this skill as clean.

View on VirusTotal