Back to skill

Security audit

Speech To Text With Speakers

Security checks across malware telemetry and agentic risk

Overview

This skill is a coherent remote speech-to-text integration, but users should treat submitted audio and URLs as sensitive third-party processing inputs.

Install only if you are comfortable sending the selected audio or public media URL to AgentPMT-hosted transcription workflows, possibly involving downstream speech-to-text providers. Avoid confidential, regulated, or third-party recordings unless you have authorization and have reviewed the provider's privacy, retention, logging, and billing terms.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
  • Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
  • Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access
  • Supply ChainUnpinned Dependencies, External Script Fetching, Obfuscated Code
Findings (4)

Vague Triggers

Medium
Confidence
88% confidence
Finding
The skill advertises broad discovery and activation keywords such as generic transcription and captioning phrases, which can cause an agent to invoke this remote skill in contexts the user did not explicitly intend. Because invocation sends user-provided audio or URLs to an external service, accidental matching increases the chance of unintended data disclosure and unnecessary paid actions.

Missing User Warnings

Medium
Confidence
92% confidence
Finding
The skill description explains functionality but does not clearly warn that uploaded audio or externally hosted media URLs will be transmitted to AgentPMT and potentially downstream transcription providers. In this context, users may submit sensitive meetings, interviews, or voice memos without informed consent about third-party processing, creating privacy and compliance risk.

Missing User Warnings

Medium
Confidence
93% confidence
Finding
The schema documents sending audio for transcription but does not warn that uploaded or referenced audio may contain sensitive personal, business, or regulated information that will be transmitted to an external service. In a speech-to-text skill, this omission can lead users or calling agents to submit sensitive recordings without informed consent or appropriate handling expectations.

Missing User Warnings

Medium
Confidence
95% confidence
Finding
The public_url parameter causes the service to retrieve remote audio, but the documentation does not clearly warn about the privacy and data-handling implications of fetching content from an external URL. This can result in users exposing private media locations or causing external content to be transferred and processed without understanding that remote fetching occurs.

VirusTotal

63/63 vendors flagged this skill as clean.

View on VirusTotal