Speech to text

Security checks across malware telemetry and agentic risk

Overview

This is a local speech-to-text skill that processes audio files with Whisper, with a disclosed but privacy-sensitive watch mode.

Install in a dedicated Python environment and use a dedicated private inbound folder. Treat generated transcripts and logs as sensitive, keep backups if you do not want original audio moved, and run watch mode only when you want ongoing automatic processing.

SkillSpector

By NVIDIA

Vulnerability Patterns

Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access
Supply ChainUnpinned Dependencies, External Script Fetching, Obfuscated Code

Findings (1)

Missing User Warnings

Low

Confidence: 90% confidence
Finding: The skill exposes a continuous folder monitoring capability (`stt_watch`) but the description does not clearly warn users that it can keep watching an inbound directory and automatically process new audio files. This can lead to unintended ingestion of sensitive voice data or surprise background processing, especially in shared or synced folders.

VirusTotal

64/64 vendors flagged this skill as clean.

View on VirusTotal