Speech Recognition Local

Security checks across malware telemetry and agentic risk

Overview

This skill locally transcribes audio files, with clear caveats that voice messages may be auto-transcribed and first use may download transcription components.

Install this if you are comfortable with local audio transcription and first-use setup downloads. For sensitive environments, preinstall and pin faster-whisper/model sources, confirm the model cache location, and only allow it to transcribe audio you want added to the conversation.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
  • Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access
  • Supply ChainUnpinned Dependencies, External Script Fetching, Obfuscated Code
Findings (2)

Vague Triggers

Medium
Confidence
83% confidence
Finding
The skill explicitly states that transcription is automatically triggered on receipt of voice messages, but the trigger scope and consent boundaries are not described. In an agent environment, broad automatic invocation can cause unintended processing of user content, surprise resource consumption, and accidental handling of sensitive audio without an explicit opt-in at the time of use.

Missing User Warnings

Low
Confidence
70% confidence
Finding
The documentation notes that the model will auto-download on first use, but this behavior is not surfaced as a clear warning in the user-facing description and requirements. Silent network activity and code/model retrieval can violate user expectations for an 'offline/private' skill, and may introduce supply-chain or policy risks in restricted environments.

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal