Speech Recognition Local

Security checks across malware telemetry and agentic risk

Overview

This skill locally transcribes audio files, with clear caveats that voice messages may be auto-transcribed and first use may download transcription components.

Install this if you are comfortable with local audio transcription and first-use setup downloads. For sensitive environments, preinstall and pin faster-whisper/model sources, confirm the model cache location, and only allow it to transcribe audio you want added to the conversation.

SkillSpector

By NVIDIA

Vulnerability Patterns

Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access
Supply ChainUnpinned Dependencies, External Script Fetching, Obfuscated Code

Findings (2)

Vague Triggers

Medium

Confidence: 83% confidence
Finding: The skill explicitly states that transcription is automatically triggered on receipt of voice messages, but the trigger scope and consent boundaries are not described. In an agent environment, broad automatic invocation can cause unintended processing of user content, surprise resource consumption, and accidental handling of sensitive audio without an explicit opt-in at the time of use.

Missing User Warnings

Low

Confidence: 70% confidence
Finding: The documentation notes that the model will auto-download on first use, but this behavior is not surfaced as a clear warning in the user-facing description and requirements. Silent network activity and code/model retrieval can violate user expectations for an 'offline/private' skill, and may introduce supply-chain or policy risks in restricted environments.

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal