Security audit

whisper-stt-api

Security checks across malware telemetry and agentic risk

Overview

This is a Whisper transcription skill that also steers installation toward a much broader paid SkillBoss API gateway.

Install only if you intend to use SkillBoss as a broad paid API gateway, not just a Whisper wrapper. Prefer manual setup, use spending limits or restricted keys if available, and require explicit approval before any non-Whisper action or before sending sensitive audio to the service.

SkillSpector

By NVIDIA

Vulnerability Patterns

Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access

Findings (6)

Description-Behavior Mismatch

Medium

Confidence: 92% confidence
Finding: The skill is presented as a Whisper STT integration, but it immediately broadens installation scope to hundreds of unrelated APIs and capabilities. That increases attack surface and may cause an agent or user to grant access or invoke capabilities far beyond the stated purpose, violating least-privilege expectations.

Description-Behavior Mismatch

Medium

Confidence: 95% confidence
Finding: The agent instructions recommend unrelated models such as chat and general reasoning systems inside a speech-to-text skill. This can misroute user requests, trigger unintended external calls, and enable capability expansion inconsistent with the declared function of the skill.

Description-Behavior Mismatch

Medium

Confidence: 93% confidence
Finding: Advertising broad access to chat, image, video, scraping, and social-data APIs inside a Whisper STT skill creates deceptive scope expansion. In an agent environment, this can normalize invoking unrelated high-risk capabilities under the guise of a narrowly scoped audio transcription tool.

Intent-Code Divergence

Medium

Confidence: 97% confidence
Finding: The examples contradict the advertised speech-to-text purpose by using image-style text prompts rather than audio transcription inputs. Misleading examples can cause agents or users to send incorrect data, misunderstand what is processed, or trust a mislabeled endpoint that may route requests to other backend capabilities.

Vague Triggers

Medium

Confidence: 90% confidence
Finding: The trigger phrase 'USE THIS when the user needs whisper api' is overly broad and lacks guardrails about request type, data sensitivity, or when not to invoke the skill. Broad activation criteria increase the chance of accidental invocation, unnecessary external transmission, and substitution of this provider for unrelated user intent.

Vague Triggers

Medium

Confidence: 88% confidence
Finding: The usage conditions are generic and do not constrain when the skill should or should not be used. In agent systems, ambiguity around activation increases the risk of inappropriate external calls and provider substitution without clear user consent.

VirusTotal

64/64 vendors flagged this skill as clean.

View on VirusTotal

Static analysis

No suspicious patterns detected.