Gemini Voice Assistant

Security checks across malware telemetry and agentic risk

Overview

This is a straightforward Gemini voice/text assistant, but users should know it can send typed or spoken content to Google Gemini and may trigger on broad chat phrases.

Install only if you are comfortable providing a Gemini API key and having typed or spoken conversation content processed by Google Gemini. Consider narrowing the WhatsApp triggers or requiring explicit opt-in before use, especially in chats that may contain private or sensitive information.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
  • MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
  • MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
  • Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Findings (5)

Lp3

Medium
Category
MCP Least Privilege
Confidence
90% confidence
Finding
The skill documentation shows capabilities that rely on environment variables and shell execution, but it does not declare corresponding permissions. This creates a transparency and policy-enforcement gap: users or orchestrators may invoke the skill without realizing it accesses secrets from the environment and runs command-line tooling such as Python and FFmpeg.

Tp4

High
Category
MCP Tool Poisoning
Confidence
79% confidence
Finding
The documented purpose presents the skill as a voice-to-voice assistant, but the behavior also supports text mode and configurable system instructions. This mismatch can mislead users and higher-level agents about what inputs are accepted and how much behavioral control callers have, increasing the risk of prompt injection, unsafe use, or unexpected data handling.

Missing User Warnings

Medium
Confidence
93% confidence
Finding
The markdown does not clearly warn users that their audio and text are sent to Google's Gemini Live API. In a voice assistant context, this is materially important because users may share sensitive spoken or typed content without understanding that it leaves the local environment and is processed by a third party.

Vague Triggers

Medium
Confidence
92% confidence
Finding
The WhatsApp text triggers include very generic phrases such as "help", "what is", "how to", "explain", and "tell me", which are common in ordinary conversation and can cause the skill to activate unintentionally. In a voice assistant context, this broad matching increases the chance of accidental invocation, unintended data processing, and user confusion about when audio or text is being sent to the Gemini-backed service.

Natural-Language Policy Violations

Medium
Confidence
81% confidence
Finding
The configuration hardcodes the language to English without any indication of user selection or negotiation. While not a direct security flaw, this can lead to misinterpretation of user requests, poor consent transparency, and unexpected handling of multilingual voice input in a voice-processing assistant.

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal