Gemini Voice Assistant

Security checks across malware telemetry and agentic risk

Overview

This is a straightforward Gemini voice/text assistant, but users should know it can send typed or spoken content to Google Gemini and may trigger on broad chat phrases.

Install only if you are comfortable providing a Gemini API key and having typed or spoken conversation content processed by Google Gemini. Consider narrowing the WhatsApp triggers or requiring explicit opt-in before use, especially in chats that may contain private or sensitive information.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands

Findings (5)

Lp3

Medium

Category: MCP Least Privilege
Confidence: 90% confidence
Finding: The skill documentation shows capabilities that rely on environment variables and shell execution, but it does not declare corresponding permissions. This creates a transparency and policy-enforcement gap: users or orchestrators may invoke the skill without realizing it accesses secrets from the environment and runs command-line tooling such as Python and FFmpeg.

Tp4

High

Category: MCP Tool Poisoning
Confidence: 79% confidence
Finding: The documented purpose presents the skill as a voice-to-voice assistant, but the behavior also supports text mode and configurable system instructions. This mismatch can mislead users and higher-level agents about what inputs are accepted and how much behavioral control callers have, increasing the risk of prompt injection, unsafe use, or unexpected data handling.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: The markdown does not clearly warn users that their audio and text are sent to Google's Gemini Live API. In a voice assistant context, this is materially important because users may share sensitive spoken or typed content without understanding that it leaves the local environment and is processed by a third party.

Vague Triggers

Medium

Confidence: 92% confidence
Finding: The WhatsApp text triggers include very generic phrases such as "help", "what is", "how to", "explain", and "tell me", which are common in ordinary conversation and can cause the skill to activate unintentionally. In a voice assistant context, this broad matching increases the chance of accidental invocation, unintended data processing, and user confusion about when audio or text is being sent to the Gemini-backed service.

Natural-Language Policy Violations

Medium

Confidence: 81% confidence
Finding: The configuration hardcodes the language to English without any indication of user selection or negotiation. While not a direct security flaw, this can lead to misinterpretation of user requests, poor consent transparency, and unexpected handling of multilingual voice input in a voice-processing assistant.

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal