whatsappVoiceOpenSkill

Security checks across malware telemetry and agentic risk

Overview

The skill mostly does what it says, but it needs review because it processes private voice messages, logs transcripts, runs a shell command from an audio path, and includes a long-running listener.

Install only if you are comfortable reviewing or patching it first. Replace the shell-string execSync call with argument-safe execution, run the listener only for trusted WhatsApp senders, disable raw transcript logging, protect any logs, and add explicit confirmations before enabling custom handlers that affect devices, accounts, files, or public outputs.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection

Findings (14)

Lp3

Medium

Category: MCP Least Privilege
Confidence: 91% confidence
Finding: The skill documentation advertises capabilities that require network access and likely environment/config access, but it does not declare permissions or clearly scope those privileges. Undeclared capabilities are dangerous because reviewers and users cannot accurately assess what data the skill can access or where voice data, transcripts, or API credentials may be sent.

Tp4

High

Category: MCP Tool Poisoning
Confidence: 94% confidence
Finding: The stated purpose focuses on voice transcription and response generation, but the content also describes filesystem monitoring, live external data fetching, and command-style integrations such as smart-home or drone control. This mismatch is risky because it hides materially broader behavior, including local file access and potentially safety-relevant actuation, increasing the chance of unexpected execution paths or abuse.

Context-Inappropriate Capability

High

Confidence: 91% confidence
Finding: The drone handler introduces a high-risk control surface unrelated to basic WhatsApp voice transcription and response generation. In a voice-driven messaging context, ambiguous or spoofed commands could query or later evolve into controlling physical equipment, creating safety, privacy, and unauthorized-operation risks disproportionate to the stated purpose.

Context-Inappropriate Capability

Medium

Confidence: 88% confidence
Finding: The smart-home example performs real device interaction in response to a voice-derived intent, which exceeds passive conversational processing and can cause unauthorized physical state changes. Messaging and speech interfaces are prone to misrecognition, replay, or abuse, so directly toggling devices without stronger trust checks is risky.

Missing User Warnings

Medium

Confidence: 92% confidence
Finding: The documentation promotes an auto-listener that watches inbound WhatsApp voice messages but does not clearly warn users that incoming messages will be automatically monitored, transcribed, and processed. In a messaging context, this can create privacy, consent, and compliance risks because operators may deploy the skill without understanding that third-party audio content will be ingested and analyzed automatically.

Missing User Warnings

Medium

Confidence: 89% confidence
Finding: The auto-listener silently watches a local inbound directory every few seconds and processes new voice files, but this monitoring behavior is not prominently disclosed as a privacy/security warning. Automatic surveillance of incoming media can expose sensitive conversations, create surprise data processing, and expand the attack surface if malicious files are dropped into the watched path.

Missing User Warnings

High

Confidence: 95% confidence
Finding: The skill describes Whisper transcription, external integrations, and weather lookups without clearly warning that audio or transcripts may be transmitted to third-party services or APIs. This is dangerous because voice notes often contain sensitive personal or operational information, and users may unknowingly expose that data to remote providers or downstream integrations.

Missing User Warnings

Medium

Confidence: 97% confidence
Finding: This code toggles a smart-home device immediately after intent handling, with no user-facing confirmation or secondary check. For a voice-to-action workflow, transcription errors, spoofed audio, or accidental invocation can trigger unintended physical actions, making silent execution dangerous.

Missing User Warnings

Medium

Confidence: 89% confidence
Finding: The setup guide recommends running a daemon that automatically watches and processes incoming WhatsApp voice messages, but it does not clearly warn users that this involves continuous monitoring and transcription of potentially sensitive personal communications. In a messaging context, silent background processing can create privacy, consent, and data-handling risks, especially if operators deploy it without informing end users or restricting retention and access.

Missing User Warnings

Medium

Confidence: 90% confidence
Finding: The daemon logs full voice transcripts and generated responses to stdout, which can expose sensitive user content in application logs, terminals, process supervisors, or centralized log pipelines. In a WhatsApp voice-processing context, transcripts may contain personal data, credentials, financial details, or commands, so retaining them without minimization or consent creates a meaningful privacy and data-leakage risk.

Missing User Warnings

Medium

Confidence: 94% confidence
Finding: The code logs transcribed voice content and later logs full processing results, which can include sensitive spoken data. Voice interfaces commonly handle personal or confidential information, so writing transcripts to logs creates a durable disclosure path to operators, support staff, log backends, and anyone with log access.

Ssd 3

Medium

Confidence: 96% confidence
Finding: The transcript is printed verbatim to logs immediately after transcription, exposing any sensitive content spoken by the user. In a WhatsApp voice-processing context, users may speak names, addresses, account details, or operational commands, so verbatim logging materially increases data leakage risk.

Ssd 3

Medium

Confidence: 95% confidence
Finding: On unknown commands, the response echoes the user's full spoken input back into the output channel. This can unnecessarily re-disclose secrets that were accidentally spoken or mis-transcribed, and it broadens exposure to messaging histories, downstream integrations, and anyone viewing the conversation.

Ssd 3

Medium

Confidence: 93% confidence
Finding: The final returned object includes the full transcript together with sender metadata and timestamp, increasing the chance that sensitive speech is persisted, forwarded, or stored by callers and logs. In a messaging integration, this creates a broad natural-language data exposure surface beyond the minimum needed to fulfill the feature.

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal