Security audit

Ningyao Voice Launcher

Security checks across malware telemetry and agentic risk

Overview

This is a real local voice-chat launcher, but it needs review because it can send screen/chat data to OpenAI and exposes an unauthenticated terminal-capable local web server.

Install only if you are comfortable running a local Node server that uses your OpenAI API key and may send chat text, speech-derived text, screen summaries, and screenshots to the configured model provider. Run it on a trusted machine and network, avoid sharing sensitive screens, verify the server is bound to localhost before use, and avoid the terminal panel until it has authentication and stricter file/path controls.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection

Findings (12)

Lp3

Medium

Category: MCP Least Privilege
Confidence: 91% confidence
Finding: The skill advertises operational steps that use environment variables, network access, and shell/PowerShell execution, but it does not declare those capabilities or permissions. That mismatch weakens security review and informed consent, making it easier for a user or host system to invoke code with broader access than expected, especially because it installs dependencies and asks for an API key in a local `.env` file.

Context-Inappropriate Capability

Medium

Confidence: 96% confidence
Finding: This frontend exposes a terminal execution workflow in a voice-chat launcher, which materially expands the attack surface beyond the stated purpose of the skill. Even though the actual execution occurs server-side, the browser code normalizes terminal access as a supported feature, making misuse or backend command-injection bugs much more likely to be reachable.

Intent-Code Divergence

High

Confidence: 99% confidence
Finding: The UI tells the user to enter a 'whitelist' command, but the code forwards any trimmed string directly to /api/terminal with no client-side restriction at all. If the backend trusts this input or has incomplete validation, an attacker can submit arbitrary commands, potentially leading to full host compromise, data loss, or malware execution.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: The README explicitly advertises microphone voice input and screen-sharing-based summaries, but it does not clearly warn users that sensitive audio, on-screen content, and derived summaries may be transmitted to a model service via the configured API endpoint. In a voice/chat launcher skill, this omission is security-relevant because users may expose credentials, personal data, or confidential workspace content without informed consent.

Missing User Warnings

Medium

Confidence: 83% confidence
Finding: The README says the tool captures browser microphone input and uses an OpenAI model, but it does not clearly warn users that spoken content or transcribed text may be sent to a third-party API. In a voice-chat context this can expose sensitive personal, workplace, or bystander audio-derived content without informed consent, making the omission a real privacy/security issue.

Missing User Warnings

Medium

Confidence: 90% confidence
Finding: Captured screen frames are periodically converted to image data and transmitted to /api/screen, which can expose sensitive on-screen information such as credentials, personal data, messages, or proprietary content. Although browser screen-sharing permission is required, this file does not provide a clear in-app disclosure that images are being sent to a backend for analysis, increasing the risk of uninformed consent.

Missing User Warnings

Medium

Confidence: 88% confidence
Finding: The UI offers screen sharing and explicitly states it will continuously inspect screen contents, but it does not present a clear privacy warning, scope description, or visible reminder of the sensitivity of shared content. In a voice-companion skill with 'screen awareness,' users may expose passwords, personal messages, or confidential work information without fully understanding the ongoing access, making this a real privacy/security weakness.

Missing User Warnings

Medium

Confidence: 94% confidence
Finding: The `/api/screen` endpoint sends user-provided screen images to the OpenAI API for analysis, and this file contains no consent, disclosure, or gating mechanism before transmitting potentially sensitive visual data off-device. Because screenshots can include passwords, messages, documents, or other confidential content, silent transmission creates a real privacy and data-handling risk.

Missing User Warnings

Medium

Confidence: 92% confidence
Finding: The `/api/chat` endpoint forwards recent chat history and optional `screenSummary` context to the OpenAI API, but this file shows no user-facing notice or permission check for that external sharing. Conversations and screen-derived context may contain personal, confidential, or regulated data, so undisclosed transmission is a legitimate privacy vulnerability.

Vague Triggers

Low

Confidence: 85% confidence
Finding: The package description is broad enough that an agent could invoke this skill in situations beyond a narrowly scoped local voice-launcher setup. Overbroad routing metadata can cause inappropriate activation, which is a security concern when a skill may install, configure, or launch local software on the user's machine.

Natural-Language Policy Violations

Medium

Confidence: 87% confidence
Finding: The description explicitly frames the skill around a Chinese voice companion and browser-based Chinese voice chat, which can bias invocation toward a specific language/locale without confirming the user's preference. In an agent setting, forcing language defaults can lead to misrouting, degraded user control, and unintended processing or setup choices that the user did not request.

Missing User Warnings

Medium

Confidence: 84% confidence
Finding: The script transmits both local file contents and an authentication token to a remote service without any interactive confirmation, dry-run preview, or guardrails against accidentally including sensitive files from the publish directory. In a developer environment, this can lead to unintended disclosure of secrets, internal notes, or unpublished assets if the folder contents are broader than expected.

VirusTotal

65/65 vendors flagged this skill as clean.

View on VirusTotal

Static analysis

No suspicious patterns detected.