Xiabb

Security checks across malware telemetry and agentic risk

Overview

This appears to be a real macOS dictation tool, but it needs Review because it combines microphone capture, cloud transcription, Accessibility control, auto-paste, local API-key storage, and under-disclosed install/persistence behavior.

Install only if you are comfortable granting microphone and Accessibility permissions to a tool that sends dictated audio to Google Gemini and auto-pastes text into the active app. Inspect install.sh before running it, decline launch-at-login unless you need it, avoid dictating passwords or confidential material, and use a restricted Gemini API key that you can rotate.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection

Findings (34)

Lp3

Medium

Category: MCP Least Privilege
Confidence: 84% confidence
Finding: The skill advertises shell/env-related capabilities via its installation instructions, but does not declare corresponding permissions. This weakens user consent and reviewability because users may run shell commands or provide environment-backed secrets without clear upfront disclosure.

Tp4

High

Category: MCP Tool Poisoning
Confidence: 93% confidence
Finding: The description presents the skill as a simple voice-to-text utility, but the broader behavior includes installing an app into /Applications, changing code-signing/quarantine state, configuring persistence via LaunchAgents, and storing an API key locally. These are materially sensitive system and credential-handling actions that users should be told about before installation.

Description-Behavior Mismatch

Medium

Confidence: 95% confidence
Finding: The app implements persistence by writing a LaunchAgent plist into ~/Library/LaunchAgents and exposing a 'Launch at Login' menu item, which is beyond the narrow core claim of transient voice-to-text input. While user-triggered and not covert, persistence materially changes the software's security posture because it causes automatic execution on login and increases the blast radius if the app is later modified or abused.

Context-Inappropriate Capability

Low

Confidence: 92% confidence
Finding: The app writes all transcribed text to the global clipboard and then simulates Cmd+V, affecting system-wide user state and potentially exposing sensitive dictated content to other apps that read the clipboard. Because this occurs automatically and is not prominently disclosed, users may unknowingly leak private audio-derived text or overwrite important clipboard contents.

Intent-Code Divergence

Medium

Confidence: 97% confidence
Finding: The landing page simultaneously claims 'local'/'offline privacy' and says the product is powered by the Gemini API, which strongly implies user voice or derived text may be sent to a third-party service. In a voice-transcription tool, this is a meaningful trust and privacy issue because users may disclose sensitive content under the false impression that processing never leaves the device.

Intent-Code Divergence

Medium

Confidence: 95% confidence
Finding: The comparison section says competitors require proprietary API keys while presenting XiaBB as zero-config/free, yet the features section states it is powered by the user's own Gemini API key. This contradiction can mislead users about setup requirements, vendor dependency, and privacy posture, especially for a tool marketed as simple and local.

Context-Inappropriate Capability

High

Confidence: 99% confidence
Finding: The Vite `define` setting replaces `process.env.GEMINI_API_KEY` at build time, which embeds the secret directly into client-side JavaScript. Any user can recover it from the shipped bundle or browser tooling and then abuse the Gemini API key outside the intended app, causing unauthorized usage, billing exposure, and possible downstream data access depending on the key's scope.

Intent-Code Divergence

Medium

Confidence: 95% confidence
Finding: The installer advertises a straightforward app install, but it also offers persistence via a LaunchAgent in the user's login items area. Even though this behavior is optional and prompted, persistence changes the security profile of the installation and should be explicitly disclosed up front because auto-start software can increase exposure and surprise users.

Description-Behavior Mismatch

Medium

Confidence: 93% confidence
Finding: The script goes beyond basic installation by clearing quarantine attributes and optionally creating login persistence. Those actions are not inherently malicious, but they bypass normal user-review signals and establish ongoing execution, which is more dangerous than a simple app copy/install flow.

Description-Behavior Mismatch

Low

Confidence: 93% confidence
Finding: The app copies every transcription into the global clipboard before pasting it, which exposes dictated content to other applications, clipboard managers, and cloud clipboard sync services. Users may expect text insertion at the cursor, but not persistent clipboard replacement of potentially sensitive spoken content.

Missing User Warnings

Medium

Confidence: 88% confidence
Finding: The README explicitly states that audio is sent to Google Gemini, but the setup and usage sections do not prominently warn users that spoken content may leave the device and be processed by a third party. For a voice-to-text tool that captures potentially sensitive dictated content and auto-pastes it into arbitrary apps, insufficient disclosure can lead users to unknowingly transmit confidential information such as credentials, code, emails, or personal data.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: The skill states it is powered by Google Gemini, but does not clearly warn that dictated speech and derived text may be transmitted to a third-party cloud service. Because the app captures potentially sensitive spoken content and inserts text at the cursor, undisclosed external processing creates meaningful privacy and data-handling risk.

Missing User Warnings

Low

Confidence: 88% confidence
Finding: Requiring a Gemini API key without guidance on secure handling can lead users to expose credentials or misunderstand that usage is tied to their own cloud account. This is especially relevant because installation appears to store the key locally, creating avoidable credential-management risk.

Missing User Warnings

Medium

Confidence: 92% confidence
Finding: The prompt explicitly instructs building functionality that overwrites the clipboard and simulates Cmd+V into whatever application currently has focus, but does not require a clear user confirmation, warning, or targeting constraint. In the context of a voice-to-text utility, this can cause unintended text injection into sensitive fields, chats, terminals, or admin tools, and can also destroy prior clipboard contents without notice.

Missing User Warnings

Medium

Confidence: 87% confidence
Finding: The content explicitly describes sending recorded audio to Gemini cloud APIs and requiring a Gemini API key, but it does not clearly warn users that speech content leaves the device and may be processed by a third party. In a voice-input tool, that omission can lead users to unknowingly transmit sensitive spoken data such as credentials, proprietary code, or personal information.

Missing User Warnings

Low

Confidence: 82% confidence
Finding: The article instructs users to place a Gemini API key in a plaintext local config file without any warning that the key is sensitive. While local config storage is common, failing to mention credential sensitivity increases the chance of accidental disclosure through backups, screenshots, repo commits, or permissive file permissions.

Missing User Warnings

Medium

Confidence: 86% confidence
Finding: The script promotes a workflow where audio is recorded and recognized text is automatically pasted into whatever application currently has focus, but it does not mention consent, accidental capture, or that speech is sent to a cloud provider. In this context, users could unintentionally dictate sensitive data into the wrong field or expose spoken content to third-party processing without understanding the privacy implications.

Missing User Warnings

Medium

Confidence: 92% confidence
Finding: The script tells users to configure and use a Gemini API key for speech transcription but does not clearly disclose that recorded audio and derived transcripts are sent to Google's Gemini REST API for processing. In a voice-input tool aimed at developers, users may speak source code, credentials, internal project names, or other sensitive content, so the omission undermines informed consent and can cause unintended third-party disclosure.

Missing User Warnings

Medium

Confidence: 90% confidence
Finding: The installation and usage section instructs users to install and run a microphone-driven transcription tool without warning that it captures spoken input and may process sensitive speech content. Because the skill context is a product promo and quick-start flow, users are encouraged to adopt it rapidly, which increases the chance they will use it on confidential conversations, code, or credentials without understanding the privacy implications.

Missing User Warnings

Medium

Confidence: 90% confidence
Finding: The article explicitly describes recording user audio and sending it to the Gemini API, but it does not present a clear privacy notice about what data leaves the device, how long it may be retained, or what third-party processing terms apply. For a voice-input tool, users may dictate sensitive content, so omitting this disclosure can mislead users and increase privacy and compliance risk.

Missing User Warnings

Medium

Confidence: 94% confidence
Finding: Recorded microphone audio is base64-encoded and sent to Google's Gemini API, but the onboarding and primary UX shown here do not provide a clear, prominent warning that spoken content leaves the device for cloud processing. For a voice-input tool handling potentially sensitive speech, lack of transparent disclosure undermines informed consent and can expose confidential information.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: The code automatically pastes transcribed text by posting synthetic Cmd+V events without explicit per-use confirmation. Because the app also requests Accessibility permissions and monitors global key events, this can insert sensitive or unintended text into whichever application is focused, including chats, terminals, admin consoles, or password fields.

Missing User Warnings

Medium

Confidence: 94% confidence
Finding: The page explicitly tells users to run `bash install.sh` from a freshly cloned repository, which is a direct prompt to execute shell code on their local system without any warning, review step, or verification guidance. This is dangerous because website visitors may blindly run the installer, and if the repository or script is ever compromised, arbitrary code would execute with the user's privileges.

Missing User Warnings

Medium

Confidence: 98% confidence
Finding: The UI markets 'Total offline privacy' and 'Keep your thoughts local' while also advertising Gemini API usage, without warning that spoken or transcribed content may be transmitted externally. For a voice tool used during coding, users may dictate secrets, code, emails, or proprietary information, so misleading privacy claims materially increase the risk of unintended disclosure.

Missing User Warnings

High

Confidence: 98% confidence
Finding: This line explicitly serializes `env.GEMINI_API_KEY` into the client build, making a sensitive credential available to end users without any protective boundary. In the context of a speech-to-text/productivity skill, exposing a provider API key is unnecessary and increases risk because users or third parties can extract and misuse it invisibly.

VirusTotal

65/65 vendors flagged this skill as clean.

View on VirusTotal