senseaudio-floating-audio-assistant-v2

Security checks across malware telemetry and agentic risk

Overview

The skill matches its audio-assistant purpose, but it needs review because it can capture, transmit, and store sensitive audio, clipboard text, transcripts, and local AudioClaw configuration data.

Review before installing. Use it only for audio and copied text you are authorized to capture and send to SenseAudio or AudioClaw. Avoid using it during confidential meetings or while copying secrets, stop it when finished, and periodically delete saved runs, clipboard files, logs, and generated metadata from workspace state.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Supply ChainUnpinned Dependencies, External Script Fetching, Obfuscated Code
Behavioral ASTexec() Call, eval() Call, Dynamic Import
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access

Findings (19)

Missing User Warnings

Medium

Confidence: 90% confidence
Finding: The notes describe capturing system output audio and sending audio/text to remote SenseAudio ASR/translation services, but they do not include an explicit operator-facing privacy warning, consent requirement, or guidance about sensitive content. This can lead to accidental collection or transmission of conversations, media, notifications, or other regulated data without adequate notice, especially because system-audio capture is broad by design.

Missing User Warnings

Medium

Confidence: 90% confidence
Finding: The quickstart instructs users to capture system audio, stream it to the SenseAudio service, and persist runtime/project state, but it does not clearly warn that sensitive audio and derived transcripts may leave the local machine and be stored. In this skill context, the omission is materially risky because the feature is specifically designed to capture broad system output, which can include meetings, notifications, credentials spoken aloud, or other confidential content.

Missing User Warnings

Medium

Confidence: 92% confidence
Finding: The README states that recognized speech segments are persisted to a log file, but it does not clearly warn users that spoken content may contain sensitive personal, corporate, or regulated information and will be stored locally. In a real-time subtitle/system-audio assistant, silent persistence increases privacy risk because users may enable the tool in meetings, media playback, or shared environments without realizing transcripts are retained.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: The system-audio setup instructions explain how to route macOS output through BlackHole and process playback audio, but they do not clearly warn that this can capture audio from other applications, including calls, meetings, videos, and notification sounds. In this skill’s context, that omission is more dangerous because the tool is specifically designed for continuous realtime transcription/subtitles, making inadvertent interception and processing of third-party audio more likely.

Missing User Warnings

Medium

Confidence: 96% confidence
Finding: Clipboard contents are silently persisted to disk under the workspace state directory, which can capture sensitive data such as secrets, private messages, or API keys without an obvious disclosure in this code path. Because this assistant explicitly operates on clipboard text, the skill context increases the likelihood that users will paste sensitive material and not expect durable local storage.

Missing User Warnings

Medium

Confidence: 85% confidence
Finding: The overlay can launch a background ASR runner using an environment file and local logging with no visible confirmation or disclosure in this host code. In an audio-capture assistant, silent startup of a recording/transcription pipeline materially raises privacy risk because users may not understand when monitoring begins or what configuration and secrets are being applied.

Missing User Warnings

Medium

Confidence: 88% confidence
Finding: This function sends user-provided source text, including transcript content, to external SenseAudio endpoints. In this skill context, transcripts may contain sensitive meeting content, personal data, or proprietary material, so silent transmission to a third-party API creates a real confidentiality/privacy risk even though it appears to be part of the intended feature.

Missing User Warnings

Medium

Confidence: 90% confidence
Finding: The script writes prompts, generated lyrics, source-path references, API responses, audio URLs, and other metadata to disk in plaintext without any warning or minimization. In this skill context, those files can preserve sensitive transcript-derived content and provider responses long after generation, increasing local disclosure risk to other users, backups, logs, or support bundles.

Missing User Warnings

Medium

Confidence: 94% confidence
Finding: The script reads clipboard-derived text from a file and forwards the full contents to an external agent process without any disclosure, consent prompt, or sensitivity filtering. Clipboard contents often include passwords, tokens, private messages, or proprietary text, so silent transmission to another component can create a privacy and data-exposure risk.

Missing User Warnings

Medium

Confidence: 91% confidence
Finding: The script forwards full transcript content to an external agent binary without any visible consent, notice, or data-classification check. In this skill context, transcripts may contain sensitive spoken content, so silently relaying them to another process increases privacy and data-handling risk, especially if that agent may log, persist, or further transmit the content.

Missing User Warnings

Medium

Confidence: 88% confidence
Finding: This code captures microphone audio and sends it to a remote ASR service over WebSocket, but the file contains no explicit user-consent or disclosure mechanism before transmission. In an audio assistant context, this can expose sensitive spoken content, nearby conversations, or regulated data to a third party without sufficiently transparent notice.

Missing User Warnings

Medium

Confidence: 91% confidence
Finding: The code persistently writes transcripts and subtitle data to JSON, JSONL, and text files, but there is no disclosure, retention control, or access restriction shown in this file. Persisted speech transcripts can contain credentials, personal data, business secrets, or other sensitive content that remains recoverable long after the session ends.

Missing User Warnings

Medium

Confidence: 90% confidence
Finding: The script starts a live audio capture subprocess and streams captured audio into a recognizer without any explicit consent prompt, notice, or confirmation at runtime. In the context of a floating audio assistant that can monitor system audio, this creates a real privacy risk because users may unintentionally capture sensitive conversations, media, or application audio.

Missing User Warnings

Medium

Confidence: 94% confidence
Finding: The script reads text from a local file representing clipboard content and sends that text to a third-party TTS endpoint without any built-in notice, consent check, or redaction step. Because clipboard contents often include secrets, personal data, or proprietary text, silent transmission to an external service creates a real confidentiality risk in this skill context.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: The script unconditionally kills processes using broad pkill -f patterns, including SIGKILL fallbacks, without confirmation or validating ownership beyond matching command lines. This can terminate unrelated local processes that happen to match the patterns, causing denial of service or data loss, and is more concerning here because the script is intended to be run interactively as part of a desktop assistant workflow.

Missing User Warnings

Medium

Confidence: 85% confidence
Finding: The script automatically upgrades pip and installs requirements from the network on execution, with no pinning or trust verification shown in this file beyond a local requirements hash. This creates supply-chain exposure and unexpected code execution during startup if dependencies are compromised, tampered with upstream, or if requirements are changed locally.

Missing User Warnings

Medium

Confidence: 92% confidence
Finding: Clipboard text is persisted to disk in a predictable workspace state directory before TTS generation, which can capture sensitive copied material such as passwords, tokens, or private messages. The risk is increased because clipboard contents are monitored continuously and there is no explicit disclosure or minimization in this file before writing the captured text to storage.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: The guide instructs users to mirror all macOS system audio into a virtual capture device so the interpreter can ingest it, but it provides no warning that this may include sensitive content such as calls, notifications, DRM-protected media, passwords spoken aloud, or confidential meeting audio. In the context of a real-time transcription/translation assistant that forwards audio to a remote WebSocket ASR service, this omission materially increases the risk of unintended surveillance and data disclosure.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: The optional live probe sends arbitrary text plus a bearer API key to a third-party SenseAudio endpoint. Although this is framed as a diagnostic feature, the CLI help does not clearly warn that enabling --live-tts will transmit user-supplied content off-host, so operators may trigger external data disclosure during testing without explicit informed consent.

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal