realtime-transcription

Security checks across malware telemetry and agentic risk

Overview

This is a coherent local transcription skill, but it records sensitive audio, stores transcripts, and asks for broader local authority than its workflow clearly needs.

Install only if you are comfortable with microphone or system audio being transcribed and saved locally, and with transcript text potentially being sent to your configured LLM for summarization. Prefer running it in a virtual environment, pin or review dependencies and the SenseVoice model, narrow the skill permissions and trigger phrases, and regularly delete or protect archived transcripts.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
Behavioral ASTexec() Call, eval() Call, Dynamic Import
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands

Findings (9)

subprocess module call

Medium

Category: Dangerous Code Execution
Content: print(f" [{i}/{total}] 安装 {pkg}... {desc}") print(f" ⏳ pip3 install {pkg} ", end="", flush=True) try: result = subprocess.run( [sys.executable, '-m', 'pip', 'install', '--quiet', pkg], capture_output=True, text=True, timeout=300 )
Confidence: 91% confidence
Finding: result = subprocess.run( [sys.executable, '-m', 'pip', 'install', '--quiet', pkg], capture_output=True, text=True, timeout=300 )

Context-Inappropriate Capability

Medium

Confidence: 94% confidence
Finding: Bundling package installation capability into a transcription tool is unnecessary privilege expansion and increases supply-chain risk. A user running the skill is induced to let it alter the environment and fetch code from package repositories, which is outside the core transcription function and can be abused if dependencies are compromised.

Missing User Warnings

Medium

Confidence: 89% confidence
Finding: The skill captures microphone or system audio and archives transcripts, but the README does not prominently warn that recordings may include passwords, meetings, personal data, or other sensitive content. In a transcription/archive skill, missing privacy disclosures increases the risk of accidental collection and long-term retention of sensitive information, especially because system audio capture can sweep in unrelated content.

Vague Triggers

Medium

Confidence: 93% confidence
Finding: The trigger phrase "停止" / "stop" is extremely generic and can be spoken in many unrelated contexts, making accidental invocation plausible. In this skill, unintended triggering is more dangerous because it can terminate an active transcription session and immediately move into summary/archive handling of captured audio content.

Missing User Warnings

High

Confidence: 98% confidence
Finding: The skill captures microphone or system audio and archives transcripts, but the description does not present a clear privacy/consent warning before use. This is dangerous because users may unknowingly record sensitive conversations, meetings, credentials, or third-party audio, creating legal, privacy, and data-retention risk.

Missing User Warnings

Medium

Confidence: 88% confidence
Finding: The script starts microphone or system-audio capture without an explicit privacy warning, despite collecting potentially sensitive conversations or media audio. In this skill context, that is materially risky because the feature is specifically designed to monitor and transcribe live audio streams.

Missing User Warnings

Medium

Confidence: 90% confidence
Finding: Transcribed speech is persisted to disk without a prominent warning, which can expose sensitive spoken content to other local users, backups, logs, or later forensic recovery. This is especially relevant here because the default output path is automatic and the user may not realize retention is occurring.

Vague Triggers

Medium

Confidence: 87% confidence
Finding: The trigger phrase at this line is a very short, generic stop command in Chinese that is likely to appear in ordinary speech during a live transcription session. Because this skill continuously listens for voice commands while handling audio capture, accidental activation could stop recording unexpectedly and trigger downstream summarization and archival of partial content.

Vague Triggers

Medium

Confidence: 91% confidence
Finding: The English trigger phrase on this line is broad and can easily occur in normal conversation, media playback, or meeting audio. In a real-time transcription skill, that makes unintended activation plausible, which can start recording without clear user intent and increase privacy risk by capturing audio unexpectedly.

VirusTotal

VirusTotal findings are pending for this skill version.

View on VirusTotal