Qqbot Voice Transcribe

Security checks across malware telemetry and agentic risk

Overview

This appears to be a real QQ voice transcription skill, but its bot integration examples can turn user-controlled attachment filenames into shell commands on the host.

Review before installing. Do not copy the Gateway examples as-is: replace exec/execPromise shell strings with spawn or execFile argument arrays, generate safe server-side filenames, restrict processing to a controlled directory, and pin the Silk decoder to a trusted location. Only enable the swap instructions if you intentionally want a persistent system-level change, and disclose to bot users that voice messages will be transcribed and may be logged or stored.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Behavioral ASTexec() Call, eval() Call, Dynamic Import
Taint TrackingDirect Taint Flow, Variable-Mediated Taint Flow, Credential Exfiltration Chain
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection

Findings (8)

subprocess module call

Medium

Category: Dangerous Code Execution
Content: output_mp3 = output_dir / f"{amr_path.stem}.mp3" print(f"🎵 正在解码...") result = subprocess.run( ["bash", f"{DECODER_PATH}/converter.sh", str(fixed_path), "mp3"], capture_output=True, text=True,
Confidence: 89% confidence
Finding: result = subprocess.run( ["bash", f"{DECODER_PATH}/converter.sh", str(fixed_path), "mp3"], capture_output=True, text=True, timeout=120 )

Tainted flow: 'DECODER_PATH' from os.getenv (line 19, credential/environment) → subprocess.run (code execution)

Medium

Category: Data Flow
Content: output_mp3 = output_dir / f"{amr_path.stem}.mp3" print(f"🎵 正在解码...") result = subprocess.run( ["bash", f"{DECODER_PATH}/converter.sh", str(fixed_path), "mp3"], capture_output=True, text=True,
Confidence: 96% confidence
Finding: result = subprocess.run( ["bash", f"{DECODER_PATH}/converter.sh", str(fixed_path), "mp3"], capture_output=True, text=True, timeout=120 )

Description-Behavior Mismatch

Medium

Confidence: 96% confidence
Finding: The README instructs users to create and persist a 4GB swap file by modifying /etc/fstab, which is a system-level change outside the core purpose of voice transcription. This expands the blast radius of the skill, requires elevated privileges, and can lead operators to make lasting host configuration changes without clearly addressing operational risks, rollback steps, or safer alternatives.

Intent-Code Divergence

Medium

Confidence: 90% confidence
Finding: The documentation promises a user confirmation flow, but the shown code automatically appends the transcription to bot message content without any confirmation gate. In a messaging context, this can expose sensitive spoken content, cause unintended actions based on mis-transcription, and violate user expectations about consent and review.

Context-Inappropriate Capability

Medium

Confidence: 94% confidence
Finding: The code builds shell command strings with unquoted, attacker-influenced paths such as localPath, pcmPath, wavPath, and downloadDir, then executes them via child_process.exec. If an attachment filename or path contains shell metacharacters, this can lead to command injection and arbitrary command execution; additionally, invoking binaries from fixed local locations increases trust on external executables without integrity checks.

Missing User Warnings

Medium

Confidence: 79% confidence
Finding: The skill automatically transcribes user voice messages but does not warn operators or users about privacy implications, retention, or where audio/text may be processed and stored. Even if Whisper runs locally, transcription creates additional sensitive data artifacts and may expose private message content through logs, temp files, or downstream automation.

Missing User Warnings

Medium

Confidence: 94% confidence
Finding: The skill automatically transcribes voice attachments and appends their contents into bot-visible message flows, yet the description does not clearly warn about this privacy-sensitive behavior. In practice, users may send voice notes expecting limited handling, while the skill silently converts speech into text that is easier to store, search, forward, or act upon.

Missing User Warnings

Medium

Confidence: 99% confidence
Finding: The code builds shell commands with interpolated values from an untrusted file path and environment-controlled parameters, then executes them via exec. If an attacker can influence the attachment filename/path or relevant environment variables, they may inject shell metacharacters and achieve arbitrary command execution in the bot process.

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal