Qqbot Voice Transcribe

Security checks across malware telemetry and agentic risk

Overview

This appears to be a real QQ voice transcription skill, but its bot integration examples can turn user-controlled attachment filenames into shell commands on the host.

Review before installing. Do not copy the Gateway examples as-is: replace exec/execPromise shell strings with spawn or execFile argument arrays, generate safe server-side filenames, restrict processing to a controlled directory, and pin the Silk decoder to a trusted location. Only enable the swap instructions if you intentionally want a persistent system-level change, and disclose to bot users that voice messages will be transcribed and may be logged or stored.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
  • Behavioral ASTexec() Call, eval() Call, Dynamic Import
  • Taint TrackingDirect Taint Flow, Variable-Mediated Taint Flow, Credential Exfiltration Chain
  • MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Findings (8)

subprocess module call

Medium
Category
Dangerous Code Execution
Content
output_mp3 = output_dir / f"{amr_path.stem}.mp3"
        
        print(f"🎵 正在解码...")
        result = subprocess.run(
            ["bash", f"{DECODER_PATH}/converter.sh", str(fixed_path), "mp3"],
            capture_output=True,
            text=True,
Confidence
89% confidence
Finding
result = subprocess.run( ["bash", f"{DECODER_PATH}/converter.sh", str(fixed_path), "mp3"], capture_output=True, text=True, timeout=120 )

Tainted flow: 'DECODER_PATH' from os.getenv (line 19, credential/environment) → subprocess.run (code execution)

Medium
Category
Data Flow
Content
output_mp3 = output_dir / f"{amr_path.stem}.mp3"
        
        print(f"🎵 正在解码...")
        result = subprocess.run(
            ["bash", f"{DECODER_PATH}/converter.sh", str(fixed_path), "mp3"],
            capture_output=True,
            text=True,
Confidence
96% confidence
Finding
result = subprocess.run( ["bash", f"{DECODER_PATH}/converter.sh", str(fixed_path), "mp3"], capture_output=True, text=True, timeout=120 )

Description-Behavior Mismatch

Medium
Confidence
96% confidence
Finding
The README instructs users to create and persist a 4GB swap file by modifying /etc/fstab, which is a system-level change outside the core purpose of voice transcription. This expands the blast radius of the skill, requires elevated privileges, and can lead operators to make lasting host configuration changes without clearly addressing operational risks, rollback steps, or safer alternatives.

Intent-Code Divergence

Medium
Confidence
90% confidence
Finding
The documentation promises a user confirmation flow, but the shown code automatically appends the transcription to bot message content without any confirmation gate. In a messaging context, this can expose sensitive spoken content, cause unintended actions based on mis-transcription, and violate user expectations about consent and review.

Context-Inappropriate Capability

Medium
Confidence
94% confidence
Finding
The code builds shell command strings with unquoted, attacker-influenced paths such as localPath, pcmPath, wavPath, and downloadDir, then executes them via child_process.exec. If an attachment filename or path contains shell metacharacters, this can lead to command injection and arbitrary command execution; additionally, invoking binaries from fixed local locations increases trust on external executables without integrity checks.

Missing User Warnings

Medium
Confidence
79% confidence
Finding
The skill automatically transcribes user voice messages but does not warn operators or users about privacy implications, retention, or where audio/text may be processed and stored. Even if Whisper runs locally, transcription creates additional sensitive data artifacts and may expose private message content through logs, temp files, or downstream automation.

Missing User Warnings

Medium
Confidence
94% confidence
Finding
The skill automatically transcribes voice attachments and appends their contents into bot-visible message flows, yet the description does not clearly warn about this privacy-sensitive behavior. In practice, users may send voice notes expecting limited handling, while the skill silently converts speech into text that is easier to store, search, forward, or act upon.

Missing User Warnings

Medium
Confidence
99% confidence
Finding
The code builds shell commands with interpolated values from an untrusted file path and environment-controlled parameters, then executes them via exec. If an attacker can influence the attachment filename/path or relevant environment variables, they may inject shell metacharacters and achieve arbitrary command execution in the bot process.

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal