Audio Summary

Security checks across malware telemetry and agentic risk

Overview

This audio summarization skill does what it claims, but it needs review because it embeds a live-looking API key and runs ffmpeg through an unsafe shell command using the user’s file path.

Review before installing. Only run it on trusted filenames and non-sensitive media you are comfortable sending to Bailian/DashScope, replace the embedded API key with your own scoped secret, and prefer a version that invokes ffmpeg with subprocess arguments instead of a shell command.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Behavioral ASTexec() Call, eval() Call, Dynamic Import
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access

Findings (6)

os.system() or os exec-family call

High

Category: Dangerous Code Execution
Content: print(f"正在从视频提取音频并极致压缩: {video_path}") # 压缩为 16k mono mp3, 32k 码率以确保 Base64 编码后不超过 10MB (约支持 10-15 分钟视频) cmd = f'ffmpeg -y -i "{video_path}" -vn -ar 16000 -ac 1 -ab 32k "{audio_path}" -loglevel error' os.system(cmd) return os.path.exists(audio_path) def transcribe_with_data_uri(audio_path):
Confidence: 97% confidence
Finding: os.system(cmd)

Context-Inappropriate Capability

Medium

Confidence: 99% confidence
Finding: A hardcoded API key for an external service is embedded directly in the source code. This exposes the credential to anyone who can read the file, enables unauthorized use of the account, and couples the skill to undisclosed third-party data transfer.

Context-Inappropriate Capability

Medium

Confidence: 92% confidence
Finding: The skill includes shell command execution capability via ffmpeg invocation, which expands the attack surface beyond simple transcription logic. In this implementation, that capability becomes more dangerous because it is combined with unsafely built shell command strings from user input.

Missing User Warnings

Medium

Confidence: 97% confidence
Finding: The skill documentation says audio/video is transcribed via the external 百炼 qwen3-asr-flash service, but it does not give a clear, explicit privacy warning that user media content will be uploaded off-device to a third-party API. This can mislead users into sharing sensitive recordings without informed consent, increasing privacy, compliance, and data-handling risk.

Missing User Warnings

High

Confidence: 99% confidence
Finding: The code uses a hardcoded API credential without any user-facing disclosure or secret handling controls. This creates immediate credential leakage risk and can result in account abuse, billing fraud, or broader compromise if the key has expansive permissions.

Missing User Warnings

High

Confidence: 96% confidence
Finding: The skill base64-encodes the user's audio and sends it to a remote API for processing, but provides no explicit notice, consent, or configuration to opt out of external transfer. This can expose sensitive spoken content, private recordings, or regulated data to a third party without informed user approval.

VirusTotal

63/63 vendors flagged this skill as clean.

View on VirusTotal