vision-skill

Security checks across malware telemetry and agentic risk

Overview

This vision skill does what it claims, but users should understand that selected images and prompts are sent to Tencent COS and Doubao cloud services.

Install only if you are comfortable sending chosen images, reference images, prompts, and generated outputs through Tencent COS and Doubao/Volcengine. Use least-privilege cloud credentials, keep the COS bucket private or tightly scoped, avoid regulated or confidential documents unless your policies allow it, and periodically clear the local .tasks directory if task history may contain sensitive content.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
  • Behavioral ASTexec() Call, eval() Call, Dynamic Import
  • MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
  • Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Findings (5)

subprocess module call

Medium
Category
Dangerous Code Execution
Content
# grep -v grep 排除自身,wc -l 统计行数
        # 注意:这只是一个简易的软限制,不够精确但能防止瞬间炸机
        cmd = "ps -ef | grep worker.py | grep -v grep | wc -l"
        result = subprocess.check_output(cmd, shell=True)
        count = int(result.strip())
        
        MAX_CONCURRENT_WORKERS = 5
Confidence
91% confidence
Finding
result = subprocess.check_output(cmd, shell=True)

Lp3

Medium
Category
MCP Least Privilege
Confidence
91% confidence
Finding
The skill documentation exposes capabilities that imply access to environment variables, local files, shell execution, network communication, and file writes, but it does not declare permissions or boundaries for those actions. That mismatch makes it easier for an agent or user to invoke the skill without understanding its access level, increasing the chance of unintended data exposure or unsafe execution in environments that rely on declared permissions for trust decisions.

Context-Inappropriate Capability

High
Confidence
94% confidence
Finding
Enumerating worker processes through a shell command gives the skill unnecessary host introspection powers unrelated to core image recognition/generation. In the skill context, this is more concerning because a user-facing vision tool generally should not need arbitrary shell-based process inspection on the host.

Missing User Warnings

Medium
Confidence
88% confidence
Finding
The README states that local files are automatically uploaded to Tencent COS and then sent to Doubao APIs, but it does not prominently and explicitly warn users that image contents and prompts leave the local environment during normal operation. In a vision skill handling potentially sensitive images, this can lead to unintentional disclosure of personal, confidential, or regulated data because users may not realize cloud transmission is mandatory for core functionality.

Missing User Warnings

Medium
Confidence
95% confidence
Finding
The skill states that local images are uploaded to Tencent COS and then sent to Doubao AI services, but it does not prominently warn users that image contents and potentially sensitive text extracted via OCR will leave the local environment. In a vision/OCR skill, this is especially relevant because users may process invoices, contracts, screenshots, or forms that often contain confidential or regulated data.

VirusTotal

VirusTotal findings are pending for this skill version.

View on VirusTotal