i-can-see

Security checks across malware telemetry and agentic risk

Overview

This skill transparently connects to a specified ESP32-CAM, saves a photo locally, and asks the agent to analyze it, but users should treat it as an intentional camera-capture tool.

Install this only if you control the ESP32-CAM at the configured local IP and are comfortable with OpenClaw taking and saving photos when you explicitly ask it to look. Use it in spaces where camera capture is acceptable, keep snapshots in a known folder, and delete sensitive images after analysis.

SkillSpector

By NVIDIA

Vulnerability Patterns

Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration

Findings (6)

Lp3

Medium

Category: MCP Least Privilege
Confidence: 93% confidence
Finding: The skill explicitly instructs the agent to access a networked ESP32-CAM endpoint, but no corresponding permission declaration is documented. Hidden or undeclared network access reduces transparency and weakens policy enforcement, making it easier for the skill to perform external communication without informed user or platform approval.

Tp4

High

Category: MCP Tool Poisoning
Confidence: 88% confidence
Finding: The skill claims to provide vision by taking a photo, but it also instructs saving images to arbitrary caller-supplied filesystem paths. That broad write capability exceeds the user-facing description and can enable unintended overwrites, persistence of sensitive photos, or storage in unsafe locations if the surrounding agent passes untrusted paths.

Vague Triggers

Medium

Confidence: 86% confidence
Finding: Activation phrases like '看看' and '你看到了什么' are broad everyday language and can cause accidental triggering. In this skill's context, mistaken activation is more serious because it leads to real-world image capture and network/device interaction, creating privacy and consent risks from normal conversation.

Vague Triggers

Medium

Confidence: 84% confidence
Finding: The example invocation phrases remain ambiguous and reinforce automatic activation for general requests to 'look' or 'help me see this.' Because the workflow immediately captures and saves an image, ambiguity can translate directly into unauthorized photo capture or retention.

Missing User Warnings

High

Confidence: 96% confidence
Finding: The skill describes taking real-world photos and analyzing them, but it provides no user-facing privacy notice about image capture, local storage, retention, or possible sensitive content in frame. In a vision skill connected to a physical camera, lack of notice and consent materially increases the risk of covert or surprising surveillance behavior.

Missing User Warnings

High

Confidence: 97% confidence
Finding: The workflow tells the agent to automatically create directories, take a photo, save it, and analyze it without any step for consent, privacy warning, or cleanup. This makes the skill especially dangerous because routine operation inherently performs surveillance-like actions and persists potentially sensitive images on disk.

VirusTotal

61/61 vendors flagged this skill as clean.

View on VirusTotal