MoltShell Vision Engine

v1.0.0

Give your text-based OpenClaw agent the ability to see and describe images

1· 510·2 current·4 all-time
byAnton Melnyk@melnyk-anton
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
medium confidence
Purpose & Capability
The name/description promise (give a text-based agent the ability to see/describe images) matches the implementation: the code POSTs the image URL and prompt to MoltShell endpoints, polls for a job result, and returns a text description. The included endpoints (moltshell.xyz) and service_id are consistent with the stated MoltShell integration.
Instruction Scope
SKILL.md and the code restrict operations to submitting the image URL and prompt to MoltShell and polling for results. However, the skill sends the provided image URL off-platform (to moltshell.xyz) — if the URL points to internal or sensitive resources that could be fetched by MoltShell, that may expose data (SSRF / data exfiltration risk inherent to remote image fetch). The SKILL.md explicitly asks for a public URL, which is appropriate guidance.
Install Mechanism
No install spec is present (instruction-only skill). There is a code file included (index.ts) which will run in the agent runtime, but there are no downloads or external installers, so no high-risk install behavior is present.
!
Credentials
The skill optionally uses MOLTSHELL_API_KEY for production (documented in SKILL.md and referenced by the code) but the registry metadata shows 'Required env vars: none' and 'Primary credential: none' — a metadata mismatch. The code also reads OPENCLAW_AGENT_ID / OPENCLAW_BOT_ID (to set x-openclaw-bot-id) but those env vars are not declared in the registry metadata or SKILL.md. These are minor inconsistencies but worth noting because they affect privacy (agent id header) and configuration.
Persistence & Privilege
The skill does not request always:true and does not modify other skills or system configs. It only reads a small set of environment variables and generates a session UUID fallback; this level of presence is appropriate for its function.
Assessment
This skill appears to do what it says: it sends a provided image URL and a prompt to MoltShell (moltshell.xyz) and returns a text description. Before installing, consider: 1) Privacy: any image URL you pass will be fetched by MoltShell — do not pass URLs that point to private/internal resources or contain sensitive data. 2) Credentials: the skill can use MOLTSHELL_API_KEY for production (SKILL.md documents this), but the registry metadata does not list that env var; set the key only if you trust MoltShell. 3) Agent identifier: the skill may add an x-openclaw-bot-id header using OPENCLAW_AGENT_ID or OPENCLAW_BOT_ID (or a generated UUID); if you want to avoid exposing an agent identifier to the external service, do not set those env vars. 4) Quota: the public sandbox key is limited and will return 402 when exhausted. If you rely on this skill in production, create and use your own MoltShell API key and review MoltShell's privacy/security policy. If you need more certainty, ask the publisher for an explicit list of environment variables and a privacy policy for the MoltShell service.

Like a lobster shell, security has layers — review code before you run it.

image-to-textvk97ate53sn0bw9ebq2avwfwqps828dqglatestvk97ate53sn0bw9ebq2avwfwqps828dqgmoltshellvk97ate53sn0bw9ebq2avwfwqps828dqgscrapervk97ate53sn0bw9ebq2avwfwqps828dqgutilitiesvk97ate53sn0bw9ebq2avwfwqps828dqgvisionvk97ate53sn0bw9ebq2avwfwqps828dqg
510downloads
1stars
1versions
Updated 1mo ago
v1.0.0
MIT-0

👁️ MoltShell Vision Engine

Standard OpenClaw agents are blind. If your web-scraping bot hits an infographic, or your chatbot receives a user-uploaded image, the agent crashes because it cannot process pixels.

This skill acts as the visual cortex for your text-based bots. It securely routes image URLs to a Replicate-powered Vision-Language Model via the MoltShell M2M Marketplace and returns a structured text description so your agent can continue its reasoning loop.


⚡ Zero-Config Sandbox (Try it instantly)

We hate API paywalls as much as you do. This skill comes pre-configured with a Public Sandbox Key. You do not need to create an account or provide a credit card to test it. Just install the skill, and your bot will instantly receive $0.25 of free shadow-compute — enough for roughly 5 vision runs at $0.05 each.


🛠️ Usage

Once installed, your OpenClaw agent can call the moltshell_vision tool whenever it encounters an image.

Input Parameters

ParameterTypeRequiredDescription
image_urlstringThe public URL of the image to analyze
promptstringWhat the agent needs to know about the image

Example

Agent receives an image URL → calls moltshell_vision:

  image_url: "https://example.com/dashboard-screenshot.png"
  prompt:    "Describe the layout and key UI elements in this screenshot"

Tool returns:
  "The screenshot shows a modern web dashboard with a dark theme.
   The top navigation bar contains a logo on the left and user
   settings on the right. The main content area displays a grid
   of cards with metrics including revenue, active users, and..."

💳 Going to Production

The built-in sandbox wallet is strictly for testing and will throw a 402 Payment Required error once your free compute runs out.

To use this skill in production:

  1. Go to https://moltshell.xyz
  2. Generate a dedicated API Key
  3. Add it to your OpenClaw environment variables:
MOLTSHELL_API_KEY=sk_molt_your_key_here

That's it — no other configuration changes needed. The skill automatically uses your dedicated key when the environment variable is set.

Comments

Loading comments...