GLM-V-Caption

PassAudited by ClawScan on May 1, 2026.

Overview

This skill looks purpose-aligned, but it runs a local captioning script and sends selected media to Zhipu using your API key.

Before installing, be comfortable with running the included Python helper, providing a ZHIPU_API_KEY, and sending selected media or media URLs to Zhipu. Use a dedicated API key, monitor usage, and avoid confidential files unless that external processing is acceptable.

Findings (4)

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

Low

#ASI01: Agent Goal Hijack

What this means

If the Zhipu API is unavailable or unsuitable, the agent may stop instead of offering another way to caption the media.

Why it was flagged

These instructions force an API-only workflow and change fallback behavior. This is disclosed and purpose-aligned, but it meaningfully constrains how the agent may respond.

Skill content

ONLY use GLM-V API — Execute the script `python scripts/glmv_caption.py`; NEVER caption media yourself; IF API fails — Display the error message and STOP immediately; NO fallback methods

Recommendation

Install this skill when you specifically want Zhipu GLM-V captioning; disable it or avoid invoking it if you want local, built-in, or fallback captioning.

Info

#ASI05: Unexpected Code Execution

What this means

The agent will execute local helper code to prepare media and call the Zhipu API.

Why it was flagged

Using the skill involves running the included local Python script. This is central to the skill's design and is not hidden.

Skill content

Execute the script `python scripts/glmv_caption.py`

Recommendation

Use the skill only from a source you trust and keep the included script under normal review before providing credentials or private media.

Low

#ASI03: Identity and Privilege Abuse

What this means

Requests may consume quota or incur costs on the configured Zhipu account.

Why it was flagged

The script authenticates requests to Zhipu with the user's API key, which is expected for this integration but gives the skill account-level API access for caption requests.

Skill content

api_key = os.environ.get("ZHIPU_API_KEY") ... "Authorization": f"Bearer {api_key}"

Recommendation

Use a dedicated, revocable API key if possible, store it securely, and monitor usage on the Zhipu account.

Low

#ASI07: Insecure Inter-Agent Communication

What this means

Images, prompts, and media URLs submitted for captioning may be processed by Zhipu.

Why it was flagged

Local images can be read, encoded, and sent to Zhipu's external API for captioning. This data flow is expected and disclosed, but it is sensitive-data movement outside the local environment.

Skill content

with open(path, "rb") as f: img_data = base64.b64encode(f.read()).decode() ... API_BASE_URL = "https://open.bigmodel.cn/api/paas/v4/chat/completions" ... requests.post(API_BASE_URL, headers=headers, json=payload

Recommendation

Avoid submitting confidential or regulated media unless Zhipu's terms, retention, and privacy practices are acceptable for that data.