GLM-V-Grounding

v1.0.4

A skill that uses GLM-V native grounding capabilities for coordinate conversion, bounding-box visualization, and more. GLM-V native grounding can locate any...

0· 201·0 current·0 all-time
byJared Wen@jaredforreal
MIT-0
Download zip
LicenseMIT-0 · Free to use, modify, and redistribute. No attribution required.
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
The name/description (GLM‑V grounding, coordinate conversion, visualization, tracking) matches the included code and runtime needs. Requested binary (ffmpeg) and Python deps are appropriate for video processing and visualization. The primary credential (ZHIPU_API_KEY) matches the stated API provider.
Instruction Scope
SKILL.md and the CLI drive the agent to run the included scripts (glm_grounding_cli.py) and to pip-install the listed requirements. The runtime instructions focus on feeding images/videos and prompts and returning grounding coordinates/visualizations. The skill explicitly documents that it may read local files provided by the user and fetch public http/https URLs, and its code includes URL validation that rejects localhost/private IPs.
Install Mechanism
This is an instruction+code skill with no automated install spec; dependencies are declared in scripts/requirements.txt and SKILL.md instructs pip install -r. No arbitrary download or extract from unknown URLs is present.
Credentials
The skill only requires ZHIPU_API_KEY and a timeout value (GLM_GROUNDING_TIMEOUT). The SKILL.md marks the timeout as optional (default 60s) while registry metadata listed it as required — minor inconsistency. The CLI loads a .env file from the skill root if present (used to populate ZHIPU_API_KEY), so providing/storing the API key in .env will write it to disk if the config_setup helper is used; this is expected but users should avoid committing .env to source control.
Persistence & Privilege
The skill does not request always:true, does not modify other skills, and is not attempting to persist beyond its own .env/config in the skill directory. Autonomy (model invocation) is allowed by default but that is the platform norm and not a separate concern here.
Assessment
This skill appears to do what it says: it will send image/video bytes and prompts to the official Zhipu/BigModel chat completions endpoint using your ZHIPU_API_KEY and return grounding coordinates and visualizations. Before installing: (1) only provide an API key you trust to send media to that external service; the skill will upload media to Zhipu. (2) Be aware the config helper writes a .env file in the skill directory—do not commit that file to version control. (3) The skill can read local files if you provide local paths (expected), and it will fetch only public http/https URLs (the code explicitly blocks localhost/private IPs). (4) Install dependencies (pip -r scripts/requirements.txt) and ensure ffmpeg is available. Minor inconsistency: registry metadata lists GLM_GROUNDING_TIMEOUT as required while SKILL.md documents it as optional (default 60s). If you need higher assurance, verify the network endpoints and review exactly what payloads are sent to the API (the code uses a fixed DEFAULT_API_URL pointing to open.bigmodel.cn).

Like a lobster shell, security has layers — review code before you run it.

latestvk975g8b3j70p5d1qn88rmp109x83vz3p

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

Runtime requirements

🖼️ Clawdis
Binspython, ffmpeg
EnvZHIPU_API_KEY, GLM_GROUNDING_TIMEOUT
Primary envZHIPU_API_KEY

Comments