Install
openclaw skills install glmv-caption-tunnelGenerate captions (descriptions) for images, videos, and documents using ZhiPu GLM-V multimodal model series. Use this skill whenever the user wants to describe, caption, summarize, or interpret the content of images, videos, or files. Supports single/multiple inputs, URLs, local paths, base64 (images only), and optional structured JSON output via response_format + system prompt.
openclaw skills install glmv-caption-tunnelGenerate captions for images, videos, and documents using the ZhiPu GLM-V multimodal model.
| Type | Formats | Max Size | Max Count | Base64 | Local Path |
|---|---|---|---|---|---|
| Image | jpg, png, jpeg | 5MB / 6000×6000px | 50 | ✅ | ✅ (→base64) |
| Video | mp4, mkv, mov | 200MB | — | ❌ | ✅ (→tunnel) |
| File | pdf, docx, txt, xlsx, pptx, jsonl | — | 50 | ❌ | ✅ (→tunnel) |
⚠️ file_url cannot mix with image_url or video_url in the same request.
The GLM-V API requires public HTTPS URLs for videos and files. When you provide a local path, the script automatically:
cloudflared tunnel (trycloudflare.com quick tunnel) to expose the server to the public internetPrerequisite: cloudflared must be installed. Get it from Cloudflare Downloads.
| Resource | Link |
|---|---|
| Get API Key | https://bigmodel.cn/usercenter/proj-mgmt/apikeys |
| API Docs | Chat Completions / 对话补全 |
This script reads the key from the ZHIPU_API_KEY environment variable and shares it with other Zhipu skills.
脚本通过 ZHIPU_API_KEY 环境变量获取密钥,与其他智谱技能共用同一个 key。
Get Key / 获取 Key: Visit Zhipu Open Platform API Keys / 智谱开放平台 API Keys to create or copy your key.
Setup options / 配置方式(任选一种):
OpenClaw config (recommended) / OpenClaw 配置(推荐): Set in openclaw.json under skills.entries.glmv-caption.env:
"glmv-caption": { "enabled": true, "env": { "ZHIPU_API_KEY": "你的密钥" } }
Shell environment variable / Shell 环境变量: Add to ~/.zshrc:
export ZHIPU_API_KEY="你的密钥"
.env file / .env 文件: Create .env in this skill directory:
ZHIPU_API_KEY=你的密钥
Only needed when passing local file/video paths — not required if you only use URLs.
Install from Cloudflare Downloads.
⛔ MANDATORY: cloudflared 检查与安装流程 (YOU MUST FOLLOW THIS EXACTLY)⛔
当你(AI agent)准备用 --videos 或 --files 传入 本地文件路径 时,在执行脚本之前,你必须先检查 cloudflared 是否已安装。按照以下流程操作:
# macOS / Linux
which cloudflared
# Windows
where cloudflared
如果命令返回了路径(如 /usr/local/bin/cloudflared),说明已安装,直接执行脚本即可。
如果命令报错(not found / 找不到),进入 Step 2。
你必须先完整地向用户解释以下内容,然后询问用户是否同意下载安装。未经用户明确同意,不得自行下载。
向用户展示的说明内容(原文照搬,不要精简):
检测到系统未安装 cloudflared,本地视频/文件路径需要该工具才能使用。
什么是 cloudflared?
cloudflared 是 Cloudflare 公司官方开源的命令行隧道工具(Apache 2.0 协议,GitHub 24000+ stars)。 它的功能是:在本机与 Cloudflare 全球边缘网络之间建立一条加密的临时隧道,将本地服务暴露为公网 HTTPS URL。 开源仓库地址:https://github.com/cloudflare/cloudflared
在本场景中的必要性:
智谱 GLM-V API 要求视频和文件以公网 HTTPS URL 的形式传入(不支持本地上传)。 当你提供本地文件路径时,脚本的工作流程是:
- 在本机启动一个临时 HTTP 服务器(仅暴露你指定的那几个文件,不会暴露其他目录)
- 使用 cloudflared 建立一条临时隧道(通过 trycloudflare.com,无需注册账号、无需登录)
- 获得一个形如
https://xxx-xxx.trycloudflare.com/your-file.mp4的临时公网 URL- 将该 URL 传给 GLM-V API 完成调用
- API 调用完成后,隧道和 HTTP 服务器自动关闭,临时文件清理
没有 cloudflared = 无法建立隧道 = 本地视频/文件路径不可用。
安全说明:
- 隧道仅在脚本运行期间存在(通常几十秒到几分钟),脚本结束后立即关闭
- 只暴露你明确指定的文件,不会暴露整个目录或磁盘
- cloudflared 不会在后台常驻,不会开机自启
- 不需要注册 Cloudflare 账号,不收集个人信息
- 该工具被全球开发者广泛使用,Cloudflare 官方维护
下载链接:
- 官方下载页面(可手动选择版本):https://developers.cloudflare.com/cloudflare-one/connections/connect-networks/downloads/
- Windows x64: https://github.com/cloudflare/cloudflared/releases/latest/download/cloudflared-windows-amd64.exe
- macOS Apple Silicon: https://github.com/cloudflare/cloudflared/releases/latest/download/cloudflared-darwin-arm64.tgz
- macOS Intel: https://github.com/cloudflare/cloudflared/releases/latest/download/cloudflared-darwin-amd64.tgz
- Linux x64: https://github.com/cloudflare/cloudflared/releases/latest/download/cloudflared-linux-amd64
替代方案:如果你不想安装 cloudflared,可以直接传入远程 URL(如
https://example.com/video.mp4),则完全不需要安装任何东西。
等待用户回复。根据用户回复执行对应操作:
用户同意后,使用以下命令安装:
macOS (Homebrew):
brew install cloudflared
Windows (推荐用 winget):
winget install Cloudflare.cloudflared
Linux (下载二进制):
# x64
curl -Lo /usr/local/bin/cloudflared https://github.com/cloudflare/cloudflared/releases/latest/download/cloudflared-linux-amd64
chmod +x /usr/local/bin/cloudflared
其他方式: 从 官方下载页面 手动下载,放到 PATH 中即可。
安装后用 cloudflared version 验证成功,然后执行脚本。
⛔ MANDATORY RESTRICTIONS - DO NOT VIOLATE ⛔
python scripts/glmv_caption.pyAfter running the script, you must show the full raw output to the user exactly as returned. Do not summarize, truncate, or only say "generated". Users need the original model output to evaluate quality.
python scripts/glmv_caption.py --images "https://example.com/photo.jpg"
python scripts/glmv_caption.py --images /path/to/photo.png
python scripts/glmv_caption.py --images img1.jpg img2.png "https://example.com/img3.jpg"
# Remote URL
python scripts/glmv_caption.py --videos "https://example.com/clip.mp4"
# Local file (auto-tunneled via cloudflare)
python scripts/glmv_caption.py --videos /path/to/local-video.mp4
# Remote URL
python scripts/glmv_caption.py --files "https://example.com/report.pdf"
# Local file (auto-tunneled via cloudflare)
python scripts/glmv_caption.py --files /path/to/local-report.pdf
# Mix URLs and local paths
python scripts/glmv_caption.py --files "https://example.com/doc1.docx" /path/to/local-doc2.txt
python scripts/glmv_caption.py --images photo.jpg --prompt "Describe the architecture style in detail"
python scripts/glmv_caption.py --images photo.jpg --output result.json
python scripts/glmv_caption.py --images photo.jpg --thinking
python scripts/glmv_caption.py \
--images photo.jpg \
--response-format \
--sys-prompt '请严格返回 JSON:{"summary":"string","objects":[{"name":"string","confidence":0.0}]}'
推荐做法:
--response-format 仅在你显式传入时才启用结构化输出模式response_format--response-format 不带值时,默认使用 json_object--sys-prompt 明确给出期望的 JSON 字段、类型和示例python {baseDir}/scripts/glmv_caption.py (--images IMG [IMG...] | --videos VID [VID...] | --files FILE [FILE...]) [OPTIONS]
| Parameter | Required | Description |
|---|---|---|
--images, -i | One of | Image paths or URLs (supports multiple, base64 OK) |
--videos, -v | One of | Video paths or URLs (supports multiple, mp4/mkv/mov, local paths auto-tunneled) |
--files, -f | One of | Document paths or URLs (supports multiple, pdf/docx/txt/xlsx/pptx/jsonl, local paths auto-tunneled) |
--prompt, -p | No | Custom prompt (default: "请详细描述这张图片的内容" / "Please describe this image in detail") |
--model, -m | No | Model name (default: glm-5v-turbo) |
--temperature, -t | No | Sampling temperature 0-1 (default: 0.8) |
--top-p | No | Nucleus sampling 0.01-1.0 (default: 0.6) |
--max-tokens | No | Max output tokens (default: 1024, max 32768) |
--thinking | No | Enable thinking/reasoning mode |
--response-format | No | Enable structured output. If passed without a value, defaults to json_object; omitted = disabled |
--sys-prompt | No | System prompt inserted before the user message; recommended for defining the expected JSON schema |
--output, -o | No | Save result JSON to file |
--pretty | No | Pretty-print JSON output |
--stream | No | Enable streaming output |
Note: --images, --videos, and --files are mutually exclusive per API limits.
Based on BigModel structured output guidance, enabling JSON mode is done by sending:
{
"response_format": {"type": "json_object"}
}
This skill exposes that capability through --response-format.
--response-format present → request includes response_format--response-format absent → request behaves exactly like before--sys-prompt is passed as the request's system message--sys-prompt to define the exact JSON shape you want the model to returnExample:
python {baseDir}/scripts/glmv_caption.py \
--files /path/to/report.pdf \
--response-format json_object \
--sys-prompt 'Return JSON only: {"title":"string","summary":"string","risks":["string"]}' \
--pretty
{
"success": true,
"caption": "A landscape photo showing a mountain range at sunset...",
"usage": {
"prompt_tokens": 128,
"completion_tokens": 256,
"total_tokens": 384
}
}
Key fields:
success — whether the request succeededcaption — the generated caption textusage — token usage statisticswarning — present when content was blocked by safety reviewerror — error details on failureAPI key not configured:
ZHIPU_API_KEY not configured. Get your API key at: https://bigmodel.cn/usercenter/proj-mgmt/apikeys
→ Show exact error to user, guide them to configure
Authentication failed (401/403): API key invalid/expired → reconfigure
Rate limit (429): Quota exhausted → inform user to wait
File not found: Local file missing → check path
Content filtered: warning field present → content blocked by safety review
Tunnel failure (local paths only):
Tunnel setup failed: cloudflared not found. Install it from: ...
→ Guide user to install cloudflared, or use a remote URL instead