ifly-image-understanding

iFlytek Image Understanding (图片理解) — analyze and answer questions about images using Spark Vision model. WebSocket API, pure Python stdlib, no pip dependencies.

MIT-0 · Free to use, modify, and redistribute. No attribution required.

⭐ 0 · 78 · 0 current installs · 0 all-time installs

byIflytek AIcloud@qingzhe2020

MIT-0

Security Scan

VirusTotal

Benign

View report →

OpenClaw

Suspicious

medium confidence

ℹ

Purpose & Capability

The name/description (iFlytek Image Understanding) match the code: the script reads a local image, HMAC-signs requests, and connects to iFlytek's Spark Vision WebSocket endpoint. However, the skill registry metadata declares no required environment variables while both SKILL.md and the script require IFLY_APP_ID, IFLY_API_KEY, and IFLY_API_SECRET — this metadata mismatch is unexpected and should be corrected or explained.

✓

Instruction Scope

The runtime instructions are narrowly scoped: set iFlytek credentials, supply a local image, and run the Python script which transmits the image and question to the documented iFlytek wss endpoint. There are no instructions to read arbitrary files or exfiltrate data to other endpoints. Note: transmitted data includes the raw image and any question text (so avoid sending sensitive images/queries).

✓

Install Mechanism

There is no install spec and the code uses only Python stdlib; no external downloads or package installs are requested. This is a low-risk install mechanism in itself.

Credentials

Requesting IFLY_APP_ID / IFLY_API_KEY / IFLY_API_SECRET is proportionate to authenticating to iFlytek and is expected. The concern is twofold: (1) the registry metadata does not declare these required env vars (inconsistency that could mislead users), and (2) the repository includes a .claude/settings.local.json file granting Read(...) permission to a specific user's Desktop path and a zip command — that config is unrelated to the skill's runtime needs and suggests accidental inclusion of local packaging settings that could disclose or request broader filesystem access.

✓

Persistence & Privilege

The skill does not request always:true, does not declare persistent installation steps, and the code does not modify other skills or system-wide settings. It performs one-off WebSocket calls during invocation.

What to consider before installing

Before installing or using this skill: 1) Recognize that the script will send the full image bytes and your question to iFlytek's cloud (wss://spark-api.cn-huabei-1.xf-yun.com/v2.1/image). Do not use it with sensitive images or questions unless you trust the service and your agreement with it. 2) The SKILL.md and script require IFLY_APP_ID, IFLY_API_KEY, and IFLY_API_SECRET — verify the registry listing or installer prompts these; do not paste secrets into unexpected places. 3) Remove or inspect the included .claude/settings.local.json before use: it contains a Read(...) permission pointing at a user's Desktop path and a zip command, which is unrelated to normal runtime and may be an accidental leak of local packaging settings. 4) Run the script in a controlled environment (isolated user account or container) if you have any doubt. 5) If you need higher assurance, ask the author to (a) update the registry metadata to declare required env vars, (b) remove any local .claude files from the published package, and (c) confirm the only network destination is the documented iFlytek WebSocket endpoint.

Like a lobster shell, security has layers — review code before you run it.

Current versionv1.0.0

Download zip

latestvk977ydk3e1kewezrhq1tdxjhdn8348ra

License

MIT-0

Free to use, modify, and redistribute. No attribution required.

Termshttps://spdx.org/licenses/MIT-0.html

SKILL.md

ifly-image-understanding

Analyze images and answer questions about their content using iFlytek's Spark Vision model (图片理解).

Setup

Create an app at 讯飞控制台 with 图片理解 service enabled

Set environment variables:

export IFLY_APP_ID="your_app_id"
export IFLY_API_KEY="your_api_key"
export IFLY_API_SECRET="your_api_secret"

Usage

Describe an image

python3 scripts/image_understanding.py photo.jpg

Ask a question about an image

python3 scripts/image_understanding.py photo.jpg -q "图片里有什么动物？"

Use basic model (lower token cost)

python3 scripts/image_understanding.py photo.jpg --domain general

Options

Flag	Short	Description
`image`		Image file path (.jpg, .jpeg, .png)
`--question`	`-q`	Question about the image (default: describe)
`--domain`	`-d`	`imagev3` (advanced, default) or `general` (basic, fixed 273 tokens/image)
`--temperature`	`-t`	Sampling temperature (0,1], default 0.5
`--max-tokens`		Max response tokens 1-8192, default 2048
`--raw`		Output raw WebSocket JSON frames

Examples

# OCR a receipt
python3 scripts/image_understanding.py receipt.png -q "总金额是多少？"

# Identify objects
python3 scripts/image_understanding.py scene.jpg -q "图片中有哪些物体？"

# Low-cost basic model
python3 scripts/image_understanding.py chart.png -q "图表的趋势是什么？" -d general

Notes

Image formats: .jpg, .jpeg, .png
Max image size: 4MB
Max tokens: 8192 (input + output combined)
Auth: HMAC-SHA256 signed WebSocket URL
Endpoint: wss://spark-api.cn-huabei-1.xf-yun.com/v2.1/image
Pure stdlib: No pip dependencies — uses built-in socket + ssl for WebSocket
Model versions: imagev3 (advanced, dynamic token cost) vs general (basic, fixed 273 tokens/image)

错误码说明 😢

遇到错误先别慌～看看下面找到对应的解决方法吧！✨

错误码	错误信息	解决办法
0	🎉 成功	恭喜你！请求正常完成啦～
10003	用户的消息格式有错误	检查一下你的请求格式是否正确哦～确保发送的是合法的JSON格式呢！
10004	用户数据的schema错误	看起来数据结构有点问题～请检查一下字段名称和类型是否正确呀！
10005	用户参数值有错误	参数值可能不太对呢～仔细核对一下每个参数的有效范围吧！
10006	用户并发错误：同一用户不能多处同时连接	检测到重复连接啦！请确保只有一个客户端在连接同一个用户ID哦～
10013	用户问题涉及敏感信息，审核不通过	哎呀，你的问题可能包含了一些不太合适的内容～换个问题试试看吧！
10022	模型生产的图片涉及敏感信息，审核不通过	生成的图片没有通过审核呢...很抱歉，换张图片再试一下吧！
10029	图片任何一边的长度超过12800	图片尺寸太大啦！请确保图片宽高都不超过12800像素哦～
10041	图片分辨率不符合要求	图片尺寸不合适的呢～要求是：50×50 < 图片总像素值 < 6000×6000 哦！
10907	Token数量超过上限	内容太丰富啦！对话历史+问题的字数太多，需要精简一下输入哦～

💡 小贴士：如果还有其他问题，可以查看官方文档或者联系技术支持哦！

常见问题 🤔

图片理解的主要功能是什么呀？🐱

答：用户输入一张图片和问题，从而识别出图片中的对象、场景等信息，然后回答你的问题～是不是很方便呢！✨

图片理解支持什么应用平台呢？📱

答：目前支持 Web API 应用平台哦！直接在代码里调用就可以啦～

图片理解的文本大小限制多少呀？📝

答：有效内容不能超过 8192 Token 呢～如果超过了就要精简一下输入啦！

Comments

Loading comments…

ifly-image-understanding

License

SKILL.md

ifly-image-understanding

Setup

Usage

Describe an image

Ask a question about an image

Use basic model (lower token cost)

Options

Examples

Notes

错误码说明 😢

常见问题 🤔

图片理解的主要功能是什么呀？🐱

图片理解支持什么应用平台呢？📱

图片理解的文本大小限制多少呀？📝

更多资源 📚

Files

Comments