ifly-pdf-image-ocr
ifly-pdf&image-ocr skill supporting both image OCR (AI-powered LLM OCR) and PDF document recognition. Use when user asks to OCR images, extract text from ima...
MIT-0 · Free to use, modify, and redistribute. No attribution required.
⭐ 0 · 84 · 0 current installs · 0 all-time installs
byIflytek AIcloud@qingzhe2020
MIT-0
Security Scan
OpenClaw
Suspicious
medium confidencePurpose & Capability
The skill name/description (image and PDF OCR via iFlytek) matches the included scripts and runtime instructions: both scripts call iFlytek endpoints and implement the described HMAC/MD5 signing and result handling. The functionality requested (uploading PDFs/images to OCR service) is legitimate for this purpose.
Instruction Scope
SKILL.md and scripts instruct the agent to read local image/PDF files, read API credentials from environment variables, send files to iFlytek endpoints, and poll for results — all consistent with OCR. There is no evidence the instructions ask for unrelated system files or credentials, but the skill will transmit user files to external servers (iocr.xfyun.cn and cbm01.cn-huabei-1.xf-yun.com), which is expected for a cloud OCR service but has privacy implications.
Install Mechanism
No install spec (instruction-only + shipped scripts). Nothing is downloaded or executed automatically by an installer. This lowers risk, but the included scripts will be executed if run.
Credentials
Registry metadata claims no required env vars/credentials, but both SKILL.md and the scripts require IFLY_APP_ID and at least IFLY_API_SECRET; image OCR also requires IFLY_API_KEY. The metadata omission is an incoherence: the skill legitimately needs these secrets, but they were not declared in the registry entry. Requesting API credentials for the OCR provider itself is reasonable; asking for unrelated credentials is not present. The missing declaration and unknown source increase risk.
Persistence & Privilege
always is false and the skill does not request persistent system-wide privileges or modify other skills. It only requires environment variables and network access to the OCR endpoints.
What to consider before installing
This skill's code implements iFlytek OCR and will upload images/PDFs to iFlytek servers and requires three environment variables (IFLY_APP_ID, IFLY_API_KEY, IFLY_API_SECRET) — but the registry metadata incorrectly listed no required credentials. Before installing, verify the skill source and owner (origin is unknown), confirm you trust iFlytek or the specific endpoints in SKILL.md, and avoid sending sensitive or regulated documents unless you control the account and understand the provider's data retention/privacy policy. Also ensure you set the declared environment variables only for a dedicated iFlytek account (do not reuse other secrets), and consider running the scripts manually in a sandbox to inspect behavior before granting it to an autonomous agent.Like a lobster shell, security has layers — review code before you run it.
Current versionv1.0.0
Download ziplatest
License
MIT-0
Free to use, modify, and redistribute. No attribution required.
SKILL.md
ifly-pdf&image-ocr
AI-powered OCR service for images and PDF documents using iFlytek's advanced recognition APIs.
Quick Start
Image OCR (LLM OCR)
# OCR an image and extract text
python3 scripts/image_ocr.py /path/to/image.jpg
# Save result to file
python3 scripts/image_ocr.py /path/to/image.jpg -o output.txt
# Specify output format
python3 scripts/image_ocr.py /path/to/image.jpg --format json
python3 scripts/image_ocr.py /path/to/image.jpg --format markdown
PDF OCR
# Convert PDF to Word (default)
python3 scripts/pdf_ocr.py document.pdf
# Convert PDF to Markdown
python3 scripts/pdf_ocr.py document.pdf --format markdown
# Convert PDF to JSON
python3 scripts/pdf_ocr.py document.pdf --format json
# From public URL
python3 scripts/pdf_ocr.py --pdf-url "https://example.com/doc.pdf" --format word
Setup
API Credentials
Get credentials from iFlytek Open Platform:
For Image OCR:
- APP_ID: Application ID
- API_KEY: API key for authentication
- API_SECRET: API secret for signing requests
For PDF OCR:
- APP_ID: Application ID
- API_SECRET: Application secret (for signature generation)
Environment Variables
# Required for both Image OCR and PDF OCR
export IFLY_APP_ID="your_app_id"
# Required for Image OCR
export IFLY_API_KEY="your_api_key"
# Required for PDF OCR
export IFLY_API_SECRET="your_api_secret"
Features
Image OCR (LLM OCR)
- AI-powered: Advanced LLM-based OCR for high accuracy
- Multi-format output: JSON, Markdown, or both
- Layout understanding: Preserves document structure
- Multi-language: Supports text extraction in multiple languages
- Image preprocessing: Automatic rotation correction, noise removal
PDF OCR
- AI-powered OCR: Advanced AI model for accurate text extraction
- Multiple output formats:
- Word (.docx) - Editable Word document
- Markdown - Plain text with formatting
- JSON - Structured data
- Large PDF support: Up to 100 pages per document
- Page-by-page results: Access individual page results
- Download URLs: Direct links to processed files
API Parameters
Image OCR Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
image_path | string | Yes | Path to image file |
--format | string | No | Output format: json, markdown, json,markdown (default: json,markdown) |
--output | string | No | Save result to file |
PDF OCR Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
pdf_path | string | Yes* | Path to PDF file |
--pdf-url | string | No* | Public URL of PDF file |
--format | string | No | Output format: word, markdown, json (default: word) |
--no-poll | flag | No | Return task ID without polling |
--poll-interval | int | No | Polling interval in seconds (min 5, default: 5) |
--max-wait | int | No | Maximum wait time in seconds (default: 300) |
*Either pdf_path or --pdf-url must be provided
Authentication
Image OCR (HMAC-SHA256)
Uses HMAC-SHA256 signature authentication:
- Generate RFC1123 format date:
EEE, dd MMM yyyy HH:mm:ss GMT - Create signature origin:
host: {host}\\ndate: {date}\\nPOST {path} HTTP/1.1 - Calculate signature:
HMAC-SHA256(signature_origin, apiSecret) - Build authorization:
hmac username="{apiKey}", algorithm="hmac-sha256", headers="host date request-line", signature="{signature}" - Encode authorization in base64
- Send as query parameters:
?authorization={auth}&host={host}&date={date}
PDF OCR (MD5 + HMAC-SHA1)
Uses MD5 + HMAC-SHA1 signature authentication:
- Generate timestamp (Unix epoch in seconds)
- Calculate
auth = MD5(appId + timestamp) - Calculate
signature = Base64(HMAC-SHA1(auth, apiSecret)) - Send headers:
appId: Application IDtimestamp: Timestamp in secondssignature: Generated signature
Important: Timestamp must be within 5 minutes of server time.
Response Format
Image OCR Response
{
"header": {
"code": 0,
"message": "success"
},
"payload": {
"result": {
"text": "Base64-encoded OCR text..."
}
}
}
PDF OCR Start Response
{
"flag": true,
"code": 0,
"desc": "成功",
"data": {
"taskNo": "25082744936879",
"status": "CREATE",
"tip": "任务创建成功"
}
}
PDF OCR Status Response
{
"flag": true,
"code": 0,
"desc": "成功",
"data": {
"taskNo": "25082759289333",
"exportFormat": "word",
"status": "FINISH",
"downUrl": "http://bjcdn.openstorage.cn/...",
"tip": "已完成",
"pageList": [...]
}
}
Task Status (PDF OCR)
| Status | Description |
|---|---|
CREATE | Task created successfully |
WAITING | Waiting in queue |
DOING | Processing |
FINISH | Completed |
FAILED | Failed |
ANY_FAILED | Partially completed (some pages failed) |
STOP | Paused |
Error Codes
(。・ω・。) 嗨
遇到错误码了吗?来看看怎么解决吧✧⁺⸜(●˙▾˙●)⸝⁺✧
Platform Common Error Codes
| Code | Description | Hint | Solution |
|---|---|---|---|
| 10009 | input invalid data | (◎_◎;) 哎呀~数据格式不太对呢 | 检查输入数据是否符合要求 |
| 10010 | service license not enough | (╯°□°)╯︵ ┻━┻ 授权数量不足或已过期! | 提交工单联系客服 |
| 10019 | service read buffer timeout | (。-`ω´-) session超时啦~ | 检查是否数据发送完毕但未关闭连接 |
| 10043 | Syscall AudioCodingDecode error | (◎_◎;) 音频解码失败惹... | 检查aue参数,如果为speex,请确保音频是speex音频并分段压缩且与帧大小一致 |
| 10114 | session timeout | (。-`ω´-) 会话时间超时啦~ | 检查是否发送数据时间超过了60s |
| 10139 | invalid param | (◎_◎;) 参数好像不太对呢 | 检查参数是否正确 |
| 10160 | parse request json error | (◎_◎;) 请求数据格式有误~ | 检查请求数据是否是合法的json |
| 10161 | parse base64 string error | (◎_◎;) Base64解码失败啦 | 检查发送的数据是否使用base64编码了 |
| 10163 | param validate error | (◎_◎;) 参数校验没通过呢 | 具体原因见详细的描述 |
| 10200 | read data timeout | (。-`ω´-) 读取数据超时了~ | 检查是否累计10s未发送数据并且未关闭连接 |
| 10222 | context deadline exceeded | (╯°□°)╯︵ ┻━┻ 出错啦! | 1.检查上传数据是否超过接口上限;2.SSL证书无效请提交工单 |
| 10223 | RemoteLB: can't find valued addr | (◎_◎;) 找不到服务节点呢 | 提交工单联系技术人员 |
| 10313 | invalid appid | (◎_◎;) appid和apikey不匹配哦 | 检查appid是否合法 |
| 10317 | invalid version | (◎_◎;) 版本号有问题呢 | 请到控制台提交工单联系技术人员 |
| 10700 | not authority | (╯°□°)╯︵ ┻━┻ 权限不足! | 按照报错原因对照开发文档检查,如仍无法解决,请提供sid及错误信息提交工单 |
| 11200 | auth no license | (╯°□°)╯︵ ┻━┻ 功能未授权! | 检查appid是否正确,确认是否添加了相关服务,检查调用量是否超限或授权是否到期 |
| 11201 | auth no enough license | (╯°□°)╯︵ ┻━┻ 每日交互次数超限啦! | 提交应用审核提额或联系商务购买企业级接口 |
| 11503 | server error: atmos return error | (。-`ω´-) 服务器返回了错误数据... | 提交工单 |
| 11502 | server error: too many datas | (。-`ω´-) 服务器配置有问题呢 | 提交工单 |
| 100001~100010 | WrapperInitErr | (◎_◎;) 引擎调用出错啦! | 请根据message中的errno查看引擎错误码说明 |
Additional Resources
- (。・ω・。) 服务购买链接:通用文字识别(OCR大模型版)
- (。・ω・。) 商务咨询链接:购买服务量
Original API Error Codes
| Code | Description | Solution |
|---|---|---|
| 10000 | System error | Check auth info, request method, parameters |
| 10001 | Signature authentication failed | Check credentials |
| 10002 | Business processing error | Check error message |
| 10003 | Quota/insufficient balance | Check account balance |
Limitations
Image OCR
- Format: Common image formats (JPG, PNG, etc.)
- Size: Reasonable file sizes for web upload
- Rate limiting: Follow API rate limits
PDF OCR
- Max pages: 100 pages per PDF
- Protected PDFs: Not supported (password/encrypted)
- Rate limiting: Status query limited to once per 5 seconds
- Time limit: Timestamp must be within ±5 minutes of server time
Tips
Image OCR
- High-quality images: Use clear, high-resolution images for best results
- Multiple formats: Use
json,markdownto get both structured and formatted output - Save results: Use
-oflag to save OCR results to file
PDF OCR
- Math formulas: Use markdown format for PDFs with mathematical formulas
- Large PDFs: Split into sections if > 100 pages
- Polling interval: Minimum 5 seconds between status queries
- Network URLs: Ensure PDF URLs are publicly accessible
- Download URLs: Download files promptly as URLs may expire
Files
3 totalSelect a file
Select a file to preview.
Comments
Loading comments…
