#ocr #document-recognition #pdf-document

ifly-pdf-image-ocr

ifly-pdf&image-ocr skill supporting both image OCR (AI-powered LLM OCR) and PDF document recognition. Use when user asks to OCR images, extract text from images/PDFs, convert PDF to Word/Markdown, or perform any OCR tasks on images or PDFs. Supports multi-language text extraction, document layout understanding, and various output formats.

Iflytek AIcloud@qingzhe2020

Install

openclaw skills install @qingzhe2020/ifly-pdf-image-ocr

ifly-pdf&image-ocr

AI-powered OCR service for images and PDF documents using iFlytek's advanced recognition APIs.

Quick Start

Image OCR (LLM OCR)

bash

# OCR an image and extract text
python3 scripts/image_ocr.py /path/to/image.jpg

# Save result to file
python3 scripts/image_ocr.py /path/to/image.jpg -o output.txt

# Specify output format
python3 scripts/image_ocr.py /path/to/image.jpg --format json
python3 scripts/image_ocr.py /path/to/image.jpg --format markdown

PDF OCR

bash

# Convert PDF to Word (default)
python3 scripts/pdf_ocr.py document.pdf

# Convert PDF to Markdown
python3 scripts/pdf_ocr.py document.pdf --format markdown

# Convert PDF to JSON
python3 scripts/pdf_ocr.py document.pdf --format json

# From public URL
python3 scripts/pdf_ocr.py --pdf-url "https://example.com/doc.pdf" --format word

Setup

API Credentials

Get credentials from iFlytek Open Platform:

For Image OCR:

APP_ID: Application ID
API_KEY: API key for authentication
API_SECRET: API secret for signing requests

For PDF OCR:

APP_ID: Application ID
API_SECRET: Application secret (for signature generation)

Environment Variables

bash

# Required for both Image OCR and PDF OCR
export IFLY_APP_ID="your_app_id"

# Required for Image OCR
export IFLY_API_KEY="your_api_key"

# Required for PDF OCR
export IFLY_API_SECRET="your_api_secret"

Features

Image OCR (LLM OCR)

AI-powered: Advanced LLM-based OCR for high accuracy
Multi-format output: JSON, Markdown, or both
Layout understanding: Preserves document structure
Multi-language: Supports text extraction in multiple languages
Image preprocessing: Automatic rotation correction, noise removal

PDF OCR

AI-powered OCR: Advanced AI model for accurate text extraction
Multiple output formats:
- Word (.docx) - Editable Word document
- Markdown - Plain text with formatting
- JSON - Structured data
Large PDF support: Up to 100 pages per document
Page-by-page results: Access individual page results
Download URLs: Direct links to processed files

API Parameters

Image OCR Parameters

Parameter	Type	Required	Description
`image_path`	string	Yes	Path to image file
`--format`	string	No	Output format: json, markdown, json,markdown (default: json,markdown)
`--output`	string	No	Save result to file

PDF OCR Parameters

Parameter	Type	Required	Description
`pdf_path`	string	Yes*	Path to PDF file
`--pdf-url`	string	No*	Public URL of PDF file
`--format`	string	No	Output format: word, markdown, json (default: word)
`--no-poll`	flag	No	Return task ID without polling
`--poll-interval`	int	No	Polling interval in seconds (min 5, default: 5)
`--max-wait`	int	No	Maximum wait time in seconds (default: 300)

*Either pdf_path or --pdf-url must be provided

Authentication

Image OCR (HMAC-SHA256)

Uses HMAC-SHA256 signature authentication:

Generate RFC1123 format date: EEE, dd MMM yyyy HH:mm:ss GMT
Create signature origin: host: {host}\\ndate: {date}\\nPOST {path} HTTP/1.1
Calculate signature: HMAC-SHA256(signature_origin, apiSecret)
Build authorization: hmac username="{apiKey}", algorithm="hmac-sha256", headers="host date request-line", signature="{signature}"
Encode authorization in base64
Send as query parameters: ?authorization={auth}&host={host}&date={date}

PDF OCR (MD5 + HMAC-SHA1)

Uses MD5 + HMAC-SHA1 signature authentication:

Generate timestamp (Unix epoch in seconds)
Calculate auth = MD5(appId + timestamp)
Calculate signature = Base64(HMAC-SHA1(auth, apiSecret))
Send headers:
- appId: Application ID
- timestamp: Timestamp in seconds
- signature: Generated signature

Important: Timestamp must be within 5 minutes of server time.

Response Format

Image OCR Response

json

{
  "header": {
    "code": 0,
    "message": "success"
  },
  "payload": {
    "result": {
      "text": "Base64-encoded OCR text..."
    }
  }
}

PDF OCR Start Response

json

{
  "flag": true,
  "code": 0,
  "desc": "成功",
  "data": {
    "taskNo": "25082744936879",
    "status": "CREATE",
    "tip": "任务创建成功"
  }
}

PDF OCR Status Response

json

{
  "flag": true,
  "code": 0,
  "desc": "成功",
  "data": {
    "taskNo": "25082759289333",
    "exportFormat": "word",
    "status": "FINISH",
    "downUrl": "http://bjcdn.openstorage.cn/...",
    "tip": "已完成",
    "pageList": [...]
  }
}

Task Status (PDF OCR)

Status	Description
`CREATE`	Task created successfully
`WAITING`	Waiting in queue
`DOING`	Processing
`FINISH`	Completed
`FAILED`	Failed
`ANY_FAILED`	Partially completed (some pages failed)
`STOP`	Paused

Error Codes

(｡･ω･｡) 嗨~~遇到错误码了吗？来看看怎么解决吧~~ ✧⁺⸜(●˙▾˙●)⸝⁺✧

Platform Common Error Codes

Code	Description	Hint	Solution
10009	input invalid data	(◎_◎;) 哎呀~数据格式不太对呢	检查输入数据是否符合要求
10010	service license not enough	(╯°□°)╯︵ ┻━┻ 授权数量不足或已过期！	提交工单联系客服
10019	service read buffer timeout	(。-`ω´-) session超时啦~	检查是否数据发送完毕但未关闭连接
10043	Syscall AudioCodingDecode error	(◎_◎;) 音频解码失败惹...	检查aue参数，如果为speex，请确保音频是speex音频并分段压缩且与帧大小一致
10114	session timeout	(。-`ω´-) 会话时间超时啦~	检查是否发送数据时间超过了60s
10139	invalid param	(◎_◎;) 参数好像不太对呢	检查参数是否正确
10160	parse request json error	(◎_◎;) 请求数据格式有误~	检查请求数据是否是合法的json
10161	parse base64 string error	(◎_◎;) Base64解码失败啦	检查发送的数据是否使用base64编码了
10163	param validate error	(◎_◎;) 参数校验没通过呢	具体原因见详细的描述
10200	read data timeout	(。-`ω´-) 读取数据超时了~	检查是否累计10s未发送数据并且未关闭连接
10222	context deadline exceeded	(╯°□°)╯︵ ┻━┻ 出错啦！	1.检查上传数据是否超过接口上限；2.SSL证书无效请提交工单
10223	RemoteLB: can't find valued addr	(◎_◎;) 找不到服务节点呢	提交工单联系技术人员
10313	invalid appid	(◎_◎;) appid和apikey不匹配哦	检查appid是否合法
10317	invalid version	(◎_◎;) 版本号有问题呢	请到控制台提交工单联系技术人员
10700	not authority	(╯°□°)╯︵ ┻━┻ 权限不足！	按照报错原因对照开发文档检查，如仍无法解决，请提供sid及错误信息提交工单
11200	auth no license	(╯°□°)╯︵ ┻━┻ 功能未授权！	检查appid是否正确，确认是否添加了相关服务，检查调用量是否超限或授权是否到期
11201	auth no enough license	(╯°□°)╯︵ ┻━┻ 每日交互次数超限啦！	提交应用审核提额或联系商务购买企业级接口
11503	server error: atmos return error	(。-`ω´-) 服务器返回了错误数据...	提交工单
11502	server error: too many datas	(。-`ω´-) 服务器配置有问题呢	提交工单
100001~100010	WrapperInitErr	(◎_◎;) 引擎调用出错啦！	请根据message中的errno查看引擎错误码说明

Additional Resources

(｡･ω･｡) 服务购买链接：通用文字识别（OCR大模型版）
(｡･ω･｡) 商务咨询链接：购买服务量

Original API Error Codes

Code	Description	Solution
10000	System error	Check auth info, request method, parameters
10001	Signature authentication failed	Check credentials
10002	Business processing error	Check error message
10003	Quota/insufficient balance	Check account balance

Limitations

Image OCR

Format: Common image formats (JPG, PNG, etc.)
Size: Reasonable file sizes for web upload
Rate limiting: Follow API rate limits

PDF OCR

Max pages: 100 pages per PDF
Protected PDFs: Not supported (password/encrypted)
Rate limiting: Status query limited to once per 5 seconds
Time limit: Timestamp must be within ±5 minutes of server time

Tips

Image OCR

High-quality images: Use clear, high-resolution images for best results
Multiple formats: Use json,markdown to get both structured and formatted output
Save results: Use -o flag to save OCR results to file

PDF OCR

Math formulas: Use markdown format for PDFs with mathematical formulas
Large PDFs: Split into sections if > 100 pages
Polling interval: Minimum 5 seconds between status queries
Network URLs: Ensure PDF URLs are publicly accessible
Download URLs: Download files promptly as URLs may expire

ifly-pdf-image-ocr

Install

ifly-pdf&image-ocr

Quick Start

Image OCR (LLM OCR)

PDF OCR

Setup

API Credentials

Environment Variables

Features

Image OCR (LLM OCR)

PDF OCR

API Parameters

Image OCR Parameters

PDF OCR Parameters

Authentication

Image OCR (HMAC-SHA256)

PDF OCR (MD5 + HMAC-SHA1)

Response Format

Image OCR Response

PDF OCR Start Response

PDF OCR Status Response

Task Status (PDF OCR)

Error Codes

Platform Common Error Codes

Additional Resources

Original API Error Codes

Limitations

Image OCR

PDF OCR

Tips

Image OCR

PDF OCR

Related skills