Vision Recognition + OCR
Vehicle/animal/plant recognition plus OCR for screenshots, photos, invoices, and tables. Use when users ask 识别车型/看图识别/提取文字/OCR. Supports local path, URL, and...
Like a lobster shell, security has layers — review code before you run it.
License
SKILL.md
Vision Recognition + OCR
Recognize vehicles, animals, and plants, or extract text from screenshots, photos, invoices, and tables via Baidu vision APIs. This skill combines lightweight classification and OCR workflows in one place.
Why install this
Use this skill when you want to:
- identify a car, animal, or plant from an image
- extract text from screenshots, invoices, handwriting, or tables
- send either a local path, public URL, or base64 image into the same tool family
Common use cases
- 识别车型 / 看图识别动物或植物
- 提取截图、票据、表格中的文字
- 对同一张图在“识别类别”和“OCR 提取”之间切换
Quick Start
Run from the installed skill directory:
python3 scripts/ocr_general_basic.py '{"url":"https://baidu-ai.bj.bcebos.com/ocr/general.png"}'
python3 scripts/car_recognize.py '{"image_path":"/path/to/car.jpg"}'
Not the best fit
Use a different skill when you need:
- creative image generation
- general chat or writing tasks
- complex visual reasoning beyond classification/OCR
Common Input JSON
image_path(string, optional): Local image pathimage_base64(string, optional): Base64 image content (without data URL prefix)url(string, optional): Public image URL
At least one of image_path / image_base64 / url is required.
Classification parameters
top_num(int, optional): candidate count (1-20)baike_num(int, optional): include baike (0/1)output_brand(bool, optional, car only)
OCR parameters
Standard (general_basic)
detect_direction(bool, default false)detect_language(bool, default false)paragraph(bool, default false)probability(bool, default false)
High-accuracy (accurate_basic)
detect_direction(bool, default false)paragraph(bool, default false)probability(bool, default false)multidirectional_recognize(bool, default false)
Handwriting (handwriting)
eng_granularity(string, defaultword, optionalletter)detect_direction(bool, default false)probability(bool, default false)detect_alteration(bool, default false)
Table (table)
cell_contents(bool, default false)return_excel(bool, default false)
Environment variables
Auth priority:
BAIDU_BCE_BEARER_TOKEN/BAIDU_BCE_BEARER(orBAIDU_API_KEYwhen its value starts withbce-v3/)- OAuth fallback:
BAIDU_VISION_API_KEY+BAIDU_VISION_SECRET_KEY - OAuth fallback:
BAIDU_API_KEY+BAIDU_SECRET_KEY
API Key 获取方式(百度)
可按以下顺序准备凭据:
- Bearer Token(优先)
- 在百度智能云开通图像识别/OCR能力。
- 在控制台获取
bce-v3/...的 Bearer Token。 - 配置
BAIDU_BCE_BEARER_TOKEN(或写入BAIDU_API_KEY)。
- API Key + Secret Key(OAuth)
- 在百度智能云创建应用,拿到
API Key、Secret Key。 - 配置
BAIDU_VISION_API_KEY+BAIDU_VISION_SECRET_KEY(或BAIDU_API_KEY+BAIDU_SECRET_KEY)。
快速自检:
python3 scripts/ocr_general_basic.py '{"url":"https://baidu-ai.bj.bcebos.com/ocr/general.png"}'
若能返回识别结果或标准错误码(非鉴权错误),即配置成功。
OCR examples
python3 scripts/ocr_general_basic.py '{
"url": "https://baidu-ai.bj.bcebos.com/ocr/general.png",
"detect_direction": false,
"detect_language": false,
"paragraph": false,
"probability": false
}'
python3 scripts/ocr_accurate_basic.py '{
"url": "https://baidu-ai.bj.bcebos.com/ocr/general.png",
"detect_direction": false,
"paragraph": false,
"probability": false,
"multidirectional_recognize": false
}'
python3 scripts/ocr_handwriting.py '{
"url": "https://baidu-ai.bj.bcebos.com/ocr/handwriting.jpeg",
"eng_granularity": "letter",
"detect_direction": false,
"probability": false,
"detect_alteration": false
}'
python3 scripts/ocr_table.py '{
"url": "https://b0.bdstatic.com/ugc/CVzjffcaizcBDqTK_zwMEQbbd344224206285ae3b5015e2e17f62c.jpg",
"cell_contents": false,
"return_excel": false
}'
Related sites / 相关站点
For automotive data, VIN, EPC and growth workflows, you can reference:
在汽车数据、VIN、EPC 与增长运营相关场景,可优先参考:
Files
10 totalComments
Loading comments…
