{"skill":{"slug":"free-ocr-zc","displayName":"free-ocr-zc","summary":"Extract text from images via OpenRouter API using Baidu Qianfan OCR model, supporting URLs and local files with customizable prompts.","description":"# OpenRouter OCR Skill\n\n## Overview\n\nThis skill provides OCR (Optical Character Recognition) functionality using models available via OpenRouter. It uses the OpenAI Python library to communicate with OpenRouter's API, specifically designed for models like Baidu's Qianfan OCR.\n\n## Quick Start\n\nWhen you need to extract text from an image:\n\n1. **Ensure prerequisites**: \n   - Python 3.x installed\n   - Required packages: `openai`, `requests` (install via `pip install openai requests`)\n   - Place your OpenRouter API key in the file: `C:\\Users\\Administrator\\.openclaw\\secrets\\openrouter.env`\n     (format: `OPENROUTER_API_KEY=your_key_here`)\n\n2. **Call the OCR script** with an image URL or local file path:\n   ```bash\n   python ocr.py <image_input> [prompt]\n   ```\n   - `image_input`: Either a URL or a local file path to the image\n   - `prompt`: Optional text prompt for the OCR (default: \"OCR提取图片所有文字\")\n\n3. **Get result**: The script prints the extracted text to stdout.\n\n## Usage Examples\n\n### Basic Usage with Default Prompt\n```bash\npython ocr.py \"https://example.com/image.jpg\"\n```\n\n### Custom Prompt\n```bash\npython ocr.py \"https://example.com/image.jpg\" \"请识别图片中的所有文字\"\n```\n\n### Local Image File\n```bash\npython ocr.py \"C:\\path\\to\\image.jpg\"\n```\n\n## How It Works\n\nThe skill uses the OpenAI client configured with:\n- Base URL: `https://openrouter.ai/api/v1`\n- Model: `baidu/qianfan-ocr-fast:free` (configurable via environment variable)\n- API Key: Read from `OPENROUTER_API_KEY` environment variable\n\nIt sends a multimodal request containing:\n1. A text prompt (default: \"OCR提取图片所有文字\")\n2. The image (encoded as base64 if local, or passed directly if URL)\n\nThe model returns the extracted text which is printed to console.\n\n## Environment Variables\n\n- `OPENROUTER_API_KEY`: **Required** - Your OpenRouter API key\n- `OCR_MODEL`: Optional - Model to use (default: `baidu/qianfan-ocr-fast:free`)\n- `OCR_BASE_URL`: Optional - OpenRouter base URL (default: `https://openrouter.ai/api/v1`)\n\n## Installation\n\n1. Create the skill directory: `mkdir -p skills/openrouter-ocr`\n2. Save the `ocr.py` script in this directory\n3. Install dependencies: `pip install openai requests`\n4. Set your OpenRouter API key: \n   ```bash\n   setx OPENROUTER_API_KEY \"your_api_key_here\"\n   ```\n   (Restart terminal after setting)\n\n## Notes\n\n- The skill works with both HTTP/HTTPS URLs and local file paths\n- For local files, the image is read and base64-encoded before sending\n- Error handling includes network issues, invalid API keys, and model errors\n- The default model is Baidu's Qianfan OCR fast version (free tier)\n- You can change the model by setting the `OCR_MODEL` environment variable\n- Response time depends on image size and model speed\n\n## Troubleshooting\n\n- **API Key Error**: Ensure `OPENROUTER_API_KEY` is set correctly\n- **Module Not Found**: Install required packages with `pip install openai requests`\n- **Image Access**: Verify the image URL is accessible or local path exists\n- **Model Not Available**: Check if the specified model is available on OpenRouter\n\n## Example Output\n\n```\n✅ OCR 识别结果：\n------------------------------------------------------------\n这是识别出的文本内容\n...\n------------------------------------------------------------\n```\n\n## Security Note\n\nNever commit your API key to version control. Keep it secure in environment variables.","topics":["OCR","Optical Character Recognition"],"tags":{"latest":"1.0.3"},"stats":{"comments":0,"downloads":354,"installsAllTime":13,"installsCurrent":0,"stars":0,"versions":4},"createdAt":1777868351461,"updatedAt":1778492842520},"latestVersion":{"version":"1.0.3","createdAt":1777869390856,"changelog":"- No code or documentation changes in this release.\n- Version incremented to 1.0.3 for tracking purposes only.\n\n## 技能功能\n1. **图片描述**：先使用AI模型详细描述图片内容（物体、场景、颜色等）\n2. **文字识别（OCR）**：再提取图片中的文字内容\n3. **双重验证**：即使图片中没有文字，也能得到图片描述，避免返回空结果\n\n## 文件位置\n- 技能说明：`C:\\Users\\Administrator\\.openclaw\\workspace\\skills\\openrouter-ocr\\SKILL.md`\n- 主程序：`C:\\Users\\Administrator\\.openclaw\\workspace\\skills\\openrouter-ocr\\ocr.py`\n- API密钥存储：`C:\\Users\\Administrator\\.openclaw\\secrets\\openrouter.env`（已预配置您的密钥）\n\n## 使用方法\n```bash\npython ocr.py <图片URL或本地路径> [可选的OCR提示词]\n```\n\n### 示例\n```bash\n# 使用默认提示词\npython ocr.py \"https://live.staticflickr.com/3851/14825276609_098cac593d_b.jpg\"\n\n# 自定义OCR提示词\npython ocr.py \"https://example.com/image.jpg\" \"请识别图片中的所有文字\"\n```\n\n## 特点\n- ✅ 从 `secrets/openrouter.env` 文件读取API密钥，避免环境变量泄露\n- ✅ 支持HTTP/HTTPS URL和本地文件路径\n- ✅ 已修复Windows控制台编码问题（解决了emoji和特殊字符显示问题）\n- ✅ 默认使用 Baidu Qianfan OCR fast 模型（免费层级）\n- ✅ 可通过环境变量 `OCR_MODEL` 自定义模型\n\n## 测试结果\n使用您提供的海豚图片测试：\n- **图片描述**：正确描述了两只海豚在海水中嬉戏的场景\n- **OCR识别**：提取到了文字 \"跃出一跃\"\n\n技能已就绪，您可以直接使用。如需修改模型或其他配置，请编辑技能目录下的文件。","license":"MIT-0"},"metadata":null,"owner":{"handle":"openclawzhangchong","userId":"s178cqx9kma3mpnvqe8evteyr983hvb4","displayName":"张翀","image":"https://avatars.githubusercontent.com/u/270544860?v=4"},"moderation":null}