Image To Code
将图片(含文字、公式、标题)转换为指定代码格式。自动识别标题级别(title1/title2/title3),文字行转为 $word->body("正文=".$F);,公式转为 $word->formula("");,图片标记为 ![image]
MIT-0 · Free to use, modify, and redistribute. No attribution required.
⭐ 0 · 30 · 0 current installs · 0 all-time installs
MIT-0
Security Scan
OpenClaw
Suspicious
high confidencePurpose & Capability
Name/description (image → code, OCR, formula detection, title extraction) align with the included code, tests, and docs. Use of OCR engines (Tesseract / PaddleOCR / Baidu OCR / vision AI) and post-processing for LaTeX/title extraction is coherent with the stated purpose.
Instruction Scope
SKILL.md and the code instruct the agent to perform OCR, formula conversion, and to optionally call external vision AI. The runtime instructions and code will convert images and may call remote OCR/vision endpoints. They also recommend using external cloud vision (GPT-4V/Claude Vision/Google/Azure) as options — this means image contents (potentially sensitive) can be transmitted off-host. The SKILL.md and usage docs imply default/cloud-first behavior (Baidu OCR high-precision) which is broader than a strictly offline OCR helper.
Install Mechanism
There is no platform install spec, but an install.sh and requirements.txt are provided. install.sh runs pip install -r requirements.txt (paddlepaddle/paddleocr/opencv/numpy/Pillow) which are expected for OCR/image processing but can be heavy; no downloads from untrusted URLs are present. This is standard but will install large packages (paddlepaddle) and may require system dependencies.
Credentials
The skill declares no required environment variables or credentials, yet image_to_code.py contains hard-coded Baidu API key and secret (plaintext strings) and makes network calls to Baidu OCR endpoints, which is disproportionate: either the skill should ask the user to provide API credentials (env/config) or operate fully offline. Hard-coded credentials are a security/privacy concern and make the skill operate with third-party access to user images without explicit configuration.
Persistence & Privilege
The skill does not request always:true, does not modify other skills or system-wide configuration, and does not request privileged persistence. It installs dependencies into the environment via pip if run, which is normal for a Python skill.
Scan Findings in Context
[hardcoded-credentials] unexpected: image_to_code.py includes plaintext baidu_api_key and baidu_secret_key values and then requests an access_token and posts base64-encoded image data to Baidu OCR endpoints. Hard-coded service credentials are not expected for a general-purpose converter and create privacy/credential risks (images uploaded without user-provided API keys).
What to consider before installing
This skill implements image→code conversion and uses OCR/heuristics as advertised, but it embeds and uses hard-coded Baidu OCR credentials and will upload base64-encoded images to Baidu's API by default. Consider the following before installing or running it:
- Privacy: any images you process may be sent to an external service (Baidu) under the embedded key. Do not run it on sensitive documents unless you understand where data is sent and trust that endpoint.
- Credentials: hard-coded API keys are unsafe. Prefer a version that requires you to configure your own API key (via env var or config file) or remove cloud OCR and use an offline engine (Tesseract/PaddleOCR). If you discover these keys are active and you control accounts associated with them, rotate/revoke them.
- If you need offline-only processing, modify the code to disable Baidu OCR and ensure use_tesseract/PaddleOCR is enabled and configured; verify requirements and system dependencies for local OCR.
- Audit network calls (requests.post) and any logging the skill performs; if you cannot inspect runtime traffic, run in an isolated environment (no network) or sandbox first.
- The install script will pip-install heavy packages (paddlepaddle/paddleocr); run in a virtualenv/container to avoid altering system Python packages.
Given these issues, treat the provided Baidu integration and its embedded credentials as a blocker until you either supply your own credentials or explicitly disable remote OCR.Like a lobster shell, security has layers — review code before you run it.
Current versionv2.0.0
Download ziplatest
License
MIT-0
Free to use, modify, and redistribute. No attribution required.
SKILL.md
图片转代码格式转换器
功能概述
将包含文字、公式、图片的文档截图转换为指定的代码格式,支持 OCR 文字识别、公式识别和格式转换。
输出格式规范
| 内容类型 | 格式模板 | 示例输入 | 示例输出 |
|---|---|---|---|
| 一级标题 | $word->title1("标题文字"); | 第一章 项目概述 | $word->title1("项目概述"); |
| 二级标题 | $word->title2("标题文字"); | 1.1 项目背景<br>(1) 提高效率 | $word->title2("项目背景");<br>$word->title2("提高效率"); |
| 三级标题 | $word->title3("标题文字"); | 1.1.1 技术路线 | $word->title3("技术路线"); |
| 文字行 | $word->body("正文=内容=".$F); | 这是正文 | $word->body("正文=这是正文=".$F); |
| 公式 | $word->formula("LaTeX 公式"); | E = mc² | $word->formula("E = mc^2"); |
| 图片 | ![image] | [图表] | ![image] |
| 空行 | 保持空行 | (空) | (空行) |
标题提取规则
| 级别 | 识别模式 | 提取规则 | 示例 |
|---|---|---|---|
| 一级标题 | 第 X 章 、 第 X 部分、 一、 | 去掉编号前缀 | 第一章 总述 → 总述 |
| 二级标题 | 第 X 节、1.1、(1)、(一) | 去掉编号前缀 | 1.1 背景 → 背景<br>(1) 提高 → 提高 |
| 三级标题 | 1.1.1、1、 | 去掉编号前缀 | 1.1.1 架构 → 架构 |
执行流程
阶段一:图片预处理
-
图像增强
- 灰度化处理
- 二值化(文字区域)
- 去噪点
-
区域分割
- 文字区域检测
- 公式区域检测
- 图片区域检测
-
顺序识别
- 从上到下扫描
- 从左到右排序
- 保持原始顺序
阶段二:内容识别
2.1 文字识别 (OCR)
工具: PaddleOCR / Tesseract / 视觉 AI
处理逻辑:
def process_text_line(text):
# 清理 OCR 结果
text = text.strip()
# 转义特殊字符
text = text.replace('"', '\\"')
# 生成代码
return f'$word->body("正文={text}=".$F);'
2.2 公式识别
工具: Pix2Tex / MathOCR / 视觉 AI
识别流程:
- 检测公式区域(特殊字体、符号)
- 转换为 LaTeX 格式
- 生成 formula 代码
判断规则:
- 包含数学符号:∑∫∂∇√∞≈≠≤≥±×÷
- 包含变量:x, y, z, α, β, γ, θ
- 包含上标/下标格式
- 独立成行的数学表达式
2.3 图片识别
判断规则:
- 图表区域(坐标轴、图例)
- 流程图/框图
- 非文字非公式的图像内容
阶段三:格式转换
3.1 文字行处理
输入:这是一段测试文字
输出:$word->body("正文=这是一段测试文字=".$F);
3.2 公式处理
输入:E = mc²
输出:$word->formula("E = mc^2");
输入:∑(i=1 to n) xi
输出:$word->formula("\sum_{i=1}^{n} x_i");
3.3 图片处理
输入:[图表图像]
输出:![image]
技术实现
依赖库
# OCR
paddlepaddle
paddleocr
# 公式识别
pix2tex
latex2sympy
# 图像处理
opencv-python
Pillow
numpy
# 可选:视觉 AI
openai # GPT-4V
anthropic # Claude Vision
核心代码结构
#!/usr/bin/env python3
"""
图片转代码格式转换器
将图片中的文字、公式、图表转换为指定代码格式
"""
import cv2
import numpy as np
from pathlib import Path
from paddleocr import PaddleOCR
from typing import List, Tuple, Dict
class ImageToCodeConverter:
def __init__(self, ocr_lang='ch'):
"""初始化 OCR 引擎"""
self.ocr = PaddleOCR(use_angle_cls=True, lang=ocr_lang)
def detect_content_type(self, image_region: np.ndarray) -> str:
"""
检测内容类型
返回:'text' | 'formula' | 'image'
"""
# 分析区域特征
# 公式:特殊符号密度高、字体变化大
# 图片:颜色丰富、边缘复杂
# 文字:规则排列、对比度高
pass
def ocr_text(self, image: np.ndarray) -> List[Dict]:
"""执行 OCR 识别"""
result = self.ocr.ocr(image, cls=True)
return result
def formula_to_latex(self, formula_image: np.ndarray) -> str:
"""公式图像转 LaTeX"""
# 使用 pix2tex 或视觉 AI
pass
def convert_line(self, line_text: str, content_type: str) -> str:
"""
转换单行内容为代码格式
"""
if content_type == 'text':
# 转义双引号
escaped = line_text.replace('"', '\\"')
return f'$word->body("正文={escaped}=".$F);'
elif content_type == 'formula':
latex = self.formula_to_latex(formula_image)
return f'$word->formula("{latex}");'
elif content_type == 'image':
return '![image]'
return ''
def process_image(self, image_path: str, output_path: str = None):
"""
处理整张图片
"""
# 读取图片
image = cv2.imread(image_path)
# OCR 识别
ocr_result = self.ocr.ocr(image, cls=True)
# 按行处理
output_lines = []
for line in ocr_result:
if line:
for text_box in line:
bbox = text_box[0]
text = text_box[1][0]
confidence = text_box[1][1]
# 提取区域图像
x_coords = [p[0] for p in bbox]
y_coords = [p[1] for p in bbox]
x_min, x_max = min(x_coords), max(x_coords)
y_min, y_max = min(y_coords), max(y_coords)
region = image[y_min:y_max, x_min:x_max]
# 检测内容类型
content_type = self.detect_content_type(region)
# 转换为代码格式
code_line = self.convert_line(text, content_type, region)
output_lines.append(code_line)
# 输出结果
output = '\n'.join(output_lines)
if output_path:
with open(output_path, 'w', encoding='utf-8') as f:
f.write(output)
return output
def main():
import sys
if len(sys.argv) < 2:
print("用法:python image_to_code.py <图片路径> [输出路径]")
sys.exit(1)
image_path = sys.argv[1]
output_path = sys.argv[2] if len(sys.argv) > 2 else None
converter = ImageToCodeConverter()
result = converter.process_image(image_path, output_path)
if not output_path:
print(result)
if __name__ == '__main__':
main()
使用示例
示例 1:含标题的文档
输入图片内容:
第一章 项目概述
1.1 项目背景
本项目旨在开发一个智能系统
用于自动化文档处理
(1) 减少人工操作
(2) 提高准确性
输出代码:
$word->title1("项目概述");
$word->title2("项目背景");
$word->body("正文=本项目旨在开发一个智能系统=".$F);
$word->body("正文=用于自动化文档处理=".$F);
$word->title2("减少人工操作");
$word->title2("提高准确性");
示例 2:含公式和标题
输入图片内容:
第三章 物理公式
3.1 牛顿第二定律
F = ma
力的单位:牛顿 (N)
3.2 万有引力
F = G(m₁m₂)/r²
输出代码:
$word->title1("第三章 物理公式");
$word->title2("3.1 牛顿第二定律");
$word->formula("F = ma");
$word->body("正文=力的单位:牛顿 (N)=".$F);
$word->title2("3.2 万有引力");
$word->formula("F = G\frac{m_1 m_2}{r^2}");
示例 3:含图表的图片
输入图片内容:
销售数据对比
[柱状图]
结论:Q4 增长明显
输出代码:
$word->body("正文=销售数据对比=".$F);
![image]
$word->body("正文=结论:Q4 增长明显=".$F);
命令行接口
# 基本用法
python image_to_code.py input.png
# 指定输出文件
python image_to_code.py input.png output.txt
# 批量处理
python image_to_code.py *.png --output-dir ./output
# 使用视觉 AI(更准确的公式识别)
python image_to_code.py input.png --use-vision-ai
配置选项
{
"ocr_engine": "paddleocr",
"ocr_lang": "ch",
"formula_detection": "auto",
"formula_engine": "pix2tex",
"vision_ai": {
"enabled": false,
"provider": "openai",
"model": "gpt-4-vision-preview"
},
"output": {
"encoding": "utf-8",
"line_ending": "\n"
}
}
质量标准
- 文字识别准确率 > 95%
- 公式识别准确率 > 85%
- 内容类型判断准确率 > 90%
- 输出格式完全符合规范
- 支持中文、英文、数字混合
- 支持批量处理
- 保持原始顺序和结构
注意事项
- 图片质量: 建议使用清晰截图(300dpi 以上)
- 公式复杂度: 复杂公式可能需要人工校对
- 特殊符号: 部分罕见符号可能识别不准确
- 手写体: 暂不支持手写文字识别
- 多栏排版: 需要额外处理阅读顺序
扩展功能(可选)
1. 视觉 AI 增强
使用 GPT-4V/Claude Vision 提高公式识别准确率
2. 上下文校正
根据前后文自动校正 OCR 错误
3. 格式保持
- 标题层级(H1/H2/H3)
- 列表格式(有序/无序)
- 表格转换
4. 批量处理
支持文件夹批量转换
测试用例
| 测试类型 | 输入 | 预期输出 |
|---|---|---|
| 纯中文 | "你好世界" | $word->body("正文=你好世界=".$F); |
| 中英文混合 | "Hello 世界" | $word->body("正文=Hello 世界=".$F); |
| 简单公式 | "a + b = c" | $word->formula("a + b = c"); |
| 复杂公式 | "∫₀^∞ e^(-x²)dx" | $word->formula("\int_{0}^{\infty} e^{-x^2}dx"); |
| 图片 | [图表] | ![image] |
| 空行 | (空) | (空行) |
版本历史
- v1.0.0: 基础功能(OCR+ 格式转换)
- v1.1.0: 公式识别(pix2tex)
- v1.2.0: 视觉 AI 支持
- v1.3.0: 批量处理
图片转代码,让文档处理更高效 🐘
Files
20 totalSelect a file
Select a file to preview.
Comments
Loading comments…
