Skill flagged — suspicious patterns detected

ClawHub Security flagged this skill as suspicious. Review the scan results before using.

Product Doc Reader

v1.0.0

产品工程图纸结构化提取器 v5.0。pdftotext 优先 + Vision 兜底,支持软连字符清理/跨行关联/数据校验。专为 Farreach 线材产品图纸设计。

0· 137·0 current·0 all-time
byJaden's built a claw@cjboy007

Install

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for cjboy007/ssa-product-doc-reader.

Previewing Install & Setup.
Prompt PreviewInstall & Setup
Install the skill "Product Doc Reader" (cjboy007/ssa-product-doc-reader) from ClawHub.
Skill page: https://clawhub.ai/cjboy007/ssa-product-doc-reader
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install ssa-product-doc-reader

ClawHub CLI

Package manager switcher

npx clawhub@latest install ssa-product-doc-reader
Security Scan
VirusTotalVirusTotal
Suspicious
View report →
OpenClawOpenClaw
Suspicious
high confidence
Purpose & Capability
Name/description (PDF → structured JSON/MD) aligns with the included Python scripts and examples. Expectation of pdftotext / pdftoppm / optional Tesseract is coherent for the stated extraction job. However the skill also documents and includes code/notes that reference Vision API (Gemini via OpenRouter) and Google Drive batch processing — those capabilities legitimately require API access and credentials, yet no environment variables or credentials are declared in the metadata.
!
Instruction Scope
SKILL.md and DEVELOPMENT_SUMMARY explicitly mention Vision API fallback (Gemini 2.5 Flash / OpenRouter) and Drive search / batch upload logic. The runtime instructions show only running python3 scripts locally, but the scripts (e.g., batch_process_drive.py, extract_vision.py, extract_hybrid.py) almost certainly make network requests or call external services. SKILL.md does not specify where API keys should come from, nor does it constrain network behavior — that grants broad discretion to the bundled scripts and is scope-creeping compared to a purely local pdftotext-based extractor.
Install Mechanism
No install spec — skill is delivered as files + instructions and expects system binaries (python3, pdftoppm). That is lower-risk than an installer that downloads/extracts arbitrary code. The presence of several Python scripts means code will run when invoked; no remote installer URLs or archive extraction steps were provided in the manifest.
!
Credentials
The skill declares no required env vars, but multiple parts of the documentation and examples reference external services: Vision/OpenRouter and Google Drive. Those services require API keys or OAuth tokens (not declared as requires.env or primaryEnv). This is an inconsistency: the code likely expects credentials in the environment or local config but the manifest gives no explicit, least-privileged credential requirements. That increases the chance of accidental credential exposure or of the scripts attempting to prompt for/use credentials from unrelated places.
Persistence & Privilege
always is false and the skill is user-invocable. There is no request for permanent platform-wide presence. Nothing in the provided metadata indicates the skill will modify other skills or agent-wide settings. That is proportionate.
What to consider before installing
What to consider before installing/running: - Expect this to run code: the package includes multiple Python scripts (extract_hybrid.py, extract_vision.py, batch_process_drive.py). Review those scripts before running. Search them for network calls (requests, urllib, openai/openrouter SDKs, googleapiclient, sockets) and any hardcoded endpoints. - Credentials mismatch: the docs mention Vision (Gemini/OpenRouter) and Google Drive operations but the manifest does not declare required API keys. If you run the scripts, they may look for API keys in your environment, local config, or attempt to prompt/use CLI tools — be cautious about providing secrets. - Sandbox first: run the tool offline or in a VM/container without network access (or with network blocked) to confirm local-only behavior when you only want pdftotext-based extraction. Then enable network only after you’ve inspected code and configured minimal, scoped credentials. - Inspect batch_process_drive.py carefully: it may upload or delete files on Drive. If you use it, ensure it targets a test Drive account or that you understand exactly what it will read/write. - If you need Vision fallback, prefer creating a dedicated, limited-scope API key/account and monitor outgoing traffic. Consider replacing or instrumenting the script calls to OpenRouter/OpenAI to log endpoints and data before sending. - If you’re not comfortable auditing Python code, ask the maintainer for a short summary of what external endpoints the scripts call, and for a minimal configuration example that shows required env vars and where outputs are sent. If you provide the content of scripts/extract_hybrid.py and batch_process_drive.py I can do a targeted review and point to exact lines that perform network or Drive/OpenRouter calls.

Like a lobster shell, security has layers — review code before you run it.

Runtime requirements

📐 Clawdis
Any binpython3, pdftoppm
latestvk974919efspp4r52nskpyvtys183qmkc
137downloads
0stars
1versions
Updated 1mo ago
v1.0.0
MIT-0

Product Doc Reader 📐

产品工程图纸 → 结构化数据 (v5.0.1 修复版)

将 PDF 格式的产品图纸自动转换为结构化 JSON 和 Markdown,专为线材/连接器产品图纸设计。

v5.0.1 最新修复(2026-03-24):

  • ✅ 软连字符清理(\xad 导致产品名断裂)
  • ✅ 跨行关联优化(长度值分行问题)
  • ✅ 数据校验增强(排除 BJ 编号/电气参数/线材规格)
  • ✅ 599-028 修复(右上角客户品名信息栏提取)

🎯 使用场景

  • 客户发来产品图纸 PDF,需要提取 BOM、规格参数
  • 批量处理工程图纸,建立产品数据库
  • 图纸内容归档到知识库(Obsidian/Markdown)
  • 图纸比对(新旧版本 BOM 差异)
  • 客户图纸适配(C331 等非福睿标准模板)
  • Excel 模具号列自动填充

🏗️ 架构 (v5.0 混合策略)

PDF 图纸
  ├─→ pdftotext(精确文本,保留布局)⭐ 优先
  │     ├─→ 清理软连字符 \xad
  │     ├─→ BOM 物料清单(字体嵌入,100% 准确)
  │     ├─→ 产品规格矩阵(跨行关联)
  │     └─→ 字段规则识别(BJ/599-xxx/ITEM)
  │
  ├─→ pdftoppm(PDF → PNG,300 DPI)
  │     └─→ Vision API(Gemini 2.5 Flash)⭐ 兜底
  │           ├─→ 布局理解
  │           ├─→ 图形/尺寸标注
  │           ├─→ 测试要求结构化表格
  │           └─→ 多页合并
  │
  └─→ 合并策略
        ├─→ 模具号:pdftotext 优先(精确)
        ├─→ 产品名:pdftotext 优先(排除 BOM 词汇)
        ├─→ 包装规范:pdftotext 优先(BJ 精确匹配)
        ├─→ 长度:pdftotext 优先(跨行关联)
        ├─→ 测试要求:Vision 结构化表格
        └─→ 字段校验 + 置信度评分

字段识别规则

规则示例说明
BJ + 8 位数字BJ0599-0002包装规范(福睿内部编码)
599-xxx599-002Drawing No. / 物料编码
ITEM 栏的值5001-127AModel No.(客户图纸常见)
CUSTOMER ITEMHDACFM客人品名
MODEL NO.OP-DP09型号(多产品图纸)

数据校验规则

校验项规则示例
模具号格式必须包含字母 + 数字5001-130AP
排除单字符长度 ≤ 2P, 1
排除 BJ 开头包装规范BJ0599-0002
排除 599 开头Drawing No.599-002
产品名长度> 3 字符300, 60H
排除纯数字长度值9144, 10000
排除电气参数电压/频率300V, 60HZ
排除线材规格导体规格32BC, 511BC

🔧 依赖

工具用途安装
python3运行脚本系统自带
pdftoppmPDF 转 PNGbrew install poppler
tesseractOCR 备用(可选)brew install tesseract tesseract-lang

不再依赖: docling


📋 快速使用

基本提取(混合模式,推荐)

cd /Users/wilson/.openclaw/workspace/skills/product-doc-reader
python3 scripts/extract_hybrid.py <图纸.pdf>
# 输出到 ./output/<图纸名>.json 和 .md

输出到指定目录

python3 scripts/extract_hybrid.py <图纸.pdf> -o /path/to/output -f both

输出 JSON 到 stdout(管道使用)

python3 scripts/extract_hybrid.py <图纸.pdf> --stdout -f json

纯 Vision 模式(备用)

python3 scripts/extract_hybrid.py <图纸.pdf> --vision-only

纯文本模式(快速)

python3 scripts/extract_hybrid.py <图纸.pdf> --text-only

参数说明

参数说明默认
pdf_pathPDF 文件路径(必填)-
-o, --output-dir输出目录./output
-f, --format输出格式:json / md / bothboth
--vision-only仅用 Vision API关闭
--text-only仅用 pdftotext关闭
--stdout输出到 stdout关闭
--dpiOCR DPI(默认 300)300

📊 输出结构

JSON 字段

{
  "product_name": "HDMI2CABLE4K6030F, HDMI2CABLE4K6010M",
  "model_no": "5001-131A",
  "drawing_no": "599-028",
  "packaging_spec": "BJ0599-0053, BJ0599-0055",
  "material_code": "599-028",
  "length_mm": "9144+50, 10000+50",
  "mold_info": "",
  "mold_number": "",
  "bom": [
    {
      "no": "①",
      "part_name": "CABLE",
      "spec": "HDMI2 4K60 30AWG...",
      "quantity": "M"
    }
  ],
  "tolerances": [
    { "range": "0.5-3.0", "hardware": "±0.05", "plastic": "±0.1" }
  ],
  "test_requirements": {
    "table": [],
    "other_tests": ["100% OPEN SHORT MISS WIRE TEST", "PASS 4K@60HZ"]
  },
  "pin_assignment": {
    "connectors": ["HDMI"],
    "description": ""
  },
  "dimensions": ["11.33±0.3", "20.8±0.3"],
  "notes": ["ROHS compliant"],
  "revision_history": [
    { "revision": "A0", "date": "2026/03/17", "description": "" }
  ],
  "company": "珠海福睿电子 FARREACH",
  "drawing_date": "2026/03/17",
  "drawn_by": "Kenny",
  "checked_by": "X.J.C",
  "approved_by": "Lin.",
  "products": [
    {
      "customer_item": "HDMI2CABLE4K6030F",
      "length_mm": "9144+50",
      "material_code": "599-028",
      "packaging_spec": "BJ0599-0053"
    }
  ],
  "source_file": "599-028.pdf",
  "extraction_method": "hybrid",
  "pages": 1,
  "confidence": 100.0,
  "warnings": []
}

✅ 测试结果(58 份 599 系列图纸)

产品类型图纸数量模具号范围成功率
HDMI 转接头7AD4001-009 ~ MB-003100%
HDMI 线缆205001-119A ~ OP-HD65100%
DP-HDMI 转换器8AP-073A / AP-079A100%
USB-C 转 HDMI3OP-USB09 / TP-C089B100%
DP 光纤线缆 (AOC)1OP-DP09 (7 款)100%
KVM 线缆5DP-033A ~ 5001-127A100%
USB 转接头3AD9017 / USB-3005A-014100%
DP 线缆1DP-021B100%

总计: 58/58 (100%)


⚠️ 已知限制与优化建议

1. Google Drive 搜索分页

问题: gog drive search "599" 只返回 17 个结果,实际有 58 个。

解决: 逐个搜索 599-001599-058

2. 产品名提取依赖排除列表

问题: 需要手动添加新 BOM 词汇。

解决: 已添加 50+ 排除词,后续发现新词继续添加。

3. 批量处理无断点续传

问题: 58 个文件处理到一半失败,需要重来。

建议: 记录已处理文件列表,支持从断点继续。

4. 特殊模板适配

问题: 客户图纸(如 C331 模板)字段位置不同。

解决: 已支持跨行关联(上下 5 行搜索)。

5. 数据校验需人工审核

问题: 置信度 <80% 的结果需要人工检查。

建议: 添加自动标记功能,低置信度结果单独输出。


📁 文件结构

product-doc-reader/
├── SKILL.md                      ← 本文件
├── DEVELOPMENT_SUMMARY.md        ← 开发总结(踩坑记录)
├── scripts/
│   ├── extract_hybrid.py         ← v5.0.1 核心提取脚本
│   ├── batch_599_full.py         ← 批量处理脚本
│   └── batch_process_drive.py    ← Drive 批量处理
├── examples/
│   └── hybrid/                   ← 测试输出示例
└── output/                       ← 默认输出目录

🔄 版本历史

版本日期改进
v5.0.12026-03-24599-028 修复:软连字符清理/跨行关联/数据校验
v5.0.02026-03-24pdftotext 优先 + 数据校验规则
v4.0.02026-03-23pdftotext + Vision 混合策略
v3.0.02026-03-23纯 Vision API(Gemini Flash)
v2.0.02026-03-23Docling + 区域分割 + Tesseract OCR

📞 维护者

开发: WILSON
测试: Jaden
最后更新: 2026-03-24 15:00


Product Doc Reader v5.0.1 - 生产就绪 ✅

Comments

Loading comments...