Skill flagged — suspicious patterns detected

ClawHub Security flagged this skill as suspicious. Review the scan results before using.

document-parser

Extract structured data from PDFs, images, and Word files with layout analysis, table recognition, OCR, seal detection, and directory extraction.

MIT-0 · Free to use, modify, and redistribute. No attribution required.
0 · 410 · 5 current installs · 6 all-time installs
bytoken-ai@ankylala
MIT-0
Security Scan
VirusTotalVirusTotal
Suspicious
View report →
OpenClawOpenClaw
Suspicious
high confidence
Purpose & Capability
The code and documentation consistently implement a remote-document-parser client (PDF/image/Word parsing, OCR, table and seal detection). Using a remote API for heavy tasks like OCR/layout analysis is reasonable, so the capability aligns with the name/description. However, the packaged default base_url is an IP address (47.111.146.164) embedded in examples and defaults, which is unexpected for a generic skill and should be justified by the author.
!
Instruction Scope
Runtime instructions and the CLI cause the skill to read local files and POST their binary contents to a remote HTTP endpoint. The SKILL.md and config examples explicitly point to the same unknown IP. The skill will attempt uploads even without an API key (it logs a warning but proceeds), so users could inadvertently exfiltrate sensitive documents simply by running the default parse command.
Install Mechanism
This is instruction-only plus a Python script; there is no download-from-URL or post-install arbitrary code fetch. Dependencies are standard (requests, python-docx, Pillow) and listed in requirements.txt. No high-risk install behavior was found.
!
Credentials
The skill does not require environment variables, but supports optional DOCUMENT_PARSER_API_KEY and DOCUMENT_PARSER_BASE_URL. The problem is not many credentials requested, but that the default configuration/README/config.example hardcodes an explicit IP-based endpoint. Sensitive files are sent to that endpoint by default, and the API key is optional — meaning data can be uploaded unauthenticated. That is disproportionate for a drop-in skill where users may expect local processing or to configure their own server.
Persistence & Privilege
The package does not request always:true, does not modify other skills or system-wide settings, and only writes output files derived from user input to the current working directory. It does read a local config.json if present (expected). No elevated persistence or privilege escalation behavior observed.
What to consider before installing
This skill is functionally coherent but risky by default: if you run 'document-parser parse <file>' as-is, the file will be uploaded to the default server at 47.111.146.164 (the README and config example point to that IP). Before installing/using it, consider: 1) Do not upload sensitive documents to an unknown third party. 2) Prefer to set DOCUMENT_PARSER_BASE_URL to a trusted/self-hosted parser endpoint, or host the parsing service yourself. 3) If you must use this skill, require and provide an API key and confirm the operator/trustworthiness of the endpoint. 4) Audit network traffic (or run in an isolated environment) to verify where files are sent. 5) If you don't have a trusted remote parser, avoid using the skill or inspect/modify index.py to implement local processing instead.
!
clawhub.yaml:35
Install source points to URL shortener or raw IP.
!
config.example.json:2
Install source points to URL shortener or raw IP.
About static analysis
These patterns were detected by automated regex scanning. They may be normal for skills that integrate with external APIs. Check the VirusTotal and OpenClaw results above for context-aware analysis.

Like a lobster shell, security has layers — review code before you run it.

Current versionv1.0.1
Download zip
latestvk97as2w6vdeagsk2hq1m6x58t582jhdz

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

SKILL.md

document-parser

高精度文档解析技能,从 PDF、图片、Word 文档中提取结构化数据。

用途

  • 解析 PDF、图片 (JPG/PNG)、Word 文档
  • 版面分析与结构提取
  • 表格识别(输出 HTML/Markdown)
  • OCR 文字识别
  • 印章检测
  • 目录提取

命令

解析文档

document-parser parse <文件路径> [选项]

示例:

document-parser parse C:\docs\report.pdf
document-parser parse C:\docs\scan.jpg --layout --table
document-parser parse C:\docs\contract.docx --output markdown

查询任务状态

document-parser status <任务 ID>

参数说明

参数说明示例
文件路径PDF/图片/Word 文件路径C:\docs\report.pdf
--layout启用版面分析--layout
--table启用表格识别--table
--seal启用印章检测--seal
--output输出格式 (json/markdown/both)--output markdown
--pages页码范围--pages 1-5,8,10-12

配置

方式一:环境变量

DOCUMENT_PARSER_API_KEY=your_api_key
DOCUMENT_PARSER_BASE_URL=http://47.111.146.164:8088/taidp/v1/idp/general_parse

方式二:配置文件

在技能目录创建 config.json

{
  "api_key": "your_api_key",
  "base_url": "http://47.111.146.164:8088/taidp/v1/idp/general_parse"
}

输出格式

返回结构化 JSON 包含:

  • pages: 解析后的页面数组
  • elements: 版面元素(文本、表格、图片等)
  • markdown: Markdown 格式文本
  • data: 数据统计摘要

依赖

  • requests
  • python-docx (Word 支持)
  • Pillow (图片处理)

错误码

错误码消息说明
10000Success识别成功
10001Missing parameter参数缺失
10002Invalid parameter非法参数
10003Invalid file文件格式非法
10004Failed to recognize识别失败
10005Internal error内部错误

Files

8 total
Select a file
Select a file to preview.

Comments

Loading comments…