Skill flagged — suspicious patterns detected

ClawHub Security flagged this skill as suspicious. Review the scan results before using.

Markitdown File Converter

v1.0.0

将 PDF、Word (docx/doc)、Excel (xlsx/xls)、PPT (pptx/ppt)、图片等文件一键转换为 Markdown 或 JSON。 内置三大引擎:pandoc(DOCX 表格/Emoji/公式最强)、markitdown(微软开源,Excel/PPT/图片 OCR)、mammoth...

0· 17·0 current·0 all-time
MIT-0
Download zip
LicenseMIT-0 · Free to use, modify, and redistribute. No attribution required.
Security Scan
Capability signals
Requires OAuth token
These labels describe what authority the skill may exercise. They are separate from suspicious or malicious moderation verdicts.
VirusTotalVirusTotal
Suspicious
View report →
OpenClawOpenClaw
Suspicious
high confidence
!
Purpose & Capability
The skill is a document-to-Markdown/JSON converter and its files and CLI match that purpose. However, the code includes a 'PaddleOCR Cloud' integration with a default API URL and hard-coded access token so the skill will call an external cloud API by default. Requiring a remote OCR service is not inherently wrong, but the registry metadata declared no required env vars/credentials and the README implied cloud OCR is used only if configured — the code contradicts that by enabling the cloud path via non-empty defaults.
!
Instruction Scope
SKILL.md describes local installs and optional 'PaddleOCR Cloud' usage 'if configured'. But the runtime instructions and code will attempt to call PaddleOCR Cloud automatically because defaults are present. The skill will read images/files and POST them to an external HTTP endpoint (ocr/paddleocr.py -> httpx.post). That behavior (sending potentially sensitive document contents to a third-party endpoint) is not clearly documented as enabled by default in SKILL.md and thus expands the instruction scope unexpectedly.
Install Mechanism
There is no platform install spec in registry metadata (instruction-driven). The scripts run pip installs at runtime (subprocess pip install) and pandoc download logic uses GitHub releases or winget/brew/apt — these are expected for this functionality. No obfuscated installers or unusual download hosts are present except the single hard-coded PaddleOCR API endpoint for runtime calls (not an installer).
!
Credentials
Registry metadata lists no required environment variables or credentials, but the code reads PADDLEOCR_DOC_PARSING_API_URL and PADDLEOCR_ACCESS_TOKEN (in scripts/ocr/paddleocr.py). Worse, these have non-empty default values in the code, so the cloud OCR path is considered 'configured' even if the user sets nothing. A document conversion skill should not transmit document contents to a remote service without explicit, declared credentials or opt-in.
Persistence & Privilege
The skill does not request permanent inclusion (always=false), does not modify other skills, and does not persist credentials or change global agent configuration. It runs installs in the current environment, which is expected for a utility script.
Scan Findings in Context
[hardcoded-api-url] unexpected: scripts/ocr/paddleocr.py defines PADDLEOCR_DOC_PARSING_API_URL defaulting to 'https://c474r929pea0qa6c.aistudio-app.com/layout-parsing'. A default remote endpoint for OCR is present in code, which will be used unless environment variables override it.
[hardcoded-access-token] unexpected: scripts/ocr/paddleocr.py includes a hard-coded PADDLEOCR_ACCESS_TOKEN value. The code treats the pair (URL + token) as 'configured' by default, causing automatic calls to a third-party API and potential exfiltration of input files/images.
What to consider before installing
This skill mostly does what it claims (local conversion + OCR), but it will by default send images/documents to a remote 'PaddleOCR Cloud' endpoint because the code ships with a non-empty default API URL and token. Before installing or running: 1) Do not run it on sensitive documents without first verifying or disabling the cloud OCR: set environment variables PADDLEOCR_DOC_PARSING_API_URL="" and/or PADDLEOCR_ACCESS_TOKEN="" to disable the cloud path, or remove/patch scripts/ocr/paddleocr.py so is_configured() returns False unless you explicitly configure it. 2) If you must use cloud OCR, replace the default endpoint/token with a known, trusted service and a token you control, and inspect network traffic to confirm destination. 3) Consider running conversions with the pix2tex / RapidOCR local engines only (they are present) or audit/modify the code to never call external endpoints automatically. 4) If unsure, run the skill in an isolated environment (offline or sandboxed) and review/grep the repository for other hard-coded endpoints or secrets before use.

Like a lobster shell, security has layers — review code before you run it.

latestvk974wj1pxrq6ckvhwgt85zsxm584g18f

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

Comments