universal-pdf-vision-parser
Extract multilingual document content and language learning notes (French, German, Japanese, Spanish, etc.) from PDFs using multimodal vision (Qwen-VL-Max)....
MIT-0 · Free to use, modify, and redistribute. No attribution required.
⭐ 0 · 249 · 5 current installs · 6 all-time installs
byM Z@MingEnsiie
MIT-0
Security Scan
OpenClaw
Suspicious
high confidencePurpose & Capability
The skill's name, description, SKILL.md, and code all align: converting PDF pages to images and sending them to Qwen‑VL‑Max for transcription. However, the registry metadata claims no required env vars or credentials while SKILL.md and the script require a DashScope API key (either via --api-key or DASHSCOPE_API_KEY). This metadata omission is an incoherence worth flagging.
Instruction Scope
The runtime instructions and the script remain within the stated purpose: render PDF pages to PNG, base64-encode them, send them plus a transcription prompt to a multimodal API, and write Markdown. The agent is not instructed to read unrelated files or system state.
Install Mechanism
There is no formal install spec in the registry (instruction-only), but SKILL.md tells the user to pip install pymupdf and dashscope. That is typical for a Python-based, instruction-only skill, but the lack of declared dependencies in the registry is another metadata inconsistency.
Credentials
The code expects an API key (DASHSCOPE_API_KEY or CLI --api-key) to call an external service; this is proportionate to the function. The concern is that the registry lists no required credentials. Also note that the skill transmits full-page base64 images to a third-party API — that is necessary for the stated purpose but has privacy/breach implications for sensitive documents.
Persistence & Privilege
The skill does not request always:true, does not modify other skills or system-wide settings, and does not persist credentials beyond setting dashscope.api_key at runtime. No elevated or permanent privileges are requested.
What to consider before installing
This skill appears to do what it says (convert PDF pages to images and send them to Qwen‑VL‑Max for transcription), but there are two issues to consider before installing:
- Metadata mismatch: The registry claims no required credentials, but the SKILL.md and script require a DashScope API key (DASHSCOPE_API_KEY or --api-key) and Python packages. Confirm the registry/provider and why credentials/dependencies were omitted.
- Data exposure: The skill uploads full page images (base64 PNGs) to an external service. Do not run it on sensitive or confidential PDFs unless you trust the DashScope endpoint and have reviewed its privacy/billing/retention policies. Consider using local OCR alternatives for sensitive data.
Recommended actions:
- Verify the skill's source and author (no homepage and unknown source are risk indicators).
- Confirm API key scope and permissions (least-privilege) and monitor billing/usage for unexpected activity.
- Test with non-sensitive documents first and inspect network activity if possible.
- If you need stronger assurance, ask the publisher to update registry metadata to declare required env vars and dependencies, and provide a canonical homepage or repo.Like a lobster shell, security has layers — review code before you run it.
Current versionv1.0.0
Download ziplatest
License
MIT-0
Free to use, modify, and redistribute. No attribution required.
SKILL.md
Universal PDF Vision Parser Skill
Version: 0.1
This skill is a high-end multilingual document digitizer. It uses multimodal vision to 'look' at each PDF page, making it perfect for language learning notes, bilingual documents, and complex layouts that standard OCR fails to capture.
Prerequisites
- DashScope API Key: A valid key from Alibaba Cloud Bailian with
qwen-vl-maxaccess. - Environment:
pip install pymupdf dashscope
Usage
Basic Command
python scripts/vision_parse.py --pdf <path_to_pdf> --out <path_to_output.md> --api-key <YOUR_API_KEY> --max-pages 2
--max-pages: (Optional) Max pages to process. Defaults to2. Set to-1for all pages.
Agentic Workflow
- Visual Scanning: Converts PDF pages to 300 DPI PNGs.
- Expert Transcription: Qwen-VL-Max identifies the language and transcribes terms, translations, and explanations.
- Markdown Structuring: Automatically formats content with bold keywords, italicized meanings, and clean tables.
Examples
User: "Convert this German-Chinese note to markdown: notes.pdf"
Agent Action:
python scripts/vision_parse.py --pdf notes.pdf --out notes.md
Files
2 totalSelect a file
Select a file to preview.
Comments
Loading comments…
