PDF OCR Using Gemini LLM
v0.1.7Extract text from PDFs using Google Gemini OCR. Use when extracting text from PDFs, performing OCR on scanned documents, or processing image-based PDFs.
⭐ 0· 252·1 current·1 all-time
byIssam El Alaoui@ashtonizmev
MIT-0
Download zip
LicenseMIT-0 · Free to use, modify, and redistribute. No attribution required.
Security Scan
OpenClaw
Benign
high confidencePurpose & Capability
Name/description, required env (GOOGLE_API_KEY), listed Python packages (google-genai, pymupdf), CLI entry point, and code all align with a PDF OCR tool that uploads pages to Google's Gemini API.
Instruction Scope
The SKILL.md and code explicitly split PDFs into single-page files and upload full page files to Google's API for OCR. This behaviour is documented in the README and implemented in gemini_client.py (files.upload + models.generate_content). There are no apparent instructions or code that read unrelated files, other env vars, or send data to unknown endpoints, but note that entire page images are transmitted to Google (privacy/cost implication).
Install Mechanism
Dependencies are standard Python packages (google-genai, pymupdf, pydantic, pydantic-settings) and a requirements.txt is included. No downloads from custom URLs or extracts from arbitrary hosts are present.
Credentials
Only GOOGLE_API_KEY is required and declared as the primary credential. That single key is appropriate and required for the Google Gemini client used by the skill. No unrelated secrets or config paths are requested.
Persistence & Privilege
The skill is not always-enabled, does not modify other skills, and only writes temporary files under the system temp directory (cleans up after processing). It does not request elevated system persistence.
Assessment
This skill appears to be what it says: it splits PDFs into single-page files and uploads them to Google Gemini for OCR, and it requires only GOOGLE_API_KEY. Before installing, consider: (1) privacy — full page images are sent to Google, so do not use with highly sensitive documents unless acceptable; (2) cost and quotas — large PDFs mean many uploads and API usage billed against your API key; (3) secure the GOOGLE_API_KEY (don’t paste it into logs or share it); (4) review and pin package versions if you want reproducible installs; (5) test on non-sensitive sample PDFs first to confirm behavior. If you need guarantees about retention or want OCR to run locally, consider a local OCR solution instead.Like a lobster shell, security has layers — review code before you run it.
latestvk9779zcq1re620xq1fwem804zd82fe0s
License
MIT-0
Free to use, modify, and redistribute. No attribution required.
Runtime requirements
EnvGOOGLE_API_KEY
Primary envGOOGLE_API_KEY
Install
Python deps
uv tool install google-genaiuv
uv tool install pymupdfuv
uv tool install pydanticuv
uv tool install pydantic-settings