claw-text-and-pics

v1.0.1

Extract text and embedded images from scanned documents, PDFs, and photos via Mistral OCR API. Use when reading receipts, invoices, contracts, handwritten no...

⭐ 0· 82·0 current·0 all-time

by@photon78

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for photon78/claw-text-and-pics.

Previewing Install & Setup.

Prompt PreviewInstall & Setup

Install the skill "claw-text-and-pics" (photon78/claw-text-and-pics) from ClawHub.
Skill page: https://clawhub.ai/photon78/claw-text-and-pics
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install claw-text-and-pics

ClawHub CLI

Package manager switcher

npx clawhub@latest install claw-text-and-pics

Security Scan

Capability signals

Requires sensitive credentials

These labels describe what authority the skill may exercise. They are separate from suspicious or malicious moderation verdicts.

VirusTotal

Benign

View report →

OpenClaw

Suspicious

medium confidence

Purpose & Capability

The code and SKILL.md implement a Mistral OCR client that needs a MISTRAL_API_KEY and optionally TELEGRAM_BOT_TOKEN / TELEGRAM_CHAT_ID for sending images. However, the registry metadata at the top claims "Required env vars: none" and "Primary credential: none", which is incorrect. The required environment variables (MISTRAL_API_KEY) are proportionate to the stated purpose, but the registry listing failing to declare them is an inconsistency that could mislead users.

ℹ

Instruction Scope

SKILL.md instructs the agent to read ~/.openclaw/.env as a fallback for credentials, but the included ocr.py only reads environment variables via os.environ and does not implement loading that file. Aside from that mismatch, the runtime behavior described (send document to Mistral, print Markdown, optionally crop images locally with Pillow, optionally send images to Telegram) matches the code. The skill transmits document data to api.mistral.ai (expected) and to api.telegram.org only when --send is used (also expected).

✓

Install Mechanism

No install spec / external downloads are present; the skill is instruction+Python code only. Optional dependency is Pillow (pip). Nothing is downloaded from arbitrary URLs and no installers create unexpected binaries, so install risk is low.

ℹ

Credentials

The code requires MISTRAL_API_KEY (sensitive) and optionally TELEGRAM_BOT_TOKEN / TELEGRAM_CHAT_ID for sending images. Those credentials are proportional to the functionality. The concern is the registry metadata omitted declaring the required env var(s), which may cause users to miss that they must provide a sensitive API key. The SKILL.md does document the env vars correctly; code enforces MISTRAL_API_KEY at runtime.

✓

Persistence & Privilege

The skill does not request permanent/global presence (always:false) and it does not modify other skills or system-wide settings. Autonomous invocation is allowed by default but is not combined with other high-privilege requests, so no additional persistence concerns are present.

What to consider before installing

This skill appears to do what it says (send image/PDF content to Mistral OCR and optionally post cropped images to Telegram), but note these points before installing: - The registry metadata omitted required credentials, but the skill actually requires MISTRAL_API_KEY (and TELEGRAM_BOT_TOKEN only if you use --send). Provide the Mistral key via environment variables; otherwise the script exits. - SKILL.md says it reads ~/.openclaw/.env as a fallback, but the included Python script does not load that file — it reads only environment variables. If you rely on a .env file, ensure your environment loader populates os.environ or modify the script. - Using this skill sends document data to Mistral's API. Do not run it on highly sensitive documents unless you trust the Mistral service and your API key policy. Consider processing sensitive files in an isolated environment or checking your Mistral account data-retention policy. - If you use --send, the skill will upload images to Telegram using the provided bot token and chat ID. Ensure your TELEGRAM_BOT_TOKEN is limited to the bot you expect and keep it secret. - The repository imports subprocess but does not use it; no arbitrary shell execution is performed by the script. Still, review network endpoints (api.mistral.ai and api.telegram.org) and confirm you are comfortable with external network calls. If you want to proceed: set MISTRAL_API_KEY in the agent environment, audit that environment for other secrets, and run the script in an environment where accidental exfiltration risk is controlled. If you need stronger assurance, request the publisher correct the registry metadata and/or add explicit code to load ~/.openclaw/.env (or remove the misleading note).

Like a lobster shell, security has layers — review code before you run it.

documents images mistral ocr pdf pictures extractionvk97cypmh920hc9gjbdqznx76yn85fwyslatestvk97cypmh920hc9gjbdqznx76yn85fwys

82downloads

0stars

2versions

Updated 3d ago

v1.0.1

MIT-0

claw-text-and-pics

Extract text and images from documents via Mistral OCR

Give your OpenClaw agent the ability to read scanned documents, PDFs, and images — extracting clean Markdown text and cropping out embedded images. Powered by Mistral's OCR API.

When to use

Extract text from scanned documents, invoices, receipts, contracts
Pull embedded images from PDFs or scans
Convert handwritten notes or photos to searchable text
Send extracted images directly to Telegram

Usage

# Extract text only
python3 ocr.py --input scan.jpg

# Extract text from PDF (3 pages)
python3 ocr.py --input document.pdf --pages 3

# Extract embedded images
python3 ocr.py --input scan.jpg --extract-images --output-dir ./images/

# Extract images and send to Telegram
python3 ocr.py --input scan.jpg --extract-images --send --target 123456789

# Works with URLs too
python3 ocr.py --input https://example.com/document.pdf

Output

stdout: Extracted text as Markdown
Files: Cropped images saved to --output-dir (only with --extract-images)

Configuration

Set in ~/.openclaw/.env or as environment variables:

Variable	Required	Description
`MISTRAL_API_KEY`	Yes	Your Mistral API key
`TELEGRAM_BOT_TOKEN`	Only for `--send`	Your Telegram bot token
`TELEGRAM_CHAT_ID`	Optional	Default chat ID (overridable with `--target`)

Environment Variables

MISTRAL_API_KEY=required        # Mistral API key — get one at console.mistral.ai
TELEGRAM_BOT_TOKEN=optional     # Required only when using --send
TELEGRAM_CHAT_ID=optional       # Default target chat ID (overridable with --target)

This skill reads ~/.openclaw/.env as a fallback for credentials. Ensure the file has restricted permissions: chmod 600 ~/.openclaw/.env

Requirements

Python 3.11+
Mistral API key (console.mistral.ai)
Optional (only for --extract-images): pip install pillow

Parameters

Parameter	Required	Description
`--input`	Yes	Local path or URL to image/PDF
`--extract-images`	No	Crop and save embedded images
`--output-dir`	No	Output directory (default: `./extracted-images`)
`--send`	No	Send extracted images via Telegram
`--target`	No	Telegram chat ID (or `TELEGRAM_CHAT_ID` env var)
`--pages`	No	Number of PDF pages to process
`--debug`	No	Print raw API response

Comments

Loading comments...