Doc Process

v4.1.1

Document intelligence: categorize, autofill forms, analyze contracts, scan receipts/invoices, analyze bank statements, parse resumes/CVs, scan IDs/passports...

1· 510·4 current·4 all-time
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Suspicious
high confidence
!
Purpose & Capability
Name, description, reference docs, and included scripts line up with document intelligence features (OCR, redaction, scanning, table extraction, statement parsing, timeline). However the SKILL.md instructs automatic installation of system binaries and Python packages for optional features (tesseract, ffmpeg, whisper models), which is not reflected in the registry metadata (no install spec, no declared binaries). The implicit requirement to install system packages and download models is disproportionate to what the metadata advertises and should be explicit.
!
Instruction Scope
The runtime instructions are detailed and scoped to document tasks and include consent gates for classification and timeline logging. BUT they also direct the agent to run the provided setup.sh 'automatically — no prompting needed' before running scripts. That automatic-install instruction expands the agent's runtime actions beyond mere file processing (system/package installs and network downloads) and conflicts with the otherwise careful consent language in the SKILL.md.
!
Install Mechanism
There is no registry install spec, but the bundle contains setup.sh which the SKILL.md tells the agent to execute. setup.sh will pip-install many packages and attempt to install system binaries via brew/apt and the skill will also cause openai-whisper to download a ~140MB model at runtime. Running an included install script from an unknown/unsigned skill is higher-risk because it executes arbitrary code on the host and performs system-level installs and network downloads.
Credentials
The skill requests no environment variables or credentials in metadata (good). It does write to local paths (cache for whisper, ~/.doc-process-timeline.json, optionally ~/Documents) and may create files in the user's home. Those writes are reasonable for a document tool but should be consented to — the SKILL.md documents an opt-in model for timeline logging, though the auto-setup behavior introduces unexpected side effects (downloads, package installs) that were not declared as required environment/privileges.
!
Persistence & Privilege
The skill is not marked always:true and uses an opt-in timeline stored in ~/.doc-process-timeline.json, which the docs say will only be written after explicit consent. However, the instruction to run setup.sh automatically on first use may persistpackages, binaries, and downloaded models across sessions without an explicit user prompt, giving it a persistent footprint on the system that the metadata does not call out.
What to consider before installing
What to check before installing or running this skill: - Review the included setup.sh file line-by-line before running it. It will pip-install packages and attempt to install system binaries via apt/brew; those commands run on your machine and may require elevated privileges. - If you don't want system-level changes, refuse automatic setup and instead run the scripts in a controlled environment (container, VM, or sandbox) and perform manual installs you approve. - Expect the openai-whisper dependency to download a large model (~140 MB) on first use; that implies network activity and disk usage (~~/.cache/whisper). - Confirm that timeline logging is actually opt-in in practice: when enabling, verify the skill only writes the stated JSON fields and that summaries contain no PII. - If you need to use this skill on sensitive documents (IDs, bank statements, medical records), prefer running the Python scripts locally under your control rather than allowing the agent to auto-install/run them. - If you intend to allow the agent to run the skill autonomously, be aware that the combination of autonomous invocation and an auto-install step increases the potential blast radius; consider disabling autonomous installs or requiring explicit user confirmation for setup actions.

Like a lobster shell, security has layers — review code before you run it.

latestvk97fx3va2dshfxkp7n5tj54d2h82ehdf
510downloads
1stars
16versions
Updated 1mo ago
v4.1.1
MIT-0

Doc-Process — Document Intelligence Skill

Step 0 — Auto-Setup (run once on first use)

Before invoking any script for the first time in a session, check whether the script dependencies are available. If any are missing, run the setup script automatically — no prompting needed:

bash skills/doc-process/setup.sh

This installs all Python packages (pymupdf, Pillow, pytesseract, opencv-python-headless, numpy, img2pdf, pdfplumber, openai-whisper) and attempts to install system binaries (tesseract, ffmpeg) via brew or apt depending on the platform.

When to run Step 0:

  • First time any script-assisted mode is used in a session
  • After a fresh clawhub install piyush-zinc/doc-process
  • If a script fails with ModuleNotFoundError or ImportError

To install Python packages only (no system packages):

bash skills/doc-process/setup.sh --light

Or install directly from the skill's requirements file:

pip install -r skills/doc-process/requirements.txt

Note: openai-whisper downloads its model (~140 MB) on first audio transcription — not at install time.


Overview

This skill handles all document-related tasks using Claude's native vision/language capabilities for reading and analysis, and Python scripts for file-output operations. Most modes require no installation — only the file-output scripts need third-party libraries.


How Features Are Implemented

FeatureImplementationExternal libraries
OCR / reading imagesClaude built-in visionNone
MRZ decoding (passport/ID)Claude reads MRZ visually, applies ICAO algorithmNone
PDF readingClaude reads PDF text layer or visuallyNone
Form autofillClaude reads form fields, outputs fill tableNone
Contract analysisClaude applies reference rule setNone
Receipt / invoice scanningClaude reads image or PDFNone
Bank statement (PDF)Claude reads PDF pagesNone
Bank statement (CSV)statement_parser.py — pure stdlibNone
Expense loggingexpense_logger.py — pure stdlibNone
Bank report generationreport_generator.py — pure stdlibNone
Resume / CV parsingClaude reads documentNone
Medical summarizerClaude reads documentNone
Legal redaction (display)Claude marks up outputNone
Legal redaction (file output)redactor.pypymupdf (PDF); Pillow + pytesseract (image)
Meeting minutes (text/PDF)Claude reads documentNone
TranslationClaude's multilingual capabilitiesNone
Document categorizerClaude reads first 1–2 pages (with consent gate)None
Timeline loggingtimeline_manager.py — pure stdlibNone
Table extraction (PDF)table_extractor.pypdfplumber
Audio transcriptionaudio_transcriber.pyopenai-whisper + ffmpeg
Doc scan / perspective correctiondoc_scanner.pyopencv-python-headless, numpy, Pillow; img2pdf optional

Dependencies & Installation

No installation required for core functionality

Reading, analysis, form filling, contract review, receipt scanning, bank statement analysis (PDF), resume parsing, ID scanning, medical summarising, redaction markup, meeting minutes, and translation all run on Claude's built-in capabilities.

Optional — install only for file-output scripts

# PII redaction to PDF/image files  (redactor.py)
pip install pymupdf>=1.23          # required for PDF redaction
pip install Pillow>=10.0           # required for image redaction
pip install pytesseract>=0.3       # required for image redaction (also: brew install tesseract)

# Document scanning / perspective correction  (doc_scanner.py)
pip install opencv-python-headless>=4.9 numpy>=1.24 Pillow>=10.0
pip install img2pdf>=0.5           # optional — for PDF output; Pillow fallback used if absent

# Table extraction from PDFs  (table_extractor.py)
pip install pdfplumber>=0.11

# Audio transcription  (audio_transcriber.py)
# Also requires ffmpeg binary: brew install ffmpeg  /  apt install ffmpeg
pip install openai-whisper>=20231117

All dependencies are also listed in requirements.txt at the repository root.

Binary dependencies

BinaryRequired byInstall
tesseractredactor.py (image mode)brew install tesseract / apt install tesseract-ocr
ffmpegaudio_transcriber.pybrew install ffmpeg / apt install ffmpeg

Network access

openai-whisper downloads model files (~140 MB) from OpenAI/HuggingFace servers on first run only. Cached at ~/.cache/whisper/. All other scripts are fully local after installation.


Script Reference

ScriptDependenciesPurposeExample
redactor.pypymupdf; Pillow + pytesseract (image mode)PII redaction to file (PDF/image/text)python scripts/redactor.py --file doc.pdf --mode full --log
doc_scanner.pyopencv-python-headless, numpy, Pillow; img2pdf optionalDocument scanning: edge detection, perspective correction, scan-quality outputpython scripts/doc_scanner.py --input photo.jpg --output scanned.png --mode bw
expense_logger.pyNoneAdd/list/edit/delete expense entries in CSVpython scripts/expense_logger.py add --date 2024-03-15 --merchant "Starbucks" --amount 13.12 --file expenses.csv
statement_parser.pyNoneParse bank CSV export, categorize transactionspython scripts/statement_parser.py --file statement.csv --output categorized.json
report_generator.pyNoneFormat categorized JSON into a markdown reportpython scripts/report_generator.py --file categorized.json --type bank
timeline_manager.pyNoneManage opt-in document processing timelinepython scripts/timeline_manager.py show
audio_transcriber.pyopenai-whisper, ffmpegTranscribe audio files to textpython scripts/audio_transcriber.py --file meeting.mp3 --output transcript.txt
table_extractor.pypdfplumberExtract tables from PDFs to CSV or JSONpython scripts/table_extractor.py --file document.pdf --output data.csv

All scripts import only what they declare. Scripts with no declared deps use Python stdlib only. You can verify any script: "show me the source of [script name]".


Script Import Verification

ScriptStdlib importsThird-partyNetwork
timeline_manager.pyargparse, json, sys, datetime, pathlib, uuid, collectionsNoneNever
redactor.pyargparse, re, sys, pathlib, dataclassespymupdf (PDF); Pillow + pytesseract (image)Never
doc_scanner.pyargparse, json, sys, time, pathlibopencv-python-headless, numpy, Pillow; img2pdf optionalNever
expense_logger.pyargparse, csv, json, sys, pathlibNoneNever
statement_parser.pyargparse, csv, json, re, sys, collections, datetime, pathlibNoneNever
report_generator.pyargparse, json, sys, collections, pathlibNoneNever
utils.pyre, unicodedata, datetime, pathlibNoneNever
audio_transcriber.pyargparse, sys, pathlibopenai-whisperFirst-run model download only
table_extractor.pyargparse, csv, io, json, sys, pathlibpdfplumberNever

Privacy & Data Handling

AspectPolicy
Document contentRead locally within this session only. Not stored, indexed, or transmitted.
Personal data for form autofillUsed only to complete the current form. Not written to any file. Not retained after session.
Timeline logOpt-in only. Confirmed by user before any entry is written. Contains no raw document content — only category-level summaries.
Redacted output filesWritten only to a path the user explicitly confirms.
Audio transcriptsWritten to a local file the user specifies. Model download on first Whisper use only.
No telemetryThis skill has no analytics, usage reporting, or network calls beyond what is listed above.

Step 1 — Identify the Mode

Explicit intent → go directly to the matching mode

ModeUser intent signalsTypical file types
Document Categorizer"process this", "what is this?", "analyze this", "help with this", no clear intentAny
Form Autofillfill, autofill, fill out, complete this formPDF form, image, screenshot
Contract Analyzerreview, summarize, contract, agreement, risks, red flags, NDA, leasePDF, text
Receipt Scannerreceipt, invoice, log expense, scan this billPhoto, image, PDF
Bank Statement Analyzerbank statement, transactions, subscriptions, categorize spendingPDF, CSV
Resume / CV Parserparse resume, extract cv, what's on this resume, scan resumePDF, image, text
ID & Passport Scannerscan id, read passport, extract from id card, scan my passportPhoto, image, PDF
Medical Summarizerlab report, blood test, prescription, discharge summary, medical resultsPDF, image, text
Legal Redactorredact, remove pii, anonymize, censor sensitive infoPDF, text, image
Meeting Minutesmeeting minutes, action items, summarize meeting, transcribe meetingText, PDF, image, audio
Table Extractorextract table, table to csv, get data from pdf, table to jsonPDF, image, text
Document Translatortranslate this, translate to [language], document translationAny
Document Timelineshow my timeline, document history, what have I processed, save timeline
Doc Scanscan this photo, make this look scanned, correct perspective, dewarp, clean this photo, digitize this, straighten thisPhoto, image

Ambiguous intent → Document Categorizer (with consent gate)

If the user uploads a file without a clear mode signal, do not read it yet. Ask:

"I can classify this document automatically to suggest the best mode — that requires me to read the first 1–2 pages. Or you can choose directly:

OptionBest for
Form AutofillForms with fill-in fields
Contract AnalyzerAgreements, NDAs, leases
Receipt ScannerReceipts, invoices
Bank Statement AnalyzerBank/credit card statements
Resume ParserCVs, resumes
ID ScannerPassports, IDs, driver's licenses
Medical SummarizerLab reports, prescriptions
Legal RedactorAny document with PII to remove
Meeting MinutesNotes or recordings
Table ExtractorDocuments with data tables
TranslatorNon-English documents
Doc ScanDocument photo needing perspective correction

Shall I classify it, or which mode would you like?"

Only read the document after the user confirms.


Step 2 — Read the Document

Use the Read tool on the uploaded file. For images, read them visually. For PDFs over 10 pages, read in page ranges.

For audio files (Meeting Minutes mode only): confirm before running — this requires openai-whisper and downloads a model on first run:

"Transcribing this audio requires the openai-whisper library. On first use it downloads a model file (~140 MB). Is that OK?"

If yes:

python skills/doc-process/scripts/audio_transcriber.py --file <path> --output transcript.txt

If no: ask if the user can provide a text transcript.

For document photos (Doc Scan mode): read the image visually first to assess quality and detect the document type before running the scanner script.


Step 3 — Execute the Mode

Load and follow the matching reference file in full:

ModeReference file
Document Categorizerreferences/document-categorizer.md
Form Autofillreferences/form-autofill.md
Contract Analyzerreferences/contract-analyzer.md
Receipt Scannerreferences/receipt-scanner.md
Bank Statement Analyzerreferences/bank-statement-analyzer.md
Resume / CV Parserreferences/resume-parser.md
ID & Passport Scannerreferences/id-scanner.md
Medical Summarizerreferences/medical-summarizer.md
Legal Redactorreferences/legal-redactor.md
Meeting Minutesreferences/meeting-minutes.md
Table Extractorreferences/table-extractor.md
Document Translatorreferences/document-translator.md
Document Timelinereferences/document-timeline.md
Doc Scanreferences/doc-scan.md

Step 4 — Redactor: PII Rule Coverage

The redactor.py script covers the following PII categories across 50+ rule types for global document types (bank statements, contracts, medical records, invoices, share-purchase agreements, government forms, and more).

Category 1 — Personal Identifiers (standard + light mode)

RuleExamples
SSN (US)123-45-6789
SIN (Canada)123-456-789
UK National Insurance NumberAB 12 34 56 C
Australian TFN123 456 789
Australian Medicare number1234 56789 1
Indian Aadhaar1234 5678 9012
Passport numberA12345678
Driver's licensekeyword-anchored
UK NHS number943 476 5919
National / voter IDkeyword-anchored
Vehicle VINkeyword-anchored 17-char code
NRIC (Singapore)S1234567A
Medical record (MRN)keyword-anchored
Indian PANAABCW6386P
Email addressany@domain.com
Phone numberall international formats; date/reference false-positives suppressed
Street addressBLK/BLOCK/FLAT/UNIT/APT prefix + number + street name + type (Street, Ave, Rd, Hill, Close, Quay, Park, etc.)
Unit / apartment number#02-01, Unit 3B, Apt 4C, Flat 12
P.O. BoxPO Box 1234
US ZIP / CA postal10001, M5V 3A8
UK postcodeSW1A 2AA
International 6-digit postalSingapore 229572, Bangalore 560067
IPv4 address192.168.1.1
MAC addressAA:BB:CC:DD:EE:FF
Date of birthkeyword + numeric/month-name formats
Age"Age: 34"
Labeled name (50+ field keywords)Bill To, Shipper, Attention, Buyer, Seller, Patient, Employee, Plaintiff, Trustee, Shareholder, Director, Tenant, Lender, Beneficiary, etc.
Honorific prefix + nameMr./Mrs./Ms./Dr./Prof./Rev./Hon./Mx. + name

Category 2 — Financial Data (standard + full mode)

RuleExamples
Credit / debit card number4111 1111 1111 1111
Card CVVCVV: 123
Card expiry03/26
Bank account numberkeyword-anchored
IBANIBAN country-code validated (GB, DE, FR, etc.)
ABA / routing number"Routing No." and "ABA No."
UK Sort code20-00-00
Australian BSB063-000
Indian IFSC codeHDFC0000001
SWIFT / BIC codeallows space in code (e.g. CHAS US33)
Salary / compensationsalary, CTC, gross/net pay, take-home, remuneration
Credit scorekeyword-anchored
Loan / mortgage amountkeyword-anchored
Tax figuresAGI, taxable income, tax paid
Net worth / total assetskeyword-anchored
Cryptocurrency walletBitcoin, Ethereum

Category 3 — Sensitive / Protected (full mode only)

HIV/AIDS status, blood type, mental health diagnoses (expanded), reproductive health, substance use history, sexual orientation / gender identity, disability, criminal record, genetic information, immigration status, minor's name, attorney–client privilege, trade secrets.

Redaction modes

FlagCategoriesUse case
--mode lightCat 1 onlySharing docs where financial details can remain
--mode standardCat 1 + 2 (default)General privacy protection
--mode fullCat 1 + 2 + 3Legal filings, healthcare, immigration, HR
--custom REGEXCat 0 + selected modeDomain-specific or proprietary terms

How PDF redaction works

  1. Word bounding boxes are extracted from the PDF layout engine
  2. PII is detected using a single-pass, non-overlapping regex engine
  3. Matched spans are mapped back to word bounding boxes
  4. PyMuPDF redaction annotations (solid black fill) are placed on the exact word rects
  5. apply_redactions() burns the black fills in and removes the underlying text data from the content stream — redacted text cannot be copy-pasted or extracted
  6. The file is saved incrementally — every non-redacted element (fonts, images, vector graphics, metadata) is left completely untouched
  7. The original file is never modified; output is always a separate copy

Step 5 — Doc Scan: How It Works

The doc_scanner.py script converts a document photo into a professional scan in 7 steps:

  1. Multi-strategy edge detection — tries three approaches in order: (A) Canny on greyscale; (B) Morphological gradient; (C) Colour/brightness threshold. Stops at first success.
  2. Sub-pixel corner refinementcv2.cornerSubPix makes the four corner points accurate to sub-pixel level for the most precise warp.
  3. Perspective warp — four-point transform using Lanczos interpolation flattens the document to a perfect rectangle.
  4. Shadow removal — per-channel background estimation + normalisation removes cast shadows and uneven lighting without affecting text.
  5. Scan-quality enhancement — mode-specific: BW = adaptive threshold (block size auto-scaled to resolution) + stroke repair + denoising; Gray = auto-levels + CLAHE + unsharp mask; Color = white-balance + CLAHE + sharpening.
  6. Scanner border — 8 px white border simulates scanner bed edge.
  7. DPI-tagged output — saved with embedded DPI metadata (default 300 DPI, print quality).

When auto-detection fails

If the script reports "corners_detected": false:

  1. Offer manual corner hints: ask the user where the four corners of the document are approximately
  2. Use --no-warp to at least apply enhancement without perspective correction
  3. Provide photography tips (see references/doc-scan.md Step 8)

Step 6 — Document Timeline (Opt-In)

Off by default. After completing the first document task in a session, ask once:

"Would you like me to keep a processing log for this session? It records document type, filename, and a category-level summary (no raw content, no personal data) to ~/.doc-process-timeline.json on your local machine. Entirely optional — yes or no."

  • Yes → confirm "Timeline logging is on." Log current and subsequent documents. Announce each with "Logged to your timeline."
  • No → confirm "No log will be kept." Do not run any timeline script. Do not ask again this session.
  • No response / unsure → treat as No.

Summary rules (strictly enforced): the --summary argument must never contain names, ID numbers, dates of birth, addresses, account numbers, card numbers, medical values, or any data that could identify a person. Category-level descriptions only.


Step 7 — Deliver Output

Present output in clean tables with section headers as specified in each reference file. Always end with an action prompt relevant to the mode. For Doc Scan, always offer to continue processing the scanned output.


General Principles

  • Never hallucinate field values. Unknown values → [MISSING] or [UNREADABLE].
  • Flag risks conservatively — when in doubt, include it.
  • Keep summaries scannable with tables and bullets.
  • Do not echo sensitive data beyond what is necessary for the immediate task.
  • Always include relevant disclaimers (medical, legal, privacy) where required by the reference guide.
  • Timeline is opt-in per session. Never log without confirmed consent.
  • Personal data for form autofill is session-only. Never write it to a file.
  • Before running any script with third-party deps, run bash skills/doc-process/setup.sh automatically if deps are not yet installed (see Step 0). No need to ask — the setup script is safe and idempotent.
  • Categorize before asking — but only after confirming the user wants auto-classification.
  • For Doc Scan: always assess the image visually first; never process non-document images.

Comments

Loading comments...