Install
openclaw skills install @samledger67-dotcom/document-ingestionProcess raw accounting source documents (PDFs, CSVs, bank statements, invoices, receipts) into standardized transaction records for QBO import. Use when batch-processing client documents for month-end close, categorizing transactions, or extracting data from 1099s and payroll reports. NOT for bank reconciliation, P&L variance analysis, or AR collections.
openclaw skills install @samledger67-dotcom/document-ingestionUse when a user needs to process raw accounting source documents into standardized transaction records for QBO import. Triggers on:
bank-reconciliation skill)pl-quick-compare skill)ar-collections-agent skill)Processes 6 document types → standardized records → Excel workbook + QBO import CSV.
| Input Type | Formats | Extracts |
|---|---|---|
| Bank Statements | CSV, OFX/QFX, PDF | Date, vendor, amount |
| Credit Card Stmts | CSV, PDF | Date, merchant, amount, category |
| Invoices | Vendor, total, date, due date, invoice #, line items | |
| Receipts | PDF, JPG/PNG* | Merchant, date, amount |
| 1099 / Tax Forms | Payer, TIN, form type, box amounts | |
| Payroll Reports | CSV, PDF | Employee, gross, taxes, net per employee |
*Image OCR requires tesseract installed.
.cache/document-ingestion/{slug}.json| Tab | Contents |
|---|---|
| Processed Transactions | All records with category, confidence, dup flag, exception |
| ⚠ Exceptions | Records needing manual review before import |
| Duplicates | Flagged potential duplicates with "Dup Of" reference |
| Category Mapping | Unique vendor → QBO account map with confidence |
| Import Ready | QBO-format rows (Date, Description, Amount, Account, Memo) |
| CDC Log | Delta metrics vs. prior run + this-run stats summary |
scripts/pipelines/document-ingestion.py
# Process a directory of mixed documents
python3 scripts/pipelines/document-ingestion.py \
--slug sb-paulson \
--input-dir ~/Downloads/month-end-docs
# Single file
python3 scripts/pipelines/document-ingestion.py \
--slug sb-paulson \
--file ~/Downloads/invoice_march.pdf
# Multiple files + custom output dir
python3 scripts/pipelines/document-ingestion.py \
--slug glowlabs \
--file ~/Downloads/stmt.csv \
--file ~/Downloads/payroll.csv \
--out ~/Desktop/ingested
# Offline mode (no QBO auth needed)
python3 scripts/pipelines/document-ingestion.py \
--slug sb-paulson \
--input-dir ./docs \
--no-qbo-coa
# QBO sandbox
python3 scripts/pipelines/document-ingestion.py \
--slug sb-paulson \
--input-dir ./docs \
--sandbox
| Flag | Default | Description |
|---|---|---|
--slug | required | Company slug (QBO + client vendor map) |
--input-dir | — | Directory of docs to process |
--file | — | Single file (repeatable) |
--out | ~/Desktop | Output directory |
--no-qbo-coa | false | Use built-in COA only (offline) |
--sandbox | false | QBO sandbox mode |
pip install openpyxl
pip install pdfminer.six # Better PDF text extraction
pip install ofxparse # Better OFX/QFX parsing
brew install tesseract # Image receipt OCR (JPG/PNG)
Node.js QBO client # Auth token must be configured
HIGH confidenceHIGH confidenceMEDIUM confidenceLOW confidence50+ known vendors pre-mapped:
VENDOR_MAP in script)Auto-loaded by --slug:
clients/{slug}/categorization-map*.md markdown tablesDUP_WINDOW_DAYS constant)is_duplicate=True, excluded from import file| Condition | Flag |
|---|---|
| Missing transaction date | "Missing transaction date" |
| Zero amount (non-1099) | "Zero amount — verify or skip" |
| Empty/unknown vendor | "Vendor name missing or unknown" |
| LOW confidence category | "Low categorization confidence — manual review" |
| PDF extraction failed | "PDF text extraction failed — manual review required" |
| Image without tesseract | "Image OCR not available — manual entry required" |
Ready-to-import columns:
Date | Description | Amount | Vendor/Customer | Account | Class | Memo | Doc Number
Location: .cache/document-ingestion/{slug}.json
Tracks between runs:
docs_processed, records_extracted, duplicates_caughtexceptions_flagged, import_readyhigh_confidence, medium_confidence, low_confidenceDocIngestion_{slug}_{YYYYMMDD}.xlsx
DocIngestion_{slug}_{YYYYMMDD}_QBO_Import.csv
sb-paulson, glowlabs, etc.)--no-qbo-coa--input-dir pointing to client's document drop folderbank-reconciliation.pynano-pdf skill or manual entryVENDOR_MAP in script or add to clients/{slug}/categorization-map.mdEdit load_client_vendor_map() in the script:
if slug_lower in ("new-client", "nc"):
client_map.update({
"vendor name": "QBO Account Name",
})
Or create clients/{slug}/categorization-map.md with markdown table:
| Vendor / Memo Keyword | Primary Account | Notes |
|---|---|---|
| Amazon | Office Supplies | |
| Comcast | Utilities | |
All amounts use Python Decimal with ROUND_HALF_UP to 2 decimal places. No float arithmetic.