Skill flagged — suspicious patterns detected

ClawHub Security flagged this skill as suspicious. Review the scan results before using.

Document Ingestion

v1.0.2

Process raw accounting source documents (PDFs, CSVs, bank statements, invoices, receipts) into standardized transaction records for QBO import. Use when batc...

0· 236·1 current·1 all-time

Install

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for samledger67-dotcom/document-ingestion.

Previewing Install & Setup.
Prompt PreviewInstall & Setup
Install the skill "Document Ingestion" (samledger67-dotcom/document-ingestion) from ClawHub.
Skill page: https://clawhub.ai/samledger67-dotcom/document-ingestion
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install document-ingestion

ClawHub CLI

Package manager switcher

npx clawhub@latest install document-ingestion
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Suspicious
high confidence
!
Purpose & Capability
The skill's stated purpose (convert accounting docs into QBO import CSVs) is reasonable, but the SKILL.md expects live QBO Chart-of-Accounts access and a Node.js QBO client with an auth token while the registry metadata declares no required environment variables or primary credential. The skill also references a local Python script path (scripts/pipelines/document-ingestion.py) and client-specific local files (clients/{slug}/...) that are not provided in the bundle.
!
Instruction Scope
Runtime instructions direct the agent to run a local Python script, read directories and client mapping files, write outputs to ~/Desktop and .cache/document-ingestion/{slug}.json, and optionally contact QBO (including sandbox). The SKILL.md also refers to 'Auth token must be configured' but does not specify how or which env var. These instructions access local filesystem paths and external APIs beyond what the registry declares.
Install Mechanism
This is an instruction-only skill (no install spec, no code files). The SKILL.md lists pip packages and Homebrew (tesseract) as required/optional installs — that's a manual installation expectation but the registry provides no automated install. Because no script files are included, following the instructions would fail unless the user separately obtains the referenced scripts.
!
Credentials
The runtime behavior implies need for QBO credentials (auth token) and possibly other secrets for a Node.js QBO client, but requires.env is empty and no primary credential is declared. It also reads/writes local files (client maps, caches, Desktop outputs). Requesting QBO access is proportionate to purpose only if the skill declares which credentials it needs and why; here that mapping is missing.
Persistence & Privilege
The skill is not always-enabled and does not request elevated platform privileges. However, it will create/modify local cache files (.cache/document-ingestion/...), read client config files, and write Excel/CSV outputs to user directories (default ~/Desktop). These are normal for such a tool but should be confirmed before running.
What to consider before installing
Red flags to consider before installing/using this skill: - The SKILL.md expects you to run a local script at scripts/pipelines/document-ingestion.py and to have client mapping files (clients/{slug}/...) but the published bundle contains no code — ask the publisher for the script and full source before trusting it. - The skill clearly needs a QBO auth token (and mentions a Node.js QBO client) but the registry lists no required environment variables or credential names; ask which exact credentials are needed and how to scope/restrict them (use a sandbox token with minimal scope for testing). - The instructions will read your local folders and write outputs and caches (~/Desktop, .cache/document-ingestion). If you plan to run anything from an unreviewed source, do so in an isolated environment (VM/container) and inspect the code first. - The skill recommends installing third-party tools (tesseract, pdfminer.six, ofxparse). Install these only from official sources and be cautious about permissions. - If you want to proceed: obtain the actual script source, verify where the QBO token is read (which env var or config file), review vendor maps and any pre-mapped vendor list for privacy issues, and test with non-sensitive sample documents in QBO sandbox mode. If the publisher cannot provide the missing script and explicit credential/config instructions, treat this skill as incomplete and avoid running it on real financial data.

Like a lobster shell, security has layers — review code before you run it.

Runtime requirements

📄 Clawdis
latestvk977k6z3n231tync81r2zr38jx83d55n
236downloads
0stars
3versions
Updated 5h ago
v1.0.2
MIT-0

Document Ingestion Engine — SKILL.md

When to Use This Skill

Use when a user needs to process raw accounting source documents into standardized transaction records for QBO import. Triggers on:

  • "Process these documents / invoices / receipts / bank statements"
  • "Ingest docs for [client]"
  • "I have PDFs/CSVs to categorize"
  • "Batch import these transactions to QBO"
  • "Extract data from 1099s / payroll reports"
  • Document drop + categorization requests during month-end close

When NOT to Use

  • Not for running bank reconciliation (use bank-reconciliation skill)
  • Not for P&L variance analysis (use pl-quick-compare skill)
  • Not for single manual journal entries (just post directly in QBO)
  • Not for AR collections or aging (use ar-collections-agent skill)

What It Does

Processes 6 document types → standardized records → Excel workbook + QBO import CSV.

Input TypeFormatsExtracts
Bank StatementsCSV, OFX/QFX, PDFDate, vendor, amount
Credit Card StmtsCSV, PDFDate, merchant, amount, category
InvoicesPDFVendor, total, date, due date, invoice #, line items
ReceiptsPDF, JPG/PNG*Merchant, date, amount
1099 / Tax FormsPDFPayer, TIN, form type, box amounts
Payroll ReportsCSV, PDFEmployee, gross, taxes, net per employee

*Image OCR requires tesseract installed.

Processing Steps

  1. File type detection — magic bytes + extension fallback
  2. Document classification — bank/CC/invoice/receipt/1099/payroll
  3. Content extraction — CSV parsing, OFX parsing, PDF text extraction
  4. Format normalization — dates (multi-format), amounts (Decimal), vendor names (strip noise)
  5. QBO COA pull — fetches live Chart of Accounts from QBO for categorization
  6. Duplicate detection — same amount + vendor within ±3 days → flagged
  7. Auto-categorization — vendor map → COA keywords → doc-class default
  8. Confidence scoring — HIGH (exact match) / MEDIUM (fuzzy) / LOW (needs review)
  9. Exception flagging — missing dates, zero amounts, unknown vendors, LOW confidence
  10. QBO import CSV — ready for batch import (excludes dups + failed extractions)
  11. Excel workbook — 6 tabs (see below)
  12. CDC tracking — delta since last run cached in .cache/document-ingestion/{slug}.json

Excel Output Tabs

TabContents
Processed TransactionsAll records with category, confidence, dup flag, exception
⚠ ExceptionsRecords needing manual review before import
DuplicatesFlagged potential duplicates with "Dup Of" reference
Category MappingUnique vendor → QBO account map with confidence
Import ReadyQBO-format rows (Date, Description, Amount, Account, Memo)
CDC LogDelta metrics vs. prior run + this-run stats summary

Script Location

scripts/pipelines/document-ingestion.py

Usage

# Process a directory of mixed documents
python3 scripts/pipelines/document-ingestion.py \
    --slug sb-paulson \
    --input-dir ~/Downloads/month-end-docs

# Single file
python3 scripts/pipelines/document-ingestion.py \
    --slug sb-paulson \
    --file ~/Downloads/invoice_march.pdf

# Multiple files + custom output dir
python3 scripts/pipelines/document-ingestion.py \
    --slug glowlabs \
    --file ~/Downloads/stmt.csv \
    --file ~/Downloads/payroll.csv \
    --out ~/Desktop/ingested

# Offline mode (no QBO auth needed)
python3 scripts/pipelines/document-ingestion.py \
    --slug sb-paulson \
    --input-dir ./docs \
    --no-qbo-coa

# QBO sandbox
python3 scripts/pipelines/document-ingestion.py \
    --slug sb-paulson \
    --input-dir ./docs \
    --sandbox

All CLI Flags

FlagDefaultDescription
--slugrequiredCompany slug (QBO + client vendor map)
--input-dirDirectory of docs to process
--fileSingle file (repeatable)
--out~/DesktopOutput directory
--no-qbo-coafalseUse built-in COA only (offline)
--sandboxfalseQBO sandbox mode

Dependencies

Required (pip)

pip install openpyxl

Optional (better extraction quality)

pip install pdfminer.six   # Better PDF text extraction
pip install ofxparse       # Better OFX/QFX parsing
brew install tesseract     # Image receipt OCR (JPG/PNG)

Node.js QBO Client

Node.js QBO client   # Auth token must be configured

Categorization Logic

Priority Chain

  1. Vendor Map exact matchHIGH confidence
  2. Vendor Map substring matchHIGH confidence
  3. COA keyword index (built from COA account names + keywords) → MEDIUM confidence
  4. Doc-class defaultLOW confidence

Built-in Vendor Map

50+ known vendors pre-mapped:

  • Stripe/Square/PayPal → Sales Revenue
  • Gusto/ADP/Deel/Paychex → Payroll - Salaries & Wages
  • Google/Microsoft/Slack/GitHub/Zoom → Software & Subscriptions
  • Delta/United/Marriott/Uber → Travel
  • FedEx/UPS/USPS → Postage & Delivery
  • Chase/BofA service charges → Bank & Merchant Fees
  • etc. (see VENDOR_MAP in script)

Client-Specific Overrides

Auto-loaded by --slug:

  • glowlabs → Loads GlowLabs vendor map (Deel, Toptal, Brex, Huellas Labs, etc.)
  • sb-paulson / willo → Loads Willo Salons vendor map
  • Other clients → Reads clients/{slug}/categorization-map*.md markdown tables

Duplicate Detection Rules

  • Window: ±3 days (configurable via DUP_WINDOW_DAYS constant)
  • Match criteria: Same amount (exact Decimal) + same vendor key (first 3 meaningful words)
  • Action: Flagged as is_duplicate=True, excluded from import file
  • Always confirm before deleting — duplicates tab shows "Dup Of Row #" reference

Exception Rules (auto-flagged)

ConditionFlag
Missing transaction date"Missing transaction date"
Zero amount (non-1099)"Zero amount — verify or skip"
Empty/unknown vendor"Vendor name missing or unknown"
LOW confidence category"Low categorization confidence — manual review"
PDF extraction failed"PDF text extraction failed — manual review required"
Image without tesseract"Image OCR not available — manual entry required"

QBO Import CSV Format

Ready-to-import columns:

Date | Description | Amount | Vendor/Customer | Account | Class | Memo | Doc Number
  • Amount sign: positive = expense (debit), negative = credit/income
  • Memo includes source file + doc type for audit trail
  • Excludes: duplicates, failed extractions

CDC Cache

Location: .cache/document-ingestion/{slug}.json

Tracks between runs:

  • docs_processed, records_extracted, duplicates_caught
  • exceptions_flagged, import_ready
  • high_confidence, medium_confidence, low_confidence

Output File Naming

DocIngestion_{slug}_{YYYYMMDD}.xlsx
DocIngestion_{slug}_{YYYYMMDD}_QBO_Import.csv

Agent Instructions

Standard Run

  1. Collect input files from user (directory path or individual files)
  2. Get client slug (sb-paulson, glowlabs, etc.)
  3. Run pipeline. If QBO auth not set, use --no-qbo-coa
  4. Deliver summary:
    • Records extracted, dups caught, exceptions
    • HIGH/MED/LOW confidence split
    • Path to Excel + import CSV
  5. Walk user through Exceptions tab — those need action before import

Month-End Close Integration

  • Run AFTER bank statement download, BEFORE bank reconciliation
  • Use --input-dir pointing to client's document drop folder
  • Import CSV goes into QBO → then run bank-reconciliation.py

Exception Handling

  • PDFs with no extractable text → LOW confidence + exception flag → send to client for re-scan
  • Image receipts with no tesseract → exception flag → use nano-pdf skill or manual entry
  • Unknown vendors → update VENDOR_MAP in script or add to clients/{slug}/categorization-map.md

Adding New Client Vendor Maps

Edit load_client_vendor_map() in the script:

if slug_lower in ("new-client", "nc"):
    client_map.update({
        "vendor name": "QBO Account Name",
    })

Or create clients/{slug}/categorization-map.md with markdown table:

| Vendor / Memo Keyword | Primary Account | Notes |
|---|---|---|
| Amazon | Office Supplies | |
| Comcast | Utilities | |

Financial Math

All amounts use Python Decimal with ROUND_HALF_UP to 2 decimal places. No float arithmetic.

Comments

Loading comments...