Accounting Skill

v0.1.0

Process accounting documents — invoices (hóa đơn GTGT), purchase orders, and bank statements. Extract structured data from PDF (digital and scanned), JPG, an...

⭐ 0· 155·0 current·0 all-time

by@dvnghiem

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for dvnghiem/vn-accounting-skill.

Previewing Install & Setup.

Prompt PreviewInstall & Setup

Install the skill "Accounting Skill" (dvnghiem/vn-accounting-skill) from ClawHub.
Skill page: https://clawhub.ai/dvnghiem/vn-accounting-skill
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install vn-accounting-skill

ClawHub CLI

Package manager switcher

npx clawhub@latest install vn-accounting-skill

Security Scan

VirusTotal

Benign

View report →

OpenClaw

Benign

medium confidence

✓

Purpose & Capability

Name/description (accounting OCR) aligns with included scripts and reference docs. The repo contains classification and three extractor scripts plus OCR helpers and bank/invoice/PO references — all expected for the stated functionality. No unrelated credentials, binaries, or config paths are requested.

ℹ

Instruction Scope

SKILL.md and README direct the agent/user to run local scripts (uv run ...) which read input files, perform OCR, and write Excel/JSON outputs. This matches purpose. Two items to note: (1) the README example uses eval on commands generated by classify_document.py — executing untrusted command strings is risky; prefer invoking recommended scripts directly rather than eval’ing arbitrary output. (2) Extraction outputs include raw_text and full source_file paths in JSON backups which may contain sensitive data or reveal local filesystem structure.

ℹ

Install Mechanism

No install spec in registry (instruction-only), but scripts rely on system packages (tesseract, poppler) and Python deps declared inline via PEP-723 comments (uv run will auto-install). This is reasonable for OCR tooling, but auto-installing Python packages (via 'uv run') has typical supply-chain risk compared to using only system packages; verify dependencies and run in a controlled environment.

✓

Credentials

The skill requests no environment variables, credentials, or unrelated system config paths. The code operates on local files and writes local Excel/JSON outputs — consistent with its purpose. There are no obvious requests for network credentials or cloud keys.

✓

Persistence & Privilege

Skill flags are default (always: false) and the skill does not request permanent/privileged presence or modify other skills. It writes output files (Excel/JSON) in user-specified locations, which is appropriate for this tool.

Assessment

This skill appears to be what it says: local OCR + structured extraction. Before installing or running it: 1) Inspect scripts/ocr_utils.py (not fully shown here) to confirm it does not make network calls or upload data; 2) Run first uses in a controlled environment (or a copy of sensitive files) because the skill writes JSON backups containing raw OCR text and full source_file paths which may contain PII; 3) Install system OCR packages (tesseract, poppler) and review Python packages that 'uv run' will install; consider using a virtual environment; 4) Avoid blindly executing shell-eval of commands produced by classify_document (the README shows an eval example); instead run the suggested extractor programmatically or by copy/paste to avoid command injection if an attacker can supply malicious files; 5) Use --dry-run to preview parsed JSON before writing files. If you want higher assurance, share the full contents of scripts/ocr_utils.py and any remaining truncated files so they can be reviewed for network or other unexpected behavior.

Like a lobster shell, security has layers — review code before you run it.

latestvk9779915hbzx0sf8qgrawe1yp18363gv

155downloads

0stars

1versions

Updated 1mo ago

v0.1.0

MIT-0

Accounting Skill

Extract structured data from accounting documents (invoices, POs, bank statements) into Excel tracking sheets with JSON backups. Handles digital PDFs, scanned PDFs, and images via automatic OCR.

Prerequisites

Install system OCR dependencies before first use. See {baseDir}/references/ocr-setup.md for full guide.

# Ubuntu / Debian
sudo apt install tesseract-ocr tesseract-ocr-vie poppler-utils

# Verify
uv run {baseDir}/scripts/ocr_utils.py check

Quick Start

1. Classify an unknown document

uv run {baseDir}/scripts/classify_document.py /path/to/document.pdf

Returns JSON with type (invoice / po / statement / other), confidence, and a ready-to-run extraction command.

2. Extract an invoice

uv run {baseDir}/scripts/extract_invoice.py /path/to/invoice.pdf -o invoice_tracking.xlsx

Appends to the Excel tracking sheet. Use --dry-run to preview parsed data without writing.

3. Extract a bank statement

uv run {baseDir}/scripts/extract_statement.py /path/to/statement.pdf

Creates statement_{bank}_{date}.xlsx with transactions. Use -o to specify output path.

4. Extract a purchase order

uv run {baseDir}/scripts/extract_po.py /path/to/po.pdf -o po_tracking.xlsx

Tracks delivery dates and flags overdue/urgent POs.

5. Generate empty Excel templates

uv run {baseDir}/scripts/generate_templates.py all -o ~/accounting/

Creates blank tracking sheets: invoice_tracking.xlsx, po_tracking.xlsx, statement_template.xlsx.

Common Options (all extractors)

Flag	Description
`--format excel\|json\|both`	Output format (default: `both`)
`--dry-run`	Parse and validate only, print JSON to stdout
`--json-dir DIR`	Directory for JSON backup files
`-o FILE`	Output Excel file path

Workflow

Single Document

File → classify_document.py → route → extract_*.py → Excel + JSON

Batch Processing

For a folder of mixed documents, classify first, then route:

for f in /path/to/docs/*; do
  uv run {baseDir}/scripts/classify_document.py "$f" --output-dir ~/accounting/
done

Then run the suggested extraction commands from each classification result.

OCR Strategy

All scripts share {baseDir}/scripts/ocr_utils.py which auto-selects the best extraction method:

Digital PDFs → pdfplumber (fast, no OCR needed)
Scanned PDFs → pdf2image + pytesseract at 300 DPI (fallback when pdfplumber gets <50 chars/page)
Images (JPG/PNG/TIFF) → pytesseract with grayscale preprocessing

Each result includes ocr_confidence and extraction_confidence percentages. Documents below 85% are automatically flagged needs_review.

Validation Rules

Invoices: Subtotal + VAT = Total (auto-checks math), duplicate detection by invoice number + vendor
Bank statements: Opening balance + credits − debits = closing balance
POs: Delivery date tracking with overdue/urgent alerts

Reference Documents

Read these for field schemas, Vietnamese format details, and validation logic:

{baseDir}/references/invoice-fields.md — Vietnamese VAT invoice fields, tax rates, patterns
{baseDir}/references/bank-formats.md — Vietnamese bank names, transaction formats, amount patterns
{baseDir}/references/po-fields.md — PO fields, delivery status logic, payment terms
{baseDir}/references/ocr-setup.md — OCR installation, troubleshooting, confidence scoring

Comments

Loading comments...