FapiaoClaw

v1.0.0

Process and organize invoice PDFs by fixing extensions, removing duplicates and invalid files, checking for keywords, and calculating total amounts.

0· 13·0 current·0 all-time
byJOE@jie
MIT-0
Download zip
LicenseMIT-0 · Free to use, modify, and redistribute. No attribution required.
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
Name/description (fix extensions, dedupe, remove invalids, check keywords, sum amounts) align with the included main.py implementation. The code implements exactly those behaviors (extension rename, MD5 dedupe, remove files with '行程单' in the filename, keyword-based PDF content checks, moving files into invoices/ and invoices/unknown/, and totaling amounts). No unrelated capabilities (network access, unrelated cloud credentials) are present.
!
Instruction Scope
SKILL.md instructs the agent to run 'python main.py process <keyword> -d <dir>' and to prompt for a missing keyword — that matches main.py's required args. However the runtime behavior is destructive: duplicates are deleted (os.remove) and files with a filename keyword are deleted; other PDFs are moved into invoices/ and invoices/unknown/. SKILL.md does not require explicit user confirmation or suggest creating backups before running, which increases the chance of accidental data loss.
Install Mechanism
There is no install spec (instruction-only install). requirements.txt lists PyMuPDF, and main.py imports fitz (PyMuPDF). The agent or user must ensure PyMuPDF is installed; no code is downloaded at install time and there are no external network calls in the code. This is low install risk but the dependency must be satisfied before running.
Credentials
The skill declares no environment variables, requires no credentials, and main.py does not read environment variables or external config. Requested access is limited to the target directory provided by the user, which is appropriate for this purpose.
Persistence & Privilege
The skill is not always-enabled and does not modify other skills or global agent settings. It operates only on the user-specified filesystem directory. No elevated or persistent privileges are requested.
Assessment
This skill appears to do what it says: it scans PDFs, renames files, removes duplicates, deletes files whose names contain '行程单', moves matched/unknown invoices into invoices/ subfolders, and computes a total. There is no network communication or secret access. However, it performs destructive actions (deleting files and moving others) without prompting for confirmation. Before running: 1) back up the directory or test on a copy; 2) ensure PyMuPDF is installed (pip install PyMuPDF); 3) provide the correct keyword(s) — the script requires a keyword argument and will not guess it; and 4) review sample outputs first (run on a small set) to confirm it behaves as you expect. If you need non-destructive behavior, ask the developer to add a dry-run / confirmation flag or to prompt before deleting files.

Like a lobster shell, security has layers — review code before you run it.

latestvk97ezsk43982jyy96xcvrk7j118426v4

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

SKILL.md

WORKFLOW SKILL — Process and organize invoice PDFs using main.py.

USE FOR: Organizing invoices in a directory when the user provides a keyword. Capabilities include:

  1. Fixing .pdf extensions (files ending in ? or ).
  2. Removing duplicate files based on MD5 hash.
  3. Removing invalid files (e.g. containing "行程单").
  4. Checking PDF contents for a mandatory missing keyword.
  5. Calculating the grand total amount from all valid invoices.

DO NOT USE FOR: General Python debugging or other non-invoice related tasks. INVOKES: Terminal tool to run python main.py process <keyword> -d <dir>

Instructions for the Agent

When the user asks you to process or organize invoices (e.g., "整理 ./fapiao 里 的发票,关键字是 银河科技;腾讯"):

  1. Extract Arguments:
    • dir: The target directory. If the user does not specify a directory, safely default to the current directory (.) or ask the user if it is ambiguous.
    • keyword: The target keyword to check inside the PDFs (e.g., a company name). Note: Multiple keywords can be provided separated by a semicolon ; (e.g., CompanyA;CompanyB).
  2. Handle Missing Keyword:
    • You MUST prompt the user for the <keyword> if it is missing from their request. Do not guess it.
  3. Execute Command:
    • Construct and run the following terminal command (replacing placeholders with extracted values): python main.py process <keyword> -d <dir>
  4. Report Results:
    • After the script completes, summarize the printed results for the user (files fixed, dupes removed, files moved to <dir>/invoices, and files moved to <dir>/invoices/unknown, plus the grand total).

Files

3 total
Select a file
Select a file to preview.

Comments

Loading comments…