Fapiao Clipper
v1.5.2发票夹子 v1.4 - 本地大模型驱动的发票自动识别与报销管理工具。 2级降级链:PyMuPDF文本提取(修复跨行匹配)→ Qwen3-VL视觉模型。 新增:seller/buyer跨行匹配修复、日期标准化。 功能:8项风控验真 + 一键导出 Excel + 合并 PDF。
Security Scan
OpenClaw
Benign
high confidencePurpose & Capability
Name/description (local invoice OCR, verification, export) matches the code and files: PDF/OFD handling, PyMuPDF extractor, optional local Ollama Qwen3-VL, SQLite DB, email downloader, blacklist sync and tax-check interactions. Required binary is only python3 and no unrelated cloud credentials are demanded in the skill metadata. The components present are appropriate for the stated purpose.
Instruction Scope
Runtime instructions are limited to cloning the repo, pip installing requirements, configuring config.yaml, and running CLI/web UI commands. The code will read/write files under the user-specified storage path (default ~/Documents/发票夹子) and lets the agent read the SQLite DB. The email watcher will log in to the user's IMAP account, download attachments, extract links from email HTML and follow those links (including re-requesting forms) to retrieve PDFs — behavior needed for 'auto fetch invoices' but it means the skill fetches external URLs and writes downloaded payloads locally. This is within scope but worth noting as an I/O/network surface that can pull arbitrary remote content if present in mail.
Install Mechanism
No automated install spec is embedded in the skill metadata (instruction-only), but SKILL.md / README instruct cloning from the GitHub homepage and pip installing requirements.txt. That is a normal install path. The repo includes executable Python code (not just prose), so installing and running will execute that code locally. No suspicious remote binary downloads or URL-shortener installs are used in the provided install instructions; Docker compose references local services (Ollama) and optional env vars.
Credentials
The skill declares no required env vars in metadata, which aligns with shipping a config file-based tool. Operationally, the tool requires IMAP credentials (username/password) in config to enable mailbox scanning, and may require local Ollama or optional third-party API keys if you choose those providers (config example shows siliconflow.api_key, docker-compose shows DASHSCOPE_API_KEY and OLLAMA_BASE_URL). These credential needs match the features (email scanning, local vision model, optional cloud provider) and are not excessive, but the user must supply them in plaintext config.yaml — treat those credentials as sensitive and protect config file permissions.
Persistence & Privilege
Skill is not force-installed (always: false) and does not request to modify other skills or system-wide agent settings. It stores data locally (SQLite DB, inbox directory, exports) in the user-specified storage path. Allowing the agent to read the DB is intentional for answering invoice queries; autonomous invocation is allowed by platform default but is not combined with additional privileged flags here.
Assessment
This repository appears to implement exactly what it claims: a local invoice OCR and reimbursement helper. Before installing or running it, consider the following:
- Credentials/config: The email watcher expects IMAP username/password in config/config.yaml — these will be stored in plaintext in that file unless you take other measures. Limit file permissions (chmod 600) and keep config out of backups if you don't want credentials stored elsewhere.
- Network I/O: The email component will download attachments and follow links found in email HTML (including re-posting form actions) to retrieve PDFs. This is necessary for auto-download but increases the risk of fetching malicious content embedded in emails. If you enable mail scanning, run it on a trusted machine or in an isolated environment.
- Local services: OCR fallback uses a local Ollama model (Qwen3-VL) or optional cloud providers. If you use a cloud provider (siliconflow etc.), you will need to supply an API key — review those settings in config.yaml and requirements.txt before enabling.
- Privacy claims: The project advertises 'zero upload' — code shows downloads from tax.gov for blacklist/verification and clicking the tax bureau check link; verification likely involves querying public tax-check endpoints. Review verifier.py (not shown fully in the bundle) to confirm it only queries public verification endpoints and does not post invoice contents to third-party services.
- Exposed interfaces: README documents options to expose the Web UI (Tailscale/frp or running in Docker). If you enable remote access, ensure you secure access (VPN/Tailscale, firewall rules) because the Web UI can read the local invoice DB and exports.
- Dependency audit: Inspect requirements.txt and vet dependencies before pip install. Consider installing into a dedicated virtualenv or container.
- Least privilege: If you only need local PDF/image processing (no mail auto-fetch), leave email.enabled=false and run manual scans to reduce network exposure.
If you want deeper analysis, provide the full verifier.py and the complete requirements.txt so I can check whether any dependency or verification code sends invoice data to third-party endpoints beyond the stated tax-check/blacklist lookups.Like a lobster shell, security has layers — review code before you run it.
Runtime requirements
🧾 Clawdis
Binspython3
latest
发票夹子 (Invoice Clipper) v1.3
纯 Python CLI 工具,OpenClaw / Claude Code / KimiClaw 等任何 Agent 平台均可使用。
v1.3 重大更新
简化架构为 2 级(2026-04-03):
- 第1级:PyMuPDF 文本提取(修复跨行匹配)
- 第2级:Qwen3-VL 视觉模型(备用)
- 去掉 GLM-OCR(不稳定)和 TurboQuant(未启用)
设计理念
发票 → 放文件夹
↓
PDF 提取文字(两种引擎可选)
↓ 读不出才走第2级
视觉模型(扫描件才触发)
↓
存入 SQLite 数据库
↓
Agent 直接读数据库回答问题 ← 完全不消耗 API token
二级识别链 (v1.3)
| 级别 | 引擎 | 触发条件 | 特点 |
|---|---|---|---|
| 第1级 | PyMuPDF | 可搜索 PDF(默认) | 毫秒级,无需Java |
| 第2级 | Ollama Qwen3-VL | 图片/扫描件 | ~6.1GB 内存 |
大部分发票走第1级,零成本。
数据库(Agent 直接读)
发票处理后存在 ~/Documents/发票夹子/invoices.db(SQLite)。
Agent 可以直接用自然语言读数据库,例如:
- "这个月收到哪些发票?"
- "有没有超过365天的发票?"
- "XX公司的发票有吗?"
不需要额外调用任何大模型 API,Agent 用自己的上下文就能直接读。
命令速查
| 用户意图 | 执行命令 |
|---|---|
| 扫描发票 | python3 {baseDir}/main.py scan |
| 列出发票 | python3 {baseDir}/main.py list |
| 查询日期 | python3 {baseDir}/main.py query --from 2026-03-01 --to 2026-03-31 |
| 标记不报销 | python3 {baseDir}/main.py exclude <ID> |
| 恢复报销 | python3 {baseDir}/main.py include <ID> |
| 导出报销 | python3 {baseDir}/main.py export --from 2026-03-01 --to 2026-03-31 --format both |
| 批量验真 | python3 {baseDir}/main.py verify |
| 查看问题发票 | python3 {baseDir}/main.py problems |
| 同步黑名单 | python3 {baseDir}/main.py blacklist-sync |
意图识别规则
| 用户说 | 执行的命令 |
|---|---|
| "扫描发票" / "整理邮箱" | scan |
| "本月发票" / "列出所有" | list |
| "XX商家发票" | query --seller XX |
| "导出报销" | export --from ... --to ... --format both |
| "不要报销#3那张" | exclude 3 |
Agent 平台使用
零配置(推荐首次使用)
不想编辑 YAML?运行交互向导,回答几个问题即可:
python3 {baseDir}/setup_config.py
安装
git clone https://github.com/Alan5168/fapiao-clipper.git
cd fapiao-clipper
pip install -r requirements.txt
cp config/config.yaml.template config/config.yaml
注意事项
- 原文件永不删除,
exclude仅标记 - 发票有效期默认 365 天(可配置)
- 有 OpenClaw/Claude Code → 第1级搞定后,Agent 直接读数据库,不消耗 API
Comments
Loading comments...
