Install
openclaw skills install invoice-extractorExtract invoice information from images and PDF files using Baidu OCR API, export to Excel. Supports single file, multiple files, or entire directory processing. Use when the user mentions invoices, invoice recognition, extracting invoice data, processing receipts, converting invoices to Excel, or batch processing invoice files.
openclaw skills install invoice-extractorExtract invoice information from images (PNG, JPG) and PDF files, then export to Excel format.
Get API credentials from https://cloud.baidu.com/product/ocr:
Create config.txt in the project root:
BAIDU_API_KEY=your_api_key_here
BAIDU_SECRET_KEY=your_secret_key_here
Or run the setup wizard:
python main_baidu.py --setup
Process a single file:
python main_baidu.py -f invoice.pdf
Process multiple files:
python main_baidu.py -f invoice1.pdf -f invoice2.png
Process entire directory:
python main_baidu.py -i ./fp
Mixed mode (directory + extra files):
python main_baidu.py -i ./fp -f extra_invoice.pdf
Output will be saved to output/ directory as Excel file.
Task Progress:
- [ ] Check prerequisites (Baidu API credentials)
- [ ] Choose input method (single file / multiple files / directory)
- [ ] Scan and collect invoice files
- [ ] Preview files (optional with --list)
- [ ] Process each file with Baidu OCR
- [ ] Parse invoice fields
- [ ] Export to Excel
- [ ] Verify output
Process one specific invoice file:
python main_baidu.py -f invoice.pdf
python main_baidu.py -f "path/to/invoice.png"
Process several specific files:
python main_baidu.py -f file1.pdf -f file2.png -f file3.jpg
Process all invoice files in a directory (recursive):
python main_baidu.py -i ./my_invoices
python main_baidu.py -i "/path/to/invoice/folder"
Combine directory and individual files:
python main_baidu.py -i ./fp -f ./extra/invoice.pdf
List files without processing:
python main_baidu.py -i ./fp --list
python main_baidu.py [options]
Input Options:
-f FILE, --file FILE Specify invoice file (can be used multiple times)
-i DIR, --input DIR Input directory (default: fp)
Output Options:
-o DIR, --output DIR Output directory (default: output)
-n NAME, --name NAME Output filename prefix (default: 发票信息)
Authentication Options:
--api-key KEY Baidu API Key
--secret-key KEY Baidu Secret Key
Other Options:
--setup Run configuration wizard
--list List files to be processed without processing
-h, --help Show help
python main_baidu.py -f "invoice.pdf"
python main_baidu.py -f "1.pdf" -f "2.png" -f "3.jpg"
python main_baidu.py -i "./2024_invoices"
python main_baidu.py -i ./fp --list
# Then process:
python main_baidu.py -i ./fp
python main_baidu.py -i ./fp -f ./urgent/invoice.pdf -o ./output -n "March_2024"
python main_baidu.py -i ./fp -o ./reports -n "Q1_Invoice_Summary"
.
├── fp/ # Place invoice files here
├── output/ # Excel output directory
├── src/
│ ├── main_baidu.py # Main entry point
│ ├── baidu_ocr_extractor.py # Baidu OCR wrapper
│ ├── invoice_model.py # Data models
│ ├── excel_exporter.py # Excel export
│ └── config.py # Configuration
├── scripts/ # Utility scripts
│ ├── batch_process.py # Batch processing helper
│ └── verify_export.py # Verify Excel export
├── config.txt # API credentials
├── requirements.txt # Dependencies
├── SKILL.md # This file
├── setup.md # Detailed setup guide
└── examples.md # Usage examples
python scripts/batch_process.py /path/to/invoices
python scripts/verify_export.py output/invoice_info.xlsx
Common issues and solutions:
"Baidu OCR authentication failed"
"No invoice files found"
--list to see what files are detected"Image format error"
"File not found"
"path/to/file name.pdf"Set credentials via environment:
export BAIDU_API_KEY="your_key"
export BAIDU_SECRET_KEY="your_secret"
Create a script for monthly processing:
#!/bin/bash
MONTH=$(date +%Y%m)
python main_baidu.py \
-i "/invoices/$MONTH" \
-o "/reports/$MONTH" \
-n "Invoice_Report_$MONTH"