Install
openclaw skills install pdf-contract-redactorPDF contract redaction tool. Use when the user needs to redact sensitive information from scanned PDF contracts. The tool performs OCR to extract text, identifies field names and their corresponding values, and redacts only the values while keeping field names visible. Supports Alibaba Cloud OCR API for accurate Chinese text recognition.
openclaw skills install pdf-contract-redactorRedact sensitive values from scanned PDF contracts while preserving field names.
Convert PDF pages to high-resolution PNG images (200 DPI) for OCR.
Call Alibaba Cloud OCR API to get:
For each field in the field list:
For each matched value:
The following fields are searched and their values are redacted:
python scripts/redact_contract.py <input.pdf> <access_key_id> <access_key_secret> [output.pdf]
Example:
python scripts/redact_contract.py contract.pdf LTAIxxx xxx contract_redacted.pdf
<name>_redacted.pdf: Redacted PDF with values covered<name>_fields.json: JSON file listing all matched field-value pairsUses Alibaba Cloud "通用文字识别-高精度版" (RecognizeAdvanced API):
https://ocr.aliyuncs.com# For a field at (fx0, fy0, fx1, fy1)
# Look for values that are:
# 1. To the right: vx0 > fx1 and |vy0 - fy0| < field_height * 2
# 2. Below: vy0 > fy1 and vx0 >= fx0 - field_width * 0.3
# Choose the closest match
OCR returns coordinates in image space (200 DPI).
Convert to PDF space (72 DPI) using scale factor: scale = 72 / 200 = 0.36
pip install pymupdf pillow requests