Pdf Contract Redactor
Security checks across malware telemetry and agentic risk
Overview
The skill is aligned with PDF contract redaction, but its output may not be safely redacted and it creates/logs unredacted sensitive values.
Use this only if sending contract pages to Alibaba Cloud OCR is acceptable. Before sharing any output, independently verify that redactions are irreversible, and protect or delete the generated _fields.json file and any logs containing extracted values.
VirusTotal
65/65 vendors flagged this skill as clean.
Risk analysis
Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.
A user may share the output believing sensitive contract data is removed, while the original content may still be recoverable under the overlay.
The redaction step visually covers values with black rectangles and saves the PDF; the code does not show a destructive PDF redaction, flattening, or removal of underlying page/image content.
page.draw_rect(rect, color=(0, 0, 0), fill=(0, 0, 0)) ... doc.save(output_path)
Use true PDF redaction or rasterize/flatten the final pages, then verify that hidden text/images cannot be extracted before sharing.
If OCR fails or fields are missed, the user could receive and share a PDF that still contains sensitive contract information.
OCR/API failures return an empty list, but the main flow still creates an output PDF and reports completion, so the output can be incomplete or unchanged while appearing successfully redacted.
except Exception as e: print(f"API call failed: {e}"); return [] ... redactor.create_redacted_pdf(output_pdf, field_values) ... print(f"Done! Output: {output_pdf}")Fail closed on OCR errors or zero/low match counts, require user review of unmatched fields, and clearly warn when the redaction is incomplete.
The redacted PDF may be accompanied by files or logs containing the same sensitive values the user intended to hide.
The tool persists unredacted extracted values in a JSON results file and prints some of them to stdout, which may enter logs or the agent transcript.
data = [{"field_name": fv.field_name, "value": fv.value, ...}] ... print(f" [{fv.field_name}] = {fv.value[:30]}")Make the field-value JSON optional and protected, avoid printing sensitive values by default, and instruct users to delete or secure generated metadata files.
Contract pages, including sensitive data, are transmitted to Alibaba Cloud for OCR.
The script sends base64-encoded page images to Alibaba Cloud OCR. This is disclosed and necessary for the stated OCR workflow, but it is external processing of contract content.
url = "https://ocr.aliyuncs.com" ... "ImageURL": f"data:image/png;base64,{image_base64}" ... requests.post(url, params=params, json=body, timeout=60)Use this only when Alibaba Cloud OCR processing is allowed by the user's privacy, legal, and contract-handling requirements.
Cloud credentials could be visible to local system users or retained in command history.
The skill requires Alibaba Cloud AccessKey credentials for its stated OCR purpose, but passing secrets as command-line arguments can expose them in shell history or process listings.
python scripts/redact_contract.py <input.pdf> <access_key_id> <access_key_secret> [output.pdf]
Use a least-privileged OCR-only key and prefer safer secret handling such as environment variables, a credential manager, or a scoped integration.
