Install
openclaw skills install deepread-piiRedact 14 types of PII from PDFs, images, and text using context-aware AI that removes data irreversibly with black bars, supporting multi-language and audit...
openclaw skills install deepread-piiYour agent processes documents. Some of those documents contain names, SSNs, medical records, credit card numbers. Before that data flows to another API, a database, a teammate, or an LLM — it needs to be clean.
Input: "Invoice to: John Smith, 123 Oak St, SSN: 456-78-9012, Card: 4532-1234-5678-9012"
Output: "Invoice to: ██████████, ██████████, SSN: ███████████, Card: ███████████████████"
One API call. Document in, redacted copy + detection report back.
DeepRead PII detects 14 types of personally identifiable information using context-aware AI — not regex. It knows "Dr. Sarah Chen" on a hospital letterhead is the physician (skip), but "Sarah Chen" on a patient intake form is the patient (redact). No regex rules to write. No word lists to maintain. No false positive floods.
This skill instructs the agent to POST your file to
https://api.deepread.tech, poll for results, and download the redacted copy. No system files are modified and no elevated permissions are requested.
deepread-ocr, fill forms with deepread-form-fill, then redact the originalsSign up and create an API key:
# Visit the dashboard
https://www.deepread.tech/dashboard
# Or use this direct link
https://www.deepread.tech/dashboard/?utm_source=clawhub
Save your API key:
export DEEPREAD_API_KEY="sk_live_your_key_here"
Add to your clawdbot.config.json5:
{
skills: {
entries: {
"deepread-pii": {
enabled: true
// API key is read from DEEPREAD_API_KEY environment variable
// Do NOT hardcode your API key here
}
}
}
}
Option A: With Webhook (Recommended)
curl -X POST https://api.deepread.tech/v1/pii/redact \
-H "X-API-Key: $DEEPREAD_API_KEY" \
-F "file=@document.pdf" \
-F "webhook_url=https://your-app.com/webhooks/pii"
# Returns immediately:
{
"id": "550e8400-e29b-41d4-a716-446655440000",
"status": "queued"
}
# Your webhook receives results when processing completes
Option B: Poll for Results
curl -X POST https://api.deepread.tech/v1/pii/redact \
-H "X-API-Key: $DEEPREAD_API_KEY" \
-F "file=@patient_record.pdf"
# Returns immediately:
{
"id": "550e8400-e29b-41d4-a716-446655440000",
"status": "queued"
}
# Poll until completed:
curl https://api.deepread.tech/v1/pii/550e8400-e29b-41d4-a716-446655440000 \
-H "X-API-Key: $DEEPREAD_API_KEY"
Replaces all detected PII with solid black bars. Text is physically removed from the PDF content stream. Copy-paste cannot recover it.
curl -X POST https://api.deepread.tech/v1/pii/redact \
-H "X-API-Key: $DEEPREAD_API_KEY" \
-F "file=@contract.pdf"
Response when completed:
{
"id": "550e8400-...",
"status": "completed",
"progress_percent": 100,
"redacted_file_url": "https://...(signed URL)...",
"report": {
"id": "550e8400-...",
"page_count": 5,
"processing_time_ms": 3421,
"pii_detected": {
"NAME": {
"count": 3,
"pages": [1, 2],
"confidence_avg": 0.92
},
"PHONE": {
"count": 2,
"pages": [1],
"confidence_avg": 0.89
},
"EMAIL": {
"count": 1,
"pages": [2],
"confidence_avg": 0.97
}
},
"total_redactions": 6,
"redaction_policy": "black_bar",
"confidence_threshold_used": 0.85,
"below_threshold_count": 0
}
}
Works on scanned documents, photos of IDs, screenshots — any PNG or JPEG.
curl -X POST https://api.deepread.tech/v1/pii/redact \
-H "X-API-Key: $DEEPREAD_API_KEY" \
-F "file=@drivers_license.png"
Vision AI detects PII in the image, locates exact positions, and draws black bars over the text.
curl -X POST https://api.deepread.tech/v1/pii/redact \
-H "X-API-Key: $DEEPREAD_API_KEY" \
-F "file=@notes.txt"
Before: Patient John Smith, SSN 456-78-9012, called from (555) 867-5309
After: Patient ██████████, SSN ███████████, called from ██████████████
Supports documents in English, Chinese, Spanish, Hindi, and Arabic.
curl -X POST https://api.deepread.tech/v1/pii/redact \
-H "X-API-Key: $DEEPREAD_API_KEY" \
-F "file=@documento.pdf" \
-F "language=es"
Supported languages: en (default), zh, es, hi, ar
deepread-ocr skill for that, then redactDocument → AI Detection → Threshold Filter → Context Validation → Redact
Layer 1: AI Detection
Layer 2: Confidence Threshold
below_threshold_count for manual reviewLayer 3: Context-Aware Validation
admissions@university.edu → institutional email, SKIPjohn.smith@gmail.com → personal email, REDACT| PII Type | Examples |
|---|---|
SSN | Social Security numbers, national IDs |
CREDIT_CARD | Card numbers, transaction IDs tied to a person |
EMAIL | Personal email addresses (not institutional) |
PHONE | Phone numbers (all country formats) |
NAME | Person names (all cultures/languages) |
ADDRESS | Physical addresses, postal codes |
IP_ADDRESS | IPv4 and IPv6 addresses |
DATE_OF_BIRTH | Birth dates (only when labeled as DOB — not random dates) |
PASSPORT_NUMBER | Passport numbers |
DRIVER_LICENSE | Driver's licenses, registration numbers |
BANK_ACCOUNT | Account numbers, routing numbers |
IBAN | International Bank Account Numbers |
URL | Personal URLs and web addresses |
MEDICAL_RECORD | Medical record numbers, patient IDs, hospital IDs |
Auth: X-API-Key: YOUR_KEY
Content-Type: multipart/form-data
Rate Limit: 10 requests per 60 seconds
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
file | File | Yes | — | PDF, PNG, JPEG, or TXT (max 50MB) |
language | string | No | "en" | "en", "zh", "es", "hi", "ar" |
webhook_url | string | No | — | HTTPS URL for completion notification |
Response (200 OK):
{
"id": "550e8400-e29b-41d4-a716-446655440000",
"status": "queued"
}
Errors:
| Status | Code | Meaning |
|---|---|---|
| 400 | INVALID_REQUEST | Bad request parameters |
| 400 | UNSUPPORTED_FORMAT | File type not supported |
| 400 | EMPTY_DOCUMENT | File is empty |
| 400 | DOCUMENT_CORRUPTED | File is corrupt or unreadable |
| 400 | PASSWORD_PROTECTED | Document is password-protected |
| 401 | UNAUTHORIZED | Invalid or missing API key |
| 413 | FILE_TOO_LARGE | Exceeds 50MB size limit |
| 429 | RATE_LIMITED | Rate limit exceeded |
| 500 | INTERNAL_ERROR | Server error (retry or contact support) |
Auth: X-API-Key: YOUR_KEY
Rate Limit: 60 requests per 60 seconds
Poll until status is completed or failed.
Response (completed):
{
"id": "550e8400-...",
"status": "completed",
"progress_percent": 100,
"redacted_file_url": "https://...(signed URL)...",
"report": {
"id": "550e8400-...",
"page_count": 3,
"processing_time_ms": 4200,
"pii_detected": {
"NAME": {
"count": 4,
"pages": [1, 2, 3],
"confidence_avg": 0.93,
"confidence_min": 0.87,
"below_threshold": 0,
"allowed": 0
},
"SSN": {
"count": 1,
"pages": [1],
"confidence_avg": 0.98,
"confidence_min": 0.98,
"below_threshold": 0,
"allowed": 0
}
},
"total_redactions": 5,
"redaction_policy": "black_bar",
"confidence_threshold_used": 0.85,
"below_threshold_count": 0
},
"error": null
}
Response (failed):
{
"id": "550e8400-...",
"status": "failed",
"progress_percent": 0,
"redacted_file_url": null,
"report": null,
"error": {
"code": "DOCUMENT_CORRUPTED",
"message": "Unable to process the document"
}
}
Statuses: queued → processing → completed or failed
import requests
import time
API_KEY = "sk_live_YOUR_KEY"
BASE = "https://api.deepread.tech"
# Submit document for redaction
with open("patient_record.pdf", "rb") as f:
resp = requests.post(
f"{BASE}/v1/pii/redact",
headers={"X-API-Key": API_KEY},
files={"file": f},
)
job_id = resp.json()["id"]
# Poll with backoff
delay = 3
while True:
time.sleep(delay)
result = requests.get(
f"{BASE}/v1/pii/{job_id}",
headers={"X-API-Key": API_KEY}
).json()
if result["status"] == "completed":
# Download redacted file
redacted_url = result["redacted_file_url"]
redacted_pdf = requests.get(redacted_url).content
with open("patient_record_redacted.pdf", "wb") as f:
f.write(redacted_pdf)
# Review detection report
report = result["report"]
print(f"Redacted {report['total_redactions']} PII instances")
for pii_type, summary in report["pii_detected"].items():
print(f" {pii_type}: {summary['count']} found on pages {summary['pages']}")
break
elif result["status"] == "failed":
print(f"Failed: {result['error']['message']}")
break
delay = min(delay * 1.5, 15)
import fs from "fs";
const API_KEY = "sk_live_YOUR_KEY";
const BASE = "https://api.deepread.tech";
// Submit document
const form = new FormData();
form.append("file", fs.createReadStream("contract.pdf"));
const { id: jobId } = await fetch(`${BASE}/v1/pii/redact`, {
method: "POST",
headers: { "X-API-Key": API_KEY },
body: form,
}).then((r) => r.json());
// Poll with backoff
let delay = 3000;
let result;
do {
await new Promise((r) => setTimeout(r, delay));
result = await fetch(`${BASE}/v1/pii/${jobId}`, {
headers: { "X-API-Key": API_KEY },
}).then((r) => r.json());
delay = Math.min(delay * 1.5, 15000);
} while (!["completed", "failed"].includes(result.status));
if (result.status === "completed") {
console.log(`Redacted file: ${result.redacted_file_url}`);
console.log(`Total redactions: ${result.report.total_redactions}`);
}
# Redact a PDF
curl -X POST https://api.deepread.tech/v1/pii/redact \
-H "X-API-Key: $DEEPREAD_API_KEY" \
-F "file=@document.pdf"
# Redact a scanned image
curl -X POST https://api.deepread.tech/v1/pii/redact \
-H "X-API-Key: $DEEPREAD_API_KEY" \
-F "file=@id_card.png"
# Redact with language hint
curl -X POST https://api.deepread.tech/v1/pii/redact \
-H "X-API-Key: $DEEPREAD_API_KEY" \
-F "file=@documento.pdf" \
-F "language=es"
# Check status
curl https://api.deepread.tech/v1/pii/JOB_ID \
-H "X-API-Key: $DEEPREAD_API_KEY"
Use deepread-ocr to extract data, then redact the original before archiving:
# Step 1: Extract structured data (keeps the data you need)
curl -X POST https://api.deepread.tech/v1/process \
-H "X-API-Key: $DEEPREAD_API_KEY" \
-F "file=@claim.pdf" \
-F 'schema={"type":"object","properties":{"claim_number":{"type":"string"},"amount":{"type":"number"}}}'
# Step 2: Redact PII from original (clean copy for filing)
curl -X POST https://api.deepread.tech/v1/pii/redact \
-H "X-API-Key: $DEEPREAD_API_KEY" \
-F "file=@claim.pdf"
Strip PII before sending document content to any AI model:
# 1. Redact the document
resp = requests.post(f"{BASE}/v1/pii/redact",
headers={"X-API-Key": API_KEY},
files={"file": open("report.pdf", "rb")})
job_id = resp.json()["id"]
# 2. Wait for completion, download redacted version
# 3. Now safe to send to any LLM for analysis
Use deepread-form-fill to complete a form, then redact a copy for records:
# Step 1: Fill the form
curl -X POST https://api.deepread.tech/v1/form-fill \
-H "X-API-Key: $DEEPREAD_API_KEY" \
-F "file=@w4_form.pdf" \
-F 'form_fields={"full_name":"Jane Smith","ssn":"456-78-9012"}'
# Step 2: Redact the filled form for internal records
curl -X POST https://api.deepread.tech/v1/pii/redact \
-H "X-API-Key: $DEEPREAD_API_KEY" \
-F "file=@w4_filled.pdf"
curl -X POST https://api.deepread.tech/v1/pii/redact \
-H "X-API-Key: $DEEPREAD_API_KEY" \
-F "file=@document.pdf" \
-F "webhook_url=https://your-app.com/webhooks/pii"
Only use polling if you cannot expose a webhook endpoint.
The report tells you exactly what was found and where:
report = result["report"]
# Check if anything was below confidence threshold
if report["below_threshold_count"] > 0:
print(f"Warning: {report['below_threshold_count']} detections below threshold")
print("Consider manual review of the original document")
# See breakdown by type
for pii_type, summary in report["pii_detected"].items():
print(f"{pii_type}: {summary['count']} on pages {summary['pages']}")
Always specify the language for better detection accuracy:
-F "language=zh" # Chinese
-F "language=es" # Spanish
-F "language=hi" # Hindi
-F "language=ar" # Arabic
Poll every 3-5 seconds with backoff:
delay = 3
while True:
time.sleep(delay)
result = check_status(job_id)
if result["status"] in ("completed", "failed"):
break
delay = min(delay * 1.5, 15) # cap at 15s
Upgrade: https://www.deepread.tech/dashboard/billing?utm_source=clawhub
UNSUPPORTED_FORMATSolution: Only PDF, PNG, JPEG, and TXT are supported. Convert other formats first.
EMPTY_DOCUMENTSolution: File has no content. Check the file is not corrupted or zero-bytes.
FILE_TOO_LARGESolution: Compress the file or split into smaller documents. Max 50MB.
Possible causes:
language parameter)Possible causes:
below_threshold_count)https://api.deepread.tech — no other endpoints are contactedDEEPREAD_API_KEY secret — rotate if compromisedDeepRead is three tools with one API key:
| Skill | What it does | Install |
|---|---|---|
deepread-ocr | Extract text + structured JSON from any document. 97%+ accuracy. | clawhub install uday390/deepread-ocr |
deepread-form-fill | Fill any PDF form with AI vision — scanned forms, government PDFs | clawhub install uday390/deepread-form-fill |
deepread-pii | Redact sensitive data. This skill. | You're here |
Common pipeline: Extract data (OCR) → Fill target form (Form Fill) → Redact originals for filing (PII)
Ready to start? Get your free API key at https://www.deepread.tech/dashboard/?utm_source=clawhub