Install
openclaw skills install docstrangeDocument extraction API by Nanonets. Convert PDFs and images to markdown, JSON, or CSV with confidence scoring. Use when you need to OCR documents, extract invoice fields, parse receipts, or convert tables to structured data.
openclaw skills install docstrangeDocument extraction API — convert PDFs, images, and documents to markdown, JSON, or CSV with per-field confidence scoring.
Get your API key: https://docstrange.nanonets.com/app
curl -X POST "https://extraction-api.nanonets.com/api/v1/extract/sync" \
-H "Authorization: Bearer $DOCSTRANGE_API_KEY" \
-F "file=@document.pdf" \
-F "output_format=markdown"
Response:
{
"success": true,
"record_id": "550e8400-e29b-41d4-a716-446655440000",
"status": "completed",
"result": {
"markdown": {
"content": "# Invoice\n\n**Invoice Number:** INV-2024-001..."
}
}
}
# Visit the dashboard
https://docstrange.nanonets.com/app
Save your API key:
export DOCSTRANGE_API_KEY="your_api_key_here"
Recommended: Use environment variables (most secure):
{
skills: {
entries: {
"docstrange": {
enabled: true,
// API key loaded from environment variable DOCSTRANGE_API_KEY
},
},
},
}
Alternative: Store in config file (use with caution):
{
skills: {
entries: {
"docstrange": {
enabled: true,
env: {
DOCSTRANGE_API_KEY: "your_api_key_here",
},
},
},
},
}
Security Note: If storing API keys in ~/.openclaw/openclaw.json:
chmod 600 ~/.openclaw/openclaw.jsoncurl -X POST "https://extraction-api.nanonets.com/api/v1/extract/sync" \
-H "Authorization: Bearer $DOCSTRANGE_API_KEY" \
-F "file=@document.pdf" \
-F "output_format=markdown"
Access content: response["result"]["markdown"]["content"]
Simple field list:
curl -X POST "https://extraction-api.nanonets.com/api/v1/extract/sync" \
-H "Authorization: Bearer $DOCSTRANGE_API_KEY" \
-F "file=@invoice.pdf" \
-F "output_format=json" \
-F 'json_options=["invoice_number", "date", "total_amount", "vendor"]' \
-F "include_metadata=confidence_score"
With JSON schema:
curl -X POST "https://extraction-api.nanonets.com/api/v1/extract/sync" \
-H "Authorization: Bearer $DOCSTRANGE_API_KEY" \
-F "file=@invoice.pdf" \
-F "output_format=json" \
-F 'json_options={"type": "object", "properties": {"invoice_number": {"type": "string"}, "total_amount": {"type": "number"}}}'
Response with confidence scores:
{
"result": {
"json": {
"content": {
"invoice_number": "INV-2024-001",
"total_amount": 500.00
},
"metadata": {
"confidence_score": {
"invoice_number": 98,
"total_amount": 99
}
}
}
}
}
curl -X POST "https://extraction-api.nanonets.com/api/v1/extract/sync" \
-H "Authorization: Bearer $DOCSTRANGE_API_KEY" \
-F "file=@table.pdf" \
-F "output_format=csv" \
-F "csv_options=table"
For documents >5 pages, use async and poll:
Queue the document:
curl -X POST "https://extraction-api.nanonets.com/api/v1/extract/async" \
-H "Authorization: Bearer $DOCSTRANGE_API_KEY" \
-F "file=@large-document.pdf" \
-F "output_format=markdown"
# Returns: {"record_id": "12345", "status": "processing"}
Poll for results:
curl -X GET "https://extraction-api.nanonets.com/api/v1/extract/results/12345" \
-H "Authorization: Bearer $DOCSTRANGE_API_KEY"
# Returns: {"status": "completed", "result": {...}}
Get element coordinates for layout analysis:
-F "include_metadata=bounding_boxes"
Extract document structure (sections, tables, key-value pairs):
-F "json_options=hierarchy_output"
Enhanced table and number formatting:
-F "markdown_options=financial-docs"
Guide extraction with prompts:
-F "custom_instructions=Focus on financial data. Ignore headers."
-F "prompt_mode=append"
Request multiple formats in one call:
-F "output_format=markdown,json"
| Document Size | Endpoint | Notes |
|---|---|---|
| <=5 pages | /extract/sync | Immediate response |
| >5 pages | /extract/async | Poll for results |
JSON Extraction:
["field1", "field2"] — quick extractions{"type": "object", ...} — strict typing, nested dataConfidence Scores:
include_metadata=confidence_score{
"type": "object",
"properties": {
"invoice_number": {"type": "string"},
"date": {"type": "string"},
"vendor": {"type": "string"},
"total": {"type": "number"},
"line_items": {
"type": "array",
"items": {
"type": "object",
"properties": {
"description": {"type": "string"},
"quantity": {"type": "number"},
"price": {"type": "number"}
}
}
}
}
}
{
"type": "object",
"properties": {
"merchant": {"type": "string"},
"date": {"type": "string"},
"total": {"type": "number"},
"items": {
"type": "array",
"items": {"type": "object", "properties": {"name": {"type": "string"}, "price": {"type": "number"}}}
}
}
}
Important: Documents uploaded to DocStrange are transmitted to https://extraction-api.nanonets.com and processed on external servers.
Before uploading sensitive documents:
Best practices:
file_url with publicly accessible URLs instead of uploading large files directly"your_api_key_here" in examples400 Bad Request:
file, file_url, or file_base64Sync Timeout:
/extract/results/{record_id}Missing Confidence Scores:
json_options (field list or schema)include_metadata=confidence_scoreAuthentication Errors:
DOCSTRANGE_API_KEY environment variable is setBefore publishing or updating this skill, verify:
package.json declares requiredEnv and primaryEnv for DOCSTRANGE_API_KEYpackage.json lists API endpoints in endpoints array"your_api_key_here") not real keysSKILL.md or package.json