Install
openclaw skills install mar-docstrangeDocument extraction via SkillBoss API Hub. Convert PDFs and images to markdown, JSON, or CSV with confidence scoring. Use when you need to OCR documents, extract invoice fields, parse receipts, or convert tables to structured data.
openclaw skills install mar-docstrangeDocument extraction — convert PDFs, images, and documents to markdown, JSON, or CSV with per-field confidence scoring, powered by SkillBoss API Hub.
curl -X POST "https://api.heybossai.com/v1/run" \
-H "Authorization: Bearer $SKILLBOSS_API_KEY" \
-H "Content-Type: application/json" \
-d '{"model": "reducto/parse", "inputs": {"document_url": "https://example.com/document.pdf"}}'
Response:
{
"result": {
"record_id": "550e8400-e29b-41d4-a716-446655440000",
"status": "completed",
"markdown": {
"content": "# Invoice\n\n**Invoice Number:** INV-2024-001..."
}
}
}
}
Visit the SkillBoss dashboard to obtain your API key.
Save your API key:
export SKILLBOSS_API_KEY="your_api_key_here"
Recommended: Use environment variables (most secure):
{
skills: {
entries: {
"docstrange": {
enabled: true,
// API key loaded from environment variable SKILLBOSS_API_KEY
},
},
},
}
Alternative: Store in config file (use with caution):
{
skills: {
entries: {
"docstrange": {
enabled: true,
env: {
SKILLBOSS_API_KEY: "your_api_key_here",
},
},
},
},
}
Security Note: If storing API keys in ~/.openclaw/openclaw.json:
chmod 600 ~/.openclaw/openclaw.jsoncurl -X POST "https://api.heybossai.com/v1/run" \
-H "Authorization: Bearer $SKILLBOSS_API_KEY" \
-H "Content-Type: application/json" \
-d '{"model": "reducto/parse", "inputs": {"document_url": "https://example.com/document.pdf"}}'
Access content: response["data"]["result"]["markdown"]["content"]
Simple field list:
curl -X POST "https://api.heybossai.com/v1/run" \
-H "Authorization: Bearer $SKILLBOSS_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "reducto/parse",
"inputs": {
"file_base64": "<base64-encoded-file>",
"filename": "invoice.pdf",
"output_format": "json",
"json_options": ["invoice_number", "date", "total_amount", "vendor"],
"include_metadata": "confidence_score"
}
}'
With JSON schema:
curl -X POST "https://api.heybossai.com/v1/run" \
-H "Authorization: Bearer $SKILLBOSS_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "reducto/parse",
"inputs": {
"file_base64": "<base64-encoded-file>",
"filename": "invoice.pdf",
"output_format": "json",
"json_options": {"type": "object", "properties": {"invoice_number": {"type": "string"}, "total_amount": {"type": "number"}}}
}
}'
Response with confidence scores:
{
"result": {
"json": {
"content": {
"invoice_number": "INV-2024-001",
"total_amount": 500.00
},
"metadata": {
"confidence_score": {
"invoice_number": 98,
"total_amount": 99
}
}
}
}
}
}
curl -X POST "https://api.heybossai.com/v1/run" \
-H "Authorization: Bearer $SKILLBOSS_API_KEY" \
-H "Content-Type: application/json" \
-d '{"model": "reducto/parse", "inputs": {"document_url": "https://example.com/table.pdf"}}'
For documents >5 pages, use async and poll:
Queue the document:
curl -X POST "https://api.heybossai.com/v1/run" \
-H "Authorization: Bearer $SKILLBOSS_API_KEY" \
-H "Content-Type: application/json" \
-d '{"model": "reducto/parse", "inputs": {"file_base64": "<base64-encoded-file>", "filename": "large-document.pdf", "output_format": "markdown", "async": true}}'
# Returns: {"data": {"result": {"record_id": "12345", "status": "processing"}}}
Poll for results:
curl -X POST "https://api.heybossai.com/v1/run" \
-H "Authorization: Bearer $SKILLBOSS_API_KEY" \
-H "Content-Type: application/json" \
-d '{"model": "reducto/parse", "inputs": {"document_url": "https://example.com/document.pdf"}}'
# Returns: {"data": {"result": {"status": "completed", ...}}}
Get element coordinates for layout analysis:
"include_metadata": "bounding_boxes"
Extract document structure (sections, tables, key-value pairs):
"json_options": "hierarchy_output"
Enhanced table and number formatting:
"markdown_options": "financial-docs"
Guide extraction with prompts:
"custom_instructions": "Focus on financial data. Ignore headers.",
"prompt_mode": "append"
Request multiple formats in one call:
"output_format": "markdown,json"
| Document Size | Mode | Notes |
|---|---|---|
| <=5 pages | sync (default) | Immediate response |
| >5 pages | "async": true | Poll for results |
JSON Extraction:
["field1", "field2"] — quick extractions{"type": "object", ...} — strict typing, nested dataConfidence Scores:
"include_metadata": "confidence_score"{
"type": "object",
"properties": {
"invoice_number": {"type": "string"},
"date": {"type": "string"},
"vendor": {"type": "string"},
"total": {"type": "number"},
"line_items": {
"type": "array",
"items": {
"type": "object",
"properties": {
"description": {"type": "string"},
"quantity": {"type": "number"},
"price": {"type": "number"}
}
}
}
}
}
{
"type": "object",
"properties": {
"merchant": {"type": "string"},
"date": {"type": "string"},
"total": {"type": "number"},
"items": {
"type": "array",
"items": {"type": "object", "properties": {"name": {"type": "string"}, "price": {"type": "number"}}}
}
}
}
Important: Documents uploaded via SkillBoss API Hub are transmitted to https://api.heybossai.com and processed through the SkillBoss infrastructure.
Before uploading sensitive documents:
Best practices:
"async": true for documents >5 pages to avoid timeoutsfile_url with publicly accessible URLs instead of uploading large files directly"your_api_key_here" in examples400 Bad Request:
file_base64 or file_urlSKILLBOSS_API_KEY is validSync Timeout:
"async": true for documents >5 pages"action": "get_result" and "record_id"Missing Confidence Scores:
json_options (field list or schema)"include_metadata": "confidence_score"Authentication Errors:
SKILLBOSS_API_KEY environment variable is setBefore publishing or updating this skill, verify:
package.json declares requiredEnv and primaryEnv for SKILLBOSS_API_KEYpackage.json lists API endpoints in endpoints array"your_api_key_here") not real keysSKILL.md or package.json{} body for Guide mode)