upstage-document-parse

Parse documents (PDF, images, DOCX, PPTX, XLSX, HWP) using Upstage Document Parse API. Extracts text, tables, figures, and layout elements with bounding boxe...

MIT-0 · Free to use, modify, and redistribute. No attribution required.
3 · 2.2k · 4 current installs · 4 all-time installs
MIT-0
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
Name/description, required binary (curl), and required environment variable (UPSTAGE_API_KEY) align with a document-parsing integration that calls an external HTTP API. No unrelated credentials, binaries, or config paths are requested.
Instruction Scope
SKILL.md instructs the agent to read local document files (e.g., ~/Documents/report.pdf) and POST them to Upstage endpoints — this is consistent with parsing functionality but does involve uploading user files to a third party. Instructions do not direct the agent to read unrelated system files or secrets beyond the declared API key.
Install Mechanism
Instruction-only skill with no install spec and no code files. This minimizes disk-write risk; required runtime tool is curl (reasonable and declared).
Credentials
Only one environment variable (UPSTAGE_API_KEY) is required and is the expected credential for the described API. The SKILL.md does show an optional local openclaw config location for storing the same key, which is consistent with setup.
Persistence & Privilege
always:false (default) and autonomous invocation is allowed (platform default). The skill does not request permanent system presence, nor does it modify other skills or system-wide settings.
Assessment
This skill will upload any document you ask it to parse to Upstage's API using your UPSTAGE_API_KEY. Before installing or using it, verify Upstage's privacy/retention policy (SKILL.md notes results are stored for 30 days and download URLs expire quickly) and avoid uploading highly sensitive material unless you accept that third party processing. Keep your API key secret, consider using a scoped/short‑lived key if possible, and revoke/rotate the key if you stop using the skill. If you want stricter controls, do not enable autonomous invocation or restrict the skill's use to interactive sessions only.

Like a lobster shell, security has layers — review code before you run it.

Current versionv1.0.4
Download zip
latestvk972wh0fvaqb6d29jk75egywf581vqv7

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

Runtime requirements

📑 Clawdis
Binscurl
EnvUPSTAGE_API_KEY
Primary envUPSTAGE_API_KEY

SKILL.md

Upstage Document Parse

Extract structured content from documents using Upstage's Document Parse API.

Supported Formats

PDF (up to 1000 pages with async), PNG, JPG, JPEG, TIFF, BMP, GIF, WEBP, DOCX, PPTX, XLSX, HWP

Installation

clawhub install upstage-document-parse

API Key Setup

  1. Get your API key from Upstage Console
  2. Configure the API key:
openclaw config set skills.entries.upstage-document-parse.apiKey "your-api-key"

Or add to ~/.openclaw/openclaw.json:

{
  "skills": {
    "entries": {
      "upstage-document-parse": {
        "apiKey": "your-api-key"
      }
    }
  }
}

Usage Examples

Just ask the agent to parse your document:

"Parse this PDF: ~/Documents/report.pdf"
"Parse: ~/Documents/report.jpg"

Sync API (Small Documents)

For small documents (recommended < 20 pages).

Parameters

ParameterTypeDefaultDescription
modelstringrequiredUse document-parse (latest) or document-parse-nightly
documentfilerequiredDocument file to parse
modestringstandardstandard (text-focused), enhanced (complex tables/images), auto
ocrstringautoauto (images only) or force (always OCR)
output_formatsstring['html']text, html, markdown (array format)
coordinatesbooleantrueInclude bounding box coordinates
base64_encodingstring[]Elements to base64: ["table"], ["figure"], etc.
chart_recognitionbooleantrueConvert charts to tables (Beta)
merge_multipage_tablesbooleanfalseMerge tables across pages (Beta, max 20 pages if true)

Basic Parsing

curl -X POST "https://api.upstage.ai/v1/document-digitization" \
  -H "Authorization: Bearer $UPSTAGE_API_KEY" \
  -F "document=@/path/to/file.pdf" \
  -F "model=document-parse"

Extract Markdown

curl -X POST "https://api.upstage.ai/v1/document-digitization" \
  -H "Authorization: Bearer $UPSTAGE_API_KEY" \
  -F "document=@report.pdf" \
  -F "model=document-parse" \
  -F "output_formats=['markdown']"

Enhanced Mode for Complex Documents

curl -X POST "https://api.upstage.ai/v1/document-digitization" \
  -H "Authorization: Bearer $UPSTAGE_API_KEY" \
  -F "document=@complex.pdf" \
  -F "model=document-parse" \
  -F "mode=enhanced" \
  -F "output_formats=['html', 'markdown']"

Force OCR for Scanned Documents

curl -X POST "https://api.upstage.ai/v1/document-digitization" \
  -H "Authorization: Bearer $UPSTAGE_API_KEY" \
  -F "document=@scan.pdf" \
  -F "model=document-parse" \
  -F "ocr=force"

Extract Table Images as Base64

curl -X POST "https://api.upstage.ai/v1/document-digitization" \
  -H "Authorization: Bearer $UPSTAGE_API_KEY" \
  -F "document=@invoice.pdf" \
  -F "model=document-parse" \
  -F "base64_encoding=['table']"

Response Structure

{
  "api": "2.0",
  "model": "document-parse-251217",
  "content": {
    "html": "<h1>...</h1>",
    "markdown": "# ...",
    "text": "..."
  },
  "elements": [
    {
      "id": 0,
      "category": "heading1",
      "content": { "html": "...", "markdown": "...", "text": "..." },
      "page": 1,
      "coordinates": [{"x": 0.06, "y": 0.05}, ...]
    }
  ],
  "usage": { "pages": 1 }
}

Element Categories

paragraph, heading1, heading2, heading3, list, table, figure, chart, equation, caption, header, footer, index, footnote


Async API (Large Documents)

For documents up to 1000 pages. Documents are processed in batches of 10 pages.

Submit Request

curl -X POST "https://api.upstage.ai/v1/document-digitization/async" \
  -H "Authorization: Bearer $UPSTAGE_API_KEY" \
  -F "document=@large.pdf" \
  -F "model=document-parse" \
  -F "output_formats=['markdown']"

Response:

{"request_id": "uuid-here"}

Check Status & Get Results

curl "https://api.upstage.ai/v1/document-digitization/requests/{request_id}" \
  -H "Authorization: Bearer $UPSTAGE_API_KEY"

Response includes download_url for each batch (available for 30 days).

List All Requests

curl "https://api.upstage.ai/v1/document-digitization/requests" \
  -H "Authorization: Bearer $UPSTAGE_API_KEY"

Status Values

  • submitted: Request received
  • started: Processing in progress
  • completed: Ready for download
  • failed: Error occurred (check failure_message)

Notes

  • Results stored for 30 days
  • Download URLs expire after 15 minutes (re-fetch status to get new URLs)
  • Documents split into batches of up to 10 pages

Python Usage

import requests

api_key = "up_xxx"

# Sync
with open("doc.pdf", "rb") as f:
    response = requests.post(
        "https://api.upstage.ai/v1/document-digitization",
        headers={"Authorization": f"Bearer {api_key}"},
        files={"document": f},
        data={"model": "document-parse", "output_formats": "['markdown']"}
    )
print(response.json()["content"]["markdown"])

# Async for large docs
with open("large.pdf", "rb") as f:
    r = requests.post(
        "https://api.upstage.ai/v1/document-digitization/async",
        headers={"Authorization": f"Bearer {api_key}"},
        files={"document": f},
        data={"model": "document-parse"}
    )
request_id = r.json()["request_id"]

# Poll for results
import time
while True:
    status = requests.get(
        f"https://api.upstage.ai/v1/document-digitization/requests/{request_id}",
        headers={"Authorization": f"Bearer {api_key}"}
    ).json()
    if status["status"] == "completed":
        break
    time.sleep(5)

LangChain Integration

from langchain_upstage import UpstageDocumentParseLoader

loader = UpstageDocumentParseLoader(
    file_path="document.pdf",
    output_format="markdown",
    ocr="auto"
)
docs = loader.load()

Environment Variable (Alternative)

You can also set the API key as an environment variable:

export UPSTAGE_API_KEY="your-api-key"

Tips

  • Use mode=enhanced for complex tables, charts, images
  • Use mode=auto to let API decide per page
  • Use async API for documents > 20 pages
  • Use ocr=force for scanned PDFs or images
  • merge_multipage_tables=true combines split tables (max 20 pages with enhanced mode)
  • Results from async API available for 30 days
  • Server-side timeout: 5 minutes per request (sync API)
  • Standard documents process in ~3 seconds

Files

1 total
Select a file
Select a file to preview.

Comments

Loading comments…