Install
openclaw skills install upstage-schema-generationGenerate a JSON schema for structured document information extraction. Use this skill whenever the user asks to generate or create a schema for a document — including phrases like "스키마 생성해줘", "이 문서에서 뭘 추출할지 스키마 짜줘", "generate a schema for this document", or any request to define extraction fields for a document. Also trigger when the user provides file paths or a folder path containing documents and wants a schema generated.
openclaw skills install upstage-schema-generationAnalyze sample documents and automatically generate a JSON schema for use with Information Extraction.
UPSTAGE_API_KEY environment variable is required. Get your key at console.upstage.ai.| Mode | When to use | Latency |
|---|---|---|
| API mode | Default. Fast schema generation via Upstage endpoint. | Low |
VLM mode (claude-opus-4-6) | When the user wants careful, hand-tuned schemas with precise extraction rules and table-aware design. | High |
Endpoint: POST https://api.upstage.ai/v1/information-extraction/schema-generation
import os
import json
import requests
import base64
api_key = os.environ["UPSTAGE_API_KEY"]
with open("document.pdf", "rb") as f:
b64 = base64.b64encode(f.read()).decode()
response = requests.post(
"https://api.upstage.ai/v1/information-extraction/schema-generation",
headers={"Authorization": f"Bearer {api_key}", "Content-Type": "application/json"},
json={
"model": "information-extract",
"messages": [
{"role": "system", "content": "Generate schema for this invoice document."},
{"role": "user", "content": [
{"type": "image_url", "image_url": {"url": f"data:application/pdf;base64,{b64}"}}
]}
]
}
)
schema = json.loads(response.json()["choices"][0]["message"]["content"])
print(json.dumps(schema, indent=2))
Note: The API model is
information-extract(notschema-generate). Thesystemmessage can guide the schema focus (e.g., "Generate schema about bank_name."). Up to 3 sample images can be provided in the user message.
For carefully designed schemas with precise extraction rules, follow the 4-step VLM workflow.
references/vlm-workflow.md (parameter gathering, document reading, property list, JSON schema conversion)references/schema-design.md (key naming, descriptions, table handling, blank/duplicate handling)<system-temp>/<input-stem>.schema.json (e.g., /tmp/invoice.schema.json). For inline input with no source file, use <system-temp>/<timestamp>-schema.json.response_format in upstage-information-extractionupstage-document-classification first, then generate per-category schemasupstage-builder