{"skill":{"slug":"upstage-information-extraction","displayName":"Upstage Information Extraction","summary":"Extract specific named fields from documents using Upstage Information Extraction API with custom JSON schemas (sync/async) or prebuilt models for receipts,...","description":"---\nname: upstage-information-extraction\ndescription: \"Extract specific named fields from documents using Upstage Information Extraction API with custom JSON schemas (sync/async) or prebuilt models for receipts, invoices, waybills, bills of lading. Use when user wants named values like '청구액', '주문번호', invoice total, supplier name — '영수증에서 금액이랑 날짜 뽑아줘', '인보이스 필드 추출해줘', 'extract invoice number and amount', 'pull structured data from receipts'. DO NOT use for plain text extraction without a schema — use upstage-ocr. DO NOT use for full document layout/markdown conversion — use upstage-document-parse. For schema design help, pair with upstage-schema-generation.\"\nhomepage: https://console.upstage.ai/api/information-extraction\n---\n\n# Upstage Information Extraction\n\nExtract structured data from documents using custom JSON schemas. Also supports prebuilt models for receipts, invoices, and trade documents.\n\n## Quick Start\n\n```python\nimport os\nfrom openai import OpenAI\n\nclient = OpenAI(\n    api_key=os.environ[\"UPSTAGE_API_KEY\"],\n    base_url=\"https://api.upstage.ai/v1/information-extraction\"\n)\n\nresponse = client.chat.completions.create(\n    model=\"information-extract\",\n    messages=[{\n        \"role\": \"user\",\n        \"content\": [{\"type\": \"image_url\", \"image_url\": {\"url\": \"https://example.com/invoice.pdf\"}}]\n    }],\n    response_format={\n        \"type\": \"json_schema\",\n        \"json_schema\": {\n            \"name\": \"invoice_schema\",\n            \"schema\": {\n                \"type\": \"object\",\n                \"properties\": {\n                    \"invoice_number\": {\"type\": \"string\", \"description\": \"Invoice ID\"},\n                    \"total_amount\": {\"type\": \"string\", \"description\": \"Total amount with currency\"},\n                    \"date\": {\"type\": \"string\", \"description\": \"Invoice date in YYYY-MM-DD\"}\n                }\n            }\n        }\n    }\n)\nprint(response.choices[0].message.content)\n```\n\n**API Key**: Always use `os.environ[\"UPSTAGE_API_KEY\"]`. Get your key at [console.upstage.ai](https://console.upstage.ai).\n\n---\n\n## Endpoints\n\n| Mode | Endpoint |\n|------|----------|\n| Sync | `POST https://api.upstage.ai/v1/information-extraction` |\n| Async | `POST https://api.upstage.ai/v1/information-extraction/async` |\n| Status | `GET https://api.upstage.ai/v1/information-extraction/jobs/{job_id}` |\n\n- **OpenAI SDK compatible**: Set `base_url` to `https://api.upstage.ai/v1/information-extraction`\n\n## Parameters\n\n| Parameter | Type | Required | Description |\n|-----------|------|----------|-------------|\n| `model` | string | Yes | `information-extract` or `information-extract-nightly` |\n| `messages` | array | Yes | Single user message with `image_url` |\n| `response_format` | object | Yes | Extraction schema (JSON Schema format) |\n| `mode` | string | No | `standard` (default) or `enhanced` |\n| `location` | boolean | No | Return coordinates (default: false) |\n| `confidence` | boolean | No | Return confidence scores (default: false) |\n| `split` | boolean | No | Split multi-document files (default: false) |\n\n## Limits\n\n| Item | Sync | Async |\n|------|------|-------|\n| Max pages | 100 | 1,000 |\n| Max properties | 100 | 5,000 |\n| Max schema chars | 15,000 | 120,000 |\n\n## Schema Rules\n\n- Top-level properties: only `string`, `integer`, `number`, `array` allowed (no objects)\n- No nested arrays\n- Total character length of all property names must be under 10,000\n- For automatic schema generation, use `upstage-schema-generation` skill\n\n## Response Structure\n\n```json\n{\n  \"choices\": [\n    {\n      \"message\": {\n        \"content\": \"{\\\"invoice_number\\\": \\\"INV-001\\\", \\\"total_amount\\\": \\\"$1,234.56\\\", \\\"date\\\": \\\"2026-01-15\\\"}\"\n      }\n    }\n  ],\n  \"usage\": {\"prompt_tokens\": 500, \"completion_tokens\": 50}\n}\n```\n\n`content` is a JSON string. Parse with `json.loads()`.\n\n---\n\n## Prebuilt Models\n\nReady-to-use models that require no schema definition.\n\n| Model | Document Type |\n|-------|--------------|\n| `receipt-extraction` | Receipts |\n| `air-waybill-extraction` | Air waybills |\n| `bill-of-lading-and-shipping-request-extraction` | Bills of lading / shipping requests |\n| `commercial-invoice-and-packing-list-extraction` | Commercial invoices / packing lists |\n| `kr-export-declaration-certificate-extraction` | Korean export declaration certificates |\n\n### Prebuilt Usage Example\n\n```python\nimport os\nfrom openai import OpenAI\n\nclient = OpenAI(\n    api_key=os.environ[\"UPSTAGE_API_KEY\"],\n    base_url=\"https://api.upstage.ai/v1/information-extraction\"\n)\n\nresponse = client.chat.completions.create(\n    model=\"receipt-extraction\",\n    messages=[{\n        \"role\": \"user\",\n        \"content\": [{\"type\": \"image_url\", \"image_url\": {\"url\": \"https://example.com/receipt.jpg\"}}]\n    }]\n)\nprint(response.choices[0].message.content)\n```\n\nPrebuilt models are called without `response_format`.\n\n---\n\n## Async Processing (Large Documents)\n\n```python\nimport os\nimport time\nimport requests\n\napi_key = os.environ[\"UPSTAGE_API_KEY\"]\nheaders = {\"Authorization\": f\"Bearer {api_key}\", \"Content-Type\": \"application/json\"}\n\n# 1. Submit async job\nresponse = requests.post(\n    \"https://api.upstage.ai/v1/information-extraction/async\",\n    headers=headers,\n    json={\n        \"model\": \"information-extract\",\n        \"messages\": [{\"role\": \"user\", \"content\": [{\"type\": \"image_url\", \"image_url\": {\"url\": \"FILE_URL\"}}]}],\n        \"response_format\": {\"type\": \"json_schema\", \"json_schema\": {\"name\": \"schema\", \"schema\": {...}}}\n    }\n)\njob_id = response.json()[\"id\"]\n\n# 2. Poll for results\nwhile True:\n    status = requests.get(\n        f\"https://api.upstage.ai/v1/information-extraction/jobs/{job_id}\",\n        headers=headers\n    ).json()\n    if status[\"status\"] == \"completed\":\n        print(status[\"choices\"][0][\"message\"][\"content\"])\n        break\n    time.sleep(5)\n```\n\n## Output Files\n\n- **Default**: write extracted JSON to `<system-temp>/<input-stem>.extracted.json` (e.g., `/tmp/invoice.extracted.json`). Use `tempfile.gettempdir()` for cross-platform code.\n- **Override**: if the user specifies an output path, use it.\n- **Always print the resolved absolute path** in your response so the user can locate the file.\n\n## Tips\n\n- `enhanced` mode improves accuracy on complex tables/images but is slower.\n- Set `confidence: true` to get per-field confidence scores for quality filtering.\n- Schema design is critical for extraction quality. Use `upstage-schema-generation` skill for automatic generation.\n- `split: true` is useful when a single file contains multiple documents.\n","tags":{"latest":"1.0.0"},"stats":{"comments":0,"downloads":326,"installsAllTime":1,"installsCurrent":1,"stars":0,"versions":1},"createdAt":1778085486051,"updatedAt":1778492864292},"latestVersion":{"version":"1.0.0","createdAt":1778085486051,"changelog":"Initial release of Upstage Information Extraction skill.\n\n- Extracts structured named fields from documents via Upstage Information Extraction API using custom JSON schemas or prebuilt models.\n- Supports both synchronous and asynchronous processing for large documents.\n- Includes detailed usage instructions, sample Python code, and schema requirements.\n- Prebuilt models available for receipts, invoices, waybills, bills of lading, and export certificates.\n- Output is saved as a JSON file; absolute path is always reported.\n- Provides tips for model selection, confidence scoring, and multi-document splitting.","license":"MIT-0"},"metadata":{"setup":[],"os":null,"systems":null},"owner":{"handle":"upstage-deployment","userId":"s176wgta5t9fny3cxc4x84867x863zhs","displayName":"Upstage Deployment","image":"https://avatars.githubusercontent.com/u/124652596?v=4"},"moderation":null}