{"skill":{"slug":"hwp-reader","displayName":"hwp-reader","summary":"Extract and analyze text, tables, images, and metadata from Korean HWP and HWPX documents, supporting both legacy and modern formats.","description":"# 🐧 HWP Reader — Read & Analyze Korean HWP/HWPX Documents\n\n> Author: 무펭이 🐧 | v1.0.0\n\n## Description\n\nRead and extract text content from Korean HWP (한글) and HWPX files. Supports both legacy HWP format (via pyhwp) and modern HWPX format (ZIP-based XML).\n\n## When to Use\n\n- User asks to read/analyze a .hwp or .hwpx file\n- Government support application forms (정부지원사업 신청서)\n- Any Korean document in Hangul Word Processor format\n\n## How It Works\n\n### HWP Files (Legacy Format)\n```bash\npython3 -c \"\nfrom hwp5.hwp5txt import main\nimport sys\nsys.argv = ['hwp5txt', 'FILE_PATH']\nmain()\n\"\n```\n\n### HWPX Files (Modern XML Format)\n```bash\npython3 -c \"\nimport zipfile\nz = zipfile.ZipFile('FILE_PATH')\n\n# Quick preview text\nif 'Preview/PrvText.txt' in z.namelist():\n    print(z.read('Preview/PrvText.txt').decode('utf-8'))\n\n# Full content from section XMLs\nimport xml.etree.ElementTree as ET\nfor name in sorted(z.namelist()):\n    if name.startswith('Contents/section') and name.endswith('.xml'):\n        root = ET.fromstring(z.read(name))\n        for elem in root.iter():\n            if elem.text and elem.text.strip():\n                print(elem.text.strip())\n\"\n```\n\n## Capabilities\n\n| Feature | HWP | HWPX |\n|---------|-----|------|\n| Text extraction | ✅ pyhwp | ✅ ZIP+XML |\n| Table detection | ⚠️ `<표>` markers | ✅ XML tags |\n| Image extraction | ❌ | ✅ from BinData/ |\n| Metadata | ✅ via hwp5 | ✅ from version.xml |\n\n## Dependencies\n\n- **pyhwp** (`pip install pyhwp`) — installed at `/Users/mupeng/Library/Python/3.9/lib/python/site-packages/hwp5/`\n- **Python 3.9+** — standard library `zipfile`, `xml.etree.ElementTree`\n\n## Limitations\n\n- HWP text extraction loses table structure (shows `<표>` placeholder)\n- HWPX Preview/PrvText.txt is truncated to ~1KB; use section XMLs for full content\n- Complex formatting (colors, fonts, page layout) not preserved in text mode\n- Encrypted/password-protected HWP files not supported\n\n## Usage Examples\n\n### Read a government application form\n```\n\"이 HWP 파일 읽어줘: /path/to/신청서.hwp\"\n→ Extract text → Analyze structure → Summarize sections\n```\n\n### Compare two versions\n```\n\"v1.hwp와 v2.hwp 차이점 분석해줘\"\n→ Extract both → Diff content → Report changes\n```\n\n### Fill in a template\n```\n\"이 양식에 우리 사업 내용 채워줘\"\n→ Read template → Identify blanks → Generate content suggestions\n```\n\n---\n\n*🐧 무펭이 — Making Korean documents accessible to AI agents*\n","tags":{"latest":"1.0.0"},"stats":{"comments":0,"downloads":879,"installsAllTime":2,"installsCurrent":2,"stars":0,"versions":1},"createdAt":1772292818716,"updatedAt":1778993984076},"latestVersion":{"version":"1.0.0","createdAt":1772292818716,"changelog":"Initial publish","license":null},"metadata":null,"owner":{"handle":"mupengi-bot","userId":"s17cb0n67gxg14m41wrqex0hr183j5d2","displayName":"mupengi-bot","image":"https://avatars.githubusercontent.com/u/259087580?v=4"},"moderation":null}