Hwp Extract Pipeline

v1.0.0

HWP/HWPX/PDF extraction pipeline: attempt hwp-reader, then pyhwp, then OCR, with safe fallbacks. Use when agent needs reliable text extraction from Korean HW...

0· 75·0 current·0 all-time
bydevelopheo@heoboong
MIT-0
Download zip
LicenseMIT-0 · Free to use, modify, and redistribute. No attribution required.
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
Name/description match the included script: the code implements a pipeline (hwp-reader -> pyhwp -> HWPX parsing -> strings) to extract text from local HWP/HWPX/PDF files. No unrelated capabilities or extra credentials are requested.
Instruction Scope
SKILL.md and the script restrict operations to local files and produce JSON output. The script will execute local helper binaries (hwp-reader if present), may run the provided or detected Python venv to import pyhwp, reads zip/XML inside HWPX, and calls the system 'strings' binary as a fallback. It writes <id>_extracted.json to the current working directory and creates a short-lived temp extractor script when invoking pyhwp. These behaviors are expected for this purpose but are worth noting because the skill executes local binaries and writes files.
Install Mechanism
No install spec; this is an instruction + script bundle only. Nothing is downloaded or extracted from external URLs and no packages are installed by the skill itself.
Credentials
The skill declares no environment variables or credentials. Runtime behavior inspects ~/.openclaw/venv and the current working directory for helper binaries, which is reasonable for locating a venv or workspace-provided hwp-reader binary.
Persistence & Privilege
always is false and the skill does not request persistent system-wide changes or modify other skills. It writes output files to the working directory only (no system config changes).
Assessment
This skill appears to do exactly what it describes: extract text from local HWP/HWPX/PDF files and save JSON output. Before installing/use, consider: (1) it may execute a local 'hwp-reader' binary or a Python interpreter from ~/.openclaw/venv — ensure those binaries are trusted (an attacker controlling the working directory or venv could cause execution of malicious code); (2) it writes <id>_extracted.json into the current directory and creates a short-lived temp script when using pyhwp; (3) OCR is mentioned but not implemented in the script (system OCR tools are not invoked here). If you will run this on untrusted files or in multi-tenant environments, run it in an isolated container or sandbox and verify any helper binaries (hwp-reader, venv python) are from trusted sources.

Like a lobster shell, security has layers — review code before you run it.

latestvk978134krvvvtedwq3gacr8q0s83tfyb

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

Comments