Docx Toolkit
AdvisoryAudited by VirusTotal on Mar 27, 2026.
Overview
Type: OpenClaw Skill Name: docx-toolkit-zhouli Version: 1.0.0 The docx-toolkit is a legitimate utility for extracting text, tables, and images from Microsoft Word documents (.docx and .doc). The scripts (extract_text.py, extract_doc_text.py, extract_images.py, and resize_images.py) use standard libraries like python-docx, olefile, and Pillow to perform their stated functions. While the image extraction script includes logic to categorize images based on surrounding text context (e.g., identifying contracts or certificates), this behavior is consistent with the toolkit's stated purpose of document analysis and does not exhibit signs of malicious intent or data exfiltration.
Findings (0)
Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.
Installing dependencies could pull newer or different package versions than the author tested.
The skill asks users to install live, unpinned Python packages. These dependencies are expected for Word and image processing, but unpinned package installs depend on external package provenance at install time.
pip3 install python-docx olefile Pillow
Install in a virtual environment and pin or review dependency versions if using this on sensitive documents.
Original extracted images may be compressed or changed if the command is run without a separate output folder.
The resize helper can modify existing image files when no output directory is provided. This is disclosed and purpose-aligned, but it can reduce image quality or replace originals.
If output_dir is omitted, overwrites in place.
Use an explicit output directory when resizing images unless you intentionally want in-place compression.
Extracted text, images, and manifests may reveal confidential content such as contracts, certificates, or personnel information.
The tool can store surrounding document text and image classifications in a manifest. This is useful for review workflows, but it creates local derived context that may include sensitive information from the source document.
image_manifest.json (when --context): maps each image to its context
Keep output folders private, review manifests before sharing them, and avoid sending extracted content to external services unless appropriate.
