Docx Toolkit

AdvisoryAudited by VirusTotal on Mar 27, 2026.

Overview

Type: OpenClaw Skill Name: docx-toolkit-zhouli Version: 1.0.0 The docx-toolkit is a legitimate utility for extracting text, tables, and images from Microsoft Word documents (.docx and .doc). The scripts (extract_text.py, extract_doc_text.py, extract_images.py, and resize_images.py) use standard libraries like python-docx, olefile, and Pillow to perform their stated functions. While the image extraction script includes logic to categorize images based on surrounding text context (e.g., identifying contracts or certificates), this behavior is consistent with the toolkit's stated purpose of document analysis and does not exhibit signs of malicious intent or data exfiltration.

Findings (0)

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

NoteHigh Confidence

ASI04: Agentic Supply Chain Vulnerabilities

What this means

Installing dependencies could pull newer or different package versions than the author tested.

Why it was flagged

The skill asks users to install live, unpinned Python packages. These dependencies are expected for Word and image processing, but unpinned package installs depend on external package provenance at install time.

Skill content

pip3 install python-docx olefile Pillow

Recommendation

Install in a virtual environment and pin or review dependency versions if using this on sensitive documents.

NoteHigh Confidence

ASI02: Tool Misuse and Exploitation

What this means

Original extracted images may be compressed or changed if the command is run without a separate output folder.

Why it was flagged

The resize helper can modify existing image files when no output directory is provided. This is disclosed and purpose-aligned, but it can reduce image quality or replace originals.

Skill content

If output_dir is omitted, overwrites in place.

Recommendation

Use an explicit output directory when resizing images unless you intentionally want in-place compression.

NoteHigh Confidence

ASI06: Memory and Context Poisoning

What this means

Extracted text, images, and manifests may reveal confidential content such as contracts, certificates, or personnel information.

Why it was flagged

The tool can store surrounding document text and image classifications in a manifest. This is useful for review workflows, but it creates local derived context that may include sensitive information from the source document.

Skill content

image_manifest.json (when --context): maps each image to its context

Recommendation

Keep output folders private, review manifests before sharing them, and avoid sending extracted content to external services unless appropriate.