Office Document Extractor
PassAudited by ClawScan on May 4, 2026.
Overview
The skill appears to be a benign offline Office-to-Markdown converter, but it reads and writes local document contents and includes bundled third-party Python code.
This looks consistent with an offline document converter. Before installing, be comfortable running the bundled Python code, use it only on documents you intend to extract, and remember that the generated Markdown may contain sensitive or untrusted text.
Findings (3)
Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.
Running batch mode on a sensitive folder may create Markdown copies of multiple private documents.
Batch mode reads every supported Office file in the user-selected directory and writes Markdown outputs. This is expected for the converter, but users should be aware of the directory-wide local file operation.
files = [f for f in input_path.iterdir() if f.suffix.lower() in supported]
...
out_file = output_path / f"{file.stem}.md"Run it only on intended files or folders, and choose an output directory you are comfortable storing extracted document text in.
If the bundled dependency code were altered upstream or packaged incorrectly, the converter could behave differently than expected.
The skill discloses bundled third-party dependencies. Bundling is purpose-aligned and avoids pip/network installs, but it still means users rely on the integrity of included vendored code.
- **openpyxl/** — Pure Python Excel library (v3.1.5) - **et_xmlfile/** — openpyxl dependency (pure Python)
Prefer versions from a trusted publisher, and review or verify bundled dependency provenance when using the skill on sensitive documents.
Private document contents may be placed into Markdown and then reused in analysis or LLM context; malicious document text could also be mistaken for instructions if not treated as data.
The extracted document text may later be used as model context or indexed. That is the stated purpose, but Office documents can contain sensitive data or untrusted instructions.
Use when the user needs to extract text from Word documents, Excel spreadsheets, or PowerPoint presentations for analysis, indexing, or LLM processing.
Treat extracted Markdown as sensitive document data, and do not let instructions inside converted documents override the user's actual request.
