docx-pdf-knowledge-parser

v1.0.1

Parse local `.docx` and `.pdf` files into structured knowledge artifacts with detailed reports, tracking successes, failures, and summaries without auto-writ...

0· 110·0 current·0 all-time
MIT-0
Download zip
LicenseMIT-0 · Free to use, modify, and redistribute. No attribution required.
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
The name/description (parse local .docx/.pdf into report-first outputs) matches the code and SKILL.md. The included parsers and run.py implement the declared behavior. Mentions of Feishu in README/agent metadata are informational for future connectors but do not imply hidden Feishu integration.
Instruction Scope
SKILL.md explicitly limits processing to local/already-available files and the code follows that. Be aware the tool will iterate all files in the provided input directory and will attempt to parse any .docx/.pdf it finds — so the operator must ensure the input directory contains only files intended for ingestion to avoid accidental parsing of sensitive documents.
Install Mechanism
There is no install spec; requirements.txt lists python-docx and pypdf which are appropriate for the task. No downloads from arbitrary URLs or extract operations are present.
Credentials
The skill requests no environment variables, no credentials, and no config paths. The code does not reference any secrets or external services; the lack of credentials is consistent with an offline/local parsing utility.
Persistence & Privilege
always is false and the skill does not attempt to modify other skills or global agent settings. It writes output files only to the user-specified output directory (kb-items.jsonl, failed-items.jsonl, ingest-report.md, MEMORY.candidate.md).
Scan Findings in Context
[no_issues_found] expected: Static scan did not flag suspicious patterns. File I/O and use of python-docx/pypdf are expected for this purpose.
Assessment
This skill appears to be what it says: a local batch parser for .docx and .pdf files. Before running it, (1) ensure the --input-dir contains only files you want parsed (it will read and extract text from each .docx/.pdf it finds); (2) be aware extracted text and summary files (including MEMORY.candidate.md) will be written in plaintext to --output-dir — avoid writing to a shared or sensitive location; (3) install the two Python dependencies (python-docx, pypdf) in a controlled environment; (4) the README/metadata mentions Feishu but no network connector or credentials are included — adding Feishu integration would require extra code/credentials; and (5) if you need OCR for image-based PDFs, this version will mark them as failed and recommend manual/OCR workflows. No network exfiltration or credential use was found.

Like a lobster shell, security has layers — review code before you run it.

latestvk97ette4y348279f5b60sh4m0x83evyh

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

Comments