doc-extract-filter

v1.1.1

支持 PDF、Word、Excel 文件的文本提取和按关键词筛选,返回完整或筛选后的文本内容。

1· 114·0 current·0 all-time
MIT-0
Download zip
LicenseMIT-0 · Free to use, modify, and redistribute. No attribution required.
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
Name/description (extract/filter text from documents) matches the included code: extractor, filter, converter, and utils implement extraction, keyword/regex filtering, batch processing and result export. Declared CLI/API parameters align with implementation.
Instruction Scope
SKILL.md and entry script instruct the agent to read specified files or directories, extract text, filter matches, and optionally write JSON/text outputs. The instructions do not request unrelated data, secrets, or remote endpoints. Note: batch mode traverses directories and will process any supported files accessible to the running agent—this is expected but relevant for sensitive directories.
Install Mechanism
This is instruction/code-based (no install spec). A requirements.txt is provided but there is no automated installer; the runtime environment must have the listed Python packages. OCR functionality additionally requires system tesseract and pdf2image/Pillow; missing optional dependencies are handled in code (falls back to non-OCR extraction).
Credentials
The skill does not request environment variables, credentials, or config paths. All I/O is local-file-based as described. There are no requests for unrelated service keys or tokens.
Persistence & Privilege
Skill is not marked always:true and does not modify other skills or system-wide agent settings. It performs file reads/writes within the paths provided by the caller, which is appropriate for its purpose.
Assessment
This skill is internally consistent and appears to do what it claims, but it runs code on your agent and will read/write any file paths you pass to it. Before installing or invoking: (1) ensure the Python environment has required packages (requirements.txt) and tesseract if you need OCR; (2) avoid pointing it at sensitive system or credential directories—it will traverse directories you give it in batch mode; (3) run it on non-sensitive sample files first to confirm behavior and output formats; (4) if you need stronger isolation, run the skill in a sandboxed environment or container. If you need, I can list the exact functions that read/write files and where outputs are saved.

Like a lobster shell, security has layers — review code before you run it.

latestvk976kjnakg4yk6yd6kk924r3q183ferc

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

Comments