doc-extract-filter

Security checks across static analysis, malware telemetry, and agentic risk

Overview

The artifacts show a purpose-aligned document extraction tool; the main risk is that it can read and save full local document contents when asked, especially in batch mode.

This skill appears appropriate for extracting and filtering document text. Before installing or using it, make sure you are comfortable sharing the selected files with the agent, avoid broad batch directories, and review or pin dependencies if using it in a controlled environment.

Static analysis

No static analysis findings were reported for this release.

VirusTotal

VirusTotal findings are pending for this skill version.

View on VirusTotal

Risk analysis

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

What this means

If you filter a sensitive document, the assistant may still receive the entire document, not only the matching snippets.

Why it was flagged

Filter mode returns the full extracted document text in addition to filtered matches, so private content may enter the agent context even when the user asked for filtering.

Skill content
"data": {
                        "text": text,
                        "filtered_text": filtered_text,
                        "matches": filter_result.get("results", [])
Recommendation

Use the skill only on documents you intend to share with the agent, and consider changing filter mode to return only matches if you need stricter privacy.

What this means

Choosing a broad folder could create extracted-text copies of many local documents.

Why it was flagged

Batch mode recursively collects supported files from the chosen directory and writes extracted results to JSON files.

Skill content
for file_path in input_dir.rglob('*'):
                    if file_path.is_file() and file_path.suffix.lower() in supported_extensions:
                        files_to_process.append(file_path)
...
with open(output_file, 'w', encoding='utf-8') as f:
                            json.dump(result, f, ensure_ascii=False, indent=2)
Recommendation

Use narrow input directories, choose a controlled output directory, and avoid pointing batch mode at home, workspace, or shared folders unless that is intended.

What this means

Future installs may resolve different package versions than the author tested.

Why it was flagged

Several dependencies are listed without exact version pins, unlike the core pinned packages, which can lead to dependency drift during installation.

Skill content
python-markdown  # 用于 Markdown 文件处理
beautifulsoup4  # 用于从 HTML 中提取文本
...
pytesseract; python_version >= "3.6"
Recommendation

Pin and verify dependency versions before installing in a sensitive environment.