Back to skill
v1.1.1

doc-extract-filter

BenignClawScan verdict for this skill. Analyzed May 1, 2026, 7:58 AM.

Analysis

The artifacts show a purpose-aligned document extraction tool; the main risk is that it can read and save full local document contents when asked, especially in batch mode.

GuidanceThis skill appears appropriate for extracting and filtering document text. Before installing or using it, make sure you are comfortable sharing the selected files with the agent, avoid broad batch directories, and review or pin dependencies if using it in a controlled environment.

Findings (3)

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

Abnormal behavior control

Checks for instructions or behavior that redirect the agent, misuse tools, execute unexpected code, cascade across systems, exploit user trust, or continue outside the intended task.

Tool Misuse and Exploitation
SeverityMediumConfidenceHighStatusNote
scripts/doc-extract-filter.py
for file_path in input_dir.rglob('*'):
                    if file_path.is_file() and file_path.suffix.lower() in supported_extensions:
                        files_to_process.append(file_path)
...
with open(output_file, 'w', encoding='utf-8') as f:
                            json.dump(result, f, ensure_ascii=False, indent=2)

Batch mode recursively collects supported files from the chosen directory and writes extracted results to JSON files.

User impactChoosing a broad folder could create extracted-text copies of many local documents.
RecommendationUse narrow input directories, choose a controlled output directory, and avoid pointing batch mode at home, workspace, or shared folders unless that is intended.
Agentic Supply Chain Vulnerabilities
SeverityLowConfidenceHighStatusNote
requirements.txt
python-markdown  # 用于 Markdown 文件处理
beautifulsoup4  # 用于从 HTML 中提取文本
...
pytesseract; python_version >= "3.6"

Several dependencies are listed without exact version pins, unlike the core pinned packages, which can lead to dependency drift during installation.

User impactFuture installs may resolve different package versions than the author tested.
RecommendationPin and verify dependency versions before installing in a sensitive environment.
Sensitive data protection

Checks for exposed credentials, poisoned memory or context, unclear communication boundaries, or sensitive data that could leave the user's control.

Memory and Context Poisoning
SeverityMediumConfidenceHighStatusNote
scripts/doc-extract-filter.py
"data": {
                        "text": text,
                        "filtered_text": filtered_text,
                        "matches": filter_result.get("results", [])

Filter mode returns the full extracted document text in addition to filtered matches, so private content may enter the agent context even when the user asked for filtering.

User impactIf you filter a sensitive document, the assistant may still receive the entire document, not only the matching snippets.
RecommendationUse the skill only on documents you intend to share with the agent, and consider changing filter mode to return only matches if you need stricter privacy.