Kekik Crawler

AdvisoryAudited by Static analysis on Apr 30, 2026.

Overview

No suspicious patterns detected.

Findings (0)

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

What this means

Using these presets may crawl pages even when a site has requested bots not to fetch them.

Why it was flagged

Both promoted research presets disable robots.txt checks, which is related to crawler behavior but should be an explicit user choice.

Skill content
"person-research": { ... "no_robots": True }, ... "deep-research": { ... "no_robots": True }
Recommendation

Use presets only when appropriate, keep page/concurrency limits conservative, and avoid disabling robots.txt unless you intentionally accept that tradeoff.

What this means

Untrusted plugin files could execute arbitrary code on the user's machine if placed in the plugin directory.

Why it was flagged

The plugin system executes local Python plugin files from the configured plugin directory; this is documented, purpose-aligned extensibility but still runs local code.

Skill content
spec.loader.exec_module(mod)
Recommendation

Only use plugins from sources you trust, and avoid pointing `--plugins` at directories containing unreviewed Python files.

What this means

A later dependency release could change behavior or introduce dependency risk.

Why it was flagged

The install dependencies are normal for this crawler but are not pinned to exact versions, so future installs may resolve to newer package versions.

Skill content
selectolax>=0.3.21
tenacity>=8.2.3
orjson>=3.10.0
scrapling>=0.2.96
Recommendation

Install in a virtual environment and consider pinning or locking dependency versions before regular use.

What this means

Search terms, including names or sensitive phrases, may be visible to those search providers.

Why it was flagged

When `--queries` is used, the query text is sent to multiple external search providers; this is expected for search-seeded crawling.

Skill content
"duckduckgo": f"https://duckduckgo.com/html/?q={q}",
"bing": f"https://www.bing.com/search?q={q}",
"yahoo": f"https://search.yahoo.com/search?p={q}",
"brave_web": f"https://search.brave.com/search?q={q}"
Recommendation

Avoid sensitive queries if you do not want them sent to external search engines; use explicit `--urls` instead when possible.

What this means

Crawled content and URL history can remain on disk after a run.

Why it was flagged

Fetched page bodies and crawl state are stored locally for cache/checkpoint behavior, which is consistent with the stated crawler design.

Skill content
self.cache.put(final_url, status, html.encode("utf-8", "ignore"))
...
save_checkpoint(self.checkpoint_path, self.queue, self.seen)
Recommendation

Store outputs in an appropriate directory, clear cache/checkpoint files when no longer needed, and avoid crawling private pages if local retention is unwanted.