Kekik Crawler

PassAudited by ClawScan on May 1, 2026.

Overview

The artifacts show a coherent web crawler, not hidden exfiltration or destructive behavior, but users should review its dependency install, plugin execution, robots settings, external search queries, and local caches.

This skill appears safe for its stated purpose if you intend to run a web crawler. Install it in a virtual environment, review or pin dependencies, use only trusted plugins, be careful with person or deep research presets because they disable robots.txt checks, avoid sensitive search queries, and manage the generated outputs/cache/checkpoint files.

Findings (5)

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

What this means

Using these presets may crawl pages even when a site has requested bots not to fetch them.

Why it was flagged

Both promoted research presets disable robots.txt checks, which is related to crawler behavior but should be an explicit user choice.

Skill content
"person-research": { ... "no_robots": True }, ... "deep-research": { ... "no_robots": True }
Recommendation

Use presets only when appropriate, keep page/concurrency limits conservative, and avoid disabling robots.txt unless you intentionally accept that tradeoff.

What this means

Untrusted plugin files could execute arbitrary code on the user's machine if placed in the plugin directory.

Why it was flagged

The plugin system executes local Python plugin files from the configured plugin directory; this is documented, purpose-aligned extensibility but still runs local code.

Skill content
spec.loader.exec_module(mod)
Recommendation

Only use plugins from sources you trust, and avoid pointing `--plugins` at directories containing unreviewed Python files.

What this means

A later dependency release could change behavior or introduce dependency risk.

Why it was flagged

The install dependencies are normal for this crawler but are not pinned to exact versions, so future installs may resolve to newer package versions.

Skill content
selectolax>=0.3.21
tenacity>=8.2.3
orjson>=3.10.0
scrapling>=0.2.96
Recommendation

Install in a virtual environment and consider pinning or locking dependency versions before regular use.

What this means

Search terms, including names or sensitive phrases, may be visible to those search providers.

Why it was flagged

When `--queries` is used, the query text is sent to multiple external search providers; this is expected for search-seeded crawling.

Skill content
"duckduckgo": f"https://duckduckgo.com/html/?q={q}",
"bing": f"https://www.bing.com/search?q={q}",
"yahoo": f"https://search.yahoo.com/search?p={q}",
"brave_web": f"https://search.brave.com/search?q={q}"
Recommendation

Avoid sensitive queries if you do not want them sent to external search engines; use explicit `--urls` instead when possible.

What this means

Crawled content and URL history can remain on disk after a run.

Why it was flagged

Fetched page bodies and crawl state are stored locally for cache/checkpoint behavior, which is consistent with the stated crawler design.

Skill content
self.cache.put(final_url, status, html.encode("utf-8", "ignore"))
...
save_checkpoint(self.checkpoint_path, self.queue, self.seen)
Recommendation

Store outputs in an appropriate directory, clear cache/checkpoint files when no longer needed, and avoid crawling private pages if local retention is unwanted.