Kekik Crawler
PassAudited by ClawScan on May 1, 2026.
Overview
The artifacts show a coherent web crawler, not hidden exfiltration or destructive behavior, but users should review its dependency install, plugin execution, robots settings, external search queries, and local caches.
This skill appears safe for its stated purpose if you intend to run a web crawler. Install it in a virtual environment, review or pin dependencies, use only trusted plugins, be careful with person or deep research presets because they disable robots.txt checks, avoid sensitive search queries, and manage the generated outputs/cache/checkpoint files.
Findings (5)
Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.
Using these presets may crawl pages even when a site has requested bots not to fetch them.
Both promoted research presets disable robots.txt checks, which is related to crawler behavior but should be an explicit user choice.
"person-research": { ... "no_robots": True }, ... "deep-research": { ... "no_robots": True }Use presets only when appropriate, keep page/concurrency limits conservative, and avoid disabling robots.txt unless you intentionally accept that tradeoff.
Untrusted plugin files could execute arbitrary code on the user's machine if placed in the plugin directory.
The plugin system executes local Python plugin files from the configured plugin directory; this is documented, purpose-aligned extensibility but still runs local code.
spec.loader.exec_module(mod)
Only use plugins from sources you trust, and avoid pointing `--plugins` at directories containing unreviewed Python files.
A later dependency release could change behavior or introduce dependency risk.
The install dependencies are normal for this crawler but are not pinned to exact versions, so future installs may resolve to newer package versions.
selectolax>=0.3.21 tenacity>=8.2.3 orjson>=3.10.0 scrapling>=0.2.96
Install in a virtual environment and consider pinning or locking dependency versions before regular use.
Search terms, including names or sensitive phrases, may be visible to those search providers.
When `--queries` is used, the query text is sent to multiple external search providers; this is expected for search-seeded crawling.
"duckduckgo": f"https://duckduckgo.com/html/?q={q}",
"bing": f"https://www.bing.com/search?q={q}",
"yahoo": f"https://search.yahoo.com/search?p={q}",
"brave_web": f"https://search.brave.com/search?q={q}"Avoid sensitive queries if you do not want them sent to external search engines; use explicit `--urls` instead when possible.
Crawled content and URL history can remain on disk after a run.
Fetched page bodies and crawl state are stored locally for cache/checkpoint behavior, which is consistent with the stated crawler design.
self.cache.put(final_url, status, html.encode("utf-8", "ignore"))
...
save_checkpoint(self.checkpoint_path, self.queue, self.seen)Store outputs in an appropriate directory, clear cache/checkpoint files when no longer needed, and avoid crawling private pages if local retention is unwanted.
