Kekik Crawler

Security checks across malware telemetry and agentic risk

Overview

This appears to be a legitimate web crawler, with important cautions around trusted plugins, robots.txt behavior, dependency pinning, and saved crawl data.

Install in a virtual environment, pin or audit dependencies, run only against targets you are allowed to crawl, use only trusted plugin directories, avoid --insecure except in controlled tests, and remember that research presets ignore robots.txt while outputs/cache/checkpoints may retain crawled pages and search terms.

SkillSpector

By NVIDIA

Vulnerability Patterns

Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Supply ChainUnpinned Dependencies, External Script Fetching, Obfuscated Code
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection

Findings (13)

Description-Behavior Mismatch

Medium

Confidence: 95% confidence
Finding: The plugin manager loads Python modules directly from disk at runtime using importlib and executes them via exec_module(). That means anyone who can place or modify files in the plugin directory can run arbitrary code inside the crawler process, which exceeds the stated deterministic Scrapling-only behavior and creates a code-execution extension point.

Context-Inappropriate Capability

High

Confidence: 98% confidence
Finding: spec.loader.exec_module(mod) executes arbitrary Python from files selected at runtime, giving those files full interpreter privileges. In a crawler context, this can enable arbitrary command execution, data exfiltration, tampering with crawl results, or persistence if an attacker can influence the plugin directory contents.

Description-Behavior Mismatch

Medium

Confidence: 96% confidence
Finding: Both built-in crawler presets explicitly set `no_robots: True`, which disables robots.txt compliance by default. In a web-crawling skill, this increases the risk of unauthorized or policy-violating collection, especially because the skill description does not disclose this bypass behavior or frame it as an exceptional, user-approved option.

Context-Inappropriate Capability

Medium

Confidence: 95% confidence
Finding: The presets embed robots.txt bypass as standard behavior without a clear justification tied to the crawler's stated purpose. This is dangerous because it normalizes non-compliant crawling and can lead users to perform scraping against site owner restrictions unintentionally at scale.

Missing User Warnings

Medium

Confidence: 89% confidence
Finding: The README explicitly documents an `--insecure` mode for crawling environments with certificate problems but provides no warning that this disables or weakens TLS verification. That can normalize unsafe usage, leading operators to crawl over connections vulnerable to man-in-the-middle interception or content tampering, especially since this tool collects and stores output data.

Missing User Warnings

Medium

Confidence: 96% confidence
Finding: The documentation tells users how to run the crawler but omits a clear warning that execution will make external web requests and write output/report files locally. This increases the risk of unintended data egress, contacting untrusted hosts, or overwriting/storing files without the user's awareness.

Missing User Warnings

Medium

Confidence: 90% confidence
Finding: The code silently executes plugin modules from disk without any user-facing disclosure, which can hide the fact that running the crawler also runs arbitrary local Python. This reduces operator awareness and makes unsafe deployments more likely, especially given the skill description emphasizes deterministic behavior.

Missing User Warnings

Medium

Confidence: 97% confidence
Finding: The file defines presets that disable robots.txt compliance but provides no warning, disclosure, or indication to users that they are invoking a crawler configured to ignore site crawling preferences. In context, that makes the skill more dangerous because the bypass is built into convenient presets, increasing the chance of silent misuse by routine users rather than deliberate experts.

Unpinned Dependencies

Low

Category: Supply Chain
Content: selectolax>=0.3.21 tenacity>=8.2.3 orjson>=3.10.0 scrapling>=0.2.96
Confidence: 96% confidence
Finding: selectolax>=0.3.21

Unpinned Dependencies

Low

Category: Supply Chain
Content: selectolax>=0.3.21 tenacity>=8.2.3 orjson>=3.10.0 scrapling>=0.2.96
Confidence: 96% confidence
Finding: tenacity>=8.2.3

Unpinned Dependencies

Low

Category: Supply Chain
Content: selectolax>=0.3.21 tenacity>=8.2.3 orjson>=3.10.0 scrapling>=0.2.96
Confidence: 97% confidence
Finding: orjson>=3.10.0

Unpinned Dependencies

Low

Category: Supply Chain
Content: selectolax>=0.3.21 tenacity>=8.2.3 orjson>=3.10.0 scrapling>=0.2.96
Confidence: 96% confidence
Finding: scrapling>=0.2.96

Known Vulnerable Dependency: orjson — 4 advisory(ies): CVE-2025-67221 (orjson does not limit recursion for deeply nested JSON documents); CVE-2024-27454 (orjson does not limit recursion for deeply nested JSON documents); CVE-2024-27454 (orjson.loads in orjson before 3.9.15 does not limit recursion for deeply nested ) +1 more

High

Category: Supply Chain
Confidence: 93% confidence
Finding: orjson

VirusTotal

60/60 vendors flagged this skill as clean.

View on VirusTotal