Kekik Crawler

Security checks across malware telemetry and agentic risk

Overview

This appears to be a legitimate web crawler, with important cautions around trusted plugins, robots.txt behavior, dependency pinning, and saved crawl data.

Install in a virtual environment, pin or audit dependencies, run only against targets you are allowed to crawl, use only trusted plugin directories, avoid --insecure except in controlled tests, and remember that research presets ignore robots.txt while outputs/cache/checkpoints may retain crawled pages and search terms.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • Supply ChainUnpinned Dependencies, External Script Fetching, Obfuscated Code
  • Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
  • MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Findings (13)

Description-Behavior Mismatch

Medium
Confidence
95% confidence
Finding
The plugin manager loads Python modules directly from disk at runtime using importlib and executes them via exec_module(). That means anyone who can place or modify files in the plugin directory can run arbitrary code inside the crawler process, which exceeds the stated deterministic Scrapling-only behavior and creates a code-execution extension point.

Context-Inappropriate Capability

High
Confidence
98% confidence
Finding
spec.loader.exec_module(mod) executes arbitrary Python from files selected at runtime, giving those files full interpreter privileges. In a crawler context, this can enable arbitrary command execution, data exfiltration, tampering with crawl results, or persistence if an attacker can influence the plugin directory contents.

Description-Behavior Mismatch

Medium
Confidence
96% confidence
Finding
Both built-in crawler presets explicitly set `no_robots: True`, which disables robots.txt compliance by default. In a web-crawling skill, this increases the risk of unauthorized or policy-violating collection, especially because the skill description does not disclose this bypass behavior or frame it as an exceptional, user-approved option.

Context-Inappropriate Capability

Medium
Confidence
95% confidence
Finding
The presets embed robots.txt bypass as standard behavior without a clear justification tied to the crawler's stated purpose. This is dangerous because it normalizes non-compliant crawling and can lead users to perform scraping against site owner restrictions unintentionally at scale.

Missing User Warnings

Medium
Confidence
89% confidence
Finding
The README explicitly documents an `--insecure` mode for crawling environments with certificate problems but provides no warning that this disables or weakens TLS verification. That can normalize unsafe usage, leading operators to crawl over connections vulnerable to man-in-the-middle interception or content tampering, especially since this tool collects and stores output data.

Missing User Warnings

Medium
Confidence
96% confidence
Finding
The documentation tells users how to run the crawler but omits a clear warning that execution will make external web requests and write output/report files locally. This increases the risk of unintended data egress, contacting untrusted hosts, or overwriting/storing files without the user's awareness.

Missing User Warnings

Medium
Confidence
90% confidence
Finding
The code silently executes plugin modules from disk without any user-facing disclosure, which can hide the fact that running the crawler also runs arbitrary local Python. This reduces operator awareness and makes unsafe deployments more likely, especially given the skill description emphasizes deterministic behavior.

Missing User Warnings

Medium
Confidence
97% confidence
Finding
The file defines presets that disable robots.txt compliance but provides no warning, disclosure, or indication to users that they are invoking a crawler configured to ignore site crawling preferences. In context, that makes the skill more dangerous because the bypass is built into convenient presets, increasing the chance of silent misuse by routine users rather than deliberate experts.

Unpinned Dependencies

Low
Category
Supply Chain
Content
selectolax>=0.3.21
tenacity>=8.2.3
orjson>=3.10.0
scrapling>=0.2.96
Confidence
96% confidence
Finding
selectolax>=0.3.21

Unpinned Dependencies

Low
Category
Supply Chain
Content
selectolax>=0.3.21
tenacity>=8.2.3
orjson>=3.10.0
scrapling>=0.2.96
Confidence
96% confidence
Finding
tenacity>=8.2.3

Unpinned Dependencies

Low
Category
Supply Chain
Content
selectolax>=0.3.21
tenacity>=8.2.3
orjson>=3.10.0
scrapling>=0.2.96
Confidence
97% confidence
Finding
orjson>=3.10.0

Unpinned Dependencies

Low
Category
Supply Chain
Content
selectolax>=0.3.21
tenacity>=8.2.3
orjson>=3.10.0
scrapling>=0.2.96
Confidence
96% confidence
Finding
scrapling>=0.2.96

Known Vulnerable Dependency: orjson — 4 advisory(ies): CVE-2025-67221 (orjson does not limit recursion for deeply nested JSON documents); CVE-2024-27454 (orjson does not limit recursion for deeply nested JSON documents); CVE-2024-27454 (orjson.loads in orjson before 3.9.15 does not limit recursion for deeply nested ) +1 more

High
Category
Supply Chain
Confidence
93% confidence
Finding
orjson

VirusTotal

60/60 vendors flagged this skill as clean.

View on VirusTotal