Scrapling

Security checks across malware telemetry and agentic risk

Overview

This is a real scraping skill, but it needs Review because it includes under-scoped bypass/API-replication guidance and a helper script that can install packages or execute local Python code.

Install only after reviewing the helper script and limiting use to sites where you have permission to scrape. Avoid passing untrusted URLs, selectors, or spider files to run.sh, use an isolated virtual environment, and do not use the API-replication or bypass examples against paid, protected, rate-limited, or terms-prohibited services.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Supply ChainUnpinned Dependencies, External Script Fetching, Obfuscated Code
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access

Findings (11)

Context-Inappropriate Capability

Medium

Confidence: 98% confidence
Finding: This section goes beyond ordinary web scraping and provides operational guidance for discovering hidden APIs, reconstructing auth mechanisms, and programmatically reproducing access patterns. That materially lowers the barrier to unauthorized access or abuse of services, especially where 'premium' or protected endpoints are discussed.

Context-Inappropriate Capability

Medium

Confidence: 96% confidence
Finding: The spider mode passes a user-controlled argument directly to `python3`, allowing the wrapper to execute any local Python file rather than only a constrained scraping workflow. In an agent or automation context, this expands the skill from web scraping into arbitrary code execution, which is dangerous if untrusted input can reach this parameter.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: The skill description and top-level framing advertise anti-bot bypass and related capabilities without prominent guardrails about authorization, terms-of-service, or acceptable use. In a scraping skill, this context makes the omission more dangerous because the primary workflow directly interfaces with external systems and site protections.

Missing User Warnings

Medium

Confidence: 97% confidence
Finding: The anti-bot and Cloudflare examples present bypass functionality as a normal fetch mode, without warning that such actions may interfere with site defenses or violate acceptable-use boundaries. This normalizes evasive behavior and makes misuse easier for downstream users or agents.

Missing User Warnings

High

Confidence: 99% confidence
Finding: The API reverse-engineering section explicitly discusses reproducing discovered auth mechanisms and accessing hidden or premium endpoints, while omitting a strong warning that this may be unauthorized. In context, this is dangerous because it provides actionable steps for bypassing intended access controls under the guise of research.

Missing User Warnings

Medium

Confidence: 91% confidence
Finding: The brand-data example silently constructs a screenshot URL through an external service, which discloses the target URL to a third party. That can leak sensitive research targets, internal URLs, or user intent without informed consent.

Missing User Warnings

Medium

Confidence: 90% confidence
Finding: The script automatically installs a Python package when it is missing, modifying the environment without explicit user consent. In security-sensitive or reproducible environments, silent dependency installation can introduce supply-chain risk, unexpected network access, and non-deterministic behavior.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: Spider mode executes a user-specified Python file with no validation or warning, which can run arbitrary code on the host. The skill context makes this more dangerous because it is described as a web-scraping helper, so callers may not expect that selecting a mode can launch unrestricted local programs.

Ssd 4

High

Confidence: 99% confidence
Finding: This is a high-risk dual-use section because it gives a step-by-step workflow for discovering nonpublic APIs, extracting auth logic from JavaScript, and replaying requests with reconstructed headers. The skill context amplifies the risk since it is packaged as a reusable agent capability for automated scraping and data extraction.

Ssd 1

Medium

Confidence: 96% confidence
Finding: The language frames stealth and Cloudflare-solving as convenient, recommended scraping techniques rather than exceptional, authorization-dependent actions. That benign framing reduces user caution and encourages evasion of protective controls.

Ssd 1

Medium

Confidence: 97% confidence
Finding: The cloudscraper section normalizes bypassing Cloudflare-protected endpoints by providing direct setup and usage instructions. Even without exploit code against a specific target, this meaningfully facilitates circumvention of access controls and anti-abuse measures.

VirusTotal

64/64 vendors flagged this skill as clean.

View on VirusTotal