Scrapling Web Extractor

Security checks across malware telemetry and agentic risk

Overview

This is a webpage-to-Markdown scraper with mostly coherent core behavior, but it needs review because its instructions encourage anti-bot evasion, overstate implemented controls, and write scraped content locally by default.

Install only if you intend to scrape authorized public pages and can review target URLs and outputs yourself. Avoid using the anti-bot bypass guidance, do not put real proxy passwords directly in command lines, and be aware that this version writes scraped page content to a local outputs directory by default.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access

Findings (9)

Lp3

Medium

Category: MCP Least Privilege
Confidence: 95% confidence
Finding: The skill exposes meaningful capabilities beyond what is declared: network access, shell execution, local file reads, and local file writes. Undeclared capabilities reduce transparency and can lead operators to invoke the skill without understanding that it can read local URL lists and persist scraped content to disk.

Tp4

High

Category: MCP Tool Poisoning
Confidence: 97% confidence
Finding: The documented purpose focuses on webpage fetching and HTML-to-Markdown conversion, but the behavior also includes reading local files and writing output artifacts locally. This mismatch can mislead users and policy systems, increasing the risk of unintended local data exposure or persistence of sensitive scraped material.

Description-Behavior Mismatch

Medium

Confidence: 93% confidence
Finding: The skill claims to operate on public webpages, yet it advertises stealth features to bypass anti-bot protections such as Cloudflare and Datadome. Anti-bot defenses are access-control-like protections in practice, so this contradiction encourages use against targets that are intentionally restricting automated access.

Intent-Code Divergence

High

Confidence: 98% confidence
Finding: The documentation explicitly recommends bypassing anti-bot protections while later rules say never to bypass CAPTCHAs or access controls. This internal contradiction is dangerous because it normalizes evasive scraping behavior and undermines the policy safeguard in the same document.

Intent-Code Divergence

High

Confidence: 99% confidence
Finding: The example command is labeled as bypassing anti-bot protection, directly operationalizing prohibited conduct. Concrete examples are especially risky because they lower the barrier to misuse and provide a ready-made recipe for evasive scraping against defended sites.

Missing User Warnings

Medium

Confidence: 91% confidence
Finding: The skill can save scraped page content and an index file to disk, but the description does not prominently warn users about this persistence. That omission can lead to accidental storage of copyrighted, private, regulated, or otherwise sensitive content on the local system.

Missing User Warnings

Medium

Confidence: 90% confidence
Finding: The proxy and geolocation spoofing options involve sensitive operational data such as proxy credentials and browser/network identity manipulation, but there is no privacy or handling warning. Users may expose credentials in command history or misunderstand the compliance and privacy implications of spoofing features.

Natural-Language Policy Violations

High

Confidence: 97% confidence
Finding: The skill instructs users to use stealth mode to bypass anti-bot protections despite later prohibitions on bypassing access controls. In context, this makes the skill more dangerous because its primary domain is web scraping, where anti-bot evasion materially increases the chance of unauthorized or policy-violating collection.

Natural-Language Policy Violations

High

Confidence: 99% confidence
Finding: The command example explicitly markets stealth mode for bypassing anti-bot protection. Because it is an actionable snippet, it materially facilitates misuse and conflicts with the skill's own stated restriction against bypassing protections.

VirusTotal

59/59 vendors flagged this skill as clean.

View on VirusTotal