Smart Web Scraper

Security checks across malware telemetry and agentic risk

Overview

This is a straightforward web scraping skill whose network access, crawling, and optional file output are disclosed and aligned with its purpose.

Install only if you need a general-purpose scraper. Use it only on sites you are authorized to scrape, keep robots.txt enforcement enabled unless you have a clear lawful reason, avoid collecting personal or sensitive data without consent or another valid basis, and choose output paths carefully because existing files may be overwritten.

SkillSpector

By NVIDIA

Vulnerability Patterns

Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration

Findings (6)

Lp3

Medium

Category: MCP Least Privilege
Confidence: 89% confidence
Finding: The skill advertises and demonstrates network access and file output, but the metadata declares no permissions. That creates a transparency and governance gap: an agent or reviewer may approve or invoke the skill without understanding that it can fetch arbitrary URLs and write scraped data to disk, including potentially sensitive content.

Tp4

High

Category: MCP Tool Poisoning
Confidence: 86% confidence
Finding: The description frames the skill as a simple structured extractor, but the documented behavior is broader: crawling multiple pages, enumerating links, analyzing page structure, supporting extra output modes, and optionally bypassing robots.txt. That mismatch can mislead users and policy systems about the operational scope, increasing the risk of unintended reconnaissance, overcollection, or use in contexts that would have required stricter review.

Context-Inappropriate Capability

High

Confidence: 95% confidence
Finding: The crawl command includes an explicit --ignore-robots option that allows the tool to bypass robots.txt restrictions and continue scraping content the site operator has disallowed for automated access. In an agent setting, this increases the risk of unauthorized data collection, policy evasion, and use of the skill for abusive scraping beyond the user's stated needs.

Missing User Warnings

Medium

Confidence: 78% confidence
Finding: The README promotes scraping, contact extraction, and multi-page crawling without warning users about privacy, sensitive-data handling, or legal/ethical boundaries. In an agent context, this omission can normalize collection of personal or restricted data at scale and increase the chance of misuse, especially when the skill advertises lead generation and broad website extraction.

Missing User Warnings

Medium

Confidence: 83% confidence
Finding: The skill lacks an explicit warning that scraping may collect personal or sensitive data and that results can be saved to files. In context, this matters because the skill supports arbitrary URLs, structured extraction, crawling, and output-to-file, which can facilitate silent collection or local persistence of sensitive content without adequate user awareness.

Missing User Warnings

Medium

Confidence: 89% confidence
Finding: The output path is fully user-controlled and the code writes directly to that path without confirmation, path restrictions, or overwrite protection. In an agent environment, this can overwrite arbitrary local files or place scraped content in sensitive locations, causing data loss or unsafe modification of the host workspace.

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal