Security audit

Webcrawler Deep Crawl

Security checks across malware telemetry and agentic risk

Overview

This skill is a useful web crawler, but it asks for broad authenticated crawling and includes rate-limit evasion guidance that users should review before installing.

Install only if you are comfortable with a crawler that can use your active browser session, write crawled content to disk, and potentially capture authenticated pages. Use it on sites you own or are allowed to crawl, set tight page/depth/glob limits, avoid logged-in sessions unless necessary, and do not use the stealth multi-session guidance to bypass site controls.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands

Findings (9)

Lp3

Medium

Category: MCP Least Privilege
Confidence: 91% confidence
Finding: The skill clearly directs the agent to perform network-capable actions such as fetching llms.txt, sitemap.xml, and crawling pages, yet it declares no permissions. This creates a transparency and policy-enforcement gap: operators may invoke the skill without realizing it performs broad outbound requests and persistent collection, undermining least-privilege controls and auditability.

Intent-Code Divergence

Medium

Confidence: 84% confidence
Finding: The document states the skill does not handle login flows, but earlier instructions tell the agent to assist with login when needed. That contradiction can cause the agent to enter authenticated areas and then crawl and persist private account content, while operators rely on a misleading limitation statement.

Context-Inappropriate Capability

High

Confidence: 96% confidence
Finding: The instruction to open multiple stealth browser sessions with independent fingerprints is an evasion technique that goes beyond normal crawling. It can be used to bypass anti-bot controls, rate limits, and site defenses, increasing the likelihood of unauthorized large-scale scraping and making the skill materially more dangerous in context.

Context-Inappropriate Capability

Medium

Confidence: 96% confidence
Finding: The generated JavaScript performs a fetch to a user-supplied origin with `credentials: 'include'`, causing the browser to attach ambient cookies or HTTP auth for that site. In a website-crawling skill, this can unintentionally access authenticated `/llms.txt` content or disclose the user's logged-in context to an arbitrary target origin without clear consent, expanding the crawl beyond public data.

Missing User Warnings

Medium

Confidence: 89% confidence
Finding: The skill contemplates crawling pages when the browser is already logged in, but it does not clearly warn users that authenticated pages may contain private account, tenant, or customer data that will be extracted and written to disk. In this context, the omission is significant because the crawler is designed for breadth-first bulk collection and persistence.

Missing User Warnings

Medium

Confidence: 88% confidence
Finding: The skill instructs the agent to create crawl_state.json, a pages directory, and optionally a combined dataset, but it lacks a clear user-facing disclosure that potentially large amounts of page content and metadata will be written to the working directory. This can lead to unintentional retention of proprietary, personal, or regulated data, especially during deep crawls or authenticated sessions.

Missing User Warnings

Medium

Confidence: 92% confidence
Finding: This code silently issues a cross-origin request with included credentials to a URL derived from user input, but provides no warning that browser credentials may be sent. That is risky in this skill because users are likely to supply arbitrary sites for crawling, so the skill may probe third-party origins as the current logged-in user and retrieve private index data unexpectedly.

Missing User Warnings

Medium

Confidence: 96% confidence
Finding: The generated JavaScript performs `fetch(u, { credentials: 'include' })`, which causes browser cookies and other ambient credentials to be sent with sitemap requests when permitted by same-origin policy. In a deep-crawl skill, this can unintentionally access authenticated/internal sitemap content or disclose the user's logged-in context without clear consent, expanding the crawl surface beyond public data.

Missing User Warnings

Medium

Confidence: 92% confidence
Finding: The script always includes page metadata and `document.referrer` in its JSON output, which can expose sensitive browsing context, internal URLs, query parameters, or access-token-bearing referrers without any minimization or explicit opt-in. In a deep-crawl skill that is meant to bulk extract website content for downstream indexing or LLM ingestion, this increases the chance that private or unnecessary context is collected and propagated into logs, datasets, or vector stores.

VirusTotal

57/57 vendors flagged this skill as clean.

View on VirusTotal

Static analysis

No suspicious patterns detected.