Security audit

ScraperAPI Skills

Security checks across malware telemetry and agentic risk

Overview

This is a disclosed ScraperAPI integration bundle, but it should be used deliberately because it sends web queries, URLs, page content, and API credentials to third-party services.

Install only if you intend to use ScraperAPI as a third-party scraping provider. Do not send private URLs, credentials in query strings, internal systems, regulated personal data, or sensitive business content unless you have authorization and a clear data-processing basis. Review lead-enrichment, crawler, DataPipeline, webhook, and n8n scheduling workflows carefully because they can collect personal contact details, run recurring jobs, send results to webhooks, and consume ScraperAPI credits.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
Taint TrackingDirect Taint Flow, Variable-Mediated Taint Flow, Credential Exfiltration Chain
YARA SignaturesMalware Match, Webshell Match, Cryptominer Match

Findings (44)

Vague Triggers

Medium

Confidence: 89% confidence
Finding: The trigger phrases include broad requests like 'get web data for my agent', 'scrape a website in my app', and 'add web scraping to my project', which can activate this skill in contexts that are not explicitly asking for ScraperAPI. Over-broad invocation increases the chance that user-supplied URLs, queries, and content are routed to a third-party service unexpectedly, creating privacy and data-handling risk.

External Transmission

Medium

Category: Data Exfiltration
Content: import os, requests r = requests.get( "https://api.scraperapi.com/", params={"api_key": os.environ["SCRAPERAPI_API_KEY"], "url": "https://httpbin.org/ip"} ) print(r.status_code, r.text[:200])
Confidence: 94% confidence
Finding: https://api.scraperapi.com/

External Transmission

Medium

Category: Data Exfiltration
Content: curl "https://api.scraperapi.com/?api_key=$SCRAPERAPI_API_KEY&url=https://example.com" # With JS rendering curl "https://api.scraperapi.com/?api_key=$SCRAPERAPI_API_KEY&url=https://example.com&render=true" # Structured data — Google SERP curl "https://api.scraperapi.com/structured/google/search?api_key=$SCRAPERAPI_API_KEY&query=web+scraping"
Confidence: 95% confidence
Finding: https://api.scraperapi.com/

External Transmission

Medium

Category: Data Exfiltration
Content: curl "https://api.scraperapi.com/?api_key=$SCRAPERAPI_API_KEY&url=https://example.com&render=true" # Structured data — Google SERP curl "https://api.scraperapi.com/structured/google/search?api_key=$SCRAPERAPI_API_KEY&query=web+scraping" # Structured data — Amazon product curl "https://api.scraperapi.com/structured/amazon/product?api_key=$SCRAPERAPI_API_KEY&asin=B09V3KXJPB"
Confidence: 95% confidence
Finding: https://api.scraperapi.com/

External Transmission

Medium

Category: Data Exfiltration
Content: curl "https://api.scraperapi.com/structured/google/search?api_key=$SCRAPERAPI_API_KEY&query=web+scraping" # Structured data — Amazon product curl "https://api.scraperapi.com/structured/amazon/product?api_key=$SCRAPERAPI_API_KEY&asin=B09V3KXJPB" # Submit an async job curl -X POST "https://async.scraperapi.com/jobs" \
Confidence: 95% confidence
Finding: https://api.scraperapi.com/

External Transmission

Medium

Category: Data Exfiltration
Content: API_KEY = os.environ["SCRAPERAPI_API_KEY"] # Submit r = requests.post( "https://async.scraperapi.com/jobs", json={ "apiKey": API_KEY,
Confidence: 90% confidence
Finding: requests.post( "https://

External Transmission

Medium

Category: Data Exfiltration
Content: API_KEY = os.environ["SCRAPERAPI_API_KEY"] # Submit r = requests.post( "https://async.scraperapi.com/jobs", json={ "apiKey": API_KEY,
Confidence: 90% confidence
Finding: requests.post( "https://async.scraperapi.com/jobs", json=

YARA rule 'agent_skill_credential_exfiltration_webhook': AI agent skill credential harvesting followed by webhook or external exfiltration [agent_skills]

Critical

Category: YARA Match
Content: ```python import os, requests, time API_KEY = os.environ["SCRAPERAPI_API_KEY"] # Submit r = requests.post(
Confidence: 85% confidence
Finding: os.environ["SCRAPERAPI_API_KEY"]; requests.post(; requests.post(; requests.post(; requests.post(; requests.post(; webhook.site; webhook.site

Vague Triggers

Medium

Confidence: 91% confidence
Finding: The trigger scope is broad enough to activate on generic research or contact-finding requests, which increases the chance the skill runs in situations where users did not explicitly intend third-party scraping or personal-data enrichment. Because the skill then performs external searches and profile synthesis automatically, over-triggering can cause unnecessary collection and disclosure of personal information.

Ssd 3

Medium

Confidence: 95% confidence
Finding: The skill explicitly instructs the model to gather and present personal contact details, including emails, phone numbers, locations, and social profiles, into a consolidated dossier. This creates a doxxing-style enrichment workflow that can facilitate targeted phishing, harassment, or unauthorized profiling even when data is scraped from public sources.

Ssd 3

Medium

Confidence: 97% confidence
Finding: Using an email address as a seed to derive a person's identity, employer, and related company details materially increases privacy risk because a single identifier can be expanded into a fuller profile without the subject's consent. This supports deanonymization and targeted social-engineering workflows, especially when combined with search and scraping across multiple sources.

Ssd 3

Medium

Confidence: 94% confidence
Finding: The synthesis step turns scattered public and scraped data into a structured contact card, which lowers the effort needed to operationalize personal information for phishing, impersonation, or surveillance. Aggregation meaningfully raises the sensitivity of the data beyond the risk of any single source because it produces a ready-made intelligence profile.

Unrestricted Tool Access

Medium

Category: Excessive Agency
Content: Run searches by *what you're looking for*, not by which site to target. Google will surface whatever sources exist — company website, Crunchbase, Wikipedia, news, directories, LinkedIn, G2, etc. Collect all promising URLs from `organic_results[].link` and carry them into Phase 3. **Search tool:** Call `mcp__ScraperAPI__google_search` with `query`, `num: 10`, and `countryCode: "us"`. Read snippets carefully — they often contain the data you need without an extra fetch. ### 2a. Person name as seed
Confidence: 83% confidence
Finding: tool:*

Vague Triggers

Medium

Confidence: 81% confidence
Finding: The trigger text is broad enough to match common informational requests like general market questions, pricing questions, or trend questions, which can cause the agent to invoke this skill when the user did not explicitly intend external web research. Because the skill transmits user-supplied queries, URLs, and content to ScraperAPI, over-triggering increases unnecessary data exfiltration and can lead to privacy and compliance issues.

Vague Triggers

Medium

Confidence: 92% confidence
Finding: The trigger scope is unusually broad, covering generic activities like web scraping, research, shopping, SEO, market research, and product lookup. That can cause the skill to activate for many ordinary user requests and route user queries, URLs, and page content to a third-party scraping service, increasing the chance of unnecessary data exposure, unexpected tool use, and bypass of safer default web tools.

Missing User Warnings

Medium

Confidence: 75% confidence
Finding: The Maps guidance explicitly recommends always supplying precise latitude/longitude for local queries but provides no caution about collecting, transmitting, or retaining sensitive geolocation data. In this skill context, user-supplied queries and coordinates are sent to a third-party service, which increases privacy risk and can expose exact user or target locations unnecessarily.

Missing User Warnings

Medium

Confidence: 91% confidence
Finding: The guide explicitly promotes scraping that 'handles proxy rotation, CAPTCHAs, and anti-bot measures automatically' without any caution about authorization, terms-of-service, privacy, or legal constraints. In this skill context, that omission materially increases misuse risk because the tool is positioned for broad scraping tasks and transmits user-supplied URLs and queries to a third party, making it easier for users to bypass protections without informed consent or compliance checks.

Missing User Warnings

Medium

Confidence: 89% confidence
Finding: The documentation recommends geo-targeting and premium proxy use but does not warn that requests may traverse third-party proxy infrastructure or different jurisdictions, which can create legal, compliance, and data-handling exposure. Given the skill metadata explicitly states that user-supplied queries, URLs, and content are transmitted to ScraperAPI, this omission is more dangerous because users may unknowingly send regulated or sensitive data through external networks.

Missing User Warnings

Medium

Confidence: 91% confidence
Finding: The webhook workflow takes user-supplied company input and sends it directly to ScraperAPI/Google Search, which is a third-party service, without any built-in notice, consent check, or data-minimization guidance in the example itself. While the skill metadata notes that user-supplied queries are transmitted externally, someone copying this workflow may expose user input to an external processor without realizing the privacy and compliance implications.

Missing User Warnings

Medium

Confidence: 91% confidence
Finding: The crawler configuration requires a `crawlerCallbackUrl` where ScraperAPI will POST crawl results, which means scraped page contents and metadata are transmitted to an externally reachable endpoint. The reference documents the mechanics but does not clearly warn about this data flow, increasing the risk that users unknowingly send sensitive, proprietary, or regulated data off-platform or to an exposed webhook.

External Transmission

Medium

Category: Data Exfiltration
Content: - Node listing: `https://n8n.io/integrations/scraperapi/` - Price-drop digest template (page): `https://n8n.io/workflows/15609-send-daily-price-drop-digest-emails-for-amazon-walmart-and-google-via-scraperapi/` - Price-drop digest template (API, returns full workflow JSON): `https://api.n8n.io/api/templates/workflows/15609` - npm package: `https://www.npmjs.com/package/n8n-nodes-scraperapi-official` - ScraperAPI docs: `https://docs.scraperapi.com/` - Dashboard: `https://dashboard.scraperapi.com/`
Confidence: 87% confidence
Finding: https://api.n8n.io/

Env Variable Harvesting

High

Category: Data Exfiltration
Content: import os, requests from scraperapi_sdk import ScraperAPIClient client = ScraperAPIClient(os.environ["SCRAPERAPI_API_KEY"]) def scrape(url, params=None): try:
Confidence: 80% confidence
Finding: os.environ["SCRAPERAPI_API_KEY"]

Tainted flow: 'SCRAPERAPI_API_KEY' from os.environ.get (line 26, credential/environment) → requests.get (network output)

Critical

Category: Data Flow
Content: """Search Google via ScraperAPI structured endpoint.""" print(f" Searching: {query!r}") try: resp = requests.get( "https://api.scraperapi.com/structured/google/search", params={ "api_key": SCRAPERAPI_API_KEY,
Confidence: 90% confidence
Finding: resp = requests.get( "https://api.scraperapi.com/structured/google/search", params={ "api_key": SCRAPERAPI_API_KEY, "query": query,

Tainted flow: 'SCRAPERAPI_API_KEY' from os.environ.get (line 26, credential/environment) → requests.get (network output)

Critical

Category: Data Flow
Content: if _should_skip(url): return None try: resp = requests.get( "https://api.scraperapi.com/", params={ "api_key": SCRAPERAPI_API_KEY,
Confidence: 90% confidence
Finding: resp = requests.get( "https://api.scraperapi.com/", params={ "api_key": SCRAPERAPI_API_KEY, "url": url, "output_format":

Lp3

Medium

Category: MCP Least Privilege
Confidence: 92% confidence
Finding: The skill clearly requires environment variables, writes output files, and performs network requests, yet it does not declare explicit permissions for those capabilities. This creates a transparency and policy-enforcement gap: users or orchestrators may invoke the skill without understanding that sensitive queries and scraped content will be transmitted externally and persisted locally.

VirusTotal

58/58 vendors flagged this plugin as clean.

View on VirusTotal

Static analysis

No suspicious patterns detected.