Back to plugin

Security audit

ScraperAPI Skills

Security checks across malware telemetry and agentic risk

Overview

This is a disclosed ScraperAPI integration bundle, but it should be used deliberately because it sends web queries, URLs, page content, and API credentials to third-party services.

Install only if you intend to use ScraperAPI as a third-party scraping provider. Do not send private URLs, credentials in query strings, internal systems, regulated personal data, or sensitive business content unless you have authorization and a clear data-processing basis. Review lead-enrichment, crawler, DataPipeline, webhook, and n8n scheduling workflows carefully because they can collect personal contact details, run recurring jobs, send results to webhooks, and consume ScraperAPI credits.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
  • Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
  • Taint TrackingDirect Taint Flow, Variable-Mediated Taint Flow, Credential Exfiltration Chain
  • YARA SignaturesMalware Match, Webshell Match, Cryptominer Match
Findings (44)

Vague Triggers

Medium
Confidence
89% confidence
Finding
The trigger phrases include broad requests like 'get web data for my agent', 'scrape a website in my app', and 'add web scraping to my project', which can activate this skill in contexts that are not explicitly asking for ScraperAPI. Over-broad invocation increases the chance that user-supplied URLs, queries, and content are routed to a third-party service unexpectedly, creating privacy and data-handling risk.

External Transmission

Medium
Category
Data Exfiltration
Content
import os, requests

r = requests.get(
    "https://api.scraperapi.com/",
    params={"api_key": os.environ["SCRAPERAPI_API_KEY"], "url": "https://httpbin.org/ip"}
)
print(r.status_code, r.text[:200])
Confidence
94% confidence
Finding
https://api.scraperapi.com/

External Transmission

Medium
Category
Data Exfiltration
Content
curl "https://api.scraperapi.com/?api_key=$SCRAPERAPI_API_KEY&url=https://example.com"

# With JS rendering
curl "https://api.scraperapi.com/?api_key=$SCRAPERAPI_API_KEY&url=https://example.com&render=true"

# Structured data — Google SERP
curl "https://api.scraperapi.com/structured/google/search?api_key=$SCRAPERAPI_API_KEY&query=web+scraping"
Confidence
95% confidence
Finding
https://api.scraperapi.com/

External Transmission

Medium
Category
Data Exfiltration
Content
curl "https://api.scraperapi.com/?api_key=$SCRAPERAPI_API_KEY&url=https://example.com&render=true"

# Structured data — Google SERP
curl "https://api.scraperapi.com/structured/google/search?api_key=$SCRAPERAPI_API_KEY&query=web+scraping"

# Structured data — Amazon product
curl "https://api.scraperapi.com/structured/amazon/product?api_key=$SCRAPERAPI_API_KEY&asin=B09V3KXJPB"
Confidence
95% confidence
Finding
https://api.scraperapi.com/

External Transmission

Medium
Category
Data Exfiltration
Content
curl "https://api.scraperapi.com/structured/google/search?api_key=$SCRAPERAPI_API_KEY&query=web+scraping"

# Structured data — Amazon product
curl "https://api.scraperapi.com/structured/amazon/product?api_key=$SCRAPERAPI_API_KEY&asin=B09V3KXJPB"

# Submit an async job
curl -X POST "https://async.scraperapi.com/jobs" \
Confidence
95% confidence
Finding
https://api.scraperapi.com/

External Transmission

Medium
Category
Data Exfiltration
Content
API_KEY = os.environ["SCRAPERAPI_API_KEY"]

# Submit
r = requests.post(
    "https://async.scraperapi.com/jobs",
    json={
        "apiKey": API_KEY,
Confidence
90% confidence
Finding
requests.post( "https://

External Transmission

Medium
Category
Data Exfiltration
Content
API_KEY = os.environ["SCRAPERAPI_API_KEY"]

# Submit
r = requests.post(
    "https://async.scraperapi.com/jobs",
    json={
        "apiKey": API_KEY,
Confidence
90% confidence
Finding
requests.post( "https://async.scraperapi.com/jobs", json=

YARA rule 'agent_skill_credential_exfiltration_webhook': AI agent skill credential harvesting followed by webhook or external exfiltration [agent_skills]

Critical
Category
YARA Match
Content
```python
import os, requests, time

API_KEY = os.environ["SCRAPERAPI_API_KEY"]

# Submit
r = requests.post(
Confidence
85% confidence
Finding
os.environ["SCRAPERAPI_API_KEY"]; requests.post(; requests.post(; requests.post(; requests.post(; requests.post(; webhook.site; webhook.site

Vague Triggers

Medium
Confidence
91% confidence
Finding
The trigger scope is broad enough to activate on generic research or contact-finding requests, which increases the chance the skill runs in situations where users did not explicitly intend third-party scraping or personal-data enrichment. Because the skill then performs external searches and profile synthesis automatically, over-triggering can cause unnecessary collection and disclosure of personal information.

Ssd 3

Medium
Confidence
95% confidence
Finding
The skill explicitly instructs the model to gather and present personal contact details, including emails, phone numbers, locations, and social profiles, into a consolidated dossier. This creates a doxxing-style enrichment workflow that can facilitate targeted phishing, harassment, or unauthorized profiling even when data is scraped from public sources.

Ssd 3

Medium
Confidence
97% confidence
Finding
Using an email address as a seed to derive a person's identity, employer, and related company details materially increases privacy risk because a single identifier can be expanded into a fuller profile without the subject's consent. This supports deanonymization and targeted social-engineering workflows, especially when combined with search and scraping across multiple sources.

Ssd 3

Medium
Confidence
94% confidence
Finding
The synthesis step turns scattered public and scraped data into a structured contact card, which lowers the effort needed to operationalize personal information for phishing, impersonation, or surveillance. Aggregation meaningfully raises the sensitivity of the data beyond the risk of any single source because it produces a ready-made intelligence profile.

Unrestricted Tool Access

Medium
Category
Excessive Agency
Content
Run searches by *what you're looking for*, not by which site to target. Google will surface whatever sources exist — company website, Crunchbase, Wikipedia, news, directories, LinkedIn, G2, etc. Collect all promising URLs from `organic_results[].link` and carry them into Phase 3.

**Search tool:** Call `mcp__ScraperAPI__google_search` with `query`, `num: 10`, and `countryCode: "us"`. Read snippets carefully — they often contain the data you need without an extra fetch.

### 2a. Person name as seed
Confidence
83% confidence
Finding
tool:*

Vague Triggers

Medium
Confidence
81% confidence
Finding
The trigger text is broad enough to match common informational requests like general market questions, pricing questions, or trend questions, which can cause the agent to invoke this skill when the user did not explicitly intend external web research. Because the skill transmits user-supplied queries, URLs, and content to ScraperAPI, over-triggering increases unnecessary data exfiltration and can lead to privacy and compliance issues.

Vague Triggers

Medium
Confidence
92% confidence
Finding
The trigger scope is unusually broad, covering generic activities like web scraping, research, shopping, SEO, market research, and product lookup. That can cause the skill to activate for many ordinary user requests and route user queries, URLs, and page content to a third-party scraping service, increasing the chance of unnecessary data exposure, unexpected tool use, and bypass of safer default web tools.

Missing User Warnings

Medium
Confidence
75% confidence
Finding
The Maps guidance explicitly recommends always supplying precise latitude/longitude for local queries but provides no caution about collecting, transmitting, or retaining sensitive geolocation data. In this skill context, user-supplied queries and coordinates are sent to a third-party service, which increases privacy risk and can expose exact user or target locations unnecessarily.

Missing User Warnings

Medium
Confidence
91% confidence
Finding
The guide explicitly promotes scraping that 'handles proxy rotation, CAPTCHAs, and anti-bot measures automatically' without any caution about authorization, terms-of-service, privacy, or legal constraints. In this skill context, that omission materially increases misuse risk because the tool is positioned for broad scraping tasks and transmits user-supplied URLs and queries to a third party, making it easier for users to bypass protections without informed consent or compliance checks.

Missing User Warnings

Medium
Confidence
89% confidence
Finding
The documentation recommends geo-targeting and premium proxy use but does not warn that requests may traverse third-party proxy infrastructure or different jurisdictions, which can create legal, compliance, and data-handling exposure. Given the skill metadata explicitly states that user-supplied queries, URLs, and content are transmitted to ScraperAPI, this omission is more dangerous because users may unknowingly send regulated or sensitive data through external networks.

Missing User Warnings

Medium
Confidence
91% confidence
Finding
The webhook workflow takes user-supplied company input and sends it directly to ScraperAPI/Google Search, which is a third-party service, without any built-in notice, consent check, or data-minimization guidance in the example itself. While the skill metadata notes that user-supplied queries are transmitted externally, someone copying this workflow may expose user input to an external processor without realizing the privacy and compliance implications.

Missing User Warnings

Medium
Confidence
91% confidence
Finding
The crawler configuration requires a `crawlerCallbackUrl` where ScraperAPI will POST crawl results, which means scraped page contents and metadata are transmitted to an externally reachable endpoint. The reference documents the mechanics but does not clearly warn about this data flow, increasing the risk that users unknowingly send sensitive, proprietary, or regulated data off-platform or to an exposed webhook.

External Transmission

Medium
Category
Data Exfiltration
Content
- Node listing: `https://n8n.io/integrations/scraperapi/`
- Price-drop digest template (page): `https://n8n.io/workflows/15609-send-daily-price-drop-digest-emails-for-amazon-walmart-and-google-via-scraperapi/`
- Price-drop digest template (API, returns full workflow JSON): `https://api.n8n.io/api/templates/workflows/15609`
- npm package: `https://www.npmjs.com/package/n8n-nodes-scraperapi-official`
- ScraperAPI docs: `https://docs.scraperapi.com/`
- Dashboard: `https://dashboard.scraperapi.com/`
Confidence
87% confidence
Finding
https://api.n8n.io/

Env Variable Harvesting

High
Category
Data Exfiltration
Content
import os, requests
from scraperapi_sdk import ScraperAPIClient

client = ScraperAPIClient(os.environ["SCRAPERAPI_API_KEY"])

def scrape(url, params=None):
    try:
Confidence
80% confidence
Finding
os.environ["SCRAPERAPI_API_KEY"]

Tainted flow: 'SCRAPERAPI_API_KEY' from os.environ.get (line 26, credential/environment) → requests.get (network output)

Critical
Category
Data Flow
Content
"""Search Google via ScraperAPI structured endpoint."""
    print(f"  Searching: {query!r}")
    try:
        resp = requests.get(
            "https://api.scraperapi.com/structured/google/search",
            params={
                "api_key": SCRAPERAPI_API_KEY,
Confidence
90% confidence
Finding
resp = requests.get( "https://api.scraperapi.com/structured/google/search", params={ "api_key": SCRAPERAPI_API_KEY, "query": query,

Tainted flow: 'SCRAPERAPI_API_KEY' from os.environ.get (line 26, credential/environment) → requests.get (network output)

Critical
Category
Data Flow
Content
if _should_skip(url):
        return None
    try:
        resp = requests.get(
            "https://api.scraperapi.com/",
            params={
                "api_key": SCRAPERAPI_API_KEY,
Confidence
90% confidence
Finding
resp = requests.get( "https://api.scraperapi.com/", params={ "api_key": SCRAPERAPI_API_KEY, "url": url, "output_format":

Lp3

Medium
Category
MCP Least Privilege
Confidence
92% confidence
Finding
The skill clearly requires environment variables, writes output files, and performs network requests, yet it does not declare explicit permissions for those capabilities. This creates a transparency and policy-enforcement gap: users or orchestrators may invoke the skill without understanding that sensitive queries and scraped content will be transmitted externally and persisted locally.

VirusTotal

58/58 vendors flagged this plugin as clean.

View on VirusTotal

Static analysis

No suspicious patterns detected.