Bright Data Best Practices

v1.0.0

Build production-ready Bright Data integrations with best practices baked in. Reference documentation for developers using coding assistants (Claude Code, Cu...

0· 41·0 current·0 all-time
byMeir Kadosh@meirk-brd

Install

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for meirk-brd/brightdata-bright-data-best-practices.

Previewing Install & Setup.
Prompt PreviewInstall & Setup
Install the skill "Bright Data Best Practices" (meirk-brd/brightdata-bright-data-best-practices) from ClawHub.
Skill page: https://clawhub.ai/meirk-brd/brightdata-bright-data-best-practices
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install brightdata-bright-data-best-practices

ClawHub CLI

Package manager switcher

npx clawhub@latest install brightdata-bright-data-best-practices
Security Scan
Capability signals
CryptoRequires walletCan make purchasesRequires OAuth tokenRequires sensitive credentials
These labels describe what authority the skill may exercise. They are separate from suspicious or malicious moderation verdicts.
VirusTotalVirusTotal
Suspicious
View report →
OpenClawOpenClaw
Benign
medium confidence
Purpose & Capability
Name and description match the included content: comprehensive Bright Data API/CLI references (Web Unlocker, SERP API, Web Scraper API, Browser API). The examples and parameters in the files align with a developer-facing best-practices guide; the requested binaries/envs in the registry are minimal (none) but the content itself legitimately discusses Bright Data credentials and CLI usage.
Instruction Scope
SKILL.md and the referenced docs provide example code, CLI usage, env-var patterns, and paths to Bright Data CLI credentials (~/.config/brightdata-cli/... etc.). The instructions do not tell the agent to scan unrelated system files or call unexpected external endpoints: all network endpoints are Bright Data domains (api.brightdata.com, brd.superproxy.io). The docs do show webhook/notify and external storage (S3/GCS) options for async jobs — legitimate for production flows but worth noting because those features can cause data to be pushed to third-party endpoints if misconfigured.
Install Mechanism
Instruction-only skill with no install spec and no code files — lowest install risk. The CLI install paths described in docs (npm install -g @brightdata/cli, curl installer, npx) are standard for the vendor; the skill itself does not perform any install.
!
Credentials
The documentation explicitly references multiple secrets and credential formats (BRIGHTDATA_API_KEY, BRIGHTDATA_UNLOCKER_ZONE, BRIGHTDATA_SERP_ZONE, BROWSER_AUTH, zone passwords, proxy username strings). However, the registry metadata lists no required environment variables or primary credential. That mismatch is a usability/security concern: the skill will expect or instruct use of secrets but does not declare them up-front. Ensure you only provide appropriately-scoped/ephemeral Bright Data keys and understand where credentials will be used (CLI or direct REST/proxy).
Persistence & Privilege
always: false and disable-model-invocation: false (default) — the skill is not force-included and can be invoked autonomously by the agent (normal). The skill does not request persistent system-wide changes or access to other skills' configurations.
Assessment
This package is a documentation/reference skill for Bright Data and appears internally consistent with that purpose. Before installing or allowing it to run autonomously, consider: 1) Credential handling — the docs reference BRIGHTDATA_API_KEY, BROWSER_AUTH and zone passwords but the registry metadata didn't declare any required env vars; only provide credentials if you understand how the agent will use them. Prefer scoped/ephemeral API keys and rotate them after testing. 2) Use the vendor CLI login (bdata login) when possible instead of embedding long-lived API keys in env vars or files. 3) Be careful with async/webhook/output features — they can send scraped data to external URLs or cloud storage (S3/GCS); validate webhook URLs and storage configs before enabling. 4) Monitor network activity and billing (Bright Data services are billable); enable quotas/expiration on keys where possible. 5) If you don't trust the skill owner (source unknown, no homepage), review the exact agent prompts/interactions it will perform and avoid granting broad credentials until you can verify provenance.

Like a lobster shell, security has layers — review code before you run it.

latestvk979j0ef98khkwn6c9kq4t0fcd85pcyv
41downloads
0stars
1versions
Updated 6h ago
v1.0.0
MIT-0

CLI Setup Reference

Install, authentication, and troubleshooting for the Bright Data CLI (bdata) are documented in a single canonical place:

references/cli-setup.md

Consult it before any task that shells out to bdata.

Bright Data APIs

Bright Data provides infrastructure for web data extraction at scale. Four primary APIs cover different use cases — always pick the most specific tool for the job.

Choosing the Right API

Use CaseAPIWhy
Scrape any webpage by URL (no interaction)Web UnlockerHTTP-based, auto-bypasses bot detection, cheapest
Google / Bing / Yandex search resultsSERP APISpecialized for SERP extraction, returns structured data
Structured data from Amazon, LinkedIn, Instagram, TikTok, etc.Web Scraper APIPre-built scrapers, no parsing needed
Click, scroll, fill forms, run JS, intercept XHRBrowser APIFull browser automation
Puppeteer / Playwright / Selenium automationBrowser APIConnects via CDP/WebDriver

Authentication Pattern (All APIs)

All APIs share the same authentication model. The env vars below apply to direct REST API integrations — if you are using the bdata CLI, bdata login handles all of these automatically (see references/cli-setup.md).

export BRIGHTDATA_API_KEY="your-api-key"         # From Control Panel > Account Settings
export BRIGHTDATA_UNLOCKER_ZONE="zone-name"       # Web Unlocker zone name
export BRIGHTDATA_SERP_ZONE="serp-zone-name"      # SERP API zone name
export BROWSER_AUTH="brd-customer-ID-zone-NAME:PASSWORD"  # Browser API credentials

REST API authentication header for Web Unlocker and SERP API:

Authorization: Bearer YOUR_API_KEY

Web Unlocker API

HTTP-based scraping proxy. Best for simple page fetches without browser interaction.

Endpoint: POST https://api.brightdata.com/request

import requests

response = requests.post(
    "https://api.brightdata.com/request",
    headers={"Authorization": f"Bearer {API_KEY}"},
    json={
        "zone": "YOUR_ZONE_NAME",
        "url": "https://example.com/product/123",
        "format": "raw"
    }
)
html = response.text

Key Parameters

ParameterTypeDescription
zonestringZone name (required)
urlstringTarget URL with http:// or https:// (required)
formatstring"raw" (HTML) or "json" (structured wrapper) (required)
methodstringHTTP verb, default "GET"
countrystring2-letter ISO for geo-targeting (e.g., "us", "de")
data_formatstringTransform: "markdown" or "screenshot"
asyncbooleantrue for async mode

Quick Patterns

# Get markdown (best for LLM input)
response = requests.post(
    "https://api.brightdata.com/request",
    headers={"Authorization": f"Bearer {API_KEY}"},
    json={"zone": ZONE, "url": url, "format": "raw", "data_format": "markdown"}
)

# Geo-targeted request
json={"zone": ZONE, "url": url, "format": "raw", "country": "de"}

# Screenshot for debugging
json={"zone": ZONE, "url": url, "format": "raw", "data_format": "screenshot"}

# Async for bulk processing
json={"zone": ZONE, "url": url, "format": "raw", "async": True}

Critical rule: Never use Web Unlocker with Puppeteer, Playwright, Selenium, or anti-detect browsers. Use Browser API instead.

See references/web-unlocker.md for complete reference including proxy interface, special headers, async flow, features, and billing.


SERP API

Structured search engine result extraction for Google, Bing, Yandex, DuckDuckGo.

Endpoint: POST https://api.brightdata.com/request (same as Web Unlocker)

response = requests.post(
    "https://api.brightdata.com/request",
    headers={"Authorization": f"Bearer {API_KEY}"},
    json={
        "zone": "YOUR_SERP_ZONE",
        "url": "https://www.google.com/search?q=python+web+scraping&brd_json=1&gl=us&hl=en",
        "format": "raw"
    }
)
data = response.json()
for result in data.get("organic", []):
    print(result["rank"], result["title"], result["link"])

Essential Google URL Parameters

ParameterDescriptionExample
qSearch queryq=python+web+scraping
brd_jsonParsed JSON outputbrd_json=1 (always use for data pipelines)
glCountry for searchgl=us
hlLanguagehl=en
startPagination offsetstart=10 (page 2), start=20 (page 3)
tbmSearch typetbm=nws (news), tbm=isch (images), tbm=vid (videos)
brd_mobileDevicebrd_mobile=1 (mobile), brd_mobile=ios
brd_browserBrowserbrd_browser=chrome
brd_ai_overviewTrigger AI Overviewbrd_ai_overview=2
uuleEncoded geo locationfor precise location targeting

Note: num parameter is deprecated as of September 2025. Use start for pagination.

Parsed JSON Response Structure

{
  "organic": [{"rank": 1, "global_rank": 1, "title": "...", "link": "...", "description": "..."}],
  "paid": [],
  "people_also_ask": [],
  "knowledge_graph": {},
  "related_searches": [],
  "general": {"results_cnt": 1240000000, "query": "..."}
}

Bing Key Parameters

ParameterDescription
qSearch query
setLangLanguage (prefer 4-letter: en-US)
ccCountry code
firstPagination (increment by 10: 1, 11, 21...)
safesearchoff, moderate, strict
brd_mobileDevice type

Async for Bulk SERP

# Submit
response = requests.post(
    "https://api.brightdata.com/request",
    params={"async": "1"},
    headers={"Authorization": f"Bearer {API_KEY}"},
    json={"zone": SERP_ZONE, "url": "https://www.google.com/search?q=test&brd_json=1", "format": "raw"}
)
response_id = response.headers.get("x-response-id")

# Retrieve (retrieve calls are NOT billed)
result = requests.get(
    "https://api.brightdata.com/serp/get_result",
    params={"response_id": response_id},
    headers={"Authorization": f"Bearer {API_KEY}"}
)

Billing: Pay per 1,000 successful requests only. Async retrieve calls are not billed.

See references/serp-api.md for complete reference including Maps, Trends, Reviews, Lens, Hotels, Flights parameters.


Web Scraper API

Pre-built scrapers for structured data extraction from 100+ platforms. No parsing logic needed.

Sync Endpoint: POST https://api.brightdata.com/datasets/v3/scrape Async Endpoint: POST https://api.brightdata.com/datasets/v3/trigger

# Sync (up to 20 URLs, returns immediately)
response = requests.post(
    "https://api.brightdata.com/datasets/v3/scrape",
    params={"dataset_id": "YOUR_DATASET_ID", "format": "json"},
    headers={"Authorization": f"Bearer {API_KEY}"},
    json={"input": [{"url": "https://www.amazon.com/dp/B09X7M8TBQ"}]}
)

if response.status_code == 200:
    data = response.json()  # Results ready
elif response.status_code == 202:
    snapshot_id = response.json()["snapshot_id"]  # Poll for completion

Parameters

ParameterTypeDescription
dataset_idstringScraper identifier from the Scraper Library (required)
formatstringjson (default), ndjson, jsonl, csv
custom_output_fieldsstringPipe-separated fields: url|title|price
include_errorsbooleanInclude error info in results

Request Body

{
  "input": [
    { "url": "https://www.amazon.com/dp/B09X7M8TBQ" },
    { "url": "https://www.amazon.com/dp/B0B7CTCPKN" }
  ]
}

Poll for Async Results

import time

# Trigger
snapshot_id = requests.post(
    "https://api.brightdata.com/datasets/v3/trigger",
    params={"dataset_id": DATASET_ID, "format": "json"},
    headers={"Authorization": f"Bearer {API_KEY}"},
    json={"input": [{"url": u} for u in urls]}
).json()["snapshot_id"]

# Poll
while True:
    status = requests.get(
        f"https://api.brightdata.com/datasets/v3/progress/{snapshot_id}",
        headers={"Authorization": f"Bearer {API_KEY}"}
    ).json()["status"]

    if status == "ready": break
    if status == "failed": raise Exception("Job failed")
    time.sleep(10)

# Download
data = requests.get(
    f"https://api.brightdata.com/datasets/v3/snapshot/{snapshot_id}",
    params={"format": "json"},
    headers={"Authorization": f"Bearer {API_KEY}"}
).json()

Progress status values: startingrunningready | failed Data retention: 30 days. Billing: Per delivered record. Invalid input URLs that fail are still billable.

See references/web-scraper-api.md for complete reference including scraper types, output formats, delivery options, and billing details.


Browser API (Scraping Browser)

Full browser automation via CDP/WebDriver. Handles CAPTCHA, fingerprinting, and anti-bot detection automatically.

Connection:

  • Playwright/Puppeteer: wss://${AUTH}@brd.superproxy.io:9222
  • Selenium: https://${AUTH}@brd.superproxy.io:9515
const { chromium } = require("playwright-core");

const AUTH = process.env.BROWSER_AUTH;
const browser = await chromium.connectOverCDP(`wss://${AUTH}@brd.superproxy.io:9222`);
const page = await browser.newPage();
page.setDefaultNavigationTimeout(120000); // Always set to 2 minutes

await page.goto("https://example.com", { waitUntil: "domcontentloaded" });
const html = await page.content();
await browser.close();
from playwright.async_api import async_playwright

async with async_playwright() as p:
    browser = await p.chromium.connect_over_cdp(f"wss://{AUTH}@brd.superproxy.io:9222")
    page = await browser.new_page()
    page.set_default_navigation_timeout(120000)
    await page.goto("https://example.com", wait_until="domcontentloaded")
    html = await page.content()
    await browser.close()

Custom CDP Functions

FunctionPurpose
Captcha.solveManually trigger CAPTCHA solving
Captcha.setAutoSolveEnable/disable auto CAPTCHA solving
Proxy.setLocationSet precise geo location (call BEFORE goto)
Proxy.useSessionMaintain same IP across sessions
Emulation.setDeviceApply device profile (iPhone 14, etc.)
Emulation.getSupportedDevicesList available device profiles
Unblocker.enableAdBlockBlock ads to save bandwidth
Unblocker.disableAdBlockRe-enable ads
Input.typeFast text input for bulk form filling
Browser.addCertificateInstall client SSL cert for session
Page.inspectGet DevTools debug URL for live session
// CDP session pattern for custom functions
const client = await page.target().createCDPSession();

// CAPTCHA solve with timeout
const result = await client.send("Captcha.solve", { timeout: 30000 });

// Precise geo location (must be before goto)
await client.send("Proxy.setLocation", {
  latitude: 37.7749,
  longitude: -122.4194,
  distance: 10,
  strict: true
});

// Block unnecessary resources
await client.send("Network.setBlockedURLs", { urls: ["*google-analytics*", "*.ads.*"] });

// Device emulation
await client.send("Emulation.setDevice", { deviceName: "iPhone 14" });

Session Rules

  • One initial navigation per session — new URL = new session
  • Idle timeout: 5 minutes
  • Max duration: 30 minutes

Geolocation

  • Country-level: append -country-us to credentials username
  • EU-wide: append -country-eu (routes through 29+ European countries)
  • Precise: use Proxy.setLocation CDP command (before navigation)

Error Codes

CodeIssueFix
407Wrong portPlaywright/Puppeteer → 9222, Selenium → 9515
403Bad authCheck credentials format and zone type
503Service scalingWait 1 minute, reconnect

Billing: Traffic-based only. Block images/CSS/fonts to reduce costs.

See references/browser-api.md for complete reference including all CDP functions, bandwidth optimization, CAPTCHA patterns, and debugging.


Detailed References

  • references/web-unlocker.md — Web Unlocker: full parameter list, proxy interface, special headers, async flow, features, billing, anti-patterns
  • references/serp-api.md — SERP API: all Google params (Maps, Trends, Reviews, Lens, Hotels, Flights), Bing params, parsed JSON structure, async, billing
  • references/web-scraper-api.md — Web Scraper API: sync vs async, all parameters, polling, scraper types, output formats, billing
  • references/browser-api.md — Browser API: connection strings, session rules, all CDP functions, geo-targeting, bandwidth optimization, CAPTCHA, debugging, error codes

Comments

Loading comments...