# Site Analysis Guide A step-by-step playbook for analyzing any website before building a scraper. This phase is critical — spending time here prevents hours of debugging fragile selectors later. --- ## Table of Contents - [Step 1: Initial Page Fetch](#step-1-initial-page-fetch) - [Step 2: Content Rendering Detection](#step-2-content-rendering-detection) - [Step 3: Hidden API Discovery](#step-3-hidden-api-discovery) - [Step 4: Selector Strategy](#step-4-selector-strategy) - [Step 5: Pagination Analysis](#step-5-pagination-analysis) - [Step 6: Anti-Bot Assessment](#step-6-anti-bot-assessment) - [Step 7: Final Decision Matrix](#step-7-final-decision-matrix) --- ## Step 1: Initial Page Fetch Start by fetching the target page through Web Unlocker to get the raw HTML the server returns. ```python import requests import os API_KEY = os.environ["BRIGHTDATA_API_KEY"] ZONE = os.environ["BRIGHTDATA_UNLOCKER_ZONE"] # Fetch raw HTML response = requests.post( "https://api.brightdata.com/request", headers={"Authorization": f"Bearer {API_KEY}"}, json={"zone": ZONE, "url": TARGET_URL, "format": "raw"} ) html = response.text # Also fetch as markdown for a quick readable overview md_response = requests.post( "https://api.brightdata.com/request", headers={"Authorization": f"Bearer {API_KEY}"}, json={"zone": ZONE, "url": TARGET_URL, "format": "raw", "data_format": "markdown"} ) markdown = md_response.text ``` **What to look for in the raw HTML:** - Is the actual content (product names, prices, text) present in the HTML? - Or is the HTML mostly empty containers waiting for JavaScript to fill them? --- ## Step 2: Content Rendering Detection Determine if the site renders content server-side (SSR) or client-side (CSR). ### Signs of Server-Side Rendering (Web Unlocker is sufficient) - The data you need appears directly in the raw HTML - Product names, prices, descriptions are visible in the source - The HTML contains `` tags with structured data (good SEO sites do this) - JSON-LD blocks (`