Install
openclaw skills install facebook-scraperClawHub Security found sensitive or high-impact capabilities. Review the scan results before using.
Discover and scrape public Facebook pages and groups by location and category with browser simulation and export data in JSON or CSV formats.
openclaw skills install facebook-scraperPart of ScrapeClaw — a suite of production-ready, agentic social media scrapers for Instagram, YouTube, X/Twitter, and Facebook built with Python & Playwright, no API keys required.
A browser-based Facebook page and group discovery and scraping tool.
---
name: facebook-scraper
description: Discover and scrape Facebook pages and public groups from your browser.
emoji: 📘
version: 1.0.0
author: influenza
tags:
- facebook
- scraping
- social-media
- page-discovery
- group-discovery
- business-pages
metadata:
clawdbot:
requires:
bins:
- python3
- chromium
config:
stateDirs:
- data/output
- data/queue
- thumbnails
outputFormats:
- json
- csv
---
This skill provides a two-phase Facebook scraping system:
facebook.com as the site to searchFor OpenClaw agent integration, the skill provides JSON output:
# Discover Facebook pages (returns JSON)
discover --location "Miami" --category "restaurant" --type page --output json
# Discover Facebook groups (returns JSON)
discover --location "New York" --category "fitness" --type group --output json
# Scrape single page (returns JSON)
scrape --page-name examplebusiness --output json
# Scrape single group (returns JSON)
scrape --page-name examplegroup --type group --output json
{
"page_name": "example_business",
"display_name": "Example Business",
"entity_type": "page",
"category": "Restaurant",
"subcategory": "Italian Restaurant",
"about": "Family-owned Italian restaurant since 1985",
"followers": 45000,
"page_likes": 42000,
"location": "Miami, FL",
"address": "123 Main St, Miami, FL 33101",
"phone": "+1-555-0123",
"email": "info@example.com",
"website": "https://example.com",
"hours": "Mon-Sat 11AM-10PM",
"is_verified": false,
"page_tier": "mid",
"profile_pic_local": "thumbnails/example_business/profile_abc123.jpg",
"cover_photo_local": "thumbnails/example_business/cover_def456.jpg",
"recent_posts": [
{"post_url": "https://facebook.com/example_business/posts/123", "reactions": 320, "comments": 45, "shares": 12}
],
"scrape_timestamp": "2026-02-20T14:30:00"
}
{
"page_name": "example_group",
"display_name": "Miami Fitness Community",
"entity_type": "group",
"about": "A community for fitness enthusiasts in Miami",
"members": 15000,
"privacy": "Public",
"posts_per_day": 25,
"location": "Miami",
"page_tier": "mid",
"profile_pic_local": "thumbnails/example_group/profile_abc123.jpg",
"cover_photo_local": "thumbnails/example_group/cover_def456.jpg",
"scrape_timestamp": "2026-02-20T14:30:00"
}
| Tier | Likes/Members Range |
|---|---|
| nano | < 1,000 |
| micro | 1,000 - 10,000 |
| mid | 10,000 - 100,000 |
| macro | 100,000 - 1M |
| mega | > 1,000,000 |
data/queue/{location}_{category}_{type}_{timestamp}.jsondata/output/{page_name}.jsonthumbnails/{page_name}/profile_*.jpg, thumbnails/{page_name}/cover_*.jpgdata/export_{timestamp}.json, data/export_{timestamp}.csvEdit config/scraper_config.json:
{
"google_search": {
"enabled": true,
"api_key": "",
"search_engine_id": "",
"queries_per_location": 3
},
"scraper": {
"headless": false,
"min_likes": 1000,
"download_thumbnails": true,
"max_thumbnails": 6
},
"cities": ["New York", "Los Angeles", "Miami", "Chicago"],
"categories": ["restaurant", "retail", "fitness", "real-estate", "healthcare", "beauty"]
}
The scraper automatically filters out:
Running a scraper at scale without a residential proxy will get your IP blocked fast. Here's why proxies are essential for long-running scrapes:
| Advantage | Description |
|---|---|
| Avoid IP Bans | Residential IPs look like real household users, not data-center bots. Facebook is far less likely to flag them. |
| Automatic IP Rotation | Each request (or session) gets a fresh IP, so rate-limits never stack up on one address. |
| Geo-Targeting | Route traffic through a specific country/city so scraped content matches the target audience's locale. |
| Sticky Sessions | Keep the same IP for a configurable window (e.g. 10 min) — critical for maintaining a Facebook login session. |
| Higher Success Rate | Rotating residential IPs deliver 95%+ success rates compared to ~30% with data-center proxies on Facebook. |
| Long-Running Scrapes | Scrape thousands of pages/groups over hours or days without interruption. |
| Concurrent Scraping | Run multiple browser instances across different IPs simultaneously. |
We have affiliate partnerships with top residential proxy providers. Using these links supports continued development of this skill:
| Provider | Best For | Sign Up |
|---|---|---|
| Bright Data | World's largest residential network, 72M+ IPs, enterprise-grade | 👉 Sign Up for Bright Data |
| IProyal | Premium residential pool, pay-as-you-go, 195+ countries | 👉 Sign Up for IProyal |
| Storm Proxies | Fast & reliable residential IPs, developer-friendly API | 👉 Sign Up for Storm Proxies |
| NetNut | ISP-grade residential network, 52M+ IPs, direct connectivity | 👉 Sign Up for NetNut |
Sign up with any provider above, then grab:
export PROXY_ENABLED=true
export PROXY_PROVIDER=netnut # brightdata | iproyal | stormproxies | netnut | custom
export PROXY_USERNAME=your_user
export PROXY_PASSWORD=your_pass
export PROXY_COUNTRY=us # optional: two-letter country code
export PROXY_STICKY=true # optional: keep same IP per session
These are auto-configured when you set the provider name:
| Provider | Host | Port |
|---|---|---|
| Bright Data | brd.superproxy.io | 22225 |
| IProyal | proxy.iproyal.com | 12321 |
| Storm Proxies | rotating.stormproxies.com | 9999 |
| NetNut | gw-resi.netnut.io | 5959 |
Override with "host" and "port" in config or PROXY_HOST / PROXY_PORT env vars if your plan uses a different gateway.
For any other proxy service, set provider to custom and supply host/port manually:
{
"proxy": {
"enabled": true,
"provider": "custom",
"host": "your.proxy.host",
"port": 8080,
"username": "user",
"password": "pass"
}
}
Once configured, the scraper picks up the proxy automatically — no extra flags needed:
# Discover and scrape as usual — proxy is applied automatically
python main.py discover --location "Miami" --category "restaurant" --type page
python main.py scrape --page-name examplebusiness
# The log will confirm proxy is active:
# INFO - Proxy enabled: <ProxyManager provider=netnut enabled host=gw-resi.netnut.io:5959>
# INFO - Browser using proxy: netnut → gw-resi.netnut.io:5959
from proxy_manager import ProxyManager
# From config (auto-reads config/scraper_config.json)
pm = ProxyManager.from_config()
# From environment variables
pm = ProxyManager.from_env()
# Manual construction
pm = ProxyManager(
provider="netnut",
username="your_user",
password="your_pass",
country="us",
sticky=True
)
# For Playwright browser context
proxy = pm.get_playwright_proxy()
# → {"server": "http://gw-resi.netnut.io:5959", "username": "user-country-us-session-abc123", "password": "pass"}
# For requests / aiohttp
proxies = pm.get_requests_proxy()
# → {"http": "http://user:pass@host:port", "https": "http://user:pass@host:port"}
# Force new IP (rotates session ID)
pm.rotate_session()
# Debug info
print(pm.info())
"sticky": true."country": "us" (or your target region) so Facebook serves content in the expected locale.pm.rotate_session() when switching Facebook accounts to get a fresh IP.delay_between_profiles in config (default 5-10s) to avoid aggressive patterns.