Install
openclaw skills install scrape-creator-profileScrape and extract structured data from creator profiles across platforms such as YouTube, Instagram, TikTok, Twitter/X, LinkedIn, Twitch, and personal websites. Use this skill whenever the user asks to look up, fetch, analyze, or collect data about a content creator, influencer, or public profile — even if they don't say "scrape". Triggers: "get info on this creator", "pull their profile", "what's their follower count", "scrape this creator page", "summarize this influencer", "fetch creator stats", "analyze this YouTube channel", "get TikTok profile data".
openclaw skills install scrape-creator-profileExtract structured profile data from a content creator's public page. Returns a normalized JSON object regardless of platform.
Use this skill for any request that involves:
If the user provides a URL, jump straight to Step 2. If they only give a name or handle, start at Step 1.
If the user gave a username/handle without a URL:
| Platform | URL Pattern | Example |
|---|---|---|
| YouTube | https://www.youtube.com/@{handle} | https://www.youtube.com/@mkbhd |
https://www.instagram.com/{handle}/ | https://www.instagram.com/natgeo/ | |
| TikTok | https://www.tiktok.com/@{handle} | https://www.tiktok.com/@charlidamelio |
| Twitter / X | https://x.com/{handle} | https://x.com/sama |
https://www.linkedin.com/in/{handle}/ | https://www.linkedin.com/in/satyanadella/ | |
| Twitch | https://www.twitch.tv/{handle} | https://www.twitch.tv/shroud |
| Substack | https://{handle}.substack.com | https://astralcodexten.substack.com |
| GitHub | https://github.com/{handle} | https://github.com/torvalds |
| Patreon | https://www.patreon.com/{handle} | https://www.patreon.com/kurzgesagt |
@ prefix with short handle → likely Twitter/X or Instagramweb_fetch(url)
Check if the response contains useful profile data (bio, follower count, etc.).
If the content is mostly JS placeholders, empty <div>s, or a login wall,
move to 2b.
Platforms that almost always require browser mode:
Browser steps:
For persistent blocks, use the Apify actor for that platform. See
references/apify-actors.md for actor IDs and call patterns.
Parse the fetched content and populate the following fields. Mark any missing
field as null — do not guess.
{
"platform": "string", // youtube | instagram | tiktok | twitter | linkedin | twitch | substack | github | patreon | other
"handle": "string", // @-prefixed username
"display_name": "string", // Full display name
"verified": true | false,
"bio": "string", // Profile description / about text
"profile_url": "string", // Canonical URL used to scrape
"avatar_url": "string | null",
"external_links": ["string"], // Any links in bio or link-in-bio
"stats": {
"followers": "number | null",
"following": "number | null",
"subscribers": "number | null",
"total_views": "number | null",
"total_posts": "number | null",
"monthly_listeners": "number | null", // Spotify-style, if applicable
"engagement_rate": "number | null" // Percentage, if computable
},
"recent_content": [
{
"title": "string | null",
"url": "string",
"published_at": "ISO 8601 string | null",
"views": "number | null",
"likes": "number | null",
"comments": "number | null"
}
// Up to 5 most recent items
],
"contact_info": {
"email": "string | null", // Only if publicly listed in bio/links
"website": "string | null"
},
"scraped_at": "ISO 8601 UTC timestamp"
}
Privacy rule: Only capture fields that are explicitly public on the profile page. Do not infer, deduce, or cross-reference private information. Do not store or relay phone numbers even if visible.
See references/platform-selectors.md for CSS selectors and JSON-LD paths
per platform. Quick reference:
#subscriber-count or meta itemprop=interactionCount; description in #description-inner[data-testid="UserProfileHeader_Items"]; bio in [data-testid="UserDescription"]window._sharedData or <script type="application/ld+json"><script id="__UNIVERSAL_DATA_FOR_REHYDRATION__">itemprop attributes: name, description, follows, worksForConvert abbreviated counts to integers before storing:
"12.3K" → 12300"4.5M" → 4500000"1B" → 1000000000"1,234" → 1234Use the helper script:
python3 ~/.openclaw/workspace/skills/scrape-creator-profile/scripts/normalize_count.py "12.3K"
Present a clean summary in chat:
**[Display Name]** (@handle) · Platform
✅ Verified | 👥 X followers | 📝 Y posts
Bio: "..."
Top links: url1, url2
Recent content:
1. "Video Title" — X views (date)
2. ...
[Full JSON available on request]
Return or save the full normalized JSON object from Step 3.
To save to disk:
python3 ~/.openclaw/workspace/skills/scrape-creator-profile/scripts/save_profile.py \
--data '<json>' \
--output ~/creator-profiles/{handle}_{platform}.json
If the user supplies multiple handles or URLs (comma-separated, line-separated, or a list):
CREATOR_SCRAPE_DELAY_MS between
requests, default 1500 ms).python3 ~/.openclaw/workspace/skills/scrape-creator-profile/scripts/compare_profiles.py \
--profiles '<json array>' \
--format table # or csv
| Situation | Action |
|---|---|
| Login wall / auth required | Report which fields were blocked; return partial data |
| Rate limited (429) | Wait CREATOR_SCRAPE_DELAY_MS × 3, retry once, then report |
| Profile not found (404) | Inform the user; suggest alternate handle spellings |
| JavaScript-only page, no browser | Suggest enabling browser mode in OpenClaw settings |
| Ambiguous handle across platforms | Ask user to confirm platform before scraping |
robots.txt disallow rules unless the user explicitly overrides and
accepts responsibility.references/platform-selectors.md — CSS selectors and JSON-LD paths per platformreferences/apify-actors.md — Apify actor IDs and call patterns for fallback scrapingscripts/normalize_count.py — Converts "12.3K" → 12300scripts/save_profile.py — Saves profile JSON to diskscripts/compare_profiles.py — Builds comparison table or CSV from multiple profiles