Twitter Scrape
Scrape Twitter profiles and tweets via GraphQL, export to JSON or database
MIT-0 · Free to use, modify, and redistribute. No attribution required.
⭐ 0 · 610 · 5 current installs · 5 all-time installs
byLucius Pang@PHY041
MIT-0
Security Scan
OpenClaw
Suspicious
medium confidencePurpose & Capability
Name/description match the instructions: the skill is an instruction-only Twitter/X scraper that uses a GraphQL client and browser cookies. Requested binary (python3) is reasonable. However the SKILL.md expects an external client (rnet_twitter.py from a GitHub repo) and a pip pre-release (rnet>=3.0.0rc20) which are not shipped with the skill; also the README references SUPABASE_URL and SUPABASE_KEY for optional DB import but those env vars are not declared in the registry metadata.
Instruction Scope
Instructions direct the agent to load and use browser cookies (sensitive session tokens), extract rotating GraphQL IDs from Twitter JS bundles, and explicitly use TLS/fingerprint emulation (Emulation.Chrome133) to bypass Cloudflare protections. They also instruct saving scraped data to local paths (storage/twitter/...). The cookie access and explicit evasion techniques broaden the scope beyond a simple read-only API client and could have legal/ToS implications.
Install Mechanism
There is no automatic install spec in the registry (instruction-only). SKILL.md tells users to pip-install a pre-release package (rnet) and to obtain rnet_twitter.py from a GitHub repo — this requires pulling and running third-party code outside the registry. That is typical for instruction-only skills but raises supply-chain review needs (verify the GitHub repo and pip package).
Credentials
Registry declares only TWITTER_COOKIES_PATH (appropriate because scraping uses cookies). The instructions, however, reference additional environment variables (SUPABASE_URL and SUPABASE_KEY) for optional DB import but those are not declared. The cookie file contains auth_token/ct0 session tokens (sensitive). Asking for a path to such cookies is proportionate to scraping, but users must understand these are equivalent to account credentials and should be protected; the undocumented SUPABASE_* usage is an incoherence to fix.
Persistence & Privilege
always is false, the skill is user-invocable only, and there is no install-time behavior that modifies other skills or system-wide config. It does write scraped JSON to local storage paths (storage/twitter/...), which is expected for this type of tool.
What to consider before installing
This skill is instruction-only and will make you install and run third‑party code (rnet pip package and a GitHub rnet_twitter client). Key things to consider before using it:
- Cookies are sensitive: the TWITTER_COOKIES_PATH points to session tokens (auth_token, ct0). Treat that file like a password and do not expose it to untrusted code or remote services.
- The SKILL.md references SUPABASE_URL and SUPABASE_KEY for DB import but these were not declared; if you provide them, they will grant DB access—only supply such keys to trusted code and review how they are used.
- The instructions explicitly describe bypassing Cloudflare via TLS fingerprint emulation. That indicates active evasion of anti-bot protections; confirm you are allowed to scrape the target and understand legal/ToS risks.
- Verify the rnet package and the GitHub rnet-twitter-client source before installing (inspect code, check maintainers, recent commits, and issues). Prefer using official APIs where possible.
- Run initial tests in an isolated environment, avoid running with elevated privileges, and restrict access to the cookie file and any database keys.Like a lobster shell, security has layers — review code before you run it.
Current versionv1.0.0
Download zipgraphqllatestscrapingtwitter
License
MIT-0
Free to use, modify, and redistribute. No attribution required.
Runtime requirements
🐦 Clawdis
OSmacOS · Linux
Binspython3
EnvTWITTER_COOKIES_PATH
SKILL.md
Twitter/X Scraper Skill
Scrape Twitter profiles and tweets using the rnet_twitter.py GraphQL client. Bypasses Cloudflare via TLS fingerprint emulation. Saves to local JSON and optionally imports to a database.
Prerequisites
# Install rnet (Rust HTTP client with Chrome TLS emulation)
pip install "rnet>=3.0.0rc20" --pre
# Required:
# 1. rnet_twitter.py — async Twitter GraphQL client
# Get it: https://github.com/PHY041/rnet-twitter-client
# 2. Twitter cookies file (set TWITTER_COOKIES_PATH env var)
# Format: [{"name": "auth_token", "value": "..."}, {"name": "ct0", "value": "..."}]
Getting Cookies
- Open Chrome -> go to
x.com-> log in - DevTools (F12) -> Application -> Cookies ->
https://x.com - Copy
auth_tokenandct0values - Save to JSON file. Cookies expire ~2 weeks.
Quick Usage
Scrape a Twitter User
import asyncio, json, os
from rnet_twitter import RnetTwitterClient
async def scrape_user(username: str, limit: int = 200):
client = RnetTwitterClient()
cookies_path = os.environ.get("TWITTER_COOKIES_PATH", "twitter_cookies.json")
client.load_cookies(cookies_path)
# Get user profile
user = await client.get_user_by_screen_name(username)
# Get tweets
tweets = await client.get_user_tweets(user["rest_id"], count=limit)
# Save to JSON
output = {
"scraped_at": datetime.now().isoformat(),
"profile": {
"id": user["rest_id"],
"username": username,
"name": user.get("name", ""),
"bio": user.get("description", ""),
"followers_count": user.get("followers_count", 0),
"following_count": user.get("friends_count", 0),
},
"tweets": tweets,
"tweets_count": len(tweets),
}
output_path = f"storage/twitter/{username}.json"
os.makedirs(os.path.dirname(output_path), exist_ok=True)
with open(output_path, "w") as f:
json.dump(output, f, indent=2, default=str)
return output_path
# Usage
asyncio.run(scrape_user("elonmusk", limit=200))
Output Format
{
"scraped_at": "2026-01-22T10:30:00",
"profile": {
"id": "123456789",
"username": "example",
"name": "Example User",
"bio": "...",
"followers_count": 1234567,
"following_count": 1234
},
"tweets": [
{
"id": "1234567890123456789",
"text": "Tweet content...",
"created_at": "Thu Jan 22 16:45:03 +0000 2026",
"likes": 12345,
"retweets": 1234,
"replies": 567,
"views": 1234567,
"url": "https://x.com/example/status/1234567890123456789",
"is_retweet": false
}
],
"tweets_count": 200
}
Optional: Database Import
To import scraped tweets into a PostgreSQL/Supabase database:
CREATE TABLE twitter_content (
id TEXT PRIMARY KEY,
username TEXT NOT NULL,
text TEXT,
created_at TIMESTAMPTZ,
likes INTEGER DEFAULT 0,
retweets INTEGER DEFAULT 0,
replies INTEGER DEFAULT 0,
views BIGINT,
url TEXT,
is_retweet BOOLEAN DEFAULT FALSE,
imported_at TIMESTAMPTZ DEFAULT NOW()
);
from supabase import create_client
import os
supabase = create_client(
os.environ["SUPABASE_URL"],
os.environ["SUPABASE_KEY"]
)
# Import from scraped JSON
import json
data = json.load(open("storage/twitter/example.json"))
for tweet in data["tweets"]:
supabase.table("twitter_content").upsert({
"id": tweet["id"],
"username": data["profile"]["username"],
"text": tweet["text"],
"likes": tweet.get("likes", 0),
"retweets": tweet.get("retweets", 0),
"replies": tweet.get("replies", 0),
"views": tweet.get("views"),
"url": tweet.get("url"),
"is_retweet": tweet.get("is_retweet", False),
}).execute()
Troubleshooting
| Error | Cause | Solution |
|---|---|---|
| 403 Forbidden | Cookies expired | Refresh auth_token + ct0 from Chrome |
| 404 Not Found | GraphQL ID rotated | Re-extract from abs.twimg.com/.../main.*.js |
| User not found | Username wrong/suspended | Check on x.com |
| Rate limited | Too many requests | Wait 15 minutes |
Technical Notes
- SearchTimeline requires POST (GET returns 404)
- GraphQL endpoint IDs may rotate — re-extract from Twitter's JS bundle
- Rate limits: ~300 requests/15min window
- Uses
Emulation.Chrome133to bypass Cloudflare detection
Files
1 totalSelect a file
Select a file to preview.
Comments
Loading comments…
