Twitter Scrape

Scrape Twitter profiles and tweets via GraphQL, export to JSON or database

MIT-0 · Free to use, modify, and redistribute. No attribution required.
0 · 610 · 5 current installs · 5 all-time installs
byLucius Pang@PHY041
MIT-0
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Suspicious
medium confidence
Purpose & Capability
Name/description match the instructions: the skill is an instruction-only Twitter/X scraper that uses a GraphQL client and browser cookies. Requested binary (python3) is reasonable. However the SKILL.md expects an external client (rnet_twitter.py from a GitHub repo) and a pip pre-release (rnet>=3.0.0rc20) which are not shipped with the skill; also the README references SUPABASE_URL and SUPABASE_KEY for optional DB import but those env vars are not declared in the registry metadata.
!
Instruction Scope
Instructions direct the agent to load and use browser cookies (sensitive session tokens), extract rotating GraphQL IDs from Twitter JS bundles, and explicitly use TLS/fingerprint emulation (Emulation.Chrome133) to bypass Cloudflare protections. They also instruct saving scraped data to local paths (storage/twitter/...). The cookie access and explicit evasion techniques broaden the scope beyond a simple read-only API client and could have legal/ToS implications.
Install Mechanism
There is no automatic install spec in the registry (instruction-only). SKILL.md tells users to pip-install a pre-release package (rnet) and to obtain rnet_twitter.py from a GitHub repo — this requires pulling and running third-party code outside the registry. That is typical for instruction-only skills but raises supply-chain review needs (verify the GitHub repo and pip package).
!
Credentials
Registry declares only TWITTER_COOKIES_PATH (appropriate because scraping uses cookies). The instructions, however, reference additional environment variables (SUPABASE_URL and SUPABASE_KEY) for optional DB import but those are not declared. The cookie file contains auth_token/ct0 session tokens (sensitive). Asking for a path to such cookies is proportionate to scraping, but users must understand these are equivalent to account credentials and should be protected; the undocumented SUPABASE_* usage is an incoherence to fix.
Persistence & Privilege
always is false, the skill is user-invocable only, and there is no install-time behavior that modifies other skills or system-wide config. It does write scraped JSON to local storage paths (storage/twitter/...), which is expected for this type of tool.
What to consider before installing
This skill is instruction-only and will make you install and run third‑party code (rnet pip package and a GitHub rnet_twitter client). Key things to consider before using it: - Cookies are sensitive: the TWITTER_COOKIES_PATH points to session tokens (auth_token, ct0). Treat that file like a password and do not expose it to untrusted code or remote services. - The SKILL.md references SUPABASE_URL and SUPABASE_KEY for DB import but these were not declared; if you provide them, they will grant DB access—only supply such keys to trusted code and review how they are used. - The instructions explicitly describe bypassing Cloudflare via TLS fingerprint emulation. That indicates active evasion of anti-bot protections; confirm you are allowed to scrape the target and understand legal/ToS risks. - Verify the rnet package and the GitHub rnet-twitter-client source before installing (inspect code, check maintainers, recent commits, and issues). Prefer using official APIs where possible. - Run initial tests in an isolated environment, avoid running with elevated privileges, and restrict access to the cookie file and any database keys.

Like a lobster shell, security has layers — review code before you run it.

Current versionv1.0.0
Download zip
graphqlvk972s83tf4z8geh087nj8md9a582dp2platestvk972s83tf4z8geh087nj8md9a582dp2pscrapingvk972s83tf4z8geh087nj8md9a582dp2ptwittervk972s83tf4z8geh087nj8md9a582dp2p

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

Runtime requirements

🐦 Clawdis
OSmacOS · Linux
Binspython3
EnvTWITTER_COOKIES_PATH

SKILL.md

Twitter/X Scraper Skill

Scrape Twitter profiles and tweets using the rnet_twitter.py GraphQL client. Bypasses Cloudflare via TLS fingerprint emulation. Saves to local JSON and optionally imports to a database.


Prerequisites

# Install rnet (Rust HTTP client with Chrome TLS emulation)
pip install "rnet>=3.0.0rc20" --pre

# Required:
# 1. rnet_twitter.py — async Twitter GraphQL client
#    Get it: https://github.com/PHY041/rnet-twitter-client
# 2. Twitter cookies file (set TWITTER_COOKIES_PATH env var)
#    Format: [{"name": "auth_token", "value": "..."}, {"name": "ct0", "value": "..."}]

Getting Cookies

  1. Open Chrome -> go to x.com -> log in
  2. DevTools (F12) -> Application -> Cookies -> https://x.com
  3. Copy auth_token and ct0 values
  4. Save to JSON file. Cookies expire ~2 weeks.

Quick Usage

Scrape a Twitter User

import asyncio, json, os
from rnet_twitter import RnetTwitterClient

async def scrape_user(username: str, limit: int = 200):
    client = RnetTwitterClient()
    cookies_path = os.environ.get("TWITTER_COOKIES_PATH", "twitter_cookies.json")
    client.load_cookies(cookies_path)

    # Get user profile
    user = await client.get_user_by_screen_name(username)

    # Get tweets
    tweets = await client.get_user_tweets(user["rest_id"], count=limit)

    # Save to JSON
    output = {
        "scraped_at": datetime.now().isoformat(),
        "profile": {
            "id": user["rest_id"],
            "username": username,
            "name": user.get("name", ""),
            "bio": user.get("description", ""),
            "followers_count": user.get("followers_count", 0),
            "following_count": user.get("friends_count", 0),
        },
        "tweets": tweets,
        "tweets_count": len(tweets),
    }

    output_path = f"storage/twitter/{username}.json"
    os.makedirs(os.path.dirname(output_path), exist_ok=True)
    with open(output_path, "w") as f:
        json.dump(output, f, indent=2, default=str)

    return output_path

# Usage
asyncio.run(scrape_user("elonmusk", limit=200))

Output Format

{
  "scraped_at": "2026-01-22T10:30:00",
  "profile": {
    "id": "123456789",
    "username": "example",
    "name": "Example User",
    "bio": "...",
    "followers_count": 1234567,
    "following_count": 1234
  },
  "tweets": [
    {
      "id": "1234567890123456789",
      "text": "Tweet content...",
      "created_at": "Thu Jan 22 16:45:03 +0000 2026",
      "likes": 12345,
      "retweets": 1234,
      "replies": 567,
      "views": 1234567,
      "url": "https://x.com/example/status/1234567890123456789",
      "is_retweet": false
    }
  ],
  "tweets_count": 200
}

Optional: Database Import

To import scraped tweets into a PostgreSQL/Supabase database:

CREATE TABLE twitter_content (
  id TEXT PRIMARY KEY,
  username TEXT NOT NULL,
  text TEXT,
  created_at TIMESTAMPTZ,
  likes INTEGER DEFAULT 0,
  retweets INTEGER DEFAULT 0,
  replies INTEGER DEFAULT 0,
  views BIGINT,
  url TEXT,
  is_retweet BOOLEAN DEFAULT FALSE,
  imported_at TIMESTAMPTZ DEFAULT NOW()
);
from supabase import create_client
import os

supabase = create_client(
    os.environ["SUPABASE_URL"],
    os.environ["SUPABASE_KEY"]
)

# Import from scraped JSON
import json
data = json.load(open("storage/twitter/example.json"))
for tweet in data["tweets"]:
    supabase.table("twitter_content").upsert({
        "id": tweet["id"],
        "username": data["profile"]["username"],
        "text": tweet["text"],
        "likes": tweet.get("likes", 0),
        "retweets": tweet.get("retweets", 0),
        "replies": tweet.get("replies", 0),
        "views": tweet.get("views"),
        "url": tweet.get("url"),
        "is_retweet": tweet.get("is_retweet", False),
    }).execute()

Troubleshooting

ErrorCauseSolution
403 ForbiddenCookies expiredRefresh auth_token + ct0 from Chrome
404 Not FoundGraphQL ID rotatedRe-extract from abs.twimg.com/.../main.*.js
User not foundUsername wrong/suspendedCheck on x.com
Rate limitedToo many requestsWait 15 minutes

Technical Notes

  • SearchTimeline requires POST (GET returns 404)
  • GraphQL endpoint IDs may rotate — re-extract from Twitter's JS bundle
  • Rate limits: ~300 requests/15min window
  • Uses Emulation.Chrome133 to bypass Cloudflare detection

Files

1 total
Select a file
Select a file to preview.

Comments

Loading comments…