Twitter Scrape

Scrape Twitter profiles and tweets via GraphQL, export to JSON or database

MIT-0 · Free to use, modify, and redistribute. No attribution required.

⭐ 0 · 610 · 5 current installs · 5 all-time installs

byLucius Pang@PHY041

MIT-0

Security Scan

VirusTotal

Benign

View report →

OpenClaw

Suspicious

medium confidence

ℹ

Purpose & Capability

Name/description match the instructions: the skill is an instruction-only Twitter/X scraper that uses a GraphQL client and browser cookies. Requested binary (python3) is reasonable. However the SKILL.md expects an external client (rnet_twitter.py from a GitHub repo) and a pip pre-release (rnet>=3.0.0rc20) which are not shipped with the skill; also the README references SUPABASE_URL and SUPABASE_KEY for optional DB import but those env vars are not declared in the registry metadata.

Instruction Scope

Instructions direct the agent to load and use browser cookies (sensitive session tokens), extract rotating GraphQL IDs from Twitter JS bundles, and explicitly use TLS/fingerprint emulation (Emulation.Chrome133) to bypass Cloudflare protections. They also instruct saving scraped data to local paths (storage/twitter/...). The cookie access and explicit evasion techniques broaden the scope beyond a simple read-only API client and could have legal/ToS implications.

ℹ

Install Mechanism

There is no automatic install spec in the registry (instruction-only). SKILL.md tells users to pip-install a pre-release package (rnet) and to obtain rnet_twitter.py from a GitHub repo — this requires pulling and running third-party code outside the registry. That is typical for instruction-only skills but raises supply-chain review needs (verify the GitHub repo and pip package).

Credentials

Registry declares only TWITTER_COOKIES_PATH (appropriate because scraping uses cookies). The instructions, however, reference additional environment variables (SUPABASE_URL and SUPABASE_KEY) for optional DB import but those are not declared. The cookie file contains auth_token/ct0 session tokens (sensitive). Asking for a path to such cookies is proportionate to scraping, but users must understand these are equivalent to account credentials and should be protected; the undocumented SUPABASE_* usage is an incoherence to fix.

✓

Persistence & Privilege

always is false, the skill is user-invocable only, and there is no install-time behavior that modifies other skills or system-wide config. It does write scraped JSON to local storage paths (storage/twitter/...), which is expected for this type of tool.

What to consider before installing

This skill is instruction-only and will make you install and run third‑party code (rnet pip package and a GitHub rnet_twitter client). Key things to consider before using it: - Cookies are sensitive: the TWITTER_COOKIES_PATH points to session tokens (auth_token, ct0). Treat that file like a password and do not expose it to untrusted code or remote services. - The SKILL.md references SUPABASE_URL and SUPABASE_KEY for DB import but these were not declared; if you provide them, they will grant DB access—only supply such keys to trusted code and review how they are used. - The instructions explicitly describe bypassing Cloudflare via TLS fingerprint emulation. That indicates active evasion of anti-bot protections; confirm you are allowed to scrape the target and understand legal/ToS risks. - Verify the rnet package and the GitHub rnet-twitter-client source before installing (inspect code, check maintainers, recent commits, and issues). Prefer using official APIs where possible. - Run initial tests in an isolated environment, avoid running with elevated privileges, and restrict access to the cookie file and any database keys.

Like a lobster shell, security has layers — review code before you run it.

Current versionv1.0.0

Download zip

graphqlvk972s83tf4z8geh087nj8md9a582dp2platestvk972s83tf4z8geh087nj8md9a582dp2pscrapingvk972s83tf4z8geh087nj8md9a582dp2ptwittervk972s83tf4z8geh087nj8md9a582dp2p

License

MIT-0

Free to use, modify, and redistribute. No attribution required.

Termshttps://spdx.org/licenses/MIT-0.html

Runtime requirements

🐦 Clawdis

OSmacOS · Linux

Binspython3

EnvTWITTER_COOKIES_PATH

SKILL.md

Twitter/X Scraper Skill

Scrape Twitter profiles and tweets using the rnet_twitter.py GraphQL client. Bypasses Cloudflare via TLS fingerprint emulation. Saves to local JSON and optionally imports to a database.

Prerequisites

# Install rnet (Rust HTTP client with Chrome TLS emulation)
pip install "rnet>=3.0.0rc20" --pre

# Required:
# 1. rnet_twitter.py — async Twitter GraphQL client
#    Get it: https://github.com/PHY041/rnet-twitter-client
# 2. Twitter cookies file (set TWITTER_COOKIES_PATH env var)
#    Format: [{"name": "auth_token", "value": "..."}, {"name": "ct0", "value": "..."}]

Getting Cookies

Open Chrome -> go to x.com -> log in
DevTools (F12) -> Application -> Cookies -> https://x.com
Copy auth_token and ct0 values
Save to JSON file. Cookies expire ~2 weeks.

Quick Usage

Scrape a Twitter User

import asyncio, json, os
from rnet_twitter import RnetTwitterClient

async def scrape_user(username: str, limit: int = 200):
    client = RnetTwitterClient()
    cookies_path = os.environ.get("TWITTER_COOKIES_PATH", "twitter_cookies.json")
    client.load_cookies(cookies_path)

    # Get user profile
    user = await client.get_user_by_screen_name(username)

    # Get tweets
    tweets = await client.get_user_tweets(user["rest_id"], count=limit)

    # Save to JSON
    output = {
        "scraped_at": datetime.now().isoformat(),
        "profile": {
            "id": user["rest_id"],
            "username": username,
            "name": user.get("name", ""),
            "bio": user.get("description", ""),
            "followers_count": user.get("followers_count", 0),
            "following_count": user.get("friends_count", 0),
        },
        "tweets": tweets,
        "tweets_count": len(tweets),
    }

    output_path = f"storage/twitter/{username}.json"
    os.makedirs(os.path.dirname(output_path), exist_ok=True)
    with open(output_path, "w") as f:
        json.dump(output, f, indent=2, default=str)

    return output_path

# Usage
asyncio.run(scrape_user("elonmusk", limit=200))

Output Format

{
  "scraped_at": "2026-01-22T10:30:00",
  "profile": {
    "id": "123456789",
    "username": "example",
    "name": "Example User",
    "bio": "...",
    "followers_count": 1234567,
    "following_count": 1234
  },
  "tweets": [
    {
      "id": "1234567890123456789",
      "text": "Tweet content...",
      "created_at": "Thu Jan 22 16:45:03 +0000 2026",
      "likes": 12345,
      "retweets": 1234,
      "replies": 567,
      "views": 1234567,
      "url": "https://x.com/example/status/1234567890123456789",
      "is_retweet": false
    }
  ],
  "tweets_count": 200
}

Optional: Database Import

To import scraped tweets into a PostgreSQL/Supabase database:

CREATE TABLE twitter_content (
  id TEXT PRIMARY KEY,
  username TEXT NOT NULL,
  text TEXT,
  created_at TIMESTAMPTZ,
  likes INTEGER DEFAULT 0,
  retweets INTEGER DEFAULT 0,
  replies INTEGER DEFAULT 0,
  views BIGINT,
  url TEXT,
  is_retweet BOOLEAN DEFAULT FALSE,
  imported_at TIMESTAMPTZ DEFAULT NOW()
);

from supabase import create_client
import os

supabase = create_client(
    os.environ["SUPABASE_URL"],
    os.environ["SUPABASE_KEY"]
)

# Import from scraped JSON
import json
data = json.load(open("storage/twitter/example.json"))
for tweet in data["tweets"]:
    supabase.table("twitter_content").upsert({
        "id": tweet["id"],
        "username": data["profile"]["username"],
        "text": tweet["text"],
        "likes": tweet.get("likes", 0),
        "retweets": tweet.get("retweets", 0),
        "replies": tweet.get("replies", 0),
        "views": tweet.get("views"),
        "url": tweet.get("url"),
        "is_retweet": tweet.get("is_retweet", False),
    }).execute()

Troubleshooting

Error	Cause	Solution
403 Forbidden	Cookies expired	Refresh auth_token + ct0 from Chrome
404 Not Found	GraphQL ID rotated	Re-extract from `abs.twimg.com/.../main.*.js`
User not found	Username wrong/suspended	Check on x.com
Rate limited	Too many requests	Wait 15 minutes

Technical Notes

SearchTimeline requires POST (GET returns 404)
GraphQL endpoint IDs may rotate — re-extract from Twitter's JS bundle
Rate limits: ~300 requests/15min window
Uses Emulation.Chrome133 to bypass Cloudflare detection

Files

1 total

Select a file

Select a file to preview.

Comments

Loading comments…