xclaw
Extract posts from X (Twitter) timeline or profile pages with engagement metrics, media URLs, and download local copies. Use when asked to scrape, list, or a...
Like a lobster shell, security has layers — review code before you run it.
License
SKILL.md
XCLAW
Extract posts from X (Twitter) pages with full metadata including text, author, timestamp, engagement metrics, media URLs, and download local copies.
When to Use
- List recent posts from a user's profile
- Extract posts from home timeline
- Capture post URLs without navigating into individual posts
- Gather engagement metrics (likes, reposts, replies, views)
- Download images and video thumbnails
- Analyze content patterns across multiple posts
File Structure
intel/x/
├── {date}-{time}-X.md # Intel file
└── media/
└── {YYYYMMDD}-{HHMMSS}/ # Media folder (same timestamp)
├── image_{postId}_{index}.jpg
└── video_{postId}_poster.jpg
File Naming
| Item | Format | Example |
|---|---|---|
| Intel file | {date}-{time}-X.md | 2026-03-16-143052-X.md |
| Media folder | {YYYYMMDD}-{HHMMSS} | 20260316-143052 |
| Image | image_{postId}_{index}.jpg | image_2032741269689213289_0.jpg |
| Video thumbnail | video_{postId}_poster.jpg | video_2032741269689213289_poster.jpg |
OUTPUT FORMAT — EXACT FORMAT REQUIRED
WRITE THE INTEL FILE IN THIS EXACT FORMAT. FOLLOW EVERY COMPONENT PRECISELY.
# X Intelligence Report
**Scraped:** {date} {time} (Asia/Shanghai)
**Source:** X Home Timeline
**Posts Collected:** {count}
---
## Post {n}: {Topic Title}
**Author:** {Name} (@{handle})
**Posted:** {relative time}
**URL:** {post_url}
**Content:**
{post_text_content}
**Metrics:** ❤️ {likes} 🔁 {reposts} 💬 {replies} 👁 {views}
**Media:**
🖼️ media/{folder}/image_{postId}_{index}.jpg
**Signal:** {🔴 HIGH / 🟡 MEDIUM / 🟡 LOW / ⚪ SKIP} — {Brief signal assessment}
---
## Post {n+1}: ...
(MORE POSTS...)
---
## Summary
**High Signal (Tier 1):**
1. {Topic} — {One-line summary}
2. {Topic} — {One-line summary}
**Medium Signal (Tier 2):**
- {Topic} — {One-line summary}
- {Topic} — {One-line summary}
**Skip:**
- {Topic} — {Reason}
Media in Output Format (Conditional)
IF THE POST HAS MEDIA, INCLUDE THE MEDIA LINE AFTER METRICS:
**Media:**
🖼️ media/{folder}/image_{postId}_{index}.jpg
OR FOR VIDEO THUMBNAILS:
**Media:**
🎬 media/{folder}/video_{postId}_poster.jpg
IF THE POST HAS NO MEDIA, OMIT THE MEDIA SECTION ENTIRELY.
WORKFLOW
Step 1: Create Media Folder FIRST
BEFORE EXTRACTING ANY POSTS, create the media folder:
# Create folder with current timestamp
# Format: YYYYMMDD-HHMMSS
mkdir -p ../../intel/x/media/{YYYYMMDD}-{HHMMSS}
# Example
mkdir -p ../../intel/x/media/20260316-143052
Step 2: Navigate to Target Page
// For home timeline
browser.navigate({ url: "https://x.com/home" })
// For a specific user profile
browser.navigate({ url: "https://x.com/username" })
Step 3: Scroll to Load More Posts
browser.act({
fn: "() => { for(let i=0; i<3; i++) { window.scrollBy(0, window.innerHeight); } return 'scrolled'; }",
kind: "evaluate"
})
Step 4: Extract Posts with Media URLs
browser.act({
fn: "() => {
const articles = document.querySelectorAll('article[data-testid=\"tweet\"]');
const posts = [];
articles.forEach(article => {
const url = article.querySelector('a[href*=\"/status/\"]')?.href;
if (!url || url.includes('/analytics') || url.includes('/photo')) return;
const text = article.querySelector('[data-testid=\"tweetText\"]')?.innerText || '';
const timeEl = article.querySelector('time');
const timestamp = timeEl?.innerText || '';
const datetime = timeEl?.dateTime || '';
// Get author info
const nameEl = article.querySelector('[data-testid=\"User-Name\"]');
const name = nameEl?.querySelector('span')?.innerText || '';
const handle = nameEl?.querySelector('a[href*=\"/\"]')?.innerText || '';
// Get engagement metrics
const metricButtons = article.querySelectorAll('button[role=\"button\"]');
let replies = '0', reposts = '0', likes = '0', views = '0';
metricButtons.forEach(btn => {
const txt = btn.innerText || '';
const label = btn.getAttribute('aria-label') || '';
if (label && label.includes('Reply')) {
const match = txt.match(/(\d+)/);
if (match) replies = match[1];
}
if (label && label.includes('Repost')) {
const match = txt.match(/(\d+)/);
if (match) reposts = match[1];
}
if (label && label.includes('Like')) {
const match = txt.match(/(\d+)/);
if (match) likes = match[1];
}
});
// Views are in an <a> link (not button), e.g. <a href="/.../analytics">71K</a>
const viewLink = article.querySelector('a[href*=\"/analytics\"]');
if (viewLink) {
const v = viewLink.innerText.trim();
views = v.replace(/[^0-9KMB.]/g, '') || '0';
}
// Get media URLs - images and video thumbnails
const mediaUrls = [];
// Image posts: pbs.twimg.com/media/
const images = article.querySelectorAll('img[src*=\"pbs.twimg.com/media/\"]');
images.forEach(img => {
if (img.src && !mediaUrls.includes(img.src)) {
mediaUrls.push(img.src);
}
});
// Also check for image links in <a> tags
const imageLinks = article.querySelectorAll('a[href*=\"pbs.twimg.com/media/\"]');
imageLinks.forEach(link => {
if (link.href && !mediaUrls.includes(link.href)) {
mediaUrls.push(link.href);
}
});
// Video thumbnails: pbs.twimg.com/amplify_video_thumb/
const videoThumbs = article.querySelectorAll('img[src*=\"pbs.twimg.com/amplify_video_thumb/\"]');
videoThumbs.forEach(img => {
if (img.src && !mediaUrls.includes(img.src)) {
mediaUrls.push(img.src);
}
});
const postId = url.match(/\/status\/(\d+)/)?.[1] || '';
posts.push({
name,
handle,
text: text.substring(0, 500),
timestamp,
datetime,
url,
postId,
mediaUrls,
metrics: { replies, reposts, likes, views }
});
});
return posts.slice(0, 10);
}",
kind: "evaluate"
})
Step 5: Download Media Files
FOR EACH POST WITH MEDIA, DOWNLOAD THE FILES:
# Download image - replace URL with actual media URL from extraction
curl -L -o "../../intel/x/media/{folder}/image_{postId}_{index}.jpg" "https://pbs.twimg.com/media/xxx?format=jpg&name=large"
# Download video thumbnail
curl -L -o "../../intel/x/media/{folder}/video_{postId}_poster.jpg" "https://pbs.twimg.com/amplify_video_thumb/xxx"
IMPORTANT:
- USE
-LFLAG TO FOLLOW REDIRECTS - ADD
&name=largeFOR FULL-RESOLUTION IMAGES - USE THE ACTUAL URLS EXTRACTED FROM STEP 4
Step 6: Write Intel File
WRITE THE MARKDOWN FILE TO ../../intel/x/{date}-{time}-X.md
FOLLOW THE EXACT OUTPUT FORMAT SHOWN ABOVE. INCLUDE:
- Header with Scraped, Source, Posts Collected
- Each post with Author, Posted, URL, Content, Signal
- Media references if present
- Summary section at the end
Step 7: Close Chrome Tab
browser.close()
Key Selectors
| Element | Selector |
|---|---|
| Article container | article[data-testid="tweet"] |
| Post text | [data-testid=\"tweetText\"] |
| Timestamp | time |
| User name | [data-testid=\"User-Name\"] |
| Post URL | a[href*=\"/status/\"] |
| Analytics link (views) | a[href*=\"/analytics\"] |
| Metric buttons | button[role=\"button\"] |
| Images | img[src*=\"pbs.twimg.com/media/\"] |
| Video thumbnails | img[src*=\"pbs.twimg.com/amplify_video_thumb/\"] |
Tips
- Don't click into posts - Extract URLs directly from the timeline
- Filter URLs - Exclude
/analyticsand/photoURLs - Scroll before extracting - X loads posts dynamically
- Create folder FIRST - Media folder must exist before downloads
- Use -L flag - curl must follow redirects for pbs.twimg.com URLs
- Add &name=large - Get full-resolution images, not thumbnails
- Always close the tab - Task ends when tab closes
Files
1 totalComments
Loading comments…
