Reddit Archive

Download and archive Reddit posts including images, GIFs, and videos from specified users or subreddits with filtering and sorting options.

Terry Ellison@terellison

Install

openclaw skills install @terellison/reddit-archive

SKILL.md — Reddit Archive

Download and archive Reddit posts (images, GIFs, videos) from users or subreddits.

Auto-Installation

This script automatically checks for and installs its dependencies on first run:

requests — Python HTTP library
yt-dlp — video downloader

If missing, it will attempt to install them via pip install --user. You can also:

Pre-install: pip3 install requests yt-dlp
Override yt-dlp path: export YTDLP_PATH=/your/custom/path/yt-dlp

Browser Login Required for Reddit Videos

As of mid-2026, downloading v.redd.it videos requires an authenticated Reddit session — yt-dlp's Reddit extractor reads cookies from your browser to satisfy this. Stay logged into Reddit in Safari (or another browser, see below) and the script handles it automatically.

Default browser: safari (macOS default).
Override: export REDDIT_COOKIES_BROWSER=chrome (or firefox, brave, edge, vivaldi). Set to none to skip cookie loading if you don't need Reddit videos.
Image-only / redgifs-only archives don't need this — the cookie loader is harmless if you're not logged in (those URLs won't try to use Reddit credentials), but v.redd.it posts will fail with an Account authentication is required error.

When to Use

You want to archive content from Reddit — either from a specific user (u/username) or a subreddit (r/subname).

Usage

bash

python3 ~/path/to/reddit_archive.py [options]

Options

Flag	Description	Default
`-u, --user`	Reddit username (either this OR --subreddit required)	—
`-s, --subreddit`	Subreddit name (either this OR --user required)	—
`-o, --output`	Output directory	`~/temp/.reddit_<target>`
`--sort`	Sort order: hot, new, rising, top, controversial	`hot`
`--time`	Time filter for top/controversial: hour, day, week, month, year, all	—
`--after`	Start date (YYYY-MM-DD)	No filter
`--before`	End date (YYYY-MM-DD)	No filter
`--limit`	Max posts to fetch (0 = unlimited)	0
`--images`	Download images (jpg, png, webp)	✓
`--gifs`	Download GIFs/videos (gfycat, redgifs, imgur)	✓
`--skip-existing`	Skip already-downloaded files	✓
`--workers`	Parallel download workers	4

Examples

bash

# All posts from a user
python3 reddit_archive.py -u someuser

# Subreddit with date range
python3 reddit_archive.py -s orlando --after 2025-01-01 --before 2025-12-31

# Top 10 most upvoted posts of all time from a subreddit
python3 reddit_archive.py -s funny --sort top --time all --limit 10

# New posts only
python3 reddit_archive.py -s orlando --sort new

# GIFs only, specific user
python3 reddit_archive.py -u someguy --gifs

# Custom output dir
python3 reddit_archive.py -u someuser -o ~/Downloads/reddit_archive

Output

Downloads are saved to the output directory with the following structure:

text

output_directory/
├── Pictures/
│   ├── {target}_{post_id}.jpg
│   ├── {target}_{post_id}.png
│   └── ...
└── Videos/
    ├── {target}_{post_id}.mp4
    └── ...

File Organization

The skill is organized as:

text

reddit-archive/
├── SKILL.md              ← This file
└── scripts/
    ├── reddit_archive.py ← Main downloader script
    └── requirements.txt  ← Python dependencies

Rate Limiting

Pauses 0.8s between listing-page fetches
Presents as Safari on macOS (Reddit's anti-bot blocks descriptive bot User-Agents in 2026)
Sets the over18 cookie so NSFW subreddits don't return an interstitial
Run one instance at a time — parallel runs trigger rate limits

Technical Notes

Data source: scrapes old.reddit.com listing HTML (old.reddit.com/r/<name>/<sort>/ or old.reddit.com/user/<name>/submitted/). Reddit's anonymous JSON API started returning 403 + an anti-bot HTML page in mid-2026, and the self-serve OAuth flow is gated behind a Responsible Builder Policy approval. old.reddit's server-rendered listings still work and embed the same metadata in <div class="thing" data-*> attributes (schema stable since ~2010).
Pagination: uses the after=t3_<id> cursor extracted from the page's next › button rather than a JSON after field.
Galleries: old.reddit embeds preview.redd.it/<id>.<ext> URLs for each gallery item inline. Each image is also available unsigned at i.redd.it/<id>.<ext> (full resolution, no expiry), which is what we download.
v.redd.it videos: routed through yt-dlp with --cookies-from-browser (HTML scraping doesn't expose the DASH manifest URL the way the old JSON API did, and yt-dlp's Reddit extractor in 2026 needs an authenticated session to fetch the manifest itself).
GIF/video downloads use yt-dlp (redgifs, gfycat, v.redd.it); direct images and direct mp4/gif URLs are streamed via requests.
Date filtering is done client-side after fetching (filters by the post's created_utc, which we derive from data-timestamp).