Reddit Archive

Security

Download and archive Reddit posts including images, GIFs, and videos from specified users or subreddits with filtering and sorting options.

Install

openclaw skills install reddit-archive

SKILL.md — Reddit Archive

Download and archive Reddit posts (images, GIFs, videos) from users or subreddits.

Auto-Installation

This script automatically checks for and installs its dependencies on first run:

  • requests — Python HTTP library
  • yt-dlp — video downloader

If missing, it will attempt to install them via pip install --user. You can also:

  • Pre-install: pip3 install requests yt-dlp
  • Override yt-dlp path: export YTDLP_PATH=/your/custom/path/yt-dlp

Browser Login Required for Reddit Videos

As of mid-2026, downloading v.redd.it videos requires an authenticated Reddit session — yt-dlp's Reddit extractor reads cookies from your browser to satisfy this. Stay logged into Reddit in Safari (or another browser, see below) and the script handles it automatically.

  • Default browser: safari (macOS default).
  • Override: export REDDIT_COOKIES_BROWSER=chrome (or firefox, brave, edge, vivaldi). Set to none to skip cookie loading if you don't need Reddit videos.
  • Image-only / redgifs-only archives don't need this — the cookie loader is harmless if you're not logged in (those URLs won't try to use Reddit credentials), but v.redd.it posts will fail with an Account authentication is required error.

When to Use

You want to archive content from Reddit — either from a specific user (u/username) or a subreddit (r/subname).

Usage

python3 ~/path/to/reddit_archive.py [options]

Options

FlagDescriptionDefault
-u, --userReddit username (either this OR --subreddit required)
-s, --subredditSubreddit name (either this OR --user required)
-o, --outputOutput directory~/temp/.reddit_<target>
--sortSort order: hot, new, rising, top, controversialhot
--timeTime filter for top/controversial: hour, day, week, month, year, all
--afterStart date (YYYY-MM-DD)No filter
--beforeEnd date (YYYY-MM-DD)No filter
--limitMax posts to fetch (0 = unlimited)0
--imagesDownload images (jpg, png, webp)
--gifsDownload GIFs/videos (gfycat, redgifs, imgur)
--skip-existingSkip already-downloaded files
--workersParallel download workers4

Examples

# All posts from a user
python3 reddit_archive.py -u someuser

# Subreddit with date range
python3 reddit_archive.py -s orlando --after 2025-01-01 --before 2025-12-31

# Top 10 most upvoted posts of all time from a subreddit
python3 reddit_archive.py -s funny --sort top --time all --limit 10

# New posts only
python3 reddit_archive.py -s orlando --sort new

# GIFs only, specific user
python3 reddit_archive.py -u someguy --gifs

# Custom output dir
python3 reddit_archive.py -u someuser -o ~/Downloads/reddit_archive

Output

Downloads are saved to the output directory with the following structure:

output_directory/
├── Pictures/
│   ├── {target}_{post_id}.jpg
│   ├── {target}_{post_id}.png
│   └── ...
└── Videos/
    ├── {target}_{post_id}.mp4
    └── ...

File Organization

The skill is organized as:

reddit-archive/
├── SKILL.md              ← This file
└── scripts/
    ├── reddit_archive.py ← Main downloader script
    └── requirements.txt  ← Python dependencies

Rate Limiting

  • Pauses 0.8s between listing-page fetches
  • Presents as Safari on macOS (Reddit's anti-bot blocks descriptive bot User-Agents in 2026)
  • Sets the over18 cookie so NSFW subreddits don't return an interstitial
  • Run one instance at a time — parallel runs trigger rate limits

Technical Notes

  • Data source: scrapes old.reddit.com listing HTML (old.reddit.com/r/<name>/<sort>/ or old.reddit.com/user/<name>/submitted/). Reddit's anonymous JSON API started returning 403 + an anti-bot HTML page in mid-2026, and the self-serve OAuth flow is gated behind a Responsible Builder Policy approval. old.reddit's server-rendered listings still work and embed the same metadata in <div class="thing" data-*> attributes (schema stable since ~2010).
  • Pagination: uses the after=t3_<id> cursor extracted from the page's next › button rather than a JSON after field.
  • Galleries: old.reddit embeds preview.redd.it/<id>.<ext> URLs for each gallery item inline. Each image is also available unsigned at i.redd.it/<id>.<ext> (full resolution, no expiry), which is what we download.
  • v.redd.it videos: routed through yt-dlp with --cookies-from-browser (HTML scraping doesn't expose the DASH manifest URL the way the old JSON API did, and yt-dlp's Reddit extractor in 2026 needs an authenticated session to fetch the manifest itself).
  • GIF/video downloads use yt-dlp (redgifs, gfycat, v.redd.it); direct images and direct mp4/gif URLs are streamed via requests.
  • Date filtering is done client-side after fetching (filters by the post's created_utc, which we derive from data-timestamp).