{"skill":{"slug":"hn-podcast-transcribe","displayName":"HN Podcast Transcribe","summary":"Download, transcribe, and archive Hacker News podcast episodes (e.g. \"Hacker News Recap\" by Wondercraft). Use when: (1) user wants to transcribe HN podcast e...","description":"---\nname: hn-podcast-transcribe\ndescription: >\n  Download, transcribe, and archive Hacker News podcast episodes (e.g. \"Hacker News Recap\" by Wondercraft).\n  Use when: (1) user wants to transcribe HN podcast episodes, (2) user asks to archive HN podcast content,\n  (3) user wants searchable text from HN audio podcasts, (4) user mentions \"HN podcast\" + \"transcribe\" or \"archive\".\n  Supports incremental processing — only new episodes are downloaded/transcribed on each run.\n  Works with any podcast RSS feed, not just HN.\n---\n\n# HN Podcast Transcribe & Archive\n\nAutomatically download, transcribe, and archive Hacker News podcast episodes into a searchable local archive.\n\n## Default Podcast\n\n**Hacker News Recap** by Wondercraft.ai — daily AI-generated recap of top HN posts.\n- RSS: `https://rss.buzzsprout.com/2170103.rss`\n\nOverride with `HN_PODCAST_RSS` env var for any podcast RSS feed.\n\n## Workflow\n\n### 1. Fetch new episodes\n\n```bash\npython3 scripts/fetch_episodes.py [--rss URL] [--archive DIR] [--limit N] [--no-download]\n```\n\n- Parses podcast RSS feed\n- Compares against existing archive to skip already-processed episodes\n- Downloads audio (mp3/m4a/wav) for each new episode\n- Saves metadata as JSON alongside audio\n- Default archive: `./hn-podcast-archive/`\n- `--no-download`: save metadata only, skip audio download\n\n**Download strategies** (tried in order):\n1. Direct HTTP download — works for most podcast CDNs\n2. yt-dlp fallback — handles some Cloudflare-protected hosts\n3. If both fail, the episode directory is created with metadata; place audio manually\n\n**Cloudflare note**: Some hosts (e.g. Buzzsprout) block automated downloads. If direct download fails:\n- Use `--no-download` to create the directory structure\n- Download audio manually via browser or podcast app\n- Place the file as `audio.mp3` in the episode directory\n- Re-run the transcribe step\n\n### 2. Transcribe audio\n\n```bash\npython3 scripts/transcribe_episodes.py [--archive DIR] [--model MODEL] [--format FORMAT]\n```\n\n- Finds episodes with audio but no transcript\n- Runs Whisper locally (no API key needed)\n- Outputs: `txt`, `srt`, `vtt`, or `json` (default: `txt`)\n- Default model: `turbo` (fast, good accuracy)\n- Supports audio formats: mp3, m4a, wav, ogg, flac\n\n### 3. Generate archive index\n\n```bash\npython3 scripts/build_index.py [--archive DIR]\n```\n\n- Creates `archive_index.json` with all episodes, dates, titles, and transcript paths\n- Enables fast search across the archive\n\n### 4. Search archive\n\n```bash\npython3 scripts/search_archive.py [--archive DIR] \"search query\"\n```\n\n- Full-text search across all transcribed episodes\n- Returns matching episodes with context snippets\n\n## One-shot: Full Pipeline\n\n```bash\npython3 scripts/pipeline.py [--rss URL] [--archive DIR] [--model MODEL] [--limit N]\n```\n\nRuns fetch → transcribe → index in sequence.\n\n## Cron Integration\n\nSet up periodic processing with OpenClaw cron:\n\n```\n# Daily at 6am — process new HN Recap episodes\ncron add --name \"hn-podcast-digest\" --schedule \"0 6 * * *\" --payload '{\"kind\":\"agentTurn\",\"message\":\"Run the HN podcast transcription pipeline: python3 scripts/pipeline.py --limit 3\"}'\n```\n\n## Archive Structure\n\n```\nhn-podcast-archive/\n├── archive_index.json\n├── 2026-05-10_hardware-attestation-as-monopoly-enabler/\n│   ├── episode.json\n│   ├── audio.mp3\n│   └── transcript.txt\n├── 2026-05-09_a-recent-experience-with-chatgpt-5-5-pro/\n│   ├── episode.json\n│   ├── audio.mp3\n│   └── transcript.txt\n└── ...\n```\n\n## Configuration\n\n| Env Var | Default | Description |\n|---|---|---|\n| `HN_PODCAST_RSS` | Buzzsprout HN Recap feed | Podcast RSS feed URL |\n| `HN_ARCHIVE_DIR` | `./hn-podcast-archive` | Archive directory |\n| `WHISPER_MODEL` | `turbo` | Whisper model name |\n| `WHISPER_FORMAT` | `txt` | Transcript output format |\n\n## Requirements\n\n- Python 3.10+\n- `openai-whisper` (`pip install openai-whisper`)\n- `requests` (`pip install requests`)\n- `static-ffmpeg` (`pip install static-ffmpeg`) — auto-provides ffmpeg\n- `yt-dlp` (optional, for fallback downloads)\n- Whisper models auto-download to `~/.cache/whisper` on first use\n","tags":{"latest":"2.0.0"},"stats":{"comments":0,"downloads":476,"installsAllTime":0,"installsCurrent":0,"stars":0,"versions":5},"createdAt":1777866431013,"updatedAt":1778534506696},"latestVersion":{"version":"2.0.0","createdAt":1778534269186,"changelog":"Major update: multi-format audio, static-ffmpeg auto-setup, Cloudflare download fallback, improved search","license":"MIT-0"},"metadata":null,"owner":{"handle":"terrycarter1985","userId":"s17brwfrqyjhbjgadkvar20h8x8492g8","displayName":"terrycarter1985","image":"https://avatars.githubusercontent.com/u/1542987?v=4"},"moderation":null}