HN Podcast Transcriber
AdvisoryAudited by Static analysis on May 6, 2026.
Overview
No suspicious patterns detected.
Findings (0)
Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.
Installing or updating these external tools can introduce third-party code and model downloads into the local environment.
The skill depends on external local tools and a pip-installed package. This is expected for Whisper transcription, but users should install trusted versions because the registry install metadata does not automate or pin them.
- **whisper** CLI installed (`pip install openai-whisper`) - **ffmpeg** on PATH
Install Whisper and ffmpeg from trusted sources, keep them updated, and consider pinning versions if reproducibility matters.
A large or untrusted feed could cause many downloads, disk usage, or transcription work.
The script fetches an RSS feed and downloads enclosure URLs from that feed. This is central to the skill, but custom or untrusted feeds control which audio URLs are downloaded.
urllib.request.urlopen(req, timeout=30) ... urllib.request.urlretrieve(audio_url, audio_path)
Use trusted feed URLs, run with a dedicated archive directory, and use `--limit` when testing or processing large feeds.
Running transcription can consume local resources and depends on Whisper/ffmpeg safely handling downloaded media.
The script executes the local `whisper` CLI on downloaded audio. This is purpose-aligned, but it invokes local binaries and media parsing.
subprocess.run(["whisper", audio_path, "--model", whisper_model, "--output_format", "txt", "--output_dir", ep_dir], ... timeout=600)
Keep Whisper and ffmpeg updated, avoid untrusted/private feeds unless needed, and run the script in a controlled workspace.
If an agent later reads the archive, prompt-like text inside a transcript could influence it if not handled as untrusted source material.
The skill persists externally sourced podcast transcripts as markdown. This is the intended archive behavior, but later agents or searches should treat transcript text as untrusted content, not instructions.
f.write(f"## Transcript\n\n{transcript}\n")Treat archived transcripts as reference content only, and do not let later workflows follow instructions found inside podcast text.
A scheduled job may continue downloading and transcribing new episodes until disabled.
The skill suggests optional scheduled execution. This is disclosed and purpose-aligned for periodic podcast ingestion, but it is persistent behavior if the user enables it.
Set up an OpenClaw cron job for daily checks: 1. Create an isolated cron job that runs the script 2. Or add a heartbeat check in HEARTBEAT.md
Only schedule it intentionally, keep the job isolated, and periodically review disk usage and job status.
