Install
openclaw skills install autoshortsTurn long videos into viral TikTok, Instagram Reels & YouTube Shorts. Daily AI pipeline: Whisper transcribes, Gemini 3 Flash multimodal picks every viral moment with frame-accurate cuts, FFmpeg renders with hook-text overlay, you approve, Upload-Post publishes. One video per run, human-gated before posting. Use when the user wants to create shorts/reels/clips from long videos, mentions autoshorts, viral clips, video repurposing, content automation, or asks for the daily clip batch.
openclaw skills install autoshortsPipeline tooling lives at ~/Documents/skill-autoshorts/. Each day this skill picks ONE long video from INPUT_FOLDER, extracts every viable short-form clip (Gemini 3 Flash decides), shows them to the user for approval, and publishes the approved ones via Upload-Post.
cd ~/Documents/skill-autoshorts && python3 -m venv venv && source venv/bin/activate && pip install -r requirements.txt
Required system binary. Verify with ffmpeg -version. Install with brew install ffmpeg if missing.
.envFile lives at ~/Documents/skill-autoshorts/.env. Required keys:
UPLOAD_POST_API_KEY=...
UPLOAD_POST_PROFILE=...
GEMINI_API_KEY=...
INPUT_FOLDER=/abs/path/to/long/videos
OUTPUT_FOLDER=/abs/path/to/clip/output
WHISPER_MODEL=medium
TIMEZONE=Europe/Madrid
If a required key is missing, ask the user for it before continuing.
UPLOAD_POST_PROFILE (NOT the social handle).curl -H "Authorization: Apikey $UPLOAD_POST_API_KEY" https://api.upload-post.com/api/uploadposts/me.This skill is invoked daily by the openclaw harness, which also handles the messaging bridge (Telegram, WhatsApp, or whatever channel openclaw is configured with). The skill itself does NOT talk to Telegram or any messenger directly — it just runs the pipeline and presents the candidates as text + absolute file paths. openclaw forwards your output to the user's phone, captures the user's reply, and feeds it back into the conversation.
Concretely: at Step 5 you print the candidates table and ask which IDs to publish; openclaw delivers that table plus the clip files via the user's chosen channel; the user replies on their phone (e.g., "1, 3, 5"); openclaw injects that reply back; you continue with Steps 6–8. Same pattern for any other "ask the user" point in the workflow (metadata review, dry-run confirmation, etc.).
If the skill is invoked outside openclaw (e.g., user runs /autoshorts directly in Claude Code), the same prompts work — they just appear in the terminal instead of on the phone.
This skill is meant to run as a daily infinite loop. Every run picks ONE video and walks it through the pipeline. Pick semantics are round-robin per cycle: each video is picked at most once per cycle. When every video in INPUT_FOLDER has been processed in the current cycle, a new cycle automatically starts and the same videos become available again — generating fresh clips from already-clipped sources. The state file at state/processed.json tracks cycle_started_at, last_processed_at per video, and cycles_count per video. Inside a cycle, the next pick is the newest unprocessed-this-cycle (mtime DESC), so fresh material always jumps the queue.
Before doing any work, check that the environment is ready and ask the user for whatever is missing:
~/Documents/skill-autoshorts/venv/bin/python exist? If not, run setup step 1 from the Setup section. (You can do this without asking — it's mechanical.)ffmpeg in PATH — if missing, ask the user to brew install ffmpeg (do not install yourself; system-wide installs deserve confirmation)..env file — check that every required key is set and non-empty:
GEMINI_API_KEY → if missing, ask: "Falta la API key de Gemini. Pégamela (la generas en https://aistudio.google.com/apikey)."UPLOAD_POST_API_KEY and UPLOAD_POST_PROFILE → if missing, ask: "Necesito la API key de Upload-Post y el nombre del profile (Manage Users en https://app.upload-post.com)."INPUT_FOLDER and OUTPUT_FOLDER → if missing, default to ~/Documents/skill-autoshorts/input and ~/Documents/skill-autoshorts/output and write them to .env.WHISPER_MODEL → default medium. TIMEZONE → default Europe/Madrid.If the user provides an API key in the conversation, write it to .env immediately, never echo it back, and warn that the key is now in conversation logs and they should rotate it after testing.
Input video format: videos in INPUT_FOLDER are expected to already be 9:16 vertical and ready to post (1080×1920 typical). If the user has burned-in subtitles, those should already be on the source video. The skill does NOT reformat, crop, scale, or burn subtitles — it only cuts the chosen segment and overlays a hook text on top. If a video arrives in landscape or any non-9:16 ratio, surface that to the user and ask before processing — running the pipeline as-is will produce TikTok-incompatible output.
How videos arrive into INPUT_FOLDER is the harness's job, not the skill's. The canonical flow: the user forwards a video to openclaw / Hermes / their agent in chat (Telegram / WhatsApp / etc.), the harness downloads it and saves it to INPUT_FOLDER. The skill itself only operates on files that are already there. If the user passes a video path that is NOT inside INPUT_FOLDER (e.g. /autoshorts /Users/foo/Downloads/podcast.mp4), copy it in first (use cp, do not move — the original stays put). Otherwise pick will not find it.
python autoshorts.py pick
Returns JSON with the next video to process. Output fields:
path, name, size_mb, mtime, duration_s — file metadata.previous_cycles_completed — how many cycles this video has already been through (0 means first time ever).remaining_in_cycle — how many other videos are still untouched in the current cycle.cycle_started_at — timestamp of the current cycle's start.new_cycle_started — true if THIS pick is the one that opened a fresh cycle (every video has been processed in the previous cycle, the loop wraps around).If new_cycle_started is true, mention it to the user briefly ("starting a new cycle — already clipped this video N times before, going for fresh moments"). It's not an error — it's the expected wrap-around. Gemini will likely pick different moments since the prompt is non-deterministic and HOT.md priors evolve over time.
The pipeline does NOT hard-stop when it runs out of fresh videos. The only hard-stop case is INPUT_FOLDER being empty — surface that and ask the user to drop something in.
If the user explicitly says "reprocess video X right now" out of cycle order, remove that entry from state/processed.json first, then run pick. Do NOT bypass the cycle logic by other means.
python autoshorts.py transcribe "<VIDEO_PATH>"
Writes output/<video_slug>/transcript.json with sentence segments and per-word timestamps. Whisper auto-detects language. Default model is medium.
python autoshorts.py analyze "<VIDEO_PATH>"
Uploads the video to Gemini Files API and asks gemini-3-flash-preview to return EVERY viable short-form moment (20–60s each), with timestamps snapped to word boundaries from the transcript. Output: output/<video_slug>/clips.json. Read this file to get the candidate list.
For each clip in clips.json, run two commands:
python autoshorts.py extract "<VIDEO_PATH>" \
--start <START> --end <END> \
--output "output/<slug>/clip_<ID>.mp4"
python autoshorts.py hook "output/<slug>/clip_<ID>.mp4" \
--text "<HOOK_TEXT>" --duration 3 \
--output "output/<slug>/clip_<ID>_final.mp4"
The hook is rendered TikTok/Instagram-style: each line of text gets its own black pill (78% opacity, rounded corners) behind it, with white Impact text + black stroke on top. The pill keeps the hook legible on any background — pure white, pure black, busy screenshares — without needing to inspect the underlying frame. Positioned at the top of the frame for the first 3 seconds. Hook text comes from clips.json (Gemini wrote it in the video's language).
Cut and hook ALL candidates upfront — the user will review the actual final files visually, not metadata.
You are multimodal. Use that. Before showing the candidates to the user, verify the hook overlay actually renders cleanly on each clip.
For every clip_<ID>_final.mp4:
python autoshorts.py preview output/<slug>/clip_<ID>_final.mp4
This extracts a single frame at t=1.0s (mid-hook) to preview_clip_<ID>_final.png next to the clip. Open it with the Read tool — Claude / openclaw both view PNGs directly. No Gemini call needed; the agent running the skill IS the multimodal reviewer.
For each preview, evaluate:
á é í ó ú ñ ¿ ¡) rendering correctly?Add a "QA" column to the Step 5 table with one of:
✅ — clean⚠️ <issue> — flag the specific problem (e.g. ⚠️ último carácter recortado, ⚠️ pill desbordado a la derecha)Do NOT silently drop flagged clips — show them to the user with the warning so they can decide. The QA pass is advisory: a "⚠️" is a hint, not a veto. If multiple clips fail in the same way (e.g. the hook is consistently overflowing), that's a signal to suggest the user shorten the hook style going forward.
Show a markdown table:
| ID | Duration | Hook | Score | QA | Reason | File |
|---|---|---|---|---|---|---|
| 1 | 38s | "..." | 9 | ✅ | ... | output/<slug>/clip_1_final.mp4 |
| 2 | 27s | "..." | 7 | ⚠️ acento "ó" recortado | ... | output/<slug>/clip_2_final.mp4 |
| … | … | … | … | … | … | … |
Always include the absolute file paths in the table — openclaw uses them to attach the actual clip videos when it forwards the message to the user's messenger (Telegram / WhatsApp / etc.). Without absolute paths the user sees only metadata and cannot review the clips visually. Then ask:
Which clip IDs do you want to publish? (e.g.
1, 3, 5, ornone.)
Wait for the user's reply (it will arrive via openclaw from the user's phone).
If the user replies none (rejects all candidates), skip directly to Step 8 and mark-processed with --clips-published 0. This consumes the video so tomorrow's run picks the next one — otherwise the same rejected candidates would surface again. If the user wants to retry the same video later, they can manually remove its entry from state/processed.json.
For every approved ID, generate platform-specific copy. This is YOUR job as Claude — write it directly, do not call a tool. Match the language of the video.
tiktok_title, max 90 chars): punchy hook, 1–2 emojis, hashtag mix at end of the title. Sweet spot ~70–85 chars.instagram_title, up to 2200 chars): long-form storytelling — first line is the hook, then 2-4 short paragraphs (use \n\n), CTA ("Guarda esto", "Etiqueta a alguien…", "Comenta X para…"), then 20-30 hashtags mixing sizes (large/medium/niche). Sweet spot 500–800 chars total.youtube_title, max 100 chars but keep ~40-60 chars so it doesn't truncate on mobile): SEO-friendly with keywords. Description focuses on searchability, 3–5 hashtags max.title and description for any platform that doesn't have its own override.Length contract (verify before publishing): YouTube title is the most constrained — write it shortest and most direct. TikTok and Instagram can breathe — TikTok up to ~85 chars in tiktok_title, Instagram captions are long-form by design.
Show the generated copy back to the user and confirm before publishing.
Schedule one approved clip per day starting tomorrow at 10:00 in TIMEZONE (default Europe/Madrid). Each next clip += 1 day.
For each approved clip:
python autoshorts.py publish "output/<slug>/clip_<ID>_final.mp4" \
--platforms tiktok,instagram,youtube \
--title "<GENERAL>" \
--description "<DESCRIPTION>" \
--tiktok-title "<TIKTOK_TITLE>" \
--instagram-title "<INSTAGRAM_CAPTION>" \
--youtube-title "<YOUTUBE_TITLE>" \
--schedule "<ISO_DATE>" \
--timezone "Europe/Madrid" \
--tiktok-mode draft \
--clip-id <ID> \
--hook-text "<HOOK_TEXT>" \
--viral-score <GEMINI_SCORE> \
--reason "<GEMINI_REASON>" \
--video-source "<SOURCE_VIDEO_FILENAME>"
The --clip-id, --hook-text, --viral-score, --reason, --video-source flags are not optional in practice — they feed the learning loop. Without them, learn cannot correlate engagement metrics back to which hook patterns worked. The values come straight from clips.json (the Gemini output) and the source video filename.
TikTok mode: --tiktok-mode draft (default) sends to the TikTok inbox via post_mode=MEDIA_UPLOAD so the user can finish editing in-app before publishing. Use --tiktok-mode direct (DIRECT_POST) only when the user explicitly wants immediate publishing.
Always run with --dry-run first and show the user the exact request payloads. Only execute the real publish after explicit "go".
python autoshorts.py mark-processed "<VIDEO_PATH>" \
--clips-generated <N_CANDIDATES> \
--clips-published <N_APPROVED>
This appends the video's hash to state/processed.json so tomorrow's pick skips it. Run this even if --clips-published 0 — a rejected video is still consumed. The only time you do NOT mark-processed is if the pipeline crashed mid-run (e.g., Gemini errored out before producing clips); in that case let the user retry the same video tomorrow.
After publishing, you can run a quick reflect to capture WHY the user approved the clips they approved (no engagement metrics needed — just the approved-vs-rejected signal):
python autoshorts.py reflect --window-days 30
This compares recent candidates (learnings/candidate-history.jsonl) against approvals (learnings/post-history.jsonl) and asks Gemini to extract qualitative patterns ("approves hooks with concrete numbers, rejects question-form hooks"). Output goes to learnings/runs/reflect-YYYY-MM-DD-HHMM.md.
These observations are NOT auto-promoted to HOT.md. They're notes for the user to review and curate. Run reflect occasionally — daily is overkill, weekly is fine.
Print:
| # | File | Duration | Hook | Schedule | Platforms |
|---|
…and the source video name with how many candidates were generated vs. published.
learn)This skill gets smarter over time. Engagement data from past clips (views, likes, comments, shares, saves — fetched from Upload-Post analytics) is fed back into the clip-selection prompt for future runs.
Run learn weekly, not daily. Engagement metrics need time to mature; daily learn would chase noise.
python autoshorts.py learn
Defaults: 7-day soak (clips younger than this are excluded), 90-day max age (older are stale), composite score = 0.6·views + 0.4·engagement_rate, top/bottom 20% as winners/losers.
learnings/post-history.jsonl (every clip we published, with its hook + Gemini score + Gemini reason + source video).GET /api/uploadposts/post-analytics/{request_id} — same request_id we got back at publish time.learnings/HOT.md to Gemini Flash with a meta-prompt asking it to produce an updated HOT.md (≤80 lines) listing patterns supported by the new evidence.HOT.YYYYMMDD-HHMMSS.md.bak).learnings/runs/learn-YYYY-MM-DD.md so the user can see exactly which clips were called winners/losers and how the learnings changed.cmd_analyze automatically reads learnings/HOT.md (if it exists and is non-empty) and prepends it to the Gemini prompt as "PRIOR LEARNINGS — apply when selecting clips and writing hooks". Gemini then weighs those patterns when proposing clips and writing hooks for tomorrow's video. You don't have to do anything to make this work — it happens on every analyze call.
learnpython autoshorts.py learn0 9 * * 1 cd ~/Documents/skill-autoshorts && ./venv/bin/python autoshorts.py learnpost-history.jsonl has fewer than ~10 entries — the rule of "5 winners + 5 losers minimum" will short-circuit the run with a "not enough data" note.HOT.md by hand AND keep running learn — learn will overwrite your edits. If you want manual rules, put them in learnings/insights/ (manual notes, not used by the pipeline).post-history.jsonl or metrics.jsonl — they're append-only memory. Without them every learn starts from zero.learn more than ~once a week — Gemini will just churn the same patterns.output/<slug>/clips.raw.txt — read it and re-prompt manually.OUTPUT_FOLDER/<video_slug>/. Surface them clearly so the openclaw harness can attach them when forwarding to Telegram / WhatsApp / whatever messenger channel the user has configured.pick says "all videos already processed", tell the user and stop — do not re-process. They need to drop a new video into INPUT_FOLDER.state/processed.json is the only memory between runs. Never edit it programmatically except via mark-processed. If the user asks to "reprocess video X", the right move is to ask them to confirm, then remove the matching entry from state/processed.json manually.medium model (~1.5 GB) downloads on first transcribe call. Warn the user the first run will take longer — subsequent runs reuse the cached model.