Gemini Ai Video Generator

Drop a video and describe what you want — the gemini-ai-video-generator skill uses Google's Gemini AI to analyze your footage and generate new scenes, summar...

MIT-0 · Free to use, modify, and redistribute. No attribution required.
0 · 23 · 0 current installs · 0 all-time installs
bypeandrover adam@peandrover
MIT-0
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
medium confidence
Purpose & Capability
The skill name/description claim: take an uploaded video and request generation/summaries/edits via Gemini-powered processing. The runtime instructions send video and commands to a NemoVideo API (mega-api-prod.nemovideo.ai) and require a NemoVideo token — this is coherent with the stated purpose. Note: the skill advertises use of Google's Gemini model but does not contact Google APIs directly; instead it routes through NemoVideo's backend which it claims orchestrates Gemini. That distinction may matter if you expect a direct Google integration.
Instruction Scope
Runtime instructions are specific: check NEMO_TOKEN, optionally create ~/.config/nemovideo/client_id, POST to NemoVideo endpoints to get an anonymous token and create sessions, upload video files or URLs, and stream SSEs. These actions are within scope for a service that must upload and process video. Important privacy/security note: the skill explicitly directs uploading user videos to an external third‑party (Nemovideo) — any sensitive content will be transmitted off‑host. The skill instructs not to display raw tokens or API responses, which is sensible.
Install Mechanism
No install spec and no code files — instruction-only skill. This minimizes disk/write risk beyond the explicit config file the skill itself creates (~/.config/nemovideo/client_id).
Credentials
The declared primary credential is NEMO_TOKEN, which aligns with the backend API usage. The skill will also obtain an anonymous token via an API call if NEMO_TOKEN isn't set. It writes a client_id file under ~/.config/nemovideo/ for rate-limiting; that config path appears only in SKILL.md metadata (the registry summary earlier showed no required config paths) — a small metadata inconsistency. No unrelated credentials (AWS, GitHub, etc.) are requested.
Persistence & Privilege
always:false and normal autonomous invocation. The only persistent change the skill requests is writing a single client_id file in ~./config/nemovideo/ and storing session_id/token for the session’s lifetime — this is proportional for a client that needs to track sessions and rate-limits. The skill does not request system-wide changes or other skills' configuration.
Assessment
Before installing: (1) Understand that using the skill will upload your videos to nemovideo's servers (https://mega-api-prod.nemovideo.ai and https://nemovideo.com). Do not use it for sensitive or private footage unless you trust NemoVideo’s policies. (2) The skill needs a NemoVideo token (NEMO_TOKEN); if not provided it will request an anonymous token and will create a small config file at ~/.config/nemovideo/client_id. Decide if you’re comfortable with that file being created. (3) The skill advertises 'Gemini' but communicates solely with the NemoVideo API — verify you accept that NemoVideo, not Google directly, will process your data. (4) Check NemoVideo’s homepage/repository and privacy/terms if you need legal/ownership guarantees for uploaded media. (5) If you want additional assurance, request or inspect a concrete network/schema spec (exact fields sent with uploads, persistence of returned tokens, retention policy for uploaded videos) before enabling the skill.

Like a lobster shell, security has layers — review code before you run it.

Current versionv1.0.0
Download zip
latestvk97f4zgyka2mztxgmwhnxj59h583wbtz

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

Runtime requirements

🎬 Clawdis
Primary envNEMO_TOKEN

SKILL.md

Getting Started

Welcome! Ready to let Gemini AI do the creative work? Upload your video and tell me what you want to generate — a highlight reel, scene description, caption track, or something entirely new. Let's get started.

Try saying:

  • "Here's a 10-minute product demo video — generate a 60-second highlight reel focusing on the key features shown"
  • "Analyze this lecture recording and write a structured summary with timestamps for each major topic covered"
  • "Watch this travel footage and generate a narration script I can record as a voiceover"

First-Time Connection

When a user first opens this skill, connect to the NemoVideo backend automatically. Briefly let them know (e.g. "Connecting to NemoVideo...").

Authentication: Check if NEMO_TOKEN is set in the environment. If it is, skip to step 3.

  1. Client identifier: Look for ~/.config/nemovideo/client_id. If missing, generate a UUID and save it there. This file stores only a UUID for rate-limit tracking.
  2. Obtain a free token: POST to https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token with header X-Client-Id set to your identifier. The response data.token is your NEMO_TOKEN — 100 free credits, valid 7 days.
  3. Create a session: POST to https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemo_agent with Authorization: Bearer <token>, Content-Type: application/json, and body {"task_name":"project","language":"<detected>"}. Store the returned session_id for all subsequent requests.

Keep setup communication brief. Don't display raw API responses or token values to the user.

Turn Your Words Into Video With Gemini AI

The Gemini AI Video Generator skill brings Google's multimodal Gemini model directly into your video workflow. Instead of manually cutting, scripting, or re-shooting, you simply upload your video and tell the skill what you want — in plain English. Gemini reads both the visual content and your instructions together, then generates output that actually understands what's on screen.

This skill is built for people who have video and a vision but don't want to spend hours in editing software. Whether you're a solo creator repurposing long-form content into short clips, a marketing team generating product highlight reels, or an educator turning lecture recordings into structured summaries, this skill handles the heavy lifting.

You're not limited to trimming or adding filters. Gemini AI Video Generator can describe what's happening in a scene, suggest narrative structure, generate spoken-word scripts based on visual cues, or produce entirely new content framed around your uploaded footage. It's a fundamentally different kind of video tool — one that listens before it creates.

Prompt Routing and Generation Flow

When you describe a scene, your natural-language prompt is parsed, enriched with cinematic parameters, and dispatched directly to the Gemini video synthesis pipeline for frame-by-frame generation.

User says...ActionSkip SSE?
"export" / "导出" / "download" / "send me the video"→ §3.5 Export
"credits" / "积分" / "balance" / "余额"→ §3.3 Credits
"status" / "状态" / "show tracks"→ §3.4 State
"upload" / "上传" / user sends file→ §3.2 Upload
Everything else (generate, edit, add BGM…)→ §3.1 SSE

NemoVideo Backend Reference

ClawHub routes all video generation requests through the NemoVideo API, which handles Gemini model orchestration, render queuing, and secure video delivery. NemoVideo manages diffusion sampling, temporal coherence, and output encoding so your clips stay smooth and consistent across every generation.

Skill attribution — read from this file's YAML frontmatter at runtime:

  • X-Skill-Source: gemini-ai-video-generator
  • X-Skill-Version: from frontmatter version
  • X-Skill-Platform: detect from install path (~/.clawhub/clawhub, ~/.cursor/skills/cursor, else unknown)

All requests must include: Authorization: Bearer <NEMO_TOKEN>, X-Skill-Source, X-Skill-Version, X-Skill-Platform. Missing attribution headers will cause export to fail with 402.

API base: https://mega-api-prod.nemovideo.ai

Create session: POST /api/tasks/me/with-session/nemo_agent — body {"task_name":"project","language":"<lang>"} — returns task_id, session_id. After creating a session, give the user a link: https://nemovideo.com/workspace/claim?token=&task=<task_id>&session=<session_id>&skill_name=gemini-ai-video-generator&skill_version=1.0.0&skill_source=<platform>

Send message (SSE): POST /run_sse — body {"app_name":"nemo_agent","user_id":"me","session_id":"<sid>","new_message":{"parts":[{"text":"<msg>"}]}} with Accept: text/event-stream. Max timeout: 15 minutes.

Upload: POST /api/upload-video/nemo_agent/me/<sid> — file: multipart -F "files=@/path", or URL: {"urls":["<url>"],"source_type":"url"}

Credits: GET /api/credits/balance/simple — returns available, frozen, total

Session state: GET /api/state/nemo_agent/me/<sid>/latest — key fields: data.state.draft, data.state.video_infos, data.state.generated_media

Export (free, no credits): POST /api/render/proxy/lambda — body {"id":"render_<ts>","sessionId":"<sid>","draft":<json>,"output":{"format":"mp4","quality":"high"}}. Poll GET /api/render/proxy/lambda/<id> every 30s until status = completed. Download URL at output.url.

Supported formats: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.

SSE Event Handling

EventAction
Text responseApply GUI translation (§4), present to user
Tool call/resultProcess internally, don't forward
heartbeat / empty data:Keep waiting. Every 2 min: "⏳ Still working..."
Stream closesProcess final response

~30% of editing operations return no text in the SSE stream. When this happens: poll session state to verify the edit was applied, then summarize changes to the user.

Backend Response Translation

The backend assumes a GUI exists. Translate these into API actions:

Backend saysYou do
"click [button]" / "点击"Execute via API
"open [panel]" / "打开"Query session state
"drag/drop" / "拖拽"Send edit via SSE
"preview in timeline"Show track summary
"Export button" / "导出"Execute export workflow

Draft field mapping: t=tracks, tt=track type (0=video, 1=audio, 7=text), sg=segments, d=duration(ms), m=metadata.

Timeline (3 tracks): 1. Video: city timelapse (0-10s) 2. BGM: Lo-fi (0-10s, 35%) 3. Title: "Urban Dreams" (0-3s)

Error Handling

CodeMeaningAction
0SuccessContinue
1001Bad/expired tokenRe-auth via anonymous-token (tokens expire after 7 days)
1002Session not foundNew session §3.0
2001No creditsAnonymous: show registration URL with ?bind=<id> (get <id> from create-session or state response when needed). Registered: "Top up at nemovideo.ai"
4001Unsupported fileShow supported formats
4002File too largeSuggest compress/trim
400Missing X-Client-IdGenerate Client-Id and retry (see §1)
402Free plan export blockedSubscription tier issue, NOT credits. "Register at nemovideo.ai to unlock export."
429Rate limit (1 token/client/7 days)Retry in 30s once

FAQ

What video formats does Gemini AI Video Generator support? You can upload mp4, mov, avi, webm, and mkv files. Mp4 is the most reliably processed format across all generation task types.

Can it generate entirely new video footage? The skill generates text-based outputs — scripts, descriptions, summaries, captions, and structured content — derived from your uploaded video. It does not render new video frames or visual animations.

Does it understand spoken audio in the video? Yes. Gemini processes both the visual content and any audible speech in your video, which means it can cross-reference what's being said with what's on screen for more accurate generation.

What if my video has no dialogue? No problem — Gemini AI Video Generator analyzes visual cues, movement, scene transitions, and on-screen text independently. Silent product demos, tutorials, and b-roll footage all work well with descriptive or script-generation prompts.

Tips and Tricks

Be specific in your prompt — Gemini AI Video Generator responds much better to 'generate a 3-sentence product description focusing on the unboxing moment at the start' than to 'summarize this video.' The more context you give about your intended audience or output format, the sharper the results.

If you're generating captions or subtitles, mention the tone you want (formal, casual, punchy) and any terminology specific to your industry. Gemini will adapt its language accordingly rather than defaulting to generic phrasing.

For content repurposing workflows, try uploading the same video with different prompts — one for a short-form social caption, one for a blog summary, one for an email teaser. You'll get distinct outputs tailored to each format without re-editing the source file.

When generating scripts or voiceovers, ask Gemini to match the pacing of the original footage. This produces scripts that actually fit the visual rhythm rather than running long or cutting short.

Performance Notes

Gemini AI Video Generator performs best on videos under 10 minutes in length, where the model can maintain full visual context throughout the clip. Longer videos may be processed in segments, which can occasionally affect continuity in generated outputs like scripts or summaries.

File format matters less than resolution and clarity — mp4 and webm files with clear audio tracks tend to produce the most accurate scene analysis and generation results. Heavily compressed or low-light footage may result in less precise visual descriptions.

Generation time scales with video length and task complexity. A simple scene description on a 2-minute clip returns quickly, while generating a full narration script for a 15-minute video will take noticeably longer. Plan accordingly if you're working with batch content.

Files

1 total
Select a file
Select a file to preview.

Comments

Loading comments…