Pixbim Lip Sync Ai

v1.0.0

Turn any video into a perfectly lip-synced production using pixbim-lip-sync-ai — the tool that matches mouth movements to dialogue, dubbing, or voiceover wit...

0· 80·0 current·0 all-time

Install

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for tk8544-b/pixbim-lip-sync-ai.

Previewing Install & Setup.
Prompt PreviewInstall & Setup
Install the skill "Pixbim Lip Sync Ai" (tk8544-b/pixbim-lip-sync-ai) from ClawHub.
Skill page: https://clawhub.ai/tk8544-b/pixbim-lip-sync-ai
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Required env vars: NEMO_TOKEN
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install pixbim-lip-sync-ai

ClawHub CLI

Package manager switcher

npx clawhub@latest install pixbim-lip-sync-ai
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
Name and description describe a cloud lip-sync service; the only required credential is NEMO_TOKEN and a config path pointing to ~/.config/nemovideo/, which match that purpose.
Instruction Scope
Instructions direct the agent to check for NEMO_TOKEN, create an anonymous token via the nemovideo API if missing, create a session, upload video files (multipart or URL), call SSE endpoints, and include attribution headers. These actions are consistent with a remote rendering service, but they involve automatic network requests and uploading potentially sensitive video/audio content to an external domain (mega-api-prod.nemovideo.ai). The skill also instructs storing session_id and token and reading its own YAML frontmatter for attribution — expected but worth noting.
Install Mechanism
Instruction-only skill with no install spec and no code files; nothing is written to disk by an installer and there are no external downloads referenced in the install step.
Credentials
Only one environment variable is required (NEMO_TOKEN) and the SKILL.md documents generating an anonymous token if not present. Declared config path (~/.config/nemovideo/) aligns with storing tokens/session data for the service.
Persistence & Privilege
always:false and autonomous invocation enabled (standard). The skill will create and store a short-lived anonymous token and session_id in user config and will auto-connect on first use, which means it may make network calls without an explicitly pre-provided token; consider this automated connection behavior when installing. The skill does not request elevated system privileges or modify other skills' configs.
Assessment
This skill appears to do what it says, but it operates by sending your video and audio to mega-api-prod.nemovideo.ai for processing and will generate or store a NEMO_TOKEN/session_id in ~/.config/nemovideo/ if one isn't provided. Before installing or using it: (1) confirm you are comfortable uploading the content (sensitive footage will be transmitted to an external cloud service); (2) if you prefer control, set your own NEMO_TOKEN environment variable instead of allowing the skill to create an anonymous token; (3) review the nemo service's privacy/terms if possible; and (4) be aware the skill will auto-connect on first use and may perform network operations during normal conversation.

Like a lobster shell, security has layers — review code before you run it.

Runtime requirements

🎤 Clawdis
EnvNEMO_TOKEN
Primary envNEMO_TOKEN
latestvk97e4pa065j86psxn14ebm79th84e307
80downloads
0stars
1versions
Updated 2w ago
v1.0.0
MIT-0

Getting Started

Welcome to Pixbim Lip Sync AI — your shortcut to perfectly synchronized lips and audio in any video. Share your video details or audio track and let's get your lip sync dialed in right now.

Try saying:

  • "I have a Spanish dubbed audio track and an English video — can you help me sync the lip movements to the Spanish dialogue using Pixbim Lip Sync AI?"
  • "My interview footage has audio that drifted out of sync halfway through the recording. How do I use Pixbim Lip Sync AI to fix the mouth movement alignment?"
  • "I'm working on an animated character and I want to use Pixbim Lip Sync AI to match the mouth shapes to a recorded voiceover line — what's the best approach?"

First-Time Connection

When a user first opens this skill, connect to the processing backend automatically. Briefly let them know (e.g. "Setting up...").

Authentication: Check if NEMO_TOKEN is set in the environment. If it is, skip to step 2.

  1. Obtain a free token: Generate a random UUID as client identifier. POST to https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token with header X-Client-Id set to that UUID. The response data.token is your NEMO_TOKEN — 100 free credits, valid 7 days.
  2. Create a session: POST to https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemo_agent with Authorization: Bearer <token>, Content-Type: application/json, and body {"task_name":"project","language":"<detected>"}. Store the returned session_id for all subsequent requests.

Keep setup communication brief. Don't display raw API responses or token values to the user.

Make Every Word Match Every Mouth Movement

Lip sync errors are one of the most distracting problems in video production — whether you're dubbing a film into a new language, adding a voiceover to an animated character, or correcting audio drift in a recorded interview. Pixbim Lip Sync AI solves this by analyzing both the audio track and the facial movements in your video, then intelligently aligning them so every syllable lands exactly when the lips move.

This skill gives you direct access to Pixbim's lip sync engine through a conversational interface. You can describe your project, specify your source video and target audio, and get back a synchronized output without needing to touch a timeline or manually adjust keyframes. It's designed for workflows where speed and accuracy both matter.

Content creators producing multilingual versions of their videos, game developers animating NPC dialogue, and post-production teams cleaning up dubbing artifacts will all find this tool cuts hours of manual work down to minutes. The result is natural-looking mouth movement that holds up under scrutiny — not the rubbery, approximate sync you get from generic tools.

Routing Lip Sync Requests

When you submit a lip sync job, your request is parsed for the target video clip, audio track, and facial detection parameters, then dispatched to the appropriate Pixbim processing pipeline based on resolution tier and sync mode selected.

User says...ActionSkip SSE?
"export" / "导出" / "download" / "send me the video"→ §3.5 Export
"credits" / "积分" / "balance" / "余额"→ §3.3 Credits
"status" / "状态" / "show tracks"→ §3.4 State
"upload" / "上传" / user sends file→ §3.2 Upload
Everything else (generate, edit, add BGM…)→ §3.1 SSE

Pixbim API Backend Reference

Pixbim Lip Sync AI runs on a cloud-based neural rendering backend that performs per-frame phoneme mapping and mouth-shape blending using its trained deep learning model. All video assets are temporarily staged in secure cloud storage during the synthesis pass, then returned as a processed output file once the lip sync render is finalized.

Skill attribution — read from this file's YAML frontmatter at runtime:

  • X-Skill-Source: pixbim-lip-sync-ai
  • X-Skill-Version: from frontmatter version
  • X-Skill-Platform: detect from install path (~/.clawhub/clawhub, ~/.cursor/skills/cursor, else unknown)

All requests must include: Authorization: Bearer <NEMO_TOKEN>, X-Skill-Source, X-Skill-Version, X-Skill-Platform. Missing attribution headers will cause export to fail with 402.

API base: https://mega-api-prod.nemovideo.ai

Create session: POST /api/tasks/me/with-session/nemo_agent — body {"task_name":"project","language":"<lang>"} — returns task_id, session_id.

Send message (SSE): POST /run_sse — body {"app_name":"nemo_agent","user_id":"me","session_id":"<sid>","new_message":{"parts":[{"text":"<msg>"}]}} with Accept: text/event-stream. Max timeout: 15 minutes.

Upload: POST /api/upload-video/nemo_agent/me/<sid> — file: multipart -F "files=@/path", or URL: {"urls":["<url>"],"source_type":"url"}

Credits: GET /api/credits/balance/simple — returns available, frozen, total

Session state: GET /api/state/nemo_agent/me/<sid>/latest — key fields: data.state.draft, data.state.video_infos, data.state.generated_media

Export (free, no credits): POST /api/render/proxy/lambda — body {"id":"render_<ts>","sessionId":"<sid>","draft":<json>,"output":{"format":"mp4","quality":"high"}}. Poll GET /api/render/proxy/lambda/<id> every 30s until status = completed. Download URL at output.url.

Supported formats: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.

SSE Event Handling

EventAction
Text responseApply GUI translation (§4), present to user
Tool call/resultProcess internally, don't forward
heartbeat / empty data:Keep waiting. Every 2 min: "⏳ Still working..."
Stream closesProcess final response

~30% of editing operations return no text in the SSE stream. When this happens: poll session state to verify the edit was applied, then summarize changes to the user.

Backend Response Translation

The backend assumes a GUI exists. Translate these into API actions:

Backend saysYou do
"click [button]" / "点击"Execute via API
"open [panel]" / "打开"Query session state
"drag/drop" / "拖拽"Send edit via SSE
"preview in timeline"Show track summary
"Export button" / "导出"Execute export workflow

Draft field mapping: t=tracks, tt=track type (0=video, 1=audio, 7=text), sg=segments, d=duration(ms), m=metadata.

Timeline (3 tracks): 1. Video: city timelapse (0-10s) 2. BGM: Lo-fi (0-10s, 35%) 3. Title: "Urban Dreams" (0-3s)

Error Handling

CodeMeaningAction
0SuccessContinue
1001Bad/expired tokenRe-auth via anonymous-token (tokens expire after 7 days)
1002Session not foundNew session §3.0
2001No creditsAnonymous: show registration URL with ?bind=<id> (get <id> from create-session or state response when needed). Registered: "Top up credits in your account"
4001Unsupported fileShow supported formats
4002File too largeSuggest compress/trim
400Missing X-Client-IdGenerate Client-Id and retry (see §1)
402Free plan export blockedSubscription tier issue, NOT credits. "Register or upgrade your plan to unlock export."
429Rate limit (1 token/client/7 days)Retry in 30s once

Integration Guide

Pixbim Lip Sync AI works by taking a source video containing a face or character and a separate audio track, then processing both to generate a synchronized output. To get the best results, your video should have a clearly visible face or mouth region with consistent lighting — heavy motion blur or extreme camera angles can reduce sync accuracy.

When submitting a project through this skill, provide the video resolution, frame rate, and whether the subject is a live-action person or a 2D/3D animated character. Pixbim handles both, but the processing pipeline differs. For dubbing workflows, supply the target language audio as a clean WAV or MP3 file, and specify whether you want the original background audio preserved beneath the new dialogue.

For batch localization — syncing the same video to multiple language tracks — describe all target audio files in a single request and the skill will structure the job accordingly. Output files are delivered in the same format and resolution as your source video unless you specify otherwise.

Troubleshooting

If your lip sync output looks off, the most common cause is a mismatch between the audio sample rate and the video frame rate. Make sure your audio file is exported at 44.1kHz or 48kHz and your video is a standard frame rate (24, 25, or 30fps) before submitting. Non-standard frame rates can cause Pixbim Lip Sync AI to miscalculate the timing offsets.

For animated characters, if the mouth shapes appear generic or don't match the phonemes in the audio, check whether the character rig supports viseme-based animation. Pixbim Lip Sync AI outputs viseme data that requires a compatible rig — if your character only has basic open/close mouth states, the sync will appear simplified.

If the face is not being detected in the source video, ensure the subject's face occupies at least 15% of the frame and is not obscured by masks, heavy makeup, or extreme lighting. Submitting a short test clip first is a good way to confirm detection before processing a full-length video.

Comments

Loading comments...