Install
openclaw skills install wavespeed-infinitetalk-avatarGenerate talking head videos from a portrait image and audio using WaveSpeed AI's InfiniteTalk model. Produces lip-synced video up to 10 minutes long at 480p...
openclaw skills install wavespeed-infinitetalk-avatarGenerate talking head videos from a portrait image and audio using WaveSpeed AI's InfiniteTalk model. Produces lip-synced video up to 10 minutes long with natural facial animations.
export WAVESPEED_API_KEY="your-api-key"
Get your API key at wavespeed.ai/accesskey.
import wavespeed from 'wavespeed';
// Upload local image and audio files
const imageUrl = await wavespeed.upload("/path/to/portrait.png");
const audioUrl = await wavespeed.upload("/path/to/speech.mp3");
const output_url = (await wavespeed.run(
"wavespeed-ai/infinitetalk",
{
image: imageUrl,
audio: audioUrl
}
))["outputs"][0];
You can also pass existing URLs directly:
const output_url = (await wavespeed.run(
"wavespeed-ai/infinitetalk",
{
image: "https://example.com/portrait.jpg",
audio: "https://example.com/speech.mp3"
}
))["outputs"][0];
Model ID: wavespeed-ai/infinitetalk
Animate a portrait image with lip-synced audio to produce a talking head video.
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
image | string | Yes | -- | URL of the portrait image to animate |
audio | string | Yes | -- | URL of the audio to drive the animation |
mask_image | string | No | -- | URL of a mask image to specify which person to animate. Warning: The mask should only cover the regions to animate — do not upload the full image as mask_image, or the result may render as fully black. |
prompt | string | No | -- | Text prompt for additional guidance. Keep it short; English recommended to avoid noisy results. |
resolution | string | No | 480p | Output resolution. One of: 480p, 720p |
seed | integer | No | -1 | Random seed (-1 for random). Range: -1 to 2147483647 |
import wavespeed from 'wavespeed';
const imageUrl = await wavespeed.upload("/path/to/portrait.png");
const audioUrl = await wavespeed.upload("/path/to/speech.mp3");
const output_url = (await wavespeed.run(
"wavespeed-ai/infinitetalk",
{
image: imageUrl,
audio: audioUrl,
resolution: "720p",
seed: 42
}
))["outputs"][0];
When multiple people are in the image, use a mask to specify which face to animate:
const imageUrl = await wavespeed.upload("/path/to/group-photo.png");
const audioUrl = await wavespeed.upload("/path/to/speech.mp3");
const maskUrl = await wavespeed.upload("/path/to/mask.png");
const output_url = (await wavespeed.run(
"wavespeed-ai/infinitetalk",
{
image: imageUrl,
audio: audioUrl,
mask_image: maskUrl,
resolution: "720p"
}
))["outputs"][0];
Important: The mask should only highlight the face region to animate. Using the full image as a mask will produce a fully black output.
const output_url = (await wavespeed.run(
"wavespeed-ai/infinitetalk",
{
image: imageUrl,
audio: audioUrl,
prompt: "natural head movements, subtle expressions"
}
))["outputs"][0];
import { Client } from 'wavespeed';
const client = new Client("your-api-key", {
maxRetries: 2,
maxConnectionRetries: 5,
retryInterval: 1.0,
});
const imageUrl = await client.upload("/path/to/portrait.png");
const audioUrl = await client.upload("/path/to/speech.mp3");
const output_url = (await client.run(
"wavespeed-ai/infinitetalk",
{
image: imageUrl,
audio: audioUrl,
resolution: "720p"
}
))["outputs"][0];
import { Client, WavespeedTimeoutException, WavespeedPredictionException } from 'wavespeed';
const client = new Client();
const result = await client.runNoThrow(
"wavespeed-ai/infinitetalk",
{
image: imageUrl,
audio: audioUrl
}
);
if (result.outputs) {
console.log("Video URL:", result.outputs[0]);
console.log("Task ID:", result.detail.taskId);
} else {
console.log("Failed:", result.detail.error.message);
if (result.detail.error instanceof WavespeedTimeoutException) {
console.log("Request timed out - try increasing timeout");
} else if (result.detail.error instanceof WavespeedPredictionException) {
console.log("Prediction failed");
}
}
| Resolution | Cost per 5 seconds | Rate per second | Max length |
|---|---|---|---|
| 480p | $0.15 | $0.03/s | 10 minutes |
| 720p | $0.30 | $0.06/s | 10 minutes |
Minimum charge is 5 seconds. Video length is determined by the audio duration (up to 10 minutes).
mask_image to target the correct faceWAVESPEED_API_KEY securely. Do not hardcode it in source files or commit it to version control. Use environment variables or secret management systems.