---
name: text-to-video
description: Generate a talking video from a photo and written script using VEED Fabric 1.0 text endpoint with AI voice
metadata:
  tags: text-to-speech, tts, voice, script, text, video
---

# Image + Text to Video

Turns a photo of a face into a talking video where an AI-generated voice speaks the provided script. The face lip-syncs to the generated speech.

## Required inputs

1. **Image** — a photo with a clearly visible face
2. **Text** — the script to be spoken (1–2000 characters)

## Supported image formats

JPG, JPEG, PNG, WebP, GIF, AVIF

**MUST** validate the image file extension before uploading.

## Text limits

The Fabric `/text` endpoint has a hard limit of **2000 characters**.

**MUST** check the character count before submitting. If the script exceeds 2000 characters, reject it immediately with:

> Your script is {length} characters ({length - 2000} over the 2000-character limit, roughly 30–45 seconds of speech). Please shorten it or split into multiple videos.

**MUST NOT** truncate or silently modify the user's script. **MUST NOT** attempt to send text over 2000 characters to the API.

## Voice presets

Ask the user to pick a voice style. Offer these presets plus custom free-text:

| Preset | `voice_description` value |
|---|---|
| Professional | `Clear, confident, professional business tone` |
| Casual | `Warm, friendly, conversational tone` |
| Energetic | `Upbeat, enthusiastic, high-energy tone` |
| Custom | User's own description passed directly |

If the user doesn't specify, default to **Professional**.

## Options

**Resolution** (default: `480p`):
- `480p` — $0.08/sec (standard), $0.10/sec (fast)
- `720p` — $0.15/sec (standard), $0.20/sec (fast)

**Speed** (default: standard):
- Standard — uses `https://queue.fal.run/veed/fabric-1.0/text`
- Fast — uses `https://queue.fal.run/veed/fabric-1.0/text/fast`

## API request

The `image_url` **MUST** be a publicly accessible URL. If the user provides a local file path, upload it first — see [./file-upload.md](./file-upload.md).

```bash
RESPONSE=$(curl -s -X POST "https://queue.fal.run/veed/fabric-1.0/text" \
  -H "Authorization: Key $FAL_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "image_url": "https://example.com/headshot.jpg",
    "text": "Hi, I'\''m the founder of Acme. Let me tell you about our new product.",
    "resolution": "480p",
    "voice_description": "Clear, confident, professional business tone"
  }')
```

For fast mode, change the endpoint to `https://queue.fal.run/veed/fabric-1.0/text/fast`.

The response contains `request_id`, `status_url`, and `response_url`. Proceed to [./queue.md](./queue.md) for polling and retrieval.

## Full workflow

1. Gather image and script text from the user
2. Validate image format and text length (1–2000 chars)
3. Upload local image if needed ([file-upload.md](./file-upload.md))
4. Ask about voice style (show presets)
5. Ask about resolution and speed (show pricing)
6. Submit to queue (this file)
7. Poll for status ([queue.md](./queue.md))
8. Download result ([output.md](./output.md))