# Voice.ai API Reference Production API endpoints used by this pipeline. Based on the [Voice.ai TTS SDK](https://github.com/gizmoGremlin/openclaw-skill-voice-ai-voices). ## Authentication Bearer token in the `Authorization` header: ``` Authorization: Bearer ``` Get your key at [voice.ai/dashboard](https://voice.ai/dashboard). ## Base URL ``` https://dev.voice.ai ``` Override via `VOICEAI_API_BASE` environment variable. API path prefix: `/api/v1` --- ## Endpoints ### `GET /api/v1/tts/voices` — List available voices **Query Parameters:** | Param | Type | Default | Description | |--------------|---------|---------|----------------------------| | `limit` | integer | 10 | Max voices to return | | `offset` | integer | 0 | Pagination offset | | `visibility` | string | — | `PUBLIC` or `PRIVATE` | **Response:** ```json { "voices": [ { "voice_id": "d1bf0f33-8e0e-4fbf-acf8-45c3c6262513", "name": "Ellie", "language": "en", "visibility": "PUBLIC", "status": "AVAILABLE" } ] } ``` ### `POST /api/v1/tts/speech` — Generate speech **Request Body:** ```json { "text": "Hello, world!", "voice_id": "d1bf0f33-8e0e-4fbf-acf8-45c3c6262513", "audio_format": "wav", "temperature": 1.0, "top_p": 0.8, "model": "voiceai-tts-v1-latest", "language": "en" } ``` | Field | Type | Required | Default | Description | |----------------|--------|----------|-----------------------------|-------------------------------------| | `text` | string | ✅ | — | Text to synthesize (max 5000 chars) | | `voice_id` | string | No | built-in default | Voice UUID or omit for default | | `audio_format` | string | No | `mp3` | `mp3`, `wav`, `pcm`, etc. | | `temperature` | number | No | 1.0 | Variation (0.0–2.0) | | `top_p` | number | No | 0.8 | Nucleus sampling (0.0–1.0) | | `model` | string | No | auto-selected by language | `voiceai-tts-v1-latest` or `voiceai-tts-multilingual-v1-latest` | | `language` | string | No | `en` | ISO 639-1 code | **Response:** Binary audio data in the requested format. ### `POST /api/v1/tts/speech/stream` — Streaming speech Same body as `/tts/speech`. Returns chunked transfer-encoded audio for low-latency playback. --- ## Audio Formats | Format | Description | |-------------------|----------------------------| | `mp3` | MP3 at 32kHz (default) | | `wav` | WAV at 32kHz | | `pcm` | Raw PCM 16-bit | | `mp3_44100_128` | MP3 44.1kHz 128kbps | | `mp3_44100_192` | MP3 44.1kHz 192kbps | | `wav_22050` | WAV 22.05kHz | | `wav_24000` | WAV 24kHz | | `opus_48000_128` | Opus 48kHz 128kbps | ## Models | Model ID | Languages | |---------------------------------------------|-----------| | `voiceai-tts-v1-latest` | English | | `voiceai-tts-multilingual-v1-latest` | 11 langs | Multilingual model supports: en, es, fr, de, it, pt, pl, ru, nl, sv, ca. ## Popular Voices | Alias | Voice ID (UUID) | Gender | Style | |----------|----------------------------------------------|--------|--------------------------| | `ellie` | `d1bf0f33-8e0e-4fbf-acf8-45c3c6262513` | F | Youthful, vibrant vlogger| | `oliver` | `f9e6a5eb-a7fd-4525-9e92-75125249c933` | M | Friendly British | | `lilith` | `4388040c-8812-42f4-a264-f457a6b2b5b9` | F | Soft, feminine | | `smooth` | `dbb271df-db25-4225-abb0-5200ba1426bc` | M | Deep, smooth narrator | | `corpse` | `72d2a864-b236-402e-a166-a838ccc2c273` | M | Deep, distinctive | | `skadi` | `559d3b72-3e79-4f11-9b62-9ec702a6c057` | F | Anime character | | `zhongli`| `ed751d4d-e633-4bb0-8f5e-b5c8ddb04402` | M | Deep, authoritative | | `flora` | `a931a6af-fb01-42f0-a8c0-bd14bc302bb1` | F | Cheerful, high pitch | | `chief` | `bd35e4e6-6283-46b9-86b6-7cfa3dd409b9` | M | Heroic, commanding | The CLI accepts both aliases (`--voice ellie`) and full UUIDs (`--voice d1bf0f33-...`). --- ## Error Codes | Code | Meaning | Action | |------|--------------------|----------------------------------| | 401 | Invalid API key | Check `VOICE_AI_API_KEY` | | 402 | Out of credits | Top up at voice.ai/dashboard | | 422 | Validation error | Check text length, voice_id | | 429 | Rate limited | Wait and retry | --- ## Mock Mode When `--mock` is passed, the pipeline runs end-to-end without any network calls: - Voice listing returns the 9 popular voices above - TTS returns generated WAV files with an audible test tone - No API key required - All output files (review.html, chapters, captions, etc.) are produced identically