Text to Speech and Voice Cloning by Verbatik AI

Turn your AI assistant into a TTS and voice cloning powerhouse using the Verbatik API. Use when generating speech from text, cloning voices, managing cloned...

MIT-0 · Free to use, modify, and redistribute. No attribution required.
0 · 126 · 1 current installs · 1 all-time installs
byVerbatik Technologies Limited@verbatik
MIT-0
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
The name/description (TTS and voice cloning) matches the runtime instructions: every API endpoint and header documented in SKILL.md is for api.verbatik.com. Required env var (VERBATIK_API_KEY) is appropriate for this purpose.
Instruction Scope
SKILL.md is an API usage guide: it shows GET/POST/DELETE calls to Verbatik endpoints, headers, parameters, pricing, and error codes. It does not instruct reading unrelated local files, other environment variables, or sending data to third-party endpoints outside api.verbatik.com (MCP/OAuth endpoints are on the same domain).
Install Mechanism
No install spec and no code files — this is instruction-only, so nothing is downloaded or written to disk by the skill itself.
Credentials
Only a single credential (VERBATIK_API_KEY) is required and is the primary credential declared. That aligns with the documented Bearer token auth. There are no unrelated secrets requested.
Persistence & Privilege
The skill is not marked always:true and uses default autonomous invocation settings. It does not request persistent system configuration or modify other skills.
Assessment
This skill appears coherent and purely documents calls to api.verbatik.com, but before installing: (1) Verify the vendor and domain (https://api.verbatik.com) and read their privacy/terms (voice cloning has legal/ethical implications). (2) Keep VERBATIK_API_KEY secret, store it in a safe secret store, and use a least-privilege key if available. (3) Watch billing: the skill references prepaid balances and auto top-up—test with a low-cost/dev account to avoid unexpected charges. (4) Confirm you have consent/rights to clone any voices you upload. (5) If you enable any OAuth/MCP one‑click integration, review the scopes requested by that flow. (6) Revoke/regenerate the API key if you suspect misuse.

Like a lobster shell, security has layers — review code before you run it.

Current versionv1.0.1
Download zip
latestvk979cxpy5zeq0g662kxecre0cx82pjsm

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

Runtime requirements

EnvVERBATIK_API_KEY
Primary envVERBATIK_API_KEY

SKILL.md

Text-to-Speech & Voice Cloning Assistant (via Verbatik)

Autonomously generate speech, clone voices, and manage audio via the Verbatik API.

Setup

  1. Create a Verbatik account at api.verbatik.com
  2. Top up your prepaid balance (Settings → Billing)
  3. Generate an API key (Settings → API Keys)
  4. Store your API key:
    VERBATIK_API_KEY=vbt_your_api_key_here
    

Auth

All requests use Bearer token:

Authorization: Bearer <VERBATIK_API_KEY>

Base URL: https://api.verbatik.com

MCP (Model Context Protocol)

Verbatik also exposes an MCP server for direct AI assistant integration. Endpoint:

https://api.verbatik.com/api/mcp/mcp

Supports OAuth 2.1 (one-click connect in Claude Desktop) and API key auth via mcp-remote bridge.

Core Workflow

1. List Available Voices

GET /api/v1/voices

Query params:

  • language — filter by language code (e.g. en-US, fr-FR)
  • genderMale, Female, or Neutral
  • search — search by voice name

Returns array of voices with id (slug), name, gender, language_code, language_name, is_neural, sample_url, styles.

2. Text-to-Speech (Pre-trained Voices)

POST /api/v1/tts
Content-Type: text/plain
Authorization: Bearer <key>
X-Voice-ID: jenny-en-us
X-Store-Audio: true

Hello, this is a test of the Verbatik text-to-speech API.

Headers:

  • Content-Typetext/plain or application/ssml+xml for SSML
  • X-Voice-ID — voice slug (e.g. jenny-en-us). Defaults to Jenny if omitted
  • X-Store-Audiotrue to get a stored URL back, false for binary audio stream

Max text length: 50,000 characters. Large texts are automatically chunked.

Cost: $0.002 per 1,000 characters

Response (when X-Store-Audio: true):

{
  "success": true,
  "audio_url": "https://...",
  "characters_processed": 52,
  "chunks_processed": 1,
  "response_time_ms": 1200,
  "cost_cents": 1
}

Response (when X-Store-Audio: false): Binary audio with metadata in response headers (X-Characters-Processed, X-Cost-Cents, X-Balance-Cents).

3. Clone a Voice

POST /api/v1/voice-training
Content-Type: application/json
Authorization: Bearer <key>

{
  "audio_url": "https://example.com/sample.mp3",
  "name": "My Voice",
  "noise_reduction": true,
  "volume_normalization": true,
  "preview_text": "Hello, this is a preview of my cloned voice!"
}

Requirements:

  • Audio must be at least 10 seconds of speech
  • Supported formats: .mp3, .wav (max 20MB)
  • Cost: $3.00 per clone

Response:

{
  "success": true,
  "voice_id": "uuid-here",
  "name": "My Voice",
  "fal_voice_id": "...",
  "preview_url": "https://...",
  "cost_cents": 300
}

4. Generate Speech with Cloned Voice

POST /api/v1/voice-cloning
Content-Type: text/plain
Authorization: Bearer <key>
X-Voice-ID: <cloned_voice_uuid>
X-Store-Audio: true

Hello from my cloned voice!

Optional headers for voice control:

  • X-Speed0.5 to 2.0 (default: 1)
  • X-Volume0 to 10 (default: 1)
  • X-Pitch-12 to 12 (default: 0)
  • X-Emotionhappy, sad, angry, fearful, disgusted, surprised, neutral
  • X-Formatmp3, pcm, flac (default: mp3)
  • X-Language-Boost — language to enhance (e.g. English, French, Japanese)
  • X-Sample-Rate8000, 16000, 22050, 24000, 32000, 44100
  • X-Bitrate32000, 64000, 128000, 256000

Voice modification (Speech 2.8 Turbo):

  • X-Voice-Modify-Pitch-100 to 100
  • X-Voice-Modify-Intensity-100 to 100
  • X-Voice-Modify-Timbre-100 to 100

Supports interjection tags in text: (laughs), (sighs), (coughs), (clears throat), (gasps), (sniffs), (groans), (yawns)

Supports pause markers: <#x#> where x = 0.01–99.99 seconds

Max text length: 5,000 characters. Cost: $0.10 per 1,000 characters

5. Manage Cloned Voices

List all cloned voices:

GET /api/v1/my-voices

Optional query param: statuspending, ready, or failed

Get a specific voice:

GET /api/v1/my-voices/<voice_id>

Delete a voice:

DELETE /api/v1/my-voices/<voice_id>

6. Preview a Pre-trained Voice

GET /api/v1/preview/<voice_slug>

Returns binary audio preview. No auth required. Cached for 24 hours.

Pricing

ActionCost
Standard TTS (pre-trained voices)$0.002 / 1,000 chars
Cloned Voice TTS$0.10 / 1,000 chars
Voice Cloning$3.00 / clone
List voices, check balance, estimate costFree

All usage is deducted from your prepaid balance. Auto top-up is available.

Error Handling

StatusMeaning
401Invalid or missing API key
402Insufficient balance — top up required
400Bad request (invalid params, text too long, voice not found)
404Voice not found or doesn't belong to your workspace
429Rate limit exceeded — check Retry-After header
500Server error

Tips

  • Always use X-Store-Audio: true when you need a shareable URL — binary mode is for streaming
  • Use voice slugs (e.g. jenny-en-us) not internal IDs for pre-trained voices
  • Use UUIDs from voice-training or my-voices for cloned voices
  • Clone voices with clean audio (minimal background noise) for best results
  • Use noise_reduction: true when cloning from imperfect audio
  • Cloned voices expire after 7 days of inactivity — Verbatik auto-refreshes them via cron
  • Estimate costs before large batches with the estimate_cost MCP tool
  • Check your balance before bulk operations to avoid 402 errors
  • Use emotion and speed controls on cloned voices for more natural output
  • SSML is supported for pre-trained voices — use Content-Type: application/ssml+xml

Files

1 total
Select a file
Select a file to preview.

Comments

Loading comments…