openai-tts-python

v1.0.1

Text-to-speech conversion using OpenAI's TTS API for generating high-quality, natural-sounding audio. Supports 6 voices (alloy, echo, fable, onyx, nova, shimmer), speed control (0.25x-4.0x), HD quality model, multiple output formats (mp3, opus, aac, flac), and automatic text chunking for long content (4096 char limit per request). Use when: (1) User requests audio/voice output with triggers like "read this to me", "convert to audio", "generate speech", "text to speech", "tts", "narrate", "speak", or when keywords "openai tts", "voice", "podcast" appear. (2) Content needs to be spoken rather than read (multitasking, accessibility). (3) User wants specific voice preferences like "alloy", "echo", "fable", "onyx", "nova", "shimmer" or speed adjustments.

1· 2.1k·8 current·8 all-time
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Suspicious
high confidence
Purpose & Capability
The code, SKILL.md, README, and manifest all implement a TTS client that calls OpenAI's audio.speech.create API, supports chunking, multiple voices, formats and pydub for combining — this matches the stated purpose. However the top-level registry summary in the evaluation header incorrectly claimed 'Required env vars: none' while SKILL.md, manifest.json, and the script explicitly require OPENAI_API_KEY.
Instruction Scope
Runtime instructions and the script only read user-provided text (args, files, or stdin), call the OpenAI API, and optionally use pydub/ffmpeg to combine audio. There are no instructions to read unrelated system files or exfiltrate other data. Network access to OpenAI is required (expected for this purpose).
Install Mechanism
No custom install spec is provided (instruction-only install), and required dependencies (openai, pydub, ffmpeg) are standard packages/tools available via pip and system package managers. No downloads from untrusted URLs or archives are present in the manifest or README.
!
Credentials
The manifest, SKILL.md, README, and code all require OPENAI_API_KEY (reasonable for this TTS skill), but the top-level registry metadata reported 'Required env vars: none' and 'Primary credential: none' — an explicit mismatch. This metadata omission reduces transparency about credential needs. No other unrelated secrets are requested, so the single required credential is proportionate, but the metadata inconsistency is a concern.
Persistence & Privilege
The skill does not request persistent/always-on installation and does not modify other skills or system-wide agent settings. Default autonomous invocation is allowed by platform policy (normal) and the skill does not escalate privileges.
What to consider before installing
This package implements a normal OpenAI text-to-speech client and legitimately requires your OPENAI_API_KEY. Before installing or enabling it: (1) confirm the publisher/source (the package lists no homepage and 'Source: unknown'); (2) correct or ask the publisher to correct the registry metadata so the required OPENAI_API_KEY is declared; (3) ensure you are comfortable granting network access to OpenAI and that the key has appropriate billing/usage controls; (4) review the included openai/pydub/ffmpeg install instructions and run in a sandbox if you have doubts; (5) if you plan to deploy broadly, prefer an official or well-audited mirror/repository and verify package versions. The inconsistencies are likely sloppy metadata, not malicious code, but verify source trustworthiness before providing your API key.

Like a lobster shell, security has layers — review code before you run it.

latestvk97czawqmdde5fgqcjyzh0anvh7zxhed
2.1kdownloads
1stars
2versions
Updated 1mo ago
v1.0.1
MIT-0

OpenAI TTS

Text-to-speech conversion using OpenAI's TTS API for generating high-quality, natural-sounding audio from text.

Features

  • 6 different voice options (male/female)
  • Standard and HD quality models
  • Automatic text chunking for long content (4096 char limit)
  • Multiple output formats (mp3, opus, aac, flac)

Activation

This skill activates when the user:

  • Requests audio/voice output: "read this to me", "convert to audio", "generate speech", "make this an audio file"
  • Uses keywords: "tts", "openai tts", "text to speech", "voice", "audio", "podcast"
  • Needs content spoken for accessibility, multitasking, or podcast creation
  • Specifies voice preferences: "alloy", "echo", "fable", "onyx", "nova", "shimmer"
  • Asks to "narrate", "speak", or "vocalize" text

Requirements

  • OPENAI_API_KEY environment variable must be set
  • Python 3.8+
  • Dependencies: openai, pydub (optional, for long text)

Voices

VoiceTypeDescription
alloyNeutralBalanced, versatile
echoMaleWarm, conversational
fableNeutralExpressive, storytelling
onyxMaleDeep, authoritative
novaFemaleFriendly, upbeat
shimmerFemaleClear, professional

Usage

Basic Usage

from openai import OpenAI
import os

client = OpenAI(api_key=os.getenv('OPENAI_API_KEY'))

response = client.audio.speech.create(
    model="tts-1",      # or "tts-1-hd" for higher quality
    voice="onyx",       # choose from: alloy, echo, fable, onyx, nova, shimmer
    input="Your text here",
    speed=1.0           # 0.25 to 4.0 (optional)
)

with open("output.mp3", "wb") as f:
    for chunk in response.iter_bytes():
        f.write(chunk)

Command Line

# Basic
python -c "
from openai import OpenAI
client = OpenAI()
response = client.audio.speech.create(model='tts-1', voice='onyx', input='Hello world')
open('output.mp3', 'wb').write(response.content)
"

Long Text (Auto-chunking)

from openai import OpenAI
from pydub import AudioSegment
import tempfile
import os
import re

client = OpenAI()
MAX_CHARS = 4096

def split_text(text):
    if len(text) <= MAX_CHARS:
        return [text]

    chunks = []
    sentences = re.split(r'(?<=[.!?])\s+', text)
    current = ''

    for sentence in sentences:
        if len(current) + len(sentence) + 1 <= MAX_CHARS:
            current += (' ' if current else '') + sentence
        else:
            if current:
                chunks.append(current)
            current = sentence

    if current:
        chunks.append(current)

    return chunks

def generate_tts(text, output_path, voice='onyx', model='tts-1'):
    chunks = split_text(text)

    if len(chunks) == 1:
        response = client.audio.speech.create(model=model, voice=voice, input=text)
        with open(output_path, 'wb') as f:
            f.write(response.content)
    else:
        segments = []
        for chunk in chunks:
            response = client.audio.speech.create(model=model, voice=voice, input=chunk)
            with tempfile.NamedTemporaryFile(suffix='.mp3', delete=False) as tmp:
                tmp.write(response.content)
                segments.append(AudioSegment.from_mp3(tmp.name))
                os.unlink(tmp.name)

        combined = segments[0]
        for seg in segments[1:]:
            combined += seg
        combined.export(output_path, format='mp3')

    return output_path

# Usage
generate_tts("Your long text here...", "output.mp3", voice="nova")

Models

ModelQualitySpeedCost
tts-1StandardFast$0.015/1K chars
tts-1-hdHigh DefinitionSlower$0.030/1K chars

Output Formats

Supported formats: mp3 (default), opus, aac, flac

response = client.audio.speech.create(
    model="tts-1",
    voice="onyx",
    input="Hello",
    response_format="opus"  # or mp3, aac, flac
)

Error Handling

from openai import OpenAI, APIError, RateLimitError
import time

client = OpenAI()

def generate_with_retry(text, voice='onyx', max_retries=3):
    for attempt in range(max_retries):
        try:
            response = client.audio.speech.create(
                model="tts-1",
                voice=voice,
                input=text
            )
            return response.content
        except RateLimitError:
            if attempt < max_retries - 1:
                time.sleep(2 ** attempt)  # Exponential backoff
                continue
            raise
        except APIError as e:
            print(f"API Error: {e}")
            raise

    return None

Examples

Convert Article to Podcast

def article_to_podcast(article_text, output_file):
    intro = "Welcome to today's article reading."
    outro = "Thank you for listening."

    full_text = f"{intro}\n\n{article_text}\n\n{outro}"

    generate_tts(full_text, output_file, voice='nova', model='tts-1-hd')
    print(f"Podcast saved to {output_file}")

Batch Processing

def batch_tts(texts, output_dir, voice='onyx'):
    import os
    os.makedirs(output_dir, exist_ok=True)

    for i, text in enumerate(texts):
        output_path = os.path.join(output_dir, f"audio_{i+1}.mp3")
        generate_tts(text, output_path, voice=voice)
        print(f"Generated: {output_path}")

Links

Comments

Loading comments...