Audiobook

Generate audiobooks from novels and long-form text with chapter management and character voices. Use when users mention audiobooks, narrating books, or conve...

MIT-0 · Free to use, modify, and redistribute. No attribution required.
1 · 139 · 0 current installs · 0 all-time installs
MIT-0
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
medium confidence
Purpose & Capability
The name and description match the runtime instructions: the SKILL.md shows parsing text, chunking, and calling a SenseAudio TTS API to produce chaptered audio. Requiring a SenseAudio API key is appropriate for this purpose. One inconsistency: the registry metadata at the top reported no required env vars, while SKILL.md declares a required credential (SENSEAUDIO_API_KEY / env_var API_KEY).
Instruction Scope
Runtime instructions are limited to parsing text, chunking it, calling the SenseAudio API, decoding returned audio, and writing audio files — all consistent with audiobook generation. Important operational scope: the skill will send full user text (entire chapters/books) to a third-party API (api.senseaudio.cn). There are no instructions that access unrelated system files, other credentials, or exfiltrate data elsewhere.
Install Mechanism
This is an instruction-only skill with no install spec and no bundled code files, so nothing is downloaded or installed by the skill itself. The SKILL.md includes Python examples that assume standard libraries (and requests), but the skill does not declare or install those dependencies.
!
Credentials
The SKILL.md correctly requires an API key for SenseAudio (Authorization header uses API_KEY). However, the registry metadata did not list required environment variables, creating a mismatch. The single credential requested (SenseAudio API key) is proportionate to the stated purpose, but confirm which env var name the agent expects (SKILL.md maps SENSEAUDIO_API_KEY -> API_KEY).
Persistence & Privilege
The skill does not request always: true and does not declare elevated or persistent privileges. It is user-invocable and can be autonomously invoked by the agent (platform default), which is expected for skills of this type.
Assessment
This skill appears to do what it says: it parses text and sends chunks to SenseAudio to generate audio. Before installing, verify and be comfortable with these points: (1) You must provide a SenseAudio API key — confirm the exact environment variable name expected (SKILL.md indicates API_KEY / SENSEAUDIO_API_KEY). (2) Using the skill will upload full book text to a third-party service (api.senseaudio.cn); do not send copyrighted, confidential, or sensitive text unless you have rights and trust the service. (3) The SKILL.md uses Python + requests for examples but the skill does not install dependencies — ensure your environment has the needed runtime/libraries. (4) If you need stronger assurances, ask the publisher to correct the registry metadata to list the required env var and to document data retention/privacy for uploaded text. If any of these items worry you, test with non-sensitive sample text first.

Like a lobster shell, security has layers — review code before you run it.

Current versionv1.0.0
Download zip
latestvk97bha7sxene6fnka4c5y9tndx82vpqj

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

SKILL.md

SenseAudio Audiobook Generator

Convert novels, books, and long-form written content into professional audiobooks with natural narration and character voices.

What This Skill Does

This skill helps you create audiobooks by:

  • Converting long texts (novels, books, articles) into audio
  • Automatically handling chapter divisions
  • Supporting multiple character voices for dialogue
  • Distinguishing between narration and character speech
  • Managing long content with automatic chunking

When to Use This Skill

Use this skill when you need to:

  • Create audiobooks from novels or books
  • Convert long-form articles to audio
  • Generate narrated versions of written content
  • Produce audio for e-learning materials
  • Create accessible audio versions of text documents

How It Works

Basic Workflow

  1. Prepare the text - Format with chapter markers and dialogue tags
  2. Configure narration - Choose narrator voice and character voices
  3. Process chapters - Generate audio for each chapter or section
  4. Combine audio - Merge all segments into complete audiobook
  5. Add metadata - Include chapter markers and timestamps

Text Format

Structure your content like this:

# Chapter 1: The Beginning

[Narrator]
It was a dark and stormy night. The old mansion stood alone on the hill.

[Character: John, Voice: male_0004_a]
"I don't think we should go in there," John whispered nervously.

[Character: Sarah, Voice: female_0006_a]
"Don't be silly. It's just an old house."

[Narrator]
They pushed open the creaky door and stepped inside.

Voice Configuration

Narrator voice: Choose a clear, neutral voice for narration

  • Recommended: male_0004_a (calm male) or female_0027_a (clear female)
  • Speed: 0.9-1.0 for comfortable listening
  • Pitch: 0 (neutral)

Character voices: Assign distinct voices to each character

  • Use different voice IDs for different characters
  • Adjust pitch slightly to differentiate (+2/-2)
  • Vary speed for character personality (excited: 1.1, calm: 0.9)

Chapter Management

For long books, process by chapters:

chapters = [
    {"title": "Chapter 1", "text": "...", "start_time": 0},
    {"title": "Chapter 2", "text": "...", "start_time": 1234},
    # ...
]

Benefits:

  • Easier to manage and debug
  • Allows chapter navigation in audio players
  • Can resume generation if interrupted
  • Enables parallel processing

Implementation Steps

Step 1: Parse and Structure Content

Extract structure from the text:

import re

def parse_audiobook_text(text):
    sections = []
    current_section = {"type": "narrator", "text": "", "voice": "male_0004_a"}

    for line in text.split('\n'):
        # Chapter markers
        if line.startswith('# Chapter'):
            if current_section["text"]:
                sections.append(current_section)
            sections.append({"type": "chapter", "title": line[2:]})
            current_section = {"type": "narrator", "text": "", "voice": "male_0004_a"}

        # Narrator
        elif line.startswith('[Narrator]'):
            if current_section["text"]:
                sections.append(current_section)
            current_section = {"type": "narrator", "text": "", "voice": "male_0004_a"}

        # Character dialogue
        elif match := re.match(r'\[Character: (\w+), Voice: ([\w_]+)\]', line):
            if current_section["text"]:
                sections.append(current_section)
            current_section = {
                "type": "character",
                "character": match.group(1),
                "voice": match.group(2),
                "text": ""
            }

        else:
            current_section["text"] += line + " "

    if current_section["text"]:
        sections.append(current_section)

    return sections

Step 2: Generate Audio Segments

For each section, call the TTS API:

import requests
import binascii

def generate_audio_segment(text, voice_id, output_file, speed=1.0):
    url = "https://api.senseaudio.cn/v1/t2a_v2"
    headers = {
        "Authorization": f"Bearer {API_KEY}",
        "Content-Type": "application/json"
    }

    payload = {
        "model": "SenseAudio-TTS-1.0",
        "text": text,
        "stream": False,
        "voice_setting": {
            "voice_id": voice_id,
            "speed": speed,
            "vol": 1.0,
            "pitch": 0
        },
        "audio_setting": {
            "format": "mp3",
            "sample_rate": 32000,
            "bitrate": 128000,
            "channel": 2
        }
    }

    response = requests.post(url, headers=headers, json=payload)
    data = response.json()

    # Decode hex audio
    audio_hex = data['data']['audio']
    audio_binary = binascii.unhexlify(audio_hex)

    # Save to file
    with open(output_file, 'wb') as f:
        f.write(audio_binary)

    return data['extra_info']['audio_length']  # Duration in ms

Step 3: Handle Long Text

Split long chapters into manageable chunks:

def chunk_text(text, max_length=5000):
    """Split text at sentence boundaries"""
    sentences = re.split(r'([.!?]+\s+)', text)
    chunks = []
    current_chunk = ""

    for i in range(0, len(sentences), 2):
        sentence = sentences[i]
        separator = sentences[i+1] if i+1 < len(sentences) else ""

        if len(current_chunk) + len(sentence) + len(separator) > max_length:
            if current_chunk:
                chunks.append(current_chunk.strip())
            current_chunk = sentence + separator
        else:
            current_chunk += sentence + separator

    if current_chunk:
        chunks.append(current_chunk.strip())

    return chunks

Step 4: Combine Audio Files

Merge all segments into a complete audiobook:

from pydub import AudioSegment

def create_audiobook(segments, output_file, chapter_markers=None):
    audiobook = AudioSegment.empty()
    chapter_times = []
    current_time = 0

    for i, segment_file in enumerate(segments):
        # Load segment
        segment = AudioSegment.from_mp3(segment_file)

        # Check if this is a chapter start
        if chapter_markers and i in chapter_markers:
            chapter_times.append({
                "chapter": chapter_markers[i],
                "time": current_time
            })

        # Add segment
        audiobook += segment
        current_time += len(segment)

        # Add pause between segments (300ms)
        if i < len(segments) - 1:
            audiobook += AudioSegment.silent(duration=300)
            current_time += 300

    # Export with metadata
    audiobook.export(
        output_file,
        format="mp3",
        bitrate="128k",
        tags={
            "title": "Audiobook",
            "artist": "SenseAudio TTS",
            "album": "Generated Audiobook"
        }
    )

    return chapter_times

Advanced Features

Dialogue Enhancement

Make dialogue more natural:

  • Add slight pauses before character speech: <break time=200>
  • Vary character emotions based on context
  • Use different pitch for different characters

Narration Styles

Adjust narrator voice for different genres:

  • Fiction: Slightly expressive (speed: 1.0, varied pitch)
  • Non-fiction: Clear and steady (speed: 0.95, neutral pitch)
  • Children's books: Animated (speed: 1.1, varied emotions)
  • Technical: Slow and clear (speed: 0.9, neutral)

Chapter Markers

Add ID3 tags for chapter navigation:

from mutagen.mp3 import MP3
from mutagen.id3 import ID3, CHAP, TIT2, CTOCFlags

def add_chapter_markers(mp3_file, chapters):
    audio = MP3(mp3_file, ID3=ID3)

    # Add chapters
    for i, chapter in enumerate(chapters):
        audio.tags.add(
            CHAP(
                element_id=f"chp{i}",
                start_time=chapter["time"],
                end_time=chapters[i+1]["time"] if i+1 < len(chapters) else len(audio.info.length * 1000),
                sub_frames=[TIT2(text=chapter["chapter"])]
            )
        )

    audio.save()

Progress Tracking

For long books, track progress:

def generate_audiobook_with_progress(sections, output_dir):
    total = len(sections)
    completed = 0

    for i, section in enumerate(sections):
        output_file = f"{output_dir}/segment_{i:04d}.mp3"

        # Skip if already generated
        if os.path.exists(output_file):
            completed += 1
            continue

        # Generate audio
        generate_audio_segment(
            section["text"],
            section["voice"],
            output_file=output_file
        )

        completed += 1
        print(f"Progress: {completed}/{total} ({completed*100//total}%)")

Error Handling

Common issues:

Text too long: Chunk into smaller segments (max 10,000 chars per request)

Voice consistency: Save voice mappings to ensure same character uses same voice

Memory issues: Process chapters separately, don't load entire book at once

API rate limits: Add delays between requests (e.g., 1 second)

Output Format

The skill produces:

  • Individual chapter audio files (MP3)
  • Complete audiobook file (MP3)
  • Chapter markers and timestamps (JSON)
  • Metadata file with voice mappings and settings

Example Usage

User request: "Convert this novel into an audiobook with different voices for each character"

Skill actions:

  1. Parse the novel text and identify chapters
  2. Extract character dialogue and assign voices
  3. Generate audio for each section
  4. Combine all audio with chapter markers
  5. Export final audiobook with metadata
  6. Provide playback instructions and chapter list

Tips for Best Results

  • Use consistent voice IDs for the same character throughout
  • Add natural pauses between paragraphs and chapters
  • Test a sample chapter before processing the entire book
  • Keep narrator voice neutral and clear
  • Use higher quality settings (44100 Hz) for final production
  • Consider adding background ambience for immersive experience

Reference

For detailed API documentation, see the SenseAudio TTS API reference in the references/ directory.

Files

1 total
Select a file
Select a file to preview.

Comments

Loading comments…