Install
openclaw skills install slide-to-video-converterEnd-to-end pipeline for converting PPT/PPTX/PDF slides with speaker notes into narrated MP4 videos. Defaults to Edge TTS (Microsoft free online API) for univ...
openclaw skills install slide-to-video-converterComplete end-to-end pipeline for converting PPT/PPTX/PDF slides with speaker notes into high-quality narrated MP4 videos with auto-synced subtitles.
Stage 1: Audio Generation & Validation
┌─────────────────────────────────────────────────────────┐
│ PPTX/PDF → Images (png) │
│ Script → TTS → Audio → STT Validation → Validated Audio│
└─────────────────────────────────────────────────────────┘
Stage 2: Per-Slide Video Composition
┌─────────────────────────────────────────────────────────┐
│ Image + Validated Audio + Subtitle → Individual MP4 │
└─────────────────────────────────────────────────────────┘
Stage 3: Final Video Assembly
┌─────────────────────────────────────────────────────────┐
│ Merge All Slide Videos → final.mp4 │
└─────────────────────────────────────────────────────────┘
Require two inputs from the user:
Slide file: PPT/PPTX or PDF. Supports automatic PPTX conversion:
Speaker notes: A JSON file with per-page narration. See references/script-format.md for the expected format.
# System dependencies
brew install poppler ffmpeg libreoffice # macOS (add libreoffice for PPTX support)
# apt install poppler-utils ffmpeg libreoffice # Linux
# Python dependencies
pip install -U mlx-audio soundfile numpy edge-tts pdf2image Pillow moviepy python-pptx
# Optional: HTTP service dependencies
pip install fastapi uvicorn python-multipart
Default Mode: Edge TTS - Free online API (recommended for universal compatibility)
python scripts/pipeline.py
PPTX Support Options:
# Use PPTX file (if both PDF and PPTX exist)
python scripts/pipeline.py --use-pptx
# Force PPTX conversion even if PDF exists
python scripts/pipeline.py --use-pptx --force-audio
# PPTX with fallback method (no LibreOffice required)
python scripts/pipeline.py --use-pptx --fallback
Alternative TTS Modes:
Edge TTS - Free online API, no local model required (default)
python scripts/pipeline.py --tts-edge
HTTP Service - Independent TTS server for multi-client usage
# Start TTS server
python scripts/tts_server.py &
# Run pipeline
python scripts/pipeline.py --tts-http
Qwen3-TTS - Local GPU acceleration
python scripts/pipeline.py --tts-direct
# Full pipeline with audio validation
python scripts/pipeline.py
# Specific slides only
python scripts/pipeline.py --slides 1-5
# Fast preview mode (lower quality, quicker)
python scripts/pipeline.py --fast
# Skip image generation (use existing)
python scripts/pipeline.py --skip-images
# Force regenerate audio
python scripts/pipeline.py --force-audio
# Skip audio validation (use existing audio as-is)
python scripts/pipeline.py --skip-validation
# Custom validation threshold
python scripts/pipeline.py --threshold 0.7 --max-retries 3
Edit config.json to adjust:
| Voice | Gender | Style |
|---|---|---|
serena | Female | Warm, natural (default) |
chelsea | Female | Professional, clear |
max | Male | Authoritative, deep |
brian | Male | Friendly, energetic |
| Voice | Gender | Style |
|---|---|---|
zh-CN-YunyangNeural | Male | Professional news anchor (default) |
zh-CN-XiaoxiaoNeural | Female | Warm, natural |
zh-CN-YunjianNeural | Male | Energetic sports |
zh-CN-XiaoyiNeural | Female | Lively cartoon |
zh-CN-YunxiNeural | Male | Sunny, cheerful |
List all Edge voices: edge-tts --list-voices | grep zh-CN
This skill includes example resource directories that demonstrate how to organize different types of bundled resources:
Executable code (Python/Bash/etc.) that can be run directly to perform specific operations.
Examples from other skills:
fill_fillable_fields.py, extract_form_field_info.py - utilities for PDF manipulationdocument.py, utilities.py - Python modules for document processingAppropriate for: Python scripts, shell scripts, or any executable code that performs automation, data processing, or specific operations.
Note: Scripts may be executed without loading into context, but can still be read by Claude for patching or environment adjustments.
Documentation and reference material intended to be loaded into context to inform Claude's process and thinking.
Examples from other skills:
communication.md, context_building.md - detailed workflow guidesAppropriate for: In-depth documentation, API references, database schemas, comprehensive guides, or any detailed information that Claude should reference while working.
Files not intended to be loaded into context, but rather used within the output Claude produces.
Examples from other skills:
Appropriate for: Templates, boilerplate code, document templates, images, icons, fonts, or any files meant to be copied or used in the final output.
Any unneeded directories can be deleted. Not every skill requires all three types of resources.