ai-audio-processingAI Audio Processing Studio

Other

AI驱动的全栈音频处理技能。覆盖语音转文字（多语言ASR）、文字转语音（TTS含情感控制）、音频降噪与修复、音乐信息检索（MIR）、自动混音与母带处理、播客制作流水线、实时翻译配音。支持Whisper/Bark/OpenVoice/Demucs等前沿模型，兼容DAW工作流（Ableton/Logic/Reaper）。

Install

openclaw skills install ai-audio-processing

AI Audio Processing Studio

AI-powered full-stack audio processing skill. Covers ASR, TTS, noise reduction, music analysis, auto-mixing, podcast production, and real-time dubbing.

Core Modules

1. Speech-to-Text (ASR)

Multi-language transcription (100+ languages via Whisper)
Speaker diarization (identify who spoke when)
Timestamp-aligned subtitles (SRT/VTT/ASS)
Real-time streaming transcription
Domain-specific vocabulary customization (medical/legal/tech)
Punctuation and capitalization restoration

2. Text-to-Speech (TTS)

Natural voice synthesis (Bark/OpenVoice/CosyVoice)
Emotion control (happy, sad, angry, neutral, enthusiastic)
Voice cloning from 10-second sample
Multi-speaker dialog generation
Speed and pitch adjustment
Audiobook narration pipeline (chapter-aware)

3. Audio Restoration & Enhancement

Noise reduction (stationary + non-stationary)
De-click, de-clip, de-ess processing
Reverb removal and room acoustics correction
Audio upscaling (8kHz→48kHz via super-resolution)
Old recording restoration (vinyl crackle, tape hiss)
Voice isolation from background music

4. Music Information Retrieval (MIR)

Beat/tempo detection and BPM analysis
Key and chord recognition
Instrument separation (vocals/drums/bass/other via Demucs)
Music structure analysis (verse/chorus/bridge detection)
Genre classification and mood tagging
Melody extraction and MIDI transcription

5. Auto-Mixing & Mastering

Automatic level balancing (LUFS normalization)
EQ matching to reference tracks
Dynamic compression optimization
Stereo width enhancement
Loudness compliance (Broadcast/Streaming: -14 LUFS, -23 LUFS, -16 LUFS)
Multi-format export (WAV/FLAC/MP3/AAC/OGG)

6. Podcast Production Pipeline

Record → Transcribe → Edit by text → Mix & Master → Export

Text-based audio editing (cut by deleting transcript)
Intro/outro templating with dynamic content
Ad-insertion point detection
Show notes and chapter marker generation
RSS feed generation for publishing

7. Real-time Translation Dubbing

Speech→Translate→TTS pipeline
Lip-sync timing adjustment
Multi-track dubbing for multilingual content
Voice preservation across translations (voice cloning)
Subtitle burn-in with styling

Supported Audio Formats

Input: WAV, MP3, FLAC, AAC, OGG, M4A, WMA, AIFF, OPUS
Output: WAV (24-bit/48kHz), FLAC, MP3 (320kbps), AAC, OGG

Usage Examples

# Transcribe meeting recording
action: transcribe
input: meeting_2026-06-13.wav
language: zh
speakers: 4
output: meeting_transcript.srt
diarization: true

# Podcast production
action: podcast_pipeline
input: raw_interview.wav
host_voice: host_profile.json
guest_voice: guest_sample.wav
intro_music: intro.mp3
output: episode_042_final.mp3
chapters: auto
show_notes: true