Install
openclaw skills install acestep-lyrics-transcriptionTranscribe audio to timestamped lyrics using OpenAI Whisper or ElevenLabs Scribe API. Outputs LRC, SRT, or JSON with word-level timestamps. Use when users want to transcribe songs, generate LRC files, or extract lyrics with timestamps from audio.
openclaw skills install acestep-lyrics-transcriptionTranscribe audio files to timestamped lyrics (LRC/SRT/JSON) via OpenAI Whisper or ElevenLabs Scribe API.
Before transcribing, you MUST check whether the user's API key is configured. Run the following command to check:
cd "{project_root}/{.claude or .codex}/skills/acestep-lyrics-transcription/" && bash ./scripts/acestep-lyrics-transcription.sh config --check-key
This command only reports whether the active provider's API key is set or empty — it does NOT print the actual key value. NEVER read or display the user's API key content. Do not use config --get on key fields or read config.json directly. The config --list command is safe — it automatically masks API keys as *** in output.
If the command reports the key is empty, you MUST stop and guide the user to configure it before proceeding. Do NOT attempt transcription without a valid key — it will fail.
Use AskUserQuestion to ask the user to provide their API key, with the following options and guidance:
cd "{project_root}/{.claude or .codex}/skills/acestep-lyrics-transcription/" && bash ./scripts/acestep-lyrics-transcription.sh config --set <provider>.api_key <KEY>
cd "{project_root}/{.claude or .codex}/skills/acestep-lyrics-transcription/" && bash ./scripts/acestep-lyrics-transcription.sh config --set provider <provider_name>
config --check-key to verify the key is set before proceeding.If the API key is already configured, proceed directly to transcription without asking.
# 1. cd to this skill's directory
cd {project_root}/{.claude or .codex}/skills/acestep-lyrics-transcription/
# 2. Configure API key (choose one)
./scripts/acestep-lyrics-transcription.sh config --set openai.api_key sk-...
# or
./scripts/acestep-lyrics-transcription.sh config --set elevenlabs.api_key ...
./scripts/acestep-lyrics-transcription.sh config --set provider elevenlabs
# 3. Transcribe
./scripts/acestep-lyrics-transcription.sh transcribe --audio /path/to/song.mp3 --language zh
# 4. Output saved to: {project_root}/acestep_output/<filename>.lrc
./scripts/acestep-lyrics-transcription.sh transcribe --audio <file> [options]
Options:
-a, --audio Audio file path (required)
-l, --language Language code (zh, en, ja, etc.)
-f, --format Output format: lrc, srt, json (default: lrc)
-p, --provider API provider: openai, elevenlabs (overrides config)
-o, --output Output file path (default: acestep_output/<filename>.lrc)
CRITICAL: After transcription, you MUST manually correct the LRC file before using it for MV rendering. Transcription models frequently produce errors on sung lyrics:
[MM:SS.cc] timestamps exactly as-is (timestamps from transcription are accurate)[Verse] or [Chorus] — the LRC should only have timestamped text linesTranscribed (wrong):
[00:46.96]AC step alive,
[00:50.80]one point five eyes.
Original lyrics reference:
ACE-Step alive
One point five arrives
Corrected (right):
[00:46.96]ACE-Step alive,
[00:50.80]One point five arrives.
Config file: scripts/config.json
# Switch provider
./scripts/acestep-lyrics-transcription.sh config --set provider openai
./scripts/acestep-lyrics-transcription.sh config --set provider elevenlabs
# Set API keys
./scripts/acestep-lyrics-transcription.sh config --set openai.api_key sk-...
./scripts/acestep-lyrics-transcription.sh config --set elevenlabs.api_key ...
# View config
./scripts/acestep-lyrics-transcription.sh config --list
| Option | Default | Description |
|---|---|---|
provider | openai | Active provider: openai or elevenlabs |
output_format | lrc | Default output: lrc, srt, or json |
openai.api_key | "" | OpenAI API key |
openai.api_url | https://api.openai.com/v1 | OpenAI API base URL |
openai.model | whisper-1 | OpenAI model (whisper-1 for word timestamps) |
elevenlabs.api_key | "" | ElevenLabs API key |
elevenlabs.api_url | https://api.elevenlabs.io/v1 | ElevenLabs API base URL |
elevenlabs.model | scribe_v2 | ElevenLabs model |
| Provider | Model | Word Timestamps | Pricing |
|---|---|---|---|
| OpenAI | whisper-1 | Yes (segment + word) | $0.006/min |
| ElevenLabs | scribe_v2 | Yes (word-level) | Varies by plan |
whisper-1 is the only OpenAI model supporting word-level timestampsscribe_v2 returns word-level timestamps with type filtering# Basic transcription (uses config defaults)
./scripts/acestep-lyrics-transcription.sh transcribe --audio song.mp3
# Chinese song to LRC
./scripts/acestep-lyrics-transcription.sh transcribe --audio song.mp3 --language zh
# Use ElevenLabs, output SRT
./scripts/acestep-lyrics-transcription.sh transcribe --audio song.mp3 --provider elevenlabs --format srt
# Custom output path
./scripts/acestep-lyrics-transcription.sh transcribe --audio song.mp3 --output ./my_lyrics.lrc