Install
openclaw skills install xiaoyuzhou-asrTranscribe 小宇宙 (Xiaoyuzhou) podcast episodes to text using local Qwen3-ASR speech recognition. Combines xyz API (小宇宙FM API) to fetch episode metadata and aud...
openclaw skills install xiaoyuzhou-asrTranscribe 小宇宙 podcast episodes to text using local Qwen3-ASR (Metal/CUDA accelerated).
git clone https://github.com/ultrazg/xyz.git && cd xyz && go run .
# Default port: 23020, change with -p
POST /sendCode then POST /login (see references/xyz-api.md)brew install ffmpeg)python3 -c "
from huggingface_hub import snapshot_download
snapshot_download('Qwen/Qwen3-ASR-0.6B', local_dir='models/0.6B')
"
git clone https://github.com/alan890104/qwen3-asr-rs.git && cd qwen3-asr-rs
cargo build --release --example local_transcribe
TOKEN="$XYZ_ACCESS_TOKEN"
BASE="http://localhost:23020"
# Search episodes by keyword
curl -s -X POST $BASE/search \
-H "x-jike-access-token: $TOKEN" -H "Content-Type: application/json" \
-d '{"keyword":"关键词","type":"EPISODE"}'
# Get episode detail (contains audio URL)
curl -s -X POST $BASE/episode_detail \
-H "x-jike-access-token: $TOKEN" -H "Content-Type: application/json" \
-d '{"eid":"EPISODE_ID"}'
# List episodes of a podcast
curl -s -X POST $BASE/episode_list \
-H "x-jike-access-token: $TOKEN" -H "Content-Type: application/json" \
-d '{"pid":"PODCAST_ID","order":"desc"}'
Audio URL is in data.data.media.source.url (m4a format).
mkdir -p /tmp/xiaoyuzhou-audio
curl -L -o /tmp/xiaoyuzhou-audio/episode.m4a "$AUDIO_URL"
ffmpeg -y -i /tmp/xiaoyuzhou-audio/episode.m4a -ar 16000 -ac 1 /tmp/xiaoyuzhou-audio/episode.wav
Podcasts are continuous speech with few silence gaps. Use fixed-interval splitting:
# Split into 3-minute segments (must split at ≥2 min for Metal GPU memory)
ffmpeg -y -i episode.wav -f segment -segment_time 180 -ar 16000 -ac 1 seg_%03d.wav
Or try silence-based splitting (may find no gaps in continuous podcasts):
ffmpeg -i episode.wav -af "silencedetect=noise=-30dB:d=2" -f null - 2>&1 | grep silence_end
ffmpeg -i episode.wav -f segment -segment_times T1,T2 -ar 16000 -ac 1 seg_%03d.wav
MODEL_DIR="/path/to/models/0.6B"
ASR_BIN="qwen3-asr-rs/target/release/examples/local_transcribe"
# Transcribe each segment
for seg in seg_*.wav; do
$ASR_BIN $MODEL_DIR $seg 2>/dev/null | grep "^Text :" | sed 's/^Text : //'
done
For efficiency (load model once in Rust):
use qwen3_asr::{AsrInference, TranscribeOptions, best_device};
let engine = AsrInference::load("models/0.6B", best_device())?;
for seg in segments {
let result = engine.transcribe(&seg, TranscribeOptions::default())?;
output.push(result.text);
}
Combine transcript with metadata as markdown:
# {title}
**节目**: {podcast.title} | **日期**: {pubDate} | **时长**: {duration}s
## 转录文本
{transcript}
POST /refresh_tokenXYZ_ACCESS_TOKEN, XYZ_REFRESH_TOKENlocal_transcribe binary needed (demo binary only runs built-in test samples)