---
name: xiaoyuzhou-asr
description: >
  Transcribe 小宇宙 (Xiaoyuzhou) podcast episodes to text using local Qwen3-ASR speech recognition.
  Combines xyz API (小宇宙FM API) to fetch episode metadata and audio URLs with qwen3-asr-rs for
  local GPU-accelerated speech-to-text. Use when user wants to: (1) transcribe a 小宇宙 podcast episode,
  (2) download podcast audio from 小宇宙, (3) search and transcribe podcast content, (4) batch transcribe
  multiple episodes. Triggers: 小宇宙, podcast transcription, 播客转文字, ASR, speech recognition,
  episode transcript, 节目转录, qwen3-asr, 小宇宙转录.
---

# xiaoyuzhou-asr

Transcribe 小宇宙 podcast episodes to text using local Qwen3-ASR (Metal/CUDA accelerated).

> **Required service**: this skill does not call 小宇宙 directly. It requires a compatible
> [ultrazg/xyz](https://github.com/ultrazg/xyz.git) API server to be installed and running.
> Default base URL is `http://localhost:23020`; override it with `XYZ_BASE_URL`. Without
> this service, login, search, episode lookup, and audio URL retrieval will not work.

## Prerequisites

1. **xyz API server** running — fetches episode data and audio URLs from 小宇宙
   ```bash
   git clone https://github.com/ultrazg/xyz.git && cd xyz && go run .
   # Default port: 23020, change with -p
   ```
2. **ffmpeg** — audio format conversion (`brew install ffmpeg`)
3. **Qwen3-ASR model** — download (HF Hub does NOT ship tokenizer.json):
   ```bash
   python3 -c "
   from huggingface_hub import snapshot_download
   snapshot_download('Qwen/Qwen3-ASR-0.6B', local_dir='models/0.6B')
   "
   ```
4. **qwen3-asr-rs** — build from source:
   ```bash
   git clone https://github.com/alan890104/qwen3-asr-rs.git && cd qwen3-asr-rs
   cargo build --release --example local_transcribe
   ```
5. **tokenizer.json** — auto-generated by the transcription script on first run (from vocab.json + merges.txt). No manual step needed.

## Quick Start

```bash
# 1. Login (first time only, saves to ~/.xiaoyuzhou-asr.json)
python3 scripts/transcribe_podcast.py --login

# 2. Check all dependencies
python3 scripts/transcribe_podcast.py --check-env

# 3. Transcribe a single episode
python3 scripts/transcribe_podcast.py --keyword "早咖啡" -o output.md

# Or transcribe a shared episode URL
python3 scripts/transcribe_podcast.py --url "https://www.xiaoyuzhoufm.com/episode/EPISODE_ID" -o output.md
```

## CLI Commands

### Authentication

```bash
# Interactive login — sends verification code to phone, saves tokens
python3 scripts/transcribe_podcast.py --login
```

### Discovery

```bash
# Search podcasts and show PID (for batch mode)
python3 scripts/transcribe_podcast.py --podcast-info --keyword "声动早咖啡"

# List recent episodes of a podcast
python3 scripts/transcribe_podcast.py --list-episodes --pid PODCAST_ID --count 10
```

### Transcription

```bash
# Single episode by keyword (picks first result)
python3 scripts/transcribe_podcast.py --keyword "关键词" -o output.md

# Single episode by EID
python3 scripts/transcribe_podcast.py --eid EPISODE_ID -o output.md

# Single episode by Xiaoyuzhou URL
python3 scripts/transcribe_podcast.py --url "https://www.xiaoyuzhoufm.com/episode/EPISODE_ID" -o output.md

# Batch: transcribe 5 latest episodes of a podcast
python3 scripts/transcribe_podcast.py --pid PODCAST_ID --count 5 -o ./transcripts/

# With specific format
python3 scripts/transcribe_podcast.py --eid EPISODE_ID --format srt -o output.srt
```

### Diagnostics

```bash
# Check all dependencies (ffmpeg, xyz API, token, ASR binary, model)
python3 scripts/transcribe_podcast.py --check-env
```

## Output Formats

| Format | Flag | Description |
|--------|------|-------------|
| Markdown | `--format markdown` (default) | Metadata header + transcript |
| SRT | `--format srt` | Subtitles with estimated timestamps |
| Plain text | `--format txt` | Minimal header + transcript |
| JSON | `--format json` | Metadata + transcript as JSON |

## Batch Mode

- Transcribes the N most recent episodes of a podcast (`--pid --count N`)
- Saves each episode as a separate file in the output directory
- **Checkpoint/resume**: skips episodes that already exist in the output directory

## Configuration

Settings are resolved in priority order: **CLI argument > Environment variable > Config file**.

### Config File (`~/.xiaoyuzhou-asr.json`)

Auto-created by `--login`. Can also store paths:

```json
{
  "token": "x-jike-access-token",
  "refresh_token": "x-jike-refresh-token",
  "model_dir": "/path/to/models/0.6B",
  "asr_bin": "/path/to/local_transcribe"
}
```

### Environment Variables

| Variable | Description | Default |
|----------|-------------|---------|
| `XYZ_ACCESS_TOKEN` | x-jike access token | — (required) |
| `XYZ_REFRESH_TOKEN` | Refresh token for auto-renewal | — (optional) |
| `XYZ_BASE_URL` | xyz API base URL | `http://localhost:23020` |
| `XYZ_HTTP_TIMEOUT` | xyz API request timeout in seconds | `15` |
| `XYZ_DOWNLOAD_TIMEOUT` | Audio download timeout in seconds | `120` |
| `QWEN3_ASR_MODEL_DIR` | Qwen3-ASR model directory | auto-detect |
| `QWEN3_ASR_BIN` | local_transcribe binary path | auto-detect |

## Token Management

- `--login` saves tokens to config file automatically
- If API returns 401, auto-refresh using refresh token
- Prompt user to login if no valid token

## References

- **xyz API endpoints and auth**: [references/xyz-api.md](references/xyz-api.md)
- **Qwen3-ASR usage and performance**: [references/qwen3-asr.md](references/qwen3-asr.md)

## Constraints

- **MUST split audio** into ≤3-minute segments for Metal GPU stability (auto-handled by script)
- Audio must be WAV 16kHz mono (auto-converted by script)
- tokenizer.json auto-generated on first run (from vocab.json + merges.txt)
- xyz API requires Chinese phone number (+86) login
- All processing is local — audio never leaves the machine
- Download retries up to 3 times on network failure

## Script Reuse

This is a skill project, not a packaged Python library. Prefer the CLI above. Other scripts in
this repository can still import `scripts/transcribe_podcast.py` directly:

```python
from transcribe_podcast import (
    search_episodes, transcribe_episode, format_output,
    TranscriptionError, ApiError, TokenExpiredError,
)

try:
    episodes, _ = search_episodes(token, "早咖啡")
    episode, transcript, timings = transcribe_episode(token, eid, model_dir, asr_bin)
    output = format_output(episode, transcript)
except TranscriptionError as e:
    print(f"Error: {e}")
```
