#transcription #speech-to-text #audio #transcribe

ifly-speed-transcription

Ultra-fast speech transcription using iFLYTEK Speed Transcription API. Transcribe audio files (WAV/PCM/MP3) up to 5 hours in ~20 seconds per hour. Supports Chinese, English, and 202+ Chinese dialects with automatic language detection. Use when user asks to transcribe audio files, convert speech to text, or mentions "speed transcription" or "极速转写".

Iflytek AIcloud@qingzhe2020

Install

openclaw skills install @qingzhe2020/ifly-speed-transcription

iFly Speed Transcription

Ultra-fast speech transcription service that converts audio files to text in record time - 1 hour of audio transcribes in ~20 seconds.

Quick Start

bash

# Basic transcription (auto-detect language and dialect)
python3 scripts/transcribe.py /path/to/audio.mp3

# Save to file
python3 scripts/transcribe.py /path/to/audio.wav --output result.txt

# With domain-specific optimization
python3 scripts/transcribe.py /path/to/audio.mp3 --pd medical

# With speaker separation
python3 scripts/transcribe.py /path/to/meeting.mp3 --vspp-on 1 --speaker-num 2

Setup

1. API Credentials

Get credentials from iFlytek Open Platform:

APP_ID: Application ID
API_KEY: API key for authentication
API_SECRET: API secret for signing requests

2. Environment Variables

bash

export XFEI_APP_ID="your_app_id"
export XFEI_API_KEY="your_api_key"
export XFEI_API_SECRET="your_api_secret"

API Parameters

Required Parameters

Parameter	Description
`file_path`	Path to audio file (MP3, 16kHz, 16-bit, mono)
`--language`	Language code (default: `zh_cn` for Chinese+English+202 dialects)
`--accent`	Accent (default: `mandarin`)

Optional Parameters

Parameter	Type	Description
`--pd`	string	Domain: court, finance, medical, tech, sport, edu, gov, game, ecom, car
`--vspp-on`	int	Speaker separation: 0=off, 1=on
`--speaker-num`	int	Number of speakers (0=auto, range 1-10)
`--output-type`	int	Output: 0=1best, 1=cnlbest, 2=multi-candidate
`--postproc-on`	int	Post-processing: 0=off, 1=on (default)
`--enable-subtitle`	int	Subtitle mode: 0=document, 1=subtitle
`--smoothproc`	bool	Smoothing: true=on, false=off (default: true)
`--colloqproc`	bool	Colloquial processing: true=on, false=off
`--language-type`	int	Language mode: 1=auto, 2=Chinese, 3=English, 4=Chinese-only
`--dhw`	string	Hot words (comma-separated, UTF-8)

Audio Requirements

Format: MP3
Sample rate: 16kHz
Bit depth: 16-bit
Channels: Mono (single channel)
Size: ≤ 500MB
Duration: ≤ 5 hours (recommended: ≥ 5 minutes)

Workflow

1. Upload Audio File

Files < 30MB use direct upload. Files ≥ 30MB use multipart upload (5MB chunks).

2. Create Transcription Task

Submit uploaded file URL with transcription parameters.

3. Poll for Results

Query task status periodically until completion.

Response Format

json

{
  "task_id": "1568100557463963551003",
  "task_status": "4",
  "text": "Transcribed text content...",
  "segments": [
    {
      "speaker": "spk-0",
      "begin": "0",
      "end": "470",
      "text": "听说。"
    }
  ]
}

Task Status

1: Pending
2: Processing
3: Completed
4: Callback completed
-1: Failed

Language Support

autodialect (language=zh_cn)

Automatic recognition of Chinese, English, and 202 Chinese dialects including:

Major: Mandarin, Cantonese, Taiwanese, Sichuanese, Shanghainese, Northeastern
Full list: 合肥话、芜湖话、皖北话、粤语、北京话、福州话、闽南语、潮汕话、客家话、贵阳话、海口话、石家庄话、太原话、郑州话、东北话、武汉话、长沙话、南京话、南昌话、大连话、呼和浩特话、银川话、西宁话、济南话、西安话、上海话、四川话、台湾话、天津话、乌鲁木齐话、云南话、杭州话、重庆话 (202 total)

Common Use Cases

Meeting Transcription: Convert meeting recordings to text with speaker separation
Interview Recording: Transcribe interviews for documentation
Lecture Recording: Convert academic lectures to searchable text
Voice Notes: Transform voice memos into text notes
Call Center: Analyze customer service calls
Legal Proceedings: Transcribe court hearings with domain optimization
Medical Consultation: Doctor-patient conversation documentation

Error Handling

Error Code	Description	友好提示
10107	自定音频编码字段错误	请检查 encoding 的传值是否规范～ (◎_◎)
10303	参数值传递不规范	请检查传参值是否有误哦～ (°∀°)ﾉ
10043	音频解码失败	请检查所传的音频是否与 encoding 字段描述的编码格式对应呢～
20304	静音音频、音频格式与传参不匹配	检查音频是否为16k、16bit单声道音频哦～ (｡•́︿•̀｡)

💡 遇到问题？

📖 接口文档：https://console.xfyun.cn/services/ost
💰 购买套餐：https://www.xfyun.cn/services/fast_lfasr?target=price

常见问题 FAQ

Q: 录音文件转写极速版的主要功能是什么？ A: 快速地将长段音频（5小时以内）数据转换成文本数据呢～ (๑•̀ㅂ•́)و✧

Q: 录音文件转写极速版支持什么语言？ A: 支持中文、英文 + 202种方言免切识别哦！ヽ(✿ﾟ▽ﾟ)ノ

Q: 录音文件转写极速版支持什么应用平台？ A: 目前支持 WebAPI 应用平台啦～

Q: 为什么只支持 MP3 格式呀？ A: 因为 MP3 格式兼容性好、文件小、传输快呢～使用 lame 编码就能轻松接入啦！ (◕‿◕)

Tips

For speaker separation: Use --vspp-on 1 for better speaker diarization
For specific domains: Use --pd parameter for improved accuracy
For faster processing: Audio files ≥ 5 minutes are prioritized
For subtitle output: Use --enable-subtitle 1 for subtitle-formatted output
For hot words: Use --dhw="word1,word2" to boost recognition accuracy