ifly-speed-transcription
Ultra-fast speech transcription using iFLYTEK Speed Transcription API. Transcribe audio files (WAV/PCM/MP3) up to 5 hours in ~20 seconds per hour. Supports C...
Like a lobster shell, security has layers — review code before you run it.
License
SKILL.md
iFly Speed Transcription
Ultra-fast speech transcription service that converts audio files to text in record time - 1 hour of audio transcribes in ~20 seconds.
Quick Start
# Basic transcription (auto-detect language and dialect)
python3 scripts/transcribe.py /path/to/audio.mp3
# Save to file
python3 scripts/transcribe.py /path/to/audio.wav --output result.txt
# With domain-specific optimization
python3 scripts/transcribe.py /path/to/audio.mp3 --pd medical
# With speaker separation
python3 scripts/transcribe.py /path/to/meeting.mp3 --vspp-on 1 --speaker-num 2
Setup
1. API Credentials
Get credentials from iFlytek Open Platform:
- APP_ID: Application ID
- API_KEY: API key for authentication
- API_SECRET: API secret for signing requests
2. Environment Variables
export XFEI_APP_ID="your_app_id"
export XFEI_API_KEY="your_api_key"
export XFEI_API_SECRET="your_api_secret"
API Parameters
Required Parameters
| Parameter | Description |
|---|---|
file_path | Path to audio file (MP3, 16kHz, 16-bit, mono) |
--language | Language code (default: zh_cn for Chinese+English+202 dialects) |
--accent | Accent (default: mandarin) |
Optional Parameters
| Parameter | Type | Description |
|---|---|---|
--pd | string | Domain: court, finance, medical, tech, sport, edu, gov, game, ecom, car |
--vspp-on | int | Speaker separation: 0=off, 1=on |
--speaker-num | int | Number of speakers (0=auto, range 1-10) |
--output-type | int | Output: 0=1best, 1=cnlbest, 2=multi-candidate |
--postproc-on | int | Post-processing: 0=off, 1=on (default) |
--enable-subtitle | int | Subtitle mode: 0=document, 1=subtitle |
--smoothproc | bool | Smoothing: true=on, false=off (default: true) |
--colloqproc | bool | Colloquial processing: true=on, false=off |
--language-type | int | Language mode: 1=auto, 2=Chinese, 3=English, 4=Chinese-only |
--dhw | string | Hot words (comma-separated, UTF-8) |
Audio Requirements
- Format: MP3
- Sample rate: 16kHz
- Bit depth: 16-bit
- Channels: Mono (single channel)
- Size: ≤ 500MB
- Duration: ≤ 5 hours (recommended: ≥ 5 minutes)
Workflow
1. Upload Audio File
Files < 30MB use direct upload. Files ≥ 30MB use multipart upload (5MB chunks).
2. Create Transcription Task
Submit uploaded file URL with transcription parameters.
3. Poll for Results
Query task status periodically until completion.
Response Format
{
"task_id": "1568100557463963551003",
"task_status": "4",
"text": "Transcribed text content...",
"segments": [
{
"speaker": "spk-0",
"begin": "0",
"end": "470",
"text": "听说。"
}
]
}
Task Status
1: Pending2: Processing3: Completed4: Callback completed-1: Failed
Language Support
autodialect (language=zh_cn)
Automatic recognition of Chinese, English, and 202 Chinese dialects including:
- Major: Mandarin, Cantonese, Taiwanese, Sichuanese, Shanghainese, Northeastern
- Full list: 合肥话、芜湖话、皖北话、粤语、北京话、福州话、闽南语、潮汕话、客家话、贵阳话、海口话、石家庄话、太原话、郑州话、东北话、武汉话、长沙话、南京话、南昌话、大连话、呼和浩特话、银川话、西宁话、济南话、西安话、上海话、四川话、台湾话、天津话、乌鲁木齐话、云南话、杭州话、重庆话 (202 total)
Common Use Cases
- Meeting Transcription: Convert meeting recordings to text with speaker separation
- Interview Recording: Transcribe interviews for documentation
- Lecture Recording: Convert academic lectures to searchable text
- Voice Notes: Transform voice memos into text notes
- Call Center: Analyze customer service calls
- Legal Proceedings: Transcribe court hearings with domain optimization
- Medical Consultation: Doctor-patient conversation documentation
Error Handling
| Error Code | Description | 友好提示 |
|---|---|---|
| 10107 | 自定音频编码字段错误 | 请检查 encoding 的传值是否规范~ (◎_◎) |
| 10303 | 参数值传递不规范 | 请检查传参值是否有误哦~ (°∀°)ノ |
| 10043 | 音频解码失败 | 请检查所传的音频是否与 encoding 字段描述的编码格式对应呢~ |
| 20304 | 静音音频、音频格式与传参不匹配 | 检查音频是否为16k、16bit单声道音频哦~ (。•́︿•̀。) |
💡 遇到问题?
- 📖 接口文档:https://console.xfyun.cn/services/ost
- 💰 购买套餐:https://www.xfyun.cn/services/fast_lfasr?target=price
常见问题 FAQ
Q: 录音文件转写极速版的主要功能是什么? A: 快速地将长段音频(5小时以内)数据转换成文本数据呢~ (๑•̀ㅂ•́)و✧
Q: 录音文件转写极速版支持什么语言? A: 支持中文、英文 + 202种方言免切识别哦! ヽ(✿゚▽゚)ノ
Q: 录音文件转写极速版支持什么应用平台? A: 目前支持 WebAPI 应用平台啦~
Q: 为什么只支持 MP3 格式呀? A: 因为 MP3 格式兼容性好、文件小、传输快呢~ 使用 lame 编码就能轻松接入啦! (◕‿◕)
Tips
- For speaker separation: Use
--vspp-on 1for better speaker diarization - For specific domains: Use
--pdparameter for improved accuracy - For faster processing: Audio files ≥ 5 minutes are prioritized
- For subtitle output: Use
--enable-subtitle 1for subtitle-formatted output - For hot words: Use
--dhw="word1,word2"to boost recognition accuracy
Files
4 totalComments
Loading comments…
