Skill flagged — suspicious patterns detected

ClawHub Security flagged this skill as suspicious. Review the scan results before using.

ifly-speed-transcription

Ultra-fast speech transcription using iFLYTEK Speed Transcription API. Transcribe audio files (WAV/PCM/MP3) up to 5 hours in ~20 seconds per hour. Supports C...

MIT-0 · Free to use, modify, and redistribute. No attribution required.
0 · 81 · 0 current installs · 0 all-time installs
byIflytek AIcloud@qingzhe2020
MIT-0
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Suspicious
medium confidence
Purpose & Capability
Functionality (audio upload, multipart upload, create/poll transcription tasks) matches the description of an iFLYTEK speed-transcription client. The code expects iFlytek credentials (app id, api key, api secret), which are appropriate for this purpose. However, the registry metadata lists no required environment variables/credentials even though SKILL.md and scripts clearly require XFEI_APP_ID / XFEI_API_KEY / XFEI_API_SECRET — this metadata omission is an inconsistency.
Instruction Scope
SKILL.md gives concrete runtime instructions (set env vars, run python script, upload/poll workflow). The instructions themselves are scoped to transcription and do not ask for unrelated host data. One oddity: the repository contains a .claude/settings.local.json with Read and Bash permissions pointing to a user-specific Desktop path and zip commands; that file is not required for normal use and appears to be author-local packaging metadata rather than necessary runtime instructions, but it could reveal an over-broad permission intent if honored by an agent runtime.
Install Mechanism
There is no install spec (instruction-only + a Python script). That lowers installation risk; dependencies are standard (requests, urllib3) listed in _meta.json. No remote archive downloads or unusual install sources are present in the provided files.
!
Credentials
The script and SKILL.md require three secrets (XFEI_APP_ID, XFEI_API_KEY, XFEI_API_SECRET) — these are proportionate to calling the iFlytek API. The concern is that the skill registry metadata does not declare these required env vars (it lists none). This mismatch can lead to accidental omission of required secrets or confusion about what the skill will access. Also the skill supports an optional callback_url parameter — if set to an attacker-controlled endpoint it could be used to exfiltrate transcription results; users should inspect and control any callback_url usage.
Persistence & Privilege
The skill is not always-enabled and uses normal autonomous invocation defaults — no elevated persistence requested. The only persistence/permission artifact is .claude/settings.local.json which enumerates local Bash and Read permissions (including reading an absolute Desktop path and running zip/py_compile). That file appears to be local packaging metadata and is not a necessary runtime privilege for the transcription task, but its presence is unusual and should be reviewed; it could indicate the author tested packaging with broad, user-specific filesystem access.
What to consider before installing
This skill appears to be a legitimate iFlytek transcription client, but there are inconsistencies you should address before installing: 1) The SKILL.md and scripts require three environment secrets (XFEI_APP_ID, XFEI_API_KEY, XFEI_API_SECRET) but the registry metadata lists no required env vars — confirm you are comfortable providing those API credentials and that metadata is corrected. 2) Review scripts/transcribe.py yourself (or run it in an isolated environment) to confirm it only uploads the audio files you expect and does not read other files. Pay special attention to callback_url usage — avoid setting a callback to an endpoint you don't control because transcription results could be delivered there. 3) The .claude/settings.local.json contains author-local absolute paths and allowed Bash commands (py_compile, zip, read of a Desktop path) — this is likely leftover packaging metadata but inspect/ignore or remove it before deployment. 4) Only provide your iFlytek credentials to trusted code; consider creating a dedicated API key with limited scope/quota for testing. If you want higher assurance, ask the publisher to update registry metadata to declare required env vars and remove any author-local permission files, or run the script in a sandboxed container and monitor network calls to the xfyun endpoints.

Like a lobster shell, security has layers — review code before you run it.

Current versionv1.0.0
Download zip
latestvk973msz2m1fw5w4zgakbwxywvd836mgx

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

SKILL.md

iFly Speed Transcription

Ultra-fast speech transcription service that converts audio files to text in record time - 1 hour of audio transcribes in ~20 seconds.

Quick Start

# Basic transcription (auto-detect language and dialect)
python3 scripts/transcribe.py /path/to/audio.mp3

# Save to file
python3 scripts/transcribe.py /path/to/audio.wav --output result.txt

# With domain-specific optimization
python3 scripts/transcribe.py /path/to/audio.mp3 --pd medical

# With speaker separation
python3 scripts/transcribe.py /path/to/meeting.mp3 --vspp-on 1 --speaker-num 2

Setup

1. API Credentials

Get credentials from iFlytek Open Platform:

  • APP_ID: Application ID
  • API_KEY: API key for authentication
  • API_SECRET: API secret for signing requests

2. Environment Variables

export XFEI_APP_ID="your_app_id"
export XFEI_API_KEY="your_api_key"
export XFEI_API_SECRET="your_api_secret"

API Parameters

Required Parameters

ParameterDescription
file_pathPath to audio file (MP3, 16kHz, 16-bit, mono)
--languageLanguage code (default: zh_cn for Chinese+English+202 dialects)
--accentAccent (default: mandarin)

Optional Parameters

ParameterTypeDescription
--pdstringDomain: court, finance, medical, tech, sport, edu, gov, game, ecom, car
--vspp-onintSpeaker separation: 0=off, 1=on
--speaker-numintNumber of speakers (0=auto, range 1-10)
--output-typeintOutput: 0=1best, 1=cnlbest, 2=multi-candidate
--postproc-onintPost-processing: 0=off, 1=on (default)
--enable-subtitleintSubtitle mode: 0=document, 1=subtitle
--smoothprocboolSmoothing: true=on, false=off (default: true)
--colloqprocboolColloquial processing: true=on, false=off
--language-typeintLanguage mode: 1=auto, 2=Chinese, 3=English, 4=Chinese-only
--dhwstringHot words (comma-separated, UTF-8)

Audio Requirements

  • Format: MP3
  • Sample rate: 16kHz
  • Bit depth: 16-bit
  • Channels: Mono (single channel)
  • Size: ≤ 500MB
  • Duration: ≤ 5 hours (recommended: ≥ 5 minutes)

Workflow

1. Upload Audio File

Files < 30MB use direct upload. Files ≥ 30MB use multipart upload (5MB chunks).

2. Create Transcription Task

Submit uploaded file URL with transcription parameters.

3. Poll for Results

Query task status periodically until completion.

Response Format

{
  "task_id": "1568100557463963551003",
  "task_status": "4",
  "text": "Transcribed text content...",
  "segments": [
    {
      "speaker": "spk-0",
      "begin": "0",
      "end": "470",
      "text": "听说。"
    }
  ]
}

Task Status

  • 1: Pending
  • 2: Processing
  • 3: Completed
  • 4: Callback completed
  • -1: Failed

Language Support

autodialect (language=zh_cn)

Automatic recognition of Chinese, English, and 202 Chinese dialects including:

  • Major: Mandarin, Cantonese, Taiwanese, Sichuanese, Shanghainese, Northeastern
  • Full list: 合肥话、芜湖话、皖北话、粤语、北京话、福州话、闽南语、潮汕话、客家话、贵阳话、海口话、石家庄话、太原话、郑州话、东北话、武汉话、长沙话、南京话、南昌话、大连话、呼和浩特话、银川话、西宁话、济南话、西安话、上海话、四川话、台湾话、天津话、乌鲁木齐话、云南话、杭州话、重庆话 (202 total)

Common Use Cases

  1. Meeting Transcription: Convert meeting recordings to text with speaker separation
  2. Interview Recording: Transcribe interviews for documentation
  3. Lecture Recording: Convert academic lectures to searchable text
  4. Voice Notes: Transform voice memos into text notes
  5. Call Center: Analyze customer service calls
  6. Legal Proceedings: Transcribe court hearings with domain optimization
  7. Medical Consultation: Doctor-patient conversation documentation

Error Handling

Error CodeDescription友好提示
10107自定音频编码字段错误请检查 encoding 的传值是否规范~ (◎_◎)
10303参数值传递不规范请检查传参值是否有误哦~ (°∀°)ノ
10043音频解码失败请检查所传的音频是否与 encoding 字段描述的编码格式对应呢~
20304静音音频、音频格式与传参不匹配检查音频是否为16k、16bit单声道音频哦~ (。•́︿•̀。)

💡 遇到问题?

常见问题 FAQ

Q: 录音文件转写极速版的主要功能是什么? A: 快速地将长段音频(5小时以内)数据转换成文本数据呢~ (๑•̀ㅂ•́)و✧

Q: 录音文件转写极速版支持什么语言? A: 支持中文、英文 + 202种方言免切识别哦! ヽ(✿゚▽゚)ノ

Q: 录音文件转写极速版支持什么应用平台? A: 目前支持 WebAPI 应用平台啦~

Q: 为什么只支持 MP3 格式呀? A: 因为 MP3 格式兼容性好、文件小、传输快呢~ 使用 lame 编码就能轻松接入啦! (◕‿◕)

Tips

  1. For speaker separation: Use --vspp-on 1 for better speaker diarization
  2. For specific domains: Use --pd parameter for improved accuracy
  3. For faster processing: Audio files ≥ 5 minutes are prioritized
  4. For subtitle output: Use --enable-subtitle 1 for subtitle-formatted output
  5. For hot words: Use --dhw="word1,word2" to boost recognition accuracy

Files

4 total
Select a file
Select a file to preview.

Comments

Loading comments…